Difference between revisions of "Overview"

From DocDataFlow
Jump to: navigation, search
Line 1: Line 1:
 
Internally, Crawler is designed along principles that are similar to the ones found in the [[http://en.wikipedia.org/wiki/Dataflow_programming Data Flow Programming]] paradigm.
 
Internally, Crawler is designed along principles that are similar to the ones found in the [[http://en.wikipedia.org/wiki/Dataflow_programming Data Flow Programming]] paradigm.
  
One of the basic units in Crawler is called a 'Personality'. A Personality is a Crawler construct which will take some form of input, and will process it into some other form.
+
One of the basic components in a Crawler-based system is called the 'Personality'. A Personality is a Crawler component which will take some form of input, and will process it into some other form.
  
Sample Personalities are: InDesign-to-XHTML/CSS, InDesign-to-EPUB...
+
Sample Personalities are: InDesign-to-XHTML/CSS, InDesign-to-EPUB, InDesign-to-Database input...
  
Personalities are built from simpler elements: a personality is a composite of interconnected processing units called 'Adapters', a set of configuration files, a set of template files...
+
Personalities are built up from simpler elements: a personality is a composite of a network of interconnected processing units called 'Adapters', a set of configuration files, a set of template files...
 +
 
 +
The input document(s) are pushed through the network of adapters. These adapters then process the document, and take it apart into ever smaller chunks of data, or collate smaller chunks back into larger chunks.
 +
 
 +
These chunks of data are called 'Granules'. Some adapters take in larger granules and split them up into smaller granules (e.g. they might take in a paragraph granule and split it into individual word granules). Some adapters collate smaller granules back into larger granules (e.g. they might take a number of word granules and concatenate them back into a paragraph granules).  Other adapters perform some kind of processing on the granules they receive; they might change them in some way, discard them, create new granules based on previous granules...
 +
 
 +
Some adapters construct new granules based on template snippet. For example, some adapter could take in some raw text, and combine this raw text with a template snippet into some XML formatted granule.

Revision as of 04:10, 26 December 2013

Internally, Crawler is designed along principles that are similar to the ones found in the [Data Flow Programming] paradigm.

One of the basic components in a Crawler-based system is called the 'Personality'. A Personality is a Crawler component which will take some form of input, and will process it into some other form.

Sample Personalities are: InDesign-to-XHTML/CSS, InDesign-to-EPUB, InDesign-to-Database input...

Personalities are built up from simpler elements: a personality is a composite of a network of interconnected processing units called 'Adapters', a set of configuration files, a set of template files...

The input document(s) are pushed through the network of adapters. These adapters then process the document, and take it apart into ever smaller chunks of data, or collate smaller chunks back into larger chunks.

These chunks of data are called 'Granules'. Some adapters take in larger granules and split them up into smaller granules (e.g. they might take in a paragraph granule and split it into individual word granules). Some adapters collate smaller granules back into larger granules (e.g. they might take a number of word granules and concatenate them back into a paragraph granules). Other adapters perform some kind of processing on the granules they receive; they might change them in some way, discard them, create new granules based on previous granules...

Some adapters construct new granules based on template snippet. For example, some adapter could take in some raw text, and combine this raw text with a template snippet into some XML formatted granule.