Difference between revisions of "Overview"

From DocDataFlow
Jump to: navigation, search
Line 9: Line 9:
 
One of the high-level components in a Crawler-based system is called a [[Personality|''personality'']].
 
One of the high-level components in a Crawler-based system is called a [[Personality|''personality'']].
  
A Personality is a Crawler component which will take input data in some shape or form, and will process it into output data in some other form.
+
A ''personality'' is a high-level Crawler component which will take input data in some shape or form, and will process it into output data in some other form.
  
 
A few examples:
 
A few examples:
Line 16: Line 16:
 
* InDesign-to-Database input: takes in InDesign document or books, and updates a database with information extracted from the document(s).
 
* InDesign-to-Database input: takes in InDesign document or books, and updates a database with information extracted from the document(s).
  
Personalities are constructed out of simpler elements.  
+
Personalities are made up out of simpler elements.  
  
 
A personality is composed of:
 
A personality is composed of:
Line 24: Line 24:
 
* a set of [[Formula File|''formula files'']]
 
* a set of [[Formula File|''formula files'']]
  
When processing, input document(s) are pushed through the network of adapters provided by the personality; data is flowing in and out of the adapters. A personality is somewhat reminiscent of a [http://en.wikipedia.org/wiki/Rube_Goldberg_machine Rube Goldberg-machine].
+
When processing, input document(s) are pushed through the network of ''adapters'' provided by the personality; data is flowing in and out of the ''adapters''.  
  
If the personality were a hive, then the adapters would be the worker bees.
+
If the ''personality'' were a hive, then the ''adapters'' would be the worker bees.
  
The initial adapters process the document, and take it apart into ever smaller chunks of data, or collate smaller chunks back into larger chunks. These chunks of data are called [[Granule|''granules'']].
+
A ''personality'' is somewhat reminiscent of a [http://en.wikipedia.org/wiki/Rube_Goldberg_machine Rube Goldberg-machine].
  
More adapters take in larger granules and split them up into smaller granules (e.g. they might take in a paragraph granule and split it into individual word granules).  
+
The initial ''adapters'' process the document, and take it apart into ever smaller chunks of data, or collate smaller chunks back into larger chunks. These chunks of data are called [[Granule|''granules'']].
  
Specific adapters collate smaller granules back into larger granules (e.g. they might take a number of word granules and concatenate them back into a paragraph granules).
+
More ''adapters'' take in larger ''granules'' and split them up into smaller ''granules'' (e.g. they might take in a paragraph ''granule'' and split it into individual word ''granules'').  
  
Some adapters perform some kind of processing on the granules they receive; they might change them in some way, discard them, count them, reorder them, create new granules based on previous granules...
+
Specific ''adapters'' collate smaller ''granules'' back into larger ''granules'' (e.g. they might take a number of word ''granules'' and concatenate them back into a paragraph ''granules'').
  
Other adapters construct new granules based on [[Template Snippet|''template snippets'']]. For example, some adapter could take in some raw text, and combine this raw text with a template snippet into some XML formatted granule.
+
Some ''adapters'' perform some kind of processing on the ''granules'' they receive; they might change them in some way, discard them, count them, reorder them, create new ''granules'' based on previous ''granules''...
 +
 
 +
Other ''adapters'' construct new ''granules'' based on [[Template Snippet|''template snippets'']]. For example, some ''adapter'' could take in some raw text, and combine this raw text with a ''template snippet'' into some XML formatted ''granule''.
  
 
The general idea is that the input data is broken apart into smaller entities, and then these smaller entities are put back together again a different shape, possibly performing a document conversion in the process.
 
The general idea is that the input data is broken apart into smaller entities, and then these smaller entities are put back together again a different shape, possibly performing a document conversion in the process.
 
== Formula Files ==
 
 
Formula files in Crawler are JavaScript-like files which allow defining placeholders as JavaScript functions.
 

Revision as of 20:18, 26 December 2013

Overview

Crawler is designed along principles that are similar to the ones found in the [Data Flow Programming] paradigm.

Crawler all by itself does not perform any useful function. In order to become usable it needs to be extended with a personality. The personality determines what function Crawler will perform.

Personality

One of the high-level components in a Crawler-based system is called a personality.

A personality is a high-level Crawler component which will take input data in some shape or form, and will process it into output data in some other form.

A few examples:

  • InDesign-to-XHTML/CSS: takes in InDesign documents or books and outputs XHTML/CSS files.
  • InDesign-to-EPUB: takes in InDesign documents or books, outputs EPUB.
  • InDesign-to-Database input: takes in InDesign document or books, and updates a database with information extracted from the document(s).

Personalities are made up out of simpler elements.

A personality is composed of:

When processing, input document(s) are pushed through the network of adapters provided by the personality; data is flowing in and out of the adapters.

If the personality were a hive, then the adapters would be the worker bees.

A personality is somewhat reminiscent of a Rube Goldberg-machine.

The initial adapters process the document, and take it apart into ever smaller chunks of data, or collate smaller chunks back into larger chunks. These chunks of data are called granules.

More adapters take in larger granules and split them up into smaller granules (e.g. they might take in a paragraph granule and split it into individual word granules).

Specific adapters collate smaller granules back into larger granules (e.g. they might take a number of word granules and concatenate them back into a paragraph granules).

Some adapters perform some kind of processing on the granules they receive; they might change them in some way, discard them, count them, reorder them, create new granules based on previous granules...

Other adapters construct new granules based on template snippets. For example, some adapter could take in some raw text, and combine this raw text with a template snippet into some XML formatted granule.

The general idea is that the input data is broken apart into smaller entities, and then these smaller entities are put back together again a different shape, possibly performing a document conversion in the process.