Difference between revisions of "Overview"

From DocDataFlow
Jump to: navigation, search
Line 22: Line 22:
 
* a set of [[FormulaFile|''formula files'']]
 
* a set of [[FormulaFile|''formula files'']]
  
When processing, input document(s) are pushed through the network of adapters provided by the personality; data is flowing in and out of the adapters. A personality is somewhat reminiscent of a [[http://en.wikipedia.org/wiki/Rube_Goldberg_machine Rube Goldberg-machine]].
+
When processing, input document(s) are pushed through the network of adapters provided by the personality; data is flowing in and out of the adapters. A personality is somewhat reminiscent of a [http://en.wikipedia.org/wiki/Rube_Goldberg_machine Rube Goldberg-machine].
  
These adapters process the document, and take it apart into ever smaller chunks of data, or collate smaller chunks back into larger chunks. These chunks of data are called [[Granule|''granules''].
+
If the personality were a hive, then the adapters would be the worker bees.
  
Some adapters take in larger granules and split them up into smaller granules (e.g. they might take in a paragraph granule and split it into individual word granules).  
+
The initial adapters process the document, and take it apart into ever smaller chunks of data, or collate smaller chunks back into larger chunks. These chunks of data are called [[Granule|''granules''].
  
Other adapters collate smaller granules back into larger granules (e.g. they might take a number of word granules and concatenate them back into a paragraph granules).
+
More adapters take in larger granules and split them up into smaller granules (e.g. they might take in a paragraph granule and split it into individual word granules).  
  
Some adapters perform some kind of processing on the granules they receive; they might change them in some way, discard them, create new granules based on previous granules...
+
Specific adapters collate smaller granules back into larger granules (e.g. they might take a number of word granules and concatenate them back into a paragraph granules).
  
Some adapters construct new granules based on template snippet. For example, some adapter could take in some raw text, and combine this raw text with a template snippet into some XML formatted granule.
+
Some adapters perform some kind of processing on the granules they receive; they might change them in some way, discard them, count them, reorder them, create new granules based on previous granules...
  
The general idea is that the input data is broken apart into smaller entities, and then these smaller entities are put back together again a different shape.
+
Other adapters construct new granules based on [[TemplateSnippet|''template snippets'']. For example, some adapter could take in some raw text, and combine this raw text with a template snippet into some XML formatted granule.
  
== Template Files ==
+
The general idea is that the input data is broken apart into smaller entities, and then these smaller entities are put back together again a different shape, possibly performing a document conversion in the process.
 
+
Template files in Crawler come in many types. One of the most common types is the 'snippet template'. These are template files that generate a snippet of text.
+
 
+
The template file is simply a text file with a .snippet file name extension.
+
 
+
Inside the template file, there is a mix of boilerplate text and placeholders. An example: there could be a template maindoc.xhtml.snippet which could contain
+
 
+
<pre>
+
<html>
+
$$HEAD$$
+
$$BODY$$
+
</html>
+
</pre>
+
 
+
The $$HEAD$$ and $$BODY$$ would be placeholders which will be replaced by the text for some lower-level granules.
+
 
+
This is only a sample: placeholders can take many shapes and forms, and the use of $$ as prefix/suffix is just what's used as the default placeholder pattern in Crawler.
+
  
 
== Formula Files ==
 
== Formula Files ==
  
 
Formula files in Crawler are JavaScript-like files which allow defining placeholders as JavaScript functions.
 
Formula files in Crawler are JavaScript-like files which allow defining placeholders as JavaScript functions.

Revision as of 20:01, 26 December 2013

Overview

Crawler is designed along principles that are similar to the ones found in the [Data Flow Programming] paradigm.

Personality

One of the high-level components in a Crawler-based system is called a personality.

A Personality is a Crawler component which will take input data in some shape or form, and will process it into output data in some other form.

A few examples:

  • InDesign-to-XHTML/CSS: takes in InDesign documents or books and outputs XHTML/CSS files.
  • InDesign-to-EPUB: takes in InDesign documents or books, outputs EPUB.
  • InDesign-to-Database input: takes in InDesign document or books, and updates a database with information extracted from the document(s).

Personalities are constructed out of simpler elements.

A personality is composed of:

When processing, input document(s) are pushed through the network of adapters provided by the personality; data is flowing in and out of the adapters. A personality is somewhat reminiscent of a Rube Goldberg-machine.

If the personality were a hive, then the adapters would be the worker bees.

The initial adapters process the document, and take it apart into ever smaller chunks of data, or collate smaller chunks back into larger chunks. These chunks of data are called [[Granule|granules].

More adapters take in larger granules and split them up into smaller granules (e.g. they might take in a paragraph granule and split it into individual word granules).

Specific adapters collate smaller granules back into larger granules (e.g. they might take a number of word granules and concatenate them back into a paragraph granules).

Some adapters perform some kind of processing on the granules they receive; they might change them in some way, discard them, count them, reorder them, create new granules based on previous granules...

Other adapters construct new granules based on [[TemplateSnippet|template snippets]. For example, some adapter could take in some raw text, and combine this raw text with a template snippet into some XML formatted granule.

The general idea is that the input data is broken apart into smaller entities, and then these smaller entities are put back together again a different shape, possibly performing a document conversion in the process.

Formula Files

Formula files in Crawler are JavaScript-like files which allow defining placeholders as JavaScript functions.