Difference between revisions of "Overview"

From DocDataFlow
Jump to: navigation, search
(Template Files)
 
(38 intermediate revisions by one user not shown)
Line 1: Line 1:
= Overview =
+
== Crawler-based Products ==
 +
 
 +
Crawler will become available in 2014 in two different flavors.
 +
 
 +
* Middle: For more advanced workflows, we'll have customizable Crawler versions that will come with 'open source' personalities.
 +
 
 +
We'll also be able to provide training and ongoing support for developing or customizing personalities. These Crawler versions are geared to be deployed in a server setup. The anticipated applications are automated conversions, automated web publishing, and automated back-end database updates.
 +
 
 +
* High end: For the most advanced setups we can also provide 'fully open source' versions of Crawler. This will allow seamless integration of Crawler into an existing workflow. This type of integration always comes with training and ongoing support.
 +
 
 +
Contact [mailto:[email protected] [email protected]] for more info.
 +
 
 +
== Overview ==
  
 
Crawler is designed along principles that are similar to the ones found in the [[http://en.wikipedia.org/wiki/Dataflow_programming Data Flow Programming]] paradigm.
 
Crawler is designed along principles that are similar to the ones found in the [[http://en.wikipedia.org/wiki/Dataflow_programming Data Flow Programming]] paradigm.
 +
 +
Crawler all by itself does not perform any useful function. In order to become usable it needs to be extended with a [[Personality|''personality'']].
 +
 +
The selected ''personality'' determines what function Crawler will perform.
  
 
== Personality ==
 
== Personality ==
  
One of the basic components in a Crawler-based system is called the 'Personality'.  
+
One of the high-level components in a Crawler-based system is called a [[Personality|''personality'']].
  
A Personality is a Crawler component which will take input data in some shape or form, and will process it into output data in some other form.
+
A [[Personality|''personality'']] is a high-level Crawler component which will take input data in some shape or form, and will process it into output data in some other form.
  
 
A few examples:
 
A few examples:
Line 14: Line 30:
 
* InDesign-to-Database input: takes in InDesign document or books, and updates a database with information extracted from the document(s).
 
* InDesign-to-Database input: takes in InDesign document or books, and updates a database with information extracted from the document(s).
  
Personalities are constructed out of simpler elements.
+
When processing, input document(s) are pushed through a network of [[Adapter|''adapters'']] provided by the personality; data is flowing in and out of the ''adapters''.  
 
+
A personality is composed of:
+
* a workflow network of interconnected processing units called 'Adapters'
+
* a set of configuration files
+
* a set of template files
+
 
+
== Adapters ==
+
 
+
The input document(s) are pushed through the network of adapters provided by the personality.  
+
  
These adapters then process the document, and take it apart into ever smaller chunks of data, or collate smaller chunks back into larger chunks. These chunks of data are called 'Granules'.  
+
[[File:Sampleexporter1.png|800px]]
  
Some adapters take in larger granules and split them up into smaller granules (e.g. they might take in a paragraph granule and split it into individual word granules).  
+
If the ''personality'' were a hive, then the ''adapters'' would be the worker bees.
  
Some adapters collate smaller granules back into larger granules (e.g. they might take a number of word granules and concatenate them back into a paragraph granules).  Other adapters perform some kind of processing on the granules they receive; they might change them in some way, discard them, create new granules based on previous granules...
+
A ''personality'' is somewhat reminiscent of a [http://en.wikipedia.org/wiki/Rube_Goldberg_machine Rube Goldberg-machine].
  
Some adapters construct new granules based on template snippet. For example, some adapter could take in some raw text, and combine this raw text with a template snippet into some XML formatted granule.
+
The initial ''adapters'' process the document, and take it apart into ever smaller chunks of data.  
  
The general idea is that the input data is broken apart into smaller entities, and then these smaller entities are put back together again a different shape.
+
The reverse also happens: some adapters collate smaller chunks back into larger chunks.  
  
== Template Files ==
+
These 'chunks of data' are referred to as [[Granule|''granules'']].
  
Template files in Crawler come in many types. One of the most common types is the 'snippet template'. These are template files that generate a snippet of text.  
+
Example: an adapter might take in a paragraph ''granule'' and split it into individual word ''granules''. Another adapter further downstream might take a number of word ''granules'' and concatenate them back into a paragraph ''granules''.
  
The template file is simply a text file with a .snippet file name extension.
+
Some ''adapters'' perform some kind of processing on the ''granules'' they receive; they might change them in some way, discard them, count them, reorder them, create new ''granules'' based on previous ''granules''...
  
Inside the template file, there is a mix of boilerplate text and placeholders. An example: there could be a template maindoc.xhtml.snippet which could contain
+
Other ''adapters'' construct new ''granules'' based on [[Template Snippet|''template snippets'']]. For example, some ''adapter'' could take in some raw text, and combine this raw text with a ''template snippet'' into some XML formatted ''granule''.
  
<code>
+
The general idea is that the input data is broken apart into smaller entities, and then these smaller entities are put back together again a different shape, possibly performing a document conversion in the process.
<html>
+
$HEAD$
+
$BODY$
+
</html>
+
</code>
+

Latest revision as of 02:59, 5 July 2019

Crawler-based Products

Crawler will become available in 2014 in two different flavors.

  • Middle: For more advanced workflows, we'll have customizable Crawler versions that will come with 'open source' personalities.

We'll also be able to provide training and ongoing support for developing or customizing personalities. These Crawler versions are geared to be deployed in a server setup. The anticipated applications are automated conversions, automated web publishing, and automated back-end database updates.

  • High end: For the most advanced setups we can also provide 'fully open source' versions of Crawler. This will allow seamless integration of Crawler into an existing workflow. This type of integration always comes with training and ongoing support.

Contact [email protected] for more info.

Overview

Crawler is designed along principles that are similar to the ones found in the [Data Flow Programming] paradigm.

Crawler all by itself does not perform any useful function. In order to become usable it needs to be extended with a personality.

The selected personality determines what function Crawler will perform.

Personality

One of the high-level components in a Crawler-based system is called a personality.

A personality is a high-level Crawler component which will take input data in some shape or form, and will process it into output data in some other form.

A few examples:

  • InDesign-to-XHTML/CSS: takes in InDesign documents or books and outputs XHTML/CSS files.
  • InDesign-to-EPUB: takes in InDesign documents or books, outputs EPUB.
  • InDesign-to-Database input: takes in InDesign document or books, and updates a database with information extracted from the document(s).

When processing, input document(s) are pushed through a network of adapters provided by the personality; data is flowing in and out of the adapters.

Sampleexporter1.png

If the personality were a hive, then the adapters would be the worker bees.

A personality is somewhat reminiscent of a Rube Goldberg-machine.

The initial adapters process the document, and take it apart into ever smaller chunks of data.

The reverse also happens: some adapters collate smaller chunks back into larger chunks.

These 'chunks of data' are referred to as granules.

Example: an adapter might take in a paragraph granule and split it into individual word granules. Another adapter further downstream might take a number of word granules and concatenate them back into a paragraph granules.

Some adapters perform some kind of processing on the granules they receive; they might change them in some way, discard them, count them, reorder them, create new granules based on previous granules...

Other adapters construct new granules based on template snippets. For example, some adapter could take in some raw text, and combine this raw text with a template snippet into some XML formatted granule.

The general idea is that the input data is broken apart into smaller entities, and then these smaller entities are put back together again a different shape, possibly performing a document conversion in the process.