Difference between revisions of "Overview"

From DocDataFlow
Jump to: navigation, search
 
(26 intermediate revisions by one user not shown)
Line 1: Line 1:
= Overview =
+
== Crawler-based Products ==
 +
 
 +
Crawler will become available in 2014 in two different flavors.
 +
 
 +
* Middle: For more advanced workflows, we'll have customizable Crawler versions that will come with 'open source' personalities.
 +
 
 +
We'll also be able to provide training and ongoing support for developing or customizing personalities. These Crawler versions are geared to be deployed in a server setup. The anticipated applications are automated conversions, automated web publishing, and automated back-end database updates.
 +
 
 +
* High end: For the most advanced setups we can also provide 'fully open source' versions of Crawler. This will allow seamless integration of Crawler into an existing workflow. This type of integration always comes with training and ongoing support.
 +
 
 +
Contact [mailto:[email protected] [email protected]] for more info.
 +
 
 +
== Overview ==
  
 
Crawler is designed along principles that are similar to the ones found in the [[http://en.wikipedia.org/wiki/Dataflow_programming Data Flow Programming]] paradigm.
 
Crawler is designed along principles that are similar to the ones found in the [[http://en.wikipedia.org/wiki/Dataflow_programming Data Flow Programming]] paradigm.
  
Crawler all by itself does not perform any useful function. In order to become usable it needs to be extended with a personality. The personality determines what function Crawler will perform.
+
Crawler all by itself does not perform any useful function. In order to become usable it needs to be extended with a [[Personality|''personality'']].  
 +
 
 +
The selected ''personality'' determines what function Crawler will perform.
  
 
== Personality ==
 
== Personality ==
Line 9: Line 23:
 
One of the high-level components in a Crawler-based system is called a [[Personality|''personality'']].
 
One of the high-level components in a Crawler-based system is called a [[Personality|''personality'']].
  
A ''personality'' is a high-level Crawler component which will take input data in some shape or form, and will process it into output data in some other form.
+
A [[Personality|''personality'']] is a high-level Crawler component which will take input data in some shape or form, and will process it into output data in some other form.
  
 
A few examples:
 
A few examples:
Line 16: Line 30:
 
* InDesign-to-Database input: takes in InDesign document or books, and updates a database with information extracted from the document(s).
 
* InDesign-to-Database input: takes in InDesign document or books, and updates a database with information extracted from the document(s).
  
Personalities are made up out of simpler elements.  
+
When processing, input document(s) are pushed through a network of [[Adapter|''adapters'']] provided by the personality; data is flowing in and out of the ''adapters''.  
  
A personality is composed of:
+
[[File:Sampleexporter1.png|800px]]
* a workflow network of interconnected processing units called [[Adapter|''adapters'']]
+
* a set of [[Configuration File|''configuration files'']]
+
* a set of [[Template File|''template files'']]
+
* a set of [[Formula File|''formula files'']]
+
 
+
When processing, input document(s) are pushed through the network of ''adapters'' provided by the personality; data is flowing in and out of the ''adapters''.
+
  
 
If the ''personality'' were a hive, then the ''adapters'' would be the worker bees.
 
If the ''personality'' were a hive, then the ''adapters'' would be the worker bees.
Line 30: Line 38:
 
A ''personality'' is somewhat reminiscent of a [http://en.wikipedia.org/wiki/Rube_Goldberg_machine Rube Goldberg-machine].
 
A ''personality'' is somewhat reminiscent of a [http://en.wikipedia.org/wiki/Rube_Goldberg_machine Rube Goldberg-machine].
  
The initial ''adapters'' process the document, and take it apart into ever smaller chunks of data, or collate smaller chunks back into larger chunks. These chunks of data are called [[Granule|''granules'']].
+
The initial ''adapters'' process the document, and take it apart into ever smaller chunks of data.
 +
 
 +
The reverse also happens: some adapters collate smaller chunks back into larger chunks.  
  
More ''adapters'' take in larger ''granules'' and split them up into smaller ''granules'' (e.g. they might take in a paragraph ''granule'' and split it into individual word ''granules'').  
+
These 'chunks of data' are referred to as [[Granule|''granules'']].
  
Specific ''adapters'' collate smaller ''granules'' back into larger ''granules'' (e.g. they might take a number of word ''granules'' and concatenate them back into a paragraph ''granules'').
+
Example: an adapter might take in a paragraph ''granule'' and split it into individual word ''granules''. Another adapter further downstream might take a number of word ''granules'' and concatenate them back into a paragraph ''granules''.
  
 
Some ''adapters'' perform some kind of processing on the ''granules'' they receive; they might change them in some way, discard them, count them, reorder them, create new ''granules'' based on previous ''granules''...
 
Some ''adapters'' perform some kind of processing on the ''granules'' they receive; they might change them in some way, discard them, count them, reorder them, create new ''granules'' based on previous ''granules''...

Latest revision as of 02:59, 5 July 2019

Crawler-based Products

Crawler will become available in 2014 in two different flavors.

  • Middle: For more advanced workflows, we'll have customizable Crawler versions that will come with 'open source' personalities.

We'll also be able to provide training and ongoing support for developing or customizing personalities. These Crawler versions are geared to be deployed in a server setup. The anticipated applications are automated conversions, automated web publishing, and automated back-end database updates.

  • High end: For the most advanced setups we can also provide 'fully open source' versions of Crawler. This will allow seamless integration of Crawler into an existing workflow. This type of integration always comes with training and ongoing support.

Contact [email protected] for more info.

Overview

Crawler is designed along principles that are similar to the ones found in the [Data Flow Programming] paradigm.

Crawler all by itself does not perform any useful function. In order to become usable it needs to be extended with a personality.

The selected personality determines what function Crawler will perform.

Personality

One of the high-level components in a Crawler-based system is called a personality.

A personality is a high-level Crawler component which will take input data in some shape or form, and will process it into output data in some other form.

A few examples:

  • InDesign-to-XHTML/CSS: takes in InDesign documents or books and outputs XHTML/CSS files.
  • InDesign-to-EPUB: takes in InDesign documents or books, outputs EPUB.
  • InDesign-to-Database input: takes in InDesign document or books, and updates a database with information extracted from the document(s).

When processing, input document(s) are pushed through a network of adapters provided by the personality; data is flowing in and out of the adapters.

Sampleexporter1.png

If the personality were a hive, then the adapters would be the worker bees.

A personality is somewhat reminiscent of a Rube Goldberg-machine.

The initial adapters process the document, and take it apart into ever smaller chunks of data.

The reverse also happens: some adapters collate smaller chunks back into larger chunks.

These 'chunks of data' are referred to as granules.

Example: an adapter might take in a paragraph granule and split it into individual word granules. Another adapter further downstream might take a number of word granules and concatenate them back into a paragraph granules.

Some adapters perform some kind of processing on the granules they receive; they might change them in some way, discard them, count them, reorder them, create new granules based on previous granules...

Other adapters construct new granules based on template snippets. For example, some adapter could take in some raw text, and combine this raw text with a template snippet into some XML formatted granule.

The general idea is that the input data is broken apart into smaller entities, and then these smaller entities are put back together again a different shape, possibly performing a document conversion in the process.