Difference between revisions of "Overview"

From DocDataFlow
Jump to: navigation, search
 
(34 intermediate revisions by one user not shown)
Line 1: Line 1:
= Overview =
+
== Crawler-based Products ==
  
Crawler is designed along principles that are similar to the ones found in the [[http://en.wikipedia.org/wiki/Dataflow_programming Data Flow Programming]] paradigm.
+
Crawler will become available in 2014 in two different flavors.
  
== Personality ==
+
* Middle: For more advanced workflows, we'll have customizable Crawler versions that will come with 'open source' personalities.
  
One of the basic components in a Crawler-based system is called the 'Personality'.  
+
We'll also be able to provide training and ongoing support for developing or customizing personalities. These Crawler versions are geared to be deployed in a server setup. The anticipated applications are automated conversions, automated web publishing, and automated back-end database updates.
  
A Personality is a Crawler component which will take input data in some shape or form, and will process it into output data in some other form.
+
* High end: For the most advanced setups we can also provide 'fully open source' versions of Crawler. This will allow seamless integration of Crawler into an existing workflow. This type of integration always comes with training and ongoing support.
  
A few examples:
+
Contact [mailto:sales@rorohiko.com sales@rorohiko.com] for more info.
* InDesign-to-XHTML/CSS: takes in InDesign documents or books and outputs XHTML/CSS files.  
+
* InDesign-to-EPUB: takes in InDesign documents or books, outputs EPUB.  
+
* InDesign-to-Database input: takes in InDesign document or books, and updates a database with information extracted from the document(s).
+
  
Personalities are constructed out of simpler elements.
+
== Overview ==
  
A personality is composed of:
+
Crawler is designed along principles that are similar to the ones found in the [[http://en.wikipedia.org/wiki/Dataflow_programming Data Flow Programming]] paradigm.
* a workflow network of interconnected processing units called [[Adapters|''adapters'']]
+
* a set of [configuration files]
+
* a set of [template files]
+
* a set of [formula files]
+
  
== Adapters ==
+
Crawler all by itself does not perform any useful function. In order to become usable it needs to be extended with a [[Personality|''personality'']].
  
The input document(s) are pushed through the network of adapters provided by the personality.  
+
The selected ''personality'' determines what function Crawler will perform.
  
These adapters then process the document, and take it apart into ever smaller chunks of data, or collate smaller chunks back into larger chunks. These chunks of data are called 'Granules'.
+
== Personality ==
  
Some adapters take in larger granules and split them up into smaller granules (e.g. they might take in a paragraph granule and split it into individual word granules).  
+
One of the high-level components in a Crawler-based system is called a [[Personality|''personality'']].
  
Some adapters collate smaller granules back into larger granules (e.g. they might take a number of word granules and concatenate them back into a paragraph granules).  Other adapters perform some kind of processing on the granules they receive; they might change them in some way, discard them, create new granules based on previous granules...
+
A [[Personality|''personality'']] is a high-level Crawler component which will take input data in some shape or form, and will process it into output data in some other form.
 +
 
 +
A few examples:
 +
* InDesign-to-XHTML/CSS: takes in InDesign documents or books and outputs XHTML/CSS files.  
 +
* InDesign-to-EPUB: takes in InDesign documents or books, outputs EPUB.
 +
* InDesign-to-Database input: takes in InDesign document or books, and updates a database with information extracted from the document(s).
  
Some adapters construct new granules based on template snippet. For example, some adapter could take in some raw text, and combine this raw text with a template snippet into some XML formatted granule.
+
When processing, input document(s) are pushed through a network of [[Adapter|''adapters'']] provided by the personality; data is flowing in and out of the ''adapters''.  
  
The general idea is that the input data is broken apart into smaller entities, and then these smaller entities are put back together again a different shape.
+
[[File:Sampleexporter1.png|800px]]
  
== Template Files ==
+
If the ''personality'' were a hive, then the ''adapters'' would be the worker bees.
  
Template files in Crawler come in many types. One of the most common types is the 'snippet template'. These are template files that generate a snippet of text.  
+
A ''personality'' is somewhat reminiscent of a [http://en.wikipedia.org/wiki/Rube_Goldberg_machine Rube Goldberg-machine].
  
The template file is simply a text file with a .snippet file name extension.
+
The initial ''adapters'' process the document, and take it apart into ever smaller chunks of data.  
  
Inside the template file, there is a mix of boilerplate text and placeholders. An example: there could be a template maindoc.xhtml.snippet which could contain
+
The reverse also happens: some adapters collate smaller chunks back into larger chunks.  
  
<pre>
+
These 'chunks of data' are referred to as [[Granule|''granules'']].
<html>
+
$$HEAD$$
+
$$BODY$$
+
</html>
+
</pre>
+
  
The $$HEAD$$ and $$BODY$$ would be placeholders which will be replaced by the text for some lower-level granules.
+
Example: an adapter might take in a paragraph ''granule'' and split it into individual word ''granules''.  Another adapter further downstream might take a number of word ''granules'' and concatenate them back into a paragraph ''granules''.
  
This is only a sample: placeholders can take many shapes and forms, and the use of $$ as prefix/suffix is just what's used as the default placeholder pattern in Crawler.
+
Some ''adapters'' perform some kind of processing on the ''granules'' they receive; they might change them in some way, discard them, count them, reorder them, create new ''granules'' based on previous ''granules''...
  
== Formula Files ==
+
Other ''adapters'' construct new ''granules'' based on [[Template Snippet|''template snippets'']]. For example, some ''adapter'' could take in some raw text, and combine this raw text with a ''template snippet'' into some XML formatted ''granule''.
  
Formula files in Crawler are JavaScript-like files which allow defining placeholders as JavaScript functions.
+
The general idea is that the input data is broken apart into smaller entities, and then these smaller entities are put back together again a different shape, possibly performing a document conversion in the process.

Latest revision as of 02:59, 5 July 2019

Crawler-based Products

Crawler will become available in 2014 in two different flavors.

  • Middle: For more advanced workflows, we'll have customizable Crawler versions that will come with 'open source' personalities.

We'll also be able to provide training and ongoing support for developing or customizing personalities. These Crawler versions are geared to be deployed in a server setup. The anticipated applications are automated conversions, automated web publishing, and automated back-end database updates.

  • High end: For the most advanced setups we can also provide 'fully open source' versions of Crawler. This will allow seamless integration of Crawler into an existing workflow. This type of integration always comes with training and ongoing support.

Contact [email protected] for more info.

Overview

Crawler is designed along principles that are similar to the ones found in the [Data Flow Programming] paradigm.

Crawler all by itself does not perform any useful function. In order to become usable it needs to be extended with a personality.

The selected personality determines what function Crawler will perform.

Personality

One of the high-level components in a Crawler-based system is called a personality.

A personality is a high-level Crawler component which will take input data in some shape or form, and will process it into output data in some other form.

A few examples:

  • InDesign-to-XHTML/CSS: takes in InDesign documents or books and outputs XHTML/CSS files.
  • InDesign-to-EPUB: takes in InDesign documents or books, outputs EPUB.
  • InDesign-to-Database input: takes in InDesign document or books, and updates a database with information extracted from the document(s).

When processing, input document(s) are pushed through a network of adapters provided by the personality; data is flowing in and out of the adapters.

Sampleexporter1.png

If the personality were a hive, then the adapters would be the worker bees.

A personality is somewhat reminiscent of a Rube Goldberg-machine.

The initial adapters process the document, and take it apart into ever smaller chunks of data.

The reverse also happens: some adapters collate smaller chunks back into larger chunks.

These 'chunks of data' are referred to as granules.

Example: an adapter might take in a paragraph granule and split it into individual word granules. Another adapter further downstream might take a number of word granules and concatenate them back into a paragraph granules.

Some adapters perform some kind of processing on the granules they receive; they might change them in some way, discard them, count them, reorder them, create new granules based on previous granules...

Other adapters construct new granules based on template snippets. For example, some adapter could take in some raw text, and combine this raw text with a template snippet into some XML formatted granule.

The general idea is that the input data is broken apart into smaller entities, and then these smaller entities are put back together again a different shape, possibly performing a document conversion in the process.