Difference between revisions of "Disassembler"

From DocDataFlow
Jump to: navigation, search
(Created page with "A disassembler is an ''atomic adapter''. In a typical Crawler workflow, the larger granules that are broken apart by disassemblers are not stripped away an...")
 
 
(10 intermediate revisions by one user not shown)
Line 1: Line 1:
A disassembler is an [[Atomic Adapter|''atomic adapter'']].
+
A disassembler is an [[Atomic adapter|''atomic adapter'']].
  
In a typical Crawler workflow, the larger granules that are broken apart by disassemblers are not stripped away and remain part of the data flow. For example, when a disassembler breaks apart a 'paragraph' granule into a series of 'word' granules, the output of the disassembler will typically consist of a stream of word granules, followed by the original paragraph granule from which the word granules were extracted.
+
Disassemblers accept granules via their input connection.
An assembler further down the track will often mostly ignore such paragraph granule as far as its contents go. Instead it will collect the word granules, and wait for the paragraph granule solely as a terminating trigger to signify the series of word granules is complete.
+
 
 +
A disassembler will normally pass through ''all'' granules it receives.
 +
 
 +
It will also break some of the input granules down into smaller granules, and it will 'inject' these additional granules into the granule stream.
 +
 
 +
The smaller granules are injected ''before'' the larger input granule from which they have been extracted.
 +
 
 +
Granules that are of no interest to the disassembler are normally passed through unmodified.
 +
 
 +
For example, when a disassembler breaks apart a 'paragraph' granule into a series of 'word' granules, the output of the disassembler will typically consist of a stream of word granules, followed by the original paragraph granule from which the word granules were extracted.
 +
 
 +
An [[Assembler|''assembler'']] further down the data flow will often ignore such paragraph granule as far as its contents go. Instead it will collect the word granules, and wait for the paragraph granule solely as a terminating trigger to signify the series of word granules is complete.
 +
 
 +
An example input with three granules could look like this:
 +
 
 +
<pre>
 +
Para: this is a paragraph
 +
Para: this is another paragraph
 +
TextFrame: pos (10, 20), width 20, height 80
 +
</pre>
 +
 
 +
A paragraph disassembler might convert this input into the following output:
 +
 
 +
<pre>
 +
Word: this
 +
Word: is
 +
Word: a
 +
Word: paragraph
 +
Para: this is a paragraph
 +
Word: this
 +
Word: is
 +
Word: another
 +
Word: paragraph
 +
Para: this is another paragraph
 +
TextFrame: pos (10, 20), width 20, height 80
 +
</pre>
 +
 
 +
In other words: under normal circumstances, a disassembler will only ''add'' to the data flow. It won't take granules away.

Latest revision as of 04:46, 4 January 2014

A disassembler is an atomic adapter.

Disassemblers accept granules via their input connection.

A disassembler will normally pass through all granules it receives.

It will also break some of the input granules down into smaller granules, and it will 'inject' these additional granules into the granule stream.

The smaller granules are injected before the larger input granule from which they have been extracted.

Granules that are of no interest to the disassembler are normally passed through unmodified.

For example, when a disassembler breaks apart a 'paragraph' granule into a series of 'word' granules, the output of the disassembler will typically consist of a stream of word granules, followed by the original paragraph granule from which the word granules were extracted.

An assembler further down the data flow will often ignore such paragraph granule as far as its contents go. Instead it will collect the word granules, and wait for the paragraph granule solely as a terminating trigger to signify the series of word granules is complete.

An example input with three granules could look like this:

Para: this is a paragraph
Para: this is another paragraph
TextFrame: pos (10, 20), width 20, height 80

A paragraph disassembler might convert this input into the following output:

Word: this
Word: is
Word: a
Word: paragraph
Para: this is a paragraph
Word: this
Word: is
Word: another
Word: paragraph
Para: this is another paragraph
TextFrame: pos (10, 20), width 20, height 80

In other words: under normal circumstances, a disassembler will only add to the data flow. It won't take granules away.