Assembler

From DocDataFlow
Revision as of 02:44, 28 December 2013 by Kris (Talk | contribs)

Jump to: navigation, search

An assembler is an atomic adapter.

Assemblers accept granules via their input. They then use these input granules to construct larger granules. Typically, assemblers will rely on the presence of certain 'marker granules' in the input stream to decide when a constructed granule is ready to be output.

For example, an assembler could be collecting 'word granules', and string these 'word granules' together into a new 'word group' granule. At a certain point in time, the assembler needs to decide whether the 'word group' is complete. The presence of some other type granule (e.g. a 'text frame' granule) in its input will typically be the trigger to release the newly constructed 'word group' granule.

In a typical Crawler workflow, larger granules that are broken apart by disassemblers remain part of the data flow. For example, when a disassembler breaks apart a 'paragraph' granule into a series of word granules, the output of the disassembler will typically consist of a stream of word granules, followed by the original paragraph granule from which the word granules were extracted.

An assembler further down the track will often ignore the data of such paragraph granule. Instead it will collect the word granules, and wait for the paragraph granule solely as a terminating trigger to signify the series of word granules is complete.