Difference between revisions of "Granule Acceptance"

From DocDataFlow
Jump to: navigation, search
 
(11 intermediate revisions by one user not shown)
Line 1: Line 1:
An important mechanism in Crawler is the idea of 'granule acceptance' by adapters.
+
= Three Main Criteria =
  
When a granule is presented to any adapter for processing, the adapter can accept or reject the granule based on a number of criteria.
+
An important mechanism in Crawler is the idea of 'granule acceptance' by [[Adapter|''adapters'']].  
  
Some of these criteria are part of the default infrastructure of Crawler, and are checked automatically. However, these automatic criteria can always be overruled by a particular type of adapter or adapter network.  
+
When a [[Granule|''granule'']] is presented to any [[Adapter|''adapter'']] for processing, the adapter can accept or reject the granule based on a number of criteria.
  
The default criteria are only there for convenience: they will be 'the right thing' in most cases. They can be adjusted for the more uncommon cases where the acceptance criteria need to be different.
+
Some of these criteria are part of the default infrastructure of Crawler, and are provided automatically, by default.  
  
= Visit Counting =
+
These automatic criteria can always be overruled by a specific types of adapter or adapter network.
  
The first default criterium: granules are not normally accepted twice by the same adapter: only one 'visit' is allowed.
+
These default criteria are only provided for convenience, and they will do 'the right thing' in most cases.  
  
In some of the more complex personalities, you might see 'adapter loops': networks of adapters where the output of an adapter further down the data flow feeds back into the input of an adapter earlier in the data flow. These loops will often rely on the 'don't accept twice' mechanism to avoid getting caught into endless loops.
+
They can be adjusted for the more uncommon cases where the acceptance criteria need to be different.
  
For example, here is a schematic representation of the network used for document conversion in Crawler:
+
== Visit Counting ==
 +
 
 +
The first default criterium: by default, granules are not accepted twice by the same adapter: only one 'visit' is allowed. This behavior can be overridden.
 +
 
 +
In some of the more complex personalities, you might see 'adapter loops': networks of adapters where the output of an adapter further down the data flow feeds back into the input of an adapter earlier in the data flow.
 +
 
 +
These loops will often rely on the 'don't accept twice' mechanism to avoid getting caught into endless loops.
 +
 
 +
A practical example. Below a schematic representation of the adapter network used for document conversion in Crawler:
  
 
[[File:Sampleexporter.png|800px]]
 
[[File:Sampleexporter.png|800px]]
  
Note that the ViewAssembler sits at the core of a number of 'adapter loops'. The 'ViewAssembler' and 'Selector' adapters in this network have been modified to not count visits: they both allow granules to 'pass through' more than once.  
+
Note that the ViewAssembler sits at the core of a number of 'adapter loops'.  
 +
 
 +
The 'ViewAssembler' and [[Selector|'Selector']] adapters in this network are exceptions. They have been modified to allow unlimited visits, so they both allow granules to 'pass through' more than once.
 +
 
 +
On the other hand, the individual [[Processor|''Processor'']] sub-adapters of the [[Selector|''Selector'']] ''do'' use the default visit counting: they only allow one visit by any granule.
 +
 
 +
That means that once a granule goes round one of the loops, it'll go back through the ViewAssembler, then the [[Selector|''Selector'']].
 +
 
 +
The [[Selector|''Selector'']] will not send the granule back to the same [[Processor|''Processor'']] adapter because that processor will reject it: it has 'seen' that granule before, and it only allows one visit.
 +
 
 +
As a result, the Selector will work its way down its list of options, and every time round it will pick the next eligible adapte.
 +
 
 +
If there aren't any more, it'll pick the Output adapter.
 +
 
 +
In this example, the visit counting is used to set up a mechanism where granules go round and round the network, but take a different path every time.
 +
 
 +
Each and every granule which ever roams the data flow network gets assigned a unique identifier when it is created.
 +
 
 +
Once created, a granule never changes: all that can happen to it is that it can be dropped from the data flow, and/or replaced by one or more new granules with different identifiers.
 +
 
 +
Through this unique identifier, adapters are able to track how many times they've seen a particular granule.
 +
 
 +
== Granule Type Acceptance ==
 +
 
 +
A second default criterium is the granule type.
 +
 
 +
Every adapter can be configured to only accept particular granule types.
 +
 
 +
Many adapters will not use this mechanism, and simply accept all granule types. For example, most [[Selector|''selectors'']] will accept any granule type.
 +
 
 +
But the 'sub-adapters' of such selector will often use the granule type to accept or reject a certain granule, and hence help the [[Selector|''Selector'']] to decide what sub-adapter the granule should be sent to.
 +
 
 +
== Programmatic Acceptance ==
 +
 
 +
The third criterium is programmatically defined. When a software developer creates an adapter, they can opt to implement a special method ''canProcessGranule'', which either returns true or false.
 +
 
 +
The default implementation of ''canProcessGranule'' implements the ''visit count'' and ''granule type'' granule acceptance mechanism.
 +
 
 +
A customized adapter can either enhance or re-implement this method, and use various other criteria to accept or reject a granule.
  
However, the individual sub-adapters of the Selector do use the default visit counting: they only allow one visit.
+
For example, some adapter could be made to accept only paragraph granules that have a certain minimum length. Or an adapter could be made to only accept InDesign text frame granules that have a background color, and so on...

Latest revision as of 23:28, 29 December 2013

Three Main Criteria

An important mechanism in Crawler is the idea of 'granule acceptance' by adapters.

When a granule is presented to any adapter for processing, the adapter can accept or reject the granule based on a number of criteria.

Some of these criteria are part of the default infrastructure of Crawler, and are provided automatically, by default.

These automatic criteria can always be overruled by a specific types of adapter or adapter network.

These default criteria are only provided for convenience, and they will do 'the right thing' in most cases.

They can be adjusted for the more uncommon cases where the acceptance criteria need to be different.

Visit Counting

The first default criterium: by default, granules are not accepted twice by the same adapter: only one 'visit' is allowed. This behavior can be overridden.

In some of the more complex personalities, you might see 'adapter loops': networks of adapters where the output of an adapter further down the data flow feeds back into the input of an adapter earlier in the data flow.

These loops will often rely on the 'don't accept twice' mechanism to avoid getting caught into endless loops.

A practical example. Below a schematic representation of the adapter network used for document conversion in Crawler:

Sampleexporter.png

Note that the ViewAssembler sits at the core of a number of 'adapter loops'.

The 'ViewAssembler' and 'Selector' adapters in this network are exceptions. They have been modified to allow unlimited visits, so they both allow granules to 'pass through' more than once.

On the other hand, the individual Processor sub-adapters of the Selector do use the default visit counting: they only allow one visit by any granule.

That means that once a granule goes round one of the loops, it'll go back through the ViewAssembler, then the Selector.

The Selector will not send the granule back to the same Processor adapter because that processor will reject it: it has 'seen' that granule before, and it only allows one visit.

As a result, the Selector will work its way down its list of options, and every time round it will pick the next eligible adapte.

If there aren't any more, it'll pick the Output adapter.

In this example, the visit counting is used to set up a mechanism where granules go round and round the network, but take a different path every time.

Each and every granule which ever roams the data flow network gets assigned a unique identifier when it is created.

Once created, a granule never changes: all that can happen to it is that it can be dropped from the data flow, and/or replaced by one or more new granules with different identifiers.

Through this unique identifier, adapters are able to track how many times they've seen a particular granule.

Granule Type Acceptance

A second default criterium is the granule type.

Every adapter can be configured to only accept particular granule types.

Many adapters will not use this mechanism, and simply accept all granule types. For example, most selectors will accept any granule type.

But the 'sub-adapters' of such selector will often use the granule type to accept or reject a certain granule, and hence help the Selector to decide what sub-adapter the granule should be sent to.

Programmatic Acceptance

The third criterium is programmatically defined. When a software developer creates an adapter, they can opt to implement a special method canProcessGranule, which either returns true or false.

The default implementation of canProcessGranule implements the visit count and granule type granule acceptance mechanism.

A customized adapter can either enhance or re-implement this method, and use various other criteria to accept or reject a granule.

For example, some adapter could be made to accept only paragraph granules that have a certain minimum length. Or an adapter could be made to only accept InDesign text frame granules that have a background color, and so on...