From DocDataFlow
Jump to: navigation, search

Context contents

A context is a collection of data that is relevant to a particular granule.

For example, when a text frame in some document is represented by a granule for processing inside a Crawler personality, it is accompanied by its context.

That context will include information like:

  • what page is the text frame on?
  • what is the text frame position on that page?
  • what document is that page on?
  • ...

The data relating to the text frame is split into two parts:

  • the granule itself, with its own raw data ‘inside’
  • any additional information about the granule and its surroundings is stored in the context.

The context contains all the other data that is not part of the granule, but is relevant to it.

Once created, granules remain fixed and the data in them does not change. They often directly reflect properties and information extracted from the source document, and these remain constant.

Contexts, on the other hand, are not fixed: as granules flow through various adapters, their context can accumulate additional data. It's normal for a granule to start out with an almost empty context. As it progresses through the various adapters, the context will collect more and more data, until the granule is either output or absorbed into a larger granule.

In a Crawler workflow, adapters and granules are fixed, constant entities: once created they don't change. Any changes that accumulate during the process are tracked in a context.

Context hierarchies

Contexts are arranged into a hierarchy.


Example: when we look at a 'text frame' granule, it will probably be a sub-granule of a larger ‘page’ granule.

The 'page' granule itself is a sub-granule of a larger ‘document’ granule.

Each of those granules will have its own context. There will be a context for the document granule, and another context for the page granule.

The page context will be a subcontext of the document context: i.e. the page context will include all info from the document context, plus its own specific data.

The text frame context will be a subcontext of the page context: i.e. the text frame context will include all info from the page context, plus its own specific data.

The various adapters in a workflow will often pass information to one another by means of the context.

During the Crawler process, we'll often refer to certain information by name. For example, when processing a template snippet the template text contains placeholders, like $$XPOS$$.

Such placeholders are interpreted within the relevant context. A single snippet will normally be used to process many individual granules; each of the granules will come with its own context, and placeholders like $$XPOS$$ will be replaced by different values every time, depending on what the context dictates for the value of XPOS.

If a certain placeholder is not defined within a particular context, Crawler will check the parent context, and the parent's parent and so on.

There is a top-level context, the app context. This is a 'root context' which serves as the ultimate parent to all contexts that exist during the process. This app context stores system-wide information that is to be shared by all contexts.