You are here: Processor Library > Read and Write > Reader

Reader

A Reader is a special type of processor that is used to read data at the beginning of a process. Readers may connect to any of the following sources of data:

Staged data (that is, a snapshot of data - either present in the repository or not - or output data that has been written by another process);
A Data Interface (which can then be redirected to different sources of data using Mappings);
A set of Reference Data;
A Real time provider of messages (for example, the inbound interface of a Web Service)

A process must contain at least one Reader, but may contain many Readers, if matching data from multiple sources.

Use

Readers are used at the beginning of processes in order to select the sources of data that you are intending to work with in the process, and any selection and reordering of data attributes from that data source that are specific to the process you are intending to create. For example, for the purposes of a specific process, you may wish to select only the name and address fields from a data source, and you may wish to reorder them for the purpose of display throughout your process.

A Reader is automatically added to a process for you, since a process must always have at least one Reader.

Configuration

Reader Source

Select the Type of data that you wish to read from the following options:

Staged data - that is, a snapshot of data, or the named output of another process, in the OEDQ repository

Note: The snapshot does not necessarily have to exist in the repository. You may be intending to run the process in streaming mode, meaning the source data will not be copied into the repository.

Data Interface - that is, a configured source-independent interface of a set of data attributes
Reference Data - that is, a set of reference data that exists in the OEDQ repository
Real time provider - that is, a direct connection to a real time source of messages

Select the Source of data from the available sources of the selected type.

All the available attributes in the data appear in the left pane. Select those that you wish to work with in the process by using the arrow buttons to select, and de-select attributes:

selects the attributes selected in the left-hand pane as inputs to the process

selects all available attributes as inputs

de-selects the selected inputs in the right-hand pane

de-selects all inputs

In the right-hand pane, the attributes that you have chosen to work with may be re-ordered by drag-and-drop:

The order that you specify in the Reader will be used to display results throughout the process.

Note: If you know you are not intending to work with all the attributes of a given set of data, it is a good idea to exclude them in the Reader. This will make configuring your processors and browsing your results much more straight-forward as only the attributes you are interested in will be displayed.

If you are configuring a process to work with several readers, you can also choose to change the Data Stream Name. The Data Stream Name provides a way of referring to the data source where a process has multiple streams, such as when matching across several sources.

Options

Option	Type	Purpose	Default Value
No Data Reference Data	Reference Data (No Data Handling Category)	See note below	None

Note on No Data Handling

An option is provided to normalize different types of No Data values in the reader using a Reference Data map, so that the process based on the reader will treat these values in a common way. This option is not used by default, so that you can profile and understand the data in its 'pure' form, allowing you to identify the different types of No Data when profiling.

A system-level No Data Handling map is provided. If used, this will normalize any empty Strings, or Strings that consist entirely of No Data characters (that is, spaces, or other non-printing or control characters) to Null values.

This is the same functionality available when snapshotting data or using the Normalize No Data processor.

If you want to change the No Data characters, you can use a different No Data Handling map.

Execution

The Reader is a necessary part of any process, whatever the remit of that process is. Some processors are not suitable for certain types of execution, however. For example, it is not possible to match and consolidate data from numerous sources in a real time response process, but selecting a Real time Reader Source (as above) places no restrictions on the processors that are available for configuration, as the execution of a process is driven from how its Reader(s) and Writer(s) are configured.

In general, OEDQ is designed for three modes of execution:

Batch execution, where a set of records in one or more data sources is processed in batch.
Real time monitoring execution, where OEDQ acts as a data quality probe for a data source, monitoring incoming records for quality as they are created, but where no real time response to each record is expected.
Real time response execution, where OEDQ processes records, and passes them back along with extra data, on a real time response interface.

Each processor in the library is listed with the execution modes that can be sensibly used with that processor.

Results Browsing

The results browser for a Reader displays all the records present in the underlying data store once a process has been run.

Output Filters

The Reader does not provide any output filters. All records are read from the specified source and made available to the remainder of the process.

Example

The following example shows the records that are read from the example Service Management data in the example Service Management Project.

In this case, the Reader was configured to read all the data attributes from the source, without changing their order, and without using the default No Data Handling map. No further processing has yet been defined: