Configuring the data sampling

This section explains how the Sampler program functions, and shows how to select which fields in the event traces are relevant to AI Suite Studio.

How the Sampler works

The diagram below illustrates the specific operation of the Sampler program:

Items in bold in the text below are represented in the diagram.

During the accounting transformation session, a single input event into the Rule Engine may generate several output events, from several transformation rule executions. The set of output events is a consistent set of data, the result of the accounting transformation of the input event.

The Rule Engine produces input trace files and output trace files, that reflect the rule executions and the input and output events. 

Both input traces and output traces are fed into the Sampler program, but sampling is performed on the output traces only, and consists in finding new sets of data that have not yet been recorded as samples. The input traces are only useful for the user to see the values of the properties used in conditions, to activate the outputs of the accounting methods.

The output of the Sampler program is the Sampled data and is stored in a json file.

format is a type of object created in Designer. The formats created for AI Suite Studio have two specific properties, named Sampler key and Sampler data. Information about the formats used in the event trace files are found by the Sampler program in the Repository. The Sampler program's algorithm isolates unique sampled data based on a set of sampler keys. When a unique value is detected for the set of sampler keys, the regarding lines are kept to be sampled.

Example of data sampling

The table below is an example of output traces. Rows in green are samples, rows in white are discarded. Read on to find out the logic behind these choices.

The first lines in green represent a consistent set of data obtained after a transformation rule. The sampling key defined is composed from technical and functional properties:

  • Technical: Input event code, Input event version, Segment code, Transformation rule name, Rule start date, Financial case, Output event number and
  • Functional keys: Account, Currency
  1. As there is no other set of data sampled, this first one is kept and sampled.
  2. Then follow the lines in white that represent another consistent set of data.
    For each line the sampling keys are tested against the already kept and sampled set of keys. They are identical, so the whole set of data is dropped. The set of data will not be kept for sampling, because the information offered doesn’t provide anything new from the first set.
  3. Then follow the lines in green that form a consistent set of data (including the orange line as well). Reading the new set of data, everything is identical, except for the line in orange. On the orange line, the value for Financial case is different, this makes the sampling key different, so the set is a new one and will be sampled.

Sampling a set of data means writing in the Sampled data json output file, the information provided by the sampled data.

Sampler key and sampled data

These two notions are involved in the sampling process:

  • A set of sampler keys from the output format produce the unique identifier: the sampling key. This sampling key is the identifier that the Sampler program examines when taking the decision as to whether the consistent set of data is sampled or discarded. Among those keys, some may be labeled as functional keys and others are technical keys, corresponding to functional properties and technical properties.
  • The sampled data is the actual information written in the sampler output file for the consistent set of data. Technical properties are used internally by the application and functional properties are the ones representing the information the application handles.

A property of a format can be marked as sampler key and/or sampled data. This is done in Designer. If a property is marked as a sampler key then it is automatically marked as sampled data as well.

On the input event formats only properties marked as sampled data are used.

Set Sampler key and sampled data in Designer

When edited in Designer, the formats for AI Suite Studio have two properties for Sampler key and Sampler data:

Double-click on one of the rows to view the Property editor, then edit the fields in the Sampler group.


Related Links