OpenAPI SpecificationJSON

Automation

All imported documents are processed by the data extraction process to obtain values of fields specified in the schema. Extracted values are then available for validation in the UI.

Using per-queue automation settings, it is possible to skip manual UI validation step and automatically switch document to confirmed state or proceed with the export of the document. Decision to export document or switch it to confirmed state is based on Queue settings.

Currently, there are three levels of automation:

  • No automation: User has to review all documents in the UI to validate extracted data (default).

  • Confidence automation: User only reviews documents with low data extraction confidence ("tick" icon is not displayed for one or more fields) or validation errors. By default, we automate documents that are duplicates and do not automate documents that edits (split) is proposed. You can change this in per-queue automation settings

  • Full automation: All documents with no validation errors are exported or switched to confirmed state only if they do not contain a suggested edit (split). You can change this in per-queue automation settings

    An error triggered by a schema field constraint or connector validation blocks auto-export even in full-automation level. In such case, non-required fields with validation errors are cleared and validation is performed again. In case the error persists, the document must be reviewed manually, otherwise it is exported or switched to confirmed state.

Rossum never exports a document that contains validation errors. If you want to export all documents, it is necessary to set-up schema and connector in a way that no validation error may occur. Validation errors come up if the extracted data does not pass the validation rules set-up in the schema or connector.. the format is incorrect, constraints are not

followed, etc.). Please note that hidden fields are not validated and do not affect document automation.

Read more about the Automation framework in our knowledge base.

Sources of field validation

Low-confidence fields are considered to be not validated. On the API level they have an empty validation_sources list.

Validation of a field may be introduced by various sources: data extraction confidence above a threshold, computation of various checksums (e.g. VAT rate, net amount and gross amount) or a human review. These validations are recorded in the validation_source list. The data extraction confidence threshold may be adjusted, see validation sources for details.

AI Confidence Scores

While there are multiple ways to automatically pre-validate fields, the most prominent one is score-based validation based on AI Core Engine confidence scores.

The confidence score predicted for each AI-extracted field is stored in the rir_confidence attribute. The score is a number between 0.0 and 1.0, and is calibrated so that it approximately corresponds to the probability that the prediction is correct. In other words, fields with score 0.85 are expected to be correct roughly 85 out of 100 times. The mean calibrated score across many fields with various scores roughly corresponds to their accuracy.

The value of the score_threshold (can be set on queue, or individually per datapoint in schema; default is 0.8) attribute represents the minimum score that triggers automatic validation. Note that the threshold does not correspond to the accuracy of the fields passing the threshold!

An exception to the confidence score semantics may be Dedicated AI Engines based on low amounts of training data, which might not be calibrated. Please ask your Rossum technical contact about confidence scores in your case if you are using a Dedicated AI Engine.

Autopilot

Autopilot is a automatic process removing "eye" icon from fields. This process is based on past occurrence of field value on documents which has been already processed in the same queue.

Read more about this Automation component in our knowledge base.

Autopilot configuration

Example autopilot configuration:

{
  "autopilot": {
    "enabled": true,

    "search_history":{
      "rir_field_names": ["sender_ic", "sender_dic", "account_num", "iban", "sender_name"],
      "matching_fields_threshold": 2
    },
    "automate_fields":{
      "rir_field_names": [
        "account_num",
        "bank_num",
        "iban",
        "bic",
        "sender_dic",
        "sender_ic",
        "recipient_dic",
        "recipient_ic",
        "const_sym"
      ],
      "field_repeated_min": 3
    }
  }
}

Autopilot configuration can be modified in Queue.settings where you can set rules for each queue. If Autopilot is not explicitly disabled by switch enabled set to false, Autopilot is enabled.

Configuration is divided into two sections:

This section configures process of finding documents from the same sender as the document which is currently being processed. Annotation is considered from the same sender if it contains fields with same rir_field_name and value as the current document.

Example history search configuration:

{
  "search_history":{
    "rir_field_names": ["sender_ic", "sender_dic", "account_num"],
    "matching_fields_threshold": 2
  }
}
AttributeTypeDescription
rir_field_nameslistList of rir_field_names used to find annotations from the same sender. This should contain fields which are unique for each sender. For example sender_ic or sender_dic.
Please note that due to technical reasons it is not possible to use document_type in this field and it will be ignored.
matching_fields_thresholdintAt least matching_fields_threshold fields must match current annotation in order to be considered from the same sender. See example on the right side.

Automate fields

This section describes rules which will be applied on annotations found in previous step History search. Field will have "eye" icon removed, if we have found at least field_repeated_min fields with same rir_field_name and value on documents found in step History search.

AttributeTypeDescription
rir_field_nameslistList of rir_field_names which can be validated based on past occurrence
field_repeated_minintNumber of times field must be repeated in order to be validated

If any config section is missing, default value which you can see on the right side is applied.