Automation
All imported documents are processed by the data extraction process to obtain values of fields specified in the schema. Extracted values are then available for validation in the UI.
Using per-queue automation settings, it is possible to skip manual UI validation step and automatically switch document to confirmed state or proceed with the export of the document. Decision to export document or switch it to confirmed state is based on Queue settings.
Currently, there are three levels of automation:
-
No automation: User has to review all documents in the UI to validate extracted data (default).
-
Confidence automation: User only reviews documents with low data extraction confidence ("tick" icon is not displayed for one or more fields) or validation errors. By default, we automate documents that are duplicates and do not automate documents that edits (split) is proposed. You can change this in per-queue automation settings
-
Full automation: All documents with no validation errors are exported or switched to confirmed state only if they do not contain a suggested edit (split). You can change this in per-queue automation settings
An error triggered by a schema field constraint or connector validation blocks auto-export even in full-automation level. In such case, non-required fields with validation errors are cleared and validation is performed again. In case the error persists, the document must be reviewed manually, otherwise it is exported or switched to confirmed state.
Rossum never exports a document that contains validation errors. If you want to export all documents, it is necessary to set-up schema and connector in a way that no validation error may occur. Validation errors come up if the extracted data does not pass the validation rules set-up in the schema or connector.. the format is incorrect, constraints are not
followed, etc.). Please note that hidden fields are not validated and do not affect document automation.
Read more about the Automation framework in our knowledge base.Sources of field validation
Low-confidence fields are considered to be not validated. On the API level they have an empty validation_sources list.
Validation of a field may be introduced by various sources: data extraction
confidence above a threshold, computation of various checksums (e.g. VAT rate,
net amount and gross amount) or a human review. These validations are recorded in
the validation_source list. The data extraction confidence threshold may be
adjusted, see validation sources for details.
AI Confidence Scores
While there are multiple ways to automatically pre-validate fields, the most prominent one is score-based validation based on AI Core Engine confidence scores.
The confidence score predicted for each AI-extracted field is stored in the
rir_confidence attribute. The score is a number between 0.0 and 1.0, and is
calibrated so that it approximately corresponds to the probability that the prediction is correct.
In other words, fields with score 0.85 are expected to be correct roughly 85 out of 100 times.
The mean calibrated score across many fields with various scores roughly corresponds to their accuracy.
The value of the score_threshold (can be set on queue,
or individually per datapoint in schema; default is 0.8)
attribute represents the minimum score that triggers automatic validation.
Note that the threshold does not correspond to the accuracy of the fields passing the threshold!
An exception to the confidence score semantics may be Dedicated AI Engines based on low amounts of training data, which might not be calibrated. Please ask your Rossum technical contact about confidence scores in your case if you are using a Dedicated AI Engine.
Autopilot
Autopilot is a automatic process removing "eye" icon from fields. This process is based on past occurrence of field value on documents which has been already processed in the same queue.
Read more about this Automation component in our knowledge base.Autopilot configuration
Example autopilot configuration:
{
"autopilot": {
"enabled": true,
"search_history":{
"rir_field_names": ["sender_ic", "sender_dic", "account_num", "iban", "sender_name"],
"matching_fields_threshold": 2
},
"automate_fields":{
"rir_field_names": [
"account_num",
"bank_num",
"iban",
"bic",
"sender_dic",
"sender_ic",
"recipient_dic",
"recipient_ic",
"const_sym"
],
"field_repeated_min": 3
}
}
}Autopilot configuration can be modified in Queue.settings where you can set
rules for each queue.
If Autopilot is not explicitly disabled by switch enabled set to false, Autopilot is enabled.
Configuration is divided into two sections:
History search
This section configures process of finding documents from the same sender as the document which is currently being processed. Annotation is considered from the same sender if it contains fields with same rir_field_name and value as the current document.
Example history search configuration:
{
"search_history":{
"rir_field_names": ["sender_ic", "sender_dic", "account_num"],
"matching_fields_threshold": 2
}
}| Attribute | Type | Description |
|---|---|---|
| rir_field_names | list | List of rir_field_names used to find annotations from the same sender. This should contain fields which are unique for each sender. For example sender_ic or sender_dic.Please note that due to technical reasons it is not possible to use document_type in this field and it will be ignored. |
| matching_fields_threshold | int | At least matching_fields_threshold fields must match current annotation in order to be considered from the same sender. See example on the right side. |
Automate fields
This section describes rules which will be applied on annotations found in previous step History search.
Field will have "eye" icon removed, if we have found at least field_repeated_min fields with same rir_field_name and value
on documents found in step History search.
| Attribute | Type | Description |
|---|---|---|
| rir_field_names | list | List of rir_field_names which can be validated based on past occurrence |
| field_repeated_min | int | Number of times field must be repeated in order to be validated |
If any config section is missing, default value which you can see on the right side is applied.