OpenAPI SpecificationJSON

Queue Schema

Every queue has an associated schema that specifies which fields will be extracted from documents as well as the structure of the data sent to connector and exported from the platform.

See the introduction to Rossum customization for a high-level overview of configuring the captured fields and managing schemas.

The best visual guide to the schema JSON that will get you started tweaking it in the Rossum app is our tutorial on editing the extraction schema. Especially when maintaining long dropdown select boxes.

Rossum schema supports data fields with single values (datapoint), fields with multiple values (multivalue) or tuples of fields (tuple). At the topmost level, each schema consists of sections, which may either directly contain actual data fields (datapoints) or use nested multivalues and tuples as containers for single datapoints.

But while schema may theoretically consist of an arbitrary number of nested containers, the Rossum UI supports only certain particular combinations of datapoint types. The supported shapes are:

  • simple: atomic datapoints of type number, string, date or enum

  • list: simple datapoint within a multivalue

  • tabular: simple datapoint within a "multivalue tuple" (a multivalue list containing a tuple for every row)

Schema content

Schema content consists of a list of section objects.

Common attributes

The following attributes are common for all schema objects:

AttributeTypeDescriptionRequired
categorystringCategory of an object, one of section, multivalue, tuple or datapoint.yes
idstringUnique identifier of an object. Maximum length is 50 characters.yes
labelstringUser-friendly label for an object, shown in the user interfaceyes
hiddenbooleanIf set to true, the object is not visible in the user interface, but remains stored in the database and may be exported. Default is false. Note that section is hidden if all its children are hidden.no
disable_predictionbooleanCan be set to true to disable field extraction, while still preserving the data shape. Ignored by aurora engines.no

Section

Example section object:

{
  "category": "section",
  "id": "amounts_section",
  "label": "Amounts",
  "children": [...],
  "icon": ""
}

Section represents a logical part of the document, such as amounts or vendor info. It is allowed only at the top level. Schema allows multiple sections, and there should be at least one section in the schema.

AttributeTypeDescriptionRequired
childrenlist[object]Specifies objects grouped under a given section. It can contain multivalue or datapoint objects.yes
iconstringThe icon that appears on the left panel in the UI for a given section (not yet supported on UI).

Datapoint

A datapoint represents a single value, typically a field of a document or some global document information. Fields common to all datapoint types:

AttributeTypeDescriptionRequired
typestringData type of the object, must be one of the following: string, number, date, enum, buttonyes
can_exportbooleanIf set to false, datapoint is not exported through export endpoint. Default is true.
can_collapsebooleanIf set to true, tabular (multivalue-tuple) datapoint may be collapsed in the UI. Default is false.
rir_field_nameslist[string]List of references used to initialize an object value. See below for the description.
default_valuestringDefault value used either for fields that do not use hints from AI engine predictions (i.e. rir_field_names are not specified), or when the AI engine does not return any data for the field.
constraintsobjectA map of various constraints for the field. See Value constraints.
ui_configurationobjectA group of settings affecting behaviour of the field in the application. See UI configuration.
widthintegerWidth of the column (in characters). Default widths are: number: 8, string: 20, date: 10, enum: 20. Only supported for table datapoints.
stretchbooleanIf total width of columns doesn't fill up the screen, datapoints with stretch set to true will be expanded proportionally to other stretching columns. Only supported for table datapoints.
width_charsinteger(Deprecated) Use width and stretch properties instead.
score_thresholdfloat [0;1]Threshold used to automatically validate field content based on AI confidence scores. If not set, queue.default_score_threshold is used.
formulastring[0;2000]Formula definition, required only for fields of type formula, see Formula Fields. rir_field_names should also be empty for these fields.
promptstring[0;2000]Prompt definition, required only for fields of type reasoning.
contextlist[string]Context for the prompt, required only for fields of type reasoning see Logical Types.

rir_field_names attribute allows to specify source of initial value of the object. List items may be:

  • one of extracted field types to use the AI engine prediction value
  • upload:id to identify a value specified while uploading the document
  • edit:id to identify a value specified in edit_pages endpoint
  • email_header:<id> to use a value extracted from email headers. Supported email headers: from, to, reply-to, subject, message-id, date.
  • email_body:<id> to select email body. Supported values are text_html for HTML body or text_plain for plain text body.
  • email:<id> to identify a value specified in email.received hook response
  • emails_import:<id> to identify a value specified in the values parameter when importing an email.

If more list items in rir_field_names are specified, the first available value will be used.

String type

Example string type datapoint with constraints:

{
  "category": "datapoint",
  "id": "document_id",
  "label": "Invoice ID",
  "type": "string",
  "default_value": null,
  "rir_field_names": ["document_id"],
  "constraints": {
    "length": {
      "exact": null,
      "max": 16,
      "min": null
    },
    "regexp": {
      "pattern": "^INV[0-9]+$"
    },
    "required": false
  }
}

String datapoint does not have any special attribute.

Date type

Example date type datapoint:

{
  "id": "item_delivered",
  "type": "date",
  "label": "Item Delivered",
  "format": "MM/DD/YYYY",
  "category": "datapoint"
}

Attributes specific to Date datapoint:

AttributeTypeDescriptionRequired
formatstringEnforces a format for date datapoint on the UI. See Date format below for more details. Default is YYYY-MM-DD.

Date format supported: available tokens

Example date formats:

  • D/M/YYYY: e.g. 23/1/2019
  • MM/DD/YYYY: e.g. 01/23/2019
  • YYYY-MM-DD: e.g. 2019-01-23 (ISO date format)

Number type

Example number type datapoint:

{
  "id": "item_quantity",
  "type": "number",
  "label": "Quantity",
  "format": "#,##0.#",
  "category": "datapoint"
}

Attributes specific to Number datapoint:

AttributeTypeDefaultDescriptionRequired
formatstring# ##0.#Available choices for number format show table below. null value is allowed.
aggregationsobjectA map of various aggregations for the field. See aggregations.

The following table shows numeric formats with their examples.

FormatExample
# ##0,#1 234,5 or 1234,5
# ##0.#1 234.5 or 1234.5
#,##0.#1,234.5 or 1234.5
#'##0.#1'234.5 or 1234.5
#.##0,#1.234,5 or 1234,5
# ##01 234 or 1234
#,##01,234 or 1234
#'##01'234 or 1234
#.##01.234 or 1234
Aggregations

Example number type datapoint with sum aggregation:

{
  "id": "quantity",
  "type": "number",
  "label": "Quantity",
  "category": "datapoint",
  "aggregations": {
    "sum": {
      "label": "Total"
    }
  },
  "default_value": null,
  "rir_field_names": []
}

Aggregations allow computation of some informative values, e.g. a sum of a table column with numeric values. These are returned among messages when the validate endpoint is called. Aggregations can be computed only for tables (multivalues of tuples).

AttributeTypeDescriptionRequired
sumobjectSum of values in a column. Default label: "Sum".

All aggregation objects can have an attribute label that will be shown in the UI.

Enum type

Example enum type datapoint with options:

{
  "id": "document_type",
  "type": "enum",
  "label": "Document type",
  "hidden": false,
  "category": "datapoint",
  "options": [
    {
      "label": "Invoice Received",
      "value": "21"
    },
    {
      "label": "Invoice Sent",
      "value": "22"
    },
    {
      "label": "Receipt",
      "value": "23"
    }
  ],
  "default_value": "21",
  "rir_field_names": [],
  "enum_value_type": "number"
}

Attributes specific to Enum datapoint:

AttributeTypeDescriptionRequired
optionsobjectSee object description below.yes
enum_value_typestringData type of the option's value attribute. Must be one of the following: string, number, dateno

Every option consists of an object with keys:

AttributeTypeDescriptionRequired
valuestringValue of the option.yes
labelstringUser-friendly label for the option, shown in the UI.yes

Enum datapoint value is matched in a case insensitive mode, e.g. EUR currency value returned by the AI Core Engine is matched successfully against &#123;"value": "eur", "label": "Euro"&#125; option.

Button type

Specifies a button shown in Rossum UI. For more details please refer to custom UI extension.

Example button type datapoint:

{
  "id": "show_email",
  "type": "button",
  "category": "datapoint",
  "popup_url": "http://example.com/show_customer_data",
  "can_obtain_token": true
}

Buttons cannot be direct children of multivalues (simple multivalues with buttons are not allowed. In tables, buttons are children of tuples). Despite being a datapoint object, button currently cannot hold any value. Therefore, the set of available Button datapoint attributes is limited to:

AttributeTypeDescriptionRequired
typestringData type of the object, must be one of the following: string, number, date, enum, buttonyes
can_exportbooleanIf set to false, datapoint is not exported through export endpoint. Default is true.
can_collapsebooleanIf set to true, tabular (multivalue-tuple) datapoint may be collapsed in the UI. Default is false.
popup_urlstringURL of a popup window to be opened when button is pressed.
can_obtain_tokenbooleanIf set to true the popup window is allowed to ask the main Rossum window for authorization token

Value constraints

Example datapoint with value constraints:

{
  "id": "document_id",
  "type": "string",
  "label": "Invoice ID",
  "category": "datapoint",
  "constraints": {
    "length": {
      "max": 32,
      "min": 5
    },
    "required": false
  },
  "default_value": null,
  "rir_field_names": [
    "document_id"
  ]
}

Constraints limit allowed values. When constraints is not satisfied, annotation is considered invalid and cannot be exported.

AttributeTypeDescriptionRequired
lengthobjectDefines minimum, maximum or exact length for the datapoint value. By default, minimum and maximum are 0 and infinity, respectively. Supported attributes: min, max and exact
regexpobjectWhen specified, content must match a regular expression. Supported attributes: pattern. To ensure that entire value matches, surround your regular expression with ^ and $.
requiredbooleanSpecifies if the datapoint is required by the schema. Default value is true.

UI configuration

Example datapoint with UI configuration:

{
  "id": "document_id",
  "type": "string",
  "label": "Invoice ID",
  "category": "datapoint",
  "ui_configuration": {
    "type":  "captured",
    "edit": "disabled"
  },
  "default_value": null,
  "rir_field_names": [
    "document_id"
  ]
}

UI configuration provides a group of settings, which alter behaviour of the field in the application. This does not affect behaviour of the field via the API. For example, disabling edit prohibits changing a value of the datapoint in the application, but the value can still be modified through API.

AttributeTypeDescriptionRequired
typestringLogical type of the datapoint. Possible values are: captured, data, manual, formula, reasoning or null. Default value is null.false
editstringWhen set to disabled, value of the datapoint is not editable via UI. When set to enabled_without_warning, no warnings are displayed in the UI regarding this fields editing behaviour. Default value is enabled, this option enables field editing, but user receives dismissible warnings when doing so.false

Logical types

  • Captured field represents information retrieved by the OCR model. If combined with edit option disabled, user can't overwrite field's value, but is able to redraw field's bounding box and select another value from the document by such an action.
  • Data field represents information filled by extensions. This field is not mapped to the AI model, so it does not have a bounding box, neither it's possible to create one. If combined with edit option disabled the field can't be modified from the UI.
  • Manual field behaves exactly like Data field, without the mapping to extensions. This field should be used for sharing information between users or to transfer information to downstream systems.
  • Formula field This field will be updated according to its formula definition, see Formula Fields. If the edit option is not disabled the field value can be overridden from the UI (see no_recalculation).
  • Reasoning fields This field will be updated according to its prompt and context. context supports adding related schema fields in a format of TxScript strings (e.g. field.invoice_id, also self.attr.label and self.attr.description are supported). If the edit option is not disabled the field value can be overridden from the UI (see no_recalculation).
  • null value is displayed in UI as Unset and behaves similar to the Captured field.

Multivalue

Example simple multivalue:

{
  "category": "multivalue",
  "id": "po_numbers",
  "label": "PO numbers",
  "children": {
    ...
  },
  "show_grid_by_default": false,
  "min_occurrences": null,
  "max_occurrences": null,
  "rir_field_names": null
}

Example multivalue with grid configuration:

{
  "category": "multivalue",
  "id": "line_item",
  "label": "Line Item",
  "children": {
    ...
  },
  "grid": {
    "row_types": [
      "header", "data", "footer"
    ],
    "default_row_type": "data",
    "row_types_to_extract": [
      "data"
    ]
  },
  "min_occurrences": null,
  "max_occurrences": null,
  "rir_field_names": ["line_items"]
}

Multivalue is list of datapoints or tuples of the same type. It represents a container for data with multiple occurrences (such as line items) and can contain only objects with the same id.

AttributeTypeDescriptionRequired
childrenobjectObject specifying type of children. It can contain only objects with categories tuple or datapoint.yes
min_occurrencesintegerMinimum number of occurrences of nested objects. If condition of min_occurrences is violated corresponding fields should be manually reviewed. Minimum required value for the field is 0. If not specified, it is set to 0 by default.
max_occurrencesintegerMaximum number of occurrences of nested objects. All additional rows above max_occurrences are removed by extraction process. Minimum required value for the field is 1. If not specified, it is set to 1000 by default.
gridobjectConfigure magic-grid feature properties, see below.
show_grid_by_defaultbooleanIf set to true, the magic-grid is opened instead of footer upon entering the multivalue. Default false. Applied only in UI. Useful when annotating documents for custom training.
rir_field_nameslist[string]List of names used to initialize content from the AI engine predictions. If specified, the value of the first field from the array is used, otherwise default name line_items is used. Attribute can be set only for multivalue containing objects with category tuple.no

Multivalue grid object

Multivalue grid object allows to specify a row type for each row of the grid. For data representation of actual grid data rows see Grid object description.

AttributeTypeDescriptionDefaultRequired
row_typeslist[string]List of allowed row type values.["data"]yes
default_row_typestringRow type to be used by defaultdatayes
row_types_to_extractlist[string]Types of rows to be extracted to related table["data"]yes

For example to distinguish two header types and a footer in the validation interface, following row types may be used: header, subsection_header, data and footer.

Currently, data extraction classifies every row as either data or header (additional row types may be introduced in the future). We remove rows returned by data extraction that are not in row_types list (e.g. header by default) and are on the top/bottom of the table. When they are in the middle of the table, we mark them as skipped (null).

There are three visual modes, based on row_types quantity:

  • More than two row types defined: User selects row types freely to any non-default row type. Clearing row type resets to a default row type. We support up to 6 colors to easily distinguish row types visually.
  • Two row types defined (header and default): User only marks header and skipped rows. Clearing row type resets to a default row type.
  • One row type defined: User is only able to mark row as skipped (null value in data). This is also a default behavior when no grid row types configuration is specified in the schema.

Only rows marked as one of row_types_to_extract values are transferred to a table by pressing "Read data from table" button in the Rossum UI (calling grid-to-table conversion API endpoint).

Tuple

Example tuple object:

{
  "category": "tuple",
  "id": "tax_details",
  "label": "Tax Details",
  "children": [
    ...
  ],
  "rir_field_names": [
    "tax_details"
  ]
}

Container representing tabular data with related values, such as tax details. A tuple must be nested within a multivalue object, but unlike multivalue, it may consist of objects with different ids.

AttributeTypeDescriptionRequired
childrenlist[object]Array specifying objects that belong to a given tuple. It can contain only objects with category datapoint.yes
rir_field_nameslist[string]List of names used to initialize content from the AI engine predictions. If specified, the value of the first extracted field from the array is used, otherwise, no AI engine initialization is done for the object.

Updating Schema

When project evolves, it is a common practice to enhance or change the extracted field set. This is done by updating the schema object.

By design, Rossum supports multiple schema versions at the same time. However, each document annotation is related to only one of those schemas. If the schema is updated, all related document annotations are updated accordingly. See preserving data on schema change below for limitations of schema updates.

In addition, every queue is linked to a schema, which is used for all newly imported documents.

When updating a schema, there are two possible approaches:

  • Update the schema object (PUT/PATCH). All related annotations will be updated to match current schema shape (even exported and deleted documents).
  • Create a new schema object (POST) and link it to the queue. In such case, only newly created objects will use the current schema. All previously created objects will remain in the shape of their linked schema.

Formerly, we recommended to always create a new schema object when changing the set of extracted fields. This is no longer necessary since updating of the current schema object (PUT/PATCH) can be used instead. See use-cases below if not sure which approach is appropriate.

Use case 1 - Initial setting of a schema

Use case 2 - Updating attributes of a field (label, constraints, options, etc.)

  • Situation: User is updating field, e.g. changing label, number format, constraints, enum options, hidden flag, etc.
  • Recommendation: Edit existing schema (see Use case 1).

Use case 3 - Adding new field to a schema, even for already imported documents.

  • Situation: User is extending a production schema by adding a new field. Moreover, user would like to see all annotations (to_review, postponed, exported, deleted, etc. states) in the look of the newly extended schema.
  • Recommendation: Edit existing schema (see Use case 1). Data of already created annotations will be transformed to the shape of the updated schema. New fields of annotations in to_review and postponed state that are linked to extracted fields types will be filled by AI Engine, if available. New fields for already exported or deleted annotations (also purged, exporting and failed_export) will be filled with empty or default values.

Use case 4 - Adding new field to schema, only for newly imported documents

  • Situation: User is extending a production schema by adding a new field. However, with the intention that the user does not want to see the newly added field on previously created annotations.
  • Recommendation: Create a new schema object (POST) and link it to the queue. Annotation data of previously created annotations will be preserved according to the original schema. This approach is recommended if there is an organizational need to keep different field sets before and after the schema update.

Use case 5 - Deleting schema field, even for already imported documents.

  • Situation: User is changing a production schema by removing a field that was used previously. However, user would like to see all annotations (to_review, postponed, exported, deleted, etc. states) in the look of the newly extended schema. There is no need to see the original fields in already exported annotations.
  • Recommendation: Edit existing schema (see Use case 1).

Use case 6 - Deleting schema field, only for newly imported documents

  • Situation: User is changing a production schema by removing a field that was used previously. However, with the intention that the user will still be able to see the removed fields on previously created annotations.
  • Recommendation: Create a new schema object (see Use case 4). Annotation data of previously created annotations will be preserved according to the original schema. This approach is recommended if there is an organizational need to retrieve the data in the original state.

When copying an annotation or moving it to a new queue by patching its queue attribute, the annotation in the new queue will still be associated with the old schema.

Preserving data on schema change

In order to transfer annotation field values properly during the schema update, a datapoint's category and schema_id must be preserved.

Supported operations that preserve fields values are:

  • adding a new datapoint (filled from AI Engine, if available)
  • reordering datapoints on the same level
  • moving datapoints from section to another section
  • moving datapoints to and from a tuple
  • reordering datapoints inside a tuple
  • making datapoint a multivalue (original datapoint is the only value in a new multivalue container)
  • making datapoint non-multivalue (only first datapoint value is preserved)

Extracted field types

AI engine currently automatically extracts the following fields at the all endpoint, subject to ongoing expansion.

Identifiers

Attr. rir_field_namesField labelDescription
account_numBank AccountBank account number. Whitespaces are stripped.
bank_numSort CodeSort code. Numerical code of the bank.
ibanIBANBank account number in IBAN format.
bicBIC/SWIFTBank BIC or SWIFT code.
const_symConstant SymbolStatistical code on payment order.
spec_symSpecific SymbolPayee ID on the payment order, or similar.
var_symVariable symbolIn some countries used by the supplier to match the payment received against the invoice. Possible non-numeric characters are stripped.
termsTermsPayment terms as written on the document (e.g. "45 days", "upon receipt").
payment_methodPayment methodPayment method defined on a document (e.g. 'Cheque', 'Pay order', 'Before delivery')
customer_idCustomer NumberThe number by which the customer is registered in the system of the supplier. Whitespaces are stripped.
date_dueDate DueThe due date of the invoice.
date_issueIssue DateDate of issue of the document.
date_uzpTax Point DateThe date of taxable event.
document_idDocument IdentifierDocument number. Whitespaces are stripped.
order_idOrder NumberPurchase order identification (Order Numbers not captured as "sender_order_id"). Whitespaces are stripped.
recipient_addressRecipient AddressAddress of the customer.
recipient_dicRecipient Tax NumberTax identification number of the customer. Whitespaces are stripped.
recipient_icRecipient Company IDCompany identification number of the customer. Possible non-numeric characters are stripped.
recipient_nameRecipient NameName of the customer.
recipient_vat_idRecipient VAT NumberCustomer VAT Number
recipient_delivery_nameRecipient Delivery NameName of the recipient to whom the goods will be delivered.
recipient_delivery_addressRecipient Delivery AddressAddress of the recipient where the goods will be delivered.
sender_addressSupplier AddressAddress of the supplier.
sender_dicSupplier Tax NumberTax identification number of the supplier. Whitespaces are stripped.
sender_icSupplier Company IDBusiness/organization identification number of the supplier. Possible non-numeric characters are stripped.
sender_nameSupplier NameName of the supplier.
sender_vat_idSupplier VAT NumberVAT identification number of the supplier.
sender_emailSupplier EmailEmail of the sender.
sender_order_idSupplier's Order IDInternal order ID in the suppliers system.
delivery_note_idDelivery Note IDDelivery note ID defined on the invoice.
supply_placePlace of SupplyPlace of supply (the name of the city or state where the goods will be supplied).

Starting from July 2020 field invoice_id was renamed to document_id. However, the invoice_id name will still be supported for backwards compatibility. For future, we would recommend switching to document_id in your extraction schemas.

Document attributes

Attr. rir_field_namesField labelDescription
currencyCurrencyThe currency which the invoice is to be paid in. Possible values: AED, ARS, AUD, BGN, BRL, CAD, CHF, CLP, CNY, COP, CRC, CZK, DKK, EUR, GBP, GTQ, HKD, HUF, IDR, ILS, INR, ISK, JMD, JPY, KRW, MXN, MYR, NOK, NZD, PEN, PHP, PLN, RON, RSD, SAR, SEK, SGD, THB, TRY, TWD, UAH, USD, VES, VND, ZAR or other. May be also in lowercase.
document_typeDocument TypePossible values: credit_note, debit_note, tax_invoice (most typical), proforma, receipt, delivery_note, order or other.
languageLanguageThe language which the document was written in. Values are ISO 639-3 language codes, e.g.: eng, fra, deu, zho. See Languages Supported By Rossum
payment_method_typePayment Method TypePayment method used for the transaction. Possible values: card, cash.

Starting from May 2020 the invoice_type document attribute was renamed to document_type. However, the invoice_type name will still be supported for backwards compatibility. For future, we would recommend switching to document_type in your extraction schemas.

Amounts

Attr. rir_field_namesField labelDescription
amount_dueAmount DueFinal amount including tax to be paid after deducting all discounts and advances.
amount_roundingAmount RoundingRemainder after rounding amount_total.
amount_totalTotal AmountSubtotal over all items, including tax.
amount_paidAmount paidAmount paid already.
amount_total_baseTax Base TotalBase amount for tax calculation.
amount_total_taxTax TotalTotal tax amount.

Typical relations (may depend on local laws):

amount_total = amount_total_base + amount_total_tax amount_rounding = amount_total - round(amount_total) amount_due = amount_total - amount_paid + amount_rounding

All amounts are in the main currency of the invoice (as identified in the currency response field). Amounts in other currencies are generally excluded.

Tables

At the moment, the AI engine automatically extracts 2 types of tables. In order to pick one of the possible choices, set rir_field_names attribute on multivalue.

Attr. rir_field_namesTable
tax_detailsTax details
line_itemsLine items

For backwards compatibility, the rir_field_names on multivalue are by default set to line_items. However, if any of column schema rir_field_names contain a string starting with tax_detail_ then the table is assumed to be tax_details.

Tax details

Tax details table and breakdown by tax rates.

Attr. rir_field_namesField labelDescription
tax_detail_baseTax BaseSum of tax bases for items with the same tax rate.
tax_detail_rateTax RateOne of the tax rates in the tax breakdown.
tax_detail_taxTax AmountSum of taxes for items with the same tax rate.
tax_detail_totalTax TotalTotal amount including tax for all items with the same tax rate.
tax_detail_codeTax CodeText on document describing tax code of the tax rate (e.g. 'GST', 'CGST', 'DPH', 'TVA'). If multiple tax rates belong to one tax code on the document, the tax code will be assigned only to the first tax rate. (in future such tax code will be distributed to all matching tax rates.)

Line items

AI engine currently automatically extracts line item table content and recognizes row and column types as detailed below. Invoice line items come in a wide variety of different shapes and forms. The current implementation can deal with (or learn) most layouts, with borders or not, different spacings, header rows, etc. We currently make two further assumptions:

  • The table generally follows a grid structure - that is, columns and rows may be represented as rectangle spans. In practice, this means that we may currently cut off text that overlaps from one cell to the next column. We are also not optimizing for table rows that are wrapped to multiple physical lines.
  • The table contains just a flat structure of line items, without subsection headers, nested tables, etc.

We plan to gradually remove both assumptions in the future.

Attribute rir_field_namesField labelDescription
table_column_codeItem Code/IDCan be the SKU, EAN, a custom code (string of letters/numbers) or even just the line number.
table_column_descriptionItem DescriptionLine item description. Can be multi-line with details.
table_column_quantityItem QuantityQuantity of the item.
table_column_uomItem Unit of MeasureUnit of measure of the item (kg, container, piece, gallon, ...).
table_column_rateItem RateTax rate for the line item.
table_column_taxItem TaxTax amount for the line. Rule of thumb: tax = rate * amount_base.
table_column_amount_baseAmount BaseUnit price without tax. (This is the primary unit price extracted.)
table_column_amountAmountUnit price with tax. Rule of thumb: amount = amount_base + tax.
table_column_amount_total_baseAmount Total BaseThe total amount to be paid for all the items excluding the tax. Rule of thumb: amount_total_base = amount_base * quantity.
table_column_amount_totalAmount TotalThe total amount to be paid for all the items including the tax. Rule of thumb: amount_total = amount * quantity.
table_column_otherOtherUnrecognized data type.