OpenAPI SpecificationJSON

Import and Export

Documents may be imported into Rossum using the REST API and email gateway. Supported file formats are PDF, PNG, JPEG, TIFF, XLSX/XLS, DOCX/DOC and HTML. Maximum supported file size is 40 MB (this limit applies also to the uncompressed size of the files within a .zip archive).

In order to get the best results from Rossum the documents should be in A4 format of at least 150 DPI (in case of scans/photos). Read more about import recommendations.

HTML File Sanitization: HTML files uploaded via upload endpoints, document creation API, or email import are automatically sanitized for security purposes. Some elements, attributes and styles are removed. The sanitization process can break HTML validity.

Importing non-standard MIME types

Support for additional MIME types may be added by handling upload.created webhook event. With this setup, user is able to pre-process uploaded files (e.g. XML or JSON formats) into a format that Rossum understands. Those usually need to be enabled on queue level first (by adding appropriate mimetype to accepted_mime_types in queue settings attributes).

List of enabled MIME types:

  • application/EDI-X12
  • application/EDIFACT
  • application/json
  • application/msword
  • application/pdf
  • application/pgp-encrypted
  • application/vnd.ms-excel
  • application/vnd.ms-outlook
  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
  • application/vnd.openxmlformats-officedocument.wordprocessingml.document
  • application/xml
  • application/zip
  • image/*
  • message/rfc822
  • text/csv
  • text/html (automatically sanitized - see note above)
  • text/plain
  • text/xml

If you find your document MIME types not supported please contact Rossum support team for more information.

Upload API

You can upload a document to the queue using upload endpoint with one or more files to be uploaded. You can also specify additional field values in upload endpoint, e.g. your internal document id. As soon as a document is uploaded, data extraction is started.

Upload endpoint supports basic authentication to enable easy integration with third-party systems.

Import by Email

It is also possible to send documents by email using a properly configured inbox that is associated with a queue. Users then only need to know the email address to forward emails to.

For every incoming email, Rossum extracts PDF documents, images and zip files, stores them in the queue and starts data extraction process.

The size limit for incoming emails is 50 MB (the raw email message with base64 encoded attachments).

All the files from the root of the archive are extracted. In case the root only contains one directory (and no other files) the whole directory is extracted. The zip files and all extracted files must be allowed in accepted_mime_types (see queue settings) and must pass inbox filtering rules (see document rejection conditions on inbox object) in order for annotations to be created.

Invalid characters in attachment file names (e.g. /) are replaced with underscores.

Small images (up to 100x100 pixels) are ignored, see inbox for reference.

You can use selected email header data (e.g. Subject) to initialize additional field values, see rir_field_names attribute description for details.

Zip attachment limits:

  • the uncompressed size of the files within a .zip archive may not exceed 40 MB
  • only archives containing less than 1000 files are processed
  • only files in the root of the archive are processed (or files inside a first level directory if it's the only one there)

Export

In order to export extracted and confirmed data you can call export endpoint. You can specify status, time-range filters and annotation ID list to limit returned results.

Export endpoint supports basic authentication to enable easy integration with third-party systems.

Auto-split of document

It is possible to process a single PDF file that contains several invoices. Just insert a special separator page between the documents. You can print this page and insert it between documents while scanning.

Rossum will recognize a QR code on the page and split the PDF into individual documents automatically. Produced documents are imported to the queue, while the original document is set to a split state.