Document Classification

posted Jul 13, 2017, 1:57 PM by Eric Patrick
The qbo.Attachment.Quandis plugins now includes a Classification IService plugin to field document classification. This combines OCR, text parsing via a score, and a matrix to classify documents.  From our website:

QBO includes an interface for Optical Character Recognition (OCR), which will extract text from common document formats (including Word, PDF, TIFF, JPEG, and more), making the text searchable. Rather than invent our own technology, we leverage third party software that excels at OCR; our default plugin leverages Google’s OCR engine. If you have an existing OCR provider that provides an API, Quandis’ Document module can leverage a plugin to use it instead of Google.

Once the text of a document is available in the QBO database, power users can extract data from the text using a simple Text Parsing plugin. Examples of values that can be extracted include:
  • Money: extract a loan amount from a text pattern like:
    “I promise to pay U.S. ${Value} (this amount is called ‘Principle’)”
  • Percentages: extract an interest rate from a text pattern like:
    “I will pay interest at a yearly rate of {Value}%.”
  • Dates: extract dates from a text pattern like:
    “Payments start on {Value}, with late charges starting”
  • Strings: extract names from a text pattern like:
    “Mortgage is issued to {Value}, known as the mortgagor.”
Next, documents can be classified with the combination of OCR, the text parsing engine, and a set of rules (matrix) to set the document type, template, status, or other values. For example, in the image below, a simple text parsing model looks for text specific to loan modifications, subprime mortgage, and mortgage notes, and then leverages a weighted rule set to determine the document type.

This enables you to purchase a portfolio of loans, drop the collateral documents into QBO, and having a workflow OCR, parses, and classify each document received. Processors would be assigned to review only documents classified as ‘unknown’, instead of putting eyeballs on each and every document, and loans missing key documents can automatically be flagged for follow-up.

Check out the demo video on our website as well.