plugin supports Optical Character Recognition (OCR) of documents stored in a QBO3 system. There are three methods supported:
Attachment/OcrToDescription: OCR text is saved to Attachment.Description
Attachment/OcrToAttachment: OCR text is saved as a new Attachment
Ocr To Description
Storing the text of a document to the Attachment.Description column of the database enables SQL queries against the text of the document. To fully leverage complex searching, full text indexing of the Attachment.Description field is recommended. Large volume full-text searching can be measurable performance drain on the database server, so use this feature judiciously.
Ocr To Attachment
Storing the text of a document as another Attachment in QBO3 allows leveraging third party document searching engines, such as Amazon CloudWatch, or internal corporate search appliances. It is more complicated to fully configure than
OcrToDescription because you must also orchestrate delivery of the text documents to a third party document searching engine, but it's scalable horizontally.
- This plugin leverage's Google Documents' OCR functionality. The document is not persisted in the Google cloud; it is transmitted, OCRed, downloaded, and deleted, all with encryption over the wire.
- For full functionality, install the
qbo.Attachment.ABCPDF plugin as well to facilitate:
- converting non-PDF to PDF files prior to OCR, and
- breaking large PDFs into smaller chunks to OCR documents larger than 2MB
Configurable application settings (
X509CertificatePath: path to a Google-provided X12 cert for making Google API calls (defaults to a Quandis account)
X509CertificatePassword: password for the Google-provided X12 certificate
ServiceAccountEmail: email account of the Google service account used to access the Google Drive API
GoogleApplicationName: Google project name authorized to access the Drive API
SubscriptionPrefix: ObjectSubscription prefix when creating SubscriberID records for an attachment uploaded to Google Drive
DeleteAfterOcr: if true, any OCRed documents will be deleted as soon as their text is retrieved
OcrChunkThreshold: size, in bytes, that triggers calling
Attachment/SplitPdf, to OCR in chunks
OcrThrowOnError: if true, any errors on any chunks will result in raising an error. If false, an
Attachment.OnRuleWarning event will be raised instead
OcrRetryAttempts: number of retry attempts for a single chunk. Used to handle 'spurious' network errors.
OcrThrowError == false, this text will be injected into the text stream in place of a failed chunk