Intelligent Document Processing
Intelligent Document Processing (IDP) refers to the application of cognitive techniques based on Artificial Intelligence and Machine Learning in general to complement more traditional Robotic Process Automation (RPA). Those techniques provide automation capabilities that go beyond the more simple, routine and stable processes currently streamlined by RPA solutions today and create genuine additional business value for clients. SuccessData is at the forefront of innovation in the IDP space.
Extract complex relationships
In contrast to more traditional approaches focused on text only, SuccessData understands relations conveyed jointly via textual, structural, tabular, and even visual expressions by using new deep-learning techniques to automatically capture the representation (in other words the features) needed to learn how to extract those relationships from richly formatted data. We turn domain expertise and document understanding based on multiple modalities of information, first into meaningful signals of supervision, and then finally into predictive extraction results.
SuccessData changes the paradigm from labeling by hand to labeling automatically making AI more broadly practical. We use programmatic supervision to build training sets using heuristic functions which completely mitigates the key pain point for most ML implementations as we need up to 100x less training data than other traditional supervised machine learning solutions. This approach allows a fundamentally faster, more flexible, and much higher quality end-to-end ML development and deployment process.
Get more than raw data
SuccessData’s unique model retrieves not only predefined data points but also contextual information on the data extracted such as where it was found in the original document and a confidence level for each data point extracted.
Integrate your reference data
SuccessData exposes a set of APIs to facilitate the integration of your own reference data so that the output data can be enriched, cross-referenced and/or reconciled.
How does SuccessData create a new extraction model?
Define the specific data points (name, date, entity, tables, etc.) that you need to retrieve
Train a Machine Learning model on a subset of the documents (text, PDFs, articles, web pages, etc.)
Once the ML is ready and deployed, send the documents to our hosted infrastructure or process the documents locally
Retrieve a JSON/XML result file containing the extracted data points in a structured form via an API call
Behind the scene
Using a traditional supervised learning approach of machine learning, input data fed to a machine learning system has to be hand-labeled by subject-matter experts. The human-crafted labels help the machine learn to interpret and classify data, however the cost of labeling training sets has become very significant if not prohibitive in some cases, the task is extremely time-consuming for humans meaning that weeks or months have to be spent working on this, and as applications and use cases shift, training sets relevance depreciates. SuccessData instead lets a team of subject matter experts write functions that automatically assign labels to datasets.
A generative neural network then compares which labels multiple functions generate for the same data, resulting in probabilities being assigned as to which labels may be true. That data and its probabilistic labels are then used to train a predictive model, instead of using hand-labeled data. The approach is known as “weak supervision” in contrast to more traditional supervised machine learning techniques.