Define the specific data points (name, date, entity, tables, etc.) that you need to retrieve
Train a Machine Learning model on a subset of the documents (text, PDFs, articles, web pages, etc.)
Once the ML is ready and deployed, send the documents to our hosted infrastructure or process the documents locally
Retrieve a JSON/XML result file containing the extracted data points in a structured form via an API call
Behind the scene
Using a traditional supervised learning approach of machine learning, input data fed to a machine learning system has to be hand-labeled by subject-matter experts. The human-crafted labels help the machine learn to interpret and classify data, however the cost of labeling training sets has become very significant if not prohibitive in some cases, the task is extremely time-consuming for humans meaning that weeks or months have to be spent working on this, and as applications and use cases shift, training sets relevance depreciates. SuccessData instead lets a team of subject matter experts write functions that automatically assign labels to datasets.
A generative neural network then compares which labels multiple functions generate for the same data, resulting in probabilities being assigned as to which labels may be true. That data and its probabilistic labels are then used to train a predictive model, instead of using hand-labeled data. The approach is known as “weak supervision” in contrast to more traditional supervised machine learning techniques.
SuccessData allows you to control where your data is processed – whether locally using containers so your data stays private, or on the cloud for publicly available data if this is more convenient