Business Results

  • Automate document routing and reduce manual labor costs

  • 96% reduction in training time

  • 60% faster inferencing time

  • 46% faster data preprocessing

  • 65% accuracy of prediction

author-image

作者

View All Reference Kits

Background

Enterprises use intelligent document analysis (IDA) to examine documents (such as policies, contracts, and legal agreements) for specific terms, and then identify those documents that may pose a risk to the business. IDA can also identify a particular document (such as legal, finance, or marketing) so that it can be categorized and routed to an appropriate department.

Paper-based documents still account for 46% of all records, which represents substantial costs to public sector organizations. An average government agency receives and manually routes approximately 3.5 million documents annually. Manual routing takes seven to ten minutes per document to read the letter or document before routing it. This manual process is time-consuming and costly.

The majority of documents managed by intelligent document processing (IDP) solutions are structured or semi-structured, leaving a significant portion of unstructured documents unmanaged. AI can make automated processing and categorizing of documents—structured, semi-structured, and unstructured—more cost-effective.

Solution

Term frequency-inverse document frequency (TF-IDF) was used to measure and quantify the importance or relevance of string representations in the documents. A support vector classification (SVC) model was trained to categorize the documents. The publicly available dataset used in the training contained about 200K topic-related documents obtained from HuffPost*. Dataset text was cleaned using stop word removal, stemming, and tokenization. The supervised training model classifies the document based on the headline into 42 predetermined categories, such as entertainment or politics.

The data ingest and text processing was optimized using Intel® Distribution of Modin* and processed 46% faster than stock Modin. Training and inferencing of the SVC model were optimized using Intel® Extension for Scikit-learn*. The optimizations improved training time by 96% and inferencing time by 60%. Reviewing and sorting the documents had an accuracy of 65%. Intel Distribution of Modin and Intel Extension for Scikit-learn are part of Intel’s end-to-end AI software portfolio of tools and framework optimizations that are powered by oneAPI.

Technology

Optimized with Intel oneAPI for Better Performance
 

Benefits

Data scientists can build a better IDP solution to address the semi-structured and unstructured documents. The time saved in training and inference allows data scientists to put more AI models into production.

Government organizations can automate the processing and categorization of more incoming semi-structured and unstructured documents and realize cost savings.

Benefits include:

  • Less time needed to build the machine learning pipeline with an instruction set from data ingest to model development to deployment
  • Compute savings from faster data preprocessing, model training, and inferencing time using oneAPI optimizations from Intel
  • Optimized performance using your compute of choice (such as CPU, GPU, or FPGA) with oneAPI interoperability across hardware architectures

Download Kit

References

IDC Survey Spotlight: What Types of Documents Are Organizations Managing with Intelligent Document Processing (IDP) Solutions, April 2021 (Available by paid subscription only.)

News Category Dataset, Kaggle, Inc. Licensed under Creative Commons 1.0 Universal (CC0 1.0) Public Domain Dedication

 

 

Stay Up to Date on AI Workload Optimizations

Sign up to receive hand-curated technical articles, tutorials, developer tools, training opportunities, and more to help you accelerate and optimize your end-to-end AI and data science workflows.

Take a chance and subscribe. You can change your mind at any time.

通过提交此表单,您确认您已年满 18 周岁,并同意就执行此业务请求与英特尔分享您的个人信息。英特尔的网站和通讯受隐私声明使用条款的制约
通过提交此表单,您确认您已年满 18 周岁,并同意就执行此业务请求与英特尔分享您的个人信息。此外,您还同意通过电子邮件和电话订阅来随时了解最新英特尔技术和行业趋势。您可以随时取消订阅。英特尔的网站和通讯受隐私声明使用条款的制约