Recognizing Program Terms in PDF Documents

The solution can definitions in PDF files and highlight their exact location in the text

Project Overview

The main challenge for our team was to develop a system that detects all terms with their definitions in PDF files. The program can recognize the font style (italic, bold or underline) and the boundaries of the term, separating it from the next sentence and indicating its exact location in the document. The most difficult part was to recognize the distribution of text in tables since standard algorithms cannot cope with this. Nevertheless, our team managed to find the optimal solution, and the customer received a useful tool that saves time searching for the necessary data in documents.

Technologien

What Was Done

Our team has developed a system that discovers term definitions in Pdf files using artificial intelligence, machine learning and CRISP-DM (Cross-Industry Standard Process for Data Mining) methodology. The most difficult task was to recognize the distribution of text in tables, as standard algorithms cannot cope with it. Nevertheless, our team managed to find the optimal solution, and the customer received a useful tool that saves time for searching the necessary data in the documents.

Im Falle einer bestimmten Anfrage oder Technologie, die hier nicht aufgeführt ist, können Sie das mit einem Experten unter contact@innowise-group.com or Skype Innowise

SIE BRAUCHEN EINE TECHNISCHE LÖSUNG?

Kontakt