Use Case Utilities
Rebuilding the quotation process
Our client in France, is a real estate investor with numerous construction projects. For each construction site, they issue calls for various artisans or construction providers. Each artisan or construction provider submits a quotation (devis) for specific work.
The client aims to extract information from the different quotations provided by various artisans and construction providers. They want to build a structured database based on a unique reference file. This will enable them to conduct precise and rapid analyses of prices and expedite the selection process for the final artisan or construction provider.
Challenges
Extracting structured data from various quotations is challenging due to inconsistencies in format, terminology, and document structure. Standardizing this process is crucial for accurate price analysis and efficient provider selection.
Lack of a Standardized Dictionary
The client and artisans do not have a standard dictionary of items, making it challenging to find matches between the client’s reference and the proposed quotations (devis) from the artisans.
Unstructured PDF Formats
The PDF quotations are in scanned format, with many tables and cells merged together in an unstructured manner.
Undefined Sub-Item Structure
The list of sub-items for each reference is not predefined.
Solution
Utilize Tesseract to perform OCR on scanned PDF documents, converting them into readable text.
Text Processing: Process the retrieved text data with LangChain Text Loader.
Enrich the reference documents with additional data through prompt engineering.
Pass the enriched reference queries and the processed quotations to the Retrieval-Augmented Generation (RAG) system.
Apply business logic to the returned responses for final processing.
Tech stack
Results
The pipeline has been working as expected. The results are split into two groups :
100%
Successfully retrieved items
Partly
Correctly retrieved items, with some expected items missing and some unnecessary items added.
