Use Case Utilities

Rebuilding the quotation process

Our client in France, is a real estate investor with numerous construction projects. For each construction site, they issue calls for various artisans or construction providers. Each artisan or construction provider submits a quotation (devis) for specific work.

The client aims to extract information from the different quotations provided by various artisans and construction providers. They want to build a structured database based on a unique reference file. This will enable them to conduct precise and rapid analyses of prices and expedite the selection process for the final artisan or construction provider.

Challenges

Extracting structured data from various quotations is challenging due to inconsistencies in format, terminology, and document structure. Standardizing this process is crucial for accurate price analysis and efficient provider selection.

Lack of a Standardized Dictionary

The client and artisans do not have a standard dictionary of items, making it challenging to find matches between the client’s reference and the proposed quotations (devis) from the artisans.

Unstructured PDF Formats

The PDF quotations are in scanned format, with many tables and cells merged together in an unstructured manner.

Undefined Sub-Item Structure

The list of sub-items for each reference is not predefined.

Solution

Utilize Tesseract to perform OCR on scanned PDF documents, converting them into readable text.

Text Processing: Process the retrieved text data with LangChain Text Loader.

Enrich the reference documents with additional data through prompt engineering.

Pass the enriched reference queries and the processed quotations to the Retrieval-Augmented Generation (RAG) system.

Apply business logic to the returned responses for final processing.

Tech stack

Results

The pipeline has been working as expected. The results are split into two groups :

100%

Successfully retrieved items

Partly

Correctly retrieved items, with some expected items missing and some unnecessary items added.

Let’s stay in touch !