Use Case Financial
Development and industrialization of the PyPF library
The AI Products team of a leading player in the banking sector aims to strengthen its Data Science development capabilities to accelerate the production deployment of its AI products in alignment with its roadmap. In this context, the mission focuses on developing and evolving the Python library PyPF, designed to support Data Scientists by standardizing best practices, optimizing the development and industrialization of Machine Learning models, automating documentation, and ensuring the reproducibility of experiments.
Challenges
The objective is to enhance the productivity and scalability of Data Science projects by structuring a robust and high-performing package.
Code base evolution
Structure and enhance the PyPF library to improve its modularity and performance, ensuring compatibility with best practices in software development and Data Science.
Industrialization
Adapt the package to meet the technical and security constraints of the ecosystem. Optimize the management of ML pipelines for better reproducibility.
Performance improvement
Integrate advanced model optimization techniques (Hyperopt, GridSearch, AutoML). Standardize preprocessing and feature engineering processes.
Documentation
Implement auto-documentation to improve transparency and maintenance.
Solution
Redesigned and improved the existing package: modularization, optimization, and structuring of the code. Implemented new Machine Learning algorithms to enrich the library. Optimized and tuned models using advanced techniques (Hyperopt, Bayesian Optimization). Standardized preprocessing and feature engineering to homogenize workflows.
Integrated best development practices (unit tests, CI/CD, automated documentation).
Optimized performance to ensure scalability and computational efficiency.
Wrote technical and functional guides to simplify adoption and future evolution of the package.
Conducted training sessions and knowledge transfer to Data Science teams.
Tech stack
Results
The development of an optimized and scalable library has significantly accelerated the creation of ML models. By standardizing Data Science workflows and aligning them with best practices, the team has been able to streamline processes, reducing development time while simultaneously enhancing model performance. The automation of critical tasks has led to immediate productivity gains, allowing the team to focus on more impactful work. Furthermore, enriched and auto-generated documentation has made it easier for new team members to onboard and for the library to be maintained, ensuring long-term efficiency and usability.
+40%
Optimization of model execution times.
100%
ML pipelines industrialized with automated documentation.
