Use Case Financial
Unsupervised fraud detection
Financial institutions often face an overwhelming number of cases that require manual review to determine potential fraudulent activity. Traditionally, the identification of possibly fraudulent clients relies on basic business rules, which can be heavily influenced by human factors. Our project addresses this challenge by reorganizing transactional and sociodemographic customer data to model behavior using unsupervised algorithms. The approach involves two main phases: clustering followed by anomaly detection.
Challenges
This project focuses on improving fraud detection by aggregating customer data, predicting fraudulent cases based on business criteria, and creating a focused list of the most anomalous transactions across different categories like international transfers, income, and expenses.
Customer data aggregation
Agreggate all the information the bank has of their customers.
Fraud detection modeling
Model and predict fraudulent cases validated by business criteria.
Identification of anomalous transactions
Build a reduced list of the most anomalous cases by several transactional axis (international transferences, incomes, expenses…).
Solution
We generate around multiple indicators for each customer, covering both transactional data (e.g., total income, transfers to risk areas) and sociodemographic data (e.g., number of accounts or products, age, sector…).
These indicators are log-transformed and scaled for normalization. Unsupervised clustering models are used to segment customers with custom metrics ensuring cluster stability and determining the optimal number of clusters.
Anomaly detection models are then applied to identify outliers within each segment. The results are validated with both business feedback and SHAP values on synthetic data are used to ensure model transparency.
Tech stack
Results
This approach offers a comprehensive view of customer behavior, enabling a deeper understanding of interactions and trends across various touchpoints. It introduces a new statistical methodology specifically designed for fraud detection, allowing for more accurate identification of suspicious activities based on data patterns. Additionally, it presents an innovative way to detect misclassified clients, helping to refine customer categorization and improve targeted strategies. Together, these advancements contribute to more effective risk management and better customer segmentation.
