Use Case Retail
Fraud Engine
With the large volume of transactions occurring daily in e-commerce, it is necessary to monitor and validate all of them to ensure their integrity.
The primary goal is to be able to perform decisions internally rather than depending on third party companies, were we have ‘black box’ behaviours, thus, reducing to a maximun the volume of sensitive data sent outside and multiplying the ease to take business decisions.
The Fraud Engine system uses a combination of machine learning models that aim to perform fraud decisions on each of the e-commerce transactions to rule in favor or against them.
Challenges
Implementing an effective fraud detection engine comes with several challenges, from ensuring model accuracy to maintaining real-time performance. To achieve a reliable and independent decision-making system, multiple obstacles must be addressed.
Obtain accurate models
With training datasets of very high imbalance (<0.5% positives), making False Positives tend to zero.
Achieve similar performace to third party companies
Having substantially poorer datasets (for example, we cannot use credit card data, we don’t have information about the rest of the market…).
Reduce model complexity
As we need to perform inference in real-time.
Solution
A combination of machine learning models and rule-based models have been implemented due fast inference times needed. One of the models is in charge of determining the riskiness of the user account, while the order is in charge of assesing how likely is the transaction fraudulent. A Neo4j graph database has been generated, with a custom data model, and is fed with each new transaction arrival. A set of features for the models are obtained by extracting key information in the graph database.
Tech stack
Results
Deep understanding of suspicious behaviours
User segmentation groups based on fraudulent activity features
Reduction of sensitive data traffic outside the company
Minimum False Positive rate
+200k
Annual profit
<1%
False positive rate
>50%
Reduction on outgoing data volume
