Imagine running a bank and suddenly spotting unusual credit card transactions. Fortunately, you catch it in time, even with the help of an employee. Initially, you might have only a few customers, but later, it could be millions or even billions. In the past, people dealt with this using traditional rules, but fraudsters have evolved. Relying on those old methods is risky. What’s needed is a robust system that swiftly adapts to new fraud tricks. In the realm of Machine Learning, we call this Online Incremental Learning—a method to detect these tricky transactions. This article provides a demonstration of Online Incremental Learning for Fraud Detection, showcasing its importance in staying ahead of evolving fraud patterns. In simple words, this is a process to continuously update the model with new knowledge as a new transaction arrives, much like how we humans learn new things every day. To explain with an example, consider this as a new chef who is learning to cook while adding ingredients to a simmering pot, constantly tasting and adjusting. In contrast to traditional ML (Offline Learning) where the model is trained with the entire data at once, Online Incremental Learning continuously ingests real-time one transaction of the customer at a time. The answer to this question is simple: change is constant. In fields like fraud detection, patterns change rapidly. Traditional learning models require retraining with the entire dataset, which is time-consuming and impractical. Online Incremental Learning steps in as a game-changer, learning from new data on the fly. The key idea is that the model can be adapted to newer frauds without forgetting the learning from previous frauds. This ensures the creation of a robust system on the go while detecting fraudulent transactions. Benefit and Limitations are explored in the subsequent sections. River is a Python library designed for Online Incremental machine learning, providing support for various machine learning tasks such as regression, classification, and unsupervised learning. Additionally, it is versatile enough to handle ad-hoc tasks like calculating online metrics and detecting concept drift. Each tool within the library is capable of being updated with just one observation at a time, making it suitable for processing streaming data. Depending on your specific use case, this approach may offer greater convenience compared to utilizing a batch model. Let’s see how the process can be broken down into steps with an example. In production, use an event streaming platform like Kafka for high-throughput transaction data. This facilitates real-time data integration into online incremental learning. In this blog, we’ve simulated event streaming with a focused sample of 50,000 transactions. Before diving into the model construction, it’s crucial to address a few aspects: Data Pre-processing: In this step, traditional ML processes are adapted for Online Incremental Learning using River’s on-the-fly adaptable pre-processing functions. Incremental/Online Learning: Continuous training of the model on data from the Data Stream is essential. For anomaly detection, we opt for HalfSpaceTrees in the River Library, similar to Isolation Forest but trainable incrementally for ongoing adaptability against new fraudulent activities. Metrics: Employ the ROCAUC metric from River’s metrics module to evaluate the incremental learning model’s performance. Integrated into the pipeline, it ensures continuous monitoring and refinement over time. The components mentioned above can be composed as demonstrated below: This example we used is pretty basic. To make it work even better, you can do a few more things like adjusting the data balance with Under/Over Sampling and fine-tuning the model’s settings, which we call hyperparameters. These extra steps can make a big difference in how well the model performs by handling imbalances in the data and making sure the model is set up just right. It shows that Online Incremental Learning can be improved and customized based on the unique features of the data you’re working with. The code simulates a Kafka-like data stream by looping over a streaming dataset. Within the loop, the model predicts anomaly scores with score_one and learns from each sample using learn_one. The ROCAUC metric is updated with true labels and predicted scores using the update method. This code exemplifies an online incremental learning approach in a streaming data setting. The above plot visually demonstrates the improvement of the ROCAUC metric over time through online incremental learning. Starting at 0, the metric steadily increases, reaching a final value of 0.95. This visual evidence highlights the model’s effective adaptation and enhanced anomaly detection capabilities as it learns from incoming data. Let’s now dive into the intriguing benefits and limitations of Online Incremental Learning. It is important to understand this as some negative aspects might be more relevant than the positive aspects of considering this method As a bonus, let’s discuss some advanced strategies that can help to deal with real-world problems. In conclusion, this article highlights the pivotal role of Online Incremental Learning in real-time fraud detection, addressing the challenges posed by evolving patterns in large datasets. Demonstrated through the implementation of the River Python library, specifically designed for Online Incremental Learning, the model showcased continuous learning and adaptation to streaming data. Utilizing the HalfSpaceTrees model and the ROCAUC metric, the analysis section visually portrayed a substantial improvement over time. While Online Incremental Learning offers benefits such as adaptability, efficiency, and scalability, it requires careful consideration of data quality and the complexity of maintaining model stability. Additionally, advanced strategies for handling concept drift and online feature selection were discussed. Overall, Online Incremental Learning emerges as a crucial tool for real-time fraud detection, ensuring the adaptability and effectiveness of anomaly detection systems in dynamic environments. Subscribe to our Newsletters and Stay tuned for more interesting topics.Online Incremental Learning

Need for Online Incremental Learning
How does it work?
Online Incremental Learning for Fraud Detection using RiverML
Install library
pip install riverImport Libraries
from river import metrics
from river import preprocessing
from river import datasets
from river import anomaly
from river import composeInitialize Data Source
streaming_dataset = datasets.CreditCard().take(50_000)Building a Model
model = compose.Pipeline(
preprocessing.MinMaxScaler(), #Data Preprocessing learning on the fly
anomaly.HalfSpaceTrees() #Online Incremental ML model
)
auc = metrics.ROCAUC()
auc_plt = [] #to track and plot the ROCAUCModel Prediction and Learning
for i, (x, y) in enumerate(streaming_dataset): #simulating the Kafka like Stream
score = model.score_one(x) #predict
model.learn_one(x) #learn from a sample
auc.update(y, score) #update the metric
auc_plt.append(auc.get()) #to track and plot the ROCAUCAnalyzing Metrics

Benefits and Limitations
Benefits
Limitations
Slight Help with Advanced Strategy
Conclusion
Contact Us Today!
