Machine Learning Concept Drift – What is it and Five Steps to Deal With it

Concept drift is a major consideration for ensuring the long-term accuracy of machine learning algorithms. Concept drift is a specific type of model drift, and can be understood as changes in the relationship between the input and target output. The properties of the target variables may evolve and change over time. As the model has been trained on static training data, this evolution can negatively impact the accuracy of the model.  

The effectiveness of learning models can degrade over time because of concept drift, gradually or even suddenly becoming inaccurate. The model may continue to work perfectly with the datasets it was trained on, but new data with a shifting relationship may cause accuracy to drift over time. Instead of the model just breaking or failing because of corrupted data, concept drift is an issue that is harder to detect. 

Like all types of model drift, concept drift can be a barrier to the successful application of machine learning algorithms in the real world. The relationship between the dataset and targets can evolve beyond the trends learned in the training phase. For example, wider unknown trends affecting customer behaviour can impact the accuracy of predictive machine learning models. The concept of what constitutes normal customer behaviour may have changed over time, resulting in an inaccurate predictive model. 

It’s important to understand the idea of concept drift so that processes can be put in place to detect and mitigate it. This guide explores the topic of concept drift in machine learning, what causes it, and the steps needed to proactively deal with the issue.  

What is concept drift in machine learning?

Concept drift in machine learning is when the relationship between the input and target changes over time. Generally, this could be an unforeseen change in the relationship between input and output data over time. It usually occurs when real-world environments change in contrast to the training data the model learned from. For example, the behavior of customers can change over time, lowering the accuracy of a model trained on historic customer datasets. The model may no longer be suitable to accurately predict customer trends because the definition of good customer behavior may be out of date. 

Concept drift is a specific type of drift which impacts machine learning models. Data drift is another type of drift, but this is caused by unforeseen changes in the input data. Models trained from historic datasets may become less accurate and effective over time as the underlying relationships between the data shifts and evolves. Concept drift is a major challenge for machine learning deployment and development, as in some cases the model is at risk of becoming completely obsolete over time. 

Different types of machine learning will have different training methods. But the most common processes will see the training of machine learning algorithms on historic training data. Models will be trained to recognise the relationship between input and output training data, to be applied to real-world datasets. The assumption is that the relationship between input and output data will remain the same, so the algorithm will stay accurate. But in many cases changes in live data can make models trained on historic data less accurate and effective.  

What causes concept drift in machine learning?

Concept drift is caused by changing relationships between input and output data. The properties of the target variables may have shifted between the static training data and real-world dynamic data. Machine learning models are usually built from training datasets in local or offline environments. Once deployed to the real world, the relationships between input and output data can shift and change dynamically. This means emerging trends might not be correctly understood by the model, impacting its effectiveness.  

For example, a model that predicts or maps marketing campaign success may not take into account wider economic issues which may be having a serious impact on customer behaviour. When data drift occurs, the model will no longer reach the accuracy as realised in the training process. 

The rules and patterns recognized by the model may become obsolete as the environment changes. As the real-world environment of live data shifts from the training environment, the model will become less effective. Concept shift should be planned for in predictive and forecasting models and actively monitored. Models can then be regularly retrained to keep abreast of evolving and changing data. 

How to detect concept drift in machine learning

Identifying concept drift is the process of detecting changes in the relationships within datasets. This could be steady change over time, periodic or recurring changes as with seasonal data or sudden sweeping changes, COVID-19 lockdowns significantly changed customer behavior as an example.

As a result of this changing environment, the accuracy of predictive models would have been impacted. Beyond huge external changes, concept drift can occur more gradually as customers are influenced by wider economic issues or trends which aren’t mapped in the training data. 

Organizations should focus on the ongoing monitoring of the accuracy and confidence in a model. If the accuracy shifts over time then concept drift is likely to be occurring. The challenge is identifying the systematic change in data against natural fluctuations and trends.  

The methods for detecting concept drift in machine learning generally include: 

  • Ongoing monitoring of the accuracy and performance of the machine learning model to understand whether performance is deteriorating over time. 
  • Monitoring the average confidence score of a machine learning model’s predictions over time. This is used specifically for models that classify data such as images or text. If the average confidence score changes over time, concept drift could be occurring. 

Five steps to deal with concept drift 

Concept drift is a challenge when deploying machine learning models, and needs to be addressed to ensure models stay accurate and reliable. Concept drift will negatively impact the value a machine learning model brings to an organization. There are a few steps organizations can take to detect, monitor and deal with concept drift so that machine learning models are safeguarded. Certain steps may only apply to specific types of machine learning models or tasks. 

The five steps for dealing with concept drift include: 

  1. Setting up a process for concept drift detection. 
  2. Maintaining a static model as a baseline for comparison. 
  3. Regularly retraining and updating the model. 
  4. Weighting the importance of new data. 
  5. Creating new models to solve sudden or recurring concept drift. 

Setting up a process for concept drift detection

The first step is to set up processes for monitoring and detecting concept drift. Measuring the ongoing accuracy of a model is key to achieving long term performance. Organizations can achieve this by maintaining labelled testing datasets or samples which have been curated by a team member. Any drop in performance over time that isn’t related to the quality of data may flag concept drift. 

Maintaining a static model as a baseline for comparison

It can be difficult to detect concept drift and understand if a model has become less accurate over time. A static model can be used as a baseline to understand any changes in model accuracy. It’s valuable to have a baseline model to measure the success of any changes you make to combat concept drift. A baseline static model can be used to measure the ongoing accuracy of amended models after each intervention. 

Regularly retraining and updating the model

A static machine learning algorithm is much more likely to experience concept drift. Generally trained in an offline or local environment, a static model won’t adapt to changing environments or scenarios. For models that deal with forecasting or predictions, a static algorithm developed on historic data can become inaccurate over time. Models deemed at risk from concept drift should be regularly retrained and updated to keep in line with evolving datasets and live environments. 

Where possible, the static model can regularly be updated and trained with samples of new training data. This fine-tunes the model and lowers the risk of it becoming obsolete over time. Retraining should occur regularly to reflect new and emerging trends between input and output data. The frequency of the required update can be set by regularly assessing the accuracy of the machine learning model. For example, retraining might be required monthly, quarterly or every six months to maintain accuracy. 

Weighting the importance of new data

When developing some models data scientists can set the relative importance of different input data. New data can be recognized as of higher importance than older data by weighting input data by relative age. This will emphasize the importance of new data within the algorithm, adding less weight to historic data which may be out-of-date. If concept shift is occurring, focusing on newer data should mean the algorithm can adapt and stay accurate. However, this is not without risk as overweighting new data can severely impact model performance.  

Creating new models to solve sudden or recurring concept drift

In some circumstances, sudden concept drift may occur from global events or changes. In these cases, models trained on historic data will become less reliable as behavior changes. Keeping with our same example, changes in customer behavior during the COVID-19 pandemic and the lockdowns experienced across the globe. New models can be adapted from existing models to deal with new trends within these periods of change.  

Concept drift detection with Seldon

Seldon moves machine learning from POC to production to scale, reducing time-to-value so models can get to work up to 85% quicker. In this rapidly changing environment, Seldon can give you the edge you need to supercharge your performance. 

Seldon’s Alibi Detect is a priority for any team looking for better drift detection processes to notify the your team when models need to be retrained. 

Talk to our team about machine learning solutions today and how to deploy machine learning models at scale –>

Contents