Unveiling Model Insights with Evidently AI

Harness the power of Evidently AI for advanced model evaluation and monitoring

In this blog, we will cover the below topics

  • Data and model drift, tracking, and its importance
  • Need for measuring model drift
  • Implications of model drift
  • Detecting and addressing model drift using python
  • Introduction to Evidently AI
  • Features of Evidently AI
  • Exploring Model Insights with Evidently AI
  • Pros and cons of tracking model drift

What Is Model And Data Drift?

Model drift refers to the degradation of a machine learning model’s performance over time due to changes in the underlying data distribution. It occurs when the data used for training the model no longer accurately represent the real-world data, leading to deteriorating predictions. The model drift is a critical aspect of maintaining accurate and reliable machine-learning models over time. By measuring and monitoring model drift using Python and appropriate libraries, data scientists can proactively identify and address performance degradation. The ability to detect this shift and adapt models to evolving data distributions is vital for ensuring the continued validity and effectiveness of models in real-world scenarios. Embrace model drift measurement as a crucial practice to maintain high-performing models and make informed data-driven decisions. The other type of drift is Data drift which is also referred to as covariate shift, arises when the distribution of input data undergoes unforeseen changes over a period of time.

Need For Measuring Drift

  • Performance Monitoring: Monitoring model drift allows us to identify when a model’s performance starts to decline, ensuring that predictions remain accurate and reliable.
  • Early Detection of Concept Shifts: Detecting concept shifts helps identify changes in the underlying data distribution, enabling timely model updates or retraining to maintain optimal performance.
  • Compliance and Regulatory Requirements: In regulated industries, such as finance and healthcare, tracking model drift is essential to ensure models remain compliant and deliver reliable predictions.

Implications Of Drift

  • Business Impact: Model drift can lead to inaccurate predictions, potentially impacting critical business decisions, customer satisfaction, and financial outcomes.
  • Data Quality Monitoring: Model drift can serve as an indicator of data quality issues or changes in the data collection process, prompting investigations and improvements.
  • Maintaining Model Validity: Monitoring model drift helps ensure the model’s continued validity and effectiveness in real-world scenarios, supporting long-term business goals.

Detecting And Addressing Model Drift Using Python

  • Collect Data: Ensure you have access to historical and real-time data to evaluate model performance over time.
  • Split Data: Divide your data into training and testing sets. The training set is used to develop the initial model, while the testing set simulates real-world data for ongoing evaluation.
  • Train the Model: Train your model using the training set, making use of appropriate algorithms and techniques based on your problem domain.
  • Evaluate Model Performance: Apply the trained model to the testing set and measure its performance using relevant evaluation metrics (e.g., accuracy, precision, recall). 
  • Set a Threshold: Determine a performance threshold below which you consider the model’s performance to indicate potential drift.
  • Monitor Drift: Periodically re-evaluate the model’s performance using real-time or updated data. Compare the performance metrics with the established threshold. 
  • Detect Drift: If the model’s performance falls below the threshold, it may indicate model drift. Various statistical techniques and algorithms can be applied to detect drift, such as distributional comparisons or hypothesis tests.
  • Address Drift: If drift is detected, several actions can be taken, including retraining the model using updated data, fine-tuning model parameters, or considering alternative algorithms.

Evidently AI

Evidently AI is a versatile library designed to facilitate the analysis and interpretation of machine learning models. It offers a wide range of features that enable data scientists to gain comprehensive insights into model performance, identify potential issues, and ensure the reliability of their models.

Features Of Evidently AI

Data Profiling: Evidently AI performs thorough data profiling, summarizing key statistics and distributions, enabling data scientists to understand the characteristics of their datasets.

Model Performance Analysis: The library provides detailed metrics and visualizations to assess model performance, including accuracy, precision, recall, ROC curves, and confusion matrices.

Drift Detection: Evidently AI offers tools to detect and monitor data drift, enabling users to identify when a model’s performance is affected by changes in the underlying data distribution.

Fairness Analysis: The library allows for fairness analysis, assessing whether models exhibit biases or discriminate against certain groups based on demographic or other sensitive attributes.

Error Analysis: Evidently AI helps to identify patterns and sources of errors by analyzing predictions and their discrepancies with actual values, enabling targeted model improvement.

Exploring Model Insights With Evidently AI

We will be using the Bike Sharing dataset from Kaggle. The dataset is a collection of bike renting data from people who rent a bike from one location and return it to a different place on an as-needed basis. For more information refer to the Bike Sharing. Let’s start with installing all the dependent libraries and loading the data.

import pandas as pd
import numpy as np
import requests
import zipfile
import io

from datetime import datetime, time
from sklearn import datasets, ensemble

from evidently import ColumnMapping
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset, TargetDriftPreset, RegressionPreset

content = requests.get("https://archive.ics.uci.edu/static/public/275/bike+sharing+dataset.zip").content
with zipfile.ZipFile(io.BytesIO(content)) as arc:
    raw_data = pd.read_csv(arc.open("hour.csv"), header=0, sep=',', parse_dates=['dteday'], index_col='dteday')

raw_data.index = raw_data.apply(
    lambda row: datetime.combine(row.name, time(hour=int(row['hr']))), axis = 1)

Let’s look at the data

We notice that there are both numerical and categorical columns in the dataset. It will be easier to segregate into two categories namely numerical_features and categorical_features. Also, define the target variable as ‘cnt’ which is the number of bikes rented each day, and another variable prediction which we will predict using model

target = 'cnt'
prediction = 'prediction'
numerical_features = ['temp', 'atemp', 'hum', 'windspeed', 'hr', 'weekday']
categorical_features = ['season', 'holiday', 'workingday']

Before we step further, it is important to note that in a real-world scenario, once a model is built and deployed in production, our objective is to monitor this production model to ensure that model accuracy and expected behavior are consistent. This model is set as a reference (benchmark) and all future predcitions from the model will be compared against this reference. Eg: If a production model is trained till the month of June 30 and deployed then the new data and predictions starting from July will be compared against the production model’s results to check for any deviation.

Now, for the sake of the blog and to keep things simple, we will split our dataset into two groups called reference having January data and current with February data. We will use Randomforest regressor to build model and predict the values.

reference = raw_data.loc['2011-01-01 00:00:00':'2011-01-28 23:00:00']
current = raw_data.loc['2011-01-29 00:00:00':'2011-02-28 23:00:00']

# Building a Randomforest Regressor model
regressor = ensemble.RandomForestRegressor(random_state = 0, n_estimators = 50)

# Set up the predictor variables that are numnerical, categorical and the target variable
regressor.fit(reference[numerical_features + categorical_features], reference[target])

# Preict for bothe reference and the current dataset
ref_prediction = regressor.predict(reference[numerical_features + categorical_features])
current_prediction = regressor.predict(current[numerical_features + categorical_features])

# Add a new column called prediction in respective dataframes
reference['prediction'] = ref_prediction
current['prediction'] = current_prediction

It is time to check for model performance using Evidently’s Report module. In this trial, our current data is Januray and reference is none because this is model is our baseline model and there is nothing in the past to benchmark it.

column_mapping = ColumnMapping()

column_mapping.target = target
column_mapping.prediction = prediction
column_mapping.numerical_features = numerical_features
column_mapping.categorical_features = categorical_features

regression_perfomance = Report(metrics=[RegressionPreset()])
regression_perfomance.run(current_data=reference, reference_data=None, column_mapping=column_mapping)

# Display the report

Model Metrics

Model Tracking

Now, lets feed February first week’s data and check if there are any deviations from the baseline version from the previous section. The code chunk will be same but the data will change Eg: In the previous section for the baseline model, we only had current data and did not have reference data but this time our current data will be first weeks data from February and reference will be January data.

regression_perfomance = Report(metrics=[RegressionPreset()])
regression_perfomance.run(current_data=current.loc['2011-01-29 00:00:00':'2011-02-07 23:00:00'], 


Model Performance Metrics

Observations From First Week Report

  1. The Mean Absolute Error has increased to 13.38 from 4.1 (Base Model)
  2. Error distribution indicates some changes but it is still normally distributed
  3. Error normality indicates some deviation but nothing major at this stage
  4. There is no model drift detected but the model is underestimating
  5. Drift score (p_value) for prediction variable: 0.684. As it is greater than 0.05, it can be concluded that there is no drift.

Observations From Third Week Report

We will use the same code as above with change in the dates (2011-02-15 to 2011-02-21). Here are the observations

  1. The Mean Absolute Error has increased to 24.69
  2. Error distribution indicates significant deviation from the reference
  3. Error normality indicates some deviation but nothing major at this stage
  4. There is no model drift detected but the model is underestimating
  5. Drift score (p_value) for prediction variable: 0.095. As it is greater than 0.05, it can be statistically concluded that there is no drift. It is important to note the decrease from 0.684 (first week) to 0.095. It is an indication that the model is drifting and needs to be retrained.

Data Drift

In the previous section, we had both predicted and actual values based on which all our metrics were generated to track model drift but there are also scenarios where we may not have the actual data to validate. In such cases, we can use data drift to check if the input parameters to the model have changed if yes, then the model needs to be retrained and deployed.

column_mapping = ColumnMapping()

column_mapping.numerical_features = numerical_features

data_drift = Report(metrics = [DataDriftPreset()])
data_drift.run(current_data = current.loc['2011-01-29 00:00:00':'2011-02-07 23:00:00'],
               reference_data = reference,



  1. Some variables are changing/evolving with time and from the list above, these are seasonal factors.
  2. Although the model’s performance didn’t show significant degradation compared to the reference, the error distribution did indicate deviation from the reference.
  3. Also, another noticeable factor was that the model underestimated the prediction in comparison to the reference.
  4. The data drift analysis showed that 4 variables are drifting indicating that the model should be retrained and deployed.

Tracking drifts is crucial for effective model performance. Merely monitoring the model’s performance is inadequate as it fails to account for potential long-term negative impacts. By incorporating data drift tracking, one can monitor inherent changes in the input, enabling proactive measures to be taken, and mitigating potential financial losses. This comprehensive approach ensures that the business remains vigilant and adaptable, minimizing adverse consequences in the future.

Pros And Cons Of Tracking Model Drift


  • Early Detection: Tracking model drift allows for early detection of performance deterioration, enabling timely interventions.
  • Improved Decision-Making: Accurate models lead to better-informed decisions, positively impacting business outcomes.
  • Compliance and Regulation: Monitoring model drift ensures compliance with regulatory requirements and industry standards.


  • Computational Overhead: Implementing drift monitoring mechanisms may introduce additional computational complexity.
  • Data Collection Challenges: Ensuring representative and diverse data for drift detection can be challenging, especially in rapidly evolving domains.
  • False Positives: Drift detection methods may occasionally raise false alarms, leading to unnecessary model updates or retraining.


Evidently AI is a powerful Python library that provides valuable insights into data quality and model performance. By utilizing its features, data scientists and machine learning practitioners can proactively monitor their models, identify issues such as data drift and model decay, and make informed decisions. This blog presented an overview of Evidently AI and showcased practical examples of its usage. Incorporating Evidently AI into your machine learning workflow can significantly enhance the understanding and management of data quality and model performance. By actively utilizing Evidently AI, data professionals can ensure the reliability and robustness of their machine learning models, ultimately leading to more accurate predictions and informed decision-making.

There are other statistical methods for testing / measuring the drift.

  1. Kolmogorov-Smirnov (K-S) test
  2. Population Stability Index
  3. Model-Based Approach
  4. Adaptive Windowing (ADWIN)
  5. Page-Hinkley method



Leave a Reply

Your email address will not be published. Required fields are marked *