Performance Metrics

Performance Metrics for Regression: Data Science with Python

Performance evaluation metrics are used to measure the quality of the statistical or machine learning model. Evaluating machine learning models or algorithms is an equally essential step, similar to data retrieval, processing, and training.

Depending on model evaluation and subsequent comparisons, we can decide whether to continue our efforts in model enhancement or cease them and which model should be selected as the final model to be used/employed.

Performance Metrics

Table of Contents

  1. Introduction.
  2. Regression Metrics:
    1. Mean Absolute Error (MAE)
    2. Mean Squared Error (MSE)
    3. Root Mean Squared Error (RMSE)
    4. R² (R-Squared)
    5. Adjusted R²


Performance metrics are important in the evaluation of machine learning models. Evaluation metrics help in tuning the model hyperparameters and help in deciding whether the intelligent feature that we just developed will add to the value of our model, or not.

So how can we evaluate a model’s performance? How can we decide whether Model A is better or Model B performs better?

The best answer is to have some numerical measure or metric of a model’s effectiveness and use that measure to select and rank different models. Different types of metrics work for various problems.

We should also keep in mind that a lot of times these evaluation metrics may not capture the required success criteria of the problem to be solved. In these cases, we will have to imagine and adopt these metrics for our problem and use things like objectives and business constraints to reach a fruitful insight. In this article, we will discuss performance metrics for regression.

Performance Metrics for Regression

A regression model is an example of a supervised learning method and is relatively easier to evaluate than an unsupervised model if you have the right metrics (real-valued numeric response variables). In this section, we go through a small subset of these essential metrics.

Aim: To model the relationship between a specific number of features and a continuous target variable.

Dataset: Boston housing dataset.

Python Code: Importing the required libraries and loading our dataset:

Click here to set up your system for the Python coding environment.

Mean Absolute Error (MAE)

Mean absolute error (MAE) is a simple metric that measures the average magnitude of the errors in a set of predictions without considering their direction.

It’s the average of the absolute differences between the predicted and actual values, where all individual differences have equal weight. We aim to get a minimum value of MAE, as it is a loss.

Mean Absolute Error
Image: MAE

Python Code:

It can be implemented using sci-kit-learn’s mean_absolute_error function from the metrics module.

Mean Absolute Error Python Code

Advantages of MAE

  1. MAE is most robust to outliers.
  2. MAE is a more natural measure of average error (unlike RMSE) and is unambiguous.

Disadvantages of MAE

  1. One disadvantage of MAE is that the gradient magnitude is not dependent on the error size, only on the sign of (y – ŷ). This may cause the gradient magnitude to get large even when the error is small, which in turn may lead to convergence problems.
  2. Unlike the standard deviation (σ), it cannot be readily ‘plugged’ into the normal distribution formulae.
  3. MAE uses the absolute value of the residuals, so it’s difficult to indicate whether the model is underperforming or overperforming.
  4. MAE is non-differentiable, unlike MSE, which is differentiable.
MAE Plot
Image: MAE Plot

The above plot illustrates the difference between the predicted values (regression line) and the output values (red dots).

Each residual (difference) contributes linearly to the total error. Since we are summing individual residuals, a small MAE suggests that the model is perfect at prediction. Similarly, a large MAE suggests that the model may have trouble generalizing well. An MAE of 0 indicates that our model outputs perfect predictions, but it isn’t possible in real-life scenarios.

Mean Squared Error (MSE)

The mean squared error evaluates the average of the squares of the errors or deviation between the actual and predicted values, as predicted by a regression model. The mean squared error, or MSE, can be used to evaluate a regression model’s performance, with lower values indicating better regression models with fewer errors.

But what does MSE represent?

It denotes the squared distance between actual and predicted values. We take squared value to avoid the cancellation of negative terms and it is to the benefit of MSE.

The mathematical formula for calculating MSE is:


Mean Square Error
Image: MSE

Python code

It can be implemented using sci-kit-learn’s mean_squared_error function from the metrics module.

Mean Square Error Python Code

Advantages of MSE

  1. It can be used as a loss function because it has a differentiable graph.

Disadvantages of MSE

  1. More prone to outliers than other metrics because of the squaring factor.
  2. The MSE value is a squared unit.
  3. If the error values are smaller or smaller than 1, it may lead to underestimating the model’s badness.
Mean Square Error Plot
Image: MSE Plot

MSE will almost always exceed MAE because, in MAE, residuals contribute linearly to the total error, whereas in MSE, the error grows quadratically with each residual. That is why MSE is used to determine how well the model fits the data because it strongly penalizes the heavy outliers.

Root Mean Squared Error (RMSE)

The RMSE (Root Mean Squared Error) is defined as the average root-squared difference between the real and the predicted value. Its usage is similar to the MSE. It can be mathematically expressed as:


Root mean squared error (RMSE) = sqrt(Sum of squared error/number of samples)

 Now, the question is: what is: a good RMSE value?

The general answer is the lower the RMSE value, the better a given model can “fit” a dataset. However, it depends on the range of the dataset you’re working with and is significant in determining whether or not a given RMSE value is “low” or not.

Python Code

RMSE is implemented using the scikit-learn mean_squared_error function from the metrics module.

RMSE python code

Advantages of RMSE

  1. The interpretation of loss is quite easy, as the output value obtained is of the same unit as the required variable.
  2. RMSE is the most preferred evaluation metric in the case of deep learning techniques.

Disadvantages of RMSE

  1. RMSE is not as robust to outliers as compared to MAE (mean absolute error). 

The coefficient of determination (R² score)

The R² score, or coefficient of determination, computes the proportion of variance in the dependent variable that is explained by the independent variable. R² score can also be referred to as “goodness of fit”. The R² score ranges from -∞ to 1. The closer to 1, the R², the more perfect the regression model is. It indicates that the independent variables explain all of the variances. If R² is equal to 0, the model performs poorly, just like a random model. If R² is negative, the regression model is inaccurate. Therefore, this evaluation metric is an excellent tool to evaluate the efficiency of a regression model.

The mathematical formula for calculating r2 is given below:


Where ȳ denotes the mean of the dependent variable, yi denotes the actual true values, and ŷi indicates predicted outputs.

Python Code

It can be implemented by using the sci-kit-learn package’s r2_score.

R-squared Python Code

Advantages of R²

  1. It is usually interpreted as summarizing the percent of the variation in the response that the regression model describes.

Disadvantages of R²

  1. R² does not measure how one variable explains another.
  2. R² gives no information about prediction error.
  3. R² doesn’t permit comparing models using transformed responses.
  4. R² can be arbitrarily close to 1 when the model is wrong.

Adjusted R² score

One downside of the R² score is that as new features are added to the data, the R² score starts to increase or stays constant but never decreases because it assumes that the variance of the data increases with more data. But the problem is that when we add extraneous features to the dataset, R² sometimes starts to increase. This is incorrect.

Therefore, Adjusted R² emerged as R²’s alternative to control this situation. The mathematical formula for computing adjusted R² is given below:

Adjusted R-squared

Where N indicates the overall sample size (number of rows) and p indicates the number of predictors (columns).

Python Code

It can be implemented by using the sci-kit-learn package’s r2_score.

Adjusted R-squared Python Code

Advantages of Adjusted R²

  1. Unlike R squared, Adjusted squared can decrease with the addition of less significant variables, thus resulting in a more reliable and accurate evaluation.
  2. Adjusted R Squared is an efficient evaluation metric, and can correlate the variables more accurately than R Squared.

Disadvantages of Adjusted R²

  1. This metric is valid only for models where R2 is defined (e.g., linear models and not generalized linear models).

Stay Tuned

Click here, to learn about performance metrics for classification in detail.

Keep learning and keep implementing!!

2 thoughts on “Performance Metrics for Regression: Data Science with Python”

Leave a Comment

Your email address will not be published. Required fields are marked *