You know the value of predictive analytics. You may have invested some time into implementing machine learning solutions for your organization. Are you getting the most out of your machine learning models? Do you know how to evaluate your machine learning solutions?
Evaluating the usefulness of a machine learning model is a difficult task. Having a solid comprehension of standard statistics and metrics is extremely important in assessing solutions. At Northwest Cadence, we often see companies focus on one metric to determine whether they have a “good” machine learning model. However, we encourage organizations to gain a deeper understanding of all statistics that are available when determining the strength of a model. A more in-depth knowledge of model evaluation will allow companies to be confident in their solutions, and assist in tweaking models for better results. In this blog, we will walk through evaluation criteria for regression and binary classification machine learning models (please note that multiclass regression models are very similar to binary). For a quick overview of machine learning with examples, check out Machine Learning for the Rest of Us.
Regression models are ideal for predicting numeric outcomes. When evaluating a regression model, we are interested in knowing how close the observed value is to the predicted value. There are five main metrics to look at and understand when evaluating the strength of a regression model:
- Mean Absolute Error
- Root Mean Squared Error
- Relative Absolute Error
- Relative Squared Error
- Coefficient of Determination (R2)
The Mean Absolute Error is the average difference between the actual and predicted values. Let fi be the predicted value, yi the observed value, and n be the number of predictions:
The Root Mean Squared Error represents the standard deviation of the differences between predicted values and observed values. Let fi be the predicted value, yi the observed value, and n be the number of predictions:
Like the Mean Absolute Error, the Relative Absolute Error estimates the average difference between observed and predicted values, but allows for comparison between models with different units. Let fi be the predicted value, yi the observed value, and ȳ be the average of yi:
Like the Root Mean Squared Error, the Relative Squared Error represents the standard deviation of the differences between predicted values and observed values, but allows for comparison between models with different units. Let fi be be the predicted value, yi the observed value, and ȳ be the average of yi:
While all five metrics should be considered in evaluation, the R2 value is typically the most helpful in the assessment process. The R2 value usually falls between 0 and 1, and measures how the model fits the data. If a model perfectly describes the data, then the R2 value equals 1. If a model poorly describes the observed data, then the R2 equals 0. To estimate the R2 value, let fi be the predicted value, yi the observed value, and ȳ the average of yi:
Note that just because an R2 value is high, it does not mean that it is a great model. Often individuals over fit their data producing a high R2 value but this usually means that the model will not do great at predicting future observations. Therefore, individuals should be aware and still account for all metrics available when evaluating a model.
In classification models there are outcomes designated as “positive” events and “negative” events. For example, if the model is used to predict whether an individual will purchase a product, then a purchase would be considered a positive event and a non-purchase would be considered a negative event. This does not mean that the events are good or bad. Knowing that someone will likely not purchase a product could save an organization both time and money that could be used elsewhere. Predictions are broken into four main categories:
- True Positives: observations where the model correctly predicted the positive event.
- False Positives: observations where the model predicted a positive event when the observation was actually a negative event.
- True Negatives: observations where the model correctly predicted the negative event.
- False Negatives: observations where the model predicted a negative event when the observation was actually a positive event.
Before assessing the model, one should know whether a false positive or a false negative has a greater cost. Knowing the cost of false negatives and false positives enables the individual to focus on the metrics that matter most. The statistics used to evaluate classification models are accuracy, precision, recall, and F1 score. If a false negative costs more than a false positive, then recall is the most helpful metric. If a false positive costs more than a false negative, then precision is the most important metric. If the cost of the two events are about the same, then either accuracy or F1 score are the most significant. While one metric could be valued greater the others, do not dismiss all other metrics because considering all four is a good evaluation practice.
Accuracy is the most intuitive of the four metrics. One can think of accuracy as how often did the model predict the right value. Accuracy can be found with the following formula:
Precision is the proportion of correctly predicted positive observations, and is found with the following formula:
Recall is the proportion of correctly predicted positive events. The formula for recall is as follows:
F1 Score is an average of precision and recall. The formula for F1 score is:
Ensuring that you are getting the most out of your machine learning models is crucial for your organization, and can give you the advantage you need. A solid understanding of what each statistic measures, and which metrics to value the most will help ensure and increase the strength of your machine learning models.
Northwest Cadence helps companies design and execute machine learning solutions. If you are curious about how you can use machine learning to transform the way your business operates, please reach out to firstname.lastname@example.org and we will work with you to create just the right solution for your team!