A baseline is a method that uses heuristics, simple summary statistics, randomness, or machine learning to create predictions for a dataset. You can use these predictions to measure the baseline’s performance (e.g., accuracy)– this metric will then become what you compare any other machine learning algorithm against.
There are many machine learning algorithms to choose from. Hundreds in fact. You must know whether the predictions for a given algorithm are good or not. But how do you know?
The answer is to use a baseline prediction algorithm. A baseline prediction algorithm provides a set of predictions that you can evaluate as you would any predictions for your problems, such as classification accuracy or RMSE.
The scores from these algorithms provide the required point of comparison when evaluating all other machine learning algorithms on your problem.
Once established, you can comment on how much better a given algorithm is as compared to the naive baseline algorithm, providing context on just how good a given method actually is.
The two most commonly used baseline algorithms are:
- Random Prediction Algorithm.
- Zero Rule Algorithm.
When starting on a new problem that is more sticky than a conventional classification or regression problem, it is a good idea to first devise a random prediction algorithm that is specific to your prediction problem. Later you can improve upon this and devise a zero rule algorithm.
In more detail:
A machine learning algorithm tries to learn a function that models the relationship between the input (feature) data and the target variable (or label). When you test it, you will typically measure performance in one way or another. For example, your algorithm maybe 75% accurate. But what does this mean? You can infer this meaning by comparing it with a baseline’s performance.
Typical baselines include those supported by scikit-learn’s “dummy” estimators:
- “stratified”: generates predictions by respecting the training set’s class distribution.
- “most_frequent”: always predicts the most frequent label in the training set.
- “prior”: always predicts the class that maximizes the class prior.
- “uniform”: generates predictions uniformly at random.
- “constant”: always predicts a constant label that is provided by the user.
This is useful for metrics that evaluate a non-majority class.
- “median”: always predicts the median of the training set
- “quantile”: always predicts a specified quantile of the training set, provided with the quantile parameter.
- “constant”: always predicts a constant value that is provided by the user.
In general, you will want your approach to outperform the baselines you have selected. In the example above, you would want your 75% accuracy to be higher than any baseline you have run on the same data.
if you are dealing with a specific domain of machine learning (such as recommender systems), then you will typically pick baselines that are current state-of-the-art(SoTA) approaches – since you will usually want to demonstrate that your approach does better than these. For example, while you evaluate a new collaborative filtering algorithm, you may want to compare it to matrix factorization — which itself is a learning algorithm, but is now a popular baseline since it has been so successful in recommender system research.