It is important to establish baseline performance on a predictive modeling problem. A baseline provides a point of comparison for the more advanced methods that you evaluate later.
In this tutorial, you will discover how to implement baseline machine learning algorithms from scratch in Python.
After completing this tutorial, you will know:
- How to implement the random prediction algorithm.
- How to implement the zero rule prediction algorithm.
The random prediction algorithm predicts a random outcome as observed in the training data. It is perhaps the simplest algorithm to implement.
It requires that you store all of the distinct outcome values in the training data, which could be large on regression problems with lots of distinct values.
Because random numbers are used to make decisions, it is a good idea to fix the random number seed prior to using the algorithm. This is to ensure that we get the same set of random numbers, and in turn the same decisions each time the algorithm is run.
Below is an implementation of the Random Prediction Algorithm in a function named random_algorithm().
The function takes both a training dataset that includes output values and a test dataset for which output values must be predicted.
The function will work for both classification and regression problems. It assumes that the output value in the training data is the final column for each row.
First, the set of unique output values is collected from the training data. Then a randomly selected output value from the set is selected for each row in the test set.
1
2
3
4
5
6
7
8
9
|
# Generate random predictions
def random_algorithm(train, test):
output_values = [row[–1] for row in train]
unique = list(set(output_values))
predicted = list()
for row in test:
index = randrange(len(unique))
predicted.append(unique[index])
return predicted
|
We can test this function with a small dataset that only contains the output column for simplicity.
The output values in the training dataset are either “0” or “1”, meaning that the set of predictions the algorithm will choose from is {0, 1}. The test set also contains a single column, with no data as the predictions are not known.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
from random import seed
from random import randrange
# Generate random predictions
def random_algorithm(train, test):
output_values = [row[–1] for row in train]
unique = list(set(output_values))
predicted = list()
for row in test:
index = randrange(len(unique))
predicted.append(unique[index])
return predicted
seed(1)
train = [[0], [1], [0], [1], [0], [1]]
test = [[None], [None], [None], [None]]
predictions = random_algorithm(train, test)
print(predictions)
|
Running the example calculates random predictions for the test dataset and prints those predictions.
1
|
[0, 0, 1, 0]
|
The random prediction algorithm is easy to implement and fast to run, but we could do better as a baseline.
Reference: www.machinelearningmastery.com

An interested and active person in the field of data science and molecular dynamics simulation
Leave A Comment