Skip to content

Random Forest classifier

Simone Maurizio La Cava edited this page Apr 22, 2020 · 6 revisions

The Random Forest classifier is an ensamble classifier obtained by aggregating more decision tree classifiers.

Before starting the classification process, it may be useful to introduce the Decision Tree and the Random Forest concepts (if you already know them, you can jump directly to the procedure paragraph).





Decision Tree classifier

A Decision Tree classifier is a Supervised Machine Learning where the data is continuously split according to a certain parameter, and consists of a set of nodes, which corresponds to the tests for the value of a certain feature and where a split of the data is computed dependently by the result of the test, and the branches which connect a node to another node.

The final nodes (the nodes not connected to following nodes) are called leaf nodes and predict the outcome of the classification, assigning the class label to any tested sample.

The first node, which represent the first test, is called root node.


However, before the classification step, the Decision Tree classifier have to be built.

A training set is used in order to fit a Decision Tree classifier, and the algorithm simply consist in identify the better split value on the better feature to split (the one which is able to better discriminate between the classes), splitting the data based on the result of the test and repeat it for each nodes until samples belonging to only one class are present on a node, or until this path is pruned (in this case will be used the majority rule on the node) in order to prevent overfitting or in case of reaching a maximum depth value.


Now, the classification step only consists in applying these tests on each test sample, which will reach a leaf node and it will be so classified.





Random Forest classifier

Random Forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest.

These decision trees are fitted with different sets, which can mean different samples (however belonging to the same overall training set) or different features.

In the classification step, each tree will classify any test sample, and the final class chosen by the Random Forest classifier can be decided in different ways.

Athena simply uses the majority rule to assign the final class label, which consists in assigning the most frequently predicted label.

This classifier tends to reduce the probability of overfitting respect to the Decision Tree classifier, and generally provides better performance.

Currently, you can select:

  • The Random Forest classifier, an ensamble classifier obtained by aggregating more decision tree classifiers (here, you can also use a decision tree classifier)
  • The Neural Network classifier, a classifier composed of a multilayer artificial neural network, resulting by a set of preceptrons.

Finally, you can try them by changing their parameters, or you can return to the analysis list.

Clone this wiki locally