In this article, we are going to learn a bit more about a popular method of creating and visualizing predictive models and algorithms – decision trees. We are going to learn what are decision trees, what are the types of decision trees and when you should use each. Finally, at the end of the article, we will take a look at the advantages as well as disadvantages of using decision trees.
What is a Decision Tree?
A decision tree algorithm – a popular method for creating and visualizing predictive models and algorithms – is a part of supervised learning algorithms, which can be used for solving regression and classification problems. Why a popular method? Because decision trees are relatively easy to understand and they yield results – they are effective in predictive modeling, as simple as that. However, a decision tree looks a bit different than the one you can see in your garden, so when it actually comes to building a decision tree, we start at the root – where the entire population is, and then we move down the tree and at each node, we split the population into smaller parts, which is also one of the basic goals of a decision tree – to create a training model that we can use to predict the outcome.
When predicting through a decision tree, we go through two stages:
- training the decision tree – in this stage, we build the tree, test it, and optimize it by using available data;
- using the training decision tree – in this stage we use the training model so we can predict the class or value of the target variable by learning simple decision rules from data.
What are the different types of decision trees
Every data scientist should know that there are different types of decision trees. The types differ depending on the data you are typing to predict:
- The regression tree aka continuous variable decision tree predicts continuous quantitative data
- The classification tree aka categorical variable decision tree predicts qualitative data
What are the similarities between the regression and classification trees?
Both, classification and regression trees are also called recursive partitioning trees and they have been extensively used in predictive analytics. Borth regression and classification trees are machine-learning methods that build the prediction models from specific datasets. In both cases, we split the data into multiple blocks repeatedly and the prediction model fits on each of such partitions of the prediction model and then each partition represents the data as a graphical decision tree.
What are the differences between the regression and classification trees?
However, there are two basic differences between the regression and classification decision tree:
- We use the regression tree when the response variable is not categorical, so when the response variable is either continuous or numeric – the regression decision trees take ordered values with continuous values. We usually use regression trees when we deal with quantities, prices for example, or when we are trying to predict a person’s income because the data we are trying to predict falls along a continuum, and may depend on a person’s education, age, sex and so on. We could also use a regression tree when we want to predict the selling price of a house, where we take into account a size of a house, the age, and style of a house, as well as the location of the area.
- We use classification trees when the response variable is fixed or categorical, for example when the response variable can be binary classified as yes or no, or 0 or 1. The classification decision trees are built with unordered values with dependent variables, so the algorithm is therefore used to identify the “class” within which a target variable would most likely fall. For instance, we use a classification tree when we are trying to predict who will subscribe to a newspaper or who will subscribe to an electronic newsfeed, or when we are typing to predict who will graduate from college. A more complex example would be using a classification tree to predict a medical diagnosis based on various symptoms.
What are the main advantages of using decision trees?
As already mentioned, one of the main benefits of using a decision tree is that it is easy to visualize and is therefore not complex to understand or to interpret or analyze. In addition to low complexity, decision trees require little data preparation – while other methods often require data normalization and dummy variables, decision trees do not demand this type of preparation. On the other hand, decision trees do not support blank or missing values, while some other methods do.
In addition to that, decision trees can handle multi-output problems.
In short, yes, you can use decision trees for this problem. However, there are many other ways to predict the result of multiclass problems, but if you want to use decision trees one way of doing it is to assign a unique integer to each of your classes.
The cost of using the decision tree is often lower compared to other techniques, but the cost is also logarithmic according to the number of data points that are used to train the tree and can get quite high depending on the complexity of the tree.
As we’ve seen in the paragraph where we’ve covered the types of decision trees, this method works with both numerical (regression trees) and categorical data (classification trees).
We can validate a decision tree by using a statistical test, and this helps us estimate how reliable the model is. In addition to this, decision trees are flexible because they perform quite well even when the predictions in the training tree from which the data were generated are broken.
What are the main disadvantages of using decision trees?
We’ve mentioned earlier that decision trees can also cover complex data. The problems arise when we build trees that are too complex and in situations like that, trees cannot generalize data accordingly. This problem is called overfitting and can be overcome with mechanisms such as setting the minimum number of samples required at a leaf node or setting the maximum depth of the tree.
Oftentimes decision trees are seen as unstable because only a small difference in data can cause a totally different decision tree to be created. Of course, we can avoid this problem by creating a decision tree with an ensemble.
We have to take into account that decision trees are not suitable for extrapolation because predictions based on decision trees are not smooth and they are not continuous – they merely represent a vector-field constant approximation.
In addition to this, we can create decision trees that are biased – this can easily happen when some classes begin to dominate. For this reason, we have to focus on balancing the dataset before we fit it in the decision tree.
We can also often encounter problems with having an optimal decision tree, even when we build simple decision trees because practical decision-tree learning algorithms are based on heuristic algorithms such as the greedy algorithm, where we optimize decisions at each node. The problem lies in the fact that algorithms like these cannot guarantee to return a decision tree that is globally optimized. However, we can overcome this issue by creating multiple training trees in an ensemble learner, and then the features and samples are randomly sampled with a replacement.