A
C
D
G
M
N
R
S
X
Decision trees are logic trees where the data is split and sorted repeatedly by decision points referred to as nodes. Nodes consist of a specific parameter question and a binary answer, each of which leads to another node. There are two types of decision trees: classification trees which usually assess categorical data and use yes or no questions to classify data, and regression trees which assess quantitative data. Decision trees are often used in machine learning (ML) models to learn from data and make predictions.
Due to the complexity of ML tasks, a decision tree-based algorithm or ML model will usually have many trees. NTrees is an important tuning parameter for ML that dictates the number of trees generated within the model. The NTrees parameter defaults to 50, but it is important to carefully consider where to set this parameter in order to optimize a model’s performance.
The number of trees a model generates affects how deep it can dig into data and how much it can learn from the training data set. A higher number of trees is not always better though. In general, increasing the number of trees increases the predictive potential of a model, but if too many are present, the model will become highly complex and will work too hard to analyze the data. The model can quickly become overfit to the training data. Models with many trees also require much more computing power and time to train and tune. However, if the NTrees parameter is set too low, performance will suffer. The model will be underfit and will not be able to properly learn from the training data or make accurate predictions.
Like many parameters in machine learning, the optimal NTrees settings are not known when first creating a model. In the case of NTrees, model developers use a process called a parameter search where they test out different NTrees values within a reasonable range and see which yields the best results. They can then repeat the test using a narrower and narrower range until they have found the best value.
Models with more features will often require more trees to achieve optimal results. The ideal number of trees can also be affected by the tree depth parameter, which dictates how many decision tree levels the model will have. A model with lower tree depth may require more trees to fully learn from the data. Most models will fall between 50 and 400 trees, but some simpler models may require less.