Introduction
Classification and Regression trees (CART) are a non-parametric decision tree learning technique that produces either classification or regression trees, dependent on whether variable is categorical or continuous. In this context, our focus is primarily on regression, with our goal being to predict a continuous output variable.

Mode of Operation:

The CART algorithm builds a binary tree where each non-leaf node splits the dataset into exactly two subsets repeatedly. Each of the root nodes represents a single input variable (x) and a split point on that variable. Essentially, the dataset is split into number of trees, depending on the criteria of splitting. The criteria could either be: Entropy, Gini or Variance. The splitting is done till the terminal node of the tree is reached.

Process of Building the Trees

Image description

  1. Feature Selection Entails the evaluation of the features of the data to identify that which best splits the data. The selection of the ideal input variable and the specific split is chosen using a greedy algorithm to minimize the cost function such as the mean squared error.
  2. Binary Splitting
    Upon selection of the best feature, a binary split is created in the data to two child nodes

  3. Recursive Tree Building
    The process is ongoing until a stopping criterion is met, such as the minimum number of samples in a node, or the maximum tree depth.

  4. Tree Pruning
    Upon building of the full tree, pruning begins. It entails examination of the tree sections to identify branches that can be removed without a significant loss in prediction accuracy. The simplest pruning approach involves working through each leaf node in the tree, while evaluating the effect of removing it using a hold-out test set. Leaf nodes are removed when there is a drop in the overall cost function on the entire test set.

Application of the CART Algorithm
There are diverse applications of the CART algorithm, attributed to its ability to handle both the classification and regression problems, coupled by the transparent nature of decision trees. This provides valuable insights and predictions to the different domains.

Image description

In the healthcare sector, the importance of timely and accurate diagnosis cannot be underscored. This facilitates the prediction of the likelihood of a patient having a particular disease based on the symptoms and test results. The CART algorithm facilitates determining the risk of patients developing complications post operation, based on factors like age, surgery type and pre-existing conditions. From a financial standpoint, the CART algorithm facilitates the prediction of the creditworthiness of customers based on variables like debt ratios, employment status and income.