Although modern machine learning algorithms have already established themselves as spectacular prediction performers, a lot of times it abandons interpretability which is considered critical for a lot of problems. To tackle this problem, we can use fast interpretable greedy-tree sums or FIGS algorithm which is perfect for abbreviated rule-based models.
To be more precise, FIGS can help CART algorithms handle the sums of trees and at the same time help them produce the same. A predetermined threshold ensures that FIGS remains decipherable and the splits across all the trees remain manageable.
Through practical experiments, it has been established that the FIGS algorithm can deliver a trailblazing performance when it is used in various real-world datasets. Yet, in this case, the number of splits is strictly restricted (no more than 20).
Both theoretical and simulation results signify that the FIGS algorithm has the ability to overthrow an important weakness of the single-tree models. This process is completed by improving the confluence rates of the l2 generalization error. The additive components from generative additive models are extricated to complete this entire process.
How does FIGS work?
Normally, the task of the FIGS algorithm is to extend CART, which is a typical greedy algorithm. FIGS helps CART grow a decision tree and at the same time, also helps the algorithm grow a sum of trees. During the reduplication process, FIGS can extend any tree that has already been initiated or another new tree.
Depending on the rule that will quickly decrease the entire uncharted variance or an additional splitting benchmark, FIGS will easily accept the same. In order to keep the tree in sync with each other, every tree has to predict the remaining residuals after totaling the projection of all the other trees.
There is a huge similarity between ensemble approaches such as random forest or gradient boosting with FIGS. More importantly, the model can efficiently adjust to the underlying structure of the data due to the fact that all trees in the model are extended further to wrangle with each other. You do not need to manually assist the enlargement process because the shape, size and number of trees surface impulsively from the data itself.
The utilization process of the FIGS algorithm is very similar to that of scikit-learn models. In this process, you will need to import a regressor or a classifier at first and then make use of the predict and fit methods. Let us look at the example of using FIGS on a clinical dataset.
from imodels import FIGSClassifier, get_clean_dataset
from sklearn.model_selection import train_test_split
# prepare data (in this a sample clinical dataset)
X, y, feat_names = get_clean_dataset('csi_pecarn_pred')
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.33, random_state=42)
# fit the model
model = FIGSClassifier(max_rules=4) # initialize a model
model.fit(X_train, y_train) # fit model
preds = model.predict(X_test) # discrete predictions: shape is (n_test, 1)
preds_proba = model.predict_proba(X_test) # predicted probabilities: shape is (n_test, n_classes)
# visualize the model
model.plot(feature_names=feat_names, filename='out.svg', dpi=300)
(Note: The model is used for illustration purposes.)
The above-mentioned model has only four splits (because we have specified that the model will not have more than four splits through max_rules=4). As for the predictions, it is acquired after adding all the values received from each tree leaf. The accuracy rate of the model is 84 percent and a physician can use this model for (i) a total of four relevant features and (ii) to carefully examine the model to equalize his/her domain expertise. To attain a more tensile model, you can remove the restriction imposed on rule numbers which will bring about a larger model.