Fast interpretable greedy-tree sums (FIGS)

September 1, 2022

How does FIGS work?

Normally, the task of the FIGS algorithm is to extend CART, which is a typical greedy algorithm. FIGS helps CART grow a decision tree and at the same time, also helps the algorithm grow a sum of trees. During the reduplication process, FIGS can extend any tree that has already been initiated or another new tree.

Depending on the rule that will quickly decrease the entire uncharted variance or an additional splitting benchmark, FIGS will easily accept the same. In order to keep the tree in sync with each other, every tree has to predict the remaining residuals after totaling the projection of all the other trees.

There is a huge similarity between ensemble approaches such as random forest or gradient boosting with FIGS. More importantly, the model can efficiently adjust to the underlying structure of the data due to the fact that all trees in the model are extended further to wrangle with each other. You do not need to manually assist the enlargement process because the shape, size and number of trees surface impulsively from the data itself.

FIGS Utilization

The utilization process of the FIGS algorithm is very similar to that of scikit-learn models. In this process, you will need to import a regressor or a classifier at first and then make use of the predict and fit methods. Let us look at the example of using FIGS on a clinical dataset.

‍

from imodels import FIGSClassifier, get_clean_dataset

from sklearn.model_selection import train_test_split

‍

# prepare data (in this a sample clinical dataset)

X, y, feat_names = get_clean_dataset('csi_pecarn_pred')

X_train, X_test, y_train, y_test = train_test_split(

X, y, test_size=0.33, random_state=42)

‍

# fit the model

model = FIGSClassifier(max_rules=4) # initialize a model

model.fit(X_train, y_train) # fit model

preds = model.predict(X_test) # discrete predictions: shape is (n_test, 1)

preds_proba = model.predict_proba(X_test) # predicted probabilities: shape is (n_test, n_classes)

‍

# visualize the model

model.plot(feature_names=feat_names, filename='out.svg', dpi=300)

‍

(Note: The model is used for illustration purposes.)

The above-mentioned model has only four splits (because we have specified that the model will not have more than four splits through max_rules=4). As for the predictions, it is acquired after adding all the values received from each tree leaf. The accuracy rate of the model is 84 percent and a physician can use this model for (i) a total of four relevant features and (ii) to carefully examine the model to equalize his/her domain expertise. To attain a more tensile model, you can remove the restriction imposed on rule numbers which will bring about a larger model.

‍

Reference links:

https://csinva.io/imodels/figs.html

https://arxiv.org/abs/2201.11931

https://www.researchgate.net/publication/358232755_Fast_Interpretable_Greedy-Tree_Sums_FIGS

https://en.x-mol.com/paper/article/1488229291775614976

https://github.com/Yu-Group/imodels-experiments/blob/master/readme.md

‍

Sign up for Free Trial

Latest Blogs