We are pleased to announce the release of Alibi Explain v0.9.0, featuring support for global feature importance calculation using two complementary methods—Permutation Importance and Partial Dependence Variance.
Model agnostic global feature importance
Whilst the users of Alibi Explain have already enjoyed global explanation capabilities via Partial Dependence and Accumulated Local Effects, these methods require close scrutiny of the features of interest to draw conclusions about their effects on the model predictions. In this newest release we build on top of previous work and introduce methods for calculating global feature importances—a single numerical value, one for each feature, summarizing the overall importance of each feature towards the predictive model of interest.
The benefit of summarizing feature importance into a single number is that practitioners can quickly gain an understanding of which features their model deems as more or less important, before using other methods for deeper interpretability. Moreover, both Permutation Importance and Partial Dependence Variance are fully model agnostic and apply to any model which presents a more powerful alternative to methods that can only apply to certain subclasses of models (e.g. feature importance for tree ensembles).
Permutation Importance
Permutation importance is a classic method for calculating feature importance by checking how much the model performance degrades when feature values within a column are permuted—a nice and simple interpretation. The method does require ground truth labels which makes it more applicable during the model development process. The Alibi Explain implementation is very flexible and allows users to define multiple and custom metrics to compute the importances with respect to, allowing to analyze the effects for domain-specific metrics.
As an example, Figure 1 shows the output of performing permutation feature importance on a random forest classifier trained on a task to determine which employees are more likely to leave the company. The bar charts show the importance scores with respect to both “accuracy” and “F1” metrics and as we can see, the feature “satisfaction_level” is the most important
Figure 1: Permutation feature importance for a random forest classifier trained to predict the probability of employees leaving a company. In this case the feature importance is calculated with respect to the accuracy and the F1 metric.
For more information check out the documentation.
Partial Dependence Variance
This method of computing feature importances builds on top of Partial Dependence (PD) plots. The intuition is that, given a PD plot, the most important features will be those with the highest variance of the PD curves across the feature range. Since PD plots are model agnostic, so are the derived feature importances which leads to a standardized procedure to quantify feature importance for any model. By combining the PD plots with the PD variance feature importances, the user can gain a much more thorough understanding of the feature impacts on their model. Furthermore, PD variance extends to quantifying the interaction strength between pairs of features, allowing a deeper understanding of which features interact with each other inside the model.
For more information check out the documentation.
When to use which method?
Given the two new feature importance methods a natural question is when to pick one over the other. The key is to consider that even though both methods result in a single scalar value per feature, the interpretation of these values is different. Partial Dependence Variance can be used to quantify how much of the model’s output variance is explained by each feature. On the other hand, Permutation Importance quantifies how much model performance degrades when a feature is noised. These insights are complementary as the latter captures not only main feature effects but also feature interactions, and it may be useful to consider both when performing a thorough analysis of model behaviour.