In machine learning, there are currently debates about what an explanation or explainable model is and what is necessary for a given purpose. This post details the concepts of explanation and interpretation to help clarify the difference between the two; discusses how, although interpretation is preferable, explanation is the only option for many machine learning techniques; and then details a clustering technique that aids explanation for unsupervised machine learning.

One pair of definitions related to explanations of models from academia splits the field into interpretability and explainability. An interpretation is a set of understandable reasons why a model made a particular decision, which means that the model itself must in some way be sufficiently understandable to interpret. Examples of interpretable models include decision trees and some limited applications of linear regression. An interpretation may, for example, point to the particular piece of training data that caused the model to make the decision. An explanation is a set of understandable reasons that rationalize or justify why a model made a decision that may or may not be at all related to why a model made a particular decision. That is, an explanation may point to a particular piece of training data, and that particular piece of training data may or may not have had anything to do with the decision the model made. Examples of explanations for models include LIME, Google’s What-If Tool, and IBM’s AIF360.

Using an interpretable model is essential if one wants to understand why a decision was actually made. The difference between an interpretable model and an explainable model can be illuminated via a simple example. Suppose a person buys a car. An explanation, or rationalization, for the decision might be that the person justifies buying the car because their old car was becoming unreliable, the new car was in their price range, and the new car had all the features and properties they desired. However, the real reasons that the person bought the car, if we were to be able to interpret the internals of the person’s thought process, were because the person was feeling jealous of a friend’s car and the sales person used effective emotional leverage.

Unfortunately, some deployed machine learning models use techniques that are very opaque, black boxes that are hard or impossible to interpret and thus rationalizations (explanations) are the only available choice for understanding the data. Further, the existing tools for explaining opaque models are designed for supervised learning. How can we apply them to unsupervised learning?

Obtaining rationalizations for unsupervised learning techniques can be done easily. Once the unsupervised learning technique has been performed, the values returned from the unsupervised technique can be appended into a new feature or target to the original data. First, consider an anomaly detection method that assigns a score to each training case (which may be a feature vector of some sort) indicating how anomalous the point is. This anomaly score is treated as the label for each case as if it were a supervised learning system and run through the explanation tool. The same technique can be applied to hard or soft clustering. Hard clustering is easy, given that each case belongs to exactly one cluster. The cluster ID can be used as the new label and, as before, the data can be run through an explanation tool. In soft clustering, each case may potentially belong to any number of clusters, potentially in a fractional manner. With soft clustering, each cluster can be one-hot encoded with each cluster being a label as if it were the output of a multilabel classification supervised learning system, and then these labels can be used in conjunction with the other features in the explanation system.