Explainable artificial intelligence: unveiling what machines are learning

Explain or perish!

Every day, each person generates an enormous quantity and diversity of data, which, allied to an increase of the available computational power and to the democratisation of the access to the data itself, created the ideal conditions for the generalised use of machine learning models applied to several contexts and areas (e.g., law, finance, healthcare, commerce, politics, biometrics and multimedia). Moreover, while the first generation of machine learning algorithms required some prior knowledge on data process and feature engineering and was used to solve easy tasks, the current trend focuses more on solving harder tasks, using multi data sources and extraordinarily complex models such as deep learning (Figure 1) or model ensembles. Interestingly, this newer generation of machine learning is the one that is achieving human-level performances in various areas (e.g., cancer detection in biomedical imaging). Although this may seem promising, it is now important to address the limitations of these outperforming models, to have a better sense of the big picture.

Perhaps the most pertinent limitation of these complex models is the fact that they work as black-boxes (i.e., their internal logic is hidden to the user) that receive data and output results, without properly explaining their decisions in a human-comprehensible way. Therefore, for certain highly-regulated areas (e.g., healthcare, law or finance), the outcome is that we end up training highperforming models that will never be used in a real-world context since this lack of transparency leads to a deficit of trust which may jeopardise the acceptance of such technologies by the area-experts (Figure 2). Besides, there is an inherent tension between machine learning performance (predictive accuracy) and explainability: usually, the best-performing methods (e.g., deep learning) are the least transparent, and the ones providing a clear explanation (e.g., decision trees) are less accurate.

Regulation is also pointing the debate in this direction. For instance, if we look to the EU-GDPR, we end up concluding that transparency, for the sake of ethics and fairness, is one of the priorities. An interesting example is the EU-GDPR’s Article 22 which states that individuals “have the right not to be subject to a decision based solely on automated processing”. This statement has interesting outcomes: on one hand, it states that the users of such technologies have the right to know what are the mechanisms behind the functioning of such technologies and, on the other hand, they have the right to not be subject to the decisions performed by these algorithms [2].

At first sight, this may seem a complicated list of problems to be addressed by the research and technology community. However, we should see this as an opportunity to develop more transparent algorithms, capable of being understood by the area-experts and, the final goal, to be implemented in a real-world context.

Unboxing the black box

There are several ways to study interpretability in machine learning: working with models which are interpretable by design (inherently interpretable) or using post hoc (and model agnostic) interpretation methods. Depending on the purpose of the study or the field of application, both pathways are reasonable. Besides, it is also important to understand if we are dealing with structured (e.g., tables) or unstructured (e.g., images) data to choose the best interpretability approach [3].

For instance, let’s assume that we are dealing with structured data (e.g., a set of features and their correspondent labels). Probably, the easiest way to achieve interpretability is to use algorithms that will generate interpretable models such as linear regression, logistic regression, decision trees or decision rules. This type of algorithms usually respect important properties such as linearity (the association between features and labels is modelled linearly), monotonicity (the relation between a feature and their label always goes in the same direction over the entire range of the feature) or feature-interactions (e.g., multiplying two features to generate a new one). On the other hand, by preference or by necessity, you can fit your data to complex models and then use an external (agnostic) interpretation method to understand what is being learned by the model. The great advantage of this approach is flexibility since you can detach yourself from the type of model and focus on the performance itself [3]. A worth-reading set of post hoc methods are partial dependent plot (PDP) [4], individual conditional expectation (ICE) plot [5], permutation feature importance [6], local interpretable model-agnostic explanations (LIME) [7] or Shapley values [8]. Figure 3 presents an example for LIME algorithm.

Now assume that we are dealing with image data and a deep convolutional neural network (CNN) to perform image classification. In this case, we aim to understand not only the type of features that are being learned by the CNN but also the pixels which contribute more to the final decision. Once again, generally, for this type of computer vision applications, we use post hoc methods to deconstruct the black-box. Current state-of-the-art methods use diversified approaches to obtain explanations: SmoothGrad generates gradient-based sensitivity maps by adding noise to the input [9]; Class Activation Mapping (CAM) [10] and Gradient-weighted Class Activation Mapping (Grad-CAM) [11] methods map the predicted class score back to the previous convolutional layer to produce the class activation maps; deconvolutional network (DeConvNet) [12], guided backpropagation (Guided BackProp) [13] (Figure 4) and PatternNet [14] try to map all the activations of network back into the pixel input space to discover which patterns originally caused a given activation in the feature maps; and Layer-Wise Relevance Propagation (LRP) [15], Deep Taylor Decomposition (DTD) [16] (Figure 5), Deep Learning Important Features (DeepLIFT) [17], Integrated Gradients [18] and PatternAttribution [14] aim to evaluate how signal dimensions contribute to the output through the layers.

Following the current trend in this topic, there are already off-the-shelf packages and frameworks that can be used by less experienced practitioners. Some interesting ones are lime, Shap, iNNvestigate, keras-vis and Captum.


From the last section, one can infer that there is no shortage of interpretability methods available in the literature. Nonetheless, there are some problems associated with most of them, as the explanations they provide are often difficult for a human to understand, jeopardising their purpose of existence. This exact gap between the current interpretability methods and the explanation sciences (that describe the way humans explain and understand) is described in a recent paper by Mittelstadt et al. [19]. As pointed out by the authors of the paper, the available interpretability methods are more focused on scientific modelling than explanation giving. Therefore, we need a different approach to interpretability or of meta-explainability, i.e., explaining the explanations provided by interpretability or explainable artificial intelligence (xAI) methods.

A possible way we consider to overcome the need for meta-explainability is to provide different types of explanations for the same case, maximising the probability of having at least one explanation that is understandable to the explanation consumer. For example, consider a given Chest X-ray, and a certain machine learning model. The machine learning model analyses the X-ray and classifies it according to the disease detected. For a radiologist, it would probably make sense to explain the decision of the model by providing an interpretability saliency map, highlighting the regions of the image that were relevant for the decision. On the other hand, for a general clinician, it would probably make more sense to explain it by providing a textual description of what is happening in the image (since they are not as familiarised with the images as the radiologists are). If our system provided both explanations, both explanation consumers would be satisfied, and our problem would be solved.

Another possible approach is to completely rethink the way interpretability methods provide the explanations, approximating it from the way humans think and behave. An idea explored in the paper by Mittelstadt et al. [19] is building the explanations in a contrastive way. Therefore, instead of explaining the features that led to the decision, it may make sense to explain the decision based on what made the case different from “normal”. Moreover, the explanations should be adapted to the subjective interests of the explanation consumer and should be interactive, allowing their iterative improvement.

In summary, a lot of work has been done in the last years to develop interpretability methods able to explain the decisions of machine learning models. Nonetheless, most of the work was developed from a pure computer science perspective, not taking into account all the insights from the fields of medicine, psychology, philosophy or sociology, and therefore, falling short in terms of understandability for humans.

INESC TEC work on xAI

Since 2017, our group at INESC TEC is actively working in xAI. We have been addressing several relevant topics related to the interpretability domain, namely:

· The generation of explanations to satisfy all types of consumers;

· Analysing what type of features state-of-the-art anti-spoofing models focus when classifying images into bona-fine (real) or attack;

· The use of interpretability as an intermediate step to improve the existing machine learning models.

The generation of explanations to satisfy all types of consumers has received a lot of our attention. As previously pointed out, different people have different preferences and domain knowledge, and that also translates to the kind of explanations that they would like to receive.

In the past, we have proposed two machine learning models (one deep neural network, and one ensemble of models) [20, 21] that embedded in their architectures properties that increased their inherent interpretability (e.g., monotonicity, sparsity), and from which we were able to extract different types of explanations, or as we called it, complementary explanations. These complementary explanations consist of rule-based textual explanations and case-based visual explanations. Both types of explanations are provided for the same sample, which means that our approach increases the chance of satisfying all kinds of explanation consumers.

The second topic we have been focusing our efforts is the understanding of anti-spoofing machine learning models. Part of our team has worked in forensics and biometrics for years, trying to improve the machine learning models used in these fields. However, an interpretability study to understand what the machine learning models really learn remained largely undone. In a very recent work, we explored interpretability techniques to further assess the quality of the machine learning models used for anti-spoofing. With that goal in mind, we defined several properties we considered ideal for an anti-spoofing model, and we verified their fulfilment by performing a thorough experimental work. For instance, one of these ideal properties consists of having similar explanations for the sample regardless of being present in the train or only in the test (Figure 8). By obeying this rule, the model demonstrates to be robust and coherent.

Finally, we are also investigating ways of leveraging interpretability to improve machine learning models. One of our most recent works tackles precisely that. We took advantage of the fact that the interpretability saliency maps of a well-trained model help us localise the regions of interest of an image, and fine-tuned an existing classification model with these same saliency maps to improve its performance as a medical image retrieval system. By doing that, we were able to improve the retrieval process considerably, producing results very much in line with the ones obtained by an experienced radiologist [23].

Most of the interpretability work developed at INESC TEC and mentioned in this article was performed under the umbrella of CLARE (Computer-Aided Cervical Cancer Screening), which is an FCT financed project in which we are working in collaboration with Fraunhofer Portugal. Moreover, since April we are also involved in TAMI (Transparent Artificial Medical Intelligence), where we are collaborating with the company First Solutions, Fraunhofer Portugal, ARS–Norte and Carnegie Mellon University to improve the transparency of algorithms in medical decision support systems.

If you became interested in this topic, we invite you to read this survey on machine learning interpretability developed by our research group [24] and, for the next months and years, stay tuned! We will keep devoting our efforts to the xAI field and will do our best to tackle the limitations of today’s interpretability and machine learning models.

Short bio of authors

This article was written by INESC TEC researchers Wilson Silva and Tiago Gonçalves in August 2020.

Wilson Silva is a PhD Candidate in Electrical and Computer Engineering at Faculdade de Engenharia da Universidade do Porto (FEUP) and a research assistant at INESC TEC, where he is associated with the Visual Computing & Machine Intelligence (VCMI) and Breast research groups. He holds an integrated master (BSc + MSc) degree in Electrical and Computer Engineering obtained from FEUP in 2016. His main research interests include Machine Learning and Computer Vision, with a particular focus on Explainable Artificial Intelligence and Medical Image Analysis.

Tiago Gonçalves received his MSc in Bioengineering (Biomedical Engineering) from Faculdade de Engenharia da Universidade do Porto (FEUP) in 2019. Currently, he is a PhD Candidate in Electrical and Computer Engineering at FEUP and a research assistant at the Centre for Telecommunications and Multimedia of INESC TEC with the Visual Computing & Machine Intelligence (VCMI) Research Group. His research interests include machine learning, explainable artificial intelligence (in-model approaches), computer vision, medical decision support systems, and machine learning deployment.


[1] S. M. McKinney, M. Sieniek, V. Godbole, J. Godwin, N. Antropova, H. Ashrafian, T. Back, M. Chesus, G. C. Corrado, A. Darzi, et al., “International evaluation of an ai system for breast cancer screening,” Nature, vol. 577, no. 7788, pp. 89–94, 2020.

[2] M. E. Kaminski, “The right to explanation, explained,” Berkeley Tech. LJ, vol. 34, p. 189, 2019.

[3] C. Molnar, Interpretable Machine Learning. 2019. https://christophm. github.io/interpretable-ml-book/.

[4] J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Annals of statistics, pp. 1189–1232, 2001.

[5] A. Goldstein, A. Kapelner, J. Bleich, and E. Pitkin, “Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation,” Journal of Computational and Graphical Statistics, vol. 24, no. 1, pp. 44–65, 2015.

[6] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–32, 2001.

[7] M. T. Ribeiro, S. Singh, and C. Guestrin, “” why should i trust you?” explaining the predictions of any classifier,” in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1135–1144, 2016.

[8] L. S. Shapley, “A value for n-person games,” Contributions to the Theory of Games, vol. 2, no. 28, pp. 307–317, 1953.

[9] D. Smilkov, N. Thorat, B. Kim, F. Vi´egas, and M. Wattenberg, “Smoothgrad: removing noise by adding noise,” arXiv preprint arXiv:1706.03825, 2017.

[10] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2921– 2929, 2016.

[11] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradientbased localization,” in Proceedings of the IEEE international conference on computer vision, pp. 618–626, 2017.

[12] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in European conference on computer vision, pp. 818–833, Springer, 2014.

[13] J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller, “Striving for simplicity: The all convolutional net,” arXiv preprint arXiv:1412.6806, 2014.

[14] P.-J. Kindermans, K. T. Schu¨tt, M. Alber, K.-R. Mu¨ller, D. Erhan, B. Kim, and S. D¨ahne, “Learning how to explain neural networks: Patternnet and patternattribution,” arXiv preprint arXiv:1705.05598, 2017.

[15] G. Montavon, S. Lapuschkin, A. Binder, W. Samek, and K.-R. Mu¨ller, “Explaining nonlinear classification decisions with deep taylor decomposition,” Pattern Recognition, vol. 65, pp. 211–222, 2017.

[16] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Mu¨ller, and W. Samek, “On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation,” PloS one, vol. 10, no. 7, p. e0130140, 2015.

[17] A. Shrikumar, P. Greenside, and A. Kundaje, “Learning important features through propagating activation differences,” arXiv preprint arXiv:1704.02685, 2017.

[18] M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” arXiv preprint arXiv:1703.01365, 2017.

[19] B. Mittelstadt, C. Russell, and S. Wachter, “Explaining explanations in ai,” in Proceedings of the conference on fairness, accountability, and transparency, pp. 279–288, 2019.

[20] W. Silva, K. Fernandes, M. J. Cardoso, and J. S. Cardoso, “Towards complementary explanations using deep neural networks,” in Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pp. 133–140, Springer, 2018.

[21] W. Silva, K. Fernandes, and J. S. Cardoso, “How to produce complementary explanations using an ensemble model,” in 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8, IEEE, 2019.

[22] A. F. Sequeira, W. Silva, J. R. Pinto, T. Gonc¸alves, and J. S. Cardoso, “Interpretable biometrics: Should we rethink how presentation attack detection is evaluated?,” in 2020 8th International Workshop on Biometrics and Forensics (IWBF), pp. 1–6, IEEE, 2020.

[23] W. Silva, A. Poellinger, J. S. Cardoso, and M. Reyes, “Interpretabilityguided content-based medical image retrieval,” in MICCAI, 2020. to be published.

[24] D. V. de Carvalho, E. M. Pereira, and J. S. Cardoso, “Machine learning interpretability: A survey on methods and metrics,” Electronics (section: Artificial Intelligence), 2019.

INESC TEC is a private non-profit research institution, dedicated to scientific research and technological development.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store