Emergent Explainability: Adding a causal chain to neural network inference

Adam Perrett
The University of Manchester
Manchester, UK
[email protected]
Corresponding author

Abstract

This position paper presents a theoretical framework for enhancing explainable artificial intelligence (xAI) through emergent communication (EmCom), focusing on creating a causal understanding of AI model outputs. We explore the novel integration of EmCom into AI systems, offering a paradigm shift from conventional associative relationships between inputs and outputs to a more nuanced, causal interpretation. The framework aims to revolutionize how AI processes are understood, making them more transparent and interpretable. While the initial application of this model is demonstrated on synthetic data, the implications of this research extend beyond these simple applications. This general approach has the potential to redefine interactions with AI across multiple domains, fostering trust and informed decision-making in healthcare and in various sectors where AI’s decision-making processes are critical. The paper discusses the theoretical underpinnings of this approach, its potential broad applications, and its alignment with the growing need for responsible and transparent AI systems in an increasingly digital world.

1 Introduction and background

1.1 Opportunities and challenges of explainability

Explainable artificial intelligence (xAI) remains one of the foremost challenges in the field, crucial to equitable and fair application, particularly in sensitive domains like healthcare. For examples, understanding the impact of race in AI models is a common problemHuang et al. [2022]. Mitigating potential bias that exists within data and its effect on treatment begins with understanding the AI model. The core issue lies in the inherent opacity of advanced AI models, especially deep learning algorithms, which operate as "black boxes". These complex models can make highly accurate predictions or decisions, but understanding the causal chain leading to their output is not straightforward. This lack of transparency raises concerns about trustworthiness, accountability, and ethics, especially when AI is employed in critical areas such as medical diagnostics, treatment planning, or patient care management. This work addresses concerns of xAI with the goal of enabling AI to be a vital and interpretable tool for clinicians and patients alike, fostering trust and enabling informed decision-making. The novel framework also has many potential applications beyond healthcare, with the potential to reshape the way we interact with AI.

In the healthcare sector, xAI methods face significant challenges. To address concerns of safety, equity and trustworthiness, an understanding of causality is crucial. Simpler models like decision trees trade complexity for clarity, essential where decision rationales are crucial. However, complex models using tools like LIME Ribeiro et al. [2016] and SHAP Lundberg and Lee [2017] often yield limited, sometimes misleading insights. Visualization techniques in deep learning, such as attention maps, provide helpful interpretations but lack precise correlations with neural network processes Erhan et al. [2009], Simonyan et al. [2013]. Feature importance measures, while identifying key decision factors, can suffer from bias and lack causal clarity Zhou et al. [2016], Selvaraju et al. [2017]. These methods, enhancing reliability and aiding compliance, struggle with a fundamental trade-off between accuracy and explainability. This leads to technical complexities and the risk of oversimplified interpretations, underscoring the need for more refined and transparent explainability techniques in high-stakes healthcare AI applications.

This work focuses on enhancing xAI in healthcare, a key area in need of growth and equitable application. It aims to develop a novel and practical framework for AI explainability, particularly in healthcare diagnostics and patient care. The proposed research will imbue the AI model with a causal communication channel that can be used for validation and transparency of output. The expected outcomes include improved AI clarity and trust between clinicians and patients, with the potential to inform policy in AI ethics. This effort is geared towards contributing meaningful research and reshaping the way we use AI as a tool.

Refer to caption — Figure 1: The contextualiser network receives the task ID (e.g. task = is there a dog) and passes a message to the actor network. The actor network processes the message and an input to produce an output (e.g. is there a dog + this image = yes/no). The actor is trained with supervised learning and the contextualiser is trained with reinforcement learning, as errors cannot propagate between networks.

1.2 Emergent communication

Emergent communication (EmCom) in AI Brandizzi [2023], a relatively nascent field, presents unique challenges and opportunities in the context of explainability. The principle behind EmCom is to have two or more agents exchanging messages in order to solve a task, see Fig.1. Altering the task set-up can alter the compositionality and generalisation of the generated language Mu and Goodman [2021], Lowe et al. [2020]. Current research focuses on investigating the generated language and comparing it to human language Li and Bowling [2019], Lazaridou and Baroni [2020], Lowe et al. [2019]. The research community sees the opportunity to apply this further and scale it up Chaabouni et al. [2022], highlighting the growth possible in this fruitful domain. A key shift this work aims to exploit is moving from message passing being input dependant to task dependant. This raises the level of abstraction of the message from a label to an instruction.

1.3 Emergent explainability

The aim of this research is to utilise the mechanisms of EmCom to make neural network explainability causal. Current xAI methods reveal associative relationships between inputs and outputs but are often limited to analysing single input presentations and say little about causality. The proposed work hinges on the following principles:

•

The messages used in EmCom can be made human interpretable (e.g. text or image)
•

The message and the output share a causal relationship, as the task can only be solved with the message
•

An appropriately generalised language facilitates: transfer learning, data privatisation, distributed learning

2 Methodology

Experimental work focuses on synthetic data. Work has also been performed using the MNIST dataset, although this is still currently under investigation. Preliminary results will be discussed, with detailed analysis to follow. Integration with human interpretable message passing is currently theoretical.

2.1 Experimental design

All possible truth tables for three inputs ( $n=3$ ) act as the family of tasks that are trained on. This gives $2^{2^{3}}=256$ possible truth tables and $2^{3}*2^{2^{3}}=1024$ training examples. Each truth table is allocated a random One Hot Encoding (OHE) to act as the task ID, this will be passed to the contextualiser to generate a truth table specific message. A random OHE is used to unsure there is as little possible information present in the task ID and all communicated information must be learnt during training. The actor will then receive the message and a set of 3 inputs.

2.2 Agent setup

Each agent has a separate contextualiser network and actor network. Both have two hidden layers of 128 ReLU activation neurons, only difference between them being their inputs, outputs, and initialisation. The conextualiser has as many inputs as there are tasks, as it receives a OHE for the task ID. There is a context layer of 32 ReLU neurons places after the final hidden layer. The activation of these neurons forms the message that is passed. There is then a single dense layer which samples from this to predict the expected reward of the actor. The error between this and the actual actor reward forms the reinforcement learning signal. The actor network has $3+32$ inputs (input size + context length) and 2 outputs (binary 0 and 1 for the truth table entry). It is trained using supervised cross-entropy.

2.3 Communication protocol

Fig. 2 shows how the training data is distributed. Agents are allocated a unique portion of tasks which it can act as a contextualiser for, $r_{c}$ . It can act as an actor for all these and an additional portion of the remaining tasks, $r_{a}$ . The remaining tasks will be unseen by the agent throughout training. There is a truth table associated with each training example. For each example an agent which can act as a contextualiser is randomly selected. It is given as input the OHE associated with the task ID and produces the associated context. An agent which can be an actor for this training example is then selected and given the associated message and input.

2.4 Training

A batch size of 512 is used with an Adam optimiser using learning rate 0.001. Cross-entropy loss between the target and actual output is used to train the actor. Mean-squared error between the predicted and actual actor accuracy is used to train the contextualiser using reinforcement learning. When the contextualiser and actor are both the same agent, gradients are allowed to flow back through the actor and into the communicated message.

2.5 Evaluation

The key evaluation metric is the performance of an agent on unseen truth tables. The only way to achieve higher than random chance accuracy is for the communicated messages to contain generalised information about the structure of the family of tasks. This way, even though an agent has never performed the task, it can understand the message and perform the task adequately.

2.6 Work in Progress - Human interpretability

This methodology is currently under development. The principle is to alter the message passing between agents to be human interpretable. This can take the form of an image related to the task, such as the features to look for e.g. the task is looking for a dog so the communicated message contains a nose and floppy ears. This can be injected into the convolutional kernels or concatenated downstream. It is also possible that the message passing can be human language. This would require the message to contain all relevant information to complete the task.

3 Experiments and Results

3.1 Examining the effect of actor overlap

The variable $r_{a}$ , controlling the ratio of examples in which an agent is not a contextualiser but can be an actor is explored in Fig, 3. As the amount of ’information sharing’ increases so does the classification accuracy on unseen data. This displays that interaction between agents is key to generating a generalised communication. This is most evident when there is no sharing and the performance does not increase above random chance.

4 Discussion and Limitations

This work forms a position paper displaying the potential for increasing abstract message passing between agents compared to the current literature. It has been displayed for communicating information about a family of tasks and has been shown to generalise outside of tasks an agent is trained on.

There is still much work to do in imbuing this communication channel with human interpretable messages but this is a stepping stone in that direction and a signpost for future endeavours.

With the novel transition in this work of EmCom from communicating a label to communicating task information, the language is a higher level of abstraction and more focused on function. It has already been proven that languages can emerge which generalise beyond training data Mu and Goodman [2021], ?. This naturally leads to networks which can generalise across tasks, given the appropriate context. This can allow training data to remain proprietary whilst still sharing learnt information with another network. It can enable distributed learning, as many agents can go off and learn different tasks and return with their understanding. Obvious extensions to transfer learning exist due to the task independent nature of the language. There is also reason to believe that embedding such causal grounding in LLMs will help alleviate problems such as hallucinations and mistakes with trivial request like simple maths. This is because the framework will remove the sole dependence on the statistical regularities of text and instead reposition the text as a medium with which the task can be understood. When this condition is met the initial contextualiser in the EmCom framework can be replaced with a human, and the actor can then be harnessed as a generalised task solver.

References

Huang et al. [2022] Jonathan Huang, Galal Galal, Mozziyar Etemadi, and Mahesh Vaidyanathan. Evaluation and mitigation of racial bias in clinical machine learning models: Scoping review. JMIR Med Inform, 10(5):e36388, May 2022. ISSN 2291-9694. doi: 10.2196/36388. URL https://medinform.jmir.org/2022/5/e36388.
Ribeiro et al. [2016] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. "why should i trust you?": Explaining the predictions of any classifier. KDD ’16, page 1135–1144, New York, NY, USA, 2016. Association for Computing Machinery. ISBN 9781450342322. doi: 10.1145/2939672.2939778. URL https://doi.org/10.1145/2939672.2939778.
Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf.
Erhan et al. [2009] Dumitru Erhan, Yoshua Bengio, Aaron Courville, and Pascal Vincent. Visualizing higher-layer features of a deep network. University of Montreal, 1341(3):1, 2009.
Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013.
Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2921–2929, 2016.
Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017.
Brandizzi [2023] Nicolo’ Brandizzi. Toward more human-like ai communication: A review of emergent communication research. IEEE Access, 11:142317–142340, 2023. doi: 10.1109/ACCESS.2023.3339656.
Mu and Goodman [2021] Jesse Mu and Noah Goodman. Emergent communication of generalizations. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 17994–18007. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/9597353e41e6957b5e7aa79214fcb256-Paper.pdf.
Lowe et al. [2020] Ryan Lowe, Abhinav Gupta, Jakob N. Foerster, Douwe Kiela, and Joelle Pineau. On the interaction between supervision and self-play in emergent communication. CoRR, abs/2002.01093, 2020. URL https://arxiv.org/abs/2002.01093.
Li and Bowling [2019] Fushan Li and Michael Bowling. Ease-of-teaching and language structure from emergent communication. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/b0cf188d74589db9b23d5d277238a929-Paper.pdf.
Lazaridou and Baroni [2020] Angeliki Lazaridou and Marco Baroni. Emergent multi-agent communication in the deep learning era. CoRR, abs/2006.02419, 2020. URL https://arxiv.org/abs/2006.02419.
Lowe et al. [2019] Ryan Lowe, Jakob N. Foerster, Y-Lan Boureau, Joelle Pineau, and Yann N. Dauphin. On the pitfalls of measuring emergent communication. CoRR, abs/1903.05168, 2019. URL http://arxiv.org/abs/1903.05168.
Chaabouni et al. [2022] Rahma Chaabouni, Florian Strub, Florent Altché, Eugene Tarassov, Corentin Tallec, Elnaz Davoodi, Kory Wallace Mathewson, Olivier Tieleman, Angeliki Lazaridou, and Bilal Piot. Emergent communication at scale. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=AUGBfDIV9rL.