This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

User Decision Guidance with Selective Explanation Presentation
from Explainable-AI

Yosuke Fukuchi1 and Seiji Yamada2,3 1Faculty of Systems Design, Tokyo Metropolitan University, Tokyo, Japan [email protected]. 2Digital Content and Media Sciences Research Division, National Institute of Infomatics, Tokyo, Japan. 3The Graduate University for Advanced Studies, SOKENDAI, Tokyo, Japan. This work was supported in part by JST CREST Grant Number JPMJCR21D4 and JSPS KAKENHI Grant Number JP24K20846.
Abstract

This paper addresses the challenge of selecting explanations for XAI (Explainable AI)-based Intelligent Decision Support Systems (IDSSs). IDSSs have shown promise in improving user decisions through XAI-generated explanations along with AI predictions, and the development of XAI made it possible to generate a variety of such explanations. However, how IDSSs should select explanations to enhance user decision-making remains an open question. This paper proposes X-Selector, a method for selectively presenting XAI explanations. It enables IDSSs to strategically guide users to an AI-suggested decision by predicting the impact of different combinations of explanations on a user’s decision and selecting the combination that is expected to minimize the discrepancy between an AI suggestion and a user decision. We compared the efficacy of X-Selector with two naive strategies (all possible explanations and explanations only for the most likely prediction) and two baselines (no explanation and no AI support). The results suggest the potential of X-Selector to guide users to AI-suggested decisions and improve task performance under the condition of a high AI accuracy.

I Introduction

Intelligent Decision Support Systems (IDSSs) [1], empowered by Artificial Intelligence (AI), have the potential to help users make better decisions by introducing explainability into their support. An increasing number of methods have been proposed for achieving explainable AIs (XAIs) [2], and the development of large language models (LLMs) has also made it possible to generate various post-hoc explanations that justify AI predictions. Previous studies have integrated such XAI methods into IDSSs and shown their effectiveness in presenting explanations along with AI predictions in diverse applications [3, 4].

Now that IDSSs can have a variety of explanation candidates, a new question arises as to which explanations an IDSS should provide in dynamic interaction. Explanation is a complex cognitive process [5]. Although XAI explanations can potentially guide users to make better decisions, there is also a risk of having negative effects on explainees’ decisions. Various causes including explanation uninterpretability [6, 7], information overload [8, 9], and contextual inaccuracy [10] can affect users and thus the performance of decision-making. A subtle difference in the nuance of a linguistic explanation can also have a different impact and sometimes mislead user decisions when influenced by the context, the status of the task, and the cognitive and psychological status of the users. Conversely, we can expect that IDSSs can be greatly enhanced if they can strategically select explanations that are likely to lead users to better decisions while taking the situation into account.

To address the question of how IDSSs can select explanations to improve user decisions, this paper proposes X-Selector, a method for dynamically selecting which explanations to provide along with an AI prediction. The main characteristic of X-Selector is that it predicts how explanations affect user decision-making for each trial and attempts to guide users to an AI-suggested decision referring to the prediction results. The design of guiding users with explanation selection is inspired by libertarian paternalism [11], the idea of influencing one’s choices to better ones while embracing the autonomy of their decision-making.

This paper also reports user experiments that simulated stock trading with an XAI-based IDSS. In a preliminary experiment, we compared two naive but common strategies—ALL (providing all possible explanations) and ARGMAX (providing only explanations for the AI’s most probable prediction)—against baseline scenarios providing no explanations or no decision support. The results suggest that the ARGMAX strategy works better with high AI accuracy, and ALL is more effective when AI accuracy is lower, indicating that the strategy for selecting explanations affects user performance. In the second experiment, we compared the results of explanations selected by X-Selector with ARGMAX and ALL. The results indicate the potential of X-Selector’s selective explanations to more strongly lead users to suggested decisions and to outperform ARGMAX when AI accuracy is high.

II Background

II-A XAIs for deep learning models

While various methods such as Fuzzy Logic and Evolutionary Computing have been introduced to IDSSs, this paper targets IDSSs with Deep Learning (DL) models. IDSSs driven by DL models are capable of dealing with high-dimensional data such as visual images and are actively studied in diverse fields [12, 13, 14]. Due to their blackbox nature, explainability for DL models is also an area of active research, and this can potentially offer benefits for IDSSs.

There are various forms of explanations depending on the nature of the target AI. Common explanations for visual information processing AIs include presenting saliency maps. The class activation map (CAM) is a widely used method for visualizing a saliency map of convolutional neural network (CNN) layers [15]. It identifies the regions of an input image that contribute the most for a model to classify the image into a particular class.

Language is also a common modality of XAIs, and free-text explanation is becoming rapidly available thanks to the advance of LLMs [16]. LLMs can generate post-hoc explanations for AI predictions. Here, post-hoc means that the explanations are generated after the AI’s decision-making process has occurred, as opposed to intrinsic methods that generate explanations in an integral part of that process [2].

II-B Human-XAI interaction

The theme of this study involves how to facilitate users’ appropriate use of AI. Avoiding human over/under-reliance on an AI is a fundamental problem of human-AI interaction [17]. Here, over-reliance is a state in which a human overestimates the capability of an AI and blindly follows its decision, whereas under-reliance is a state in which a human misuses an AI even though it can perform well.

Although explanation is believed to generally help people appropriately use AI by providing transparency in AI predictions, previous studies suggest that XAI explanations do not always work positively [18]. Maehigashi et al. demonstrated that presenting AI saliency maps has different effects on user trust in an AI depending on the task difficulty and the interpretability of the saliency map [6]. Herm revealed that the type of XAI explanation strongly influences users’ cognitive load, task performance, and task time [9]. Panigutti et al. conducted a user study with an ontology-based clinical IDSS, and they found that the users were influenced by the explanations despite the low perceived explanation quality [19]. These results suggest potential risks of triggering under-reliance with explanations or, conversely, leading to users blindly following explanations from an IDSS even if the conclusion drawn from the explanations is incorrect.

This study aims to computationally predict how explanations affect user decisions in order to avoid misleading users and encourage them to make better decisions by selecting explanations. Work by Wiegreffe et al. [16] shares a similar concept with this study. They propose a method of evaluating explanations generated by LLMs by predicting human ratings of their acceptability. This approach is pivotal in understanding how users perceive AI-generated explanations. However, our study diverges by focusing on the behavioral impacts of these explanations on human decision-making. We are particularly interested in how these explanations can alter the decisions made by users getting an IDSS’s support, rather than just their perceptions of the explanations. Another relevant study, Pred-RC [20, 21], aims to predict the effect of explanations of AI performance so that users can avoid over/under-reliance. It dynamically predicts a user’s binary judgment of whether s/he assigns a task to the AI and selects explanations that guide him/her to better assignment. X-Selector aims to take a further step to predict concrete decisions taking the effects of explanations into account and proactively influences them to improve the performance of each decision-making.

III X-Selector

III-A Overview

This paper proposes X-Selector, a method for dynamically selecting explanations for AI predictions. The idea of X-Selector is that it predicts user decisions under possible combinations of explanations and chooses the best one that is predicted to minimize the discrepancy between the decision that the user is likely to make and the AI-suggested one.

III-B Algorithm

The main components of X-Selector are UserModel and π\pi. UserModel is a model of a user who makes decision dud_{u}. X-Selector uses it for user decision prediction:

UserModel(𝒄,𝒙,d)=P(du=d|𝒄,𝒙).\mathrm{UserModel}(\bm{c},\bm{x},d)=P(d_{u}=d|\bm{c},\bm{x}). (1)

The output of UserModel is represented as a probaility distribution of dud_{u} conditioned by 𝒄\bm{c} and 𝒙\bm{x}, where 𝒙𝑿\bm{x}\in\bm{X} is a combination of explanations to be presented to the user, and 𝒄\bm{c} represents all the other contextual information including AI predictions, task status, and user status. In this paper, we developed a dataset of (𝒄,𝒙,\bm{c},\bm{x}, and dud_{u}) and prepared a machine learning model that was trained with the dataset for implementing this.

In addition, X-Selector has a policy π\pi, which considers a decision dAId_{\mathrm{AI}} based on 𝒄\bm{c}. This inference is done in parallel with user decision-making:

π(𝒄,d)=P(dAI=d|𝒄).\pi(\bm{c},d)=P(d_{\mathrm{AI}}=d|\bm{c}). (2)

X-Selector aims to minimize the discrepancy between dud_{u} and dAId_{\mathrm{AI}} by comparing the effect of each 𝒙\bm{x} on dud_{u}. The selected combination 𝒙^\hat{\bm{x}} is calculated as:

𝒙^=argmin𝒙|EdUserModel(𝒄,𝒙,d)[d]Edπ(𝒄,d)[d]|.\hat{\bm{x}}=\mathrm{argmin}_{\bm{x}}|E_{d\sim{\mathrm{UserModel}}(\bm{c},\bm{x},d)}[d]-E_{d\sim\pi(\bm{c},d)}[d]|. (3)

To calculate this equation, X-Selector simulates how 𝒙\bm{x} will change dud_{u} using UserModel and aims to choose the best one that guides dud_{u} to dAId_{\mathrm{AI}} the most.

III-C Implementation

III-C1 Task

We implemented X-Selector in a stock trading simulator in which users get support from an XAI-based IDSS. Figure 1 shows screenshots of the simulator. The simulation was conducted on a website. Participants were virtually given three million JPY and traded stocks for 60 days with a stock price chart, AI prediction of the future stock price, and explanations for the prediction.

In the simulation, participants checked the opening price and a price chart for each day and decided whether to buy stocks with the funds they had, sell stocks they had, or hold their position. In accordance with Japan’s general stock trading system, participants could trade stocks in units of 100 shares. Participants were asked to show their decision twice a day to clarify the influence of the explanations. They were first asked to decide an initial order dd^{\prime}, that is, the amount of trade only with chart information and without the support of the IDSS. Then, the IDSS showed a bar graph that indicated the output of a stock price prediction model and its explanations. We did not explicitly show dAId_{\mathrm{AI}} to enhance the autonomy of users’ decision-making, which is inspired by libertarian paternalism [11], the idea of affecting behavior while respecting freedom of choice as well. However, we can easily extend X-Selector to a setting in which dAId_{\mathrm{AI}} is given to users by including it with 𝒄\bm{c} (when you always show dAId_{\mathrm{AI}}) or 𝒙\bm{x} (when you want to selectively show dAId_{\mathrm{AI}}). Finally, they input their final order dd. After this, the simulator immediately transited to the next day. The positions carried over from the final day were converted into cash on the basis of the average stock price over the next five days to calculate the participants’ total performance.

Refer to caption
(a) Chart
Refer to caption
(b) Examples of StockAI’s prediction and its explanations
Figure 1: Screenshots of trading simulator

III-C2 StockAI

In the task, an IDSS provides a prediction of a stock price prediction model (StockAI) as user support. StockAI is a machine-learning model that is designed to predict the average stock price in the next five business days, and we used its prediction as the target of the explanation provided to users. StockAI predicts future stock prices on the basis of an image of a candlestick chart. Although using a candlestick chart as an input does not necessarily lead to better performance than modern approaches proposed in the latest studies [22], we chose this because of the better understandability of saliency maps generated with the model as an explanation of AI predictions. Note that the aim of this research is not building a high-performance prediction model but investigating the interaction between a human and an AI whose performance is not necessarily perfect.

For the implementation of StockAI, we used ResNet-18 [23], a deep-learning visual information processing model, using the PyTorch library 111https://pytorch.org/. The StockAI is trained in a supervised manner; it classifies the ratio of the future stock price to the opening price of the day into three classes: BULL (over +2%), NEUTRAL (from -2 to +2%), and BEAR (under -2%). The prediction results are presented as a bar graph of the probability distribution for each class, which hereafter denoted as pp. For the training, we collected the historical stock data (from 2018/5/18 to 2023/5/16) of companies that are included in the Japanese stock index Nikkei225. We split the data by stock code, with three-quarters of the data as the training dataset and the remainder as the test. The accuracy with which the model was able to predict the correct class among the three classes was 0.474, and the accuracy for binary classification, or the matching rate of the sign of the expected value of the model’s prediction and that of actual fluctuations, was 0.63 for the test dataset.

III-C3 Explanations

We prepared two types of explanations: saliency maps and free-texts. We applied CAM-based methods available in the pytorch-grad-cam package 222https://github.com/frgfm/torch-cam to StockAI and adopted Score-CAM [24] because it most clearly visualizes saliency maps of StockAI. Because CAM-based methods can generate a saliency map for each prediction class, three maps were acquired for each prediction. Let 𝒙map\bm{x}_{\mathrm{map}} be the set of the acquired maps.

In addition, we created a set of free-text explanations based on the GPT-4V model in the Open-AI API [25], which allows images as input. We input a chart with a prompt that asked GPT-4V to generate two explanation sentences that justify each prediction class (BULL, NEUTRAL, and BEAR). Therefore, we acquired six sentences in total for each chart. Let us denote the set of them by 𝒙text\bm{x}_{\mathrm{text}}.

As a result, three saliency maps and six free-text explanations were available for each trading day, and X-Selector considered 29=5122^{9}=512 combinations of the selected explanations (𝒙^𝒙map𝒙text\hat{\bm{x}}\subseteq\bm{x}_{\mathrm{map}}\cup\bm{x}_{\mathrm{text}}).

III-C4 Models

Refer to caption
Figure 2: Structure of UserModel

We implemented UserModel with a deep learning model (Figure 2). The input of UserModel is a tuple (𝒄,𝒙)(\bm{c},\bm{x}). 𝒄\bm{c} includes four variables: date ii, StockAI’s prediction pp, total rate δ\delta, and initial order dd^{\prime}. ii is a categorical variable that embeds the context of the day such as the stock price. pp is a three-dimensional vector that corresponds to the values in the bar graph (Fig. 1). δ\delta is the percentage increase or decrease of the user’s total asset from the initial amount. ii and the other variables are encoded in 2048-dimensional vectors (hi,hp,hr,hd)(h_{i},h_{p},h_{r},h_{d^{\prime}}) with the Embedding and Linear modules implemented in PyTorch, respectively.

Let us denote 𝒙map\bm{x}_{\mathrm{map}} and 𝒙text\bm{x}_{\mathrm{text}} as a set {(x,cls,flag)}\{(x,cls,flag)\}, where xx is the raw data of an explanation, and cls{BULL,NEUTRAL,BEAR}cls\in\{\textrm{BULL},\textrm{NEUTRAL},\textrm{BEAR}\}. flag=1flag=1 if xx is to be presented, and 0 when hidden. 𝒙map\bm{x}_{\mathrm{map}} and 𝒙text\bm{x}_{\mathrm{text}} are also encoded in 2048-dimensional vectors (hmap,htext)(h_{\mathrm{map}},h_{\mathrm{text}}):

hmap\displaystyle h_{\mathrm{map}} =x,cls,flag𝒙mapflag(CNN(x)ClsEmbedding(cls)),\displaystyle=\sum_{\begin{subarray}{c}x,cls,\\ flag\end{subarray}\in\bm{x}_{\mathrm{map}}}flag\cdot(CNN(x)\odot ClsEmbedding(cls)),
htext\displaystyle h_{\mathrm{text}} =x,cls,flag𝒙textflag(TextEncoder(x)ClsEmbedding(cls)),\displaystyle=\sum_{\begin{subarray}{c}x,cls,\\ flag\end{subarray}\in\bm{x}_{\mathrm{text}}}flag\cdot(TextEncoder(x)\odot ClsEmbedding(cls)),

where \odot denotes an element-wise product. CNNCNN is a three-layer CNN model. For TextEncoderTextEncoder, we used the E5 (embeddings from bidirectional encoder representations) model [26] with pretrained parameters333https://huggingface.co/intfloat/multilingual-e5-large.

All embedding vectors (hi,hp,hr,hd,hmap,htexth_{i},h_{p},h_{r},h_{d^{\prime}},h_{\textrm{map}},h_{\textrm{text}}) are concatenated and input to a three-layer linear model. To extract the influence of explanations, the model was trained to predict not dd but the difference ddd-d^{\prime}. In our initial trial, UserModel always predicted ddd-d^{\prime} to be nearly 0 due to the distributional bias, so we added an auxiliary task of predicting whether d=dd=d^{\prime} and trained the model for predicting ddd-d^{\prime} only when ddd\neq d^{\prime}. The expected value of dud_{u} in Equation 3 is P(dd)(dd)+dP(d\neq d^{\prime})\cdot(d-d^{\prime})+d^{\prime}.

π\pi was acquired with the deep deterministic policy gradient, a deep reinforcement learning method [27]. We simply trained π\pi to decide dd to maximize assets on the basis of pp for the training dataset. The reward for the policy is calculated as the difference in total assets between the current day and the previous day.

IV Experiments

IV-A Preliminary experiment

IV-A1 Procedure

We conducted a preliminary experiment to investigate the performance of users who were provided explanations with two naive strategies (ALL and ARGMAX). ALL shows all the nine explanations available for each day, and ARGMAX selects explanations for StockAI’s most probable prediction. To ensure the quality of the explanations, we also prepared two baselines: ONLY_PRED shows pp but does not provide any explanations. In PLAIN, participants received no support from the IDSS and acted on their own.

For simulation, we chose a Japanese general trading company (code: 2768) from the test dataset on the basis of the common stock price range (1,000 - 3,000 JPY) and its high volatility compared with the other Nikkei225 companies.

Because we had anticipated that the accuracy of StockAI would affect the result, we prepared two scenarios: high-accuracy and low-accuracy. We calculated the moving average of the accuracy of StockAI with a window size of 60 and chose two sections for them. The accuracy of StockAI for high-accuracy was 0.750, which was the highest, and that for low-accuracy was 0.333, the chance level of three-class classification.

We recruited participants to join the simulation with compensation of 220 JPY through Lancers444https://lancers.jp/, a Japanese crowdsourcing platform, and got 336 participants. The participants were first provided pertinent information, and 325 consented to the participation. We gave them instructions on the task and gave basic explanations about stock charts and the price prediction AI. We instructed the participants to increase the given three million JPY as much as possible by trading with the IDSS’s support. To motivate them, we told them that additional rewards would be given to the top performers. We did not notify them of the amount of the additional rewards and the number of participants who got them. We asked six questions to check their comprehension of the task. 34 participants who failed to answer correctly were excluded from the task. After familiarization with the trading simulator, the participants traded for 60 virtual days successively. 242 participants completed the task (152 males, 88 females, and 2 did not answer; aged 14-77, M=42.8,SD=10.1M=42.8,SD=10.1). Table I gives details on the sample sizes.

TABLE I: Sample sizes of preliminary experiment
ALL ARGMAX ONLY_PRED PLAIN
High-accuracy 39 40 34 41
Low-accuracy 31 34 34 38

IV-A2 Result

Figure 3 shows the changes in the participants’ performance. A conspicuous result is the underperformance of PLAIN, particularly in the high-accuracy scenario. ONLY_PRED performed well for high-accuracy, but could not outperform PLAIN for low-accuracy. This suggests that presenting pp alone contributes to improving performance only when it has enough accuracy.

ALL and ARGMAX showed different results between the scenarios. For high-accuracy, ARGMAX outperformed ALL. ALL slightly underperformed the ONLY_PRED baseline as well. This suggests that ARGMAX explanations successfully guided users to follow the prediction of StockAI while ALL toned down the guidance, which worked negatively in this scenario. On the other hand, ALL outperformed ARGMAX and the baselines for low-accuracy. Interestingly, ARGMAX also outperformed the baselines, which suggests that explanations successfully provide users with insights into situations and AI accuracy and can contribute to better decision-making. ALL positively worked for low-accuracy by providing multiple perspectives.

Refer to caption
(a) High-accuracy
Refer to caption
(b) Low-accuracy
Figure 3: Comparisons of baseline total assets. Error bands represent standard errors.

IV-B Experiment with X-Selector

IV-B1 Procedure

Refer to caption
(a) Distribution of correlation coefficient between dud_{u} and dAId_{\mathrm{AI}} for each user
Refer to caption
(b) User performance
Figure 4: Results for high-accuracy scenario

To evaluate X-Selector, we conducted a simulation with its selected explanations. To train UserModel, we used the data of the preliminary experiment and additional data acquired in another experiment in which explanations were randomly selected. We added the data to broaden the variety of explanation combinations in the dataset. The numbers of the additional participants were 54 and 45 for high- and low-accuracy, respectively. We conducted a 4-fold cross validation for UserModel, and the correlation coefficient between the model’s predictions and the ground truths was 0.429 on average (SD=0.056SD=0.056).

We obtained a participation of 97 participants. Finally, 39 and 35 participants completed the task for high-accuracy and low-accuracy, respectively (46 males, 26 females and 2 did not answer; aged 23-64, M=39.6,SD=10.0M=39.6,SD=10.0).

To analyze the results, we compared the correlation coefficient between dAId_{\mathrm{AI}} and dud_{u} for each participant as a measure of whether X-Selector could successfully guided users to dAId_{\mathrm{AI}}, as well as the comparison of user performance that we did in the preliminary experiment.

IV-B2 Result

Figure 4 shows the correlation coefficient distribution between dAId_{\mathrm{AI}} and dud_{u} in the high-accuracy condition. The results for ALL and ARGMAX are also shown for comparison. Notably, while the peaks for ALL and ARGMAX are centered around zero, X-Selector shifted this peak to the right, indicating a stronger correlation between dud_{u} and dAId_{\mathrm{AI}} for a greater number of participants. This means that X-Selector effectively guides users to dAId_{\mathrm{AI}} without coercing but presenting explanations selectively.

Figure 4 illustrates the user performance. X-Selector generally outperformed ALL and ARGMAX, meaning that X-Selector enabled users to trade better with selective explanations. In more detail, X-Selector first underperformed ARGMAX, but the score reversed on day 16. The gap once narrowed near day 39, but it broadened again until the end.

A possible reason for X-Selector’s better performance is that it can predict which combination of explanations guides participants to sell or buy shares more. For example, the stock price around day 16 dropped steeply, so the IDSS needed to guide participants to reverse their position. Here, whereas ARGMAX showed explanations for BEAR, X-Selector showed explanations for NEUTRAL as well as BEAR, which may have helped users sell their shares more. Similarly, X-Selector also attempted to guide users to buy a moderate amount when dAId_{\mathrm{AI}} was positive but not high by, for example, showing only a saliency map for BULL and no text explanations. Another reason is that X-Selector can overcome the ambiguity in the interpretation of pp. pp reflects a momentum of stock price in the high-accuracy scenario and must provide some insight for trading, but it was up to the participants how to use this to actually decide their order. π\pi sometimes suggested that they buy shares even though NEUTRAL or BEAR was the most likely in pp. Thus, we can say that pp was poorly calibrated, but by referring to π\pi, X-Selector can avoid misleading participants and instead lead them to more promising decisions.

On the other hand, X-Selector underperformed ARGMAX until day 16. The stock price was in an uptrend until day 14, and ARGMAX continuously presented explanations for BULL for 12 days in a row, which may have strongly guided participants to buy stocks and lead to large benefits. In our implementation, UserModel considers only the explanations of the day and does not consider the history of what explanations were previously provided, which can be a next target for future work.

Refer to caption
(a) Distribution of correlation coefficient
Refer to caption
(b) User performance
Figure 5: Results for low-accuracy scenario

X-Selector could not improve user performance in the low-accuracy scenario (Figure 5). Overall, the result was similar to ARGMAX and underperformed ALL. We further focused on the correlation coefficient between dud_{u} and dAId_{\mathrm{AI}}. Figure 5 shows that, contrary to the high-accuracy scenario, X-Selector did not increase the score. The different results between the high- and low-accuracy scenarios indicate the possibility that participants actively assessed the reliability of the AI and autonomously decided whether to follow X-Selector’s guidance. This itself highlights a positive aspect of introducing libertarian paternalism to human-AI interaction in that users can potentially avoid AI failure depending on its reliability. However, this did not result in improving their performance in this scenario. The lack of correlation between the score and the final asset amounts in the X-Selector condition (r=0.048r=0.048) suggests that merely disregarding the AI’s guidance does not guarantee performance improvement. A future direction for this problem can be developing a mechanism to control the strength of AI guidance and provide explanations in more neutral way depending on AI accuracy.

V Conclusion

This paper investigated the question of how IDSSs can select explanations, and we proposed X-Selector, which is a method for dynamically selecting which explanations to provide along with an AI prediction. In X-Selector, UserModel predicts the effect of presenting explanations on a user decision for each possible combination to show. Then, it selects the best combination that minimizes the difference between the predicted user decision and the AI’s suggestion. We applied X-Selector to a stock trading simulation with the support of an XAI-based IDSS. The result indicated that X-Selector can select explanations that guide users to suggested decisions effectively and improve the performance when the accuracy of the AI is high, and in addition, it revealed a new challenge of X-Selector for low-accuracy cases.

References

  • [1] G. Phillips-Wren, Intelligent Decision Support Systems, 02 2013, pp. 25–44.
  • [2] A. Adadi and M. Berrada, “Peeking inside the black-box: A survey on explainable artificial intelligence (xai),” IEEE Access, vol. 6, pp. 52 138–52 160, 2018.
  • [3] M. H. Lee and C. J. Chew, “Understanding the effect of counterfactual explanations on trust and reliance on ai for human-ai collaborative clinical decision making,” Proc. ACM Hum.-Comput. Interact., vol. 7, no. CSCW2, oct 2023.
  • [4] D. P. Panagoulias, E. Sarmas, V. Marinakis, M. Virvou, G. A. Tsihrintzis, and H. Doukas, “Intelligent decision support for energy management: A methodology for tailored explainability of artificial intelligence analytics,” Electronics, vol. 12, no. 21, 2023.
  • [5] T. Miller, “Explanation in artificial intelligence: Insights from the social sciences,” Artificial Intelligence, vol. 267, pp. 1–38, 2019.
  • [6] A. Maehigashi, Y. Fukuchi, and S. Yamada, “Modeling reliance on xai indicating its purpose and attention,” in Proc. Annu. Meet. of CogSci, vol. 45, 2023, pp. 1929–1936.
  • [7] ——, “Empirical investigation of how robot’s pointing gesture influences trust in and acceptance of heatmap-based xai,” in 2023 32nd IEEE Intl. Conf. RO-MAN, 2023, pp. 2134–2139.
  • [8] A. N. Ferguson, M. Franklin, and D. Lagnado, “Explanations that backfire: Explainable artificial intelligence can cause information overload,” in Proc. Annu. Meet. of CogSci, vol. 44, no. 44, 2022.
  • [9] L.-V. Herm, “Impact of explainable ai on cognitive load: Insights from an empirical study,” in 31st Euro. Conf. Info. Syst., 2023, 269.
  • [10] U. Ehsan, P. Tambwekar, L. Chan, B. Harrison, and M. O. Riedl, “Automated rationale generation: A technique for explainable ai and its effects on human perceptions,” in Proc. 24th Int. Conf. IUI, 2019, p. 263–274.
  • [11] C. R. Sunstein, Why Nudge?: The Politics of Libertarian Paternalism.   Yale University Press, 2014.
  • [12] M. Kraus and S. Feuerriegel, “Decision support from financial disclosures with deep neural networks and transfer learning,” Decision Support Systems, vol. 104, pp. 38–48, 2017.
  • [13] A. Chernov, M. Butakova, and A. Kostyukov, “Intelligent decision support for power grids using deep learning on small datasets,” in 2020 2nd Intl. Conf. SUMMA, 2020, pp. 958–962.
  • [14] C.-Y. Hung, C.-H. Lin, T.-H. Lan, G.-S. Peng, and C.-C. Lee, “Development of an intelligent decision support system for ischemic stroke risk assessment in a population-based electronic health record database,” PLOS ONE, vol. 14, no. 3, pp. 1–16, 03 2019.
  • [15] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” in Proc. IEEE Conf. CVPR, June 2016.
  • [16] S. Wiegreffe, J. Hessel, S. Swayamdipta, M. Riedl, and Y. Choi, “Reframing human-AI collaboration for generating free-text explanations,” in Proc. of the 2022 Conf. of NAACL.   ACL, Jul. 2022, pp. 632–658.
  • [17] R. Parasuraman and V. Riley, “Humans and automation: Use, misuse, disuse, abuse,” Human Factors, vol. 39, no. 2, pp. 230–253, 1997.
  • [18] H. Vasconcelos, M. Jörke, M. Grunde-McLaughlin, T. Gerstenberg, M. S. Bernstein, and R. Krishna, “Explanations can reduce overreliance on ai systems during decision-making,” roc. ACM Hum.-Comput. Interact., vol. 7, no. CSCW1, apr 2023. [Online]. Available: https://doi.org/10.1145/3579605
  • [19] C. Panigutti, A. Beretta, F. Giannotti, and D. Pedreschi, “Understanding the impact of explanations on advice-taking: A user study for ai-based clinical decision support systems,” in Proc. of 2022 CHI.   ACM, 2022.
  • [20] Y. Fukuchi and S. Yamada, “Selectively providing reliance calibration cues with reliance prediction,” in Proc. Annu. Meet. of CogSci, vol. 45, 2023, pp. 1579–1586.
  • [21] ——, “Dynamic selection of reliance calibration cues with ai reliance model,” IEEE Access, vol. 11, pp. 138 870–138 881, 2023.
  • [22] J.-F. Chen, W.-L. Chen, C.-P. Huang, S.-H. Huang, and A.-P. Chen, “Financial time-series data analysis using deep convolutional neural networks,” in 2016 7th Intl. Conf. on CCBD, 2016, pp. 87–92.
  • [23] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. CVPR, 2016, pp. 770–778.
  • [24] H. Wang, Z. Wang, M. Du, F. Yang, Z. Zhang, S. Ding, P. Mardziel, and X. Hu, “Score-cam: Score-weighted visual explanations for convolutional neural networks,” in Proc. IEEE/CVF Conf. CVPR Workshops, June 2020.
  • [25] OpenAI, “Gpt-4 technical report,” 2023.
  • [26] L. Wang, N. Yang, X. Huang, B. Jiao, L. Yang, D. Jiang, R. Majumder, and F. Wei, “Text embeddings by weakly-supervised contrastive pre-training,” 2022.
  • [27] S. Gu, T. Lillicrap, I. Sutskever, and S. Levine, “Continuous deep q-learning with model-based acceleration,” in Proc. 33rd Intl. Conf. on Machine Learning, vol. 48, 20–22 Jun 2016, pp. 2829–2838.