This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Individual and Team Trust Preferences for Robotic Swarm Behaviors

Elena M. Vella1, Daniel A. Williams2, Airlie Chapman1, Chris Manzie2 1Elena M. Vella and Airlie Chapman are with the Department of Mechatronic Engineering, The University of Melbourne, 3052 Parkville, Australia [email protected], [email protected].2Daniel A. Williams and Chris Manzie are with the Department of Electrical Engineering, The University of Melbourne, 3052 Parkville, Australia [email protected], [email protected].
Abstract

Trust between humans and multi-agent robotic swarms may be analyzed using human preferences. These preferences are expressed by an individual as a sequence of ordered comparisons between pairs of swarm behaviors. An individual’s preference graph can be formed from this sequence. In addition, swarm behaviors may be mapped to a feature vector space. We formulate a linear optimization problem to locate a trusted behavior in the feature space. Extending to human teams, we define a novel distinctiveness metric using a sparse optimization formulation to cluster similar individuals from a collection of individuals’ labeled pairwise preferences. The case of anonymized unlabeled pairwise preferences is also examined to find the average trusted behavior and minimum covariance bound, providing insights into group cohesion. A user study was conducted, with results suggesting that individuals with similar trust profiles can be clustered to facilitate human-swarm teaming.

I Introduction

As teams of robots venture further into home and work settings, interactions between humans and robotic systems will become more common. An important element of these interactions is the notion of human trust, an intuitive belief that the object of trust will help to achieve goals even in uncertain situations [1]. Excessive trust in a system can lead to over-reliance, with potentially negative implications for performance and safety. Conversely, humans with low trust may avoid interacting with the system altogether. It is vital to understand how the design and operation of robotic systems influences human trust perceptions, in order to facilitate more efficient and successful interactions. We therefore focus on learning and predicting human trust in robotic swarms.

Much attention has focused on trust in human-automation interaction, initially in human interactions with industrial machines [1] but later expanding to other settings [2]. Trust in human-robot interaction is an active area of research [3], stemming from scenarios that require a delegation of autonomy (e.g. collaborative lifting, search and rescue). This is particularly salient in human interactions with multi-robot swarms (HSI), as it is difficult for a human to interact with each swarm agent simultaneously. Delegation of autonomy can only happen when the human trusts the swarm sufficiently (regarding proper task execution and safety, among other factors [3]). Conventional approaches to measuring human trust in swarms use a scalar value [4] or multi-dimensional quantity [3] to represent trust, however the nebulous, idiosyncratic nature of trust may complicate attempts to compare values between individuals and across time. In addition, trust measures may focus on certain characteristics and omit others, prescribing a model of trust that may not suit all participants. For greater comparability when analyzing trust, we may instead focus on human preferences and use preference learning techniques [5]. Inspired by the algorithm presented in [6], we may use preferential reasoning to understand the influence of swarm behaviors on a human observer’s trust level. We can further extend the concept of individual trust to groups [7], thus providing a way of understanding how to team individuals based on similar trust profiles and of analyzing conflicting preferences’ impact on a team’s overall trust dynamic.

We extend concepts developed in [5, 6, 7] to population based measurements of trust. Our contributions include a distinctiveness metric describing how an individual’s trust towards a swarm differ from others in a population. We determine this metric this by analyzing differences in individuals’ trust preferences as perturbations from a common reference, and quantifying the respective divergence for each individual. By selecting those with distinctiveness below a threshold, we may cluster individuals with similar trust preferences. We also consider the concept of group cohesion regarding the distribution of trust preferences when preferences are anonymized and aggregated from a population. To this end, assuming the individual’s optimal trust is drawn from a normal distribution, the unlabeled preference data can yield bounds on the distribution covariance. These in turn can serve as a measure of group cohesion.

The remainder of this paper studies how we may analyze human trust preferences for robotic swarm behaviors and describe a group’s trust preferences by synthesizing individuals’ preferences. We first define key terminology and explain the swarm behaviors considered. In §\SII we formulate a preference learning system and introduce Valma for feature vector extraction. We develop a trust preference model for an individual in §\SIII, extending this to a group in IV. Details of a user study feature in §\SV, and concluding remarks are made in §\SVI.

Notation

For xqx\in\mathbb{R}^{q}, we define the 1-norm of a vector x𝟏=i=1n|xi|||x||_{\mathbf{1}}=\sum_{i=1}^{n}|x_{i}| and the 2-norm of a vector x𝟐=i=1n|xi|2||x||_{\mathbf{2}}=\sum_{i=1}^{n}|x_{i}|^{2}. The indicator function 𝕀A()\mathbb{I}_{A}(\cdot) for a set 𝒜\mathcal{A}\in\mathbb{R} is defined as 𝕀A(x)=1\mathbb{I}_{A}(x)=1 if x𝒜x\in\mathcal{A} and otherwise equal to 0. The identity matrix II is a diagonal matrix with 1 on the dominant axis and 0 elsewhere. A pair of elements (xi1,xi2)(x^{1}_{i},x^{2}_{i}) appears uniquely in a sequence set 𝒮\mathcal{S}, if and only if xi1x^{1}_{i} is preferred to xi2x^{2}_{i} for all possible set combinations i=1,,pi=1,...,p. A graph 𝒢=(V,S)\mathcal{G}=(V,S) is defined by a vertex set VV of cardinality nn and an ordered edge set EV×VE\subseteq V\times V of cardinality mm. If an edge exists from vertex vjv_{j} to vertex viv_{i} it is expressed as (vj,vi)E(v_{j},v_{i})\in E. A directed acyclic graph is a graph with no directed cycles. We denote the standard normal cumulative density function as Φ(x)\Phi(x). The function evaluates the probability that the value of a random variable Y𝒩(0,1)Y\sim\mathcal{N}(0,1) is less than or equal to x(,)x\in(-\infty,\infty). Similarly, the normal cumulative density function Φμ,σ(x)\Phi_{\mu,\sigma}(x) describes the probability that the value of the random variable X𝒩(μ,σ2)X\sim\mathcal{N}(\mu,\sigma^{2}) is less than or equal to xx, denoted by p(Xx)p(X\leq x). The cumulative density functions are related by p(Xx)=Φμ,σ(x)=Φ(xμ)/σp(X\leq x)=\Phi_{\mu,\sigma}(x)=\Phi\left(x-\mu\right)/\sigma; the inverse mapping is given by

Φ1(p(X<x))=xμσ.\Phi^{-1}\left(p(X<x)\right)=\frac{x-\mu}{\sigma}. (1)

II Problem Formulation

We seek to formulate a preference learning problem in order to study how a swarm’s behavior might influence a human observer’s trust and, more generally, trust for a group.

II-A Swarm Behaviors

In this work we consider a swarm composed of ground-based robotic vehicles [8]. Each agent’s motion can be represented using unicycle dynamics commanded by a multi-agent controller.

To elicit human trust responses, we have implemented five swarm behaviors using the robotic platform: 1) cyclic pursuit, in which agents traverse a circle [9], 2) herding, in which agents move from one location to another while maintaining collective cohesion [10], 3) leader following, in which a leader moves while trailed by all other agents [11], 4) square formation, in which agents relocate to the vertices of a square shape [11], 5) line formation, in which the agents relocate to form a line [11]. The respective trajectories are depicted in Figure 1.

Figure 1: Swarm behavior trajectory traces.

II-B Preference Data and Preference Graph

Given a set of nbn_{b} swarm behaviors BB, we wish to collect a set of mbm_{b} pairwise comparison preferences B={(vi1,vi2)|vi1,vi2B,i{1,,mb}}\mathcal{E}_{B}=\{(v_{i}^{1},v_{i}^{2})\,|\,v_{i}^{1},v_{i}^{2}\in B,\,i\in\{1,...,m_{b}\}\} that answer the general question “Comparing these two swarm behaviors, which do you trust more?”, presupposing an individual’s intuitive definition of trust. For the ithi^{th} comparison, we record the more trusted behavior as vi1v_{i}^{1} (the first element in the pair) and the less trusted behavior as vi2v_{i}^{2} (the second element).

We may visualize an individual’s pairwise comparison preferences using a preference graph. Consistent preferences may be depicted as acyclic graphs, yielding a partial order over preferences, while inconsistent preferences generate a cyclic graph. For the kthk^{th} individual, the directed preference graph 𝒢k=(Vk,k)\mathcal{G}_{k}=(V_{k},\mathcal{E}_{k}) is defined by a vertex set VkBV_{k}\subseteq B containing the compared behaviors, and an ordered edge set kVk×VkB\mathcal{E}_{k}\subseteq V_{k}\times V_{k}\subseteq\mathcal{E}_{B} indicating preferences among pairs of vertices. Here, a directed edge (vi,vj)k(v_{i},v_{j})\in\mathcal{E}_{k} from vertex viv_{i} to vertex vjv_{j} indicates behavior viv_{i} is preferred to behavior vjv_{j}. Note there is at most one edge between each vertex in a pair, i.e. we do not consider self-contradictions.

We assume that each preference is labeled as belonging to a given individual when constructing the individual’s directed preference graph. In a wider population for which demographic information cannot be collected, we instead collect anonymized unlabeled trust preferences such that we cannot distinguish between individuals. In this case we may define a population preference graph 𝒢¯=(V¯,¯,W)\bar{\mathcal{G}}=(\bar{V},\bar{\mathcal{E}},W), with the vertex set V¯=k=1pVk\bar{V}=\bigcup_{k=1}^{p}V_{k}, the ordered set of edges ¯B\bar{\mathcal{E}}\subseteq\mathcal{E}_{B}, and the associated edge weight set WW. The edge and weight set are formed by enumerating each pairwise preference as aij=k=1p𝕀k((vi,vj))a_{ij}=\sum_{k=1}^{p}\mathbb{I}_{\mathcal{E}_{k}}\left((v_{i},v_{j})\right) for all vi,vjV¯v_{i},v_{j}\in\overline{V}. The edge set ¯\bar{\mathcal{E}} contains the edge (vi,vj)(v_{i},v_{j}) if aijaji>0a_{ij}-a_{ji}>0 (i.e. viv_{i} is more preferred to vjv_{j}) or aij=aji0a_{ij}=a_{ji}\neq 0 and i<ji<j (i.e. viv_{i} are equally preferred). The associated edge weight element of WW is wk=aij/(aij+aji)w_{k}=a_{ij}/(a_{ij}+a_{ji}) for the kthk^{th} preference pair (vk1,vk2)=(vi,vj)(v_{k}^{1},v_{k}^{2})=(v_{i},v_{j}); these weights capture preference variability among individuals in the population.

We interpret 𝒢¯\bar{\mathcal{G}} as proportional to the average preference of the population, with each vertex in the graph mapped to a preferred preference. The directed weighted edges between vertices indicate the likelihood of the preference transition by the majority of individuals. A partially ordered set of instance preferences can be abstracted from the preference graph and is amenable to an ordinal optimization problem [12]. We instead locate the preference instances within a feature vector space and optimize within this space.

II-C Feature Vector Extraction

Refer to caption
Figure 2: Overview of data flows in the Visual and Longitudinal Motion Autoencoder model architecture; note that the blocks convey the dimensionality of the signals while the arrows indicate performed transformations.

As we cannot reason about preferences towards behaviors not already in the graph, we may consider mapping each behavior to a point xx in a feature space 𝒳\mathcal{X}. In §\SV we present individuals with videos of swarm behaviors. We seek to work directly with these stimuli, to capture information about the swarm’s visual appearance and trajectory evolution encoded therein. Manual feature vector extraction from stimuli has been demonstrated for preference learning in [13], however doing so for videos is impractical and entails discretionary judgements regarding which features to select. Automatic feature vector extraction processes can address these two issues by removing human discretion in feature identification. One may consider dimensionality-reduction techniques for individual video frames (e.g. principle component analysis [14]), however temporal dynamics between frames will be neglected. To overcome this issue, we have adopted a similar approach to [15] by developing a neural network-based variational auto-encoder (Valma) that extracts a feature vector for a video of swarm behavior. As depicted in Figure 2, the model contains a pre-processing component (extracting frame features using the VGGNet-A computer vision model [16]) and two additional recurrent neural network components: an encoder and a decoder. The recurrent neural networks use an LSTM architecture [17] in order to learn temporal relationships between different data features. Between the two components there is a data bottleneck, with the encoded input projected into a lower-dimensional latent space.111The reader is referred to our repository for further details about the implementation and training of the model. By training the model to reproduce the encoder input at the decoder output, we can learn a mapping from the input space to a latent space in an unsupervised manner. and thus extract a compact feature vector x𝒳x\in\mathcal{X} automatically. We describe the mapping as h:B𝒳h:B\to\mathcal{X} from the behavior set to a qq-dimensional feature space 𝒳q\mathcal{X}\subseteq\mathbb{R}^{q}.

III Trust for Individuals

To meaningfully compare individuals’ preferences regarding videos of swarm behaviors, we consider pairwise comparisons (‘instances’) in a feature vector space. In the following sections we synthesize and build on [6] and [13], posing a convex optimization problem with a global extremum.

III-A Preference Synthesis

The feature vectors for the trust preference pair (vi1,vi2)B(v^{1}_{i},v^{2}_{i})\in\mathcal{E}_{B} are given by xi1=h(vi1)x^{1}_{i}=h(v^{1}_{i}) and xi1=h(vi1)x^{1}_{i}=h(v^{1}_{i}), respectively. Pairwise trust preferences imply the existence of an underlying quadratic trust function fk:𝒳f_{k}:\mathcal{X}\to\mathbb{R} such that

fk(xi1)fk(xi2)(vi1,vi2)k.f_{k}(x^{1}_{i})\leq f_{k}(x^{2}_{i})\Leftrightarrow(v^{1}_{i},v^{2}_{i})\in\mathcal{E}_{k}. (2)

Consider the quadratic trust function

fk(x)=xx¯k2,f_{k}(x)=\lVert x-\bar{x}_{k}\rVert_{2}, (3)

where x¯k\bar{x}_{k} is a vector in 𝒳\mathcal{X} corresponding to optimal trust for individual kk. We may estimate this function by considering the preference set as analogous to a set of affine classifications. In feature vector space, this set corresponds to a set of hyperplanes separating the pairs of behavior points with maximal distance to each point.

Given the ithi^{th} preference pair (xi1,xi2)𝒳×𝒳(x_{i}^{1},x_{i}^{2})\in\mathcal{X}\times\mathcal{X}, (2) is equivalent to the halfspace

gi(x)=aiTxbi0,g_{i}(x)=a_{i}^{T}x-b_{i}\leq 0\;, (4)

where ai=xi2xi1a_{i}=x_{i}^{2}-x_{i}^{1} and bi=aiT(xi1+xi2)/2b_{i}=a_{i}^{T}(x_{i}^{1}+x_{i}^{2})/2. The closed halfspace indicates that gi(x)0g_{i}(x)\leq 0 (the region of the feature space containing preferred behaviors) is convex but not affine.

III-B Preference Polytope

The kthk^{th} individual’s preference set can be described by a set of halfspaces (4) in feature vector space, with the ithi^{th} hyperspace associated with the preference (vi1,vi2)k(v^{1}_{i},v^{2}_{i})\in\mathcal{E}_{k}. Each preference instance reduces the halfspace of 𝒳\mathcal{X}, further constraining xx. The intersection of the closed halfspaces define a preference polytope, a region of feature vector space associated with greatest preference. The intersection satisfies the system of linear inequalities created by the preferences, and can be represented by the polytope

Pk={x𝒳|aiTxbi,(vi1,vi2)k}.P_{k}=\left\{x\in\mathcal{X}|a_{i}^{T}x\leq b_{i},\forall(v^{1}_{i},v^{2}_{i})\in\mathcal{E}_{k}\right\}. (5)

An example the intersection of eight preference’s corresponding halfspaces is given by the shaded interior region in Figure 3. The preference polytope can be unbounded for small |k||\mathcal{E}_{k}| and poorly distributed preference pairs. The preference polytope can also be empty for cyclic preference graphs and poorly selected embeddings.

The closed region of the polytope PkP_{k} can be used to determine preferred swarm behaviors; this process is often termed preference learning [18]. The polytope PkP_{k} can be built iteratively, with new pairwise comparisons presented to the individual over time. Given the polytope Pk(t)P_{k}(t) at sample time tt, the addition of the ithi^{th} preference at time t+1t+1 forms the new polytope Pk(t+1)=Pk(t){x|gi(x)0}P_{k}(t+1)=P_{k}(t)\bigcap\{x|g_{i}(x)\leq 0\}. The strategic presentation of pairwise comparisons to the participant can rapidly reduce the volume of Pk(t)P_{k}(t) over time.

III-C Finding the Chebyshev Center

As we have insufficient information within a bounded PkP_{k} to find x¯k\bar{x}_{k} in (3), we may substitute an alternative point in 𝒳\mathcal{X}. We believe that the Chebyshev center of PkP_{k} is a suitable candidate [6]. The Chebyshev center of PkP_{k} is the center of the largest inscribed ball in PkP_{k}, also referred to as the in-center point. A visual interpretation of this is depicted in Figure 3. Let the Chebyshev center xcx_{c} lie at the center of the largest possible ball ={xc+u|u2r}\mathcal{B}=\{x_{c}+u\,|\,\lVert u\rVert_{2}\leq r\} inside PkP_{k}. We may obtain \mathcal{B} by maximising rr. For a weaker constraint, let \mathcal{B} lie in the half-space

u2raiT(xc+u)bi.\lVert u\rVert_{2}\leq r\implies a_{i}^{T}(x_{c}+u)\leq b_{i}. (6)

The corresponding largest possible ball is given by sup{aiTu|u2r}=rai2.\text{sup}\{a_{i}^{T}u|\lVert u\rVert_{2}\leq r\}=r\lVert a_{i}\rVert_{2}. Hence the Chebyshev center lies within the halfspace Pk\mathcal{B}\subset P_{k} if and only if aiTxc+rai2bi,a_{i}^{T}x_{\text{c}}+r\lVert a_{i}\rVert_{2}\leq b_{i}, for all (vi1,vi2)k(v_{i}^{1},v_{i}^{2})\in\mathcal{E}_{k}. Given the ball radius r0r\geq 0, xcx_{\text{c}} can be found by solving the optimization

(xc,r¯)\displaystyle\left({x}_{c},\overline{r}\right) =argmaxx,rr,\displaystyle=\arg\max_{x,r}\;r, (7)
s.t. aiTx+rai2bi,(vi1,vi2)k.\displaystyle a_{i}^{T}x+r\lVert a_{i}\rVert_{2}\leq b_{i},\forall(v_{i}^{1},v_{i}^{2})\in\mathcal{E}_{k}.

The optimization is a linear program with many algorithms that can reliably and efficiently solve the problem [19]. The resulting xcx_{c} can then be used as a proxy to compare individuals’ trust.

Refer to caption
Figure 3: The Chebyshev center xcx_{c} with radius rr of the preference polytope PP.

IV Trust for Groups

We may analyze the trust preferences of a group of individuals. We first examine the distinctiveness of individuals’ preferences using labeled pairwise preference data and consider the effect of self-contradictory responses. Then considering unlabeled pairwise preferences, we observe properties of the entire group’s preferences. Assuming that the individual’s point of maximal trust is drawn from a normal distribution, we use this to evaluate the cohesion of the group.

IV-A Labeled Individual Preferences

Given multiple individuals’ respective preferences, we may devise a measure of the individuals’ distinctiveness. The optimal trust value for each individual kk is denoted as x+zkx+z^{k} where xx is assumed to be a global reference trust measure and zkz^{k} the perturbation of individual kk away from this reference. If the magnitude of zkz^{k} were small for all of the team, the group would exhibit similar trust behaviors. Individual with large zkz^{k} would, in turn, have distinctive trust behaviors compared to the group as a whole. Examining the preferences across each individual kk, the selection problem for their ithi^{th} preference selection will generate the hyperspace (aik)T(x+zk)bik\left(a_{i}^{k}\right)^{T}\left(x+z^{k}\right)\leq b_{i}^{k}. An optimization problem can then be posed to find xx with small zk1\left\|z^{k}\right\|_{1} across all members using

(x¯,z¯)\displaystyle\left(\overline{x},\overline{z}\right) =argminx,z=[z1,,zn]k=1nzk1,\displaystyle=\text{argmin}_{x,z=[z^{1},\dots,z^{n}]}\sum_{k=1}^{n}\left\|z^{k}\right\|_{1}, (8)
s.t. (aik)T(x+zk)bik,(vi1,vi2)k.\displaystyle\left(a_{i}^{k}\right)^{T}\left(x+z^{k}\right)\leq b_{i}^{k},\forall(v_{i}^{1},v_{i}^{2})\in\mathcal{E}_{k}.

The accumulative 1-norm is used here to minimize zz due to its sparsifying properties as it promotes sparse solutions with small or even zero zk1\left\|z^{k}\right\|_{1} [19]. When zk1=0\left\|z^{k}\right\|_{1}=0, the reference trust measure xx will satisfy all of the selection preferences for individual kk. This subset of individuals will share a non-trivial intersection of their trust polytopes and an additional Chebyshev center selection could be performed to select a trust reference xcx_{c} with the characteristics described in §\SIII-C.

IV-B Unlabeled Population Preferences

We now examine the case where a preference may be expressed multiple times with contradictory responses. Consider the selection of the perturbation zz for each individual from a normal distribution Z𝒩(0,Σ)Z\sim\mathcal{N}(0,\Sigma), where Σ\Sigma is a symmetric positive definite matrix. From (4), the ithi^{th} selection problem can subsequently be posed as a preferential selection of one choice over the other when the random variable Xi=gi(x+z)=aiT(x+z)biX_{i}=g_{i}(x+z)=a_{i}^{T}\left(x+z\right)-b_{i} is non-positive; the corresponding distribution is 𝒩(μ=aiTxbi,σ2=aiTΣai)\mathcal{N}(\mu=a_{i}^{T}x-b_{i},\sigma^{2}=a_{i}^{T}\Sigma a_{i}). By applying the inverse distribution mapping (1), the probability of a positive preference selection pi=p(Xi0)p_{i}=p(X_{i}\leq 0) is then Φ1(pi)σ=μ.-\Phi^{-1}(p_{i})\sigma=\mu. Assuming that the covariance of ZZ is bounded as 0Σα2I0\preceq\Sigma\preceq\alpha^{2}I, then σαai2\sigma\leq\alpha\left\|a_{i}\right\|_{2}. For pi0.5p_{i}\geq 0.5 then Φ1(pi)0\Phi^{-1}(p_{i})\geq 0 and

αai2Φ1(pi)aiTxbi0.-\alpha\left\|a_{i}\right\|_{2}\Phi^{-1}(p_{i})\leq a_{i}^{T}x-b_{i}\leq 0. (9)

Similarly, for pi<0.5p_{i}<0.5 then

0aiTxbiαai2Φ1(pi),0\leq a_{i}^{T}x-b_{i}\leq-\alpha\left\|a_{i}\right\|_{2}\Phi^{-1}(p_{i}), (10)

such that each preference constrains xx to lie in what can be interpreted geometrically as a slab, i.e., a set of the form {xq|αaTxβ}\{x\in\mathbb{R}^{q}|\alpha\leq a^{T}x\leq\beta\} for scalars αβ\alpha\leq\beta. With the data sampled from a finite population, the probability pip_{i} is calculated based on a confidence interval [piδ,pi+δ]\left[p_{i}-\delta,p_{i}+\delta\right] projected onto the unit interval [0,1]\left[0,1\right] as Δ=[max(0,piδ),min(1,pi+δ)])\Delta=\left[\max(0,p_{i}-\delta),\min(1,p_{i}+\delta)\right]). Here, 2δ2\delta is the width of the confidence interval band and is based on the margin of error calculated from a number of samples. Applying the central limit theorem for the binomial distribution is one approach to calculate the width with δ=Z1/4ns\delta=Z\sqrt{1/4n_{s}} where ZZ is the ZZ-score associated with a confidence interval and nsn_{s} is the number of samples of the ithi^{th} preference [20].

Refer to caption
Figure 4: Participant Chebyshev centers xcx_{c} compared with the aggregated trust optimum x¯\bar{x}.

We assume that gi(x)g_{i}(x) is constructed so that Xi0X_{i}\leq 0 for most of the population (i.e., pi0.5p_{i}\geq 0.5), for example as per §\SII-B for 𝒢¯=(V¯,¯,W)\bar{\mathcal{G}}=(\bar{V},\bar{\mathcal{E}},W) with pi=wiWp_{i}=w_{i}\in W. Using the constraints (9) and (10) over Δ\Delta, we may find the average trust measure x¯\overline{x} and minimum covariance bound α¯\overline{\alpha} for the population as

(x¯,α¯)\displaystyle\left(\overline{x},\overline{\alpha}\right) =argminx,αα\displaystyle=\arg\min_{x,\alpha}\;\alpha (11)
s.t. aiTxαai2max(0,Φ1(piδ))bi,\displaystyle a_{i}^{T}x-\alpha\left\|a_{i}\right\|_{2}\max(0,-\Phi^{-1}(p_{i}-\delta))\leq b_{i},
aiTx+αai2Φ1(min(1,pi+δ))bi,\displaystyle a_{i}^{T}x+\alpha\left\|a_{i}\right\|_{2}\Phi^{-1}(\min(1,p_{i}+\delta))\geq b_{i},
(vi1,vi2)¯,pi=wiW.\displaystyle\hphantom{a_{i}^{T}x+\alpha\left\|a_{i}\right\|_{2}\Phi^{-1}}\forall(v_{i}^{1},v_{i}^{2})\in\bar{\mathcal{E}},p_{i}=w_{i}\in W.

For the ithi^{th} preference, the (positive) upper bound on the covariance α\alpha constrains the width of the slab containing xx. Similarly, the closer pip_{i} is to 0.50.5 (i.e. a split decision on the ithi^{th} preference among individuals), the narrower the slab.

V User Study

We have conducted a user study to observe human trust preferences regarding swarm behaviors. In this section we outline our procedure and summarize collected results.

Refer to caption
Figure 5: Trust scores for participants as measured using TPS-HRI; the shaded region contains participants with distinctiveness zk20.035\left\|z^{k}\right\|_{2}\leq 0.035.
Refer to caption
Figure 6: Participant distinctiveness.

V-A Procedure

We pursued an online survey methodology involving filmed videos of the swarm behaviors from §\SII-A. All participants were over 18 years of age and no demographic information was requested. Forty-three participants responded to the survey, and were not remunerated or otherwise rewarded for survey completion. In the first part, we presented the participant with a video of a swarm executing a leader following behavior (see §\SII-A). The fourteen questions relating to trust in the swarm are a modified version of the Trust Perception Scale-HRI (TPS-HRI) [3], substituting the word ‘swarm’ for ‘robot’. In the second part, we presented the participant with pairs of swarm behavior videos and collected the participant’s pairwise trust preferences, asking ‘Comparing the two swarms, which do you trust more?’. To avoid priming the participants we did not specify the notion of trust further. The first six pairs of videos covered each combination among the first four of the five behaviors, shown to each participant in the same order. Each participant could repeat the process with four more video pairs, but this was optional to avoid survey fatigue impacting response quality.

Feature Vector Extraction

The feature vectors extracted by Valma encode sequential dependencies between successive video frames: for a training set of 20 videos, the percentage of dimensions differing between the original video’s feature vector and that of the same video played in reverse lies in the range [23.1%,49.5%][23.1\%,49.5\%]. Since the only difference is in the order of video frames presented to Valma, and given that this has yielded distinct feature vectors, we infer that Valma can map distinct videos to distinct feature vectors.

V-B Results

We proceed to analyze data gathered from the online survey, focusing on the distinctiveness and cohesion of the cohort’s trust preferences. For each participant we have created a preference graph (exemplified by Figure 7) to determine the respective preference polytope. For the subset of participants with bounded preference polytopes, their individual Chebyshev centres could be found. In Figure 4 the aggregated population trust optimum x¯\bar{x} from (8) is compared with the Chebyshev centers for the preference polytopes belonging to a subset of participants.222For visualization purposes only two dimensions of 𝒳\mathcal{X} are depicted. For the same subset of participants, we also compare their distinctiveness z1k\left\|z{{}^{k}}\right\|_{1} with corresponding TPS-HRI trust scores in Figure 6. We observe that participants with distinctiveness zk10.035\left\|z^{k}\right\|_{1}\leq 0.035 and a trust score in the range [42%,56.5%][42\%,56.5\%] express preferences compatible with the population’s preferences. In contrast, participants with distinctiveness zk1>0.035\left\|z^{k}\right\|_{1}>0.035 have preferences differing from the population. In Figure 5 we extrapolate the notion of distinctiveness and trust bounds to the whole population. Participants with trust values in the range [42%,56.5%][42\%,56.5\%] are associated with low distinctiveness from the population’s preferences in Figure 6, hence have similar trust preferences to an average participant in the population. In this way low distinctiveness can become a criterion for selecting teams of participants.

In Figure 7 a partial ordering over the swarm behaviors is generated from an aggregation of unlabeled preferences from all participants, and represented as a preference graph 𝒢¯=(V¯,¯,W)\bar{\mathcal{G}}=(\bar{V},\bar{\mathcal{E}},W) as depicted. We may then use the edge-weighted population preference graph to derive an average trust measure μ=x¯\mu=\bar{x} and minimum covariance bound σα¯\sigma\leq\bar{\alpha} from (11). In Table I we compare μ\mu with the aggregated Chebyshev center x¯c\bar{x}_{c} of the unweighted preference graph (V¯,¯)(\bar{V},\bar{\mathcal{E}}). We observe in Table II that 54.1%54.1\% of participant’s individual Chebyshev centers lie within the upper bound of one standard deviation α\alpha of the mean and 100%100\% lie within 2α2\alpha. This matches well the theoretical bounds p(s<Xμ2/σ<s)=Φ(s)Φ(s)p(-s<\left\|X-\mu\right\|_{2}/\sigma<s)=\Phi(s)-\Phi(-s) for s{1,2}s\in\{1,2\}, hence the aggregated unlabeled preference data is consistent with the model presented in Figure 7. The relatively small distance between the mean and the population’s Chebyshev center, μx¯c20.1α¯\left\|\mu-\bar{x}_{c}\right\|_{2}\leq 0.1\bar{\alpha}, shows that the population preference graph 𝒢¯\bar{\mathcal{G}} and optimal solution of (11) is representative of the true value of x¯k\bar{x}_{k}. This suggests that we may analyze preference similarity to evaluate a population’s cohesiveness.

Refer to caption
Refer to caption
Figure 7: a) A participant’s acyclic preference graph; b) Weighted population preference graph (the gradient bar indicates the preference likelihood for the population).
TABLE I: Population Trust Statistical Information
Aggregate 𝐱¯𝐜\mathbf{\bar{x}_{c}} Mean,μ\mu Covariance Bound,α¯\bar{\alpha}
(-0.4076, 0.1697) (-0.4383, 0.1788) 0.3406
TABLE II: Population Trust Preference Distribution
s 𝐩(𝐬<𝐱𝐜μ𝟐/α¯<𝐬)\mathbf{p(-s<\left\|x_{c}-\mu\right\|_{2}/\bar{\alpha}<s)} Φ(s)Φ(s)\Phi(s)-\Phi(-s)
1 0.5405 0.6812
2 1.0000 0.9545

VI Conclusions

In this work, we have studied a model of human trust in a robotic swarm using preferences. Generating a unique feature vector for each swarm behavior using Valma, we have embedded the swarm behaviors into a feature space and have formulated a polytope model with a Chebyshev center. Extending our consideration to groups of individuals, we have formulated a new distinctiveness metric to measure individuals’ labeled pairwise trust preferences with respect to a wider population. Aggregating all pairwise trust preferences for a group, we have posed a sparse optimization problem informed by the population’s weighted preference graph. This yields an average trust measure and minimum covariance bound, enabling analysis of the group’s cohesion. Results from our user study suggest that individuals with similar trust profiles may be grouped by low distinctiveness.

We anticipate three main areas for future work: measuring steady-state trust preferences in longer-duration interactions, modeling a population’s aggregated trust preferences using explicitly distinct clusters, and identifying an ideal truncation length for feature vectors produced by Valma.

References

  • [1] J. D. Lee and K. A. See, “Trust in Automation: Designing for Appropriate Reliance,” Human Factors, vol. 46, no. 1, pp. 50–80, Mar. 2004.
  • [2] J. B. Lyons, K. Sycara, M. Lewis, and A. Capiola, “Human-Autonomy Teaming: Definitions, Debates, and Directions,” Frontiers in Psychology, vol. 12, p. 589585, 2021.
  • [3] K. E. Schaefer, “Measuring Trust in Human Robot Interactions: Development of the “Trust Perception Scale-HRI”,” in Robust Intelligence and Trust in Autonomous Systems, R. Mittu, D. Sofge, A. Wagner, and W. Lawless, Eds.   Boston, MA: Springer US, 2016, pp. 191–218.
  • [4] C. Nam, P. Walker, H. Li, M. Lewis, and K. Sycara, “Models of Trust in Human Control of Swarms With Varied Levels of Autonomy,” IEEE Transactions on Human-Machine Systems, vol. 50, no. 3, pp. 194–204, Jun. 2020.
  • [5] J. Fürnkranz and E. Hüllermeier, “Preference learning and ranking by pairwise comparison,” in Preference learning.   Springer, 2010, pp. 65–82.
  • [6] P. Kingston and M. Egerstedt, “Comparing apples and oranges through partial orders: An empirical approach,” in 2009 American Control Conference, Jun. 2009, pp. 5434–5439.
  • [7] K. Akash, W.-L. Hu, T. Reid, and N. Jain, “Dynamic modeling of trust in human-machine interactions,” in 2017 American Control Conference (ACC), May 2017, pp. 1542–1548.
  • [8] E. Schoof, C. Manzie, I. Shames, A. Chapman, and D. Oetomo, “An experimental platform for heterogeneous multi-vehicle missions,” in Proceedings of the International Conference on Science and Innovation for Land Power, Adelaide, Australia, 2018, pp. 5–6.
  • [9] J. Marshall, M. Broucke, and B. Francis, “Formations of vehicles in cyclic pursuit,” IEEE Transactions on Automatic Control, vol. 49, no. 11, pp. 1963–1974, Nov. 2004.
  • [10] A. Pierson and M. Schwager, “Bio-inspired non-cooperative multi-robot herding,” in 2015 IEEE International Conference on Robotics and Automation (ICRA), May 2015, pp. 1843–1849.
  • [11] P. Pierpaoli, T. T. Doan, J. Romberg, and M. Egerstedt, “A Reinforcement Learning Framework for Sequencing Multi-Robot Behaviors,” arXiv:1909.05731 [cs, eess], Sep. 2019.
  • [12] Y.-C. Ho, R. Sreenivas, and P. Vakili, “Ordinal optimization of deds,” Discrete event dynamic systems, vol. 2, no. 1, pp. 61–88, 1992.
  • [13] P. Kingston, J. von Hinezmeyer, and M. Egerstedt, “Metric Preference Learning with Applications to Motion Imitation,” in Controls and Art, A. LaViers and M. Egerstedt, Eds.   Cham: Springer International Publishing, 2014, pp. 1–26.
  • [14] S. L. Brunton and J. N. Kutz, Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control.   Cambridge: Cambridge University Press, 2019. [Online]. Available: https://www.cambridge.org/core/books/datadriven-science-and-engineering/77D52B171B60A496EAFE4DB662ADC36E
  • [15] K. Aberman, R. Wu, D. Lischinski, B. Chen, and D. Cohen-Or, “Learning character-agnostic motion for motion retargeting in 2D,” ACM Transactions on Graphics, vol. 38, no. 4, pp. 75:1–75:14, Jul. 2019.
  • [16] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” Sep. 2014.
  • [17] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, Nov. 1997.
  • [18] R. Herbrich, T. Graepel, P. Bollmann-Sdorra, and K. Obermayer, “Supervised learning of preference relations,” Proceedings des Fachgruppentreffens Maschinelles Lernen (FGML-98), pp. 43–47, 1998.
  • [19] S. Boyd, S. P. Boyd, and L. Vandenberghe, Convex optimization.   Cambridge university press, 2004.
  • [20] L. D. Brown, T. T. Cai, and A. DasGupta, “Interval Estimation for a Binomial Proportion,” Statistical Science, vol. 16, no. 2, pp. 101 – 133, 2001.