Individual and Team Trust Preferences for Robotic Swarm Behaviors
Abstract
Trust between humans and multi-agent robotic swarms may be analyzed using human preferences. These preferences are expressed by an individual as a sequence of ordered comparisons between pairs of swarm behaviors. An individual’s preference graph can be formed from this sequence. In addition, swarm behaviors may be mapped to a feature vector space. We formulate a linear optimization problem to locate a trusted behavior in the feature space. Extending to human teams, we define a novel distinctiveness metric using a sparse optimization formulation to cluster similar individuals from a collection of individuals’ labeled pairwise preferences. The case of anonymized unlabeled pairwise preferences is also examined to find the average trusted behavior and minimum covariance bound, providing insights into group cohesion. A user study was conducted, with results suggesting that individuals with similar trust profiles can be clustered to facilitate human-swarm teaming.
I Introduction
As teams of robots venture further into home and work settings, interactions between humans and robotic systems will become more common. An important element of these interactions is the notion of human trust, an intuitive belief that the object of trust will help to achieve goals even in uncertain situations [1]. Excessive trust in a system can lead to over-reliance, with potentially negative implications for performance and safety. Conversely, humans with low trust may avoid interacting with the system altogether. It is vital to understand how the design and operation of robotic systems influences human trust perceptions, in order to facilitate more efficient and successful interactions. We therefore focus on learning and predicting human trust in robotic swarms.
Much attention has focused on trust in human-automation interaction, initially in human interactions with industrial machines [1] but later expanding to other settings [2]. Trust in human-robot interaction is an active area of research [3], stemming from scenarios that require a delegation of autonomy (e.g. collaborative lifting, search and rescue). This is particularly salient in human interactions with multi-robot swarms (HSI), as it is difficult for a human to interact with each swarm agent simultaneously. Delegation of autonomy can only happen when the human trusts the swarm sufficiently (regarding proper task execution and safety, among other factors [3]). Conventional approaches to measuring human trust in swarms use a scalar value [4] or multi-dimensional quantity [3] to represent trust, however the nebulous, idiosyncratic nature of trust may complicate attempts to compare values between individuals and across time. In addition, trust measures may focus on certain characteristics and omit others, prescribing a model of trust that may not suit all participants. For greater comparability when analyzing trust, we may instead focus on human preferences and use preference learning techniques [5]. Inspired by the algorithm presented in [6], we may use preferential reasoning to understand the influence of swarm behaviors on a human observer’s trust level. We can further extend the concept of individual trust to groups [7], thus providing a way of understanding how to team individuals based on similar trust profiles and of analyzing conflicting preferences’ impact on a team’s overall trust dynamic.
We extend concepts developed in [5, 6, 7] to population based measurements of trust. Our contributions include a distinctiveness metric describing how an individual’s trust towards a swarm differ from others in a population. We determine this metric this by analyzing differences in individuals’ trust preferences as perturbations from a common reference, and quantifying the respective divergence for each individual. By selecting those with distinctiveness below a threshold, we may cluster individuals with similar trust preferences. We also consider the concept of group cohesion regarding the distribution of trust preferences when preferences are anonymized and aggregated from a population. To this end, assuming the individual’s optimal trust is drawn from a normal distribution, the unlabeled preference data can yield bounds on the distribution covariance. These in turn can serve as a measure of group cohesion.
The remainder of this paper studies how we may analyze human trust preferences for robotic swarm behaviors and describe a group’s trust preferences by synthesizing individuals’ preferences. We first define key terminology and explain the swarm behaviors considered. In II we formulate a preference learning system and introduce Valma for feature vector extraction. We develop a trust preference model for an individual in III, extending this to a group in IV. Details of a user study feature in V, and concluding remarks are made in VI.
Notation
For , we define the 1-norm of a vector and the 2-norm of a vector . The indicator function for a set is defined as if and otherwise equal to 0. The identity matrix is a diagonal matrix with 1 on the dominant axis and 0 elsewhere. A pair of elements appears uniquely in a sequence set , if and only if is preferred to for all possible set combinations . A graph is defined by a vertex set of cardinality and an ordered edge set of cardinality . If an edge exists from vertex to vertex it is expressed as . A directed acyclic graph is a graph with no directed cycles. We denote the standard normal cumulative density function as . The function evaluates the probability that the value of a random variable is less than or equal to . Similarly, the normal cumulative density function describes the probability that the value of the random variable is less than or equal to , denoted by . The cumulative density functions are related by ; the inverse mapping is given by
(1) |
II Problem Formulation
We seek to formulate a preference learning problem in order to study how a swarm’s behavior might influence a human observer’s trust and, more generally, trust for a group.
II-A Swarm Behaviors
In this work we consider a swarm composed of ground-based robotic vehicles [8]. Each agent’s motion can be represented using unicycle dynamics commanded by a multi-agent controller.
To elicit human trust responses, we have implemented five swarm behaviors using the robotic platform: 1) cyclic pursuit, in which agents traverse a circle [9], 2) herding, in which agents move from one location to another while maintaining collective cohesion [10], 3) leader following, in which a leader moves while trailed by all other agents [11], 4) square formation, in which agents relocate to the vertices of a square shape [11], 5) line formation, in which the agents relocate to form a line [11]. The respective trajectories are depicted in Figure 1.





II-B Preference Data and Preference Graph
Given a set of swarm behaviors , we wish to collect a set of pairwise comparison preferences that answer the general question “Comparing these two swarm behaviors, which do you trust more?”, presupposing an individual’s intuitive definition of trust. For the comparison, we record the more trusted behavior as (the first element in the pair) and the less trusted behavior as (the second element).
We may visualize an individual’s pairwise comparison preferences using a preference graph. Consistent preferences may be depicted as acyclic graphs, yielding a partial order over preferences, while inconsistent preferences generate a cyclic graph. For the individual, the directed preference graph is defined by a vertex set containing the compared behaviors, and an ordered edge set indicating preferences among pairs of vertices. Here, a directed edge from vertex to vertex indicates behavior is preferred to behavior . Note there is at most one edge between each vertex in a pair, i.e. we do not consider self-contradictions.
We assume that each preference is labeled as belonging to a given individual when constructing the individual’s directed preference graph. In a wider population for which demographic information cannot be collected, we instead collect anonymized unlabeled trust preferences such that we cannot distinguish between individuals. In this case we may define a population preference graph , with the vertex set , the ordered set of edges , and the associated edge weight set . The edge and weight set are formed by enumerating each pairwise preference as for all . The edge set contains the edge if (i.e. is more preferred to ) or and (i.e. are equally preferred). The associated edge weight element of is for the preference pair ; these weights capture preference variability among individuals in the population.
We interpret as proportional to the average preference of the population, with each vertex in the graph mapped to a preferred preference. The directed weighted edges between vertices indicate the likelihood of the preference transition by the majority of individuals. A partially ordered set of instance preferences can be abstracted from the preference graph and is amenable to an ordinal optimization problem [12]. We instead locate the preference instances within a feature vector space and optimize within this space.
II-C Feature Vector Extraction

As we cannot reason about preferences towards behaviors not already in the graph, we may consider mapping each behavior to a point in a feature space . In V we present individuals with videos of swarm behaviors. We seek to work directly with these stimuli, to capture information about the swarm’s visual appearance and trajectory evolution encoded therein. Manual feature vector extraction from stimuli has been demonstrated for preference learning in [13], however doing so for videos is impractical and entails discretionary judgements regarding which features to select. Automatic feature vector extraction processes can address these two issues by removing human discretion in feature identification. One may consider dimensionality-reduction techniques for individual video frames (e.g. principle component analysis [14]), however temporal dynamics between frames will be neglected. To overcome this issue, we have adopted a similar approach to [15] by developing a neural network-based variational auto-encoder (Valma) that extracts a feature vector for a video of swarm behavior. As depicted in Figure 2, the model contains a pre-processing component (extracting frame features using the VGGNet-A computer vision model [16]) and two additional recurrent neural network components: an encoder and a decoder. The recurrent neural networks use an LSTM architecture [17] in order to learn temporal relationships between different data features. Between the two components there is a data bottleneck, with the encoded input projected into a lower-dimensional latent space.111The reader is referred to our repository for further details about the implementation and training of the model. By training the model to reproduce the encoder input at the decoder output, we can learn a mapping from the input space to a latent space in an unsupervised manner. and thus extract a compact feature vector automatically. We describe the mapping as from the behavior set to a -dimensional feature space .
III Trust for Individuals
To meaningfully compare individuals’ preferences regarding videos of swarm behaviors, we consider pairwise comparisons (‘instances’) in a feature vector space. In the following sections we synthesize and build on [6] and [13], posing a convex optimization problem with a global extremum.
III-A Preference Synthesis
The feature vectors for the trust preference pair are given by and , respectively. Pairwise trust preferences imply the existence of an underlying quadratic trust function such that
(2) |
Consider the quadratic trust function
(3) |
where is a vector in corresponding to optimal trust for individual . We may estimate this function by considering the preference set as analogous to a set of affine classifications. In feature vector space, this set corresponds to a set of hyperplanes separating the pairs of behavior points with maximal distance to each point.
Given the preference pair , (2) is equivalent to the halfspace
(4) |
where and . The closed halfspace indicates that (the region of the feature space containing preferred behaviors) is convex but not affine.
III-B Preference Polytope
The individual’s preference set can be described by a set of halfspaces (4) in feature vector space, with the hyperspace associated with the preference . Each preference instance reduces the halfspace of , further constraining . The intersection of the closed halfspaces define a preference polytope, a region of feature vector space associated with greatest preference. The intersection satisfies the system of linear inequalities created by the preferences, and can be represented by the polytope
(5) |
An example the intersection of eight preference’s corresponding halfspaces is given by the shaded interior region in Figure 3. The preference polytope can be unbounded for small and poorly distributed preference pairs. The preference polytope can also be empty for cyclic preference graphs and poorly selected embeddings.
The closed region of the polytope can be used to determine preferred swarm behaviors; this process is often termed preference learning [18]. The polytope can be built iteratively, with new pairwise comparisons presented to the individual over time. Given the polytope at sample time , the addition of the preference at time forms the new polytope . The strategic presentation of pairwise comparisons to the participant can rapidly reduce the volume of over time.
III-C Finding the Chebyshev Center
As we have insufficient information within a bounded to find in (3), we may substitute an alternative point in . We believe that the Chebyshev center of is a suitable candidate [6]. The Chebyshev center of is the center of the largest inscribed ball in , also referred to as the in-center point. A visual interpretation of this is depicted in Figure 3. Let the Chebyshev center lie at the center of the largest possible ball inside . We may obtain by maximising . For a weaker constraint, let lie in the half-space
(6) |
The corresponding largest possible ball is given by Hence the Chebyshev center lies within the halfspace if and only if for all . Given the ball radius , can be found by solving the optimization
(7) | ||||
s.t. |
The optimization is a linear program with many algorithms that can reliably and efficiently solve the problem [19]. The resulting can then be used as a proxy to compare individuals’ trust.

IV Trust for Groups
We may analyze the trust preferences of a group of individuals. We first examine the distinctiveness of individuals’ preferences using labeled pairwise preference data and consider the effect of self-contradictory responses. Then considering unlabeled pairwise preferences, we observe properties of the entire group’s preferences. Assuming that the individual’s point of maximal trust is drawn from a normal distribution, we use this to evaluate the cohesion of the group.
IV-A Labeled Individual Preferences
Given multiple individuals’ respective preferences, we may devise a measure of the individuals’ distinctiveness. The optimal trust value for each individual is denoted as where is assumed to be a global reference trust measure and the perturbation of individual away from this reference. If the magnitude of were small for all of the team, the group would exhibit similar trust behaviors. Individual with large would, in turn, have distinctive trust behaviors compared to the group as a whole. Examining the preferences across each individual , the selection problem for their preference selection will generate the hyperspace . An optimization problem can then be posed to find with small across all members using
(8) | ||||
s.t. |
The accumulative 1-norm is used here to minimize due to its sparsifying properties as it promotes sparse solutions with small or even zero [19]. When , the reference trust measure will satisfy all of the selection preferences for individual . This subset of individuals will share a non-trivial intersection of their trust polytopes and an additional Chebyshev center selection could be performed to select a trust reference with the characteristics described in III-C.
IV-B Unlabeled Population Preferences
We now examine the case where a preference may be expressed multiple times with contradictory responses. Consider the selection of the perturbation for each individual from a normal distribution , where is a symmetric positive definite matrix. From (4), the selection problem can subsequently be posed as a preferential selection of one choice over the other when the random variable is non-positive; the corresponding distribution is . By applying the inverse distribution mapping (1), the probability of a positive preference selection is then Assuming that the covariance of is bounded as , then . For then and
(9) |
Similarly, for then
(10) |
such that each preference constrains to lie in what can be interpreted geometrically as a slab, i.e., a set of the form for scalars . With the data sampled from a finite population, the probability is calculated based on a confidence interval projected onto the unit interval as . Here, is the width of the confidence interval band and is based on the margin of error calculated from a number of samples. Applying the central limit theorem for the binomial distribution is one approach to calculate the width with where is the -score associated with a confidence interval and is the number of samples of the preference [20].

We assume that is constructed so that for most of the population (i.e., ), for example as per II-B for with . Using the constraints (9) and (10) over , we may find the average trust measure and minimum covariance bound for the population as
(11) | ||||
s.t. | ||||
For the preference, the (positive) upper bound on the covariance constrains the width of the slab containing . Similarly, the closer is to (i.e. a split decision on the preference among individuals), the narrower the slab.
V User Study
We have conducted a user study to observe human trust preferences regarding swarm behaviors. In this section we outline our procedure and summarize collected results.


V-A Procedure
We pursued an online survey methodology involving filmed videos of the swarm behaviors from II-A. All participants were over 18 years of age and no demographic information was requested. Forty-three participants responded to the survey, and were not remunerated or otherwise rewarded for survey completion. In the first part, we presented the participant with a video of a swarm executing a leader following behavior (see II-A). The fourteen questions relating to trust in the swarm are a modified version of the Trust Perception Scale-HRI (TPS-HRI) [3], substituting the word ‘swarm’ for ‘robot’. In the second part, we presented the participant with pairs of swarm behavior videos and collected the participant’s pairwise trust preferences, asking ‘Comparing the two swarms, which do you trust more?’. To avoid priming the participants we did not specify the notion of trust further. The first six pairs of videos covered each combination among the first four of the five behaviors, shown to each participant in the same order. Each participant could repeat the process with four more video pairs, but this was optional to avoid survey fatigue impacting response quality.
Feature Vector Extraction
The feature vectors extracted by Valma encode sequential dependencies between successive video frames: for a training set of 20 videos, the percentage of dimensions differing between the original video’s feature vector and that of the same video played in reverse lies in the range . Since the only difference is in the order of video frames presented to Valma, and given that this has yielded distinct feature vectors, we infer that Valma can map distinct videos to distinct feature vectors.
V-B Results
We proceed to analyze data gathered from the online survey, focusing on the distinctiveness and cohesion of the cohort’s trust preferences. For each participant we have created a preference graph (exemplified by Figure 7) to determine the respective preference polytope. For the subset of participants with bounded preference polytopes, their individual Chebyshev centres could be found. In Figure 4 the aggregated population trust optimum from (8) is compared with the Chebyshev centers for the preference polytopes belonging to a subset of participants.222For visualization purposes only two dimensions of are depicted. For the same subset of participants, we also compare their distinctiveness with corresponding TPS-HRI trust scores in Figure 6. We observe that participants with distinctiveness and a trust score in the range express preferences compatible with the population’s preferences. In contrast, participants with distinctiveness have preferences differing from the population. In Figure 5 we extrapolate the notion of distinctiveness and trust bounds to the whole population. Participants with trust values in the range are associated with low distinctiveness from the population’s preferences in Figure 6, hence have similar trust preferences to an average participant in the population. In this way low distinctiveness can become a criterion for selecting teams of participants.
In Figure 7 a partial ordering over the swarm behaviors is generated from an aggregation of unlabeled preferences from all participants, and represented as a preference graph as depicted. We may then use the edge-weighted population preference graph to derive an average trust measure and minimum covariance bound from (11). In Table I we compare with the aggregated Chebyshev center of the unweighted preference graph . We observe in Table II that of participant’s individual Chebyshev centers lie within the upper bound of one standard deviation of the mean and lie within . This matches well the theoretical bounds for , hence the aggregated unlabeled preference data is consistent with the model presented in Figure 7. The relatively small distance between the mean and the population’s Chebyshev center, , shows that the population preference graph and optimal solution of (11) is representative of the true value of . This suggests that we may analyze preference similarity to evaluate a population’s cohesiveness.


Aggregate | Mean, | Covariance Bound, |
---|---|---|
(-0.4076, 0.1697) | (-0.4383, 0.1788) | 0.3406 |
s | ||
---|---|---|
1 | 0.5405 | 0.6812 |
2 | 1.0000 | 0.9545 |
VI Conclusions
In this work, we have studied a model of human trust in a robotic swarm using preferences. Generating a unique feature vector for each swarm behavior using Valma, we have embedded the swarm behaviors into a feature space and have formulated a polytope model with a Chebyshev center. Extending our consideration to groups of individuals, we have formulated a new distinctiveness metric to measure individuals’ labeled pairwise trust preferences with respect to a wider population. Aggregating all pairwise trust preferences for a group, we have posed a sparse optimization problem informed by the population’s weighted preference graph. This yields an average trust measure and minimum covariance bound, enabling analysis of the group’s cohesion. Results from our user study suggest that individuals with similar trust profiles may be grouped by low distinctiveness.
We anticipate three main areas for future work: measuring steady-state trust preferences in longer-duration interactions, modeling a population’s aggregated trust preferences using explicitly distinct clusters, and identifying an ideal truncation length for feature vectors produced by Valma.
References
- [1] J. D. Lee and K. A. See, “Trust in Automation: Designing for Appropriate Reliance,” Human Factors, vol. 46, no. 1, pp. 50–80, Mar. 2004.
- [2] J. B. Lyons, K. Sycara, M. Lewis, and A. Capiola, “Human-Autonomy Teaming: Definitions, Debates, and Directions,” Frontiers in Psychology, vol. 12, p. 589585, 2021.
- [3] K. E. Schaefer, “Measuring Trust in Human Robot Interactions: Development of the “Trust Perception Scale-HRI”,” in Robust Intelligence and Trust in Autonomous Systems, R. Mittu, D. Sofge, A. Wagner, and W. Lawless, Eds. Boston, MA: Springer US, 2016, pp. 191–218.
- [4] C. Nam, P. Walker, H. Li, M. Lewis, and K. Sycara, “Models of Trust in Human Control of Swarms With Varied Levels of Autonomy,” IEEE Transactions on Human-Machine Systems, vol. 50, no. 3, pp. 194–204, Jun. 2020.
- [5] J. Fürnkranz and E. Hüllermeier, “Preference learning and ranking by pairwise comparison,” in Preference learning. Springer, 2010, pp. 65–82.
- [6] P. Kingston and M. Egerstedt, “Comparing apples and oranges through partial orders: An empirical approach,” in 2009 American Control Conference, Jun. 2009, pp. 5434–5439.
- [7] K. Akash, W.-L. Hu, T. Reid, and N. Jain, “Dynamic modeling of trust in human-machine interactions,” in 2017 American Control Conference (ACC), May 2017, pp. 1542–1548.
- [8] E. Schoof, C. Manzie, I. Shames, A. Chapman, and D. Oetomo, “An experimental platform for heterogeneous multi-vehicle missions,” in Proceedings of the International Conference on Science and Innovation for Land Power, Adelaide, Australia, 2018, pp. 5–6.
- [9] J. Marshall, M. Broucke, and B. Francis, “Formations of vehicles in cyclic pursuit,” IEEE Transactions on Automatic Control, vol. 49, no. 11, pp. 1963–1974, Nov. 2004.
- [10] A. Pierson and M. Schwager, “Bio-inspired non-cooperative multi-robot herding,” in 2015 IEEE International Conference on Robotics and Automation (ICRA), May 2015, pp. 1843–1849.
- [11] P. Pierpaoli, T. T. Doan, J. Romberg, and M. Egerstedt, “A Reinforcement Learning Framework for Sequencing Multi-Robot Behaviors,” arXiv:1909.05731 [cs, eess], Sep. 2019.
- [12] Y.-C. Ho, R. Sreenivas, and P. Vakili, “Ordinal optimization of deds,” Discrete event dynamic systems, vol. 2, no. 1, pp. 61–88, 1992.
- [13] P. Kingston, J. von Hinezmeyer, and M. Egerstedt, “Metric Preference Learning with Applications to Motion Imitation,” in Controls and Art, A. LaViers and M. Egerstedt, Eds. Cham: Springer International Publishing, 2014, pp. 1–26.
- [14] S. L. Brunton and J. N. Kutz, Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge: Cambridge University Press, 2019. [Online]. Available: https://www.cambridge.org/core/books/datadriven-science-and-engineering/77D52B171B60A496EAFE4DB662ADC36E
- [15] K. Aberman, R. Wu, D. Lischinski, B. Chen, and D. Cohen-Or, “Learning character-agnostic motion for motion retargeting in 2D,” ACM Transactions on Graphics, vol. 38, no. 4, pp. 75:1–75:14, Jul. 2019.
- [16] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” Sep. 2014.
- [17] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, Nov. 1997.
- [18] R. Herbrich, T. Graepel, P. Bollmann-Sdorra, and K. Obermayer, “Supervised learning of preference relations,” Proceedings des Fachgruppentreffens Maschinelles Lernen (FGML-98), pp. 43–47, 1998.
- [19] S. Boyd, S. P. Boyd, and L. Vandenberghe, Convex optimization. Cambridge university press, 2004.
- [20] L. D. Brown, T. T. Cai, and A. DasGupta, “Interval Estimation for a Binomial Proportion,” Statistical Science, vol. 16, no. 2, pp. 101 – 133, 2001.