Parameter Estimation in Epidemic Spread Networks Using Limited Measurements††thanks: Research supported in part by the National Science Foundation, grants NSF-CMMI 1635014 and NSF-ECCS 2032258.
Abstract
We study the problem of estimating the parameters (i.e., infection rate and recovery rate) governing the spread of epidemics in networks. Such parameters are typically estimated by measuring various characteristics (such as the number of infected and recovered individuals) of the infected populations over time. However, these measurements also incur certain costs, depending on the population being tested and the times at which the tests are administered. We thus formulate the epidemic parameter estimation problem as an optimization problem, where the goal is to either minimize the total cost spent on collecting measurements, or to optimize the parameter estimates while remaining within a measurement budget. We show that these problems are NP-hard to solve in general, and then propose approximation algorithms with performance guarantees. We validate our algorithms using numerical examples.
Keywords: Epidemic spread networks, Parameter estimation, Optimization algorithms
1 Introduction
Models of spreading processes over networks have been widely studied by researchers from different fields (e.g., [14, 23, 6, 3, 25, 20]). The case of epidemics spreading through networked populations has received a particularly significant amount of attention, especially in light of the ongoing COVID-19 pandemic (e.g., [20, 21]). A canonical example is the networked SIR model, where each node in the network represents a subpopulation or an individual, and can be in one of three states: susceptible (S), infected (I), or recovered (R) [18]. There are two key parameters that govern such models: the infection rate of a given node, and the recovery rate of that node. In the case of a novel virus, these parameters may not be known a priori, and must be identified or estimated from gathered data, including for instance the number of infected and recovered individuals in the network at certain points of time. For instance, in the COVID-19 pandemic, when collecting the data on the number of infected individuals or the number of recovered individuals in the network, one possibility is to perform virus or antibody tests on the individuals, with each test incurring a cost. Therefore, in the problem of parameter estimation in epidemic spread networks, it is important and of practical interest to take the costs of collecting the data (i.e., measurements) into account, which leads to the problem formulations considered in this paper. The goal is to exactly identify (when possible) or estimate the parameters in the networked SIR model using a limited number of measurements. Specifically, we divide our analysis into two scenarios: 1) when the measurements (e.g., the number of infected individuals) can be collected exactly without error; and 2) when only a stochastic measurement can be obtained.
Under the setting when exact measurements of the infected and recovered proportions of the population at certain nodes in the network can be obtained, we formulate the Parameter Identification Measurement Selection (PIMS) problem as minimizing the cost spent on collecting the measurements, while ensuring that the parameters of the SIR model can be uniquely identified (within a certain time interval in the epidemic dynamics). In settings where the measurements are stochastic (thereby precluding exact identification of the parameters), we formulate the Parameter Estimation Measurement Selection (PEMS) problem. The goal is to optimize certain estimation metrics based on the collected measurements, while satisfying the budget on collecting the measurements.
Related Work
The authors in [22, 34] studied the parameter estimation problem in epidemic spread networks using a "Susceptible-Infected-Susceptible (SIS)" model of epidemics. model. When exact measurements of the infected proportion of the population at each node of the network can be obtained, the authors proposed a sufficient and necessary condition on the set of the collected measurements such that the parameters of the SIS model (i.e., the infection rate and the recovery rate) can be uniquely identified. However, this condition does not pose any constraint on the number of measurements that can be collected.
In [24], the authors considered a measurement selection problem in the SIR model. Their problem is to perform a limited number of virus tests among the population such that the probability of undetected asymptotic cases is minimized. The transmission of the disease in the SIR model considered in [24] is characterized by a Bernoulli random variable which leads to a Hidden Markov Model for the SIR dynamics.
Finally, our work is also closely related to the sensor placement problem that has been studied for control systems (e.g., [19, 37, 36]), signal processing (e.g., [7, 35]), and machine learning (e.g., [17]). The goal of these problems is to optimize certain (problem-specific) performance metrics of the estimate based on the measurements of the placed sensors, while satisfying the sensor placement budget constraints.
Contributions
First, we show that the PIMS problem is NP-hard, which precludes polynomial-time algorithms for the PIMS problem (if P NP). By exploring structural properties of the PIMS problem, we provide a polynomial-time approximation algorithm which returns a solution that is within a certain approximation ratio of the optimal. The approximation ratio depends on the cost structure of the measurements and on the graph structure of the epidemic spread network. Next, we show that the PEMS problem is also NP-hard. In order to provide a polynomial-time approximation algorithm that solves the PEMS problem with performance guarantees, we first show that the PEMS problem can be transformed into the problem of maximizing a set function subject to a knapsack constraint. We then apply a greedy algorithm to the (transformed) PEMS problem, and provide approximation guarantees for the greedy algorithm. Our analysis for the greedy algorithm also generalizes the results from the literature for maximizing a submodular set function under a knapsack constraint to nonsubmodular settings. We use numerical examples to validate the obtained performance bounds of the greedy algorithm, and show that the greedy algorithm performs well in practice.
Notation and Terminology
The sets of integers and real numbers are denoted as and , respectively. For a set , let denote its cardinality. For any , let . Let denote a zero matrix of dimension ; the subscript is dropped if the dimension can be inferred from the context. For a matrix , let , and be its transpose, trace and determinant, respectively. The eigenvalues of are ordered such that . Let (or ) denote the element in the th row and th column of , and let denote the th row of . A positive semidefinite matrix is denoted by .
2 Model of Epidemic Spread Network
Suppose a disease (or virus) is spreading over a directed graph , where is the set of nodes, and is the set of directed edges (and self loops) that captures the interactions among the nodes in . Here, each node is considered to be a group (or population) of individuals (e.g., a city or a country). A directed edge from node to node , where , is denoted by . For all , denote and . For all and for all , we let , and represent the proportions of population of node that are susceptible, infected and recovered at time , respectively. To describe the dynamics of the spread of the disease in , we will use the following discrete-time SIR model (e.g., [11]):
(1a) | |||
(1b) | |||
(1c) |
where is the infection rate of the disease, is the recovery rate of the disease, is the sampling parameter, and is the weight associated with edge . Let be a weight matrix, where for all such that , and otherwise. Denote , , and , for all . Throughout this paper, we assume that the weight matrix and the sampling parameter are given.
3 Preliminaries
We make the following assumptions on the initial conditions , and , and the parameters of the SIR model in Eq. (1) (e.g., [22, 11]).
Assumption 3.1.
For all , , , , and .
Assumption 3.2.
Assume that with . For all with and , assume that . For all , .
Definition 3.3.
Consider a directed graph with . A directed path of length from node to node in is a sequence of directed edges . For any distinct pair of nodes such that there exists a directed path from to , the distance from node to node , denoted as , is defined as the shortest length over all such paths.
Definition 3.4.
Define and . For all , define where , and if there is no path from to for any . For all , define .
Using similar arguments to those in [11], one can show that with for all and for all under Assumptions 3.1-3.2. Thus, given and , we can obtain for all and for all . We also have the following result that characterizes properties of and in the SIR model over given by Eq. (1); the proof can be found in Section 7.1 in the Appendix.
Lemma 3.5.
Consider a directed graph with and the SIR dynamics given by Eq. (1). Suppose Assumptions 3.1-3.2 hold. Then, the following results hold for all , where , and and are defined in Definition 3.4.
for all .
If , then for all , and for all .111Note that for the case when , i.e., , part implies for all .
If , then for all , and for all .
If with , then and for all .
4 Measurement Selection Problem in Exact Measurement Setting
Throughout this section, we assume that defined in Definition 3.4 are known, i.e., we know the set of nodes in that have infected individuals initially.
4.1 Problem Formulation
Given exact measurements of and for a subset of nodes, our goal is to estimate (or uniquely identify, if possible) the unknown parameters and . Here, we consider the scenario where collecting the measurement of (resp., ) at any node and at any time step incurs a cost, denoted as (resp., ). Moreover, we can only collect the measurements of and for , where are given with . Noting that Lemma 3.5 provides a (sufficient and necessary) condition under which (resp., ) holds, we see that one does not need to collect a measurement of (resp., ) if (resp., ) from Lemma 3.5. Given time steps with , we define a set
(2) |
which represents the set of all candidate measurements from time step to time step . To proceed, we first use Eq. (1b)-(1c) to obtain
(3) |
where with
(4) |
and with
(5) |
We can then view Eq. (3) as a set of equations in and . Noting that for all can be obtained from as argued in Section 3, we see that the coefficients in the set of equations in and given by Eq. (3), i.e., the terms in Eq. (3) other than and , can be determined given that and are known for all . Also note that given and for all , we can uniquely identify and using Eq. (3) if and only if .
Next, let denote a measurement selection strategy, where is given by Eq. (2). We will then consider identifying and using measurements contained in . To illustrate our analysis, given any and any , we first consider the following equation from Eq. (3):
(6) |
where , and we index the equation in Eq. (3) corresponding to Eq. (6) as . Note that in order to use Eq. (6) in identifying and , one needs to determine the coefficients (i.e., the terms other than and ) in the equation. Also note that in order to determine the coefficients in equation , one can use the measurements contained in , and use Lemma 3.5 to determine if (resp., ) holds. Supposing , we see from Lemma 3.5 and Eq. (1b) that and , which makes equation useless in identifying and . Thus, in order to use equation in identifying and , we need with . Similarly, given any and any , we consider the following equation from Eq. (3):
(7) |
where we index the above equation as . Supposing , we see from Lemma 3.5 and Eq. (1c) that , which makes equation useless in identifying and . Hence, in order to use equation in identifying and , we need to have with and . More precisely, we observe that equation can be used in identifying and if and only if , and or (from Lemma 3.5).
In general, let us denote the following two coefficient matrices corresponding to equations and in Eq. (3), respectively:
(8a) | |||
(8b) |
for all and for all . Moreover, given any measurement selection strategy , we let
(9) |
be the set that contains indices of the equations from Eq. (3) that can be potentially used in identifying and , based on the measurements contained in . In other words, the coefficients on the left-hand side of equation (resp., ()) can be determined using the measurements from and using Lemma 3.5, for all (resp., ). Let us now consider the coefficient matrix (resp., ) corresponding to (resp., ). One can then show that it is possible that there exist equations in whose coefficients cannot be (directly) determined using the measurements contained in or using Lemma 3.5, where the undetermined coefficients come from the first element in given by Eq. (8a). Nevertheless, it is also possible that one can perform algebraic operations among the equations in such that the undetermined coefficients get cancelled. Formally, we define the following.
Definition 4.1.
Consider a measurement selection strategy , where is given by Eq. (2). Stack coefficient matrices for all and for all into a single matrix, where and are given by (8) and is given by Eq. (9). The resulting matrix is denoted as . Moreover, define to be the set that contains all the matrices such that and can be obtained via algebraic operations among the rows in , and the elements in and can be fully determined using the measurements from and using Lemma 3.5.
In other words, corresponds to two equations (in and ) obtained from Eq. (3) such that the coefficients on the right-hand side of the two equations can be determined using the measurements contained in and using Lemma 3.5 (if the coefficients contain or ). Moreover, one can show that the coefficients on the left-hand side of the two equations obtained from Eq. (3) corresponding to can also be determined using measurements from and using Lemma 3.5. Putting the above arguments together, we see that given a measurement selection strategy , and can be uniquely identified if and only if there exists such that . Equivalently, denoting
(10) |
where if , we see that and can be uniquely identified using the measurements from if and only if .
Remark 4.2.
Note that if a measurement selection strategy satisfies , it follows from the above arguments that , i.e., has at least two rows, where is defined in Eq. (9).
Recall that collecting the measurement of (resp., ) at any node and at any time step incurs cost (resp., ). Given any measurement selection strategy , we denote the cost associated with as
(11) |
We then define the Parameter Identification Measurement Selection (PIMS) problem in the perfect measurement setting as follows.
Problem 4.3.
Consider a discrete-time SIR model given by Eq. (1) with a directed graph , a weight matrix , a sampling parameter , and sets defined in Definition 3.4. Moreover, consider time steps with , and a cost of measuring and a cost of measuring for all and for all . The PIMS problem is to find that solves
(12) |
where is defined in Eq. (2), is defined in Eq. (11), and is defined in Eq. (10).
We have the following result; the proof is included in Section 7.2 in the Appendix.
Theorem 4.4.
The PIMS problem is NP-hard.
Theorem 4.4 indicates that there is no polynomial-time algorithm that solves all instances of the PIMS problem optimally (if P NP). Moreover, we note from the formulation of the PIMS problem given by Problem 4.3 that for a measurement selection strategy , one needs to check if holds, before the corresponding measurements are collected. However, in general, it is not possible to calculate when no measurements are collected. In order to bypass these issues, we will explore additional properties of the PIMS problem in the following.
4.2 Solving the PIMS Problem
We start with the following result.
Lemma 4.5.
Consider a discrete time SIR model given by Eq. (1). Suppose Assumptions 3.1-3.2 hold. Then, the following results hold, where and are defined in (8), , , and and are defined in Definition 3.4 for all .
For any and for any with , for all and for all , where .
For any and for any with , for all , and for all , where .
Proof.
Noting from (8), we have
(13) |
To prove part , consider any and any with , where we note and from the definition of . We then see from Lemma - that and for all . It follows that for all . Also, we obtain from Lemma for all , which proves part .
We then prove part . Considering any and any with , we see from the definition of that and there exists such that . Letting be a node in such that , we note from Lemma that for all . Also note that from Assumption 3.2. The rest of the proof of part is then identical to that of part . ∎
Recalling the way we index the equations in Eq. (3) (see Eqs. (6)-(7) for examples), we define the set that contains all the indices of the equations in Eq. (3):
(14) |
Following the arguments in Lemma 4.5, we denote
(15) | |||
(16) |
where and are defined in Lemma 4.5, and is defined in Definition 3.4. Next, for all , we define the set of measurements that are needed to determine the coefficients in equation (when no other equations are used) to be
where is defined in Eq. (2). Similarly, for all , we define
Moreover, let us denote
(17) |
for all . Similarly to Eq. (11), denote the sum of the costs of the measurements from as .
Based on the above arguments, we propose an algorithm defined in Algorithm 1 for the PIMS problem. Note that Algorithm 1 finds an equation from and an equation from such that the sum of the costs of the two equations is minimized, where and are defined in Eq. (15) and Eq. (16), respectively.
Proposition 4.6.
Proof.
The feasibility of follows directly from the definition of Algorithm 1 and Lemma 4.5. We now prove (18). Consider any equations and . We have from Eq. (17) the following:
which implies
Next, since satisfies , we recall from Remark 4.2 , where
which implies . In fact, suppose , where and . Since the elements in and (defined in (8)) do not contain , or for any , and cannot all be zero, we see that there exists (with ), where . This further implies . Using similar arguments, one can show that holds in general, which implies . Combining the above arguments leads to (18). ∎
5 Measurement Selection Problem in Random Measurement Setting
In this section, we assume that the initial condition is known. Nevertheless, our analysis can potentially be extended to cases where the initial condition is given by a probability distribution.
5.1 Problem Formulation
Here, we consider the scenario where the measurement of (resp., ), denoted as (resp., ), is given by a pmf (resp., ). Note one can express in terms of and using (1b). Hence, given and , we can alternatively write as for all and for all . Since the initial conditions are assumed to be known, we drop the dependency of on , and denote the pmf of as for all and for all . Similarly, given and , we denote the pmf of as for all and for all . Note that when collecting measurement (resp., ) under a limited budget, one possibility is to give virus (resp., antibody) tests to a group of randomly and uniformly sampled individuals of the population at node and at time (e.g., [2]), where a positive testing result indicates that the tested individual is infected (resp., recovered) at time (e.g., [1]). Thus, the obtained random measurements and and the corresponding pmfs and depend on the total number of conducted virus tests and antibody tests at node and at time , respectively. Consider any node and any time step , where the total population of is denoted by and is assumed to be fixed over time. Suppose we are allowed to choose the number of virus (resp., antibody) tests that will be performed on the (randomly sampled) individuals at node and at time . Assume that the cost of performing the virus (resp., antibody) tests is proportional to the number of the tests. For any and for any , let
(19) |
be the set of all possible costs that we can spend on collecting the measurement , where and . Similarly, for any and any , let
(20) |
denote the set of all possible costs that we can spend on collecting the measurement , where and . For instance, can be viewed as the cost of performing virus tests on (randomly sampled) individuals in the population at node , where and . To reflect the dependency of the pmf (resp., ) of measurement (resp., ) on the cost spent on collecting the measurement of (resp., ), we further denote the pmf of (resp., ) as (resp., ), where (resp., ) with (resp., ) given by Eq. (19) (resp., Eq. (20)). Note that (resp., ) is the cost that we spend on collecting measurement (resp., ), and (resp., ) indicates that measurement (resp., ) is not collected.
In contrast with the exact measurement case studied in Section 4, it is not possible to uniquely identify and using measurements and which are now random variables. Thus, we will consider estimators of and based on the measurements indicated by a measurement selection strategy. Similarly to Section 4, given time steps with , define the set of all candidate measurements as
(21) |
Recalling and defined in Eq. (19) and Eq. (20), respectively, we let be a measurement selection that specifies the costs spent on collecting measurements and for all and for all . Moreover, we define the set of all candidate measurement selections as
(22) |
where for all . In other words, a measurement selection is defined over the integer lattice so that is a vector of dimension , where each element of corresponds to an element in , and is denoted as (or ). The set contains all such that and for all and for all . Thus, for any and for any , there exists such that and . In other words, (resp., ) indicates the cost spent on collecting the measurement of (resp., ). Given a measurement selection , we can also denote the pmfs of and as and , respectively, where we drop the dependencies of the pmfs on and for notational simplicity.
To proceed, we consider the scenario where measurements can only be collected under a budget constraint given by . Using the above notations, the budget constraint can be expressed as
(23) |
We then consider estimators of based on any given measurement selection . Considering any , we denote
(24) |
for all and for all . For all and for all with , denote , where . Letting
we denote the measurement vector indicated by as
(25) |
where and . Note that and are (discrete) random variables with pmfs and , respectively. We then see from Eq. (25) that is a random vector whose pmf is denoted as . Similarly, the pmf of (resp., ) is denoted as (resp., ). Given with , we make the following assumption on measurements and .
Assumption 5.1.
For any and for any (), , , and are independent of each other. Moreover, for any () and for any , and are independent, and and are independent.
The above assumption ensures that measurements from different nodes or from different time steps are independent, and the measurements of and are also independent. It then follows from Eq. (25) that the pmf of can be written as
(26) |
where we can further write for all , and for all .
In order to quantify the performance (e.g., precision) of estimators of based on , we use the Bayesian Cramer-Rao Lower Bound (BCRLB) (e.g., [33]) associated with . In the following, we introduce the BCRLB, and explain why we choose it as a performance metric. First, given any measurement , let be the corresponding Fisher information matrix defined as
(27) |
with the expectation taken with respect to . Under Assumption 5.1 and some regularity conditions on the pmfs of and , Eq. (27) can be written as the following (e.g., [13]):
(28) |
Consider any estimator of based on a measurement selection , and assume that we have a prior pdf of , denoted as . Under some regularity conditions on the pmfs of and , and , we have (e.g., [32, 33]):
(29) |
where is the error covariance of the estimator , the expectation is taken with respect to , and is the BCRLB associated with the measurement selection . The BCRLB is defined as (e.g., [32, 33])
(30) |
where denotes the expectation taken with respect to , is given by Eq. (27), and encodes the prior knowledge of as
(31) |
where the second equality holds under some regularity conditions on [32].
Thus, the above arguments motivate us to consider (functions of) as optimization metrics in the measurement selection problem studied in this section, in order to characterize the estimation performance corresponding to a measurement selection . In particular, we will consider and , which are widely used criteria in parameter estimation (e.g., [12]), and are also known as the Bayesian A-optimality and D-optimality criteria respectively in the context of experimental design (e.g., [27]). First, considering the optimization metric , we see from the above arguments that (29) directly implies for all estimators of and for all . Therefore, a measurement selection that minimizes potentially yields a lower value of for an estimator of . Furthermore, there may exist an estimator that achieves the BCRLB (e.g., [32]), i.e., provides the minimum value of that can be possibly achieved by any estimator of , given a measurement selection . Similar arguments hold for . To proceed, denoting
(32) |
we define the Parameter Estimation Measurement Selection (PEMS) problem.
Problem 5.2.
Consider a discrete-time SIR model given by Eq. (1) with a directed graph , a weight matrix , a sampling parameter , and an initial condition . Moreover, consider time steps with ; a set with and , for all and for all ; a set with and , for all and for all ; a budget ; and a prior pdf . Suppose (resp., ) is given by a pmf (resp., ), where (resp., ). The PEMS problem is to find a measurement selection that solves
(33) |
where is defined in Eq. (22), can be either of or with and defined in Eq. (32), is defined in Eq. (21), and is given by Eq. (30).
5.2 Solving the PEMS Problem
In this section, we consider a measurement model with specific pmfs of and (e.g., [4] and [11]). Nonetheless, our analysis can potentially be extended to other measurement models.
5.2.1 Pmfs of Measurements and
Consider any and any . Assume that the total population of node is fixed over time and is denoted as . Given any measurement selection with defined in Eq. (22), we recall from Section 5.1 that can be viewed as the cost of performing virus tests on randomly and uniformly sampled individuals in the population of node , where (with ), and with . Note that is the proportion of population at node and at time that is infected, and under Assumptions 3.1-3.2 as shown by Lemma 3.5. Thus, a randomly and uniformly sampled individual in the population at node and at time will be an infected individual (at time ) with probability , and will be a non-infected (i.e., susceptible or recovered) individual with probability . Supposing the tests are accurate,222Here, “accurate” means that an infected individual (at time ) will be tested positive with probability one, and an individual that is not infected will be tested negative with probability one. we see from the above arguments that the obtained number of individuals that are tested positive, i.e., , is a binomial random variable with parameters and . Thus, for any and for any , the pmf of is
(34) |
where with since . Note that we do not define the pmf of measurement when , i.e., when , since indicates no measurement is collected for state . Also note that when , the pmf of given in Eq. (34) reduces to . Moreover, since the weight matrix and the sampling parameter are assumed to be given, we see that given and initial condition , can be obtained using Eq. (1b) for all and for all , where we can view as a function in the unknown parameter . In other words, given , , , and , one can obtain the right-hand side of Eq. (34). Again, we only explicitly express the dependency of the pmf of on and in Eq. (34). Following similar arguments to those above, we assume that for any and for any , measurement has the following pmf:
(35) |
where with , , and . Similarly, we note that the pmf of given in Eq. (35) reduces to when . Considering any measurement selection and any measurement , where and is defined in Eq. (21), we have the following:
(36) | ||||
(37) |
where the expectation is taken with respect to , and . To obtain (36), we note the form of in Eq. (34) (or Eq. (35)), and use the chain rule. Moreover, one can obtain (37) from the fact that is a binomial random variable. Noting that the pmf of reduces to if as argued above, we let the right-hand side of (37) be zero if .
5.2.2 Complexity of the PEMS Problem
Under the measurement model described above, we show that the PEMS problem is also NP-hard, i.e., there exist instances of the PEMS problem that cannot be solved optimally by any polynomial-time algorithm (if P NP). The proof of the following result is included in Section 7.3 in the Appendix.
Theorem 5.3.
The PEMS problem is NP-hard.
5.2.3 An Equivalent Formulation for the PEMS Problem
Theorem 5.3 motivates us to consider approximation algorithms for solving the PEMS problem. To begin with, we note that the objective function in the PEMS problem can be viewed as a function defined over an integer lattice. We then have and , where is defined in Eq. (22). First, considering , we will define a set function , where is a set constructed as
(38) |
In other words, for any and for any , we associate elements (resp., ) in set to measurement (resp., ). The set function is then defined as
(39) |
where for any , we define such that and for all and for all . In other words, (resp., ) is set to be the number of elements in that correspond to the measurement (resp., ). Also note that . Following the arguments leading to (37), we define
(40) |
where , , , , , and the expectation is taken with respect to the prior pdf . Given any , we see from the arguments for (37) that . Moreover, one can show that . Similarly, one can obtain , which implies for all . Now, suppose the pmfs of and are given by Eq. (34) and Eq. (35), respectively. Recall from Eq. (30) that for all , where and are given by (31) and (28), respectively. Supposing Assumption 5.1 holds, for all , one can first express using (37), and then use Eq. (40) to obtain , where is defined above given . Putting the above arguments together, we have from Eq. (39) the following:
(41) |
Next, let the cost of be , denoted as , for all , and let the cost of be , denoted as , for all , where and are given in the instance of the PEMS problem. Setting the cost structure of the elements in in this way, we establish an equivalence between the cost of a subset and the cost of , where is defined above. Similarly, considering the objective function in the PEMS problem, we define a set function as
(42) |
where we define such that and for all and for all . Note that given an instance of the PEMS problem in Problem 5.2, we can construct the set with the associated costs of the elements in in time, where is the number of nodes in graph , and with and for all . Assuming that and are (fixed) constants, the construction of the set with the associated costs takes time, which is polynomial in the parameters of the PEMS problem (Problem 5.2). Based on the above arguments, we further consider the following problem:
(P) |
where can be either of or with and given by in (41) and (42), respectively, and for all . By the way we construct and the costs of elements in , one can verify that (resp., ) is an optimal solution to Problem (P) with (resp., ) if and only if (resp., ) defined above is an optimal solution to (33) in Problem 5.2 with (resp., ). Thus, given a PEMS instance we can first construct with the associated cost for each element in , and then solve Problem (P).
5.3 Greedy Algorithm for the PEMS Problem
Note that Problem (P) can be viewed as a problem of maximizing a set function subject to a knapsack constraint, and greedy algorithms have been proposed to solve this problem with performance guarantees when the objective function is monotone nondecreasing and submodular333A set function , where is the ground set, is said to be monotone nondecreasing if for all . is called submodular if for all and for all . (e.g., [16] and [29]). Before we formally introduce the greedy algorithm for the PEMS problem, we first note from (40)-(42) that given a prior pdf of and any , one has to take the expectation in order to obtain the value of . However, it is in general intractable to explicitly calculate the integration corresponding to . Hence, one may alternatively evaluate the value of using numerical integration with respect to (e.g., [28]). Specifically, a typical numerical integration method (e.g., the trapezoid rule) approximates the integral of a function (over an interval) based on a summation of (weighted) function values evaluated at certain points within the integration interval, which incurs an approximation error (see e.g., [28], for more details). We then see from (40)-(42) that in order to apply the numerical integration method described above to , one has to obtain the values of , , , and for a given (within the integration interval), where and with given in an instance of the PEMS problem. Recall that the initial conditions , and are assumed to be known. We first observe that for any given , the values of and for all and for all can be obtained using the recursions in (1) in time. Next, noting that and , we take the derivative with respect to on both sides of the equations in (1) and obtain
(43) |
Considering any given , we can then use the recursion in (1) together with the recursion in (43) to obtain the values of and for all and for all in time. Similarly, considering any given , one can obtain the values of and for all and for all in time.
Putting the above arguments together and considering the prior pdf of , i.e., , we see from (40)-(42) that for all , an approximation of , denoted as , can be obtained in time, where is the number of points used for the numerical integration method with respective to , as we described above.444We assume that is polynomial in the parameters of the PEMS instance. Furthermore, in the sequel, we may assume that satisfies for all (with ), where .555Note that is related to the approximation error of the numerical integration method, and will decrease if increases; see, e.g., [28], for more details about the numerical integration method. We are now ready to introduce the greedy algorithm given in Algorithm 2 to solve the PEMS problem, where and is the approximation of as we described above. From the definition of Algorithm 2, we see that the number of function calls of required in the algorithm is , and thus the overall time complexity of Algorithm 2 is given by .
We proceed to analyze the performance of Algorithm 2 when applied to the PEMS problem. First, one can observe that in Problem (P) shares a similar form to the one in [30]. Thus, using similar arguments to those in [30], one can show that is monotone nondecreasing and submodular with . Noting the assumption that for all , one can show that given by line 6 of Algorithm 2 satisfies that for all . Similarly, one can show that , where is given by line 3 in Algorithm 2. One can then use similar arguments to those for Theorem 1 in [16] and obtain the following result; the detailed proof is omitted for conciseness.
Theorem 5.4.
In contrast to , the objective function is not submodular in general (e.g., [17]). In fact, one can construct examples where the objective function in the PEMS problem is not submodular. Hence, in order to provide performance guarantees of the greedy algorithm when applied to Problem (P) with , we will extend the analysis in [16] to nonsubmodular settings. To proceed, note that for all , we have , which implies and [10]. Therefore, the objective function is monotone nondecreasing with . We then characterize how close is to being submodular by introducing the following definition.
Definition 5.5.
Consider Problem (P) with , where is defined in (39). Suppose Algorithm 2 is applied to solve Problem (P). For all , let denote the set that contains the first elements added to set in Algorithm 2, and let . The type-1 greedy submodularity ratio of is defined to be the largest that satisfies
(45) |
for all and for all . The type-2 greedy submodularity ratio of is defined to be the largest that satisfies
(46) |
for all and for all such that , where .
Remark 5.6.
Note that since we approximate using as we argued above, we may not be able to obtain the exact values of and from Definition 5.5. Moreover, finding may require an exponential number of function calls of (or ). Nonetheless, it will be clear from our analysis below that obtaining lower bounds on and suffices. Here, we describe how we obtain a lower bound on using , and defer our analysis for lower bounding to the end of this section, which requires more careful analysis. Similarly to (46), let denote the largest real number that satisfies
(47) |
for all and for all such that . Noting our assumption that for all (with ), one can see that given by (47) also satisfies (46), which implies . Given for all from Algorithm 2, can now be obtained via function calls of .
Based on Definition 5.5, the following result extends the analysis in [15, 16], and characterizes the performance guarantees of Algorithm 2 for Problem (P) with .
Theorem 5.7.
Proof.
Noting that (48) holds trivially if or , we assume that and . In this proof, we drop the subscript of (resp., ) and denote (resp., ) for notational simplicity. First, recall that for all , we let denote the set that contains the first elements added to set in Algorithm 2, and let . Now, let be the first index in such that a candidate element for (given in line of Algorithm 2) cannot be added to due to . In other words, for all , any candidate element for satisfies and can be added to in Algorithm 2. Noting that for all , one can then show that the following hold for all :
(49) |
Now, considering any , we have the following:
(50) | |||
(51) | |||
(52) | |||
(53) |
where (50) follows from the definition of in (45), and (51) follows from (49). To obtain (52), we use the fact . Similarly, we obtain (53). Noting that is monotone nondecreasing, one can further obtain from (53) that
(54) |
To proceed, let be the (first) candidate element for that cannot be added to due to , as we argued above. Similarly to (49), one can see that holds for all . Letting and following similar arguments to those leading up to (54), we have
(55) |
Denoting for all and , we obtain from (54) and (55) the following:
(56) |
Unrolling (56) yields
(57) | ||||
(58) |
To obtain (57), we use the facts that for all and , since as we argued in Remark 5.6. To obtain (58), we first note from the way we defined that . Also noting that , we then obtain (58).
Now, one can show that (e.g., [15]). We then have from (58) the following:
(59) |
where (59) follows from .
To proceed with the proof of the theorem, we note from the definition of in Definition 5.5 that with , which together with (59) imply that
(60) |
where . Since is monotone nondecreasing, we obtain from (60)
(61) |
We will then split our analysis into two cases. First, supposing that , we see from (61) that at least one of and holds. Recalling that for all , it follows that at least one of and holds. Second, supposing and using similar arguments, we have that at least one of and holds. Now, we note from line 12 of Algorithm 2 that , which implies . Combining the above arguments together, we obtain (48). ∎
Remark 5.8.
Note that (48) becomes if . Also note that can hold when the objective function is not submodular, as we will see later in our numerical examples.
Remark 5.9.
The authors in [31] also extended the analysis of Algorithm 2 to nonsubmodular settings, when the objective function can be exactly evaluated (i.e., ). They obtained a performance guarantee for Algorithm 2 that depends on a submodularity ratio defined in a different manner. One can show that the submodularity ratios defined in Definition 5.5 are lower bounded by the one defined in [31], which further implies that the performance bound (when is ) for Algorithm 2 given in Theorem 5.7 is tighter than that provided in [31].
Finally, we aim to provide a lower bound on that can be computed in polynomial time. The lower bounds on and together with Theorem 5.7 will also provide performance guarantees for the greedy algorithm.
Lemma 5.10.
([10]) For any positive semidefinite matrices , , and .
We have the following result; the proof is included in Section 7.4 in the Appendix.
Lemma 5.11.
Recalling our arguments at the beginning of this section, we may only obtain approximations of the entries in the ( by ) matrix for using, e.g., numerical integration, where (resp., ) is defined in (40) (resp., (31)). Specifically, for all , let be the approximation of , where each entry of represents the approximation error of the corresponding entry in . Since and are positive semidefinite matrices, is a symmetric matrix. Now, using a standard eigenvalue perturbation result, e.g., Corollary 6.3.8 in [10], one can obtain that for all , where denotes the Frobenius norm of a matrix. It then follows that
where and satisfies for all . Combining the above arguments with (62) in Lemma 5.11, we obtain a lower bound on that can be computed using function calls of .
5.3.1 Illustrations
Note that one can further obtain from (62):
(63) |
where . Supposing is fixed, we see from (63) that the lower bound on would potentially increase if and increase. Recall that given by (31) encodes the prior knowledge that we have about . Moreover, recall from (40) that depends on the prior pdf and the dynamics of the SIR model in (1). Thus, the lower bound given by Lemma 5.11 and thus the corresponding performance bound of Algorithm 2 given in Theorem 5.7 depend on the prior knowledge that we have on and the dynamics of the SIR model. Also note that the performance bounds given in Theorem 5.7 are worst-case performance bounds for Algorithm 2. Thus, in practice the ratio between a solution returned by the algorithm and an optimal solution can be smaller than the ratio predicted by Theorem 5.7, as we will see in our simulations in next section. Moreover, instances with tighter performance bounds potentially imply better performance of the algorithm when applied to those instances. Similar arguments hold for the performance bounds provided in Theorem 5.4.
5.3.2 Simulations
To validate the theoretical results in Theorems 5.4 and 5.7, and Lemma 5.11, we consider various PEMS instances.777In our simulations, we neglect the approximation error corresponding to the numerical integrations discussed in Section 5.3, since the error terms are found to be sufficiently small. The directed network is given by Fig. . According to the existing literature about the estimated infection and recovery rates for the COVID-19 pandemic (e.g., [26]), we assume that the infection rate and the recovery rate lie in the intervals and , respectively. Let the prior pdf of (resp., ) be a (linearly transformed) Beta distribution with parameters and (resp., and ), where and are also assumed to be independent. The prior pdfs of and are then plotted in Fig. and Fig. , respectively. We set the sampling parameter . We then randomly generate the weight matrix such that Assumptions 3.1-3.2 are satisfied, where each entry of is drawn (independently) from certain uniform distributions. The initial condition is set to be , and , and , and for all . In the pmfs of measurements and given in Eq. (34) and Eq. (35), respectively, we set and , where is the total population at node .



First, let us consider PEMS instances with a relatively smaller size. In such instances, we set the time steps , i.e., we only consider collecting measurements at time step . In the sets and , we let and for all , and draw and uniformly randomly from . Here, we can choose to perform , , or virus (or antibody) tests at a node and at . In Fig. , we consider the objective function , given by Eq. (42), in the PEMS instances constructed above, and plot the greedy solutions and the optimal solutions (found by brute force) to the PEMS instances under different values of budget . Note that for all the simulation results in this section, we obtain the averaged results from randomly generated matrices as described above, for each value of . As shown in Theorem 5.4, the greedy algorithm yields a approximation for (in the worst case), and the results in Fig. show that the greedy algorithm performs near optimally for the PEMS instances generated above. Similarly, in Fig. , we plot the greedy solutions and the optimal solutions to the PEMS instances constructed above under different values of , when the objective function is given in Eq. (39). Again, the results in Fig. show that the greedy algorithm performs well for the constructed PEMS instances. Moreover, according to Lemma 5.11, we plot the lower bound on the submodularity ratio of in Fig. . Here, we note that the submodularity ratio of is always greater than one in the PEMS instances constructed above. Hence, Theorem 5.7 yields a worst-case approximation guarantee for the greedy algorithm, where .



We then investigate the performance of the greedy algorithm for PEMS instances of a larger size. Since the optimal solution to the PEMS instances cannot be efficiently obtained when the sizes of the instances become large, we obtain the lower bound on the submodularity ratio of provided in Lemma 5.11, which can be computed in polynomial time. Different from the smaller instances constructed above, we set and . We let for all in and , where we also set and draw and uniformly randomly from , for all and for all . Moreover, we modify the parameter of the Beta distribution corresponding to the pdf of to be and . Here, we can choose to perform , , , …, or virus (or antibody) tests at a node and at . In Fig. , we plot the lower bound on obtained from the PEMS instances constructed above. We note that the submodularity ratio of is also always greater than one. Hence, Theorem 5.7 yields a worst-case approximation guarantee for the greedy algorithm. We plot in Fig. the approximation guarantee using the lower bound that we obtained on .


6 Conclusion
We first considered the PIMS problem under the exact measurement setting, and showed that the problem is NP-hard. We then proposed an approximation algorithm that returns a solution to the PIMS problem that is within a certain factor of the optimal one. Next, we studied the PEMS problem under the noisy measurement setting. Again, we showed that the problem is NP-hard. We applied a greedy algorithm to solve the PEMS problem, and provided performance guarantees on the greedy algorithm. We presented numerical examples to validate the obtained performance bounds of the greedy algorithm, and showed that the greedy algorithm performs well in practice.
7 Appendix
7.1 Proof of Lemma 3.5
We first prove part . Considering any and any , we note from Eq. (1a) that
(64) |
Under Assumptions 3.1-3.2, we have for all as argued in Section 3, and for all , which implies . Supposing , we have from Eq. (64) . Combining the above arguments with the fact from Assumption 3.1, we see that for all . Noting that with for all and for all as argued in Section 3 and that and for all , the result in part also implies for all and for all .
One can then observe that in order to prove parts -, it is sufficient to prove the following facts.
Fact 1.
Consider any and any . If , then for all with .
Fact 2.
Consider any and any such that . If there exists such that , then . If for all , then .
Fact 3.
Consider any and any . If , then . If , then .
Let us first prove Fact 1. Consider any and any . Supposing , we have from Eq. (1)
(65) |
where the first term on the right-hand side of the above equation is positive, since from Assumption 3.2, and the second term on the right-hand side of the above equation is nonnegative. It then follows that . Repeating the above argument proves Fact 1.
7.2 Proof of Theorem 4.4
We show that PIMS is NP-hard via a polynomial reduction from the exact cover by 3-sets (X3C) problem which is known to be NP-complete [9].
Problem 7.1.
Consider and a collection of 3-element subsets of , where . The X3C problem is to determine if there is an exact cover for , i.e., a subcollection such that every element of occurs in exactly one member of .

Consider an instance of the X3C problem given by a set and a collection of 3-element subsets of , where . We then construct an instance of the PIMS problem as follows. The node set of the graph is set to be . The edge set of is set to satisfy that if is contained in , for all , and . Note that based on the construction of , each node represents a subset from in the X3C instance, and each node represents an element from in the X3C instance, where the edges between and indicate how the elements in are included in the subsets in . A plot of is given in Fig. 4. Accordingly, the weight matrix is set to satisfy that if is contained in , for all , and . We set the sampling parameter to be . The set is set to be , i.e., for all . We set time steps and . Finally, we set for all , and for all , for all , and . Since for all , we see from Lemma 3.5 that and for all and for all . Therefore, Lemma 3.5 is no longer useful in determining the coefficients in the equations from Eq. (3).
We claim that an optimal solution, denoted as , to the constructed PIMS instance satisfies if and only if the solution to the X3C instance is “yes”.
First, suppose the solution to the X3C instance is “yes”. Denote an exact cover as , where . Let us consider a measurement selection strategy given by
We then have from Eq. (9) . Noting that for all and for all from Lemma , we consider the following equations from Eq. (3) whose indices are contained in :
(67) | |||
(68) |
where we note from the way we constructed . Since is an exact cover for , we see from the construction of that is a union of mutually disjoint (3-element) sets such that . Thus, subtracting the equations in (68) from Eq. (67), we obtain
(69) |
where we note as argued above. Following Definition 4.1, we stack coefficient matrices , and for all into a matrix , where and are defined in (8). Now, considering the matrix:
(70) |
we see from the above arguments that and can be be obtained via algebraic operations among the rows in , and the elements in and can be determined using the measurements from . Therefore, we have , where is defined in Definition 4.1. Noting that , we have , which implies , where is given by Eq. (10). Thus, satisfies the constraint in (12). Since from the way we set the costs of collecting measurements in the PIMS instance, we have .
Conversely, suppose the solution to the X3C instance is “no”, i.e., for any subcollection that contains subsets, there exists at least one element in that is not contained in any subset in . We will show that for any measurement selection strategy that satisfies , holds. Equivalently, we will show that for any with , does not hold. Consider any such that . Noting that for all in the constructed PIMS instance, we have and for all . Moreover, we see that contains at most measurements from . To proceed, let us consider any such that
(71) |
where . In other words, contains measurements from and all the other measurements from that have zero costs. It follows that . Similarly to (67) and (68), we have the following equations from Eq. (3) whose indices are contained in (given by Eq. (9)):
(72) | |||
(73) |
Noting that for any subcollection that contains subsets, there exists at least one element in that is not contained in any subset in as we argued above, we see that there exists at least one element in that is not contained in any subset in . It then follows from the way we constructed that there exists such that for all . Thus, by subtracting the equations in (73) (multiplied by any constants resp.) from Eq. (72), will remain on the right-hand side of the equation in (72). Similarly, consider any equation from (73) indexed by , where . First, suppose we subtract Eq. (72) multiplied by some positive constant and any equations in (73) other than equation (multiplied by any constants resp.) from equation . Since there exists such that for all as argued above, we see that will appear on the right-hand side of equation . Next, suppose we subtract any equations in (73) other than equation (multiplied by any constants resp.) from equation . One can check that either of the following two cases holds for the resulting equation : (a) the coefficients on the right-hand side of equation contain , where ; or (b) the coefficient matrix on the right-hand side of equation is of the form . Again, we stack for all and for all into a matrix , where we note that is of the form for all . One can then see from the above arguments that for all (if they exist) such that and can be obtained from algebraic operations among the rows in , and the elements in and can be determined using the measurements from , holds. It follows that , i.e., constraint in (12) does not hold. Using similar arguments to those above, one can further show that holds for all , completing the proof of the converse direction of the above claim.
Hence, it follows directly from the above arguments that an algorithm for the PIMS problem can also be used to solve the X3C problem. Since X3C is NP-complete, we conclude that the PIMS problem is NP-hard.
7.3 Proof of Theorem 5.3
We prove the NP-hardness of the PEMS problem via a polynomial reduction from the knapsack problem which is known to be NP-hard [9]. An instance of the knapsack problem is given by a set , a size and a value for each , and . The knapsack problem is to find such that is maximized while satisfying .
Given any knapsack instance, we construct an instance of the PEMS problem as follows. Let be a graph that contains a set of isolated nodes with and . Set the weight matrix to be , and set the sampling parameter as . The time steps and are set to be , i.e., only the measurements of and for all will be considered. The initial condition is set to satisfy , and for all . The budget constraint is set as . Let and for all . The pmfs of measurements and are given by Eqs. (34) and (35), respectively, with and for all , where Assumption 5.1 is assumed to hold. Finally, let the prior pdf of be a Beta distribution with parameters and , and let the prior pdf of also be a Beta distribution with parameters and , where we take and to be independent. Noting that in the PEMS instance constructed above, i.e., incurs a cost of , we only need to consider measurements for all . Moreover, since , a corresponding measurement selection is then given by . In other words, if measurement is collected (with cost ), and if measurement is not collected. We will see that there is a one to one correspondence between a measurement in the PEMS instance and an element in the knapsack instance.
Consider a measurement selection . Since and for all , Eq. (1c) implies for all , where . One then obtain from Eq. (28) and Eq. (37) the following:
(74) |
Next, noting that and are independent, one can show via Eq. (31) that is diagonal, where one can further show that using the fact that the pdfs of and are Beta distributions with parameters and . Similarly, one can obtain . It now follows from Eq. (74) that
(75) |
where are some constants (independent of ). Note that the objective in the PEMS instance is given by , where . First, considering the objective function , where , we see from Eq. (75) that is minimized (over ) if and only if is maximized. Similar arguments hold for the objective function . It then follows directly from the above arguments that a measurement selection is an optimal solution to the PEMS instance if and only if is an optimal solution to the knapsack instance. Since the knapsack problem is NP-hard, the PEMS problem is NP-hard.
7.4 Proof of Lemma 5.11
Noting the definition of in Definition 5.5, we provide a lower bound on for all and for all , where we assume that , otherwise (45) would be satisfied for all . Recalling the definition of in (39), we lower bound in the following manner:
(76) | ||||
(77) |
To obtain (76), we let and note that for all and for all . Next, we upper bound in the following manner:
(78) | ||||
(79) |
To obtain (78), we note that for all , where the second inequality follows from Lemma 5.10 with the fact , and is defined above. Combining (77) and (79), and noting , we have
(80) |
which implies (62).
References
- [1] Centers for Disease Control and Prevention Test for Current Infection. https://www.cdc.gov/coronavirus/2019-ncov/testing/diagnostic-testing.html. 2020.
- [2] Protect Purdue-Purdue University’s response to COVID-19. https://protect.purdue.edu. 2020.
- Ahn and Hassibi [2013] H. J. Ahn and B. Hassibi. Global dynamics of epidemic spread over complex networks. In Proc. Conference on Decision and Control, pages 4579–4585. IEEE, 2013.
- Bendavid et al. [2020] E. Bendavid, B. Mulaney, N. Sood, S. Shah, E. Ling, R. Bromley-Dulfano, C. Lai, Z. Weissberg, R. Saavedra, J. Tedrow, et al. COVID-19 antibody seroprevalence in Santa Clara County, California. MedRxiv, 2020.
- Bian et al. [2017] A. A. Bian, J. M. Buhmann, A. Krause, and S. Tschiatschek. Guarantees for greedy maximization of non-submodular functions with applications. In Proc. International Conference on Machine Learning, pages 498–507, 2017.
- Chakrabarti et al. [2008] D. Chakrabarti, Y. Wang, C. Wang, J. Leskovec, and C. Faloutsos. Epidemic thresholds in real networks. ACM Transactions on Information and System Security, 10(4):1–26, 2008.
- Chepuri and Leus [2014] S. P. Chepuri and G. Leus. Sparsity-promoting sensor selection for non-linear measurement models. IEEE Transactions on Signal Processing, 63(3):684–698, 2014.
- Cormen et al. [2009] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to algorithms. MIT press, 2009.
- Garey and Johnson [1979] M. R. Garey and D. S. Johnson. Computers and intractability, volume 174. Freeman San Francisco, 1979.
- Horn and Johnson [2012] R. A. Horn and C. R. Johnson. Matrix analysis. Cambridge University Press, 2012.
- Hota et al. [2020] A. R. Hota, J. Godbole, P. Bhariya, and P. E. Paré. A closed-loop framework for inference, prediction and control of SIR epidemics on networks. arXiv preprint arXiv:2006.16185, 2020.
- Joshi and Boyd [2008] S. Joshi and S. Boyd. Sensor selection via convex optimization. IEEE Transactions on Signal Processing, 57(2):451–462, 2008.
- Kay [1993] S. M. Kay. Fundamentals of statistical signal processing: Estimation theory. Prentice Hall PTR, 1993.
- Kempe et al. [2003] D. Kempe, J. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. In Proc. international conference on Knowledge Discovery and Data mining, pages 137–146, 2003.
- Khuller et al. [1999] S. Khuller, A. Moss, and J. S. Naor. The budgeted maximum coverage problem. Information Processing Letters, 70(1):39–45, 1999.
- Krause and Guestrin [2005] A. Krause and C. Guestrin. A note on the budgeted maximization of submodular functions. Carnegie Mellon University. Center for Automated Learning and Discovery, 2005.
- Krause et al. [2008] A. Krause, A. Singh, and C. Guestrin. Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies. Journal of Machine Learning Research, 9(Feb):235–284, 2008.
- Mei et al. [2017] W. Mei, S. Mohagheghi, S. Zampieri, and F. Bullo. On the dynamics of deterministic epidemic propagation over networks. Annual Reviews in Control, 44:116–128, 2017.
- Mo et al. [2011] Y. Mo, R. Ambrosino, and B. Sinopoli. Sensor selection strategies for state estimation in energy constrained wireless sensor networks. Automatica, 47(7):1330–1338, 2011.
- Newman [2002] M. E. Newman. Spread of epidemic disease on networks. Physical Review E, 66(1):016128, 2002.
- Nowzari et al. [2016] C. Nowzari, V. M. Preciado, and G. J. Pappas. Analysis and control of epidemics: A survey of spreading processes on complex networks. IEEE Control Systems Magazine, 36(1):26–46, 2016.
- Paré et al. [2018] P. E. Paré, J. Liu, C. L. Beck, B. E. Kirwan, and T. Başar. Analysis, estimation, and validation of discrete-time epidemic processes. IEEE Transactions on Control Systems Technology, 2018.
- Pastor-Satorras et al. [2015] R. Pastor-Satorras, C. Castellano, P. Van Mieghem, and A. Vespignani. Epidemic processes in complex networks. Reviews of Modern Physics, 87(3):925, 2015.
- Pezzutto et al. [2020] M. Pezzutto, N. B. Rossello, L. Schenato, and E. Garone. Smart testing and selective quarantine for the control of epidemics. arXiv preprint arXiv:2007.15412, 2020.
- Preciado et al. [2014] V. M. Preciado, M. Zargham, C. Enyioha, A. Jadbabaie, and G. J. Pappas. Optimal resource allocation for network protection against spreading processes. IEEE Transactions on Control of Network Systems, 1(1):99–108, 2014.
- Prem et al. [2020] K. Prem, Y. Liu, T. W. Russell, A. J. Kucharski, R. M. Eggo, N. Davies, S. Flasche, S. Clifford, C. A. Pearson, J. D. Munday, et al. The effect of control strategies to reduce social mixing on outcomes of the COVID-19 epidemic in Wuhan, China: a modelling study. The Lancet Public Health, 2020.
- Pukelsheim [2006] F. Pukelsheim. Optimal design of experiments. SIAM, 2006.
- Stoer and Bulirsch [2013] J. Stoer and R. Bulirsch. Introduction to numerical analysis, volume 12. Springer Science & Business Media, 2013.
- Streeter and Golovin [2009] M. Streeter and D. Golovin. An online algorithm for maximizing submodular functions. In Proc. Advances in Neural Information Processing Systems, pages 1577–1584, 2009.
- Summers et al. [2015] T. H. Summers, F. L. Cortesi, and J. Lygeros. On submodularity and controllability in complex dynamical networks. IEEE Transactions on Control of Network Systems, 3(1):91–101, 2015.
- Tzoumas et al. [2020] V. Tzoumas, L. Carlone, G. J. Pappas, and A. Jadbabaie. Lqg control and sensing co-design. IEEE Transactions on Automatic Control, pages 1–1, 2020. doi: 10.1109/TAC.2020.2997661.
- Van Trees [2004a] H. L. Van Trees. Detection, estimation, and modulation theory, part I. John Wiley & Sons, 2004a.
- Van Trees [2004b] H. L. Van Trees. Detection, estimation, and modulation theory, part IV: Optimum array processing. John Wiley & Sons, 2004b.
- Vrabac et al. [2020] D. Vrabac, P. E. Paré, H. Sandberg, and K. H. Johansson. Overcoming challenges for estimating virus spread dynamics from data. In Proc. Conference on Information Sciences and Systems, pages 1–6. IEEE, 2020.
- Ye and Sundaram [2019] L. Ye and S. Sundaram. Sensor selection for hypothesis testing: Complexity and greedy algorithms. In Proc. Conference on Decision and Control, pages 7844–7849. IEEE, 2019.
- Ye et al. [2020] L. Ye, S. Roy, and S. Sundaram. Resilient sensor placement for Kalman filtering in networked systems: Complexity and algorithms. IEEE Transactions on Control of Network Systems, 7(4):1870–1881, 2020.
- Ye et al. [2021] L. Ye, N. Woodford, S. Roy, and S. Sundaram. On the complexity and approximability of optimal sensor selection and attack for Kalman filtering. IEEE Transactions on Automatic Control, 66(5):2146–2161, 2021.