A Note on Valid Inequalities for PageRank Optimization with Edge Selection Constraints
Abstract
Csáji, Jungers, and Blondel prove that while a PageRank optimization problem with edge selection constraints is NP-hard, it can be solved optimally in polynomial time for the unconstrained case. This theoretical result is accompanied by several observations, which we leverage to develop valid inequalities in polynomial time for this class of NP-hard problems.
11footnotetext: Corresponding author ([email protected])
1 Introduction
This note considers the NP-hard PageRank optimization problem with edge selection constraints from Csáji, Jungers, and Blondel [11, 12]. Let be a directed graph with a set of nodes for , a target node , and a set of directed edges . Suppose that there exists a set of hidden (fragile) edges in such that , making and mutually exclusive. Given a graph , and a set of hidden edges , our goal is to identify a subset under specific constraints such that attains an optimal PageRank value of [9] in the graph . Previous studies, such as those by [4, 13], explore the optimal structure for outgoing edges from certain nodes in . Similarly, another study [24] explores the theoretical difficulty of selecting a fixed number of incoming edges to the target node to optimize its PageRank. The result demonstrates that, unless P = NP, there is no polynomial-time approximation method to maximize PageRank of by selecting the fixed number of incoming edges.
The PageRank value of a node equals the inverse of the expected first return time to , i.e., the average steps for a random walk starting at to return to in [2, 17]. Csáji et al. study maximizing the PageRank value of a node by equivalently minimizing the expected first return time to through the selection of edges in [11, 12]. Formally, we define as the expected first return time to after selecting (adding) the set of edges to the graph . For each directed edge , we define as a binary variable, where denotes the selection of the edge ; otherwise, . Specifically, an incumbent selection is equivalent to the support . For the notation and vector , we make a slight adjustment for convenience. We name the notation and its support , and name to the corresponding function evaluations for and for the corresponding support , interchangeably. We follow the concept considered by the study of Csáji et al., which minimizes based on the selected edges . Let be a set of constraints on the associated -dimensional binary decision vector . Given a graph and a set , the general PageRank optimization problem with edge selection constraints from Csáji et al. is defined as
(1) |
We note that Problem (1) presents the potential to explore various structures in mathematical programming. For instance, if the evaluation of an objective function relies on multiple sampling scenarios based on the sample average approximation method of [18], stochastic programming approaches such as those in [8, 16, 20, 26] can be considered. Similarly, if probabilistic constraints are in , the chance-constrained optimization methods [1, 19, 22] could be explored. Based on general methods of integer programming [7, 10, 23, 25], Problem (1) can be solved using an exact cutting plane method based on a relaxed master problem (RMP)
(2) |
where is a variable that evaluates the value of the function and is a set of valid inequalities iteratively updated to refine the RMP (2). At each iteration, given an incumbent solution , the method generates a valid inequality associated with this incumbent and adds it to . The process continues until the optimality guarantee is achieved, ensuring convergence to the optimal solution.
This note considers valid inequalities for the set of the RMP (2). We note that Csáji et al. prove that the unconstrained case of Problem (1) can be solved optimally in polynomial time [11, 12]. Later, a similar conclusion is independently reached by [14], further confirming the validity of this result. We summarize the following observations from the formal results of [12], which can help us develop valid inequalities.
Observation 1.1
From the results of [12], we obtain the following observations.
-
The problem can be solved in polynomial time.
-
Given two exclusive subsets and , the function
(3) can be solved in polynomial time.
-
Given an incumbent solution , the function can be solved in polynomial time, where .
We take the -shaped inequality from [21] as an example of the baseline for the set . Let be a small enough constant that makes the -shaped inequality valid. Given an incumbent , the -shaped inequality is defined as
Here, since for any , constant is trivially and sufficiently small to provide a valid inequality for the RMP (2); however, through Observation 1.1, we can compute a stronger value polynomially for to have a stronger -shaped inequality
(4) |
We note that, without Observation 1.1, even the -shaped inequality (4) cannot be guaranteed to be generated in polynomial time. Starting with the -shaped inequality (4), and based on Observation 1.1, we develop stronger valid inequalities in the following sections.
2 A Valid Inequality
Given an incumbent , we can utilize the function (3) of Observation 1.1 to calculate the best case of either adding an edge from or not adding an edge from . Note that base on the function (3), together with the condition for any , we develop another class of valid inequalities for the set of the RMP (2).
Theorem 2.1
Proof.
Consider a feasible point to the RMP (2). We show that satisfies the inequality (2.1) for any with the associated . Because of , there exists two cases: at least one edge of that is not in , or at least one edge of that is not in for the inequality (2.1).
-
Case 1:
For the case where at least one edge of is not in , there are two possible results in the term when for an edge . We let and denote two edges in for the two results, where the associated variables in with the conditions and . Here, since must includes at least one or , we have and . In other words, both and represent the relaxation of .
-
Case 2:
Similar to the discussion in the previous case, we have another situation that at least one edge of is not in . To put it briefly, let and be possible edges in , where the associated variables of with and . Both and are the relaxation of .
We demonstrate the validity of the following inequalities for a point based on the discussion in Cases 1 and 2.
(6) | ||||
(7) | ||||
(8) |
Based on the relations with and with , we conclude and for the validity of the inequality (6), where the inequality (6) can also be regarded as . The inequality (7) follows from the relations with and with . The inequality (8) follows from for all and for all . This completes the proof. ∎
Given an incumbent , the inequality (2.1) discusses the best possible outcome for each individual element within the associated and . We provide the following theoretical result, demonstrating that the inequality (2.1) is stronger than the -shaped inequality (4).
Proof.
Since is a relaxation of both and , we have and for all . Because of , we have
and
hold for any . This completes the proof. ∎
3 Strengthen the Valid Inequality
The lifting technique of integer programming is a generalization method used to strengthen valid inequalities [3, 5, 6, 15, 27]. In this section, we explore a method that integrates the general lifting concept with the observations in 1.1 to strengthen the inequality (2.1). Specifically, given an incumbent , we remain the coefficients of the inequality (2.1) and perform up-lifting procedures for the associated set . Specifically, given an incumbent , we retain the coefficients of inequality (2.1) and perform up-lifting procedures for the other coefficients associated with the set . Let be an ordering of the edges in . In this ordering, given , we sequentially solve lifting problems (see, e.g., the section II.2 of [23] and the section 2 of [15]) to get a new class of valid inequalities
where function for denotes the -th exact lifting problem defined by
s.t. | (9a) | |||
(9b) | ||||
(9c) |
Note that the different classes of Problem (1) are still embedded within the lifting problem (9). This implies that directly solving the exact lifting problem (9) to obtain a new valid inequality remains challenging. Here, we attempt to derive a modified version of the exact lifting problem (9), which allows us to obtain a valid inequality more easily.
Theorem 3.1
Given an incumbent , the inequality
(10) |
is valid for the RMP (2), where a modified lifting problem is
(11a) | ||||
(11b) |
for .
Proof.
Given , let be a positive constant that for all . We aim to find an inequality
(12) |
where is a valid coefficient for a variable for all to . We first slightly modify the original lifting problem (9) into another lifting problem referred to as
(13) |
We prove the validity of (13) for determing the inequality (12) using the fundamental concepts of lifting and mathematical induction. For the edge of an arbitrary ordering, the exact lift problem for the valid coefficient of in the inequality (12) is
which provides a valid for the inequality (12). Suppose that we have the valid for to . Given that the coefficients to in the inequality (12) are valid, the exact lifting problem for determining the valid coefficient of in the inequality (12) is
which provides a valid for the inequality (12). Since the value and the ordering are arbitrary, this confirms the validity of (13) for determing the inequality (12). Compared to (13), the modified lifting problem (11) removes the of and the terms of the objective function. Since is the relaxation of shown in (9c), the term for all , and the term for all to , we conclude that is the relaxation of and for all to . That is, for a feasible point , we have
This completes the proof. ∎
We provide Algorithm 1 to solve the separation problem of the inequality (10). Based on Observation 1.1, Corollary 3.1 demonstrates that Algorithm 1 is a polynomial-time method for the inequality (10).
Proof.
Although inequality (10) is derived from the modified lifting problem (11), it reveals an intriguing property compared to the inequality (2.1). Specifically, for any ordering , the inequality (10) is stronger than the inequality (2.1) and the -shaped inequality (4).
Proof.
Given , we set up an arbitrary ordering for all elements in . Note that given an arbitrary , we have . Therefore, from the equation (11b), the relation
holds for an arbitrary . This relation provides the result since
∎
Finally, the inequalities (2.1) and (10) can generate corresponding integer programming formulations for the set of the RMP (2), allowing us to apply a general cutting plane method in integer programming to solve Problem (1) optimally.
Proposition 3.2
Proof.
In conclusion, without Observation 1.1, we cannot even guarantee that the -shaped inequality (4) can be generated in polynomial time. Based on the observations, we introduce valid inequalities for Problem (1). We strengthen these inequalities step by step and demonstrate that these valid inequalities can be generated in polynomial time. These results may offer some potential insights for the exact method on Problem (1). For future research, exploring stronger or facet-defining conditions from other forms of lifting techniques is a suitable direction.
Acknowledgments
This work is supported, in part by, NSTC Taiwan 111-2221-E-A49-079 and 111-2221-E-A49-126-MY2.
References
- [1] S. Ahmed, J. Luedtke, Y. Song, and W. Xie. Nonanticipative duality, relaxations, and formulations for chance-constrained stochastic programs. Mathematical Programming, 162:51–81, 2017.
- [2] D. Aldous and J. Fill. Reversible markov chains and random walks on graphs. Unfinished monograph, recompiled 2014, available at https://www.stat.berkeley.edu/users/aldous/RWG/book.html, 2002.
- [3] A. Atamtürk. Sequence independent lifting for mixed-integer programming. Operations Research, 52(3):487–490, 2004.
- [4] K. Avrachenkov and N. Litvak. The effect of new links on google pagerank. Stochastic Models, 22(2):319–331, 2006.
- [5] E. Balas. Facets of the knapsack polytope. Mathematical Programming, 8:146–164, 1975.
- [6] E. Balas and E. Zemel. Facets of the knapsack polytope from minimal covers. SIAM Journal on Applied Mathematics, 34:119–148, 1978.
- [7] P. Belotti, C. Kirches, S. Leyffer, J. Linderoth, J. Luedtke, and A. Mahajan. Mixed-integer nonlinear optimization. Acta Numerica, 22:1–131, 2013.
- [8] J. R. Birge and F. Louveaux. Introduction to Stochastic Programming. Springer Verlag, New York, 1997.
- [9] S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30:107–117, 1998.
- [10] M. Conforti, G. Cornuéjols, and G. Zambelli. Integer Programming. Springer Verlag, New York, 2014.
- [11] B. C. Csáji, R. M. Jungers, and V. D. Blondel. Pagerank optimization in polynomial time by stochastic shortest path reformulation. In Algorithmic Learning Theory (ALT), pages 89–103, 2010.
- [12] B. C. Csáji, R. M. Jungers, and V. D. Blondel. Pagerank optimization by edge selection. Discrete Applied Mathematics, 169(31):73–87, 2014.
- [13] C. de Kerchove, L. Ninove, and P. van Dooren. Maximizing pagerank via outlinks. Linear Algebra and its Applications, 429(5-6):1254–1276, 2008.
- [14] O. Fercoq, M. Akian, M. Bouhtou, and S. Gaubert. Ergodic control and polyhedral approaches to pagerank optimization. IEEE Transactions on Automatic Control, 58(1):134–148, 2013.
- [15] Z. Gu, G. L. Nemhauser, and M. W. Savelsbergh. Sequence independent lifting in mixed integer programming. Journal of Combinatorial Optimization, 4:109–129, 2000.
- [16] J. L. Higle and S. Sen. Stochastic decomposition: An algorithm for two-stage linear programs with recourse. Mathematics of Operations Research, 16(3):650–669, 1991.
- [17] J. Hopcroft and D. Sheldon. Manipulation-resistant reputations using hitting time. In Algorithms and Models for the Web-Graph (WAW), pages 68–81, 2007.
- [18] A. J. Kleywegt, A. Shapiro, and T. Homem-de-Mello. The sample average approximation method for stochastic discrete optimization. SIAM Journal on Optimization, 12(2):479–502, 2002.
- [19] S. Küçükyavuz and R. Jiang. Chance-constrained optimization under limited distributional information: A review of reformulations based on sampling and distributional robustness. EURO Journal on Computational Optimization, 10 100030:1–45, 2022.
- [20] S. Küçükyavuz and S. Sen. An introduction to two-stage stochastic mixed-integer programming. INFORMS TutORials in Operations Research, pages 1–27, 2017.
- [21] G. Laporte and F. Louveaux. The integer -shaped method for stochastic integer programs with complete recourse. Operations Research Letters, 13(3):133–142, 1993.
- [22] J. Luedtke and S. Ahmed. A sample approximation approach for optimization with probabilistic constraints. SIAM Journal on Optimization, 19(2):674–699, 2008.
- [23] G. L. Nemhauser and L. A. Wolsey. Integer and Combinatorial Optimization. Wiley-Interscience, New York, NY, USA, 1988.
- [24] M. Olsen and A. Viglas. On the approximability of the link building problem. Theoretical Computer Science, 518:96–116, 2014.
- [25] A. Schrijver. Theory of Linear and Integer Programming. Wiley-Interscience, New York, NY, USA, 1998.
- [26] A. Shapiro, D. Dentcheva, and A. Ruszczyński. Lectures on Stochastic Programming: Modeling and Theory. Society for Industrial Mathematics, Philadelphia, PA, USA, 2009.
- [27] L. A. Wolsey. Valid inequalities and superadditivity for 0-1 integer programs. Mathematics of Operations Research, 2(1):66–77, 1977.