A fast two-stage algorithm for non-negative matrix factorization in streaming data
Abstract
In this article, we study algorithms for nonnegative matrix factorization (NMF) in various applications involving streaming data. Utilizing the continual nature of the data, we develop a fast two-stage algorithm for highly efficient and accurate NMF. In the first stage, an alternating non-negative least squares (ANLS) framework is used, in combination with active set method with warm-start strategy for the solution of subproblems. In the second stage, an interior point method is adopted to accelerate the local convergence. The convergence of the proposed algorithm is proved. The new algorithm is compared with some existing algorithms in benchmark tests using both real-world data and synthetic data. The results demonstrate the advantage of our algorithm in finding high-precision solutions.
Index Terms:
Interior point method, active set method, nonnegative matrix factorization, streaming data.I Introduction
Nonnegative matrix factorization (NMF) [26, 19] refers to the factorization of a matrix approximately into the product of two nonnegative matrices with low rank, . It has become one of the most popular multi-dimensional data processing tools in various applications such as signal processing[2], biomedical engineering[27], pattern recognition[6], computer vision and image engineering[2].
Lee and Sewing initiated the study of NMF and presented a method in 1999 [19]. Their method makes all decomposed components non-negative and achieves non-linear dimension reduction at the same time. Developed in [20] for NMF, Lee and Seung’s multiplicative update rule has been a popular method due to the simplicity in implementation.
A commonly used optimization formulation of is to use the Square of Euclidian Distance (SED) as the objective function, that is,
(1) |
Many studies of NMF based on the above formulation have focused on the use of different optimization approaches like the alternating non-negative least squares (ANLS) [22, 15, 11, 13], coordinate descent methods [5, 21], and alternating direction method of multipliers (ADMM) [12]. A comprehensive survey on various NMF models and many existing NMF algorithms can be found in [30].
The NMF problem has been shown to be non-convex and NP-hard [29]. The algorithms studied in the literature can only guarantee finding a local minimum in general, rather than a global minimum of the cost function. Although Arora et al. [1] presented a polynomial-time algorithm for constant , the order of the complexity of the algorithm is too high to be applied in practice. Nevertheless, in many data mining applications, high quality local minima are desired in short time [30].
In this study, we are motivated by the application of NMF to problems involving streaming data. Here, streaming data is continually generated data with continuous distributions, such as those from the real-time monitoring of reaction product in chemistry experiments and materials synthesis [23, 28]. For example, Zhao et al. [32] measured the X-ray diffraction data during the nucleation and growth of zeolite-supported Ag nanoparticles through reduction of Ag-exchanged mordenite (MOR), and the processed the data with pair distribution function (PDF) measurement [7]. In the data, each PDF is approximately represented by a vector representation of dimensions, which is recorded at time instances in total. We should note the key features of continual nature of these types of continuously distributed data: at any fixed time, the PDF is continuous in the distance variable; meanwhile, at any fixed distance in the PDF measurement, its value also has continuity in time; moreover, the spatially distributed data at later times are generated progressively following the data earlier in time. In the AgMOR data used in [32], the dimensions were respectively and , where is the length of each data, and is the number of measurements. It was anticipated that there are three materials present in the reaction, which means that . The focus of our study here is on streaming data in the particular regime of , with being very small. This reflects the high dimensionality of data in an individual measurement (very large ) for systems with a relatively small number of components .
Our goal is to obtain high-precision local solutions for streaming data with a relatively small scale, that , , and . Based on our observation, ANLS has a fast descent rate on objective function at the first few iterations, but a low convergent rate near a local minimizer. Interior point method has a better local convergent rate, but require much more computation. Thus, we propose to combine ANLS with interior point method into a two-stage algorithm. In the first stage, an active set method is used for ANLS. Due to the continuity of streaming data, the number of active set changes is small. In the second stage, a line search interior point method is adopted to reach a fast local convergence. Based on the property of streaming data, we optimize the computation of the interior point algorithm so that the computational cost can scale like in each iteration. The proposed two-stage algorithm combines the advantages of ANLS and interior point method for streaming data.
The organization of this paper is as follows. Section 2 introduces the first stage of our proposed algorithm which is based on the ANLS with an active set method for streaming data. Section 3 proposes the second stage of the algorithm which is a line search interior point method. Its convergence is also discussed. Section 4 gives the whole framework of our proposed two-stage algorithm to solve NMF in streaming data. Section 5 shows the efficiency of our proposed algorithm by numerical tests. Section 6 concludes this paper with discussions on some future concerns.
II ANLS framework and active set method
We first briefly review the ANLS framework for solving (1). In ANLS, variables are simply divided into two groups, which are then updated respectively as outlined in algorithm 1.
Repeat until stopping criteria are satisfied
end
Note that each subproblem can be split into a series of nonnegative least square problems
(2) |
For , corresponds to every column of , while for , takes every row of .
Although the original problem in (1) is nonconvex, the subproblems in algorithm 1 are convex quadratic problems whose optimal solutions can be found in polynomial time. In addition, the convergence of algorithm 1 to a stationary point of (1) has been proved [10].
On the basis of ANLS framework, many algorithms for NMF have been developed, such as active set method [15], projection gradient method [22], projection Newton method [9], projection quasi-Newton method [14, 31], Nesterov’s gradient method [11], and the method combined with Barzilai-Borwein stepsize [13].
The active set method solves the subproblem exactly. Kim and park [15] introduced an algorithm based on ANLS and active set method. The constrained least square problem in the matrix form with multiple right-hand side vectors is decomposed into several independent nonnegative least square problems with single right-hand side vector (2), which are solved by the active set method of Lawson and Hanson [18]. Later, Kim and Park [16] proposed an active-set-like algorithm based on ANLS framework. In their algorithm, the single least squares are solved by block principal pivoting method, and the columns that have a common active set are grouped together to avoid redundant computation of Cholesky factorization in solving linear equations.
By noticing the continuity in streaming data, when a single least square problem (2) is solved, the same and similar may be used in the subsequent least square, resulting in similar solutions with likely the same active set.
Therefore, we choose the active set method of Lawson and Hanson [18] to solve the least square problem (2), using the solution and its active set as the initial guess of the subsequent least square. This strategy is called warm start strategy or continuation technology, which is widely used in many algorithms and can reasonably improve the effectiveness[24, 8]. We do not need to do any re-grouping, because the right-hand side vectors have been naturally grouped due to the continuity in data, and we just need to solve them one by one sequentially. In fact, the active set of the solution in the subsequent least square usually has no or little change. When we solve the first set of equations in the new least square subproblem, the Cholesky factorization performed in the previous step can be utilized to avoid redundant computation. Numerically, we see that the active set usually changes only on a very few occasions or does not change at all for the streaming data under consideration. Consequently, the first stage of our algorithm is chosen to be a combination of ANLS framework, active set method and warm start strategy.
III A line search interior point method for NMF
Interior-point methods have proved to be equally successful for various nonlinear optimization as for linear programming. In this section, we propose a line search interior point method for solving NMF. Its global convergence and computational cost are analyzed.
III-A Algorithm
Given that the linear independence constraint qualification (LICQ) holds for NMF problem, the KKT conditions [17] for the problem can be written as
(3) |
We denote , , , , where transforms a matrix to a vector by expanding it by columns. Meanwhile, we define the inverse operations , , , .
Applying Newton’s method to the nonlinear system, in the variables , , , , we obtain
(4) | ||||
with , where , , is a vector of ones, constructs a diagonal matrix with the diagonal elements given by ,
(5) | ||||
is the th row of , and is the th column of .
Let be strictly positive, then the variables , , and are forced to take positive values. The trajectory is called the primal-dual central path. The variables are updated by
(6) |
where , , and
(7) |
with . The condition (7) is called the fraction to the boundary rule [25], which is used to prevent the variables from approaching their lower bounds of 0 too quickly. In this work, we choose .
A predictor or probing strategy [25] can also be used to determine the parameter . We calculate a predictor (affine scaling) direction
by setting . We probe this direction by letting be the longest step lengths that can be taken along the affine scaling direction before violating the nonnegativity conditions . Explicit formulas for these step lengths are given by (7) with . We then define to be the value of complementarity along the (shortened) affine scaling step, that is,
(8) |
and a heuristic choice of is defined as following: and
(9) |
We propose a two-level nested loop algorithm for the interior point search. In the inner loop, the parameter is fixed. In the outer loop, we gradually reduce to 0. We use the following error function to break the inner loop, which is based on the perturbed KKT system:
(10) |
To guarantee the global convergence of the algorithm, we apply a line search approach. First we consider the second derivation of the interior-point method associated with the barrier problem
We use the exact merit function as same as the barrier function, which can be formed by
In the algorithm, we utilize Armijo line search to make the merit function sufficiently decrease.
Considering the convexity of the Hessian, we approximate
by a positive definite matrix to guarantee the direction is always a descent direction of , so that such a line search can be implemented. Notice that the original NMF problem is a least square problem. Thus we can utilize the Hessian in the traditional Guass-Newton algorithm. The Hessian matrix is
and it is guaranteed to be positive semi-definite. Here
which is the first item of . For further safeguarding, we add a diagonal matrix to this Hessian, where is a small positive constant. Then we obtain the primal-dual direction by solving
(11) | ||||
By eliminating and in (11), we have
(12) |
where
and is simply represented by . We simplify the above formula by
where represents the coefficient matrix and represents the direction . Since is positive definite, the inner product , which means that is a descent direction.
To sum up all the approaches above, we present the whole algorithm of interior point method in (2). It contains two loops. The parameter is fixed in the inner loop. Due to the Armijo line search (13) in the inner loop, the stopping criterion of the inner loop can be satisfied in finite iterations. Further, the parameter and are reduced gradually, and by the definition of error function , the solutions of the two-loop algorithm satisfy the KKT system (3) within the error . In practice, the barrier stop tolerance can be defined as
The complete convergence theorem and its proof are given in (1).
Theorem 1.
Suppose that all the sequences , , , generated by algorithm 2 are bounded. Then algorithm 2 stops in finite iterations.
Proof.
We will consider the inner loop first and show that for a given , will be satisfied in finite iterations.
Based on the Armijo line search rule (13), the value of decreases monotonously. Then we have that the lower bound of and is greater than a strictly positive constant that depends on . Thus the smallest eigenvalue of the coefficient matrix of (12) is greater than a constant greater than 0. Further more, using the boundedness of , we obtain that is bounded. According to the lower bound of , the boundedness of and (7), we have
(14) |
According to the stepsize rule (13), we have
We aim to prove
(15) |
next. To prove it by contradiction, we suppose that
(16) |
This means that there exists a subsequence and a constant such that
(17) |
for . Due to the boundedness of , there exists a subsequence such that converges to . By (16) and (17), we have
According to (14),
when is large enough. For simplicity, we redefine the sequence satisfying the above condition as . From the stepsize rule, the condition (13) is violated by . We have
(18) |
Taking the limit of the above inequality, we obtain
Due to , it follows that . On the other hand, . Therefore, , which is in contradiction to (16). So (15) is established.
From (12) and (15), it follows that
and
Then according to (11), we obtain
For an arbitrary cluster point , for any , there exists a constant such that
and
for all . Due to the boundedness of , can reach 1 for some , where is a constant. Then it follows from
that
where is constant. Therefore, for a given , let be sufficiently small, then there exists a such that .
We denote the sequence satisfying the inner loop stopping criterion by
To prove the theorem by contradiction, we suppose that there is no point satisfying the outer loop stopping criterion . Due to the boundedness, there exists a cluster point of . For any cluster point , we consider the limits on both sides of
then we have that
Thus algorithm 2 stops at some , a contradiction. Therefore, algorithm 2 stops in finite iterations. ∎
III-B Computation
In general, the computational cost in each iteration of the interior point method is usually the cubic power of its size, which is impractical for large scale problems. However, for the special NMF problem under consideration here, the amount of computation can be greatly reduced, so that the proposed method can be applied to practical problems involving streaming data. In the following, we analyze the computational cost of algorithm 2.
First of all, to compute the gradient in (1), flops are needed.
The main computational cost of (2) is computing the primal dual direction (11). One can solve (12) to obtain first, and then compute within a low cost.
We rewrite (12) as
(19) |
In order to minimize the computational cost, we first decompose .
can be obtained by Cholesky factorization or eigenvalue decomposition. Since the matrix is composed of positive definite diagonal blocks with the size of k times k, one can obtain and within flops. (19) is equivalent to
Then we solve from
(20) |
and compute by
We need flops for constructing and flops for computing . By considering that each block of is rank one, we can compute within flops. The dominant computation is the computation of which costs flops. If we consider that each block of is rank one, it can be reduced to flops. When solving (20) by Cholesky factorization, since the size of the coefficient matrix is by , the computational cost is . Other computations, like computing the right side of (20) and computing , are flops.
To sum up, the computation cost of algorithm 2 in each iteration is . Compared with the computation of gradient , the cost is no more than times. Since and is usually very small in our streaming data, the computational complexity is completely acceptable.
IV A two-stage algorithm for NMF on streaming data
In this section, we propose a practical algorithm with fast convergence for solving NMF in streaming data. It combines both ANLS framework with active set method and interior point method proposed in the previous sections.
In the early stage of the algorithm, we use ANLS framework with active set method. It can reduce the value of objective function rapidly. We use the relative step tolerance, which is a relative lower bound on the size of a step, meaning
as the stopping criterion of this stage. If the algorithm attempts to take a step that is smaller than step tolerance, the iterations end.
At the end of the first phase, the algorithm is going to enter the second phase of using the interior point method. However, the solutions from active set method usually contain elements with value zero, which is incompatible to the strict interior point required by interior point method. Meanwhile, we also need to provide the initial dual variables to the interior point method. To address these issues, we first change the primal variable smaller than to by
(21) |
where is a small positive constant. We can choose
(22) |
Next, we give the initial value of the dual variable by
(23) |
We set the parameters
(24) |
and
(25) |
After these preparations, the algorithm enters its second stage by implementing algorithm 2. As the Hessian matrix is approximated in algorithm 2 by the positive definite matrix
(26) |
in order to further speed up convergence, we can change back to at the right time. A heuristic way to switch to
(27) |
is by monitoring , which is given in (9). A small implies that the predictor step generates a point close to the boundary, thus it is likely to be close to a local minimum. Therefore, when
we switch to (27), where is a user-supplied constant, and we set in our test. The computation of the primal-dual direction is similar to that using . The difference is that each block of is no longer rank one, thus the computational cost is flops. Since (27) is not guaranteed to be positive definite, the primal-dual direction using (27) may not be a descent direction. We check the negativity of
in (13) before we implement the line search. If we fail to obtain a descent direction, we switch to the positive definite Hessian (26).
All elements of the above approaches are presented in our two-stage algorithm, as outlined below in algorithm 3.
Initialize: Choose initial
Implement (1) with active set method
Update variables by (21), (22) and (23)
Set parameters by (24) and (25), , select the tolerance , and let .
Repeat until
Repeat until
If ,
compute the primal-dual direction by solving (11)
with Hessian (27).
If ,
, compute the primal-dual direction by
solving (11).
Else,
compute the primal-dual direction by solving (11).
Compute and using (7).
Backtrack step lengths to find
the smallest integer satisfying (13).
Compute using (6).
Set .
End
Compute parameter using (8) and (9) and update .
If , , else
End
V Numerical tests
We test our two-stage algorithm and compare it with other ANLS-based methods including NeNMF [11], QRPBB [13] and ANLS-BPP [16]. All the tests are performed using MATLAB 2020a on a computer with 64-bit system and 2.70GHz CPU. Comparisons are done on both synthetic data sets and real-world problems.
V-A Stopping criterion
The KKT conditions of (1) are given in (3). The definition of the error function (10) measures the violation of the KKT conditions. Therefore, we set
to be the stopping criterion, where
The ANLS-based algorithms do not generate the dual variables and . Here we give a reasonable definition that
In some cases, we also limit the maximum CPU time, in cases where some algorithms can not reach the given accuracy.
V-B Streaming AgMOR PDF data
Zhao et al. [32] measured the X-ray diffraction data during the nucleation and growth of zeolite-supported Ag nanoparticles through reduction of Ag-exchanged mordenite (MOR), and processed the data with pair distribution function (PDF) measurement. In the field of chemistry, more and more people use mathematical tools to analyze their measured data. Chapman et al. [4] used principle component analysis (PCA) and some post-processing to analyze the data given in [32], and obtained 3 principle components. Since PDF is a distribution-type function, NMF may be intuitively more applicable.
We simply remove the negativity of the raw data by shifting each PDF up by the opposite of its original minimum value. Then we perform the NMF algorithms on the data. The size of the data is
and in the algorithms, is set to be 3 based on the analysis of [4].
We use the same initial point for each algorithm. The relationship between KKT violation and CPU time of the four algorithms are shown in Fig. 1. It can be seen that the performances of NeNMF, ANLS-BPP and our 2-STAGE algorithm are similar in the early stage, and the decline of error function value for QRPBB is the most obvious. After that, there is a rise of error function value in 2-STAGE, because our algorithm begins to enter the second stage and the variables change, and some shocks occur later, which is caused by the instability of the early iterations of the interior point method. After several iterations, 2-STAGE becomes stable and converges quickly to meet the termination criterion. Compared with NeNMF and ANLS-BPP, QRPBB is always more accurate. Even so, it can not achieve the given criterion in a short time.

Next, we generate 10 different random initial points. For each of them, all algorithms are implemented. The results are shown in Tab. I. Because none of the three algorithms compared with the 2-STAGE algorithm can meet the stopping criterion in a short time, we set the maximum running CPU time as 20 seconds. Tab. I presents the average, minimum and maximum CPU time, the KKT violation and the objective function value of each algorithm respectively. It can be seen that our algorithm always has the minimum KKT violation and the minimum objective function value. It has a great advantage in finding a high-precision solution within a short amount of time.
Algorithm | cpu(s):avrg(min,max) | E:avrg(min,max) | f:avrg(min,max) |
---|---|---|---|
NeNMF | 20.01(20.00,20.03) | 6.64(1.72,16.32)e-2 | 5.37274(5.37242,5.37329)e+2 |
QRPBB | 20.02(20.00,20.03) | 2.19(1.48,2.94)e-3 | 5.37236(5.37213,5.37263)e+2 |
ANLS-BPP | 20.02(20.02,20.05) | 1.37(0.75,2.14)e-2 | 5.37236(5.37206,5.37264)e+2 |
2-STAGE | 5.03(3.55,8.88) | 3.58(1.24,6.30)e-7 | 5.37205(5.37205,5.37205)e+2 |
V-C Yale face database
The Yale face database is widely tested for face recognition [3]. It contains 165 grayscale images of 15 individuals. There are 11 images per subject, one per different facial expression or configuration: lighting (center-light, left-light and right-light), with/without glasses, facial expressions (normal, happy, sad, sleepy, surprised, and wink). The original image size is pixels. To reduce the computational burden, the image size was reduced to pixels. Because the image of face also has some degree of continuity, we regard the data as streaming data to test our algorithm. The data size is .
We gradually increase to explore the influence of variation on the algorithm. We selected the face images of the first individuals, , i.e. . In the following tests, we fix . We limit a maximum CPU time of 60 seconds for each algorithm and generate 10 different random initial points for each sample. We see from Fig. 2, where we take an example of and an example of , that the first stage of 2-STAGE is not as efficient as other algorithms. The main reason is that the continuity of face data is not strong enough compared with streaming data, thus the active set changes frequently in the algorithm resulting large computational cost. In Fig. 2 (top), the second stage of 2-STAGE, interior point method, shows a fast local convergence, since the slope is significantly steeper than the others. However, in Fig. 2 (bottom), the slope of the second stage of 2-STAGE is similar with others. The reason is that when increases, the computational cost of interior point method increases faster than that of other algorithms. Therefore, we see from Tab. II that, as anticipated, the larger is, the less efficient 2-STAGE is compared with other algorithms.


size(n,m,k) | Algorithm | cpu(s):avrg(min,max) | E:avrg(min,max) | f:avrg(min,max) |
---|---|---|---|---|
(4096,11,3) | NeNMF | 44.94(30.50,66.53) | 125(9.77e-7,1254) | 1.40186(1.40104,1.40918)e+7 |
QRPBB | 19.10(12.05,40.28) | 8.20(3.39,9.99)e-7 | 1.40511(1.40104,1.40918)e+7 | |
ANLS-BPP | 23.14(16.42,49.08) | 9.80(9.50,9.98)e-7 | 1.40185(1.40104,1.40918)e+7 | |
2-STAGE | 13.30(6.83,24.33) | 4.39(1.16,9.46)e-7 | 1.40520(1.40104,1.41008)e+7 | |
(4096,22,3) | NeNMF | 36.36(7.28,65.63) | 1.13(9.47e-7,11.34) | 3.13573(3.12627,3.14786)e+7 |
QRPBB | 6.30(2.41,17.23) | 7.32(2.63,9.95)e-7 | 3.13324(3.12627,3.14785)e+7 | |
ANLS-BPP | 19.90(2.89,60.03) | 0.79(9.48e-7,5.78) | 3.13772(3.12627,3.14786)e+7 | |
2-STAGE | 9.81(5.45,18.28) | 3.54(1.79,7.37)e-7 | 3.12893(3.12627,3.14785)e+7 | |
(4096,44,3) | NeNMF | 63.76(61.03,67.06) | 2.71(2.04,6.17)e-6 | 8.20694(8.20694,8.20694)e+7 |
QRPBB | 6.32(4.97,7.06) | 8.52(2.49,9.99)e-7 | 8.20694(8.20694,8.20694)e+7 | |
ANLS-BPP | 10.06(8.34,12.14) | 9.85(9.44,9.99)e-7 | 8.20694(8.20694,8.20694)e+7 | |
2-STAGE | 16.05(11.08,31.14) | 5.05(1.02,9.02)e-7 | 8.20694(8.20694,8.20694)e+7 | |
(4096,88,3) | NeNMF | 62.86(61.05,64.81) | 2.09(1.35,4.19)e-6 | 1.97873(1.97753,1.98962)e+8 |
QRPBB | 4.97(3.14,10.58) | 8.37(5.42,9.96)e-7 | 1.97945(1.97753,1.98962)e+7 | |
ANLS-BPP | 9.10(7.25,16.50) | 4.53(1.32,9.81)e-7 | 1.97874(1.97753,1.98962)e+7 | |
2-STAGE | 14.76(7.86,43.48) | 5.05(1.02,9.02)e-7 | 1.97753(1.97753,1.97753)e+7 | |
(4096,165,3) | NeNMF | 63.99(60.17,67.20) | 1.54(1.35,1.89)e-6 | 4.06305(4.06305,4.06305)e+8 |
QRPBB | 5.91(5.30,6.78) | 5.01(4.79,9.58)e-7 | 4.06305(4.06305,4.06305)e+8 | |
ANLS-BPP | 11.46(10.81,12.36) | 9.60(9.40,9.85)e-7 | 4.06305(4.06305,4.06305)e+8 | |
2-STAGE | 29.78(16.58,61.78) | 4.10(1.19,8.39)e-7 | 4.06305(4.06305,4.06305)e+8 |
V-D Synthetic data
In order to test more problems on different scales, we artificially synthesized some data for testing.
We note that the focus of this paper is on streaming data. Chapman et al.[4] use PCA method to study AgMor data. They found that there were three dominant components (singular values) of the data, which means that the original data is a rank-3 matrix plus noise. According to this characteristics, we construct the artificial data by the following methods. First, we decide the problem size . Then, we generate a random matrix whose size is , and each element is uniformly distributed from 0 to 1. In the same way, we generate a random matrix whose size is . Next, we compute and add Gaussian noise, whose expectation is 0 and standard deviation is , to each element of . Finally, we change the negative elements in to zeros.
In this set of tests, each algorithm is limited to a maximum CPU time of 60 seconds. We generate 10 different random initial points for sample. The results are shown in Tab. III. From an average point of view, the performance of 2-STAGE is the best of the four algorithms, followed by QRPBB. Except for M = 50, k = 6, ANLS-BPP exceeds QRPBB. In some cases, other algorithms may be better than 2-STAGE. For example, when and , the minimum time of QRPBB is much less than the maximum time of 2-STAGE. Generally speaking, the performance of 2Stage in getting high-precision solution truly stands out.
size(n,m,k) | Algorithm | cpu(s):avrg(min,max) | E:avrg(min,max) | f:avrg(min,max) |
---|---|---|---|---|
(2000,50,3) | NeNMF | 60.01(60.00,60.03) | 1.74(0.70,3.57)e-5 | 4.61662(4.61662,4.61662)e+2 |
QRPBB | 29.36(20.17,48.14) | 8.46(5.60,9.80)e-7 | 4.61662(4.61662,4.61662)e+2 | |
ANLS-BPP | 58.48(47.89,60.05) | 1.69(1.00,3.29)e-6 | 4.61662(4.61662,4.61662)e+2 | |
2-STAGE | 3.31(2.45,4.75) | 3.07(1.16,5.86)e-7 | 4.61662(4.61662,4.61662)e+2 | |
(2000,50,4) | NeNMF | 47.80(40.88,60.02) | 1.19(1.00,2.31)e-6 | 4.56019(4.56019,4.56019)e+2 |
QRPBB | 22.89(12.78,34.22) | 7.30(2.67,9.99)e-7 | 4.56019(4.56019,4.56019)e+2 | |
ANLS-BPP | 28.03(23.44,32.61) | 9.99(9.98,10.00)e-7 | 4.56019(4.56019,4.56019)e+2 | |
2-STAGE | 7.41(4.22,11.78) | 3.25(1.09,8.91)e-7 | 4.56019(4.56019,4.56019)e+2 | |
(2000,50,5) | NeNMF | 49.81(40.92,56.80) | 9.99(9.98,10.00)e-7 | 4.45449(4.45449,4.45449)e+2 |
QRPBB | 25.14(20.61,35.91) | 5.99(1.80,8.86)e-7 | 4.45449(4.45449,4.45449)e+2 | |
ANLS-BPP | 24.78(21.67,28.31) | 9.99(9.97,9.99)e-7 | 4.45449(4.45449,4.45449)e+2 | |
2-STAGE | 6.98(6.20,7.91) | 3.93(1.69,8.31)e-7 | 4.45449(4.45449,4.45449)e+2 | |
(2000,50,6) | NeNMF | 60.02(60.00,60.03) | 5.18(0.38,13.11)e-3 | 4.36121(4.36121,4.36121)e+2 |
QRPBB | 60.01(60.00,60.03) | 2.19(0.02,8.31)e-4 | 4.36121(4.36121,4.36121)e+2 | |
ANLS-BPP | 56.07(20.38,60.05) | 5.35(0.10,32.81)e-5 | 4.36309(4.36121,4.38005)e+2 | |
2-STAGE | 14.46(6.31,25.81) | 4.74(1.18,7.77)e-7 | 4.36121(4.36121,4.36121)e+2 | |
(2000,100,3) | NeNMF | 17.14(14.88,19.91) | 9.96(9.93,10.00)e-7 | 9.57337(9.57337,9.57337)e+2 |
QRPBB | 5.84(4.86,7.70) | 7.81(1.76,9.98)e-7 | 9.57337(9.57337,9.57337)e+2 | |
ANLS-BPP | 15.10(14.27,15.97) | 9.98(9.94,10.00)e-7 | 9.57337(9.57337,9.57337)e+2 | |
2-STAGE | 4.05(2.95,5.33) | 4.71(1.00,9.92)e-7 | 9.57337(9.57337,9.57337)e+2 | |
(2000,100,4) | NeNMF | 26.86(23.59,31.17) | 9.98(9.97,10.00)e-7 | 9.54230(9.54230,9.54230)e+2 |
QRPBB | 14.53(13.17,15.63) | 5.75(2.37,8.99)e-7 | 9.54230(9.54230,9.54230)e+2 | |
ANLS-BPP | 24.60(22.36,26.70) | 9.98(9.96,10.00)e-7 | 9.54230(9.54230,9.54230)e+2 | |
2-STAGE | 7.46(5.36,8.91) | 4.82(1.01,9.44)e-7 | 9.54230(9.54230,9.54230)e+2 | |
(2000,100,5) | NeNMF | 36.04(31.03,41.45) | 9.98(9.97,10.00)e-7 | 9.44857(9.44857,9.44857)e+2 |
QRPBB | 17.60(16.17,20.45) | 7.02(4.22,9.12)e-7 | 9.44857(9.44857,9.44857)e+2 | |
ANLS-BPP | 27.54(25.70,29.58) | 9.98(9.96,10.00)e-7 | 9.44857(9.44857,9.44857)e+2 | |
2-STAGE | 11.77(9.69,17.59) | 4.59(1.27,8.06)e-7 | 9.44857(9.44857,9.44857)e+2 | |
(2000,100,6) | NeNMF | 60.02(60.00,60.03) | 7.43(0.03,28.52)e-3 | 9.35451(9.35451,9.35452)e+2 |
QRPBB | 57.20(37.47,60.03) | 2.54(0.07,12.20)e-5 | 9.35451(9.35451,9.35451)e+2 | |
ANLS-BPP | 60.03(60.02,60.05) | 1.14(0.04,4.44)e-4 | 9.35451(9.35451,9.35451)e+2 | |
2-STAGE | 25.54(14.44,52.39) | 3.05(1.04,9.33)e-7 | 9.35451(9.35451,9.35451)e+2 |
VI Conclusions
In this paper we focused on solutions to the NMF for streaming data. We presented a fast two-stage algorithm, where the first stage is the ANLS framework with active set method which gains benefit from the continuity of streaming data, and the second stage is a line search interior point method which gets benefit from . In addition, we have proved the global convergence of the proposed line search interior point method. The first stage reduces the value of the objective function rapidly, and the second stage converges to a local solution quickly due to the property of Newton-type direction. We tested the proposed algorithm on several real and synthetic data, and observed that compared with other algorithms, our algorithm is more effective in solving high-precision local solutions.
The active set method in the first stage does not reach the expected speed, even if it is tested on continuous data. We think that this may be caused by the limitations of the underlying code implementation in MATLAB. On the other hand, we find that the transition part between the two stages may induce instability. This is because the solution of the active set method cannot be directly used as the initial guess of the interior point method, and its changes have an impact on stability. At present, the parameters used to generate starting point are selected carefully to avoid the instability. In the future, we will work to find a more stable transition technique.
Considering that in addition to the basic NMF model, there are other variants of NMF, such as constrained NMFs and structured NMFs, our algorithm has the potential to be applications to more problems through extension. This will be our consideration in the future.
References
- [1] S. Arora, R. Ge, R. Kannan, and A. Moitra, “Computing a nonnegative matrix factorization—provably,” SIAM Journal on Computing, vol. 45, no. 4, pp. 1582–1611, 2016.
- [2] I. Buciu, “Non-negative matrix factorization, a new tool for feature extraction: theory and applications,” International Journal of Computers, Communications and Control, vol. 3, no. 3, pp. 67–74, 2008.
- [3] D. Cai, X. He, J. Han, and H.-J. Zhang, “Orthogonal laplacianfaces for face recognition,” IEEE transactions on image processing, vol. 15, no. 11, pp. 3608–3614, 2006.
- [4] K. W. Chapman, S. H. Lapidus, and P. J. Chupas, “Applications of principal component analysis to pair distribution function data,” Journal of Applied Crystallography, vol. 48, no. 6, pp. 1619–1626, 2015.
- [5] A. Cichocki and A.-H. Phan, “Fast local algorithms for large scale nonnegative matrix and tensor factorizations,” IEICE transactions on fundamentals of electronics, communications and computer sciences, vol. 92, no. 3, pp. 708–721, 2009.
- [6] A. Cichocki, R. Zdunek, A. H. Phan, and S.-i. Amari, Nonnegative matrix and tensor factorizations: applications to exploratory multi-way data analysis and blind source separation. John Wiley & Sons, 2009.
- [7] T. Egami and S. J. L. Billinge, Underneath the Bragg peaks: structural analysis of complex materials, 2nd ed. Amsterdam: Elsevier, 2012. [Online]. Available: Link
- [8] D. Goldfarb and S. Ma, “Convergence of fixed-point continuation algorithms for matrix rank minimization,” Foundations of Computational Mathematics, vol. 11, no. 2, pp. 183–210, 2011.
- [9] P. Gong and C. Zhang, “Efficient nonnegative matrix factorization via projected newton method,” Pattern Recognition, vol. 45, no. 9, pp. 3557–3565, 2012.
- [10] L. Grippo and M. Sciandrone, “On the convergence of the block nonlinear gauss–seidel method under convex constraints,” Operations research letters, vol. 26, no. 3, pp. 127–136, 2000.
- [11] N. Guan, D. Tao, Z. Luo, and B. Yuan, “Nenmf: An optimal gradient method for nonnegative matrix factorization,” IEEE Transactions on Signal Processing, vol. 60, no. 6, pp. 2882–2898, 2012.
- [12] D. Hajinezhad, T.-H. Chang, X. Wang, Q. Shi, and M. Hong, “Nonnegative matrix factorization using admm: Algorithm and convergence analysis,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016, pp. 4742–4746.
- [13] Y. Huang, H. Liu, and S. Zhou, “Quadratic regularization projected barzilai–borwein method for nonnegative matrix factorization,” Data mining and knowledge discovery, vol. 29, no. 6, pp. 1665–1684, 2015.
- [14] D. Kim, S. Sra, and I. S. Dhillon, “Fast newton-type methods for the least squares nonnegative matrix approximation problem,” in Proceedings of the 2007 SIAM international conference on data mining. SIAM, 2007, pp. 343–354.
- [15] H. Kim and H. Park, “Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method,” SIAM journal on matrix analysis and applications, vol. 30, no. 2, pp. 713–730, 2008.
- [16] J. Kim and H. Park, “Fast nonnegative matrix factorization: An active-set-like method and comparisons,” SIAM Journal on Scientific Computing, vol. 33, no. 6, pp. 3261–3281, 2011.
- [17] H. W. Kuhn and A. W. Tucker, “Nonlinear programming,” in Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, Calif.: University of California Press, 1951, pp. 481–492. [Online]. Available: Link
- [18] C. L. Lawson and R. J. Hanson, Solving least squares problems. Siam, 1995, vol. 15.
- [19] D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature, vol. 401, no. 6755, p. 788, 1999.
- [20] ——, “Algorithms for non-negative matrix factorization,” in Advances in neural information processing systems, 2001, pp. 556–562.
- [21] L. Li and Y.-J. Zhang, “Fastnmf: highly efficient monotonic fixed-point nonnegative matrix factorization algorithm with good applicability,” Journal of Electronic Imaging, vol. 18, no. 3, p. 033004, 2009.
- [22] C.-J. Lin, “Projected gradient methods for nonnegative matrix factorization,” Neural computation, vol. 19, no. 10, pp. 2756–2779, 2007.
- [23] C.-H. Liu, C. J. Wright, R. Gu, S. Bandi, A. Wustrow, P. K. Todd, D. O’Nolan, M. L. Beauvais, J. R. Neilson, P. J. Chupas, K. W. Chapman, and S. J. L. Billinge, “Validation of non-negative matrix factorization for assessment of atomic pair-distribution function (pdf) data in a real-time streaming context,” J. Appl. Crystallogr., Jan 2020, arXiv:2010.11807 [cond-mat.mtrl-sci]. [Online]. Available: Link
- [24] Y.-F. Liu, M. Hong, and E. Song, “Sample approximation-based deflation approaches for chance sinr-constrained joint power and admission control,” IEEE Transactions on Wireless Communications, vol. 15, no. 7, pp. 4535–4547, 2016.
- [25] J. Nocedal and S. Wright, Numerical optimization. Springer Science & Business Media, 2006.
- [26] P. Paatero and U. Tapper, “Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values,” Environmetrics, vol. 5, no. 2, pp. 111–126, 1994.
- [27] S. Sra and I. S. Dhillon, Nonnegative matrix approximation: Algorithms and applications. Computer Science Department, University of Texas at Austin, 2006.
- [28] P. K. Todd, A. Wustrow, R. D. McAuliffe, M. J. McDermott, G. T. Tran, B. C. McBride, E. D. Boeding, D. O’Nolan, C.-H. Liu, S. S. Dwaraknath, K. W. Chapman, Simon J. L. Billinge, K. A. Persson, A. Huq, G. M. Veith, and J. R. Neilson, “Defect-accommodating intermediates yield selective low-temperature synthesis of YMnO3 polymorphs,” Inorg. Chem., vol. 59, p. 13639–13650, Aug 2020. [Online]. Available: Link
- [29] S. A. Vavasis, “On the complexity of nonnegative matrix factorization,” SIAM Journal on Optimization, vol. 20, no. 3, pp. 1364–1377, 2009.
- [30] Y.-X. Wang and Y.-J. Zhang, “Nonnegative matrix factorization: A comprehensive review,” IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 6, pp. 1336–1353, 2012.
- [31] R. Zdunek and A. Cichocki, “Non-negative matrix factorization with quasi-newton optimization,” in International conference on artificial intelligence and soft computing. Springer, 2006, pp. 870–879.
- [32] H. Zhao, T. M. Nenoff, G. Jennings, P. J. Chupas, and K. W. Chapman, “Determining quantitative kinetics and the structural mechanism for particle growth in porous templates,” The Journal of Physical Chemistry Letters, vol. 2, no. 21, pp. 2742–2746, 2011.