Segmenting Epipolar Line

Shengjie Li School of Electronic Information and
Electrical Engineering
Shanghai Jiao Tong University
[email protected] Qi Cai School of Electronic Information and
Electrical Engineering
Shanghai Jiao Tong University
[email protected] Yuanxin Wu School of Electronic Information and
Electrical Engineering
Shanghai Jiao Tong University
[email protected]

Abstract

Identifying feature correspondence between two images is a fundamental procedure in three-dimensional computer vision. Usually the feature search space is confined by the epipolar line. Using the cheirality constraint, this paper finds that the feature search space can be restrained to one of two or three segments of the epipolar line that are defined by the epipole and a so-called virtual infinity point.

Index Terms:

feature correspondence, epipolar line, cheirality, virtual infinity point

I Introduction

Now photography has become an indispensable part of people’s daily life, using the camera can help people record things in life. However, the real world is three-dimensional, and ordinary monocular camera can only record the world on a two-dimensional image by projection. Then how to use two-dimensional images to get three-dimensional information has become a very concerned problem. This involves the problem of image matching. If we can know the two-dimensional image of an object from different angles, we can find a way to recover the three-dimensional information from it. Therefore, the correspondence of two image points is a fundamental problem in computer vision. For two images without a priori information we usually use feature detection methods like SIFT[1] or SURF [2] to find the correspondence between images. In general, only a few points on the image are suitable for matching, e.g., corners or edge points, and the feature matching methods are time consuming. If the matching relationship between the photos taken by two cameras is known, we can get the essential matrix and the relative pose of two cameras[3, 4, 5, 6]. And the epipolar constraint can be used to reduce time consumption in feature matching. The epipolar constraint reduces the search space of matching points from two dimensions to one dimension. That is to say, given one image point in one view, the corresponding image point in the other view lies on the intersection of the imaging plane of the other view and the epipolar plane passing the two view origins and the image point, namely the epipolar line [3]. The methods to speed up the epipolar line search have been studied for many years. We may use image rectification to make the epipolar lines all parallel so that the search can be done on scan lines [7, 8, 9], as for rectified images the disparity can be used to represent the depth of a pixel[10, 11]. However, the technique of image rectification distorts the image and loses surface details on object reconstruction [12] and can not be used under some circumstances. For example, when the epipoles are in the image, it is impossible to perform image rectification. Because when doing image rectification, the epipoles need to be transformed to infinity through affine transformation. If the epipoles are in the image, the points around the epipoles are also transformed to infinity, and the original image will be destroyed. Our method does not need the process of image rectification, so there are no such limitations. Our method combines epipolar line search and the cheirality constraint. The cheirality constraint, first proposed by Hartley in [13], means that any point that lies in an image must lie in front of the camera producing that image, which is alternatively known as the positive depth constraint. Werner and Pajdla [14] give necessary and sufficient conditions for an image point set to correspond to any real imaging geometry. Agarwal and Pryhuber[15] give an algebraic description of mutiview cheirality. In this paper, we use the cheirality constraint to segment epipolar line and identify the correct epipolar segment.

This paper is organized as follows: Section 2 reviews the existing concepts such as the epipolar geometry and cheirality constraint. Section 3 raises the concepts of virtual infinity point and uses it for segmenting the epipolar line. In section 3, cheirality on the epipolar line is discussed. Section 4 describes how to identify the epipolar line segment on which the corresponding image point lies. Section 5 reports the experiment results to show the effectiveness of our new method. Section 6 concludes this paper.

II Related Works

Throughout this paper we assume that all cameras are calibrated and the normalized image coordinates are used unless explicitly stated otherwise.

II-A Epipolar Geometry

The epipolar geometry between two cameras is illustrated in Fig. 1. $X=[x,y,1]^{T}$ and $X^{\prime}=[x^{\prime},y^{\prime},1]^{T}$ are a pair of matched image points. $X_{w}=[x_{w},y_{w},z_{w},1]^{T}$ is the projected 3D world point relative to the first camera coordinate. $X^{{}^{\prime}}_{w}=[x^{{}^{\prime}}_{w},y^{{}^{\prime}}_{w},z^{{}^{\prime}}_{w},1]^{T}$ is the coordinate of projected 3D world point relative to the second camera. The two-view imaging can be described as

\displaystyle z_{w}^{{}^{\prime}}X^{{}^{\prime}}=z_{w}RX+t

(1)

where $R$ and $t$ are the rotation and translation between two camera frames, respectively. Left multiplying (1) by $X^{{}^{\prime}T}[t]_{\times}$ yields the epipolar constraint [3]

\displaystyle X^{{}^{\prime}T}EX=0

(2)

where the essential matrix $E=[t]_{\times}R$ and $[t]_{\times}$ means the skew-symmetric matrix formed by $t$ .

Refer to caption — Figure 1: An example of epipolar line, virtual infinity point and epipole in two-view geometry.

II-B Epipolar Line

Referring to (2), $X^{{}^{\prime}T}$ is restricted to fall in the left null space of $EX$ which can be described as a line on the image, namely the epipolar line. The line can be described as

\displaystyle l^{{}^{\prime}}=EX=[t]_{\times}RX

(3)

In order to find a pair of matching points between two images, the epipolar line can be employed to reduce the search space from two dimensions to one dimension. It is known as the epipolar line search [3].

II-C Cheirality Constraint

The cheirality constraint in two-view geometry means that a 3D world point must lie in front of both two cameras, or alternatively it must have positive depths. It is well known that we would get four sets of pose solutions from decomposing the essential matrix $E$ [3, 16]. The cheirality constraint is helpful in identifying the right pose solution [3, 13].

III Epipolar Line Segments

This section will show that the epipolar line has two or three segments that are divided by the epipole and a so-called virtual infinity point.

III-A Virtual Infinity Point

The virtual infinity point $p_{inf}$ is defined as

\displaystyle p_{inf}=\frac{RX}{[RX]_{3}}

(4)

where $[RX]_{3}$ is the third row of $RX$ . In other words, the virtual infinity point $p_{inf}$ is $RX$ represented by the homogeneous coordinate.

Proposition 1

The virtual infinity point lies on the epipolar line.

Proof 1

with (3) and (4), we have

\displaystyle p_{inf}\times l^{{}^{\prime}}=(\frac{RX}{[RX]_{3}})\times([t]_{\times}RX)=0

(5)

Q.E.D.

The epipole $p_{e}$ on the right image is computed by [3]

\displaystyle p_{e}=\frac{t}{[t]_{3}}

(6)

where $[t]_{3}$ is the third row of $t$ .

Specifically, $p_{inf}$ is at infinity when $[RX]_{3}=0$ and $p_{e}$ is at infinity when $[t]_{3}=0$ .

Referring to Fig. 1, $C$ and $C^{{}^{\prime}}$ are two camera centers. The projection rays $\overrightarrow{CX}$ and $\overrightarrow{C^{\prime}p_{inf}}$ are parallel and intersect at the plane at infinity. It is the reason we name $p_{inf}$ as the virtual infinity point.

III-B Segmentation of Epipolar Line

It is well known that the epipolar line passes through the epipole. In Sec. 3.A, we prove that the virtual infinity point lies on the epipolar line as well. Therefore, the epipolar line are separated into three segments by the virtual infinity point and the epipole, as demonstrated in Fig. 1. Without loss of generality, it assumes that $p_{inf}$ lies on the right side of $p_{e}$ . It can be inferred that the correct matching point should lie on one of the three segments for finite epipole and finite virtual infinity point. The epipolar line would be separated into two segments for the special case $[RX]_{3}=0$ or $[t]_{3}=0$ .

III-C Cheirality on Epipolar Line

Referring to Fig. 1, we assume $\widetilde{X^{\prime}}$ represents one candidate on the epipolar line and $\widetilde{X_{w}}$ represents the intersection of $\overrightarrow{CX}$ and $\overrightarrow{C^{\prime}\widetilde{X^{\prime}}}$ . As illustrated by the green dot, if $\widetilde{X_{w}}$ is in front of both cameras, $\widetilde{X^{\prime}}$ can be a possible solution to the matching point of $X$ . As illustrated by the red dots, if $\widetilde{X_{w}}$ is behind one of the two cameras, the cheirality constraint is violated and then $\widetilde{X^{\prime}}$ is not a feasible solution. That means that the cheirality constraint helps us remove infeasible solutions to the matching point on the condition that the depth can be obtained.

IV Identifying Right Epipolar Segment

Referring to (1), $X^{{}^{\prime}}$ can be represented as

\displaystyle X^{\prime}=\lambda_{1}RX+\lambda_{2}t

(7)

where $\lambda_{1}=\frac{z_{w}}{z^{{}^{\prime}}_{w}}$ and $\lambda_{2}=\frac{1}{z^{{}^{\prime}}_{w}}$ . It is well known that if $z_{w}>0$ , $z^{{}^{\prime}}_{w}>0$ in (1), the image points pair $(X,X^{{}^{\prime}})$ satisfy with the cheirality constraint. If $\lambda_{1}>0$ , $\lambda_{2}>0$ , we have $z_{w}>0$ , $z^{{}^{\prime}}_{w}>0$ . Therefore (7) constrains any point pair to fit the real imaging geometry when $\lambda_{1}>0$ and $\lambda_{2}>0$ . The relative position on epipolar line of correct matching point $X^{\prime}$ , virtual infinity point $p_{inf}$ and epipole $p_{e}$ is decided by the sign of $[RX]_{3}$ and $[t]_{3}$ .

Proposition 2

$[RX]_{3}>0$ , $[t]_{3}>0$ , $X^{\prime}$ is between $p_{inf}$ and $p_{e}$

$[RX]_{3}>0$ , $[t]_{3}<0$ , $p_{inf}$ is between $X^{\prime}$ and $p_{e}$

$[RX]_{3}<0$ , $[t]_{3}>0$ , $p_{e}$ is between $p_{inf}$ and $X^{\prime}$

Proof 2

Substituting (4) and (6) into (7) gives

$\displaystyle X^{\prime}$	$\displaystyle=\lambda_{1}[RX]_{3}\frac{RX}{[RX]_{3}}+\lambda_{2}[t]_{3}\frac{t}{[t]_{3}}$	(8)
	$\displaystyle=\lambda_{1}[RX]_{3}p_{inf}+\lambda_{2}[t]_{3}p_{e}$
	$\displaystyle=ap_{inf}+bp_{e}$

where $a=\lambda_{1}[RX]_{3}$ , $b=\lambda_{2}[t]_{3}$ . Because $X^{{}^{\prime}},p_{inf}$ and $p_{e}$ are represented in homogeneous coordinates, we have $a+b=1$ . Specifically, if $\lambda_{1}=0$ , then $b=1,X^{{}^{\prime}}=p_{e}$ , and if $\lambda_{2}=0$ , $a=1,X^{{}^{\prime}}=p_{inf}$ . For $\lambda_{1}>0,\lambda_{2}>0$ , $a$ and $b$ have the same signs with $[RX]_{3}$ and $[t]_{3}$ , respectively. Then (8) can be transformed as such

	$\displaystyle(a+b)X^{{}^{\prime}}=ap_{inf}+bp_{e}$		(9)
	$\displaystyle a(X^{{}^{\prime}}-p_{inf})=b(p_{e}-X^{{}^{\prime}})$		(10)

When $a>0,b>0$ , (10) indicates that $\overrightarrow{p_{inf}X^{{}^{\prime}}}\parallel\overrightarrow{X^{{}^{\prime}}p_{e}}$ . Considering that $X^{\prime}$ , $p_{inf}$ and $p_{e}$ is colinear, we can infer that $X^{\prime}$ is between $p_{inf}$ and $p_{e}$ . Furthermore, (8) can also be transformed into

	$\displaystyle X^{{}^{\prime}}=(1-b)p_{inf}+bp_{e}$		(11)
	$\displaystyle-b(p_{e}-p_{inf})=p_{inf}-X^{{}^{\prime}}$		(12)

When $b<0$ , it means $\overrightarrow{p_{inf}p_{e}}\parallel\overrightarrow{X^{{}^{\prime}}p_{inf}}$ , $p_{inf}$ is between $X^{\prime}$ and $p_{e}$ . Finally, (8) can also be transformed as such

	$\displaystyle X^{{}^{\prime}}=ap_{inf}+(1-a)p_{e}$		(13)
	$\displaystyle-a(p_{inf}-p_{e})=p_{e}-X^{{}^{\prime}}$		(14)

When $a<0$ , we $\overrightarrow{p_{e}p_{inf}}\parallel\overrightarrow{X^{{}^{\prime}}p_{e}}$ , alternatively, $p_{e}$ is between $p_{inf}$ and $X^{\prime}$ . Note that the condition $a<0,b<0$ does not occur because $a+b=1$ .

Q.E.D.

According to Sec. 3.B the epipolar line are separated into two segments for the special case $[RX]_{3}=0$ or $[t]_{3}=0$ . Specifically, when $[RX]_{3}=0$ and $[t]_{3}\not=0$ , (7) can be transformed into

$\displaystyle X^{\prime}$	$\displaystyle=\lambda_{1}RX+\lambda_{2}[t]_{3}\frac{t}{[t]_{3}}$	(15)
	$\displaystyle=\lambda_{1}RX+\lambda_{2}[t]_{3}p_{e}$
	$\displaystyle=\lambda_{1}RX+bp_{e}$

As $[X^{{}^{\prime}}]_{3}=1,[RX]_{3}=0,[p_{e}]_{3}=1$ , we have $b=1$ . Therefore, $X^{{}^{\prime}}-p_{e}=\lambda_{1}RX$ , indicating the relative position of $X^{{}^{\prime}}$ and $p_{e}$ on the epipolar line is only up to the first two rows of $RX$ ( $\lambda_{1}>0$ ). For instance, if $[RX]_{1}>0$ , $X^{{}^{\prime}}$ lies to the right of $p_{e}$ in the x-axis direction of the image plane. When $[t]_{3}=0$ and $[RX]_{3}\not=0$ , (7) can be transformed into

$\displaystyle X^{\prime}$	$\displaystyle=\lambda_{1}[RX]_{3}\frac{RX}{[RX]_{3}}+\lambda_{2}t$	(16)
	$\displaystyle=\lambda_{1}[RX]_{3}p_{inf}+\lambda_{2}t$
	$\displaystyle=ap_{inf}+\lambda_{2}t$

Because $[X^{{}^{\prime}}]_{3}=1,[RX]_{3}=1,[p_{e}]_{3}=0$ , we have $a=1$ . Therefore, $X^{{}^{\prime}}-p_{inf}=\lambda_{2}t$ that means the relative position of $X^{{}^{\prime}}$ and $p_{inf}$ on the epipolar line is only up to the first two rows of $t$ ( $\lambda_{2}>0$ ).

When $[t]_{3}=0$ and $[RX]_{3}=0$ , the third row of the right side of (7) is equal to 0. As $X^{{}^{\prime}}$ is represented in homogeneous coordinates, the left side of (7) is equal to 1. Then the condition $[t]_{3}=0$ and $[RX]_{3}=0$ does not occur.

It should be noted that the actual epipolar line segment is the intersection of the segment we choose and the image plane in practice. Hartley [3] points out the search space can be reduced to a bounded line, but our method analytically proves that the correct matching point can be located in any of the three segments. It is distinctive from the common sense that the correct matching point lies between the epipole and virtual infinity point.

V Experimental Results

Firstly, we examine the theory of epipolar line segments on four pairs of images. The first and second pair have a small baseline. The third and fourth pair have a large baseline. The relative pose of images is already known and the SIFT features are used.

V-A Small-baseline test

Five pairs of feature points are randomly chosen and highlighted as green dots in both images, as seen in Fig. 2 and Fig. 3. With the known relative pose, the epipolar line for each chosen point in the left image is plotted in the second image. All virtual infinity points lie on the corresponding epipolar lines. Blue rays are used to represent the right segment on the epipolar line and red rays are used to represent the wrong segments, identified according to Proposition 2. We see that all the matching points in the second image fall on the right epipolar segment. It should be noted that the epipole in the second image is out of view. Therefore, the epipolar lines have only two visible segments and the correct matching points are close to the corresponding virtual infinity points.

V-B Large-baseline test

In this case, the epipole in the second image falls in the image and all the epipolar lines intersect at the epipole as Fig. 4 illustrates. The epipole and virtual infinity points divide the epipolar lines into three segments and all correct matching points fall on the identified segments. Note that point $\#1$ and point $\#3$ in the right image are clearly false matches and stay far away from the epipolar lines of point $\#1$ and point $\#3$ in the left image. In Fig. 5, the epipole in the second image is out of view. We see that all the matching points in the second image fall on the right epipolar segment. Note that point $\#5$ in the right image are clearly false matches and stay far away from the epipolar lines of point $\#5$ in the left image.

V-C Simulation test

Additionally, we first simulate two-view geometry in different poses and calculate how much search space on the epipolar line can be reduced. The size of simulated image is set to $100\times 100$ pixels. The intrinsic matrix of the two simulated cameras are both set to $\left[\begin{matrix}50&0&50\\ 0&50&50\\ 0&0&1\end{matrix}\right]$ .

The left subfigures in Fig. 6 demonstrates the relative pose of two camera. The optical center with a star is the first camera. The right subfigures show the reduced search space percentage for each image point by the advantage of identifying the right epipolar segment. Table I lists the average search space reduced for four representative image pairs. On average, the right epipolar segment can help reduce the search space by about $50\%$ in contrast to the whole epipolar line.

TABLE I: REDUCED SEARCH SPACE FOR EACH IMAGE PAIR

pair	reduced space(percentage)
a	$31.07\%$
b	$42.92\%$
c	$66.67\%$
d	$50.50\%$

VI Conclusions

This paper finds that for a known relative pose the epipolar line can be divided into two or three segments using the epipole and the newly-defined virtual infinity point. Hopefully, this fact can be used to significantly reduce the search space by about $50\%$ in finding feature correspondence. The epipolar line segment can also be used to identify the outliers of feature matching. Simulation and test results are reported to verify the analysis and the proposed algorithm. This paper discusses the problem of segmenting epipolar line in the calibrated case, the uncalibrated case would be discussed in the future work.

References

[1] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.
[2] H. Bay, T. Tuytelaars, and L. Van Gool, “Surf: Speeded up robust features,” in European Conference on Computer Vision, pp. 404–417, Springer, 2006.
[3] R. Hartley and A. Zisserman, Multiple view geometry in computer vision. Cambridge University Press, 2003.
[4] D. Nistér, “An efficient solution to the five-point relative pose problem,” IEEE transactions on pattern analysis and machine intelligence, vol. 26, no. 6, pp. 756–770, 2004.
[5] H. Stewenius, C. Engels, and D. Nistér, “Recent developments on direct relative orientation,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 60, no. 4, pp. 284–294, 2006.
[6] H. C. Longuet-Higgins, “A computer algorithm for reconstructing a scene from two projections,” Nature, vol. 293, no. 5828, pp. 133–135, 1981.
[7] A. Fusiello, E. Trucco, and A. Verri, “A compact algorithm for rectification of stereo pairs,” Machine Vision and Applications, vol. 12, no. 1, pp. 16–22, 2000.
[8] D. Oram, “Rectification for any epipolar geometry.,” in BMVC, vol. 1, pp. 653–662, Citeseer, 2001.
[9] M. Pollefeys, R. Koch, and L. Van Gool, “A simple and efficient rectification method for general motion,” in Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 1, pp. 496–501, IEEE, 1999.
[10] N. Qian, “Binocular disparity and the perception of depth,” Neuron, vol. 18, no. 3, pp. 359–368, 1997.
[11] F. Gonzalez and R. Perez, “Neural mechanisms underlying stereoscopic vision,” Progress in neurobiology, vol. 55, no. 3, pp. 191–224, 1998.
[12] D. C. Blumenthal-Barby and P. Eisert, “High-resolution depth for binocular image-based modeling,” Computers & Graphics, vol. 39, pp. 89–100, 2014.
[13] R. I. Hartley, “Chirality,” International Journal of Computer Vision, vol. 26, p. 41–61, Jan. 1998.
[14] T. Werner and T. Pajdla, “Cheirality in epipolar geometry,” in Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, vol. 1, pp. 548–553, IEEE, 2001.
[15] S. Agarwal, A. Pryhuber, R. Sinn, and R. R. Thomas, “Multiview chirality,” arXiv preprint arXiv:2003.09265, 2020.
[16] W. Wang and H. T. Tsui, “A svd decomposition of essential matrix with eight solutions for the relative positions of two perspective cameras,” in Proceedings 15th International Conference on Pattern Recognition. ICPR-2000, vol. 1, pp. 362–365, IEEE, 2000.