$\pi_{t}$ - Enhancing the Precision of Eye Tracking using Iris Feature Motion Vectors

Aayush K. Chaudhary [email protected] Rochester Institute of TechnologyRochesterNYUSA and Jeff B. Pelz [email protected] Rochester Institute of TechnologyRochesterNYUSA

Abstract.

A new high-precision eye-tracking method has been demonstrated recently by tracking the motion of iris features rather than by exploiting pupil edges. While the method provides high precision, it suffers from temporal drift, an inability to track across blinks, and loss of texture matches in the presence of motion blur. In this work, we present a new methodology ( $\pi_{t}$ ) to address these issues by optimally combining the information from both iris textures and pupil edges. With this method, we show an improvement in precision (S2S-RMS & STD) of at least 48% and 10% respectively while fixating a series of small targets and following a smoothly moving target. Further, we demonstrate the capability in the identification of microsaccades between targets separated by $0.2^{\circ}$ .

Eye tracking methodology, iris features, eye movements, video-based eye tracking, iris segmentation, fixations, smooth pursuit, microsaccades, gaze estimation.

1. Introduction

Estimation of gaze in current non-invasive, video-based eye trackers allows research involving visual perception, eye movement studies, virtual reality, and augmented reality in natural conditions (Li et al., 2008; Kassner et al., 2014a; San Agustin et al., 2010; Holmdahl, 2015; Labs, 2019). Current feature-based eye-trackers mainly estimate gaze by fitting a mathematical model (model-based or regression-based in 2D or 3D) to localized eye features such as the pupil, iris border, and corneal reflection (CR) (Tsukada et al., 2011; Kassner et al., 2014b; Dierkes et al., 2018; Li and Parkhurst, 2005; Hansen and Ji, 2009).

The spatial quality of the estimated gaze is characterized mainly by two widely used metrics; accuracy and precision. The degree to which an eye tracker can estimate the correct gaze position for a known target is referred to as its accuracy, whereas the degree of spatial and temporal dispersion of gaze during fixations is referred to as its precision (Edlund and Nichols, 2019, p. 182). Various factors such as the inability of the participant to fixate at the desired target, the video quality of an eye tracker (resolution and compression artifacts), the experimental setup (on-axis or off-axis), calibration decay and the gaze-estimation algorithms limit the precision and accuracy of these systems (Blignaut, 2019; Ehinger et al., 2019).

Among those factors, the selection of a gaze-estimation algorithm can influence the reported gaze significantly (Hansen and Ji, 2010; Villanueva et al., 2008). Current video-based eye trackers are limited by signal noise due to reliance on localizing the edges of the pupil/iris boundary. Low-level image features such as edges are intolerant to illumination changes, occlusion caused by eyelashes/eyelids, CR at the pupil boundaries, etc. They are also highly dependent on parameters such as threshold values. Further, when eye-cameras are off-axis, the objective function of fitting an ellipse on the projected 2D image of iris or pupil might produce an error as these features are not perfect ellipses (Świrski, 2015; Villanueva et al., 2008; Wang et al., 2019).Recently, convolutional neural network-based approaches (appearance-based models) have been developed (Park et al., 2018a, b; Kim et al., 2019), which focus on taking advantage of extensive training sets, learning-based optimization and generalization, preparation of synthetic data with natural features using generative adversarial networks (Kim et al., 2019; Wood et al., 2016), etc. However, the gaze estimation results are biased towards the training set and have only reached an accuracy of 2.06 ( $\pm 0.44$ ) degrees (Kim et al., 2019) for real subjects. There is still room for improvement in these approaches before its extensive use in research, especially in tasks requiring high precision.

In another approach, (Pelz and Hansen, 2017) tracked a large number of motion vectors made up of iris features across adjacent frames in order to extract the velocity of the eye over time, integrating the velocity to obtain eye position. They emphasized improving the temporal precision and accuracy of the eye-tracking system by considering a large number of iris features where the noise was addressed in the spatial domain rather than the temporal domain. (Chaudhary and Pelz, 2019) extended (Pelz and Hansen, 2017)’s approach to demonstrate a method to achieve a high precision system capable of detecting small eye movements as small as 0.2 degrees with high confidence. While (Pelz and Hansen, 2017) and (Chaudhary and Pelz, 2019) both show significant gains in precision due to the emphasis on a large number of motion vectors, neither addresses the issue of the drift inherent in a system that determines position purely by the integration of velocity over time. Small errors in approximations of the velocity induced by the use of central tendency metrics like geometric median (Pelz and Hansen, 2017; Chaudhary and Pelz, 2019) accumulate over time, resulting in temporal drift which degrades accuracy in gaze estimations (Chaudhary, 2019).

Additionally, (Pelz and Hansen, 2017) and (Chaudhary and Pelz, 2019) only used trials without blinks because the method relies solely on the integration of iris motion vectors, which are absent during blinks. If gaze position changes while the eyelid covers the iris features, gaze position based on integrated motion values are inaccurate. Motion blur can also degrade performance as relatively few matches are available.

This raises a concern regarding the overall concept of relying completely on the integration of velocity for the position signal, especially during rapid motion and pupil occlusion. One alternative approach might be to use the velocity integration method to provide only relative position information and use it in parallel with a traditional eye-position signal (e.g., P-CR or 3D gaze vector). In essence, such a system would provide high precision data during fixations and smooth pursuit, but rely on traditional methods at all other times.

The goal of this paper is to address these issues, namely erroneous drifts, handling of blinks, and motion blur. Our approach is to use traditional approaches such as (P-CR) based system with an iris-based velocity system as proposed in (Pelz and Hansen, 2017; Chaudhary and Pelz, 2019). P-CR based gaze estimation has been used extensively since the 1960s and is still used today in some commercial eye-trackers. However, precision is constrained because of reliance on pupil edges and the bright corneal reflection (Li et al., 2008). By optimally combining the P-CR position with the iris velocity (computed from textures) [“ $Pi_{t}$ ” (or “ $\pi_{t}$ ”)] we demonstrate a hybrid method with high precision without the issues caused by drift, blinks, or motion blur.

Observations made from multiple sources (e.g., pupil and iris estimates from different measurement techniques) can be fused with various state estimation techniques. Maximum likelihood and maximum posterior, the Kalman filter (Kalman, 1960), and the particle filter (Del Moral, 1996) are some of the common state estimation techniques used based on linear/non-linear dynamic measurements systems (Castanedo, 2013). The disadvantages of maximum likelihood and maximum posterior are that in order to reduce the bias of the solution, they require an empirical model of the sensor with a possibly large number of samples (Castanedo, 2013). As our data can be assumed to be from linear sources with Gaussian distributed noise (random variable derived by a large number of identically independent iris feature matches/pupil edges), modeling with the Kalman filter is a good fit as it assures the optimal estimation on this type of data (Alofi et al., 2017). The use of the particle filters is sophisticated and well-practiced for non-linear data, but to obtain small variance in the estimations requires a relatively large data sample (Alofi et al., 2017).

The Kalman filter is designed to combine uncertain information from multiple independent sources to estimate a more confident (certain) approximation (Pei et al., 2017). Our goal is to combine the information derived from the iris (I: high precision but possibly contaminated with drift, motion blur, or blinks) with the information from the pupil (P: no drift but noisy) by considering a convex combination of the two estimates with a general form $\beta\times I+(1-\beta)\times P$ , where $(0\leq\beta\leq 1)$ (Pei et al., 2017). $I$ and $P$ are derived in Section 3.3 and represent iris and pupil information, respectively, and $\beta$ is scaled with confidence in the information from the iris. Note that position information (P) is computed independently for each frame. As a result there is no accumulated drift, but the signal is noisy because of reliance on edge/boundary calculations based on a small number of features (edges) on each video frame. A position signal can also be computed by integrating the velocity signal derived from the iris features in consecutive frames. That position signal, however, suffers from drift over time because even small errors are compound over large numbers of frames.

Here, combining the information from the two signals generated from two independent methods is not straightforward. We are only interested in the position information from the pupil center, and only the velocity information from the iris features so that the noise (from the pupil signal) and the drift (from iris signal) are not considered. Our work incorporates the two signals in the different domains that are related by first-order derivative/integration, which can further be applied to any application which has a spatial drift over time. We demonstrate this strategy in the field of eye-tracking to devise a post-hoc evaluation pipeline on a chin-rest system with both high accuracy and high precision for stimuli displayed in an automated teleprompter setup.

The major contributions of this paper are as follows:

(1)

We demonstrate a new methodology to improve precision of any pupil detection technique by incorporating information from multiple features of the iris. This is one of the earliest attempts to combine regression-based approaches (P-CR) with computer-vision approaches such as iris feature matching and tracking. This paper opens an interesting area to be explored i.e. combining information from any eye-trackers (P-CR, 3D based approaches, appearance-based models) with iris-based estimates.
(2)

We present a modified Kalman filter approach that helps to estimate a reliable signal by disentangling useful pieces of information from two independent sources (one precise, but drifting signal and another accurate but noisy signal).
(3)

With the off-the-shelf components like a digital camera, synchronized displays and IRLEDs, we show sample-to-sample root mean square error (S2S-RMS) of $0.05^{\circ}$ for real human-eye.

2. Literature Review

Depending on the use of either polynomial gaze function or fitting a 3D model of the eye to predict gaze, video-based eye-tracking methodology is broadly divided into two categories; regression-based and model-based gaze estimation (Hansen and Ji, 2009). Both categories are reliant on eye features such as pupil center, pupil/iris contours, glints (CR), facial features (Chen et al., 2008), limbus, eye-corners, iris features, etc. (Tsukada et al., 2011; Li and Parkhurst, 2005; Park et al., 2018b; Pelz and Hansen, 2017; Chaudhary and Pelz, 2019; Hansen and Ji, 2009; Chen et al., 2008; Świrski et al., 2012). Regression-based gaze estimation uses a mapping function to predict 2D gaze coordinates based on the features. In contrast, model-based approaches allow gaze estimation in 3D space. Model-based approaches such as (Tsukada et al., 2011; Świrski et al., 2012; Shih and Liu, 2004; Guestrin and Eizenman, 2006; Chen and Ji, 2008) assume a spherical eye model and approximate a gaze direction based on pupil/iris contours. These models approximate the solution based on several assumptions such as neglecting perspective projection and the effects of refraction in (Shih and Liu, 2004), virtual pupil lying in the optical axis in (Chen and Ji, 2008) and neglecting corneal reflection in (Świrski et al., 2012). Moreover, recent publications (Dierkes et al., 2018, 2019) have also accounted for the corneal reflection in the 3D-model.

Recently, appearance-based models have appeared in the field of eye-tracking, which try to learn a direct mapping between the images and the gaze direction (Park et al., 2018a, b). These models have been shown to perform better than the above models in person-independent gaze estimation and unconstrained environments but are particularly biased towards the training set. Other challenges for appearance-based models are finding ways to incorporate prior knowledge in a differential manner, the need for enormous labeled data sets (Marcus, 2018), computational cost, and the difficulty of understanding what the models have actually ’learned’ (Park et al., 2018b). Even though appearance-based models were incorporated with additional regression-based or model-based methods in (Park et al., 2018a, b), the method is still not reliable for high precision tasks.

We propose a regression-based gaze estimation method that has the potential to be further modified for model-based gaze estimation. Our priority is to achieve a high-precision system and are focused on tracking features that are more reliable for that purpose. To do so, we are extending (Chaudhary and Pelz, 2019) previous work, which demonstrated a high-precision task (microsaccade detection) by integrating eye velocity computed from a large number of iris features. To address the issues existing in that method, we enhance that method by optimally adding pupil-center tracking. Pupil location has been tracked in numerous ways including the Starburst algorithm (based on the iterative selection of candidate points obtained by ray following to find best-fit) (Li and Parkhurst, 2005), an image-aware support function to fit an ellipse to the pupil edge (Świrski et al., 2012), and fitting an ellipse to connected components (Kassner et al., 2014a). In this work, we deploy the method of (Kassner et al., 2014a) to detect the pupil center, but we capture a high-resolution image of the eye as in (Chaudhary and Pelz, 2019). Our regression-based gaze estimation method is based on those signals.

3. Methods

This section formulates the problem of estimating an approximate hybrid position ( $\pi_{t}$ ) by extracting the P-CR relative eye position and iris velocity (i). The detailed block diagram is shown in Figure 1. Frames are extracted from the video sequence and fed to specific blocks for processing. In one of the blocks, images are fed to a CNN to obtain an iris mask, which is a crucial step to define the region of interest for another step whose objective is to approximate the iris velocity based on the motion distribution of feature matches in consecutive frames. The next step is the determination of the pupil center based on ellipse fitting, as proposed in (Kassner et al., 2014a). To compensate for head movements, we determine the average CR signal of multiple glints as well as approximate head movement velocity based on a fixed head mask on the subject. Based on these signals, $\pi_{t}$ is computed and demonstrated in applications such as gaze estimation, smooth pursuit analysis, and microsaccade detection.

Refer to caption — Figure 1. Basic flow of our methodology.

3.1. Pupil position

The pupil center (P) is specified as the ellipse center given by the 2D pupil detection model of the Pupil Labs software of (Kassner et al., 2014a). (Kassner et al., 2014a) initially compute an integral image from the grayscale image and find an approximation to the pupil and region of interest (ROI) for pupil detection based on response to Haar-like center-surround features (Świrski et al., 2012).

In the obtained ROI, spectral glint masks and a dark pupil mask are created based on the maximum intensity and the lowest and highest spike index. Morphological operations (opening with an elliptical structured kernel of size 9) followed by the Canny Edge detector (Canny, 1986) are then used to find the pupil edges.

In (Kassner et al., 2014a), the contours are found based on connected components of the edges. On those contours which have a minimum size of three points, the number of points that compose that line segment is reduced to find a similar curve using the OpenCV (Bradski, 2000) implementation of the Douglas-Peuker algorithm (Douglas and Peucker, 1973). The extracted contour points are then passed through a curvature test such that any three adjacent contour points must have a curvature more than 80 degrees for the curvature to be maintained.

Thus, on the intermediate contours based on continuous curvature, weak and strong contours are extracted based on parameters such as roundness ratio (ratio of minor and major axis) and user-defined radius limits of ellipse fit. Strong contours must also satisfy ellipse criteria for area and perimeter ratio of contour support, whereas weak contours need not. For those contours, either strong or weak, pruning using an augmented combinatorial search is done in order to find the solutions.

Best solutions out of these solutions are found based on the roundness ratio, the user-defined radius of ellipse fit (70, 200), and support ratio based on supporting edge length and the ellipse circumference. The best contour is obtained from the best solution, which gives us the final best edges (intersection of best contour with initial edges). The final best ellipse is based on the ellipse fit on those final edges when it satisfies all the above criteria. The center of this final best ellipse fit is considered as the pupil-center (P) in this paper. These steps are shown in Figure 2.

3.2. Iris velocity

Eye velocity is determined by tracking multiple feature points on the iris, so the first step is defining the iris ROI on the eye images. Semantic segmentation models like U-Net (Ronneberger et al., 2015) and RITnet (Chaudhary et al., 2019) can be used to extract the segmented iris mask from a given eye image. We adopt the U-Net architecture as in (Chaudhary and Pelz, 2019) with a modification during model training. Instead of reshaping the 960 $\times$ 540 to 224 $\times$ 224 as in (Chaudhary and Pelz, 2019), we initially partition each eye image to 540 $\times$ 540 before resizing to 224 $\times$ 224. This maintains the aspect ratio of the image and eliminates unnecessary pixels as the eye size $<540$ pixels. Then, we follow the same steps as in (Chaudhary and Pelz, 2019) on the segmented mask, i.e., we extract iris features from a Contrast Limited Adaptive Histogram equalized (Pizer et al., 1987) grayscale image. These features are matched in consecutive frames based on the Lowe’s ratio distance test followed by RANSAC. These matched features represent the iris feature movement vector. Final iris velocity is computed by calculating the geometric median of these movement vectors (Chaudhary and Pelz, 2019).

3.3. Problem Formulation

We have two sources of information: iris velocity and pupil position which we intend to combine with the Kalman filter, an optimal estimation technique for linear systems with Gaussian error (Anderson and Moore, 2012). A straightforward interpretation of the Kalman filter update equations is that they scale measurements from two sources by their corresponding precision matrix (the inverse of the covariance matrix) and then take the weighted sum as shown in Equation 1.

(1)

\overline{H}=\Sigma(\beta*P+(1-\beta)*I)

where $P$ , $I$ and $H$ are the pupil position, the iris position and the hybrid position respectively.

In our case, one source of measurement is the pupil position, and the other is the iris velocity (which is integrated to compute the iris position). The information in iris velocity can only provide us information about the iris position up to a constant bias value. If we use a traditional update of the Kalman filter (Equation (1)), we would be using incorrect bias induced by the integration of iris velocity, and error would still creep into our estimated solution. The estimated signal would also drift over time, primarily affected by the bias information from the iris velocity measurement. Therefore, while combining iris velocity measurement with the pupil position measurement, we must ignore any bias information obtained from the iris velocity measurement to gain the benefit in precision from the velocity measure without degrading accuracy.

Therefore, we need to make some crucial changes to the Kalman filter approach. To do so, we take the probabilistic interpretation of the Kalman filter. Using a probabilistic framework helps us combine two information sources from different domains in the same spirit of the Kalman filter. In a probabilistic interpretation, a Kalman update equation is interpreted as the posterior mean in which prior and likelihood are both linear. In this framework, we can easily integrate two measurements from different domains by defining a prior distribution which behaves like a Gaussian in a gradient domain, i.e., whose derivative is Gaussian. Using this approach, we can derive a Kalman update equation (as in (Ghimire et al., 2019)).

We can fuse the information of the pupil position ( $P$ ) with the iris velocity ( $i$ or $\textbf{D}I$ ) to obtain the hybrid position (H), where D is the spatial gradient operator. In general, if $\textbf{P}(P|H)$ is the likelihood, $\textbf{P}(H|I)$ refers to the prior probability distribution, and $\textbf{P}(H|P,I)$ refers to the posterior distribution, then the posterior probability can be computed as

\begin{split}\textbf{P}(H|P,I)&=\frac{\textbf{P}(P|H)\textbf{P}(I|H)\textbf{P}(H)}{\textbf{P}(P,I)}=\frac{\textbf{P}(P|H)\textbf{P}(H|I)}{\textbf{P}(P|I)}\\ \end{split}

where $\textbf{P}(P|I)$ is the normalization factor. So, we have $\textbf{P}(H|P,I)\propto{\textbf{P}(P|H)\textbf{P}(H|I)}$ . Here, $\textbf{P}(P|H)$ and $\textbf{P}(H|I)$ are defined as:

\textbf{P}(P|H)=\frac{\exp^{-\frac{(P-H)^{2}}{2\sigma_{P}^{2}}}}{\sqrt{2\pi\sigma_{P}^{2}}}=\frac{\exp^{-(P-H)^{T}\beta_{P}\textbf{I}(P-H)/2}}{\sqrt{2\pi\sigma_{P}^{2}}}

\begin{split}\textbf{P}(H|I)&=\frac{\exp^{-\frac{(\frac{\textbf{D}H}{\textbf{D}t}-\frac{\textbf{D}I}{\textbf{D}t})^{2}}{2\sigma_{I}^{2}}}}{\sqrt{2\pi\sigma_{I}^{2}}}=\frac{\exp^{-{({H}-I)^{T}\beta_{I}\textbf{D}^{T}\textbf{D}({H}-I)}/2}}{\sqrt{2\pi\sigma_{I}^{2}}}\end{split}

where $\beta_{I}=1/{\sigma_{I}^{2}}$ and $\beta_{P}=1/{\sigma_{P}^{2}}$ . Higher values of $\beta$ indicate lower standard deviation and more certainty.

Note that we have used the spatial gradient of the iris position. For a prior distribution, the low dimensional structure of the signal can be considered to model the prediction error (Ghimire et al., 2019). In our case, we find the hybrid signal by minimizing the mean square estimation between hybrid velocity and iris velocity and also between hybrid position and pupil position.

An important property of Gaussian distributions is that the product of two Gaussian distributions is a Gaussian distribution (Bishop, 2006, p. 638). Thus,

\textbf{P}(H|P,I)={\textbf{P}(P|H)\textbf{P}(H|I)}

(2)

=\frac{\exp^{-(P-H)^{T}\beta_{P}\textbf{I}(P-H)/2}}{\sqrt{2\pi\sigma_{P}^{2}}}\frac{\exp^{-{({H}-I)^{T}\beta_{I}\textbf{D}^{T}\textbf{D}({H}-I)}/2}}{\sqrt{2\pi\sigma_{I}^{2}}}

The equation is simplified in Appendix A. Further our term $\textbf{P}(H|P,I)$ , can be expressed as

(3)

\begin{split}\textbf{P}(H|P,I)&=K{\exp^{-1/2{(H-\overline{H})^{T}\Sigma^{-1}(H-\overline{H})}}}\\ &=K{\exp^{-1/2{(H^{T}\Sigma^{-1}H-H^{T}\Sigma^{-1}\overline{H}+...)}}}\end{split}

Comparing, Appendix A (Equation LABEL:eq:longe) and 3, we get the mean estimate ( $\overline{H}$ ) and the covariance ( $\Sigma$ ) of the hybrid position (Bishop, 2006, p. 639) as

{\Sigma}^{-1}=(\beta_{P}\textbf{I}+\beta_{I}\textbf{D}^{T}\textbf{D})

(4)

{\Sigma}=(\beta_{P}\textbf{I}+\beta_{I}\textbf{D}^{T}\textbf{D})^{-1}

\Sigma^{-1}\overline{H}=\beta_{P}\textbf{I}P+\beta_{I}\textbf{D}^{T}\textbf{D}I

(5)

\overline{H}=\Sigma(\beta_{P}\textbf{I}P+\beta_{I}\textbf{D}^{T}\textbf{D}I)

It is important to note that $\textbf{D}^{T}\textbf{D}$ is a non-invertible matrix. Therefore as the value of $\beta_{P}$ tends towards 0, the determinant of $\Sigma^{-1}$ approaches 0, as shown in Appendix B. The inverse operation cannot be obtained for the covariance matrix. So it is preferable to keep the value of $\beta_{P}$ non-zero as it is the limiting case for our approach.

With this update strategy, the bias information in the velocity measurement does not influence the solution. Hence, the solution correctly uses the rest of the information in the velocity to give a reasonable estimate of the position. This setup allows handling the drift in the estimated signal, and essential information from the iris velocity is preserved.

Additionally, the modifications made to the Kalman filter also handles latency issues (discussed further in Section 7.2) in the traditional Kalman filter-based approach. Rather than a traditional filtering approach, the formulation we use helps in estimation by combining different temporal frequency components from the pupil and iris signals. Low-temporal frequency components of the pupil are combined with the high-temporal frequency components from the iris. This combination of frequency components results in no time-lag (run-time). Note that the addition of high-frequency components is essential as this data mainly carries information about tremors and microsaccades, which are crucial for high precision tasks.

3.4. Positional Difference Compensation

The pupil position and the iris velocity in the above relations are extracted without taking into account the movements of the eye camera (headgear slippage and relative camera movement) (Li et al., 2008). To compensate for error induced by these events, we compute the P-CR vector to find the relative position of the eye to the camera. Additionally, we compute the head movement velocity to the camera (head velocity: D $H_{v}$ ) for a user-defined ROI on the forehead using a similar approach as used for computing iris velocity. Hence, in our case for the hybrid model, the gaze vector is computed when the relative position between the principal eye signal (iris or pupil) is subtracted with a compensatory movement (CR or $H_{v}$ ). So, the overall model can be computed as

(6)

\pi_{t}=Pi_{t}=\overline{h}=\Sigma(\beta_{P}\textbf{I}(P-CR)+\beta_{I}\textbf{D}^{T}\textbf{D}(I-H_{v}))

Note that for the CR, we initially segment the iris mask as in (Chaudhary and Pelz, 2019) to identify the region where the CR is most likely to be present. In the iris segmented region, we identify the bright spots in the mask with an empirically derived hard threshold pixel value of above 140 (for an 8-bit image). For our experiment with four IRLEDs, we initially find the largest bright spot. A small window (ellipse or rectangle depending on the number of points in the contour) around the largest spot is then used to find the remaining valid CRs. The LEDs are placed such that CR usually falls on the iris. However, on infrequent occasions like that seen in Figure 3, one of the CR falls on the sclera. We ignore this CR as a detection but consider all other CRs as valid. So, on those valid CRs, we find the center of the glint using moments. The centers of the detected CRs are used to approximate a circle. The center of this circle is referred to as CR in the following sections.

3.5. Blink Classification and Poor Feature Match

The hybrid model ( $\pi_{t}$ ) has weighting parameters $\beta_{p}$ and $\beta_{i}$ , related to confidence in the pupil position and iris velocity, respectively. Except during a blink, they sum to 1. The weight $\beta_{i}$ is a function of the number of detected iris feature matches; $\beta_{i}=min(0.9,0.9*(n_{\mathrm{matches}})/(min_{\mathrm{matches}}))$ , where $min_{\mathrm{matches}}$ is a user-set parameter set. The confidence in iris velocity will be degraded if fewer than $min_{\mathrm{matches}}$ feature points are computed, such as in cases of a large saccadic movement, motion blur, and significant compression artifacts. To avoid abrupt changes in $\beta_{i}$ (from 0.9 to $<$ 0.9) between two consecutive timestamps when there are only a few feature matches, we use a linear decay function that takes into account the previous/next two timestamp’s $\beta_{i}$ values.

To classify blinks in the pupil signal, we use a confidence value provided by the open-source Pupil Capture software (Kassner et al., 2014a). While pupil detection confidence decreases even in cases of a ‘partial blink’ where the portion of the pupil is occluded, a few iris feature matches still exist during the partial blink. Thus we classify blinks for pupil and iris separately and set values of $\beta_{p}$ and $\beta_{i}$ to 0 only in the case of a complete blink. In the case where pupil confidence is less than user defined confidence threshold, but we still have a few iris-feature matches, $\beta_{i}$ will be low, and the overall confidence in the position will be low.

For this paper, we set the user-defined parameters $min_{\mathrm{matches}}$ and confidence threshold to be 50 and 0.3 respectively.

3.6. Gaze Estimation

Equation (6) describes the combination of the pupil position and the iris velocity. These two components represent the calibrated gaze position and the calibrated gaze velocity for the pupil and iris, respectively, which we combine to get a hybrid gaze position. This relation is valid for both cyclopean eye (midpoint of two eyes) (Ono and Barbeito, 1982; Hering, 1977) and independent individual eyes. Independent analysis of the individual gaze of each eye supports the study of vergence eye movements, which is not possible with cyclopean gaze estimation. Many current video-based eye trackers report cyclopean gaze estimates because the precision and accuracy of current eye trackers is insufficient to estimate depth based on gaze position. The improved hybrid signal for each eye may provide adequate signal quality to estimate a useful vergence signal.

For computing the calibrated gaze position from the pupil, calibration is performed by a second-order polynomial of the instructed gaze position and its corresponding relative pupil position (Cerrolaza et al., 2012; Świrski, 2015). The same calibration routine creates problems for iris gaze position because the iris drifts over time, and additionally, due to possible pupil dilation/constriction and a significant position change of gaze during blinks. So, instead, we use a calibration scheme based on the iris velocity signal; we know the relative distance between calibration target positions, and we can extract the relative distance of iris position across the saccades during calibration, so we can compute a mapping function between them. We integrate velocity from 30 ms before to 30 ms after each saccade that brings gaze to a fixation target. With the calibrated position and velocity gaze components, we compute the hybrid gaze position in terms of visual angle.

4. Experimental Design

Table 1. Refinements in the hardware-setup compared to (Chaudhary and Pelz, 2019)

	(Chaudhary and Pelz, 2019)	Ours
Mirrorless Camera	Panasonic Lumix DMC-GH4	Panasonic Lumix DC-GH5S
Light source	Tungsten halogen source with bifurcated fiber optics	Lite-On HSDL-4261 870nm IRED.
Chin rest	Yes	Yes
Forehead rest	Yes	No (extra degree of freedom)
Stimulus	Printed Snellen Chart	Appearing Intermittently (Teleprompter)
Frame rate	96 Hz	120 Hz
Synchronization	No (finite interval used to localize eye movements)	Yes (Photo-diode setup)

4.1. Camera Setup

We tested our method with images captured with a Panasonic Lumix DC-GH5S mirrorless digital camera (modified by removing the IR-rejection filter) with a Lumix G Vario 14-140mm II ASPH (H-FSA14140) lens set to a focal length of approximately 100 mm and an aperture of F/8. Sensitivity to visible wavelengths was blocked with a DHD IR760 high-pass filter ( $T_{380-740}<1\%,\ T_{780-1200}>85\%$ ). The camera was set to an ISO value of 800 and the shutter speed to $1/400$ sec. We recorded binocular eye movements at 120 frames per second (fps) at a resolution of 1920 $\times$ 1080, with IPB compression [FHD/8bit/100M/59.94Hz] with a slow-motion effect of 1/2).

The camera was placed 50 cm from the observers’ eyes, whose position was fixed with a UHCOTech HeadSpot chin rest (with no forehead rest). The setup allowed observers to make small rotational and translating head movements without significant distance variation. Four infrared LEDs (IREDs) (HSDL-4261, 870 nm, viewing angle= 26^∘), each placed on a square of approximately 2 cm, were used to illuminate each eye uniformly at angle of 25-30 degrees above the horizontal. The IREDs were placed at a distance of approximately 9 cm from the eyes (as shown in Figure 4) such that an area of approx. 36 $cm^{2}$ of the eyes was covered to provide proper illumination even if the observer made small head movements. The total irradiance at the eye was 0.01 $W/cm^{2}$ measured with a calibrated radiometer.

Table 1 summarizes the refinements made in the hardware setup compared to (Chaudhary and Pelz, 2019).

4.2. Display target

We used a teleprompter setup (Glide Gear TMP50 20.3 X 17.8 cm) with an iPad (MR7FZLL/A) to display the stimuli. The stimuli were presented as a 30 fps video, the same temporal resolution as the iPad. To synchronize the displayed stimulus, we use a photo-diode setup, as shown in Figure 4. A region of the video that is not visible in the teleprompter screen contained a unique binary pattern of black and white patches, which transitioned each time the stimulus display changed. The binary pattern was detected by a photo-diode and Op-Amp (LM-358) circuit. Because the iPad LCD display has a faster black-to-white transition than white-to-black, we used the black-to-white transitions to mark events. To make the display sync signal available in the video record, the output of the Op-Amp drove a 940 nm IRLED in the field of view of the Lumix camera to indicate when a stimulus was presented to the observer in the video sequence.

Because iPads are raster displays operating at 30 Hz, it takes approximately 33 ms to rewrite the entire display. We derived a parametric model to find the delay based on the display position on the screen. We have accounted for this time delay in our results. The parametric model is given by

(7)

t=(21.4*x)+(4.26*y)-2.35

where $x$ and $y$ refer to horizontal and vertical stimulus position in pixels, and $t$ is the estimated delay in ms.

5. Subjects

We recorded eye movements of seven participants (four males and three females) with a mean age of 31 ( $\sigma$ =12) with normal or corrected-to-normal (two out of seven subjects) vision. Observers with a varying range of iris pigmentation were selected for the experiment. The experiment was conducted with the approval of the Institutional Review Board, and all participants provided informed consent before starting the experiment.

6. Tasks

Every observer performed a sequence of tasks. Initially, 12 calibration targets were displayed on the screen in a pseudo-random pattern to allow the maximum number of changes in horizontal and vertical directions, as shown in Figure 6. Each target consisted of concentric circles, the larger of which intiially subtended an angle of 1.0 degrees. The target first grew to a size of 1.34 degrees then decreased to a size of 0.5 degrees before disappearing after one second, as shown in Figure 7. The field of view of the calibration targets was 14.19^∘ X 9.68^∘. The calibration was followed by the tasks described in the following section.

6.1. Task 1: Calibration Verification Task

Six calibration verification targets were shown in sequence, subtending a total angle of 10.02^∘ X 3.99^∘. The delay between the disappearance of one target and the appearance of the next target was on average 31 ms ( $\sigma$ =5). The calibration verification points were different from those used during calibration.

6.1.1. Measures

We evaluated the accuracy and precision with which the methods predicted gaze on the verification targets, assuming that the observer fixated each target. Accuracy measures were based on the difference between the displayed target position and the mean reported gaze position of the stable fixation window. The fixation window was determined from a rolling 450 ms window with a minimum dispersion search after the target was displayed on the screen.

Another essential metric we considered was precision during fixations. Sample-to-sample root mean square error (S2S-RMS) and standard deviation (STD) are two widely used metrics to measure the precision of eye-trackers (Edlund and Nichols, 2019, p. 182-4). Both measures are related to the spatial variability in the signal over time, but they contain different information about an eyetracker’s behavior (Edlund and Nichols, 2019, p. 182-4), (Niehorster et al., 2020). Because S2S-RMS is calculated on temporally adjacent data points, its value relays information about the spatio-temporal aspects of a system that are absent from STD measures. S2S-RMS is also inherently sensitive to the update rate of the eye-tracker (Blignaut and Beelders, 2012).

(8)

S2S-RMS=\sqrt{\dfrac{\sum_{i=1}^{n}{(x_{i}-x_{i-1}})^{2}+(y_{i}-y_{i-1})^{2}}{n}}

(9)

STD=\sqrt{\dfrac{(\sigma_{x}^{2}+\sigma_{y}^{2})}{2}}

6.2. Task 2: Smooth Pursuit Task

To evaluate the eye-tracking methodology, we evaluated two tasks that require very high precision: microsaccade detection and smooth pursuit. For a smooth-pursuit task, observers followed a moving target on a ramp (linear-trajectory) at different velocities (mean=4.6 deg/s, $\sigma$ =1.9) with 17 random directional changes.

6.2.1. Measures

Measurement of accuracy and precision in a smooth-pursuit task is not straightforward. (Komogortsev and Karpov, 2013; Komogortsev et al., 2010) proposed a method to determine the accuracy based on how closely the smooth-pursuit signal matches the target stimulus. A quantitative smooth pursuit score based on the position and velocity was reported based on the Euclidean distance and differences in speed at every timestamp with respect to the smooth-pursuit target stimulus.

We propose a method for measuring precision during smooth pursuit using S2S-RMS and STD after ‘detrending’ the raw data. ’Detrending’ a signal subtracts the best-fit line/curve from the data (Moncrieff et al., 2004). So, a constant velocity term can be removed from the smooth-pursuit data resulting in nearly zero-velocity signal. In any smooth pursuit movement with randomly changing directions, it is observed that there is a latency of $100-130$ ms after the direction change (Lisberger et al., 1987; Lisberger and Westbrook, 1985; Fukushima et al., 2013). At that point, the eye either begins moving at approximately the correct velocity (but is lagging the target due to the latency), or the movement starts with a small saccade in the direction of the stimulus. In either case, the eye velocity is then typically similar to that of the target but lags in position. Finally, any positional offset between the eye and target is corrected by a second ‘catch-up’ saccade, as shown in Figure 8. At that point, the eye position and velocity match the stimulus.

In our proposed smooth-pursuit precision metric, we initially find the time interval where both the eye position and eye velocity are closest to the stimulus position and velocity. The eye position at this moment is referred to as the starting point. Similarly, we compute the ending point just before the stimulus changes direction. An equation describing the line joining the starting and ending gaze points is computed. The gaze signal is detrended using this signal (line), resulting in a signal with zero mean velocity and can be analyzed in the same way as a fixation signal. We can thus compute the precision (S2S-RMS and STD) for the detrended signal.

6.3. Task 3: Microsaccade Detection Task

Motivated by (Shelchkova et al., 2018) and (Chaudhary and Pelz, 2019), we evoked small, voluntary eye movements with a Snellen acuity chart by displaying a sequence of fixation targets on the teleprompter screen. Any voluntary or involuntary saccade less than 0.5 $\circ$ is considered as microsaccade in this paper following (Engbert and Kliegl, 2003; Otero-Millan et al., 2008; Troncoso et al., 2008; Shelchkova et al., 2018; Chaudhary and Pelz, 2019). There were two elements in this task; first, the observer was asked to fixate on a series of thin color bars (6 bars, each (5.2x12 arcmin)), then on six colored boxes, alternating in size between 12x12 and 5.2x12 arcmin. Each target was displaced from the previous target in the horizontal direction by 0.2 degrees (12 arcmin). The total expected number of small eye movements was ten (five each for the color bars and boxes).

6.3.1. Measures

We measured the number of microsaccades detected when the observer looked to the different color bars $/$ boxes. Horizontal eye movements detected within 100 - 500 ms of each target onset were identified as microsaccades. Note that horizontal cyclopean velocity was used for microsaccade identification. We used the method described in (Chaudhary and Pelz, 2019), where the velocity signal is filtered with a 1D total variational denoising filter (regularization value of 0.05). After denoising the velocity signal, a threshold value is determined with an adaptive algorithm by fitting two velocity distributions representing noise and microsaccade based on Gaussian mixture models. Velocities above the adaptive threshold are identified as microsaccades based on the Velocity-Threshold Identification algorithm (I-VT) (Salvucci and Goldberg, 2000). We refer the reader to (Chaudhary and Pelz, 2019) (Section: Microsaccade detection) for additional details.

7. Results

7.1. Iris segmentation

Maintaining the aspect ratio by cropping to 540 $\times$ 540 before resizing to 224 $\times$ 224 boosted the performance by 1.8% and 2.9% for the uncorrelated and correlated test data respectively as seen in Table 2.

Table 2. Note improvement in segmentation results due to change in aspect ratio

Dataset	(Chaudhary and Pelz, 2019)	Ours
Training	89.8%	92.4%
Uncorrelated test	89.1%	90.7%
Correlated test	86.6%	89.1%

7.2. Qualitative Results

The dramatic difference in noise between the traditional P-CR and $\pi_{t}$ methods can be seen in Figure 9. The top panel shows the horizontal and vertical position signals for both methods; the inset panels show the indicated segments at $4\times$ magnification. Note that the noise reduction from the $\pi_{t}$ method does not introduce a temporal lag in the signal as temporal filtering methods do (distinguishable with (*) marker in position plots). The lower panel shows the velocity signals over the same periods for both methods. The reduced noise inherent in the $\pi_{t}$ method is especially evident in the velocity signals. Note that all the results presented for traditional P-CR and $\pi_{t}$ methods are analyzed for same video at the same timestamps.

7.3. Task 1: Verification Task Performance

7.3.1. Accuracy

Figure 10 (left) shows the accuracy of $\pi_{t}$ and P-CR methods along the horizontal axis. Both plots indicate accuracy for the left, right, and cyclopean eye. Each point in the figure represents an individual participant. Figure 10 (right) shows the same data for accuracy along the vertical axis. Note that accuracy is the same for $\pi_{t}$ and P-CR based methods.

7.3.2. Precision

While the accuracy values are the same, the $\pi_{t}$ method has significantly better precision. Figure 11 represents the precision for $\pi_{t}$ and P-CR based methods measured with S2S-RMS (left panel) and standard deviation (STD) (right panel) metrics. Each figure has results for individual left and right eyes and for the combined, cyclopean eye. The median sample-to-sample root mean square (S2S-RMS) and the standard deviation value are improved by at least 55% and 23%, respectively.

7.4. Task 2: Smooth Pursuit Performance

Figure 12 represents the gaze map of eye movements during a smooth pursuit task for one participant. Note that small, high-frequency fluctuations during smooth pursuit are minimized in the $\pi_{t}$ record compared to P-CR. Figure 13 represents the precision for $\pi_{t}$ and P-CR based methods using S2S-RMS (left) and STD (right) metrics. Each figure has results for individual left and right eye and the combined cyclopean eye. Note that the reported precision is for the detrended signal, as described in Section 6.2.1. The median sample-to-sample root mean square (S2S-RMS) and the standard deviation value are improved by at least 48% and 10%, respectively.

7.5. Task 3: Microsaccade Detection Performance

As described in section 6.3, out of ten possible small microsaccades per subject, the number of microsaccades detected by the seven subjects was [10, 8, 8, 7, 7, 6, 5]. Only the first detected microsaccade event during a time interval of 100-500 ms of each target onset was counted. If no small-movements were beyond the threshold value during that time interval, then the count was zero. Note that 27% of the microsaccades are not detected, mainly because of the significant head movement and simultaneous eye movement of the person to fixate at the gaze position. Figure 14 shows a number of microsaccades detected with the $\pi_{t}$ model. The P-CR signal has too much noise to allow the microsaccades to be detected.

7.6. Simulation Test

In this section, we explore the mathematical formulation in Equation 5 using simulated data. Our hypothesis is, given any two random signals ( $A_{noise}$ and $B_{drift}$ ) which are related to each other by first-order derivative/integration; where signals $A_{noise}$ and $B_{drift}$ are influenced by spatial noise and temporal drift respectively, then the $\pi_{t}$ algorithm can be used to estimate a signal that compensates for both spatial noise and temporal drift. Note that rather than attempting to replicate eye-gaze data we are simulating random signals $A_{noise}$ and $B_{drift}$ that fulfill the stated requirements.

Signals $A_{noise}$ and $B_{drift}$ are derived from a 2 Hz square wave with an amplitude of three units sampled at 250 Hz for 2 sec followed by a 1 Hz sine wave with a peak amplitude of two units sampled at 250 Hz for the next two seconds. Signal $A_{noise}$ consists of the addition of random Gaussian noise ( $\mathcal{N}(0,0.03)$ with a sampling size of 1000) to the original signal (position-like signal; noisy). Signal $B_{drift}$ consists of the addition of random Gaussian noise ( $\mathcal{N}(0,0.01)$ ) in the spatial gradient of the original signal (velocity-like signal; temporal drift). Figure 15 shows ten such trials and its derived output based on $\pi_{t}$ .

For quantitative results, Table 3 shows the performance of our approach $\pi_{t}$ generated using these two signals ( $A_{noise}$ & $B_{drift}$ ) in terms of mean square error (MSE) and $R^{2}$ computed against the original signal for 100 trials. Note MSEo $R^{2}$ o represents computation of these metrics in the original signal domain, MSEt $R^{2}$ t represents computation with the gradient of the signal. We observe an improvement in the original domain signal as well as a gradient-domain signal for both metrics MSE, and $R^{2}$ for our $\pi_{t}$ derived signal.

Table 3. Simulation test; MSEo

R^{2}

o represents computation of these metrics in the original signal domain, MSEt

R^{2}

t represents computation with the gradient of signal. MSEt and MSEo are in

*10^{-4}

units.

	Signal $A_{noise}$	Signal $B_{drift}$	$\pi_{t}$
MSEo	9.04 $\pm$ 0.38	475.71 $\pm$ 480.30	1.51 $\pm$ 0.12
MSEt	18.12 $\pm$ 1.06	1.0 $\pm$ 0.04	0.88 $\pm$ 0.04
$R^{2}$ o	0.9998	0.9914	0.9999
$R^{2}$ t	0.9931	0.9996	0.9997

8. Discussion

The previous methods based on tracking motion of iris features offer significant improvements over existing methods, but an important limitation of these methods is the temporal drift in the estimation of gaze. The major contributing factor for the drift was the compounding of small errors in the approximation of velocity. Our work addresses this issue by considering the pupil edges (extensively used in traditional gaze tracking methodologies) as support for velocity approximation. We present a new mathematical formulation which handles drift in gaze position by considering a weighted sum in a Kalman filter framework.

Moreover, our approach is being extended further such that we will also be capable of handling cases during blinks and insufficient texture matches. During calibration, the use of positional information from $\pi_{t}$ helps to eliminate the need for integration of velocity, which was the bottleneck during blinks in (Chaudhary and Pelz, 2019; Pelz and Hansen, 2017). Our modification in $\beta_{P}$ and $\beta_{I}$ , as described in section 3.5, helps to handle cases of blinks and insufficient texture matches. Note that in cases of partial blinks, complete pupil edges are often not visible, and ellipse fits cannot be computed reliably. In such scenarios, there is still a possibility of a good number of feature matches. The proposed method allows for the study of eye movements even during the period immediately before and after a full blink, which leads to errors when relying on ellipse fits. Further, with traditional methods, there is a high possibility of ellipse fit on false edges, which is eliminated in our method as we rely on a large number of iris features.

Our mathematical formulation can further be interpreted as the superposition of contributions from separate pupil and iris estimates. In this case, we consider the signal as the combination of a high-temporal frequency component from the iris motion vectors and low-temporal frequency components from pupil position. This combination helps to maintain the accurate low-frequency position signal while supporting high precision tasks based on the high-frequency component from the stable iris motion vectors. This formulation eliminates reliance on high-frequency components from the pupil position, which are noisy and interfere in the study of eye movements such as smooth pursuit and microsaccades.

Our method of calibration of the two components separately is critical. This is because the iris position drifts over time, and we need the same scaling factor for the combination. A mismatch in the scaling factor creates undershoot or overshoot during saccades as the mathematical formulation is the weighted sum of two components when the low-frequency component is significant. Thus, we propose a new iris calibration technique based solely on the relative change in iris velocity with the change in calibration targets.

The hybrid ( $\pi_{t}$ ) approach shows a comparable accuracy with the P-CR based approaches, suggesting that drift is not prevalent in our method even when iris information is incorporated. The simulation results in Table 3 show a low value of MSEo and MSEt in $\pi_{t}$ compared to original individual signals $A_{noise}$ and $B_{drift}$ , suggesting that this work can be applied in any domain where noise is prevalent in one signal and temporal drift in another. The small values in MSEo verify our contribution in handling temporal drift existing in $\pi_{t}$ based methods.

The advantage of our ( $\pi_{t}$ ) method is the improvement in the precision values without affecting the accuracy in the system. An important performance metric for eye-tracking systems is S2S-RMS, as it is highly influenced by the frame rate and temporal variation. Our method also shows an improvement in median value of S2S-RMS with at least a 55% reduction (0.132 ${}^{\circ}\rightarrow$ 0.042 ^∘, 0.145 ${}^{\circ}\rightarrow$ 0.052^∘, and 0.099 ${}^{\circ}\rightarrow$ 0.044 ^∘ for left, right and cyclopean eyes) in the verification task for the same video sequences. Further, we have shown an improvement of 23% in STD, which can be thought of as a combination of both eye movement and eye-tracking methodology noise.

Additionally, we show an improvement of at least 48% in S2S-RMS and 10% in STD for smooth pursuit tasks, demonstrating the value of the method for studies of smooth pursuit. We also highlight an essential contribution of (Chaudhary and Pelz, 2019)’s previous work in detecting eye movements as small as 0.2^∘ and also verifies that this formulation does not deteriorate the eye gaze signal quality, as all the missing detection were because of simultaneous head and eye movements to fixate at the target. Lowering the regularization value (section 6.3) helps in detection of few of these movements but it increases the number of false alarms. A relevant field of application to this strategy would be the study of vergence eye movement as we have boosted the ability of eye trackers to study movements with small changes.

Our study improves the precision of current video-based eye trackers by relying on multiple features of the human eye. The method is not without limitations, of course. First, it requires multiple calibration points in order, which include a series of changes in X-Y direction (refer to Figure 6). It will require an extra effort for calibration design, though the present study does not answer the minimum calibration point changes required for proper iris calibration; we used 12 changes, and addressing the minimum number is recognized as future work. Second, the performance of our method falls in the case of blurred images as we rely on iris features for the velocity signal and on pupil edges for pupil position estimates. Lack of a sharp eye image is also a problem for many current video-based systems as well, but this can be solved with a higher-resolution camera and proper exposure time. Third, finding the precision matrix in Equation 4 is a time-consuming step that increases as the time series increase. For our short videos, it was not an impediment, but for general use using long series, we can incorporate a precision matrix using techniques like LU-decomposition (Banachiewicz, 1938) or use small windowing blocks to speed processing. Finally, compression artifacts are still a problem in our method as mentioned in (Chaudhary and Pelz, 2019)

In summary, the main contributions of this paper are handling of spatial drift, a novel strategy that incorporates the two signals in different domains that are related by first-order derivative/integration (one noisy the other with temporal drift), a new calibration routine for iris calibration, a way to compute precision in smooth pursuit signals by signal detrending, and most importantly, a method for estimating gaze that allows for significant improvement of precision compared to current video-based eye-tracking methodology. At the same time, this method is also useful in tasks requiring high precision, such as the study of microsaccades, smooth pursuit, and even vergence eye movements.

This methodology can be used with any eye tracker (P-CR, 3D based approaches, appearance-based models) that estimate the head compensated gaze position to improve its precision. It also gives a confidence value for each signal component, supporting further study and error analysis.

9. Acknowledgements

The authors want to acknowledge the contribution of our high school intern, Brian Cowburn, for helping hardware setup for synchronizing iPad display and the IRLED with a photo-diode setup.

References

(1)
Alofi et al. (2017) Afnan Alofi, Anwaar Alghamdi, Razan Alahmadi, Najla Aljuaid, and M Hemalatha. 2017. A Review of Data Fusion Techniques. International Journal of Computer Applications 167, 7 (2017).
Anderson and Moore (2012) Brian DO Anderson and John B Moore. 2012. Optimal filtering. Courier Corporation.
Banachiewicz (1938) Th Banachiewicz. 1938. Méthode de résolution numérique des équations linéaires, du calcul des déterminants et des inverses, et de réduction des formes quadratiques. Bull. Intern. Acad. Polon. Sci. A (1938), 393–401.
Bishop (2006) Christopher M Bishop. 2006. Pattern recognition and machine learning. springer.
Blignaut (2019) Pieter Blignaut. 2019. A cost function to determine the optimum filter and parameters for stabilizing gaze data. (2019).
Blignaut and Beelders (2012) Pieter Blignaut and Tanya Beelders. 2012. The precision of eye-trackers: a case for a new measure. In Proceedings of the symposium on eye tracking research and applications. 289–292.
Bradski (2000) Gary Bradski. 2000. The opencv library. Dr Dobb’s J. Software Tools 25 (2000), 120–125.
Canny (1986) John Canny. 1986. A computational approach to edge detection. IEEE Transactions on pattern analysis and machine intelligence 6 (1986), 679–698.
Castanedo (2013) Federico Castanedo. 2013. A review of data fusion techniques. The Scientific World Journal 2013 (2013).
Cerrolaza et al. (2012) Juan J Cerrolaza, Arantxa Villanueva, and Rafael Cabeza. 2012. Study of polynomial mapping functions in video-oculography eye trackers. ACM Transactions on Computer-Human Interaction (TOCHI) 19, 2 (2012), 1–25.
Chaudhary and Pelz (2019) Aayush Chaudhary and Jeff Pelz. 2019. Motion tracking of iris features to detect small eye movements. Journal of Eye Movement Research 12, 6 (Apr. 2019). https://doi.org/10.16910/jemr.12.6.4
Chaudhary (2019) Aayush K Chaudhary. 2019. Motion tracking of iris features for eye tracking. In Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications. 1–3.
Chaudhary et al. (2019) A. K. Chaudhary, R. Kothari, M. Acharya, S. Dangi, N. Nair, R. Bailey, C. Kanan, G. Diaz, and J. B. Pelz. 2019. RITnet: Real-time Semantic Segmentation of the Eye for Gaze Tracking. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). 3698–3702. https://doi.org/10.1109/ICCVW.2019.00568
Chen and Ji (2008) Jixu Chen and Qiang Ji. 2008. 3d gaze estimation with a single camera without ir illumination. In 2008 19th International Conference on Pattern Recognition. IEEE, 1–4.
Chen et al. (2008) Jixu Chen, Yan Tong, Wayne Gray, and Qiang Ji. 2008. A robust 3D eye gaze tracking system using noise reduction. In Proceedings of the 2008 symposium on Eye tracking research & applications. ACM, 189–196.
Del Moral (1996) Pierre Del Moral. 1996. Non-linear filtering: interacting particle resolution. Markov processes and related fields 2, 4 (1996), 555–581.
Dierkes et al. (2018) Kai Dierkes, Moritz Kassner, and Andreas Bulling. 2018. A novel approach to single camera, glint-free 3D eye model fitting including corneal refraction. In Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications. ACM, 9.
Dierkes et al. (2019) Kai Dierkes, Moritz Kassner, and Andreas Bulling. 2019. A fast approach to refraction-aware eye-model fitting and gaze prediction. In Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications. ACM, 23.
Douglas and Peucker (1973) David H Douglas and Thomas K Peucker. 1973. Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica: the international journal for geographic information and geovisualization 10, 2 (1973), 112–122.
Edlund and Nichols (2019) John E Edlund and Austin Lee Nichols. 2019. Advanced Research Methods for the Social and Behavioral Sciences. Cambridge University Press. 182–190 pages.
Ehinger et al. (2019) Benedikt V Ehinger, Katharina Gross, Inga Ibs, and Peter Koenig. 2019. A new comprehensive Eye-Tracking Test Battery concurrently evaluating the Pupil Labs Glasses and the EyeLink 1000. BioRxiv (2019), 536243.
Engbert and Kliegl (2003) Ralf Engbert and Reinhold Kliegl. 2003. Microsaccades uncover the orientation of covert attention. Vision research 43, 9 (2003), 1035–1045.
Fukushima et al. (2013) Kikuro Fukushima, Junko Fukushima, Tateo Warabi, and Graham R Barnes. 2013. Cognitive processes involved in smooth pursuit eye movements: behavioral evidence, neural substrate and clinical correlation. Frontiers in systems neuroscience 7 (2013), 4.
Ghimire et al. (2019) Sandesh Ghimire, John L Sapp, B Milan Horacek, and Linwei Wang. 2019. Noninvasive Reconstruction of Transmural Transmembrane Potential with Simultaneous Estimation of Prior Model Error. IEEE transactions on medical imaging (2019).
Guestrin and Eizenman (2006) Elias Daniel Guestrin and Moshe Eizenman. 2006. General theory of remote gaze estimation using the pupil center and corneal reflections. IEEE Transactions on biomedical engineering 53, 6 (2006), 1124–1133.
Hansen and Ji (2009) Dan Witzner Hansen and Qiang Ji. 2009. In the eye of the beholder: A survey of models for eyes and gaze. IEEE transactions on pattern analysis and machine intelligence 32, 3 (2009), 478–500.
Hansen and Ji (2010) Dan Witzner Hansen and Qiang Ji. 2010. In the eye of the beholder: A survey of models for eyes and gaze. IEEE transactions on pattern analysis and machine intelligence 32, 3 (2010), 478–500.
Hering (1977) Ewald Hering. 1977. The theory of binocular vision. Plenum Publishing Corporation.
Holmdahl (2015) Todd Holmdahl. 2015. BUILD 2015: A closer look at the Microsoft HoloLens hardware. Microsoft Devices Blog 30 (2015).
Kalman (1960) Rudolph Emil Kalman. 1960. A new approach to linear filtering and prediction problems. Journal of basic Engineering 82, 1 (1960), 35–45.
Kassner et al. (2014a) Moritz Kassner, William Patera, and Andreas Bulling. 2014a. Pupil: An Open Source Platform for Pervasive Eye Tracking and Mobile Gaze-based Interaction. In Adjunct Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing (Seattle, Washington) (UbiComp ’14 Adjunct). ACM, New York, NY, USA, 1151–1160. https://doi.org/10.1145/2638728.2641695
Kassner et al. (2014b) Moritz Kassner, William Patera, and Andreas Bulling. 2014b. Pupil: an open source platform for pervasive eye tracking and mobile gaze-based interaction. In Proceedings of the 2014 ACM international joint conference on pervasive and ubiquitous computing: Adjunct publication. ACM, 1151–1160.
Kim et al. (2019) Joohwan Kim, Michael Stengel, Alexander Majercik, Shalini De Mello, David Dunn, Samuli Laine, Morgan McGuire, and David Luebke. 2019. NVGaze: An anatomically-informed dataset for low-latency, near-eye gaze estimation. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 550.
Komogortsev et al. (2010) Oleg V Komogortsev, Denise V Gobert, Sampath Jayarathna, Do Hyong Koh, and Sandeep M Gowda. 2010. Standardization of automated analyses of oculomotor fixation and saccadic behaviors. IEEE Transactions on Biomedical Engineering 57, 11 (2010), 2635–2645.
Komogortsev and Karpov (2013) Oleg V Komogortsev and Alex Karpov. 2013. Automated classification and scoring of smooth pursuit eye movements in the presence of fixations and saccades. Behavior research methods 45, 1 (2013), 203–215.
Labs (2019) Pupil Labs. 2019. Invisible Eye Tracker. (2019). https://pupil-labs.com/products/invisible/
Li and Parkhurst (2005) Dongheng Li and Derrick J Parkhurst. 2005. Starburst: A robust algorithm for video-based eye tracking. Elselvier Science 6 (2005).
Li et al. (2008) Feng Li, Susan Munn, and Jeff Pelz. 2008. A model-based approach to video-based eye tracking. Journal of Modern Optics 55, 4-5 (2008), 503–531.
Lisberger and Westbrook (1985) SG Lisberger and LE Westbrook. 1985. Properties of visual inputs that initiate horizontal smooth pursuit eye movements in monkeys. Journal of Neuroscience 5, 6 (1985), 1662–1673.
Lisberger et al. (1987) Stephen G Lisberger, EJ Morris, and Lawrence Tychsen. 1987. Visual motion processing and sensory-motor integration for smooth pursuit eye movements. Annual review of neuroscience 10, 1 (1987), 97–129.
Marcus (2018) Gary Marcus. 2018. Deep learning: A critical appraisal. arXiv preprint arXiv:1801.00631 (2018).
Moncrieff et al. (2004) John Moncrieff, Robert Clement, John Finnigan, and Tilden Meyers. 2004. Averaging, detrending, and filtering of eddy covariance time series. In Handbook of micrometeorology. Springer, 7–31.
Niehorster et al. (2020) Diederick C Niehorster, Raimondas Zemblys, Tanya Beelders, and Kenneth Holmqvist. 2020. Characterizing gaze position signals and synthesizing noise during fixations in eye-tracking data. Behavior Research Methods (2020).
Ono and Barbeito (1982) Hiroshi Ono and Raphael Barbeito. 1982. The cyclopean eye vs. the sighting-dominant eye as the center of visual direction. Perception & Psychophysics 32, 3 (1982), 201–210.
Otero-Millan et al. (2008) Jorge Otero-Millan, Xoana G Troncoso, Stephen L Macknik, Ignacio Serrano-Pedraza, and Susana Martinez-Conde. 2008. Saccades and microsaccades during visual fixation, exploration, and search: foundations for a common saccadic generator. Journal of vision 8, 14 (2008), 21–21.
Park et al. (2018a) Seonwook Park, Adrian Spurr, and Otmar Hilliges. 2018a. Deep pictorial gaze estimation. In Proceedings of the European Conference on Computer Vision (ECCV). 721–738.
Park et al. (2018b) Seonwook Park, Xucong Zhang, Andreas Bulling, and Otmar Hilliges. 2018b. Learning to find eye region landmarks for remote gaze estimation in unconstrained settings. In Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications. ACM, 21.
Pei et al. (2017) Yan Pei, Swarnendu Biswas, Donald S Fussell, and Keshav Pingali. 2017. An elementary introduction to kalman filtering. arXiv preprint arXiv:1710.04055 (2017).
Pelz and Hansen (2017) Jeff B Pelz and Dan Witzner Hansen. 2017. System and method for eye tracking. International Patent Application No. PCT/US2017/034756 (2017).
Pizer et al. (1987) Stephen M Pizer, E Philip Amburn, John D Austin, Robert Cromartie, Ari Geselowitz, Trey Greer, Bart ter Haar Romeny, John B Zimmerman, and Karel Zuiderveld. 1987. Adaptive histogram equalization and its variations. Computer vision, graphics, and image processing 39, 3 (1987), 355–368.
Ronneberger et al. (2015) Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234–241.
Salvucci and Goldberg (2000) Dario D Salvucci and Joseph H Goldberg. 2000. Identifying fixations and saccades in eye-tracking protocols. In Proceedings of the 2000 symposium on Eye tracking research & applications. 71–78.
San Agustin et al. (2010) Javier San Agustin, Henrik Skovsgaard, Emilie Mollenbach, Maria Barret, Martin Tall, Dan Witzner Hansen, and John Paulin Hansen. 2010. Evaluation of a low-cost open-source gaze tracker. In Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications. 77–80.
Shelchkova et al. (2018) Natalya Shelchkova, Michele Rucci, and Martina Poletti. 2018. Perceptual enhancements during microsaccade preparation. Journal of Vision 18, 10 (2018), 1278–1278.
Shih and Liu (2004) Sheng-Wen Shih and Jin Liu. 2004. A novel approach to 3-D gaze tracking using stereo cameras. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 34, 1 (2004), 234–245.
Świrski (2015) Lech Świrski. 2015. Gaze estimation on glasses-based stereoscopic displays. (2015).
Świrski et al. (2012) Lech Świrski, Andreas Bulling, and Neil Dodgson. 2012. Robust real-time pupil tracking in highly off-axis images. In Proceedings of the Symposium on Eye Tracking Research and Applications. ACM, 173–176.
Troncoso et al. (2008) Xoana G Troncoso, Stephen L Macknik, Jorge Otero-Millan, and Susana Martinez-Conde. 2008. Microsaccades drive illusory motion in the Enigma illusion. Proceedings of the National Academy of Sciences 105, 41 (2008), 16033–16038.
Tsukada et al. (2011) Akihiro Tsukada, Motoki Shino, Michael Devyver, and Takeo Kanade. 2011. Illumination-free gaze estimation method for first-person vision wearable device. In 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops). IEEE, 2084–2091.
Villanueva et al. (2008) Arantxa Villanueva, Rafael Cabeza, et al. 2008. Evaluation of corneal refraction in a model of a gaze tracking system. IEEE Transactions on Biomedical Engineering 55, 12 (2008), 2812–2822.
Wang et al. (2019) Xi Wang, Albert Chern, and Marc Alexa. 2019. Center of circle after perspective transformation. arXiv preprint arXiv:1902.04541 (2019).
Wood et al. (2016) Erroll Wood, Tadas Baltrušaitis, Louis-Philippe Morency, Peter Robinson, and Andreas Bulling. 2016. Learning an appearance-based gaze estimator from one million synthesised images. In Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications. ACM, 131–138.

πt\pi_{t} - Enhancing the Precision of Eye Tracking using Iris Feature Motion Vectors