Bilevel Inverse Problems in Neuromorphic Imaging
Abstract.
Event or Neuromorphic cameras are novel biologically inspired sensors that record data based on the change in light intensity at each pixel asynchronously. They have a temporal resolution of microseconds. This is useful for scenes with fast moving objects that can cause motion blur in traditional cameras, which record the average light intensity over an exposure time for each pixel synchronously. This paper presents a bilevel inverse problem framework for neuromorphic imaging. Existence of solution to the inverse problem is established. Second order sufficient conditions are derived under special situations for this nonconvex problem. A second order Newton type solver is derived to solve the problem. The efficacy of the approach is shown on several examples.
Key words and phrases:
Bilevel optimization, neuromorphic imaging, existence of solution, second order sufficient conditions2010 Mathematics Subject Classification:
65K05, 90C26, 90C46, 49J201. Introduction
Event (Neuromorphic) cameras are novel biologically inspired sensors that record data based on the change in light intensity at each pixel asynchronously [9]. If the change in light intensity at a given pixel is greater than a preset threshold then an event is recorded at that pixel. For this reason if there is no change to the scene, be that movement or brightening/dimming of a light source, no events will be recorded. In contrast if a scene is dynamic from camera movement or from the movement of an object in the scene then each pixel of the event camera will record intensity changes with a temporal resolution on the order of microseconds [9]. This logging of events results in a non-redundant stream of events through the time dimension for each pixel [16]. The stream of data is exceptionally useful for scenes with fast moving objects that can cause motion blur in traditional cameras, which record the average light intensity over an exposure time for each pixel synchronously [13]. Figure 1 gives an illustration of the differences between traditional and event cameras.

The prior work, [13], proposed the Event-based Double Integral (EDI) and multiple Event-based Double integral (mEDI) algorithms to address motion blur for the underlying inverse problem. The EDI model utilizes a single blurry standard camera image, along with associated event data, to generate one or more clear images through the process of energy minimization. The mEDI model, an extension of the EDI model, utilizes multiple images in conjunction with event data to produce unblurred images. The EDI and mEDI models were an extension of the work from [16] where a continuous-time model was initially introduced. While these algorithms represent a leap forward over earlier work, questions remain about the validity of the models in various scenarios (see below), regarding what situations solutions exist and what optimization procedures are appropriate in order to find such solutions. In this paper we provide a mathematical foundation and further extensions to the previous methods by extending the model and providing a bilevel inverse problem framework. Indeed, as shown below, the EDI and mEDI models correspond to inverse problems in image deblurring [3, 4]. In particular,
-
(a)
We introduce a bilevel optimization problem that allows for the simultaneous optimization over a desired number of frames. We also illustrate that this optimization problem is indeed an inverse problem. Therefore, we use the terms ‘optimization’ and ‘inverse’, interchangeably.
-
(b)
We establish existence of solution to this problem. Under certain conditions, we derive the second order sufficient conditions and provide local uniqueness of solution to these nonconvex problems.
-
(c)
A fully implementable framework based on second order methods has been developed. The benefits of the proposed approach are illustrated using multiple numerical examples.
For completeness, we emphasize that the bilevel approaches to search for the hyperparameters in traditional imaging is not new, see for instance [2, 8]. However, the problem considered in this paper is naturally bilevel without having to search for the hyperparameters.
Outline: The remainder of the paper is organized as follows. Section 2 focuses on some notation and preliminary results. In Section 2.1, we discuss the basics of event based cameras. Section 2.3 focuses on existing models from [13] which serves as a foundation for the proposed bilevel optimization based variational model in Section 3 . Existence of solution to the proposed optimization problem is shown in Theorem 3.1. Moreover, local convexity of our reduced functional is shown in Theorem 3.5. Section 4 first focuses on how to prepare the data for the algorithm. It then discuss implementation details. The problem itself is solved using a second order Newton based method. Subsequently, several illustrative numerical examples are provided in Section 5 .
2. Notation and Preliminaries
Let represent the number of images and denotes the number of pixels per image. We use to denote a tensor. We use the vector
to represent the pixel value of for all times. Moreover, we use the matrix
to represent an image of size at a fixed time instance . We use the scalar
to denote the value of at a pixel location and time . Graphical representations are given in Figure 2 . Finally, given quantities and for some , we define the operation as:

The remainder of the section is organized as follows. First in Section 2.1, we discuss the basic working principle behind the neuromorphic cameras. The main ideas behind the algorithms presented in [13] are described in Section 2.3. We briefly discuss some potential limitations of this approach which motivates our algorithm in Section 3.
2.1. Neuromorphic (Event Based) Cameras
Neuromorphic Camera Basics
Neuromorphic cameras are composed of independent pixels that detect light intensity changes in the environment as they occur. Light intensity is sampled and events are logged if the intensity of the light is beyond a preset hardware defined threshold, . We denote the instantaneous light intensity at pixel location at time by . When the light intensity detected by the camera exceeds the threshold for a given pixel located at , at a given time , an event is logged. Then the reference intensity, , for the pixel located at is updated. Due to the high rate of sampling, and independent nature of the camera’s pixels neuromorphic cameras are less susceptible to exposure issues, as well as image blurring. Figure 1 illustrates the difference in the data recorded between traditional and event based cameras.
Data representation
The output of an event based camera over some time interval, , is of the form , with the following definitions:
-
, where represents the exposure interval, each represents a discrete time at which an event occurred and represents the total number of events recorded during the interval, . When discussing methods that involve multiple images, we will denote the exposure interval for the - image as . We will also denote the set of events associated to each image as . With this notation we have and with representing the total number of events associated to image .
-
represent pixel coordinates.
-
is the polarity of the event. An increase in light intensity above a threshold, , we regard as a polarity 1 and a decrease of intensity as a polarity shift of . No event is logged if .
We can reformulate these events into a datacube with the following definition:
(2.1) |
which results in a sparse 3-dimensional matrix with entries of . We refer to Figure 5 for 2D and 3D examples of the datacube. Representing the data in this way allows us to track intensity changes over time on a per pixel basis.
2.2. Inverse problems in Image Deblurring
Image deblurring is a classic example of an inverse problem, where the goal is to reconstruct the unknown original image from a degraded, blurred observation. In image deblurring, the degradation process can be modeled as a interaction between the original image and a blur kernel which represents the blur caused by various factors such as movement [19, 20].
In the context of image deblurring, the field can be divided into two categories, blind and non-blind, based on the amount of information known about the blur kernel [1, 15, 17, 18, 19]. In the non-blind case, the blur kernel is known which leads to a simpler ableit ill-posed inverse problem [17, 18]. In the blind case, no information about the blur kernel is available, making the deblurring problem more challenging as the blur kernel must be estimated from the degraded image in addition to the original image [15, 19]. The semi-blind case is a particular instance of a blind image deblurring problem in that some but not all information about the blur kernel is available [5, 12, 14]. The problems under consideration in this project are of semi-blind type.
2.3. Existing Model, Algorithm, and Limitations
The approach discussed in this paper is motivated by [13]. The article [13] seeks to find latent un-blurred images using the Event Based Double Integral (EDI) and multi-Event Based Double Integral (mEDI) models. We briefly discuss this next. This will be followed by our new proposed models.
EDI Model
In [13], the discrete events outlined in Section 2.1 are used to generate a continuous function, , for each pixel location. This is done by generating a series of continuous unit bump functions centered at each : . Then we can define as:
(2.2) |
We graphically illustrate building this function for a single pixel below with the following string of events:
In Figure 3 we see the function overlayed onto a stem plot of events.

From 2.2, we define the ‘sum of events’ from time of interest to arbitrary time as:
(2.3) |
where corresponds to one pixel and consists of all the pixels. We note that , are manually chosen and represent the time at which we wish to generate an image. In practice we will set each to correspond to the timestamps of the standard camera images we are using in the model. For convenience we will omit from unless needed for clarity. Introducing a scalar variable gives us the EDI Model:
(2.4) |
Where and is a standard camera image. Furthermore, is the pixel value of the unblurred image at location , and is the exposure time for the image. We note that the superscript in and are simply placeholders for notation since the EDI model only considers a single image at a time. Taking the log and re-arranging terms of (2.4) leads to:
(2.5) |
where
(2.6) |
The formulation in (2.5) gives us the correlation between the EDI model and other semi-blind inverse problems in image deblurring. Here is our blurry image, is our latent unblurred image, and is our kernel function. As shown in [1, 3, 4, 6, 7] this formulation is common for many inverse problems. We note the EDI/mEDI models would fall into the category of semi-blind inverse problems mentioned in Section 2.2 since the kernel, , is known up to the parameter .
mEDI Model
If we consider two latent images and we know that event data will give us the per pixel changes that have occurred between the two images. We can mathematically describe this change as where is as given in (2.2) and are user chosen times of interest. Incorporating the threshold variable allows us to formulate the following equations
(2.7) |
Combining (2.5) and (2.7) gives us the following system of equations:
(2.8) |
Notice that, (2.8) gives us the mEDI model described by [13]. The size the upper half of the submatrix in (2.8) is and the lower half matrix is .
mEDI Model - Optimizing for z
The variable is found by solving the following minimization problem for each pixel of an image of interest :
(2.9) |
Here , , and are as given in (2.6). We refer to [13] for their solution approach. Notice that (2.7) is not used as constraint to the minimization problem (2.9), but (2.9) is used to generate the image reconstructions.
Limitations of the mEDI and EDI Models
While the mEDI model uses multiple frames in order to accurately represent de-blurred images some limitations for the model remain.
-
(a)
Performance of the model can degrade for images that contain large variations of light intensity within a single scene. Some examples of this are having very dark objects in a bright room with a highly dynamic scene. In these instances we can see “shadowing” effects where shadows appear across multiple frames. We refer to Section 5.3 for specific examples.
- (b)
-
(c)
Recall from 2.7 that . This definition does not include any information from the standard images . Due to this the mEDI model is not well suited for generating image reconstructions across time-spans that include multiple standard images.
3. Proposed Model and Algorithm
3.1. Model
For each pixel , we consider the following bilevel inverse problem
(3.1a) | |||
subject to constraints | |||
(3.1b) |
Notice, that this is an optimization problem for one pixel across all images. By , we denote the regularization parameters. Moreover,
is a submatrix of the matrix given in (2.8). Recall in Equation (2.9) that is taken as a scalar and the optimization is done over a single image at a time. We consider as size so that we can optimize over the same pixel for all images simultaneously. Also, in contrast to [13] since we have defined as a vector over all images, we also modify the variable with the following definition:
(3.2) |
Where and is the - component of the vector . This is done so the events associated to the - image are being optimized by the - component.
Notice that, above is not a square matrix. For every , the lower level problem is uniquely solvable and the solution is given by
(3.3) |
with
Substituting, this in the upper level problem, we obtain the following reduced problem
(3.4) |
which is now just a problem in . Notice, that the reduced problem (3.4) is equivalent to the full space formulation
(3.5a) | |||
subject to constraints | |||
(3.5b) |
The next result establishes existence of solution to the reduced problem (3.4) and equivalently to (3.5).
Theorem 3.1.
There exists a solution to the reduced problem (3.4).
Proof.
The proof follows from the standard Direct method of calculus of variations. We will omit the subscript for notation simplicity. We begin by noticing that is bounded below by 0. Therefore, there exists a minimizing sequence such that
It then follows that
where the constant is independent of . Thus, is a bounded sequence. Therefore, it has a convergent subsequence, still denoted by , such that . It them remains to show that is the minimizer. Since, solves (3.3), therefore, we have that and as . Subsequently, from the definition of infimum and these convergence estimates, we readily obtain that
Thus is the minimizer and the proof is complete. ∎
In order to develop a gradient based solver for (3.4), we next write down the expression of gradient of .
Lemma 3.2.
The reduced objective is continuously differentiable and the derivative is given by
where ,
and is a diagonal matrix given by
(3.6) | |||
(3.7) |
Proof.
The proof follows by simple calculations. ∎
The next result establishes that is bounded and strictly positive for each , with denoting the - exposure time interval.
Theorem 3.3.
Let denote the - exposure time interval with . If there exists , for all , with being nonzero then is bounded and strictly positive for all .
Proof.
First we will calculate . Recall:
yielding
(3.8) |
where . Let . Then we have the following:
Using Fubini’s Theorem and following the procedure above for the two integral products in the numerator of (3.8) we get the following equalities:
(3.9) |
We notice that Equation (3.9) is positive for (since all terms are positive) and the expression is for . By assumption there exists some such that which implies that an event has occurred at the pixel during the interval Without loss of generality suppose a single event occurred at time . Recall from (2.3) that . Then which implies . Which further impiles that . Thus we conclude . Next, using (3.9), we obtain the following upper bound:
Hence
We now conclude that is bounded and strictly positive. ∎
In order to show the local convexity of , we study the structure of the Hessian of the first term in the definition of .
Lemma 3.4.
The matrix is a diagonal matrix.
Proof.
Without loss of generality, suppose that . Recall by our definition:
Then, recall the expression of from (3.7). Now to calculate the Hessian of , we need to take the derivative with respect to each variable once again. This will result in and will be of the form:
where is defined in Theorem 3.3 . Next we consider . Observe:
where
and
Here , represent the columns of . Then we have the following:
with . Next we can add and to get:
where . Lastly let . Then
(3.9) |
Here denotes the mode 1 unfolding of the tensor. Further details of tensor matrix multiplication can be found in [11]. The proof is complete. ∎
Next, we establish that is strictly locally convex on any finite interval when is chosen appropriately.
Theorem 3.5.
Let denotes the - exposure time interval with . If there exists , for all , with being nonzero then for any closed ball centered at 0 and radius , there exists such that for , is strictly convex.
Proof.
First we calculate Hess() as:
We notice that the term is of them form , which implies that it is symmetric positive semi-definite. Next, from Lemma 3.4 we recall that the term is a diagonal matrix. In the case where the diagonal entries (eigenvalues) are greater than 0, then we are done. Suppose some eigenvalue is less than 0 and let which implies . Given , choose such that
Where and Recall from Theorem 3.3 that , which implies for any . Then using 3.9 we have the following:
Here
and are defined in Equation 3.9 . Notice that is derived from bounds on as follows:
and
Recall is diagonal from 3.9 . Denote the diagonal entries as , . Without loss of generality let , . Then we have the following:
Thus for all . Furthermore for all . We notice
is symmetric positive definite since it is a diagonal matrix with entries: . Hence Hess() is symmetric positive definite due to it being the summation of a symmetric positive definite matrix and a symmetric positive semi-definite matrix. We conclude that for any convex set there exists such that is strictly convex over . ∎
We further note that from Theorem 3.5 once an interval is chosen a lower bound for can be calculated by:
4. Experimental Setup
This section focuses on how to prepare the dataset to be used in our computations.
Standardizing
Depending on the event camera used, the entries of the matrix from (2.4), where identifies a unique standard camera image in the sequence being considered, may take on various ranges to include or . We standardize using MATLAB function mat2gray. This function will map the values of to the range on a log scale. That being said since we define the matrix we have to be careful not to allow to have any zero values. We can easily protect against this by inserting a manual range into the mat2gray function of , for a sufficiently small . In our computations we set . We note and are the minimum and maximum values over all values in the matrix , respectively.
Building the data cube
In order to view the problem in three dimensions, we create a data cube as shown in Figure . As stated in (2.1) we define this cube by:
Since the event data is generated on a nearly continuous basis, the data cube will be of size where is the total number of events used in the reconstruction over all images. Note that is explained in further detail in 2.1 . On average our image reconstructions use approximately events. It is impractical numerically to use this data cube due to its size. This issue is resolved by reducing the size of the data cube via time compression. Choose some such that and define a new data cube as follows:
Then we have where , and . For our examples we have chosen .
Calculate
Once we have built the data cube then we can calculate the function which is defined componentwise as:
Numerically we approximate this using the trapezoid rule as:
where is defined in (2.3) and is the step length. We evaluate the function as well as the derivatives of and in a similar way.
Omitting unnecessary calculations for
We recall that must be calculated on a per pixel basis. To decrease computation time we only perform an optimization for over the pixels for which event data has been captured.
Image Reconstruction
As shown in Figure 5 our method is capable of generating two image representations. Given some value the representation is given as: defined in . The representation reconstructs only the dynamical part of the image. This is due to our definition of in (3.2). In order to reconstruct the full image including the portion of the image with no dynamics, we use
a variation of the definition given in (2.5). In general we will refer to the latter formulation as the reconstruction, and will specifically denote the representation when presented.
5. Numerical Examples
Next, we present a series of examples which establishes that the proposed approach could be beneficial in practice.
5.1. Benchmark Problem: Unit Bump
The first example is a synthetic case where we consider a blurred unit bump, see Figure 4 . In order to perform this analysis, we require a baseline image for comparison, a series of consecutive images as input for the ESIM generator [10] and our proposed method, a blurry image, and the corresponding event data. To begin, we will construct a series of consecutive images by first creating a white circle on a black background (not shown). Using this initial image, we move the white circle two pixels to the right and two pixels down eight times. This set of nine images completes our series of images. We now generate a tenth image, the blurry image (middle in Figure 4), by averaging the sequence of nine images. The fifth image of the set is designated as our baseline image (left in Figure 4). Lastly, to generate the associated event data, we use the ESIM generator from [10]. The ESIM generator takes a series of images as input, and will output event data. Recall that our model requires a series of images as well as event data in order to generate a reconstructed image. We set to be the fourth image of the nine image set, to be the blurry image, and to be the sixth image. The result of our method (right in Figure 4) is compared to the baseline image. As shown, we obtain a high quality reconstruction with SSIM of 0.96 and PSNR of 27.3. This example serves a benchmark for us to apply our approach on the event based dataset.



5.2. Multiple Event Based Dataset Examples
Figures 5-7 show the application of our method to various examples. In each of these examples three standard camera images are used to define the variable, while the and variables are calculated using the associated event data. In each of these examples the image was chosen to be a blurry image. We will refer to the image as the blurry image for the remainder of this section.
In Figure 5 for the chosen three image sequence events were recorded. The blurry image is shown (left) while our reconstruction is shown (right). Our reconstruction deblurs the fence-line as well as captures the grass and tree texture that is not apparent in the blurred image. The blurred image (left) in Figure 6 shows a person swinging a pillow. Approximately 26,000 events were captured for this example. Our model has successfully reconstructed the boundary of the pillow as well as the persons hand and thumb. Figure 7 is an example of a person jumping and the three image sequence used has 16000 associated events. Notice in the blurred image (left) the persons arm is barely visible. In contrast our model is able to reconstruct the persons arm (right).










5.3. Event Based Problem: Shadowing Effects
As noted in Section 2.3 one of the limitations to the mEDI model is the appearance of shadows in a high contrasting environment. Examples of this shadowing effect can be seen in [13] and Figures 8 and 9. As shown in Figures 8 and 9 our model is able to substantially reduce the shadowing effects of the EDI and mEDI models.





Figure 10 shows the behavior of five randomly chosen pixels. We observe the typical convergence behavior of a Newton’s method.

Conclusion, Limitation, and Future Work
In this study, we introduce a novel approach to image deblurring by combining event data and multiple standard camera images within a variational framework. Our method demonstrates the existence of solutions for a locally convex problem and we apply a second order Newton solver to several examples, showcasing the effectiveness of our approach.
Though our method does offer a systematic way to construct de-blurred images across time there are limitations that still exist in the model.
-
(a)
Our method requires a pixel by pixel optimization and depending on the camera resolution can run into scalability issues. This issue would be most evident with images where dynamics occur at a majority of pixels.
-
(b)
Manual tuning may be required of certain parameters as described in Section 4 .
To mitigate the scalability risk, we propose to filter out pixels that have significantly lower event counts compared to other pixels. In future work, we also plan to test the model’s robustness in a domain shift scenario, such as a camera moving in and out of water. Moreover, we aim to develop a parameter learning framework to optimize the regularization parameters for better performance.
Acknowledgement
The authors are grateful to Dr. Patrick O’Neil and Dr. Diego Torrejon (BlackSky) and Dr. Noor Qadri (US Naval Research Lab, Washington DC), for bringing the neuromorphic imaging topic to their attention and for several fruitful discussions.
References
- [1] Mariana SC Almeida and Luis B Almeida. Blind and semi-blind deblurring of natural images. IEEE Transactions on Image Processing, 19(1):36–52, 2009.
- [2] Harbir Antil, Zichao Wendy Di, and Ratna Khatri. Bilevel optimization, deep learning and fractional laplacian regularization with applications in tomography. Inverse Problems, 36(6):064001, May 2020.
- [3] Simon Arridge, Peter Maass, Ozan Öktem, and Carola-Bibiane Schönlieb. Solving inverse problems using data-driven models. Acta Numerica, 28:1–174, 2019.
- [4] Mario Bertero, Patrizia Boccacci, and Christine De Mol. Introduction to inverse problems in imaging. CRC press, 2021.
- [5] Alessandro Buccini, Marco Donatelli, and Ronny Ramlau. A semiblind regularization algorithm for inverse problems with application to image deblurring. SIAM Journal on Scientific Computing, 40(1):A452–A483, 2018.
- [6] Leon Bungert and Matthias J Ehrhardt. Robust image reconstruction with misaligned structural information. IEEE Access, 8:222944–222955, 2020.
- [7] Julianne Chung, Matthias Chung, and Dianne P O’Leary. Designing optimal spectral filters for inverse problems. SIAM Journal on Scientific Computing, 33(6):3132–3152, 2011.
- [8] Juan Carlos De los Reyes, Carola-Bibiane Schönlieb, and Tuomo Valkonen. Bilevel parameter learning for higher-order total variation regularisation models. J. Math. Imaging Vision, 57(1):1–25, 2017.
- [9] Guillermo Gallego, Tobi Delbrück, Garrick Orchard, Chiara Bartolozzi, Brian Taba, Andrea Censi, Stefan Leutenegger, Andrew J. Davison, Jörg Conradt, Kostas Daniilidis, and Davide Scaramuzza. Event-based vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1):154–180, 2022.
- [10] Daniel Gehrig, Henri Rebecq, Guillermo Gallego, and Davide Scaramuzza. Asynchronous, photometric feature tracking using events and frames. In Proceedings of the European Conference on Computer Vision (ECCV), pages 750–765, 2018.
- [11] Tamara G. Kolda and Brett W. Bader. Tensor decompositions and applications. SIAM Review, 51(3):455–500, 2009.
- [12] Renaud Morin, Stéphanie Bidon, Adrian Basarab, and Denis Kouamé. Semi-blind deconvolution for resolution enhancement in ultrasound imaging. In 2013 IEEE International Conference on Image Processing, pages 1413–1417. IEEE, 2013.
- [13] Liyuan Pan, Cedric Scheerlinck, Xin Yu, Richard Hartley, Miaomiao Liu, and Yuchao Dai. Bringing a blurry frame alive at high frame-rate with an event camera. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6820–6829, 2019.
- [14] Se Un Park, Nicolas Dobigeon, and Alfred O Hero. Semi-blind sparse image reconstruction with application to mrfm. IEEE Transactions on Image Processing, 21(9):3838–3849, 2012.
- [15] Pooja Satish, Mallikarjunaswamy Srikantaswamy, and Nataraj Kanathur Ramaswamy. A comprehensive review of blind deconvolution techniques for image deblurring. Traitement du Signal, 37(3), 2020.
- [16] Cedric Scheerlinck, Nick Barnes, and Robert E. Mahony. Continuous-time intensity estimation using event cameras. CoRR, abs/1811.00386, 2018.
- [17] Shu Tang, Weiguo Gong, Weihong Li, and Wenzhi Wang. Non-blind image deblurring method by local and nonlocal total variation models. Signal processing, 94:339–349, 2014.
- [18] Shipeng Xie, Xinyu Zheng, Wen-Ze Shao, Yu-Dong Zhang, Tianxiang Lv, and Haibo Li. Non-blind image deblurring method by the total variation deep network. IEEE Access, 7:37536–37544, 2019.
- [19] Zhengrong Xue. Blind image deblurring: a review. CoRR, abs/2201.10522, 2022.
- [20] Min Zhang, Geoffrey S Young, Yanmei Tie, Xianfeng Gu, and Xiaoyin Xu. A new framework of designing iterative techniques for image deblurring. Pattern Recognition, 124:108463, 2022.