This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

HYPERION: Hyperspectral Penetrating-type Ellipsoidal Reconstruction for Terahertz Blind Source Separation

Chia-Hsiang Lin Department of Electrical Engineering, National Cheng Kung University Miin Wu School of Computing, National Cheng Kung University Contributed equally Yi-Chun Hung Department of Electrical and Computer Engineering, University of California, Los Angeles Contributed equally Feng-Yu Wang Department of Electrical Engineering, National Tsing Hua University Department of Life Science, National Tsing Hua University Shang-Hua Yang Department of Electrical Engineering, National Tsing Hua University

1 Abstract

Terahertz (THz) technology has been a great candidate for applications, including pharmaceutic analysis, chemical identification, and remote sensing and imaging due to its non-invasive and non-destructive properties. Among those applications, penetrating-type hyperspectral THz signals, which provide crucial material information, normally involve a noisy, complex mixture system. Additionally, the measured THz signals could be ill-conditioned due to the overlap of the material absorption peak in the measured bands. To address those issues, we consider transmitted (penetrating-type) signal mixtures and aim to develop a blind hyperspectral unmixing (HU) method without requiring any information from a prebuilt database, such as predefined material composition or material information. The proposed HYperspectral Penetrating-type Ellipsoidal ReconstructION (HYPERION) algorithm is unsupervised, not relying on collecting extensive data or sophisticated model training. Instead, it is developed based on elegant ellipsoidal geometry under a very mild requirement on data purity, whose excellent efficacy is experimentally demonstrated.

2 Introduction

The recent advances in remote sensing technologies have revolutionized the ways people interact with the world in communication, manufacturing, molecular science, life science, healthcare, geography, and astronomy [1, 2, 3]. Among all remote sensing methods that perceive information non-invasively, electromagnetic (EM) wave has become one of the most critical tools due to their promising nature of the high-speed operation, long traveling distance, and unique light-matter interactions in specific spectral bands [4]. With the use of the EM wave, unveiling multi-functional behaviors inside objects with deep-subwavelength spatial precision and tera-frame-per-second imaging speed has also been well-demonstrated nowadays [5]. Terahertz (THz) spectrum, located between microwave and infrared EM spectrum, has recently aroused extensive attention for identifying a great variety of materials, including molecules, proteins, explosives, chemical mixtures, and charged particles in a remote distance [6, 7, 8, 9]. The chemical compositions and the molecular structures of materials can directly map their unique molecular dynamics in the THz frequency regime. This enables the THz spectroscopy tools to unveil fruitful material information inside optically opaque objects without any labels. Due to the non-ionizing, non-destructive nature of THz wave, THz spectroscopy systems are of great interest for medical, pharmaceutical, non-destructive evaluation, and industrial inspection [10, 11]. Different systems based on THz spectroscopic modalities have further extended the application fields. THz near-field systems such as THz scanning near-field microscopy [12] and THz scanning tunneling microscopy [13] significantly shrink spatial resolution down to atomic level, which becomes powerful tools to perform near-field studies of advanced materials and resolving structural information of biological samples [13]. Material dynamics down to tens of femtosecond temporal resolution can be further determined through time-resolved THz spectroscopy [14]. In addition, hyperspectral THz imaging systems map out not just water content inside biological samples but chemical distribution inside a sealed package [10, 11].

Although THz spectroscopy has shown its great promises in the past decades, many materials, chemical compounds, and biomolecules have not yet been fully characterized in the THz frequency range, unlike in visible light and infrared spectrum. Furthermore, different THz spectroscopic modalities, measuring conditions, or sample preparation methods would introduce considerable discrepancies between samples with the same material [15, 16]. The typical way to perform THz spectroscopy for material identification is to establish THz material characteristics of pure substances locally, followed by material information extraction [17]. This way, a ppb-level sensing capability has been demonstrated by evaluating the spectral shifts, and amplitude changes of the material absorption spectrum in the THz regime [18]. However, in many real-world scenarios, such as identifying chemical portions of malignant tumors or in situ quantifying glucose in a body fluid sample, it is still challenging to purify and extract pure substances for further THz material dataset establishment. This physical constraint significantly limits the practical use of THz spectroscopy systems in a wide range of applications. We want to ask here: Can we separate material signatures of mixture chemicals without measuring or even knowing their pure substances? This is a blind spectral unmixing problem waiting to be answered, especially in THz spectroscopy, THz hyperspectral imaging, and THz biophotonics fields. Recently, research groups have implemented THz spectral unmixing methods for material blind separation – hard modeling factor analysis (HMFA) [19], nonnegative matrix factorization (NMF) [20], and independent component analysis (ICA) [21]. However, all the conventional THz spectral unmixing methods still rely on properties of material information, the size of a measured dataset, and the signal-to-noise ratio of the dataset, which severely limits their scope of practical use. HMFA uses the correlation of mixture signals to unmix sources, which requires curve fitting for the massive amount of material information, such as spectral peak amplitudes, peak locations, and spectral profiles. The blind unmixing performance of the nonconvex NMF is sensitive to initial parameters and the signal-to-noise ratio, leading to difficulties for practical uses. One of the requirements in ICA is the statistical independence of the sources, causing inaccurate amplitudes of THz unmixed spectrum compared with the ground truth. Here, we proposed an unsupervised blind separation algorithm, HYperspectral Penetrating-type Ellipsoidal ReconstructION (HYPERION), which needs neither collecting big data nor sophisticated model training. In this article, we demonstrated a comprehensive study on HYPERION for non-invasive, non-destructive material separation and identification. To the best of our knowledge, HYPERION has shown superior performance among the state-of-the-art THz spectral unmixing methods regarding separation accuracy (i.e., spectral angle error, mean square error) and noise immunity testing with mixture materials. Due to the superior blind separation capability, we have further demonstrated highly accurate material blind separation mapping through HYPERION THz hyperspectral imaging. It should be noted that there is no prior knowledge of pure substance information, no supervised model training, no need to establish THz material dataset for pure substances, and no restricted requirement on the data purity γ\gamma. Our proposed blind separation modality, HYPERION, could potentially make paradigm shifts of THz technology for chemical sensing, industrial inspection, material identification, biomedical sensing, and imaging in the near future.

3 Results

Assumption evaluation

Our method, HYPERION, is designed based on Löwner-John ellipsoid (LJE) and the linear mixing model ((Supplementary Note 14)), where the THz data follows a convex geometry structure (see “Methods” for the details). By utilizing the information of the LJE defined as the maximum-volume ellipsoid inscribed in the convex geometry, HYPERION has the efficacy to blindly unmix sources from the data with low data purity [22]. To validate the linear mixing model on the transmitted THz signals, we have measured and characterized mixture tablets composed of D-lactose monohydrate, D-sucrose, and L-tyrosine through a broadband THz time-domain spectroscopy (THz-TDS) system. The three chemicals are selected due to their abundant information within THz spectral bands, including spectral absorption lines and varying absorption spectral trends. The obtained ternary dataset contains the three pure-substance THz spectra (D-lactose monohydrate, D-sucrose, and L-tyrosine) and six mixture spectra composed by each pair of the chemicals with a ratio of 3:7, which is designed to test the data purity criterion. The details of measurement protocol and environment condition are elaborated in Supplementary Note 1.

We compared the THz spectra simulated by the convex combinations of the three chemicals with their corresponding THz measured spectrum with the ternary dataset. All samples are prepared based on the standardized protocol (see “Methods” for the details). As shown in Fig. 1(a), all the simulated spectra are highly aligned with the measured spectra, including the spectral features and the locations of spectral lines. We then project high dimensional data of the ternary dataset into a low dimensional space through principal component analysis (PCA). The PCA evaluation result provides solid evidence that the necessary criteria of holding the linear mixing model are well-preserved (Supplementary Note 2). To further explore the linear mixing model in terms of spectrum similarity and accuracy, we use spectral angle mapper (SAM) and root-mean-square error (RMSE) [23, 24] to evaluate the diminutive differences between simulated spectrum and measured spectrum (Supplementary Note 3). The SAM indicator is based on the projection angle between two spectra in comparison, which explicitly reveals the spectral shape similarity without mixing the influence of spectrum offset. The RMSE indicator is the square root of the quadratic mean of differences between the two spectra. It includes both shape similarity and the effects of the band-wise differences among the measured spectral range. The SAM and RMSE of this ternary dataset are less than 5 degrees and 6 cm-1 as shown in Fig. 1(b), which demonstrates both the excellent spectral shape similarity as well as the slight deviation between the measured and simulated mixture spectra. Based on the validation of qualitative and quantitative analysis, the assumption of the linear mixing model is experimentally verified. Note that the linear mixing model has been extensively used in remote sensing for reflected (reflecting-type) signals but never for transmitted (penetrating-type) signals as far as we know; thus, the experiment is needed. We then transform the ternary dataset from the THz transmission spectrum to the material absorption spectrum of each measured sample (Supplementary Note 4), which directly presents broadband material information, including resonant frequencies, spectral features, and light-matter interaction levels in the THz range. Additionally, we extract and present material absorption spectrum between 0.2 THz to 1.75 THz since the ASOPS THz-TDS system dynamic range in power spectra of the selected materials approximately decreases to about 5 dB at 1.75 THz that start to mismatch with the linear mixing model. As shown in Fig. 1(c), all unmixed pure substance absorption spectra are highly aligned with the measured ones, including the spectral peak locations and the material absorption profiles. The unmixed D-glucose and L-tyrosine values are slightly deviated from the measured values above 1.5 THz since the signal-to-noise ratio (SNR) of the THz spectroscopy system gradually decreases from 60 dB at 0.3 THz to 5 dB at 1.75 THz.

Nevertheless, at low-SNR spectral regions (1.2 - 1.6 THz), HYPERION still provides great unmixing capability - less than 3.51 cm-1 deviated absorption coefficient values from the measured ones. For the spectral regimes with a noticeable unmixing difference, for example, the measured values of L-tyrosine absorption peak are 15.35 cm-1 and 4.78 cm-1 differences from the unmixed values at 0.949 THz and 1.295 THz, respectively. Due to the overlap of water vapor and material absorption peaks, the linear mixing model is not well-preserved. Since HYPERION takes all the spectral information into account by the LJE instead of specific spectral components, even in nonlinear matter-matter or light-matter interaction spectral regions, the unmixed material absorption spectrum based on HYPERION still demonstrates great spectral accuracy of each source spectral signature but just a relatively higher mismatch in absorption coefficient values.

Despite the nonlinear interaction and the low SNR levels in certain bands, HYPERION offers an accurate unmixing performance – 10.63 degrees with SAM and a 2.54 cm-1 with RMSE of overall unmixed material absorption spectrum has been achieved. This SAM value presents high spectral shape similarity of unmixed absorption coefficients, preserving rich material spectral profiles and details among a wide frequency range. In another perspective, the extraordinarily low RMSE shows both high unmixing accuracy and spectral similarity of the unmixed absorption coefficient, providing various types of material spectral features for further chemical identification and non-destructive evaluation applications.

Refer to caption
Figure 1: Unmixed results of the ternary dataset by HYPERION. (a) The three subplots at the top panel are the measured spectra of pure substances (D-glucose, L-tyrosine, and D-lactose monohydrate). Gray arrows indicate the absorption lines of water vapor. The bottom two rows of subplots are the mixture spectra with different mixing ratios; D-glucose, D-lactose monohydrate, and L-tyrosine are denoted as G, L, and T, respectively. The simulated lines (black) are computed based on the linear mixing model and the measured spectra of the pure substances. (b) The spectral angle mapper (SAM) and root mean square error (RMSE) between the simulated and measured spectra. G:L:T represents the proportion of D-glucose, D-lactose monohydrate, and L-tyrosine. (c) The comparison between the unmixed material absorption spectrum and the measured absorption spectrum upon the ternary dataset with pure substances. The unmixed and measured material absorption spectrum are shown as solid lines and dotted lines, respectively.

Quinary Dataset with Pure Substances

HYPERION has several advantages, including support of complex mixture systems, milder data purity requirement, excellent spectral accuracy on unmixed sources, and remarkable noise immunity, due to the simultaneous use of all spectral information by LJE. To further investigate the performance of HYPERION in broad application scope, we design three experimental modules to verify HYPERION characteristics in terms of unmixing capability of complex mixture system, low data purity dataset, and noisy environment condition based on the quinary dataset (see “Methods” for the details). The quinary dataset contains the five pure chemicals (D-glucose, D-lactose monohydrate, L-tyrosine, L-histidine, and D-sucrose) with rich material information in THz spectral range and ten mixture tablets. The five pure chemicals are specifically selected to validate HYPERION performance under different unmixed spectral scenarios - identifying single/multiple spectral peaks and overlaid spectral profiles simultaneously. Within the spectral range of 0.2 - 1.7 THz, L-tyrosine and L-histidine have a single absorption peak at 0.95 and 0.78 THz, respectively. D-lactose monohydrate contains two absorption peaks located at 0.52 THz and 1.3 THz. D-sucrose and D-glucose present the high spectral similarity of their THz absorption spectra, which is used to evaluate ill-conditioned levels of HYPERION. Furthermore, D-lactose is also selected to investigate the unmixing performance of HYPERION on the pharmaceutical application since D-lactose is the common ingredient as an excipient in tablets [25, 26]. The ten mixture tablets are composed of each pair of the five chemicals with a 5:5 ratio, which satisfies the milder requirement of the data purity for a quinary case [27]. Identical experimental setup, tablet preparation, data acquisition, and data preprocessing protocol are well-followed as the previous ternary dataset establishment.

Comparing the unmixed and measured absorption spectra shows that HYPERION has great efficacy on the different spectral scenarios. As shown in Fig. 2, the five unmixed absorption spectra (solid lines) based on the quinary dataset demonstrate a high correlation with the measured absorption spectra (dotted lines) within a broad frequency range. In terms of spectral peak locations, unmixed absorption spectra with varieties of material properties are precisely resolved. The spectral peak deviation of all unmixed absorption spectra is less than 10 GHz, close to the spectral resolution limit of the asynchronized optical sampling (ASOPS) THz-TDS system (see “Methods” for the details) at protocol settings. Additionally, the increasing trends and the characteristics of the absorption spectra are well unmixed since HYPERION utilizes the convex geometry of the whole measured frequency bands for the blind separation instead of specific bands information, such as peak regimes. Those mismatched values between unmixed and measured absorption spectra mainly come from nonlinear light-matter interaction [28], and limited SNR levels at higher frequencies.

To address the ill-conditioned case, HYPERION transforms the data matrix into the preconditioned space, where the similar material absorption spectra can be easily separated (see “Methods” for the details). As shown in Fig. 2, D-glucose and D-sucrose absorption spectra are highly correlated except for the slightly different slope within the 0.2 THz – 1.2 THz frequency range. Under this condition, HYPERION again shows the unmixing capability of a SAM of 9.57 degrees and an RMSE of 2.30 cm-1, demonstrating its resilience to ill-conditioned scenarios with the complex mixture system. Considering different spectral characteristics among the quinary dataset, HYPERION still reaches 12.15 degrees and 2.88 cm-1 in SAM and RMSE, respectively. Compared with less complicated material systems (e.g., ternary system), HYPERION maintains a similar accuracy level of unmixed material spectrum based on complex mixture systems (e.g., quinary system). Additionally, HYPERION is featured for its low computation time due to the convexity nature of the LJE (see “Methods” for the details). The computation time of HYPERION on this dataset is less than 3 seconds under a general personal laptop (Intel Core i5 with 8 GB memory).

Refer to caption
Figure 2: Unmixed material absorption spectra of the quinary dataset with pure substances. The comparison between unmixed material absorption spectrum (solid line) and measured material absorption spectrum (dotted line) upon the quinary dataset with pure substances. The subset of the figure shows that the D-glucose and D-sucrose absorption spectra are highly correlated. The material absorption spectrum difference between D-glucose and D-sucrose within 0.2 - 1.2 THz is dominated by different slopes (gray region) while it is dominated by shape within 1.2 - 1.75 THz.

Quinary Dataset without Pure Substances

In most on-site blind source separation scenarios, such as pharmaceutical inspection, biomarker analysis, remote sensing, and chemical identification, the existence and identification of pure substances remain unknown, which generally involve highly mixed chemicals and imply a low data purity. To this end, it is crucial to extract source signals from mixtures with low data purity [22].

Here, we have evaluated the HYPERION based on the same quinary dataset but excluded all pure substances. There is no prerequisite information from all mixture tablets in this testbed, including spectral information, the existence of pure substances, and spectra composition. To address this type of dataset with scarce information, HYPERION fits a simplex to the data convex geometry in the preconditioned space. When the optimal simplex is constructed, the corners of the optimal simplex correspond to the unmixed sources (viewed as vectors) in the preconditioned space (see “Methods” for the details). As shown in Fig. 3, the difference of unmixed absorption spectra among cases with and without pure substances is almost negligible. This is because LJE has been demonstrated to provide comparable performance with the pure substance case [29], as long as the requirement of data purity is satisfied (Supplementary Note 5). The slight difference in performance comes from the measurement noise of the dataset and the number of measured datasets – 15 versus 10 measured tablets. Without the need for pure substance information and its measurement on-site, HYPERION has the potential to expand the THz spectroscopy application scopes, opening up the door to industrial applications in non-invasive sensing, chemical identification, and in vivo biomarker extraction.

Refer to caption
Figure 3: Comparison of unmixed material absorption spectra with and without pure substances of the quinary dataset. Quinary dataset (D-glucose, D-lactose monohydrate, D-sucrose, L-tyrosine, L-histidine) with and without pure substances unmixed by HYPERION are indicated by solid and dotted lines, respectively. The values of spectral angle mapper (SAM) and root-mean square error (RMSE) of dataset with and without pure substances are also shown (with/without) to evaluate the spectral shape similarity.

Noise Immunity Evaluation

To evaluate the efficacy of HYPERION with THz spectroscopy systems in general, we further investigated the noise immunity performance of HYPERION. In this study, we apply additive white Gaussian noise (AWGN) to the measured THz time-domain electric field signals with a 0.001% - 0.1% standard deviation (SD) range (Supplementary Note 10). Moreover, the quinary THz dataset without pure substances is chosen to better evaluate real-world scenarios – low data purity and noisy condition. In those conditions, the data convex geometry will be distorted, resulting in the inaccuracy of unmixed material absorption spectra. To overcome the issue, HYPERION includes a general regularizer in the objective function to accommodate the distortion of convex geometry. With the general regularizer and the regularization parameter, λ\lambda, HYPERION does not force unmixed spectra to form the theoretically required regular structure in the noiseless case. Instead, it encourages the regular structure (see “Methods” section for the details). Based on the design of the regularizer, HYPERION is well-suitable for the penetrating-type THz data, which is normally ill-conditioned and noisy. As shown in Fig. 4(a), the SAM and RMSE of HYPERION unmixing results maintain at low levels while the noise SD is less than 0.1%. In the cases of noisy environment (e.g., top six highest SD conditions), HYPERION performs slightly inferior since the SNR of THz signals in the high-frequency regime drop dramatically at higher noise levels, which causes severe distortion of the data convex geometry and leads to a deteriorating impact of the unmixing performance. To further evaluate the noise immunity capability of HYPERION among unmixed chemicals and frequency ranges, the unmixed absorption spectra of the five pure substances are demonstrated at 0.1% noise SD (Fig. 4(b)-(f)). Under this severe noise condition, the unmixed absorption spectra through HYPERION still show great alignment with the noise-free ground truth in low-frequency bands. Within the 0.2 – 1 THz frequency range, the SAM and RMSE of the D-glucose absorption spectrum are 12.32 degrees and 2.50 cm-1, respectively. In higher frequency bands, a noticeable difference of every material absorption spectrum starts to show. Although the global trends and features are mostly well-preserved, the high-frequency spectral fluctuations due to the severe additive noise could be an issue for high-precision chemical identification applications.

In sum, HYPERION demonstrates its noise immunity feature with distorted convex geometry datasets. By utilizing this powerful feature, HYPERION not only can fit in a strictly controlled data acquisition environment but also be well suitable for real-world blind detection schemes under a noisy environment.

Refer to caption
Figure 4: The performance of HYPERION under different noise standard deviation (SD). (a) The root mean square error (RMSE) and spectral angle mapper (SAM) of HYPERION under different noise SD. (b-f) The unmixed absorption spectra and ground truth material absorption spectra of L-histidine, D-sucrose, D-glucose, D-lactose monohydrate, and L-tyrosine within 0.2 - 1.75 THz under SD = 0.1%.

Comparison with commonly used unmixing methods

To further evaluate the unmixing performance of the geometry-based HYPERION, we introduce the commonly used unmixing methods as a comparison: statistic-based nonnegative independent component analysis (nICA), algebra-based nonnegative matrix factorization (NMF), model-based hard modeling factor analysis (HMFA), and geometry-based successive projection algorithm (SPA) [30, 31, 20, 32, 21, 33]. nICA is one of the modified versions of ICA, where it imposes the nonnegative constraint on unmixed sources. With the nonnegative constraint, nICA can unmix the spectra of nonnegative values and lead to better performance than ICA since the fraction of incident radiation absorbed by the material is always nonnegative, leading to the nonnegative values of a material absorption spectrum. NMF is the unmixing method for blind separation problems designed according to the non-convex optimization approaches (Supplementary Note 13). HMFA is the unmixing method based on peak fitting and has demonstrated its efficacy in mid-infrared and THz bands [32, 19]. SPA uses the selection and projection in the vector space to unmix the blind sources. The comparison among HYPERION, nICA, NMF, HMFA, and SPA under different noise SD is shown in Fig. 5. From the unmixed material spectra and the RMSE values, HYPERION has outperformed the THz unmixing methods under a broad noise SD range from 0.001% to 0.1%. As expected, it comes from the three designed properties of HYPERION. First, HYPERION filters out most noise in the affine fitting step, where the data is projected into the lower dimensional convex hull by the singular value decomposition approach. Second, HYPERION uses convex geometry to find the LJE, which can further address the ill-conditioned nature of THz source signals. Third, the embedded general regularizer provides better accommodation to the noisy data by soft fitting for the distortion of convex data geometry in the objective function. As shown in Fig. 5(a, e), although nICA can resolve the spectral absorption peaks, the unmixed material absorption spectra by nICA are severely compressed. The reason is that the nICA method assumes all material absorption spectra are independent in the complex mixture system [31]. However, the summation of the chemical composition (in %) must equal to one, making it impossible for the sources to be statistically independent, and this sum-to-one constraint leads to an assumption of nICA independent assumption. In Fig. 5(b), NMF shows decent unmixing capability because it can converge to the local optimum [34, 35]. However, NMF is relatively sensitive to noisy conditions, probably due to its non-convexity nature. As a result, the unmixing performance of NMF on the THz dataset bounces dramatically at different unmixing trails and noise levels. Consequently, it is challenging to accurately resolve unmixing material spectra under a low SNR spectral regime. As shown in Fig. 5(f), the unmixed absorption spectra through NMF deviate more from the ground truth signal compared to HYPERION while frequency increases. In the case of HMFA (Fig. 5(c, g)), it is not capable of resolving the ill-conditioned data (i.e., D-Glucose and D-Sucrose) nor the spectral trends since the peak fitting approach inherently has difficulties in differentiating overlapped features. As shown in Fig. 5(d, h), SPA has the inferior unmixing performance in the absorption peak regions since the pure-pixel assumption does not hold in those datasets [36]. The convergence time of these methods is also provided in Supplementary Table. 8. Overall, HYPERION delivers superior performance than the four THz unmixing modalities in resolving broadband spectral features, unmixing ill-condition data, reducing noise impact, and preserving spectral details in a noisy environment.

Refer to caption
Figure 5: The performance comparison among HYPERION, NMF, HMFA and SPA under different noise SD. (a-d) The performance comparison between HYPERION with NMF, nICA, HMFA and SPA under different noise SD, respectively. The noise-free values are indicated with dotted lines. The unmixing performance is evaluated by root mean square error (RMSE) between the unmixed absorption spectra and the ground truth. (e-h) Five unmixed absorption spectra by NMF, nICA, HMFA and SPA are compared with the ground truth.

Application

In the sections above, HYPERION has demonstrated superior features on unmixing penetrating-type THz material spectra. Those accurately unmixed spectra can benefit several applications. To demonstrate one of the potential applications, we propose the combination of the raster-scanning THz-TDS system and HYPERION (THz HYPERION) to visualize the “secret recipes” of arbitrary objects at a remote distance. We use the unmixed spectra from the quinary dataset without pure substances and prepare a small test set of 15 chemical tablets with different compositions based on D-lactose monohydrate, D-glucose, and L-tyrosine. The 15 tablets are mounted on the paper board covered by a copper foil, which is utilized to block THz signals encoding the material information of the paper board. Every tablet is measured by the THz-TDS system with the configurations described earlier. The images in visible light and THz band images of the test set are shown in Fig. 6(a) and 6(b), respectively. The unmixed material spectra by the quinary dataset without pure substances are used to restore the chemical compositions of the 15 tablets in the test set by the following convex optimization problem:

min𝒓i3\displaystyle\min_{\bm{r}_{i}\in\mathbb{R}^{3}}{} 𝒙i𝑨𝒓i1\displaystyle~{}\left\|\bm{x}_{i}-\bm{A}\bm{r}_{i}\right\|_{1} (1)
s.t. 𝒓i0, 1𝑻𝒓i=1,\displaystyle~{}\ \bm{r}_{i}\geq 0,\ \bm{1^{T}}\bm{r}_{i}=1,

where i=1,2,,15i=1,2,\dots,15. 𝒙i\bm{x}_{i} and 𝒓i\bm{r}_{i} are the measured THz material absorption spectra of the test set and the optimization variables to estimate the corresponding ratio of compositions, respectively. Columns of 𝑨\bm{A} are the unmixed spectra from the quinary dataset without pure substances. The L1L^{1} norm is chosen to effectively decrease the influence of the unmixed spectra deviation in the absorption peak regions. Since equation (1) is a convex problem, the solver CVX [37] is adopted for the following demonstration (Supplementary Note 6). The qualitative comparisons between the ground truth and the estimated material composition map are shown in Fig. 6(c, d). As expected, the majority of tablet material compositions are highly aligned with the ground truth except very few tablets (i.e. 10th{}^{\text{th}}, 11th{}^{\text{th}} and 12th{}^{\text{th}} tablets). This experimental result indicates that THz HYPERION is applicable for estimating the arbitrary composition of complex mixture systems while the pure material spectra are accurately unmixed. In Fig. 6(e), we further evaluate the unmixed material composition map qualitatively. Most unmixed results present less than 20% inaccuracy of each composition difference compared with ground truth. The inaccuracy of estimating material composition is caused by the number of spectra in the quinary dataset, the high absorption spectrum similarity of D-lactose monohydrate and D-glucose, and the increased system noise in the high-frequency regime. It is worth mentioning that the accuracy of unmixed spectra would be greatly enhanced while the size of the measured dataset is increased. To this extent, the quality of unmixed material signatures can be significantly elevated by extending the measurement area and the pixel number of the multi-material objects under test. Furthermore, different regularizers can be further designed to match with HYPERION for specific application purposes. Overall, our demonstration shows the potential for some blind separation mapping applications, including pharmaceutical analysis, functional imaging, and space exploration.

Refer to caption
Figure 6: Blind chemical mapping by THz HYPERION. (a) The optical and (b) THz images of 15 tablets with different proportions composed of D-glucose, D-lactose monohydrate, and L-tyrosine, respectively. The THz image is scanned with the step size of 0.25 mm; the colormap shows the normalized field strength of the time-resolved transmitted THz signal. (c) The ground truth and (d) the experimental compositions of tablets. The color basis is blue, red, and green as shown in the ground truth 1st1^{\text{st}}, 11th11^{\text{th}}, and 15th15^{\text{th}} points, respectively. The color channels are linear combination of the selected color basis by the corresponding ratio of the chemical composition. 1st1^{\text{st}}, 11th11^{\text{th}}, and 15th15^{\text{th}} points are corresponding to D-glucose, D-lactose monohydrate and L-tyrosine, respectively. (e) The comparison between the estimated proportions by THz HYPERION and the ground truth (D-lactose monohydrate/L-tyrosine/D-glucose).

4 Discussion

In this paper, we propose HYperspectral Penetrating-type Ellipsoidal ReconstructION (HYPERION) to blindly unmix the sources (i.e., the THz absorption spectra of pure substances) from transmitted THz signals, which are usually noisy and low data purity. In HYPERION, affine fitting and simplex fitting are utilized to address the noisy and low data purity issues, respectively. In affine fitting, the noise is filtered out by projecting the data into the lower dimensional convex hull; in simplex fitting, HYPERION is encouraged to fit a regular simplex to the data convex geometry in the preconditioned space, where a relatively mild data purity is required (i.e., γ>1q1\gamma>\frac{1}{\sqrt{q-1}}). Additionally, the THz spectra with the overlapped material absorption peak in the measured THz bands, such as D-glucose and D-sucrose in 0.2 - 1.75 THz bands, can be ill-conditioned and lead to inferior performance. To address the ill-conditioned case, HYPERION transforms the transmitted THz signals into the preconditioned space, where the LJE information can easily separate the similar THz spectra. Based on the affine and simplex fittings and the transformation based on LJE information, HYPERION is capable of handling the ill-conditioned THz signals with noise and low data purity.

To evaluate the unmixing efficacy of HYPERION upon the ill-conditioned dataset with low data purity, we selected the quinary dataset without pure substances for the qualitative and quantitative analysis. In the ill-conditioned case, the THz material absorption spectra of D-glucose and D-sucrose are well-unmixed with RMSE of 2.59cm12.59~{}\text{cm}^{-1} and 3.08cm13.08~{}\text{cm}^{-1} compared with the ground truth spectra, respectively. In the low data purity case, even though the dataset only contains the highly mixed tablets (i.e., 5:5 for every pair of the pure substances), HYPERION still demonstrates the comparable unmixing performance of the dataset with pure substances. Note that, compared to the 1:9 mixing condition, the 5:5 mixing condition is indeed highly mixed. Additionally, for the noisy condition, we have also applied AWGN with 0.001% - 0.1% SD range to the quinary dataset without pure substances. In the SD range under 0.04%, HYPERION delivers similar unmixing performance; in the SD range higher than 0.04%, the unmixing performance is slightly inferior since the SNR of THz signals drops dramatically in the high-frequency regime. With the detailed evaluation, HYPERION reveals the great unmixing efficacy on complex mixture systems under noisy conditions.

With the support of complex mixture systems, HYPERION is capable of contributing to the applications, including pharmaceutical analysis, biomedical diagnosis [38], and art conservation [39]. Among those applications, pharmaceutical analysis is much more challenging since it requires identification of drugs and monitoring the impurity of drugs [40]. In this sense, we have demonstrated how HYPERION can further help analyze drug compositions and impurity levels without prior drug recipe information. In the demonstration, a material absorption spectrum dataset, composed of the mixtures of D-lactose monohydrate, D-glucose, and L-tyrosine with distinctive material composition, is adopted to simulate the different impurity levels of drugs. Upon the dataset, THz HYPERION can precisely estimate most of the mixture composition and deliver less than 20% inaccuracy of each composition compared with ground truth.

THz hyperspectral imaging is a great extent for HYPERION for future work since the LJE can be more accurately estimated from a large number of spectra. The task-oriented regularizers can also be designed for the spatial relation of pixels, such as spatial continuity and self-similarity. To this extent, more precise unmixed material absorption spectra and complex material mapping with a wide composition range can be well-resolved. Furthermore, the more efficient optimizer can significantly decrease the computation time, which utilizes a large number of spectra in hyperspectral images. In addition to the above future work for the more precise estimation of unmixed spectra and material mapping, the required information of tablet thickness in HYPERION can be relaxed by combining with the methods estimating the complex refractive index based on THz-TDS systems [17, 41]. By relaxing the required information of tablet thickness, HYPERION can further contribute to the applications where the sample thickness cannot be measured and obtained.

5 Methods

Löwner-John ellipsoid (LJE) for transmitted (penetrating-type) THz signals

In complex mixture systems, the material absorption spectra of the overlapped material absorption peaks in the measured THz bands can be quite similar (i.e., D-sucrose and D-glucose). It is the so-called ill-conditioned HU problem, having drawn attention from very recent machine learning literature [42]. As far as we know, the most effective solution for addressing the ill-conditioned HU is based on the Löwner-John ellipsoid (LJE) theory [42], which elegantly exploits the data convex structure (i.e., the convex hull of signature vectors contains the data vectors; cf. equation (2)). However, almost all the HU theories (including LJE) were developed for the reflecting-type signals, probably because the need for the HU techniques mainly comes from the satellite hyperspectral remote sensing, for which the hyperspectral signals are reflected from the objects to the satellite sensors [43]. To apply the LJE theory on the transmitted THz signals, our first task is to reveal the data convex structure of the THz signals.

Let mi(t),i=1,,nm_{i}(t),~{}i=1,\dots,n be the impulse response function of the ithi^{\text{th}} material that the input THz signal x(t)x(t) penetrates through, and let \otimes denote the convolution operator. The transmitted THz signal measured at time tt can then be modeled as

y(t)=x(t)m1(t)mn(t).y(t)=x(t)\otimes m_{1}(t)\otimes\cdots\otimes m_{n}(t).

By taking the Fourier transform on both sides, we then have

Y(f)=X(f)M1(f)Mn(f),Y(f)=X(f)~{}\!M_{1}(f)\cdots M_{n}(f),

where (Y,X,Mi)(Y,X,M_{i}) are the Fourier transforms of (y,x,mi)(y,x,m_{i}), respectively, and ff is the frequency index. If there are kk frequency samples f1,,fkf_{1},\dots,f_{k} in the THz spectral regions, then (Y,X,Mi)(Y,X,M_{i}) can be considered as kk-dimensional vectors (e.g., Y=[Y(f1),,Y(fk)]TY=[Y(f_{1}),\dots,Y(f_{k})]^{T}). Let us define the standardized data as Z(f)=log(|Y(f)||X(f)|)=i=1nlog(|Mi(f)|)Z(f)=\log\left(\frac{|Y(f)|}{|X(f)|}\right)=\sum_{i=1}^{n}{\log(|M_{i}(f)|)}. Also, the materials M1,,MnM_{1},\dots,M_{n} may not be distinct, we assume that there are qq distinct materials in the set {M1,,Mn}\{M_{1},\dots,M_{n}\}, and let N(i)N(i) be the set of indices corresponding to the ithi^{\text{th}} distinct material in the set {M1,,Mn}\{M_{1},\dots,M_{n}\}. By defining Hi(f)=jN(i)Mj(f)H_{i}(f)=\prod_{j\in N(i)}M_{j}(f), we can further simplify the expression as Z(f)=i=1qlog(|Hi(f)|)Z(f)=\sum_{i=1}^{q}{\log(|H_{i}(f)|)}. Here, we neglect the energy loss between the interfaces of materials since it is relatively small compared to energy loss in the lossy medium (Supplementary Note 4). According to the derived approximation of material absorption coefficients (Supplementary Note 4), we have log(|Hi(f)|)=12αi(f)di,i=1,,q\log(|H_{i}(f)|)=\frac{1}{2}\alpha_{i}(f)d_{i},~{}i=1,\dots,q, thereby leading to the standardized data representation

Z(f)=i=1q12αi(f)di,Z(f)=\sum_{i=1}^{q}\frac{1}{2}\alpha_{i}(f)d_{i},

where αi(f)\alpha_{i}(f) is the material absorption coefficient of the ithi^{\text{th}} distinct material at frequency ff, and did_{i} is the penetration depth of the ithi^{\text{th}} distinct material.

Naturally, we define the hyperspectral signature of the ithi^{\text{th}} material as

𝒔i[αi(f1),,αi(fk)]Tk,i=1,,q.\bm{s}_{i}\triangleq[\alpha_{i}(f_{1}),\dots,\alpha_{i}(f_{k})]^{T}\in\mathbb{R}^{k},~{}i=1,\dots,q.

Let dind_{i}^{n} be the penetration depth of the ithi^{\text{th}} distinct material in the nthn^{\text{th}} sample data. It is elegant to observe that if we have the information of the value of ln12(d1n++dqn)l_{n}\triangleq\frac{1}{2}(d_{1}^{n}+\dots+d_{q}^{n}), simply normalizing the nthn^{\text{th}} standardized data Zn[Zn(f1),,Zn(fk)]TZ_{n}\triangleq[~{}\!Z_{n}(f_{1}),\dots,Z_{n}(f_{k})~{}\!]^{T} by lnl_{n} can reveal the desired convex structure, i.e., Znlnconv{𝒔1,,𝒔q}\frac{Z_{n}}{l_{n}}\in\textrm{conv}\{{\bm{s}}_{1},\dots,{\bm{s}}_{q}\}. If the incident angle is zero, lnl_{n} is nothing but half of the thickness of the nthn^{\text{th}} sample. Therefore, in the proposed HYPERION algorithm, the \ell samples Z1,,ZZ_{1},\dots,Z_{\ell} are allowed to have different thickness 2l1,,2l2l_{1},\dots,2l_{\ell}. Based on different thickness of samples, the normalized/standardized data 𝒙n=[Zn(f1)ln,,Zn(fk)ln]T{\bm{x}}_{n}=\left[\frac{Z_{n}(f_{1})}{l_{n}},\dots,\frac{Z_{n}(f_{k})}{l_{n}}\right]^{T} have the desired data convex structure

𝒙1,,𝒙conv{𝒔1,,𝒔q}.{\bm{x}}_{1},\dots,{\bm{x}}_{\ell}\in\textrm{conv}\{{\bm{s}}_{1},\dots,{\bm{s}}_{q}\}. (2)

The aim is to recover the THz signatures 𝒔1,,𝒔q{\bm{s}}_{1},\dots,{\bm{s}}_{q} from the preprocessed THz transmitted signals 𝒙1,,𝒙{\bm{x}}_{1},\dots,{\bm{x}}_{\ell} for material identification.

Due to the convex structure in equation (2), HYPERION can use the information of LJE, defined as the maximum-volume ellipsoid inscribed in the standardized/normalized penetrating-type THz signals as shown in Fig. 7, for the source separation with mild requirement of data purity (Supplementary Note 5).

To be precise, the standardized/normalized transmitted THz data matrix 𝑿{\bm{X}} can be explicitly written as

𝑿[𝒙1,,𝒙]=𝑷𝑻𝚺k×,{\bm{X}}\triangleq[{\bm{x}}_{1},\dots,{\bm{x}}_{\ell}]={\bm{P}}{\bm{T}}\bm{\Sigma}\in\mathbb{R}^{k\times\ell},

where 𝑷k×q{\bm{P}}\in\mathbb{R}^{k\times q} with [𝑷]ijαj(fi)[{\bm{P}}]_{ij}\triangleq\alpha_{j}(f_{i}), 𝑻q×{\bm{T}}\in\mathbb{R}^{q\times\ell} with [𝑻]ij12dij[{\bm{T}}]_{ij}\triangleq\frac{1}{2}d_{i}^{j}, and 𝚺×\bm{\Sigma}\in\mathbb{R}^{\ell\times\ell} is the diagonal matrix with the ithi^{\text{th}} diagonal entry being 1li\frac{1}{l_{i}}. Then, computing the LJE of the THz signals 𝑿{\bm{X}} can be proven to be a convex optimization problem [44] as below:

(𝑭,𝒄)=\displaystyle(\bm{F}^{\star},\bm{c}^{\star})= argmax𝑭𝕊++q1,𝒄q1logdet(𝑭)\displaystyle\arg\max_{{\bm{F}}\in\mathbb{S}^{q-1}_{++},~{}\!{\bm{c}}\in\mathbb{R}^{q-1}}~{}\log\det({\bm{F}}) (3)
s.t.𝑭𝒃ihi𝒃iT𝒄,i=1,,H,\displaystyle{\rm s.t.}~{}~{}~{}~{}~{}~{}\lVert\bm{F}\bm{b}_{i}\rVert\leq h_{i}-\bm{b}_{i}^{T}\bm{c},~{}\forall i=1,\dots,H,

where 𝕊++p1\mathbb{S}^{p-1}_{++} is the positive semidefinite (PSD) cone, the halfspace parameters {(𝒃1,h1),,(𝒃H,hH)}\{({\bm{b}}_{1},h_{1}),\dots,({\bm{b}}_{H},h_{H})\} come from the \cal H-polytope representation of the transmitted THz data 𝑿{\bm{X}}, i.e., (𝑿)=(𝑷𝑻𝚺)(conv{𝒙1,,𝒙})=i=1H{𝒙𝒃iT𝒙hi}\mathcal{H}({\bm{X}})=\mathcal{H}({\bm{P}}{\bm{T}}\bm{\Sigma})\equiv\mathcal{H}(\textrm{conv}\{\bm{x}_{1},\dots,\bm{x}_{\ell}\})=\bigcap_{i=1}^{H}~{}\!\{\bm{x}\mid\bm{b}_{i}^{T}\bm{x}\leq h_{i}\} [45], and the optimal argument (𝑭,𝒄)(\bm{F}^{\star},\bm{c}^{\star}) gives the desired maximum-volume ellipsoid inscribed in the THz data 𝑿(𝑿){\bm{X}}\equiv\mathcal{H}({\bm{X}}) (i.e., LJE) as

(𝑭,𝒄){𝑭𝜶+𝒄𝜶1}q1.\mathcal{E}(\bm{F}^{\star},\bm{c}^{\star})\triangleq\{\bm{F}^{\star}\bm{\alpha}+\bm{c}^{\star}\mid\|\bm{\alpha}\|\leq 1\}\subseteq\mathbb{R}^{q-1}.

Note that we have assumed w.l.o.g. that the dimension of conv{𝒙1,,𝒙}\textrm{conv}\{\bm{x}_{1},\dots,\bm{x}_{\ell}\} is no greater than q1q-1, as implied by equation (2). The solving details for equation (3) are discussed in the Supplementary Note 7.

Refer to caption
Figure 7: Illustration of HYPERION. The illustration is taken in the ternary case for comprehensive visualization. The points, m1m_{1}, m2m_{2} and m3m_{3} are the unknown THz sources of the complex mixture system. The LJE is first obtained by the convex hull of data points in the original space. The transformation between the obtained LJE and a Euclidean ball, FF, and the center of the LJE, 𝒄\bm{c} are then utilized to transform data points to the preconditioned space for addressing the ill-conditioned case. Since the signatures of precondition operator, 𝒞(m1)\mathcal{C}(m_{1}), 𝒞(m2)\mathcal{C}(m_{2}) and 𝒞(m3)\mathcal{C}(m_{3}), should form corners of a simplex with provable regular structure in the noise-free case, HYPERION is encouraged to fit a regular simplex to the data convex geometry. To extract the unmixed signatures, the corners of the constructed simplex are transformed back to the original space by the inverse function of precondition operator.

LJE (𝑭,𝒄)\mathcal{E}(\bm{F}^{\star},\bm{c}^{\star}) for solving the transmitted THz unmixing problem

As aforementioned, THz data is ill-conditioned. As reported in [42], the information embedded in the LJE (𝑭,𝒄)\mathcal{E}({\bm{F}}^{\star},{\bm{c}}^{\star}) is critical in preconditioning the data 𝑿{\bm{X}}; specifically, LJE yields the preconditioned data

𝒞(𝑿)=(𝑭)(𝑷𝑻𝚺𝒄𝟏T)(q1)×,\mathcal{C}({\bm{X}})=({\bm{F}}^{\star})^{\dagger}\left({\bm{P}}{\bm{T}}\bm{\Sigma}-{\bm{c}}^{\star}\bm{1}_{\ell}^{T}\right)\in\mathbb{R}^{(q-1)\times\ell},

whose corresponding (preconditioned) THz signatures are easily verified to be 𝑺~=(𝑭)([𝒔1,,𝒔q]𝒄𝟏qT)\widetilde{{\bm{S}}}=({\bm{F}}^{\star})^{\dagger}\left([{\bm{s}}_{1},\dots,{\bm{s}}_{q}]-{\bm{c}}^{\star}\bm{1}_{q}^{T}\right); note that the precondition operator 𝒞()\mathcal{C}(\cdot) is applied columnwisely as 𝒞(𝒗)=(𝑭)(𝒗𝒄)\mathcal{C}({\bm{v}})=({\bm{F}}^{\star})^{\dagger}({\bm{v}}-{\bm{c}}^{\star}) for any given (column) vector 𝒗{\bm{v}}. Remarkably, estimating the preconditioned THz signatures 𝑺~\widetilde{{\bm{S}}} is much more friendly because, according to the LJE theory [42, Theorem 1], columns of 𝑺~\widetilde{{\bm{S}}} will form a regular simplex centered at the origin whenever the data purity γ>1q1\gamma>\frac{1}{\sqrt{q-1}}. As 𝑺~\widetilde{{\bm{S}}} is obtained, the THz signatures can be simply recovered as [𝒔1,,𝒔q]=(𝑭)𝑺~+𝒄𝟏qT[{\bm{s}}_{1},\dots,{\bm{s}}_{q}]=({\bm{F}}^{\star})^{\dagger}\widetilde{{\bm{S}}}+{\bm{c}}^{\star}\bm{1}_{q}^{T}. Therefore, we can focus on estimating 𝑺~\widetilde{{\bm{S}}} from the preconditioned THz data 𝒞(𝑿)\mathcal{C}({\bm{X}}) next.

According to the regular simplex structure, 𝑺~\widetilde{{\bm{S}}} can be characterized as [42, Equation (11)]

𝑺~=α𝑼T𝑺0,\widetilde{{\bm{S}}}=\alpha{\bm{U}}^{T}{\bm{S}}_{0}, (4)

where 𝑺0{\bm{S}}_{0} forms any unit-volume regular simplex centered at the origin 𝟎q1\bm{0}_{q-1}, with closed-form expression available in [42, Proposition 2]; the unitary matrix 𝑼{\bm{U}} (i.e., 𝑼T𝑼=𝑰q1{\bm{U}}^{T}{\bm{U}}=\bm{I}_{q-1}) and the scalar α>0\alpha>0 (with closed-form expression available in [42, Equation (12)]) are for rotating and scaling 𝑺0{\bm{S}}_{0} to fit 𝑺~\widetilde{{\bm{S}}}.

As the transmitted THz data is not just ill-conditioned but also noisy, the regular structure may not be strictly satisfied. Therefore, unlike [42], we did not force the regular structure. Instead, we just use such information to design a regularizer ϕ(𝑺~)𝑺~α𝑼T𝑺0F2\phi(\widetilde{{\bm{S}}})\triangleq\left\|\widetilde{{\bm{S}}}-\alpha\bm{U}^{T}{\bm{S}}_{0}\right\|_{F}^{2}, by minimizing which we just encourage the regular structure (rather than forcing it).

To finish the design of the HU criterion for the transmitted THz data, we need to design the data-fitting term. To this end, we need the following lemma, whose proof is given in Supplementary Note 8.

Lemma 1

The precondition operator 𝒞\mathcal{C} satisfies the relation of 𝒞(𝐗)=𝐒~𝐓𝚺\mathcal{C}({\bm{X}})=\widetilde{{\bm{S}}}{\bm{T}}\bm{\Sigma}. \square

By Lemma 1, we naturally design the data fitting term as 𝒞(𝑿)𝑺~𝑻𝚺F2\left\|\mathcal{C}({\bm{X}})-\widetilde{{\bm{S}}}{\bm{T}}\bm{\Sigma}\right\|_{F}^{2}, which, together with the regularizer ϕ\phi, yields the following HU criterion for transmitted THz signals:

min𝑺~,𝑻,𝚺,𝑼\displaystyle\min_{\widetilde{{\bm{S}}},~{}\!{\bm{T}},\bm{\Sigma},~{}\!\bm{U}}~{} 𝒞(𝑿)𝑺~𝑻𝚺F2+λϕ(𝑺~)+Iunitary(𝑼)\displaystyle~{}\left\|\mathcal{C}({\bm{X}})-\widetilde{{\bm{S}}}{\bm{T}}\bm{\Sigma}\right\|_{F}^{2}+\lambda\phi(\widetilde{{\bm{S}}})+I_{\textrm{unitary}}({\bm{U}}) (5)
s.t. 𝑻𝟎,𝚺𝟎,𝟏qT𝑻𝚺=𝟏T,\displaystyle~{}{\bm{T}}\geq\bm{0},~{}\bm{\Sigma}\geq\bm{0},~{}\bm{1}_{q}^{T}{\bm{T}}\bm{\Sigma}=\bm{1}_{\ell}^{T},

where λ:=1\lambda:=1 is the regularization parameter, and Iunitary()I_{\textrm{unitary}}(\cdot) is the indicator function of the set of unitary matrices. An algorithm for solving equation (5) is provided in Supplementary Note 9. Once equation (5) is solved, the column vectors of the optimal solution 𝑺~\widetilde{{\bm{S}}}^{\star} then serves as the estimates of the THz signatures in the preconditioned space as shown in Fig. 7. Based on the problem formulation and the solving detail, the algorithm, termed HYperspectral Penetrating-type Ellipsoidal ReconstructION (HYPERION), has been completed. Remarkably, HYPERION does not use any information about the pattern of resonant peaks of the signatures and is designed under a fully unsupervised setting.

ASOPS THz-TDS System

In the asynchronized optical sampling (ASOPS) THz-TDS system (Menlo TERA ASOPS, Menlo Systems, Germany) as shown in Fig. 8, two asynchronized Er-doped fiber femtosecond lasers are fed into an InGaAs/InAlAs THz photoconductive antenna emitter and an LT-InGaAs/InAlAs THz detector. The power, bandwidth, and dynamic range of the THz-TDS system are up to 60 µW, less than 4.5 THz, and greater than 80 dB. The generated THz radiation from the THz photoconductive antenna emitter then consecutively travels through two convex THz lenses with a focal length of 10 cm. The first convex lens is used for THz beam collimation, and the second convex lens is to focus the THz radiations on the tablets. The diameter of the focused beam is approximately 1.5 mm of full width at half maximum (FWHM). After penetrating through tablets, the transmitted THz waves contained material absorption information in THz range, then traveling through two identical convex THz lenses to the THz photoconductive antenna detector. THz photoconductive antenna detector is used to retrieve the time-resolved THz electric field and convert the electric field signal to the photocurrent. The connected transimpedance amplifier (TIA) then amplifies the induced photocurrent to voltage signals. The bandwidth and the gain of the TIA are 1.8 MHz and 106 (V/A), respectively. Each measurement contains a 100 ps time-domain trace with a 5 fs temporal resolution. To further increase the signal-to-noise ratio (SNR) of the system, we average 1,000 time-domain traces, which can effectively decrease the time-domain additive white Gaussian noise to 0.015% according to the law of large numbers (Supplementary Note 10). With the setting above, the system offers a dynamic range of over 65 dB from 0.1 THz to 3 THz.

Refer to caption
Figure 8: Experimental setup. The THz photoconductive antenna (PCA) emitter and detector are excited by two asynchronized femtosecond lasers with repetition rate (frepf_{\text{rep}}) of 100 MHz + 10 Hz and 100 MHz. The repetition rate difference (Δf\Delta f) of two lasers is 10 Hz. The generated THz beam is focused to a spot with a diameter of 1.5 mm by THz convex lenses (focal length: 10 cm), interacting with chemical tablets and being detected by the THz photoconductive antenna detector. The detected THz electric field is converted to photocurrent and is further amplified by a transimpedance amplifier (TIA). The computer converts the amplified analog signal to the digital data and presents the measured THz signals in both time and frequency domains.

Tablet Preparation

In this study, all tablets were composed of D-glucose, D-lactose monohydrate, L-histidine, L-tyrosine, and D-sucrose powders. D-glucose, D-lactose monohydrate, and D-sucrose powders were from Merck & Co., Inc. (Kenilworth, NJ, USA). L-histidine, L-tyrosine powders were from Acros Organics (Geel, Belgium). Each chemical was ground and mixed uniformly by the ball mill, ensuring that the Mie scattering is greatly reduced in the measured spectral characteristics. Followed by the pulverization, the chemical powders were poured into a tablet die made by a high-speed steel (HSS) for subsequent high-pressure compression. Powders were compressed by a hydraulic press (Specac, Orpington, U.K.) to form 3 mm-thick tablets (total mass: 0.3 g; thickness: 3.05±0.023.05\pm 0.02 mm; see Supplementary Table 6 for the details) under the pressure of 1,000 kg/cm2 for 15 seconds. Tablets were then mounted on a 0.6 mm thick polylactic acid (PLA) plate. The copper foil is covered around the tablets to block THz signals encoding the PLA plate material information. Additionally, the copper foil prevents the low-frequency diffraction effect caused by the edge of the PLA plate.

6 Supplementary Information

Supplementary Note 1: Measurement Protocol

Each spectrum is obtained by fast Fourier transform (FFT) of a 100 picoseconds (ps) THz time-domain trace based on an asynchronous optical sampling (ASOPS) THz-TDS system (see Supplementary Note 11). To increase the system dynamic range, 1,000 spectra of each mixture are measured and averaged, providing more than 60 dB dynamic range at 0.3 THz and 25 dB dynamic range at 1.75 THz. All the measurements are conducted at room temperature (23°C), one atmosphere (atm) pressure, and 60% humidity. Under this condition, the dataset contains water vapor absorption lines at 0.56 THz, 0.75 THz, 0.99 THz, 1.10 THz, 1.16 THz, 1.21 THz, 1.23 THz, 1.41 THz, 1.60 THz, 1.66 THz, and 1.72 THz within 0.2 - 1.75 THz frequency range (Supplementary Fig. 9), which contribute nearly 2.3% of the nonlinear region in the measured frequency bands.

Supplementary Note 2: Linear Mixing Model Validation

Whether the dataset follows the linear mixing model is essential since HYPERION utilizes the convex geometry of the model to unmix the material signatures. To validate the linear mixing model upon the dataset, principle component analysis (PCA) is one of the most suitable tools since data information can still be reserved after projecting to a lower-dimensional space [46]. To be precise, we project the non-pure mixture signatures to the optimal 2 dimensional (2D) space, which is constructed by the two eigenvectors corresponding to the two largest eigenvalues. Additionally, only non-pure mixture points are selected for the calculation of the 2D space since the assumption of pure substances is not held. After obtaining the optimal 2D plane, the data points of pure substances are further projected to the optimal 2D plane. In the evaluation of the linear mixing model, the quinary dataset with pure substances is selected due to the material variety. As shown in Supplementary Fig. 10, the high-dimensional data points are projected onto the two-dimensional plane. Except for the extreme cases, most of the data points in every subplot form the triangle in the projected 2D space, where three points lie at the vertex and three points fall at the center of the side. The misaligned data points are caused by the inaccurate estimation of the two-dimensional plane, which is caused by the measurement noise and nonlinear light-matter interaction. Based on the projected data points, the measured data points are highly aligned with the linear mixing model. Additionally, the triangles involving D-glucose and D-lactose are narrower since the signatures of the two chemicals are the ill-conditioned case. It is worthwhile to mention that the alignment accuracy between the model and measured data may be further enhanced by modeling the nonlinear matter-matter interaction into the linear mixing model.

Refer to caption
Figure 9: The THz frequency-domain spectrum of air. Water vapor absorbs THz wave at 0.56 THz, 0.75 THz, 0.99 THz, 1.10 THz, 1.16 THz, 1.21 THz, 1.23 THz, 1.41 THz, 1.60 THz, 1.66 THz, and 1.72 THz within 0.2 - 1.75 THz frequency range. The absorption lines are indicated by gray dotted lines.
Refer to caption
Figure 10: Prinpical component analysis (PCA) projection of the quinary dataset. Data points from the quinary dataset with pure substances are projected on a two-dimensional projection plane by PCA. Five pure substances are D-glucose, D-lactose monohydrate, D-sucrose, L-tyrosine, and L-histidine, which are abbreviated as G, L, S, T, and H.

Supplementary Note 3: Spectral Angle Mapper (SAM) and Root Mean Square Error (RMSE)

Spectral angle mapper (SAM) is typically used to evaluate the shape similarity between two spectra. The definition is as a following equation:

θ(s1,s2)=cos1(s1s2s12s22),\theta(s_{1},s_{2})=\cos^{-1}\left(\frac{s_{1}\cdot s_{2}}{\left\|s_{1}\right\|_{2}\cdot\left\|s_{2}\right\|_{2}}\right), (6)

where s1s_{1} and s2s_{2} are spectra.
Root mean square error (RMSE) does not only focus on the shape similarity but the band-wise differences. RMSE is defined as a following equation:

d(s1,s2)=s1s222n,d(s_{1},s_{2})=\sqrt{\frac{\left\|s_{1}-s_{2}\right\|_{2}^{2}}{n}}, (7)

where nn is the dimension of the spectrum.

Supplementary Note 4: Data Preprocessing

The THz-TDS system provides a time-domain waveform that profiles the interactions between THz radiation and measured tablets. The reference waveform Sref(t)S_{\text{ref}}(t) and the tablet waveform Stablet(t)S_{\text{tablet}}(t), where tt is the time index, refer to the measured waveform without and with a tablet, respectively. After obtaining reference waveform Sref(t)S_{\text{ref}}(t) and tablet waveform Stablet(t)S_{\text{tablet}}(t), the fast Fourier transform (FFT) was applied to convert time-domain waveforms to the frequency-domain reference spectrum Sref(ω)S_{\text{ref}}(\omega) and tablet spectrum Stablet(ω)S_{\text{tablet}}(\omega), where ω\omega is the angular frequency. To further extract the material absorption spectrum (α(ω)\alpha(\omega)) of the tablet, the thickness of the tablet is an important parameter for the derivation. The thickness of the tablet is measured by the electric ruler, which provides a 0.1 mm resolution. Compared to the thickness of the tablet (i.e.,  3 mm), this resolution of 0.1 mm introduces approximately 3% inaccuracy in the measurement. The material absorption spectra of the tablets are derived from the Fresnel equations and the THz wave propagation model based on homogeneous and planar materials. Furthermore, according to this derivation, the thickness measurement error of 3% will only introduce approximately 3% error in material absorption spectra. Since the particle size of the tablet powder is vastly smaller than the wavelength of the THz waves (Supplementary Note 16) Additionally, the thickness of the tablet is significantly larger than the effective wavelength of the THz signal in the 0.2 - 1.75 THz frequency range such that the Fabry-Pérot effect [17] does not introduce drastic damping to the measured spectrum.

Under this condition, we do not include the scattering influence and the Fabry-Pérot effect in this model. The Fresnel equations describe the transmission and the reflection of a THz wave traveling through an interface, which are based on the complex refractive index of the material, n~(ω)=n(ω)jκ(ω)\tilde{n}(\omega)=n(\omega)-j\kappa(\omega), where n(ω)n(\omega) is the real refractive index and κ(ω)\kappa(\omega) represents the extinction coefficient which is proportional to the material absorption coefficient, κ(ω)=[α(ω)λ)]/4π\kappa(\omega)=\left[\alpha(\omega)\cdot\lambda)\right]/4\pi, where λ\lambda is the wavelength. The material absorption coefficients describe the power loss when the THz waves travel through the material. Considering a typical incident THz wave propagates a material with thickness dd (Supplementary Fig. 11), the Fresnel equation [17] at an interface can be represented as

tab(ω)=2n~a(ω)n~a(ω)+n~b(ω),t_{ab}(\omega)=\frac{2\tilde{n}_{a}(\omega)}{\tilde{n}_{a}(\omega)+\tilde{n}_{b}(\omega)},
rab(ω)=n~a(ω)n~b(ω)n~a(ω)+n~b(ω),r_{ab}(\omega)=\frac{\tilde{n}_{a}(\omega)-\tilde{n}_{b}(\omega)}{\tilde{n}_{a}(\omega)+\tilde{n}_{b}(\omega)},

where tab(ω)t_{ab}(\omega) is the transmission coefficient of a THz wave from the region aa (which is air in our experiment) to region bb (which is the tablet in our experiment) and rab(ω)r_{ab}(\omega) is the reflection coefficient of a THz wave at the aa-bb interface. Additionally, n~a(ω)\tilde{n}_{a}(\omega) and n~b(ω)\tilde{n}_{b}(\omega)) represent the complex refractive index of region aa and bb, respectively. The THz wave propagation constant, pb(ω,d)p_{b}(\omega,d), is governed by

pb(ω,d)=exp[jn~b(ω)ωdc],p_{b}(\omega,d)=\exp\left[\frac{-j\tilde{n}_{b}(\omega)\omega d}{c}\right],

where cc is the speed of light. Consequently, the equation for the received THz tablet signal in the frequency domain can be represented as

Stablet(ω)\displaystyle S_{\text{tablet}}(\omega) =Sref(ω)tabtbapb(ω,d)(The Fresnel loss is included in tab and tba.)\displaystyle=S_{\text{ref}}(\omega)\cdot t_{ab}\cdot t_{ba}\cdot p_{b}(\omega,d)\quad\textbf{(The Fresnel loss is included in $t_{ab}$ and $t_{ba}$.)}
=Sref(ω)tabtbaexp[jn~b(ω)ωdc]\displaystyle=S_{\text{ref}}(\omega)\cdot t_{ab}\cdot t_{ba}\cdot\exp\left[\frac{-j\tilde{n}_{b}(\omega)\omega d}{c}\right]
=Sref(ω)tabtbaexp{j[nb(ω)jκb(ω)]ωdc}\displaystyle=S_{\text{ref}}(\omega)\cdot t_{ab}\cdot t_{ba}\cdot\exp\left\{\frac{-j[n_{b}(\omega)-j\kappa_{b}(\omega)]\omega d}{c}\right\}
=Sref(ω)tabtbaexp[κb(ω)ωdc]exp[jnb(ω)ωdc].\displaystyle=S_{\text{ref}}(\omega)\cdot t_{ab}\cdot t_{ba}\cdot\exp\left[\frac{\kappa_{b}(\omega)\omega d}{c}\right]\cdot\exp\left[\frac{-jn_{b}(\omega)\omega d}{c}\right].

The reference signal normalizes the tablet signal, and its amplitude can be obtained from equation (6)

|Stablet(ω)Sref(ω)|=|tabtbaexp[κb(ω)ωdc]||H(ω)|.\left|\frac{S_{\text{tablet}}(\omega)}{S_{\text{ref}}(\omega)}\right|=\left|t_{ab}\cdot t_{ba}\cdot\exp\left[\frac{\kappa_{b}(\omega)\omega d}{c}\right]\right|\triangleq\left|H(\omega)\right|. (8)

The extinction coefficient can be derived by taking the natural logarithm operation of equation (8)

ln(|H(ω)|)=κb(ω)ωdc+ln(|tabtba|),\ln(\left|H(\omega)\right|)=\frac{\kappa_{b}(\omega)\omega d}{c}+\ln(\left|t_{ab}\cdot t_{ba}\right|),
κb(ω)=c[ln(|H(ω)|)ln(|tabtba|)]ωd.\kappa_{b}(\omega)=\frac{c\left[\ln(\left|H(\omega)\right|)-\ln(\left|t_{ab}\cdot t_{ba}\right|)\right]}{\omega d}.

Therefore, the material absorption coefficient can be represented as

α(ω)=4πκ(ω)λ=2[ln(|H(ω)|)ln(|tabtba|)]d.\alpha^{{}^{\prime}}(\omega)=\frac{4\pi\kappa(\omega)}{\lambda}=\frac{2\cdot\left[\ln(\left|H(\omega)\right|)-\ln(\left|t_{ab}\cdot t_{ba}\right|)\right]}{d}. (9)

Based on the derivation, to acquire the material absorption spectrum, we have first to calculate the transmission coefficients, tabt_{ab} and tbat_{ba}. However, the interface power loss is relatively small since the thickness of the sample is greatly larger than the THz wavelength, and the tablet surface is smooth and flat compared to the THz wavelength. This relatively small interface power loss results in the high transmission coefficients of tabt_{ab} and tbat_{ba}. Thus, we can neglected the second term of the right-hand side in equation (9) to form the equation (10) since the multiplication of tabt_{ab} and tbat_{ba} approaches to 11.

α(ω)=2ln(|H(ω)|)d=2ln(|Stablet(ω)Sref(ω)|)d.\alpha^{{}^{\prime}}(\omega)=\frac{2\cdot\ln(\left|H(\omega)\right|)}{d}=\frac{2\cdot\ln\left(\left|\frac{S_{\text{tablet}}(\omega)}{S_{\text{ref}}(\omega)}\right|\right)}{d}. (10)

To further match the notation in blind source separation fields, the equation (10) can be expressed in ordinary frequency, ff, by the equation of ω=2πf\omega=2\pi f:

α(ω)=α(2πf)α(f).\alpha^{{}^{\prime}}(\omega)=\alpha^{{}^{\prime}}(2\pi f)\triangleq\alpha(f). (11)
Refer to caption
Figure 11: Illustration of THz wave traveling through a material. A THz wave transmission and reflection pathways through a planar, homogeneous material with a thickness dd. The THz signals shown in this figure are represented in time-domain.

Supplementary Note 5: Mild algorithmic requirement

The aforementioned problem (i.e., recovering 𝒔1,,𝒔q{\bm{s}}_{1},\dots,{\bm{s}}_{q} from their mixtures 𝒙1,,𝒙{\bm{x}}_{1},\dots,{\bm{x}}_{\ell}) has been studied in the machine learning literature; for example, in the successive projection algorithm (SPA) [47], the problem is solved under one additional assumption called separability assumption, i.e.,

𝒔1,,𝒔q{𝒙1,,𝒙},{\bm{s}}_{1},\dots,{\bm{s}}_{q}\in\{{\bm{x}}_{1},\dots,{\bm{x}}_{\ell}\},

meaning that, for each material, there is a pure observation solely composed of such material. Under the separability assumption, one can show that conv{𝒙1,,𝒙}=conv{𝒔1,,𝒔q}\textrm{conv}\{{\bm{x}}_{1},\dots,{\bm{x}}_{\ell}\}=\textrm{conv}\{{\bm{s}}_{1},\dots,{\bm{s}}_{q}\}, and hence identifying the signatures is nothing but seeking the vertices of the observable geometry conv{𝒙1,,𝒙}\textrm{conv}\{{\bm{x}}_{1},\dots,{\bm{x}}_{\ell}\}, which is polynomial-time solvable. Remarkably, by following the definition of data purity γ(0,1]\gamma\in(0,1], defined in [27], the separability assumption holds if, and only if, γ=1\gamma=1 (the highest data purity). So, the separability assumption is quite restricted, and such assumption is often violated in practical hyperspectral applications [27].

In fact, the separability assumption may not hold in our THz application, as a tablet typically involves highly mixed chemicals, implying a low data purity. In recent mathematics literature, the LJE theory has been proven to work well even with very low data purity; specifically, the signatures can be perfectly recovered under a much milder assumption γ>1q1\gamma>\frac{1}{\sqrt{q-1}} [22]. Also, as aforementioned, the LJE criterion is theoretically and experimentally proven to be robust against the ill-conditioning of the signatures, hence quite suitable for our THz applications. Therefore, in this paper, the convex LJE criterion is applied to the transmitted (penetrating-type) THz signals for the first time, to be detailed later.

We remark that there is other non-convex criterion that can also perform HU under the mild requirement of γ>1q1\gamma>\frac{1}{\sqrt{q-1}} [48, 27], but such criterion cannot achieve the theoretical bound 1q1\frac{1}{\sqrt{q-1}} in practice due to its non-convexity [22]. By contrast, the LJE criterion has the same theoretical bound, and in the meanwhile, it is experimentally shown to be able to achieve the bound due to its convexity nature [22].

Supplementary Note 6: Implementation of THz HYPERION

To restore the chemical compositions, the equation (12) should be solved for every tablet.

min𝒓i3\displaystyle\min_{\bm{r}_{i}\in\mathbb{R}^{3}}{} 𝒙i𝑨𝒓i1\displaystyle~{}\left\|\bm{x}_{i}-\bm{A}\bm{r}_{i}\right\|_{1} (12)
s.t. 𝒓i0, 1𝑻𝒓i=1.\displaystyle~{}\ \bm{r}_{i}\succeq 0,\ \bm{1^{T}}\bm{r}_{i}=1.

Since the equation (12) is convex, it can be solved by the solver CVX [37] as the following pseudo-code:

cvx¯begin\displaystyle{\rm cvx\underline{~{}~{}}begin}
variabler(3)nonnegative;\displaystyle~{}~{}~{}~{}~{}~{}{\rm variable~{}r(3)~{}nonnegative;}
minimize(norm(xAr,1));\displaystyle~{}~{}~{}~{}~{}~{}{\rm minimize(~{}norm(x-A*r,1)~{});}
subjectto\displaystyle~{}~{}~{}~{}~{}~{}{\rm subject~{}to}
sum(r)==1;\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}{\rm sum(r)==1;}
cvx¯end\displaystyle{\rm cvx\underline{~{}~{}}end}

Supplementary Note 7: How the LJE problem (13) can be solved in practice?

One of the core in HYPERION is the LJE problem:

(𝑭,𝒄)=\displaystyle(\bm{F}^{\star},\bm{c}^{\star})= argmax𝑭𝕊++q1,𝒄q1logdet(𝑭)\displaystyle\arg\max_{{\bm{F}}\in\mathbb{S}^{q-1}_{++},~{}\!{\bm{c}}\in\mathbb{R}^{q-1}}~{}\log\det({\bm{F}}) (13)
s.t.𝑭𝒃ihi𝒃iT𝒄,i=1,,H,\displaystyle{\rm s.t.}~{}~{}~{}~{}~{}~{}\lVert\bm{F}\bm{b}_{i}\rVert\leq h_{i}-\bm{b}_{i}^{T}\bm{c},~{}\forall i=1,\dots,H,

where 𝕊++p1\mathbb{S}^{p-1}_{++} is the positive semidefinite (PSD) cone, the halfspace parameters {(𝒃1,h1),,(𝒃H,hH)}\{({\bm{b}}_{1},h_{1}),\dots,({\bm{b}}_{H},h_{H})\} come from the \cal H-polytope representation of the data 𝑿{\bm{X}} and the optimal argument (𝑭,𝒄)(\bm{F}^{\star},\bm{c}^{\star}) gives the desired maximum-volume ellipsoid inscribed in the penetrating-type THz data.

We remark that there is an algorithm [42] that can be adapted to solve equation (13) exactly, but it is mainly designed to handle the case with million-scale constraints HH induced by NASA’s benchmark hyperspectral data [49]. In typical THz applications, such as measuring the chemicals in a given tablet, we just have a few samples, and hence we do not need to use such a sophisticated method. Instead, the general-purpose convex software CVX [37] is effective and fast when the problem size is not too large, and hence is suitable to be adopted to solve equation (13).

To adopt CVX, we need to implement the constraints and the objective function of equation (13). The constraints of equation (13) are the well-known Lorentz cone constraints “𝑭𝒃ihi𝒃iT𝒄\lVert\bm{F}\bm{b}_{i}\rVert\leq h_{i}-\bm{b}_{i}^{T}\bm{c}”, theoretically equivalent to require the LJE to lie within the penetrating-type data convex hull (𝑷𝑻𝚺)(conv{𝒙1,,𝒙})\mathcal{H}({\bm{P}}{\bm{T}}\bm{\Sigma})\equiv\mathcal{H}(\textrm{conv}\{\bm{x}_{1},\dots,\bm{x}_{\ell}\}). As for the objective function, for PSD matrix 𝑭{\bm{F}}, we observe that

logdet(𝑭)det(𝑭)(det(𝑭))1q1.\log\det({\bm{F}})\propto\det({\bm{F}})\propto(\det({\bm{F}}))^{\frac{1}{q-1}}.

So, the objective function can be implemented by the command “det¯rootn(){\rm det\underline{~{}~{}}rootn(\cdot)}” supported by the CVX package. Solving the LJE problem by the general-purpose convex software CVX normally reuqires some prior knowledge to transform the problem into the supporting functions. To this end, we provide the pseudo-code for the better understanding. The pseudo-code using CVX is as following:

cvx¯begin\displaystyle{\rm cvx\underline{~{}~{}}begin}
variableF(q1,q1)symmetric;\displaystyle~{}~{}~{}~{}~{}~{}{\rm variable~{}F(q-1,q-1)~{}symmetric;}
variablec(q1);\displaystyle~{}~{}~{}~{}~{}~{}{\rm variable~{}c(q-1);}
maximize(det¯rootn(F));\displaystyle~{}~{}~{}~{}~{}~{}{\rm maximize(~{}det\underline{~{}~{}}rootn(F)~{});}
subjectto\displaystyle~{}~{}~{}~{}~{}~{}{\rm subject~{}to}
fori=1:H\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}{\rm for~{}i=1:H}
norm(Fb(i,:),2)+b(i,:)c<=h(i);\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}{\rm norm(~{}F*b(i,:),~{}2~{})+b(i,:)^{\prime}*c<=h(i);}
end\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}{\rm end}
cvx¯end\displaystyle{\rm cvx\underline{~{}~{}}end}

It shows that CVX can handle a non-standard form of convex programming.

Supplementary Note 8: Proof of Lemma 1. \square

Following the definition of precondition operator 𝒞\mathcal{C}, we have

𝒞(𝑿)\displaystyle\mathcal{C}({\bm{X}}) (𝑭)(𝑷𝑻𝚺𝒄𝟏T)\displaystyle\triangleq({\bm{F}}^{\star})^{\dagger}\left({\bm{P}}{\bm{T}}\bm{\Sigma}-{\bm{c}}^{\star}\bm{1}_{\ell}^{T}\right)
=(𝑭)𝑷𝑻𝚺(𝑭)𝒄𝟏T\displaystyle=({\bm{F}}^{\star})^{\dagger}{\bm{P}}{\bm{T}}\bm{\Sigma}-({\bm{F}}^{\star})^{\dagger}{\bm{c}}^{\star}\bm{1}_{\ell}^{T}
=(𝑭)𝑷𝑻𝚺(𝑭)𝒄𝟏qT𝑻𝚺\displaystyle=({\bm{F}}^{\star})^{\dagger}{\bm{P}}{\bm{T}}\bm{\Sigma}-({\bm{F}}^{\star})^{\dagger}{\bm{c}}^{\star}\bm{1}_{q}^{T}{\bm{T}}\bm{\Sigma}
=(𝑭)[𝒔1,,𝒔q]𝑻𝚺(𝑭)𝒄𝟏qT𝑻𝚺\displaystyle=({\bm{F}}^{\star})^{\dagger}[{\bm{s}}_{1},\dots,{\bm{s}}_{q}]{\bm{T}}\bm{\Sigma}-({\bm{F}}^{\star})^{\dagger}{\bm{c}}^{\star}\bm{1}_{q}^{T}{\bm{T}}\bm{\Sigma}
=[(𝑭)([𝒔1,,𝒔q]𝒄𝟏qT)]𝑻𝚺\displaystyle=\left[({\bm{F}}^{\star})^{\dagger}\left([{\bm{s}}_{1},\dots,{\bm{s}}_{q}]-{\bm{c}}^{\star}\bm{1}_{q}^{T}\right)\right]{\bm{T}}\bm{\Sigma}
=𝑺~𝑻𝚺,\displaystyle=\widetilde{{\bm{S}}}{\bm{T}}\bm{\Sigma},

where we have used the observation that each column of the matrix 𝑻𝚺{\bm{T}}\bm{\Sigma} actually provides a set of convex combination coefficients [44], implying the equation “𝟏qT𝑻𝚺=𝟏T\bm{1}_{q}^{T}{\bm{T}}\bm{\Sigma}=\bm{1}_{\ell}^{T}”. \blacksquare


Supplementary Note 9: Algorithm details for solving the HYPERION

As there are four variables in equation (14),

min𝑺~,𝑻,𝚺,𝑼\displaystyle\min_{\widetilde{{\bm{S}}},~{}\!{\bm{T}},\bm{\Sigma},~{}\!\bm{U}}{} 𝒞(𝑿)𝑺~𝑻𝚺F2+λϕ(𝑺~)+Iunitary(𝑼)\displaystyle~{}\left\|\mathcal{C}({\bm{X}})-\widetilde{{\bm{S}}}{\bm{T}}\bm{\Sigma}\right\|_{F}^{2}+\lambda\phi(\widetilde{{\bm{S}}})+I_{\textrm{unitary}}({\bm{U}}) (14)
s.t. 𝑻𝟎,𝚺𝟎,𝟏qT𝑻𝚺=𝟏T,\displaystyle~{}{\bm{T}}\geq\bm{0},~{}\bm{\Sigma}\geq\bm{0},~{}\bm{1}_{q}^{T}{\bm{T}}\bm{\Sigma}=\bm{1}_{\ell}^{T},

solving it directly using the alternating optimization may not be effective. Considering that what we are really interested in is the THz signatures 𝑺~\widetilde{{\bm{S}}}, we do not need to separately optimize all the variables, allowing us to introduce the trick of changes of variables to reduce the number of block variables. Specifically, by letting 𝑻~=𝑻𝚺\widetilde{{\bm{T}}}={\bm{T}}\bm{\Sigma}, equation (14) can be reformulated as

min𝑺~,𝑻~,𝑼\displaystyle\min_{\widetilde{{\bm{S}}},~{}\!\widetilde{{\bm{T}}},~{}\!\bm{U}}{} 𝒞(𝑿)𝑺~𝑻~F2+λϕ(𝑺~)\displaystyle~{}\left\|\mathcal{C}({\bm{X}})-\widetilde{{\bm{S}}}\widetilde{{\bm{T}}}\right\|_{F}^{2}+\lambda\phi(\widetilde{{\bm{S}}}) (15)
s.t. 𝑻~𝟎,𝟏qT𝑻~=𝟏T,𝑼T𝑼=𝑰q1.\displaystyle~{}{\widetilde{{\bm{T}}}\geq\bm{0},~{}\bm{1}_{q}^{T}\widetilde{{\bm{T}}}=\bm{1}_{\ell}^{T},}~{}\bm{U}^{T}\bm{U}=\bm{I}_{q-1}.

After reducing the number of block variables, it is natural to solve equation (15) using alternating optimization that alternatively updates 𝑺~,𝑻~,𝑼\widetilde{{\bm{S}}},~{}\!\widetilde{{\bm{T}}},~{}\!\bm{U}, as detailed next.

The subproblem for solving 𝑺~\widetilde{{\bm{S}}} (with 𝑻~,𝑼\widetilde{{\bm{T}}},~{}\!\bm{U} fixed) can be explicitly written as

𝑺~=argmin𝑺~𝒞(𝑿)𝑺~𝑻~F2+λ𝑺~α𝑼T𝑺0F2.\displaystyle\widetilde{{\bm{S}}}^{\star}=\arg\min_{\widetilde{{\bm{S}}}}\left\|\mathcal{C}({\bm{X}})-\widetilde{{\bm{S}}}\widetilde{{\bm{T}}}\right\|_{F}^{2}+\lambda\left\|\widetilde{{\bm{S}}}-\alpha{\bm{U}}^{T}{\bm{S}}_{0}\right\|_{F}^{2}. (16)

By defining 𝒎~vec(𝑺~)\widetilde{\bm{m}}\triangleq\textrm{vec}(\widetilde{{\bm{S}}}), 𝒗[vec(𝒞(𝑿))T,λ(vec(α𝑼T𝑺0))T]T{\bm{v}}\triangleq[\textrm{vec}(\mathcal{C}({\bm{X}}))^{T},~{}\sqrt{\lambda}(\textrm{vec}(\alpha{\bm{U}}^{T}{\bm{S}}_{0}))^{T}]^{T} and 𝑷~[𝑻~𝑰q1,λ𝑰q(q1)]T\widetilde{{\bm{P}}}\triangleq[\widetilde{{\bm{T}}}\otimes\bm{I}_{q-1},~{}\sqrt{\lambda}\bm{I}_{q(q-1)}]^{T}, equation (16) can be simplified as vec(𝑺~)𝒎~=argmin𝒎~𝑷~𝒎~𝒗22\textrm{vec}(\widetilde{{\bm{S}}}^{\star})\triangleq\widetilde{\bm{m}}^{\star}=\arg\min_{\widetilde{\bm{m}}}~{}\|\widetilde{{\bm{P}}}\widetilde{\bm{m}}-{\bm{v}}\|_{2}^{2}, which closed-form solution can be derived as

𝒎~\displaystyle\widetilde{\bm{m}}^{\star} =(𝑷~T𝑷~)1(𝑷~T𝒗)\displaystyle=(\widetilde{{\bm{P}}}^{T}\widetilde{{\bm{P}}})^{-1}(\widetilde{{\bm{P}}}^{T}{\bm{v}})
={[𝑻~𝑻~T+λ𝑰q]1𝑰q1}vec(𝒞(𝑿)𝑻~T+λα𝑼T𝑺0)\displaystyle=\{[\widetilde{{\bm{T}}}\widetilde{{\bm{T}}}^{T}+\lambda\bm{I}_{q}]^{-1}\otimes\bm{I}_{q-1}\}\textrm{vec}(\mathcal{C}({\bm{X}})\widetilde{{\bm{T}}}^{T}+\lambda\alpha{\bm{U}}^{T}{\bm{S}}_{0})
=vec(𝑰q1[𝒞(𝑿)𝑻~T+λα𝑼T𝑺0][𝑻~𝑻~T+λ𝑰q]T),\displaystyle=\textrm{vec}\left(\bm{I}_{q-1}~{}\![\mathcal{C}({\bm{X}})\widetilde{{\bm{T}}}^{T}+\lambda\alpha{\bm{U}}^{T}{\bm{S}}_{0}]~{}\![\widetilde{{\bm{T}}}\widetilde{{\bm{T}}}^{T}+\lambda\bm{I}_{q}]^{-T}\right),

where \otimes and vec()\textrm{vec}(\cdot) denote Kronecker product and vectorization operator, respectively. Therefore, we have 𝑺~=[𝒞(𝑿)𝑻~T+λα𝑼T𝑺0][𝑻~𝑻~T+λ𝑰q]T\widetilde{{\bm{S}}}^{\star}=[\mathcal{C}({\bm{X}})\widetilde{{\bm{T}}}^{T}+\lambda\alpha{\bm{U}}^{T}{\bm{S}}_{0}][\widetilde{{\bm{T}}}\widetilde{{\bm{T}}}^{T}+\lambda\bm{I}_{q}]^{-T}.

Next, note that the subproblem for solving 𝑻~\widetilde{{\bm{T}}} (with 𝑺~,𝑼\widetilde{{\bm{S}}},~{}\!\bm{U} fixed) is

𝑻~=argmin𝑻~𝟎,𝟏qT𝑻~=𝟏T\displaystyle\widetilde{{\bm{T}}}^{\star}=\arg\min_{\widetilde{{\bm{T}}}\geq\bm{0},~{}\bm{1}_{q}^{T}\widetilde{{\bm{T}}}=\bm{1}_{\ell}^{T}}{} 𝒞(𝑿)𝑺~𝑻~F2,\displaystyle~{}\left\|\mathcal{C}({\bm{X}})-\widetilde{{\bm{S}}}\widetilde{{\bm{T}}}\right\|_{F}^{2},

which can be easily solved as the well-known fully-constrained least-squares problem [50, 51]. Finally, the subproblem for solving 𝑼{\bm{U}} (with 𝑺~,𝑻~\widetilde{{\bm{S}}},~{}\!\widetilde{{\bm{T}}} fixed) is a non-convex optimization problem

𝑼=argmin𝑼T𝑼=𝑰q1𝑺~α𝑼T𝑺0F2,\displaystyle{\bm{U}}^{\star}=\arg\min_{\bm{U}^{T}\bm{U}=\bm{I}_{q-1}}\left\|\widetilde{{\bm{S}}}-\alpha{\bm{U}}^{T}{\bm{S}}_{0}\right\|_{F}^{2}, (17)

but can be elegantly solved using the following inequalities for square matrices 𝑨,𝑩{\bm{A}},{\bm{B}}, i.e., trace(𝑨)𝑨=𝑨S1\textrm{trace}({\bm{A}})\leq\|{\bm{A}}\|_{*}=\|{\bm{A}}\|_{S^{1}} and 𝑨𝑩S1𝑨Sp𝑩Sq\|{\bm{A}}{\bm{B}}\|_{S^{1}}\leq\|{\bm{A}}\|_{S^{p}}\|{\bm{B}}\|_{S^{q}}, where 1p+1q=1\frac{1}{p}+\frac{1}{q}=1 (1p,q1\leq p,q\leq\infty), \|\cdot\|_{*} denotes nuclear norm, and Sp\|\cdot\|_{S^{p}} denotes Schatten pp-norm. Since 𝑼{\bm{U}} is unitary, the objective function of equation (17) can be simplified as trace(𝑼𝑺~𝑺0T)\textrm{trace}({\bm{U}}\widetilde{{\bm{S}}}{\bm{S}}_{0}^{T}), whose upper bound is derivable using the inequalities as trace(𝑼𝑺~𝑺0T)𝑼𝑺~𝑺0TS1𝑼S𝑺~𝑺0TS1=𝑺~𝑺0TS1\textrm{trace}({\bm{U}}\widetilde{{\bm{S}}}{\bm{S}}_{0}^{T})\leq\|{\bm{U}}\widetilde{{\bm{S}}}{\bm{S}}_{0}^{T}\|_{S^{1}}\leq\|{\bm{U}}\|_{S^{\infty}}\|\widetilde{{\bm{S}}}{\bm{S}}_{0}^{T}\|_{S^{1}}=\|\widetilde{{\bm{S}}}{\bm{S}}_{0}^{T}\|_{S^{1}}, and is achievable by 𝑼=𝑫1𝑫2T{\bm{U}}^{\star}={\bm{D}}_{1}{\bm{D}}_{2}^{T}, where 𝑫1,𝑫2{\bm{D}}_{1},{\bm{D}}_{2} are obtained from the singular value decomposition of 𝑺~𝑺0T=𝑫2𝚺𝑫1T\widetilde{{\bm{S}}}{\bm{S}}_{0}^{T}={\bm{D}}_{2}\bm{\Sigma}^{\prime}{\bm{D}}_{1}^{T}.

The algorithm, termed HYperspectral Penetrating-type Ellipsoidal ReconstructION (HYPERION), has been completed. Remarkably, HYPERION does not use any information about the pattern of resonant peaks and is designed under a fully blind setting.

Supplementary Note 10: Noise Modeling of the System

The noise in our THz-TDS system corresponds to additive white Gaussian noise (AWGN), whose standard deviation in the single time-domain trace is approximately 0.48% of the peak-to-peak amplitude as shown in Supplementary Fig. 12. According to the law of large numbers, the standard deviation of AWGN is linearly proportional to n\sqrt{n}, where nn is the number of time traces. In this regard, we applied the AWGN to our dataset to simulate the different levels of noise conditions. We also present this noise amplitude in terms of SNR by adopting the equation (18) from the speech signal processing field.

SNR=10log10(P(S)P(N)P(N)),\displaystyle\text{SNR}=10\log_{10}\left(\frac{P(S)-P(N)}{P(N)}\right), (18)

where P(S)P(S) and P(N)P(N) are the power of signal and noise, respectively.

Refer to caption
Figure 12: THz-TDS system noise distribution. The noise distribution in our THz-TDS system follows the Gaussian distribution. The noise standard deviation is 0.48% of the mean of the peak amplitude. Based on this information, we can infer the noise level in different average number configuration.

Supplementary Note 11: Asynchronized Optical Sampling (ASOPS) THz-TDS System

The conventional THz time-domain spectroscopy (THz-TDS) system is essential for many THz applications, such as remote sensing, material characterization, imaging, and defect inspection. As THz-TDS systems are capable of sampling ultrafast THz signals in the time domain, THz-TDS systems can provide more information, such as time delay, spectral phase, and spectral amplitude [52]. A conventional THz-TDS system utilizes a mechanical delay-line stage to introduce a time delay between two split femtosecond laser beams: the so-called pump beam and probe beam [53]. The pump beam is coupled to the THz photoconductive antenna emitter to generate the THz radiations. The probe beam is fed to the THz photoconductive antenna detector to gate received THz signals [54, 52]. With the accumulated time delay between two beams, a THz-TDS system can profile the THz time-domain signal, where the effective sampling frequency is typically tens of THz, which is the reciprocal of the time delay [53]. However, the sampling time and the quality of the time-domain signal are severely limited by the mechanical delay line stage condition, such as maximum moving velocity, position resolution, and operational stability. To address the limitation, the asynchronized optical sampling (ASOPS) THz-TDS system is developed to increase the sampling frequency. Additionally, the ASOPS THz-TDS system is more stable due to the elimination of the mechanical delay line stage interference [55]. By replacing the mechanical delay line stage with the two asynchronized femtosecond lasers, the time delay between the probe beam and the pump beam can be accumulated in a short time. More specifically, the sampling time of a single time trace, tsamplet_{\text{sample}}, can be decreased down to within millisecond timescale. Additionally, tsamplet_{\text{sample}} is related to the repetition rate of the lasers as

tsample=1frepA×1frepA1frepA1frepA+Δf1Δf,\displaystyle t_{\text{sample}}=\frac{\frac{1}{f^{A}_{\text{rep}}}\times\frac{1}{f^{A}_{\text{rep}}}}{\frac{1}{f^{A}_{\text{rep}}}-\frac{1}{f^{A}_{\text{rep}}+\Delta f}}\cong\frac{1}{\Delta f},

where frepAf^{A}_{\text{rep}} and Δf\Delta f are the repetition rate of the pump beam laser and repetition rate difference between the two femtosecond lasers. In our ASOPS THz-TDS system, the repetition rate of the pump beam laser is 100 MHz. To achieve the high time-domain sampling resolution and the fast sampling time simultaneously, the repetition rate difference is set as 10 Hz, delivering 5 fs sampling resolution and 100 ms sampling time. However, under this configuration, the size of the full single-time trace is considerably large, requiring high data transfer rate and large storage space. To prevent those issues, we have only taken 100 ps segment of the single time trace and set the sampling rate of the DAQ card as 20 MHz. Accordingly, our ASOPS THz-TDS system provides 0.01 THz resolution in the spectrum and 5 fs time-domain sampling resolution.

Supplementary Note 12: Frequency Band Selection

The detectable bands of the spectra are between 0.2 THz and 1.75 THz due to physical and hardware limitations [56, 57]; the THz spectral signal below 0.2 THz suffers from the physical and instrumental limitations, such as photoconductive material carrier lifetime, bandwidth, and sensing area of the THz photoconductive antennas; the THz spectral signal above 1.75 THz decays to the system noise level due to the absorption of the tablets. Consequently, we have only extracted THz bands between 0.2 THz and 1.75 THz as our valid spectra for the further unmixing experiments.

Supplementary Note 13: Nonnegative Matrix Factorization (NMF)

The nonnegative matrix factorization (NMF) is one of the commonly used methods to recover material signatures in the blind sense. To address the non-convexity of NMF, the result of NMF is taken from one of the 10 trials under different initialization based on the best root-mean-squared error (RMSE). The RMSE, DD, is defined as:

D=𝑿𝑾𝑯Fnm,\displaystyle D=\frac{\left\|\bm{X}-\bm{WH}\right\|_{F}}{\sqrt{nm}},

where 𝐗n×m\mathbf{X}\in\mathbb{R}^{n\times m}, 𝐖n×k\mathbf{W}\in\mathbb{R}^{n\times k}, and 𝐇k×m\mathbf{H}\in\mathbb{R}^{k\times m} are the target, signatures matrix, and abundance matrix, respectively. In the experiment, kk equals to 3 and 5 in the ternary and quinary cases, respectively. Additionally, we adopt the naive NMF in equation (19) since validity of the commonly-used regularizers, such as L2L^{2} norm or L1L^{1} norm, have not yet been verified with reasonable physical meaning. The alternating least square method is used to solve this NMF optimization problem:

min𝑾,𝑯\displaystyle\min_{{\bm{W}},~{}\!{\bm{H}}}{} 𝑿𝑾𝑯F\displaystyle~{}\left\|{\bm{X}}-{\bm{W}}{\bm{H}}\right\|_{F} (19)
s.t. 𝑾𝟎,𝑯𝟎.\displaystyle~{}{{\bm{W}}\geq\bm{0},~{}{\bm{H}}\geq\bm{0}.}

Supplementary Note 14: Linear Mixing Model

Since the sum-to-one criterion derived in the “Methods” section is based on the assumption where interface power loss is relatively small compared to material absorption. To validate the assumption in practical cases, we have measured the spectra of three mixture tablets with different mixing approaches as shown in Supplementary Fig. 14. The three tablets are composed of the same amount of substances. As shown in Supplementary Fig. 15, the deviation among three measured spectra by different mixing approaches is only less than 2 dB. Based on the experiment, it concludes that the spectrum is not severely affected by the mixing approaches.

In addition to the mixing approaches, the dynamic range of the THz-TDS system affects the accuracy of linear mixing model on the material absorption spectra since the largest detectable absorption coefficients αmax(f)\alpha_{\text{max}}(f) are determined by the THz-TDS system dynamic range [58]. More specifically, the performance of HYPERION will be decreased when the material absorption coefficients are larger than the αmax(f)\alpha_{\text{max}}(f). To this sense, it is important to compare the αmax(f)\alpha_{\text{max}}(f) of our ASOPS THz-TDS system to the pure substance absorption spectra. As shown in Supplementary Fig. 13, in the range between 1.3 THz and 1.5 THz, the material absorption coefficients of D-lactose monohydrate and D-glucose excess the αmax(f)\alpha_{\text{max}}(f). Except this region, the material absorption spectra are less than the αmax(f)\alpha_{\text{max}}(f). Although the measured absorption coefficients are not accurate in this region, HYPERION can still deliver the acceptable unmixing performance due to use of the material absorption information from all measured frequency bands.

Refer to caption
Figure 13: The comparison among absorption spectra of five materials and the largest detectable absorption spectra αmax\alpha_{max} over the range of 0.2 to 1.75 THz.

Supplementary Note 15: Composition Estimation Comparison of HYPERION, NMF, HMFA, nICA, and SPA

We compared the material composition estimation by the material absorption spectra unmixed from different methods, including HYPERION, NMF, HMFA, nICA, and SPA. The configuration of the test set is same as in the “Application” section. As shown in Supplementary Table. 1, HYPERION delivers superior performance of estimating material composition since HYPERION unmixs the accurate material absorption spectra. HMFA and nICA have the much inferior performance on material estimation since the unmixed material absorption spectra are much suppressed in magnitude. The unmixed material spectra by the quinary dataset without pure substances are used to restore the chemical compositions of the 15 tablets in the test set by the convex optimization problem in equation (20).

min𝒓i3\displaystyle\min_{\bm{r}_{i}\in\mathbb{R}^{3}}~{} 𝒙i𝑨𝒓i1\displaystyle~{}\left\|\bm{x}_{i}-\bm{A}\bm{r}_{i}\right\|_{1} (20)
s.t. 𝒓i0, 1𝑻𝒓i=1,\displaystyle~{}\ \bm{r}_{i}\geq 0,\ \bm{1^{T}}\bm{r}_{i}=1, (21)
Table 1: The composition estimation comparison between HYPERION, HMFA, NMF, SPA and nICA.
[Uncaptioned image]

Supplementary Note 16: Grain Size of Pure Substances and the Scattering Effect

Here, we will investigate the effect of the scattering on the unmixed spectra results. Since the effect of scattering is proportional to the grain size of the tablet [59], an optical microscopy with 1.25 numerical aperture (NA) is utilized to measure the grain sizes of different tablets. In the measurement, all chemicals follow the same grounding approach (see “Method” for the details) and are placed on the standard microscopy slide. For each chemical, the grain size is determined by taking the average of the grain size of 7 randomly selected grains. As shown in Supplementary Table. 2, the grain sizes of all chemicals fall in 1 µm range. Thus, the scattering effect in different chemicals falls in the same scale. To this sense, we can model the measured material absorption as the addition of true material absorption and the scattering effect as in equation (22).

umsr(f)=uabs(f)+usct(f),\displaystyle u_{msr}(f)=u_{abs}(f)+u_{sct}(f), (22)

where umsru_{msr}, uabsu_{abs}, and usctu_{sct} are the measured material absorption, material absorption and the scattering effect, respectively. Since the grain sizes of different chemicals fall in the same range, usct(f)u_{sct}(f) is considerably identical for different chemicals. In HYPERION, this constant shift usct(f)u_{sct}(f) will be calibrated in the affine fitting step and added to the unmixed spectrum in the transformation between the preconditioned space and original space. Thus, the scattering effect will not affect the unmixing performance of HYPERION when the grain sizes of different chemical fall in the same scale.

Table 2: The grain size of five pure substances. The measurement is performed by a optical microscopy with numerical aperture (NA) of 1.25.
[Uncaptioned image]
Refer to caption
Figure 14: Three different permutation approaches. The 1st{}^{\text{st}} way is to mix chemicals randomly, the 2nd{}^{\text{nd}} way is to mix chemicals in order, and the 3rd{}^{\text{rd}} way is to mix chemicals in a staggered manner.
Refer to caption
Figure 15: The THz frequency-domain spectrum of different permutation approaches. The amplitude deviation among three different permutation approaches (uniform mix, stack mix, and staggered mix) is less than 2 dB within 0.2 - 1.0 THz.
Table 3: The quantitative comparison between the HYPERION and NMF on five substances (HYPERION/NMF) under different noise levels. Performances of HYPERION and NMF are evaluated by the root mean square error (RMSE). Superior values are labeled in bold and are highlighted with different colors (HYPERION: orange; NMF: blue).
[Uncaptioned image]
Table 4: The quantitative comparison between the HYPERION and HMFA on five substances (HYPERION/HMFA) under different noise levels. Performances of HYPERION and HMFA are evaluated by the root mean square error (RMSE). Superior values are labeled in bold and are highlighted with different colors (HYPERION: orange; HMFA: blue).
[Uncaptioned image]
Table 5: The quantitative comparison between the HYPERION and nICA on five substances (HYPERION/nICA) under different noise levels. Performances of HYPERION and nICA are evaluated by the root mean square error (RMSE). Superior values are labeled in bold and are highlighted with different colors (HYPERION: orange; nICA: blue).
[Uncaptioned image]
Table 6: The quantitative comparison between the HYPERION and SPA on five substances (HYPERION/SPA) under different noise levels. Performances of HYPERION and SPA are evaluated by the root mean square error (RMSE). Superior values are labeled in bold and are highlighted with different colors (HYPERION: orange; SPA: blue).
[Uncaptioned image]
Table 7: Tablet thickness in the quinary dataset.
[Uncaptioned image]
Table 8: The convergence time of HYPERION, HMFA, NMF, SPA, and nICA.
[Uncaptioned image]

References

  • [1] Swen Koenig et al. “Wireless sub-THz communication system with high data rate” In Nat. Photonics 7.12 Nature Publishing Group, 2013, pp. 977–981
  • [2] Paul J Curran “Review Article Remote sensing methodologies and geography” In Int. J. Remote Sens. 8.9 Taylor & Francis, 1987, pp. 1255–1275
  • [3] Kazunori Akiyama et al. “First M87 event horizon telescope results. IV. Imaging the central supermassive black hole” In Astrophys. J. 875.1 IOP Publishing, 2019, pp. L4
  • [4] Noel Gorelick et al. “Google Earth Engine: Planetary-scale geospatial analysis for everyone” In Remote Sens. Environ. 202 Elsevier, 2017, pp. 18–27
  • [5] Daniele Faccio and Andreas Velten “A trillion frames per second: the techniques and applications of light-in-flight photography” In Rep. Prog. Phys. 81.10 IOP Publishing, 2018, pp. 105901
  • [6] TR Globus et al. “THz-spectroscopy of biological molecules” In J. Biol. Phys. 29.2 Springer, 2003, pp. 89–100
  • [7] Andrea G Markelz, Joseph R Knab, Jing Yin Chen and Yunfen He “Protein dynamical transition in terahertz dielectric response” In Chem. Phys. Lett. 442.4-6 Elsevier, 2007, pp. 413–417
  • [8] Yi-Chun Hung, Chia-Hsiang Lin, Feng-Yu Wang and Shang-Hua Yang “Penetrating Terahertz Hyperspectral Unmixing via Löwner-John Ellipsoid (THz HU-LJE): An Unsupervised Algorithm” In 2020 45th International Conference on Infrared, Millimeter, and Terahertz Waves (IRMMW-THz), 2020, pp. 1–2 IEEE
  • [9] Paul A George et al. “Ultrafast optical-pump terahertz-probe spectroscopy of the carrier relaxation and recombination dynamics in epitaxial graphene” In Nano Lett. 8.12 ACS Publications, 2008, pp. 4248–4251
  • [10] Kodo Kawase, Yuichi Ogawa, Yuuki Watanabe and Hiroyuki Inoue “Non-destructive terahertz imaging of illicit drugs using spectral fingerprints” In Opt. Express 11.20 Optical Society of America, 2003, pp. 2549–2554
  • [11] John F Federici et al. “THz imaging and sensing for security applications—explosives, weapons and drugs” In Semicond. Sci. Technol. 20.7 IOP Publishing, 2005, pp. S266
  • [12] Andreas J Huber et al. “Terahertz near-field nanoscopy of mobile carriers in single semiconductor nanodevices” In Nano Lett. 8.11 ACS Publications, 2008, pp. 3766–3770
  • [13] Tyler L Cocker et al. “An ultrafast terahertz scanning tunnelling microscope” In Nat. Photonics 7.8 Nature Publishing Group, 2013, pp. 620–625
  • [14] Charles A Schmuttenmaer “Exploring dynamics in the far-infrared with terahertz spectroscopy” In Chem. Rev. 104.4 ACS Publications, 2004, pp. 1759–1780
  • [15] OP Cherkasova, MM Nazarov, M Konnikova and AP Shkurinov “THz spectroscopy of bound water in glucose: Direct measurements from crystalline to dissolved state” In J. Infrared Millim. Terahertz Waves 41.9 Springer, 2020, pp. 1057–1068
  • [16] PC Upadhya, YC Shen, AG Davies and EH Linfield “Terahertz time-domain spectroscopy of glucose and uric acid” In J. Biol. Phys. 29.2 Springer, 2003, pp. 117–121
  • [17] Timothy D Dorney, Richard G Baraniuk and Daniel M Mittleman “Material parameter estimation with terahertz time-domain spectroscopy” In J. Opt. Soc. Am. A 18.7 Optical Society of America, 2001, pp. 1562–1571
  • [18] Hua Zhong, Albert Redo-Sanchez and X-C Zhang “Identification and classification of chemicals using terahertz reflective spectroscopic focal-plane imaging system” In Opt. Express 14.20 Optical Society of America, 2006, pp. 9130–9141
  • [19] Xian Li et al. “Component spectra extraction from terahertz measurements of unknown mixtures” In Appl. Opt. 54.30 Optical Society of America, 2015, pp. 8925–8934
  • [20] Yehao Ma et al. “THz spectral data analysis and components unmixing based on non-negative matrix factorization methods” In Spectrochim. Acta A Mol. Biomol. Spectrosc. 177 Elsevier, 2017, pp. 49–57
  • [21] Soner Balci et al. “Independent component analysis applications on THz sensing and imaging” In Image Sensing Technologies: Materials, Devices, Systems, and Applications III 9854, 2016, pp. 98540K International Society for OpticsPhotonics
  • [22] Chia-Hsiang Lin et al. “Maximum Volume Inscribed Ellipsoid: A New Simplex-Structured Matrix Factorization Framework via Facet Enumeration and Convex Optimization” In SIAM J. Imaging Sci. 11.2, 2018, pp. 1651–1679
  • [23] Qi Wei, José Bioucas-Dias, Nicolas Dobigeon and Jean-Yves Tourneret “Hyperspectral and multispectral image fusion based on a sparse representation” In IEEE Trans. Geosci. Remote Sens. 53.7 IEEE, 2015, pp. 3658–3668
  • [24] Laetitia Loncan et al. “Hyperspectral pansharpening: A review” In IEEE Geosci. Remote Sens. 3.3 IEEE, 2015, pp. 27–46
  • [25] Kacper Nowak et al. “Selected aspects of terahertz spectroscopy in pharmaceutical sciences.” In Acta Pol. Pharm. 72.5, 2015, pp. 851–866
  • [26] John F Gamble et al. “Investigation into the degree of variability in the solid-state properties of common pharmaceutical excipients—anhydrous lactose” In AAPS PharmSciTech 11.4 Springer, 2010, pp. 1552–1557
  • [27] Chia-Hsiang Lin et al. “Identifiability of the Simplex Volume Minimization Criterion for Blind Hyperspectral Unmixing: The No Pure-Pixel Case” In IEEE Trans. Geosci. Remote Sens. 53.10, 2015, pp. 5530–5546
  • [28] Barmak Heshmat et al. “Terahertz scattering and water absorption for porosimetry” In Opt. Express 25.22 Optical Society of America, 2017, pp. 27370–27385
  • [29] Chia-Hsiang Lin and José M Bioucas-Dias “Nonnegative blind source separation for ill-conditioned mixtures via John ellipsoid” In IEEE Trans. Neural Netw. Learn. Syst. 32.5 IEEE, 2020, pp. 2209–2223
  • [30] Michael W Berry et al. “Algorithms and applications for approximate nonnegative matrix factorization” In Comput. Stat. Data Anal. 52.1 Elsevier, 2007, pp. 155–173
  • [31] Mark D Plumbley “Algorithms for nonnegative independent component analysis” In IEEE Trans. Neural Netw. 14.3 IEEE, 2003, pp. 534–543
  • [32] E Kriesten et al. “Identification of unknown pure component spectra by indirect hard modeling” In Chemometr. Intell. Lab. Syst. 93.2 Elsevier, 2008, pp. 108–119
  • [33] Sanjeev Arora et al. “A practical algorithm for topic modeling with provable guarantees” In International conference on machine learning, 2013, pp. 280–288 PMLR
  • [34] Chih-Jen Lin “On the convergence of multiplicative update algorithms for nonnegative matrix factorization” In IEEE Trans. Neural Netw. 18.6 IEEE, 2007, pp. 1589–1596
  • [35] Russell Albright et al. “Algorithms, initializations, and convergence for the nonnegative matrix factorization”, 2006
  • [36] Wing-Kin Ma et al. “A signal processing perspective on hyperspectral unmixing: Insights from remote sensing” In IEEE Signal Process. Mag. 31.1 IEEE, 2013, pp. 67–81
  • [37] M. Grant and S. Boyd “CVX: Matlab Software for Disciplined Convex Programming, version 1.21”, http://cvxr.com/cvx/, 2011
  • [38] Qiushuo Sun et al. “Recent advances in terahertz technology for biomedical applications” In Quant. Imaging. Med. Surg. 7.3 AME Publications, 2017, pp. 345
  • [39] C Seco-Martorell et al. “Goya’s artwork imaging with Terahertz waves” In Opt. Express 21.15 Optical Society of America, 2013, pp. 17800–17805
  • [40] X Yu Lawrence et al. “Applications of process analytical technology to crystallization processes” In Adv. Drug Deliv. Rev. 56.3 Elsevier, 2004, pp. 349–369
  • [41] Lionel Duvillaret, Frédéric Garet and Jean-Louis Coutaz “Highly precise determination of optical constants and sample thickness in terahertz time-domain spectroscopy” In Appl. Opt. 38.2 Optical Society of America, 1999, pp. 409–415
  • [42] Chia-Hsiang Lin and José M Bioucas-Dias “Nonnegative blind source separation for ill-conditioned mixtures via John ellipsoid” In IEEE Trans. Neural. Netw. Learn. Syst. IEEE, 2020
  • [43] Chia-Hsiang Lin, Chong-Yung Chi, Yu-Hsiang Wang and Tsuang-Han Chan “A fast hyperplane-based minimum-volume enclosing simplex algorithm for blind hyperspectral unmixing” In IEEE Trans. Signal Process. 64.8 IEEE, 2016, pp. 1946–1961
  • [44] Chong-Yung Chi, Wei-Chiang Li and Chia-Hsiang Lin “Convex Optimization for Signal Processing and Communications: From Fundamentals to Applications” CRC Press, Boca Raton, FL, 2017
  • [45] C.. Barber, D.. Dobkin and H. Huhdanpaa “The quickhull algorithm for convex hulls” In ACM Trans. Math. Softw. 22.4 ACM, 1996, pp. 469–483
  • [46] Fabian Pedregosa et al. “Scikit-learn: Machine learning in Python” In J. Mach. Learn. Res. 12 JMLR. org, 2011, pp. 2825–2830
  • [47] Sanjeev Arora et al. “A practical algorithm for topic modeling with provable guarantees” In International Conference on Machine Learning, 2013
  • [48] M.. Craig “Minimum-volume transforms for remotely sensed data” In IEEE Trans. Geosci. Remote Sens. 32.3, 1994, pp. 542–552
  • [49] Robert O Green et al. “Imaging spectroscopy and the airborne visible/infrared imaging spectrometer (AVIRIS)” In Remote Sens. Environ. 65.3 Elsevier, 1998, pp. 227–248
  • [50] J.. Bioucas-Dias and M… Figueiredo “Alternating direction algorithms for constrained sparse regression: Application to hyperspectral unmixing” In 2nd Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), 2010, pp. 1–4 DOI: 10.1109/WHISPERS.2010.5594963
  • [51] D. Heinz and C.-I. Chang “Fully constrained least squares linear mixture analysis for material quantification in hyperspectral imagery” In IEEE Trans. Geosci. Remote Sens. 39.3, 2001, pp. 529–545
  • [52] P Uhd Jepsen, David G Cooke and Martin Koch “Terahertz spectroscopy and imaging–Modern techniques and applications” In Laser Photonics Rev. 5.1 Wiley Online Library, 2011, pp. 124–166
  • [53] Jens Neu and Charles A Schmuttenmaer “Tutorial: An introduction to terahertz time domain spectroscopy (THz-TDS)” In J. Appl. Phys. 124.23 AIP Publishing LLC, 2018, pp. 231101
  • [54] Nathan M Burford and Magda O El-Shenawee “Review of terahertz photoconductive antenna technology” In Opt. Eng. 56.1 International Society for OpticsPhotonics, 2017, pp. 010901
  • [55] Christof Janke et al. “Asynchronous optical sampling for high-speed characterization of integrated resonant terahertz sensors” In Opt. Lett. 30.11 Optical Society of America, 2005, pp. 1405–1407
  • [56] Masahiko Tani, Michael Herrmann and Kiyomi Sakai “Generation and detection of terahertz pulsed radiation with photoconductive antennas and its application to imaging” In Meas. Sci. Technol. 13.11 IOP Publishing, 2002, pp. 1739
  • [57] Sascha Preu et al. “Tunable, continuous-wave terahertz photomixer sources and applications” In J. Appl. Phys. 109.6 American Institute of Physics, 2011, pp. 4
  • [58] Peter Uhd Jepsen and Bernd M Fischer “Dynamic range in terahertz time-domain transmission and reflection spectroscopy” In Optics letters 30.1 Optical Society of America, 2005, pp. 29–31
  • [59] Aparajita Bandyopadhyay et al. “Effects of scattering on THz spectra of granular solids” In International Journal of Infrared and Millimeter Waves 28.11 Springer, 2007, pp. 969–978