This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Learning to Identify Semi-Visible Jets

Taylor Faucett Department of Physics and Astronomy, University of California, Irvine CA    Shih-Chieh Hsu Department of Physics, University of Washington, Seattle WA    Daniel Whiteson Department of Physics and Astronomy, University of California, Irvine CA
Abstract

We train a network to identify jets with fractional dark decay (semi-visible jets) using the pattern of their low-level jet constituents, and explore the nature of the information used by the network by mapping it to a space of jet substructure observables. Semi-visible jets arise from dark matter particles which decay into a mixture of dark sector (invisible) and Standard Model (visible) particles. Such objects are challenging to identify due to the complex nature of jets and the alignment of the momentum imbalance from the dark particles with the jet axis, but such jets do not yet benefit from the construction of dedicated theoretically-motivated jet substructure observables. A deep network operating on jet constituents is used as a probe of the available information and indicates that classification power not captured by current high-level observables arises primarily from low-pTp_{\textrm{T}} jet constituents.

I Introduction

The microscopic nature of dark matter (DM) remains one of the most pressing open questions in modern physics Bertone and Hooper (2018); Bertone and Tait (2018), and a robust program of experimental searches for evidence of its interaction with the visible sector Aaltonen et al. (2012); collaboration (2013, 2012); Aad et al. (2013); Chatrchyan et al. (2012); Bai and Tait (2013); Petriello et al. (2008); Ferreira de Lima et al. (2017); Aaij et al. (2018) described by the Standard Model (SM). These experiments typically assume that DM is neutral, stable and couples weakly to SM particles; in collider settings this predicts detector signatures in which weakly-produced DM particles are invisible, evidenced only by the imbalance of momentum transverse to the beam. No evidence of DM interactions has been observed to date.

However, while these assumptions are reasonable, the lack of observation motivates the exploration of scenarios in which one or more of them are relaxed. Specifically, if DM contains complex strongly-coupled hidden sectors, such as appear in many Hidden-Valley models Strassler and Zurek (2007a); Han et al. (2008), it may lead to the production of stable or meta-stable dark particles within hadronic jets Cohen et al. (2015, 2017); Kar and Sinha (2021); Bernreuther et al. (2021). Depending on the portion of the jet which results in dark-sector hadrons, it may be only semi-visible to detectors, leading to a unique pattern of energy deposits, or jet substructure.

A robust literature exists for the identification of jet substructure Larkoski et al. (2014a, 2013); Kogler et al. (2019); Lu et al. (2022), with applications to boosted WW-boson Baldi et al. (2016); Chen et al. (2020); collaboration (2019), Higgs boson Chung et al. (2021) and top-quark tagging collaboration (2019). Typically, observables are designed to discriminate between the energy deposition patterns left by jets which result from a single hard parton and the patterns left by jets which result from several collimated hard partons, as can be produced from the decay of a massive boosted particle. While these observables have some power when adapted to the task of identifying semi-visible jets Kar and Sinha (2021); Cohen et al. (2020), no observables have yet been specifically designed to be sensitive to the unique energy patterns of semi-visible nature of jets.

In parallel, the rapid development of machine learning to the analysis of jet energy depositions Larkoski et al. (2013); Komiske et al. (2018); Baldi et al. (2016) demonstrated that jet tagging strategies, including those for semi-visible jets, can be learned directly from lower-level jet constituents without the need to form physics-motivated high level observables Bernreuther et al. (2021); Dillon et al. (2022). Such learned models are naturally challenging to interpret, validate or quantify uncertainties, especially given the high-dimensional nature of their inputs. In the case of semi-visible jets, extra care must be taken when drawing conclusions from low-level details, many of which may depend on specific theoretical parameter choices as well as the approximations made during modeling of hadronization Cohen et al. (2020). However, techniques have been recently demonstrated Faucett et al. (2021); Collado et al. (2021a, b); Bradshaw et al. (2022) to translate the learned model into interpretable high-level observables, which can provide guidance regarding the nature and robustness of the information being used for classification.

In this paper, we present a study of machine learning models trained to distinguish semi-visible jets from QCD background jets using the patterns of their low-level jet constituents. We compare the performance of models which use low-level constituents to those which use the set of existing high-level observables to explore where the existing HL observables do and do not capture all of the relevant information. Where gaps exist, we attempt to translate Faucett et al. (2021) low-level networks into networks which use a small set of interpretable observables which replicate their decisions and performance. Interpretation of these observables can yield insight into the nature of the energy deposition inside semi-visible jets.

II Semi-visible jets

Following Refs. Cohen et al. (2015); Kar and Sinha (2021), we consider pair production of dark-sector quarks of several flavors (χi=χ1,2)\left(\chi_{i}=\chi_{1,2}\right) via a messenger sector which features a ZZ^{\prime} gauge boson in an ss-channel process (Fig. 1(a)) or scalar mediator φ\varphi in the tt-channel process (Fig. 1(b)) that couples to both SM and DM sectors and leads to a dijet signature. The dark quarks produce QCD-like dark showers, involving many dark quarks and gluons which produce dark hadrons, some of which are stable or meta-stable and some of which decay into SM hadrons via an off-shell process.

Refer to caption
(a) s-channel
Refer to caption
(b) t-channel
Figure 1: Feynman diagrams for ss-channel and tt-channel pair-production of dark-sector quarks χi\chi_{i} which lead to semi-visible jet production.

The detector signature of the resulting semi-visible jet (SVJ) depends on the lifetime and stability of the dark hadrons, leading to several possibilities (see Fig. 2). Though the physics is complex and depends on the details of the dark sector structure, a description of the dark and SM hadrons produced by a DM model quark can be encapsulated in the quantity rinvr_{\textrm{inv}}, the ratio of dark stable hadrons to all hadrons in the jet:

rinv# of stable dark hadrons# of hadronsr_{\textrm{inv}}\equiv\left\langle\frac{\textrm{\# of stable dark hadrons}}{\textrm{\# of hadrons}}\right\rangle (1)
Refer to caption
(a) Rapid Decay (Visible), rinv=0r_{\textrm{inv}}=0
Refer to caption
(b) Stable Dark Jet (Invisible), rinv=1r_{\textrm{inv}}=1
Refer to caption
(c) Fractional Decay (Semi-Visible), rinv(0,1)r_{\textrm{inv}}\in(0,1)
Figure 2: Depictions of jets produced with varying visible (SM, solid red) and invisible (dark sector, dashed black) components.

An invisible fraction of rinv=0r_{\textrm{inv}}=0 corresponds to a dark quark which produces a jet consisting of only visible hadrons, as depicted in the Rapid Decay example given in Fig. 2(a). Alternatively, an invisible fraction of rinv=1r_{\textrm{inv}}=1 describes a stable dark jet (Fig. 2(b)), in which the dark quark hadronizes exclusively in the dark sector. For any intermediate value of rinvr_{\textrm{inv}}, jets contain a visible and invisible fraction, leading to ET\cancel{E}_{\textrm{T}} along the jet axis(Fig. 2(c)).

III Sample Generation and Data Processing

Samples of simulated events with semi-visible jets are generated using the modified Hidden Valley Strassler and Zurek (2007b) model described in  Cohen et al. (2015, 2017) for both an ss-channel (Fig. 1(a)) and tt-channel (Fig. 1(b)) process. Samples of simulated ppZχ1χ¯1pp\to Z^{\prime}\to\chi_{1}\overline{\chi}_{1} and ppφχ1χ¯1pp\to\varphi\to\chi_{1}\overline{\chi}_{1} events are produced in proton-proton collisions at a center-of-mass energy of s=13TeV\sqrt{s}=13\,\text{TeV} in MadGraph5 Alwall et al. (2011) (v2.6.7) with xqcut=100 and the NNPDF2.3 LO PDF set Skands et al. (2014). The mediator mass is set to 1.5TeV1.5\,\textrm{TeV} and the dark quark mass to Mχ1=10GeVM_{\chi_{1}}=10\,\textrm{GeV}. Up to two extra hard jets due to radiation are generated and MLM-matched Mangano et al. (2002), to facilitate comparison with existing studies. Invisible fraction, showering and hadronization are performed with Pythia8 v8.244 Sjöstrand et al. (2015). In particular, the following parameters are set: the dark confinement scale Λd=5GeV\Lambda_{d}=5\,\textrm{GeV}; the number of dark colors Nc=2N_{c}=2; the number of dark flavors Nf=1N_{f}=1; and the intermediate dark meson ρd\rho_{d} mass of mρd=20GeVm_{\rho_{d}}=20\,\textrm{GeV} and the dark matter πd\pi_{d} mass of md=9.9GeVm_{d}=9.9\,\textrm{GeV}. Distinct sets were generated for invisible fractions of rinv[0.0,0.3,0.6]r_{\textrm{inv}}\in[0.0,0.3,0.6] via configurations of the branching ratio of decay process ρdπdπd\rho_{d}\to\pi_{d}\pi_{d}. Detector simulation and reconstruction are conducted in Delphes v3.4.2 de Favereau et al. (2014) using the default ATLAS card. A sample of SM jets from a typical QCD process is generated from the ppjjpp\to jj process. The same simulation chain as described above is applied to the SM jets.

Jets are built from calorimeter energy deposits and undergo the jet trimming procedure described in Ref. Krohn et al. (2010) with the anti-kTk_{\textrm{T}} Cacciari et al. (2008) clustering algorithm from pyjet Cacciari et al. (2012), using a primary jet-radius parameter of R=1.0R=1.0 and subjet clustering radius of Rsub=0.2R_{\text{sub}}=0.2 and fcut=0.05f_{\text{cut}}=0.05. The threshold on fcutf_{\text{cut}} effectively removes constituents in subjets with pTp_{\textrm{T}} that is less than 5% of the jet pTp_{\textrm{T}}. Leading jets are required to have pT[300,400]GeVp_{\textrm{T}}\in[300,400]\,\text{GeV}. For each event generated, the leading jet is selected and truth-matched to guarantee the presence of a dark quark within the region of ΔR<1\Delta R<1.

After all selection requirements, 2×1062\times 10^{6} simulated jets remain with a 50/50 split between SVJ signal and QCD background. To avoid inducing biases from artifacts of the generation process, signal and background events are weighted such that the distributions in pTp_{\textrm{T}} and η\eta are uniform; see  Fig. 7.

III.1 High-level Observables

A large set of jet substructure observables Kogler et al. (2019); Marzani et al. (2019); Larkoski et al. (2017) have been proposed for tasks different from the focus of this study, that of identifying jets with multiple hard subjets. Nevertheless, these observables may summarize the information content in the jet in a way that is relevant for the task of identifying semi-visible jets Kar and Sinha (2021), and so serve as a launching point for the search for new observables.

Our set of high-level (HL) observables includes 15 candidates: jet pTp_{\textrm{T}}, the Generalized Angularities Larkoski et al. (2014b) of pTDp_{\textrm{T}}^{D} and Les Houches Angularity (LHA), N-subjettiness ratios τ21β=1\tau_{21}^{\beta=1} and τ32β=1\tau_{32}^{\beta=1} Thaler and Van Tilburg (2011), and Energy Correlation function ratiosLarkoski et al. (2013) C2β=1C_{2}^{\beta=1}, C2β=2C_{2}^{\beta=2}, D2β=1D_{2}^{\beta=1}, D2β=2D_{2}^{\beta=2}, e2e_{2}, e3e_{3}, as well as jet width, jet emasse_{\textrm{mass}}, constituent multiplicity and the splitting function Larkoski et al. (2014a) zgz_{g}. In each case, observables are calculated from trimmed jet constituents described above. Definitions and distributions for each high-level observable are provided in App. B.1 with re-weighting applied as described above.

IV Machine Learning and Evaluation

For both the low-level (LL) trimmed jet constituents and high-level jet substructure observables, a variety of networks and architectures are tested.

Due to their strong record in previous similar applications Faucett et al. (2021); Collado et al. (2021a, b); Bernreuther et al. (2021), a deep neural network using dense layers is trained on the high-level observables. We additionally consider XGBoostChen and Guestrin (2016), which has shown strong performance in training high-level classifiers with jet substructure Bourilkov (2019); Cornell et al. (2021), as well as LightGBMKe et al. (2017) which has demonstrated power in class separation on high-level features. LightGBM has the strongest classification performance among these networks which use high-level features; see Tab. 1.

Table 1: Summary of performance (AUC) in the SVJ classification task for several networks using high-level features, for the six simulated scenarios, three choices of invisible fraction rinvr_{\textrm{inv}} for both the ss-channel and tt-channel processes. Statistical uncertainty in each case is less than ±0.002\pm 0.002 with a 95% confidence, measured using bootstrapping over 200 models.
ss-channel AUC
Model rinv=0.0r_{\textrm{inv}}=0.0 0.3 0.6
LightGBM 0.861 0.803 0.736
XGBoost 0.861 0.803 0.736
DNN 0.860 0.799 0.734
tt-channel AUC
Model rinv=0.0r_{\textrm{inv}}=0.0 0.3 0.6
LightGBM 0.808 0.755 0.683
XGBoost 0.806 0.755 0.681
DNN 0.801 0.726 0.656
Table 2: Summary of performance (AUC) in the SVJ classification task for several networks using low-level constituents, for the six simulated scenarios, three choices of invisible fraction rinvr_{\textrm{inv}} for both the ss-channel and tt-channel processes. Statistical uncertainty in each case is less than ±0.002\pm 0.002 with a 95% confidence, measured using bootstrapping over 200 models.
ss-channel AUC
Model rinv=0.0r_{\textrm{inv}}=0.0 0.3 0.6
PFN 0.866 0.822 0.776
EFN 0.849 0.795 0.735
CNN 0.855 0.792 0.740
tt-channel AUC
Model rinv=0.0r_{\textrm{inv}}=0.0 0.3 0.6
PFN 0.806 0.754 0.697
EFN 0.796 0.741 0.672
CNN 0.791 0.739 0.663

In the case of classifiers which use low-level constituents, convolutional neural networks on jet images are considered Faucett et al. (2021); Collado et al. (2021a, b); Baldi et al. (2016); Cogan et al. (2015); de Oliveira et al. (2016). Given the specific task of classifying jet substructure observables and their use for a similar task in Ref. Collado et al. (2021b), Energy Flow Networks (EFN) and Particle Flow Networks (PFN) are also applied Komiske et al. (2019), and found to significantly out-perform convolutional networks, with the PFN emerging as the most powerful network; see Table 2.

Receiver operating characteristic (ROC) curves for both the PFN and LightGBM high-level models are given in Fig. 3. Additional details for the network training and hyperparameter selections are provided in App. C.

IV.1 Performance Comparison

We compare the SVJ classification performance of the most powerful networks based on high-level or low-level input features, through calculations of the Area under the ROC Curve (AUC); see Fig 3 and Table 3.

Note that the high-level (HL) observables are calculated directly from the low-level constituents with no additional information. If HL features extract all of the relevant information for the classification task, networks which use them as inputs should be able to match the performance of the networks which use the LL information directly, which we take as a probe of the power of the available information. If networks using LL information surpass the performance of those using HL information, it suggests that information exists in the LL constituents which is not being captured by the HL observables. One might consider directly applying networks based on LL information to the classification task Bernreuther et al. (2021), but this presents challenges in calibrating, validating and quantifying uncertainties on their high-dimensional inputs. Instead, a performance gap suggests the possibility that an additional HL observable might be crafted to take advantage of the overlooked information.

In each of the six cases explored here, the networks which use low-level information match or exceed the performance of networks which use high-level jet observables. Significant performance gaps are seen in the rinv=0.6r_{\textrm{inv}}=0.6 case, where the AUC between the LL and HL networks is 0.040 and 0.014 for the ss-channel and tt-channel cases, as well as in the and rinv=0.3r_{\textrm{inv}}=0.3 invisible fraction in the ss-channel process, where the gap is 0.019. In the rinv=0.0r_{\textrm{inv}}=0.0 ss-channel scenario, a small gap is seen, though larger than the statistical uncertainty. In the other two cases, the LL and HL networks achieve essentially equivalent performance, strongly suggesting that the HL features have captured the relevant information in the LL constituents. As these observables were not designed for this task, it was not a priori clear that they would summarize all of the available and relevant information.

Refer to caption
Figure 3: Comparison of performance between a network which uses low-level constituents (PFN in solid red) and one which uses high-level jet observables (LightGBM, dashed blue). Shown are the background rejection (inverse of background efficiency) versus signal efficiency for the six simulated cases: ss-channel and tt-channel production for three values of rinv.r_{\textrm{inv}}. Statistical uncertainty on AUC in each case is less than ±0.002\pm 0.002 at 95% confidence level, measured using bootstrapping over 200 models.

However, one can also assess the difference between the LL and HL networks using other metrics than AUC. The Average Decision Ordering (ADO) metric, see Eq. 6, measures the fraction of input pairs in which two networks give the same output ordering. Even in cases where the AUC is equivalent, the ADO between the LightGBM and PFN (Table 3) is well below 1, suggesting that while their classification performance is the same, they arrive at it using distinct strategies. This indicates that there may be room to improve the classification accuracy by considering a network which uses both sets of features.

Table 3: Summary of performance (AUC and ADO) in the SVJ classification task for various network architectures and input features. Statistical uncertainty in each case is less than ±0.002\pm 0.002 at 95% confidence level, measured using bootstrapping over 200 models.
ss-channel
Model[Features] rinv=0.0r_{\textrm{inv}}=0.0 rinv=0.3r_{\textrm{inv}}=0.3 rinv=0.6r_{\textrm{inv}}=0.6
ADO[\cdot,PFN] AUC ADO[\cdot,PFN] AUC ADO[\cdot,PFN] AUC
PFN[LL] 1 0.866 1 0.822 1 0.776
LightGBM[HL] 0.858 0.861 0.839 0.803 0.818 0.736
LL-HL gap 0.005 0.019 0.040
tt-channel
Model[Features] rinv=0.0r_{\textrm{inv}}=0.0 rinv=0.3r_{\textrm{inv}}=0.3 rinv=0.6r_{\textrm{inv}}=0.6
ADO[\cdot,PFN] AUC ADO[\cdot,PFN] AUC ADO[\cdot,PFN] AUC
PFN[LL] 1 0.806 1 0.754 1 0.697
LightGBM[HL] 0.844 0.808 0.805 0.755 0.787 0.683
LL-HL gap -0.002 -0.001 0.014

V Finding New Observables

The studies above reveal that models which use low-level constituents as inputs provide the overall best classification performance. However, our objective is not merely to find the classifier with the optimal statistical performance with difficult-to-assess systematic uncertainties. Rather, we seek to understand the underlying physics used by the PFN and to translate this information into one or more meaningful physical features, allowing us to extract insight into the physical processes involved and assign reasonable systematic uncertainties. In this section, we search for these additional high-level observables among the Energy Flow Polynomials Komiske et al. (2018) (EFP), which form a complete basis for jet observables.

V.1 Search Strategy

Interpreting the decision making of a black-box network is a notoriously difficult problem. For the task of jet classification, we apply the guided search technique utilized in past jet-related interpretability studies Faucett et al. (2021); Collado et al. (2021b, a); Bradshaw et al. (2022); Lu et al. (2022). In this approach, new HL observables are identified among the infinite set of Energy Flow Polynomials (EFPs) which exist as a complete linear basis for jet observables. In the EFP framework, one-dimensional observables are constructed as nested sums over measured jet constituent transverse momenta pT,ip_{\textrm{T},i} and scaled by their angular separation θij\theta_{ij}.

These parametric sums are described as the set of all isomorphic multigraphs where:

each node i=1Nzi,\displaystyle\Rightarrow\sum_{i=1}^{N}z_{i}, (2)
each kk-fold edge (θij)k.\displaystyle\Rightarrow\left(\theta_{ij}\right)^{k}. (3)

and where each graph can be further parameterized by values of (κ,β)(\kappa,\beta),

(zi)κ\displaystyle(z_{i})^{\kappa} =(pT,ijpT,j)κ,\displaystyle=\left(\frac{p_{\textrm{T},i}}{\sum\limits_{j}p_{\textrm{T},j}}\right)^{\kappa}, (4)
θijβ\displaystyle\theta^{\beta}_{ij} =(Δηij2+Δφij2)β/2.\displaystyle=\left(\Delta\eta_{ij}^{2}+\Delta\varphi_{ij}^{2}\right)^{\beta/2}. (5)

Here, pT,ip_{\textrm{T},i} is the transverse momentum of the trimmed jet for constituent ii, and Δηij\Delta\eta_{ij} (Δφij\Delta\varphi_{ij}) is pseudorapidity (azimuth) difference between constituents ii and jj. As the EFPs are normalized, they capture only the relative information about the energy deposition. For this reason, in each network that includes EFP observables, we include as an additional input the sum of pTp_{\textrm{T}} over all constituents, to indicate the overall scale of the energy deposition.

For the set of EFPs, infrared and collinear (IRC) safety requires that κ=1\kappa=1. To more broadly explore the space, we consider examples with κ1\kappa\not=1 which generate more exotic observables. For the case of EFPs with IRC-unsafe parameters, we select all prime graphs with dimension d5d\leq 5 and κ\kappa and β\beta values of κ[2,1,0,12,1,2,4]\kappa\in\left[-2,-1,0,\tfrac{1}{2},1,2,4\right] and β[110,12,1,2,4]\beta\in\left[\tfrac{1}{10},\tfrac{1}{2},1,2,4\right]. Given the form of Eq. (2) and Eq. (3), the size and relative sign of inputs chosen for (κ,β)(\kappa,\beta) will provide insights into utility of soft/hard pTp_{\textrm{T}} effects and narrow/wide angle energy distributions.

In principle, the EFP space is complete and any information accessible through constituent-based observables is contained in some combination of EFPs. However, there is no guarantee that an EFP representation will be compact. On the contrary, a blind search through the space can prove time and resource prohibitive. Rather than a brute force search of an infinite space of observables, we take the guided approach of Ref Faucett et al. (2021), which uses the PFN as a black-box guide and iteratively assembles a set of EFPs which provide the closest equivalent decision making in a compact and low-dimensional set of inputs. This is done by isolating the space of information in which the PFN and existing HL features make opposing decisions on the same inputs and isolates the EFP which most closely mimics the PFN in that subspace.

Here, the agreement between networks f(x)f(x) and g(x)g(x) is evaluated over pairs of inputs (x,x)(x,x^{\prime}) by comparing their relative classification decisions, expressed mathematically as:

DO[f,g](x,x)=Θ((f(x)f(x))(g(x)g(x))),\textrm{DO}[f,g](x,x^{\prime})=\Theta\Big{(}\big{(}f(x)-f(x^{\prime})\big{)}\big{(}g(x)-g(x^{\prime})\big{)}\Big{)}, (6)

and referred to as decision ordering (DO). DO=0=0 corresponds to inverted decisions over all input pairs and DO = 1 corresponds to the same decision ordering. As prescribed in Ref. Faucett et al. (2021), we scan the space of EFPs to find the observable that has the highest average decision ordering (ADO) with the guiding network when averaged over disordered pairs. The selected EFP is then incorporated into the new model of HL features, HLn+1, and the process is repeated until the ADO plateaus.

VI Guided Iteration

For each of the four scenarios in which a gap is observed between the AUC performance of the HL LightGBM model and the PFN, a guided search is performed to identify an EFP observable which mimics the decisions of the PFN.

The search results, shown in Table 4, identify in each of the four cases an EFP with d3d\leq 3 which boosts the classification performance of the LightGBM classifier when trained with the original 15 HL features as well as the identified EFP observable. In addition, the decision similarity (ADO) between the new HL model and the PFN is increased. Scans for a second EFP do not identify additional observables which significantly increase performance or similarity. A guided search was also performed with an identical set of κ\kappa and β\beta parameters for EFPs with dimension d5d\leq 5 but no improvements were seen in the case of higher dimensional graph structure. While the performance and similarity gaps have been reduced, they have not quite been erased.

Table 4: Results of a guided search for high-level (HL) EFP observables which mimic the decision ordering of the PFN, a network based on low-level constituents. For each of the four scenarios in which a gap is observed between the AUC performance of the HL network and the PFN, an EFP is selected to attempt to increase the average decision ordering (ADO) of the HL network with the PFN. Statistical uncertainty in each case is ±0.002\pm 0.002 with 95% confidence, measured using bootstrapping over 100 models.
HL network HL+EFP network PFN
Process rinvr_{\textrm{inv}} AUC, ADO[.,pfn] EFP κ\kappa β\beta AUC, ADO[.,pfn] AUC
ss-channel 0.0 0.861,  0.858 [Uncaptioned image]\begin{gathered}\includegraphics[width=21.68231pt]{fig_1_0_0.pdf}\end{gathered} -2 0.864,  0.863 0.866
ss-channel 0.3 0.803,  0.839 [Uncaptioned image]\begin{gathered}\includegraphics[width=21.68231pt]{fig_2_3_0.pdf}\end{gathered} 1 12\tfrac{1}{2} 0.807,  0.840 0.822
ss-channel 0.6 0.736,  0.818 [Uncaptioned image]\begin{gathered}\includegraphics[width=21.68231pt]{fig_3_2_0.pdf}\end{gathered} -1 2 0.747,  0.821 0.776
tt-channel 0.6 0.683,  0.787 [Uncaptioned image]\begin{gathered}\includegraphics[width=21.68231pt]{fig_3_3_1.pdf}\end{gathered} -2 110\tfrac{1}{10} 0.690,  0.792 0.697

VI.1 Analysis of the Guided Search Results

In the ss-channel process, with invisible fraction rinv=0.0r_{\textrm{inv}}=0.0, the addition of a single EFP closes the small performance gap with the PFN within the statistical uncertainty. The identified EFP in this case is the dot graph with the IRC-unsafe energy exponent κ=2\kappa=-2, expressed as a sum over constituents in Eq. (7).

([Uncaptioned image])(κ=2β=n/a)=a=1N1za2\left(\begin{gathered}\includegraphics[width=28.18636pt]{fig_1_0_0.pdf}\end{gathered}\right)^{\footnotesize{\begin{pmatrix}\kappa=-2\\ \beta=\text{n/a}\end{pmatrix}}}=\sum\limits_{a=1}^{N}\frac{1}{z_{a}^{2}} (7)

This graph is, in effect, simply a measure of the sum of the inverse pT2p_{\textrm{T}}^{2} of the jet constituents, and is sensitive to constituents with low pTp_{\textrm{T}}. Distribution of values of this observable for signal and background events are shown in Fig. 4, demonstrating good separation between signal and background.

Refer to caption
Figure 4: Distribution of the EFP observable selected by the guided search in the ss-channel rinv=0r_{\textrm{inv}}=0 scenario, shown for semi-visible jets (SVJ) as well as QCD jets from the Standard Model background (SM Bkg). The inset pane shows the graph corresponding to the EFP. See text for additional details.

In the remaining ss-channel and tt-channel examples, addition of the selected EFP improves performance but fails to match the PFN. Of the three existing gaps, the ss-channel process with rinv=0.3r_{\textrm{inv}}=0.3 is the only result in which we see an IRC-safe EFP observable, where κ=1\kappa=1. In the other cases, the EFP graphs again have κ<0\kappa<0, making them sensitive to low-pTp_{\textrm{T}} information. The complete expression for each selected graph is given in Eq. (8) and the distributions of each observable for signal and background are shown in Fig. 5. The triangular graph selected in the case of the tt-channel process with invisible fraction rinv=0.6r_{\textrm{inv}}=0.6 has the same structure as the energy correlation ratio (e3)(e_{3}), though with distinct κ\kappa and β\beta values.

([Uncaptioned image])(κ=1β=12)\displaystyle\left(\begin{gathered}\includegraphics[width=21.68231pt]{fig_2_3_0.pdf}\end{gathered}\right)^{\footnotesize{\begin{pmatrix}\kappa=1\\ \beta=\tfrac{1}{2}\end{pmatrix}}} =a,b=1Nzazb(θab)3/2\displaystyle=\sum\limits_{a,b=1}^{N}z_{a}z_{b}\left(\theta_{ab}\right)^{3/2} (8b)
([Uncaptioned image])(κ=1β=2)\displaystyle\left(\begin{gathered}\includegraphics[width=21.68231pt]{fig_3_2_0.pdf}\end{gathered}\right)^{\footnotesize{\begin{pmatrix}\kappa=-1\\ \beta=2\end{pmatrix}}} =a,b,c=1N(θabθac)2zazbzc\displaystyle=\sum\limits_{a,b,c=1}^{N}\frac{\left(\theta_{ab}\theta_{ac}\right)^{2}}{z_{a}z_{b}z_{c}} (8d)
([Uncaptioned image])(κ=2β=110)\displaystyle\left(\begin{gathered}\includegraphics[width=21.68231pt]{fig_3_3_1.pdf}\end{gathered}\right)^{\footnotesize{\begin{pmatrix}\kappa=-2\\ \beta=\tfrac{1}{10}\end{pmatrix}}} =a,b,c=1N(θabθbcθac)110zazbzc\displaystyle=\sum\limits_{a,b,c=1}^{N}\frac{\left(\theta_{ab}\theta_{bc}\theta_{ac}\right)^{\tfrac{1}{10}}}{z_{a}z_{b}z_{c}} (8f)
Refer to caption
(a) ss-channel / rinv=0.3r_{\textrm{inv}}=0.3
Refer to caption
(b) ss-channel / rinv=0.6r_{\textrm{inv}}=0.6
Refer to caption
(c) tt-channel / rinv=0.6r_{\textrm{inv}}=0.6
Figure 5: Distribution of EFP observables selected by the guided search for the ss-channel rinv=0.3r_{\textrm{inv}}=0.3, rinv=0.6r_{\textrm{inv}}=0.6 and tt-channel rinv=0.6r_{\textrm{inv}}=0.6 scenarios, shown for semi-visible jets (SVJ) as well as QCD jets from the Standard Model background (SM Bkg). The inset panes show the graph corresponding to the selected EFP. See text for additional details.

VII Greedy Search

Given the persistent gap between the performance of the LL networks and the HL models augmented by EFP observables, we consider whether the EFP space lacks the needed observables, or whether the guided search is failing to identify it. We examine this question by taking a more comprehensive look at the space of EFPs we consider. Similar to the technique described in Ref. Faucett et al. (2021), we perform a greedy search in the same EFP space studied in the guided iteration approach explored above. In a greedy search, we explicitly train a new model for each candidate EFP, combining the EFP with the existing 15 HL features. Note that this is significantly more computationally intensive than evaluation of the ADO, as done in the guided search, and seeks to maximize AUC rather than to align decision ordering with the PFN. The candidate EFP which produces the best-performing model is kept as the 16th HL observable (Pass 1), and the process is repeated in search of a 17th (Pass 2), until a plateau in performance is observed.

The results of the greedy search across all choices of rinvr_{\textrm{inv}} in ss-channel and tt-channel scenarios where a gap between HL and LL exists are given in Table 5. Similar levels of performance are achieved as in the guided search, to within statistical uncertainties. In all cases, the HL and LL gap persists. Results are given for the IRC-unsafe selections with dimension d3d\leq 3. A similar greedy search was also performed on the IRC-unsafe d5d\leq 5 EFP set with no performance differences observed.

Table 5: Summary of two passes of a greedy search through EFP space for additional observables which might capture the information used by the low-level network (PFN) and match its performance, as measured by AUC. For each of the four processes and rinvr_{\textrm{inv}} scenarios in which we have identified a gap between performance of the PFN and the HL model, two passes are made to identify the EFP which most improves the AUC of a new HL model which incorporates the candidate EFP. Performance of the HL model deduced by the guided search ( Table 4) and the PFN are also given. Statistical uncertainty in each case is ±0.002\pm 0.002 with 95% confidence, measured using bootstrapping over 100 models.
HL Pass 1 Pass 2 Guided PFN
Process rinvr_{\textrm{inv}} AUC Graph κ\kappa β\beta AUC Graph κ\kappa β\beta AUC HL AUC AUC
ss-channel 0.0 0.861 [Uncaptioned image]\begin{gathered}\includegraphics[width=28.18636pt]{fig_2_1_0.pdf}\end{gathered} 12\tfrac{1}{2} 2 0.864 [Uncaptioned image]\begin{gathered}\includegraphics[width=28.18636pt]{fig_2_2_0.pdf}\end{gathered} 2 110\tfrac{1}{10} 0.866 0.864 0.866
ss-channel 0.3 0.803 [Uncaptioned image]\begin{gathered}\includegraphics[width=28.18636pt]{fig_2_3_0.pdf}\end{gathered} 4 2 0.807 [Uncaptioned image]\begin{gathered}\includegraphics[width=28.18636pt]{fig_2_1_0.pdf}\end{gathered} -1 1 0.809 0.807 0.822
ss-channel 0.6 0.736 [Uncaptioned image]\begin{gathered}\includegraphics[width=28.18636pt]{fig_2_2_0.pdf}\end{gathered} 4 4 0.744 [Uncaptioned image]\begin{gathered}\includegraphics[width=28.18636pt]{fig_2_3_0.pdf}\end{gathered} -2 110\tfrac{1}{10} 0.747 0.747 0.776
tt-channel 0.6 0.683 [Uncaptioned image]\begin{gathered}\includegraphics[width=28.18636pt]{fig_3_2_0.pdf}\end{gathered} -1 110\tfrac{1}{10} 0.690 [Uncaptioned image]\begin{gathered}\includegraphics[width=28.18636pt]{fig_4_3_1.pdf}\end{gathered} -2 4 0.692 0.690 0.697

The greedy search selects similar EFP graphs as the guided search, with the exception of the 4-node graph selected in pass 2 for the rinv=0.6r_{\textrm{inv}}=0.6, tt-chanel scenario. No IRC-safe graphs (κ=1\kappa=1) are selected and we again see frequent sensitivity to low pTp_{\textrm{T}} parameters (i.e. κ=2,1,12\kappa=-2,-1,\tfrac{1}{2}) and a variety of both narrow and wide angle features (i.e. β=110,2,4\beta=\tfrac{1}{10},2,4). A repeat of the greedy search with only IRC-safe observables achieves no performance improvements over the original HL features, suggesting that missing information may be strongly tied to IRC-unsafe representations of the information in the jet constituents.

VIII Exploration of pTp_{\textrm{T}} Dependence

The results of both the guided search and the greedy search strongly suggest that the full performance gap which persists between the HL and LL representation of the jet contents cannot be compactly expressed in terms of a small number of EFP observables in the set that have been considered. This raises the question of what feature of the LL constituents can be credited with this performance improvement and why that information does not translate compactly to our EFP observables. The clues from the guided and greedy search point to sensitivity to low-pTp_{\textrm{T}} constituents, as can be generated from soft radiation. Figure 6 shows the distribution of constituent pTp_{\textrm{T}} relative to the jet pTp_{\textrm{T}} for the six scenarios, in which SVJs appear to have more constituents at low relative pTp_{\textrm{T}}. Recall that jet constituents are trimmed if the subjet they belong to has a pTp_{\textrm{T}} below a threshold of 5% of the pTp_{\textrm{T}} of the jet, corresponding to fcut=0.05f_{\textrm{cut}}=0.05.

Refer to caption
Figure 6: Distribution of constituent pTp_{\textrm{T}} relative to the jet pTp_{\textrm{T}} for the six scenarios: ss-channel and tt-channel with rinv[0,0.3,0.6]r_{\textrm{inv}}\in[0,0.3,0.6].

To consider whether the distinguishing information is contained in these low-pTp_{\textrm{T}} constituents, we explore a broader range of thresholds, both lowering it to 0% and raising it to 10% and 15%. The HL features are re-evaluated on the newly trimmed constituents, and used as inputs to a new LightGBM model, whose performance is compared to a PFN trained on the trimmed constituents. Results for networks with varying fcutf_{\textrm{cut}} thresholds are shown in Table 6.

Table 6: Comparison of the performance difference between a PFN operating on low-level constituent information and a LightGBM model using high-level summary quantities, for several values of the jet trimming parameter fcutf_{\textrm{cut}}. Jet constituents belonging to a subjet whose fraction of the jet pTp_{\textrm{T}} is below fcutf_{\textrm{cut}} are dropped, which has the effect of removing lower-pTp_{\textrm{T}} constituents. Shown is the AUC of each model for ss- and tt-channel processes under three rinvr_{\textrm{inv}} scenarios. Statistical uncertainty in each case is ±0.002\pm 0.002 with 95% confidence, measured using bootstrapping over 100 models.
ss-channel
rinv=0.0r_{\textrm{inv}}=0.0 rinv=0.3r_{\textrm{inv}}=0.3 rinv=0.6r_{\textrm{inv}}=0.6
fcutf_{\textrm{cut}} PFN LightGBM LL-HL Gap PFN LightGBM LL-HL Gap PFN LightGBM LL-HL Gap
0.00 0.908 0.895 0.013 0.853 0.829 0.024 0.788 0.739 0.049
0.05 0.866 0.861 0.005 0.822 0.803 0.019 0.776 0.736 0.040
0.10 0.847 0.848 -0.001 0.790 0.790 0.000 0.746 0.721 0.025
0.15 0.838 0.843 -0.005 0.784 0.785 -0.001 0.738 0.717 0.021
tt-channel
rinv=0.0r_{\textrm{inv}}=0.0 rinv=0.3r_{\textrm{inv}}=0.3 rinv=0.6r_{\textrm{inv}}=0.6
fcutf_{\textrm{cut}} PFN LightGBM LL-HL Gap PFN LightGBM LL-HL Gap PFN LightGBM LL-HL Gap
0.00 0.825 0.817 0.008 0.748 0.737 0.011 0.662 0.647 0.0015
0.05 0.806 0.808 -0.002 0.754 0.755 -0.001 0.697 0.683 0.014
0.10 0.741 0.742 -0.001 0.662 0.663 -0.001 0.595 0.597 -0.002
0.15 0.731 0.740 -0.009 0.655 0.661 -0.006 0.593 0.596 -0.003

In each case, raising the fcutf_{\textrm{cut}} threshold decreases the classification performance, as might be expected due to the removal of low-pTp_{\textrm{T}} information. Perhaps more interesting is the variation in the gap between the performance of the HL LightGBM model and the PFN operating on low-level constituents. In nearly every case, the gap grows as more low-pTp_{\textrm{T}} information is included, supporting the hypothesis that this is the origin of most of the information missing from the HL models. The details of the low-pTp_{\textrm{T}} constituents are likely to be very sensitive to modeling uncertainties and subject to concerns about infrared and collinear safety.

However, even in for the most aggressive value of fcut=0.15f_{\textrm{cut}}=0.15, a persistent gap of Δ\DeltaAUC=0.021 remains in the ss-channel, rinv=0.6r_{\textrm{inv}}=0.6 scenario, which cannot be explained by low-pTp_{\textrm{T}} constituents. We therefore examine this in more detail.

First, we consider the EFPs selected in the study above where fcut=0.05f_{\textrm{cut}}=0.05, including EFPs selected by the guided search (Table 4) and the greedy search (Table 5). For all combinations of HL and identified EFPs, no performance gain is seen. Next, we perform a fresh guided search on models trained from the fcut=0.15f_{\textrm{cut}}=0.15 constituents. In contrast to previous cases, no large improvement in training performance is obtained on the first selected EFP. After 200 iterations, the gap is reduced to Δ\DeltaAUC=0.010 with the addition of 200 EFPs. We note that this far exceeds the mean number of constituents of these trimmed jets, 60. We conclude that there there does not appear to be a compact representation of the remaining information in the EFP space we have explored.

IX Conclusions

We have analyzed the classification performance of models trained to distinguish background jets from semi-visible jets using the low-level jet constituents, and found them to offer stronger performance than models which rely on high-level quantities motivated by other processes, mostly those involving collimated hadronic decays of massive objects.

While models operating on the existing suite of HL quantities nearly match the performance of those using LL information, a significant gap exists which suggests that relevant information remains to be captured, perhaps in new high-level features. To our knowledge, this is the first study to compare the performance of constituent-based and high-level semi-visible-jet taggers, and to identify the existence of relevant information uncaptured by existing high-level features. Jets due to semi-visible decays are markedly different in energy distribution than those from massive objects, so it is not unexpected that existing features may not completely summarize the useful information.

Using a guided strategy, we identify a small set of new useful features from the space of energy-flow polynomials, but these do not succeed in completely closing the performance gap. In most cases, the remaining gap seems to be due to information contained in very low-pTp_{\textrm{T}} constituents, which is likely to be sensitive to modeling of showering and hadronization and may not be infrared and collinear safe. This highlights the importance of interpretation and validation of information used by constituent-based taggers. As demonstrated by Ref. Cohen et al. (2020), the specific pattern of energy deposition may depend sensitively on both the parameters of the theoretical model as well as the settings chosen for the hadronization model. It is therefore vital that the information being analyzed be interpreted before being applied to analysis of collider data.

In one case studied here, a gap persist between low- and high-level-based models even when low-pTp_{\textrm{T}} constituents are aggressively trimmed, suggesting the possibility that a new high-level feature could be crafted to capture this useful high-pTp_{\textrm{T}} information. Our efforts to capture this information with the simpler energy-flow polynomials was not successful, suggesting that more complex high-pTp_{\textrm{T}} observables may exist which provide useful discrimination between QCD and semi-visible jets. The studies presented here can inform and guide theoretical work to construct such observables specifically tailored to this category of jets. Whether such observables can be efficiently represented using alternative basis sets of observables, and whether they are robust to Shimmin et al. (2017) or explicitly dependent on Ghosh et al. (2021) uncertainties while providing power over large regions of theoretical parameter space is an important avenue for future work.

X Acknowledgements

The authors thank Po-Jen Cheng for his assistance in sample validation. This material is based upon research supported by the Chateaubriand Fellowship of the Office for Science & Technology of the Embassy of France in the United States. T. Faucett and D. Whiteson are supported by the U.S. Department of Energy (DOE), Office of Science under Grant No. DE-SC0009920. S.-C. Hsu is supported by the National Science Foundation under Grant No. 2110963.

Appendix A Signal reweighting

The signal and background distributions have distinct transverse momentum and pseudo-rapidity distributions due to the processes used to generate them. We wish to learn to classify the signal and background independent of these quantities, and so reweight the signal events to match the background distribution, see Fig 7.

Refer to caption
Refer to caption
Figure 7: Distributions of jet transverse momentum (pTp_{\textrm{T}}) and pseudo-rapidity (η\eta) shown for semi-visible jet (SVJ) and the standard model background (SM Bkg) for the six simulated scenarios, three choices of invisible fraction rinvr_{\textrm{inv}} for both the ss-channel and tt-channel processes. The SVJ samples are reweighted to match the background distributions.

Appendix B Jet Substructure Observables

High-level features used to discriminate between semi-visible and background jets are defined below.

B.1 Jet transverse momentum and mass

The sum of jet pTp_{\textrm{T}} constituents is included as a HL observable both in the initial HL inputs and along with EFPs to give ML algorithms a relative scale for dimensionless EFP features to train with. The jet pTp_{\textrm{T}} sum is calculated by

pT=ijetpT,ip_{\textrm{T}}=\sum\limits_{i\in\text{jet}}p_{\textrm{T},i} (9)

Distributions for jet pTp_{\textrm{T}} and emasse_{\textrm{mass}}, defined below, are shown in Fig. 8.

Refer to caption
Refer to caption
Figure 8: Distributions of jet transverse momentum(pTp_{\textrm{T}}) and emasse_{\textrm{mass}} shown for semi-visible jet and the standard model background (SM Bkg) for the six simulated scenarios, three choices of invisible fraction rinvr_{\textrm{inv}} for both the ss-channel and tt-channel processes.

B.2 Generalized Angularities

Multiple standard HL observables are defined by choices of κ\kappa and β\beta parameters from the momentum fraction (zi)\left(z_{i}\right) and angular separation (θi)\left(\theta_{i}\right) of a Generalized Angularity (GA) expression Larkoski et al. (2014b),

λβκ=ijetziκθiβ\lambda_{\beta}^{\kappa}=\sum\limits_{i\in\text{jet}}z_{i}^{\kappa}\theta_{i}^{\beta} (10)

The Les Houches Angularity (LHA) is defined from the GA expression with parameters (κ=1,β=1/2)\left(\kappa=1,\beta=1/2\right) and pTDp_{\textrm{T}}^{D} with (κ=2,β=0)\left(\kappa=2,\beta=0\right). Written explicitly, these become

LHA =ijetziθi1/2\displaystyle=\sum\limits_{i\in\text{jet}}z_{i}\theta_{i}^{1/2} (11)
pTD\displaystyle p_{\textrm{T}}^{D} =ijetzi2\displaystyle=\sum\limits_{i\in\text{jet}}z_{i}^{2} (12)

Two additional values, ewidthe_{\textrm{width}} and emasse_{\textrm{mass}}, are produced by choices of (κ=1,β=1)\left(\kappa=1,\beta=1\right) and (κ=1,β=2)\left(\kappa=1,\beta=2\right), respectively

ewidth\displaystyle e_{\textrm{width}} =ijetziθi\displaystyle=\sum\limits_{i\in\text{jet}}z_{i}\theta_{i} (13)
emass\displaystyle e_{\textrm{mass}} =ijetziθi2\displaystyle=\sum\limits_{i\in\text{jet}}z_{i}\theta_{i}^{2} (14)

Lastly, the multiplicity (although technically defined as simply the total number of constituents in the jet) can be expressed in this same generalized form for (κ=0,β=0)\left(\kappa=0,\beta=0\right)

multiplicity=ijet 1\text{multiplicity}=\sum\limits_{i\in\text{jet}}\,1 (15)

Distributions for all GA observables are shown in Fig. 9

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 9: Distributions of jet pTDp_{\textrm{T}}^{D}, LHA, width and multiplicity shown for semi-visible jet (red, green, blue) and the standard model background (SM Bkg, yellow) for the six simulated scenarios, three choices of invisible fraction rinvr_{\textrm{inv}} for both the ss-channel and tt-channel processes.

B.3 Energy Correlation

Energy Correlation Functions Larkoski et al. (2013) and their corresponding ratios are computed via the functions: ECF1,ECF2\textrm{ECF}_{1},\textrm{ECF}_{2} and ECF3\textrm{ECF}_{3},

ECF1\displaystyle\textrm{ECF}_{1} =ipT,i\displaystyle=\sum_{i}p_{\textrm{T},i} (16)
ECF2β\displaystyle\textrm{ECF}_{2}^{\beta} =i<jpT,ipT,j(θij)β\displaystyle=\sum_{i<j}p_{\textrm{T},i}\,p_{\textrm{T},j}\left(\theta_{ij}\right)^{\beta} (17)
ECF3β\displaystyle\textrm{ECF}_{3}^{\beta} =i<j<kpT,ipT,jpT,k(θijθikθjk)β\displaystyle=\sum_{i<j<k}p_{\textrm{T},i}\,p_{\textrm{T},j}\,p_{\textrm{T},k}\left(\theta_{ij}\theta_{ik}\theta_{jk}\right)^{\beta} (18)

and the related ratios are given by,

e2β\displaystyle e_{2}^{\beta} =ECF2β(ECF1)2\displaystyle=\frac{\textrm{ECF}_{2}^{\beta}}{\left(\textrm{ECF}_{1}\right)^{2}} (19)
e3β\displaystyle e_{3}^{\beta} =ECF3β(ECF1)3\displaystyle=\frac{\textrm{ECF}_{3}^{\beta}}{\left(\textrm{ECF}_{1}\right)^{3}} (20)

from these ratios, we then compute the energy correlation ratios C2C_{2} and D2D_{2}

C2\displaystyle C_{2} =e3(e2)2\displaystyle=\frac{e_{3}}{\left(e_{2}\right)^{2}} (21)
D2\displaystyle D_{2} =e3(e2)3\displaystyle=\frac{e_{3}}{\left(e_{2}\right)^{3}} (22)

Distributions for all Energy Correlation observables are shown in Fig. 10

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 10: Distributions of jet energy correlation functions (C2β=1,C2β=2,D2β=1andD2β=2)\left(C_{2}^{\beta=1},C_{2}^{\beta=2},D_{2}^{\beta=1}\,\text{and}\,D_{2}^{\beta=2}\right) and pairs e2e_{2} and e3e_{3} shown for semi-visible jet (red, green, blue) and the standard model background (SM Bkg, yellow) for the six simulated scenarios, three choices of invisible fraction rinvr_{\textrm{inv}} for both the ss-channel and tt-channel processes.

B.4 N-Subjettiness

Given subjets isolated via clustering, for N candidate subjets, the N-subjettiness (τN\tau_{N}Thaler and Van Tilburg (2011) is defined as,

τN=1d0kpT,kmin(Δθ1,k,Δθ2,k,,ΔθN,k)\tau_{N}=\frac{1}{d_{0}}\sum_{k}p_{\textrm{T},k}\textrm{min}\left(\Delta\theta_{1,k},\Delta\theta_{2,k},\ldots,\Delta\theta_{N,k}\right) (23)

where we define the normalization factor d0d_{0} by,

d0=kpT,kR0d_{0}=\sum_{k}p_{\textrm{T},k}\,R_{0} (24)

where R0R_{0} is the characteristic jet radius used during clustering. Finally, the N-subjettiness ratios used are defined by

τ21β=1\displaystyle\tau_{21}^{\beta=1} =τ2τ1\displaystyle=\frac{\tau_{2}}{\tau_{1}} (25)
τ32β=1\displaystyle\tau_{32}^{\beta=1} =τ3τ2\displaystyle=\frac{\tau_{3}}{\tau_{2}} (26)

Distributions for both N-subjettiness observables are shown in Fig. 11

Refer to caption
Refer to caption
Figure 11: Distributions of jet N-subjettines (τ21β=1andτ32β=1)\left(\tau_{21}^{\beta=1}\,\text{and}\,\tau_{32}^{\beta=1}\right) shown for semi-visible jet (red, green, blue) and the standard model background (SM Bkg, yellow) for the six simulated scenarios, three choices of invisible fraction rinvr_{\textrm{inv}} for both the ss-channel and tt-channel processes.

B.5 Groomed Momentum Splitting Fraction

The splitting fraction is described in terms of the Soft Drop grooming technique in Ref. Larkoski et al. (2014a). The feature is calculated using energyflow Komiske et al. (2018) with the Cambridge/Aachen algorithm using a jet radius of R=1R=1 and Soft Drop parameters of β=0\beta=0 and zcut=0.1z_{\text{cut}}=0.1.

A distribution for zgz_{g} is given in Fig. 12

Refer to caption
Figure 12: Distributions of jet splitting function zgz_{g} shown for semi-visible jet (red, green, blue) and the standard model background (SM Bkg, yellow) for the six simulated scenarios, three choices of invisible fraction rinvr_{\textrm{inv}} for both the ss-channel and tt-channel processes.

Appendix C ML Architectures

C.1 Deep Neural Networks

All deep neural networks were trained in Tensorflow Abadi et al. (2015) and Keras Chollet and others (2015). The networks were optimized with Adam Kingma and Ba (2014) for up to 100 epochs with early stopping. For all networks, weights were initialized using orthogonal weights Saxe et al. (2013). Hyperparameters were optimized using bayesian optimization with the Sherpa hyperparameter optimization library Hertel et al. (2020).

C.2 High-Level DNN

All HL features are preprocessed with Scikit’s Standard Scaler Pedregosa et al. (2011) before training.

C.2.1 Deep Neural Networks

Hyperparameters and network design for all Dense networks trained on HL or EFP features are selected via Sherpa optimization using between two and eight fully connected hidden layers and a final layer with a sigmoidal logistic activation function to predict the probability of signal or background.

C.2.2 Particle-Flow Networks

The Particle Flow Network (PFN) is trained using the energyflow package Komiske et al. (2019). Input features are taken from the trimmed jet constituents and preprocessed by centering the constituents in (ηφ)(\eta-\varphi) space to the average pTp_{\textrm{T}} and normalizing constituent values to 1. Both the EFN and PFN use this constituent information as inputs in the form of the 3-component hadronic measure measure option in EnergyFlow (i.e. pTp_{\textrm{T}}, η\eta, φ\varphi).

The PFN uses 3 dense layers in the per-particle frontend module and 3 dense layers in the backend module. Both frontend and backend layers use 300 hidden nodes per layer with a latent and filter dropout of 0.2. Each layer uses relu Nair and Hinton (2010) activation and glorot normal initializer. The final output layer uses a sigmoidal logistic activation function to predict the probability of signal or background. The Adam optimizer is used and trained with a batch size of 128 and a fixed learning rate of 0.001.

C.3 Boosted Learning Models

HL features are, again, preprocessed with Scikit’s Standard Scaler before training. Except where indicated, default settings are used.

C.3.1 LightGBM

All applications of LightGBM are trained using regression for binary log loss classification using Gradient Boosting Decision Trees. Performance is measured by the AUC metric for a maximum of 5000 boosting rounds and early stopping set to 100 rounds against AUC improvements.

C.3.2 XGBoost

All applications of XGBoost are trained using using the gradient tree booster and settings of: η\eta=0.1, subsample=0.5, base_score=0.1, γ\gamma=0.0, and max_depth=6. Performance is measured by the AUC metric for a maximum of 5000 boosting rounds and early stopping set to 100 rounds against AUC improvements.

C.4 Convolutional Networks on Jet Images

The Convolutional Neural Networks used a jet image produced through EnergyFlow’s pixelate function. Jet images were produced as a 32×3232\times 32 pixel matrix with an image width of 1.0. Resulting jet images were then normalized to a range of values between [0,1]. The network consisted of 3 hidden layers consisting of 300 nodes and used kernels of size 3×33\times 3 and strides of 1×11\times 1. Each layer uses relu Nair and Hinton (2010) activation and glorot normal initializer. The final output layer uses a sigmoidal logistic activation function to predict the probability of signal or background. The Adam optimizer is used and trained with a batch size of 128 and a fixed learning rate of 0.001.

References