Detecting anomaly in vector boson scattering
Abstract
Measuring the vector boson scattering (VBS) precisely is an important step towards understanding the electroweak symmetry breaking of the standard model (SM) and detecting new physics beyond the SM. We propose a neural network which compress the features of the VBS into three dimensional latent space. The consistency of the SM prediction and the experimental data is tested by the binned log-likelihood analysis in the latent space. We will show that the network is capable of distinguish different polarization modes of production in both dileptonic channel and semi-leptonic channel. The method is also applied to constrain the effective field theory and two Higgs Doublet Model. The results demonstrate that the method is sensitive to generic new physics contributing to the VBS.
I Introduction
Vector Boson Scattering (VBS) represents sensitive probe of both the Standard Model (SM) electroweak symmetry breaking (EWSB) and new physics Beyond-the-SM (BSM) Rauch:2016pai ; Green:2016trm . If the couplings of the Higgs boson to vector bosons deviate from the SM prediction, the cross sections of VBS processes will increase with center-of-mass energy up to the scale of new physics. In addition, many BSM models predict extended Higgs sector. The contribution from new resonances can also increase the VBS cross section in certain phase space.
Measuring the VBS processes at hadron collider is experimentally challenging due to their low signal yields and complex final states. The LHC experiments have built comprehensive searches for the VBS processes Alessandro:2018khj ; Baglio:2020bnc ; Gallinaro:2020cte . The same-sign production with leptonic decay has the largest signal-to-background ratio among VBS processes. This channel was the first VBS process that has been observed during the run 1 of the LHC Aad:2014zda ; Khachatryan:2014sta and has been confirmed by the measurements at the LHC run II Aaboud:2019nmv ; Sirunyan:2017ret . The ATLAS and CMS Collaborations have also performed the measurements for other VBS channels, such as fully leptonic Sirunyan:2017fvv ; Aad:2020zbq , fully leptonic Aaboud:2018ddq ; Sirunyan:2019ksz and semi-leptonic or with the decaying hadronically Aad:2019xxo ; Sirunyan:2019der . New physics contributions to the VBS channels are usually parameterized by effective field theory (EFT) operators. Precision measurement of the VBS channels can be recast as constraints on the coefficient of the operators Fabbrichesi:2015hsa ; Liu:2018pkg ; Stolarski:2020qim .
Understanding the polarization of the gauge bosons is an important step after the measurements of the VBS processes. Vector bosons are unstable and can only be observed through their decay products. This lead to the interference among different polarizations, which cancels exactly only when the azimuthal angles of the decay products are integrated over. Even though selection cuts in analyses render the incompleteness of the cancellation, it is still possible to extract polarization fractions by fitting data with Monte Carlo simulated templates. There are studies aiming to determine the polarization of gauge bosons in the channel Han:2009em ; Ballestrero:2017bxn , in fully leptonic channel Ballestrero:2020qgv , in fully leptonic WZ/ZZ channels Ballestrero:2019qoy , in the SM Higgs decay Maina:2020rgd and in generic processes with boosted hadronically decaying boson De:2020iwq . Various kinematic observables have been proposed in these works to discriminate the longitudinal and transverse polarized gauge boson. Several recent studies have shown that deep neural network with input of final states momenta can be used for regression of the lepton angle in the gauge boson rest frame Searcy:2015apa ; Grossi:2020orx and classification of events from different polarizations Lee:2018xtt ; Lee:2019nhm .
Autoencoders have been widely used in model-agnostic searches at colliders, dubbed as anomaly detection or novelty detection. The main function of the autoencoder is that it learns to map an input to a latent compressed representation and then back to itself. The autoencoder which is trained on known SM processes could be able to identify the BSM events as anomalies Cerri:2018anq ; Collins:2018epr ; Collins:2019jip ; Blance:2019ibf ; Andreassen:2020nkr ; Nachman:2020lpy ; Collins:2019jip ; Farina:2018fyg ; Roy:2019jae . In other cases, when the anomaly can not be detected on a single event, density-based novelty evaluators DAgnolo:2018cun ; DeSimone:2018efk ; Hajer:2018kqm are proposed to detect discrepancies between two datasets in the latent space. Since the VBS processes are the perfect window to access any new physics related with EWSB, we can adopt autoencoders to detect possible new physics contributions to the process.
In this work, focusing on the fully leptonic and semi-leptonic channels of the +jets process, we propose a neural network based on the Transformer architecture vaswani2017attention to learn the features of the VBS process. Those features are not only useful in separating the VBS process from the SM backgrounds but also capable of discriminating different polarizations of the bosons in the VBS process. An autoencoder is trained on the features to reduce the dimensionality so that only the most relevant features are kept. Eventually, we perform binned log-likelihood test in the latent space to find out whether the distributions of the feature is coincide with the SM prediction. The EFT and Two Higgs Doublet Model (2HDM) are considered as examples to demonstrate that this method is able to test a wide class of BSM physics.
The paper is organized as follows. The analysis framework is introduced in Sec. II, including the event generation, architecture of neural network and binned log-likelihood analysis. Discrimination of different polarization modes of the production is discussed in Sec. III. In Sec. IV and Sec. V, we consider the applications of our method to effective field theory and two Higgs Doublet Model, respectively. Our conclusions are presented in Sec. VI.
II Analysis framework
II.1 Event generation for signals and backgrounds
The signal and background events in our study is generated with the MadGraph5_aMC@NLO Alwall:2014hca framework, in which the Madspin is used for the decays of heavy SM particles (top quark, W/Z boson), and Pythia 8.2 Sjostrand:2007gs is used for parton shower, hadronization and decay of hadrons. The latest version of MG5 is capable of handling polarized parton scattering BuarqueFranzosi:2019boy . This function is adopted to simulate the events of the VBS processes with fixed vector boson polarization in the final state. The detector effects are simulated by Delphes 3 with ATLAS configuration card, where -tagging efficiency is set to 70%, and mistagging rates for the charm- and light-flavor jets are 0.15 and 0.008, respectively ATLAS:2016gsw . The clustering of final state particles into jets are implemented by FastJet Cacciari:2011ma using the anti- algorithm with cone size parameter .
All of the diagrams at ( is the electroweak coupling constant) are included in simulating the VBS process (referred as EW production hereafter), such as , processes with final state vector boson radiated from quark directly, and the significant interferences among diagrams. There are also mixed electroweak-QCD diboson productions at , where is the strong coupling constant. In the SM, the interference between the electroweak and mixed EW-QCD production is found to be small Biedermann:2017bss ; Ballestrero:2017bxn ; Campanario:2020xaf . In simulating the polarized processes, the definition of the polarization is frame-dependent. We take the partonic center of mass frame as the reference frame in this work, i.e. the rest frame defined by the two initial parton in the process 111One could also use the rest frame of system as the reference frame, in which the fraction of longitudinal polarized boson is slightly higher BuarqueFranzosi:2019boy . .
We will study both the dileptonic channel and semi-leptonic channel of the EW production. So that at least one of the bosons should be decaying leptonically (denoted by ). The dominant backgrounds are QCD production of process, single top production, mixed EW-QCD production of and the EW production of . Since the fully hadronic final states are not relevant in our analysis, the following requirements are applied in generating the background events: (1) at least one of the top decays leptonically in the process (denoted by ); (2) either or top quark decays leptonically in the process (denoted by ) ; (3) at least one of the boson decays leptonically in the mixed electroweak-QCD process (denoted by ); (4) the boson decays leptonically in the mixed electroweak-QCD process (denoted by ) and in the EW process (denoted by ). In all of those cases, the transverse momenta of final state jets should be greater than 20 GeV. We will use the measured inclusive cross sections at the LHC for CMS:2016rtp and Sirunyan:2018lcp processes, and use the leading order cross sections which are calculated by MadGraph5_aMC@NLO for diboson processes. The fiducial cross sections at 13 TeV LHC are provided in the second column of Tab. 1.
[pb] | [fb] | [fb] | |
210.3 | 139.8 | 3007.6 | |
/ | 15.9 | 11.6 | 224.6 |
4.68 | 14.7 | 340.5 | |
2.20 | 4.49 | 165.7 | |
0.487 | 3.68 | 22.2 | |
0.738 | 4.36 | 37.3 |
The events are divided into two classes with the following preselections Alessandro:2018khj :
-
•
Di-Lepton: exactly two opposite sign leptons with ; at least two jets with ; the two jets with leading should give large invariant mass ( GeV) and have large pseudorapidity separation (); no -tagged jet in the final state.
-
•
Semi-Lepton: exactly one charged lepton with ; at least four jets with ; the pair of jets with the largest invariant mass ( GeV) that also satisfies is taken as the forward-backward jet pair; (4) among the remaining jets, the jet pair with invariant mass closest to the boson mass is regarded as the jet pair from decay.
The cross sections for signal and backgrounds after the Di-Lepton and Semi-Lepton selections are provided in the third and fourth columns of the Tab. 1, respectively. We can find that the process is the most important background in both channels, the cross section of which is times larger than that of the VBS process.
The preselected events are fed into the network for learning the features. The deep learning is known to be able to transform lower level inputs into discriminative outputs. So we represent each event by a set of four-momenta 222We use the , although sometimes is used. and their identities (the lepton charge is implied). Different networks will be adopted for dileptonic channel and semi-leptonic channel. The input for the network of dileptonic channel consists of momenta of two leptons, forward and backward jets, sum of all detected particles and sum of jets that are not assigned as forward-backward jets. And the input for the network of semi-leptonic channel consists of momenta of the lepton, forward and backward jets, two jets from decay, sum of all detected particles and sum of remaining jets 333Jets that are not assigned as forward-backward jets and jets from boson decay.. In short, there are six/seven momenta with identities for the input of dileptonic/semi-leptonic channel.
II.2 Architecture of neural network
A simple fully connected neural network can extract the features of the input data, but there are a lot of redundant connections, which will make the extraction efficiency low and prone to overfitting. These problems can be alleviated by including the attention mechanism. As proposed in Ref. vaswani2017attention , the Transformer with multi-head self-attention mechanism provides a variety of different attentions and improved the learning ability, thus can be used to effectively extract the internal connections of features.

The architecture of our neural network is illustrated in Fig. 1. The input consists of identities and four-momenta of particles ( for the dileptonic/semi-leptonic channel). The original particles momentum () is normalized according to
(II.1) |
where the index runs over particles in an event. The mean and standard deviation are calculated on the particles from the full set of the training sample. Then, we embed the particles identities of each event into a uniform distribution (), and map the normalized four momenta to a matrix () through a Mapping network. The Mapping network is a fully connected neural network with 4 hidden layers (each layer contains 64 neurons). The summation of those two components (encode the types of particles into the four momenta, denoted by ) are fed into the Transformer. The Transformer contains four copies of encoder layers. Each encoder consists of a self-attention layer and a feed forward neural network followed by normalization layers. In particular, the self-attention layer map the into
(II.2) |
where is constructed from , and are trainable parameter matrices.
The output of the Transformer is a matrix of size . The features are obtained by averaging over the particle index (thus it has the shape ). Eventually, a Classifier and an Autoencoder is applied for classifying the inputs (to the processes which they belong) and reducing the dimensionality of the feature space. The Classifier and Autoencoder are trained simultaneously, using the Adam optimizer with learning rate of . Even though higher dimensional feature space provides better discrimination power, the statistical uncertainty in shape analysis is significantly larger due to the limited number of simulated events ( for each signal processes after preselection). In Fig. 2, we show the stabilized loss (typically after 100 epochs in the training) of the Autoencoder for different choices of the dimensionality of feature space. We can find that for all of the polarization modes in both dileptonic and semi-leptonic channels, the three dimensional latent space can reproduce the 64-dimensional features reasonably well (with loss ). Meanwhile, the binned log-likelihood analysis can be performed with relatively small statistical uncertainty.


II.3 Binned log-likelihood analysis in the latent space
The 3-dimensional latent space is divided into bins for dileptonic channel and bins for semi-leptonic channel, since the latter has larger production rate. In principle, one could perform the binned log-likelihood test over all of the bins. However, we find this renders the result sensitive to the tail of the distribution, where the signal and background event numbers are small. Although more dedicated analysis is possible to resolve this issue, we try to only use bins that contain relatively large number of signal events as a simpler alternative. Among the bins which contain at least 1% of total signal events, ten with highest signal to background ratios are selected for the log-likelihood test 444For the EFT case, since the kinematic feature of production with non-zero is similar to that of the SM , the selected bins are identical in most of the cases. As for the 2HDM, around half of the selected bins are different from those of SM . Moreover, the selected bins are different from parameter point to parameter point in the 2HDM. . The backgrounds here refers to the summed contributions of , /, , and processes. And the signal here refers to the and its new physics modifications. In realistic experiments, the number of signal in each bin can be obtained by subtracting the predicted background event number from the measured number. This procedure selects 30% of signal events and 0.5% of total background events in most of the cases. According to the cross sections in the Tab. 1, this procedure reduce the cross section of combined backgrounds to the same level as that of the VBS signal.
For a given hypothesis (either the SM or new physics BSM), the expected number of events () in the th bin can be obtained from Monte Carlo simulation. The probability of the th bin having observed events follows the Poissonian probability, . So we can determine the probability of the full distribution by multiplying the Poissonian probabilities of the selected bins. The binned likelihood for hypothesis is defined as
(II.3) |
where runs over 10 selected bins. Subsequently, we can define the test statistic as the log likelihood ratio between a given hypothesis (new physics with fixed parameters) and the null hypothesis (the SM).
(II.4) |
We use the expected numbers of events from two hypotheses ( and ) to generate two sets of pseudo-data. In each bin, the pseudo-data is obtained by generating a random number from Poissonian (statistical uncertainty) plus Gaussian distribution (systematical uncertainty) with mean value of . We repeat this procedure times for and , respectively. This gives two distributions of the test statistic . Finally, the -value of the test hypothesis () can be calculated by assuming that the actual observation is at the center of distribution under null hypothesis.
III Learning the features of vector boson polarization
Among polarization modes of the VBS processes, the longitudinally polarized component is most closely related to the unitarity issue, i.e. the property of the Higgs boson and possible new physics. There have been extensive studies on separating the polarization of the gauge boson in the VBS process, exploiting various kinematic variables. The lepton angular distribution in the gauge boson rest frame is known to be sensitive to the vector boson polarization,
(III.1) |
where the is the fraction of the corresponding helicity and the is the angle between the vector boson flight direction in a certain frame and the lepton flight direction in the vector boson rest frame. Even though the shape of the angular distribution is a good discriminating variable, it can not be reconstructed precisely for the most of the time. In the dileptonic channel of , there are two missing neutrinos in the final state. One can not reconstruct the rest frame for individual boson. As for the semi-leptonic channel, even though the neutrino momentum can be solved up to a twofold ambiguity (thus the full momenta of all particles can be calculated), there are usually large uncertainties in measuring the jets momenta and identifying the forward-backward jets and jets from boson decay. Moreover, the shape of the distribution can be distorted by kinematic cuts that need to be used to separate VBS from its backgrounds Stirling:2012zt .
In this section, we demonstrate that our network is capable of discriminating different polarization modes of the electroweak production with the low-level inputs.
III.1 The dileptonic channel
We train the network with labeled events of electroweak , , , productions, respectively. Here () represents longitudinally (transversely) polarized boson. The normalized 555Integrating the distribution over all bins gives one. distributions of those polarization modes in the three dimensional latent space are shown in Fig, 3. Larger size of cube indicates more events in that bin. We can find remarkable differences in the distributions of different polarizations.




To assess the discriminating power of our network, we perform a comparative study on methods with different input variables. Besides the three latent features, two classes of variables are defined 666We have tried many other variables, only those showing significant discriminating power are kept.:
-
•
Detector level variables: Variable in this class can be reconstructed experimentally, include the transverse momenta of two leptons and the forward-backward jets ; the azimuthal angle difference between the forward and backward jets .
-
•
Truth level variables: Variable in this class can only be obtained from Monte Carlo simulation, include the transverse momenta of two bosons ; the lepton angle in the boson rest frame . The later is calculated by , where is the boson momentum in the initial parton center of mass frame and is the lepton momentum in the boson rest frame.
The Gradient Boosting Decision Tree (GBDT) method is adopted to calculate the receiver operating characteristic (ROC) curves with inputs of the variables in a class either with or without including the latent variables. The ROC curves are showing in the left panel of Fig. 4, where we have considered the events of the as the signal and events of other polarization modes as background. We can find that the method using latent features alone have already outperform the GBDT with all detector level variables. And the GBDT which combines the latent variables with the detector level variables does not have better discriminating power than the method with solely latent variables. It indicates that the information of those detector level variables should have been included in the latent variables. The GBDT with truth level variables have slightly improved discriminating power than the method with latent variables. It is also interesting to observe that the discriminating power can be improved further by combining the truth level variables and latent variables.


When the new physics modifies the Higgs to gauge boson interaction, the incomplete cancellation in the VBS amplitude leads to an increased fraction of longitudinal polarized gauge boson final state. The current precision measurements of the SM allows the increasement of fraction by a percent level, e.g. from 6% to 7% in the following case. To study the sensitive of latent variables to this amount of change, we perform the binned log-likelihood analysis, taking the SM cross section (after applying the cut of GeV at parton level) for each polarized component. These are fb, fb, fb and fb respectively. The test hypothesis take fb while keep other cross sections the same. The p-values for the hypothesis test with varying the integrated luminosity are shown in Fig. 4, where we have considered the cases with three different systematical uncertainties. We can conclude that future LHC is capable of detecting such change, if the systematic uncertainty is below . Note that the background processes are not considered at this stage. Moreover, the new physics may not be simply considered as the summation of the SM components. More complete and realistic analysis will be given in the next two sections.
III.2 The semi-leptonic channel
Comparing to the dileptonic channel, the semi-leptonic channel has much larger production cross section and only includes a single neutrino in the final state. Better discrimination power can be obtained in this channel. Similarly, the network for the semi-leptonic channel is trained with labeled events of EW production of with different polarizations. The normalized distribution for each polarization mode in the latent space is shown in Fig. 5.




Two classes of variables that are used in GBDT method to calculate the ROC curves are listed as follows.
-
•
Detector level variables: transverse momentum and pseudorapidity of the lepton, azimuthal angle difference between forward backward jets and the transverse momentum of boson pair which can be calculated by vector sum of the transverse momenta of its decay products (including the missing transverse momentum).
-
•
Truth level variables: transverse momenta of two bosons , the lepton angle in the W boson rest frame and the invariant mass of the forward backward jets .


The ROC curves for methods with different inputs are presented in the left panel of Fig. 6. Even though the semi-leptonic channel only contain one neutrino in the final state, the large uncertainty in jet measurement and confusion of forward-backward jets with jets from boson decay render the similar polarization discriminating power of this channel with that of the dileptonic channel. However, due to the sizable production rate of this channel, dataset with integrated luminosity of fb-1 can be used to probe the 1% change in the fraction.
It should be noted that this result is only provided as a rough estimation. In a concrete model, the differential cross section of the EW channel is not simply given by the combination of the SM polarization components. Variables other than those listed above can be helpful in discriminating different polarizations. Meanwhile, the contribution from SM background processes should be taken into account. In the following two sections, we will consider the effective field theory and two-Higgs-Doublet-Model (2HDM) as case study.
IV Application to the effective field theory
In absence of direct observations of new states, a practical way for investigating the new physics lies in a description based on the EFT, which is valid up to the scale of new physics. The EFT contains a complete set of independent gauge-invariant operators made up by the SM fields. There have been numerous studies on constraining the coefficients of these operators with precision measurements at experiments DiVita:2017eyz ; Ellis:2018gqa ; Grojean:2018dqj ; Biekotter:2018rhp ; Almeida:2018cld . Most of the operators are tightly constrained by the elctroweak precision tests (EWPT) of the SM. We will consider the following operator Giudice:2007fh ; Contino:2013kra
(IV.1) |
since it is less constrained by the EWPT. The field is Higgs doublet and denotes the Higgs boson field with the vacuum expectation value GeV. The operator contributes to the Higgs boson kinetic term, and an appropriate field redefinition is required to bring back the kinetic term to its canonical form
(IV.2) |
It leads to the following changes to the Higgs couplings
(IV.3) |
The up-dated global fit to the EFT coefficients constrains (marginalizing over all other operators) Dawson:2020oco . Future lepton colliders, such as the ILC, will constrain the to the 1% level Jung:2020uzh .
We study its effects on the EW production at the LHC. As the polarization vector grows with momentum , the longitudinally polarized gauge boson scattering () is dominant at high energy. In the high energy limit, the amplitude for the longitudinal boson scattering without Higgs contribution is
(IV.4) |
which cancels with the amplitude from Higgs exchange
(IV.5) |
leaving terms not rising with energy. Here, are Mandelstam variables. However, the cancellation only holds if the Higgs boson couplings to gauge bosons are exactly SM-like. The operator modifies the Higgs boson couplings as shown in Eq. IV.3, leading to an incomplete cancelation up to the scale where new physics states come in. As a result, the fraction of the is increased and the kinematic properties of final states are changed.
-1.0 | -0.5 | 0 | 0.5 | 1.0 | |
[fb] | 440.6 | 421.8 | 419.7 | 426.7 | 436.2 |
[fb] | 4.82 | 4.44 | 4.36 | 4.48 | 4.62 |
[fb] | 40.2 | 37.7 | 37.3 | 37.9 | 39.3 |
[fb] | 46.29 | 29.68 | 25.84 | 28.79 | 34.01 |
[fb] | 0.754 | 0.397 | 0.314 | 0.356 | 0.462 |
[fb] | 5.28 | 3.04 | 2.40 | 2.79 | 3.50 |
We adopt the UFO model as implemented in Ref. Alloul:2013naa to generate the EW events in the EFT. All of the coefficients except the are set to zero. Both the dileptonic channel and the semi-leptonic channel are considered. Only those events that pass through the preselection cuts as listed in Sec. II.1 will be fed into the network for further analyses. The production cross section of the EW process (with different choices of ) before and after the preselections are given in Tab. 2. The case corresponds to the SM. We can find the fraction of the longitudinal production increases with as the cancellation become less exact. And our preselection cuts can raise the fraction of the longitudinal , especially for the dileptonic channel. After the preselections, the production rate of the semi-leptonic channel is an order of magnitude large than that of the dileptonic channel.
In this and the next section, the same network that is trained on the labeled SM background processes as well as the SM with different polarizations is used for testing. Events of the new physics are not used for training the network, in order to show that our method is model agnostic. Analyzing the preselected events of both SM background processes and the EFT processes with the pre-trained network, we can obtain the distributions of those processes in the 3-dimensional latent space. The normalized distributions are presented in Fig. 7, where the background corresponds to the weighted sum of all SM processes (including , /, , and ) as discussed in Sec. II.1. Since the network is trained to classify the SM background processes with the SM , it is not surprised to find that the background events are well separated from the signal events (EW production in the EFT). Moreover, there are visible differences among the distributions of EW production with different . This feature can be used to constrain the value of .








To measure the consistency of the SM and EFT with non-zero , we perform the binned log-likelihood test in the latent space. As have been discussed in Sec. II.3, only ten bins with highest signal to background ratios are used. According to our simulation, this will select 30% signal events and 0.5% background events after the preselection. The null hypothesis is the SM backgrounds plus SM EW and the test hypothesis is the SM backgrounds plus EFT EW with a non-zero . The required integrated luminosity to achieve 95% Confidence Level (C.L.) probing for different are presented in Fig. 8. It can be seen that the semi-leptonic channel outperforms the dileptonic channel if the systematic uncertainty can be controlled below 5%. Due to higher backgrounds in the semi-leptonic channel, the sensitivity drop quickly when the systematic uncertainty is larger than 5%. With systematic uncertainty around 5%, our method will be able to constrain the to [-0.2,0.1] at high luminosity LHC.


IV.1 Effects of event simulation error
Since our network is trained to detect the anomaly in the simulated SM processes, it could be sensitive to the errors in the simulation. In Fig. 9, we show how the results of our shape analyses change if the testing samples are simulated independently from the training ones. To calculate the -values in the figure, the null hypothesis is always the SM prediction with events simulations as have been discussed above. In test hypothesis (NSM and N), the events of the SM processes are simulated independently with Herwig++ Bahr:2008pv ; Bellm:2015jjp for parton shower and hadronization, and Delphes with ATLAS parameters for detector simulation. For the SM processes, two independent simulations lead to () systematical deviations in the selected bins for dileptonic (semileptonic) channel. As a result, if the systematic uncertainty in the shape analysis is chosen to be smaller than the systematical deviations caused by the simulation, event samples of two simulations for the SM processes can be distinguished, as shown by the blues lines in both panels. Moreover, the difference of the simulations in null and test hypothesis renders over-optimistic results for the sensitivity to new physics, although the effects is mild when the systematic uncertainty in the shape analysis is chosen to be large.


V Application to the 2HDM
The EFT description may not valid when the collision energy is approaching the masses of new states. Here we consider an ultraviolet complete model, the 2HDM Aoki:2009ha ; Branco:2011iw which is one of the simplest extension to the Higgs sector of the SM. The scalar sector of the 2HDM consists of two doublets. A discrete symmetry is imposed to avoid tree-level flavor changing neutral currents. Depending on how this symmetry is extended to the fermion sector, four types of the 2HDM can be realized. The type-II case will be considered in this work. The 2HDM predicts many remarkable signatures at the hadron collider. In particular, there are resonant signals due to the existence of extra CP-even scalar, CP-odd scalar and charged scalar. Instead of proposing dedicated search for each of those signals, we will show that our method is sensitive to changes of the polarization and kinematic properties of the EW production in the 2HDM. Comparing the latent features of the process in the 2HDM with those from measurement, constraints on the parameters of the 2HDM can be obtained.
There are six parameters in the type-II 2HDM: mass of scalars (, and ), the mixing angle between two CP-even scalars and the ratio between two vacuum expectation values . The has been measured to be close to 125 GeV. The and are not relevant in the production. Their mass is set to 3 TeV to forbid the decays of into those states. The couplings of CP-even scalars to the bosons are given by
(V.1) |
So the combination is usually used to replace the parameter. Even though the alone is not related to the couplings, it can modify the scalar to fermions couplings, which means the total decay width of the thus the kinematics of can be affected. We will chose for simplicity 777The influence of the to the production is mild as long as the decay width of the is not too large.. So we are left with two free parameters: and . The partial widths of the are given by
(V.2) | ||||
(V.3) | ||||
(V.4) | ||||
(V.5) |
with , and / is the Yukawa coupling of the top/bottom quark.
The model is implemented in FeynRules Alloul:2013bka , which generates the UFO model files for the MG5 to calculate the leading order production cross section and simulate the events. In Tab. 3, we present the production cross sections of the EW process for a few points in the 2HDM as illustration. In particular, the contribution of the heavy scalar is taken into account, which lead to an increased total production rate for the most of the time 888The cross section in 2HDM can be smaller than that in SM when the mass of the is heavy and decay width of the is large, because of the destructive interference between and in some phase space..
(300,0.7) | (300,0.9) | (700, 0.7) | (700,0.9) | |
[fb] | 636.2 | 492.5 | 461.9 | 428.5 |
[fb] | 8.362 | 5.853 | 5.527 | 4.842 |
[fb] | 64.07 | 46.52 | 43.70 | 39.33 |
[fb] | 170.75 | 79.81 | 71.58 | 42.65 |
[fb] | 2.91 | 1.27 | 1.30 | 0.676 |
[fb] | 20.78 | 9.35 | 9.50 | 5.06 |
Due to the facts that the cancellation between the amplitudes with and without Higgs exchange are delayed to the scale of and the heavy scalar dominantly decays into longitudinally polarized vector boson, the fraction of is considerably larger than that of the SM. For relatively light and small (which means the contribution of is significant), the fraction of can reach 30% before applying the preselection cuts, while the number is 6% in the SM. The preselections can increase the fraction even further. This feature renders our network very sensitive to the signals in the 2HDM.



Moreover, the existence of the resonance in the production also gives rise to discriminative features in the final state. In Fig. 10, we plot the normalized distributions of latent features for the production from pure resonance in the dileptonic channel. Different masses of the have distinct distributions in the latent space. It means the network is not only capable of classifying the polarizations of the vector bosons, but also sensitive to their kinematic properties, even though those 2HDM events are not used for trainning.
Finally, we pass the preselected events in dileptonic channel and semi-leptonic channel to the pre-trained network, to extract the latent features. The binned log-likelihood test is performed in the latent space to find out the discovery potential of models with different parameters in 2HDM. Similar as before, the null hypothesis is taken as the SM backgrounds plus the SM EW and the test hypothesis is taken as the SM backgrounds (assuming those processes are kept intact in 2HDM) plus the EW in 2HDM with different sets of parameters. The required integrated luminosity for achieving 95% C.L. probing on the - plane are shown in Fig. 11, for dileptonic channel and semi-leptonic channel, respectively. Unlike the traditional heavy Higgs resonant searches Aaboud:2018bun ; Sirunyan:2019pqw , the sensitivities of which drop quickly at large due to the suppressed production rate. Our method probe both the resonant feature and the modification to Higgs couplings simultaneously. The parameter space with as heavy as 1.5 TeV can be probed with relatively low integrated luminosity provided the is not too close to one. However, as (the alignment limit), our method loss the sensitivity completely. Searches for the resonances in fermionic channels are still able to constrain the model CMS:2019hvr ; Aad:2020zxo ; Kling:2020hmi ; Chen:2015fca , since their productions are mainly controlled by the Yukawa couplings. The production cross sections of both channel before applying the preselection cuts are indicated by the color grades in the figure. We can find the sensitivity of the method is roughly determined by the cross section, even though a slightly better sensitivity can be achieved in the small region, e.g. comparing to the the point (), lower integrated luminosity is required to probe the point (), even though their production cross sections are similar. The improvement of the sensitivity attribute to the fact that point with smaller contains larger fraction of the longitudinal boson.


VI Discussion and conclusion
In this work, we construct a neural network consists of a classification network and an autoencoder. With the input of low level information (4-momenta and the identities of particles in our case), the network is capable of reducing the dimensionality of the feature space for production, without losing much discriminating power (discriminating the EW from other processes, as well as discriminating different polarization modes of the EW ). We find the feature space of both dileptonic and semi-leptonic channels can be compacted into three dimensions. Performing the binned log-likelihood test on the distributions of latent features, we can draw the conclusion whether the data is consistent with the SM predict. We have shown that those latent features are very sensitive to various possible new physics contributing to the VBS. Even though the scores given by the classifier network contain a certain amount of the process information, they are not as complete as the latent features. In Fig. 12, we present the sensitivities of the latent features and the sensitivities of the score 999Among the scores, we find the summation of scores of all polarization components of EW lead to the best result. So it is used for calculating the -value in the plots. obtained by the classifier for two benchmark points in the EFT and the 2HDM. It is not surprised to find out that the latent features have better sensitivities. In particular, the remarkable kinematic feature of the 2HDM is not very useful in classifying SM processes, which means this sort of information can be lost in the scores given by the classifier. Comparing to the EFT case, the improvements of using latent features are much more significant in the 2HDM model.


Considering both the dileptonic and semi-leptonic channel of the production, we show that our network is capable of classifying different polarization modes efficiently. Without considering the background, the LHC dataset with integrated luminosity fb-1 will be sufficient to probe the 1% change in the longitudinal fraction, using the semi-leptonic channel. The dileptonic channel is less sensitive due to its small production rate. Then, the network is applied to the EFT with non-zero operator and the type-II 2HDM taking into account the background effects, to obtain more complete and realistic results. In the EFT, our method will be able to constrain the coefficient to [-0.2,0.1] providing the systematic uncertainty is around 5%. The dileptonic channel outperforms the semi-leptonic channel if the systematic uncertainty is higher than 5%. In the 2HDM, since our method is sensitive to both the resonant decay and the modification to the SM Higgs couplings, the whole region with and TeV can be probed with integrated luminosity 300 fb-1 at the LHC.
We note that modifications of the SM are unlikely to be confined to VBS processes. Assuming a new physics scenario of some kind, the model dependent searches can be very effective in discovering the signal. Our method may not as sensitive as those model dependent searches for specific signals. For example, in the 2HDM with , our method is insensitive to the parameter space where (corresponds to ). On the other hand, searches for at the LHC have already excluded the parameter space with GeV CMS:2019hvr ; Aad:2020zxo ; Kling:2020hmi . The advantage of our method is that it is suitable for detecting a wide class of new physics which contributes the VBS, i.e. related to the SM electroweak symmetry breaking. This is especially useful when the forms of new physics are not known.
Acknowledgement
This work was supported in part by the Fundamental Research Funds for the Central Universities, by the NSFC under grant No. 11905149 and No. 11875306.
References
- (1) M. Rauch, Vector-Boson Fusion and Vector-Boson Scattering, arXiv:1610.08420.
- (2) D. R. Green, P. Meade, and M.-A. Pleier, Multiboson interactions at the LHC, Rev. Mod. Phys. 89 (2017), no. 3 035008, [arXiv:1610.07572].
- (3) C. Anders et al., Vector boson scattering: Recent experimental and theory developments, Rev. Phys. 3 (2018) 44–63, [arXiv:1801.04203].
- (4) J. Baglio et al., VBSCan Mid-Term Scientific Meeting, in VBSCan Mid-Term Scientific Meeting, 4, 2020. arXiv:2004.00726.
- (5) M. Gallinaro et al., Beyond the Standard Model in Vector Boson Scattering Signatures, 5, 2020. arXiv:2005.09889.
- (6) ATLAS Collaboration, G. Aad et al., Evidence for Electroweak Production of in Collisions at TeV with the ATLAS Detector, Phys. Rev. Lett. 113 (2014), no. 14 141803, [arXiv:1405.6241].
- (7) CMS Collaboration, V. Khachatryan et al., Study of vector boson scattering and search for new physics in events with two same-sign leptons and two jets, Phys. Rev. Lett. 114 (2015), no. 5 051801, [arXiv:1410.6315].
- (8) ATLAS Collaboration, M. Aaboud et al., Observation of electroweak production of a same-sign boson pair in association with two jets in collisions at TeV with the ATLAS detector, Phys. Rev. Lett. 123 (2019), no. 16 161801, [arXiv:1906.03203].
- (9) CMS Collaboration, A. M. Sirunyan et al., Observation of electroweak production of same-sign W boson pairs in the two jet and two same-sign lepton final state in proton-proton collisions at 13 TeV, Phys. Rev. Lett. 120 (2018), no. 8 081801, [arXiv:1709.05822].
- (10) CMS Collaboration, A. M. Sirunyan et al., Measurement of vector boson scattering and constraints on anomalous quartic couplings from events with four leptons and two jets in proton–proton collisions at 13 TeV, Phys. Lett. B 774 (2017) 682–705, [arXiv:1708.02812].
- (11) ATLAS Collaboration, G. Aad et al., Observation of electroweak production of two jets and a -boson pair with the ATLAS detector at the LHC, arXiv:2004.10612.
- (12) ATLAS Collaboration, M. Aaboud et al., Observation of electroweak boson pair production in association with two jets in collisions at 13 TeV with the ATLAS detector, Phys. Lett. B 793 (2019) 469–492, [arXiv:1812.09740].
- (13) CMS Collaboration, A. M. Sirunyan et al., Measurement of electroweak WZ boson production and search for new physics in WZ + two jets events in pp collisions at 13TeV, Phys. Lett. B 795 (2019) 281–307, [arXiv:1901.04060].
- (14) ATLAS Collaboration, G. Aad et al., Search for the electroweak diboson production in association with a high-mass dijet system in semileptonic final states in collisions at TeV with the ATLAS detector, Phys. Rev. D 100 (2019), no. 3 032007, [arXiv:1905.07714].
- (15) CMS Collaboration, A. M. Sirunyan et al., Search for anomalous electroweak production of vector boson pairs in association with two jets in proton-proton collisions at 13 TeV, Phys. Lett. B 798 (2019) 134985, [arXiv:1905.07445].
- (16) M. Fabbrichesi, M. Pinamonti, A. Tonero, and A. Urbano, Vector boson scattering at the LHC: A study of the WW WW channels with the Warsaw cut, Phys. Rev. D 93 (2016), no. 1 015004, [arXiv:1509.06378].
- (17) D. Liu and L.-T. Wang, Prospects for precision measurement of diboson processes in the semileptonic decay channel in future LHC runs, Phys. Rev. D 99 (2019), no. 5 055001, [arXiv:1804.08688].
- (18) D. Stolarski and Y. Wu, Tree-level interference in vector boson fusion production of Vh, Phys. Rev. D 102 (2020), no. 3 033006, [arXiv:2006.09374].
- (19) T. Han, D. Krohn, L.-T. Wang, and W. Zhu, New Physics Signals in Longitudinal Gauge Boson Scattering at the LHC, JHEP 03 (2010) 082, [arXiv:0911.3656].
- (20) A. Ballestrero, E. Maina, and G. Pelliccioli, boson polarization in vector boson scattering at the LHC, JHEP 03 (2018) 170, [arXiv:1710.09339].
- (21) A. Ballestrero, E. Maina, and G. Pelliccioli, Different polarization definitions in same-sign scattering at the LHC, arXiv:2007.07133.
- (22) A. Ballestrero, E. Maina, and G. Pelliccioli, Polarized vector boson scattering in the fully leptonic WZ and ZZ channels at the LHC, JHEP 09 (2019) 087, [arXiv:1907.04722].
- (23) E. Maina, Vector boson polarizations in the decay of the Standard Model Higgs, arXiv:2007.12080.
- (24) S. De, V. Rentala, and W. Shepherd, Measuring the polarization of boosted, hadronic bosons with jet substructure observables, arXiv:2008.04318.
- (25) J. Searcy, L. Huang, M.-A. Pleier, and J. Zhu, Determination of the polarization fractions in using a deep machine learning technique, Phys. Rev. D 93 (2016), no. 9 094033, [arXiv:1510.01691].
- (26) M. Grossi, J. Novak, D. Rebuzzi, and B. Kersevan, Comparing Traditional and Deep-Learning Techniques of Kinematic Reconstruction for polarisation Discrimination in Vector Boson Scattering, arXiv:2008.05316.
- (27) J. Lee, N. Chanon, A. Levin, J. Li, M. Lu, Q. Li, and Y. Mao, Polarization fraction measurement in same-sign WW scattering using deep learning, Phys. Rev. D 99 (2019), no. 3 033004, [arXiv:1812.07591].
- (28) J. Lee, N. Chanon, A. Levin, J. Li, M. Lu, Q. Li, and Y. Mao, Polarization fraction measurement in ZZ scattering using deep learning, Phys. Rev. D 100 (2019), no. 11 116010, [arXiv:1908.05196].
- (29) O. Cerri, T. Q. Nguyen, M. Pierini, M. Spiropulu, and J.-R. Vlimant, Variational Autoencoders for New Physics Mining at the Large Hadron Collider, JHEP 05 (2019) 036, [arXiv:1811.10276].
- (30) J. H. Collins, K. Howe, and B. Nachman, Anomaly Detection for Resonant New Physics with Machine Learning, Phys. Rev. Lett. 121 (2018), no. 24 241803, [arXiv:1805.02664].
- (31) J. H. Collins, K. Howe, and B. Nachman, Extending the search for new resonances with machine learning, Phys. Rev. D 99 (2019), no. 1 014038, [arXiv:1902.02634].
- (32) A. Blance, M. Spannowsky, and P. Waite, Adversarially-trained autoencoders for robust unsupervised new physics searches, JHEP 10 (2019) 047, [arXiv:1905.10384].
- (33) A. Andreassen, B. Nachman, and D. Shih, Simulation Assisted Likelihood-free Anomaly Detection, Phys. Rev. D 101 (2020), no. 9 095004, [arXiv:2001.05001].
- (34) B. Nachman and D. Shih, Anomaly Detection with Density Estimation, Phys. Rev. D 101 (2020) 075042, [arXiv:2001.04990].
- (35) M. Farina, Y. Nakai, and D. Shih, Searching for New Physics with Deep Autoencoders, Phys. Rev. D 101 (2020), no. 7 075021, [arXiv:1808.08992].
- (36) T. S. Roy and A. H. Vijay, A robust anomaly finder based on autoencoders, arXiv:1903.02032.
- (37) R. T. D’Agnolo and A. Wulzer, Learning New Physics from a Machine, Phys. Rev. D 99 (2019), no. 1 015014, [arXiv:1806.02350].
- (38) A. De Simone and T. Jacques, Guiding New Physics Searches with Unsupervised Learning, Eur. Phys. J. C 79 (2019), no. 4 289, [arXiv:1807.06038].
- (39) J. Hajer, Y.-Y. Li, T. Liu, and H. Wang, Novelty Detection Meets Collider Physics, Phys. Rev. D 101 (2020), no. 7 076015, [arXiv:1807.10261].
- (40) A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need, 2017.
- (41) J. Alwall, R. Frederix, S. Frixione, V. Hirschi, F. Maltoni, O. Mattelaer, H. S. Shao, T. Stelzer, P. Torrielli, and M. Zaro, The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations, JHEP 07 (2014) 079, [arXiv:1405.0301].
- (42) T. Sjostrand, S. Mrenna, and P. Z. Skands, A Brief Introduction to PYTHIA 8.1, Comput. Phys. Commun. 178 (2008) 852–867, [arXiv:0710.3820].
- (43) D. Buarque Franzosi, O. Mattelaer, R. Ruiz, and S. Shil, Automated predictions from polarized matrix elements, JHEP 04 (2020) 082, [arXiv:1912.01725].
- (44) ATLAS Collaboration, Optimisation of the ATLAS -tagging performance for the 2016 LHC Run, .
- (45) M. Cacciari, G. P. Salam, and G. Soyez, FastJet User Manual, Eur. Phys. J. C 72 (2012) 1896, [arXiv:1111.6097].
- (46) B. Biedermann, A. Denner, and M. Pellen, Complete NLO corrections to W+W+ scattering and its irreducible background at the LHC, JHEP 10 (2017) 124, [arXiv:1708.00268].
- (47) F. Campanario, M. Kerner, D. Ninh, and I. Rosario, Diphoton production in vector-boson scattering at the LHC at next-to-leading order QCD, JHEP 06 (2020) 072, [arXiv:2002.12109].
- (48) CMS Collaboration, Measurement of the production cross section at 13 TeV in the all-jets final state, .
- (49) CMS Collaboration, A. M. Sirunyan et al., Measurement of the production cross section for single top quarks in association with W bosons in proton-proton collisions at TeV, JHEP 10 (2018) 117, [arXiv:1805.07399].
- (50) W. Stirling and E. Vryonidou, Electroweak gauge boson polarisation at the LHC, JHEP 07 (2012) 124, [arXiv:1204.6427].
- (51) S. Di Vita, C. Grojean, G. Panico, M. Riembau, and T. Vantalon, A global view on the Higgs self-coupling, JHEP 09 (2017) 069, [arXiv:1704.01953].
- (52) J. Ellis, C. W. Murphy, V. Sanz, and T. You, Updated Global SMEFT Fit to Higgs, Diboson and Electroweak Data, JHEP 06 (2018) 146, [arXiv:1803.03252].
- (53) C. Grojean, M. Montull, and M. Riembau, Diboson at the LHC vs LEP, JHEP 03 (2019) 020, [arXiv:1810.05149].
- (54) A. Biekoetter, T. Corbett, and T. Plehn, The Gauge-Higgs Legacy of the LHC Run II, SciPost Phys. 6 (2019), no. 6 064, [arXiv:1812.07587].
- (55) E. da Silva Almeida, A. Alves, N. Rosa Agostinho, O. J. Éboli, and M. Gonzalez-Garcia, Electroweak Sector Under Scrutiny: A Combined Analysis of LHC and Electroweak Precision Data, Phys. Rev. D 99 (2019), no. 3 033001, [arXiv:1812.01009].
- (56) G. Giudice, C. Grojean, A. Pomarol, and R. Rattazzi, The Strongly-Interacting Light Higgs, JHEP 06 (2007) 045, [hep-ph/0703164].
- (57) R. Contino, M. Ghezzi, C. Grojean, M. Muhlleitner, and M. Spira, Effective Lagrangian for a light Higgs-like scalar, JHEP 07 (2013) 035, [arXiv:1303.3876].
- (58) S. Dawson, S. Homiller, and S. D. Lane, Putting SMEFT Fits to Work, arXiv:2007.01296.
- (59) S. Jung, J. Lee, M. Perelló, J. Tian, and M. Vos, Higgs, top and electro-weak precision measurements at future colliders; a combined effective field theory analysis with renormalization mixing, arXiv:2006.14631.
- (60) A. Alloul, B. Fuks, and V. Sanz, Phenomenology of the Higgs Effective Lagrangian via FEYNRULES, JHEP 04 (2014) 110, [arXiv:1310.5150].
- (61) M. Bahr et al., Herwig++ Physics and Manual, Eur. Phys. J. C 58 (2008) 639–707, [arXiv:0803.0883].
- (62) J. Bellm et al., Herwig 7.0/Herwig++ 3.0 release note, Eur. Phys. J. C 76 (2016), no. 4 196, [arXiv:1512.01178].
- (63) M. Aoki, S. Kanemura, K. Tsumura, and K. Yagyu, Models of Yukawa interaction in the two Higgs doublet model, and their collider phenomenology, Phys. Rev. D 80 (2009) 015017, [arXiv:0902.4665].
- (64) G. Branco, P. Ferreira, L. Lavoura, M. Rebelo, M. Sher, and J. P. Silva, Theory and phenomenology of two-Higgs-doublet models, Phys. Rept. 516 (2012) 1–102, [arXiv:1106.0034].
- (65) A. Alloul, N. D. Christensen, C. Degrande, C. Duhr, and B. Fuks, FeynRules 2.0 - A complete toolbox for tree-level phenomenology, Comput. Phys. Commun. 185 (2014) 2250–2300, [arXiv:1310.1921].
- (66) ATLAS Collaboration, M. Aaboud et al., Combination of searches for heavy resonances decaying into bosonic and leptonic final states using 36 fb-1 of proton-proton collision data at TeV with the ATLAS detector, Phys. Rev. D 98 (2018), no. 5 052008, [arXiv:1808.02380].
- (67) CMS Collaboration, A. M. Sirunyan et al., Search for a heavy Higgs boson decaying to a pair of W bosons in proton-proton collisions at 13 TeV, JHEP 03 (2020) 034, [arXiv:1912.01594].
- (68) CMS Collaboration, A. M. Sirunyan et al., Search for a low-mass resonance in association with a bottom quark in proton-proton collisions at 13 TeV, JHEP 05 (2019) 210, [arXiv:1903.10228].
- (69) ATLAS Collaboration, G. Aad et al., Search for heavy Higgs bosons decaying into two tau leptons with the ATLAS detector using collisions at TeV, Phys. Rev. Lett. 125 (2020), no. 5 051801, [arXiv:2002.12223].
- (70) F. Kling, S. Su, and W. Su, 2HDM Neutral Scalars under the LHC, JHEP 06 (2020) 163, [arXiv:2004.04172].
- (71) N. Chen, J. Li, and Y. Liu, LHC searches for heavy neutral Higgs bosons with a top jet substructure analysis, Phys. Rev. D 93 (2016), no. 9 095013, [arXiv:1509.03848].