In this paper, the top quark pair production events are analyzed as a source of neutral Higgs bosons of the two Higgs doublet model type I at LHC. The production mechanism is assuming a fully hadronic final state through . In order to distinguish the signal from the main background which is the standard model , we benefit from the fact that the top quarks in signal events acquire large Lorentz boost due to the heavy neutral Higgs boson. This feature leads to three collinear jets (a fat jet) which is a discriminating tool for identification of the top quarks from the Higgs boson resonances. Events with two identified top jets are selected and the invariant mass of the top pair is calculated for both signal and background. It is shown that the low region has still some parts which can be covered by this analysis and has not yet been excluded by flavor physics data.

1. Introduction

The standard model (SM) of particle physics has taken a major step forward by observing the Higgs boson at LHC [1, 2] based on a theoretical framework known as the Higgs mechanism [38]. The observed particle may belong to a single SU(2) doublet (SM) or a two Higgs doublet model (2HDM) [911] whose lightest Higgs boson respects the observed particle properties.

One of the motivations for the two Higgs doublet model is supersymmetry where each particle has a superpartner. The supersymmetry provides an elegant solution to the gauge coupling unification, dark matter candidate, and the Higgs boson mass radiative correction by a natural parameters tuning. In such a model two Higgs doublets are required to give mass to the double space of the particles [1214].

There are four types of 2HDMs with different scenarios of Higgs-fermion couplings. The ratio of vacuum expectation values of the two Higgs doublets () is a measure of the Higgs-fermion coupling in all 2HDM types [15].

In general, 2HDM involves five physical Higgs bosons due to the extended degrees of freedom added to the model by introducing the second Higgs doublet. The lightest Higgs boson, , is like the SM Higgs boson. The rest are two neutral Higgs bosons, (subjects of this study), and two charged bosons, . A review of the theory and phenomenology of 2HDM can be found in [16].

In addition to direct searches for the 2HDM Higgs bosons at colliders, there are indirect searches based on flavor physics data by investigating sources of deviations from SM when processes containing 2HDM Higgs bosons are introduced [17]. Limits obtained from these types of studies are one of the strongest limits on the mass of the charged and neutral Higgs bosons and and will be referred to when presenting the final results.

The adopted scenario in this analysis is a search for heavy neutral Higgs boson with mass in the range 0.5-1 TeV at LHC operating at TeV. All heavy Higgs bosons (CP-even, CP-odd, and the charged Higgs) are assumed to be degenerate, i.e., . The region of interest is low and the final results will be limited to . The signal process is . The fully hadronic final state is expected to result in two fat jets (each consisting of three subjets associated with the top quark) which are examined using the updated HEPTopTagger 2 [18, 19]. Events which contain two identified (tagged) top jets are used to fill the top pair invariant mass distribution histogram. The same approach is applied on background events and final shape discrimination is performed to evaluate the signal significance. Before going to the details of the analysis, a brief review of the theoretical framework is presented in the next section.

2. The Higgs Sector of 2HDM

The 2HDM Lagrangian for neutral Higgs-fermion couplings as introduced in [20] takes the following form:where are the neutral Higgs boson fields, for any fermion type and , and . The parameters define the model type and are proportional to as in Table 1 [21]. Therefore the four types of interactions (2HDM types) depend on the values of [22].

In this study, we require which has two advantages. The first one is that the factor in the lightest Higgs-gauge coupling is set to unity while the heavier Higgs, , decouples from gauge bosons [16]. On the other hand, the SM-like Higgs-fermion interactions are independent.

According to Table 1, the type I is interesting for low as all couplings in the neutral Higgs sector are proportional to . This feature leads to cancellation of this factor as long as Higgs boson branching ratio of decay to leptons and quarks is concerned. The mass of the fermion thus plays an important role in the decay rate, and as seen from Figures 1 and 2, the Higgs boson decay to dominates for all relevant Higgs boson masses and values. The decay to a pair of gluons proceeds through a preferably top quark loop and stands as the second channel. The third channel is which has been shown to be visible at LHC [23]. The current study focuses on with branching ratio being near unity and independent of the Higgs boson mass (Figure 1) and (Figure 2).

3. Signal Identification and the Search Scenario

The signal process under study is a Higgs boson production with the Higgs boson masses in the range GeV. The three Higgs bosons masses are set to be equal for minimizing [24]. All selected points are checked to be consistent with the potential stability, perturbativity, and unitarity requirements and the current experimental limits on Higgs boson masses using 2HDMC 1.6.3 [25, 26].

There has been phenomenological searches for leptophilic Higgs boson within type IV 2HDM at LHC [27] and linear colliders [28, 29]. These searches are based on leptonic decay of the Higgs boson. On the other hand, the type I 2HDM can be considered as a leptophobic model where the Higgs boson decay to quarks plays an important role. At the first glance, decays to all fermions are relevant at low values. However, the fermion mass in the Higgs-fermion vertex enhances the top quark coupling dramatically compared to other channels. This is due to the fact that the common factors cancel out when calculating branching ratio of Higgs decays to fermions. Therefore in this analysis, the Higgs boson decay to is considered as the signal.

While the neutral Higgs boson searches at LEP [30, 31] lead to GeV, the LHC results [32, 33] indicate that the neutral Higgs boson mass in the range GeV is excluded for . This result is based on minimal supersymmetric standard model (MSSM) which has a different Higgs boson spectrum from 2HDM due to supersymmetry constraints. Since our region of interest is Higgs boson masses above 500 GeV, no constraints from LEP or LHC limit the current analysis and the Higgs boson masses under study.

There are also results from flavor physics data. The strongest limit in this category comes from analysis which imposes lower limit on the charged Higgs mass in types II and III at 600 GeV [3436]. There are other analyses such as , , , , and meson mixing. Such observables have a smaller impact than . All limits from the above observables as well as the one from are mainly relevant at types II and III while types I and IV are less affected due to the fact that the charged Higgs-quark coupling in processes which raise deviation from standard model is suppressed in types I and IV with increasing .

In order to compare the two categories of types I/IV and II/III, one may notice that types I and IV behave differently from types II and III as far as the charged Higgs coupling to quarks is concerned. In the former, the charged Higgs coupling to all quark types is suppressed at low , while, in the latter, coupling with at least one type of the quarks (up type or down type) is enhanced with . Therefore charged Higgs limits from flavor physics in types I and IV are very soft and basically relevant at values as low as 2. This is the region of search in this analysis. Although we are dealing with neutral Higgs bosons, since the scenario under study is a degenerate scenario based on , limits on the charged Higgs are propagated into the final results.

4. Software Setup and Cross Sections

The signal cross section is obtained from PYTHIA 8.2.15 [37] using 2HDM spectrum files in LHA format [38, 39] extracted from 2HDMC 1.6.3 [25, 26]. The LHA files contain information about the parameters of the theoretical model as well as properties of any particle which may not be present in standard model, like 2HDM Higgs bosons. In this case, it contains Higgs bosons masses and their branching ratio of decays. For each benchmark point a separated LHA file is generated using 2HDMC and is passed to PYTHIA for event generation and cross section calculation. Results are shown in Figures 3 and 4 which show that the cross section decreases with increasing the Higgs boson mass as well as . Therefore the most suitable area for search is where the mass is as low as possible and is also very small. The main SM background processes are , gauge boson pair production , , , channel and channel single top, single and single , and QCD multijet events. These background processes are generated using PYTHIA except for the QCD multijet background for which Alpgen 2.14 [4042] is used for the hard scattering generation. The output of Alpgen is stored as LHE file [38, 39] and is passed to PYTHIA for multiparticle interaction and final state showering. The cross sections are obtained using PYTHIA except for QCD samples which is obtained from Alpgen and for which we adopt the NLO (next to leading order) cross section calculated using MCFM [4346]. The signal (benchmark points) and background cross sections are listed in Table 2. The QCD multijet has a large cross section. Therefore only events with 6 jets in the final state are generated to produce the same jet multiplicity in the final state. This is based on the assumption that events with more or less number of jets do not contribute to the signal region at the end. Samples with less number of jets have a larger cross section but do not pass the selection cuts (e.g., 6 jet requirement) and those with more number of jets have much smaller cross sections and do not contribute to the signal region sizably.

5. Signal Selection and Analysis

The generation of signal and background events starts with PYTHIA 8 [37] followed by jet reconstruction using FASTJET 2.8 [47, 48].

The analysis uses the top tagging algorithm to identify two top jets in the final state; however, before going to that step, two selection cuts are applied to purify the signal sample. The first requirement is to have at least 2 b-jets in the final state (as there are two b-jets from the top quark decays in signal events). The b-tagging is based on a matching algorithm which uses generator level information of b quarks and compares them with reconstructed jets. If a jet is flying adjacent to a b-quark, it is considered as a b-jet with 70% probability. A % fake rate from c-jets is also considered as the mistagging rate.

The second requirement is a lepton veto which requires events to be free of leptons (with a transverse energy threshold of 10 GeV). This is to select fully hadronic events and reject QCD multijet events with possibility of heavy meson leptonic decays.

At this step the top tagging algorithm is applied on signal and background events. The top tagging algorithm uses a different jet reconstruction algorithm from the one used for b-tagging. The b-tagging jet algorithm is anti-kt while the top tagging algorithm is CA (Cambridge-Aachen) as discussed in what follows.

The jet reconstruction algorithms are classified according to their different subjet distance measures which can be written as with for anti-, Cambridge/Aachen (CA), and algorithms, respectively. The algorithm first combines the soft and collinear subjets and is suitable for reconstructing the QCD splitting history in top tagging algorithm. The anti- algorithm, first combines the hardest subjets to obtain a stable jet with clean jet boundary. The CA algorithm always combines the most collinear subjets while not being sensitive to soft splittings and therefore is suitable for top tagging reconstruction. The algorithm adopted by HEPTopTagger is thus CA with a cone size of .

The HEPTopTagger is one of recent algorithms introduced for boosted top quark reconstruction [49]. It is based on a CA jet reconstruction with and the top jet candidate above 200 GeV. The threshold can be lowered down to 150 GeV without significant loss of efficiency [50, 51]. Having the collection of fat jets in the first step, the top tagging algorithm starts with undoing the last clustering of the top jet candidate and requiring the mass drop criterion as min where is the th subjet from the jet . Subjets with GeV are not considered to end the unclustering iteration.

In the second step a filtering is applied to find a three-subjet combination with a jet mass within GeV.

In the last step, having sorted jets in , several requirements are applied to find the best combination of subjets with two subjets giving the best boson invariant mass and the whole three subjets to be consistent with the top quark invariant mass. Details of these criteria are expressed in [50].

Performing the algorithm, selection efficiency for each signal sample is obtained. The same procedure is applied on background samples. Results are shown in Tables 3 and 4 for signal and background samples, respectively. These tables also include the number of events before the mass window cut. An event is required to have two top jets identified.

The invariant mass of the two top jets is calculated as the Higgs boson candidate mass. Both signal and background distributions of top quark pair invariant masses are normalized according to the corresponding cross sections. The signal on top of the background is then plotted for each benchmark point as seen in Figure 5.

The invariant mass of the top pair has a large resolution due to uncertainties in the four-momentum reconstruction of the jets as well as the top tagging algorithm. In the top tagging algorithm a based method is used to find the correct combination of the jets with their invariant mass in agreement with the boson and the top quark masses. There can be still ways to improve the top tagging algorithm such as normalizing the light jet four momenta to set their invariant mass equal to the boson mass before reconstructing the top quark. Since there are two bosons in the event, this method has some difficulties but can be studied in a detailed analysis related to the performance of the algorithm.

It should be noted that the signal distribution shown in Figure 5 has a single peak for each Higgs boson mass hypothesis due to equal masses of the Higgs bosons. Different masses hypothesis can also be considered. However, the signal distribution can not be distinguished from the equal mass scenario as long as the difference between the Higgs bosons masses is within the invariant mass resolution. Therefore scenarios with GeV might be observable with two distinguishable peaks; however, such a large mass splitting raises the problem of large which should be avoided.

Since a large number of backgrounds fill the signal region, a mass window cut is applied to select the signal and increase the signal to background ratio. The position of the mass window (both left and right sides) is determined in an automatic search based on requiring the maximum signal significance. This is performed in a loop over bins of the histogram and finding the left and right bins inside which the signal significance is maximum.

Table 5 shows mass window position, total efficiencies for signal and background events, final number of signal and background events passing the mass window cut, their ratio, and the signal significance as at two values of and 1. The integrated luminosity is set to 300 . Table 5 clearly shows the high sensitivity of the signal significance to parameter. The analysis is thus relevant to values as low as .

Figure 6 shows the signal significance as a function of the Higgs boson mass for different values. The dashed horizontal line indicates the significance. As seen from Figure 3 and Table 3, the signal cross section decreases with increasing Higgs boson mass while selection efficiencies increase. Therefore the product of the cross section times selection efficiency has a peak somewhere near the middle of the Higgs boson mass range where none of the cross section or selection efficiency are too small. This peak happens at GeV in this analysis. Lower masses suffer from the low selection efficiency while higher masses have the problem of low cross section.

Using the analysis results for Higgs boson masses from 500 GeV to 1000 GeV, one can obtain the 95% C.L. exclusion region and the discovery contours. Figure 7 shows the exclusion region at % C.L. including the recent result from [35] (the result reported in [35] is based on charged Higgs mass as a function of ; however, it is included in the current work as a limit for all Higgs bosons since the Higgs boson masses are equal in the scenario adopted in this analysis). The contour is also shown in Figure 8.

As seen from Figures 7 and 8, both exclusion and discovery are possible at regions not yet excluded by LHC. Therefore any sign of extra top pair signals on top of SM background could be regarded as a signal for new physics especially 2HDM. It should be noted that, in this analysis, a full set of background processes was studied. However, all background processes led to very small number of events which were negligible compared to the SM . Therefore final plots are based on signal on top of the distribution without any sizable error.

The LHC sensitivity to the signal studied in this analysis at integrated luminosity of 3000 can be estimated as follows. If the signal significance grows like ( is the integrated luminosity), at , the signal significance will be roughly three times larger compared to when . The signal cross section, however, decreases from to 2 by a factor of 4 as shown in Figure 4. Therefore the signal significance acquires a factor of by increasing and integrated luminosity from 1 and 300 to 2 and 3000 . This means that points with and 700 GeV will be observable at at 3000 . All above considerations are of course affected by the systematic uncertainties due to the jet energy scale and four-momentum resolution as well as uncertainties in theoretical cross section calculation. A detailed analysis and estimate of such uncertainties are needed before the final assessment.

6. Discussion

The Higgs boson decay to is already known to dominate at 2HDM type I, and there are thorough studies of 2HDM which cover this area [52]. The aim of this work was to perform an event selection analysis based on LHC data environment and show the signal on top of the background and present exclusion contours. The scenario is taken to be a very limited case (degenerate Higgs bosons masses) to study the best possible cases (benchmark points in the parameter space). The selected benchmark points indeed lead to positive results which make the whole analysis interesting for LHC program. Furthermore, we benefit from top tagging technique which enhances the signal to background ratio and reduces the fake rate. This is in turn a test of the algorithm itself as well as benefiting from its ability in identifying the signal and reducing the background.

7. Conclusions

Extra sources of events from what we expect from standard model can appear from theories beyond standard model such as two Higgs doublet models. In 2HDM type I, the heavy neutral (CP-even or odd) Higgs decay to dominates at low . In such a scenario a proton-proton collision may create a neutral Higgs decaying to . The signal from such a process can be observed as an excess of top pair events over what is expected from SM. The discriminating tool can be a top pair invariant mass distribution filled with events containing two top jets from both signal and background processes. The analysis performed in this work shows that such a signal is observable at integrated luminosity of 300 for values which depend on the Higgs boson mass. The exclusion at 95% C.L. is also possible at the same integrated luminosity for with GeV as the best point.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


This work was performed using the computing cluster at college of sciences, Shiraz University. We would like to appreciate the personnel involved in the operation and maintenance of the cluster.