Neutrino Physics in the Frontiers of Intensities and Very High Sensitivities 2018View this Special Issue
Deep Learning the Effects of Photon Sensors on the Event Reconstruction Performance in an Antineutrino Detector
We provide a fast approach incorporating the usage of deep learning for studying the effects of the number of photon sensors in an antineutrino detector on the event reconstruction performance therein. This work is a first attempt to harness the power of deep learning for detector designing and upgrade planning. Using the Daya Bay detector as a case study and the vertex reconstruction performance as the objective for the deep neural network, we find that the photomultiplier tubes (PMTs) at Daya Bay have different relative importance to the vertex reconstruction. More importantly, the vertex position resolutions for the Daya Bay detector follow approximately a multiexponential relationship with respect to the number of PMTs and, hence, the coverage. This could also assist in deciding on the merits of installing additional PMTs for future detector plans. The approach could easily be used with other objectives in place of vertex reconstruction.
The choice of photon sensors such as photomultiplier tubes (PMTs), be it their expected sizes, locations, and the total number of sensors in antineutrino detectors, including Daya Bay , Double Chooz , RENO , and JUNO , are of interest as these sensors are the information gatherers through which we can identify antineutrino interaction events. This work is an attempt in using machine learning, in particular deep learning [5–7] as a way to understand how the number of PMTs in the detector influences the event reconstruction performance and extract lessons to be learned therefrom for areas such as detector designing and upgrade planning. To the best of our knowledge, this work is the first study on the efficacy of deep learning in detector designing and planning. For this work, we ask the following: suppose we are given possible number of locations for the installation of number of PMTs , where should the PMTs be installed such that the event interaction vertex reconstruction is optimal or near-optimal given only these PMTs in the detector and possible locations? Of course, and could be infinite, but this is technically impossible as it would not meet the budget of a detector construction. In this work, we use deep learning on a model of the Daya Bay detector as a case study to understand the impact of PMTs on event position vertex reconstructions in a detector. The vertex is useful for studies on signal-background discriminations and the correction to the position-dependent energy response in the detector. The reconstruction of the vertices has been studied in-depth in the Daya Bay experiment using non-machine learning methods. As such, this allows us to cross-check our vertex reconstruction with deep learning with other methods in the Daya Bay before studying its potential in detector designing. Moreover, the vertex reconstruction performance is chosen as the objective since it is relatively simple for deep learning to handle for a clear understanding of the approach without involving too much experimental details. Nonetheless, experimentalists can easily substitute the vertex reconstruction performance with other objectives of interest. Beyond antineutrino detectors, sensor placements have been studied in areas ranging from water network distributions  to fault detections .
The Daya Bay antineutrino detectors are liquid scintillator detectors with a physics program focusing on the precision measurement of the neutrino mixing angle with reactor antineutrinos. Each Daya Bay detector consists of three concentric cylindrical tanks: an inner acrylic vessel (IAV) containing gadolinium- (Gd-) doped liquid scintillator, an outer acrylic vessel (OAV) containing undoped liquid scintillator which surrounds the IAV, and a stainless steel vessel (SSV) which surrounds the IAV and OAV. With this design, the detectors could detect the interaction of the antineutrinos and the scintillator via inverse beta decay (IBD) reactions:The emitted positron then undergoes ionization processes in the liquid scintillator before annihilating with an electron producing a prompt signal with an energy deposition in the range of 1 - 8 MeV. The deposited energy is converted to scintillation photons which are then collected by the PMTs. As the positron displacement prior to the annihilation is negligible, the interaction vertex of the prompt signal can be assumed to be the antineutrino IBD interaction vertex. However, the neutron thermalizes and diffuses before being captured on either a proton or Gd with a mean capture time of in the Gd-doped liquid scintillator and in the undoped liquid scintillator, giving rise to a delayed signal. A total of 192 Hamamatsu R5912 8-inch PMTs , arranged in a layout with 8 rows and 24 columns, are installed on the vertical wall of the SSV pointing inward towards the OAV and IAV forming a total of 6% photodetector coverage. Located above and below the OAV are reflective panels that serve to redirect scintillation light towards the PMTs thereby increasing the photon collection efficiency to 12% effectively.
As aforementioned, we used deep learning to perform the IBD vertex reconstruction in order to study the effects of PMTs on an event reconstruction. Deep learning is a class of machine learning, which is especially adept at leveraging large datasets to compute human-comprehensible quantities by learning the various degrees of correlations within. Notably, it can, on its own, learn to discover functional relationships from the data without a priori given, effectively forming a mapping from the inputs to a quantity of interest. In other words, deep learning seeks to model the quantity of interest using a vector of inputs with , where are parameters of the deep network; their numerical values were found by minimizing the error between the predicted and . Deep learning machine architectures, commonly known as deep neural networks (DNN), are based on artificial neural networks  but deeper in terms of the number of hidden layers and are more flexible in terms of how each neuron is connected to other neurons.
The ubiquity of deep learning and its significant success over traditional methods across disparate fields [12–14] in discovering patterns is surprising. However, this may well be due, in part, to that our universe operates on simple physical properties . In high-energy physics, deep learning has demonstrated its prospective use in jets [16, 17], as part of the signal-background discrimination toolkit in the search for beyond the Standard Model particles  and Higgs bosons  and in neutrino physics experiments [20–24].
2. Recursive Search
As mentioned in Section 1, we wish to search for the most important locations corresponding to PMTs installed therein from the total possible locations in the detector in determining the vertex position of events collected from the detector. is a free parameter which could be chosen during the detector design and simulation stage. Denoting the set containing the number of most important locations as the set , this implies that we should find the set such that the vertex reconstruction error is minimized. However, finding such locations simultaneously is a task confounded by a computation that grows exponentially with . Alternatively, we could search for an approximation to by recursively finding the important PMT location one at a time, which can be achieved using deep learning. Since searching for the most important location is equivalent to searching for the most important PMT at that particular location, the phrase “-th important PMT” will be used in this work as a shorthand for “-th important PMT location”.
Let the true position of the IBD prompt events be ; the predicted position using DNN as , then in a recursive search, the -th important PMT, , will be the one that maximizes the improvement in the resolution of the residual distribution given that the other PMTs have already been found through the recursive search, i.e., , and where the residual is . Namely,where and is the set containing all the PMTs. Using (2), PMTs could be progressively added into a larger and larger subset defining the best set found by the algorithm. Alternatively, one could perform a backward elimination: starting from the set with all PMTs and progressively eliminating the most “unimportant” PMT. At the conclusion of this recursive search, we obtain a curve of the event reconstruction resolution versus the number of PMTs used for the reconstruction thereof.
3. Deep Neural Network
In our approach utilizing DNNs, we used a Monte Carlo dataset comprising 2 million IBD prompt events obtained from a Daya Bay detector model which were randomly partitioned into a training set (1.4 million), a validation set (0.3 million), and a test set (0.3 million). The validation set is used for the early stopping of the DNN training to prevent overfitting or underfitting of the data . The parameter as defined in Section 2 would be 192 corresponding to the 192 PMT locations in the Daya Bay detector model. The charge information of the 192 PMTs is fed into the DNN as its inputs, and the output is the predicted vertex location . To train the DNN, we used the mean square error () loss function to measure the error between the predicted, and the truth vertex positions: where is the number of events, and are the predicted and truth values for the i-th coordinate of the j-th event vertex, respectively (). The was minimized to obtain the optimal DNN parameters. The minimization is typically done with a gradient descent method  involving the gradient of the loss function with respect to the DNN parameters, including the weights of each neuron; i.e., at each training iteration, the parameters are updated viawhere is the learning rate determined by the user that controls the step length in the negative gradient direction during the training stage. When the reaches minima, . At this point, the DNN has found the needed parameter values to best reconstruct the vertex position. To train the DNN, starts with a value of 0.001 and is progressively multiplied a factor of 0.5 whenever the value of the loss function metric stops improving. In this manner, the DNN training will descent quickly in the direction of the minima in the early stage; with a smaller learning rate at a later stage, the training will not overshoot the minima but will descent steadily towards it. An early stopping is made during the training, whereby the training is terminated when no further improvements could be observed from the loss function value after a predetermined number of training rounds, in this case ten. Without such early stopping, the loss function value can rise again indicating that an overfitting has occurred.
The efficacy of deep learning to predict the position of the IBD prompt events can be demonstrated by the residual distributions shown in Figure 1 where the charge information of the 192 PMTs is fed into a DNN as its inputs. The DNN used here to obtain consists of multiple fully connected layers with ReLU  hidden neurons. The optimal number of layers and neurons were obtained using a tree-structured Parzen estimator . The resulting network comprises three hidden layers containing 180, 148, and 148 neurons, respectively. The resolutions as obtained from the Gaussian fit to the residual distributions are 67 mm and 80 mm for and , respectively.
A straightforward and brute force use of (2) in a recursive search using a DNN to identify would be to check over all the remaining PMTs not in the optimal set and separately construct the residual distributions, picking the one giving the best resolution for a particular coordinate in . For this brute force search, we used a DNN architecture similar to the aforementioned DNN. The input layer will contain neurons with charge information from the already-chosen PMTs, i.e., those in , plus a candidate PMT, i.e., . The computation time for such a search grows quadratically with the total number of PMTs in the detector. Such a brute force search is clearly not scalable. Hence, in this work, we have also used a fast approach to approximate the brute force search but which mitigates the nonscalability of the latter.
This fast approach integrates a DNN component from the autoencoder architecture : a bottleneck layer with a single neuron, as shown in Figure 2. In this bottleneck DNN architecture, the remaining candidate PMTs not in are forced to connect to the bottleneck neuron before being given to the fully connected layers as inputs, effectively demanding the DNN to search for the best weights associated with each of these PMTs. At the bottleneck region, the DNN computes the sum , where runs over the candidate PMTs, the quantity is the -th PMT input to the DNN in the form of charge information, and the weight of the -th PMT is a parameter in the DNN. When the training stage of the DNN ends, the s would have reached their best values corresponding to a minima of the . The PMT with the largest weight in magnitude indicates that the reconstruction of the position of the IBD prompt events relies the heaviest on this PMT compared to the rest of the candidate PMTs. Hence, this PMT would be our -th important PMT, . Crucially, this type of DNN only needs to be trained once to identify no matter the value of , whereas the brute force DNN needs to be trained times, once for each remaining candidate PMT.
The heatmap in Figure 3 shows the resolutions as obtained from the residual distributions corresponding to using only one PMT for training and determining the vertex location of the antineutrino IBD interaction using the brute force search. Specifically, the resolution pertaining to using each PMT is indicated by a value from the color scale. Clearly, some PMTs contain more information about the vertex position than others. The first most important PMT from the brute force search, i.e., , is chosen as the one having the smallest color value in the heatmap. The variation in resolution for the -direction by column is due to the use of rather than . The reconstruction shows that the most important PMT is different for the -direction and the -direction. The heatmap pattern for the y-direction is similar to the x-direction, but with the dark region in the x-direction being the light region in the y-direction and vice versa, reflecting that and depend on and , respectively in a cylindrical coordinate system, i.e., a shift in difference between and .
In Figure 4, the heatmap shows the weight corresponding to each PMT as obtained from the bottleneck neuron when searching for the first most important PMT. A higher weight indicates that the PMT has a larger impact on the vertex reconstruction. The PMT with the largest weight is identified as the most important PMT, . Ideally, we would like to constrain the weights to be discrete at the bottleneck, i.e., to be either 0 or 1, where weight is 1 for the most important PMT and 0 for the rest during the training of the DNN. However, such a constraint is nondifferentiable and noncontinuous with respect to the loss function which would render DNN parameter optimization using gradient descent algorithms unfeasible. Comparing Figures 3 and 4, the brute force and the bottleneck DNN have chosen different PMTs as their most important PMTs possibly due to degeneracies in the detector. For example, in the z-direction, the PMTs at the top and bottom rings should produce the same resolution. Our suspect is that, during the training of the bottleneck DNN, some information sharing between a subset of PMTs, in which the DNN thinks their information values are similar, is unavoidable. Hence, the bottleneck neuron contains information from not one but a subset of PMTs; i.e., the importance by weights of each PMT could be partially “shared” amongst several PMTs. Further understanding of these are being conducted. Figures 5 and 6 are the results as obtained from the brute force and bottleneck DNN approach, respectively, while searching for the second most important PMT after having found the first most important PMT.
Figure 7 shows the residual curve for and as a function of the number of PMTs used in the reconstruction. Using an Nvidia Tesla P40 GPU, we estimated that it would take about 60 days to complete the entire residual curve with the brute force search, whereas it took about one day to complete with the bottleneck DNN. Resolutions from random choice of PMTs are also included in the figures as a comparison to the result from the bottleneck DNN. An empirical fit to the bottleneck DNN results is done with a triple exponential fit. It can be clearly seen that there is a diminishing return on the improvement in the vertex resolution when adding additional PMTs to an existing set of PMTs, which is an implication of the submodular  nature of the Gaussian standard deviation and its relationship to the information entropy, . Succinctly, the submodularity of the Gaussian standard deviation, i.e., the resolution in this case shows that there is less new information that could be gained from adding a new PMT to a larger set of already-chosen PMTs than to a smaller set. As all the PMTs in the Daya Bay detector are of the same size and model, Figure 7 could also be interpreted as the residual curve being a function of the detector coverage.
(a) versus number of
(b) versus number of
In this work, we provide a fast approach using a deep neural network with a bottleneck neuron to uncover the effects of the number of photon sensors such as PMTs on the vertex resolutions in an antineutrino detector. The results have been compared with a random PMT search and a brute force search which yields the ideal result. Our inputs are the simulated charge information of the Daya Bay PMTs. The fast approach produces results close to those from the brute force search and fares much better than a random search. We find that the vertex resolution of the event reconstruction at the Daya Bay is approximately a multiexponentially decreasing function with respect to the number of PMTs and hence, also, the coverage. In future work, we envisage the possibility of incorporating the temporal information, i.e., the time of arrival of each photon in addition to the charge information to reconstruct the vertices. In addition, one could also study the size of the PMT needed alongside its installation location corresponding to the best event vertex reconstruction resolution. Also, a subsequent work from here would be the study of the effect of PMTs based on the event energy upon obtaining the vertex positions. Although studying the energy might need modifications to the deep network as the energy is a positive-definite quantity, the energy resolution is important when considering physics sensitivity and thereby also impacting the design of future antineutrino detectors including JUNO.
In order to use the bottleneck DNN approach for new detectors in designing phases, we suggest Monte Carlo simulations using various . Then, one can obtain the surface in the hyperspace, where is some detector performance metric; and are as described in this work. Experimentalists can then decide on the working point for their detector in accordance with their construction budget and the desired detector performance.
The data used to support the findings of this study are available from the corresponding author upon request.
Zhi-Qiang Qian is co-first author.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
The authors thank Shen-Jian Chen and Zuo-Wei Liu for their computing facilities and helpful discussions. They would also like to thank Chao Zhang, Zhe Wang, Samuel Kohn, the Daya Bay ACC, and the Collaboration for their time and comments. This work was supported by the National 973 Project Foundation of the Ministry of Science and Technology of China (Contract no. 2013CB834300) and the International Science & Technology Cooperation Program of China (Contract No. 2015DFG02100).
Double Chooz collaboration, Y. Abe et al., “Improved measurements of the neutrino mixing angle θ13 with the Double Chooz detector,” Journal of High Energy Physics, vol. 2014, no. 86, 2014.View at: Google Scholar
RENO Collaboration, J. Choi et al., “Observation of Energy and Baseline Dependent Reactor Antineutrino Disappearance in the RENO Experiment,” Physical Review Letters, vol. 116, 2016.View at: Google Scholar
JUNO collaboration, F. An et al., “Neutrino physics with JUNO,” Journal of Physics G: Nuclear and Particle Physics, vol. 43, Article ID 030401, 2016.View at: Google Scholar
Y. Bengio, “Deep Learning of Representations: Looking Forward,” in Statistical Language and Speech Processing, A. Dediu, C. Martín-Vide, R. Mitkov, and B. Truthe, Eds., 37 pages, Springer Berlin Heidelberg, Berlin, Heidelberg, 2013.View at: Google Scholar
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, Eds., vol. 25, pp. 1097–1105, 2012.View at: Google Scholar
P. T. Komiske, E. M. Metodiev, and M. D. Schwartz, “Deep learning in color: towards automated quark/gluon jet discrimination,” Journal of High Energy Physics, vol. 2017, no. 1, 2017.View at: Google Scholar
P. Baldi, P. Sadowski, and D. Whiteson, “Searching for exotic particles in high-energy physics with deep learning,” Nature Communications, vol. 5, 2014.View at: Google Scholar
S. Kohn, “Understanding Backgrounds using Deep Learning at the Daya Bay Experiment,” in Proceedings of the Work presented at the 28th International Symposium on Lepton Photon Interactions at High Energies, 2017.View at: Google Scholar
MiniBoone collaboration, R. Acciarri et al., “Convolutional neural networks applied to neutrino events in a liquid argon time projection chamber,” Journal of Instrumentation, vol. 12, 2017.View at: Google Scholar
NEXT collaboration, J. Renner et al., “Background rejection in NEXT using deep neural networks,” Journal of Instrumentation, vol. 12, Article ID T01004, 2017.View at: Google Scholar
E. Racah, S. Ko, P. Sadowski et al., “Revealing fundamental physics from the Daya Bay Neutrino Experiment using deep neural networks,” in Proceedings of the 15th IEEE International Conference on Machine Learning and Applications, ICMLA 2016, pp. 892–897, USA, December 2016.View at: Google Scholar
V. Nair and G. Hinton, “Rectified Linear Units Improve Restricted Boltzmann Machines,” in Proceedings of the 27th International Conference on Machine Learning (ICML-10), J. Furnkranz and T. Joachims, Eds., pp. 807–814, Omnipress, 2010.View at: Google Scholar
J. Bergstra, R. Bardenet, B. Yoshua, and K. Balßzs, “Algorithms for Hyper-parameter Optimization,” in Proceedings of the 24th International Conference on Neural Information Processing Systems, NIPS11, pp. 2546–2554, Curran Associates Inc, USA, 2011.View at: Google Scholar
G. E. Hinton and R. S. Zemel, “Autoencoders, Minimum Description Length and Helmholtz Free Energy,” in Advances in Neural Information Processing Systems 6, J. D. Cowan, G. Tesauro, and J. Alspector, Eds., pp. 3–10, Morgan-Kaufmann, 1994.View at: Google Scholar