Abstract

Buildings are major consumers of energy, accounting for a significant proportion of total energy use worldwide. This substantial energy consumption not only leads to increased operational costs but also contributes to environmental concerns such as greenhouse gas emissions. In the United States, building energy consumption accounts for about 40% of total energy use, highlighting the importance of efficient energy management. Therefore, accurate prediction of energy usage in buildings is crucial. However, accurate prediction of building energy consumption remains a challenge due to the intricate interaction of indoor and outdoor variables. This study introduces the Partitioned Hierarchical Multitask Regression (PHMR), an innovative model integrating recursive partition regression (RPR) with multitask learning (hierML). PHMR adeptly predicts building energy consumption by integrating both indoor factors, such as building design and operational variables, and outdoor environmental influences. Rigorous simulation studies illustrate PHMR’s efficacy. It outperforms traditional single-predictor regression models, achieving a 32.88% to 41.80% higher prediction accuracy, especially in scenarios with limited training data. This highlights PHMR’s robustness and adaptability. The practical application of PHMR in managing a modular house’s Heating, Ventilation, and Air Conditioning (HVAC) system in Spain resulted in a 37% improvement in prediction accuracy. This significant efficiency gain is evidenced by a high Pearson correlation coefficient (0.8) between PHMR’s predictions and actual energy consumption. PHMR not only offers precise predictions for energy consumption but also facilitates operational cost reductions, thereby enhancing sustainability in building energy management. Its application in a real-world setting demonstrates the model’s potential as a valuable tool for architects, engineers, and facility managers in designing and maintaining energy-efficient buildings. The model’s integration of comprehensive data analysis with domain-specific knowledge positions it as a crucial asset in advancing sustainable energy practices in the building sector.

1. Introduction

In an era marked by escalating energy demands, the imperative to enhance building energy efficiency has become increasingly critical. The Department of Energy’s 2023 report underscores this urgency, highlighting that buildings account for a substantial portion of total energy use in the United States [1]. This situation presents not just a challenge in resource management but also raises significant environmental and economic concerns. The integration of wireless sensors and IoT technologies has paved new avenues for understanding and controlling building energy usage. When combined with predictive analytics, these technologies show considerable promise in forecasting energy consumption and enabling the fine-tuning of building parameters to minimize waste.

Despite these technological advancements, the development of effective predictive models for building energy consumption is fraught with challenges, primarily due to the dynamic and multifaceted nature of energy usage. While recent studies, such as those by Himeur et al. [2, 3], Deng et al. [4], Elnour et al. [5] , and Han et al. [6], have illuminated the potential of AI and big data in revolutionizing building energy management, they also reveal significant gaps in current predictive modeling approaches. These studies highlight the need for models that can effectively navigate the complex interplay between various indoor and outdoor variables affecting energy consumption.

In response to this identified research gap, our study introduces the Partitioned Hierarchical Multitask Regression (PHMR) model. The PHMR model is designed to address the complexities of building energy consumption by integrating recursive partitioning regression (RPR) with multitask learning (hierML). This approach not only enhances the accuracy of energy consumption predictions but also facilitates the nuanced control and adaptation of building parameters. Our model represents a significant advancement from existing RPR models Chan et al. [7], Chaudhuri et al. [8], Landwehr et al. [9], Loh [10], Loh [11], Zeileis et al. [12], Rusch and Zeileis [13], filling a critical gap in the literature and offering a more sophisticated tool for energy management.

The key contributions of this study are manifold. We present the model formulation of PHMR, which bridges recursive partitioning on outdoor variables with hierarchical multitask learning to enhance prediction and control of building energy consumption. The transformation of the hierML algorithm into a convex optimization problem is detailed, ensuring optimal and computationally efficient solutions. Furthermore, the practical application of PHMR in managing a modular house’s HVAC system in Spain demonstrates the model’s utility in reducing unnecessary energy consumption, aligning with the sustainable goals outlined in recent research by Borràs et al. [14] and Mehdizadeh Khorrami et al. [15].

This paper is organized to detail the PHMR model formulation, the algorithm used for model estimation, its comparison against other methods through simulation studies, and its practical application in building energy prediction and management.

2. Model Formulation for PHMR

The PHMR model consists of two components: (1) a tree-growing process that utilizes the outdoor variables for recursive partitioning and (2) hierML performed at each step of the tree-growing process. In this section (Section 2), we will discuss the hierML model at a fixed step (-th step) of the tree-growing process. The other steps use hierML in a similar manner. In the following section (Section 3), we will present the algorithm that is used to estimate the model parameters for hierML, which is integrated with recursive partitioning to grow a tree.

The -th step of the tree-growing process yields a tree , consisting of internal and leaf nodes. All nodes of , denoted as , include the leaf nodes , which do not have any children in the tree. At each internal node of , a variable is used to partition the samples into left and right branches. The leaf nodes correspond to different subdivisions of the space defined by the outdoor variables, and for each leaf node, there exists a regression model linking the response with indoor variables: and . Figure 1 provides an example of with leaf nodes .

In previous recursive partitioning regression (RPR) models, including our SPR [16, 17], each leaf node’s regression model was fitted independently. However, in our proposed hierML model, we incorporate the hierarchical multilevel similarity structure of the leaf nodes to fit the models jointly. According to the principle of recursive partitioning, models at lower levels of leaf nodes should be more similar to each other. For instance, in Figure 1, the model at leaf node 6 should be more similar to the model at 7 than to that at 5 and that at 3 (and in fact, less similar to 3 than to 5). This is because the recursive partitioning process performs easier splits at earlier stages, resulting in more dissimilar branches and easier splits. As the split becomes later, it becomes harder because the resulting branches become more and more similar.

To jointly estimate regression models for the leaf nodes while incorporating the hierarchical multilevel similarity structure, a penalized formulation is proposed as follows: where .

The first part of equation (1) is the least-square error loss. For simplicity, the subscript “” has been removed from all the notations in this section since we are focusing on a single step in the tree-growing process. The second term in equation (1) is the hierML penalty, and there are some additional notations that need to be defined before explaining the penalty clearly. For each node in the tree, denoted as , represents the set of leaf nodes that grow from node . Taking in Figure 1 as an example, there are seven nodes in the tree. The for each node is , , , , , , and . Let contain the set of regression coefficients corresponding to the -th predictor (i.e., -th indoor variable) in . For example, , , and .

To jointly estimate the leaf node-wise regression models while incorporating the hierarchical multilevel similarity structure, we propose a penalized formulation inspired by multitask learning using regularization Obozinski et al. [18]. Specifically, the set of leaf nodes growing from a given node is denoted as . By considering the as distinct tasks, we apply a weighting strategy to -norm on , denoted as , where is a weight to be discussed later. They then put an -norm outside the weighted -norm, i.e., , to enable the selection of regression coefficients contained in each as a group. Additionally, they put another -norm further outside to enable the selection of regression coefficients corresponding to the -th predictor as a group, i.e., . This completes the explanation of the hierML penalty, which is the second term in the formulation. in the formulation is a tuning parameter that balances the least-square loss and the proposed penalty [17].

In addition, we will explain how to select the weight for each node . When the tree splits into a left and a right branch at each internal node, the regression models at the two branches should exhibit some similarities because they share the same internal node. However, the models should not be identical, or else the internal node would not have been split. For instance, at the lowest internal node, in Figure 1, the tree splits into nodes 6 and 7 as the two branches. The regression coefficients of the -th predictor in the models of nodes 6 and 7, and , should be comparable but not precisely identical. where encourages the two coefficients to be selected jointly to account for their similarity, while and encourage selection separately to account for their difference. and are the corresponding weights. Using the definition of , (2) can be written as

Furthermore, we can move up to the next internal node, i.e., . At node 2, the tree splits into a subtree rooted from node 4 as the left branch and node 5 as the right branch. Following a similar idea to (3), we can write down the penalization on regression coefficients that simultaneously account for the similarity and difference between the two branches, i.e.,

In a similar way, the penalization associated with the internal node is as follows:

To generalize the above scheme, we can write the definition of as with for identifiability consideration. Using this definition, we can write the hierML penalty in (1) as

By some algebraic operations, it can be demonstrated that is associated with and in the following manner, where is the tree’s root node:

This completes the discussion on designing the weight for the proposed hierML penalty. For better illustration, take in Figure 1 as an example. The right-hand side of (7) can be shown to be:

Comparing the above equation with the left-hand side of (7), we can get , , , , , , and . It is easy to verify that these weights comply with the formula in (8).

It is important to note that a coefficient can be penalized in multiple groups using the proposed hierML penalty. For example, is penalized in four groups according to (9): once by itself as and three other times in groups , , and , respectively. These groups have a nested structure, where is a subset of , is a subset of , and is a subset of . This is a general property of hierML, where a regression coefficient of each leaf node is penalized in multiple nested groups with each group corresponding to an ancestor of the leaf node. However, the weighting scheme proposed ensures that the weights corresponding to the multiple nested groups for each leaf node sum up to one, which balances the penalization of regression coefficients in all leaf nodes [17]. This property is presented mathematically in Proposition 1.

Proposition 1. For any leaf node , let be a set of nodes including all the ancestors of and itself. For each , let be the weight associated with node in the hierML penalty in (1). Then,

Detailed proofs are available in a separate supplementary document (Supplementary Material, Proposition 1 (available here)) for clarity.

In summary, our proposed hierML model is defined by the optimization in (1) with the weight given in (8), which depends on the choice of and . To determine , we suggest the following approach: recall that represents the idea that the regression coefficients at each branch of node should be different, while assumes they should be similar. Therefore, a larger value of implies that we want the coefficients to be estimated more independently of each other. The proposed approach for choosing the weight is to make it proportional to the distance between node and the bottom level of the tree. This is based on the idea that the farther away a node is from the bottom of the tree, or the closer it is to the root node, the more different the regression coefficients at each branch of should be. This is because recursive partitioning generally produces easier splits at earlier stages of the partitioning process, resulting in more dissimilar branches. As the split occurs at later stages, the resulting branches become more similar, making the split more difficult. By making proportional to the distance between node and the bottom level of the tree, the proposed approach takes into account this principle of recursive partitioning.

Once the for all the nodes are specified, can be obtained using (8). Next, we use the example in Figure 2 to explain how the weight for node 6, , is obtained. Because node 6 is a leaf node, we need to specify for all which include nodes 1, 2, and 4. Node 1 is three levels up from the bottom level, so . Likewise, we can get and . Normalize these weights to warrant the equality in (10) to hold, we can get , , and . Then, using (8). The weights corresponding to other nodes can be obtained in a similar way. Eventually, we can get the hierML penalty term in (1) written as

3. Algorithm for Model Estimation of PHMR

To solve the optimization problem in (1), our first step is to convert it into an alternative convex optimization following a similar idea proposed by Bach [19] for solving group lasso, i.e.,

The optimization in (12) is convex but with a nonsmooth penalty, which is difficult to solve directly. We relax the penalty term as follows: where and .

Proposition 2. The equality of (13) holds, i.e., , if and only if .

The findings of this study align with the theory presented in Supplementary Proposition 1 (Supplementary Material, Proposition 1). Detailed proofs are available in a separate supplementary document (Supplementary Material, Proposition 2) for clarity [17].

Using the results in Proposition 2, we can write the optimization in (12) into an equivalent format with a smooth penalty term, i.e., subject to and .

. (14) can be solved using an iterative algorithm that alternates between solving and . Given , can be solved analytically, i.e.,

Input: A set of leaf nodes, ’s, and their associated data (training and validation) of indoor
  variables and response variable . A set of weights for each node of the tree
Initialize:
   Initial value of as and start from iteration .
Iterate for all the values of :
   for do
    while and do
      Update by solving
      Update using Proposition 2
    end while
    Calculate on validation set using
   end for
   Output with the smallest
Output: Regression coefficients’ estimation for each leaf node.

Given , can be obtained using the result in Proposition 2. The tuning parameter is selected based on a line search from 0 to , where 0 is the smallest possible value and is the largest value that no further improvement can be made by increasing this value. The optimal is chosen by minimizing the mean squared prediction error (MSPE) on the validation set or via a cross-validation scheme. We summarize the above estimation procedure for the hierML model in Algorithm 1. Note that Algorithm 1 provides the model estimation method for hierML at each step of the recursive partitioning process for growing the tree. Next, we present the steps involved in the entire process, which compose the algorithm for constructing the PHMR model. Input to the algorithm includes a training set and a validation set on the indoor variables , outdoor variable , and response variable . At each step, the algorithm needs to select an outdoor variable, , to split the samples belonging to a leaf node into a left and a right branch (a.k.a. the left child and right child node), and , respectively. is the splitting point. To choose the optimal outdoor variable and the associated splitting point, our algorithm goes through each outdoor variable included in the dataset and each candidate splitting point and chooses the ones to be such that the empirical risk reduction evaluated on the validation set is the largest. The empirical risk reduction is computed as follows: where contains regression coefficients for leaf node estimated using the training set, which has been obtained in the previous step. and contain coefficients of the regression models at two child nodes of , respectively. The child nodes are obtained by using to split according to the splitting point . and are estimated by the hierML model or two lasso models for computational ease. A commonly used empirical risk function is the sum of squared prediction errors over all the samples in the validation set. Suppose that the outdoor variable and associated splitting point are found to be leading to the highest empirical risk reduction, then our algorithm will split the leaf node using . This creates two new leaf nodes corresponding to and , respectively. Algorithm 1 will be used to refit the hierML model for all the leaf nodes. This completes the partitioning at one leaf node. The partitioning will be recursively performed on each newly generated leaf node until no reduction in the empirical risk function can be found and the algorithm stops [17]. We summarize the above-described recursive partitioning tree-growing process in Algorithm 2.

Input: A training set and a validation set on the indoor variable , outdoor variable
  and response variable .
Initialize:
    Fit a lasso model at the current node using to obtain the current empirical risk
    
Assume that there are outdoor variables,
    for to do
     Split the data of node into left and right child node, and , respectively;
     Fit two lasso models in both child nodes and obtain the coefficients of the regression
     models, and ;
     Calculate empirical risk reduction .
    end for
    if ( for all ) then
      Stop
    else
      Choose outdoor variable and associated splitting point leading to the
      largest empirical risk reduction ;
      Split the data of node into two new leaf nodes, and ,
      respectively;
      Apply Algorithm 1 on all leaf nodes to obtain hierML estimation;
      Re-calculate empirical risk for each leaf node on by using new estimates
       from hierML;
      Select the leaf node with largest .
    end if
Output: A set of leaf nodes and fitted regression models in each leaf node.

4. Simulation Studies

The efficacy of the Partitioned Hierarchical Multitask Regression (PHMR) model was rigorously evaluated through a series of simulation studies designed to reflect various complexities encountered in real-world data. This section details the data generation process, compares PHMR against existing methods, and provides an in-depth analysis of the results.

4.1. Data Generation

Data for the simulation studies were generated to mimic a real-world scenario with one indoor variable (), five outdoor variables ( to ), and a response variable (). and were the true partitioning variables, dividing the outdoor variable space into distinct subdivisions, while to were included as noise. The 75 input variables followed a multivariate normal distribution with a correlation structure in set to . This setup aimed to test the robustness and adaptability of PHMR in a controlled yet complex environment.

PHMR’s performance was compared against Single Partition Regression (SPR), model-based recursive partitioning (MOB), and Generalized, Unbiased, Interaction Detection and Estimation (GUIDE) [16, 17]. The selection of these methods provided a diverse range of comparison points, from traditional regression approaches to more modern partitioning techniques. The tuning parameters for all models were optimized based on the mean squared prediction error (MSPE) on a validation set, ensuring a fair and consistent evaluation framework.

4.2. Result Analysis and Model Performance

The mean squared prediction error (MSPE) of each method on the test set is reported in Table 1.

The results show that PHMR outperforms the other methods significantly, particularly in smaller training sizes. This superior predictive accuracy and efficiency are attributed to its hierarchical structure, facilitating information sharing across nodes and enhancing prediction capability.

The recovery rate of the true tree structure, detailed in Table 2 and visualized in Figure 2(a), further demonstrates PHMR’s proficiency in revealing and representing complex data relationships. This was attributed to its hierarchical structure, which allows for information sharing across different nodes and enhances the overall prediction capability. The recovery rate of the true tree structure further demonstrated PHMR’s ability to uncover and represent complex data relationships. A nuanced examination of the MSPE and Pearson correlation within each leaf node, as shown in Table 3, illustrates the model’s adaptability across varying data segments. Nodes closer to the root exhibited better prediction performances, with smaller MSPEs and larger Pearson correlations, than those nearer the bottom of the tree, like nodes 8 and 9. However, even in these challenging nodes, PHMR achieved high Pearson correlations between the true and predicted responses on test data, underscoring its robustness.

In 100 independent simulations, we assessed the reconstruction frequency of the actual tree structure (Figure 2(a)) by SPR and PHMR, with outcomes and MSPE of fully-reconstructed runs detailed in Table 2. PHMR notably outperformed SPR in replicating the ground-truth structure, primarily because SPR often prematurely terminates node splitting. Further analysis of PHMR’s performance across outdoor variable subdivisions (nodes 2, 4, 6, 8, and 9) showed that nodes nearer the root had better predictions with lower MSPE and higher Pearson correlation than those at the bottom, as documented in Table 3. Despite smaller sample sizes in lower nodes, PHMR achieved high Pearson correlations, indicating its robust predictive capability across varied data segments.

4.3. Implications

The simulation studies underscore the robustness and practical applicability of PHMR in predicting complex phenomena. Its ability to accurately capture and represent underlying data structures makes it a valuable tool for various domains. Moreover, the insights gained from these studies provide a solid foundation for further research, suggesting avenues such as extending the hierML model to accommodate nonlinear relationships and developing more efficient algorithms for tree growth in the PHMR approach. These simulation studies provide a comprehensive understanding of PHMR’s capabilities and advantages. The model’s superior performance, coupled with its methodological sophistication, positions it as a promising approach for tackling intricate predictive tasks in real-world scenarios.

5. Application of PHMR in Predictive HVAC Management

5.1. Dataset Description and Experimental Settings

This study utilized a dataset from a modular house in Madrid, Spain, collected every 15 minutes over 42 days, resulting in 4,137 samples to predict indoor temperature. The dataset was collected from the SML system, a prototype dwelling equipped with cutting-edge energy-saving features (Bache and Lichman [20] and Zamora-Martinez et al. [21]). Figure 3 provides a comprehensive depiction of the sensors and actuators employed in the study. This dataset included six indoor variables like CO2 and relative humidity, and six outdoor variables such as outdoor temperature and sun irradiance, detailed in Table 4. The data were split into a training set (50%), a validation set (25%), and a test set (25%). PHMR was applied to this dataset, creating a significant tree structure (depicted in Figure 4) that highlighted the importance of variables such as Sun_light, Sun_irradiance, and Outdoor_temp in predicting indoor temperature variations [17]. This section elaborates on the dataset’s specifics, the variable importance as discovered by PHMR, and the model’s implementation details.

5.2. Model Performance and Key Findings

The model’s performance was evaluated against Single Partition Regression (SPR), with PHMR demonstrating a 37% improvement in prediction accuracy, evidenced by a lower mean squared prediction error (MSPE) of 4.57 compared to SPR’s 7.27. This performance is depicted in a scatter plot (Figure 5) illustrating the relationship between predicted and actual temperatures. Key findings from this application include the crucial role of sunlight-related variables in temperature prediction and the model’s ability to fit different outdoor temperature bins, enabling the HVAC system to adjust its activation/deactivation strategy efficiently. This subsection details the performance metrics used, the comparative analysis with other models, and the critical insights gained from applying PHMR to the real-world dataset.

5.3. Practical Implications

The application of PHMR in predictive HVAC management has underscored its substantial potential for enhancing building energy efficiency. By providing accurate indoor temperature forecasts, PHMR allows HVAC systems to operate more strategically, significantly reducing energy consumption while maintaining comfort levels. This achievement not only highlights the practical utility of the model but also its contribution to advancing sustainable energy practices in building management. The robustness and adaptability of PHMR, as demonstrated by its superior performance over conventional models like SPR, suggest its wide applicability and potential to revolutionize various predictive tasks within the energy management domain.

In light of the discussions on the PHMR model’s capability, it is pertinent to highlight its adaptability and precision in processing diverse datasets. The model’s design allows it to seamlessly accommodate a broad spectrum of variables, including those indicative of occupancy levels, such as CO2 concentrations. While the current case study’s findings showed CO2 levels in dining and living areas as not significantly influencing the model’s predictions relative to other variables, this should not be construed as diminishing the potential relevance of CO2 in different contexts or datasets. The PHMR model’s robust architecture is well-equipped to handle controlled and uncontrolled variables, demonstrating its applicability across a wide range of building types and environmental conditions. This ensures the model’s utility in capturing the nuanced dynamics of indoor environments, reaffirming its broad applicability and value in predictive HVAC management and beyond.

As we look to the future, this study opens new avenues for research and development. Enhancing PHMR to accommodate nonlinear relationships could lead to even more precise predictions in complex scenarios, and optimizing the algorithms for tree growth and model selection could improve its computational efficiency and scalability. Furthermore, integrating PHMR with other building management systems could provide a comprehensive approach to intelligent building operations. Collaborative efforts with industry practitioners will be crucial in validating the model’s effectiveness and exploring its full potential. Ultimately, this research not only validates PHMR’s effectiveness in a real-world scenario but also sets the stage for its broader application in creating energy-efficient and intelligent building systems.

6. Conclusion

This study introduced the PHMR model, a significant advancement in predictive modeling, tailored to optimize building energy management. PHMR’s innovative integration of recursive partitioning for outdoor variables with a hierarchical machine learning (hierML) model for indoor variables at each partitioning node has demonstrated its strength. Notably, the model’s ability to incorporate multilevel hierarchical similarity structures into the joint model fitting process has led to improved prediction accuracy and robust performance, outperforming traditional methods in both simulated and real-world datasets. The successful application of PHMR in accurately predicting building energy usage underscores its potential as a powerful tool in the field, opening new avenues for intelligent, data-driven decision-making in building management systems.

6.1. Limitations and Future Directions

Despite its promising capabilities, PHMR comes with limitations that future research should aim to address. The current model relies on linear relationships within the hierarchical structure, potentially limiting its ability to capture more intricate, nonlinear interactions prevalent in complex data. Additionally, the scalability and computational efficiency, particularly concerning larger datasets and real-time applications, require enhancement to make PHMR more accessible and practical for broader usage. Future developments might include extending the hierML model to encompass nonlinear relationships, thereby broadening the model’s applicability and accuracy. Enhancements in tree growth algorithms and model selection processes are also critical to improving PHMR’s computational efficiency and scalability. These improvements are not just enhancements; they are necessary steps to evolve PHMR into a more comprehensive tool for various predictive modeling applications.

6.2. Broader Applications beyond Building Energy Prediction

Beyond building energy management, the PHMR model holds the potential for impactful applications across diverse industries. In healthcare, leveraging PHMR could lead to breakthroughs in predictive diagnostics and personalized treatment plans, providing a more nuanced understanding of patient data. The financial sector could benefit from the model’s predictive accuracy in areas like risk assessment and market trend analysis, making more informed and strategic decisions. Manufacturing and supply chain operations can harness PHMR to predict maintenance needs and optimize production processes, thereby enhancing efficiency and reducing operational costs. Environmental sciences could utilize the model for more accurate climate modeling and effective pollution control strategies, contributing to conservation and sustainability efforts. These various applications not only demonstrate PHMR’s versatility but also emphasize its potential to drive significant advancements in numerous fields, making it a valuable tool for researchers and practitioners alike.

Nomenclature

Acronyms
PHMR:Partitioned Hierarchical Multitask Regression—a predictive modeling approach used for building energy prediction
HVAC:Heating, Ventilation, and Air Conditioning—a technology of indoor environmental comfort
hierML:Hierarchical machine learning—a machine learning approach that considers hierarchical structures within data
RPR:Recursive partition regression—a regression method that involves partitioning data recursively
SPR:Single Partition Regression—a regression method involving a single partition of data
MOB:Model-based recursive partitioning—a statistical method for recursive partitioning using model-based criteria
GUIDE:Generalized, Unbiased, Interaction Detection and Estimation—a method for detecting and estimating interactions in nonlinear models
MSPE:Mean squared prediction error—a measure of prediction accuracy in statistical models.
Parameters and Variables
:Indoor variable—represents an indoor measurement or condition in the study.
to :Outdoor variables—variables representing outdoor conditions, where and are specifically the true partitioning variables
:Response variable—typically represents building energy consumption or indoor temperature in the study
:Regression coefficients—coefficients for the leaf node in the regression model
:Error term—represents the error term for the leaf node in the model
:Regularization parameter—used in the PHMR model to control the complexity of the model
:Correlation matrix elements—elements of the correlation matrix for the input variables.
Mathematical Notations
:Correlation matrix—a matrix representing the correlations between the 75 input variables
:Coefficient vector—a vector of regression coefficients across all leaf nodes in the PHMR model
:Set of leaf nodes—represents the set of leaf nodes that grow from node in the tree structure of the model.

Data Availability

The SML2010 data used to support the findings of this study have been deposited in the UCI repository. The hyperlink is available as follows: https://archive.ics.uci.edu/dataset/274/sml2010.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was supported by the Yonsei University Research Fund of 2022-22-0295.

Supplementary Materials

Accompanying this manuscript is a supplementary document that contains in-depth mathematical proofs for “Proof of Proposition 1” and “Proof of Proposition 2” essential for a thorough understanding of the foundational principles behind the “Partitioned Hierarchical Multitask Regression (PHMR)” model, as introduced in our study. (Supplementary Materials)