Emerging Applications of Complex NetworksView this Special Issue
A Network-Based Approach to Modeling and Predicting Product Coconsideration Relations
Understanding customer preferences in consideration decisions is critical to choice modeling in engineering design. While existing literature has shown that the exogenous effects (e.g., product and customer attributes) are deciding factors in customers’ consideration decisions, it is not clear how the endogenous effects (e.g., the intercompetition among products) would influence such decisions. This paper presents a network-based approach based on Exponential Random Graph Models to study customers’ consideration behaviors according to engineering design. Our proposed approach is capable of modeling the endogenous effects among products through various network structures (e.g., stars and triangles) besides the exogenous effects and predicting whether two products would be conisdered together. To assess the proposed model, we compare it against the dyadic network model that only considers exogenous effects. Using buyer survey data from the China automarket in 2013 and 2014, we evaluate the goodness of fit and the predictive power of the two models. The results show that our model has a better fit and predictive accuracy than the dyadic network model. This underscores the importance of the endogenous effects on customers’ consideration decisions. The insights gained from this research help explain how endogenous effects interact with exogeous effects in affecting customers’ decision-making.
Complex network modeling and simulation have shown their power in many engineering applications, such as the wireless network, sensor network, smart grids, supply chain, transportation systems, and many others. Recent developments in mathematical modeling techniques and computational algorithms to study complex networks have also drawn the attention of engineering design field. Complex networks have been used in engineering design for the study of relational patterns, effective network visualization of associations of products, and modeling social interactions  and cross-level interactions between customers and products [2, 3]. In the design of complex products, network analysis has been used to characterize a product as a network of components that share technical interfaces or connections. Various network metrics, such as clustering coefficients and path length, are used to characterize the product structure and study the correlations between design quality and the product structure. Based on the network metrics, for example, the centrality, Sosa et al.  defined three measures of modularity as a way to improve the understanding of product architecture. Recent work by Sosa et al.  found that proactively managing the use of network structure (such as hubs) may help improve the quality of complex product designs. Network analysis has also been applied to studying designers’ network for understanding organizational behavior [6, 7] and improving multidisciplinary design efficiency . In this paper, instead of focusing on the product or the designer, we leverage complex network modeling and simulation techniques to study another key stakeholder in product design, the customer. We aim to leverage complex networks to study customer preference in support of product design and development. Particularly, in this paper, we study customers’ consideration decisions by modeling product coconsideration relations, two products being concurrently considered in purchase, as a complex network.
2. Background and Literature Review
Choice modeling is of great interest in engineering design as it predicts product demand and market share as a function of engineering design attributes and customer profiles in a target market . Choice models have been integrated into design optimization to take account of customer preferences in supporting engineering design decisions [9–12]. Previous choice models mostly assume that customers have bounded rationality and have underlying utilities to rank alternatives in a consideration set, “a set of product alternatives available to an individual who will seriously evaluate through comparisons before making a final choice” . A key step of constructing choice models is to determine the consideration set . As Hauser et al.  indicated “if customers do not consider your product, they can’t choose it.”
From an enterprise perspective, understanding customer preferences in consideration is important for identifying crucial product features that customers are willing to pay for. Existing studies [16, 17] also revealed the consideration set phenomenon, that is, the size of the consideration set tends to be much smaller (roughly 5-6 brands) than the total number of choices available in a market. As a result, small changes in individuals’ consideration sets (either size or options) may significantly transform the landscape of the overall market and reshape the competition relations in an existing market. Therefore, understanding customers’ preferences in consideration poses new opportunities to optimize product configurations, address customer needs, establish competitive design strategies, and make strategic moves such as branding and positioning.
Managerial actions have been taken to influence customers’ consideration decisions directly, for example, by changing brand accessibility  and by controlling usage and awareness . However, quantitative studies on customers’ consideration decisions are challenging as consideration is an intermediate construct, not the final choice . The decision context and a large amount of uncertainty alter decision rules. Existing literature primarily focuses on inferring decision-rule heuristics [20–22], such as the cognitive simplicity rule , which has been shown to be effective in automobile and web-based purchasing. There are three approaches to uncover consideration decision-rule heuristics . The first approach only utilizes final choices and product features in the consideration set. It adopts a two-stage consider-then-choose decision process and infers model parameters using the Bayesian or maximum likelihood estimation. Typical methods include Bayesian , choice-set explosion [25–27], and soft constraints . The second approach measures consideration through designed experiments in vitro, similar to the choice-based conjoint analysis exercise . Then the decision rules that best explain the observed consideration decisions are estimated with Bayesian  and machine-learning pattern-matching algorithms . The third approach measures decision rules directly through self-explicated questions .
Despite the diversity of research on consideration sets, few studies have focused on understanding the underlying process of generating customer consideration sets. The connection between the formation of consideration sets and the driving factors is not well understood. Particularly, we know little about how the inherent market structure, including both the interdependence among existing products and association among customers, affects the consideration decisions. To address this research gap, we develop a network-based approach to model customers’ consideration behaviors by modeling product coconsideration relations. As shown in Figure 1, the key idea of the proposed network approach is to transform customer consideration sets into a product association network, in which nodes represent products and links represent the coconsideration between two products. As a result, the problem of understanding customer consideration can be addressed by predicting certain network structures as a function of association networks formed by product attributes and customer demographics. It is worth noting that as the link formation is an aggregation of customers’ decisions, the links (i.e., the coconsideration relations) imply the competition among products. Therefore, our approach enables us to study customer preference and market structure in an integrated manner. This is different from the studies in choice modeling (e.g., the monomial logit choice model ) that focus on establishing models for individuals. It is also worth noting that our study is different from the agent-based models which hypothesize certain individual choice-making rules . Instead, our approach is data-driven, which leverages the observed data to drive the establishment of coconsideration models and prediction analysis using the estimated model parameters.
Recently, network approaches have been also extensively used in recommender systems [34–38]. Recommender systems are frequently used to recommend products to customers based on what they searched (considered). From the network representation point of view, our approach is similar to the bipartite projection approach  used in the recommender systems research. However, the proposed network approach is distinct from network-based recommender algorithms [37, 38] in two aspects: first, the end goal is different. The recommender algorithms attempt to predict future likes and interests by mining data on past user activities. Common methods include the similarity-based methods (e.g., the collaborative filtering , content-based analysis , and Dirichlet allocation ) and the recently developed hybrid methods [36, 42]. The approach proposed in this paper relies on the network-based statistical inference model, which emphasizes deduction and explanation. It aims to provide an explanatory framework for customers’ consideration behaviors, so that a feedback loop can be created from customer preference to engineering design. Therefore, the end goal of this study is to inform product design for larger market share. In such a context, prediction in this study is for comparison and validation purposes. Second, the role of network in the modeling is different. In existing network-based recommender algorithms, the input takes various graph-based node-specific attributes (e.g., degree), which are essentially the exogenous factors, to generate the similarity metrics. In our approach, the model input can take into account present network structures (e.g., triangles and loops), which represents the interdependencies among products, so that the effect of the inherent competition relations can be assessed. Such a capability supports better understanding on the consideration behaviors and could provide additional insights into the design research that has been primarily driven by users’ preferences to engineering attributes.
The current work builds upon our previous research efforts. In our recent study, Fu et al.  developed a two-stage bipartite network modeling approach to study customer preferences in making choices by decoupling the choice-making process in two stages, the consideration stage and the choice-making stage. Wang et al.  utilized a dyadic network analysis approach to predict product coconsideration relations based on exogenous factors, such as product attributes and customer demographics. By mapping specific technological advancement (e.g., turbocharged techniques) to the change of products attributes, the authors also demonstrated how the model facilitates the forecast of the impact of technological changes on product coconsideration and market competition.
In this paper, we take a further step to investigate the power of complex network modeling in understanding product coconsideration relations by considering both exogenous factors and endogenous factors, for example, product interdependence and inherent market competition. The core technique is based on the Exponential Random Graph Model (ERGM) . While dyadic network models are convenient to predict the associations between products based on exogenous factors, ERGM incorporates endogenous factors as well as other network interdependencies .
The research objective of this study is therefore twofold: (a) to establish the network-modeling framework that supports the explanation of customer’s consideration behaviors and enables the prediction of future market competitions; (b) to compare the ERGM and dyadic network model to examine if the inclusion of product interdependence through the endogenous network effects would better capture the dynamics underlying the formation of product coconsideration relations. The remainder of the paper has five sections. Section 3 presents the research problem and introduces the method of constructing a product coconsideration network. We also briefly provide the technical background of the dyadic network model and ERGM. Section 4 describes the vehicle case study and the data source. We present the estimation results of the dyadic model and ERGM and illustrate how to use the attribute-related network structures to represent product interdependence, that is, the endogenous effects. To evaluate the performance of each model, Section 5 assesses model fit at both the global network level and the local link level. Section 6 evaluates the performance of each model in predicting future coconsideration relations. Finally, Section 7 presents practical implications of the findings and directions for future research.
3. Network Construction and Introduction to Network Models
3.1. Network Construction
The product coconsideration network is constructed using data from customers’ consideration sets. The presence of a link (i.e., coconsideration) between two nodes (i.e., products) is determined by an association metric, called lift . Equation (1) defines the lift value between products and . Similar to pointwise mutual information , lift measures the likelihood of the coconsideration of two products given their individual frequencies of considerations.where is the probability of a pair of products and being coconsidered by customers among all possibilities, calculated based on the collected consideration data; and is the probability of individual product being considered. The lift value indicates how likely two products are coconsidered by all customers at the aggregate level, normalized by the product popularity in the entire market. We use this probability of coconsideration, different from market share that is directly determined by the total purchases, to capture the competition between products. With the lift value, an undirected coconsideration network can be constructed using the following binary rule:where cutoff is the threshold to determine the presence of a link between two nodes and . Statistically, the lift value indicates that two products are completely independent ; a lift value greater than indicates the two products are coconsidered more likely than expected by chance. Based on the application context, research interest, and model requirement, different lift values greater than can be used as the cutoff value. Equations (1) and (2) suggest that the network adjacency matrix is symmetric and binary. In this paper, the research is focused on predicting whether two products would have been coconsidered or not. The extent of how often they are coconsidered (reflecting the competition intensity) is not the research focus of this paper. This is why we made the decision of using binary network instead of weighted network. Modeling a binary network, while computationally simpler, is not as rich as the valued network. Hence, we tested the robustness of our findings by estimating multiple models based on varying the cutoff values of lift.
3.2. Research Question in the Network Context
Once a coconsideration network is constructed, the likelihood of customers considering two products can be formulated as the probability of a coconsideration link. For prediction purpose, this leads to the question of what factors (e.g., product attributes and customer demographics) drive the formation of a link between a pair of nodes, and how significantly each factor plays a role in the link formation process. The aforementioned research question is recast as how to build a network model to predict whether a coconsideration link exists given the network structures, product attributes, and customer profiles.
We posit that there are two decision-making scenarios underlying the coconsideration relations. The first scenario (Figure 2(a)) assumes that each pair of products is independently evaluated by customers. Even for multiple alternatives in a consideration set, it treats the comparison of each two of these alternatives independent of other pairwise comparisons. The second scenario takes a more general interdependence assumption, where the formation of one coconsideration link is not independent of other coconsideration links. For example, in the right diagram of Figure 2, the likelihood of a coconsideration link between products A and B may be affected by the fact that they are both coconsidered with product C. For the two aforementioned network models, the dyadic network model takes the simple independence assumption, while the ERGM assumes that all coconsideration relations sharing one node are interdependent. In this paper, we will examine whether the ERGM provides a more accurate understanding on the factors driving product coconsiderations by evaluating the goodness of fit and the predictability of the two models.
3.3. Introduction to Network Models
The dyadic network model is analogous to the standard logistic regression element-wise on network matrices, where the model is given by the following:
The response is the binary links between nodes and defined in (2). The node attributes are converted to a vector of as dyadic variable, . Each dyadic variable measures the similarity or difference between pairs of nodes based on the attributes of nodes and a specific arithmetic function (see Table 1 for various dyadic variables). The dyadic network models use the dyadic variables to predict the complex structures of the observed network composed of coconsideration links. The coefficients indicate the importance of individual dyadic variable in forming a coconsideration relation. Note that, in this model, the probability of each link is evaluated independently.
3.3.1. Exponential Random Graph Model
Other than the dyadic attribute effects, in a network, many links connected to the same node have endogenous relations. That means the emergence of a link is often related to other links. The ERGM introduced by [49, 50] is well known for its capability in modeling the interdependence among links in social networks. For example, two people who have a common friend are more likely to be friends with each other too, and therefore the three-person friendship relations form a triangle structure. Specific network configurations, including edges, stars, triangles, and cycles, can be used to represent different types of interdependence. The ERGM interprets the global network structure as a collective self-organized emergence of various local network configurations. The logic underlying ERGM is that it considers an observed network, , as one specific realization from a set of possible random networks, , following the distribution in the following equation :where is a vector of model parameters, is a vector of the network statistics and attributes, and is a normalizing quantity to ensure (4) is a proper probability distribution. Equation (4) suggests that the probability of observing any particular network is proportional to the exponent of a weighted combination of network characteristics: one statistic is more likely to occur if the corresponding is positive. Note that, in ERGM, the network itself is a random variable and the probability is evaluated on the entire network instead of a link as in (3) for dyadic models. In brief, the advantages of using ERGM in the context of product coconsideration are threefold: using network configurations to characterize the endogenous effects among coconsideration links, providing various dyadic variables to model different types of exogenous impacts of the product attributes, and integrating both exogenous attribute effects and endogenous network effects in a unified framework.
3.3.2. Exogenous Dyadic Variables and Endogenous Network Effects
The exogenous dyadic variables used both in the dyadic model and in ERGM allow the modeling of two types of effects between a pair of nodes with specific variables: the baseline effects of the attributes and the homophily effects, that is, the similarity or difference between the attributes of two nodes [44, 51]. In the context of the product coconsideration network, the baseline effects examine whether products with a specific attribute are more likely to be coconsidered than products without that attribute; for example, imported car models could be more likely to be coconsidered as compared to domestic car models. The homophily effects examine whether two products with similar attributes tend to have a coconsideration link. For example, customers are more likely to consider and compare products with similar prices. The development of dyadic variables supports the study of inherent product competition beyond the understanding of customer preferences.
Table 1 summarizes the guidelines of creating dyadic variables for different types of attributes such as binary, categorical, and continuous. For the product attributes under (a)–(c), the strength of link is determined by the corresponding attributes and associated with the linked products. Beyond product attributes, we also introduce nonproduct related attributes (d). For example, customer demographics can be included in the model to allow the prediction of the impact of customers’ associations/similarities on product coconsideration relations. To create a dyadic variables related to customers’ attributes, multivariable association techniques, for example, joint correspondence analysis (JCA) , have been used to compute the similarity of the customer-related attributes as the distance between two product points ( and ) in a metric space. In this paper, we follow the method presented in  to develop two categories of distance variables, the distance of customer perceived characteristics and demographic distance. The customer perceived characteristics are user-proposed tags to indicate their perceptions of the products, such as youthful, sophisticated, and business-oriented. Customer demographics include income and family information of the user groups of each of the car models. The inclusion of customer associations through these distance-based dyadic variables is a unique feature of our network-modeling approach.
Different from the dyadic models that can only consider exogenous dyadic effects, the ERGM supports the modeling of product interdependence with endogenous network effects. In this paper, we are particularly interested in two network configurations, the star-type interdependence and triangle-type interdependence . The star structures (Figure 3(a)) indicate that the probability of one focal product being coconsidered with others is conditional on the number of existing coconsideration relations of that focal product (e.g., the node on the top in the figure has three coconsideration links). A positive star effect suggests that a product is more likely to be coconsidered with another product if it is popular and already being coconsidered with many others. The triangle structures (Figure 3(b)) indicate that if two products are coconsidered with the same set of other products, they are more likely to be mutually coconsidered. Positive star effects could include stars with varying number of links (such as 2, 3, 4, 5, and perhaps many more). Likewise, a link could have many triangles by linking with varying number of nodes (1, 2, 3, 4, 5, and perhaps many more). Both star and triangular effects imply multiway product competition. To combine the effects of stars with multiple links and multiple triangles, we use two network configurations, the geometrically weighted degrees and the geometrically weighted edgewise shared partner, respectively .
4. Case Study: Modeling Vehicle Coconsideration Network
4.1. Application Context and Data Source
When considering and purchasing a vehicle, customers make decisions on car models (e.g., Ford Fusion versus Honda Accord), in part, based on their preferences for vehicle attributes (e.g., price, power, and make) and their demographics (e.g., income and age). To understand the effects of these factors on vehicles’ coconsideration relations, we use data from a buyer survey in the 2013 China automarket. The dataset consists of about 50,000 new car buyers’ responses to approximately 400 unique vehicle models. The survey covered a variety of questions, including respondent demographics, vehicle attributes, and customers’ perceived vehicle characteristics. The respondents reported the car they purchased as well as the primary and secondary alternatives they considered before making the final purchase. These responses are used to construct the vehicle coconsideration network. The vehicle attributes reported in the survey are verified by vehicle catalog databases.
4.2. Vehicle Coconsideration Network
Following the method discussed in Section 3.1, we construct a vehicle coconsideration network with cutoff = 5 which results in a network of 389 nodes and 2,431 binary links. A smaller cutoff generates a denser network but has similar analytical results. We have tested our models using cutoff at 1, 3, 5, and 7, respectively, and no significant changes in the trends of the model results are observed. Figure 4 shows an example of a partial vehicle coconsideration network with 11 car models. The node size is proportional to the degree, and colors indicate the clusters in which the vehicles are more likely to be coconsidered with each other. The number on each link is the lift value indicating the strength of the coconsideration.
Table 2 summarizes some descriptive network characteristics. For example, the average degree suggests that on average each vehicle has 12.5 coconsidered vehicles and indicates the overall intensity of competition in the market. The clustering coefficient (CC), on the other hand, measures the cohesion or segmentation of the vehicle market . The average local CC at values of 0.26 indicates the strong cohesion embedded in the network, and vehicle models are frequently involved in multiway competition in the market. The descriptive network analysis facilitates the understanding of the automarket and provides guidelines on the selection of network configurations in ERGM.
4.3. Descriptive Statistics of the Independent Dyadic Variables
Many exogenous dyadic variables related to vehicle attributes, such as the difference and sum variables of car prices, engine power, fuel consumption, and matching variables of vehicle’s market segments, and make origin, could change the patterns of coconsideration among the vehicle models. We use information gain analysis to select 12 most important dyadic variables among all 22 possible dyadic variables. The log transformation (base 2) is applied to the price and engine power variables to offset the effect of large outliers. Table 3 shows the descriptive statistics of the independent variables.
In total, six vehicle attributes, import, price, engine power, fuel consumption, market segment, and vehicles’ make origin, are considered in the model. Import is a binary variable describing whether a car is imported (import = 1, 37.3%) or domestically produced (import = 0, 62.7%). As suggested in Table 1 and Section 3.3.2, we construct a sum dyadic variable of import to account for its baseline effect of whether each of the paired cars is both imported (value 2 for 13.90% of the pairs), one imported and one domestic (value 1 for 46.76%), or both domestic (value 0 for 39.34%). If the baseline effect of the import attribute is positive, the coefficient of the sum variable of import should be positive as well, that is, the higher the sum value of the two car models, the more likely they are coconsidered together. Similarly, the sum variables of price (in RMB and transformed using ) and power (in brake horsepower BHP and transformed using ) describe the baseline effects of price and power on product coconsideration relations. We construct a variable, fuel consumption, by dividing liters of gasoline each vehicle consumed per 100 kilometers over vehicle power (in 100 BHP). As such, the smaller this value is, the more fuel-efficient the car model is. The difference variables of price, power, and fuel consumption capture the homophily effects, which are used to test if the car models with similar attributes (smaller differences) are more likely to be coconsidered together.
The autoindustry is very competitive, so most car models have very clearly targeted customers and compete in a specific market segment. Since vehicle’s market segment is a categorical variable, we use a dyadic matching variable in the model to investigate whether two cars from the same segment would affect their coconsideration patterns. The top 3 in all 17 segments in our sample are the C-Class Sedan (21.6% of car models), B-Class Sedan (11.3%), and Small Utility (11.1%). Similarly, make origin is also a categorical variable, and it describes the region where the car brand originates. Our dataset shows that 90, 31, 11, and 13 car models are made in Europe, Japan, South Korea, and the United States, respectively, 98 car models are produced in China with local brands and other local-foreign joint venture brands come from Europe , Japan , South Korea , and the United States . The matching variables of market segments and make origins are used to account for people’s homophily behavior of comparing cars with the same brand and origin.
4.4. Model Implementation Using ERGM
Table 4 shows the estimated coefficients and corresponding odds ratios from fitting the dyadic and ERGM models. Other than the variables described above, the ERGM includes three additional variables associated with network configurations. The edge variable controls the number of links to ensure the estimated networks have the same density as the observed one. Conceptually, if we have no knowledge about the cars’ attributes or their coconsideration relations, the edge estimates the likelihood that two cars will be coconsidered randomly, like an intercept term in a regression or a “base rate”. The star effect and triangle effect discussed in Section 3.3.2 are measured by geometrically weighted degree and the geometrically weighted edgewise shared partner, respectively. According to the results of the ERGM, most vehicle attributes, except the price baseline effect and power difference, are statistically significant ( value < 0.001) and therefore play important roles in vehicle coconsideration. For instance, two vehicles with smaller differences in price and fuel consumption are more likely to be coconsidered. If the price of one car model is twice the price of another car, their odds of coconsideration are only 45% of the odds of two cars with the same price. Similarly, one liter per 100 km per 100 BHP difference in fuel consumption leads to 93% of the odds of coconsideration compared to the cars with the same fuel consumption. For the matching of vehicle attributes, two vehicles in the same market segment are 1.94 times more likely to be coconsidered than the ones in different segments, and two vehicles with the same make origin are 1.69 times more likely to be coconsidered than the ones with different origins. Finally, the negative coefficient for the distance of customers’ demographics shows that customers with different demographics are less likely to coconsider the same vehicle. In summary, the results show that customers are more likely to consider cars with similar perceived features, such as price, fuel consumption, market segment, and make origin.
As shown in Table 4, the coefficient of the triangle effect is 0.70 ( value < 0.001). The positive sign indicates that two vehicles coconsidered with the same set of vehicles are more likely to be coconsidered with each other. It implies that a form of multiway grouping and comparison exists in customers’ consideration decisions. That is, product alternatives in a person’s consideration set are considered as the same time. On the other hand, the positive coefficient of the star effect (inversely measured by geometrically weighted degree) indicates that most of the cars tend to have a similar number of coconsideration links and there is an absence of a few cars that are much more likely to be coconsidered than others. With these endogenous network effects, the ERGM significantly improves the model fit compared to the dyadic model as indicated by the improvement of BIC from 16,005 to 14,021. In the next section, we perform a systematic comparative analysis to evaluate how well the simulated networks match the observed vehicle coconsideration network.
5. Model Comparison on Goodness of Fit
A goodness of fit (GOF) analysis is performed to compare the model fit of dyadic and ERGM models. Using the dyadic and ERGM models in (3) and (4), respectively, and based on the estimated parameters in Table 4, we compute the predicted probabilities of coconsideration between all pairs of vehicle models. The links with predicted probabilities higher than a threshold (e.g., 0.5) are considered as links that exist. Once the simulated networks are obtained from both models, we compare them against the observed 2013 coconsideration network at both the network level and the link level. The network-level evaluation uses the spectral goodness of fit (SGOF) metric , while the link level evaluation uses various accuracy measurements, such as precision, recall, and F scores (see Section 5.2 for more details).
5.1. Network-Level Comparison
Spectral goodness of fit (SGOF) is computed as follows:where is the mean Euclidean spectral distance for the fitted model while is the mean Euclidean spectral distance for the null model, that is, the Erdős–Rényi (ER) random network in which each link has a fixed probability of being present or absent. Hence, SGOF measures the amount of the observed structures explained by a fitted model, expressed as a percent improvement over a null model. The Euclidean spectral distance computes the norm (also called Euclidean norm) of the error between the observed network and all simulated networks, that is, , where error is the absolute difference between the spectra of the observed network () and that of the simulated network (), that is, . Since the calculation of the spectra requires eigenvalues of the entire network’s adjacent matrix, this evaluation is performed at the network level. When the fitted model exactly describes the data, SGOF reaches its maximum value 1. SGOF of zero means no improvement over the null model. The SGOF metric provides an overall comparison of different models. It is especially useful when a modeler is not clear about which network structural statistics are important in explaining the observed network. For example, in our coconsideration network, it is hard to tell which network metrics, such as the average path length or the average CC, are more important to the understanding of market structure. Under this circumstance, the SGOF provides a simple yet comprehensive evaluation. Table 5 lists the SGOF scores of both dyadic model and the ERGM. Based on 1,000 predicted networks from each model, the results of the mean, 5th, and 95th percentile of SGOF show that the ERGM significantly outperforms the dyadic model.
5.2. Link-Level Comparison
In addition to the network-level comparison, the predicted networks are also evaluated at the link level. We define a pair of vehicles with a coconsideration relation as positive, whereas the ones without links as negative. Therefore, the true positive (TP) is the number of links predicted as positive and also positive in the observed network; the false positive (FP) is the number of links predicted as positive but actually negative, that is, wrong predictions of positives. Similarly, the true negative (TN) is the number of links predicted as negative and observed as negative; the false negative (FN) is the number of links predicted as negative but observed as positive. Taking 0.5 as the threshold of predicted probability (as it is used in the logistic function), we calculate the following three metrics to evaluate the performance of prediction for both dyadic model and ERGM. Precision is the fraction of true positive predictions among all positive predictions; recall is the fraction of true positive predictions over all positive observations; F score is the harmonic mean of precision and recall (see Table 6 for the formulas). These metrics are adopted because each of them reflects the capability of the model from different perspectives. It could be the case where the model predicts many links (e.g., all links are predicted in extreme cases and FP is high) so that the precision is low and the recall is high, while another model could predict very few links that leads to high FN and therefore high precision and low recall. Therefore, using either precision or recall only practically reveals the model performance. Hence F score is often recommended as a fair measure because it considers both precision and recall and provides an average score. In this study, we use all three metrics together to provide a complete picture of the model performance.
As shown in Table 6, almost all performance metrics suggest that ERGM outperforms the dyadic model. In particular, the recall of ERGM is significantly higher than that of the dyadic model. The dyadic model is only able to predict about 4.2% of coconsideration, whereas the recall of the ERGM reach 31.1%. These results imply that the inclusion of product interdependence in ERGM indeed improves the model fit and better explains the observed product coconsideration relations. The only metric for which the dyadic model has a better value is the precision. At the threshold of probability equal to 0.5, the dyadic model only predict 170 links as positive in total, and 101 of them are correct. The small denominator in the precision formula, that is, TP + FP = 170, produces a larger precision.
Since different thresholds of the predicted probability can affect the value of precision and recall, we evaluate the precision-recall curve  by altering the threshold from 0 to 1 to get a more comprehensive understanding. The model that has a larger area under the curve (AUC) performs better . When evaluating binary classifiers in an imbalanced dataset (with many more cases of one value for a variable than the other), which is the case we face, Saito and Rehmsmeier  have demonstrated that the precision-recall curve is more informative than other threshold curves, such as the receiver operating characteristic (ROC) curve. Figure 5 shows that, for any given recall value, the precision of ERGM is strictly higher than that of the dyadic model and the ERGM outperforms the dyadic model in the full spectrum of the threshold of probability (we studied the ROC curve and drew the same conclusions).
In summary, the comparisons at both the network level and the link level validate our hypothesis that the product interdependence, that is, the endogenous effect, plays a significant role in the formation of product coconsideration relations and hence the customers’ consideration decisions. In the next section, we examine the predictive power of the two models.
6. Model Comparison on Predictability
In this section, we take a further step to compare the two models in terms of the predictability. We use the models developed with the 2013 dataset (i.e., the model coefficients shown in Table 4) to predict the vehicle coconsideration relations in the 2014 market. From an illustrative example in Figure 6, we can see that some car models (e.g., node 4) withdrew from the market in 2014, some new car models (e.g., node 6 and node 7) were introduced to the market, but most of the car models (e.g., nodes 1, 2 3, and 5) remained in the 2014 market. In this paper, we focus on predicting the future coconsiderations among the overlapping car models in two consecutive years since the new models may introduce critical features not captured in the previous market, such as electric cars. In our study, 315 car models were available in both 2013 and 2014. Therefore, the task here is to predict whether each pair of cars among these 315 car models will be coconsidered in 2014 given their new vehicle attributes in 2014, the new customer demographics, existing market competition structures (The market competition structure is captured by the model coefficients of the three network configurations including the edge, star effect, and triangle effect discussed in Section 4.4.), and the model coefficients estimated based on the 2013 data.
Most pairs of cars have the same dyadic status (i.e., coconsidered or not) in 2013 and 2014. For example, if two car models were not coconsidered in 2013, customers continued to not coconsider these two in 2014. This case is not of interest because predicting nonexistence is much easier due to the imbalance nature of the network dataset and it does not provide new insights. Similarly, the persistent coconsideration in both 2013 and 2014 is also expected. Therefore, we focus on changes in two prediction scenarios: emergence and disappearance of coconsideration links from 2013 to 2014. As shown in Table 7, among 47,724 pairs of cars that were not coconsidered in 2013, 1,202 pairs were considered in 2014. The event of changing from not being coconsidered to being coconsidered indicates the change of market competition potentially caused by the change of vehicle attributes such as prices. On the other hand, 1,731 pairs of cars were coconsidered in 2013 among the 315 car models, but 1,087 pairs were no longer coconsidered in 2014. We indicate the two cases in the last column of Table 7 where the predictions of 2014 network using 2013 model are the events of interest. The two “Yes” cases, predicting emerging coconsideration and disappearing coconsideration links, both represent the change of coconsideration status from 2013 to 2014 and are the positive outcomes of model predictions. Such predictions are more difficult (yet substantively more useful) to attain than the other two “No” cases of nochange. By testing both the dyadic and ERGM models, we examine which model had better predictive capability, assuming that the driving factors and customer preferences of coconsideration characterized by the model coefficients in Table 4 are unchanged from 2013 to 2014.
In both prediction scenarios, we input the new values of vehicle attributes and customer profile attributes from 2014 into the model. When using ERGM, characteristics of network configurations calculated based on the 2013 data also served as inputs for prediction. Once the models predict the probability of each pair of car models, we evaluate the performance metrics separately in two scenarios: the precision and recall of predicting emerging coconsideration among the 47,724 pairs of not coconsidered car models, and the precision and recall of predicting the disappearance of coconsideration among 1,731 pairs of cars coconsidered in 2013. The precision and recall of predictions are calculated similarly to the ones used in Section 5.2. The precision score is the ratio of the number of correctly predicted links (such as corrected prediction of emerging coconsideration or disappeared coconsideration) over the number of predictions a model makes. The recall score is the ratio of the number of correctly predicted links over the number of events of interest (true emerging coconsideration or disappeared coconsideration in 2014).
Table 8 shows the results of the prediction precision and recall calculated based on the predicted probability of 0.5 as the threshold in the two scenarios. To predict emerging coconsideration, the ERGM had much better performance than the dyadic model. Specifically, the dyadic model tends to be overtrained based on vehicle attributes and only predicts a small set of most likely links, that is, 9 of the 1,202 emerging new coconsideration relations. On the other hand, the ERGM predicted 111 (more than ten times) emerging coconsideration with the same precision. With the probability threshold of 0.5, the ERGM and dyadic model had similar differences in performance in predicting disappearing coconsideration links. Figure 7(b) shows that ERGM outperforms the dyadic model in almost all points of the precision-recall curve. In fact, the PR curves (Figure 7) show that ERGM at the entire range of the threshold outperforms the dyadic model in both prediction scenarios.
(a) Predict emerging coconsideration
(b) Predict disappeared coconsideration
Therefore, we conclude that the ERGM has better predictability than the dyadic model. In addition to the GOF fitness test, the prediction test described above further validates our hypothesis that taking interdependencies in network modeling better explains the coconsideration network. In this particular case study, the analyses performed in both GOF and prediction analyses indicate that vehicles’ coconsideration relations are influenced by their existing competitions in the market.
7. Closing Comments
In this paper, we propose a network-based approach to study customer preferences in consideration decisions. Specifically, we apply the lift association metric to convert customers’ considerations into a product coconsideration network in which nodes present products and links represent coconsideration relations between products. With the created coconsideration networks, we adopt two network models, the dyadic model and the ERGM, to predict whether two products would have a coconsideration relation or not. Using vehicle design as a case study, we perform systematic studies to identify the significant factors influencing customers’ coconsideration decisions. These factors include vehicle attributes (price, power, fuel consumption, import, make origin, and market segment), the similarity of customer demographics, and existing competition structures (i.e., the interdependence among coconsideration choices captured by network configurations). Statistical regressions are performed to obtain the estimated parameters of both models, and comparative analyses are performed to evaluate the models’ goodness of fit and predictive power in the context of vehicle coconsideration networks. Our results show that the ERGM outperforms the dyadic model in both GOF tests and the prediction analyses. This paper makes two contributions relevant to engineering design: (a) a rigorous network-based analytical framework to study product coconsideration relations in support of engineering design decisions, and (b) a systematic evaluation framework for comparing different network-modeling techniques using GOF and prediction precision and recall.
This study provides three practical insights on coconsideration behavior in China automarket. First, the customers are price-driven when considering potential car models. Both models suggest significant homophily effects of vehicle prices and customer demographics in forming coconsideration links, that is, car models with similar prices and targeting to similar demographics such as income and family size are more likely to be considered in the same consideration set. However, the ERGM reveals much more influential drivers, such as the homophily effects of car segments and make origins. These findings confirm the internal clusters in the automarket. Second, the ERGM model suggests that there are significantly fewer star structures but much more triangles in the coconsideration network. Beyond the impacts of the vehicle and customer attributes, ERGM also illustrates that car models that received an equal amount of consideration are likely to get involved in multiway coconsideration. Third, the model comparisons based on the GOF and prediction analyses demonstrate that an ERGM approach, which captures the interdependence of coconsideration, helps improve the prediction of product coconsiderations.
Finally, having an analytical model in this application context could boost future explorations including the what if scenario analysis that aims to forecast market responses under different settings of existing product attributes, as demonstrated in . Since ERGM has a better model fit and predictability, it will help make more accurate projections on the future market trends and aid the prioritization of product features in satisfying customers’ needs as well as support engineering design and product development. Future research should extend the network approach to a longitudinal weighted network-modeling framework, which not only predicts the existence of a link but also the strength of the coconsideration between car models in subsequent years. The weighted network models would help discover the nuance in different customers’ consideration sets and therefore provide more insights into product design and market forecasting.
An early version of part of this work was presented in the 2017 International Conference on Engineering Design .
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper. Funding sources mentioned in Acknowledgments do not lead to any conflicts of interest regarding the publication of this manuscript.
The authors gratefully acknowledge the financial support from NSF CMMI-1436658 and Ford-Northwestern Alliance Project.
M. Wang, W. Chen, Y. Huang, N. S. Contractor, and Y. Fu, “A Multidimensional Network Approach for Modeling Customer-Product Relations in Engineering Design,” in Proceedings of the ASME 2015 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, American Society of Mechanical Engineers, Boston, Massachusetts, USA, 2015.View at: Publisher Site | Google Scholar
E. Byler, “Cultivating the growth of complex systems using emergent behaviours of engineering processes,” in Proceedings of the in International conference on complex systems: control and modeling, Russian Academy of Sciences, 2000.View at: Google Scholar
N. Contractor, P. R. Monge, and P. Leonardi, “Multidimensional networks and the dynamics of sociomateriality: Bringing technology inside the network,” International Journal of Communication, vol. 5, pp. 682–720, 2011.View at: Google Scholar
P. Cormier, E. Devendorf, and K. Lewis, “Optimal process architectures for distributed design using a social network model,” in Proceedings of the ASME 2012 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, IDETC/CIE 2012, pp. 485–495, USA, August 2012.View at: Publisher Site | Google Scholar
Z. Sha and J. H. Panchal, “Estimating linking preferences and behaviors of autonomous systems in the internet using a discrete choice model,” in Proceedings of the 2014 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2014, pp. 1591–1597, usa, October 2014.View at: Publisher Site | Google Scholar
J. Hauser, M. Ding, and S. P. Gaskin, “Non-compensatory (and compensatory) models of consideration-set decisions,” in Proceedings of the in 2009 Sawtooth Software Conference Proceedings, Sequin WA, 2009.View at: Google Scholar
S. Frederick, Automated choice heuristics.
W. Shao, Consumer Decision-Making: An Empirical Exploration of Multi-Phased Decision Processes, Griffith University Australia, 2006.
P. Resnick and H. R. Varian, “Recommender systems,” Communications of the ACM, vol. 40, no. 3, pp. 56–58, 1997.View at: Google Scholar
T. Zhoua, Z. Kuscsik, J. Liu, M. Medo, J. R. Wakeling, and Y. Zhang, “Solving the apparent diversity-accuracy dilemma of recommender systems,” Proceedings of the National Acadamy of Sciences of the United States of America, vol. 107, no. 10, pp. 4511–4515, 2010.View at: Publisher Site | Google Scholar
M. J. Pazzani and D. Billsus, “Content-based recommendation systems,” in The Adaptive Web: Methods and Strategies of Web Personalization, P. Brusilovsky, A. Kobsa, and., and W. Nejdl, Eds., pp. 325–341, Springer, Berlin, Germany, 2007.View at: Google Scholar
D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,” Journal of Machine Learning Research, vol. 3, no. 4-5, pp. 993–1022, 2003.View at: Google Scholar
J. S. Fu et al., “Modeling Customer Choice Preferences in Engineering Design Using Bipartite Network Analysis,” in Proceedings of the in 2017 ASME International Design Engineering Technical Conferences & Computers and Information in Engineering Conference, ASME, Cleveland, OH, USA, 2017.View at: Google Scholar
M. Wang, Z. Sha, Y. Huang, N. Contractor, Y. Fu, and W. Chen, “Forecasting technological impacts on customers' co-consideration behaviors: A data-driven network analysis approach,” in Proceedings of the ASME 2016 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, IDETC/CIE 2016, USA, August 2016.View at: Publisher Site | Google Scholar
K. W. Church and P. Hanks, “Word association norms, mutual information, and lexicography,” Computational linguistics, vol. 16, no. 1, pp. 22–29, 1990.View at: Google Scholar
M. Greenacre, Correspondence analysis in practice, CRC press, 2017.
M. Wang et al., “A Network Approach for Understanding and Analyzing Product Co-Consideration Relations in Engineering Design,” in Proceedings of the DESIGN 2016 14th International Design Conference, 2016.View at: Google Scholar
D. M. Powers, Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation, 2011.
Z. Sha et al., “Modeling Product Co-Consideration Relations: A Comparative Study of Two Network Models,” in Proceedings of the 21st International Conference on Engineering Design, ICED17, 2017.View at: Google Scholar