Explainable and Reliable Machine Learning by Exploiting LargeScale and Heterogeneous Data
View this Special IssueResearch Article  Open Access
Chong Wu, Zhenan Feng, Jiangbin Zheng, Houwang Zhang, "Common Laws Driving the Success in Show Business", Computational Intelligence and Neuroscience, vol. 2020, Article ID 8842221, 10 pages, 2020. https://doi.org/10.1155/2020/8842221
Common Laws Driving the Success in Show Business
Abstract
In this paper, we want to find out whether gender bias will affect the success and whether there are some common laws driving the success in show business. We design an experiment, set the gender and productivity of an actor or actress in a certain period as the independent variables, and introduce deep learning techniques to do the prediction of success, extract the latent features, and understand the data we use. Three models have been trained: the first one is trained by the data of an actor, the second one is trained by the data of an actress, and the third one is trained by the mixed data. Three benchmark models are constructed with the same conditions. The experiment results show that our models are more general and accurate than benchmarks. An interesting finding is that the models trained by the data of an actor/actress only achieve similar performance on the data of another gender without performance loss. It shows that the gender bias is weakly related to success. Through the visualization of the feature maps in the embedding space, we see that prediction models have learned some common laws although they are trained by different data. Using the above findings, a more general and accurate model to predict the success in show business can be built.
1. Introduction
“Do I need to change a job?” is one of the major concerns to most actors and actresses since the show business is really competitive [1]. Matthew effect [2] or the socalled “richgetricher” phenomenon is proved to exist in the show business which demonstrates the scarcity of the resources [1]. Luck is proved to be a key element in driving the success [3]. It is well known that the effect of richgetricher is quite arbitrary and unpredictable [4]. Hence, most actors and actresses will meet a problem of avoiding the famine and building a sustainable career in acting [1]. Some studies have found that boosting productivity is a key metric to evaluate the success of an actor or actress, and it can be more of a network effect [5, 6] than a consequence of acting skills; in other words, success is not highly related to the acting skills [1]. And, some studies show the relationship between the dynamic collaboration network and success [7]: success is a collective phenomenon [8]. Startup network is proved to have predictive power in show business [9]. And, future success can be predicted by monitoring the behavior of a small set of individuals [10]. To study the law of success, a great deal of work has been done [11–19].
Recently, a study shows that the success in show business is predictable and uses a heuristic thresholdbased binary classifier to achieve an accuracy up to 85% [1]. In their study, they find a strong gender bias in the waiting time statistics, the location of annus mirabilis, and the career length distribution of these data. However, we have some questions here: Whether gender bias is one of the key elements driving the success? Can we find some common laws driving the success in show business? Since we want to build a general prediction model, the common laws which determine the growth and the shape of the series are more important than the differences.
To solve our questions, we design this study. The data we use are collected from the International Movie Database (IMDb), http://www.imdb.com in [1]. It consists of millions of profile sequences of actors and actresses from the birth of the film in 1888 up to the present day [1]. Each sequence records the yearly time series of credited jobs over the entire working life of the actor or actress [1]. We just consider the number of credited jobs regardless of the impact of the work, the screen time, and so on, which is the same as in [1]. The original feature space is a nonEuclidean space. We must to do the representation learning to map these features to a Euclidean space. To do this, we construct a deep model which consists of an encoder and a classifier. Since gender is an independent variable in our experiment, we train three models: (1) MAO, (2) MAE, and (3) MM. They all have the same structure but are trained by different datasets (MAO is trained by the data of an actor, MAE is trained by the data of an actress, and MM is trained by the mixed data). Our problem can be reconstructed like follows: (1) if MAO can achieve nondegradation performance on the data of an actress like MAE and MAE can achieve nondegradation performance on the data of an actor like MAO, then it can be proved that there are common features in the series which are unrelated to the gender. (2) If MM can achieve similar and nonsuperior performance against MAO and MAE, then these features which have gender bias are not dominative features in this prediction problem; that is to say, gender bias may cause some differences into the resource allocation, but it is weakly related to success. The contributions of this paper can be concluded as follows:(1)We found that there are some common laws/features driving the success in show business by extracting and understanding the data.(2)Using these common features, a more general prediction model with an accuracy up to 90% can be built.(3)Our experiment shows that gender bias is weakly related to success despite a recent study which shows that it affects strongly the waiting time statistics, the location of annus mirabilis, the career length distribution, etc.
2. Materials and Methods
2.1. Data
The data we use consist of the careers of 1,512,472 actors and 896,029 actresses from 1888 up to 2016 and are collected from the International Movie Database (IMDb) http://www.imdb.com. Each career is viewed as a profile sequence: the yearly time series of acting jobs in films or TV series over the entire working life of the actor or actress [1]. We refer to [1] and relax their selection constraint to select the sequences of actors and actresses with working lives L ≥ 5 years, and the number of credited jobs in the annus mirabilis (AM) is ≥ 5. The sequences obtained by some more relaxed cutoffs are too short to be analyzed, and they are considered as the outliers and not included in the experiment. Then, the subset we use consists of 37896 (2.51%) sequences of actors and 22025 (2.46%) sequences of actresses which is larger than the data used in the prediction model in [1]. We divide this subset into several groups for experiment: (1) Group 1: the data of an actor with AM ≥ 5 and L ≥ 20, including 21994 sequences; (2) Group 2: the data of an actress with AM ≥ 5 and L ≥ 20, including 9034 sequences; (3) Group 3: the data of an actor with AM ≥ 5.5 ≤ L < 20, including 15902 sequences; (4) Group 4: the data of an actress with AM ≥ 5.5 ≤ L < 20, including 12991 sequences. Group 1 and Group 2 can be considered as some very successful actors which are used to train the prediction model mainly. Group 3 and Group 4 can be considered as some actors who are not very successful, and they might need a prediction model more than previous groups, and these data will be used to test the prediction model.
2.2. Data Preprocessing
To do an early prediction, we need to do some preprocessing on the data before training the model. At first, we refer to [1] to truncate each sequence into several subsequences or called subcareer series. For each sequence, we randomly sample several subsequences with a sampling rate n. The subsequences which are sampled before the annus mirabilis are regarded as class 1. The subsequences which are sampled after the annus mirabilis are regarded as class 2. Hence, it is a binary classification problem. The aim of this sampling is to get some samples of class 1 since we only have the entire working life of the actor or actress. An example of the sampling process with a sampling rate is shown in Figure 1. NatComm19 uses the following function [1] to transfer these subsequences to scalars for the training:where is the number of credited jobs at year and is the length of the subsequence.
The above transformation will lose some information like the increasing or decreasing trend. In this paper, we revise equation (1) as follows to get a new sequence and not a scalar which will protect these information:
Then, we use the new sequence to train the model.
Since gender is an independent variable, we construct three prediction models which will be trained by different subsets of the whole data. The details of separation of training data and test data for each model are shown in Table 1.
 
The validation data are included in the training data. Note. MM_ours denotes the prediction model trained by the mixed data of an actor and actress, MAO_ours denotes the prediction model trained by the data of an actor only, and MAE_ours denotes the prediction model trained by the data of an actress only. 
2.3. Prediction Model
Recurrent neural network (RNN) or long shortterm memory (LSTM) [20, 21] is powerful to solve the time series prediction problem with sequential data. Compared to the standard feedforward neural network, RNN is a kind of neural networks which is as the feedback connections (memory), as shown in Figure 2. It can process not only single data points, but also the entire sequences of data. For example, LSTM is applied in some tasks such as speech recognition [22], sign language translation [23], object cosegmentation [24, 25], and airport passenger management [26]. Hence, here, we use RNN with LSTM units to build an endtoend prediction model, where the LSTM unit is composed of a cell, an input gate, an output gate, and a forget gate. Figure 3 shows the structure of our model. Sequentially, our model can be divided into two parts: (1) encoder; (2) binary classifier. The encoder consists of an LSTM layer with 30 hidden units and outputs at the last time step. And, the classifier consists of a fully connected layer, a softmax layer, and a classification layer with the cross entropy as the loss function. Our model is trained in a supervised fashion, on a set of training sequences, using an optimization algorithm, gradient descent. Since sequences have different lengths as shown in Figure 4, the feature space of these sequences is a nonEuclidean space. It is difficult to train a classifier in this feature space. Hence, each input sequence will be embedded by the encoder to a Euclidean space using the following transformation:where is an dim sequence. Through the encoder, the dimension of the feature is also reduced. Then, the following loss function is minimized to get the optimized parameters:where is the real label and is the label predicted by the classifier.
(a)
(b)
In the process of forward propagation, LSTM does not simply compute a weighted sum of the input signal. It applies a nonlinear function. For each th LSTM unit, it maintains a memory at time and an output gate weight . Then, the output is
The memory cell is updated by partially forgetting the existing memory and adding a new memory content :where is the weight of the forget gate and is the weight of the input gate.
The details of each layer’s configuration are shown in Table 2. The training settings for the prediction model: max epoch is set to 15, size of the minibatch is set to 100, optimizer is Adam, and gradient threshold is set to 1. More complex models like the models with deep layers and the models with complex structures (biLSTM) have also been tested, but there is no obvious performance improvement. That is to say, these are all fairly “off the shelf ” classifiers. Since simpler is better, we just use the simplest model to show the results.

3. Results
Table 3–5 show the comparison between our model and a recent study NatComm19 [1] on the test data. MM_ours denotes the prediction model trained by the mixed data of an actor and actress, MAO_ours denotes the prediction model trained by the data of an actor only, and MAE_ours denotes the prediction model trained by the data of an actress only. MM_NatComm19 denotes the model of NatComm19 [1] trained by the mixed data of an actor and actress, and the learned threshold d = 6.1523; MAO_NatComm19 denotes the model of NatComm19 [1] trained by the data of an actor only, and the learned threshold d = 6.9580; and MAE_NatComm19 denotes the model of NatComm19 [1] trained by the data of an actress only, and the learned threshold d = 5.6640. All models are trained on the training data with a cutoff value (AM ≥ 5, L ≥ 20). We can see that our models outperform NatComm19 in terms of all quantity metrics in all subsets of the test data. Our models are more general than NatComm19 and can still maintain the performance on the new data (AM ≥ 5, 5 ≥ L < 20 and AM ≥ 10, 5 ≥ L < 20 and AM ≥ 15, 5 ≥ L < 20), whereas the performance of three models of NatComm19 degrades to near the baseline. The details of the baseline model can be found in [1]. There is almost no difference between the performance of our three models. And, interestingly, the difference between the performance of the three models of NatComm19 can also be ignored.
 
MM_ours denotes the prediction model trained by the mixed data of an actor and actress; MAO_ours denotes the prediction model trained by the data of an actor only; MAE_ours denotes the prediction model trained by the data of an actress only; MM_NatComm19 denotes the model of NatComm19 [1] trained by the mixed data of an actor and actress, and the learned threshold d = 6.1523; MAO_NatComm19 denotes the model of NatComm19 [1] trained by the data of an actor, and the learned threshold d = 6.9580; MAE_NatComm19 denotes the model of NatComm19 [1] trained by the data of an actress, and the learned threshold d = 5.6640. 
 
MM_ours denotes the prediction model trained by the mixed data of an actor and actress; MAO_ours denotes the prediction model trained by the data of an actor only; MAE_ours denotes the prediction model trained by the data of an actress only; MM_NatComm19 denotes the model of NatComm19 [1] trained by the mixed data of an actor and actress, and the learned threshold d = 6.1523; MAO_NatComm19 denotes the model of NatComm19 [1] trained by the data of an actor, and the learned threshold d = 6.9580; MAE_NatComm19 denotes the model of NatComm19 [1] trained by the data of an actress, and the learned threshold d = 5.6640. 
 
MM_ours denotes the prediction model trained by the mixed data of an actor and actress; MAO_ours denotes the prediction model trained by the data of an actor only; MAE_ours denotes the prediction model trained by the data of an actress only; MM_NatComm19 denotes the model of NatComm19 [1] trained by the mixed data of an actor and actress, and the learned threshold d = 6.1523; MAO_NatComm19 denotes the model of NatComm19 [1] trained by the data of an actor, and the learned threshold d = 6.9580; MAE_NatComm19 denotes the model of NatComm19 [1] trained by the data of an actress, and the learned threshold d = 5.6640. 
4. Discussion
Two MAE models (MAE_ours and MAE_NatComm19) can achieve similar results compared to two MAO models (MAO_ours and MAO_NatComm19) on the test data of an actor. Similarly, two MAO models (MAO_ours and MAO_NatComm19) can also achieve similar results compared to two MAE models (MAE_oursandMAE_NatComm19) on the test data of an actress. The case of MAE_ours and MAO_ours shows that our models can learn some common features that are used to classify. Since the model of NatComm19 uses a learnable threshold to classify the original feature space as shown in Figure 5, the case of MAE_NatComm19 and MAO_NatComm19 shows that the distribution and the shape of the original feature space of the data of an actor and the data of an actress are similar just as shown in Figure 6. MM_ours achieves similar and nonsuperior results compared to MAE_ours and MAO_ours, and MM_NatComm19 also achieves similar and nonsuperior results compared to MAE_NatComm19 and MAO_NatComm19. It shows that these features which have gender bias are not dominative features in this prediction problem; that is to say, gender bias may cause some differences in some aspects like resource allocation, but it is weakly related to success. To further validate our conclusion, we visualize the embedding space in Figure 7. It seems that three models learn some different features. But, it was caused by the randomness of the neural network, and the order of these features has no meaning because it is like the eigen decomposition. From the weight of each embedding feature which is obtained in the fully connected layer, we can see that most of these embedding features are unimportant. And interestingly, all three models have only one dominative feature. The floating range of the corresponding feature in three models is also similar , where is a positive scalar. We can believe that they have learned a similar feature that is used to classify.
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
5. Conclusion
In this paper, we design a datadriven research to find out whether the gender bias is a key element and try to find some common laws/features driving the success in show business. The experiment results show that there are some common features between the success of an actor and the success of an actress. And, gender bias is weakly related to the success. We use this property to build a general model to predict the success in show business. Compared to the benchmark, the improvement of the model is obvious. In the future, we plan to do a further research on whether gender bias is a key element and try to find some common laws driving the success in other fields.
Data Availability
The data used in this study can be accessed at https://doi.org/10.17605/OSF.IO/NDTA3.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Authors’ Contributions
The first author designed the study and wrote the paper. The remaining authors contributed equally to this paper in data analysis.
Acknowledgments
Chong Wu acknowledges the constructive suggestions from Prof. Jonathan Zhu. This work was supported by Mr. Jiangbin Zheng, the third author of this paper.
References
 O. E. Williams, L. Lacasa, and V. Latora, “Quantifying and predicting success in show business,” Nature Communications, vol. 10, no. 1, pp. 1–8, 2019. View at: Publisher Site  Google Scholar
 A. M. Petersen, W.S. Jung, J.S. Yang, and H. E. Stanley, “Quantitative and empirical demonstration of the matthew effect in a study of career longevity,” Proceedings of the National Academy of Sciences, vol. 108, no. 1, pp. 18–23, 2011. View at: Publisher Site  Google Scholar
 M. Janosov, F. Battiston, and R. Sinatra, “Success and luck in creative careers,” EPJ Data Science, vol. 9, no. 1, 2020. View at: Publisher Site  Google Scholar
 D. Easley and J. Kleinberg, “Networks, crowds, and markets: reasoning about a highly connected world,” Significance, vol. 9, pp. 4344, 2012. View at: Google Scholar
 A.L. Barabási, The Formula: The Universal Laws of Success, Hachette Book Group, Hachette UK, 2018.
 S. P. Fraiberger, R. Sinatra, M. Resch, C. Riedl, and A.L. Barabási, “Quantifying reputation and success in art,” Science, vol. 362, no. 6416, pp. 825–829, 2018. View at: Publisher Site  Google Scholar
 S. Juhász, G.˝O. Tóth, and B. Lengyel, “Brokering the core and the periphery: creative success and collaboration networks in the film industry,” PLoS One, vol. 15, no. 2, Article ID e0229436, 2020. View at: Publisher Site  Google Scholar
 L. Wu, D. Wang, and J. A. Evans, “Large teams develop and small teams disrupt science and technology,” Nature, vol. 566, no. 7744, pp. 378–382, 2019. View at: Publisher Site  Google Scholar
 B. Moreno, V. Ciotti, P. Panzarasa, S. Liverani, L. Lacasa, and V. Latora, “Predicting success in the worldwide startup network,” Scientific Reports, vol. 10, no. 1, pp. 1–6, 2020. View at: Publisher Site  Google Scholar
 M. S. Mariani, Y. Gimenez, J. Brea, M. Martin, R. Algesheimer, and C. J. Tessone, The Wisdom of the Few: Predicting Collective Success from Individual Behavior, 2020.
 R. Sinatra, D. Wang, P. Deville, C. Song, and A.L. Barabasi, “Quantifying the evolution of individual scientific impactfic impact,” Science, vol. 354, no. 6312, Article ID aaf5239, 2016. View at: Publisher Site  Google Scholar
 J. E. Hirsch, “An index to quantify an individual’s scientific research output,” Proceedings of the National Academy of Sciences, vol. 102, no. 46, pp. 16569–16572, 2005. View at: Publisher Site  Google Scholar
 A. Kozbelt, “Onehit wonders in classical music: evidence and (partial) explanations for an early career peak,” Creativity Research Journal, vol. 20, no. 2, pp. 179–195, 2008. View at: Publisher Site  Google Scholar
 D. K. Simonton, “Creative productivity: a predictive and explanatory model of career trajectories and landmarks,” Psychological Review, vol. 104, no. 1, pp. 66–89, 1997. View at: Publisher Site  Google Scholar
 H. C. Lehman, Age and Achievement, Princeton University Press, Princeton, NJ, USA, 2017.
 D. K. Simonton, “Age and outstanding achievement: what do we know after a century of research?” Psychological Bulletin, vol. 104, no. 2, pp. 251–267, 1988. View at: Publisher Site  Google Scholar
 A. Spitz and E.˝OÁ. Horvát, “Measuring longterm impact based on network centrality: unraveling cinematic citations,” PLoS One, vol. 9, no. 10, 2014. View at: Publisher Site  Google Scholar
 D. E. Acuna, S. Allesina, and K. P. Kording, “Predicting scientific success,” Nature, vol. 489, no. 7415, pp. 201202, 2012. View at: Publisher Site  Google Scholar
 O. Penner, R. K. Pan, A. M. Petersen, K. Kaski, and S. Fortunato, “On the predictability of future impact in science,” Scientific Reports, vol. 3, no. 1, pp. 1–8, 2013. View at: Publisher Site  Google Scholar
 S. Hochreiter and J. Schmidhuber, “Long shortterm memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997. View at: Publisher Site  Google Scholar
 A. Voelker, I. Kaji´c, and C. Eliasmith, “Legendre memory units: continuoustime representation in recurrent neural networks,” in Advances in Neural Information Processing Systems, pp. 15544–15553, MIT Press, Cambridge, MA, USA, 2019. View at: Publisher Site  Google Scholar
 A. Graves, A.R. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649, Vancouver, Canada, May 2013. View at: Publisher Site  Google Scholar
 J. Huang, W. Zhou, Q. Zhang, H. Li, and W. Li, “Videobased sign language recognition without temporal segmentation,” in Proceedings of the ThirtySecond AAAI Conference on Artificial Intelligence, New Orleans, LA USA, February 2018. View at: Google Scholar
 L. Wang, X. Duan, Q. Zhang, Z. Niu, G. Hua, and N. Zheng, “Segmenttube: spatiotemporal action localization in untrimmed videos with perframe segmentation,” Sensors, vol. 18, no. 5, p. 1657, 2018. View at: Publisher Site  Google Scholar
 X. Duan, Le Wang, C. Zhai et al., “Joint spatiotemporal action localization in untrimmed videos with perframe segmentation,” in Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 918–922, Athens, Greece, October 2018. View at: Publisher Site  Google Scholar
 F. Orsini, M. Gastaldi, L. Mantecchini, and R. Rossi, “Neural networks trained with WiFi traces to predict airport passenger behavior,” in Proceedings of the 2019 6th International Conference on Models and Technologies for Intelligent Transportation Systems (MTITS), pp. 1–7, Cracow, Poland, June 2019. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2020 Chong Wu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.