Explainable and Reliable Machine Learning by Exploiting Large-Scale and Heterogeneous DataView this Special Issue
Common Laws Driving the Success in Show Business
In this paper, we want to find out whether gender bias will affect the success and whether there are some common laws driving the success in show business. We design an experiment, set the gender and productivity of an actor or actress in a certain period as the independent variables, and introduce deep learning techniques to do the prediction of success, extract the latent features, and understand the data we use. Three models have been trained: the first one is trained by the data of an actor, the second one is trained by the data of an actress, and the third one is trained by the mixed data. Three benchmark models are constructed with the same conditions. The experiment results show that our models are more general and accurate than benchmarks. An interesting finding is that the models trained by the data of an actor/actress only achieve similar performance on the data of another gender without performance loss. It shows that the gender bias is weakly related to success. Through the visualization of the feature maps in the embedding space, we see that prediction models have learned some common laws although they are trained by different data. Using the above findings, a more general and accurate model to predict the success in show business can be built.
“Do I need to change a job?” is one of the major concerns to most actors and actresses since the show business is really competitive . Matthew effect  or the so-called “rich-get-richer” phenomenon is proved to exist in the show business which demonstrates the scarcity of the resources . Luck is proved to be a key element in driving the success . It is well known that the effect of rich-get-richer is quite arbitrary and unpredictable . Hence, most actors and actresses will meet a problem of avoiding the famine and building a sustainable career in acting . Some studies have found that boosting productivity is a key metric to evaluate the success of an actor or actress, and it can be more of a network effect [5, 6] than a consequence of acting skills; in other words, success is not highly related to the acting skills . And, some studies show the relationship between the dynamic collaboration network and success : success is a collective phenomenon . Startup network is proved to have predictive power in show business . And, future success can be predicted by monitoring the behavior of a small set of individuals . To study the law of success, a great deal of work has been done [11–19].
Recently, a study shows that the success in show business is predictable and uses a heuristic threshold-based binary classifier to achieve an accuracy up to 85% . In their study, they find a strong gender bias in the waiting time statistics, the location of annus mirabilis, and the career length distribution of these data. However, we have some questions here: Whether gender bias is one of the key elements driving the success? Can we find some common laws driving the success in show business? Since we want to build a general prediction model, the common laws which determine the growth and the shape of the series are more important than the differences.
To solve our questions, we design this study. The data we use are collected from the International Movie Database (IMDb), http://www.imdb.com in . It consists of millions of profile sequences of actors and actresses from the birth of the film in 1888 up to the present day . Each sequence records the yearly time series of credited jobs over the entire working life of the actor or actress . We just consider the number of credited jobs regardless of the impact of the work, the screen time, and so on, which is the same as in . The original feature space is a non-Euclidean space. We must to do the representation learning to map these features to a Euclidean space. To do this, we construct a deep model which consists of an encoder and a classifier. Since gender is an independent variable in our experiment, we train three models: (1) MAO, (2) MAE, and (3) MM. They all have the same structure but are trained by different datasets (MAO is trained by the data of an actor, MAE is trained by the data of an actress, and MM is trained by the mixed data). Our problem can be reconstructed like follows: (1) if MAO can achieve nondegradation performance on the data of an actress like MAE and MAE can achieve nondegradation performance on the data of an actor like MAO, then it can be proved that there are common features in the series which are unrelated to the gender. (2) If MM can achieve similar and nonsuperior performance against MAO and MAE, then these features which have gender bias are not dominative features in this prediction problem; that is to say, gender bias may cause some differences into the resource allocation, but it is weakly related to success. The contributions of this paper can be concluded as follows:(1)We found that there are some common laws/features driving the success in show business by extracting and understanding the data.(2)Using these common features, a more general prediction model with an accuracy up to 90% can be built.(3)Our experiment shows that gender bias is weakly related to success despite a recent study which shows that it affects strongly the waiting time statistics, the location of annus mirabilis, the career length distribution, etc.
2. Materials and Methods
The data we use consist of the careers of 1,512,472 actors and 896,029 actresses from 1888 up to 2016 and are collected from the International Movie Database (IMDb) http://www.imdb.com. Each career is viewed as a profile sequence: the yearly time series of acting jobs in films or TV series over the entire working life of the actor or actress . We refer to  and relax their selection constraint to select the sequences of actors and actresses with working lives L ≥ 5 years, and the number of credited jobs in the annus mirabilis (AM) is ≥ 5. The sequences obtained by some more relaxed cutoffs are too short to be analyzed, and they are considered as the outliers and not included in the experiment. Then, the subset we use consists of 37896 (2.51%) sequences of actors and 22025 (2.46%) sequences of actresses which is larger than the data used in the prediction model in . We divide this subset into several groups for experiment: (1) Group 1: the data of an actor with AM ≥ 5 and L ≥ 20, including 21994 sequences; (2) Group 2: the data of an actress with AM ≥ 5 and L ≥ 20, including 9034 sequences; (3) Group 3: the data of an actor with AM ≥ 5.5 ≤ L < 20, including 15902 sequences; (4) Group 4: the data of an actress with AM ≥ 5.5 ≤ L < 20, including 12991 sequences. Group 1 and Group 2 can be considered as some very successful actors which are used to train the prediction model mainly. Group 3 and Group 4 can be considered as some actors who are not very successful, and they might need a prediction model more than previous groups, and these data will be used to test the prediction model.
2.2. Data Preprocessing
To do an early prediction, we need to do some preprocessing on the data before training the model. At first, we refer to  to truncate each sequence into several subsequences or called subcareer series. For each sequence, we randomly sample several subsequences with a sampling rate n. The subsequences which are sampled before the annus mirabilis are regarded as class 1. The subsequences which are sampled after the annus mirabilis are regarded as class 2. Hence, it is a binary classification problem. The aim of this sampling is to get some samples of class 1 since we only have the entire working life of the actor or actress. An example of the sampling process with a sampling rate is shown in Figure 1. NatComm19 uses the following function  to transfer these subsequences to scalars for the training:where is the number of credited jobs at year and is the length of the subsequence.
The above transformation will lose some information like the increasing or decreasing trend. In this paper, we revise equation (1) as follows to get a new sequence and not a scalar which will protect these information:
Then, we use the new sequence to train the model.
Since gender is an independent variable, we construct three prediction models which will be trained by different subsets of the whole data. The details of separation of training data and test data for each model are shown in Table 1.
2.3. Prediction Model
Recurrent neural network (RNN) or long short-term memory (LSTM) [20, 21] is powerful to solve the time series prediction problem with sequential data. Compared to the standard feedforward neural network, RNN is a kind of neural networks which is as the feedback connections (memory), as shown in Figure 2. It can process not only single data points, but also the entire sequences of data. For example, LSTM is applied in some tasks such as speech recognition , sign language translation , object cosegmentation [24, 25], and airport passenger management . Hence, here, we use RNN with LSTM units to build an end-to-end prediction model, where the LSTM unit is composed of a cell, an input gate, an output gate, and a forget gate. Figure 3 shows the structure of our model. Sequentially, our model can be divided into two parts: (1) encoder; (2) binary classifier. The encoder consists of an LSTM layer with 30 hidden units and outputs at the last time step. And, the classifier consists of a fully connected layer, a softmax layer, and a classification layer with the cross entropy as the loss function. Our model is trained in a supervised fashion, on a set of training sequences, using an optimization algorithm, gradient descent. Since sequences have different lengths as shown in Figure 4, the feature space of these sequences is a non-Euclidean space. It is difficult to train a classifier in this feature space. Hence, each input sequence will be embedded by the encoder to a Euclidean space using the following transformation:where is an -dim sequence. Through the encoder, the dimension of the feature is also reduced. Then, the following loss function is minimized to get the optimized parameters:where is the real label and is the label predicted by the classifier.
In the process of forward propagation, LSTM does not simply compute a weighted sum of the input signal. It applies a nonlinear function. For each -th LSTM unit, it maintains a memory at time and an output gate weight . Then, the output is
The memory cell is updated by partially forgetting the existing memory and adding a new memory content :where is the weight of the forget gate and is the weight of the input gate.
The details of each layer’s configuration are shown in Table 2. The training settings for the prediction model: max epoch is set to 15, size of the minibatch is set to 100, optimizer is Adam, and gradient threshold is set to 1. More complex models like the models with deep layers and the models with complex structures (biLSTM) have also been tested, but there is no obvious performance improvement. That is to say, these are all fairly “off the shelf ” classifiers. Since simpler is better, we just use the simplest model to show the results.
Table 3–5 show the comparison between our model and a recent study NatComm19  on the test data. MM_ours denotes the prediction model trained by the mixed data of an actor and actress, MAO_ours denotes the prediction model trained by the data of an actor only, and MAE_ours denotes the prediction model trained by the data of an actress only. MM_NatComm19 denotes the model of NatComm19  trained by the mixed data of an actor and actress, and the learned threshold d = 6.1523; MAO_NatComm19 denotes the model of NatComm19  trained by the data of an actor only, and the learned threshold d = 6.9580; and MAE_NatComm19 denotes the model of NatComm19  trained by the data of an actress only, and the learned threshold d = 5.6640. All models are trained on the training data with a cutoff value (AM ≥ 5, L ≥ 20). We can see that our models outperform NatComm19 in terms of all quantity metrics in all subsets of the test data. Our models are more general than NatComm19 and can still maintain the performance on the new data (AM ≥ 5, 5 ≥ L < 20 and AM ≥ 10, 5 ≥ L < 20 and AM ≥ 15, 5 ≥ L < 20), whereas the performance of three models of NatComm19 degrades to near the baseline. The details of the baseline model can be found in . There is almost no difference between the performance of our three models. And, interestingly, the difference between the performance of the three models of NatComm19 can also be ignored.
Two MAE models (MAE_ours and MAE_NatComm19) can achieve similar results compared to two MAO models (MAO_ours and MAO_NatComm19) on the test data of an actor. Similarly, two MAO models (MAO_ours and MAO_NatComm19) can also achieve similar results compared to two MAE models (MAE_oursandMAE_NatComm19) on the test data of an actress. The case of MAE_ours and MAO_ours shows that our models can learn some common features that are used to classify. Since the model of NatComm19 uses a learnable threshold to classify the original feature space as shown in Figure 5, the case of MAE_NatComm19 and MAO_NatComm19 shows that the distribution and the shape of the original feature space of the data of an actor and the data of an actress are similar just as shown in Figure 6. MM_ours achieves similar and nonsuperior results compared to MAE_ours and MAO_ours, and MM_NatComm19 also achieves similar and nonsuperior results compared to MAE_NatComm19 and MAO_NatComm19. It shows that these features which have gender bias are not dominative features in this prediction problem; that is to say, gender bias may cause some differences in some aspects like resource allocation, but it is weakly related to success. To further validate our conclusion, we visualize the embedding space in Figure 7. It seems that three models learn some different features. But, it was caused by the randomness of the neural network, and the order of these features has no meaning because it is like the eigen decomposition. From the weight of each embedding feature which is obtained in the fully connected layer, we can see that most of these embedding features are unimportant. And interestingly, all three models have only one dominative feature. The floating range of the corresponding feature in three models is also similar , where is a positive scalar. We can believe that they have learned a similar feature that is used to classify.
In this paper, we design a data-driven research to find out whether the gender bias is a key element and try to find some common laws/features driving the success in show business. The experiment results show that there are some common features between the success of an actor and the success of an actress. And, gender bias is weakly related to the success. We use this property to build a general model to predict the success in show business. Compared to the benchmark, the improvement of the model is obvious. In the future, we plan to do a further research on whether gender bias is a key element and try to find some common laws driving the success in other fields.
The data used in this study can be accessed at https://doi.org/10.17605/OSF.IO/NDTA3.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
The first author designed the study and wrote the paper. The remaining authors contributed equally to this paper in data analysis.
Chong Wu acknowledges the constructive suggestions from Prof. Jonathan Zhu. This work was supported by Mr. Jiangbin Zheng, the third author of this paper.
D. Easley and J. Kleinberg, “Networks, crowds, and markets: reasoning about a highly connected world,” Significance, vol. 9, pp. 43-44, 2012.View at: Google Scholar
A.-L. Barabási, The Formula: The Universal Laws of Success, Hachette Book Group, Hachette UK, 2018.
M. S. Mariani, Y. Gimenez, J. Brea, M. Martin, R. Algesheimer, and C. J. Tessone, The Wisdom of the Few: Predicting Collective Success from Individual Behavior, 2020.
H. C. Lehman, Age and Achievement, Princeton University Press, Princeton, NJ, USA, 2017.
J. Huang, W. Zhou, Q. Zhang, H. Li, and W. Li, “Video-based sign language recognition without temporal segmentation,” in Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA USA, February 2018.View at: Google Scholar
F. Orsini, M. Gastaldi, L. Mantecchini, and R. Rossi, “Neural networks trained with WiFi traces to predict airport passenger behavior,” in Proceedings of the 2019 6th International Conference on Models and Technologies for Intelligent Transportation Systems (MT-ITS), pp. 1–7, Cracow, Poland, June 2019.View at: Publisher Site | Google Scholar