The variation of the journal impact factor is affected by many statistical and sociological factors such as the size of citation window and subject difference. In this work, we develop an impact factor dynamics model based on the parallel system, which can be used to analyze the correlation between the impact factor and certain elements. The parallel model aims to simulate the submission and citation behaviors of the papers in journals belonging to a similar subject, in a distributed manner. We perform Monte Carlo simulations to show how the model parameters influence the impact factor dynamics. Through extensive simulations, we reveal the important role that certain statistics elements and behaviors play to affect impact factors. The experimental results and analysis on actual data demonstrate that the value of the JIF is comprehensively influenced by the average review time, average number of references, and aging distribution of citation.

1. Introduction

Academic impact assessment and scientific journal ranking have always been a hot topic, which plays a very key role in the process of dissemination and development of academic research [13]. As one of the most important evaluation indicators in SCI, journal impact factor (JIF) is calculated by the scientific division of Clarivate Analytics® and commonly used to rank and evaluate the grades of various scientific journals in the Journal Citation Report® (JCR) database [4]. Since the introduction of the JIF, a growing stream of studies have discussed the mechanism, characteristics, and applications, as well as limitations and misuses, particularly in recent years. JIF first aims at evaluating scientific journals, but it is now increasingly used to assess research and guide publishing strategies of researchers and institutions. In this respect, JIF has gradually become an important indicator to measure the quality or reputation of a journal [5, 6]. Some publishers, for example, deem impact factor values as an indirect marketing tool for selling their journals. In [6], Larivière and Gingras argued that the JIF not only reflects the “quality” of a paper but also represents the reputation of the journal in which it is published because the possibility of citation to a paper is also significantly affected by the impact factor of the journal. In addition, JIF is often regarded as an important reference that can be relied on by most authors when they are preparing for submission [7]. A major weakness of the JIF, however, is that the two-year citation cycle of the JIF is considered too short to reflect the real academic impact of publications in some “slow” disciplines [8, 9]. The concept of JIF was first introduced by Garfield in 1955 [10], and its calculation can be formulated as the following form:where denotes the JIF of the kth year; denotes the number of papers published in the year; and denotes the number of those citations received during the kth year by the papers published in the year.

According to the definition of the JIF, it can be found that the value of the JIF in the kth year is computed by and in and years. In reality, however, the variation of the JIF is affected by many statistical and sociological factors. For instance, the value of the JIF would be strongly influenced by the research field of the journal and the type of the journal (full papers, letters, and reviews) [11, 12]. JIF is calculated via the number of references; thus, it could also be improved by increasing the number of references [1214]. In [15], Zhang and Van Poucke suggested that journals with short publication delay tend to receive higher impact factors. Yu et al. developed a transfer function model to simulate the distributed citation process [16]. However, such a transfer function model does not consider the differences of citation behavior in different disciplines. Notwithstanding some researchers pay great attention to the significant difference in the levels of impact factors over different disciplines, they often neglect the results affected by the process of submission and citation [9, 12]. Clearly, scientific publication process and JIF need to be studied as a unified system, particularly as a complex system [1719]. Scientific community, for example, in its most typical form, can be modeled as a complex system where researchers interact with each other taking the roles of authors, journal editors, and reviewers. Computer simulations are able to reproduce certain simple behaviors, and therefore, they can be used to model and reveal some correlations that are very difficult or impossible to be studied in real life.

Random theory based on probability has long been the dominating methodology in the domain of scientometrics [2022]. As a typical method of social system simulation, Monte Carlo method is very suitable for modeling such issues. In contrast to other approaches, the key idea of the method here is primarily based on distributed social computing. In this paper, we employ parallel social systems [2326] to simulate and analyze the correlation between the variation of the impact factor and the behavior of submission and citation, which can be used to interpret the JIF dynamics. In essence, the main objective of the parallel system is prediction for analysis and control. Simulation models are used to generate a lot of empirical data by setting various conditions and tuning related parameters, while statistical analysis is to perform mathematical statistics and analysis on existing data and information, which can also be applied for prediction. The fundamental difference between statistical analysis and simulation analysis is data volume rather than prediction. Bibliometric indicators are quantitative measures of science based on the publication and citation data. They are characterized by a quantitative approach and evaluation scales, which can be macro, meso, or micro, and reveal the scientific performance of a particular field over time. This paper aims to address these problems by using the dynamic modeling approach of the system. The demand for the model is completely different when the model is designed for prediction or theoretical analysis. For the prediction systems, various possibilities should be taken into account as much as possible, and the hyperparameters of the model should be tuned based on the dataset observed from the real world. By contrast, for theoretical analysis, the system should be as simple as possible to grasp merely those prominent features. Thus, our model only considers some basic social factors, with the idiosyncrasy of individuals embodied by random noises. To the best of our knowledge, most of the parallel social systems are composed of individuals and communities, which are agent-based and networked [2732], with some of the analytical properties potentially acquirable by relevant theories in system science [26, 33]. It is worth nothing that the primary goal of virtual simulation systems is not for mimicking the real-world counterparts quantitatively, instead, they should be very helpful for verifying, interpreting, and enlightening the underlying factors and the possibility of some of the phenomena and mechanisms, qualitatively. Such a framework has already been widely and effectively applied for analyzing various complex social systems, e.g., [3438]. Moreover, the virtual citation networks generated by the model here are well compatible with the most typical scale-free networks [34, 39] (please see Section 2.3 for detailed explanation). We hope our research could provide meaningful theoretical hints and enrich the relevant literature studies for better comprehending the mechanism of JIF dynamics and further aiding to facilitate enhancement in managing academic journals. The key contributions of this work can be summarized as follows:(i)We employ parallel social system theory to interpret the mechanism of JIF dynamics(ii)We develop an empirically driven parallel experiment framework that analyzes interactions between JIF and submission and citation rules (see Section 2)(iii)The correlation between JIF and some elements and behaviors that are usually ignored is revealed via simulation experiments

The rest of this paper is organized as follows. Section 2 introduces the main framework of the model in detail. The relations between JIF and certain elements and behaviors are revealed and analyzed in Section 3. Finally, Section 4 presents the concluding remarks.

2. Model Construction

In this study, we consider the paper submission process, citation process, and JIF to construct a comprehensive system. We develop a virtual citation community in which authors submit and cite papers, and journals review and publish papers. Simultaneously, the model will record those already published and cited papers and compute automatically the impact factors of the corresponding journals based on the citation distribution. The simulation model is discrete-timed, and the unit of time is months, with each iteration representing a round of submission, publication, and citation. The model is implemented in MATLAB (MATLAB and Statistics Toolbox Release 2018b, The MathWorks, Inc., Natick, Massachusetts, United States). The program code can be found in https://github.com/pjzj/JIF-Modeling.

2.1. Model Initialization

The first stage is setting the parameters of the simulation model, where general knowledge of the system is considered, namely,(1)The number of journals ().(2)The number of issues of a journal published per year (), and each month corresponds to each issue.(3)The number of papers published per issue (). According to and , the number of papers published per journal within one year can be computed by .(4)The average review time of journals (). This parameter denotes the length between two time periods, which are the time of a paper being submitted and the time of the paper being accepted, respectively. For instance, the submitted date of a paper (no. 0826) is December 2018; then, the accepted date of the paper is October 2019, (months), and represents the average value of published in the journal (see Appendix Figure 1). It should be explained that the average review time () is a macrostatistical indicator, which reflects an overall level of process speed, being different journal by journal. Furthermore, for simplicity, assume that publication occurs instantly as a paper is accepted in the model.(5)The average number of references per paper in a journal ().(6)Here, we define and assign some relevant parameters such as , , and because each of the subsystems can be parameterized independently (see Sections 2.2 and 2.3 for more details).

Note that, in the current model, similar disciplines can be represented by identical settings, while different disciplines are differentiated via tuning the parameters above. In particular, the calculations of impact factors between journals and their series (for example, Nature and Nature Cell Biology) are also independent of each other. Moreover, according to the definition of the impact factor which considers the correlation between the number of papers a journal published in the previous two years and the total number of citations, we uniformly set the impact factor in the first two years to 1 that could avoid certain undesirable situations in the citation process.

2.2. Modeling of Submission Process

The second stage is the modeling of the submission process, which mainly consists of the following three steps:(1)Characterization of papers:

The variation in paper quality is an objective fact, which is somewhat similar to the situation of examination scores in education. However, paper quality is difficult to be measured practically. For such situations, people generally assess through scoring questionnaire manner. Existing typical instances include publication scores (https://publons.com/), which are generated by selecting a score from 1 to 10 in two fields, jointly indicating the measure of a paper’s methodology, rigor, and novelty, as well as relevance to its field. Following the routine, in our model, the intrinsic quality of the paper is scored by number , with 0 being the worst and 10 being the best, and the overall quality of all papers follows a skewed distribution with certain expectation and variance. According to the expectation and variance of the distribution, the model will create corresponding numbers of papers with different scores, automatically (shown in Appendix Figure 2). In reality, due to the fact that the quality of the majority of papers is mainly mediocre-leveled, those papers with extremely high or low quality would be relatively less. Therefore, we assume that the quality of each paper is drawn from positively skewed distribution over the interval , which is implemented by gamma function. The basis for doing this is that, as the number of papers () keeps on increasing, the papers with ultrahigh quality become rare, and their number holds relatively stable, with only minor increase if above a limit. To ensure that the papers with high quality remain relatively constant regardless of changes of , a numerical integration with respect to gamma function over the interval is conducted in the model, which is computed by the following form:where (set at 0.92) is the threshold, is the number of papers with high quality, and is the gamma function.(2)Journal targeting process:

In general, the authors would give priority to those journals with higher reputation in the field when they are preparing to select journals for submission [5, 6, 40]. Of course, although the high-quality papers covering platelet function in vitro have a great impact in this scientific subdomain, they will never be accepted or published in high-ranked journals, which can be attributed to a fact that the interest for the readers of some top journals such as Nature and Science is not comparable with the technical breakthrough at a microlevel for specific journals dealing with platelets. Therefore, for certain specific topics, the papers with higher quality are usually submitted to the journals that have the highest ranking covering a specific topic. In this work, the correlation between reputation and JIF is assumed to be linear and positive [5, 6]. That is, the reputation of a journal is rescaled by the impact factor of the journal, and the reputation of a journal is higher if the impact factor of this journal is greater. Furthermore, each paper has an initial estimation quality affected by the author’s scientific level (rescaled by ). This score will determine how an author chooses a target journal. In principle, the professional scholars usually assess their work more accurately, and vice versa. In most cases, a paper with high quality also implicates the scientific level and innovation of a “competent” author behind it. Suppose that the author’s estimation quality () on the paper mainly depends on the academic level () of the author. The relation between the intrinsic quality of a paper and the author’s estimation quality on the paper could be depicted in the following form:where is the intrinsic quality of each paper, is the author’s estimation quality on the paper, and is the estimation noise by the author, which is a random value around 1. A paper with relatively higher tends to have that more approximates to 1. Thus, the variance of indicates the magnitude of noise and should be negatively correlated to the paper quality. In the parameter settings of the model, follows a truncated Gaussian distribution with expectation 1 and variance (so that only positive numbers can be sampled). Note that parameters and are positive numbers that jointly affect the aggregation or dispersion of the distribution curve.

Next, we set a condition that is corresponding to the psychological expectations of authors comprehensively considering adventurism and conservatism: a paper should be submitted to a journal with the absolute difference between and the average quality of the papers already published in the journal last year () being less than coefficient , namely, . Taken together, the authors would give priority to the journals that meet conditions and have higher impact factors when they are submitting the paper (see Part I of Figure 3).(3)Peer review process and publication:

Here, all journals are characterized by two state variables: a reputation value (rescaled by impact factors) and related rejection or acceptance thresholds. The journals first have an evaluation on each submitted paper, which can be regarded as a simplification form of the peer review [36]. In the current model, it is assumed that all reviewers in a discipline are peers, and therefore, the selection strategy of reviewers is a random selection process. The relation between the real quality of a paper () and a journal’s estimation on the paper () can be represented as follows:where denotes the real quality of a paper, denotes a journal’s assessment on the paper, and follows a lognormal distribution, i.e., .

Subsequently, according to the evaluation scores, the journals will rank the submitted papers in the descending order and make decision to accept or reject the papers based on their rankings. The rejection or acceptance thresholds are determined by the number of papers published per issue. In the model, each journal in each issue only accept (publish) the highest top 10 papers. Furthermore, only papers be accepted that, in each resubmission round, are in the top 10, and the rejected papers will be resubmitted to other journals in the next round or ultimately be abandoned and remain unpublished (shown in Part II of Figure 3). Figure 3 illustrates the modeling of the submission process, which mainly consists of two parts.

2.3. Modeling of Citation Process

Here, a virtual simulation model is developed to structure citation networks, in which papers are submitted, published, and cited in sequence. The papers are generated one by one in our program. Each time a new paper is increased, the set of papers is also correspondingly added by one item, which can be viewed as a new node in the virtual citation network. The number of citations a paper has already obtained is the degree of the corresponding node. The number of new edges that grow out of a newly added node in the citation network is the number of references of this new paper. Whether or not a new node should link to any existing node is assigned randomly, through probability . Following this way, a virtual citation network was gradually structured. Therefore, the key of calculation of the impact factor is how to assign the value to properly. In this work, enlightened by certain abstract simulation models on bibliometrics [12, 14, 36], assume that the probability of a paper to be cited in a specific discipline () is comprehensively affected by three factors.(1)The quality of a paper ():

As described in Section 2.2, according to the submission rules, the papers with different will be submitted and published in the corresponding journals. In this model, the correlation between and is supposed to be linear and positive, i.e., the corresponding effect function can be defined as follows:with , , and correspondingly, .(2)The paper age ():

For decades, the time dependence in the preferential attachment mechanism (PAM) [39] has always been a hot topic in citation networks. Younger papers will draw increasing attention via citations, while older papers are often overlooked by scholars. This aging effect is a universal phenomenon in growing networks. Furthermore, timeliness of research contents has a different effect on different disciplines. For instance, scholars in some experimental disciplines prefer to cite younger scientific achievements (for example, biology and AI), whereas in contrast, in certain theoretical disciplines (for example, Mathematics), those existing literature studies that have been fully validated are more likely to be cited [12, 15]. To this end, we consider employing a function to measure the relation between the effect to the probability of a paper to be cited by another paper in the same discipline and the paper age , and the function can be expressed as the following form:where is a hyperbolic tangent function; indicates the paper age, with its unit being months; and is a coefficient to the probability of the paper to be cited (see Appendix Figure 4).(3)The number of citations a paper receives ():

Humans are social animals, and therefore, our opinions or selections would be strongly influenced by our peers. This is particularly true in the citation network [34, 39]. Suppose that a weight factor to the probability of the paper to be cited is higher if the current number of citations (node of the citation network) of this paper is greater. The effect of the number of times a paper has already been cited to the probability of the paper to be cited in our model is also formulated by a function, which can be written as follows:with being a hyperbolic tangent function, being the number of citations already obtained, being the weight factor to the probability of the paper to be cited, and being the parameter changing the overall shape of the curve (shown in Appendix Figure 5).

Finally, according to the descriptions of the citation behaviors, the probability of a paper to be cited in the model can be comprehensively defined as a coefficient combination of a series of factors, which can be represented as the following form:

In Equation (8), denotes the real quality of a paper, denotes the paper age, denotes the number of citations a paper has, and is an i.i.d. zero-mean Gaussian noise. Besides, it should be explained that the probability to cite is the kernel of the program, but note that what really affects the citation result is merely the relative value of compared with other papers, rather than its absolute value. This plays a very important role in the model because it is easy to compute JIF when given in equation (8). Overall, the virtual scientific system is mainly composed of three sections, which are initial setup, submission, and citation (see Figure 6).

By the end of this section, one should notice that the most key feature of the analysis based on parallel systems is that a parallel model stands independently, each of the subsystems can be parameterized independently, and can itself be viewed as a feasible alternative of real systems. Such a parallel system is particularly suitable for solving situations with an ultracomplex mechanism or with unavailable data. Based on our knowledge from system analysis, either qualitative or quantitative, we expect to discover certain general laws behind various phenomena.

3. Computational Experiments and Data Analysis

3.1. Model Implementation

In this section, we perform simulations for the JIF dynamics model proposed in the last section. We first simulate and analyze the relation between the average quality of the journals () and their impact factors within 13 years. As stated in Introduction, the main objective of the simulation system is not for comprehensively and quantitatively mimicking the real-world scenarios. Thus, the model neglects those less important factors such as the very specific situation of journal disappearing or renaming. The comparison results of average quality and impact factors of journals are shown in Figure 7.

It can be seen from Figure 7 that the trends of color change are similar. In other words, the average quality of journals is consistent with the variations of the JIF. The journals with higher average quality (for example, journals 9–12 with ) always have greater impact factors. In contrast, the impact factors of the journals with lower average quality (for example, journals 1–4 with ) are unsatisfactory. The above simulation result is also demonstrated by the Gaussian fitting (bottom right in Figure 7). The explanation of this correlation is that the sample of papers with a higher average quality will tend to have a higher average number of citations in the model. Thus, the strength of this correlation between average quality and average citations will depend on the variation of the quality and citation distributions.

Next, we conduct experiments to verify whether or not JIF has an expected variation after it is artificially manipulated by the model. To see this, we artificially manipulate the impact factors of two different journals in a random year. The average performance in the 100 Monte Carlo simulation runs of a normal JIF and manipulated JIF is shown in Figure 8. Figures 8(a) and 8(e) are the impact factors without manipulation. Figures 8(b) and 8(f) are the impact factors with one manipulation in the 8th year. Figures 8(c), 8(d), 8(g), and 8(h) are the impact factors with two manipulations in the 5th and 10th years.

We can see from Figure 8 that the trends of the impact factors without manipulation do not change much within 13 years (Figures 8(a) and 8(e)), while the trends of the impact factors with manipulation are fluctuating strongly (Figures 8(b)8(h)). In addition, we note that when the impact factors are artificially decreased or increased to a certain level, they will maintain corresponding trends in the next few years until they are manipulated again. This is mainly due to the fact that the submissions only depend on the JIF of the last year in the model, whatever happens last year will immediately determine the outcome of the next year, and that will be conditionally independent of the situation two years (or more) before. Actually, it was already concluded in an influential study [9] that “…by manipulating JIF in different ways, their JIF will increase quickly, …, as an academic evaluation indicator, JIF is able to distinguish the differences of certain academic performances such as citation and publication process.”

3.2. Analysis of Features Correlating with Impact Factors

The task of this section is to analyze how the JIF changes under different parameters. The baseline values, variation intervals, and variation step of parameters in the model are reported in Table 1.

From an analysis of Table 1, it can be found that all else being equal, parameter , average review time (), and average number of references () increased the impact factors’ gap the most in all tests. These results are confirmed by a probabilistic sensitivity analysis, which evaluated the sensitivity of and average JIF to simultaneous changes in multiple parameter values away from their baseline values (shown in Figure 9).

Additionally, we conducted 50 realizations at each of the 5 tested parameter values across a given range, and the univariate sensitivity analysis showed that the JIF in the current model is also most sensitive to parameters and . By comparing and analyzing the curves of Figure 10, under three different parameter settings (S1 (Figure 10(a)): , , and ; S2 (Figure 10(b)): , , and ; and S3 (Figure 10(c)): , , and ), it can be seen that the overall trends of impact factors are similar. In greater detail, the impact factor of the journal with the number of references, is much higher than the impact factor of other two types of journals with and . This result seems to show that the greater the average number of references () of a journal, the higher the impact factor of the journal would tend to be. Moreover, we also note that the overall trends of impact factors in three figures are correspondingly decreasing with the average review time () becoming longer.

Now, we analyze what happens when different parameters take values from small to large, that is, we would like to verify how the parameters such as and influence the dynamics of the model. To see this, we performed 100 simulation runs, each lasting 13 years (156 months) of simulated time. For each simulation run, five parameter values (, , , , and ) were sampled from each of the maximum, baseline, and minimum given in Table 1. The simulation results are summarized as follows: Figure 11 shows the variations of the JIF influenced by minimum, baseline, and maximum of parameters , , and when and in the 100 Monte Carlo simulations (Figures 11(a)11(c), respectively); Figure 12 shows the variations of the JIF influenced by minimum, baseline, and maximum of parameters , , and when and in the 100 Monte Carlo simulations (Figures 12(a)12(c), respectively); and Figure 13 shows the variations of the JIF influenced by minimum, baseline, and maximum of parameters , , and when and in the 100 Monte Carlo simulations (Figures 13(a)13(c), respectively). Table 2 gives the relation between the average JIF and different parameters in the 100 Monte Carlo simulations.

Through extensive simulations, under three different conditions (Figure 11: and ; Figure 12: and ; and Figure 13: and ), an important observation from the simulation results in Figures 1113 is that no matter what the conditions of parameters and are, the average JIF would increasingly grow with the decrease of parameters , , and . It is worth remarking that the values of and jointly reflect the citation lifetime of papers, where a steeper one, namely, lesser , indicates that the corresponding journals in some particular discipline tend to cite younger papers, whereas greater signifies longer citation life cycle of papers and wider range of aging distribution of references in that discipline. This can be understood from the hyperbolic tangent function in Appendix Figure 4 that when parameters and are very small, the probability of those papers with older age () to be cited by others is almost zero.

From Table 2 and Figures 913, we see that the value of the JIF is comprehensively influenced by different factors such as average review time (), average number of references (), and citation distribution ( and ). The maximum average JIF is obtained from the conditions as follows: ; ; ; ; and . The minimum average JIF is achieved by setting ; ; ; ; and . It is clear that the value of the JIF is positively correlated to the average number of references (), whereas it is negatively related to the other four parameters (i.e., , , , and ). The experimental results demonstrate that the younger of the reference (namely, the smaller the parameters and ), the higher the impact factor of the journal would tend to be. Similarly, a journal would achieve a higher impact factor if the journal has a greater average number of references (). Furthermore, a journal with a relatively shorter review time () tends to hold a higher impact factor and vice versa.

In order to further verify the experimental results, we select 120 journals in three different fields (biology, artificial intelligence, and mathematics) from the JCR database as a reference, which can be found in Appendix Figure 14 and Tables 35. These three different disciplines have been chosen because we can use them to represent the top, middle, and bottom of the overall impact factor level, respectively, and therefore, they should be better suited to showing general results. We recruited 20 graduate students with informatics background to record the data. Concretely speaking, we first randomly select 200 paper samples for each journal. Subsequently, according to the number of references, date of submission, and publication time displayed in each paper, the average number of references and the average review time of the corresponding journal can be easily computed. By comparing and analyzing the data from Tables 35, it can be found that there exists drastic difference in the levels of average JIF, average review time (T), and average number of references (N) over the three different disciplines. To be specific, we see from Appendix Figure 14 that the average JIF of biology is 2–5 times higher than AI and mathematics, particularly in Q1 and Q2 (biology: 9.59; AI: 3.89; and mathematics: 2.17). Interesting enough, the average number of references () of biology is also much higher than other two disciplines (biology: 46.5; AI: 43.25; and mathematics: 20.5). In addition, we note that the journals in biology have the shorter average review time () in comparison with AI and mathematics journals (biology: 6.3; AI: 11.6; and mathematics: 16.5). It is clear that these observations are basically consistent with our experimental results.

4. Conclusion

As a quantitative evaluation indicator of journal quality, JIF plays a very important role in the process of the dissemination and development of academic research. However, JIF is not only widely used but also misused, producing skewed and misleading results. For example, JIF is misused to assess individual papers, authors, publishers, and institutions. Clearly, it is of great theoretical and practical significance to analyze and optimize the mechanism and effectiveness of existing evaluation indicators. This paper developed a simulation model based on the mechanism of submission, review, and citation of papers, which can be used to reproduce the differentiation process of impact factors of different journals within a similar discipline. The study is dedicated to providing a novel experimental approach based on social parallel systems, which can structure virtual citation networks for a specific discipline. Relevant series of studies can provide enlightening and helpful hints for facilitating and managing academic journals in the future. It is worth noting that intuition and speculation taken for granted are often unreliable in science, until they are validated by scientific evidence and statistical inference. Objectively speaking, without the method and the assistance of simulation experiments here, it might be very difficult to present the experimental evidence that implies the statistical correlations between the JIF dynamics and certain behaviors and elements in publication. The simulation results demonstrate that the behaviors of submission and citation would be influenced and driven by the JIF. It is the interplay between the submission and citation that further clarifies the mechanism of JIF dynamics. From an analysis of the experimental and statistical results, it can be found that the impact factor of a journal is affected by many variables and latent factors, including the discipline field of the journal, the average number of references per journal, and the peer review time of the journal. These factors can be mainly summarized as three aspects. (1) Discipline difference: experimental results in some subject fields require relatively more time to mature due to the delay in verification and recognition. The difference is so significant that the bottom journal in one discipline may have an impact factor higher than the top journal in another field, for example, the average JIF of biology in Q3 is 2-3 times higher than AI and mathematics in Q2. (2) The number of references: in principle, JIF is computed via the count of references; therefore, it is possible to be increased by adding the number of references in a reasonable and scientific manner. Moreover, the number and the aging distribution of references in a discipline field not only reflect the timeliness and characteristic of research achievements but also have a cumulative effect on changing the overall level of impact factors. (3) Peer review time: the experimental results demonstrated that journals with short peer review time tend to obtain higher impact factors. Thus, the impact factor of a journal can be increased by scientific peer review and effective journal management.

In the future, a series of meaningful work can be conducted subsequently; an interesting study on how the impact factor is differentiated within a similar discipline can be carried out by extending the current model. In addition to simplistic algorithms such as JIF, the advantages of various methods and the factors that can be weighted should be taken into account. It also should be a meaningful future direction to refine the configuration of the review process. For instance, what is the relationship between the reviewers and the subject of the paper and whether or not decision trees could be used to make selection or decision, as well, what is the correlation between the impact factor of a journal and the publication cycle of the journal. Furthermore, certain phenomena observed in experiments could possibly be explained in analytical or statistical manners, combining theories and approaches in scientometrics and machine learning. One may consider utilizing the technologies of PCA (principal component analysis) and Laplacian score for better analyzing the distribution and the stability of the JIF between different disciplines.


A. Description of Functions and

It can be observed in Figure 4(b) that is the parameter shaping the horizontal translation of the curve, and is the parameter changing the overall shape of the curve. Mathematically, function should satisfy the following principles:(1)(2)The function is increasing in the interval

Function follows several qualitative principles:(1)The function is increasing in the interval (2)The slope of the function is always decreasing in the interval (3)

The greater the parameter and the less the steep the curve in Figure 5(b), a paper is more likely to be cited by other authors. Moreover, it is clear that the citation network is among the most typical scale-free networks. In comparison with the basic structure of the BA scale-free network, the only significant difference is that the function here is nonlinear, while the function described in [39] is simpler and linear. It can be seen from Figure 5(b) that the curves approximate linear in certain intervals, in particular when the node degree is very low. Therefore, the citation network structured here is more general.

In China, the journal rankings of Chinese Academy of Sciences (https://www.las.ac.cn/) are also based on JIF data from JCR. The impact factor of all discipline journals can be divided into quantiles (Q1–Q4), which are presented in a pyramid shape. The first quantile is composed of the top 5%, while the second quantile 6%–20%, the third quantile 21%–50%, and the fourth quantile contains the remaining. The sample contains 3 classes of 40 instances each, where each class refers to the discipline of journals (biology, AI, and mathematics, respectively), and 40 instances consist of the top 10 journals in each quantile of the discipline. The reason for doing this is that, usually, the top 10 journals in different quantiles are better representative of the citation characteristics and research foci of the corresponding discipline. The sources of sample data obtained in the work are the most popular multidisciplinary databases: Web of Science (WoS), Scopus, and Google Scholar. In addition to these, MedSci (https://www.medsciediting.com/), LetPub (https://www.letpub.com.cn/), and IEEE Xplore Digital Library (https://ieeexplore.ieee.org/) provide assistance for collecting the bibliometric indicators of some journals. For better comparison, we use the most recent data available, i.e., the data we obtained in the first week of August 2019. Figure 14 and Tables 35 show the comparison of average JIF, average review time (), and average number of references () in Q1–Q4 for the 120 journals belonging to biology, AI, and mathematics.

Data Availability

The program code will be made available on https://github.com/pjzj/JIF-Modeling.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the publication of this paper.


This work was supported by Fundamental Research Funds for the Central Universities (Grants 2019RC29 and DUT19RC(3)012), the National Natural Science Foundation of China (NNSF) (Grants 61672130 and 61972064), the Gansu Provincial First-Class Discipline Program of Northwest Minzu University (Grant 11080305), and LiaoNing Revitalization Talents Program (Grant XLYC1806006).