Abstract

Artificial intelligence (AI) has emerged as a transformative technology with applications across multiple domains. The corpus of work related to the field of AI has grown significantly in volume as well as in terms of the application of AI in wider domains. However, given the wide application of AI in diverse areas, the measurement and characterization of the span of AI research is often a challenging task. Bibliometrics is a well-established method in the scientific community to measure the patterns and impact of research. It however has also received significant criticism for its overemphasis on the macroscopic picture and the inability to provide a deep understanding of growth and thematic structure of knowledge-creation activities. Therefore, this study presents a framework comprising of two techniques, namely, Bradford’s distribution and path analysis to characterize the growth and thematic evolution of the discipline. While the Bradford distribution provides a macroscopic view of artificial intelligence research in terms of patterns of growth, the path analysis method presents a microscopic analysis of the thematic evolutionary trajectories, thereby completing the analytical framework. Detailed insights into the evolution of each subdomain are drawn, major techniques employed in various AI applications are identified, and some relevant implications are discussed to demonstrate the usefulness of the analyses.

1. Introduction

Artificial intelligence (AI) has emerged as a transformative technology that holds great promise in economic, social, medical, security, and environmental applications. Several studies have explored the applications of AI in several widely varied contexts. Responsible AI [1], AI for social good [2, 3], AI for SDG [46], and AI for all [7] are some popular areas of study related to applications of AI and reflect the extent of expectations from the technology. AI has attracted the attention of the academic research community as well as technology and service companies and is driving the transformations under the Fourth Industrial Revolution [8, 9]. The ubiquitous growth of AI has even led to the development of promotional and regulatory standards by various governments. For example, European Union’s Artificial Intelligence Act, 2021, provides guidelines for legal, ethical, and technological requirements for application of AI, and providing support for research in AI [10]. In many countries, national strategies for AI have been delineated such as National Strategy on Artificial Intelligence (NSAI) in India. Similarly, other countries such as Canada (Pan-Canadian AI Strategy-https://cifar.ca/ai/), China (New Generation AI Development Plan for 2030-https://www.theconstructsim.com/98-chinas-ai-plan-for-2030/), Japan (AI Strategy 2019 & Integrated Innovation Strategy 2020-https://www.meti.go.jp/press/2020/01/20210115003/20210115003-3.pdf), Mexico (Towards a national strategy for AI in Mexico 2018-https://oecd.ai/dashboards/countries/Mexico), Singapore (National AI Strategy-https://www.smartnation.gov.sg/files/publications/national-ai-strategy-summary.pdf), South Korea (Toward AI World Leader beyond IT-https://oecd.ai/dashboards/countries/SouthKorea), Taiwan (AI Taiwan Action Plan-https://ai.taiwan.gov.tw/), the UAE (UAE Strategy for Artificial Intelligence (AI)-https://ai.gov.ae/), and the UK (National AI Strategy-https://www.gov.uk/government/organisations/office-for-artificial-intelligence) have also released strategies to promote the use and development of AI. National and International exercises are conducted for regular stocktaking and review of these developments (OECD AI Policy Observatory-https://oecd.ai/), World Bank, (United nations AI for good-https://aiforgood.itu.int/), UNESCO, United States Review committees, and Indian Parliamentary committee on S&T are some examples).

The corpus of research work related to the field of AI has grown in volume as well as in terms of the application of AI-related methods and tools. AI’s role and impact have been analysed using bibliometric methods. For instance, Wamba et al. [3] analysed the AI publication data from 1977–2019 from Web of Science core collection to understand publication patterns, authorship structure and major areas of research. Similarly, Lei & Liu [11] and Ho & Wang [12] performed a bibliometric analysis of AI for different time periods. However, these studies utilized traditional bibliometric techniques to draw the conclusions on the growth and publication trends. Given the rapid growth, thematic evolution and wide application of AI in diverse areas, the measurement and characterization of the span of AI research is a challenging task. Newer themes of research (e. g., large language models) have emerged while attention on several topics has gradually faded. Therefore, there is a need to analyse and understand the overall context of growth of AI research and trace the thematic evolution of AI research during the last decade, in a more systematic and in-depth manner. A commonly employed bibliometric approach, however, may fall short to uncover the disciplinary span and thematic evolution trajectories. Therefore, a suitable framework is required for this purpose. This study attempts to bridge this research gap by presenting an analytical framework comprising of Bradford distribution and path analysis which can be used to identify the patterns of growth, thematic evolution trajectories, methods and techniques employed, and the application areas of AI.

More precisely, the article attempts to answer the following research question:RQ: Can a framework comprising of Bradford distribution and path analysis provide the macroscopic and microscopic views of research growth and thematic evolution of AI research?

The research question guided the study to start looking at the macroscopic picture of AI research first and then continue a thorough investigation of the published literature (microscopic picture). The proposed framework has been applied to understand what the trend of growth in AI publications is, what are the plausible factors which have contributed to this growth, and identify what are the major evolutionary trajectories and methodologies/techniques which have been followed by the AI application areas in the prominent sub-disciplines. The study is novel in the sense that it provides an analytical framework which can be used for examining the scientific growth of any specific domain and present a more in-depth analysis unlike the shallow analytics obtained through a traditional bibliometric exercise. The findings from our study provide crucial insights into the areas of research of AI, its thematic evolution, most prominent tools, and techniques.

The last couple of decades has a seen remarkable progress in AI and its application domains. A lot of studies were performed by researchers of different backgrounds to describe the application of AI in particular domains. For example, Nti et al. [13] defines AI application in the engineering and manufacturing (EM) domain, Spanaki [14] performed an analysis on AI-driven agricultural technology (AgriTech), Cao [15] did a study of applications of AI in finance, Nourani et al. [16] reviews the application of AI-models in hydro-climatology and so on. Nowadays, healthcare is becoming the major sector having applications of AI-based systems [1720]. The multiple applications of AI are augmenting research by enabling efficient data collection, curation, representation, hypothesis generation, experimentation and simulation [21]. Since AI promises to provide solutions to problems of different domains, therefore, some researchers tried to map these diverse applications of AI using approaches such as bibliometric analysis, bibliographic coupling, and scientometric analysis.

Saheb et al. [22] performed a bibliometric analysis to identify the ethical concerns of AI in the healthcare domain, Frank et al. [23] explored the bibliometric evolution of AI research and its related fields using citation network, Munim et al. [24] did a bibliometric analysis of artificial intelligence in the maritime industry using bibliographic coupling techniques, Hwang & Tu [25] conducted a bibliometric mapping analysis to explore the role and research trends of AI in mathematics education, and Niu et al. [26] did a bibliometric analysis to facilitate the progress and trends in artificial intelligence research from 1990–2014. Similar to this Darko et al. [27] did the scientometric analysis to raise the level of awareness of AI and facilitate building the intellectual wealth of the AI area in the Architecture, Engineering, and Construction (AEC) industry. Wamba et al. [3] analysed the AI publication data from 1975–2019 from Web of Science core collection to understand publication patterns, authorship structure and major areas of research in AI. Lei & Liu [11] and Ho & Wang [12] also performed a bibliometric analysis of AI for different time periods. The studies above utilized traditional bibliometric techniques to draw the conclusions on the overall growth and development trends of AI research.

One recent study also utilized the machine leaning based approach called Latent Dirichlet allocation (LDA) to assess the research characteristics of AI-related publications. These include transitions in important topics over time, important topics covered by different publications and discussion platforms (journals, conferences), and regional focus on specific topics [28]. Liu & Chen [29] have proposed a modification to LDA model which improves its efficacy at topic modelling. Its results present the thematic evolution and cross-disciplinary applications of AI using topic evolution networks. The approach used in these studies focuses on extracting most prominent topic clusters and then assessing the evolution of the field within these clusters. However, most of the existing studies fall short in uncovering the major thematic evolution trajectories in AI research. A deeper understanding of the development of the field requires the combined rigorous assessment of various knowledge production indicators [30, 31]. This is an aspect which the present study addresses by proposing an analytical framework comprising of Bradford distribution and path analysis for performing an in-depth analysis of the research publication data (both publications and citations data) from the field of AI. It attempts to obtain deeper insights about publication and journal trends, major themes in the publications and their application areas, and most effective AI methods used in those application areas. This approach should serve as a good alternative to resource intensive way of a literature review of a field such as the study by Wang et al. [21]. The article, therefore, is one of the first studies to present a macroscopic as well as microscopic picture of growth and thematic evolution of AI research during the last decade using a suitable analytical framework.

3. Methodology and Data

3.1. Bradford Law of Scattering

The field of bibliometrics and scientometrics focus on various methodologies and techniques to quantify and analyse the impact of knowledge resources. These include statistical methods such as citation/author/co-word analysis, bibliographic coupling and bibliometric laws. The bibliometric laws are the basic pillars for assessing the advancement or evolution of scientific literature. Various studies have explored the applications of these laws in order to understand the patterns in scientific productivity. Bibliometrics regards research articles as one of the most credible and reliable indicators of productivity. This is usually measured using journal publications, citations and various indices. A relatively less utilized method which is useful to evaluate the productivity trends in specific fields of research is the Bradford law of scattering. It is specifically useful in comparing the productivity of a group of research journals in a discipline as compared to individual journal productivity by the other methods. Therefore, this study has utilized the Bradford law of scattering [32] to better understand the growth and development in AI research. Bradford law underlines that “If scientific journals are arranged in order of decreasing productivity of articles on a given subject, they may be divided into a nucleus of periodicals more particularly devoted to subject, and several groups containing the same number of articles as the nucleus, then the groups will be 1: n: n2……i.e., groups are in the form of geometric progression”. Here “n” is a multiplier which is referred as “Bradford Multiplier”. This statement is also referred as “Bradford verbal formulation. It is used to establish the quantitative relation between journals and scientific publications of a specific discipline by classifying the journals into several groups based on the productivity levels. These groups are named as Core Group (most productive zone), Allied Group (moderately productive zone), and Peripheral Group (least productive zone). Bradford bibliographs plot the cumulative proportion of journals on the x-axis (beginning with the most prolific one) and cumulative percentage of articles gathered from the journals on y-axis. An illustrative diagram for the data for AI-related publications for a given year, used in this work is shown in Figure 1.

Various authors have applied the Bradford law of scattering for analysing the publication trends in their fields of studies. These measurements of journal productivity include economics [33], agriculture [34], horticulture [35], crop science [36], zoology [37], microbiology [38], paediatric surgery [39], population genetics [40], neurosurgery [41], bioenergy [42], physics [43, 44], astronomy and astrophysics [45], library and information sciences [46], neural network [47], and information sciences [48]. These studies have led to the development of various models for its implementation and the interpretation of the observed trends thereof [4952]. Leimkuhler [53] described a model based on Bradford’s verbal formulation, which is as follows:

If R(r) refers to the cumulative number of Articles produced by the journals of rank , thenwhere “a” and “b” are the constants. This is known as the Law of Leimkuhler.

Later, Egghe [54]; Egghe [55] referred to those laws of Bradford explained by Leimkuhler and showed that Bradford law and Leimkuhler model are mathematically equivalent and also performed an application of the underlying law of Leimkuhler. This explanation is referred as “Egghe’s Leimkuhler Model” and is defined as follows:

Here, k denotes Bradford multiplier; γ denotes Euler’s number (and its value is 0.5777); p denotes the number of Bradford groups, and denotes number of articles in most productive journals.

Although we have several interpretations/models for Bradford law [5663], but Egghe’s explanation for the Leimkuhler model is quite popular among them [64]. Hence, this study applies the Bradford law using Leimkuhler equations (2) Now, for the application of this model, the value of “p” has to be decided (3,4, 5, …). While assigning a numerical value to “p” for evaluation of real-world data, the finite nature of data has to be considered.

Once the value of p is decided, Egghe’s Leimkuhler model can be used to calculate Bradford Multiplier.

This value of k can be used to define distribution of journals across various groups (i.e., p). As per the Bradford law, the resulting journal distribution is of the form of . Let for instance, r0 denotes the no. of journals in first Bradford group and T indicates the total number of journals then if we have “p” groups then T can be expressed as follows:

This approach is applied for further evaluation of the productivity distribution for each year.

3.2. Path Analysis
3.2.1. Importance of Citation Networks for Analysing the Development of Subject Specific Research Areas

The method of network analysis offers many tools and techniques for addressing wide range of real-world problems by modelling them as networks and looking at their properties. Scientific and technological literature can be represented in networks consisting of several such relationships between their entities (e.g., papers or patents, authors/inventors, institutional affiliations/assignees, etc.). One of the major relationships is citations, where for a pair of works, the latest work cites previous related work. A chain of citations forms a structure called citation network. Citation networks can be utilized as an important tool of knowledge extraction, i.e., to mine the scientific literature of any field and are helpful to better understand the development of the field and its associated sub fields. Use of citation analysis was pioneered by the authors of [65, 66] and since then, the development of citation analysis has been marked by the invention of new techniques and measures, the exploitation of new tools, and the study of different units of analysis. Some of the researchers present citation analysis as a method for research assessment [67], whereas some researchers focus on the critical assessment of major problems in citation analysis techniques [68].

One of the popular methods that analyses the citation network for evolution of research area is path analysis. Path analysis is an effective network-based scientometric approach for mapping technological trajectories, exploring scientific knowledge flows, and conducting systematic literature reviews. It works upon unweighted citation networks after (i) weight assignment through diligent methods (such as traversal-based weight assignment schemes) such as SPX [69, 70] and FV gradient [71, 72] methods and (ii) tracing the connective threads of knowledge through search schemes like (i) forward (local) search, (ii) global standard search or critical path method [69], (iii) backward (local) search, (iv) key-route (local) search, and (v) key-route (global) search [73]. Some of the previous studies have utilized these weight assignment and search schemes to get the evolutionary trajectories of knowledge flows in the citation network. For example, like in medical treatment [74, 75], human resource development [76], archaeology [77], and information technology for engineering [71, 78]. All these studies utilized the important paths to understand the evolution of the specific research domain. A brief description of SPC (which is most significant among SPX) and FV gradient weight assignment and search schemes are given below:

3.2.2. Weight Assignment Methods

(1) SPC Method: A Short Revisit. As mentioned earlier, SPC method is one of the weight assignment methods that belong to the SPX category (traversal-based weight assignment methods) developed by Batagelj [70]. In SPC method, all sources and sinks are identified in citation network and the number of search paths passing through them is counted. The counted number of paths is assigned to the respective arcs as their weight and thereby convert the unweighted network to SPC weighted network.

(2) FV Gradient Method: A Short Revisit. Citation network, which is an information network, is characterized by the flow of information/knowledge from one paper to another via links of citation. In a citing-cited pair, this knowledge flow is opposite to direction of the citation link. Due to this flow of knowledge, flow vergence, i.e., convergence or divergence of knowledge occurs at some vertices. Thus, in a time point (of analysis), a work can be treated as either in flow convergence mode or flow divergence mode depending on the dominance of flow convergence or divergence. Because the dominance in flow can increase and shift towards divergence, each work can be considered to be have a potential to improve its flow vergence. This potential is termed as flow vergence potential of a work, which can be expressed as follows:where is termed as the flow vergence index by Prabhakaran et al. [71] because the positive value of the index can clearly indicate works that are flow divergent and negative value of the index indicate works that are flow convergent as shown in Figure 2 (right). More details about theoretical origin and rationale behind the formulation of the FV index is already covered by Prabhakaran et al. [71]; Lathabai et al. [72]; Lathabai et al. [79]; Lathabai et al. [78], and Prabhakaran et al. [80]. Existence of flow vergence potential also implies the existence of difference or gradient in flow vergence potential between a citing-cited pair of papers. This potential difference between works i and j connected by an arc of citation from j to i can be termed as FV gradient [72] and it can be computed as follows:

In most of the cases, takes a positive value. However, in some special cases, when newer works (citing works) tend to perform better than cited ones due to its inherent merit or innovative appeal, arcs having can be found, making knowledge flow to appear as to have occurred from low potential work to a high potential work. This phenomenon was termed as flow vergence effect or FV effect and was used to detect [72] and predict [80] pivot papers of paradigm shift. Computation of FV gradients of all the arcs makes the network a signed weighted network. The FV gradient weights need to be transformed to take only positive values to ensure compliance with the requirements the of search schemes in software PAJEK [81]. The following transformation will rescale the FV gradient weights in [1, 2], where negative weighted arcs (that signifies FV effect) get values close to 2.where and are the highest and lowest FV gradient weights assigned to links in the whole network.

3.2.3. Search Schemes

Important search schemes are forward, backward, global, key-route (global and local) searches. In forward search, among all the arcs originating from all the sources, the ones with highest weight will be selected and this procedure will be repeated in greedy fashion till sink papers are found. When there are weight ties, both the arcs will be considered. In backward search, unlike the forward search, search originates from sink papers till source papers are obtained. Other than direction of search, everything is same as that of forward search method. In global search method, instead of selecting an initial arc and its subsequent arcs in a greedy or ‘local best’ fashion, for all the source-sink paths, the total weight of the paths (which is equal to the sum of weights of all the arcs that forms that path) will be computed and the highest among that will be considered as the critical path/global path. In key-route search (local) [73], instead of initiating search from source or sink (which invokes the risk of missing the highest weighted arc in the network), search commences from the terminal nodes of key-route (the highest weighted arc). This search can be conducted to retrieve multiple paths with the selection of multiple key-routes. In key-route search (global), instead of tracing from both the terminal nodes of key-routes, a global search is initiated from terminal nodes of key-routes. That means, from all the paths reachable from citing paper and cited paper in the key-route, the ones with largest sum of weights will be chosen. Multiple key-route search (selecting more than one key-routes or highest weighted arcs for retrieving paths) can be used to retrieve multiple paths.

Paths that are retrieved from SPC weighted network using the above-mentioned search methods can be generally termed as SPC paths and paths retrieved from FV gradient weighted network using search methods can be termed as FV paths. The advantage of retrieving SPC and FV paths lies in the following fact. As an arc gets more weighted due to its presence in many source-sink paths, the following can be observed in SPC weighted network. Arcs with high SPC weights can be found to be concentrated in a few areas of the network. This hinders retrieval of diverse vital paths in the network even with the usage of multiple key-route search. As there is relatively less tendency for highly FV gradient weighted links to be concentrated at a few areas in the network, FV gradient method for weight assignment is found to be appropriate for retrieving more diverse paths within a discipline and even for retrieval of interdisciplinary paths [82]. FV gradient computation can be treated as a weight assignment method that probably have the ability to highlight paths that might not be highlighted through SPX methods and hence the usage of integrated approach that makes use of both weight assignment methods was recommended by Lathabai et al. [78].

3.3. Combining Bradford Distribution and Path Analysis in the Analytical Framework

The proposed analytical framework takes advantage of the strengths of its constituent techniques: Bradford distribution and path analysis. Here in addition to the traditional application of Bradford law, this framework utilizes the journal productivity distribution for exploring the dynamic behaviour and sub-disciplinary variations related to the field of AI. Whereas, path analysis includes redundancy index, a new metric for important paths selection, and assesses how to explore the important knowledge trajectories in AI to know popular methods or tools, major thematic areas, and issues that grab the attention in the discipline.

Hence, the Bradford distribution part provides the macroscopic understanding of the evolution, whereas the path analysis part of the framework provides a microscopic view, specific developments, and thematic evolution in the field of study. Both the parts when integrated together can provide a wholistic picture of the developments within the field. Hence, the two approaches complement each other well. A schematic diagram of the methodology is presented in Figure 2.

The obtained metadata of AI (described in data description section) consisted of sixty-nine attributes as provided in the data downloaded from Web of Science. Out of these, to analyse the journal productivity, the study mainly utilized the Source Title (Name of the publishing journal) and the ISSN fields, and the total number of publication records in each journal was identified for all the years. The Article Count is the variable used to represent Number of articles published in a selected journal for a given year. The journals are then ranked into lists of decreasing order of number of research articles published in them for each year. The Compounded Annual Growth Rate (CAGR) of number of publications and number of journals was also computed. The CAGR is defined by the following expression:where is the number of publication records in the year 2020, is the number of publication records in the year 2010, and t is the time period in years. In a similar way, the CAGR for number of journals was also computed.

The equations of Leimkuhler model for application of Bradford law are then applied (3) and (5)for further analysis of journal productivity and yearwise characteristics. Bradford multiplier has been calculated and the journals for each year are divided into three groups namely core, allied and periphery (As mentioned by Brookes, to test the conformity of Bradford’s law, the following are implicit conditions: (i) Number of articles in each group must remain constant during the distribution of journals. (ii) The Bradford Multiplier k must be >1. (iii) The Bradford Multiplier k must remain approximately constant.). These groupings are further examined to observe the characteristics of journal productivity distribution using several properties like concentration, dispersion, and other productivity proportions of journals and articles.

This metadata is further processed to search the subject categories in journals published in the 2010 and in 2020 using Web of Science Master Journal list. The resultant subject categories from each productivity group are plotted on a density plot using VOS viewer to visualize the subject area width of coverage under the AI literature.

The study utilizes the network scientometrics approach (path analysis) and for this purpose, the AI research publication data (as described below) are used to create the citation network through publication details and cited reference information. This network is further analysed using the approach described by Lathabai et al. [78]. For more details, please refer to the path analysis section. An unweighted citation network of publications has been generated and converted into weighted directed acyclic network (DAN) using SPC (Search Path Count) and FV (Flow Vergence)-gradient methods. With the framework of Lathabai et al. [78]; we utilize SPC and FV-gradient weight assignment and therefore the retrieved paths will be SPC-forward path (SPCFW), SPC-backward path (SPCBW), SPC-keyroute path local (SPCKW), SPC-keyroute path global (SPCKRG), SPC-critical path (SPCCPM), FV-forward path (FVFW), FV-backward path (FVBW), FV-keyroute path local (FVKR), FV keyroute global (FVKRG), and FV-critical path (FVCPM). Further, redundancy matrix of paths is computed to analyse the suitability of the particular search schemes. Based on its results, most important evolutionary paths are explored further to get the diversity of themes along with important techniques and methods that are coming into picture under AI domain.

3.4. Data Description

This study has utilized the research publication data in AI for an eleven-year period from 2010 to 2020, collected from the Web of Science (WoS) database. This recent period is selected as it is the period of rapid growth and transformations in AI research. The search process comprised multiple queries on three fields namely, “topic = Artificial Intelligence”, “year = XXXX” and “document type = article”, where XXXX ranged from 2010 to 2020. The data were collected year-wise. Only document type ‘article’ is selected due to specific nature of study being carried out using Bradford Law of journal productivity, where journal is the publication source. Further, article is the main document type in journals that report original research in a discipline. The search resulted in a total of 21,278 publication records. These data were retrieved during October 2021 using WoS Advanced Search feature. The data were then processed by using the analytical framework described above. The framework developed has the potential to be applied in diverse areas.

4. Results

An overall trend of growth can be observed with the number of articles growing from just 536 in 2010 across 365 research journals, to 7,927 articles across 2,377 journals in 2020 (Figure 3). This shows a CAGR of 30.99 percent in number of research articles and 20.60 percent in number of journals. However, for the period of 2016–2020, this increase is about five times from 591 to 2,377, which results into an exponential growth. It shows that over the last decades, the number of journals as well as articles in AI research has increased significantly. This observation is also coherent with the findings of earlier studies in specific domains of AI applications [83].

4.1. Application of Bradford Law for Analysis of Growth of AI Literature
4.1.1. Distribution of Journals and Bradford Groupings

Further to observe the productivity trends, an in-depth analysis of journal coverage has been carried out using the Leimkuhler model of Bradford law. Basic assumptions for this are as follows:

The number of Bradford groupings (p) = 3 (Please refer (3) for calculation of Bradford multiplier and, (5) is used for calculation of number of journals in core group). Using (3) and (5), the Bradford multiplier for each year (2010–2020) are calculated and used to determine yearly distributions in the three productivity groups of journals in the form of (Table 1).

On closer observation it can be seen that number of journals in the core group during the eleven-year period remains almost constant (with sharp increase in 2015 which probably is due to expansion in coverage of journals in WOS database in the year 2015). On the other hand, for allied and periphery groups, number of journals increases significantly. The increase in number of journals in periphery group is very rapid from 2018 to 2019 and 2020. The number of journals in allied group is also seen to have a significant increase during last three years. To further analyse the dynamics at an individual group level certain quantitative estimations are conducted as follows.

4.1.2. Quantitative Estimates of the Changes in Journal Productivity Group

The study by Oluić-Vuković [84] provides an approach for examining the variations of productivity distribution in the any subject domain over the years.

Some of these characteristics are as follows:(i)Proportion between the number of the least productive journals and maximal value of journal productivity variable : Here the number of least productive journal is approximated as “a” i.e., number of journals containing one paper only and maximal value of journal productivity variable means that the number of papers that can be published by the most productive journal. This explains the share of least productive journal over the maximum capacity of journal productivity.(ii)Ratio between the number of the least productive journals and Total number of journals (a/T): This ratio shows the productivity level and contribution of journals featuring in the list of least productive journals.(iii)Ratio between the number of the articles in least productive journals and Total number of articles (a/A): It evaluates the share of articles that belongs to least productive journals with total number of articles.(iv)Comparison of dispersion versus concentration of papers (Periphery vs. Core ratio): Dispersion as a property is defined for journals in periphery group. It is the number of articles published in these journals as they represent very wide collection of subject areas. Concentration is a property of journals in core group. It is defined as total number of articles published in core group journals. This ratio shows the overall width of subject categories covered by publications in the domain.

The calculated characteristics from these are presented in Table 2. The parameters shown in Table 2 highlight the variations observed in AI research over the years. The number of articles in the most productive journal for the given year has received drastic increase as the capacity of individual journals reaches 476 articles per year (2020) from 21 articles per year (2010). This most productive journal (IEEE Access) for the year 2020 contributes to 6% (approx.) of overall articles published. The proportion of the cumulative number of articles in the least productive journals with respect to the articles in the most productive one has decreased slowly from 2010 to 2020. An interesting observation is made in relation to the number of least productive journals (i.e., journals publishing exactly one article per year on AI), there is a constant increase in the number of the least productive journals (“a”), from 272 in 2010 to 1,329 in 2020. However, the ratio between the least productive journals against total journals on AI (a/T) decreases with time showing that the overall productivity of journals is increasing, i.e., there are more journals publishing multiple research articles per year on AI as compared to 2010. The decreasing values of and significant increase of indicate a definite trend toward increasing article concentration for each journal over time.

These observations necessitated a deeper analysis of the publication trends, which has been conducted using Bradford groupings of productivity of journals as shown in Table 1. The ratio between the number of journals in periphery and the number of journals in core groups (periphery/core) has increased rapidly, underlining that application base of AI has also become wider over the time. Hence, this quantitative evaluation shows that AI literature has grown not only in volume (number of articles and journals) but also in terms of its applications in diverse subject areas (width of coverage).

4.1.3. Dispersion of Subject Categories among the Productivity Groups

For further assessment of varied subject areas, the comparison of subject categories of AI literature for the year 2010 and 2020 is analysed (Figure 4). These subject categories when looked at along with characteristics of journal productivity distribution (Table 2) indicate the following:(1)In the core group, the concentration of articles in journals and each subject category is high and also increased from 2010 to 2020. In 2010, these journals covered topics related to engineering, computer science, clinical and medical research, and environmental sciences; whereas in 2020, telecommunication, healthcare and medical sciences, sustainable sciences and technology, applied physics and analytical chemistry are the most active subject categories in these journals (Figure 4).(2)In the allied group, there is a significant growth in the number of articles, journals as well as subject categories. In this group, the main categories were applied mathematics, engineering, atomic and molecular physics, and some interdisciplinary applications in 2010. In 2020, the subject categories changed to include, agronomy, public environmental and operational health, hospitality, leisure, sports and tourism, and engineering applications (Figure 4). This means that both concentration and dispersion increased in this group.(3)In the periphery group, the increase in number of journals as well as subject categories is larger than the other two groups. In 2010, subject categories in this group included, engineering, material sciences, social sciences, oncology, and rehabilitation. In 2020, several new subject categories including food sciences and technology, behavioural sciences, medical informatics, library and information sciences, nanoscience and nanotechnology, neuroimaging, and allied engineering disciplines have featured in this group (Figure 4). This signifies that the area of application of AI has now become much diverse than earlier.

Based on these observations, it can be inferred that the growth in the allied group and periphery is significantly high, indicating that AI research is now coming from very wide subject areas. The increase in dispersion signifies that, diverse subject domains have started implementing tools from AI to address their problems, for example, food science and technology, neurosciences, nanoscience, and health science. However, there are some innovative subject categories such as green and sustainable sciences and technology that are constantly applying AI technology and trying to expand their publication capacity over the time, and they have come in the core group. Hence, journals are not only interested in the dispersion but also tried to increase the concentration of literature to expand the AI-based research tremendously.

Now that the broad trends in journal productivity within AI literature are clear. Further analysis of the data was performed to better understand the disciplinary applications of AI, their knowledge contribution and variations in the techniques associated with them. The path analysis technique is applied on the data for this purpose.

4.2. Application of Path Analysis
4.2.1. Evolution of the Major Trajectories Contributing to the Application Areas

The first step is the creation of a citation network of publications of AI literature for the period 2010–2020. This network is found to be consisting of 21278 papers and 18810 links. It is shown in Figure 5 for illustration.

This citation network of AI literature passes through the acyclic property check and observed that there are two directed cycles present in the network, which are shown below. For handling the acyclic nature of network, shrinking method of vertices is utilized. Upon shrinking, the cycles shown in Figure 6 will be represented by single vertices which are labelled as #14996 and #11417, respectively.

Next, the computation of SPC weights and FV gradient weights is performed on this acyclic citation network. After the weight assignment, determination of important SPC paths and FV paths is performed using search schemes like Forward (local), Backward (local), Key-route (local), standard global (or CPM) and Key-route (global) on SPC weighted and FV gradient weighted networks. Similarly, SPC backward (local) path, FV forward (local) path, SPC Key-route (local) path, FV Key-route (local) path SPC CPM (top) and FV CPM path (bottom) of AI network were generated and assessed.

(1) Redundancy Index Computation. Depending on the structure of network, common papers can be found in different subnetworks or paths. Based on the presence of common papers, paths that can be taken for analysis of their themes can be chosen. Suppose for a pair of paths and , if the number of common papers |PiPj|=min then, path with lesser number of papers will be an absolute subset or sub-path of the path with a greater number of papers. Thus, closeness of the redundancy ratio (RR) (which can be computed as shown in equation (10) to 1 indicate the level of redundancy of analysing smallest among the paths as most of the papers in smallest path will be covered in the largest path. A threshold can be fixed (say 0.75), above which this redundancy can be confirmed and only the largest path needs to be considered for analysis.

Upon computation of this ratio for all the pairs of paths (which is shown in RR or redundancy matrix given below), except for the entries shown in normal fonts, RR values of all other pairs are found to be either greater than 0.75 or the value is irrelevant (because that path has a value greater than 0.75 in any other pair). The matrix is constructed in such a way that, upon the first occurrence of an RR value ≥0.75 (see bold elements), the smallest among the path pair is treated as redundant (the name of the same will be marked in red color in its row and column headers) and will not be considered for further analysis though it had some nonredundant RR values till then. These previous non-redundant RR values of a redundant path will be then marked as red as well. Finally, the paths (header names not marked in red) associated with entries in normal fonts (not bold or red fonts) will be eligible.

From Table 3, entries corresponding to SPC Key-route & FV Key-route (global) (0.31), SPC Key-route & FV backward (local) (0) and FV key-route (global) & FV backward (local) (0.4) are neither red nor bold. Thus, SPC Key-route, FV Key-route (global) and FV backward (local) paths can be taken forward for further analysis, as presented next.

4.2.2. Content Analysis of Selected Paths

Most of the publications that appear in important paths deal with AI techniques applied to different problems from various fields. These are brought out by diligent and rigorous analysis of the content of these works by manually going through the abstracts and content of these works.

(1) SPC Keyroute. From Figure 7, it can be seen that SPC Key-route (local) path has two subnetworks. The left subnetwork can be marked as SPC KR-1 and the right subnetwork can be marked as SPC KR-2.SPC KR-1: SPC KR-1 can be viewed as chain of development from convergence of two chains- (i) starting from 220 Rajaee T, 2010 and 942 Nourani V, 2011 and, (ii) starting from 2632 Ghalkhani H, 2013 and 4267 Tsai WP, 2015. The first chain extends till the first cycle that comprises of 11417 Nguyen H, 2019, 15426 Nguyen H, 2020 and 15645 Nguyen H, 2020 (refer Figure 6 right). This chain and the other chain converge at 15646 Shang Y H, 2020. This chain is found to be dominated by AI applications in areas of Environmental Management and Water Resource Management.The second chain of SPC KR-1 starts with 2632 Ghalkhani H, 2013 and 4267 Tsai WP, 2015 and extends to 15645 Nguyen H, 2020. This chain is dominated by AI applications in Disaster Management. Water Resource Management also is found to be addressed. Building Technology applications of AI begins to appear towards the end of this chain.After the convergence of both these chains at 15645 Nguyen H, 2020, the path has another chain with occasional convergences and divergences. This chain is dominated by Environmental Management and Building Technology.SPC KR-2: SPC KR-2 is mostly structured as a straight chain with occasional convergences and divergences. It starts from 593 Raja MAZ, 2010. This chain is dominated by Nonlinear differential equation solving applications of AI. An application of AI for Medical and Healthcare is also present in this path.

(2) FV Keyroute (Global). From Figure 8, it can be seen that FV Key-route (global) path has two subnetworks. The left subnetwork can be marked as FV KRG-1 and the right subnetwork can be marked as FV KRG-2.FV KRG-1: FV KRG-1 can also be viewed as chain of development from convergence of two chains: (i) starting from 355 Ghorbani MA, 2010 and, (ii) starting from 2127 Khandelwal M, 2013. Both these chains converge at the first cycle that comprises of 11417 Nguyen H, 2019, 15426 Nguyen H, 2020 and 15645 Nguyen H, 2020 (refer Figure 6 right). From there it branches out into two chains: one short and another long. The short chain ends at 19878 Bai C, 2020 and long chain ends at 19165 Li DY, 2020 and 21225 Nourani V, 2021. First chain that starts at 355 Ghorbani MA, 2010 is mostly dominated by AI applications in Water Resource Management until the first cycle is obtained that dealt with Environmental Management as discussed earlier. Second chain that starts from 2127 Khandelwal M, 2013 mainly dealt with Environmental management. Thus, at the first cycle, a convergence of AI knowledge of Water Resource Management and Environmental Management occurs and then the evolution continues in the form of a chain of developments in Environmental Management in the long chain. Short chain that starts from the cycle deals with Building Technology. This divergence from the first cycle is also interesting.FV KRG-2: FV KRG-2 consists of three chains, starting from 606 Arneson B, 2010, 7233 Syam N, 2018 and 1444 Wang CS, 2012 that converges at the second cycle (comprised of 15003 Campbell C, 2020 14996 Paschen U, 2020 and 15920 Paschen J, 2020) (refer Figure 6 left). From there, it extends as three short chains that converges at 15002 Black J S, 2020. First chain from 606 Arneson B, 2010 deals with AI applications in Game simulation (of board game namely ‘Game of Go’) and Decision Support/Business Analytics. The second cycle deals with Decision Support/Business Analytics. Second and third chains starting from 7233 Syam N, 2018 and 1444 Wang CS, 2012 deals with Decision Support/Business Analytics. From the second cycle, at which these converge, further development in AI application to different problems in Decision Support/Business Analytics can be found in the form of strategies.

Now, the third path remaining is FV Backward (Local).

(3) FV Backward (Local). From Figure 9, it can be viewed that the structure of this path is relatively less sophisticated with a divergence at the end. It starts with 1433 Gelly S, 2012 and extends till 17610 Hoyer WD, 2020 and diverges to 17607 Kraftt M, 2020 and 17611 De Bryun A, 2020. Till the point of divergence, Game simulation was the dominant area of AI application and then Decision support/Business Analytics is found till the end.

4.2.3. Progression of AI Application Methodologies across the Most Impacted Sub-Disciplines

Path analysis exercise led to identification of the important paths of knowledge. Upon conducting content analysis of the nodes that are highlighted by various important paths, we can identify the evolution of AI techniques with respect to application areas. Content analysis of these research papers were conducted by manual perusal and examination of title, keywords, abstract and full text (in some cases). Details about their themes, methodologies followed in the respective application disciplines, the nature of these studies and their findings were identified and tabulated. These are arranged in seven Tables 49 shown in Appendix, each covering a selected number of years. From these tables, it can be observed that AI has evolved into such an extent that combined application of multiple models (either as layered or as hybrid) is being preferred in most of the application areas. The evolution of AI techniques along with the application areas from 2010-2020 is summarized in Figures 10 and 11. The major areas of application of AI methods are Water Resource Management, Non-differential Equation Solving, Game simulation & Decision Support/Business Analytics (Figure 10), Disaster Management, Environmental Management & Chemical Engineering, Medicine and Health care & Building technology (Figure 11). Among these, it was observed that Water resource management is an area of application in which consistent research work has been conducted from 2010 to 2019.

These areas primarily make use of concepts from computational theory, machine learning, artificial neural networks (ANN), intelligent automation, and approximation reasoning. The researchers have experimented several hybrid approaches by combining complex algorithms like genetic programming (GP), ANN (artificial neural networks), RNN (recurrent neural networks), fuzzy logic, linear programming, wavelet transform and SVM (support vector machines). AI based methods are also being utilized to solve non-linear differential equations that can be useful in many real-world problems. These methods mostly include combination of ANN with genetic programming (GP), sequential quadratic programming (SQP), and some optimization algorithm such as simulated annealing (SA), particle swarm intelligence (PSO). For instance, in 2010, ANN had been combined with PSO to develop a model, ANN-PSO, for validation of effectiveness of AI in solving Non-linear Riccati Differential Equations (Table 4). In Game Simulation applications of Monte Carlo Tree Search (MCTS) and Deep neural networks (Deep NN) have evolved to verify the effectiveness of AI in the field. AI has also impacted Decision support and business analytics by developing its potential for market analytics, B2B analytics, innovation analytics, digital analytics and so on (Figure 10).

Environmental Management majorly includes machine learning techniques like K-nearest neighbour (KNN), K-means, SVM, principal component analysis (PCA), principal component regression (PCR), and decision tree-based algorithms, namely, inference tree, classification, and regression tree (CART). It also utilizes technologies like ANN, gradient boosting, multi-variate analysis, regression analysis, and fuzzy logic to generate the empirical models to solve the problems related to it. It can be seen that one of the most recent invasions by AI is in the application area ‘Building Technology’ (Figure 11). Developments in AI is either somewhat slow to invade into ‘Medicine and Healthcare’ or it is not well captured with our analysis (with top 10 key-routes only for key-route search). A recent study by Yu & Ziang [28]; which used research publications in twenty-five (25) core AI journals, and nineteen (19) AI-related conference proceedings deduced seven (7) main themes in AI research namely, Intelligent Automation, Machine Learning, Natural Language Processing, Computer Vision, Computational Theory, Approximate Reasoning, and Artificial Neural Networks. In contrast, our study finds Machine Learning, Artificial Neural Networks (ANN), Computational Theory, Intelligent Automation, and Approximation Reasoning as prominent methods. Among these Machine Learning and Artificial Neural Network-based algorithms/methods have become increasingly popular in the AI research domain. Wang et al. [21]; underline application areas of AI such as weather forecasting, energy generation and management (e.g., battery design optimization, hydropower station location, magnetic control of nuclear fusion reactors), healthcare applications, high resolution imaging, and differential equation solvers. These fields also feature in the results of this study and are further classified with specific techniques to provide more vivid and detailed understanding. Thus, it adds a new layer of detail by identifying the year-wise technological evolution trends within each domain of AI application and the studies which report these applications. The analytical results thus add significantly to the existing analysis of the thematic patterns of AI research reported in other recent studies.

The invasion of AI in its journey to replace existing methods in application areas, such as water resource management, disaster management, environmental management, along with its signs of invasions in medicine and healthcare, and building technology, is intriguing as most of these areas are directly or indirectly related to the Sustainable Development Goals. Similar trends can be observed in other identified areas also (for more detailed understanding please see Tables 49 in Appendix I).

5. Conclusions

This study proposes an analytical framework comprising Bradford’s distribution of productivity and Path analysis for detailed analysis of AI research. It addresses the common criticism of bibliometrics being a shallow science by improving its capability at deep rooted analytical evaluation of growth and evolution of the research progress in a discipline. The framework’s application is able to present both a macroscopic as well as microscopic analysis of the AI research.

The study finds that over the period of study, the volume of research articles in AI has increased significantly, both in terms of publications and publication venues. The analysis using Bradford law showed that the publication span as well as the disciplinary span of AI research has broadened in the period of study. The concentration and dispersion of research journals in the three productivity groups (core, allied and periphery) has shown an expansionary trend indicating an increase in the related research activity (Table 2). A visualisation of subject categories covered by the journals shows that more disciplines have come under AI applications from 2010 to 2020 (for example, behavioural sciences, food science & technology, nano science and health science, green & sustainable sciences & technology). It is also observed that some subject categories such as operations research and management sciences, neurosciences, and surgery have shifted from their positions among the productivity groups from 2010 to 2020 (Figure 4).

More insights about the development of important AI methods/models as a kind of co-evolution with the changing demands for more sophistication and effectiveness from major application areas are identified through path analysis. Developments in specific AI application domains (namely, water resource management, nonlinear differential equations, disaster management, building technology, environmental sciences and chemical sciences, game simulation, and decision and business analytics) through knowledge flow, and sometimes convergence of knowledge or adoption of methods attempted in one for dealing with the problems in other are identified and discussed. Machine learning, artificial neural networks (ANNs), approximation reasoning, computational theory, and intelligent automation are the main concepts used in these fields. A general trend in AI development captured that multiple and hybrid AI models and methods like genetic programming, artificial neural networks, fuzzy logic, support vector machines, regression analysis, gradient boosting, decision tree algorithms, and swarm intelligence (Tables 49) have been explored for solving problems in major application areas of which some are directly and intensely related to SDGs (for example, water resource management and environmental (air quality) management) (Figures 10 and 11).

The findings from a study of this kind can serve as a practical guide for policy makers to base their decisions upon. Such decisions may include strengthening the infrastructure and absorption capacities of the system in the relevant domains. The study not only helps in understanding the increasing disciplinary span of AI research but also identifies major thematic trends, application areas and the major techniques, and methods used in the AI applications. One of the possible extensions of this work is the improvement of the framework by the incorporation of more sophisticated text mining methods for determination of the theme addressed by the highlighted works present in the important paths. The upgradation of the framework to text mining-integrated framework can enable the analysts to better utilize the potential of multiple key-route search and reduce the manual effort required for content analysis. Further, a more comprehensive analysis can also be done by using data from other databases like Scopus or Dimensions which have wider coverage than Web of Science [85]. Furthermore, a significant portion of research advances in the area of AI applications are being reported in conference proceedings. These are not a part of present study due to specific nature of the Bradford’s law, which is proven to hold true for journal publications, and also due to the fact that conference papers are not suitably covered by the scholarly databases like Web of Science. The current study presents a detailed and systematic analysis of a large representative sample of journal papers, which may further be substantiated and extended by incorporating conference papers.

Appendix

Content analysis of the papers appearing in the major selected paths are mentioned in Tables 49.

Data Availability

The datasets generated during and/or analyzed during the current study will be made available on reasonable request.

Disclosure

A pre-print of this article is available (Please see [86]. The pre-print is accessible at: https://www.researchsquare.com/article/rs-1806711/v1).

Conflicts of Interest

The authors declare that the manuscript complies with the ethical standards of the journal and there are no conflicts of interest whatsoever.

Acknowledgments

This work was partly supported by extramural research grant no.: MTR/2020/000625 from Science and Engineering Research Board (SERB), India, and by HPE Aruba Centre for Research in Information Systems at BHU (No.: M-22-69 of BHU), to the fourth author. Open Access funding enabled and organized by Projekt DEAL.