Technology forecasting is an important and critical issue that determines the starting point of planning and is considered as a management tool directly related to the future. In the previous research items, the development of renewable energy technologies was of concern. Moreover, due to the increasing need of countries to produce electricity and facing the lack of resources, this research focuses on forecasting photovoltaic technology. Accordingly, in this paper, for technological research in the field of solar energy, the patents extracted from one of the most famous renewable energy databases in the United States (US patent database) between 200 and 2020 were examined. Next, research gaps were analyzed by using the artificial neural network clustering method and also by analyzing covered and uncovered compounds. The results show that in the future, photovoltaic technology research will move towards the third generation of technology (organic materials) as well as focus on environmental parameters and their effects on the performance of photovoltaic systems.

1. Introduction

New technological advances are the most important driving force for growth and innovation in fields such as information technology, energy supply, and engineering, which are expected to significantly increase the quality of life in the future [1]. Technology development needs deep forecasting of future demands. In other words, in order to consider a competitive advantage for future technology, a lower-error prediction of significant variables is needed. According to the latest research items, a total of 150 techniques for technology forecasting have been identified [2]. Currently, 18 to 20 techniques are used by private and public organizations for practical forecasting purposes [3]. Likewise, all forecasting activities involve inferring and comparing past experiences. According to the most contributed research items in the field of technology forecasting [4, 5], the technology forecasting techniques can be divided into the following categories.(i)Determining, organizing, and comparing patterns of past technological developments.(ii)Collecting and integrating the opinions of experts in the fields where forecasting should be carried out.

One of the most important technological forecasting methods is patent analysis. The patent database includes text and image documents that describe the technology and its applications. The analysis of confidential information in patents stored in international databases can clearly show the current state of scientific-technological innovations in a specific field of technology [6]. On the other hand, the general goal of knowledge discovery is “to extract potential information” [7]. The patent analysis method is one of the scientific and quantitative technology forecasting methods [8]. Due to the limited budget of research and development departments, it is necessary that these departments focus on the development of technologies that are more likely to be successful, and the time for their commercialization is short. The topic of technology forecasting has a special place in different fields. This process will be of double importance for strategic issues such as the energy crisis as well as existing environmental problems such as global warming, greenhouse gases, and air pollution [9].

Nowadays, solar energy is used in different ways. One of these cases is photovoltaic technology. The photovoltaic phenomenon is when solar energy is directly converted into electricity. One of the fundamental problems and concerns of today’s world is the end of fossil fuel resources as the most important current source of energy in the world. For this reason, solar energy has been considered as an alternative to fossil fuels. It is necessary to clarify the priority areas of inventions and innovations and determine the technological fields with practical capabilities.

Photovoltaic (PV) technology has been proposed today as a standard part of building terminology with the possibility of application in existing and new buildings. The use of this system in the building envelope is very diverse and opens new ways for creative designers. For example, in semi-transparent photovoltaics, the modules can perform other building covering functions well in addition to energy storage. If the comprehensive effects and applications of photovoltaics in the building are carefully understood and taken into account in the overall design and energy concepts of the building, they can be found in the building components with multipurpose functions. In addition to electricity generation, it can also take on other applications in the building envelope.

Accordingly, as fossil fuel usage has many concerns, it is essential to focus on renewable energies and related technologies, like PV. Therefore, the main motivation of this study is to assess the PV technology revolution prediction. In this regard, in the present study, patent content analysis has been used to depict the research process of photovoltaic technology around the world. This research discovered and identified research gaps by using the patent analysis methods and cluster analysis of artificial neural networks.

The information in patents represents the status of inventions made in the world, and the results of the analysis of patent information can play a significant role in the intelligent selection of research projects. In this regard, the aim of this research is to determine the position of photovoltaic technology in the world according to the patents registered in photovoltaic technology and also to determine the research gaps for future research. Therefore, the following questions are answered in this research.(i)What is the past process of patent registration in photovoltaic technology?(ii)What are the covered combinations of scientific concepts and expressions forming the content of patents in the past?(iii)What are the research gaps in this field, and what are the proposed combinations of concepts and scientific expressions that shape the content of patents to guide future inventions?

Accordingly, the main contribution and novelty of this paper in comparison to other published papers is to address pattern registration in photovoltaic technology and assess the future technological revolutions by considering the growth of human capital.

2. Research Background

The patent analysis in various fields, such as economic growth [10], determination of technological capacity at the national level [11], industry competitiveness [12], company competitiveness and technology management [13], determining the trend of new technologies [14], assessing potential and market value [15], and stock prices [16].

Yuan and Li used patent analysis to understand the development rate and competitive position of these types of displays. Since this type of display technology is divided into three subcategories: holographic display technology, volumetric display, and two-dimensional distribution display, these three technologies were investigated as targets in the search and retrieval of patents. In this research, the databases of the United States Trademark Office and the European Patent Registration Office were used to conduct the study, and related patents between 1976 and 2009 were examined. In order to reduce time and simplify the search process, Patent Guider software was used in this study. The results of this research showed that countries such as the United States, Japan, South Korea, and Taiwan are investing in 3D displays. According to the results of the patent analysis, it is suggested that Taiwan should focus its investments on holographic and two-dimensional sharing technologies [17].

Trappey et al. have used patent data extracted from the Intellectual Property Office of the Republic of China to explore RFID technology developments and trends. They point out that China is one of the largest producers and consumers of RFID products, and estimates show that the country will need 3 billion RFID tags by 2009. Therefore, the patents registered in this field have been investigated to discover the developments of RFID technology and their trends. In this study, a total of 1389 patents were retrieved, and by integrating the methods of patent content clustering and technology life cycle prediction, the progress status of RFID technology in China was investigated.

Choi and Hwang proposed an optimal patent strategy for the fuel cell industry by integrating bibliometrics and patent analysis with the logistic growth curve model. The analytical results of this research showed that growth curves are an effective tool in quantifying how technology develops, using the cumulative number of published patents [18].

Chen et al. have conducted research titled “Patent Keyword Network Analysis to Improve Technology Development Efficiency,” in which the patents related to LED and wireless broadband networks from 2000 to 2011 are based on the methods of network analysis of patents and keyword analysis of patents that have been reviewed. By analyzing keywords, it was found which words impact technology trends over time. The results of the analysis show that the keyword network of patents is scattered. However, the clustering results show a powerful distribution law. Among the extracted keywords, ten words have an important impact on the creation of the word network as well as technology research in the fields of LED and wireless broadband networks [19].

Noh et al. conducted research titled “Keyword Selection and Processing Strategy by Text Mining Method for Patent Analysis,” which aims to fill the research gap by focusing on keyword strategies utilizing text mining in patent data. For this purpose, four factors have been discussed, such as which elements in the text of patents should be selected as keywords? Which keyword selection method should be used? How many keywords should be selected? How to convert the selected keywords into data analysis format? It evaluates and compares the 4-factor K-means clustering method and entropy value for a keyword selection strategy. The results showed that the NTF-IDF method is the most effective among the keyword selection strategies [20].

In an article, Ni et al. present a new roadmap process for patents through the GTM and the BASS model. The GTM method extracts keywords and forms a vector of keywords. This method identifies patent gaps and predicts the content of growing research fields. Moreover, this method checks the status of patent applications and discovers gaps in technology development. These gaps discovered by the GTM method show potential areas for technology development. The BASS model is used to predict the maximum number of patent applications in related technological fields and the approximate time of the emergence of a new patent [21].

Wu et al. used a data mining approach to identify and classify the quality of new patents over time. An automatic patent analysis and classification system, called SOM-KPCA-SVM, is developed according to the quality of the indicators. The proposed method in this article is the SOM-KPCA-SVM patent automatic qualitative classification method. The results of SOM clustering among different quality groups have been successful, and the result has a statistically significant difference in the quality indices between the quality groups. kernel principal component analysis (KPCA) has an efficient transformation of nonlinear feature space into patent features and also improves classification performance. SVM builds a robust classification model for patent quality issues. The purpose of SOM-KPCA-SVM is to improve time, cost, and manpower in patent analysis. Therefore, the SOM-KPCA-SVM system can allocate a short time to determine the patent quality [22].

Krestel et al. extracted a hybrid patent control on the American patent site from 2015 to 2019 and performed text mining operations on the patents, as well as examining the keywords of this field and, finally, the combinations that are not yet expressed by the existing patents [3].

Choi et al. extracted the international patents using the text mining technique, including the subgroups of this field, including pathology, radiology, and telesurgery. Moreover, they drew the life cycle curve, the starting point, the peak, and the saturation for each of these subgroups [8].

Dutt and Scharma assessed the application of artificial intelligence in forecasting the performance of renewable energy systems. In this research, the emergence of novel technologies was investigated [23]. Portus and Doma proposed a hybrid forecasting model for solar technology forecasting. This research combines the nonlinear autoregressive network with exogenous inputs (NARX) and gated recurrent units (GRU) as the forecasting hybrid model [24].

The review of previous research shows that patent analysis is one of the most powerful methods for predicting innovation opportunities in technologies in different fields.

3. Methodology

Choosing a forecasting method depends on various factors, especially available time and financial resources, as well as forecasting goals [25]. There are various models and tools for analyzing the future and predicting the future of science, technology, and innovation [2628]. However, in order to achieve acceptable and valid results, as much as possible, quantitative and qualitative methods should be used simultaneously. For this purpose, both qualitative (expert judgment) and quantitative (text mining) methods have been used in this study. The steps of the proposed method are illustrated in Figure 1.

3.1. Forming a Group of Experts

Due to the interdisciplinary nature of this research, the presence of experts in the fields of electrical engineering, mechanical engineering, and physics is necessary. For this purpose, we corresponded with prominent professors in the field of solar energy technology studies, especially photovoltaic technology, and finally, a collective opinion was obtained from them for cooperation. Finally, a group of experts was formed, including one of the professors in the electrical engineering department of the power department of different universities around the world. As seen in the executive framework of the research, expert judgment effectively supports all research processes [29]. The steps in which the structure of the research was formed based on the opinions of experts are as follows:(i)Introduction of primary keywords to search for related patents.(ii)Extraction of related patents from the warehouse of identified patents.(iii)Filtering extracted keywords based on the text mining method.(iv)Structuring keywords extracted from patents based on photovoltaic technology processes.(v)Verifying the validity of keyword clustering in repetitions of the data mining process.

3.2. Forming a Database

In this research, firstly, after determining the USPC and IPC codes, the words such as Photovoltaic and Solar Photovoltaic were searched in the title, abstract, and keywords of the patents on the USPTO website (as the complete patent data bank) in the period from 2010 to 2020. In total, a database consisting of 1288 patents was formed. Due to the nature of the required information, which is both numerical and textual, and also the ability of Excel software to form a database, Excel software version 2016 was used. In this database, information such as patent number, patent title, inventors, country name, abstract, keywords, registration date, publication date, claim, and discussion are stored.

3.3. Text Mining

Text mining means extracting implicit, unknown, and useful information from a huge amount of textual data. This method looks for valuable information such as relationships, trends, and patterns in textual data and is widely used to discover complex relationships in scientific texts and documents. Although this method helps to discover knowledge through computer analysis, the interpretation of the meaning and connections obtained from the information depends on the experts in that field. In other words, the reliability of a forecasting activity is closely related to the skill and knowledge of the experts from whom advice is sought. The text mining process used in this research is described in the following subsections.

3.3.1. Extracting Keywords

With text mining of patents, words and phrases with high frequency are identified. Then, using the opinion of experts, these words and phrases are filtered so that the most relevant ones are selected to form a technology dictionary. Accuracy is very important in this step, and the quality of the final search results depends to a large extent on this step. After forming the technology dictionary, these terms are searched again, and the frequency of occurrence of each in the text of the patents is obtained.

In this research, QDA MINER and WORDSTAT software were used to perform the text mining process in order to identify and cluster the keywords in this field. At this stage, the full text of all patents was entered into the software for preprocessing, and all the words used in these texts were extracted. In total, 6854031 words and phrases were obtained. During a four-step process, a vector of keywords related to this issue was identified using the opinion of experts. These four steps include word segmentation, removing redundant words, rooting, and calculating the weight of expressions. After reviewing and checking the words, a dictionary of keywords consisting of 80 words was obtained. By calculating the frequency of occurrence of each term in each patent, a vector of keywords is obtained, which is shown in Table 1. The rows and columns of this matrix show the frequency of use of each keyword in each patent, in the order of patents and keywords and their indexes.

3.3.2. Normalizing Key Phrases

Allocating weight to keywords based on their frequency of occurrence in the text frequency (TF) is a common method in text mining. Considering that each patent can consist of different numbers of words and pages, we can conclude that the patent size affects the frequency of keywords. To neutralize the effect of the size of the patents, in order to weight the phrases, he used the method of normal phrase frequency inverse frequency of the text, which was proposed by Trappey [30]. Normal text frequency-inverse document frequency (NTF-IDF) is shown in the following equation:where is equal to the frequency of the i-th term in the k-th text, indicates the number of words in the k-th text, n is the total number of texts, and is equal to the number of texts in the i-th term.

3.3.3. Calculating the Correlation of Keywords

The correlation between a set of data is a measure of their relationship with each other. Pearson’s correlation is used to calculate the correlation between keywords based to the nature of research data. The normal frequency of keywords is used as input to measure the relationship between them. This measurement is based on the co-occurrence of words in the text of patents. (2) shows how to calculate the Pearson’s correlation.where is the total number of documents and is the number of keywords that occur in the l-th document.

The purpose of obtaining correlation is to use it to cluster expressions and identify known and neglected compounds in this field. After obtaining the normalized frequency matrix of keywords, according to the co-occurrence of these words in the text of the patents, the Pearson correlation of terms with words is calculated using SPSS software. Therefore, as shown in Table 2, this matrix includes correlation coefficients between keywords that show the intensity of the simultaneous use of keywords in patents.

3.4. Artificial Neural Network

Self-organizing maps (SOM) were first introduced by Kohonen [31] by modeling retinal nerves. The structure of the SOM method consists of two separate layers. An input layer and an output layer are called map layers. Each neuron in the map layer corresponds to an information vector with dimensions equal to the dimensions of the space required for analysis. After training self-organizing networks, weight vectors are obtained according to the number of neurons selected for the network, each of which represents a part of the analyzed space. If the appropriate number of neurons, network dimensions, and, finally, appropriate training of the network are selected, the weight representation corresponding to the neurons of each map can represent the analyzed space well.

In the output of the SOM maps corresponding to the value of each feature in the weight vector, an RGB vector and, therefore, the color, are considered. Therefore, all values can be displayed using the color spectrum, from dark blue (for the lowest value) to dark red (for the highest value). There are various software programs for data analysis using the SOM method. The Viscovery-SOMine software was used in this research.

Self-organizing maps are a type of neural network with unsupervised learning capability that is capable of analyzing complex spaces and clustering data into homogeneous groups. For this reason, this method has also been used for segmentation in this research. By learning based on the initial classification of input patterns, these systems form a network of relationships between variables affecting clustering that is able to provide an appropriate classification of new patterns with the least possible error. The framework for text mining based on the patent bank is shown in Figure 2.

4. Numerical Results

4.1. Statistical Analysis

In this section, statistical information about 1288 photovoltaic patents collected as a database is presented. Considering that the bibliographic information of these patents is needed for data analysis, the information of all these patents was extracted from the USPTO website. Figure 3 shows the process of registering these patents in recent years. According to this chart, the number of patents has been increasing for the last 31 years. The reason for this can be the attention of scientific and research centers to improve the performance and use of this technology.

By monitoring the obtained information, the number of countries with registered patents in the field of photovoltaic technology was obtained on the US patent website. Figure 4 shows the number of patents registered in each of these countries. As can be seen in this graph, Germany won first place with 173 patents, and Italy won second place with 155 patents.

4.2. Cluster Analysis

In this research, using the SOMine software, a hierarchical cluster analysis called SOM-Ward clusters was used to determine the boundaries of each section and also to determine the number of optimal clusters. The training data of the network consists of 77 (number of keywords) 3D vectors (processes identified by expert opinion, including Grid Connected, Stand Alone, Hybrid). Figure 5 shows the final segmentation of keywords that represent combinations is the homogeneity of keywords, shown in 7 clusters. It should be noted that the homogeneity of keywords placed in each cluster is determined based on the degree of correlation with each of the three processes specified by the experts. This means that in the set, some of the keywords that were more concerned with each of these three processes in photovoltaic technology are identified and displayed in the form of separate clusters. It should be noted that in this form, each main cluster has a total of several subcategories. The cluster (six corners) is formed. A subcluster can contain one or more keywords that are very similar in terms of correlation with the three processes under investigation. On the other hand, the color intensity of the subcluster represents the average correlation of the keywords. To put it simply, the keywords that are placed in the bolder parts of Figure 5 have a high correlation with all three processes under investigation.

After defining the boundaries of the segmentation, it is possible to analyze the characteristics of the terms and keywords placed in each cluster based on the processes of photovoltaic technology. Figure 6 shows the demarcation of the clusters.

Another output of the SOM method is feature maps that show the distribution of each of the clustering variables (the three analyzed processes) in the entire analysis space. In other words, the feature maps show the clustering space according to the processes under investigation and also based on the keywords in each subcluster (colored hexagons).

At the bottom of each of these maps, a color spectrum from blue to red is indicated, along with its numerical equivalent. An example of this spectrum is shown in Figure 710. The color of each subcluster, according to its numerical equivalent, indicates the degree of correlation of the term or the average correlation of the terms placed in that subcluster with the process, which is specified at the top of the map. One of the advantages of using the SOM method is determining the position of each cluster member within the clusters.

By using these maps, on the one hand, it is possible to examine the status of keywords in each of the processes and, on the other hand, to examine the correlation between different processes. Figures 810 show the reported feature maps for network-connected, network-independent, and combined processes, respectively.

As shown in Figures 810, for example, the word Power Grid in cluster number 7 in the Grid Connected process is marked with a red color, which indicates the high degree of correlation between this word and this process. By examining all the words in this cluster, it can be concluded that this cluster shows words that have a very high correlation with the Grid Connected process and a relatively low correlation with the Stand Alone and Hybrid processes. In another example, the words such as semiconductor and wind speed in cluster number 2 are in a similar situation with three processes, so these two words are placed together in a subcluster. By examining the color of the subcluster in each process, it can be seen that these two words have a high correlation with the Hybrid process (red color), but the same subcluster in the Grid connected and Stand Alone processes are green. It is specified, and it shows the average correlation of these words with the mentioned processes.

It is necessary to mention that in the feature maps, the distance of the subclusters from each other indicates the difference of the keywords placed in them. For example, the keywords placed on the right and bottom of the map have the greatest difference in terms of the degree of correlation with the investigated process compared with the keywords placed on the left and top corners of the map. On this basis and with the aim of illustrating such differences between the subclusters and also according to the dimensions chosen for the map, some of the subclusters remain so-called empty and the keywords do not fit in them. In other words, each subcluster in the feature map, according to the subclusters that are adjacent to it and also according to its distance from other subclusters, is assigned a hypothetical label of features or keywords that can be categorized in it. Accordingly, if the keywords that are compatible with this label (in terms of the degree of correlation with the investigated processes) are not available in the data, that subcluster will remain empty. These empty subclusters can increase the area of ​​a cluster while the frequency of data placed in that cluster remains constant.

5. Conclusion

The new technological developments create different approaches to the roof and facades of the building. Meanwhile, concerning the multipurpose building cover, it is indispensable to use active and passive solar techniques. One of these solar techniques that is an integral part of building culture is PV.

In this research, using the text mining of patents and cluster analysis (artificial neural network), the keywords of this field and their co-occurrence with the three processes in photovoltaic technology patents were tracked. By combining text mining and cluster analysis (quantitative method) with experts’ opinions (qualitative method), areas (keywords) that are highly related to each of the mentioned processes (dark red color) or have a low correlation and correlation ( dark blue color), were determined.

According to the achieved results in this research, some words did not have a high co-occurrence with any of the above three processes (dark blue color). This means that these areas have not yet received the attention of researchers worldwide in photovoltaic inventions with the mentioned processes. Examining these terms and keywords can play a significant role in finding new research areas in photovoltaic technology and directing future inventions.

Moreover, the results of this research can be very useful for investment opportunities in the research and development sector in this area. In the following, some of these residual compounds are identified and presented for further research.

The raw materials for forming solar panels in photovoltaic systems include three generations of technology. The first generation is crystalline silicon, the most common bulk material for photovoltaic systems. According to the type of crystal and size, bulk silicon material is divided into monocrystalline silicon and polycrystalline silicon. The second generation is a thin platform whose material is gallium arsenide. The third generation is based on organic materials. Photovoltaic systems made of organic materials are less efficient compared to their other counterparts, but they are suitable for nonindustrial uses due to the low manufacturing cost and other features such as flexibility. Types of this generation include color-sensitive, polymer, and crystal-based. As a result of the existence of the word organic and related subsets such as color-sensitive, polymer, and nano in the dark blue part of the feature maps of each process, it shows that the inventors have not paid attention to this generation of photovoltaic technology and its effects. The reason for this is that this generation of technology is still in the laboratory stage and has not yet been used commercially, while according to the experts of this technology, it can be given to inventors as a new research area.

Environmental parameters that are effective in the performance of photovoltaic systems include the factors that indicate weather conditions. For example, the location and weather conditions of the installation of photovoltaic systems affect the amount of solar energy absorption and the effectiveness of the operation of photovoltaic systems. According to the experts’ opinions, the existence of words such as cloudy, rainy, and wet in the dark blue parts of the feature maps indicates that the effect of these atmospheric conditions on the efficiency and inefficiency of photovoltaic systems has been neglected until now.

Data Availability

Data will be made available upon reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.