Analyzing the Investment Behavior in the Iranian Stock Exchange during the COVID-19 Pandemic Using Hybrid DEA and Data Mining Techniques
The main purpose of this paper is to investigate the effects of COVID-19 regarding the efficiency of industries based on data in the Tehran stock market. A hybrid model of Data Envelopment Analysis (DEA) and data mining techniques is used to analyze the investment behavior in Tehran stock market. Particularly during the COVID-19 pandemic, many companies face financial crises. That is why companies with inferior performance must be benchmarked with efficient companies. First, the financial data of investments on selective companies are analyzed using data mining approaches to recognize the behavioral patterns of investors and securities. Second, customers are clustered into 3 selling and 4 buying groups using data mining techniques. Then, the efficiency of active companies in stock exchange is evaluated using input-oriented DEA. The results indicate that, among 23 industries listed on the stock market in Iran, solely nine were efficient in 2019. Moreover, in 2020, the number of efficient industries further decreased to six industries. Comparing the obtained results with those of another study which was conducted in 2018 by other researchers revealed that COVID-19 strongly affects the performance of an industry and some industries which were efficient in the past such as the bank industry became inefficient in the following year.
Creating a suitable environment for investors to invest in stock markets is essential for economic progress . Stable economic conditions usually lead to a predictable environment, and all these will persuade investors for a more reliable investment . The investment market plays an important role in the allocation of financial sources to companies in developing countries . Hence, the identification of key criteria for a safe investment is critically important. Processing big data of past investments in a specified field can be very useful for making efficient investment decisions. Data mining is one of these processing methods . Stock exchanges, which can provide large amounts of historical data related to previous investments, are a suitable resource for applying data mining approaches . Today stock exchange markets are rapidly growing. Every day, a large number of shares are traded, and huge amounts of data result. At the same time, global stock exchange markets are being created, which can be considered as an opportunity for investors. In such a situation, the main problem is the selection of proper investment criteria and the creation of optimal portfolios. Economists and experts believe that competition among actors in the market leads to improved pricing efficiency of shares . Nevertheless, investors and other market participants usually assume that a random selection of shares is not a useful strategy. It is very important for us to know which factors have an impact on the behavior of investors. Another issue that has created some problems in Iranian investment markets is the deficiencies in analyses. A lot of experts and investors believe that price divided by earnings per share (P/E) is the main index of change in prices. Lack of long-term models has pushed investors to employ short-term models for analyses. When a large number of investors invest their money based on a limited number of models, many people simultaneously focus their investments on a small number of companies or industries and disinvest from other industries at the same time. This intensifies problematic behaviors in the market. When there are a lot of fluctuations in economic variables, the profitability of companies can also be significantly affected by these changes. In such a situation, investors do not trust the published data. Besides, operational conditions and efficiency might have an influence on share prices. Traina  showed that rates in the financial statements have an impact on share prices. In other words, figures in the financial statements show the performance and efficiency of the companies. Investment in efficient companies produces higher returns. Here, the main problem is that the evaluation of the efficiency of companies with various criteria and indices is not an easy task. In addition, decision-making based on a limited number of indicators is insufficient. Although the evaluation of companies by financial statements is difficult, DEA models can help incorporate several criteria and indicators when assessing the companies [8, 9]. The data are gathered from the Tehran stock exchange. The efficiency of companies whose shares have been bought is determined and compared.
Since the emergence of COVID-19 in 2019, many aspects of our lives have been affected. Many countries were forced to lock down, and subsequently, many industries did not work. The stock market values of many industries decreased dramatically, and many people lost their work. This research can help to find out which companies in the stock market of Iran are efficient and which of them are not. In addition, inefficient companies can benefit from benchmarking with efficient companies.
Data mining is a technique for extracting patterns and also finding correlations among big data sets for prediction purposes . This technique can be applied for increasing revenue, customer satisfaction and relationship, reducing costs and risks, and so on. Data mining includes many methods. One of them is clustering, which is based on the idea of putting similar things into the same cluster.
There are many methods for evaluating the performance of companies or businesses such as the Balanced Scorecard or EFQM (European Foundation for Quality Management). One of the most popular methods for performance evaluation of companies is Data Envelopment Analysis (DEA), which is based on the linear programming method . In this method, Decision-Making Units (DMUs) are evaluated by input and output factors. This method consists of the variants BCC by Banker et al.  and CCR by Charnes et al. .
COVID-19 has spread rapidly in many countries, including Iran, with a strongly increasing number of cases since the end of February 2020. From the beginning of this pandemic, all public places were closed in Iran, and social distancing and home quarantine were encouraged. Due to US sanctions against Iran and a resulting fragile economy, the government could not shut down the country. Hence, the number of patients and death rates increased dramatically. The government reduced the weekly work hours in government organizations. Most of the private companies and factories could not maintain production levels. In some cases, the government ordered a shutdown in the capital of Iran and some other cities for one week. These issues led to a decreased performance of companies.
Several studies conducted in Iran investigate the effects of the COVID-19 pandemic. Ahmadi and Ramezani  studied the COVID-19 effects from an emotion-focused therapy perspective. Samadi et al.  used Wavelet Coherence Analysis to examine the comovements between markets in a time period from September 2014 to June 2020 as a period of intense uncertainty in Iran. According to our search, no study was conducted to assess the effect of COVID-19 on the Iran stock market.
The stock market of Iran is the main source of pricing the most important goods such as cement and other commodities and for allocating capital for companies. Besides, many people invest their money, and if stocks decrease abnormally, many people lose their properties. Therefore, the stock market of Iran plays a key role for both companies and private persons.
There are a lot of papers published not only about stock markets worldwide but also about the Iran stock market using diverse methods such as MCDM or DEA. However, previous studies on the Iranian stock market were carried out in normal situations, so that the following question remains: Does the pandemic have a significant effect on the Iran stock market?
The novelty of this paper is using a hybrid technique for clustering and efficiency assessment. We examine the performance of Iran’s stock market during the COVID-19 pandemic and investigate the efficiency of Iranian companies in the stock market.
First, well-known data mining approaches such as clustering methods are used to investigate the transaction behavior of investors using 30 criteria. Afterward, the Data Envelopment Analysis (DEA) approach is employed to assess the efficiency of investments. The novelties of this study are divided into two parts. In the first part, data mining methods are used to study the behavior of customers in the stock markets in Iran based on their transactions. The second part includes the evaluation of the efficiency of active companies in the Tehran stock exchange using DEA. The proposed approach is applied to all active industries in Tehran stock exchange.
The next parts of this paper are organized as follows. In Section 2, a brief literature review of past research works is provided. In Section 3, the proposed approaches are discussed. The case study and results are presented in Section 4. Finally, the paper will be concluded in Section 5.
2. Review of Past Research
2.1. Studies Related to Efficiency Assessment in Stock Markets
In this section, we review a number of studies employing neural networks, clustering and DEA, and other data mining approaches to stock market analysis.
Hiransha et al.  used novel deep learning architectures such as Multilayer Perceptrons (MLP), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM) Networks, and Convolutional Neural Networks (CNN) to predict stock prices of companies on the National Stock Exchange of India and the New York Stock Exchange. As a result, the Convolutional Neural Network (CNN) method performed better than other methods due to its ability to catch sudden changes in the network during the forecasting phase.
Montenegro and Molina  started from the fact that the neural networks appeared superior to other methods in modeling nonlinear data and worked on the prediction of the day-to-day stock value of the S&P 500 Index. By creating a data set from the daily market activity values of the stocks between June 7, 2013, and June 6, 2018, for each company in the S&P 500 Index, the Deep Learning Neural Network method was used in the training of the network and the Feature Selection Analysis method in determining the behavioral tendencies of the companies. Thus, this paper helped decision-makers to improve investment behavior.
Yazdi et al.  measured the performances of 25 insurance companies, four of which were public and the rest were private, in Iran during the 2014–2015 period. They developed and used a new approach based on Data Envelopment Analysis (DEA) together with fuzzy clustering. In the study, which considers employees, capital, and total assets as input variables and total costs, paid compensation, and profit as output variables, it is concluded that ten companies belong to an efficient cluster with the output-oriented CCR method. Similar to our study, the authors analyzed DMUs (banks and companies listed at the stock market) using DEA and clustering methods.
Khedr et al.  estimated the behavior in the stock market by using news sentiment analysis and data mining techniques, considering a data set containing the opening, high, low, and closing (OHLC) prices of the stocks of three companies traded in the NASDAQ market, and three news articles per day about the market and the company. According to the results obtained, a strong correlation was found between the news classified as positive or negative and the fluctuations in stock prices. In addition, it has been observed that using the opening, high, low, and closing prices of stocks increases the accuracy of the market behavior prediction up to 89.80%. The used method is similar to other studies, including our investigation, as a data mining approach is applied.
Karimi and Barati  evaluated the financial performances of 72 companies selected from the automotive, pharmaceutical, petrochemical, and cement sectors traded on the Tehran stock exchange using the negative data-limited adjusted measure in DEA . In this study, financial ratios such as assets turnover ratio, quick ratio, and current ratio serve as input criteria, whereas earnings per share ratio, return on assets ratio, net profit margin ratio, and so on are output criteria. After all, fifty-eight companies were determined as efficient, and the rest were determined as inefficient companies; then, efficient companies were ranked using the Andersen and Petersen model. In contrast to our study, this paper did not cluster industries, and the study is conducted under normal conditions, that is, without the effects of COVID-19 on the efficiency of companies.
Anouze and Bou-Hamad  considered MENA countries in the Middle East and North Africa and investigated the effects of 151 banks on environmental factors and performance in the years 2008–2010. They used data envelopment analysis and statistical data mining techniques such as classification and regression trees (CART), conditional inference trees (CIT), random forest based on CART and CIT, bagging, Artificial Neural Networks, and logistic regression. In the study, fixed assets, deposits, equity, and personnel expenses are input variables. Loans, net income, and liquid sources are determined as output variables. In summary, it was concluded that random forest and bagging methods are better statistical tools than other methods in measuring bank performance by using the importance rankings of the variables. The study used DEA and data mining methods, but there are differences in the used data mining techniques compared to our analysis.
Chang et al.  analyzed the portfolio performance valuation of 44 textile companies in Taiwan from 2011 to 2018 using the nested dynamic network (NDN) Data Envelopment Analysis (DEA) method. They applied a two-stage approach to their work. First, the additive dynamic DEA method was applied to find efficient financial assets. Then, an evaluation was made with the NDN DEA model to evaluate the portfolios consisting of selected effective financial assets in periods of three, four, five, or even six years. Also, here, the DEA technique was employed, but with a different data mining method compared to our study.
Zhong and Enke  evaluated 60 financial and economic factors in the S&P 500 Index using clustering and classification as data mining methods for daily stock market return prediction. Within the scope of this evaluation, 60 factors in the data set consisting of 2518 trading days between June 1, 2003, and May 31, 2013, were classified using the Principal Component Analysis (PSA) method to find the most important and most effective main components. In the next stage, by obtaining 12 new data sets from all adjusted data, the daily direction of returns was estimated by an Artificial Neural Network (ANN) and logistic regression methods. It has been concluded that the ANN achieves higher accuracy than logistic regression and classification and cluster mining are important in reducing the size of the data and increasing its efficiency.
Rezaee et al.  applied integrated methods to estimate the online financial performance of companies when online data were taken from the Tehran stock exchange in 2007–2012 at different time intervals. The used methods are Dynamic Fuzzy C-Means for updating cluster and membership numbers, Data Envelopment Analysis (DEA) for evaluating companies using financial ratios, and finally Artificial Neural Networks (ANN) for predicting the future performance of companies. The difference to our approach is that we used the K-means method, while Rezaee et al.  used fuzzy C-means.
Mehlawat et al.  evaluated 20 risky assets with their credibility by using variance and Conditional Value at Risk (CVaR) as risk measurement tools together with liquidity and entropy criteria. Within the scope of this evaluation, randomly selected portfolios with different sample sizes, including risk and entropy as input values and return and liquidity as output values, were examined by the DEA method. Moreover, these risky assets were combined with the fuzzy multipurpose portfolio model, and performance evaluation was made. In addition, the result of this paper indicated how a portfolio could be selected considering the risk of companies by using DEA and VaR.
Mashayekhi and Omrani  have proposed a new multipurpose model, which incorporated DEA cross-efficiency into the mean-variance Markowitz model to select an investment portfolio. The model examined 52 companies operating in Iran’s stock market. The results showed that the amounts of risk and return were considerably appropriate at the same time in comparison with the Markowitz model and DEA.
Table 1 provides an overview of methods used in this field.
As mentioned above, stock market evaluation is one of the most popular subjects in financial analysis. The reason is that the stock market is the engine of economics in most countries. In some related studies such as ours, the DEA method is used solely or in combination with data mining or Artificial Intelligence (AI) approaches. In DEA, the DMUs are evaluated and categorized by data analysis. In other methods, the regression shows the relationship between stock markets and factors that affect them. Some other approaches use AI to predict the performance of companies in the stock market. In our study, the effects of the COVID-19 pandemic as one of the substantial issues that affected economies worldwide and especially stock markets are evaluated. In particular, it is analyzed which industries are efficient. Inefficient industries should be benchmarked with efficient industries in order to improve.
2.2. Studies on Economic Effects of COVID-19
In the following, we provide a brief overview of studies related to the economic consequences of the COVID-19 pandemic in different countries.
Caraka et al.  studied the effects of COVID-19 on the environment and the economy of Indonesia. The conducted statistical analysis and results show that COVID-19 has a significant effect on the economy of Indonesia. Albu et al.  used a logistic model to predict the effects of the pandemic on the economy of Romania. Based on their simulation, pandemic evolution was classified into four distinct phases. Three scenarios were considered to estimate the economic impact of the epidemic at three levels. Results showed that, in the long term, an economic program based on large investment could contribute to restoring growth levels both worldwide and in the case of the EU countries.
Grima et al.  studied the previous mechanisms which were applied to provide an understanding of the challenges related to GDP. A simple statistical analysis was used by adopting data that were collected from government websites, online statistics, published reports, trends, and internal data. It is mentioned in the study that the research will help risk managers and leaders to understand the devastating social and economic impact of such disruptions and act proactively to avoid repetition and the negative effects of being unprepared.
Thorbecke  studied the impact of COVID-19 on the United States economy. Stock returns for 125 sectors are considered during the COVID-19 crisis in this study. The paper investigates how both the macroeconomic environment and sector-specific factors affect returns. Several macroeconomic variables were used in this study. With the help of a regression technique, estimation equations were used to predict the stock price index. Finally, the study discusses that stock prices are useful because they provide a measure of how investors expect shocks to impact future cash flows across sectors. Results show that sectors impaired by idiosyncratic factors include airlines, aerospace, real estate, tourism, oil, brewers, retail apparel, and funerals.
3. Research Methodology
The main steps of the proposed approach of this study are presented here. First, some essential financial indicators are determined and defined. Then, a clustering approach is conducted to cluster the considered companies of this study. The performance of each company and related clusters is determined through DEA.
3.1. Main Financial Indicators
In order to investigate the stock exchange, some financial indicators are required. Consulting a number of experts, who have enough experience in the related field, yielded to identifying the most essential financial indicators. They are defined as follows (see, e.g., [31, 32]):
Payments to shareholders include capital, share reduction, savings, profits, and losses.
Return of equity shows the efficiency of management of a company in employing the resources to obtain profits. It measures the rate of return on the owners’ investments. Return on equity is calculated using
Return on assets is calculated by dividing the net income of the company (income after taxes) by total assets as shown in
3.2.1. Clustering Approach
The first phase of the proposed approach is the clustering of investment records in order to find subsets of data in a way that the variance within clusters is minimized and the variance between clusters is maximized . One of the main problems in clustering is to determine the number of clusters. There are several methods for determining the number of clusters. In this study, Wilk’s lambda method [34, 35] as shown in (5) is used to determine the suitable number of clusters:
is the variance within a cluster and is the total variance. If the diagram of Wilk’s lambda coefficient is drawn on the basis of k (cf. Figure 1), the first jump in the diagram is the optimal number of clusters. Based on the data gathered from the Tehran stock exchange, the following variables are used to cluster the customers:(i)Online buying and selling of shares(ii)Time of buying and selling, which is divided into four periods (i.e., the company and industry whose shares have been sold)(iii)The amount of sold or bought shares(iv)Value of transactions, which is equal to the volume of transactions made by means of share prices on that day
Based on the data gathered from the Tehran stock exchange, the following variables are used to cluster the customers:(i)Online buying and selling of shares(ii)Time of buying and selling, which is divided into four periods (i.e., the first six months of 2019, the second six months of 2019, the first six months of 2020, and the second six months of 2020)(iii)Company and industry whose shares have been sold(iv)The amount of sold or bought shares(v)Value of transactions, which is equal to the volume of transactions made by mean of share prices on that day [36–38]
3.2.2. Analyzing the Efficiency Scores
Farrell  used a method on the basis of estimating production functions to measure the technical efficiency in a manufacturing company incorporating one input and one output. Farrell  also used the approach to evaluate the efficiency of the farming industry in the USA compared to the other countries. In 1978, Charnes et al.  developed Farrell’s idea and presented a model that had the ability to measure efficiency in the presence of several inputs and outputs. The model proposed by Charnes et al.  was the first official Data Envelopment Analysis (DEA) model. In 1984, Banker et al.  extended the proposed model in the presence of variable return to scale assumptions, called the BCC model. The difference between the CCR and the BCC model is that the return to scale is constant in the CCR model, whereas in the BCC model, it is not.
As the companies are assumed as DMUs, they should be analyzed using the DEA approach in the next phases of this research. Experts are asked to divide the financial indices into two main classes as inputs and outputs. Based on the views of experts of the stock exchange, the indicators are divided into inputs (total assets, total debts, and payments to shareholders) and outputs (operational profits and losses, return of equity, return on assets, and sales). A company is shown in Figure 2 as a DMU.
The input-oriented DEA based on the variable return to scale assumption is used to evaluate the financial efficiency of companies. This model was first proposed by Banker et al.  and is formalized in (6). Let and be the input and output vectors of for .
Here, p is the DMU being evaluated in the set of . is the measure of the efficiency of DMU p. are the weights assigned to output and input for solving the DEA model. is the free variable. Model (6) is a fractional mathematical programming problem, and its global optimum is hard to find. So,  proposed the following linear form:
In the BCC model, the sign of the variable W indicates the return to scale influence for each DMU. A: if W < 0, return to scale is descending B: if W = 0, return to scale is fixed C: if W > 0, return to scale is ascending
The dual of the linear model (7), which is called the envelopment form, is specified as
These models can be used to evaluate the financial performance of companies as DMUs, which were schematically depicted in Figure 2.
3.3. Research Procedure
In our study, the following steps are conducted:
Step 1. Extract inputs and outputs.
Step 2. Select companies from population.
Step 3. Collect data based on inputs and outputs.
Step 4. Run the DEA method.
Step 5. Use the C-mean method for clustering the result of DEA.
Step 6. Benchmark inefficient clusters from efficient cluster.
Figure 3 shows the procedure of research.
3.4. Data Sample
The proposed approach of this study is applied to 214 companies in total, which are listed at the Tehran stock exchange and belong to 23 industries. The number of companies related to each industry is shown in Table 2. Data regarding the considered financial indicators are taken from published balance sheets of the companies. In addition, data of transactions of 56643 customers in 2019 and 2020 are gathered and used for our study.
4. Results and Discussion
After prescreening, the data of those customers who have at least one buying and one selling record during this period are selected. Financial data of customers, companies, and industries are collected from the Tehran stock exchange database. Excel and SPSS are used for preprocessing and analyzing the data. MATLAB is used for clustering. GAMS software is utilized to codify the DEA models.
4.1. Proper Number of Clusters for Buying and Selling Data
As mentioned before, Wilk’s lambda coefficient is used to determine the suitable number of clusters. As shown in Figure 1, the first jumps for buying and selling data occur for k = 3 and k = 4 clusters, respectively. So, the suitable number of clusters based on Wilk’s lambda coefficient is 3 and 4 for buying and selling data, respectively.
4.2. Dimension Reduction by Principal Component Analysis
Principal Component Analysis (PCA), which is a method for reducing the dimension of high-dimensional data on the basis of the direction of data dispersion [40, 41], is used to reduce the dimension of buying and selling data in favor of showing the results of clustering in a two-dimension plot. PCA is used for both buying and selling data. The associated two-dimensional clustering plots are shown in Figure 4.
4.3. Cluster Analysis Using the K-Means Method
Both buying and selling records are distinctively clustered using the K-means method considering the suitable number of clusters suggested by Wilk’s lambda coefficient. It is notable that a record in the database is associated with a person who has accomplished a buy or sell transaction from 2019 to 2020. Results of clustering are shown in Table 3. It is notable that the clustering is accomplished in a 25-dimensional space as shown in Table 3, and buying and selling include 3 and 4 clusters, respectively. The variables which were considered for clustering the data are presented in Table 4.
In buying, the first buying-related cluster has the lowest number of members among the others. It also has the highest average of trading value. Reconsidering the customers included in this cluster, it has been recognized that they were corporate entities. The second buying-related cluster has a middle number of members compared to the other clusters. It is notable that the average of trading values is the lowest in comparison with the other clusters. The third buying-related cluster has the highest number of members among the others. It is notable that the average of trading values is the lowest amount in comparison with the other clusters. More formally, the customers included in this cluster invest in low volume and high diversity.
Regarding selling, the first cluster has the lowest number of members and the highest average of trading value. The second cluster has the middle number of members and the lowest average of trading value. The third selling-related cluster has the highest number of members and the lowest average of trading value. An investigation of customers of this cluster reveals that they are normal persons with an average investment of 15617 USD.
The hit number and percentage of records in both buying and selling clusters are shown in Tables 2 and 5, respectively.
4.4. Results of Customer Clustering
Buying data and selling data included 31587 and 25056 records, respectively. The third cluster of buying with 31507 records and the fourth cluster of selling with 24503 records have the highest number of members among the clusters. This shows that the behaviors of customers in these clusters are similar. Figure 5 shows the mean of traded shares in each cluster. Figures 6–8 present transaction details of clusters 1–3. As can be seen in Figure 6, in the first cluster, investors have invested in two sectors (bank and multi industrial company). This cluster has the highest value of transactions and the lowest dispersion of investment in various industries. As can be seen in Figure 7, in the second cluster, investors have invested in 16 industries. Technical and engineering services and chemical products have the largest shares in buying transactions of this cluster.
As can be seen in Figure 8, in the third cluster, investors have invested in all 23 industries. Among these industries, bank and chemical products are most often traded in this cluster.
4.5. Results of Efficiency Measurement Using DEA in Each Cluster
In the prescreening phase, negative data are shifted using variable exchanges. Portela et al.  introduced a method for exchanging negative data in DEA with positive for solving the model. These variable exchanges are done by the following equations for both output and input variables.
First, consider (p) as the SP range, which measures the distance between a reference variable and the current variable. For output variables yr, it is
For input variables xi, we have
Data are also normalized. The DEA models are codified in GAMS software. The input-oriented DEA model considering variable return to scale is used to evaluate the efficiency of the companies in the Tehran stock exchange. Table 6 presents the efficiency scores for 2019. According to the data presented in Table 6, among 23 industries, 9 are efficient. Seventy-nine percent of inefficient industries have an efficiency lower than 0.6. The transportation industry with an efficiency of 0.03 is the most inefficient company.
In order to suggest a practical benchmark for increasing the efficiency of inefficient industries, a reference set (linear combination of efficient companies) is used.
Each reference set includes efficient DMUs, which can construct efficient projections of the associated inefficient DMU. According to Table 6, the tile industry and the multi-investment industry have the highest presence in the reference sets of all inefficient industries; that is, they occur most often in nonefficient clusters.
For instance, the inefficient computer industry can follow methods of the tile industry, the multi-investment industry, and the car industry in the selection of inputs and outputs to be projected toward the efficient frontier. Based on efficiency scores reported in Table 6 and using reference sets, one can project the inefficient DMUs toward the efficient frontier. This can be assumed as a practical benchmark for inefficient DMUs. Tables 7–9 represent the efficiency scores and fraction of total investment in each industry.
Table 10 presents the efficiency scores of industries in 2020. An efficiency of 1 indicates that the DMU is efficient, whereas efficiency of less than 1 indicates inefficiency.
In Table 11, the industries are categorized according to the efficiency scores.
Figure 9 presents the efficiency scores of DMUs in both 2019 and 2020. Thus, the figure provides suitable information to compare the situation of a DMU during the period 2019–2020. Figure 9 shows that DMU23 and DMU21 have a major reduction of efficiency (92% and 46% compared to their efficiency). DMU06 has a 35% growth of efficiency compared to its efficiency in 2019.
Tables 12 and 13 present the efficiency scores for the second and the third cluster in 2020.
5. Conclusions and Future Research Directions
In this study, a hybrid procedure based on clustering analysis and DEA was proposed to investigate the selling and buying behavior of investors. The whole procedure was applied to some financial records of the Tehran stock exchange. The main steps of the proposed procedure are as follows. In the first stage, a prescreening method was accomplished on data. Then, a clustering approach was conducted to investigate the main clusters of selling and buying records. Finally, DEA was used to measure the efficiency scores of each cluster. The results were analyzed based on the financial data of the customers in the Tehran stock exchange. The efficient and inefficient DMUs (companies) were determined based on efficiency scores in each buying and selling cluster. The reference set of each inefficient DMU was proposed to achieve the projection toward the efficient frontier. The reference set can help the managers of inefficient industries to move toward the best benchmarks in the market. In future studies, more clustering approaches can be considered. Uncertainty in data and clusters as well as fuzziness in inputs and outputs can be modeled through fuzzy clustering approaches and fuzzy DEA modeling.
The phenomenon of COVID-19 has strongly affected our lives. Production and service companies are no exemptions from this observation. In Iran, the pandemic has strong effects on the fragile economy. Rates of death people, lockdowns, and decreased productivity are some of the various factors with economic impact. Hence, undoubtedly, the stock market is affected by this pandemic. In this paper, the performance of companies in the stock market during two years of the pandemic is evaluated.
The result indicated that, among 23 industries listed on the stock market in Iran, solely nine were efficient in 2019, and it shows that the COVID-19 pandemic had a strong effect on the stock market. However, in 2020, the number of efficient industries is less than in 2019 and reached six industries only. Among efficient categories based on the clustering method, the banking and investment industries were the most efficient among them. The reason is that some of them are supported by the government, and others had a suitable financial backup to tackle this problem. However, in 2020, the bank industry was not efficient because the effect of spreading COVID-19 was stronger than effects by government support or other issues. The worst industry in both 2019 and 2020 is transportation. The reason is that, according to rules resulting from the COVID-19 pandemic, many countries forbid or limit entrance to their territories; hence, the performance of transportation decreased dramatically. Another industry that is at the bottom of inefficiency in both 2019 and 2020 is the insurance industry. The reason is that many people in Iran became infected with COVID-19 and, hence, high costs result for these insurance companies to cover medical treatment of insurers. The result of the clustering pointed out that industries with a strong direct effect from COVID-19 had the least performance and were inefficient. For increasing their performance and efficiency, they must pay more attention to financial issues and indicators. In other words, the outbreak of the COVID-19 pandemic diminished the efficiency of companies in various industries in Iran. Comparing our results with those of another study conducted during a normal economic situation , we find that usually the three industries, automobile, pharmacy, and cement, had the highest efficiency among other industries. However, in 2019, solely automobile and pharmacy industries remained efficient, whereas the cement industry became inefficient. The reason is that the demand for building houses and thus for cement was decreased dramatically. In 2020 during the outbreak of the pandemic, the automobile section became inefficient as well. The reason is that people do not invest in buying cars in situations of crisis. The pharmacy industry remained efficient in both years because obviously many people were looking forward to treatment; hence, they needed pharmacy and other facilities for treatment.
For future research, we suggest further development of the employed methodology. In particular, researchers can investigate an uncertain environment, for example, by using approaches based on fuzzy numbers, D numbers, or Z numbers.
Used data can be made available upon request to the corresponding author.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
K. Wendt, “Social stock exchanges-democratization of capital investing for impact,” in Proceedings of the 30th Australasian Finance and Banking Conference, Sydney, Australia, December 2017.View at: Publisher Site | Google Scholar
J. Górecki, P. Núñez-Cacho, F. A. Corpas-Iglesias, and V. Molina, “How to convince players in construction market? Strategies for effective implementation of circular economy in construction sector,” Cogent Engineering, vol. 6, no. 1, Article ID 1690760, 2019.View at: Publisher Site | Google Scholar
R. Kumari and A. K. Sharma, “Determinants of foreign direct investment in developing countries: a panel data study,” International Journal of Emerging Markets, vol. 12, 2017.View at: Publisher Site | Google Scholar
C. Montenegro and M. Molina, “Improving the criteria of the investment on stock market using data mining techniques: the case of S & P 500 index,” International Journal of Machine Learning and Computing, vol. 10, no. 2, pp. 309–315, 2020.View at: Publisher Site | Google Scholar
A. E. Khedr, N. Yaseen, and S. E. Salama, “Predicting stock market behavior using data mining technique and news sentiment analysis,” International Journal of Intelligent Systems and Applications, vol. 9, no. 7, p. 22, 2017.View at: Publisher Site | Google Scholar
N. Koptyug, L. Persson, and J. Tåg, “Should we worry about the decline of the public corporation? a brief survey of the economics and external effects of the stock market,” The North American Journal of Economics and Finance, vol. 51, Article ID 101061, 2020.View at: Publisher Site | Google Scholar
J. Traina, “Is aggregate market power increasing? production trends using financial statements,” Production Trends Using Financial Statements, 2018, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3120849.View at: Publisher Site | Google Scholar
P. Wanke, C. P. Barros, and A. Emrouznejad, “Assessing productive efficiency of banks using integrated fuzzy-DEA and bootstrapping: a case of mozambican banks,” European Journal of Operational Research, vol. 249, no. 1, pp. 378–389, 2016.View at: Publisher Site | Google Scholar
A. K. Yazdi and F. Abdi, “Designing robust model for banks benchmarking based on rembrandt method and DEA,” Benchmarking: An International Journal, vol. 24, 2017.View at: Publisher Site | Google Scholar
K. Jayamalini and M. Ponnavaikko, “Research on web data mining concepts, techniques and applications,” in Proceedings of the 2017 International Conference on Algorithms, Methodology, Models and Applications in Emerging Technologies (ICAMMAET), pp. 1–5, Chennai, India, February 2017.View at: Publisher Site | Google Scholar
A. Boussofiane, R. G. Dyson, and E. Thanassoulis, “Applied data envelopment analysis,” European Journal of Operational Research, vol. 52, no. 1, pp. 1–15, 1991.View at: Publisher Site | Google Scholar
R. D. Banker, A. Charnes, and W. W. Cooper, “Some models for estimating technical and scale inefficiencies in data envelopment analysis,” Management Science, vol. 30, no. 9, pp. 1078–1092, 1984.View at: Publisher Site | Google Scholar
A. Charnes, W. W. Cooper, and E. Rhodes, “Measuring the efficiency of decision making units,” European Journal of Operational Research, vol. 2, no. 6, pp. 429–444, 1978.View at: Publisher Site | Google Scholar
K. Ahmadi and M. A. Ramezani, “Iranian emotional experience and expression during the COVID-19 crisis,” Asia Pacific Journal of Public Health, vol. 32, no. 5, pp. 285-286, 2020.View at: Publisher Site | Google Scholar
A. H. Samadi, S. Owjimehr, and Z. N. Halafi, “The cross-impact between financial markets, Covid-19 pandemic, and economic sanctions: the case of Iran,” Journal of Policy Modeling, vol. 43, no. 1, pp. 34–55, 2021.View at: Publisher Site | Google Scholar
M. Hiransha, E. A. Gopalakrishnan, V. K. Menon, and K. P. Soman, “NSE stock market prediction using deep-learning models,” Procedia Computer Science, vol. 132, pp. 1351–1362, 2018.View at: Publisher Site | Google Scholar
A. K. Yazdi, Y. J. Wang, and A. Alirezaei, “Analytical insights into firm performance: a fuzzy clustering approach for data envelopment analysis classification,” International Journal of Operational Research, vol. 33, no. 3, pp. 413–429, 2018.View at: Publisher Site | Google Scholar
A. Karimi and M. Barati, “Financial performance evaluation of companies listed on Tehran stock exchange: a negative data envelopment analysis approach,” International Journal of Law and Management, vol. 60, 2018.View at: Publisher Site | Google Scholar
K. Tone, T. S. Chang, and C. H. Wu, “Handling negative data in slacks-based measure data envelopment analysis models,” European Journal of Operational Research, vol. 282, no. 3, pp. 926–935, 2020.View at: Publisher Site | Google Scholar
A. L. M. Anouze and I. Bou-Hamad, “Data envelopment analysis and data mining to efficiency estimation and evaluation,” International Journal of Islamic and Middle Eastern Finance and Management, 2019.View at: Publisher Site | Google Scholar
T. S. Chang, K. Tone, and C. H. Wu, “Nested dynamic network data envelopment analysis models with infinitely many decision making units for portfolio evaluation,” European Journal of Operational Research, vol. 291, no. 2, pp. 766–781, 2021.View at: Publisher Site | Google Scholar
X. Zhong and D. Enke, “A comprehensive cluster and classification mining procedure for daily stock market return forecasting,” Neurocomputing, vol. 267, pp. 152–168, 2017.View at: Publisher Site | Google Scholar
M. J. Rezaee, M. Jozmaleki, and M. Valipour, “Integrating dynamic fuzzy C-means, data envelopment analysis and artificial neural network to online prediction performance of companies in stock exchange,” Physica A: Statistical Mechanics and its Applications, vol. 489, pp. 78–93, 2018.View at: Publisher Site | Google Scholar
M. K. Mehlawat, P. Gupta, A. Kumar, S. Yadav, and A. Aggarwal, “Multiobjective fuzzy portfolio performance evaluation using data envelopment analysis under credibilistic framework,” IEEE Transactions on Fuzzy Systems, vol. 28, no. 11, pp. 2726–2737, 2020.View at: Publisher Site | Google Scholar
Z. Mashayekhi and H. Omrani, “An integrated multi-objective Markowitz–DEA cross-efficiency model with fuzzy returns for portfolio selection problem,” Applied Soft Computing, vol. 38, pp. 1–9, 2016.View at: Publisher Site | Google Scholar
S. Asadifar and M. Kahani, “Semantic association rule mining: a new approach for stock market prediction,” in Proceedings of the 2017 2nd Conference on Swarm Intelligence and Evolutionary Computation (CSIEC), pp. 106–111, Kerman, Iran, March 2017.View at: Publisher Site | Google Scholar
R. E. Caraka, Y. Lee, R. Kurniawan et al., “Impact of COVID-19 large scale restriction on environment and economy in Indonesia,” Global Journal of Environmental Science and Management, vol. 6, pp. 65–84, 2020.View at: Publisher Site | Google Scholar
L. L. Albu, C. I. Preda, R. Lupu, C. E. Dobrotă, G. M. Călin, and C. M. Boghicevici, “Estimates of dynamics of the covid-19 pandemic and of its impact on the economy,” Romanian Journal of Economic Forecasting, vol. 23, no. 2, pp. 5–17, 2020.View at: Google Scholar
S. Grima, R. Dalli Gonzi, and E. Thalassinos, “The impact of COVID-19 on Malta and it’s economy and sustainable strategies,” Romanian Journal of Economic Forecasting, vol. 23, 2020.View at: Google Scholar
W. Thorbecke, “The impact of the COVID-19 pandemic on the US economy: evidence from the stock market,” Journal of Risk and Financial Management, vol. 13, no. 10, p. 233, 2020.View at: Publisher Site | Google Scholar
A. Yahya and S. Hidayat, “The influence of current ratio, total debt to total assets, total assets turn over, and return on assets on earnings persistence in automotive companies,” Journal of Accounting Auditing and Business, vol. 3, no. 1, 2020.View at: Google Scholar
M. Monea, “Financial ratios–reveal how a business is doing?” Annals of the University of Petroşani Economics, vol. 9, no. 2, pp. 137–145, 2009.View at: Google Scholar
J. C. Bezdek, R. Ehrlich, and W. Full, “FCM: the fuzzy c-means clustering algorithm,” Computers & Geosciences, vol. 10, no. 2–3, pp. 191–203, 1984.View at: Publisher Site | Google Scholar
A. El Ouardighi, A. El Akadi, and D. Aboutajdine, “Feature selection on supervised classification using wilks lambda statistic,” in Proceedings of the 2007 International Symposium on Computational Intelligence and Intelligent Informatics, pp. 51–55, Agadir, Morocco, March 2007.View at: Publisher Site | Google Scholar
I. G. Hatvani, J. Kovács, I. S. Kovács, P. Jakusch, and J. Korponai, “Analysis of long-term water quality changes in the kis-balaton water protection system with time series-, cluster analysis and wilks’ lambda distribution,” Ecological Engineering, vol. 37, no. 4, pp. 629–635, 2011.View at: Publisher Site | Google Scholar
N. R. Pal, K. Pal, and J. C. Bezdek, “A mixed c-means clustering model,” in Proceedings of the 6th International Fuzzy Systems Conference, vol. 1, pp. 11–21, Barcelona, Spain, July 1997.View at: Publisher Site | Google Scholar
K. L. Wu and M. S. Yang, “Alternative c-means clustering algorithms,” Pattern Recognition, vol. 35, no. 10, pp. 2267–2278, 2002.View at: Publisher Site | Google Scholar
J. Yu, “General C-means clustering model,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1197–1211, 2005.View at: Publisher Site | Google Scholar
M. J. Farrell, “The measurement of productive efficiency,” Journal of the Royal Statistical Society: Series A, vol. 120, no. 3, pp. 253–281, 1957.View at: Publisher Site | Google Scholar
H. Abdi and L. J. Williams, “Principal component analysis,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 2, no. 4, pp. 433–459, 2010.View at: Publisher Site | Google Scholar
R. Bro and A. K. Smilde, “Principal component analysis,” Analytical Methods, vol. 6, no. 9, pp. 2812–2831, 2014.View at: Publisher Site | Google Scholar
M. C. A. S. Portela, E. Thanassoulis, and G. Simpson, “Negative data in DEA: a directional distance approach applied to bank branches,” Journal of the Operational Research Society, vol. 55, no. 10, pp. 1111–1121, 2004.View at: Publisher Site | Google Scholar