Abstract

In test cricket, we rated batters’ performance. We have proposed rating criteria as introduced by Scarf, Akhtar, and Rasool in 2014 with additional explanatory variables on the updated data set. The newly added covariates that we included in our research are the home factor and the ground influence. The same rating system is applied in the previous study. Using multinomial logistic regression, sessions from all days of a test match are modeled to determine match outcome probabilities at the end of each session. These models are based on all of the factors that can influence the outcome of a match. It is discovered that the predictors of home factor and pitch quality have a significant impact on the outcome of the test match. We used multinomial logistic regression to model data and estimate the parameters in the models. We forecasted match outcomes using these models at the end of each session and measured batters’ performances by using these probabilities. This process is repeated in a test match at the end of a session, and batters’ contributions to their team score are accumulated. Both teams’ batters are then ranked based on their rating points. The batsmen are rated based on their performance in the match by adding new factors (pitch effect and home advantage) in the models. The proposed ranking is compared with the ICC’s traditional ranking of batters in the test cricket series.

1. Introduction

Test cricket is a game of patience [1, 2], and it is littered with renowned batsmen who have set extremely high standards [3]. A batsman has nearly limitless time to set and play each ball individually. It is a ball-and-bat duel that is not defined by the number of deliveries [4]. Despite this, batsmen have struggled to stay at the crease, as conditions and lapses in focus result in the loss of their wicket, particularly in the modern game. In general, traditional measures such as batting average are used to evaluate a batsman in test cricket [5, 6]. Players were awarded points based on the number of runs they score during the game. This time-honored method has several drawbacks [5]. The context of the contest in which the runs are scored is not revealed by the runs scored in the match or the average runs in the series. For instance, scoring 100 runs in a low-scoring contest is not the same as scoring 100 runs in a high-scoring match. Scoring 100 runs in the first innings is not the same as scoring 100 runs in the fourth innings. This is due to the match’s circumstances. Due to deterioration in the pitch, the pitch of the first inning is radically different from the pitch of the fourth inning. Traditional measures such as batting average, on the other hand, overlook such instances. In the same way, playing in Melbourne is not the same as playing in Qaddafi Stadium. The impact of performing in one’s own country or overseas, as well as other factors such as bowling first or second and losing or not losing the toss, are all said to have an impact on cricket results [7]. In any sport, there are various approaches for determining who the top player or team is [8, 9]. You can give them points based on which team performs best. There are both technical and nontechnical approaches for evaluating players or teams. Differentiating points are used to determine the winner in several sports [10]. It is possible to model point difference, but we will not. We are attempting to develop a new statistical measure that will allow us to assess batsman performance in the context of the contest in which runs are scored. We are concentrating on the outcomes of Test Cricket matches. We explored the extent to which a variety of characteristics, such as playing at home or away, batting or fielding first, and pitch condition, influence match outcomes. Initially, we will forecast Test match outcomes using a multinomial logistic regression model. These forecasts will subsequently be used to evaluate the batter’s performance.

A large number of studies have attempted to focus on Test Cricket players in various ways around distinct theoretical frameworks [5, 1117]. Recent research studies on cricket have highlighted the need of examining and comprehending prematch indicators such as toss, ground effects, home ground, and rating of both participating teams, among others [1822]. Kimber and Hansford studied cricket batting strategy at various levels [23]. They showed how scoring rate, opposition bowling strength, and pitch condition can be accurately integrated with runs scored to create an overall picture of batsmen’s relative attributes. The Test match results were studied by Allsopp and Clarke [24]. They concluded that a team’s first-inning bowling and batting strength, first-inning batting order lead, and home advantage are all good indicators of a winning test match outcome. Barooah and Mangan looked into some of the problems in evaluating batsmen for test matches [25]. They discovered that batters in cricket are mostly valued according to their average score: in test matches, an average of 50 or more provides a rule-of-thumb for distinguishing inordinate players from the purely good. Singh et al. assessed cricket players’ batting performance and calculated the impact of their performance on the ICC ranking system [26]. Male test cricket batters and female test cricket batters were ranked by Rohde [12]. He proposed a straightforward approach for ranking batters based on their performance. Mukherjee used a diffusion-based PageRank algorithm on the networks to figure out how important it is to rate teams and captains [27]. In Test Cricket, Akhtar and Scarf predicted match results session by session [28]. They looked at how to match result probability (win, draw, and loss), and consequences differed from one session to the next. Daud and Muhammad collected a collection of Test matches [29]. They proposed a new ranking system for Test Cricket teams based on the number of runs scored and wickets taken. They suggested that a standard accuracy index be developed to determine the relevance of the discrepancy between the researcher’s proposed rating system and the ICC rating system. Akhtar et al. developed a new rating system for players [5]. They determined the criteria for the best player in test cricket. Shah and Patel applied principal component analysis and weighted average method to rate the captain of captains among all 29 captains included in the study. Brewer and Stevenson suggested a survival analysis to forecast batting abilities in Test Cricket matches [30]. They developed a model in two stages, the first for individual players to assess their initial and balanced batting talents, as well as the rate of change in both. They matched and identified the cricketers who open the batting, which has a positive impact on the batting order. Hussain et al. utilized the International Cricket Council’s ad-hoc point system to assess cricket teams, and it is exclusively based on the number of wins and losses in cricket matches [31]. They compared their findings to those of the ICC. Boys and Philipson used an addictive log-linear model to model run scores [13]. They looked at how an individual batsman’s innings-by-innings variation in runs becomes a source of doubt in their ranking position. Stevenson and Brewer developed a Bayesian parametric model to calculate and estimate how intercontinental cricketers’ batting ability alters across innings using a Gaussian process [32]. They identified which batsmen are struggling or improving their batting skills, which has a real-world influence on sportsman evaluation, aptitude recognition, and team selection strategy. Researchers have long hypothesized that the batter’s performance influences the outcome of test matches [11]. In cricket, the concept of a player’s rating appears to have always piqued the interest of sports analysts. The research on batsmen’s performance also shows the importance of home ground, which can have a substantial impact on the outcome of a match.

2. Forecasting Test Matches’ Results

All test cricket matches played between January 1, 2017, and December 31, 2019, will be considered. The cricket website ESPNcricinfo (https://www.stats.cricinfo.com/ci/content/records/307847.html/) is used to get session-by-session data. Rain-affected contests and those with poor lighting will be disqualified. A Test Cricket match lasts five days, with each day consisting of three sessions (lunch, tea, and end of the match). The study only included nine (out of ten) recent ICC (International Cricket Council) sanctioned Test Cricket playing countries. Afghanistan has been removed due to its current status as a Test-playing nation, and, as a result, its participation in a disproportionately small number of ICC-sanctioned matches. Outcomes are measured over three years since it is assumed that for the most part, the core playing group has stayed consistent throughout this time frame. At the end of each session, a series of multinomial nominal logistic regressions is fitted to forecast Test match outcome probabilities. Here, we will look at a model with a multinomial response (win, draw, and loss). Y depicts the match result by assigning values (1, 0, and −1), with each value equating to a victory, a tie, or a defeat. The reference category is draw (0). We employed the Akaike information criteria (AIC) (Sakamoto, Ishiguro, and Kitigawa [33]), which is formulated as AIC= 2∗(number of estimated parameters involved in the model) − 2∗(log-likelihood) and Nagelkarke’s R square to examine the model fit (Nagelkerke, [34]), which is given as . In each session of each day, we modeled match outcome session-by-session in Table 1 and forecasted the test match outcome probabilities. In this section, we used those probabilities to assess each batter’s contribution to both teams. To compare our suggested rating system to the existing batting average approach, three distinct Test match series (7 matches) were included. We display the rating points for covariates such as ground effect, no ground effect, home advantage, and no home advantage. We use comparisons to see how these prematch factors affect batters’ ratings.

3. Measuring Batters’ Contribution

To determine the batter’s contribution, you must first obtain the odds of the test match’s outcome. Nominal multinomial logistic regression is used to calculate the match outcome probability (Sohail and Scarf, 2012). These actual probabilities are written as follows:where P(Y) denotes the probability (win = 1, draw = 0, or loss = −1) at the end of each session t (t = 1st, 2nd, 3rd,..., 15th), l denotes the lead until session t, w_1 denotes the first team’s wickets, w_2 denotes the second team’s wickets, g denotes the ground effect, and h denotes home advantage. The model assumes Y has a multinomial distribution, that is, Y follows MN with,

We forecast match outcomes based on the above-mentioned explanatory variables for each session of the test match. The potential position for both the reference team and the opponent squad has also been well-defined. At the end of each session, the hypothetical position of the batsmen is defined as follows:

We assess their contributions after computing their points to determine the best batsman in the Test Cricket matches.

3.1. Example 1

Consider an Australia-New Zealand test match at the Perth Cricket Stadium in Australia. When we fit a model at the commencement of a test match, the likelihood of the reference team (Australia) winning, drawing, and losing is 0.67, 0.03, and 0.30, respectively. Table 2 contains session-level data. Australia wins the match by 296 runs.

We depict the Trans-Tasman series, which is played in Australia between New Zealand and Australia. Australia has won this series (3-0). In the series, the Australian cricket team had the benefit of playing at home. The batters’ rating points throughout the series are shown in Table 3. Table 4 shows the results of our proposed methodology when both ground effect and home advantage are taken out of the equation. Table 5 shows the batters’ rankings, which are based on traditional batter averages. Labuschagne of Australia received the highest average of 91.50 points. Second, our criteria assign a score to batters based on the probability of each session’s test match result. Instead of the contribution shown in the typical position, a batter who performs well in a critical scenario receives additional rating points. As it stands, batters who perform well against highly rated teams earn more points than batters who do well against average teams. M Labuschagne received the best batsman of the series award in the Trans-Tasman series, as per traditional ratings; he scored the most runs with the highest average. The outcome would be different if the batsman of the series award was awarded using our proposed criteria. At last, we found correlation between our proposed rating system and ICC rating system with r = 0.636 and p-value = 0.000.

3.2. Example 2

Consider the 2019 Test match between Pakistan (batting first) and Australia (batting second) at Brisbane, Australia. We used the coefficients of several covariates to fit a model using sessional data. Pakistan’s chances of winning, drawing, and losing at the start of the match are 0.53, 0.13, and 0.34, respectively. Table 6 contains session by session data. Australia wins this match.

Another example is a Test Cricket match series between Australia and Pakistan that took place in Australia in 2019. The series was won 2-0 by Australia. The Test Cricket series is depicted in Table 7. Table 7 shows the results of the analysis when all predictors are considered, whereas Table 8 shows the results when home advantage and ground effect are not considered. All batters who have at least one chance to bat for their team in a test series are rated in Table 7. In Tables 7, 8, and 9, DA Warner, an Australian batter, was ranked best among batters from both teams. In Tables 7 and 8, he had different scores. When the ground impact and home factor are removed from the model, he loses some ranking points in Table 7. According to Table 9, DA Warner remained the greatest batter with the highest batting average based on the ICC’s basic average criteria. In the series, DA Warner was named batsman of the series. It is concluded that there exists a correlation between our proposed criteria and ICC criteria with r = 0.835 and p-value = 0.001.

3.3. Example 3

Take, for example, a test match played at Chattogram in 2018 between Bangladesh (reference team) and Sri Lanka. We used coefficients for different explanatory variables to fit the model on session-by-session data. For the reference team (Bangladesh), the chances of winning, drawing, and losing are 0.73, 0.12, and 0.15, respectively. The match has been called a draw. Table 10 contains session-by-session lead data.

Consider another two Test Cricket match series in Bangladesh in 2018 between Sri Lanka and Bangladesh to further investigate the proposed criteria. The series was won by Sri Lanka with a score of 2-1. Players’ batting performance in the series is described in Table 11 . When all predictors are included and techniques are used, Table 11 is produced. According to the results, Sri Lankan batter BKG Mendis received the most points (0.173) and was ranked first among all batters. When the covariate home advantage was not taken into account, BKG Mendis came in second with 0.201 points in Table 12.

When the covariates home factor and ground factor are removed from the collection of predictors, Table 12 is generated. In Tables 11 and 13, the same batter takes the first place. Table 13 is created to rate the players’ batting performances in the Test series using traditional averages. According to Table 13, Sri Lankan batsman BKG Mendis scored the most runs (271) and had the highest batting average in the series. Different criteria are used to grade batters in a conventional rating, and each table has a different top hitter, and our criteria rate batters by summing their hitting performance.

4. Discussion

The results of the analysis revealed that each outcome had a varied impact at various stages of a Test Cricket match. Explanatory variables such as home factor, ground effect, and team strength have an effect on outcomes at the start of a Test match, but this effect fades as the match progresses. Lead has a minor impact at the start of a test match, but it grows in importance as the match develops. During the match, the number of wickets is also significant. A Test Cricket match is made up of five days, each of which has three sessions, for a total of fifteen sessions in a five-day contest. Predictors fluctuate their effect on match results over the course of the five-day match; therefore, we measured all of these sessions one-by-one to anticipate the outcomes at each phase, making it easier for forecasters to forecast on a specific position. Through the statistical analysis, a rating system for Test Cricket matches is presented in this study. Multinomial logistic regression is used to calculate Test match outcome probability. To extend the scope of this study, a larger data set with additional explanatory variables can be used. There is fluctuation in our suggested rating system at the start and end of the Test match. A larger dataset can be used to tackle this problem. In our rating method, batters’ contributions are judged by the difference between the hypothetical probability and the observed probabilities for the first inning and the difference between the supposed probability and the observed probabilities for the second inning. Researchers can utilize a variety of ways to overcome issues relating to the batter’s contributions in a reduced-scoring game. The study methodologies used to rate batters are fairly practical because the proposed rating system is based on session probability, which assesses a batter’s performance in relation to his contribution to the match outcome. We found correlation (0.883) with p-value = 0.001 between proposed criteria and the traditional criteria introduced by ICC.

Data Availability

Datasets are derived from public resources website (http://www.espncricinfo.com) and made available with the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

The authors would like to acknowledge Prince Sultan University and the EIAS: Data Science and Blockchain Laboratory for their valuable support. Also, the authors would like to acknowledge the support of Prince Sultan University for Article Processing Charges (APC) of this publication.