Abstract

In this modern era of technology, most of the accessibility issues are handled with the help of smart devices and cutting-edge gadgets. Smartphones play a crucial role in addressing various accessibility challenges, including voice recognition, sign language detection and interpretation, navigation systems, speech-to-text conversion, and vice versa, among others. They are computationally powerful enough to handle and run numerous machine and deep learning applications. Among various accessibility challenges, speech disorders represent a disability where individuals struggle to communicate verbally. Similarly, hearing loss is a disability that impairs an individual’s ability to hear, necessitating reliance on gestures for communication. A significant challenge encountered by people with speech disorders, hearing loss, or both is their inability to effectively convey or receive messages from others. Hence, these individuals heavily depend on the sign language (a gesture-based communication) method, typically involving hand movements and expressions. To the best of our knowledge, there are currently no comprehensive review and/or survey articles available that cover the literature on speech disabilities and sign language detection and interpretation via smartphones utilizing machine learning and/or deep learning approaches. This study fills the gap in the literature by analyzing research publications on speech disabilities, published from 2012 to July 2023. A rigorous search and standard strategy for formulating the literature along with a well-defined theoretical framework for results and findings have been used. The paper has implications for practitioners and researchers working in accessibilities in general and smart/intelligent gadgets and applications for speech-disabled people in specific.

1. Introduction

A speech disorder, also known as a speech disability, is a condition where an individual faces difficulty in effectively communicating verbally with others. One of the primary challenges for individuals with speech disorders is their inability to convey messages directly through spoken language. Furthermore, some individuals with speech disorders may also experience hearing loss, a prevalent issue worldwide. The prevalence of speech disorders and hearing loss is steadily on the rise, with an increasing number of individuals affected by these conditions each day. According to the World Health Organization (WHO), an estimated 430 million people, which is 5% of the total world population, have a speech disability and this number is expected to rise to 1 in 4 by 2050. The impacts of hearing loss are very serious. For example, people with speech disabilities are unable to communicate with others which may lead to social isolation, loneliness, and frustration. These conditions significantly impact individuals’ lifestyles and academic performance, often resulting in employment challenges. In many developing countries, there are a very limited number of specialized schools to cater to the needs of students with speech disabilities and hearing impairments [1].

Sign language is a way of communication among people suffering from speech disorders and/or hearing loss problems. It is a language for speech-disordered people through which they can communicate with other people and convey their messages. Sign alphabets rely on static hand poses to symbolize individual letters of the alphabet, employing gestures as a form of nonverbal communication. The progression in computer vision has opened doors to the development of sophisticated models capable of recognizing these signs, interpreting hand configurations, and seamlessly translating them into both text and voice [2]. For instance, in a study by Raziq and Latif [3], the authors proposed a gesture-based approach for Pakistan Sign Language (PSL) recognition, focusing on training and communication modules to detect sign language and convert it to text.

There is no universal sign language in the world, and most people rely on region-specific sign languages. Today, there are 138–300 varieties of sign language across the world [4]. Moreover, there is a persistent communication gap between hearing-disabled people, because they rely on sign language, which is a problem for normal people due to their less understanding of sign language. Typically, sign language recognition through gadgets entails a two-step process: first, the detection of hand gestures within the image, followed by their classification into the corresponding alphabet. Numerous methodologies incorporate the use of hand-tracking devices such as Leap Motion and Intel RealSense, accompanied by the application of machine learning algorithms like support vector machines (SVMs) to classify these gestures [5]. Hardware devices, such as Microsoft’s kinetic sensors, are capable of constructing a three-dimensional (3D) model of the hand while tracking hand movements and their orientations [6]. Although hardware-based techniques can offer a relatively high level of accuracy, their widespread adoption is impeded by the significant initial setup costs.

Numerous information and communication technologies (ICTs) are used for the detection and translation of different sign languages used by speech-disordered people. However, some of these technologies are either expensive or socially unacceptable to many people suffering from speech disabilities. The computer-based techniques were widely used; however, the computer is not portable and hence cannot be used by most people on the go. For such, a specialized environment is necessary. Furthermore, it is crucial to employ socially accepted devices to address these challenges.

The ubiquitous presence of smartphones is undeniable. These devices can efficiently execute a wide range of machine and deep learning applications. Notable examples include convolutional neural networks (CNNs), K-nearest neighbors (KNN), deep convolutional generative adversarial networks (DCGANs), deep neural networks (DNNs), support vector machines (SVMs), recurrent neural networks (RNNs), and 3-D convolutional neural networks. The smartphone can translate a sign language gesture to speech and vice versa in real time to convey a proper message to other people. Some prototypical-level applications also exist; however, they are either region-specific or not accurate and hence rarely used. This problem highlights the need for a universal sign language with no geographical boundaries and specifications.

The smartphone processor and camera can be used for the detection of sign language. As mobile hardware technology is getting more sophisticated over time and moving towards cloud infrastructure, maintaining a user-friendly interface and keeping low latency on the cloud processing remains a major issue [7]. Smartphones equipped with an increasing number of cameras have prompted researchers to explore their potential in vision-based sign language recognition applications. In the vision-based approach, a smartphone’s camera is employed to capture images or videos of hand gestures. Subsequently, these frames undergo processing to recognize the signs and generate text or speech output. It is important to note that vision-based approaches may entail a trade-off in accuracy compared to sensor-based methods. This is among various challenges in image processing, including variations in lighting conditions, sensitivity to the user’s skin color, and the presence of complex backgrounds within the image [8].

Numerous review articles have been written on accessibility for speech disorder problems, regional and global sign languages, sensors-based approaches, and gesture-based recognition systems. The following few paragraphs summarize and discuss the contributions in terms of survey papers or reviews and their contributions along with a discussion on the research gap.

In a study by Ardiansyah, et al. [9], a review of studies has been performed between 2015 and 2020. They selected the 22 most relevant studies regarding their research questions. In this study, the most popular method to obtain data is through a camera. Different techniques were compared and CNN was the most popular as it was more accurate and used by 11 researchers out of 22. Similarly, a brief review of recent trends in sign language recognition by Nimisha and Jacob [10] discussed the two main approaches, which are the vision-based approach (VBA) and the gesture-based approach (GBA). The image or vision-based systematic literature review (SLR) and their approach comprising feature extraction and classification are mainly discussed. Moreover, a comparative analysis of the techniques and achievements (in terms of accuracy) of nine different studies on VBA and three studies on GBA is also available in this study.

A review of smart gloves for the conversion of signs to speech for the mute community was proposed [11]. In this study, there was an absence of comparisons across various research papers. The study primarily concentrated on a single approach, specifically the glove-based approach for gesture recognition. Similarly, the perspective and evolution of gesture recognition for sign language are presented [12]. They analyzed different gesture recognition devices through a timeline with important features and achieved recognition rates. They concluded that Leap Motion is a good option for sign language as it is cheap, easy to use, and accurately recognizes the hands. Some work on vision-based sign language recognition systems is also proposed by Sharma and Singh [8]. In this study, different vision-based methods are analyzed along with the datasets used.

A comprehensive review of wearable sensor-based sign language recognition is discussed by Kudrinko et al. [13]. They conducted a review of studies between 1991 and 2019, focusing on a total of 72 different research efforts. This review paper aimed to discern prevailing trends, best practices, and existing challenges within the field. Various attributes, such as sign language variation, sensor configuration, classification methods, study designs, and performance metrics, were systematically analyzed and compared. It is important to note that this particular study exclusively examined the sensor-based approach. Additionally, the paper proposed a review specifically centered around hand gestures and sign language recognition techniques [14]. They focused on a comprehensive exploration of the challenges, diverse approaches, and the application domain of gesture recognition. Furthermore, they studied the various techniques and technologies utilized in sensor-based gesture recognition, providing valuable insights into this area of research.

A technical approach to Chinese Sign Language processing is discussed in the study by Kamal et al. [15]. They provided an overview of Chinese Sign Language Recognition (CSLR). The paper discusses numerous issues related to Chinese Sign Language. Similarly, another review on system-based sensory gloves for sign language recognition and state of the art between 2007 and 2017 was presented by Ahmed et al. [16]. They reviewed the studies published between 2007 and 2017. The authors explored and investigated the SLR using the glove sensor approach. The articles are divided into four categories that are framework, review and study, development, and hand gesture types. Numerous recommendations put forth by researchers aim to address both current and anticipated challenges, offering a wealth of opportunities for further research in this field.

The study on a review of automatic translation from Arabic to Arabic Sign Language is presented in the study by Ayadi et al. [17]. The authors presented work related to Arabic Sign Language (ArSL). They discussed the classical machine translation approach (direct, transfer-based, and interlingua) and the corpus-based approach (memory, example, and statistical). The authors also described the language challenges, such as morphology, syntax, and structure. The study provides an extensive list of important works related to ArSL machine translation. Additionally, it offers a comprehensive review of feature extraction methods in sign language recognition systems by Suharjito et al. [18]. The review of studies published between 2009 and 2018 was analyzed. The authors reviewed and presented the progress of feature extraction in sign language recognition. The authors conclude that there is a considerable improvement in tracking hand regions by active sensors but still, there is room for improvements in vision-based approaches.

A review of gesture recognition focusing on sign language in a mobile context is presented in the study by Neiva and Zanchettin [19]. A review of studies published between 2009 and 2017 is presented. The total number of papers that were analyzed and compared was 43. The authors covered static and dynamic gestures, simple and complex backgrounds, facial and gaze expressions, and the use of special mobile hardware. Similarly, a review of vision-based American Sign Language (ASL) recognition, its techniques, and outcomes are discussed in the study by Shivashankara and Srinath [20]. The authors presented a review of ASL. The authors highlighted the work and comparison of several researchers for vision-based sign language recognition.

A comprehensive survey on sign language recognition using smartphones is presented in the study by Ghanem et al. [7]. In this paper, the authors explored the latest advancements in mobile-based sign language recognition. They categorized existing solutions into sensor-based and vision-based approaches, highlighting their respective advantages and disadvantages. The authors’ primary focus was on feature detection and sign classification algorithms. Similarly, an automatic sign language recognition survey was done in the study [21]. They reviewed the studies published between 2008 and 2017. The authors discussed the advancement of sign language recognition. The authors also provided an overview of state-of-the-art building blocks of automatic sign language recognition like feature extraction, classification, and sign language databases.

A study by Suharjito et al. [22] conducted a review of sign language recognition application systems for hearing loss or speech-disordered individuals, employing an input-process-output framework. They evaluated various sign language recognition approaches and identified the most effective approach. Additionally, the study focused on different acquisition methods and classification techniques, presenting their respective advantages and disadvantages. This comprehensive analysis offers valuable insights for researchers seeking to develop improved sign language recognition systems.

In summary, this discussion above has encompassed selected systematic literature reviews (SLRs) and survey papers covering diverse topics of interest, while also highlighting notable contributions in these areas. Certain reviews are specifically tailored to region-based sign languages, such as Chinese and American Sign Languages. Meanwhile, others have become obsolete, offering minimal relevance to contemporary modern approaches. To address this research gap, this paper conducts a comprehensive analysis and review of publications focused on sign language detection and interpretation techniques, particularly those employing machine and deep learning approaches. The review encompasses publications from esteemed journals and prestigious conferences spanning the past decade, ranging from 2012 to July 2023. The insights derived from this review hold significant implications for a wide spectrum of stakeholders, including practitioners, researchers, developers, and industries engaged in accessibility solutions, software, and hardware development, and the creation of smart devices tailored to individuals with speech disorders. The major contributions of this paper include(i)A complete up-to-date analysis of the publications published from 2012 to July 2023 through a rigorous search and standard selection criteria.(ii)A detailed yet comprehensive discussion on current trends in the field of disabilities specifically for speech disorder people.(iii)A discussion on different machine learning approaches for smart gadgets (smartphones in particular) along with sensor-based approaches used in smart gloves.

This paper organized and categorized (in a comprehensive manner) the available literature from different perspectives and points of view discussed in the Materials and Methods section. A compact and concise literature is presented in respect of sign language recognition. This study may help the practitioners to better understand the area, specifically in mobile-based sign language detection and recognition systems. It may also help the researchers to be fully aware of different approaches and research progress in this field. This work comes under the category of accessibility for people suffering from hearing loss or speech disorders.

The remainder of the paper is structured as follows. Section 2 encompasses the “Materials and Methods,” outlining the approach used for examining the existing literature. Section 3, titled “Findings and Discussion,” investigates the explanation of seven research questions. Section 4, labelled “Meta-Analysis,” provides a comprehensive overview of the paper’s analysis, and it also touches upon potential avenues for future research in Section 5 “Open Research Questions.” Finally, Section 6 serves as the conclusion, and the references are listed at the end of the paper.

2. Materials and Methods

This study presents a systematic literature review (SLR) on sign language detection and interpretation via smartphone-based machine or deep learning approaches. This study is mapped and conducted based on the guidelines presented by Kitchenham et al. [23] and Moher et al. [24]. The research questions are designed to identify the research gap and are framed in Table 1.

2.1. Search Strategy

This section discusses the search strategy for searching and mapping the relevant literature. We used the PRISMA framework for selecting the most relevant studies. We have adhered to the PRISMA framework [24] for structuring our search and selection methodology, illustrated in Figure 1. The PRISMA framework is a widely recognized and established methodology for conducting systematic literature reviews. It offers a set of guiding principles and a flowchart (refer to Figure 1) that aids researchers in adopting a systematic approach to ensure the reporting quality is accurate, comprehensive, and transparent. This, in turn, forms the foundation for making well-founded and evidence-based decisions when selecting relevant literature. Figure 1 illustrates the initial search results, which amounted to 233,860 records. After screening and removing duplicates, 281 studies were left of which 163 studies were the most relevant and are included for analysis.

The criteria for inclusion/exclusion of publication are defined in Table 2. The literature has been tabulated, analyzed, and mapped based on criteria defined in Table 2.

2.2. Time Frame and Digital Repositories

The time for searching the relevant literature is from 2012 to July 2023 (both years included) shown in Table 2. The use of smartphones for sign language detection and identification has evolved over the years due to the widespread adoption of smartphones and their growing role in assisting individuals with disabilities, including speech disorders, visual impairments, and related challenges. Since then, a reasonable amount of literature is available and mapped in this paper. We selected IEEE Xplore, ScienceDirect, ACM Digital Library, and Google Scholar for searching the literature. These repositories were selected due to the reasons that they provide relevant publications, results, and analytics. Academic search engines, such as Google Scholar, are also used for meaningful searches and insights.

2.3. Theoretical Framework and Initial Results

Table 3 shows a list of strings that we have used for searching and mapping the literature. The search strings were searched using different web search engines (discussed above). The search strings tabulated in Table 3 were applied in the selected digital repositories. The results are recorded in Table 3.

The publications are categorized as journal papers and conferences. Only prestigious conferences, i.e., supported by ACM, IEEE, or Springer, are considered. The ratio is shown in Figure 2.

Similarly, the year-wise frequency of the selected publication is shown in Figure 3. We selected papers from 2012 to July 2023. We have seen a healthy growth of publications on these accessibilities, sign language, and smartphones as tools for speech-disordered people.

Table 4 presents the summary (most relevant papers) of the publications along with years, types, and publishers. We selected only well-reputed journals and conferences.

3. Findings and Discussion

This section is dedicated to addressing the research questions raised and discussed in Table 1. Additionally, it provides an exhaustive review of the selected publications from a pool of 163 research papers. It covers a wide range of aspects within the research on smartphones as assistive devices, the application of machine and deep learning approaches for individuals with speech disorders, the compilation of comprehensive datasets utilized in research, region-specific sign languages, and a detailed examination of the evaluation metrics employed in experiments, each discussed in dedicated subsections. Moreover, this section discusses the findings, research gap, and possible directions for future research.

3.1. RQ1: What Is the Current Status of Smartphone-Based Sign Language?

In a study by Ghanem et al. [7], the authors discussed in detail a survey of existing techniques used for smartphone-based sign languages. Moreover, the authors developed an interactive Android mobile application centered around machine learning, aimed at bridging the communication gap between individuals with hearing loss and the general population. In this connection, they introduced the PSL dataset [141]. The approach used in this study involved training the data through the SVM model, enabling automatic recognition of captured signs using the static symbols stored in the database. Numerous approaches to machine and deep learning are used in various applications. Table 5 provides a list of several of these approaches.

Table 5 shows a range of techniques organized according to the year of study and evaluation metric. Notably, the CNN deep learning model has gained widespread acceptance among recent researchers for sign language detection and or recognition. Furthermore, the major evaluation metric employed across the studies is “accuracy,” as indicated in Table 5.

3.2. RQ2: How Machine Learning, Deep Learning, and Lightweight Deep Learning Techniques Are Used for the Detection and Interpretation of Sign Languages?

Over time, numerous techniques have been investigated for efficient recognition of sign and gesture languages. The majority of sign language recognition systems rely on machine learning, deep learning, and lightweight deep learning approaches. Table 6 presents a compilation of selected studies and their respective approaches for detecting sign languages through deep learning methods. Analyzing the table, we can see that CNN is the most dominant technique. These techniques are general and not associated with specific hardware, such as smartphones. Moreover, most of the studies use hand gestures as input and recognize it via some devices, such as custom-built gloves. It is also observed that CNN is still widely used even in recent years. It is important to recognize that any sign recognition system typically involves several key steps. First, input data are acquired, often through sources such as smartphone cameras or sensors. The subsequent step requires feature extraction from the acquired input data. Finally, the signs are classified using algorithms that are well-suited to the extracted features. The accuracy of the detection and extraction system significantly influences the quality of recognition results. Various approaches have been employed in sign recognition systems, including CNN, KNN, ANN, and SVM, among others. Among these techniques, CNN stands out as a leading approach compared to the other methods listed in Table 6. Table 6 also depicts the studies and their associated information with each study.

3.3. RQ3: What Are the Types of Datasets Used for Sign Language Recognition?

Table 7(a) provides a comprehensive discussion of the various types of datasets and their utilization in numerous studies. Furthermore, in Table 7(b), links to publicly available datasets are provided. Upon analyzing these tables, it is observed that most of the studies have developed their custom datasets. Additionally, it is notable that many of these datasets are language-dependent, such as the PSL, American Sign Language (ASL), Malaysian Sign Language, Taiwan Sign Language (TSL), and China Sign Language (CSL), among others. Table 7 showcases the studies along with their respective years, datasets used, and remarks for each study.

Numerous publicly available datasets are used by different articles. Some of them can be accessed via links shown in Table 7(b). Some datasets are custom-made and not publicly available.

3.4. RQ4: What Are the Most Popular Approaches for Recognizing Sign Language?

Sign language recognition commonly utilizes sensor-based and vision-based techniques to observe hand motion and posture [7]. The sensor-based approach involves the use of sensors, such as those embedded in gloves or smartphones, to track hand movements. These sensors, whether external or internal to the mobile device, capture data related to hand gestures. For example, glove-based approaches utilize multiple sensors within the gloves to monitor the position and movement of fingers and the palm, providing coordinates for subsequent processing. These devices may be connected wirelessly via Bluetooth. The glove contains ten flexors for tracking finger posture [39]. In the sensor-based approach, a combination of sensors, including a G-sensor and a Gyroscope sensor, is employed to monitor hand orientation and motion. These sensors continuously capture signals related to hand data, which are then wirelessly transmitted to a mobile device for hand state estimation. The choice of recognition method depends on the input data and the dataset utilized. In this particular case, the authors utilized template matching as a classification method, which encompasses five dynamic sign classes. In the vision-based approach, hand gestures are observed through the mobile camera, and a series of processing steps are applied to identify the signs within the video stream.

3.5. RQ5: Which Sign Languages Are Targeted?

Different countries used their regional sign languages for research and contributed to the accessibility domain for speech disorder people. The American Sign Language is the dominant sign language in the research as shown in Table 8.

3.6. RQ6: What Evaluation Metrics Are Used in the Experiments?

The systems that use sign language dataset(s) are usually evaluated using standard metrics such as accuracy, precision, recall, and F1 score. From the literature, most of the systems were evaluated by detecting and interpreting the sign languages, and hence accuracy is the frequently used metric as shown in Figure 4. Similarly, precision and recall were also used.

3.7. RQ7: Which Models Have Demonstrated Better Performance for Specific Sign Languages?

Numerous machine and deep learning models have been employed for detecting and recognizing diverse sets of sign languages. This process encompasses the training and testing of data using specific sign language datasets, which can include data ranging from hand gestures to video frames, as well as data collected from wearable sensors. As previously discussed, gestures are captured using mobile cameras, while data from wearable sensors are collected through gloves. Table 9 provides an overview of studies centered on various sign languages, offering insights into their respective accomplishments, primarily evaluated in terms of accuracy.

4. Meta-Analysis

This section offers a multilayered examination of the collected literature, exploring various dimensions, including publisher contributions, contributions by country, and citation analysis. Numerous approaches have been thoroughly tested and validated on specific sign languages, as extensively discussed earlier. For instance, Figure 5 provides a comparative analysis of various studies on American Sign Language (ASL) along with the achieved accuracy levels. It is important to note that the accuracy of these approaches and models is contingent upon the complexity and variability of signs within a specific sign language.

The contribution of publishers has been analyzed based on the selected publications. While it is evident that each publisher has made substantial contributions to research in the field of accessibility for individuals with speech disorders, it is noteworthy that a majority of the selected papers in this paper were published in IEEE journals and conferences, as illustrated in Figure 6.

Moreover, the most highly cited paper among the selected publications has been identified. The paper with the highest number of citations was authored by Cheok et al. in 2019, titled “A review of hand gesture and sign language recognition techniques [14].” As of the latest available data, it has accumulated 456 citations, as illustrated in Figure 7.

Similarly, the analysis of the selected literature for this paper has been conducted with a focus on country-wise contributions. In terms of country-wise contributions, India stands out as a significant contributor to publications related to speech disorders, as depicted in Figure 8. The United States follows as the second most prominent contributing country.

5. Open Research Questions

This section explores the potential open research questions and challenges that currently exist. While the advancing hardware and software capabilities of smartphones are no longer a computational constraint, the multifaceted nature of sign languages, each with its diverse set of gestures, continues to present significant challenges. Moreover, the challenges also include social acceptability and pervasiveness at low cost. Besides, the reliance on sign language(s) and its translation for individuals suffering from speech disorders has unique challenges that need proper investigation, for example, compatibility issues, multilingual translation, education level, real-time gesture generation, and translation. The following subsection provides an in-depth elaboration of the most salient issues and challenges identified in the existing literature.

5.1. Accuracy, Robustness, and Real-Time Detection

The accuracy of real-time translation of sign language is challenging due to various factors, such as light conditions, power consumption, social acceptability, and privacy constraints. The question is “How can we improve the accuracy and robustness of sign language detection and interpretation on smartphones to ensure reliable and real-time communication for users?” This is because it involves real-time image processing and source constraints, such as processing and storage [148]. Delays in processing with false positive responses may further increase frustration for speech-disabled people. While smartphones are portable, the input of gestures on smartphones may require specific tools or the presence of an individual to operate the smartphone’s camera for individuals with disabilities. Without these provisions, there is a risk of improper gesture input and consequently an increased chance of errors.

5.2. Multilingual Support

Every region of the world has its own sign language for its speech-disabled people. This makes it difficult to translate one sign language to another and hence the scope becomes narrow [148]. The question of “What techniques can be developed to support multiple sign languages on smartphones, accommodating diverse user needs?” still exists. Furthermore, there is a pressing need to establish a universal standard for sign language. Such a standardized language could facilitate the development of universal smart devices, ultimately leading to a reduction in the overall cost of equipment designed for these purposes.

5.3. Gesture Recognition

As mentioned, the sign languages are detected via sensor (hardware approach) or by vision approach. The sensor approaches, i.e., gloves or other wearable devices, are not socially acceptable and hence rarely used by speech-disordered people. In the vision-based approach, we have image processing, which itself requires lots of energy, power, and storage [167]. The question “How can machine learning algorithms be optimized to recognize a wide range of sign language gestures and expressions accurately?” is yet to be answered. One reason may be that machine and deep learning algorithms are resource-intensive, and hence little attention is given to smartphones. Therefore, existing machine and deep learning algorithms require proper optimization for smartphones.

5.4. Data Privacy and Security

Privacy is everyone’s right and also for people with special needs including the visually impaired [168, 169] and people suffering from speech disorders. The sign language talking patterns are vulnerable due to processing by a machine [170]. Moreover, the sign language talking in public may lead to privacy breaches. Therefore, the following question arises: “What measures can be implemented to ensure the privacy and security of sign language data transmitted and processed on smartphones?” This question needs proper attention. The messages in digital form have numerous security issues, such as chat leakages and hacking, among others. As a case study, some attempts have been made by Michigan State University (https://msutoday.msu.edu/news/2019/new-technology-breaks-through-sign-language-barriers) to address numerous pressing issues. However, more work is needed in this domain to ensure that sign language interpretation is risk-free. Proper encryption/decryption by the machine (used for translation) could also improve privacy issues.

5.5. Low-Light and Noisy Environments

Image processing in low light generates false positives, which directly affect the performance and results [171, 172]. The question “How can sign language detection systems on smartphones perform effectively in low-light conditions and noisy environments?” still exists. Moreover, due to battery constraints, smartphones have limited battery life, which tends to deplete rapidly during image processing activities under low-light conditions. The machine and deep learning application(s) may further contribute to battery depletion.

These research questions include various aspects of sign language(s) detection on smartphones and offer opportunities to advance this field to better serve the needs of individuals with hearing and speech disability problems. Researchers/academia and practitioners can focus on one or more of these questions to contribute to the development of innovative, low-cost, socially acceptable, and effective solutions.

6. Conclusion

The detection and interpretation of sign language for people with speech disorders, utilizing cost-effective off-the-shelf devices, particularly smartphones, has gained substantial attention within the research and academic communities. Using a smartphone for accessibility solutions is not an exception due to its growing capabilities in terms of processing, mobility, storage capacity, and social acceptability. This paper presented a systematic literature review (SLR) on sign language detection and interpretation using pervasive and ubiquitous computing devices, such as smartphones. The objective is to comprehensively analyze the progress achieved thus far in the machine and deep learning approaches using smartphones. Moreover, to analyze the approaches employed in enhancing accessibility for individuals with speech disorders, it is important to gather insights regarding the recent machine and deep learning approaches, available datasets, evaluation metrics, and current research and emerging trends. In this connection, this paper is intended to provide valuable insights for researchers and practitioners engaged in accessibility initiatives, particularly in the domain of speech disorders. This study highlighted the most valuable literature published from 2012 to July 2023. Moreover, it highlighted a detailed yet comprehensive literature, datasets, and numerous machine and deep learning approaches used on smartphones. The paper specifically focuses on the detection and interpretation of sign languages via smartphones. This study suggests that the development of a universal sign language could greatly benefit both practitioners and developers in this field since it may mitigate the overhead costs associated with learning, detecting, and translating multiple sign languages. Moreover, the focus should be on socially acceptable devices instead of expensive or complex wearable devices. This review paper may serve as a valuable contribution to the existing body of knowledge and is expected to offer a roadmap for future research in the domain of accessibility, specifically for speech-disabled individuals. Future work can be carried out in different areas, such as real-time accurate translation by smartphones, preserving privacy during translation, and accurate gesture recognition in low-light conditions.

Data Availability

The collected data (in an Excel sheet) will be provided upon request. Most of the basic statistics regarding the systematic literature review are discussed within the paper.

Disclosure

This study was conducted at the Department of Computer Science, City University of Science and Information Technology, Peshawar, Pakistan.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.