Abstract

In the geological survey, the recognition and classification of rock lithology are an important content. The recognition method based on rock thin section leads to long recognition period and high recognition cost, and the recognition accuracy cannot be guaranteed. Moreover, the above method cannot provide an effective solution in the field. As a communication device with multiple sensors, smartphones are carried by most geological survey workers. In this paper, a smartphone application based on the convolutional neural network is developed. In this application, the phone’s camera can be used to take photos of rocks. And the types and lithology of rocks can be quickly and accurately identified in a very short time. This paper proposed a method for quickly and accurately recognizing rock lithology in the field. Based on ShuffleNet, a lightweight convolutional neural network used in deep learning, combined with the transfer learning method, the recognition model of the rock image was established. The trained model was then deployed to the smartphone. A smartphone application for identifying rock lithology was designed and developed to verify its usability and accuracy. The research results showed that the accuracy of the recognition model in this paper was 97.65% on the verification data set of the PC. The accuracy of recognition on the test data set of the smartphone was 95.30%, among which the average recognition time of the single sheet was 786 milliseconds, the maximum value was 1,045 milliseconds, and the minimum value was 452 milliseconds. And the single-image accuracy above 96% accounted for 95% of the test data set. This paper presented a new solution for the rapid and accurate recognition of rock lithology in field geological surveys, which met the needs of geological survey personnel to quickly and accurately identify rock lithology in field operations.

1. Introduction

The recognition of rocks is not only an important part of geological survey but also the focus of geological research. The traditional recognition method consists of three steps [1, 2]: firstly, workers collect fresh rock samples in the process of exploration; secondly, after returning to the laboratory, the rock thin section with an area of about 2 × 2 cm is cut from the vertical stratification direction of the rock samples. When one side of the rock samples has been flattened on the grinding machine, it is glued to the carrier glass with glue such as adhesive. Then, the thickness of the other side is smoothed to 0.03 mm, and the cover glass piece is glued with the adhesive. Finally, an image of the rock sheet is viewed under a polarizing microscope by a knowledgeable or experienced geologist. In this way, the rock type and structural parameters can be determined. This traditional identification method requires the observer to have very rich geological knowledge and experience. In addition, the method has many problems, such as strong subjectivity, long identification period, and poor field identification ability.

With the development of computer vision and image processing technology, great changes have been brought to rock recognition and mineral analysis [3, 4]. Many researchers analyzed the texture, fabric, granularity, and lithology of rocks based on image processing techniques such as image analysis and feature extraction. Patel used the probabilistic neural network (PNN) to develop a lab-scale vision-based model in which color histogram features are used as the input. The model has achieved good recognition results, and the error of misclassification of limestone is less than 6%. The main limitation of this study is that the classification object is the entire rock sample of multiple rocks, and no further consideration is applied to identify rocks on site [5, 6]. Based on the basalt thin section image, Singh et al. extracted 27 characteristic parameters and identified 300 rock thin sections [7]. The recognition accuracy of the three texture categories is 92.22%, which is improved compared with previous studies, but the classification categories are fewer. Based on the research of image features, Cheng Guojian and Yin Juanjuan applied the support vector machine to realize the image classification of a total of 100 rock thin sections of 4 categories, with an accuracy of 80% [8]. The disadvantage is that the model performed poorly. With the continuous development of deep learning in the field of image intelligent recognition, many researchers used deep learning methods to automatically identify rock images [911]. Zhang Ye et al. used the transfer learning method for the first time to automatically identify and classify rock images. They have achieved the effective identification of three types of granite, Chiba, and breccia [12]. However, the experimental data are few and cannot meet the needs of on-site recognition. Li et al. used the transfer learning method to train the sandstone microscopic images to obtain a high-precision sandpaper slice microscopic image classification model [13]. The disadvantage is that the adaptability is poor, and it is only suitable for sandstone recognition. Cheng Guojian, Guo Wenhui and others realized automatic granularity recognition based on the rock thin section image [14]. The accuracy of rock identification is 98.5%. However, the identification objects of this study are rock thin section images, which need to be made in the laboratory and cannot be directly applied to the work site. Based on computer vision and machine learning, Marmo et al. used more than 1,000 carbonate flakes. Based on the gray scale digital image, they set up the multilayer sensory neural network model. Then, network training based on texture data was carried out, and the classification accuracy reached 93.3% [15]. Guo Chao et al. used the original color image of the rock to describe the feature space. Their method was to calculate the standard arithmetic values of different color channels by combining their morphologies [16]. The neural network is used to establish the mapping relationship between the feature space and the rock image category, and the algorithm is tested using 100 rock thin section images from the Ordos Basin. The results show that the automatic recognition rate of rock images in different color spaces is more than 95%.

The above research on rock image recognition uses the standard rock thin section image, rather than taking the more complex and direct rock image as the research object. It is based on various more complex feature parameter extraction algorithms, and the identified rock image data are less. The current research results reduce the problem of strong subjectivity and high recognition cost in the traditional method. But, it cannot meet the requirements of geological survey personnel to quickly identify rock lithology in real time in the field. “Smartphones” are now a ubiquitous handheld communication and computing device with multiple sensors that all workers can use anytime, anywhere. In order to get a better solution, this paper proposes a method to identify the rock image. The method is suitable for smartphones, and the recognition is fast and accurate. An application running on an android smartphone is also designed and developed in this article. Because smartphone computing and storage resources are limited, this method is based on ShuffleNet, a light convolutional neural network. Combining with the transfer learning method, the learning results of ShuffleNet on ImageNet of a large data set are transferred. These are transferred to the experimental data set in this paper, namely, the rock image data set (a total of 30 categories). After retraining, the generated rock recognition model is exported. Finally, an app was designed and developed in this paper to help the staff quickly and accurately identify the rock lithology on site. The rock recognition model needs to be deployed on android-based smartphones. The model of this paper extracted features by searching image pixels without manual operation, which reduced the influence of subjective factors. Moreover, the training process has low requirements on the rock image size, imaging distance, and light intensity. Using smartphones, which are carried by workers, lithology can be quickly and accurately identified. Compared with the traditional method, it solves the problem of the traditional method. The solved problems include strong subjectivity, high identification cost, and long cycle. It also has advantages over the analysis and feature extraction techniques based on rock slice images. For example, the method can directly identify more complex images of rocks without making thin sections. This method has the advantage of quickly and accurately identifying rock lithology in the field, which can meet the requirements of workers to identify rocks quickly and accurately.

2. Materials and Methods

2.1. Rock Recognition Model Structure Design
2.1.1. ShuffleNet

ShuffleNet is an extremely efficient convolution structure designed for smartphones. It was proposed by Zhang Jian and others [17]. The depthwise separable convolution and group convolution introduced by Xception and ResNeXt can coordinate the ability and computation of the model, but their pointwise convolution occupies a large amount of computation [1821]. ShuffleNet introduces pointwise group convolution to solve this problem. It has two characteristics: pointwise group convolution and channel shuffle. Compared with the existing advanced CNN models (such as MobileNets) [22, 23], the calculation amount can be greatly reduced under the similar precision, and the parameter amount can be greatly reduced. A large number of 1×1 convolutions consume a lot of computing resources, and pointwise group convolution helps reduce computational complexity. As shown in Figure 1, channel shuffle is to orderly disrupt the channels of each feature map to form a new feature map to solve the problem of “poor information flow” caused by group convolution.

Group convolution can effectively reduce the computational cost, but the output only comes from some fixed input channels. It prevents feature exchange between channels and does not obtain the optimal representation. ShuffleNet uses channel shuffle to construct the association between the input channel and the output channel, including a convolutional layer with groups and an output with  × n channels. The output dimension is reshaped into (, n), and it is transposed and flattened as the input to the next layer. Figure 2 shows that ShuffleNet is based on channel shuffle to construct the ShuffleNet unit.

The ShuffleNet architecture is primarily built from a set of ShuffleNet units. A ShuffleNet unit consists of a 1 × 1 pointwise group convolution layer and follows the channel shuffle operation layer. Under the same conditions, the calculation cost of this structure is low. The input is c × h ×  with bottleneck channels m. ShuffleNet only requires FLOPs, but ResNet requires floating-point operations per second (FLOPs). Compared to MobileNet, the ShuffleNet model achieves an absolute 7.8% performance in ImageNet Top-1 errors at a cost of approximately 40 million floating-point operations per second (MFLOPs). Channel split operation was proposed in the ShuffleNet V2. Firstly, the input of the feature channel is divided into two branch channels. One branch remains unchanged, and the other branch is computed a 1 × 1 convolution and 3 × 3 depthwise separable convolution. Then, the two branch features are connected, and the channel shuffle operation is implemented. After the channel is reorganized, the next unit is repeated. The report shows that ShuffleNet V2 is about 40% faster than ShuffleNet V1 and about 16% faster than MobileNet V2. With 500 MFLOPs, ShuffleNet V2 is 58% faster than MobileNet V2 and 63% faster than ShuffleNet V1 [24, 25].

2.1.2. Transfer Learning and Model Construction

The transfer learning method can apply the knowledge learned from other tasks (source tasks) to the target task. This method is conducive to the construction of the mathematical model of the target task and reduces the duplication of labor and the dependence on training data of the target task [2628]. A comparison of transfer learning and traditional machine learning is shown in Figure 3. Traditional machine learning faces different learning tasks, and even if there is a similarity between tasks, different learning systems need to be established. However, in the face of different learning tasks, transfer learning can transfer the knowledge learned from the learning system. In other words, knowledge learned in solving the source task is transferred to the learning system that solves the target task.

From the perspective of the structure and function of the deep neural network, the convolutional layer of the neural network mainly extracts features and shares parameters and reduces the number of parameters through the use of the pooling layers [2730]. The features extracted by the network are integrated through the final fully connected layer to obtain the high-level meaning of the image features. Then, it is classified by the classifier to get the final classification result [26, 31]. In some cases, the data set is small, and the distribution is not balanced, which makes the training results overfitting. The model performs well on the training data set but performs poorly on the verification data set and test data set. Using transfer learning method can improve this problem very well.

2.1.3. Model Structure Design

Through transfer learning, the model parameters trained by ShuffleNet on ImageNet of the large data set are migrated. In order to help train the rock image recognition model, it was migrated to 30 classes in the experimental data set of the rock image. The ShuffleNet can extract valid information from images [32, 33]. The difference between rock image data sets and large data sets is relatively small. And in specialized fields, it is a small data set of fine-grained types. Therefore, the problem of rock image recognition belongs to the fine-grained classification of small data sets. This paper used the transfer learning method to perform rock image recognition. Most parameters of the network pretrained on large data sets were retained and adjusted to fit this data set. Input image resolution: 128, 160, 192, or 224px. Different sizes of input pictures will affect the classification results [34]. This article used 224 as the initial setting. The relative size of the model can be set to 1.0, 0.75, 0.50, or 0.25. This paper recommended 0.5 as the initial setting. Smaller models run significantly faster but at the expense of accuracy.

In order to deploy the trained model to the smartphone, the lightweight convolutional neural network structure ShuffleNet of 2.1.1 was used. ShuffleNet weights and parameters pretrained by ShuffleNet on the ImageNet data set were imported based on the characteristics of the rock image data set. Each convolutional layer used ReLU as the activation function. Batchnorm was used to normalize the distribution of the batch. The Softmax classifier of 2.1.4 was used for classification. A model was trained on the data set. The model structure is shown in Table 1.

2.1.4. Softmax Regression Model

In the multiclassification problem, the Softmax regression algorithm was adopted in the rock recognition model in this paper to map the output values for multiple neural units into (0, 1) with a total value of 1 [29, 35]. Therefore, the rock recognition model was classified as the probability of a sample being in a certain category to realize multiclassification. Let the training set consist of m labeled samples, i.e., . The range of the category label y is . Let probability denote the probability that the sample is discriminated as being in category j in the case of input x. Therefore, the output of the k-class classifier is a k-dimensional vector, and the sum of its elements is 1. Analogical logistic regression using the hypothesis function can express the output of Softmax as

Among the terms in the equation, the input x is a vector of dimension m × 1, and the output model parameter is a matrix of order m × k. The training process of the model is used to find the optimal value through continuous iteration so that the predicted value approaches the actual value. The cost function of the regression model can be expressed aswhere 1 {×} is an indicative function, whose value rule is 1 {expression whose value is true} = 1; 1 {expression whose value is false} = 0.

As for solving the parameters by minimizing , there is currently no closed solution to minimize . In this paper, iterative algorithms such as gradient descent are used to solve the problem. The gradient formula that was obtained after taking derivatives is as follows: is a vector whose l element is the partial derivative of to the l component of .

In this paper, a weight attenuation function term is added to the cost function to make it strictly convex to ensure its convergence and unique solution. The cost function is modified by adding , where n represents the number of input data, and this weight attenuation term will punish excessive parameter values. The cost function was converted to the following equation:

After adding the weight attenuation term , the cost function became a strict convex function so that a unique solution can be guaranteed, and since is a convex function, the gradient descent method can guarantee convergence to the global optimal solution. Divide the data set into 3 parts: training set, validation set, and test set, take the number in λ from small to large, and then learn the model parameters on the training set, calculate the verification set error on the cross-validation set, and select the model with the smallest error, that is, choose . Finally, the evaluation is performed on the test set to obtain the best λ value. In order to use the optimization algorithm, the derivative of this new function is required:

An available Softmax regression model can be achieved by minimizing .

2.2. Model Training
2.2.1. Data Set and Data Preprocessing

Geological survey workers use fresh rocks by using a smartphone as an experiment in a data set of rock images. These rocks are generally rock foam rhyolite of different lithologies and dark gray stomatal almond-like rough rocks. For example, there are 30 kinds of light gray rhyolite, purple red tuff, gray black obsidian, purple gray amphibolic rhyolite, and potassium gray white rock data. These images came from multiple locations in East China, with sizes between 3M and 6M. In this paper, the size of each image is compressed to 224  224 pixels on the condition of ensuring accuracy. Figure 4 is a sample map of the rock.

30 different kinds of rocks were collected, and a total of 3,795 images were taken. According to the ratio of 8:1:1, images were randomly selected from rock samples as the training data set, verification data set, and test data set. That is, there are 3,046 graphs of the training set, 381 graphs of the verification set, and 368 graphs of the test set. The detailed data distribution is shown in Figure 5.

It can be seen from the observation of the data set that the number of all types of data was unbalanced, and the number of pictures of some categories was very less. Methods such as rotation, flipping, cutting, and adjusting light and shade were used to randomly expand the training data set to improve the training performance.

2.2.2. Training Model

The transfer learning method was used to train the rock recognition model in the TensorFlow framework on the PC. The ShuffleNet network structure was built using the Python programming language, and the parameters pretrained by ShuffleNet on the ImageNet data set were imported. Experiments were evaluated on Core I9 series CPU, 32G RAM, NVIDIA GeForce GTX Titan Z 12G GPU, Linux OS PC. In the training process, the default iteration step number was 3600, and the learning rate was 0.008. The activation function was ReLU. During each iteration, 50 images were randomly selected from the data set for training, and 15 images were randomly selected for cross-validation. Softmax was used as a classifier to classify, and the optimization function used a method of stochastic gradient descent. Training accuracy refers to the percentage of accurate classification of currently trained images, while verification accuracy refers to the percentage of accurate classification of randomly selected images. Cross-entropy displays the learning effect of the model training process. The smaller the value, the better the learning effect.

As can be seen from Figure 6, the loss converges after the 160th iteration. After the 640th iteration, it remained stable, and the loss was close to zero. And at this point, the training accuracy is close to 100%. The accuracy of the verification set is slightly tight. After 3,600 iterations, the accuracy of the rock recognition model on the training set approaches 100%, and the loss is only 0.0004; the accuracy on the verification data set reached 97.65%, and the loss is only 0.1052. According to the training accuracy, verification accuracy, and cross-entropy changes, it can be seen that the training effect of the model is relatively ideal.

2.3. Accuracy and Run Time Comparison

This paper shows the superiority of the rock recognition model which is based on the combination of the ShuffleNet convolutional neural network and the transfer learning method. On the personal computer, the precision and running of the rock recognition model based on ShuffleNet were compared with the MobileNet model, the SqueezeNet model, and the standard convolutional network ResNet50 model. Originally designed for mobile and embedded visual applications, MobileNets are built primarily from the deep separable convolution operation, which decomparts standard convolution into deep convolution and point-by-point convolution. MobileNets apply a single filter to each input channel and then combine the output with linear combinations through point-by-point convolution. MobileNet V2 is proposed for further improvement. It is constructed by inversion residual and linear bottleneck technique, which can reduce the number of parameters and the loss of activation operation. Combined with the single-shot detector lite used for object detection, it was reported that MobileNet V2 was 35% faster than MobileNet V1, with 20 times less computation and 10 times fewer parameters than YOLO V2 [36]. SqueezeNet proposes to maintain precision with a small number of parameters, and its core structure is a new type of component Fire module. There are three main strategies for building Fire module [37, 38]. First, the 3 × 3 filter was replaced with a 1 × 1 filter. Second, the number of input channels of the filter was reduced. Finally, the network sampling was delayed. The evaluation results showed that the structural parameters of SqueezeNet were 50 times less than original AlexNet and maintained the horizontal accuracy of AlexNet on Imagenet.

2.4. Software Deployment

In this study, the trained rock recognition model was deployed on smartphones or embedded products. The significance of rock identification lies in the fact that geological investigators use smartphones for recognition on the work site instead of huge servers in the laboratory. Recognition in the laboratory is not a bad option. But, if workers take rock samples back to the laboratory to make rock thin sections, the recognition period and cost will increase. Many applications are often very sensitive to the response time of a program; even small latency in service response can have a significant impact on the user. Today, more and more applications provide core functionality through deep learning models. Whether people are deploying models to the cloud or to smartphones, low-latency reasoning is becoming increasingly important. One way to solve this problem is to perform model inference on a high-performance cloud server. Also, the input and output models are transferred between the client and the server. However, this solution brings many problems, such as high computing costs, massive data migration through mobile networks, user privacy, and increased latency. The top-level compressed ShuffleNet model takes an alternative approach to these scenarios and requires less resources to perform reasoning. This section describes the process of deploying the ShuffleNet model on a smartphone.

The rock recognition model is trained on a PC server. This paper uses the TensorFlow framework to run properly on Linux. However, this cannot be done directly on the smartphone and requires some necessary conversions and deployments. The CNN model on Linux needs to be converted to the (.pb) format and deployed on Android smartphones. In order to implement the solution of identifying rock lithology in the field, this paper developed an application running on Android smartphones. In addition, Huawei, Samsung, and Oppo, three common smartphones in the market, were selected as the experimental platform. The interfaces of the application are shown in Figure 7.

The application is written in the Java programming language and runs on an Android smartphone (Android 4.4 operating system or higher). The operating memory of the phone should be greater than 4 GB, and the storage capacity should be 32 GB or more. The application can load and run the trained CNN model under the TensorFlow framework [39, 40]. It can identify not only the rock image captured by the smartphone in the field but also the rock photos stored in the gallery of the mobile phone. Software can output the recognition results (the type and nature of the rock, the accuracy of the recognition, and the time of execution) to the interface. The Huawei P20 mobile phone is used for the accuracy test of the model on the smartphone. The built-in main camera has a focal length of 3.95 mm and a resolution of 2244 × 1080 pixels, which is supported by most smartphones. Samsung Galaxy A8 s and Oppo R17 with a 16 megapixel rear camera (Android 7.0) are used to illustrate the performance of this app on other phones.

Operating the application is simple and convenient. Install the application on the smartphone carried by the investigator. Open the software at work. Then, click “Camera” to enter the camera photographing interface. Then, point the camera at the rock and click “photographing.” After this series of steps, the captured image is loaded into the interface to be recognized. Then, click “recognize,” and the recognition result (including the type and lithology of the rock, the recognition accuracy, and the recognition time) is displayed on the interface in less than 1 second.

2.4.1. Software Development Platform

On Windows 10, build the application development environment based on Android Studio 3.3, Android SDK (Java Development Kit), Java JDK 8 (Java Development Kit), TensorFlowLite Development Kit (you can migrate the model trained under the TensorFlow framework to Android smartphone), and ADT (Android development tools). The application is suitable for the operating system of Android 6.0 and above.

2.4.2. Software Performance and Technical Overview

This software can be used by all types of Android smartphone users. It implements a friendly graphical user interface and features linear execution. In this interface, only the rock identification results (type, lithology, and accuracy) and the identification time are shown. It also allows the user to use the application without knowing its internal performance. The technical system of the application program is basically composed of two parts. One is the underlying part composed of the TensorFlow Lite development interface, and the other is the Android application layer composed of the Android native development interface (API) [41, 42]. TensorFlow Lite can deploy the CNN model trained under the TensorFlow framework to Android smartphones. According to the principle described in Section 2.1, TensorFlow Lite provides a Jar package written in Java and (.so) format dynamic library written in C ++. The latter provides APIs for operational models such as functions for reading models, recognition functions, and output functions. The “Android native API” implements the main parts of the main application and the graphical user interface. It is responsible for coordinating tasks, invoking Android camera sensors to capture rock images and correctly store results. When rock images are captured in the “Android application layer” and needed to be identified and analyzed, a request was made for “TensorFlow Lite underlying part.” However, direct communication between the two parts is not feasible. So, this paper needs to use the Java Native Interface (JNI) to allow this interaction. The “Android application part” calls the required functionality through JNI, which is actually responsible for executing the C ++ library and returning the results.

3. Results

3.1. The Accuracy and Time of the Rock Recognition Model Tested on the Smartphone

The purpose of this study is to facilitate geologists. The idea is for them to use smartphones to quickly and accurately identify the types and lithology of rocks in the wild. Therefore, the accuracy and running time of identification are very important. The test data set and the (.pb) format model which has completed the training and the (.txt) format label file containing rock information are imported into the SD card of the Hauwei smartphone, respectively. When the application in 2.2 is run, the rock recognition model is automatically loaded. Then, the test data set in the smartphone is read, and the recognition result of the test set is obtained. As shown in Figure 8, the accuracy of the model is represented by the confusion matrix.

Among them, the row value of the matrix is the true value. The column value is the predicted value, and the accuracy of the whole test set is 95.30%. And the accuracy of the single image is above 96%, accounting for 95% of the test data set.

The recognition time distribution of the single image of the test data set is shown in Figure 9. Among them, the average recognition time of the single image is 786 milliseconds. The maximum is 1,045 milliseconds. And the minimum is 452 milliseconds. The boxplot has no outliers, indicating stable model recognition.

3.2. Correlative Experiments and Analysis

A rock recognition model based on the ShuffleNet convolutional neural network combined with the transfer learning method was presented in this paper. In order to verify the superiority of this model, the same training data set and validation data set were used in this paper. And the accuracy and running time of this model were contrasted with other CNN models (MobileNet and SqueezeNet).

3.2.1. Compared with Other CNN Models

The training based on different batch sizes was evaluated at 8, 16, 32, and 48, respectively. Figure 10 shows the relationship between model accuracy and training period. This section takes advantage of the accuracy of the rock type and lithology. The data and other parameter settings were the same as the experimental data provided in Section 2.2.

All models converged after about 35 training epochs. The 32-batch training model achieved better performance, about 5% better than the other models. SqueezeNet was more stable and smooth during training, while MobileNet and ShuffleNet fluctuated more. It is reasonable to find that the calculation of gradient descent direction was more accurate and milder for larger batch sizes during model training. Smaller batch sizes resulted in more randomness and made it harder to achieve optimal performance.

3.2.2. Execution Time Evaluation

Different CNN models contain layers of various depths and widths, number of filters, and size and shape of filters, which lead to different structures, parameters, and complexity. In this paper, rock recognition models based on ShuffleNet training are compared with models based on other convolutional neural network training. The following results were obtained by evaluating the running time of different models. Running time of different CNN models was evaluated. Training and testing time of MobileNet, ShuffleNet, SqueezeNet, and ResNet50 models with various batch sizes (8, 16, 32, and 48) were evaluated. Table 2 gives the experiment results.

Using batch sizes 8, 16, 32, and 48 during each iteration, the MobileNet model had the longest training time, and these training times were 1.265 s, 2.364 s, 4.728 s, and 8.512 s, respectively. The ShuffleNet model had the shortest training time, about 75% of that of MobileNet. For test time, ShuffleNet took the shortest amount of time, with a time of 0.125 s. Comparing with the ResNet50 model, the running efficiency was greatly improved with compressed CNN models. For the size of space occupied by the model, MobileNet, ShuffleNet, and SqueezeNet required 34.5M, 18.2M, and 25M, respectively, but 219.4M is needed for the ResNet50 model. The experimental results also show that the ShuffleNet model is efficient and occupies less space. It is 7 times faster than the standard convolutional network ResNet50 model and takes up 12 times less space.

3.3. Comparing the Recognition Times Using Different Android Smartphones

The compressed CNN model was deployed on the Android smartphone, and its performance was tested. After the model was converted, ShuffleNet, MobileNet, and SqueezeNet files were, respectively, 15.2 MB, 8.4 MB, and 42.8 MB. Table 3 shows the model test results for the selected Android smartphone. Huawei, Samsung, and Oppo phones were used for testing, and the processors of the three phones are Kylin970, Qualcomm710, and Qualcomm670. The results showed that smartphones were very efficient and could perform operations in 0.5 seconds, enabling real-time applications. The Huawei phone achieved the best performance, taking 0.283 seconds to execute the model due to the neural network processing unit contained in it. Three models were deployed on the same smartphone. There were three different types of smartphones. As shown in Table 3, the model of ShuffleNet achieved the best performance.

3.4. The Advantages of the Presented Method

The rock identification method in this paper is compared with the traditional method and the method based on rock slice image processing. Table 4 shows the advantages of the presented method. The model of this paper can quickly identify the types and properties of rocks on the condition that the accuracy requirements are met. The presented method can quickly get the recognition results in less than 1 second after taking photos in the field. And there is no need to make rock flakes to reduce the cost of identification. The experimental results show that the convolutional network model has obvious performance in model compression and computation. It is suitable for rapid and accurate recognition of rock lithology under field offline conditions.

4. Discussion

In this paper, lightweight convolutional neural network ShuffleNet was used to solve the problem of recognizing types and lithology of rocks in the field. The pretraining model of ShuffleNet was fine-tuned in combination with the transfer learning method, and then the model was retrained on the rock image data set in this paper. Finally, this paper developed an intelligent program for quickly identifying rocks for geological survey. This program enabled effective recognition of 30 types of rocks such as granite, rhyolite, tuff, and breccia before deploying trained rock models to Android smartphones. In this paper, the accuracy of the recognition model reached 97.65% in the verification data set of PC. And the accuracy of the recognition model on the test data set of the smartphone was 95.30%. The average recognition time of a single rock image was 786 milliseconds. The model size was only 18.2 MB. For the same rock model, there was no significant difference in the results of different smartphone recognition. The model extracted features by searching image pixel points without manual operation, thus reducing the influence of subjective factors. Compared with the recognition of rocks using the technique of rock thin section image processing [3, 43], the presented method has lower requirements on the size, imaging distance, and light intensity of the rock image. In this paper, the rock recognition model trained by ShuffleNet was compared with MobileNet and SqueezeNet training models, respectively, in terms of accuracy and running time. It was found that the rock recognition model based on ShuffleNet training has many advantages. For example, it can effectively reduce model parameters, compress model size, improve model calculation speed, and shorten model running time. This method had short recognition time and high accuracy and was suitable for fast and accurate recognition of rock images under offline conditions in the field. Based on ShuffleNet’s lightweight convolutional neural network, the characteristics of rocks were effectively identified in the image. Through the tests on PC and smartphones, there was no wrong situation, which fully proved the robustness and generalization ability of the model.

The greatest contribution of this paper was to provide a solution for geological survey to quickly and accurately recognize rock lithology. Traditional recognition methods need not only collection of fresh rock samples to make rock thin section but also knowledgeable or experienced professionals to recognize the rock type and structure parameters under the microscope. The traditional method has strong subjectivity, long period, and high difficulty in the field. Therefore, the traditional identification method requires the observer to have very rich geological knowledge and experience [4244]. At present, it was found that most of the rock deep learning recognition techniques are used to identify rock slices. The same is true for image processing techniques. Workers need to collect rock samples and go back to the office or laboratory to make the rock thin section. The recognition accuracy can meet the requirements of professional standards. But, the biggest disadvantage is that the research results of rock recognition cannot be applied to the field. And workers cannot use the research results of rock recognition to quickly and accurately identify the rock in the field [6, 45, 46]. This paper used ShuffleNet combined with the transfer learning method to train the rock recognition model. There was no need to make the rock thin section. Geological investigators use smartphones which they carried as tools to photograph rock images in the field.

The presented method also has some limitations. The size of sample data has a crucial influence on the recognition effect of the deep learning model. When the number of images of a certain type of the rock is less, its features will be submerged, leading to poor recognition effect. It is difficult to find similar rock images with low probability of classification and recognition. Therefore, the probability of identifying such rock images is low. In this paper, the original training data set is expanded after cutting the rock image. Then, a new classification recognition model is established by training. The second test was made by classifying and recognizing the rock images with low probability. For granite, which contains a lot of minerals and has a wide range of variation in its content, the recognition and classification effect in the model are poor. Because granite mainly consists of feldspar, quartz, and black and white mica, the mineral composition of different varieties is not the same. And there may be pyroxene and amphibole, so the image features are complex, and the difficulty of recognition is increased. In addition, there were 30 types of rocks identified in this paper, and more types and quantities of training samples were needed. The model in this paper was compared with the training models of MobileNet and SqueezeNet. Experimental results show that the ShuffleNet-based rock recognition model has advantages in precision and running time. The comparison involves fewer models and requires more lightweight compression models. Adding more models is to choose a better solution in terms of precision and time.

5. Conclusions

Recognition of rock types and lithology are an important part of geological survey. In this research, ShuffleNet, a lightweight convolutional neural network designed for smartphones, was used to recognize the types and lithology of rocks. The transfer learning method was used to train the rock recognition model on the PC, and the trained model was deployed on the smartphone. This paper designed and developed an application program that runs on a smartphone. This application is not only simple in operation but also highly accurate in rock recognition. This paper solved the problems of long recognition period and high cost in traditional recognition methods. It also makes up for the defect that the methods based on the image processing and feature extraction of rock thin section cannot recognize the rock quickly and accurately in the field. Geological investigators can quickly and accurately identify rocks by using their smartphones in the field, which is of great help to geological surveys. In the future, this paper needs to compare the rock recognition model based on ShuffleNet with more models trained by the lightweight convolutional neural network. In order to improve the accuracy and efficiency of the method, more different kinds of rock training samples were added.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

The authors thank other team members for their help with the experiment. This research was jointly supported by Geological Survey Projects of China Geological Survey (DD20190416).