Abstract

Rail fastener status recognition and detection are key steps in the inspection of the rail area status and function of real engineering projects. With the development of and widespread interest in image processing techniques and deep learning theory, detection methods that combine the two have yielded promising results in practical detection applications. In this paper, a semantic-segmentation-based algorithm for the state recognition of rail fasteners is proposed. On the one hand, we propose a functional area location and annotation method based on a salient detection model and construct a novel slab-fastclip-type rail fastener dataset. On the other hand, we propose a semantic-segmentation-framework-based model for rail fastener detection, where we detect and classify rail fastener states by combining the pyramid scene analysis network (PSPNet) and vector geometry measurements. Experimental results prove the validity and superiority of the proposed method, which can be introduced into practical engineering projects.

1. Introduction

As shown in Figure 1, a rail fastener is a fixed coupling part that prevents horizontal and vertical offsets in rails. Thus, rail fastener detection can be used for maintaining the stability of railway systems and ensuring the safety of trains. Figure 1 shows that the slab fastclip (SFC) type of rail fastener is used for coupling steel rails and sleepers in ballast and ballastless rail scenarios, respectively. Traditional rail fasteners detection requires workers to walk along the railroad to determine the state of the rail fasteners and other functional components; this method has low detection efficiency and precision and is dangerous [1]. Therefore, automatic rail fastener detection has attracted increasing attention from researchers.

To address the limitation of traditional detection methods, many automatic detection methods using machine vision and image processing technology have been proposed and achieved good experimental results [111]. Recently, deep learning theory has received increasing attention in target detection and image segmentation works and has been successfully applied to rail fastener detection [1214]. However, these efforts are often heavily dependent on time-consuming and expensive manual annotations.

Thus, this paper proposes a semantic-segmentation-based rail fastener state recognition algorithm. Our research was conducted in three main aspects. First is the construction of a functional area marker model for rail fasteners based on significance detection. In this model, the fastclip parts in the rail fastener image are regionally localized by the significance detection model. Then, the different functional parts in the image are semiautomatically labeled by constructing pseudolabels. Finally, a SFC-type rail fastener dataset is constructed based on the labeling results and the true state information. This method can effectively avoid the tedious manual collection and labeling of fastener samples and can compete in fastener detection. Second, we creatively propose a semantic-segmentation-based method for detecting rail fasteners. The functional regions in the input rail fastener image are segmented by a semantic segmentation network model, and a fastener state detection method based on vector geometry relationships was designed based on the segmentation results. Third, the overall model of the proposed system is an end-to-end detection model that can detect and complete the classification of the rail fastener status in the raw input image, which can have excellent advantages in actual engineering projects.

The rest of this paper is organized as follows. Section 2 introduces related works on current rail fastener detection. Section 3 describes the overall framework and methods. Section 4 discusses the experimental results and analysis. Finally, the conclusion and future work are presented in Section 5.

With the development of computer vision and image processing technologies, researchers have been committed to the research of rail fastener detection using two-dimensional visual images. In most of the studies, the primary purpose of the rail fastener inspection task is to check for missing fasteners on both sides of a rail. In [1], the top-down detection method is proposed to detect the fastener region and predict its status. In [2], an automatic and configurable real-time vision system is proposed to detect the presence/absence of rail fasteners. In [3], a fastener location and detection method based on the combination of wavelet transform and template matching is proposed; the method can accurately locate fasteners and predict their status. In [4], the authors present a method based on image processing and pattern recognition techniques, which can be customized to detect the absence of fasteners. In [5], wavelet transformation and principal component analysis are combined to detect fasteners. In [6], a method based on line local binary mode coding is proposed; the authors comprehensively considered the correlation between the image center point and its neighborhood nodes. In [7], the from-coarse-to-fine strategy is proposed to detect and recognize broken rail fasteners with a method based on Haar-like features and the Adaboost algorithm. In [8], the probabilistic structure topic model is proposed for simultaneously learning the probabilistic representations of different objects using unlabeled samples. In [9], a fastener detection method based on the combination of the Shi–Tomasi and Harris–Stephen feature detection algorithms is proposed; the method can successfully detect the presence of fasteners. In [10], an autonomous visual rail fastener inspection system is proposed; in the system, the histogram of oriented gradient features and the linear support vector machine (SVM) classifiers’ method are utilized to inspect the defect situation and classify fasteners.

Several laser detection methods have also been proposed. For example, in [15], a real-time rail fastener detection system is proposed using laser ranging, which can effectively reduce the calculation cost. In [16], a fastener detection method based on the light sensor mechanism is proposed; the method uses a decision tree classifier and the centerline extraction method to detect the incomplete state and loose fasteners. In [17], a structured light method based on motion image for the moving object inspection method is proposed, offering a fresh perspective when inspecting missing fastening components on high-speed railways. In [18], the authors proposed a structured-light-based system to evaluate the rail gauge and detect missing rail fasteners.

A rail fastener image, which contains some functional parts, can be further divided into the rail, fastener, and background regions. In [19], unlike in traditional rail fastener detection methods, attention is also given to the location and detection of the hexagon nut in a rail fastener image.

In recent years, with the increasing application of deep learning technologies, some researchers have also applied deep learning to rail fastener detection. For example, in [12], the authors proposed a template matching classification method to automatically collect and annotate fastener samples and further deployed a similarity-based deep convolutional neural network (DCNN) to estimate the fastener state. In [13], a real-time inspection system for ballast railway fasteners based on point cloud deep learning was developed, demonstrating excellent accuracy and efficiency in field testing on ballastless tracks. In [14], a fastener detection method based on visual rail inspection is proposed using material classification and semantic segmentation with DCNN to, respectively, identify and segment the different functional parts in a rail fastener image. In [20], the authors proposed Yolo v3, which is deployed and trained as a deep learning model for detecting the state of rail fasteners. In [21], an end-to-end abnormal fastener detection method, which can identify abnormal fasteners from a rail scene image, is proposed.

In summary, we consider the following major problems of previous approaches:(1)Although good results can be achieved in rail fastener detection through deep learning frameworks, the detection methods based on existing supervised learning are heavily dependent on the manual pixel-level annotation of image data. Annotating large-scale rail fastener image datasets one by one through manual methods is extremely tedious, time-consuming, and expensive.(2)Existing rail fastener detection methods are mainly focused on the missing state of rail fasteners in images. The main body of the detection target generally determines the missing state only for the overall area of the fastener, rather than performing specific state detection based on local functional areas. The positioning results and status of rail fasteners can only be considered from a qualitative analysis perspective, and no unified quantitative evaluation criteria exist for the accurate description and comparison of the effects of the detection method.(3)Rail fastener detection is a standard data sample imbalance problem in which the positive sample data images of fasteners taken and collected in actual railway scenes are generally much larger than the negative samples. Unbalanced training and experimental samples can affect the accuracy of experimental results.

To solve the problems of existing approaches, we present a novel semantic-segmentation-based rail fastener state recognition algorithm. The contributions of our work are as follows:(1)To reduce the reliance of traditional deep learning methods on manual annotation, we provide a semiautomatic method for locating and annotating rail fasteners based on saliency detection. The experimental results show that the method can accurately locate and segment fastener pop-up regions and generate accurate pixel-level annotations of the rail fastener image, reducing the cost of the manual annotation of functional regions and improving the efficiency by 25 times that of the traditional manual annotation process.(2)We further classify the fastener state into five specific situations based on a priori knowledge of the geometric relationships between the different functional regions in the fastener image. Meanwhile, we shift the attention of detection to specific functional structural regions (i.e., the fastener fastclip and rail regions). We propose a semantic-segmentation-based rail fastener detection method and introduce new quantitative evaluation metrics to describe and evaluate the results of fastener positioning experiments. The experimental results prove that the fastener cartridge positioning effect and state detection results of this method have obvious accuracy and superiority.(3)To solve the problem of data imbalance, a new standard dataset based on the SFC-type fastener is constructed by the method in Contribution 1, and the image’s negative sample data is reasonably augmented in the construction process. To some extent, the problem of dataset imbalance is alleviated, and the overall performance of the detection method framework is improved.

3. Proposed Method

First, we perform localization and semiautomatic labeling of the functional regions in the original images using a salient detection model (SDM). Then, we propose a semantic-segmentation-based rail fastener state detection method, which is based on the semantic information and spatial relationships among the functional parts in the rail fastener image, to achieve the accurate detection and monitoring of the rail fastener state.

3.1. Positioning and Marking of Functional Areas

Figure 2 describes in detail a specific implementation of the method for locating and labeling rail fasteners based on significance detection, which enables the interactive automatic labeling of the functional areas in the input image. Our overall approach consists of three main modules. First, the fastener region in the input image is positioned by the salient detection model to obtain the original rail saliency map corresponding to each original rail image. Then, the target region segmentation model is used to locate and segment the fastener regions on both sides of the rails in the image to obtain the corresponding position of the rail fastener image, which mainly contains the rails and fastener fastclip regions that we are interested in. Finally, a salient detection model is used to further locate and correct the fastener fastclip areas in the saliency map, and a pseudolabel construction model is used to semiautomatically label the functional areas in the image to generate the corresponding interactive pseudolabels for each rail fastener image. The following section discusses these three main modules in detail. In addition, we constructed a special dataset of SFC rail fasteners using the above method. This dataset includes the rail fastener images obtained by salient detection and target area segmentation, the corresponding functional area pseudolabeling, the ground truth, and the labeled real fastener states.

3.1.1. Salient Detection Model (SDM)

We first perform the salient detection of the original rail fastener image by setting up a sliding processing window based on the comparison of regional features. As shown in Figure 3, sliding image processing window is defined with an adjustable window scale in the input image, and image processing window is divided into processing kernel and outer frame . Random variable represents a pixel node in image processing window , and a specific characteristic attribute of the node is defined as. The characteristic properties of the described pixel node are measured by computing the intensity and color features of the image to measure the significance of the node. This salient detection mechanism is similarly defined in the literature [22].

For pixel nodein the sliding image processing frame, window, we assume that events and indicate that pixel node is located in processing kernel and outer frame , and event indicates that pixel node is a significant pixel node. The significant measure of pixel node can be expressed by the following Bayesian formula:

In addition, to further calculate the significance metric value for each pixel node in sliding image processing window , we assume that the probability of the existence of the above event is , which ranges from (0,1). To reduce the error interference during the calculation of the saliency measure, we introduce a normalized regular histogram to enhance the robustness and stability of the algorithm. Second, we represent and in equation (1) by feature histograms and and use and to represent the products of the pixels in the corresponding histograms in the CIELAB color space. Finally, the significant metric value of pixel node that meets the requirements of the specific feature attributes in sliding image processing frame window is

We construct the feature histograms of any window in different position cases by moving different proportions of the sliding image processing frame window in the original image of the rail fastener and measuring the significance of the pixel nodes within the sliding image processing frame window according to equation (2).

On this basis, we label and segment the pixel nodes by the minimization energy function in the conditional random field (CRF) [23] to perform the target-level foreground segmentation of the significant regions in the rail fastener image. Suppose that input image contains pixel nodes ; we first define array to represent the set of all pixel nodes in image . Then, let array represent the significant metric value of pixel node. Finally, we assume that a binary segmentation label that can label pixel node exists, where indicates that pixel node is a salient and nonsalient label. The mathematical description of the CRF model we constructed based on salient detection iswhere is the normalization factor. We estimate the label attribute of a pixel node in an image by using the energy function in the minimized CRF model, and the mathematical representation is

Energy function in the CRF model consists of three feature models: two one-dimensional terms based on saliency feature model , color feature model for a particular pixel node, and a two-dimensional term based on spatial relationship model for neighboring pixels. The mathematical expression iswhere parameters and are the weighting factors that control the corresponding feature models. Their specific values are determined by the method described in the literature [24]. and are adjacent pixel nodes in rail fastener image .

For the original rail images in Figure 2, we first optimally estimate the labeling of pixel points by the energy minimization function of the CRF model and label the pixel nodes as significant and nonsignificant feature nodes. The image is then segmented into significant foreground and nonsignificant background areas based on the significant feature contrast of adjacent pixel nodes. In this approach, the fastclip areas on the rail fastener image can be marked and highlighted as prominent foreground areas, and the original rail salient maps can be generated.

3.1.2. Target Region Segmentation Model

Considering that the original rail image input contains some irrelevant regions and backgrounds in addition to our fastener regions of interest, we propose a target region segmentation module. In this module, we first divide the localization results of saliency detection into left and right subimages and then segment and extract the corresponding target regions in the subimages.

As shown in Figure 2a, the coordinates of the center pixel of the significant region are first calculated and discriminated based on the spatial geometric prior information of the saliency positioning module results. Subsequently, a specific size image clipping frame is constructed with the central pixel as the coordinate origin. Then, the boundary coordinates of the clipping frame are mapped and matched with the original rail image by obtaining the clipping frame. Finally, by segmenting the matching region in the original image, we finally obtain our desired rail fastener image.

We first define array to represent the set of all pixel nodes in the original rail saliency map . Then, we perform the binary segmentation of saliency map and define function to represent the saliency eigenvalue of pixel node . The value of the function is 1 when the pixel node is the significant foreground and 0 for the background node. The coordinates of pixel node are represented by . We can calculate the value of the coordinates of the center of the significant region as follows:where the value of function is 1 and and are the respective lengths and widths of the significant regions derived by statistical operations.

Based on the coordinates of the centroids of the significant regions obtained from the above equation, we can reconstruct an image crop frame containing all significant regions based on our a priori knowledge, where is the coordinates of the centroids of the regions and represents the size of the reconstructed crop frame. To facilitate the subsequent image computation and processing, the range of the values is positioned here as [470,520] pixels. Through this method of randomly generating crop frames of variable size in the parameter interval, the negative sample data of the input image can be randomly augmented. Consequently, the data samples can be maintained at a relatively balanced level as much as possible. To a certain extent, the imbalance of the dataset of rail fasteners is alleviated and solved, improving the overall performance and accuracy of the algorithm.

Finally, by obtaining the boundary coordinates of the image clipping frame for coordinate mapping and matching with the original rail image, we can obtain an image that contains only the rail fasteners and rail edges of ROI (region of interest), completing the task of segmenting the foreground target area.

3.1.3. Pseudolabel Construction Module (PCM)

Given the irregular shape characteristics of the fastener bar area in the rail fastener image, the bar area in the image must be manually marked at a high cost when using the traditional supervised learning method to detect a specific bar area in the rail fastener image, whereas the shape characteristics of the rail travel area in the image are usually significant and a regular rectangular area, which can be marked using the manual marking method. Therefore, this work attempts to construct image target-level pseudolabels instead of pixel-level manual annotations for the rail fasteners’ bullet regions and combine the results of manual annotations for the rail walk regions to learn and train the labels of different rail functional regions in the image through weak supervised learning.

Our model for constructing pseudolabels for the automatic labeling of fastener fastclip regions is inspired by the literature [25]. Ultimately, accurate target classification and semantic segmentation can be implemented for the different classes of track-functional fasteners in the image.

As shown in Figure 2b, we first construct an image target-level pseudolabel for the rail functional area in the rail fastener image positioned by the saliency area. First, let represent the segmentation result of the significant map, whereand are the height and width of the input image, respectively. Suppose that represents the set of different functional area categories in rail fastener image ; then, the significant map of the rail fasteners generated by the SDM is transformed into a binarized image by threshold segmentation. Assuming the existence of binarized label to label the fastclip and background regions, the pixel value of the significant region in the segmentation result is set to 1 and the background pixel to 0. Then, image-level label is constructed for the pixels in the fastener fastclip region, which is mathematically expressed aswhere is the boundary segmentation box for a significant foreground region. When the value of is 1, denotes the set of pixels in the bounding box where the fastener striping region of the figure is calibrated.

Second, we assume that marker vector has a similar definition as: 0 and 1 denote the pixel properties of the background and the pop-up bar region of the graph. In addition, the rail region pixel property is set to 2. Then, the mathematical expression for constructing pseudolabel iswhere parameter represents the probability that the bounding box belongs to the fastener fastclip area. According to equation (8), we perform a pixel-level annotation of the rail regions in the rail fastener image manually and jointly construct the pseudolabeling of the image functional regions with the fastener bullet image-level labels obtained by the above automatic annotation method .

This semiautomatic image labeling method is used to generate the proposed pseudolabel in this work. The pseudolabeling structure consists of the target-level labeling of the fastclip region and the pixel-set labeling of the rail region. Consequently, we can finally obtain complete semantic labels for the different functional regions of the rail fastener image. Furthermore, as shown in Figure 2, we construct a new rail fastener image dataset, which includes the cropped rail fastener image, the generated pseudolabels, and the real state of the fastener.

3.2. Semantic Segmentation and Defect Detection

In this section, we first describe in detail the overall architecture of a vision-based rail fastener defect detection system. The system detects and outputs the state of the rail fastener in the raw input rail image via an end-to-end detection model. Second, we elaborate and discuss the semantic segmentation-based fastener defect detection algorithm within the system.

3.2.1. Overall System Architecture

As shown in Figure 4, the objective of this study is to photograph and collect original rail images in real engineering applications. Our proposed detection system is an end-to-end rail fastener detection system. The acquired original input rail image is first localized regionally, and the target region is segmented based on the SDM to obtain rail fastener images that contain only useful functional region features. Then, a semantic segmentation-based defect detection method for rail fasteners, which utilizes a semantic segmentation model and a state detection method based on vector geometry measurements, is proposed to detect and judge the state of rail fasteners. The results of the rail fastener defect detection and the status classification are finalized and generated.

3.2.2. Status Detection Method

Figure 5 presents the overall architecture of our semantic-segmentation-based approach to the rail fastener defect detection proposed in the detection method. In our designed detection method, we first perform an accurate semantic segmentation of the rails and fastener fastclip in the input rail fastener image using a semantic segmentation model. A fastener defect detection algorithm based on vector geometry parameters is then designed based on the segmentation results of the different functional regions to realize the detection and classification of fastener defects and states.

Functional region segmentation is an important part of this method. In this part, the input rail fastener image of the rail, the fastener bar, and the background region of our interest are segmented using the semantic segmentation model to obtain the semantic segmentation result of the corresponding region. The semantic segmentation model involves the accurate prediction and segmentation of the different functional regions in the rail fastener image by detecting the semantic segmentation network involved in the system.

For the rail fastener images taken and collected in the actual engineering scene, this paper uses PSPNet [26] to learn the context information and key details between the pseudolabels constructed for the different region categories in the global scene of the input image to finally achieve the semantic segmentation of different functional regions in the image. PSPNet can fuse semantic and detailed information features in different network layers. The diverse positions, shapes, and sizes of different functional areas in the rail fastener image can be used in rail fastener detection.

We use the pseudolabeled rail fastener images of different functional regions as training data, extract the underlying features of the input images, and generate the corresponding feature maps using the pretrained deep residual network (ResNet) [27] and the expansion convolution strategies [28, 29] in the PSPNet split network architecture used in this study. ResNet can achieve good network classification and identification by deepening the training network depth; the expansion convolution strategy can expand the size of the receiving field to a certain extent without changing the feature layer scale to obtain more global image information by expansion convolution on the basis of completing the original network structure. Second, we use the pyramid pooling module to learn and acquire contextual information about the different regional category labels in the image based on the full use of the captured global image features. The pyramid pool module in the PSPNet network architecture integrates four pyramid submodels of different scales to sample and output feature elements of different dimensions from low to high dimensions and then fuses the different dimensions to obtain the global context of the pyramid pool. As shown in the figure, the pyramidal pool module involved in this study contains pyramidal submodels with 1 × 1, 2 × 2, 3 × 3, and 6 × 6 dimensions. Finally, the accurate segmentation of the different functional areas in the original rail fastener image is achieved by fusing and convolutionalizing the multidimensional feature elements with the original feature map to generate the final segmentation result map.

Fastener status detection is a detection method based on the vector geometry computation proposed based on the a priori knowledge embodied in the semantic segmentation results of the functional regions. From the semantic segmentation results, we can derive the following information. (1) Regardless of the angle of the input image, the edge of the rail region in the image is always a straight line. (2) The thresholds for the vector pixel distances between different functional regions can be divided by experimentally measuring the images of rail fasteners in different states. (3) If no rail fastener fastclip area is detected in the image prediction results, then the rail fastener is in an unhealthy missing state.

On the basis of the a priori information obtained from the above experimental experience, this paper proposes a method for fastener defect detection and state classification based on vector geometry relationships. The method calculates the vector geometry relationship between the functional regions processed by the semantic segmentation model, determines the threshold of different states according to the specific vector distance, and finally classifies the states according to the defective state of the obtained rail fasteners. The implementation is as follows. First, we determine the linear boundaries of the different functional regions by a least squares linear fitting algorithm. The vector distance and the offset angle from the rail region boundary to the rail fastener region boundary are then calculated. Finally, the defect detection and classification results of the final rail fastener state are obtained by comparing and judging the calculated results with the empirical threshold.

Figure 6 shows two images of rail fasteners in a nonhealthy state and the corresponding state detection principle. For the images of unhealthy rail fasteners in the detached and offset states presented in Figures 6(a) and 6(b), we determine the working state of the fasteners in the images by calculating the vector distance and offset angle of the rail and fastclip boundaries, respectively. Let represent the semantic segmentation results obtained by the above method, and and represent the length and width of the image, respectively, with a range of [0, 473] pixels. Given the regular and smooth rectangle rail region in the image, we obtain the linear equation for the border of the rail region by defining two pixel points and as the locus points for the border of the rail region, which are calculated from the coordinates of the two locus points, as shown by the yellow solid line in Figure 6. In addition, we select 30 discrete pixel nodes with the smallest value of transverse coordinate in the fastener region as the locator, where , and then, use the least squares linear fitting algorithm to determine the linear equation for the near-rail side boundary of the rail fastener, as shown by the green solid line in Figure 6. The error term is defined as . According to the principle of least squares, must be of minimum value, and the condition for obtaining the minimum value of is , which gives

By substituting the coordinates of the location point to the value of parameter , the following equations are derived:

As shown in Figure 6(a), the linear fitting equation for the near-rail measurement boundary of the rail fastener region is obtained by the above method. Defining parameter as the vector distance from the rail region boundary to the fastener region boundary and assuming that and are the pixel points on rail region line and fastener region boundary line , respectively, is calculated as

To simplify the calculation of the equation, we set to 236. Empirically, if the vector distance between the line of the fastener near the pixels on the side of the rail and the rail in the predicted result is less than 100 pixels, then the rail fastener is in the normal state; if the vector distance between the two is in the range of 100–160 pixels, then the rail fastener is in an unhealthy loose state; if the vector distance between the two is greater than 160 pixels, then the rail fastener is in an unhealthy separate state. If the fastener sling area is not detected, the rail fastener is in the missing state.

In addition, we calculate the degree of deflection of the rail fasteners relative to the rails by calculating the angle of clamping of the two boundary lines ( and ), as shown in Figure 6(b). According to the experimental test, if , then the rail fastener is in the dislocated state; otherwise, it is in the normal state within a reasonable error. Parameter is calculated as

Calculated vector distance parameter and angular offset are compared with a predetermined threshold, and the rail fasteners are subjected to defect detection and state classification by threshold judgment. Furthermore, to avoid detection errors due to the simultaneous action of two geometric parameters on the experimental results, the rail fasteners in the unhealthy state satisfy only one of the dislocated or other nonnormal states, prioritizing the influence of the vector distance parameter on the detection results during the experiment.

4. Experiments and Analysis

In this section, we first construct a SFC rail fastener dataset by the method described in Section 4.1. A new evaluation metric is then proposed to quantitatively characterize the effect of rail fastener fastclip region positioning based on a significance detection model. In addition, by conducting experiments on the rail fastener image dataset, the results of the experiments performed on the present algorithm are qualitatively and quantitatively validated.

4.1. Dataset

Our experimental data were derived from original rail images taken and captured in real railway project scenarios, and we segmented the fastener regions on both sides of the steel rail in the original images and constructed the SFC-type rail fastener dataset using the method described in Section 3.1. Our dataset contains images of rail fasteners on the left and right sides of the tracks, pseudolabels corresponding to the images, the ground truth, and the true state of the fasteners in the images. In addition, we classified the type of fasteners according to the state of the rail fasteners into positive and negative data samples, where the positive samples include the image data of the rail fasteners in the normal state, and the negative samples include the image data of the rail fasteners in the loose, detached, missing, and dislocated states.

The SFC rail fastener dataset contains 2000 positive sample data in the normal state and 2000 negative sample datasets with defects (including 500 negative sample data in each of the three different states). In addition, we divided the equal amount of rail fastener image data for different state conditions in the dataset into a training set and a test set by random selection, with 2400 samples for training and 1600 samples for testing.

4.2. Analysis of Experimental Results

We first quantitatively evaluated the positioning effect of the rail fastener fastclip region and then conducted extensive experiments on the rail fastener image dataset to further qualitatively and quantitatively validate the experimental results of the detection system designed in this work.

4.2.1. Experimental Analysis of Fastener Railroad Fastclip Positioning

In existing studies on the positioning of the rail fastener area by computer vision and image processing techniques, existing methods can usually only qualitatively describe and evaluate the results of fastener positioning experiments. Given the lack of uniform evaluation criteria for the positioning results of the rail fastener images, the experimental results of the fastener region positioning are difficult to quantitatively evaluate and compare. Thus, we introduced the evaluation parameters in the visual attention mechanism [3034].

The Precision-Recall curve, the F-measure [35], and the mean absolute error (MAE) are used as indicators for evaluating the accuracy of fastener positioning for the effective evaluation and analysis of the accuracy of positioning results from a quantitative perspective.

In the fastener fastclip image dataset, a corresponding ground truth was first constructed for each rail fastener image in the dataset by manual labeling at the pixel level. Then, the resultant diagram of the fastener fastclip region localization obtained by this method was threshold selected as a binary segmentation from 0 to 255. Finally, for each threshold condition, the fastclip positioning results plot was compared pixel-by-pixel with the true value plot, and the evaluation parameters of the positioning results relative to the ground truth were calculated. The experimental results on the positioning of the fastener fastclips were evaluated from a quantitative point of view. Precision represents the ratio of the correctly positioned area of the fastener fastclip in the positioning result map to the actual overall area of the fastclip in the positioning result, and Recall represents the ratio of the correctly positioned area of the fastener fastclip in the positioning result map to the theoretical area of the fastclip in the true value map. The F-measure is a comprehensive indicator for evaluating the final positioning result. The MAE is used to measure the positioning error of the positioning result relative to the true value image, which is calculated as follows:

In formulas (13) and (14), represents the pixel sample data of the real fastener area located, and and , respectively, indicate the false and missed pixel data samples of the positioning experiment results relative to the theoretical railroad fastclip area in the truth map. In formula (15), the value of is 0.3 [35]. and in formula (16) are used to represent the length and width of the input rail fastener image to be processed, and parameters and represent the horizontal and vertical coordinates of the pixel node in the image, respectively.

We quantitatively evaluated the results of the localization of the fastclip region on a dataset of 200 images of rail fasteners in different states. In addition, the fastener positioning results obtained by this method were compared with the positioning effects of several other classical target saliency detection algorithms on rail fasteners, including the SER [36], SWD [37], SIM [38], SS [39], MR [40], RCRR [41], and ACM [42] algorithms. The representative serial numbers in Figure 7 are 2–8, respectively, and the graphs of the Precision-Recall curve, F-measure, and MAE of various algorithms for locating the fastclip area in the rail fastclip image are presented.

As shown in Figure 7(a), the localization experiments performed for the fast clip region significantly outperform several other saliency detection algorithms. The results show that the proposed localization method achieves the best accuracy of more than 0.9. Although the background region in the localization results was clearly separated from the foreground fastclip region, which can effectively meet the localization and segmentation needs of the postexperiment, the recall values below 0.4 do not correspond to the ideal situation (high accuracy value of 0.9) and show an increasing trend because they were not treated by the model as completely irrelevant black backgrounds. In addition, all methods converged toward an accuracy value of 0.12 when a threshold value near 255 was selected for the maximum recall value. During the convergence of the curves, the accuracy of the methods was significantly better than that of the other algorithms at any Recall value. The experimental results show that the fastener fastclip location effect of the proposed method is more accurate than those of the other methods and has a lower false alarm rate and robustness. Figures 7(b) and 7(c) show an F-measure and MAE of 0.5632 and 0.1302, respectively, for the experimental results obtained by this method. The positioning area of the rail fasteners obtained by this method is more accurate than the positioning effect of several other algorithms, and the average absolute errors of the positioning results are smaller than those of other methods, indicating that this method can position rail fasteners more precisely.

4.2.2. Functional Region Semantic Segmentation Results

We validated the detection effect of the proposed method on the constructed SFC-type rail fastener dataset. The segmentation network was modeled and trained on the training set based on the pseudolabeling of each image sample, namely, the semantics and segmentation of the “rail” and “fastclip” regions of the image with different properties. Next, we fed 1600 rail fastener image samples from the test set into a trained semantic segmentation model for automatic region identification and semantic segmentation, enabling accurate targeting and segmentation tasks for specific functional regions in the image samples.

The images of rail fasteners in four different states for the dataset were constructed. Accurate regional localization and semantic segmentation can be performed through the proposed method. The experimental results are shown in Figure 8. Figure 8(a) shows the rail fastener images of the five different fastener state types obtained by this method, where the rail fastener image in the normal, loose, separate, missing, and dislocated states are shown from the top to bottom; Figure 8(b) shows the rail fastener remarkable images for the completed fastclip remarkable area positioning; Figure 8(c) shows the functional area ground truths for the manually labeled multiple fastener images; Figure 8(d) shows the target-level pseudolabels for the rail fastener image constructed by this method, where the red and green areas are the labels for the fastclip and rail areas in the image, respectively; Figure 8(e) shows the predicted functional area results for the rail fastener image for different state types, where the gray area represents the predicted rail area, the white area represents the predicted fastclip area, and the other background and unrelated functional areas are black. In addition, Figures 8(f)8(j) show the corresponding schematic of the fastener image data on the other side of the rail.

The experimental results in Figure 8 indicate that the functional region semantic segmentation results obtained by this method and shown in Figures 8(e) and 8(j) have an excellent experimental effect compared to the corresponding ground truth shown in Figures 8(c) and 8(h). The experimental results show that our segmentation results are accurate and automatic for the functional areas of interest in all five types of rail fastener images and can effectively filter out other areas and complex backgrounds. Therefore, the region segmentation method established in this work has significant validity and accuracy for rail fastener images.

4.2.3. Rail Fastener State Detection Results

In the field of rail fastener detection, the performance of rail fastener detection and classification methods is an important factor in confirming system reliability. For the 800 data samples in the test set used for algorithm validation in different states, we detected and classified the states of the rail fasteners using the proposed vector-based geometry measurement method. In the test results obtained by this method, we assumed that parameter indicates the number of samples of true positives detected by this method, indicates the number of samples of true negatives detected, indicates the number of samples of false positive data detected, and indicates the number of samples of false negative data detected. The Precision, Recall, and F-measure of the state detection and classification results of the rail fasteners were, respectively, calculated by equations (13)–(15) (the value of parameter in equation (15) is set to 1 in this experiment), and the accuracy of the experimental results was evaluated in conjunction with the Accuracy index. The mathematical expression of the Accuracy indicator is

The experimental results are shown in Table 1, and the proposed rail fastener detection method can achieve 93.13% Accuracy on the validation dataset (contains 800 positive samples and 800 negative samples of various data types). Average Accuracy and Recall rates of 92.17% and 90.50% were achieved for the five different states of the rail fastener image data in the validation set, with an average F-measure of 0.9127. In addition, the experimental results indicate that the proposed method can obtain more accurate detection results for samples of rail fasteners in the normal, loosing, separate, missing, and dislocated states.

The experimental results indicate that the rail fastener detection system and proposed method obtain good experimental results on the SFC-type rail fastener dataset. Specimens of rail fasteners in normal, loosing, missing, and dislocated states can be obtained with high accuracy in the experiment, but the accuracy against the loose state of the fasteners is only 85.41%. The difference between the characteristics of the image samples of rail fasteners in the loosened state and the normal and disengaged state is small that even experienced professional inspectors cannot accurately determine whether the rail fasteners are in the loosened state in the inspection work. Consequently, detecting rail fasteners in a loose state is difficult and challenging, so the accuracy of the results of testing the state of rail fasteners in the loose state is affected compared to several other samples of state rail fasteners. However, considering the overall experimental results, the proposed method can be used to conduct an accurate state inspection of rail fasteners in different states with obvious accuracy and reliability and can achieve better inspection results in practical engineering applications.

To demonstrate the validity of our method, the detection accuracy metrics of different detection methods were calculated and compared on the validation dataset constructed above. Table 2 shows the total accuracy of different methods for SFC-type fastener images. The table shows that our method significantly outperforms several other detection methods mainly because the other schemes do not consider the state detection of the offset fastener during design and implementation, producing unsatisfactory experimental results and accuracy for this part of the negative sample detection. In addition, the difficulty of detecting the negative sample data in the loose state also affects the detection performance of other methods and can thus affect the overall performance of the detection method to some extent.

To test the performance of the detection method, the proposed algorithm was tested on a dataset of SFC-type rail fastener images taken and constructed in a real railway scenario. The proposed semantic-segmentation-based rail fastener detection method was executed on a computer equipped with an Intel Xeon W-2150B processor (10 cores and 10 threads). Experimental calculations indicate that the time to detect the state and classification of each rail fastener using the proposed method is 1.135 s, which can meet the requirements of practical engineering inspection tasks. In addition, we evaluated the efficiency of our semiautomatic labeling method by randomly selecting 50 images of rail fasteners from several types of data. The average time for the semiautomatic annotation of a single image in the dataset was 5.256 s, while the average time for manual annotation was 132 s. This method is 25 times more efficient than the manual method in annotating all the images. In summary, this method can effectively reduce the cost of manual annotation and has very reliable detection performance and a good generalization ability, which can meet the accuracy and efficiency requirements of engineering inspection scenarios.

5. Conclusions and Future Work

In this study, we aim to address the limitations of traditional rail fastener detection methods and deep learning theory in engineering applications. A semantic segmentation-based rail fastener state recognition algorithm is proposed. First, we propose a functional area positioning and labeling method based on the salient detection model in the system, and a novel SFC-type rail fastener dataset was constructed by labeling and constructing pseudolabels for fastclip and rail regions through an interactive semiautomatic labeling method. Then, the fastener state in the original rail image was classified and produced by the detection system designed in this study. Second, a rail fastener state detection method based on the semantic segmentation model was designed, and the fastener state was detected by the semantic segmentation network and vector distance calculation method. The experimental results show that the proposed method has good accuracy and robustness while effectively saving cost and can achieve good experimental results in practical application scenarios.

Although the proposed method was validated by numerous effective experiments on our dataset and achieved promising results, it can still be improved. Although the results based on the saliency detection model satisfy the requirements of the algorithm, the model can be further optimized and improved. Moreover, detecting the state of the rail fastener image of the loose fastclip is still challenging and difficult. In future work, we plan to address some of the above limitations. Nevertheless, we believe that the proposed inspection method is a major step forward in the automation of rail inspection and has significant implications for practical engineering applications.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (no. 51975347 and 51907117) and Key Science and Technology Support Project of the Shanghai Science and Technology Commission (no.18030501300).