Abstract

In recent years, the modern information society has entered the era of big data. Big data has rapidly developed into a popular category favored by academia and industry and has been widely used. This paper is aimed at studying the application of the 3D modeling technology of mobile big data to the monitoring of the state of electric vehicles. It proposes related concepts of big data algorithm and 3D modeling technology, introduces the trajectory matching algorithm based on mobile big data, and finally introduces the related concepts of electric vehicles and the design of the big data analysis platform for condition monitoring of substation equipment. On the basis of discussing the application of the three-dimensional modeling technology of mobile big data in the electric vehicle condition monitoring system, the experimental research was carried out with the big data in the condition monitoring of the intelligent substation. The experimental results of this paper showed that the storage overhead of the NJLS model was reduced by 34% to 40% compared with the conventional star schema, which could reduce the space overhead.

1. Introduction

Big data is the most popular direction of information science research nowadays. Relevant technology has gradually developed into a database system since the mid-1980s. It revolves around all the data and continuously excavates some data information with potential value, which has great application value. This is a dynamic and interactive process. With the rapid popularization of automobiles and the increase in the number of automobiles, energy consumption and environmental pollution are becoming increasingly serious. Electric vehicles are attracting more and more attention due to their low pollution, clean, and efficient characteristics. The electric vehicle drive system is mainly composed of a high-efficiency motor, a controller, and an energy storage device. The development goal of electric vehicles is zero emissions.

Compared to conventional cars, electric vehicles are still in their infancy. The working environment and the control technology of electric vehicles are more complex. If an electric car breaks down, it will not work properly and heavy electric vehicles will cause serious road accidents. Therefore, mobile big data is needed to monitor the status of electric vehicles. The main purpose of monitoring this data is to provide early warning of impending motor failures, to perform preventive regular maintenance on possible future problems, and to provide a diagnostic plan. Therefore, it is of great practical significance to apply the 3D modeling technology of mobile big data to monitor the status of electric vehicles.

The innovations of this paper are as follows: (1) The use of big data to monitor the state of electric vehicles is innovative and practical. (2) The monitoring of the state of electric vehicles can effectively protect people’s lives, health, and safety.

With the advancement of economy and innovation, portable large information has continuously assumed a significant part and numerous researchers have done investigation on it. Qing-Chao et al. installed spark on the THREAD platform, built a more complete macroanalysis model based on mobile data, and used the monthly data of more than 29 million users for ETL and mining processing, proving that the model could be applied to the mining of urban macrotraffic characteristics [1]. Mobile cloud computing (MCC) and IoT technology based on wireless network technology are developing rapidly. Stergiou and Psannis combined the above two technologies (i.e., MCC and IoT) with big data technologies to examine their common characteristics and found the advantages of MCC and IoT that could improve the use of big data applications [2]. In order to study how to mine valuable information in low-complexity multimedia big data, Guo et al. proposed a big data object detection method with a compressed measurement domain under a mobile distributed computing architecture [3]. In recent years, the research field of mobile big data has risen rapidly but somewhat fragmented. Xiang et al. aimed to provide a complete picture of this emerging field, build multidisciplinary bridges, and hopefully inspire future research [4]. Using big data analysis technology, Parwez et al. contributed in two ways. First, mobile network data (big data)—call detail records—was used to analyze the abnormal behavior of the mobile wireless network. Second, a neural network-based predictive model was trained using anomalous and non-anomaly data to highlight the impact of anomalies in the data while training/building an intelligent model [5]. Qiao et al. proposed a mobile big data framework, called FMBD, which provided massive data traffic collection, storage, processing, analysis, and management functions to cope with massive data traffic [6]. Hu and Yan solved the private dot product calculation problem for mobile big data applications, and the calculation efficiency was very ideal [7]. However, the shortcoming of these studies is the uncertainty of the data quality; the calculation and analysis of massive data are very complicated, so the construction of the model still needs to be improved.

3. Method of 3D Modeling Technology for Mobile Big Data

3.1. Big Data Algorithms
3.1.1. Data Mining and Machine Learning Framework

Data mining refers to the process of using certain algorithms to automatically search for information and laws with special relationships in a dataset from a large amount of data. Today’s data not only is huge but also has incomplete and ambiguous characteristics [8]. Finding out the relevant patterns is the task of data mining. It has two main characteristics: one is that it can obtain valuable information in large-scale data. The second is that it can respond according to the changes in the data and adapt and collect the changes in the data in real time. With the rapid improvement of computer computing and data processing capabilities, more and more machine learning algorithms are applied in practice. The corresponding weight parameters are obtained through the training data, the accuracy is obtained by inputting the model with the test dataset, and the final output result is obtained by making a decision based on the experience generated by the historical training data [9]. As the technical support and theoretical basis of artificial intelligence, machine learning not only can process and analyze data to obtain laws and parameters quickly but can also classify and predict related data problems. In the current trend of increasing data volume, machine learning is particularly important. Today, there are numerous frameworks, libraries, and tools available for machine learning, which facilitate the use of machine learning techniques.

As per Wikipedia’s definition, huge information alludes to informational indexes that surpass the capacity, the board, and handling abilities of conventional programming or consume additional time than OK. With respect to idea of large information, it is for the most part accepted that there are 4Vs, to be specific volume, speed, assortment, and worth. At the present transformative phase, giving a compelling and exact meaning of large data is remarkably difficult. And when a new technical concept is proposed, it usually needs to go through a process [10].

The first is volume. This is the biggest feature that distinguishes big data from traditional data, that is, the amount of data is large. This feature is inseparable from the development of technology. First of all, the data storage capacity in the past was very limited. According to the development of Moore’s law, the performance of hardware has been continuously improved and the price has gradually decreased, making large-scale data storage possible. In addition, the proliferation of technologies such as social networking, e-commerce, and the Internet of things has created a flood of data.

The second is velocity. It refers to the constant influx of data flow at an unprecedented rate and must be processed within an acceptable time frame. This is the main challenge faced by big data. Traditional data storage and processing methods simply cannot achieve the efficiency that can be used.

The third is variety. It refers to the variety of forms of data, mainly because of the abundance of data sources. The storage and analysis of these data are also two of the main challenges faced by big data.

3.1.2. Big Data Storage

In the earliest days, information was straightforwardly put away in records. Such a file storage method has many defects, cannot be stored for a long time, and has a small amount of storage. With the development of disseminated record frameworks, information can be put away as appropriated documents, which enjoys the accompanying benefits: It can oblige a lot of information [11]. The distributed file system represented by HDFS can store a large file in multiple machines, and let each machine store a part of the file to dissipate the pressure of a single machine. And redundant backup of data is supported. By default, HDFS saves three copies of each data and distributes them to different machines in the cluster, so that even if a machine in the cluster goes down or is completely destroyed, the data will not be lost. It has good scalability. When the amount of data increases gradually, the data storage capacity can be improved by adding cluster machines. There are three key components in HDFS, namely, DataNode, NameNode, and Clié. The DataNode sends the information of the data block to the NameNode [12], and the Clié is responsible for initiating read and write requests.

3.1.3. Correlation Analysis

Correlation analysis is an exploratory attempt before data analysis, which aims to explore the relationship and nature of input parameters and variables. The results of correlation analysis can effectively guide what method to take in the next step. For instance, in displaying, the information overt repetitiveness of the model can be diminished by the consequences of relationship investigation, which is the essential work before information mining. During the time spent information demonstrating, there are many sorts of boundaries and the connection between every information is unique. All kinds of parameters are directly used as model input, which increases model redundancy and reduces model calculation efficiency. In order to simplify the model input and improve the calculation efficiency of the model, it is urgent to perform a correlation analysis on the original parameters and eliminate the variables with greater correlation [13].

The commonly used correlation analysis methods are as follows: (1)Covariance and the covariance matrix

Covariance can be utilized to gauge the general mistake of two factors. In the event that it is positive, the two factors are decidedly corresponded; in any case, they are adversely related; assuming that the factors are free of one another, the covariance is 0.

Correlation analysis is performed on the two groups of data in equation (1). When there are more than two groups of data, the covariance matrix needs to be used. Taking three sets of data as an example, the equation for calculating the covariance matrix is as follows: (2)Correlation coefficient

The correlation coefficient is a statistical indicator of the degree of closeness between variables [14].

is the sample correlation coefficient, is the sample covariance, is the sample standard deviation of , and is the sample standard deviation of (3)Mutual information analysis

When , the mutual information between variables and is defined as follows:

In equation (4), is the joint entropy of variables and ; and are the unconditional entropy of variables and , respectively. In order to facilitate the evaluation, the mutual information is normalized, which can be expressed as follows:

The value of will reflect the magnitude of the correlation between the variables and .

Although the commonly used Pearson coefficient method and regression analysis method can effectively quantify the linear relationship between variables, they cannot characterize the correlation between nonlinear-related variables. In order to facilitate the numerical representation of the relationship between nonlinear correlated variables and realize quantitative statistics, the correlation can be analyzed by applying mutual information.

3.1.4. Research Status of 3D Imaging Technology

A 3D data model is an abstract-simulated representation of real things that connect the computer world and the real world. Three-dimensional modeling methods mainly include vector data structure, splicing, mixing, and analysis.

3.2. Basic Concept of Trajectory
3.2.1. Trajectory Data Classification

According to the different requirements for collecting trajectory information, the trajectory sampling method can be adjusted. Trajectory data can be roughly divided into three categories: time-based sampling trajectory data, location-based trajectory data sampling, and trigger event-based sampling trajectory data. This section will introduce these three types of trajectory data.

(1) Sampling Trajectories Based on Time. Trajectory-based temporal sampling, represented by equation (6), refers to discrete trajectory points preserved by sampling the information of a moving object in the same time period and recording its position and time information [15].

(2) Sampling Trajectory Based on Location. A location-based sampling path is an independent trajectory point reserved for sampling data, and record its position, time, and other properties as the position of the moving object changes. It can be represented by equation (7) as follows:

(3) Sampling Trajectory Based on Trigger Events. Based on the trigger event, the sampling path triggers the reception of sensor information through some specific behaviors of the moving object by recording the current position, time, and other attribute information of the moving object. For example, using mobile phones for daily calls, surfing the Internet, and swiping credit cards, citizen cards, etc. will trigger the work of some sensors such as surrounding base stations and card swiping machines, so that the corresponding data will be recorded and saved to the database. It can be represented by equation (8) as follows:

Time sampling trajectories are generally used for vehicle GPS positioning, animal migration research, etc.; location sampling trajectories are generally used for individual travel and group migration research; trigger event sampling trajectories are often used for user check-in and user hotspot research.

When the trajectory points collected based on the time sampling method express trajectory information, it may cause the lack of important trajectory features, which is not conducive to trajectory analysis research. At the same time, a large amount of redundant trajectory information will also be collected, which increases the complexity of the later trajectory analysis. Most of the location-based sampling methods require manual screening, which not only increases the labor cost but also makes the trajectory data sorting and updating cycle longer, and the acquisition process is more complicated. The collection of trajectory points based on the trigger event sampling method can obtain a large amount of trajectory data, the data is automatically collected by the sensor, the sorting and updating cycle is short, and the sampled trajectory data can also better express the trajectory characteristics. However, this collection method also has problems, because data is collected only when the corresponding event is triggered, which may cause the situation that the complete trajectory cannot be expressed.

3.2.2. Track Similarity Calculation

Use the corresponding function to obtain the distance between trajectories and use it as the metric basis. The smaller the distance is, the higher the similarity between trajectories is. The main similarity measurement methods are as follows: Euclidean distance method, Hausdorff distance method, longest common subsequence method, edit distance method, and dynamic time warping method. These intertrack similarity measures are described in detail below.

(1) Euclidean Distance Method. Euclidean distance is the most commonly used method to measure the similarity of objects. For the same dimension, the Euclidean distance between two trajectory points is calculated and the Euclidean distance between the trajectories can be obtained by adding the distances in each dimension.

Equation (9) is as follows:

(2) Hausdorff Distance Method. A common metric to measure the distance between trajectories is the Hausdorff distance method.

The main idea of the algorithm is that, given the target trajectory point set and the target trajectory point set to be matched, Hausdorff distance calculation equation (10) of the two trajectories is as follows:

The calculations for and are shown in equation (11) as follows:

Among them, 1 and 2 express the Euclidean distance between the trajectory points in trajectory point set and trajectory point set . Among them, 3 is the one-way Hausdorff distance between trajectory point set and trajectory point set , which is the minimum and maximum distances from the trajectory point in trajectory point set to the trajectory point in trajectory point set . It expresses the degree of the least similarity between trajectory point set and trajectory point set , and similarly, 4 can be obtained. Since the Hausdorff distance is directional, in general, 5 and 6 are not equal and the maximum value is selected as the final Hausdorff distance to express the degree of difference between trajectory point set and trajectory point set . That is, the distance from each point of trajectory point set to each point of trajectory point set is not greater than the Hausdorff distance, as shown in Figure 1.

(3) Longest Common Subsequence Method. The main idea of the longest common subsequence method is to find the longest common subsequence between two trajectories [16]. First of all, the definition of a subsequence is to ensure that the order of the original sequence remains unchanged and there is a new sequence obtained by selecting any element from the sequence. Thus, the subsequences may not necessarily be contiguous. Furthermore, through the length of the longest common subsequence, it is converted into a corresponding distance metric to represent the similarity between the two tracks, as shown in equation (12).

Among them, and are two trajectories and the numbers of trajectory points of the two trajectories are and . 1 is the length of the longest common subsequence between and , and 2 and 3 represent the thresholds in the horizontal and vertical directions, respectively. When the difference between the two track points in the corresponding directions is less than 4 and 5, it means that the two track points are similar and the value of 6 is increased by 1. Converting the longest common subsequence length into a distance metric, the relevant equation (13) is as follows:

represents the smaller of and .

The Euclidean distance algorithm has a simple structure and is easy to use. However, the algorithm requires the trajectory to have the same number of trajectory points, which causes a relatively large limitation for the input of experimental data, and does not consider other factors in the process, because the Euclidean distance is very sensitive to noise. When there is noise in the trajectories, it is easy to affect the similarity judgment between trajectories and affect the accuracy of the experimental results.

The Hausdorff distance method is more widely used in practical scenarios and also has higher efficiency and accuracy. However, this method is sensitive to noise track points. Even if there are a small number of noise points, it will have a significant impact on the calculation of similarity distance. Therefore, when using this method, it is necessary to pay attention to the removal of track noise points.

Due to the discontinuity of subsequences, when noise occurs in the trajectory, it can be avoided as much as possible. Therefore, the longest common subsequence algorithm has higher robustness and accuracy. However, due to the huge amount of data points in the trajectory, the algorithm requires a large amount of calculation and is not efficient. Even if the idea of dynamic programming is used, it still consumes a lot of computing time. Therefore, this method is less used in projects with massive data.

3.3. Design of Big Data Platform for Condition Monitoring of Substation Equipment
3.3.1. Traditional Substation Equipment Condition Monitoring Platform Architecture

The information procurement layer is the establishment for laying out the substation condition checking information stage. The recorded information sources and substation gear condition observing information in the conventional social dataset are transferred to the information stockroom through ETL (extraction, change, cleaning, and stacking). ETL will extricate, clean, change, and burden the information dispersed in different business frameworks (like security creation, projection of the board, and condition checking frameworks), with the goal that this essential information becomes top notch and significant information for brilliant substations. Its consistent construction is displayed in Figure 2 [17].

3.3.2. Design of the Condition Monitoring Platform for Substation Equipment under Big Data

The data acquisition layer mainly collects data of substation equipment through CAC (state access controller), sensors, etc. and transmits it to CAG (state access gateway machine) in the form of Web service. According to the advantages of the distributed file system HDFS and relational database MySQL in various fields, the data storage layer will integrate HDFS and MySQL and play their respective roles in their areas of expertise. Condition monitoring big data with unified specifications is stored in the distributed file system HDFS. MySQL is mainly used to store various model information of substation equipment condition monitoring and manage Hive metadata. The tables, fields, and spacers created by Hive will be stored in MySQL. While performing data operations, the MySQL engine needs to be started to verify the existence of metadata. Impala can share metadata information with Hive.

Most of the traditional solutions use conventional data storage and analysis methods, resulting in poor system scalability and high cost, and cannot meet the requirements of substations for complex analysis and trend prediction of the collected monitoring data. The integration of distributed data storage technology and big data analysis technology into the condition monitoring data platform brings a new idea to the data storage and analysis of condition monitoring. Considering the immense measure of information of substation hardware, it is important to relocate the information of conventional the social dataset (like MySQL) to nonsocial dataset. Figure 3 shows the design of the substation hardware condition checking stage under enormous information [18].

3.4. Development Trend of Electric Vehicles
3.4.1. Characteristics of New Energy Vehicles

The characteristics of the main types of new energy vehicles in China are compared as in Table 1.

3.4.2. System Scheme Design

From the analysis of functional requirements, it can be seen that the system needs to have the function of remote monitoring of electric vehicle motor status and faults: data acquisition function, data processing function, data remote transmission function, and remote monitoring data function. The first three of these four functions need to be realized on the vehicle end, and the remote monitoring part is realized on the cloud server end and the display terminal. The overall structure diagram of the system composed of each function is shown in Figure 4.

The system design scheme is given as follows: (1)Data acquisition function

The data acquisition part of the system can be connected to the CAN bus as a node device to interpret the communication protocol of the application layer and obtain the data required by the system [19]. (2)Remote monitoring data function

The data monitoring function needs to be realized by building a monitoring system that displays the data engine. Therefore, the mobile phone application is selected as the monitoring interface for displaying the data engine.

3.4.3. Classification and Application Status of the Electric Vehicle Bus System

The structure of electric vehicles is very different from traditional vehicles, but the communication category of the bus is still the same. Figure 5 shows the basic structure of an electric vehicle (in series). It can be seen that both the vehicle controller and the motor controller are connected to the high-speed CAN bus of the system [1].

The vehicle controller (VCU) is designed with a modular circuit consisting of a high-performance microcontroller, CAN and peripheral circuits, and special sensor acquisition circuits. The VCU receives various data on the CAN bus (control signals such as an accelerator gear and vehicle status such as vehicle speed, etc.), stores the required information after analyzing and judging, and sends control commands to the corresponding control system. The motor controller (MCU) is designed based on a high-performance main processor and is the unique core control electronic unit for electric vehicles. Therefore, it can receive the control instructions sent by the VCU, can control the motor to run according to the control instructions of the VCU, and also has the functions of motor fault diagnosis, protection, and storage [20].

As can be seen, the engine speed, torque, voltage, temperature, and other data collected by this system come from the vehicle controller and the engine controller, which communicate with each other through the high-speed carbon canister. Therefore, the data acquisition and transmission module should also be connected to the high-speed CAN bus of the electric vehicle, and then, collect the data required by the system through the CAN bus. The data acquisition hardware circuit is composed of the CAN transceiver, microcontroller containing CAN controller, and its peripheral circuits. This hardware circuit can be directly added to the CAN bus of electric vehicles as a CAN node device. Through the software realization of the data acquisition circuit function by using C language, the data acquisition module can receive the data frame composed of the motor speed and current and other data on the CAN bus of the electric vehicle and realize the data acquisition function.

3.4.4. Introduction to the Principle of CAN Bus

CAN (controller area network) is a serial communication protocol of the International Organization for Standardization (ISO). The main control process is a continuous loop program for real-time monitoring of battery voltage, current, temperature, and other information, as well as surge and high-temperature alarms. The real-time tracking information is stored for easy collection by the host computer [21]. The flowchart is shown in Figure 6.

4. Big Data Monitoring in Smart Substation Status Experiment

This paper discussed the application of the 3D modeling technology of mobile big data in the electric vehicle condition monitoring system and proposed an experimental analysis of the data model of electric equipment condition monitoring based on connectionless hierarchical coding.

The Not Join Level-encoding Schema is abbreviated as NJLS. NJLS applies this model to the condition monitoring of substation equipment, which brings a different method to condition monitoring big data storage and big data analysis [22].

4.1. Monitoring Data Preparation
4.1.1. Data Preparation

Three sets of monitoring large datasets (S1S2S3) were used in this section, and three test groups participated in a comprehensive evaluation of the model, including data loading monitoring, aggregation operations, and overall storage costs. The individual datasets are shown in Table 2.

4.1.2. Monitoring Data Loading

In the test to compare the data charging performance, this work used single-cable data to charge Hive and Impala. Figure 7 shows the loading time and loading speed of the monitoring data loading experiment.

The following conclusions can be drawn from the data loading speed in Figure 7(b): (1)Compared with the conventional star model, the NJLS model for condition monitoring of substations has no advantages in loading. This is due to the need for preprocessing and hierarchical encoding settings for the NJLS model state monitoring dataset, and the monitoring data loading speed is about 42% of the conventional model(2)The data loading speed of Hive and Impala is relatively stable and will not decrease significantly with the doubling of condition monitoring data. Because when loading data, the data to be loaded is stored in HDFS and the metadata correction is completed at the same time(3)Figure 7(b) shows that the loading speed of Impala monitoring data is slightly slower than that of Hive. Due to the limitations of the data file format supported by Impala, Hive needs to be used to load the data first and then perform other operations

4.2. Operation Details

Table 3 shows the running season of the rollup activity between the NJLS model and the star construction with various checking dataset sizes. Figure 8 shows the presentation patterns of NJLS and star blueprint in Hive and Impala frameworks with various condition observing datasets.

Combining Table 3 and Figure 8, it tends to be seen that (1) the roll-up execution season of the NJLS model is 40% to 49% more limited than that of the star model and (2) the running season of Impala’s roll-up activity for each condition checking informational index is more limited than that of Hive.

4.3. Storage Overhead

This group of experiments mainly analyzed the storage overhead of the NJLS model and the conventional star model from the aspect of data storage and proved through theory and experiments that the storage overhead of the NJLS model was still smaller than that of the star model even under large-scale condition monitoring data volume.

In this set of experiments, the number of backups was the default value of 3 for Hadoop. Figure 9 compares the storage overhead of the NJLS model and the regular star model in different condition monitoring datasets of Hive and Impala.

Taking the S2 condition observing informational collection in Figure 9 for instance, it tends to be seen that (1) the capacity upward of the NJLS model is decreased by 34% to 40% contrasting and the ordinary star model and (2) among numerous huge information investigation frameworks, Hive, Impala stockpiling upward is little.

5. Discussion

With the widespread application of wireless communication technology and sensor technology, the coverage of communication equipment is expanding [23]. Base stations set up in cities can receive radio waves from devices with moving objects and convert them into mobile data [24].

With the quick advancement of imaging innovation and Internet innovation, varying backgrounds have advanced higher necessities for the instinct and realness of data show and the examination of 3D imaging innovation has likewise gotten increasingly more consideration. The genuine working information of the electric vehicle motor is vital for the support and innovative work of the electric vehicle motor, and guaranteeing the security of the electric vehicle is additionally vital [25].

The 3D scene constructed by 3D imaging technology is not only realistic but also convenient for spatial analysis. With the explosive growth of the number of electric vehicles in China, the technical core power battery of electric vehicles has gradually attracted more attention. The application basis for the development of the electric vehicle industry is the data analysis of the operation of the power battery. Therefore, online monitoring and evaluation of electric vehicle power batteries are of great significance.

6. Conclusions

Combined with people’s expectations for electric vehicles and the development trend of the electric vehicle industry, this paper used Hadoop technology to build a substation equipment condition monitoring experimental platform and briefly described the construction process of the platform. The NJLS-based substation condition monitoring data model proposed in this paper was compared and analyzed in Hive and Impala from three aspects of monitoring data loading, summary operation, and storage cost. The test results showed that the NJLS-based substation condition monitoring data model proposed in this paper could solve the connection operation problem distributed in multiple troublesome tables in ROLAP. So, it is more suitable for performing big data analysis on large-scale and distributed clusters.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Acknowledgments

This work was supported by the Research on Design Theory and Control Technology of High-Speed, Light-Load and Limited-dof Robots for Electronic Information Manufacturing and Light Industrial Production (Project approval number: 2018JY0116; project name: application of 3D modeling technology of mobile big data in the state monitoring system of electric vehicles).