Modern engineering systems show increasing complexity due to their high nonlinearity and large disturbances and uncertainties introduced into the systems. In many circumstances, the conventional mathematical models that can accurately describe these complex systems and can be exploited in real-life applications, such as differential equations or statistical models, do not exist. However, with the fast development of advanced sensing, measurement, and data collection technologies, large amounts of data that represent input-output relationships of the systems become available. This makes data-driven modelling (DDM) possible and practical.

Data-driven modelling aims at information extraction from data and is normally used to elicit numerical predictive models with good generalisation ability, which can be viewed as regression problems in mathematics. It analyses the data that characterise a system to find relationships among system state variables (input, internal, and output variables) without taking into account explicit knowledge about physical behaviours. Many paradigms utilised in DDM have been established based on statistics and/or computational intelligence. For instance, artificial neural networks (ANNs) and fuzzy rule-based systems (FRBSs) serve as fundamental model frameworks, which are alternatives to statistical inference methods. Evolutionary algorithms (EAs), swarm intelligence (SI), and machine learning (ML) methods provide learning and optimisation abilities for calibrating and improving the intelligent or statistical models. In recent years, DDM has found widespread applications, ranging across machinery manufacturing, materials processing, power and energy systems, transport, and so forth.

This special issue intends to bring together the state-of-the-art research, application, and review of DDM techniques. It aims at not only stimulating deep insights on computational intelligence approaches in DDM but also promoting their potential applications in complex engineering problems.

This special issue has received 61 manuscripts and only 12 high-quality papers have been accepted and published (20% acceptation rate). The accepted papers involve a variety of data-driven modelling and data analytics techniques and contribute to a wide range of application areas, including transportation, environment, telecommunication, automatic control, product design, and finance. A brief introduction for each contribution is provided in the following paragraphs.

L. Wei et al. proposed a new architecture for Elman neural network (ENN), which was named recursive modified Elman neural network (RMENN), to enhance the model’s approximation ability. The stability of the model was proven and its learning algorithm was also provided. The advised network has been applied to predict the gas emission relating to Kailuan mining group, China. It was shown that RMENN outperformed ENN in prediction accuracy and convergence speed.

Q. Wang et al. combined the partial least square (PLS) approach with the back-propagation neural network (BPNN) and the radial basis function neural network (RBFNN) to predict short-term wind power. The proposed method was validated using the data collected from a wind farm of Northwest China. The experimental results showed that the new method performed better than other conventional methods, such as the single PLS, the single RBFNN, or support vector machine (SVM).

An adaptive control method was designed by Z. Xu et al. to manipulate marine vessels with unknown dynamics and unknown external disturbances. In the method, an RBFNN was used to approximate the dominant dynamics and disturbances. The simulation demonstrated that the proposed controller outperformed the original controller in performance and made the tracking error stay in the predefined bounds.

A paradigm for mechatronic products design was introduced by J. Jiang et al. The work involved design space reduction, data-driven system modelling, and optimal design of parameters. In modelling, multiple surrogate models were developed using BPNN. The proposed strategy was used into the design of high-speed trains considering seven performance indices and was validated to be effective through simulation experiments.

X. Zhou et al. applied a deep learning network, the generative adversarial network (GAN) to predict the price change for stock market. The advised model, which was named GAN-FD, was validated using the high-frequency data from Chinese stock market. It was compared with other modelling strategies, including Long Short-Term Memory (LSTM), ANN, and SVM, and showed better performance.

X. Li et al. proposed a new version of independent component analysis (ICA) for data-driven fault detection. It utilised a novel biogeography-based optimisation (BBO) method to replace the normal Newton iteration method. The new ICA, named BBO-IAC, was validated using an actuator diagnosis problem. It showed that BBO-ICA could detect faults with better efficiency than other methods, such as FastICA and PSO-ICA.

A cross-project defect prediction paradigm with an improved training data selection mechanism was proposed by P. He et al. The method was tested using 15 datasets from 14 open-source projects. The experimental results showed that the proposed method could design more effective defect predictors than some other similar approaches.

A data-driven approach was designed by Y. Li et al. to approximate the intercity trip distribution. In this work, a Poisson model was used to estimate the correlation between trip distribution and city features and k-means clustering was employed in feature analysis and interpretation. The approach was implemented to the case of analysing Chinese highway trips, which involved 17 cities of Shandong Province.

H. Ye et al. proposed a dynamic driver response model for variable speed limit (VSL) control using linear and nonlinear regression methods. The elicited model was then utilised to improve the VSL control algorithm using an optimisation technique. A simulation study was carried out based on the field data of a freeway corridor allocated in Whitemud Drive, Edmonton, Alberta, Canada, and showed the new model and control algorithm worked effectively.

A semi-supervised framework for automatic image annotation with multi-label images was suggested by H. Ge et al. The method employed graph embedding and multi-view nonnegative matrix methods for feature fusion and dimension deduction. It then employed a k-nearest neighbour (KNN) based method for image annotation. The framework was validated using experiments and showed good performance in accuracy and efficiency.

J. Zhang et al. applied group theory in telecom operation systems. The model of pipeline entity group was constructed for information transmission. The equations of network traffic were elicited according to the matrix of pipeline network and flux conservation principle. An optimisation method was then suggested to improve the general information network system.

D. Li et al. designed an approach to detect J wave from electrocardiogram (ECG) signals, which was named analytic time-frequency flexible wavelet transformation (ATFFWT). Fuzzy entropy (FE) was then employed to capture hidden but useful information and least squares-support vector machine (LS-SVM) was employed for classification. Some experiments demonstrated that the approach outperformed other similar methods.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.


The guest editors sincerely thank all the authors for their quality contributions to this special issue and thank all the reviewers for their valuable comments and suggestions. The lead guest editor would also like to express the deepest gratitude to other co-editors for their great support and cooperation throughout the development of the special issue.

Qian Zhang
Sarah Spurgeon
Li Xu
Dingli Yu