|
DL model | Strengths | Drawbacks |
|
MLP [47, 83] | (i) Can work with clean, balanced, and scaled data, regardless of the data type (ii) Integration in real-time systems is easy and allows one-dimensional data analysis | (i) Requires a lot of tuning to work on dirty or unscaled data |
AE [118, 127] | (i) Allows to learn rich representations and reduces dimensionality (ii) Can work as a denoising technique to get cleaner data (iii) Easy implementation | (i) Requires massive training data and training time (ii) Poses difficulty in discriminating relevant data |
DBN [76, 77] | (i) Can achieve higher level of generalization on one-dimensional raw data | (i) Slow training and inefficient |
DBM [75, 128] | (i) Allows for one-dimensional data analysis (ii) Combined optimization of all the layer parameters | (i) Slower training than DBN and inefficient (ii) Combined optimization becomes impractical for large data |
CNN [10, 83] | (i) Fits well for multidimensional data analysis (ii) Enables for feature extraction from raw data | (i) Complex architecture (ii) Requires large datasets and takes long training time (iii) Estimations of continuous data are poor |
RNN/LSTM/GRU [83, 113, 129] | (i) Performs well with time series or sequential data (ii) Better forecasting ability in time series and sequential data | (i) Without proper constraints on weights and gradient clipping, might suffer from the gradient either vanishing or becoming unbounded |
GAN [115, 116, 130] | (i) Learns underlying representation of data well (ii) Seems to achieve a discriminator with less generalization error where its generator output (fake data) provides a regularization effect (iii) Only algorithm that can work in semisupervised or even unsupervised setting to identify under observation clusters (iv) Model with high fidelity | (i) Very complex architecture and implementation is even difficult (ii) Difficult to model discrete data (iii) Perhaps not suitable for real-time implementation |
|