Abstract

Accurate prediction of financial market trends can have a great impact on maximizing profits and avoiding risks. Conventional methods, e.g., regression or SVR, or end-to-end training approaches, coined as deep learning algorithms, have restraints as a consequence of capturing noisy and unnecessary data. Financial market’s data are composed of stock’s price time series that are correlated, and each time series has both global and local dynamics. Inspired by recent advancements in disentanglement representation learning, in this paper, we present a promising model for predicting financial markets that learn disentangled representations of features and eliminate those features that cause interference. Our model uses the informer encoder to extract features, capturing global–local patterns by using the time and frequency domains, augmenting the clean features with time and frequency-based features, and using the decoder to predict. To be more specific, we adopt contrastive learning in the time and frequency domains to learn both global and local patterns. We argue that our methodology, disentangling and learning the influential factors, holds the potential for more accurate predictions and a better understanding of how time series move and behave. We conducted our experiments using the S&P 500, CSI 300, Hang Seng, and Nikkei 225 stock market datasets to predict their next-day closing prices. The results showed that our model outperformed existing methods in terms of prediction error (mean squared error and mean absolute error), financial risk measurement (volatility and max drawdown), and prediction net curves, which means that it may enhance traders’ profits.

1. Introduction

Financial markets act as a significant element of economic systems, and predicting them is both a critical ingredient and a challenging problem for market traders and scholars. Many methods, including technical and fundamental analysis [1, 2], have been employed to study the historical behavior of financial markets. Sound financial decision-making is a massive challenge since it depends on accurate prediction, transparency, and trust [3]. Given how quickly machine learning (ML) and, in particular, deep learning (DL) techniques are growing, studying, and understanding stock market movements is an interesting task for ML experts. Indeed, DL methods combined with financial time series prediction methods are among the most attractive research topics as a result of the complex nature of the financial markets [4].

In recent literature, recurrence, convolutional, or hybrid models have proven to be outstanding frameworks for financial market prediction. However, they are not devoid of limitations, including ignoring the long-sequence terms of time series that may cause valuable information to be missed [5], vanishing and exploding gradients [3], high time complexity [6], weak of parallelism, lack of explainability, and so on. In order to connect various data points of a sequence model and build a representation of them, self-attention, sometimes called intra-attention, was introduced to eliminate shortcomings of earlier frameworks [7]. Multihead attention, as the innovation of transformer [7], brings several strengths to the table, like capturing global dependencies, efficient parallelism [8], and context-aware representations [9], making it a valuable tool in a range of applications, like text classification [10] and time series prediction [11]. Researchers should be aware of the limitations of the self-attention mechanism used in transformers [7]. These include the fact that its core operation, the scaled dot product, has quadratic computational complexity and the slowness of inference. Several endeavors have been undertaken to get around this limitation by studying the correlation between the key and the query, which are the fundamental constituents of the attention value. As a case in point, Child et al. [12] suggested that the self-attention probability may display potential sparsity, whereas this paper [13] has evinced that the softmax function can be represented as a probability distribution. One method used for addressing such limitations is ProbSparse self-attention, also denoted as informer [5], which employs the concept of probability through Kullback–Leibler divergence to measure query sparsity. Informer demonstrated its effectiveness in different fields, including time series prediction [14], heating load prediction [15], and wind power prediction [16].

These methods rely on the end-to-end training of the models using observable data. This means it often picks up unwanted or noisy data, propagates errors, and cannot be interpreted. In contrast, the concept of disentanglement learning [17] is relatively new and aims to improve explainability and interpretability by capturing and isolating the main important beneficial factors that make up the data. Learning disentangled representation has been used in a set of studies and yielded impressive results, including developing a new fashion design based on disentangling the features of the images [18], video prediction [19], and medical imaging [20].

Time series are typically modeled through two fundamental methods: (1) in the time domain, which captures temporal correlations between individual data points and helps identify patterns and trends, and (2) in the frequency domain, which extracts relevant data that display periodic or quasiperiodic patterns and provides the spectral content of a time series [21, 22].

Each financial market movement is composed of specific movements of the stocks, called local movements, and changes related to the overall market, called global movements [17]. These movements can be helpful in providing insights for making decisions about the models and providing answers to people who demand explanations for specific decisions. Capturing and separating the underlying factors that explain the data can provide advantages such as reducing sample complexity, offering interpretation potential, and overcoming some DL challenges like the black-box nature of such algorithms. For this reason, learning disentangled global–local representations, which are more valuable for financial market prediction, is the aim of this work.

Unsupervised disentanglement learning is challenging, and that is why contrastive learning, as a promising approach to self-supervised learning, is used to enhance the results by setting similar samples close to each other while dissimilar ones are pushed far apart [23, 24].

Financial markets may have a scarcity of labeled data, which is essential for a good DL model, and data augmentation is a useful approach for improving both the quantity and the quality of the training data [22] by applying transformations or perturbations. Conventional augmenting methods, like scaling and shifting, may result in a mismatch between the augmented data and the target [24].

In this paper, we propose GLAD (Global–Local Approach; Disentanglement Learning for Financial Market Prediction), a prediction model that disentangles financial market movements into global and local patterns. Our framework makes use of an informer module to capture the temporal feature relationships between historical data. Afterward, time and frequency domains are used to capture global–local representation, i.e., (1) a mixture of autoregressive experts is used to extract global representations, and (2) a discrete Fourier transform (DFT) is applied to represent local features. We will use (1) time-based augmentation on global representation and (2) frequency-based augmentation on local representation to augment the extracted features. Contrastive learning, inspired by Woo et al. [25], is used to train global and local representations. We use more prominent stock market indexes, such as the S&P 500, Hang Seng, Shanghai and Shenzhen 300 (CSI 300), and Nikkei 225. In all of the experiments, the accuracy metrics (mean squared error (MSE) and mean absolute error (MAE)), risk measurement, and prediction net curves show that our results are better than the state of the art.

To sum up, our significant contributions include the following:(1)We propose GLAD, a novel approach for predicting financial market movements using global and local disentanglement with time–frequency contrastive learning.(2)We overcome the black-box aspect of DL and offer interpretability, how the algorithm works, and explainable artificial intelligence (XAI) by separating the underlying factors.(3)To tackle the problem of limited available data, we augmented the extracted features in both time and frequency domains.(4)Our model can provide traders with the source of changes in the financial markets, which will fundamentally increase profits and improve people’s decision-making reliability.

This paper’s outline is as follows: the literature that has relevance to our work is discussed in the next section. In Section 3, we provide an overview of the theoretical underpinnings of our model. Our model’s framework is detailed in Section 4. Section 5 describes the experiments, including parameter settings and evaluation criteria. Section 6 concludes this work.

Over the recent years of renovation, interest among many academics and market participants has grown exponentially in the financial markets, and this has motivated the use of DL methods to provide more accurate solutions for predicting financial markets [1]. As a result of their inherent complexity, financial markets are always at the core of challenging problems. Prior works in the field of DL usage for analyzing financial markets may be divided into two main categories:(1)End-to-end learning: These techniques make use of observed data. They are proficient prediction tools, but they possess notable limitations, like learning unnecessary or noisy data. For stock closing price prediction, Lu et al. [26] integrated convolutional neural networks (CNN), bidirectional long short-term memory (BiLSTM), and attention mechanism (AM) to proposed a CNN-BiLSTM-AM architecture. In Cheung et al.’s [27] study, 3D-CNN was used to find a more comprehensive set of elements that may affect crop output and price variations. Using CNN and long short-term memory (LSTM) models to increase accuracy rate, Chen and Huang [28] used eight different input features. They conclude that the proposed method can improve prediction accuracy significantly. By converting technical indicators into 2D pictures and analyzing the pictures using a CNN-LSTM-ResNet architecture, Khodaee et al. [29] predicted stock market turning points. In a more recent work, Wang et al. [3] used a transformer model for forecasting stock market indices. Furthermore, in order to capture the temporal dependency of financial data, Zhang et al. [30] used features for five consecutive calendar days in a transformer architecture and reported favorable results. Some scholars have also shown interest in sentiment analysis for financial market prediction, e.g., Köksal and Özgür [31] predicted market trends by analyzing social media comments and news. Numerous scholarly investigations have been conducted to explore the concept of feature engineering, like this paper [2], which demonstrates the use of the discrete wavelet transform by the authors for decomposing the financial time series data into approximation and detail coefficients. Furthermore, the researchers used chicken swarm optimization as an optimization technique in order to determine the most optimal subset of characteristics.(2)Disentanglement methods: Relying on the premise that the observed data consists of the interaction of various sources, these techniques prioritize capturing the essential factors and diverse explanatory sources from the observed data and isolating them from each other. The concept of encoder–decoder was used in Hadad et al.’s [32] study to map the data of a financial market to its specified and unspecified components. Chen and Huang [17] focused on disentangling excess and market returns of stocks and showed that, using this approach, the prediction outcome was improved. By disentangling news into positive and negative sentiment, Costola et al. [33] investigated the impact of news on financial markets. Using the generalized autoregressive conditional heteroscedasticity (GARCH-MIDAS) model, this study [34] found that oil supply shocks and oil consumption demand shocks had a comparable effect on the stock market volatilities in Nigeria and South Africa.

Even though generally sound conclusions have been drawn from the prior efforts, limitations like the capture of noisy data or incorrect correlations, as well as constraints related to model capacity, have been pointed out. As of late, self-attention mechanisms have shown exceptional skill in modeling the complex dependencies of time-series data. Financial markets data usually contain temporal correlations [35], while end-to-end learning could be used to model these correlations, it does not provide interpretable predictions, as does disentanglement learning. This work delves explicitly into these limitations and strengthens positive points for better prediction results as well as interpretability, which is crucial for many downstream tasks [36]. This effort is based on the promising progress of disentanglement methods and similar ideas.

3. Global–Local Disentanglement and Its Interpretation

Financial time series are intricate, noisy, and frequently show significant correlation, which means that a variable’s past values have weight on its current value along with the impact of other stocks movements. As a result, it is essential to understand the main factors that generate the observed data for analysis and prediction purposes. Our work will begin based on the following three theoretical pillars:(1)Data from financial markets are highly correlated [37], and this is a result of multiple stock market factors being cointegrated, which leads to both rich and complicated observed data. Disentangling these factors leads to the extraction of meaningful explanatory sources and has a significant impact on the interpretability of financial market prediction models as well as their predictability [3638].(2)Data from financial markets consist of both clean features and noise [28]. We can make the best predictions if we find features that accurately describe both local and global patterns [17].(3)When analyzing the fact that each stock’s feature represents one dimension in a financial market dataset, it is quickly concluded that the prediction task is a high-dimensional problem, which is in general a great challenge [39]. To improve prediction rates, it is necessary to take into account both global and local representations [39], as well as the benefits for traders.

4. Problem Formulation

Financial market data consist of data generated from market behavior or overall patterns, referred to as global representation, and specific stock movements, called local dynamics [17]. Suppose X is the price records of a group of stocks in a market, i.e.:where , is to the number of time points in a stock, is the observation of stock at time step , and is the number of input variables. The ultimate objective of a prediction model is to inference the output , where represents the future values based on the input time series.

4.1. Encoder–Decoder Models

In several prevalent models, the observed data are encoded into hidden representations, and from the hidden representations, the output representations are inferred. The precision of the results depends on how well the necessary data is captured and on their interdependencies.

4.2. Disentangled Representation

These models aim to capture the underlying factors that generate the raw data. Each has global and local representations. In other words, stock market prices can be decomposed into two separate parts as follows:(1)Local representations are the specific stock’s movements that reflect the movement of a stock itself.(2)Global representation (X-Global) is the behavior of a market that consists of overall stock movement and the shared value with other stocks in the same market, which is sometimes called the trend.

Our objective is to improve financial market prediction and help traders figure out the source of variation in stock movements. Separating financial market data into global and local patterns with effective dependency capture can accomplish this.

5. Methodology

In light of what has been said above, a prediction framework that disentangles financial market movements into global and local representations, called GLAD: Global–Local Approach; Disentanglement Learning for Financial Market Prediction, is proposed in this paper. Instead of end-to-end learning from observed data, this work aims to capture and learn usable features from observed data. Please refer to Figure 1 for an overview of our approach, in which an informer encoder–decoder [5] is the backbone of the model.

5.1. Encoder: Extraction of Sequential Input Dependencies

First, the input is fed to an informer encoder, which learns the mapping connection between the long-term relationships of the sequential input. To this end, the input sequence is shaped into a matrix , where represents the model dimension. For the embedding layer, we use a time feature to understand the sequence of our stock prices over time, and we use sine and cosine functions to encode data location. Encoder layers receive a combined input of positional encoding and embedded input. The self-attention mechanism in Vaswani et al.’s [7] study is based on the tuple of , , and for each row, which represent query, key, and value, respectively.

In the standard transformer [7], all elements of the input sequence attend to each other, resulting in a dense attention matrix, which makes the complexity of computing the attention scores for all pairs of elements prohibitively high. In place of using a scaled dot product, the lens of a kernel-based attention study [13] suggests viewing attention mechanisms as implicitly defining a kernel or similarity measure between elements in the input sequence. The attention mechanism can be seen as implicitly computing a kernel matrix that quantifies the similarity or relevance between pairs of elements. At the same time, some works, like [12], indicated that there are sparse patterns in self-attention. It was proposed to use “selective” counting techniques on all to determine which elements are attended to and which are ignored.

In view of what was found above, Tsai et al. [13] and Child et al. [12], informer [5], as utilized in our model, proposed ProbSparse attention. ProbSparse attention determines which queries are “important” by comparing the probability of the key-query pair with that of a uniform distribution through Kullback–Leibler divergence. Multihead attention addresses the issue of lost information by enabling each head to produce distinct sparse query-key pairs for every head. ProbSparse attention allows the model to focus on the most important elements and capture long-range dependencies without attending to all pairwise combinations. As a result of the ProbSparse self-attention mechanism, a 1D convolutional filter is used on the temporal dimension, along with an activation function. This is done to capture the better features and avoid the redundant combinations of value V. Following that, a max-pooling layer is added with a stride value of 2.

5.2. Decoder: Dynamically Inferring the Output

Instead of inferring the output state step by step from the hidden state, as most traditional decoders do, our decoder, as used in Zhou et al.’s [5] study, outputs by one forward procedure by feeding the following vectors as: , where is the decoder inputs, is the start token length of decoder, is a sequence in the input, is the output’s length, is the target sequence that pads to zero, and Concat is the concatenation operator. In other words, decoder’s input is Concat (start token length of the decoder, zero padding of target elements). After that, the weight will be measured, and the output will be inferred. The final layer will be a fully connected layer, and the prediction type determines whether the result is univariant or multivariate.

5.3. Global Feature Disentangler

The global feature disentangler is designed to capture global representation. It receives the output of the informer encoder as input and passes it through a mixture of autoregressive experts, which consist of a 1D causal convolution layer as it can effectively capture the continuous representation within a time series [40] and the kernel size of the th expert is which determines the receptive field of the convolutional operation. The causal nature of these layers makes them particularly suitable for this module, where temporal order and causality are important. Finally, an average-pooling operation, for retaining important information and feature aggregation, is used to get the final global representations.

5.4. Time-Based Augmentation

Through a time-based augmentation, which consists of scaling, shifting, and jittering techniques, as three typical augmentation methods. For scaling and shifting, we used and , respectively, while jittering was performed as follows: , where is the output of an augmented method, is a time step, is a sample of random scalar value , and is Gaussian noise from distribution .

5.5. Time Domain Contrastive Loss

We used the concept of a dynamic dictionary of MoCo [41] as contrastive learning to learn discriminative global representations. MoCo uses the momentum principle, which leverages a momentum-based update mechanism and contrastive learning to train deep neural networks on unlabeled data, to obtain the positive pairs, which represent samples of data augmentations, and a dynamic dictionary that contains a queue of negative pairs obtained by considering all other samples as negative samples. From Woo et al.’s [25] study, we took the loss function for similarity measured by dot product as shown below:where is the temperature hyperparameter, is an encoded query, and is a set of encoded samples.

5.6. Local Feature Disentangler

This section’s primary objective is to obtain a local representation of the data by using the DFT in view of its ability to capture intrafrequency interactions [42] and detect periodic patterns. DFT is used to map the time-domain representations into the frequency domain along the temporal dimension by converting a discrete sequence of N time-domain samples into N frequency-domain components. The resulting frequency components represent the amplitudes and phases of sinusoidal signals. The Fourier transform coefficients are learned using a learnable Fourier layer, which is realized using a per-element linear layer. The inverse DFT method was used to transform the representation back to the time domain. As a result of this layer, we get a matrix that represents the local feature representation. The equation utilized in Woo et al.’s [25] study was employed to represent the constituent of the , th element of the output:where is the number of frequencies, is the latent dimension, and are the parameters, and is the local dimension.

5.7. Frequency-Based Augmentation

We took the idea for frequency-based augmentation from Zhang et al.’s [24] study (https://github.com/mims-harvard/TFC-pretraining.git), whereby frequencies are added or removed based on the frequency’s characteristics and generate frequency-based representations. First, we randomly choose to represent the number of frequency components (amplitude and phase). We will reduce the amplitude of frequency components to zero if we want to remove them, and increase it to for adding, where is the maximum frequency–amplitude and is a predefined constant.

5.8. Frequency-Domain Contrastive Loss

To discriminate between different local patterns, we apply the loss function that was previously employed in Woo et al.’s [25] study.where is the number of frequencies and , and are the frequency components and their augmentations, respectively.

6. Experiments and Discussion

In the sections that follow, we discuss the results of our in-depth empirical study of the model and how it compares to other methods of predicting financial markets.

6.1. Datasets

We conducted extensive experiments on four financial market indices, i.e., the S&P 500, the Nikkei 225, the CSI, and the HSI, to demonstrate the predictive ability of our model. We model these daily indices over the period from January 1, 2010 to December 31, 2020.

6.1.1. Features Setting

We set the “close” feature as the target value for our prediction while the input data consists of two scenarios: (1) univariate input, which was the “close,” and (2) multivariate features, including “close, open, high, low, adj close, and volume”.

6.1.2. Data Processing

The raw data for each feature are a 1D time series; to achieve good data quality, we scale the features to unit variance and zero mean to decrease volatility.

6.1.3. Data Setup

We use the time feature with a fixed-size rolling window to ensure whether the values are taken at equal intervals or not. For the input data, we set the input length for the encoder to 9 and the decoder to 2 to predict the next day’s price.

6.2. Experimental Details

The basic information about components and setups is summarized in the following sections.

6.2.1. Metrics

MSE and MAE on each prediction window (averaging in the multivariate case) were used to evaluate this work, with the dataset split 70/30 between train and test, as shown in the following equations:where , , and are the actual value, the predicted value, and the sample size, respectively.

6.2.2. Risks Measurement

In addition to accuracy, it is imperative to evaluate pertinent facets of the trading process, like the measurement of risks related to stock returns, which may be calculated using both real and predicted values . The expression for the return at time t + 1 may be written as [3]:where is the sing function. In this work, two risk-related concepts will be utilized as follows:(1)Volatility: It is a statistical indicator of how much stock returns have variated over time [3, 43] and can be expressed as:(2)Max drawdown: It measures the most adverse potential result that may arise throughout a trade [3, 44]. It might be written as:where NV(.) represents the total return.

6.2.3. Environment Configurations

The experimental environment and settings are described in Table 1.

6.2.4. Hyperparameter Tuning

For our model, the backbone encoder used is an informer [5]. We used the Adam optimizer with learning rate starting from , and set the batch size to 32, temperature to 0.07, momentum to 0.999, , and for frequency augmentation. The number of heads is 8. The encoder contains a three-layer stack, while the decoder consists of two layers. The kernel width of distillation is 3.

6.3. Baselines

We benchmarked our model with state-of-the-art approaches to demonstrate how well our model performed. The CoST and informer results are based on our replication with dataset modifications for day, week, and year, while the results of the transformer from the paper are as is. The details are as follows:(1)End-to-end learning methods (transformer and informer): These methods are based on self-attention mechanisms and end-to-end training.(a)Transformer [3]: This work predicts univariate stocks using a transformer encoder–decoder architecture.(b)Informer [5]: This paper was used to predict ETT (https://github.com/zhouhaoyi/ETDataset.gi), ECL(https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014), and weather (https://www.ncei.noaa.gov/data/local-climatologicaldata/) dataset and we used the open source implementation (https://github.com/zhouhaoyi/Informer2020.gi) as is.(2)Disentanglement methods: The main concept behind these techniques is to capture the underlying components of the observed data in the form of clean features and use these features to learn the model.(a)CoST [25]: This model (https://github.com/salesforce/CoST.gi) was used to predict ETT, electricity, and weather by using TS2Vec [45] as backbone, capturing seasonal-trend by time–frequency analysis, and learning them by contrastive loss.(b)GLAD_a: As a backbone, we utilize the informer [5], and using time–frequency domain characteristics, we were able to capture the global–local patterns.(c)GLAD_b: We used the transformer [7] as backbone and captured the global–local by using time–frequency domain features.(d)GLAD_c: We used the structure of GLAD_b and improve the results by contrastive learning.

6.4. Interpretability and Explainability of Our Model

There is a fine line between the concepts of interpretability and explainability. Even with their importance, they do not receive the same level of research attention in time series prediction applications as they do in other fields, like computer vision. Our model attempts to attain both, and a brief discussion of each is provided below.

6.4.1. Interpretability: Understanding a Cause with an Effect

The interpretability principle, which is essential for downstream tasks, is about how well decision-makers, like researchers or traders, can understand why a decision was made [46]. To make a model that can be understood, the most important representations must be extracted, and then the model must be trained to learn them [47]. The underlying factors of financial market data, as with other time series, may be represented by time–frequency domains. In prior works, the raw time series were separated into their factors as they were in the original input domain like [17]. This means that the interpretability was based on the time domain. CoST [25] performed separation in the latent space, not the raw data, utilizing the time–frequency domain, whereas the augmentations are conducted in the time domain. Our method separated the latent space, not the raw data, using time–frequency domains and augmented the clean features, which means the time and frequency domains were augmented.

6.4.2. Explainability: Providing Meaningful Explanations for the Model’s Decisions

One problem with traditional DL models is that we cannot see inside them to find out what contents they hold. As a result of the black box, even the people who created it cannot explain why a certain result was obtained. A model is considered explainable if its learning content is understandable, and decomposability models, like the disentanglement approach, are used to achieve explainability. A multistep model and a good latent representation of data inside the model, as well as extracting the key elements of this representation, provide an explanation of the content of the model [48], and this is what our model includes, which tries to overcome the black box nature of DL.

6.5. Results and Analysis

Our experimental results on four datasets are demonstrated in Tables 2 and 3. The best results are highlighted in boldface, while the second-best results are enclosed within brackets. Table 2 provides a summary of the GLAD’s results and the top-performing baselines for the univariate setting, while Table 3 reports the results of the multivariate setting of the GLAD. Figures 2 and 3 present the fitted curves, in training and testing sets, generated by GLAD and other models for four main stock market indices. Figure 4 presents a sample of the fitted curves (40 days) in testing, and our GLAD was closer to the real data. The predicted values are close to the real data in both the training and testing sets. In addition to evaluating the accuracy of the model, we employed max drawdown and volatility, two commonly used financial market risk indicators, to appraise its performance. Table 4 presents the outcomes for volatility and max drawdown, indicating that GLAD has a competitive performance in these measures.

6.5.1. Ablation Study

To evaluate the efficacy of each GLAD module, we design two main approaches: (a) end-to-end learning that utilizes raw data in the learning, whether via transformer [3] or informer [5] and (b) representation learning with two scenarios: (1) disentangle the output features from the encoder into global–local representations and (2) implement contrastive learning. The results demonstrate that all of GLAD’s components are indispensable, as may be seen in Table 2 and Figure 2. In addition to this, we observed the following other phenomena: (1) self-attention mechanisms, transformer and informer, showed close results to CoST, with informer having superiority, (2) the global–local disentanglement models outperformed transformer and informer as examples of end-to-end learning, (3) informer’s performance is superior to that of the transformer, and (4) contrastive learning improves performance over the baselines.

7. Conclusion and Future Work

Our findings point out that disentangling, when it comes to stock market prediction, is a more productive model than conventional end-to-end methods in both prediction error (7.21% improvement in MSE and 4.53% improvement in MAE) and net value analysis, along with financial risk measurement. We have based our work on theoretical background through the nature of financial market movements and experimentally verified it, which showed that our model outperformed the state-of-the-art approaches. For financial market prediction, we introduced GLAD, a framework that disentangles global and local representations. Augmenting financial markets data is a challenge because of the timestamps, which may generate a mismatch between the augmented data (generated by methods such as shifts, scale, and others) and the target. In this paper, we inspired the idea of Zhang et al. [24], but we augmented the extracted features in (1) the time domain, where we adopt shifts, scale, and jitter on global representation, and (2) the frequency domain, in which we add or remove frequency on local representation. Empirical results demonstrated that contrastive learning may improve both learning and the prediction model. Our results make it easier for real-world users to understand what’s going on by showing them where the variance and influencing factors come from. In our future research, we will investigate (1) whether this model can predict new stocks to solve the data scarcity challenge by generating local parameters and (2) whether our model has the capability to extend to other time-series datasets.

Data Availability

Data were taken from public sources.

Conflicts of Interest

The authors declare that they have no conflicts of interest.