A Log-Based Anomaly Detection Method with Efficient Neighbor Searching and Automatic K Neighbor SelectionRead the full article
Scientific Programming provides a forum for research results in, and practical experience with, software engineering environments, tools, languages, and models of computation aimed specifically at supporting scientific and engineering computing.
Chief Editor, Professor Tramontana, is based at the University of Catania and his research primarily concerns the areas of software engineering and distributed systems.
Latest ArticlesMore articles
Incorporating Research Reports and Market Sentiment for Stock Excess Return Prediction: A Case of Mainland China
The prediction of stock excess returns is an important research topic for quantitative trading, and stock price prediction based on machine learning is receiving more and more attention. This article takes the data of Chinese A-shares from July 2014 to September 2017 as the research object, and proposes a method of stock excess return forecasting that combines research reports and investor sentiment. The proposed method measures individual stocks released by analysts, separates the two indicators of research report attention and rating sentiment, calculates investor sentiment based on external market factors, and uses the LSTM model to represent the time series characteristics of stocks. The results show that (1) the accuracy and F1 evaluation indicators are used, and the proposed algorithm is better than the benchmark algorithm. (2) The performance of deep learning LSTM algorithm is better than traditional machine learning algorithm SVM. (3) Investor sentiment as the initial hidden state of the model can improve the accuracy of the algorithm. (4) The attention of the split research report takes the two indicators of investor sentiment and price as the input of the model, which can effectively improve the performance of the model.
Evaluating Encryption Algorithms for Sensitive Data Using Different Storage Devices
Sensitive data need to be protected from being stolen and read by unauthorized persons regardless of whether the data are stored in hard drives, flash memory, laptops, desktops, and other storage devices. In an enterprise environment where sensitive data is stored on storage devices, such as financial or military data, encryption is used in the storage device to ensure data confidentiality. Nowadays, the SSD-based NAND storage devices are favored over HDD and SSHD to store data because they offer increased performance and reduced access latency to the client. In this paper, the performance of different symmetric encryption algorithms is evaluated on HDD, SSHD, and SSD-based NAND MLC flash memory using two different storage encryption software. Based on the experiments we carried out, Advanced Encryption Standard (AES) algorithm on HDD outperforms Serpent and Twofish algorithms in terms of random read speed and write speed (both sequentially and randomly), whereas Twofish algorithm is slightly faster than AES in sequential reading on SSHD and SSD-based NAND MLC flash memory. By conducting full range of evaluative tests across HDD, SSHD, and SSD, our experimental results can give better idea for the storage consumers to determine which kind of storage device and encryption algorithm is suitable for their purposes. This will give them an opportunity to continuously achieve the best performance of the storage device and secure their sensitive data.
Flow Enhancement of Mineral Pastes to Increase Water Recovery in Tailings: A Matlab-Based Imaging Processing Tool
The rate of growth of mining copper industry in Chile requires higher consumption of water, which is a resource limited in quality and quantity and a major point of concern in present times. In addition, the efficient use of water is restricted due to high levels of evaporation (10 to 15 (l/m2) per day), in particular at the north highland mining sites (Chile). On the contrary, the final disposal of tailings is mainly on pond, which loses water by evaporation and in some cases by percolation. An alternative are the paste thickeners, which generate stable paste (70% solids), reducing evaporation and percolation and therefore reducing water make up. Water is a resource with more demand as the industries are expanding, making the water recovery processes more of a necessity than a simple upgrade in efficiency. This technology was developed in Canada (early 80s) and it has widely been used in Australia (arid zones with similar weather conditions to Chile), although few plants are using this technology. The tendency in the near future is to move from open ponds to paste thickeners. One of the examples of this is Minera El Tesoro. This scenario requires developing technical capacity in both paste flow characterization and rheology modifiers (fluidity enhancer) in order to make possible the final disposal of this paste. In this context, a new technique is introduced and experimental results of fluidity modifiers are discussed. This study describes how water content affects the flow behavior and depositional geometry of tailings and silica flour pastes. The depositional angle determined from the flume tests, and the yield stresses is determined from slump test and a rheological model. Both techniques incorporate digital video and image analysis. The results indicate that the new technique can be incorporated in order to determine the proper solid content and modifiers to a given fluidity requirement. In addition, the experimental results showed that the pH controls strongly the fluid paste behavior.
Machine Learning Approach for Answer Detection in Discussion Forums: An Application of Big Data Analytics
Nowadays, data are flooding into online web forums, and it is highly desirable to turn gigantic amount of data into actionable knowledge. Online web forums have become an integral part of the web and are main sources of knowledge. People use this platform to post their questions and get answers from other forum members. Usually, an initial post (question) gets more than one reply posts (answers) that make it difficult for a user to scan all of them for most relevant and quality answer. Thus, how to automatically extract the most relevant answer for a question within a thread is an important issue. In this research, we treat the task of answer extraction as classification problem. A reply post can be classified as relevant, partially relevant, or irrelevant to the initial post. To find the relevancy/similarity of a reply to the question, both lexical and nonlexical features are used. We proposed to use LinearSVC, a variant of support vector machine (SVM), for answer classification. Two selection techniques such as chi-square and univariate are employed to reduce the feature space size. The experimental results showed that LinearSVC classifier outperformed the other state-of-the-art classifiers in the context of classification accuracy for both Ubuntu and TripAdvisor (NYC) discussion forum datasets.
Addressing the Bike Repositioning Problem in Bike Sharing System: A Two-Stage Stochastic Programming Model
In this paper, a bike repositioning problem with stochastic demand is studied. The problem is formulated as a two-stage stochastic programming model to optimize the routing and loading/unloading decisions of the repositioning truck at each station and depot under stochastic demands. The goal of the model is to minimize the expected total sum of the transportation costs, the expected penalty costs at all stations, and the holding cost of the depot. A simulated annealing algorithm is developed to solve the model. Numerical experiments are conducted on a set of instances from 20 to 90 stations to demonstrate the effectiveness of the solution algorithm and the accuracy of the proposed two-stage stochastic model.
Detection and Classification of Early Decay on Blueberry Based on Improved Deep Residual 3D Convolutional Neural Network in Hyperspectral Images
Recently, the automatic detection of decayed blueberries is still a challenge in food industry. Early decay of blueberries happens on surface peel, which may adopt the feasibility of hyperspectral imaging mode to detect decayed region of blueberries. An improved deep residual 3D convolutional neural network (3D-CNN) framework is proposed for hyperspectral images classification so as to realize fast training, classification, and parameter optimization. Rich spectral and spatial features can be rapidly extracted from samples of complete hyperspectral images using our proposed network. This combines the tree structured Parzen estimator (TPE) adaptively and selects the super parameters to optimize the network performance. In addition, aiming at the problem of few samples, this paper proposes a novel strategy to enhance the hyperspectral image sample data, which can improve the training effect. Experimental results on the standard hyperspectral blueberry datasets show that the proposed framework improves the classification accuracy compared with AlexNet and GoogleNet. In addition, our proposed network reduces the number of parameters by half and the training time by about 10%.