Abstract

Due to the recent advances of Internet and information technologies, massive quantity of archive data gets generated and it becomes difficult to handle it using conventional techniques. Archive management is the field of management related to the maintenance and utilization of archives, once they have been sent from the client to the repository. The drastic increase in the size of archive data necessitates effective storage schemes, which can be accomplished by the use of data compression approaches. Generally, data compression techniques are used for reducing the count of data being saved from a system or network without compromising the data quality. With this motivation, this study designs an effective archive storage system with a compression approach for network management (EASS-CANM). The major intention of the EASS-CANM technique is to archive the textual and image data effectively in its compact form in order to reduce the storage area. In the context of archive management, the EASS-CANM technique might be considered a successful instrument. The proposed EASS-CANM technique involves a two-stage process: textual data compression and image compression. At the initial stage, neighborhood indexing sequence (NIS) with the Prediction by Partial Matching (PPM) technique was applied for textual data compression. Secondly, fruit fly optimization (FFO) with modified Haar wavelet (MHW) is used for effective image compression where the optimal threshold selection process takes place utilizing the FFO technique. We improved the Haar wavelet filtering process in order to preserve higher image quality and clarity (MHW). With the introduction of MHW, a new transformation is made possible, allowing for improved compression outcomes as well as improved PSNR and CR values. In order to demonstrate the improved outcomes of the EASS-CANM approach, a series of simulations are performed utilizing a benchmark dataset. The experimental results reported the supremacy of the EASS-CANM technique on existing approaches. The benchmark dataset is used to conduct a wide range of studies to see whether the EASS-CANM approach improves archival efficacy. According to the full comparative result analysis, the EASS-CANM strategy is more effective than existing approaches in terms of numerous evaluation criteria. Therefore, the EASS-CANM technique can be used effectively in the administration of archives.

1. Introduction

File management and organization activities are manually done by archivists. When a massive amount of files appears simultaneously, efficiency of operation would be considerably decreased [1]. Generally, China’s archive industry is still in the preliminary phase; also, there is some gap in comparison to the archive management system of developing nations. It does not matter if the conventional dual-track mode or dual-system mode is replaced; a massive amount of electronic records would be generated [2]. The original document created by the computer at the time of office process is generally distinct from the document formed after the digitization of archive. This kind of original electronic documents is loaded with additional data when compared to digital documents; also, it is highly suitable in processing. The previous one is manifestation of archive, and the last one is just a copy of archive. If this document reflects the original data that can be efficiently utilized, it will create values [3]. Since the last destination of this electronic document, the archive department accumulates the massive number of original data. Then, organizing these complicated archives systematically becomes a major challenge for application [4]. Data mining, which is frequently employed in business, relies on the application of complex mathematical algorithms as a tool. An associated notion is Knowledge Discovery through Data Mining (KDD). In the business world, big data mining is a technique for identifying and retrieving specific pieces of information from vast databases of information. Fortunately, the data mining technique is capable of processing this heterogeneous data, making file management interrelated activity more intelligent, decreasing the burden on archivists, resolving the problem that might be encountered, and simultaneously allowing public to get improved quality service. Figure 1 illustrates the process of the archive system [5].

With the tremendous growth of the Internet and the advancement of science and technology [6], the amount of archive data is very larger to employ conventional data analysis techniques and tools for processing. Ruddy et al. consider that the dataset is comparatively smaller, and when the information has heterogeneous features, it could not be processed by conventional models [7]. The data mining technique comes into existence under this scenario. It is depending on the integration of big data processing algorithms and traditional data analysis that provide the opportunity to examine the possible value contained in a massive number of information. Many of these archives are utilized by the “vouchers,” and the ways to utilize them are single and relatively traditional [8]. Still, the method of manual system utilization and retrieval occupies the conventional position. But there are some errors in the experiment that results in wrong outcomes. To handle large amount of digitized and born-digital contents, computation methods and tools should consider the temporalities and interdependencies of distinct archival processes over the changing value and nature of open accession and the archival administrative system, dynamically developing archival collection [8]. Also, this application needs to fulfill overarching archival processes imperative to meet legal admissibility requirements, ensure transparency, and support accountability for material. The compression technique reduces the data size by using the data structure [9]. A data compression algorithm falls into lossy and lossless methods. The lossy algorithm generates a loss of data but usually ensures a high compression ratio. In contrast, the lossless algorithm guarantees the data integrity at the time of the compression/decompression method.

This paper studies an effective file storage model and compression method for network management (EASS-CANM). The presented EASS-CANM approach contains a 2-stage procedure: textual data compression and image compression. A primary step, neighborhood indexing sequence (NIS) with the Prediction by Partial Matching (PPM) approach, was implemented to textual data compression. Secondary, fruit fly optimization (FFO) with modified Haar wavelet (MHW) was utilized for effectual image compression where the optimum threshold selection procedure occurs utilizing the FFO technique. The technique integrates. This modification supports compressing medical images for maintaining their purity and preserving fine details but keeping the compression ratio (CR) maximum. For examining the enhanced archival effectiveness of the presented EASS-CANM technique, a wide range of experiments were carried out against the benchmark dataset.

2. Literature Review

Lv and Shi [10] examine the development of universities’ archive management in the data age. It employs the experimental method, review of the literature method, object-oriented method, and research method to analyze. We investigated the present scenario of universities’ archive management system, enhanced the archive management work of universities and colleges, summed up the problem present in the archive management system, and carried it out by using the data management system. Rong [11] presents the theoretic concept of archive data resource and sharing-based network cloud framework, then analyzes the significance of data resource sharing from the method of archive data management, and lastly provides the strategy of creating the modes of data resource sharing in archive management.

Israel [12] examined the attitudes and perception of registry staff members toward archive management at the FUTA. A random sampling method was utilized for selecting fifty registry staff in different sections within the university. Odhiambo [13] evaluated the readiness of USIU-A for managing digital archives to propose a strategy for enhancing digital archive management at the organization. The research shows that the organization needing for digital archive management was not up to standard.

Park et al. [14] gathered articles based on archive management from 1997 to 2016 from four journals associated with library and information science in Korea and two journals associated with archive management. Also, find the direction of archive management in Korea via a comprehensive review of study in record management in the country. The gathered articles from the first and second halves of the decades have been subjected to 5-year cycle analysis. Consequently, study on record information services, electronic records, and archive methods of different records was enhanced gradually.

Wang [15] developed a Hadoop cloud framework for functional modules and electronic archive data management. Tsvuura and Ngulube [16] investigated the digitization of archives and records at 2 state universities in Zimbabwe and embarked on digitization of their archive and record resources corresponding to the technological trend of doing online business. We adopted a qualitative multicase study to offer deeper understanding of digitization of archives and records at the state universities. Information was gathered by purposive sampling via interviews.

3. The Proposed Model

In this study, a novel EAS999S-CANM technique has been developed for the compact storage of files in the archives. The EASS-CANM technique is mainly intended to archive the textual and image data effectively in its compact form in order to reduce the storage area. The proposed EASS-CANM technique initially utilized NIS with the PPM technique for the compression of textual data. Besides, the FFO-MHW technique is exploited for the compression of images in which the optimal threshold value selection using the FFO algorithm helps improve the compression efficiency.

3.1. Textual Data Compression Using NIS with the PPM Technique

At the initial stage, the NIS technique was applied to generate an optimal as well as the shortest codeword (CW) for the input textual data. The presented NIS approach is only a character encoded system that works on the principles of “navigating data founded on ones and 0s. Amongst the 2 short CWs created by one’s and zero’s traversal, the optimum CW is selected according to the minimal amount of bits needed for storing the CW of corresponding characters. For input sequence of length , the NIS approach needs for storing the compressed data and is equated by the following equation:

where denotes the amount of bits in a CW. Additionally, the presented approach requires further eight controller bits to the optimum amount of bits from reduced data. Next, the mean number of bits needed for storing a single character with the NIS approach is estimated as follows.

The lower the values of and, the higher the performance of compression. Stimulatingly, equation (2) signifies that the approaches of NIS need a maximum of bits for storing characters. At first, the presented method loads the input text that might have special symbols and alphanumeric characters [17]. Lowering the values of and enhances density operation. Surprisingly, equation (2) reveals that the approaches of NIS require a limit of four bits to collect a character. Next, the ASCII value is converted to the corresponding binary form. Then, the process “traversing data on the basis of ones and zeros” would be performed. The algorithm included in zero-based traversal and one-based traversal is equivalent accepting that the zero-based traversal searches for zeros in the binary digit where the one-based traversal finds ones in the binary digit. When the two CWs are created, the presented method compares and selects the CWs with minimal amount of bits as optimum CW. The amount of bits in optimum CW might have a minimum of one bit and maximal of 4 bits. Lastly, each resulting optimum CW of encoded character generates the compressed file and is concatenated with the control bit.

In order to further enhance the compression efficiency, the compression of CWs takes place using the PPM technique. The PPM is a sophisticated approach for a data compression-based statistical method and is amongst the more effective methods concerning compression without data loss. The PPM method generates a PPM tree which signifies a variable-order Markov method, where the last character represents an -order Markov method. Text compression, string sequence indexing, and prediction are all made easier with PPM. Construction of a PPM tree from a large volume of data requires lengthy sequential processing in a real-world setting. Partial matching (PPM) is an adaptive statistical data density technique that utilizes context modelling and prediction to reduce the size of datasets. Uncompressed symbol streams are fed into PPM models, which then use the past characters in the stream to forecast the next symbol. When performing a cluster analysis, PPM methods can also be used to arrange data into expected groupings. The PPM tree is utilized for the representation of route. Regarding this, all the road segments have an individual identification, i.e., an integer value [18]. (i)UM represents the uncompressed message, that is, the plain message which includes the symbol that represents the road segment(ii)CM denotes the compressed message, that is, the compressed message related to the plain message, after implementing the compression of the provided PPM tree

At the same time, once the route that was tested comprises several symbols equivalent to a specific PPM tree, the compression procedure would activate the esc character [19], producing a higher CR. Simultaneously, once a route that was tested was significantly distinct from the PPM tree, the esc character would be activated, which produces a lower CR.

3.2. Image Compression Using the FFO-MHW Technique

During the image compression process, the FFO-MHW technique has been executed to effectually compress the images. The presented method is an orthogonal wavelet transform; also, it is calculated by averaging and iterating variance between even and odd pixels of digital images. The MWH is applied in different forms to compress images by decomposing its matrix to sparser one [20]. For preserving more image clarity and quality, we use MHW. MHW presents a new transformation that can attain good compression results when compared to traditional one and attain greater CR and PSNR values. A data compression ratio, also known as compression power, is a measurement of how much a data representation shrinks when compressed using a data compression technique. It is usually expressed as the ratio of uncompressed to compressed size. The data compression ratio is defined as the difference between the uncompressed and compressed sizes. Thus, a compression ratio of is often written as an explicit ratio, 5 : 1 (read “five” to “one”), or as an implicit ratio, 5/1, for a representation that decreases a file’s storage capacity from 10 MB to 2 MB. For that, we utilize MHW to divide the original image of dimension into matrices as follows [21]:

where , , , and values are estimated by

For recreation, we gain , , , and in the following:

Equation (6) is reformatted as

Similarly, the recreation formula is

For the optimal selection of threshold values in the MHW technique, the FFO algorithm has been employed. The FFO algorithm was proposed on the basis of foraging behavior of Drosophila. It is better than other species in visual sense and olfactory abilities; therefore, it can completely utilize its impulse to discover food [22]. In particular, even at a 40 km distance from the food source, the nose of FFs could select different food scents, i.e., distributed all over the air. Upon being close to the food source, the FF locates the food and the company flocking position by using their sensitive visual organ; later, they would fly in that direction. The optimal FF data would be allocated by the entire swarm in the iteration, and the following iteration would only depend on the data of preceding optimal FF. Figure 2 demonstrates the graphical representation of FFO. Based on the food search features of FF, the FOA is separated as in the following:

Step 1. Initialized parameter.
Initialize the parameter of FOA, namely, the maximal amount of iterations, the population size, the random flight distance range, and the primary FF swarm position (, ).

Step 2. Initialized population.
Assume the arbitrary position and distance for the food searching of individual FF, in which denotes the population size.

Step 3. Population assessment.
Initially, estimate the distance of food position to the origin (). Next, calculate the smell concentration (SC) judgment values (), i.e., reciprocal of distance of food position to the origin.

Step 4. Replacement.
Substitute the SC judgment value () with the SC judgment function (known as the fitness function) for finding the SC of the individual position of the FF [23].

Step 5. Discover the maximum SC.
Describe the FF with the maximum SC and the respective position amongst the FF swarm.

Step 6. Retain the maximum SC.
Keep the maximal SC value and and coordinates. Next, it flies to the position with the maximum SC.

Step 7. Iterative optimization.

In order to iterate the execution of steps 2–5, the circulation halts once the SC is no longer better than the preceding iterative SC or once the amount of iteration reaches the maximal amount of iterations.

4. Performance Validation

The performance validation of the EASS-CANM technique occurs utilizing different datasets such as text as well as image. For ensuring the enhanced compression efficacy on the textual dataset, the performance validation takes place on the benchmark dataset with different deployment models, namely, LUCE, HES-SO FishNet, and Le Gènèpi [24].

Similarly, the image compression effectiveness of the EASS-CANM model is validated compared to the benchmark image dataset [25]. A few sample images are demonstrated in Figure 3.

Figure 4 shows the CR analysis of the EASS-CANM technique under different deployment models. The EASS-CANM technique has shown effective compression outcomes with optimal values of CR. For instance, the EASS-CANM technique has obtained CR of 0.1165, 0.1523, 0.1996, 0.1792, 0.1990, and 0.2350 under LU_84 Temp, FN_101 Temp, LG_20 Temp, LU_84 RH, FN_101 RH, and LG_20 RH deployment models, respectively.

Figure 5 displays the CF analysis of the EASS-CANM method under distinct deployment systems. The EASS-CANM approach has revealed efficient compression results with optimum values of CF. For instance, the EASS-CANM method has attained CF of 8.5803, 6.5655, 5.0098, 5.5790, 5.0250, and 4.2546 under LU_84 Temp, FN_101 Temp, LG_20 Temp, LU_84 RH, FN_101 RH, and LG_20 RH deployment systems correspondingly.

Figure 6 demonstrates the CPS analysis of the EASS-CANM system under distinct deployment systems. The EASS-CANM approach has exposed efficient compression outcomes with optimum values of CPS. For instance, the EASS-CANM model has attained CPS of 1575.2888, 446.6724, 897.4095, 3164.6983, 597.6336, and 1376.6681 under LU_84 Temp, FN_101 Temp, LG_20 Temp, LU_84 RH, FN_101 RH, and LG_20 RH deployment systems correspondingly.

A comparative SS analysis of the EASS-CANM technique with recent models under various datasets is provided in Table 1 and Figure 7. The experimental outcomes show that the EASS-CANM technique has accomplished maximum SS over the other techniques. For instance, on the LU-84 Temp dataset, the EASS-CANM technique has attained increased SS of 88.35% whereas the LEC, S-LZW, ALDC, FELACS, and BCAT techniques have offered reduced SS of 70.81%, 48.99%, 73.94%, 74%, and 76.94%, respectively. At the same time, on LG-20 RH datasets, the EASS-CANM algorithm has achieved improved SS of 76.50% while the LEC, S-LZW, ALDC, FELACS, and BCAT models have provided minimized SS of 48.67%, 21.93%, 52.87%, 53.85%, and 58.52% correspondingly.

Figure 8 illustrates the comparative bit rate analysis of the EASS-CANM approach with recent algorithms on distinct datasets. The results reported the better outcomes of the EASS-CANM technique with the minimum BR. For instance, on the LU-84 Temp dataset, the EASS-CANM technique has provided the least BR of 0.9324, but the LEC, S-LZW, ALDC, FELACS, and BCAT methodologies have resulted in increased BR of 4.6732, 8.1628, 4.1896, 4.1153, and 1.8452, respectively. Moreover, on LG-20 RH datasets, the EASS-CANM approach has offered minimum BR of 1.8803 while the LEC, S-LZW, ALDC, FELACS, and BCAT methods have resulted in improved BR of 8.3034, 12.4915, 7.5408, 7.3832, and 3.3186 correspondingly.

In order to demonstrate the image compression efficiency of the EASS-CANM technique, a comparative analysis with existing techniques is developed in terms of MSE and PSNR in Table 2. The experimental outcomes stated that the EASS-CANM algorithm has obtainable effectual outcome with the maximum values of PSNR and minimum values of MSE.

Figure 9 showcases the comparative PSNR analysis of the EASS-CANM system with other approaches. The results show that the EASS-CANM technique has shown enhanced outcomes with the increased PSNR values. For sample, on image 1, the EASS-CANM technique has obtained higher PSNR of 43.73 dB whereas the FFA-LBG and GWO-ECC techniques have obtained lower PSNR of 41.53 dB and 40.06 dB, respectively. Similarly, on image 5, the EASS-CANM approach has attained maximum PSNR of 41.58 dB while the FFA-LBG and GWO-ECC systems have gained less PSNR of 39.63 dB and 38.49 dB, respectively.

The MSE analysis of the EASS-CANM system with recent algorithms is offered in Figure 10. The figure reported the higher outcomes of the EASS-CANM system with the minimum values of MSE on all test images. For example, on image 1, the EASS-CANM technique has achieved lower MSE of 2.756, but the FFA-LBG and GWO-ECC methods have obtained maximum MSE of 4.567 and 6.408 correspondingly. In addition, on image 5, the EASS-CANM system has gained minimum MSE of 4.519, but the FFA-LBG and GWO-ECC approaches have gained maximum MSE of 7.081 and 9.197 correspondingly.

Table 3 and Figure 11 show the comparative SS analysis of the EASS-CANM approach with other models [26]. The result shows that the EASS-CANM approach has revealed improved results with the improved SS values. For instance, on image 1, the EASS-CANM method has gained maximum SS of 86.54% while the FFA-LBG and GWO-ECC models have gained less SS of 74.32% and 64.30%, respectively. Likewise, on image 5, the EASS-CANM algorithm has attained maximum SS of 81.19% while the FFA-LBG and GWO-ECC models have attained less SS of 76.40% and 60.47%, respectively.

5. Conclusion

In this study, a novel EASS-CANM technique was established for the compact storage of files in the archives. The EASS-CANM technique is mainly intended to archive the textual and image data effectively in its compact form in order to reduce the storage area. The proposed EASS-CANM technique initially utilized NIS with the PPM technique for the compression of textual data. Besides, the FFO-MHW technique was exploited for the compression of images in which the optimal threshold value selection using the FFO algorithm helps improve the compression efficiency. For examining the enhanced archival effectiveness of the presented EASS-CANM approach, a wide range of experiments are implemented against the benchmark dataset. The comprehensive comparative result analysis highlighted the improved efficacy of the EASS-CANM approach on existing approaches with respect to various evaluation metrics. Therefore, the EASS-CANM technique can be treated as an effective tool for archives management. In the future, the performance of the EASS-CANM algorithm was extended to the design of lightweight cryptographic techniques to accomplish security. A novel lightweight cryptographic technique is helpful for encrypting data sent to the cloud storage, and the use of both symmetric and asymmetric encryption to encrypt data enables users to benefit from the efficient security of asymmetric encryption and the speedy performance of symmetric encryption while preserving users’ rights to access data in a secure and permitted manner.

Data Availability

No data were used to support this study.

Conflicts of Interest

The author declares that there are no conflicts of interest regarding the publication of this article.