Abstract

Information security is defined as preventing actions such as unauthorized access and use, modification, and removal of information. It consists of certain basic elements of confidentiality, integrity, and accessibility. There are numerous studies in published literature which have been conducted to ensure information security. However, there is no previous study that covers these three basic elements together. In the present study, a model that includes these three key elements of information security together for big data was proposed and implemented. With this proposed “single-label model,” a more practical and flexible structure was established for all operations (read, write, update, and delete) performed on a database on real data. In previous studies conducted with a label model, separate labels were used for read-only or write-only operations, and there was no structure that could ensure both confidentiality and integrity at the same time. The present study, however, shows what type of authorization and access control could be established between which processes and which users by looking at a single label for all the operations performed on the data. Thus, in contrast to the previous studies seen in published literature, data confidentiality, data integrity, and data consistency were all guaranteed for all transactions. The results of the proposed single-label model were also shown comparatively by conducting an experimental study of its application. The results obtained are promising for further studies.

1. Introduction

Information security is defined as preventing actions such as unauthorized access and use, modification, and removal of information, and it consists of certain basic elements including confidentiality, integrity, and accessibility [1, 2]. Confidentiality is the protection of information against being accessed, read, or used by unauthorized persons in any way. Integrity is the prevention of modification of information by unauthorized persons and the preservation of its original nature. Accessibility, on the other hand, is that the information is accessible and readily available as long as it is needed.

Today, there are new and highly effective threats that damage information systems and resources [3]. Although there are many measures taken to protect systems from such harmful threats that are supported by advanced technologies, it has been seen that attackers can still often succeed. In these and similar cases, any incident that causes a violation of any of the three basic elements of information security (confidentiality, integrity, and accessibility) is considered to be a security problem [4]. While some violations intentionally make systems inaccessible and disrupt services, others occur accidentally due to unforeseen faults. Whether accidental or malicious, security violations seriously affect the activity and reliability of an institution.

In general, threats often turn into attacks by exploiting gaps or vulnerabilities in systems. Therefore, it can be said that it is of great importance to provide all these three basic elements together to prevent such attacks from damaging information systems. In short, no matter how secure a system is, the important thing here is to ensure control of the access and authorization processes that may allow any attack [5].

Some leading factors that cause security breaches (or violations) include Denial of Service (DOS) attacks, Distributed Denial of Service (DDOS) attacks, inappropriate web browsing behavior, wiretapping, access to resources using a backdoor, and data changes occurring accidentally or intentionally [6]. Data that is deliberately or accidentally changed directly affects the integrity principle of information systems security in particular, and it results in an emerging security breach. The occurrence of such data modification events, like giving excessive authorization to users and exercising poor control of permissions, plays an important role [7]. To deal with such problems, a model designed according to the specific access rights (e.g., read, write, update, and delete) is required for organizations and users. However, studies have shown that these models are unable to fully meet the needs of rapidly growing and increasingly complex systems, because they represent a serious financial burden and fail to fully provide information flow control [811]. Therefore, it is seen that it is not enough for information systems to be constructed in a way to protect them only from unauthorized access, malicious users, and misuse. In this study, a model was created to provide the three basic elements of information security together by using real data. In this way, no user or group of users would be able to access data that is not authorized at their level or data that they are not allowed to perform various operations on.

In this study, a single-label model is created. The scientific contribution of this model is that while the data available to be used by the stakeholders can only easily be used by authorized actors, it does not allow the use of these data by unauthorized third-party actors. At the same time, this model contributes to the research of methods that enable the use of jointly used resources without causing information leakage. Therefore, in this study, we describe a distributed label model that can maintain data confidentiality with information flow control in distributed databases. The difference between this study and the other studies on this subject is that this label model targets data confidentiality and integrity among nonreliable actors and environments. Through the labels given to the data, each actor can determine his/her own security policy independently from other actors and authorize the ones that he/she chooses. The purpose of this study was to develop a method that allows different users to access the data in a distributed environment and protects confidentiality. It was aimed at investigating methods preventing unauthorized access to data being accessed jointly by multiple actors.

In the remainder of this study, other researches related to this subject are presented in Section 2, while the method is presented in Section 3. The proposed model is discussed in Section 3, and its application is detailed in Section 4. Section 5 details the evaluation and conclusions.

Information is a valuable asset. Therefore, access, processing, updating, deleting, and authorizing operations should be carefully managed to ensure that confidentiality, integrity, and accessibility are maintained. In recent years, some techniques have been developed in published literature which outline the rules related to access, authorization, monitoring, and control of information and information systems [1214]. However, it is seen in many industries that the development area of these techniques has narrowed and that existing techniques do not fully meet the new business requirements that arise with developing technology, and they cannot be managed in accordance with the organizational structure. In addition, serious costs arise in the progress towards a manageable model, and the dynamism that is necessary for the use and sharing of resources is not achieved.

In recent years, various studies using different techniques for the purposes mentioned have been described in published literature. Schultz and colleagues developed a platform that allowed the data access of users to be automatically tracked. Because a user logs into the system separately for each transaction, authority control is performed again. The user has to perform the authority check at each stage. If he/she does not perform the check at any one stage, data confidentiality is breached. This creates the need for automatic monitoring of authority [15]. Parker et al. presented a platform extension for database transactions. In this platform, each table has a label and protects its length [16], but this method can impose high computational costs and high overheads. Yang et al. used information flow control in web applications, but this approach can be expensive in both space and time and requires more memory [17]. Muthukumaran et al. applied information flow control (IFC) with FlowWatcher monitoring software that provides applications with a web proxy but limits the granularity of policies it can enforce [18]. In previous studies in published literature [1922], a separate label was used for each operation (read, write) carried out on the object, and only reading and writing were performed. In the present study’s proposal, by contrast, all operations performed on the object (read, write, update, and delete) are carried out using a single label. In this way, by looking at a single label, what type of authorization style is used between which operations and which actors can be understood.

In recent years, various studies using different techniques for the purposes mentioned above have been described in published literature [13, 15, 2326]. In this present study, on the other hand, there is no need for separate control for both authorizing and denying authorization. There is no need for separate authorization or access control for each operation such as reading, writing, updating, and deleting. In addition, by tracking the access of malicious actors to data, attempts are made to prevent information disclosure.

Fog computing or fog networking, also known as fogging, is pushing the frontiers of computing applications, data, and services away from a centralized cloud to a logical stream on the network edge. Fog networking systems work on building the control, configuration, and management over the Internet backbone [27].

Software-defined networking (SDN) is a promising approach to networking which provides an abstraction layer for the physical network [28]. In published literature, a recurrent neural network (RNN) model based on a new regularization technique (RNN-SDR) was proposed by the authors. This technique supported intrusion detection within SDNs [28]. Nevertheless, this model is not practical for implementation in the context of an SDN. Prete and Schweitzer contextualized the existing problems in current computer networks and presented the SDN network as one of the main proposals for the viability of the Internet of the future. Simulations were created in an SDN network scenario using a POX Controller [29]. However, there is a need to obtain a synergistic effect that will make cloud environments more efficient, dynamic, and flexible, including automatic reconfiguration of network clusters.

In the current study, a single-label model was developed. The scientific contribution of this model is that while the data available to use by the stakeholders can be used easily only by authorized actors, it does not allow the use of these data by unauthorized third-party actors. At the same time, this model contributes to research into methods that enable the use of jointly used resources without causing information leakage. Therefore, in this study, a single-label model was developed which can maintain data confidentiality and integrity with information flow control. The difference between this study and other studies with a single-label model is that it targets data confidentiality and integrity of users. Through the labels given to the data, each actor can determine his/her own security policy independently from the other actors and authorize the ones that he/she chooses from the other actors. Moreover, access control and authorization are ensured in accordance with the actor’s wishes, without causing data leakage and with the supervision of information flow control. The actors are able to create their own security, confidentiality, and integrity policies in a practical and flexible way. The difference between this study and other studies is that it provides data confidentiality, data integrity, and data consistency together.

3. Proposed Model

The single-label model consists of actors, objects, and labels.

3.1. Actor

The actors include data owners and users or groups of users who perform operations such as granting and receiving data authorization. Each actor labels his/her data for data confidentiality and integrity. The label consists of a list of security policies that are provided by the actors. Each actor labels his/her data for data privacy. That is, a label is determined which is paired with a data object. In addition, each actor has the right to safely change these security policies separately. Figure 1 shows a sample actor hierarchy. In this figure, X and Y are the representatives of a worker group. Worker Z has two tasks and duties as an engineer and a unit head. In the principal hierarchy, the process of granting authority is transitive. For instance, X ⟶ Y stands for granting authority by X to the principal Y. If X ⟶ Y and Y ⟶ Z, then X ⟶ Z is also true.

3.2. Label

A label is a collection of policies that are created for the protection of data. That is, a label is determined which is paired with a data object. In addition, each actor has the right to safely change these security policies separately. This model was developed for unreliable actors and environments. All actors change their own policy independently of each other. The object consists of data to which authorization is granted or received by actors. The label consists of the list of security policies issued by actors. Each actor labels his/her data for data confidentiality. In addition, each actor separately has the authority to safely change these security policies.

Figure 2 shows the contents of a label. Here, while u1, u2, …, un show the owners of the data object from the actors in the system, the terms x1, x2, …, xm refer to the actors to whom authorization is given for any transaction by the data owners: p1, p2, …, pn, that is, each content definition on the L label, shows the security policy of the relevant actor regarding these common data. Each actor who owns a data object determines his/her own policy on the label. Then, one of the actors sends these data objects to the other actors with its label.

3.3. Graph Modeling of Labels

In previous studies in published literature [1922, 3032], a separate label has been used for each operation (read, write) carried out on the object, and only reading and writing have been performed. However, in the present study proposal, all operations performed on the object (read, write, update, and delete) are carried out using a single label. In this way, by looking at a single label, what type of authorization style there is between which operations and which actors is understood.

In this present study, the single-label model is shown by a graph data structure (Figure 3) in which we let the label determined for graph G be LG. In this study, the circles in the graph data structure show the actors. Which operation will be performed in the distributed database is determined by the way the arrow is drawn. A different arrow is used for each of the read, write, update, and delete operations. Thus, with a single label, a more practical and more secure authorization and access operation is created.

LG consists of five parts, namely, owner, readers, writers, updaters, and deleters. The way the arrows are drawn in the graph show the types of authority needed to access the data. Here, while “owner” denotes the actors who own the labeled object, “readers” refers to the actors to whom authorization is given to read data owners’ transactions; “writers” refers to the actors to whom authorization is given to write to the data owners’ transactions; “updaters” refers to the actors to whom authorization is given to update the data owners’ transactions; and “deleters” refers to the actors to whom authorization is given to delete data owners’ transactions. The label shown in Figure 1 combined with graph G can be expressed in the LG typing format as follows:

The semicolon used when creating a label separates the policies from one another. Accordingly, the LG label has five policies: {1:2, 4}, {2:3, 4}, {3:4, 5}, {4: 5}, and {5: }. While 1, 2, 3, and 4 denote the owners of the data object to which the LG label belongs, 2, 3, 4, and 5 represent the actors authorized by the data owners for various object transactions (read, write, update, and delete).

Let us assume that the first policy shows the read operation on the object.

The first policy is expressed with the 1 ⟶ 1, 1 ⟶ 2, and 1 ⟶ 4 edges. This means that the 1 actor allows the 1, 2, and 4 actors to read his/her data.

Let us assume that the second policy shows the write operation on the object.

The second policy is expressed with the 2 ⟶ 2, 2 ⟶ 3, and 2 ⟶ 4 edges. This means that the 2 actor allows the 2, 3, and 4 actors to write to his/her data.

Let us assume that the third policy shows the update operation on the object.

The third policy is expressed with the 3 ⟶ 3, 3 ⟶ 4, and 3 ⟶ 5 edges. This means that the 3 actor allows the 3, 4, and 5 actors to read his/her data.

Let us assume that the fourth policy shows the delete operation on the object.

It is expressed by 4 ⟶ 4 and 4 ⟶ 5 edges. This means that the 4 actor allows the 4 and 5 actors to delete his/her data.

The last policy is expressed with the 5 ⟶ 5 edge. This means that 5 does not allow anyone other than himself/herself to perform any transaction on his/her data.

3.4. Bank Example

A bank has many customers. Each bank is obliged to protect and save its customers’ account information such as money, goods, and investments from other customers or noncustomer principals. In Figure 4, a bank’s customer operations have been shown by employing label modeling. In this figure, the oval shapes are as follows: M is customer, B is bank, and T is the principal’s computing customer assets. Arrows represent information flow between principals, while squares represent the database and the data.

Any customer can, by labeling i (1 ≤ i ≤ n) assets with {Mi:B, Mi}, forge their own security policy. Also, each customer performs operations such as drawing or depositing cash and so forth at different times. A bank has to conduct these operations safely. These banks label all customer operations performed with {M:B, M}. Thus, banks can read customers’ information. Customer i operations, like withdrawing cash, depositing cash, money transfers, and so forth, are conducted by the T principal. T is a program computing customers’ asset details. The T principal can declassify any asset information that each i customer labels with {Mi:B, Mi}, and with a {B:B} label it transfers them to the bank’s database. Thus, this bank can control the flow of information and, to ensure that other principals in the system cannot read these data, it saves these data with a {B:B} label in its private database. These labels are created for all operations performed in the database and combine them into one label.

4. Experimental Study

When the proposed single-label model was compared with the double-label model in published literature, and the performance results obtained in terms of accuracy and time are given in the following sections.

4.1. Accuracy

In Table 1, the success of the proposed single-label model and that of the double-label model in published literature are compared against a real data set, which has been taken from a hospital and whose classes are obvious. Accuracy rates were calculated for about 100 actors and 20 objects randomly selected from this data set. In addition, all classes of this data set were specified. Accuracy rates were calculated according to their real class. While measuring the accuracy rate, the classes of the model created for this study were calculated by comparing them with real classes. The success of the proposed model is clearly shown in Figure 5. When the performances of both methods were compared for all operations performed on objects in terms of accuracy rates, the success of the proposed single-label model can be clearly seen. In particular, it gives more successful results in reading and deleting operations. This is because writing and updating operations are more difficult than other operations.

In Table 2, the success of the proposed model (single label) and that of the model in published literature are compared in terms of accuracy. Accuracy rates have been calculated for about 1000 actors and 200 objects. The success of the proposed model is clearly shown in Figure 6. When the performances of both methods are compared in terms of accuracy rates for all operations performed on objects, the success of the proposed model can be clearly seen.

In Table 3, the success of the proposed model (single label) and that of the model in published literature are compared in terms of accuracy. Accuracy rates have been calculated for about 10000 actors and 2000 objects. The success of the proposed model is clearly shown in Figure 7. When the performances of both methods are compared in terms of accuracy rates for all operations performed on objects, the success of the proposed model can be clearly seen.

In Table 4, the success of the proposed model (single label) and that of the model in published literature are compared in terms of accuracy. Accuracy rates have been calculated for about 100000 actors and 20000 objects. The success of the proposed model is clearly shown in 8. When the performances of both methods are compared in terms of accuracy rates for all operations performed on objects, the success of the proposed model is clearly seen.

4.2. Time

In Table 5, the success of the proposed model (single label) and that of the model in published literature in terms of time are compared against the actual data set taken from the hospital. Performances related to time are given for about 100 actors and 20 objects. The success of the proposed model is clearly shown in Figure 9. In terms of time, it is seen that operations are performed on the data in less time with the proposed model. Writing and updating operations take longer in both methods in terms of time compared to other operations. This is because performing writing and reading operations on the object takes more time. Also, when compared in terms of time, the proposed model gives very successful results for all operations performed on the object.

In Table 6, the success of the proposed single-label model and that of the model in published literature in terms of time are compared against the actual data set taken from the hospital. Performances related to the time are given for about 1000 actors and 200 objects. The success of the proposed model is clearly shown in Figure 10. In terms of time, it is seen that operations are performed on the data in less time with the proposed model.

In Table 7, the success of the proposed single-label model and that of the model in published literature in terms of time are compared against the actual data set taken from the hospital. Performances related to the time are given for about 10000 actors and 2000 objects. The success of the proposed model is clearly shown in Figure 11. In terms of time, it is seen that operations are performed on the data in less time with the proposed model.

In Table 8, the success of the proposed single-label model and that of the model in published literature in terms of time are compared against the actual data set taken from the hospital. Performances related to the time are given for about 100000 actors and 20000 objects. The success of the proposed model is clearly shown in Figure 12. In terms of time, it is seen that operations are performed on the data in less time with the proposed model.

5. Evaluation and Conclusions

In this study, a single-label model was introduced for ensuring data security. In the proposed model, authorization and deauthorization operations between actors were both carried out. Also, in the proposed model, there is no separate authorization or access control for each operation such as reading, writing, updating, and deleting. Access control and authorization operations were performed through labels. Unlike previous studies, data security was ensured for all operations performed in the distributed database. Actors can take back the authority that they give at any time, or they can give authority to the actor they want. Challenges that occur during the implementation of security policies on distributed databases are overcome.

In this study, the problem of data security in distributed databases was addressed. In particular, a distributed-label model related to data flow control was introduced and examples of applications for its use were shown. In addition, data object flows in a distributed environment were modeled with a graph structure. In previous studies, a separate label has been used for each operation (read, write) carried out on the object, and only reading and writing have been performed. In the study proposed here, on the other hand, all operations performed on the object (read, write, update, and delete) were carried out using a single label. This also shows that the proposed model is flexible. By tracking the access of malicious actors to data, attempts were made to prevent disclosure of information. The results of the proposed single-label model for all operations performed on the data were also shown by the experimental study. It delivered more successful results, especially in reading and deleting operations.

The proposed model was also compared with the method used in previous studies in terms of time, and it was seen that it performed operations in a shorter time. In this way, data confidentiality, integrity, and consistency were ensured.

As a future study, a prototype application will be created, which shows the work of the label model, and the model will be enriched by relabeling, which takes into account the hierarchy of actors as well.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The author declares that there are no conflicts of interest.