Abstract

In recent years, online social networks (OSNs) have become an essential part of our social life. In OSNs, users can post resources to predefined groups of users, for example, family, friends, close friends. However, due to these predefined groups of users, few irrelevant users may get access to these published resources. Moreover, the users cannot configure privacy settings due to the lack of technical knowledge and the rigidity of the access control management system. To tackle these issues, we propose a text-based dynamic and fine-grain access control system for OSNs. Our proposed model uses a dynamic clustering algorithm to create user clusters based on the mutual interests of the users. After clustering, the proposed system creates automatic access rules based on the relationship between the users’ clusters and their resources. The proposed system will ensure fine-grained access control and automatic assignment of policies to the text-based resources. We have implemented our system to gauge the applicability, and the results are discussed in the experiments section.

1. Introduction

In online social networks (OSN), users usually publish resources like text, photos, audio, and video messages. Due to the rapid growth of the Internet, OSNs have become one of the modern ways for people to communicate. The advancement in technology and the ease of using social media have increased the growth of OSNs over the years. In addition, most of the organizations have started business activities on OSNs. According to Zephoria digital marketing, Facebook had a monthly 2.7 billion live users worldwide. It is the third most visited website ever since 2020 [1]. These days, Twitter, another famous social network, has monthly estimated of more than 31.3 billion of live users, and these users publish their tweets in different languages [2]. Similarly, companies and employers inspect profiles of OSN users such as LinkedIn, Twitter, and Facebook [3] for recruitment/advertisement purposes. To resolve crimes, law implementation organizations are gleaning proof from OSNs [4]. Since users in OSNs are usually linked to friends and family, a general observation is that OSNs provide a more protected and trusted personal atmosphere for online communication [5]. A study reveals [6] that 72% of American use OSNs for online communication with friends, family, and visitors.

Nowadays, machine learning-based techniques are very famous, and researchers use these techniques to resolve complex problems. In machine learning, KNN (K-nearest neighbors) algorithm is one of the essential and straightforward classification algorithms. It is based on supervised learning and is very helpful in intrusion detection, data mining and pattern recognition, and data mining applications. In real-life situations, it is extensively used in several applications. The KNN algorithm is a lazy learning method created for real-world applications and proposed by Fix et al. in 1951 [7]. KNN algorithm is generally used for pattern arrangement based on feature resemblance, and it is categorized based on the majority vote of its neighbors and allocated to the category nearest among its K-nearest neighbors. Contrasting to the other statistical approaches, which explain a model from the information obtained from the historical data, the KNN algorithm understands the training dataset as the model itself. Therefore, there is no training stage for the KNN algorithm during the testing stage. Further information on the KNN algorithm can be found in [8].

In OSN, users communicate with other users through mutual interests and share information and resources; therefore, OSN and related applications are full of an extraordinary amount of private information. This private information and resources are the main privacy concerns for the OSN users. These privacy and safety problems pose the main disadvantage for the users and the OSN service providers [911]. Moreover, the addition of unknown persons as a friend in the OSNs can be a serious privacy threat [1214]. In most of the OSNs, the users are tagged in a published message, yet causing another mode of privacy leakage. This privacy leakage of personal information of the users may impact their private life [15].

In order to tackle the above-mentioned privacy issues, researchers proposed some useful methods. In 2016, Imran-Daud et al. [16] proposed a dynamic and automatic access control method for medical-based OSNs, which creates rules dynamically and automatically at runtime and provides users with an anonymized textual-based resource. The limitation of this method is to select users in a static way (i.e., predefined groups of users are selected), whereas the proposed privacy-based technique creates groups of users dynamically and maintains fine-grained access control. Outchakoucht et al. [17] proposed the dynamic access control method developed for IoT. The architecture of this model consists of ORBAC and blockchain techniques. Therefore, it is designed for organizational-based systems, and blockchains have their limitations which are discussed in related work in detail. This method creates dynamic policies and is applied to the predefined group of users, and due to these groups of users, few irrelevant users get access to the resource.

Kayes et al. [18] proposed another access control method for IoT based on RBAC, which dynamically creates roles based on the related contextual situations and is designed explicitly for IoT and organizational-based systems. However, these partially dynamic methods have some limitations, and users’ privacy is compromised due to these limitations. Therefore, a fully dynamic and automatic access control system is required to manage privacy-aware OSN resources.

1.1. Motivation

OSNs usually provide static contact lists such as family, friends, close friends. The published resource is shared with a few irrelevant users due to this predefined group of users [19]. Similarly, predefined policies (or static policies) are applied to a specific group of users, including those that may be irrelevant to the resource. Therefore, these static policies/rules are the core reason for privacy leakage [19]. Most of the OSN users cannot configure privacy settings due to lack of technical knowledge that leads to privacy leakage. However, access control models require such a procedure that may secure the personal information of users that can automatically and dynamically administer privacy settings in OSN [19]. The dynamic and automatic access control management framework for OSN should be created as a prototype that will guarantee the resource owner privacy within the OSN, and it must adapt to current privacy requirements in a given situation to develop rules/policies accordingly [1921]. The aim of this research is to address social network-based privacy problems as discussed above.

1.2. Contribution

In this paper, we propose a dynamic, fine-grained, and privacy-aware access control model for OSNs that automatically creates and evaluates access policies based on relationships between resources and users. The contribution of the research work is as follows:(i)The proposed system extracts common interests of the users from user profile data.(ii)The system generates interest-based clusters from the user profiles by creating user clusters based on common interests within the users. These user clusters are beneficial for categorizing users into different groups based on their interests.(iii)The proposed system identifies the relationship attributes from the text resource using NLP functions.(iv)The proposed system identifies the relationship between text resources and user clusters with the help of identified relationship attributes. These recognized relationships are beneficial for managing access to the resources.(v)The proposed system automatically and dynamically creates rules based on the identified relationship between resources and user clusters.(vi)The automated delegation is managed based on recognized relationships and selected user clusters.

1.3. Organization

The rest of the organization is as follows: in Section 2, the related work is deliberated. Section 3 presents a dynamic, fine-grained, and automated access control model which enhances the security and privacy of OSN resources. In Section 4, the results and discussions support our idea. In the end, we discuss the conclusion and future research directions.

A critical literature assessment is an important feature to expose research areas where the investigation is needed. Numerous access control systems have been presented for access control since 2021. Inferring user profile information has been widely discussed [19, 20], and most infernal attempts to discover the personal information of a user by perceiving OSN groups and network contacts. Hence, a large portion of work has been done in this area, and a few of them are as follows.

In OSNs, access control offers some unique features dissimilar from previous access control. The mandatory and RBAC method implements an organization-wide access control policy that is naturally definite through the security manager. The author describes the resource access rule in the discretionary access control model. However, in OSN systems, users expect to regulate access to their resources and activities related to themselves. Thus, access to OSNs is subject to user-specified policies. Access control systems used in IoT are RBAC (role-based) [21], organization-based (Or-BAC) [22], trust-based [23], capability-based (CapBAC) [24, 25], and attribute-based access control (ABAC) [26, 27]. In a centralized access control system, rights are assigned by a centralized entity that turns out to be a failure [28]. Most of the new models are based on RBAC and ABAC methods, in which the researchers introduced changes like relation and trust to improve the access level. Some famous game theory-based access control models like Wellman and Berkowitz [29] proposed a novel access control method using game theory to investigate the advantages of owners and visitor content in OSN. Tian and Lin [30] presented a method based on game control that investigates the conduct of users using game theory in OSNs and manages resource access via a trusted estimate of user conduct. Yu et al. [31] prepared a game framework for competitive information dissemination in OSNs to comprehend the inspiration of human behaviors such as money, learning, interest, and knowledge wishes on competitive info distribution. Zhu et al. [32] implement recurrent games and incentive methods to enhance the proficiency of resource allocation in OSNs. Still, the above-discussed access control approaches cannot wisely provide users with advice on what way to create access control rules [33]. In OSNs, there is no central authority to find affiliations between users and manage policies, and access control is applied by cryptographic worth [5, 34]. Pang and Zhang [35] resolved access control and privacy protection issues in OSNs by using a cryptography-based solution. They also directed a new advanced way for access control mechanisms in OSNs. This type of access control technique is proficient to define situations based on ‘k-depth’ and ‘k-common friends. This method is more secure than traditional relationship-based access-control systems. Still, the cryptography-based models require high computational resources of computers for execution and performance.

A blockchain is an innovative and decentralized, skill behind famous cryptocurrencies like Bitcoin [36] and Ethereum [37]. Every block has a limited size and rate for storing transactions, In Bit-coin average block development time is 15 minutes, and block size is 1 Megabyte, and the maximum throughput is 7 tps (transactions per second) [38]. Ouaddah et al. [39, 40] first proposed and implemented the fair method of access. Their method of access handling utilized the consistency of the blockchain. Maesa et al. [41] published access control policies using blockchain technology. Outchakoucht et al. [17] enhanced the fair access model security using machine learning algorithms [42]. Smartly chosen contracts are implemented to analyze policies to manage access control. Zhang et al. [43] split access control contract into two streams. First is used to manage access, and the other stream is used to determine the misuse of the access control. Dukkipati et al. [44] model includes two types of policies (i). general policies and (ii). special policies in the blockchain. Maesa et al. [45] offer a model based on the Ethereum blockchain. The limitation of the above techniques is high computational capability, no microtransactions support, open ledger problem, and high transaction fee. In the paper [46], the author’s proposal depends on a pair of algorithms, first for getting resources and second for sharing resources based on the collaborative access-control model. Their work is needed in the area of automatic enforcement and evaluation of the policies. The author also recommended that smart contracts may solve automatic enforcement and evaluation of policies, and IOTA currently does not support smart contracts. In collaborative-based access control known as aggregation-based models [47], the individuals involved in sharing some content should decide whether to share their information or not. The drawbacks in previous models [48, 49] deter them to handle every case in its entirety, and in a few cases, these models have to depend only on the owner’s data to resolve the conflicts. Due to these reasons, there is a need for new conflict resolution strategies. In the paper [50], the researcher proposed a collaborative access control framework for OSNs which decides whether to allow or deny access to an object. It involves the privacy settings of the owner, originator, contributor, and others involved. Four things are under consideration: user’s trust relationship, user’s sensitivity level, weights of access types, and controller types. They proposed an algorithm to grant view and share access. Algorithm evaluation is done using theoretical and self-created scenario-based data. The limitation of this study is that while dealing with high workloads and big data, it will show problems that should be taken care of. They left this evaluation as future work.

Fong et al. [51], Gates [52], and Carminati and Ferrari [53] utilized a relationship-based model for OSN, to control whether guests can permit resources. In [5456], Fong and Siahaan and Bruns et al. describe access control tactics in OSNs and presented a mixed logic using modal logic language. In [57, 58], Park et al. and Cheng et al. describe access control rules/policies based on regular expressions, permitting user-resource, resource-resource, and user-user relations to manage guests’ permission on resources. This type of approach in OSNs gives users a way to manage permission on resources. So, the various relationships are related to various access rights. The relationship-based access management model is easy to recognize and simple to apply in OSNs. Therefore, it is tough for users to discover which access privileges a relationship must be linked providing by the resource. Another automated and dynamic access control approach is proposed by Abid and Daud [59] which is also based on dynamic relationships. The maximum present OSN models impose a basic and limited relationship-based access control method, allowing users to select rules from a predefined vocabulary, like “private,” “public,” “friend,” and “friend of a friend.” Facebook and Google + presented traditional relationships, specifically “friend list” and “circle,” giving users more choices to distinguish particularly privileged-user sets [60].

Malik et al. have focused on IoT-based traditional access control and social network systems [61]. This author recommends some critical issues for access control systems; the access control systems must be dynamic and proficient in changing the access authorizations at runtime based on requirements [61, 62]. It should be fine-grained to secure sensitive and private resources [52, 61]. Finally, it is concluded that none of the methods is fully capable of delivering dynamic and privacy-aware access control. Therefore, the need for a dynamic, fine-grained, and automated access control model in OSNs is required. This proposed system will automatically assigns the policies to the textual resources and to the captions of videos and images.

3. Proposed Methodology

Our system operates on the profile data to get the relevant information about the owner to manage access control on the shared resources (e.g., list of friends, education information, work). Based on the profile data, the system automatically generates clusters, and the users are automatically mapped to these clusters based on their relevance to them. These clusters are also known as friends’ clusters (FC’s). The interests are extracted from the content when a user tries to publish a message through the OSN. Our system identified the relationship between interests and FCs’. Based on the contact relationship, one or more than one cluster is selected. If more than one cluster are selected, in that case, the system automatically prepares one cluster by taking the union of selected clusters (an instance of merging two or more clusters into one cluster). In addition, a permit rule/policy is formed based on the affiliation among the interests and friends cluster. Finally, according to the rule/policy, the message is tagged to the designated friends’ cluster. The selected FC members have permission to access and share the message according to the rule/policy. It is noted that only the message-related users get access to the resource or message.

Flow control of submodules of the system diagram is depicted in Figure 1. Our system is initialized by creating friends’ clusters (FCs) that are achieved through the interest-based clustering algorithm, which is based on two functions: interest selection function (ISF) and clustering function (CF). The ISF selects interest categories from the user profile and sends these interest categories to the CF. The CF creates the users’ clusters based on these interest categories, which are stored in the FCs database (FCs DB). This whole process of user clustering is also known as the initialization process (IP). These friends’ clusters (FCs) are very useful for categorizing users into different clusters based on their interests. The IP starts at the beginning of the system and repeats whenever a new user is added to the system. The FCs’ database (FCs DB) will be updated due to this process.

The processes of ISF and CF are explained in Algorithm 1. In line 1, the profile data of the users are retrieved to make friends’ clusters. This profile data may contain fields such as Name, Designation, employment information, location, educational information. In lines 2–8, text processing of profile data is performed like data cleaning and transformation of profile data. Through this process, the text is prepared for feature extraction. In lines 9–20, first select interest fields (like location, job, marital status, education) and then further used them to extract interest categories from selected fields, for example, we get three categories single, married, and divorced from the marital status field. In lines 21–30, CF uses interests as a centroid for clustering, and finally, clusters are created based on the matching of interests with profile data.

(1)  reviews = read (Profile-data)
(2)  input S [i]//Select Req. Columns
(3)  for i in range (0, len (S [i])):
(4)      revi = read S [i] from reviews
(5)      for j in range (0, len (revi)):
(6)                word.lower ()
(7)                corpusi.append (word)
(8)      end for
(9)      ISF (corpusi)//Define function
(10)      di = dict ()//initialize dictionary
(11)      for word in corpusi
(12)                if word in di
(13)                         di [word] = di [word] + 1
(14)                 else
(15)                         di [word] = 1
(16)                end if
(17)      end for
(18)      interesti = convert di to array
(19)      return interesti, corpusi
(20)      end function
(21)      CF (interesti, corpusi)//Define function
(22)      Ci = []//initialize cluster array
(23)      for line in corpusi:
(24)                for word in range (0, len (interesti)):
(25)                        if line = = interesti [word]
(26)                                  Ci.append (word)
(27)                        end if
(28)                end for
(29)      end for
(30)      R = reviews [‘Cluster’ i] = Ci
(31)      return R
(32)      end function
(33)  end for

Once the initialization process is completed, the next workflow of the proposed system is as follows. The owner sends a message m to the parser for publishing. The parser interprets the message by extracting interest attributes (like location, job, marital status, education) from the message. These interest attributes are helpful to select a specific set of users or friends’ clusters (FCs) (details in Section 3.1). The rule creation manager (RCM) is used for creating, modifying, and regulating rules or policies for resources. The RCM identifies the relationship between message and FCs using interest attributes and creates rules that provide access to the message in the OSN (e.g., only a specific set of users or friends cluster (FC) access the published message). Moreover, these rules are stored in the rules database (rules DB) for future use. Finally, the message will be a tag to the selected FC. The RCM methodology is elaborated in Section 3.2. The transferring of owner permission on a resource to another user is called delegation. If a delegation request is received by the delegation manager (DM) and checks whether the rule/policy exists, it will send this rule/policy to the RCM. If a rule/policy does not exist, then DM checks the given criteria. Finally, the delegation manager permitted or denied it based on the criteria. The detailed process of the DM is explained in Section 3.2.

3.1. Parser Manager

The parser manager processes message m in order to drive useful features that are required to map users in friends’ categories. To do so, it performs NLP-based operations on the message contents which are illustrated in Figure 2. These operations are (i) NEs (Name Entities Recognition), for example, organization, locations, person names, time, date, money, and percent, and (ii) NLP tokenization that refers to notions (e.g., a noun, pronoun). Our system depends on NEs (Named Entity) recognition libraries [63], which are capable to recognize named entities and categorize them into seven classes. As a result of this process, the parser manager identifies the attributes that reflect specific categories of users from the message. In the case of nil categories, the parser manager executes the next step. For the next step, our system relies on a set of NLPs (natural language processing) libraries [64] also known as NLP libraries.

3.2. Rule Creation Manager

The purpose of RCM is to create a rule for user clusters based on the relationship between the friends’ cluster (FC) and message. Moreover, these rules are used in the future. To do so, our system administers access to the published resources according to the relationship between the resources and the friends’ clusters (FCs). The RCM receives attributes that drive useful features from the message and reflect specific categories of users from the parser manager and the clusters from the ICM manager for its operation. If RCM receives more than one cluster, then a union of clusters (an instance of merging two or more clusters into one) will be obtained, and forming a new cluster and permit/allow rule will be created. An example of allowing/deny rule is as follows:where RO is the resource owner, FCi is the selected friends’ cluster, Ri is the text resource (message), and P is the permission that may be allowed or denied. Finally, according to the selected policy, message M is tagged to the designated friends’ cluster. If a delegation request is received, the delegation manager (DM) checks the relationship between FC and resource with the delegator and delegatee, respectively. If the criteria meet, allow rule/policy will be created; otherwise, deny rule/policy will be applied. If the RCM does not find attributes-related clusters, it sends these attributes to the KNN algorithm for further processing. The system initialized KNN (K-nearest neighbor) training on profile data at the beginning of the system and repeats whenever a new user is added. Due to this process, the KNN training is updated. The parser sends the message attributes to the RCM. Before sending, the RCM applies KNN (K-nearest neighbor) matching function on message attributes and finds the nearest cluster. If the nearest cluster exists, the rule creation module will create the rule for the cluster. The system will forward the cluster list to the owner if no cluster is found. The owner will check the cluster, and the rest of the procedure will be the same as previously mentioned.

4. Result and Discussion

In this section, we executed the experiments to compute all the baseline probabilities, and finally, the overall system worked as intended and showed the results.

Step 1. : Interest-based clustering
The dataset is based on “Bank Marketing” data and taken from a famous data science website [65]. The selected columns from the dataset used for the experiment are shown in Table 1. The proposed method is developed in Python and executed in Jupiter Notebook. The point of interest is three fields/columns (like Job, marital, and education), which are most related to OSN user profile data. NLTK toolkit function is used for the text cleaning process. It is a process in which most of the string is transformed to lowercase and then removes all types of stop-words like (i) language stop-words such as article, punctuation, preposition, pronoun, and conjunction (the, a, on, of, etc.), (ii) location stop-words (City and Country names, etc.), (iii) time stop-words such as name of the days and months (May, June, Monday, Today, etc.), and (iv) numerals stop-words such as hundred, thousand. After text cleaning finally, common interests are obtained from each corpus one by one using Algorithm 1. Figures 35 illustrate the interest categories extracted from the job, marital, and education fields.
Based on the above interests, Algorithm 1 creates interest-based user clusters. According to Figure 6, the y-axis shows cluster numbers, and the x-axis shows interests. For example, the unemployed persons exist in cluster 0, those who provide services exist in cluster 1, those who do management jobs exist in cluster 2, and so on. In Figure 6, clusters 0 to 11 are based on jobs, whereas in Figure 7, clusters are based on marital status; for example, married persons exist in cluster 12, unmarried persons exist in cluster 13, and divorced persons exist in cluster 14. Similarly, in Figure 8, clusters are based on education; for example, the primary pass person exists in cluster 15, and the unknown education person exists in cluster 18.

Step 2. : Parsing manager
When the parser manager receives the owner message “M,” the parser first applies the NER function and searches Person, Location, Date, Time, and Organization. Suppose the parser finds interest attributes and sends them to the RCM. If the RCM (Rule Creation Manager) finds the related clusters, then it will send these clusters for rule creation. If no attributes are found, the parser manager applies another NLP function over the message. The NLP function tokenizes the message “M.” Based on these tokens, proper nouns and nouns are extracted and sent to the RCM for further processing. For example, a message found or received “Married person party on Monday.” The NLP and NER function results are shown in Figures 9 and 10. The selected interest attributes are married, person, and party. Finally, the parser sends these interest attributes to the RCM.
In another example, a Message “‘Interviews for primary pass candidates on Friday” is extracted. The NLP and NER function results are shown in Figures 11 and 12. The selected interest attributes are interviews, primary, pass, and candidates. Finally, the parser sends these interests (or interest attributes) to the RCM. The result of NLP tokenization is as follows.

Step 3. Rule creation manager
The purpose of RCM is to create a policy for clusters based on the relationship between the selected cluster and the message. The RCM sent cluster requests to the CM based on received attributes from the parser. According to example 1, the RCM receives three interest attributes married, person, and party. After cluster matching, the RCM gets cluster 12 or FC12, created based on married users. The RCM creates a permit rule for FC12 and finally tags the message to FC12. According to example 2, the RCM receives four interest attributes: interviews, primary, pass, and candidates. After cluster matching, the RCM gets cluster 15 or FC15, created based on primary pass users. The RCM creates a permit rule for FC15 and finally tags the message to FC15.

4.1. Training and Testing of KNN Matching Function

For the training purpose, the dataset is based on “Bank Marketing” data and taken from a famous data science website [65]. The selected columns (fields) from the dataset are shown in Table 1. The point of interest is three fields/columns (like Job, marital, and education), which are most related to OSN user profile data. The training and testing of the KNN matching function consist of the following steps:(i)Import and handling of the dataset(ii)The KNN algorithm is used from the Sci-Kit-learn package(iii)Divide the dataset into test and training data(iv)Find the k-nearest neighbor values(v)Train the data into the model(vi)Evaluating the accuracy

In step 1, the required dataset is imported, and then select relevant columns (fields) and convert these columns into numeric coding using label encoding from SK-learn. In step 2, import the KNN package from SK-learn. In step 3, the dataset is divided into test and training splits to avoid over-fitting and better understand how our KNN function performed in the testing stage. Through this method, our function is tested on anonymous data. We use this data in the following manner: (i) 20% of the data are used for testing purpose, and (ii) 80% of the data are used for training purpose. In step 4, the system finds the k-nearest neighbor points, and in step 5, we train the model using the chosen dataset, and finally, in step 6, we evaluate the accuracy of the training data and testing data. After training the KNN algorithm on our dataset, the results are as follows. The graph between accuracy and the number of neighbors obtained from the training dataset is mentioned in Table 1. According to Figure 13, the x-axis shows the n neighbors, and the y-axis shows the accuracy level whereas the training dataset accuracy is shown in the orange color line and the blue color line shows the testing dataset accuracy.

5. Conclusion and Future Work

This research article presented a dynamic and automated access control system in OSN for textual resources and publications. Our proposed system is content-driven, and the ideas behind the semantics of the messages are automatically evaluated to identify the interests and select the user clusters (FC) based on these interests. Finally, create a permit rule for selected user clusters (FC) based on the relationship message and FC. The delegation is automatically allowed/permitted based on the following three criteria: (i) the requester must be a member of a related FC, (ii) the delegatee and message must have the same relationship as the message and FC, and (iii) the delegatee and owner must have the same relationship as FC and owner. If the above criteria are met, delegation is permitted; otherwise denied. We have successfully implemented and evaluated each module and initial testing of our proposed concept. Finally, we have enumerated some future work for our model. The delegation module will be implemented and tested. Furthermore, interest selection and rule creation procedures will be improved.

Data Availability

The Bank Marketing dataset used to support the findings of this study is freely available from the Kaggle website by using the following link: (https://www.kaggle.com/datasets/janiobachmann/bank-marketing-dataset.).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors acknowledge Foundation University Islamabad for its support to conduct this research work.