A Dynamic Recommender System for Improved Web Usage Mining and CRM Using Swarm Intelligence

Alphy, Anna; Prabakaran, S.

doi:https://doi.org/10.1155/2015/193631

The Scientific World Journal

On this page

Abstract Introduction Background Experimental Results Conclusion References Copyright Related Articles

Research Article | Open Access

Volume 2015 | Article ID 193631 | https://doi.org/10.1155/2015/193631

A Dynamic Recommender System for Improved Web Usage Mining and CRM Using Swarm Intelligence

Anna Alphy¹and S. Prabakaran¹

Academic Editor: Rafael Valencia-García

Received06 Mar 2015

Revised13 Apr 2015

Accepted15 Apr 2015

Published01 Jul 2015

Abstract

In modern days, to enrich e-business, the websites are personalized for each user by understanding their interests and behavior. The main challenges of online usage data are information overload and their dynamic nature. In this paper, to address these issues, a WebBluegillRecom-annealing dynamic recommender system that uses web usage mining techniques in tandem with software agents developed for providing dynamic recommendations to users that can be used for customizing a website is proposed. The proposed WebBluegillRecom-annealing dynamic recommender uses swarm intelligence from the foraging behavior of a bluegill fish. It overcomes the information overload by handling dynamic behaviors of users. Our dynamic recommender system was compared against traditional collaborative filtering systems. The results show that the proposed system has higher precision, coverage, measure, and scalability than the traditional collaborative filtering systems. Moreover, the recommendations given by our system overcome the overspecialization problem by including variety in recommendations.

1. Introduction

The customer relationship management (CRM) entails the interaction of an organization with the current and future customers. The competitions in e-business require the efficient management of web usage data because a competitor’s website may be only one click away. An improved understanding of customers’ interest and their behaviors increases the profit of an organization. A personalized website in view of the customer’s interests may bring customer’s attention to the site more and thus increases the customer utility. The information regarding customer’s interest and behavior also helps a website administrator to personalize or customize a web page for a user. Such increased usage of business websites online creates a huge amount of web usage information to manage causing information overload. To manage this information overload, efficient data mining techniques can be applied in addition to storing, retrieving, and managing these web usage data. These data mining techniques also may be used to identify the interesting patterns from web log data or online usage data.

The major challenges of online web usage data, in addition to information overloading, are its high dimensionality and dynamic nature caused by thousands of users. The online usage data is high dimensional because it contains huge number of clicks made by the users to purchase items. The online usage data represents the interest of human beings that are highly dynamic in nature. These dynamic behaviors may be due to the changes in the user’s interest or due to the addition or deletion of web pages in a website. The personalization of the web for a user should also cope with these issues.

Designing and developing a suitable recommender system may be very much helpful in web personalization. It uses the recommendations provided by the recommender systems for providing the users with their items of interest. In the past many research works have been done in recommender systems. But most of the traditional recommender systems cannot handle the dynamic nature of online usage data.

Moreover the traditional recommender systems give limited recommendations. In traditional recommender systems, the number of iterations before convergence is high and also the quality of recommendations reduces with the increase in the number of users. The traditional recommender system also cannot balance the quality measures such as coverage and precision.

To overcome the above issues, we propose a WebBluegillRecom-annealing dynamic recommender system which could also provide recommendations to users. The proposed dynamic recommender system uses swarm intelligence approach. That is, in our dynamic recommender system, the recommendations are given not only based on users’ interest but also based on the interest of the neighborhood users. Our dynamic recommender system also overcomes the overspecialization problem in many traditional recommender systems by providing variety in recommendations.

The performance of the proposed algorithm is compared with the traditional collaborative filtering recommender systems. The results of performance evaluation show that the proposed dynamic recommender system gives better predictions in less time without losing the quality in terms of coverage, precision, measure, and scalability, compared with the traditional approaches. WebBluegillRecom-annealing recommender system overcomes the information overload and dynamic behavior challenges of many other recommendation systems.

The rest of the paper is organized as follows. Section 2 describes the related works in the recommender systems. Section 3 provides basic knowledge required. Section 4 introduces the proposed WebBluegillRecom-annealing recommender system and Section 5 describes experimental results and performance evaluations.

2. Literature Survey

Ben Schafer et al. [1] introduced a collaborative filtering (CF) system that predicts a person’s interest for an item by using that person’s recorded interest with the recorded interest of a community of like-minded people. In the collaborative filtering approach, the neighbors of an active user are defined as the users that are similar to above a similarity threshold. Collaborative filtering supports group modeling. It gives recommendations based items liked by the users of the same interest. Dai et al. [2] introduced particle swarm chaos optimization mining algorithm (PSCOMA). It uses the strong global search ability of PSO and the strong local search ability of chaos optimization for the process of web usage mining. It offers a balancing between exploration and exploitation. Çelik et al. [3] introduced artificial bee colony data miner (ABC-Miner algorithm) for mining classification rules from large datasets. It uses intelligent foraging behavior of honey bees. Balabanovic [4] introduced Fab adaptive content based recommendation system. It gives web page recommendation service to users based on the recommendations of other users and also by analyzing their content. The main limitation of this approach is that since the recommendations are based only on users’ previous ratings of items, users cannot explore new items other than the items mentioned in user profiles. The main advantage of this technique is that since group modeling is not used for recommendations brand new items can be easily added in the recommendations. Pazzani [5] recommends items to users based on demographic information about the users. Here the demographic attributes such as age, gender, and education can be used to classify users and predictions are given to a user based on this demographic information. Nasraoui and Petenes [6] presented a fuzzy interface engine that uses rules derived from user profiles that are used to give recommendations to the user. The rules are generated from the user profiles. The user profiles are created by clustering the user’s web click streams. The main advantage of this approach is the very low cost when compared to collaborative recommendation systems; it can efficiently handle overlaps in users’ interest and low main memory is required during recommendation time. Berners-Lee et al. [7] introduced semantic web. The semantic web is an extension of the World Wide Web. In semantic web the meaning of the web pages is well defined and structured in such a way that the computers and humans can work in cooperation. The semantic web creates an environment in which agents freely move from pages to pages and bring essential information to the users. Chau et al. [8] introduced a multiagent system named collaborative spider to support user collaboration in web mining. It supports collaboration by sharing complete search sessions based on postretrieval analysis. Dorigo and Sttzle [9] introduced ant colony optimization (ACO) where the ants coordinated actions and self-organizing principles are used to solve computational problems. ACO is inspired from the foraging behavior of ant colonies. Labroche et al. [10] introduced AntClust algorithm for grouping web usage sessions using chemical recognition systems in artificial ants. AntClust algorithm gains inspiration from ants’ ability to differentiate between the nest mates and outsiders using the exchange of some chemicals. AntClust computes the similarity between the objects and groups the input web user sessions that represent the number of hits per page into clusters. Here, users with similar interest come in the same cluster. Kennedy and Eberhart [11] presented particle swarm optimization (PSO) which is an evolutionary swarm intelligence based computational model. PSO is inspired from bird flocks. Here, each swarm represents a solutions set. The swarms or particles fly through the solution space. Each position of the particle in the problem space represents a solution. At each move a fitness function is evaluated to identify closeness of particles solution to the global optimal solution. Here global best solution (best) of the particle is the best solution found in the neighborhood and the personal best solution (best) is the best position visited by the particle which until now are used to find the particles new position. Moawad et al. [12] introduced a new multiagent system based approach for personalizing the web search results. In this approach dynamic user profiles are created and maintained through implicit user feedback system. Saka and Nasraoui [13] introduced flocks of agent based recommendation system (Flock-Recom) to give recommendations to user for web personalization. It gains inspiration from the collaborative behavior of flocks of birds. Each agent represents a user. Agents are allowed to freely move in the visualization panel. Agents iteratively adjust the velocity and position on the visualization panel. Based on the neighboring agents on the visualization panel top- recommendations are given to the user.

3. Background

3.1. Web Usage Mining

Web usage mining is the process of applying data mining techniques to web log data to discover interesting usage patterns [14]. It consists of the following steps:(1)Preprocessing the web log files.(2)Pattern discovery using data mining techniques [14–17].(3)Postprocessing.(4)Tracking evolving user profiles [18].

3.1.1. Preprocessing the Web Log Files

Each entry in the web log files consists of IP address, URL viewed, and access time. The web log files extracted from the web server contain a huge amount of information. All these pieces of information are not needed for further processing. The quality of the patterns discovered after web usage mining process depends on how well you perform data cleaning and user session identification. Data cleaning includes filtering the crawler’s request, request to graphics, and identifying unique sessions. The user session identification includes identifying the pages referenced by a user during a single visit to a site.

3.1.2. Pattern Discovery Using Data Mining Techniques

Once the user sessions are identified, various data mining methods such as frequent item sets, clustering, classification, association rule mining, path analysis, neural network approaches, and heuristic approach methods can be applied to extract useful patterns from web log files. These discovered patterns identify users’ interests, behavior, habits, and changes in their interest. A website can be personalized or customized for a user based these pieces of information, thereby increasing the profit of an organization.

3.1.3. Postprocessing and Tracking Evolving User Profiles

User session categories [18] are summarized into user profiles. Tracking evolving user profiles includes comparing the user profiles generated in different months. This helps in identifying new groups of user profiles and merging or splitting of user profiles and inactive user profiles. All these changes in user profiles represent the changes in customer’s interests or behaviors.

3.2. Swarm Intelligence

Swarm intelligence gains inspiration from several communities in nature such as fish schools, ant colonies, honey bees, and bird flocks. Swarm intelligence uses intelligent agents to handle copious information. An agent perceives the environment through sensors and it acts on the environment through actuators [19]. Intelligent agents can continuously perceive the dynamic conditions in the environment; it can perform actions to affect the conditions in the environment and performs reasoning to interpret perceptions. The flexibility of the software agents makes it possible to dynamically choose which actions to perform and their sequence in response to the state of its external environment.

3.3. Stimulated Annealing

Stimulated annealing [20] provides an optimal solution for the nearest neighbor search. Annealing is the process of heating a metal to its melting point and then cooling it back into solid state. Final structure of the metal depends upon cooling function. Slow cooling results in large crystals with low energy whereas fast cooling results in high energy state resulting in imperfections. Slow cooling always gives a better result.

4. The Proposed System

In the past, many research works were done in swarm intelligence for web usage mining like AntClust [10], particle swarm optimization (PSO) [11], Fab [4], collaborative filtering (CF) [1], and so forth. All these methods cannot model the dynamic behavior of users efficiently and the recommendations given to users lack ability to handle seasonality in users’ interest. Because of the information overload problem in web usage data in many traditional swarm intelligence methods such as ACO and PSO, the number of iterations needed for the system before convergence is high.

In the present work, we propose a WebBluegillRecom-annealing dynamic recommender system. It uses the simulated annealing and swarm intelligence for identifying the interesting items to be recommended for the users. The WebBluegillRecom-annealing algorithm gains inspiration from the foraging behavior of bluegill fish. Swarm intelligence uses intelligent agents to handle abundant information on the web, thereby increasing scalability. Here, intelligent software agents are used to model artificial life. Intelligent software agents can handle the dynamic nature of online usage, thereby overcoming information overload problem. This flexibility property permits the artificial bluegill fish to model foraging behavior of real bluegill fish in different densities of prey in water. The learning capability of software agents allows continuous monitoring of users dynamic behaviors and gives predictions.

The proposed WebBluegillRecom-annealing algorithm uses a cooling schema to make all agents in stable state. The cooling algorithm has been developed based on the simulated annealing approach. The cooling schema in the proposed WebBluegillRecom-annealing algorithm reduces the number of iterations required for the agents to enter into a stable low energy state. Figure 1 shows the steps involved in the proposed dynamic recommender system.

In the proposed WebBluegillRecom-annealing dynamic recommender system initially each user obtained after the data cleaning process is mapped to an agent. The agents are placed randomly on the 2D visualization panel. A cooling algorithm is then applied to bring similar agents nearer to each other in the visualization panel. This gives an initial neighborhood for agents. A better neighborhood is formed in each iteration of the algorithm by iteratively adjusting the position of the agents on the visualization panel. That is, the users that exhibit similar behavior will form a hinterland. In order to handle dynamic data that is collected incessantly and to improve the quality of neighborhood, a dynamic clustering technique is applied. Recommendations are given to users as best items preferred by the user’s latest neighborhood.

The proposed WebBluegillRecom-annealing recommender system can handle the following challenges of web usage mining such as information overload problem, dynamic behavior of users, large number of iterations before convergence, and scalability and overspecialization in recommendations problem.

4.1. Preprocessing of Web Logs to Extract Input User Sessions

Web server log files are preprocessed and input user sessions are identified. Here the th user session is encoded as an -dimensional binary attribute vector [21] User_i with the following property:

4.2. User Profile Creation Based on Data Mining Techniques

In this paper, a dynamic clustering based data mining technique is used to discover interesting online usage patterns. Unlike conventional clustering, in dynamic clustering [22], the whole input data need not be made available initially. The input data is collected continuously over time. Dynamic clustering technique has the ability to manage incoming dynamic data that represents the dynamic behaviors of users. Dynamic behaviors of users are due to the changes in user’s interest or behavior or due to the addition or deletion of a web page. These discovered patterns can be used for the creation of user profiles and for giving recommendation to users.

4.2.1. The Proposed WebBluegillRecom-Annealing Algorithm

In the proposed WebBluegillRecom-annealing algorithm (Algorithm 1), each user is mapped to an agent. All the agents are placed on the visualization panel randomly. To bring similar agents closer and dissimilar agents far apart, a cooling algorithm (Algorithm 2) is applied. Then the clusters of agents are formed using cluster-creation algorithm (Algorithm 3). It groups similar agents into the same cluster. That is, users having similar interests belong to the same cluster. This initial set of clusters can be used for further processing. These initial clusters are given as input to the Bluegill-BestPredictions algorithm (Algorithm 4). The Bluegill-BestPredictions algorithm can optimize these initial clusters by identifying a better neighborhood for agents in each cluster forming another hinterland. Moreover, it can assign new dynamic data representing a new dynamic behavior of user to the most similar cluster. It performs dynamic clustering of dynamic data and gives the users the finest recommendations by predicting the best items preferred by the neighborhood agents. Bluegill-BestPredictions algorithm gives dynamic recommendations to users. Since the recommendations are dynamic, the WebBluegillRecom-annealing algorithm can satisfy the needs of old and new users. The following part explains each of these algorithms in detail.

Notations used: = dataset; = number of data items; = dimensional real number space.
Input: dataset , where
Output: clusters, best recommendations
Steps:
() Repeat
() read a new input record data //reads dynamic data
() if data is not assigned an agent
() Assign new agent to data
() Else
() Identify the agent corresponding to data
() end if
() until no more data records
() Place the agents randomly on the visualization panel
() Run Cooling algorithm //returns the best position for the agents in the visualization panel
() Visualize agents with their updated positions in the visualization panel
() Run Cluster- creation algorithm //returns initial clusters
() While () do
() Run Bluegill-BestPredictions algorithm //performs dynamic clustering and returns best I recommendations for new and
old users
() End While
() Until stopping condition

Notations used:
= agents plus their position on the visualization plane; TL = temperature length; = agents plus their updated position
on the visualization plane; = th agent on the visualization panel; Pos() = position of agent on the visualization plane;
= total number of agents on the visualization plane.
Input: where to , , TL,
Output:
Steps:
() Generate an initial solution ,
() Generate initial temperature
() Repeat
() For
() Identify agent neighboring
() Let //where
Where
() If //downhill move
()
() Else
() IF random(0, 1) < e^∧ () then
() End if
() Increment by one.
() End for
() //reduce temperature
() until stopping condition
() Return

Notations used: Pos(Agent) = position of agent on the visualization plane; = distance threshold.
Input: Agent, Pos(Agent)
Output: Clusters
Steps:
() read the agent
() assign the agent to cluster
() for all agent do
() if (Distance()) < and <
() assign to
() Else
() assign to
() End if
() End for
() Return Clusters

Input: Extracted Clusters , newdata: data
Output: Extracted Clusters , Visualization of Clusters, Recommendations to users.
Steps:
() REPEAT
() Map each cluster centroid as Lake and define . //Now N Lakes
() If data is not assigned an agent
() Then map agent to data // represents the bluegill.
() Else
() Identify the agent corresponding to data
() End if
() FOR
() IF Sim(agent, Lake) > Maxth where the ranges from 1 to //bluegill enters the lake containing high density of prey
and identifies neighboring preys
() Repeat
() If Sim(agent, neighbor_agent) high_ range //bluegill eats large size prey
() Then Eat neighbor_agent
() nosh = Sim(agent, neighbor_agent)
() Update energy = Sum(nosh)
() Increment by one
() Else
() Increment by one
() End if
() Until energy = 1
() ELSE IF Sim(agent, Lake) < Minth where ranges from 1 to //bluegill enters the lake containing low density of preys
and identifies the neighboring preys
() Repeat
() If Sim(agent, neighbor_agent) high_range //blue gill eats large size prey
() Then Eat neighbor_agent
() nosh = Sim(agent, neighbor_agent)
() Update energy = Sum(nosh)
() Increment by one
() Else
() Break;
() End if
() Until energy = 1
() Repeat
() If Sim(agent, neighbor_agent) mid_range //blue gill eats medium size prey
() Then Eat neighbor_agent
() nosh = Sim(agent, neighbor_agent)
() Update energy = Sum(nosh)
() Increment by one
() Else
() Break;
() End if
() Until energy = 1
() Repeat //blue gill eats small size prey
() Eat neighbor_agent
() nosh = Sim(agent, neighbor_agent)
() Update energy = Sum(nosh)
() Increment by one
() Until energy = 1
() ELSE //bluegill enters a lake containing medium density of prey and identifies neighboring preys
() Repeat
() If Sim(agent, neighbor_agent) high_range //blue gill eats large size prey
() Then Eat neighbor_agent
() nosh = Sim(agent, neighbor_agent)
() Update energy = Sum(nosh)
() Increment by one
() Else
() Break;
() End if
() Until energy = 1
() Repeat
() If Sim(agent, neighbor_agent) mid_range //blue gill eats medium size prey
() Then Eat neighbor_agent
() nosh = Sim(agent, neighbor_agent)
() Update energy = Sum(nosh)
() Increment by one
() Else
() Break;
() End if
() Until energy = 1
() END IF
() END FOR
() Keep the clusters with Clus_threshold sessions.
() Plot Clusters on visualization panel according to cluster names //visualize clusters in x, y coordinates where different
cluster elements are represented in different shapes.
() For each cluster do
() Find the URL’s which are visited more than Url_count_threshold in all the sessions of that cluster //display URLs
accessed frequently in all the sessions of that cluster as user profiles
() End For
() For each agent agent do
() Find the cluster to which agent belongs.
() = agents of agent in cluster within distance
() = preferred items by each agents in
() Recommend most frequent items in to user represented by agent
() End For
() UNTIL there is no more data records
() End

In the proposed WebBluegillRecom-annealing algorithm, initially each user is mapped to an agent. Agents are placed on the visualization panel randomly. Visualization panel is a two-dimensional plane represented by - coordinates. The -axis and -axis values range from 0 to 1.

To bring similar agents closer and dissimilar agents far apart in the visualization panel, we use a cooling algorithm based on annealing concept used in metals. The attributes of this thermodynamic simulation can be mapped into stimulated annealing optimization, where a system state represents feasible solutions, energy represents costs, change of state represents the neighboring function, temperature represents control parameter, and frozen state represents final solution. Initially when the temperature is high, it accepts bad moves. This is because starting solution may not be too good because of the difficulty of escaping from neighborhood. But when the temperature is low, it almost rejects bad moves. The best ever result is kept as the final solution.

The inputs to the cooling algorithm (Algorithm 2) are agents, their position on the visualization panel, and the temperature length. In Step (1) of Algorithm 2 an initial solution Ω₀ is generated randomly and assign this as the final solution . In Step (2), initial temperature value is generated. In line 5, a new solution is created by selecting neighboring agent that is similar to the solution . In this algorithm a cost function is calculated using cosine similarity of agents. Here, the cost function is to maximize the cosine similarity (CS) of agents. Cosine similarity between any two agents represents the similarity between users that are mapped to that agent. Cosine similarity can handle qualitative and quantitative data. It can also handle high dimensional sparse data. In Step (6), the change in the cost function is calculated. In line 7, it checks whether the change in cost function (energy) is decreased. If energy is decreased then the new state is accepted (line 8). Otherwise, the new state with probability (line 10) is accepted. In line 14, a geometric temperature reduction is used, where . To get good results should be adjusted in such a way with small number of iterations at higher temperature and larger number of iterations at low temperature. For the final solution to be independent of the starting one the initial temperature should be high enough. When the temperature is low there are no uphill moves. When a given minimum value of temperature is reached or when a certain amount of looping has been performed without accepting a new solution, the algorithm is stopped (line 14).

The cooling algorithm applied is a greedy heuristic allowing the agents to move from current positions to the best neighboring solution. The cooling algorithm returns the agents with their new position on the visualization panel. Now similar agents lie close in the visualization panel. As the distances between the agents on the visualization panel increase their similarity decreases. To avoid local minima, it supports uphill moves. After applying the cooling function, agents converge to a frozen low energy state where similar agents are located nearer to each other. The usage of cooling algorithm reduces the number of iterations. These agents with their updated positions on the visualization panel after applying cooling algorithm are given as input to the cluster-creation algorithm.

In cluster-creation algorithm (Algorithm 3) Distance (, ) represents the distance between agents and on the visualization panel. represents the mean squared error or average dissimilarity between the cluster prototype and the data records [21]. Mean squared error is calculated using where represents the th user session and represents the set of sessions assigned to th cluster. is the distance from to . Initial clusters of agents are formed by grouping the agents that lie within a distance threshold and whose mean squared error lies within into a cluster (lines 4 to 6). Thus the agents within a cluster represent similar users. These clusters are given as input to the Bluegill-BestPredictions algorithm.

In the Bluegill-BestPredictions algorithm (Algorithm 4) the foraging behavior of bluegill fish is used to give dynamic recommendations to user. A bluegill sunfish [23] eats a prey to maximize its energy intake. At high density of prey, the bluegill fish eats a diet made up of larger prey. At a medium density of prey, bluegill fish eats large prey over a small prey. At this time bluegill fish becomes more selective. That is, instead of time spent in capturing and eating smaller ones, the bluegill fish can maximize its energy by eating larger ones. At a low density of prey the bluegill fish eats large, medium, and small prey as they are encountered, thus maximizing its energy intake. The behavior of bluegill fish is dynamic because the way in which it catches its prey is different in different densities of prey. The bluegill fish maximizes its energy in an optimized way, that is, more energy in less time. These behaviors of bluegill fish can be used to give dynamic recommendations to users. The proposed dynamic recommender system gives users better recommendations in less time, that is, lesser number of iterations, without compromising the quality in terms of coverage and precision. The usage of intelligent software agents helps us to simulate this artificial life. The intelligent agents are highly flexible to adapt to the dynamic behavior of users.

The Bluegill-BestPredictions algorithm, described as Algorithm 4, supports dynamic clustering of dynamic data items and provides dynamic predictions to users. For dynamic clustering [19], the complete data set need not be made available initially. The data is continuously collected over time. Whenever new data is added, it requires costly update of the clusters. The new web click streams may be due to change in user’s interest or due to the addition or deletion of a web page from given website. Application of cooling algorithm and cluster formation algorithm results in the visualization of initial clusters in the visualization panel and similar agents belonging to the same cluster. Distance between the agents in the visualization panel represents the similarity between the users.

In line 2 of Algorithm 4, the centroid of each cluster obtained as the output of cluster-creation algorithm is mapped as a lake. In line 3, data_a represents the input dynamic data and is mapped to the corresponding agent in lines 4 and 6. The data_a is considered as the bluegill fish. Here each cluster is considered as a lake with different density of prey and the agents in each cluster (lake) are considered as the prey. In line 9, the similarity of the agent_a and Lake_i is compared using (3). Here, we use web session similarity [19, 21] between URLs as similarity measure based on the fact that an agent represents user sessions: where represents the user session vectors that are prepared in the preprocessing step and represents the syntactic similarity between the th and th URLs. Consider

For a given URL Ui, represents the path traversed from the root node which is the main page to the node corresponding to the th URL. indicates the length of this path [19].

If the similarity of agent_a and Lake_i is greater than a predefined threshold (Maxth) value we assume that bluegill enters a lake containing high density of prey (line 9). In high density of prey, bluegill maximizes its energy by eating only large size prey. Here agent_a identifies the most similar neighboring agents (line 11) in that cluster until the stopping condition (line 18). The Eat algorithm (Algorithm 5) groups all these similar agents into a new cluster.

Input: Cluster , agent
Output: Cluster
Steps:
() Read agent from cluster to which it belongs
() Move the agent to cluster
() Delete agent from
() End

If the similarity of agent_a and Lake_i is less than the predefined minimum threshold (Minth) value we assume that bluegill enters a lake containing low density of prey (line 20). In low density of prey instead of waiting for larger prey bluegill maximizes its energy by eating any kind of prey as they get nearer. That is, it does not maintain any priority for diet. It will eat medium size prey or small size prey or large size prey as they come nearer to bluegill. If the similarity of agent_a and its neighboring agent in the visualization panel is an element of high_range (lines 22 to 25 assume bluegill eats large size prey) then the agent_a and neighboring agent are grouped together. Again the similarity between agent and the next neighbor agent is compared. If the similarity is an element of midrange we assume bluegill eats medium size prey (lines 33 to 36) or if the similarity is an element of low range we assume bluegill eats small size prey (lines 42 to 45). This process of eating is continued until bluegill gets sufficient energy for its survival.

If the similarity of agent_a and Lake_i is between Minth and Maxth we assume bluegill enters a lake containing medium density of prey (line 47). In this situation bluegill eats large and medium size preys. That is, instead of wasting time on capturing small size prey it prefers to capture larger and medium size prey. Lines 48 to 67 perform this behavior of bluegill fish. That is, here agent_a attracts high and medium similar agents.

These movements of agents result in a new group of agents or merging or splitting of some clusters or even deletion of some clusters. The agents with their new positions are visualized in visualization panel (line 71). In line 73, all the URLs that are visited more than a predefined threshold value Url_count_threshold are displayed for each valid cluster. Line 77 assigns the set of neighboring agents of agent_j that lies within a distance best to . Line 78 assigns the frequently preferred items by all the agents in to . In line 79, the most frequent items in the set are given as recommendations to user represented by the agent agent_j. This kind of recommendations overcomes the overspecialization problem in many traditional recommendation systems. The recommendations given by the proposed recommender system include variety in recommendations. That is, here recommendations are given not only based on that user profile of users but also based on the preferences of neighboring users. These recommendations can be used for personalizing a website, thereby improving customer relationship management (CRM).

Here, the performance of the proposed WebBluegillRecom-annealing algorithm is compared with the traditional collaborative filtering based recommender system. The performances are evaluated in terms of coverage, precision, and measure. The experimental results show that the proposed WebBluegillRecom-annealing dynamic recommender system performs better recommendations than the traditional collaborative systems.

5. Experimental Results

In Section 5 we start with the 2D visualization of agents on the visualization panel. Then we proceed with the visualization of clusters of agents obtained by the cluster-creation algorithm and Bluegill-BestPredictions algorithm. Then we show inter- and intracluster similarity measures of the obtained clusters. Then we proceed with the recommendations given to the users. And finally we compare the quality of the proposed dynamic recommender system with the traditional collaborative filtering techniques.

The proposed WebBluegillRecom-annealing dynamic recommender system is implemented on high dimensional real life data example. It is implemented using Java agent development environment (JADE). Figure 2 shows the random placement of agents in the visualization panel.

In Figure 2, visualization panel is a two-dimensional plane represented by - coordinates. The -axis and -axis values range from 0 to 1. Figure 3 shows the position of agents on the visualization panel after applying cooling algorithm.

In Figure 3, similar agents lie nearer in the visualization panel. To get better neighboring agents, a slow cooling method is adopted. That is, the value of is reduced slowly. Here a geometric temperature reduction is used. Here (line 14 of Algorithm 2) and TL = 0.005. Values are chosen by trial and error. Figure 4 shows the clusters of agents after applying cluster-creation algorithm.

Here the neighboring agents that lie within a distance threshold = 0.15 and when the average dissimilarity between the clusters and their member agents is less than are moved into the same cluster. Values are set by trial and error. Agents in the same clusters represent the users with similar behavior. The initial clusters are given as input to the Bluegill-BestPredictions algorithm.

Figure 5 shows the clusters of agents obtained after applying Bluegill-BestPredictions algorithm. In the Bluegill-BestPredictions algorithm for higher density cluster the Maxth value ranges from 0.75 to 1.0. For lower density cluster we set the Minth value as 0.40. In medium density cluster the similarity lies between 0.40 and 0.75. In high density cluster high_range means similarity above 75% of Maxth. In lower density cluster, high_range means similarity above 75% of Minth, mid_range means similarity between 45% and 75% of Minth, and low_range means similarity below 45% of Minth. In medium density cluster high_range means similarity above 75% of obtained medium density cluster similarity value. Here, mid_range lies within 45% and 75% of medium density cluster similarity value. These parameters are set by trial and error method. In Figure 5 Clus_threshold = 10 and [19], where ICTF denotes item count threshold frequency [19] that represents the minimum number of URL patterns that represent that session. Here ICTF = 0.10. An ICTF value is a real number and it lies between 0 and 1. Clus_threshold represents the minimum cluster size required for a valid cluster.

Quality of the obtained clusters can be measured in terms of intracluster similarity and intercluster similarity. For better clusters intracluster similarity value should always be higher than intercluster similarity value. Intracluster similarity value represents the similarity between the elements of that cluster. Intercluster similarity represents the similarity between the elements of a cluster with the members of the other cluster. Table 1 represents intracluster similarity and intercluster similarity of obtained clusters using WebBluegillRecom-annealing algorithm.

From Table 1 we can observe that better clusters are obtained by optimizing the clusters generated by cluster-creation algorithm by the Bluegill-BestPredictions algorithm. From this table we can notice that whenever intracluster similarity increases intercluster similarity decreases. This is a sign of good quality clusters. Table 2 shows the sample user profiles generated by WebBluegillRecom-annealing algorithm.

Table 2 shows some user profiles in cluster 6 representing a user’s interest after applying WebBluegillRecom-annealing algorithm. WebBluegillRecom-annealing system gives the best recommendations to a user with best = 0.04 (line 68 of Algorithm 4). Next time mia logs onto the website the following recommendations are given in Table 3.

Table 3 shows that recommendations are given to a user not only based on that user’s preference but also based on the other users’ (or agents) items of interest that lie within the distance best in the same cluster. Here the most frequent items are selected. The recommendations made by the WebBluegillRecom-annealing dynamic recommender system contain variety. That is, the recommendations given to a user are not limited to that user’s interest but also include the items more frequently liked by other users. The goodness of these recommendations is evaluated in terms of coverage and precision [18]. Here precision represents a summary profile’s items which are all correct or included in the original input data; that is, they include only the true data items:

Coverage represents summary profile’s items which are complete compared to the data that is summarized; that is, they include all the data items:

Here is a summary of input sessions and represents discovered mass profile. A precision value of 1 indicates that every recommended item is an element of the original truth set. A coverage value of 1 indicates that all items in the original truth set are recommended. Efficiency recommender system depends on how well it can balance the precision and coverage . To calculate coverage and precision the generated user profiles and the input user sessions are used. Here the precision, coverage, measure, and variety in recommendations of the proposed WebBluegillRecom-annealing dynamic recommender system are compared with the traditional collaborative filtering system. Traditional CF approach is selected as baseline to evaluate the performance of the proposed recommender system because both our approach and CF approach use nearest neighbor property. In CF approach if a user is similar to user , then is considered as the neighbor of . In CF approach to generate a prediction for an item it analyzes the ratings for from users in ’s neighborhood using Pearson correlation [1].

In Figure 6 the proposed WebBluegillRecom-annealing dynamic recommender system is compared with the traditional collaborative filtering system on precision for different values.

From Figure 6 we can observe that the precision values for WebBluegillRecom-annealing dynamic recommender system are slightly better than the collaborative filtering systems for small values of . Better recommendations are obtained when . This indicates that the recommendations given by WebBluegillRecom-annealing system are an element of the original input datasets when compared to the collaborative filtering system. In Figure 7 the proposed WebBluegillRecom-annealing dynamic recommender system is compared with the traditional collaborative filtering system on coverage.

From Figure 7 we can observe that, as increases, the coverage values also increase for WebBluegillRecom-annealing dynamic recommender system and collaborative filtering systems. Moreover, for a particular value of , as the iteration increases, there is an improvement in coverage values for WebBluegillRecom-annealing dynamic recommender system compared with the collaborative filtering system. WebBluegillRecom-annealing dynamic recommender system has much better values for coverage than the collaborative filtering system for small values of . But for higher values of there is only a slight improvement in coverage value compared to the collaborative filtering system. The recommendations given by the WebBluegillRecom-annealing system contain more elements in the original input datasets than the collaborative filtering system.

Balancing of precision and coverage can be represented using measure [19]. Higher values for measure represent more balanced coverage and precision. In Figure 8 the proposed WebBluegillRecom-annealing system is compared with the traditional collaborative filtering system on measure:

From Figure 8 we may observe that the measure is higher for WebBluegillRecom-annealing system when compared to collaborative filtering system. That is, better balancing of coverage and precision is possible in WebBluegillRecom-annealing system when compared to collaborative filtering systems. In the WebBluegillRecom-annealing system the same items are not recommended to users again and again. It includes variety in recommendations.

Figure 9 shows the comparison of the WebBluegillRecom-annealing system with traditional CF systems on precision when .

From Figure 9 we can observe that the proposed recommender system can face the challenge of reducing the number of iterations required in comparison with traditional CF. For the proposed WebBluegillRecom-annealing system when the number of iterations increases, the precision also increases. At the same time for a particular number of iterations the precision provided by the proposed method is better than the precision achieved for the same number of iterations in CF systems. That is, the required precision could be achieved with less number of iterations required for CF method. The annealing approach used in the algorithm reduces the number of iterations actually required to achieve a particular precision. To attain this precision traditional CF needs a higher number of iterations. Figure 10 shows the comparison of the WebBluegillRecom-annealing system with traditional CF systems on coverage when .

From Figure 10 we can observe that, for a particular number of iterations, the coverage provided by the proposed method is better than the coverage achieved for the same number of iterations in CF systems. Figure 11 shows the comparison of the WebBluegillRecom-annealing system with traditional CF systems on measure when .

From Figure 11 we can observe that the proposed method gives better measure when compared to traditional CF system in less number of iterations. That is, to attain a particular measure the traditional CF system needs more iterations. Figure 12 shows the number of times each item is recommended for a specific user over 10 different runs.

Variety means number of distinct recommended items [19]. In Figure 12, -axis represents the item ID and -axis represents the frequency of recommendations for a particular item. From Figure 12 we can observe that while traditional CF systems kept recommending the same items, WebBluegillRecom-annealing system adds variety without losing coverage, precision, and measure.

To summarize, Figures 6 to 12 show that, in the WebBluegillRecom-annealing system, as the iteration increases, there is an increase in similarity of neighbors resulting in better recommendations. As the number of iterations increases, better recommendations are given by the WebBluegillRecom-annealing dynamic recommender system than the collaborative filtering system. WebBluegillRecom-annealing dynamic recommender system gives varieties in recommendations, thereby overcoming overspecialization in recommendations. Results of returning items that are too similar to the previously rated items by the user are called overspecialization. The proposed Bluegill-BestPredictionsalgorithm used in the proposed recommender system gives dynamic recommendations each time to a user. That is, it recommends a variety of new items for the user. Hence, in the proposed system, recommendations are given in accordance with the users’ varying interests. These improvements are due to the dynamic nature of foraging behavior of agents that forms dynamic neighborhood. More variety in recommendations helps the customers to review a larger number of items before buying. These interactions encourage the customers to visit the web page more frequently, thus improving customer relationship management. Since the proposed method gives wide variety of recommendations in comparison with the traditional CF method, it lures or encourages the customers to visit more frequently the website, enriching the relationship with the customer. Since a larger number of recommendations are given, customer interacts with the website more frequently. This may drive sales growth too. In turn, it requires more information to be provided by the business community regarding the product to the customers. That is, it improves customer relationship management (CRM).

Figure 13 shows the comparison of the proposed WebBluegillRecom-annealing system and collaborative filtering system on precision for six months on varying number of users. It is based on the results of recommendations given to users. In Figure 13 the primary horizontal axis represents the months, the primary vertical axis represents the number of users that access the website in each month, and the secondary vertical axis represents the precision of recommendations given to users. WR-low and CR-low represent the lowest value of precision obtained in a month by evaluating the recommendations given to users in that month by the WebBluegillRecom-annealing dynamic recommender system and the collaborative filtering system, respectively. WR-high and CR-high represent the highest value of precision obtained in a month by evaluating the recommendations given to users in that month by the WebBluegillRecom-annealing dynamic recommender system and the collaborative filtering system, respectively. WR-close and CF-close represent the average value of precision obtained in a month by evaluating the recommendations given to users in that month by the WebBluegillRecom-annealing system and the collaborative filtering system, respectively.

From Figure 13 we can observe that the WebBluegillRecom-annealing system has higher values of precision when compared to collaborative filtering system. In WebBluegillRecom-annealing system the precision is not that much affected by the number of users. But in the collaborative system there is a reduction in precision value with the increase in the number of users. That is, scalability is better supported by the proposed WebBluegillRecom-annealing system. Figure 14 represents the comparison of WebBluegillRecom-annealing system with the standard collaborative filtering system on coverage for different number of users in different months.

In Figure 14, WR-high and CR-high represent the highest value of coverage obtained in a month by evaluating the recommendations given to users in that month by the WebBluegillRecom-annealing dynamic recommender system and the collaborative filtering system, respectively. WR-close and CF-close represent the average value of coverage obtained in a month by evaluating the recommendations given to users in that month by the WebBluegillRecom-annealing dynamic recommender system and the collaborative filtering system, respectively. WR-low and CR-low represent the lowest value of precision obtained in a month by evaluating the recommendations given to users in that month by the WebBluegillRecom-annealing system and the collaborative filtering system, respectively. From Figure 14 we can observe that the coverage is higher for WebBluegillRecom-annealing dynamic recommender system. In the WebBluegillRecom-annealing system coverage of the recommendations is not that much affected by the increase in the number of users when compared to collaborative filtering systems. Figure 15 shows the comparison of the proposed recommender system and collaborative filtering system on measure for six months.

In Figure 15 we can observe that measure is higher for the proposed WebBluegillRecom-annealing dynamic recommender algorithm. We can also observe that better balancing coverage and precision are possible even though the number of users is increased.

To summarize, Figures 13, 14, and 15 show that WebBluegillRecom-annealing dynamic recommender system has better scalability than the traditional collaborative filtering systems. The “low” value may be due to a sudden rare move by the user. Even in these rare moves WebBluegillRecom-annealing dynamic recommender system has slightly better values for precision, coverage, and measure than the collaborative filtering systems. WebBluegillRecom-annealing dynamic recommender system includes variety in recommendations without compromising quality in terms of coverage, precision, measure, and scalability. These improvements are due to the dynamic nature of foraging behavior of bluegill agents which attracts (eats) similar agents to it resulting in a better neighborhood. These recommendations can be used for personalizing or customizing a website, thereby increasing customer relationship management.

6. Conclusion

In this paper a new dynamic recommender system called WebBluegillRecom-annealing system is presented. The proposed system is based on the swarm intelligence that gains inspiration from the dynamic foraging behavior of bluegill fish. The artificial life is simulated using software agents. WebBluegillRecom-annealing dynamic recommender system is capable of handling dynamic data. It uses an annealing approach to identify the initial best neighborhood for agents, thereby reducing the number of iterations before convergence. The WebBluegillRecom-annealing recommender system includes variety in recommendations, thereby overcoming the overspecialization problem in some traditional recommendation systems. The results obtained are compared with the traditional collaborative filtering system. The experimental results show that the WebBluegillRecom-annealing recommender system can better handle dynamic behavior and seasonality in users’ interest than the traditional collaborative filtering systems. The experimental results show that the recommendations given by WebBluegillRecom-annealing system have better values for precision, coverage, and measure than the collaborative filtering system. The proposed dynamic recommender system reduces the number of iterations before convergence when compared to traditional CF recommender systems. Moreover, in WebBluegillRecom-annealing system the quality of recommendations is not much affected by the increase in the number of users. That is, WebBluegillRecom-annealing system has improved scalability compared with the traditional collaborative filtering system. The recommendations given by the WebBluegillRecom-annealing system can be used for customizing a website, thereby improving customer relationship management (CRM). The main limitation of this method is that several parameters have to be set a priori. In the future this work can be extended to track evolving user profiles.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

References

J. Ben Schafer, D. Frankowski, J. Herlocker, and S. Sen, “Collaborative filtering recommender systems,” in The Adaptive Web, vol. 4321 of Lecture Notes in Computer Science, pp. 291–324, Springer, Berlin, Germany, 2007.
View at: Publisher Site | Google Scholar
L. Dai, W. Wang, and W. Shu, “An efficient web usage mining approach using chaos optimization and particle swarm optimization algorithm based on optimal feedback model,” Mathematical Problems in Engineering, vol. 2013, Article ID 340480, 8 pages, 2013.
View at: Publisher Site | Google Scholar
M. Çelik, D. Karaboğa, and F. Köylü, “Artificial bee colony data miner (ABC-Miner),” in Proceedings of the International Symposium on Innovations in Intelligent Systems and Applications (INISTA '11), pp. 96–100, IEEE, Istanbul, Turkey, June 2011.
View at: Publisher Site | Google Scholar
M. Balabanovic, “An adaptive web page recommendation service,” in Proceedings of the 1st International Conference on Autonomous Agents (AGENTS '97), pp. 378–385, ACM, Marina del Rey, Calif, USA, February 1997.
View at: Publisher Site | Google Scholar
M. J. Pazzani, “Framework for collaborative, content-based and demographic filtering,” Artificial Intelligence Review, vol. 13, no. 5, pp. 393–408, 1999.
View at: Publisher Site | Google Scholar
O. Nasraoui and C. Petenes, “Combining web usage mining and fuzzy inference for website personalization,” in Proceedings of the KDD Workshop on Web mining as a Premise to Effective and Intelligent Web Applications (WebKDD '03), pp. 37–46, Washington, DC, USA, 2003.
View at: Google Scholar
T. Berners-Lee, J. Hendler, and O. Lassila, “The semantic web,” Scientific American, vol. 284, no. 5, pp. 34–43, 2001.
View at: Publisher Site | Google Scholar
M. Chau, D. Zeng, H. Chen, M. Huang, and D. Hendriawan, “Design and evaluation of a multi-agent collaborative Web mining system,” Decision Support Systems, vol. 35, no. 1, pp. 167–183, 2003.
View at: Publisher Site | Google Scholar
M. Dorigo and T. Sttzle, Ant Colony Optimization, MIT Press, 2004.
N. Labroche, N. Monmarché, and G. Venturini, “Antclust: ant clustering and web usage mining,” in Genetic and Evolutionary Computation—GECCO 2003, vol. 2723 of Lecture Notes in Computer Science, pp. 25–36, Springer, Berlin, Germany, 2003.
View at: Publisher Site | Google Scholar
J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of the IEEE International Conference on Neural Networks, vol. 4, pp. 1942–1948, December 1995.
View at: Google Scholar
I. F. Moawad, H. Talha, E. Hosny, and M. Hashim, “Agent-based web search personalization approach using dynamic user profile,” Egyptian Informatics Journal, vol. 13, no. 3, pp. 191–198, 2012.
View at: Publisher Site | Google Scholar
E. Saka and O. Nasraoui, “Simultaneous clustering and visualization of web usage data using swarm-based intelligence,” in Proceedings of the 20th IEEE International Conference on Tools with Artificial Intelligence (ICTAI '08), pp. 539–546, November 2008.
View at: Publisher Site | Google Scholar
R. Cooley, B. Mobasher, and J. Srivastava, “Web mining: Information and pattern discovery on the World Wide Web,” in Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence (ICTAI '97), pp. 558–567, November 1997.
View at: Google Scholar
O. Nasraoui, R. Krishnapuram, and A. Joshi, “Mining web access logs using a relational clustering algorithm based on a robust estimator,” in Proceedings of the 8th International World Wide Web Conference (WWW '99), pp. 40–41, Toronto, Canada, May 1999.
View at: Google Scholar
O. Nasraoui, R. Krishnapuram, H. Frigui, and A. Joshi, “Extracting web user profiles using relational competitive fuzz clustering,” International Journal on Artificial Intelligence Tools, vol. 9, no. 4, pp. 509–526, 2000.
View at: Publisher Site | Google Scholar
J. Srivastava, R. Cooley, M. Deshpande, and P.-N. Tan, “Web usage mining: discovery and applications of usage patterns from web data,” ACM SIGKDD Explorations Newsletter, vol. 1, no. 2, pp. 12–23, 2000.
View at: Publisher Site | Google Scholar
O. Nasraoui, M. Soliman, E. Saka, A. Badia, and R. Germain, “A Web usage mining framework for mining evolving user profiles in dynamic web sites,” IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 2, pp. 202–215, 2008.
View at: Publisher Site | Google Scholar
E. Saka and O. Nasraoui, “Improvements in flock-based collaborative clustering algorithms,” in Computational Intelligence: Collaboration, Fusion and Emergence, vol. 1 of Intelligent Systems Reference Library, pp. 639–672, Springer, Berlin, Germany, 2009.
View at: Publisher Site | Google Scholar
S. Kirkpatrick, J. Gelatt, and M. P. Vecchi, “Optimization by simulated annealing,” Science, vol. 220, no. 4598, pp. 671–680, 1983.
View at: Publisher Site | Google Scholar | MathSciNet
O. Nasraoui and R. Krishnapuram, “One step evolutionary mining of context sensitive associations and web navigation patterns,” in Proceedings of the SIAM Conference on Data Mining, pp. 531–547, Arlington, Va, USA, April 2002.
View at: Google Scholar
E. Saka and O. Nasraoui, “On dynamic data clustering and visualization using swarm intelligence,” in Proceedings of the 26th International Conference on Data Engineering Workshops (ICDEW '10), pp. 337–340, March 2010.
View at: Publisher Site | Google Scholar
http://bcs.whfreeman.com/thelifewire/content/chp53/5302002.html.

Copyright

Copyright © 2015 Anna Alphy and S. Prabakaran. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

3331

Downloads

1236

Citations