Download Report

1
A community finding method for weighted dynamic
online social network based on user behavior

Dongming Chen1, Yanlin Dong1, Xinyu Huang1, Haiyan Chen2, Dongqi Wang1*
1. Software College, Northeastern University, Shenyang, 110819, China
2. East China University of Political Science and Law, Shanghai, 200000, China
Abstract—Revealing the structural features of social
networks is vitally important to both scientific research
and practice, and the explosive growth of online social
networks in recent years has brought us dramatic
advances to understand social structures. Here we
proposed a community detection approach based on user
interaction behavior in weighted dynamic online social
networks. We researched interaction behaviors of users in
online social networks and built a directed and
un-weighted network model in terms of the following
relationships between social individuals at the very
beginning. In order to refine the interaction behavior of
users, level one fuzzy comprehensive evaluation model was
employed to describe how closely individuals are
connected to each other. According to this intimate degree
description, weights are tagged to the prior un-weighted
model we built. Secondly, a heuristic community detection
algorithm for dynamic network was provided based on the
improved version of modularity called module density. As
for the heuristic rule, we chose greedy strategy and fed the
algorithms with the update part between two neighboring
time slices’ network model. A parameter β was also set to
evaluate the impacts of individuals’ joining and leaving
actions in directed networks. Experimental results show
that the proposed algorithm can obtain high accuracy and
simultaneously get comparatively lower time complexity
than some typical algorithms. More importantly, our
algorithm needs no priori conditions.
Index Terms — Complex Network, Community Structure,
Modularity, Dynamic Network, Directed and Weighted Network
I.
INTRODUCTION
Research of complex networks has a wide attraction of
scientific research. Some scholars focus on semantic network[1]
which means a form of human knowledge relationships [2][3],
and some other researchers obtain algorithms for community
detection, which is a fundamental problem in network analysis.
Detecting communities is of great importance in sociology,
biology and computer science, disciplines where systems are
often represented as graphs. This problem is so complex that
Dongming Chen (e-mail: [email protected]).
Yanlin Dong (email: [email protected])
Xinyu Huang (email: [email protected])
Haiyan Chen (email: [email protected])
*Dongqi Wang (corresponding author; phone: +86 13624039571; email:
[email protected])
we didn’t find a satisfied solution by far, despite the huge
effort of a large interdisciplinary community of scientists
working on it over the past few years [4].
Traditional methods treat community detection as graph
partitioning and clustering problems. Kernighan-Lin(KL)
algorithm[5] is an early proposed graph partitioning based
method, which uses greedy strategy to optimize the divided
structure according to the edges between communities, the
evaluation function is defined as the number of edges in the
two communities minus the number of edges connecting them.
KL algorithm can divide target network into specific size
subsets, but the number of subsets is needed as a priori. As KL
algorithm, the majority of graph partitioning methods require
too much prior knowledge, and most variants of the graph
partitioning problem are NP-hard. More importantly, the
partitioning results are usually consisted of clusters with
similar size. Taking all these shortcomings into consideration,
graph partitioning algorithms are not fully suitable for
detecting communities in big social networks [6].
Unlike graph partitioning methods, clustering do not need
much prior knowledge, so clustering based methods can be
more effective for community detection. For example,
hierarchical clustering algorithms are known as suitable
approaches to be used to detect communities in social
networks. Social networks naturally have hierarchical
structures, which mean networks with hierarchical structure
may display several levels of grouping of the nodes. Small
clusters are contained within large ones, which are in turn
included in larger ones, and so on [7]. Some clustering
algorithms try to divide a target network naturally into many
parts based on the similarity and intensity of each node, and
these works can be classified into two kinds: aggregation
algorithms and divisive algorithms. For example, the Newman
fast algorithm [8] is a typical aggregation method based on
greedy strategy, mainly focuses on the network with numerous
vertices. Typical divisive algorithm is as method [9], which
treats the network as a huge community, and then divides it
step by step. The GN algorithm proposed by Girvan and
Newman [10] is another example of divisive method. The main
idea is to remove the edges with maximal betweenness [11],
which provides a rule of edge detection for the small
community in the network.
Except for traditional methods, community detection in
weighted, directed or dynamic networks is also widely
discussed separately. Few works took these features together
into consideration, and comparing to the research on
unweighted and undirected networks, no significant
achievements had been made. Most networks in real world can
2
be seen as heterogeneous ones because the nodes have
different characteristics and the relationship between two
nodes usually means different influences to them. In fact,
heterogeneity of nodes can be represented in weighted,
directed network models. On the other hand, dynamic network
analysis had drawn a lot of attentions in the past few years. As
we know, a dynamic community detection method was firstly
proposed by Hopcroft [12], it divided the dynamic network into
several static parts, then obtained the hierarchy clustering
results with the method of cosine similarity. They selected
relatively slightly changed parts from the results as natural
communities, and dynamic network evolution was studied on
the above bases. Chakrabarti [13] tried to analyze dynamic
network via continuous iteration, the concepts called snapshot
quality and history cost were proposed in this work to satisfy
both “clustering results should keep pace with the current
network topology as far as possible” and “the results should
minimize the difference between the clustering process”. In
order to understand the dynamic of social network structure,
Shan build a mathematical model [14], and an incremental
identification method is also proposed, which analyzes the
nodes incrementally rather than just to re-divide all nodes in
the network, to improve actual operation efficiency of the
algorithm. Ning’s work is also a valuable attempt [15], they put
forward the incidence vector concept to present the dynamic
process of the network through incrementally updated
characteristic value system. Chen, Wang and Wei approach a
novel multiobjective algorithm for detecting communities in
dynamic networks, which provides a solution to keep a
balance between accuracy and efficiency [16]. In Yu’s work, a
feasible algorithm for the identification of overlapping
modules in PPI networks is proposed, which detects
communities with more flexibility [17].
As we can see, excellent research works had been done on
community structure detection, however, the existing
algorithms either need too much priori knowledge or not good
enough to handle complex network models. The research on
weighted, directed and dynamic networks may provide us
more ways to understand real networks. And designing fast
and reliable network community detection algorithms will
continue to be the attractive area for us in the future.
On the other hand, with regards to community detection
methods, quality functions are very important, because an
algorithm needs a stopping criterion. Nowadays, the most
commonly known function is called network modularity [18],
which represents one of the first attempts to achieve a first
principle understanding of the clustering problem, and it
embeds in its compact form all essential ingredients and
questions, from the definition of community, to the choice of a
null model, to the expression of the strength of communities
and partitions[4]. Because of this, we are going to employ
modularity in this paper to evaluate the division results and we
take online social network as our target network, a detailed
analysis on the dynamic process of how an online social
network evolves will be shown.
The rest of this paper is organized as follows. Section 2 we
build a weighted social network based on interaction behavior
analysis of online social network. Section 3 gives a
community detection algorithm for dynamic directed weighted
networks. Section 4 compares our method with well-known
algorithms through experiments. Finally, Section 5 makes the
conclusions.
II. SOCIAL NETWORK WEIGHTED METHOD BASED ON USER
INTERACTION BEHAVIOR RESEARCH
When studying network structure, we first focus on the
interactions between individuals and define an approach to
weight the edges to involve more online social network
information into our network model. We assume edge weights
represent the interaction intensity between nodes, which will
reflect both the existing parameters in physical world and
calculating features defined by ourselves. The calculating
features will reveal the underlying characteristics of online
social networks. For a network containing complex social
relationships such as similarity and intimacy, we try to
transform one or more of the interactive attributes to be
weights, to measure the intimacy between users. The method
employed a weight with fuzzy synthetic evaluation model
according to the interaction of followers and the frequency of
interaction over a period of time. The theoretical basis of
weighting method is fuzzy comprehensive evaluation
method[19], which transforms qualitative evaluation to
quantitative evaluation according to the fuzzy membership
degree theory. For short, it is a method to provide an overall
evaluation to objects which are restricted by multiple factors
with fuzzy comprehensive evaluation on multiple factors [20].
In dynamic online social networks, such as Facebook[21]
and microblog [22], the behaviors, for example, 'comment',
'forward', 'like', 'at' can reflect the proximity between users
during a period of time. In our comprehensive evaluation
method, we select the level one comprehensive evaluation
model [23] as the method to measure the relationship. Here we
assume B as the collection of user behaviors:
B = {comment, forward, like, at}
It is crucial to allocate the weight for the above factors.
The weights of the behaviors are confirmed by statistical
methods. The data used in the paper comes from NLP
laboratory of Beijing Institute of Technology which collects
data of Sina microblogs in recent two years [22]. The dataset
contains 1.5 million users and 50 million microblogs. There
are 86,341,256 interaction behaviors, including 21,510,466
'comments', 36,538,478 'like', 23,299,712 'forward' and
4,992,600 'at'. So we can easily figure out the proportion
shown as the following vector A and the member of A is the
proportion of each kind of interactions.
A = (0.2491, 0.4232, 0.2699, 0.0578)
Then we assume a 4  n evaluation matrix R for a single
factor, which is formulated as follows.
r1n 
 r11 r12
r r
r2 n 
Rx   21 22
 r31 r32
r3n 


r4 n 
 r41 r42
Here n represents total number of users x followed, r1n
denotes how many times user x did 'comment' action. r2n
represents how many times user x did 'forward' action, r3n
3
represents how many times user x did 'like' action, r4n denotes
how many times user x did 'at' action, n|x represents user n
was followed by user x. User x’s behavior and connection
relationship with other users are shown in Table 1.
Table 1 Evaluation of user x’s behaviors
Followed
Behaviors
followed_id1
Comment
Forward
Like
At
r11
r12
r13
r14
Matrix R for user x can be calculated as follow:
commentnum(n | x)
(1)
r1n  n
 commentnum(i | x)
forwardednum(n | x)
(2)
n
 forwardednum(i | x)
likenum(n | x)
followed_idn
r21
r22
r23
r24
…
…
…
…
r1n
r2n
r3n
r4n
 0.875
i 1
r3n 
…
We can calculate the evaluation matrix R003 of node 003,
for example, the user did ‘at’ behavior to others for eight
times during a period of time, particularly ‘at’ user 032 for
seven times. So we can get the following data.
atnum(032 | 003)
r4(032|003) 
atnum(009 | 003)  atnum(023| 003)  atnum(032 | 003)
i 1
r2 n 
followed_id2
If we calculate all the data according to the above
(3)
method, we can get the matrix as follows.
n
 likenum(i | x)
i 1
r4 n 
atnum(n | x)
(4)
R003
n
 atnum(i | x)
i 1
Where commentnum(n|x) counts how many times user x
'comment' user n, and forwardednum(n|x) counts how many
times user x 'forward' n’s articles, likenmu(n|x) counts how
many times user x set 'like' tag to user n’s articles and
atnum(n|x) counts how many times user x did 'at' actions to
user n when they post their own articles or comments.
Then we explain the evaluation model with following
Figure 1, which is generated from nodes of sina microblog.
We take user 003 as an example, detailed information can be
discovered in table 2.
0.1667 0.0714 0.1290 0.1250 
  0
0.1429 0.1613
0 
 0.8333 0.7857 0.7097 0.8750 
Finally, we can get the Level I comprehensive evaluation
model B003.
B003  A  R003'
  0.2491 0.4232 0.2699 0.0578  
0
0.8333 
0.1667
 0.0714 0.1429 0.7857 


 0.1290 0.1613 0.7097 


0
0.8750 
 0.1250
  0.1138 0.1040 0.7822 
The result represents the weight value of ‘at’ behavior
from user 003 to the three neighboring users are 0.1138,
0.1040, 0.7822.
III. COMMUNITY PARTITION ALGORITHM FOR DYNAMIC
DIRECTED WEIGHTED NETWORKS
Figure 1 The network of test example.
User_id
003
003
003
Table 2 Statistics of user 003
Comment
Transmit
Attention_id
num
num
009
1
1
025
0
2
032
5
11
Like
num
4
5
22
At
num
1
0
7
This paper proposed an algorithm which needs no priori
conditions, the algorithm firstly set up heuristic rules of the
dynamic changes of the social network based on the basic
nature of community structure, and employed an improved
module density function D combined with weighted network,
to ensure the accuracy of the algorithm [24].
We give some basic definitions which will be used in the
rest part.
For directed weighted network G w  (V , E ,W ) , where V is
the collection of nodes inside Gw, E is the collection of edges
inside Gw, eij  E is an edge from node i to node j, W is the
4
collection of weights of Gw’s edges and
edge
wij is the weight of
eij , we define:
(1) The in-weight and out-weight of nodes in the
network.
The symbol wini represents the sum of all edges weights
i
ending with node i, and wout
denotes the sum of all edges
weights starting with node i.
(2) The inside-edge weight and outside-edge weight in
the community.
Symbol Win (ci ) means the total weight of edges inside a
community, Wout (ci ) discloses the total weight of edges
outside a community, which are shown in formula (5) and
formula (6). An edge inside a community means that both of
the two ends of the edge fall into the community.
(5)
Win(c ) 
w

i
i , jci
Wout (ci ) 

ij
ici , jci
(6)
wij

2
ic1 , jc2
community
c1 and
c 2 where c1 and c 2 are two collections of
disjointed nodes or two communities who have no overlapping
parts between them.
The second function also has two variants as shown in
formula (8),
(8)
L(i, c )  w
i

jci
ij
where i represents a node. Function L counts neighbors of
node i inside community
ci or the increased number of nodes
ci when new node i is added into the
community. Furthermore, Lin (i, ci ) counts the number of
edges from node i to community ci , where Lout (i, ci )
inside community
discloses the number of edges from community
ci to node i.
wd
(4) D (ci ) formulates module density of community
ci , it can be calculated by formula (9).
Win (ci )  Wout (ci )
(9)
2n
Here n is the number of the nodes of the network.
Then we define D wd , which means the increased
D wd  ci  
module density when a node i is added into community
'
ci
to constitute a new community ci , it can be calculated by
formula (10).
D wd  Dtwd
 Dtwd
n
n1
(10)
is an out-degree edge for node i, then the importance of the
node i to the community is 1, otherwise, it is  (0  ＜1) . For
a node of a community, an in-degree edge has more influence
on the community than an out-degree edge. Now we assume
the situation of node i joining into community ci to form a
'
ij
L calculate the sum weights of edges in community
direction is from node j to node i, then only marginal influence
is impressed on the whole community and we can ignore it.
So in the paper, we use a parameter  (0  ＜1) to
represent the weight of in-degree. It means a node i waiting to
join community ci , connected with a directed edge. If the edge
new community ci . As shown in formula (11) and (12),
(3) Function L in directed weighted networks.
Function L is a polymorphism function with two variants
for directed and weighted networks. As shown in formula (7).
(7)
L(c , c ) 
w
1
In order to deal with the affection of direction factor in
the network, we can assume a node i with high out-degree and
low in-degree, while node j is on the contrary. Obviously, if
there is an edge between node i and node j, the direction of the
edge is most possibly from node i to node j. Similarly, if there
is an edge from node i to node j, the module density that of the
community will be greatly influenced, which means that node
i prefers to be a member of community ci . Instead, if the
t
Wint (ci ) calculates Win (ci ) of time t, and Wout
(ci )
calculates Wout (ci ) of time t.
Wint 1 (ci )  Wint (ci )   Ltin (i, ci )  Ltout (i, ci )
t 1
out
W
(11)
(ci )  W (ci )  (   1) L (i, ci ) 
t
out
t
in
(12)
i
(   1) Ltout (i, ci )   ( wini  wout
)
So we can calculate the module density following
formula (13). To make the structure of the new community ci'
more reasonable, we calculate the increased module density
D wd as follows.
W  Win  3 Lin  3Lout   d in  d out
(13)
D wd  out
2n
where n represents the number of nodes of the network.
Formula (13) gives the evaluation rule, which is also the key
of the algorithm in this paper. Most of the algorithms are
based on greedy strategy or approximation algorithm, to
improve the efficiency of calculation.
In our daily life, we can easily find that if a person is my
friend, he or she will most probably be my friend’s friend. So
we prefer to set the node i to be a member of the original
community. Thus, we put forward the principle combining
with heuristic greedy strategy: for any node i in the network,
we prefer to put it into community which contains the most
neighbors of node i. It has two key points: greedy selection
property and optimal substructure.
Our algorithm deals with directed and weighted networks.
However, for a real network, it is almost impossible for the
degree of all nodes to be even. Then, the structure of the
network is an undirected and unlooped graph with a
corresponding matroid. The greedy strategy is one of the
situations of the matroid, so the idea based on module density
with greedy strategy is reasonable.
There are mainly 4 kinds of reasons that will cause the
dynamic variety of real networks. We can conclude them as
follows:
5
(1) Adding a node (V+v): When a new node is joined in
the network, it is likely to connect other edges or not.
(2) Deleting a node (V-v): When a node is deleted, then
all of the edges connected with this node are also removed.
(3) Adding an edge (E+e): Connect two nodes in the
network to produce a new edge.
(4) Deleting an edge (E-e): Remove an edge between two
nodes.
A dynamic community changes its structure in
accordance with the above four situations, and in most cases
the first and the third situations are often occurred, while the
second and the fourth situations are very scarce.
A. Analysis of community structure variations
The four changing situations of a dynamic community
result in the structure of community evolving, too. In the
network we confronted in real life, there are only six cases
when a community structure changes.
(1) Community generation: new nodes are added to a
network dynamically, and these nodes may form a new
community as shown in Figure 2(a).
(2) Community extinction: nodes of a community are
removed gradually, and finally the community disappeared as
shown in Figure 2(b).
(3) Community expansion: the community will expand
when many new nodes or edges are joined as shown in Figure
2(c).
(4) Community decrease: the community will decrease
with the removing of part of nodes or edges and it makes the
community smaller and smaller as shown in Figure 2(d).
(5) Community combination: two communities are
merged into one community dynamically as shown in Figure
2(e).
(6) Community split: a community is split into two or
more communities dynamically as shown in Figure 2(f).
It can be easily proved that the changing results are
among the above six situations.
Null
Null
(a)
(b)
B. Algorithm description
We analyze the four possible dynamic behaviors and design
the algorithm as follows.
(1) Adding a node: When the new node has no connection
with any other nodes, it will be a new independent community.
If the new node has connection with other nodes, we put the
node into the community which can get the highest gain in
module density. The pseudo code of the algorithm is designed
as follows. ( c t denotes the community structure at time t)
Algorithm 1 AddNode
Input: the structure of a community at time t, and a new
node v awaiting to be added
Output: the updated structure at time t+1
// d v is the degree of node v.
If (d v  0) then
Update c t 1  c t {v}
Else
For existing communities, put v into each of them
Do
Calculate D wd
End For
c=get the maximal D wd when node v joins in the
community
Update c t 1  (ct \ c) (c {v})
(2) Deleting a node: If the degree of the removed node is
1, just delete the edge connected to the node, otherwise, delete
all the edges connected to the node and reassign the neighbor
nodes of the deleted node to the community which can achieve
a higher module density. If the connected edges are all within
the community, just keep the neighbor node in the original
community. The pseudo code of deleting a node is given as
follows.
Algorithm 2 DeleteNode
Input: the structure of a community at time t, and the
removed node v
Output: the updated structure at time t+1
k=1;
If (node v’s degree is 1)
c t 1  c t \ c
(c \ v ) {v}
Else
(c)
(e)
(d)
(f)
Figure 2 The six changes of a community
While (Node v’s neighbor is not 0) do
S  {nodes who doesn’t have connections
with nodes outside c }
End while
End if
Put the nodes in S into the considered best community
which has the maximal D wd
Update c t 1
(3) Adding an edge: when the new node i and node j
belong to the same community ( ci  c j ), the new edge from i
to j can enhance the value of function D according to the
definition of module density. So the inside edge will not break
6
up the original community. However, if the two nodes i and j
are not in the same community ( ci  c j ), then the new edge eij
calculate betweenness value for each edge. If the intermediate
value of the removed edge eij is comparatively large, we will
is an outside edge, it will decrease the module density of the
community comparatively. Thus, the new edge may not
change the structure of the community, or make one node in ci
or cj break away from the original community and join another
community so as to obtain higher module density.
When a new edge eij joined into network, we assume
re-divide the community when removing the edge, otherwise,
just keep the original community structure unchanged. The
algorithm is designed as follows:
X  ci ,Y  c j to judge if the new edge will increase the
module density for community Y. The algorithm is designed as
wd
wd
follows, where Dixy
and Dixy
are the gain of module
Algorithm 4 DeleteEdge
Input: the structure of community at time t and the removed
edge eij
Output: the updated structure at time t+1
If ( eij is a single edge) then
c t 1  ( c t \ i{ j, } ) i { } j { }
density when putting node i and j into X and Y.
Else if (the degree of either i or j is 1) then
Algorithm 3 AddEdge
Input: the structure of community at time t, and the new edge
eij.
Output: the updated structure at time t+1
If (node i and node j are new nodes) then
Update c t 1  c t {i, j}
Else if ( ci  c j ) then
wd
If ( Dixy
)
＜0 & D wd
jxy＜0
Update ct 1  ct
Else
wd
wd
Node v = Dixy ＞D jxy？i : j
Move v to the new community
For all neighbors of v
Put it into the best community
which has the maximal ∆Dwd
End for
t 1
Update c
End if
End if
End if
c t 1  ( c t \c (j ) ) i { } c i{ i( ) \ }
Else if (node i and node j don't belong to the same
community) then
ct 1  ct
Else
Int z = Get the betweenness of eij
If (z is the top n biggest betweenness of the
network)
Put other nodes into the considered best
community which holds the maximal D wd
End if
End if
Update c t 1
Then we analyze the time complexity of every step in the
algorithm.
1) Initialization. Initialize N nodes and at most N
communities, which takes the time complexity of O( N ) .
2) Iteration. Iterations are taken place for k times, and to
each iteration, the maximal L(i, ci ) are sought for all of the
N nodes. For node i, the
(4) Deleting an edge: when the nodes i and j of an edge
come from different communities (ci  c j ) , removing the edge
weakens the linkage of intercommunity. Removing an outside
edge makes a higher module density, so it will keep the
original community structure.
When the two nodes(i and j) of the removed edge belong
to the same community (ci  c j ) , if the degree of node i or
node j is 1, then one of the two nodes will be isolated node
after removing the edge. The node will separate from the
original community, and the new separate community will
contain only one node. If the node degree of both i and j are 1,
then two new isolated communities emerged.
When the two nodes (i and j) of the removed edge come
from different communities (ci  c j ) , and the node degree is
different from above, the removing of the outside edge
eij
results a lower module density. Then the original structure
either keeps unchanged or divides into two communities.
Thus, we get the idea from GN algorithm [15], which
regards the community (ci  c j ) as a small network, then
d (i) neighbors are assigned to N
N
communities, so there are at most
 d (i)
times satisfied
i 1
L(i, ci )  0 for each iteration. So it takes time complexity of
N
O(k  d (i )) for searching maximal L(i, ci ) . It must be
i 1
estimated for each iteration to make sure if the constraint
condition is satisfied or not for all of the nodes, which takes
time complexity of O(kN ) .
3) Modification. Algorithm 1 to algorithm 3(described in
section III) are simple linear calculation on given variables,
which takes time complexity O(kN ) , linear calculation on
neighbors of every variable brings complexity
N
O(k  d (i )) .
i 1
7
The total time complexity is
N
community evolution. It also conforms to the objective rule of
‘Rich get Richer’.
N
O( N  k  d (i ) kN  kN  k  d (i )) , which is
i 1
i 1
simplified as O ( kN  k
N
 d (i)) . Assume that the edge
i 1
N
number of the network is E, then
 d (i)  2E , so the final
i 1
In reality, the iteration time will be acceptable for most
networks, so the algorithm will be efficient. Especially, the
more obvious of the community structure will be, the less
iteration time will be needed, and the more efficient will the
algorithm be. However, if the community structure is not clear,
the community detection algorithm will make no sense all the
same.
time complexity is O(kN  kE ) .
From the reckon above, we can get the time complexity
of the algorithm is O(kN  kE ) , N denotes the number of
nodes and E represents edges in the network, and the k is the
number of iteration. Consequently, the value of k is vital in
running the algorithm. From the time complexity proved
above, we can conclude that the consuming time of the
algorithm will decrease with the decline of k value. So, the
smaller the k value becomes and the faster the algorithm runs.
According to the Heuristic principles of basic property in
community structure, the algorithm will need to detect a
community from the node with more neighbor nodes as
possible. Obviously, the more nodes a community contains,
the more suitable for algorithm to detecting, which enable the
main community become greater during the dynamic
Table 3

dataset size
IV. EXPERIMENT AND ANALYSIS
In this section, we carry out the experiments by using
Sina microblog dataset with 5,000 users. And we establish
four networks with 100, 500, 1000 and 5000 nodes
respectively. To show the results conveniently, we choose
some representative nodes to build a small network. Then we
compare the efforts with other algorithms.
Firstly, to confirm the value of  , we do the experiment
with different network scale(dataset size). When the
experiments end, we choose the value of  when the
community obtains the maximum value of module density.
The results are shown in Table 3.
Relation between  and module density
0.40
0.45
0.50
0.55
0.60
0.65
0.70
0.75
0.80
S100
0.399
0.420
0.442
0.459
0.467
0.471
0.456
0.433
0.384
S500
0.435
0.473
0.494
0.509
0.522
0.519
0.498
0.492
0.478
S1000
0.511
0.537
0.554
0.568
0.577
0.585
0.575
0.560
0.542
S5000
0.553
0.592
0.627
0.658
0.683
0.697
0.680
0.657
0.621
From Table 3, we can conclude that in most cases when
the value of  is 0.65, the module density will reach the
maximal value. Thus, we choose the approximate value of
  0.65 for the rest experiments.
Comparing with extremal optimization algorithm [25] and
improved CNM algorithm [26], we do experiments to get the
time overhead and the module density. The comparison results
are listed in Table 4 and Table 5.
Table 4 Time Overhead comparison of different algorithms
dataset size
S100
S500
S1000
algorithms
Our algorithm
Extremal optimization
Improved CNM algorithm
algorithms
0.285
0.317
0.279
1.962
2.231
2.195
7.450
7.983
7.964
Table 5 Modularity comparison of different algorithms
dataset size
S100
S500
S1000
Our algorithm
Extremal optimization
Improved CNM algorithm
0.369
0.362
0.347
From Table 4 and Table 5, we can conclude that the
algorithm in this paper outperforms in time consuming
compared with extreme optimization and improved CNM
algorithm as an increasing in network scale [26].
0.433
0.405
0.398
0.527
0.485
0.451
S5000
105.317
132.250
125.725
S5000
0.573
0.510
0.494
To demonstrate the high quality of the divided results,
we can evaluate the module density of the algorithms. Figure 3
shows the changing tendency of module density when network
scale increases, which discloses the larger of the network scale,
8
the bigger of the module density will be. Module density
reflects the divisive quality of network, so the proposed
algorithm is suitable for large scale network.
Module Density
Module Density
0.8
0.6
0.4
0.2
0
0
5
10
time (s) 15
value of module density S100
value of module density S500
value of module density S1000
value of module density S5000
Figure 3
Changing Tendency of Module Density
The above experiments have presented the results in
static networks. So we randomly add four behaviors of
dynamic network variations (described in section 3) to
compare with the other algorithms. The adjustment of the
community is based on increments of the network and it is
locally conducted, even in the most complex situations, the
time expenses of the algorithm can be neglected, and the
results are shown in Figure 4.
for online social networks are provided in this paper, which
are suitable for dynamic social network modeling and
community detecting, considering with the characteristics of
the social network and the shortage of the most current
community partition algorithm. According to the real-time and
dynamic characteristics of the social network, we first
weighted the network edge based on the user's behaviors, then
combined the following methods to design a new community
partition algorithm: the strategy of greedy heuristic rules,
improve module density function D in terms of the
characteristics of the weighted and directed network, which is
employed to derive the standard measure function for
estimating community detection and dynamic rules in real
social network. The algorithm can obtain very high accuracy
and low time complexity without any priori condition.
Moreover, the complexity only depends on the number of
iterations, namely the clarity of the community structure of
network, and regardless of the scale of the network, so it is
suitable for dynamic and large scale network analysis.
ACKNOWLEDGEMENT
This work is supported by the Central Universities
Fundamental Research funding of China under Grant No.
N120317003.
REFERENCES
[1]
Modularity
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
Modularity
[2]
[3]
[4]
0
Figure 4
5
10
algorithm in this paper
extremal optimization
improved CNM algorithm
15
[5]
time (s)
[6]
Changing Tendency of Modularity
We can conclude that during the process of changes, all
the module density values of the three algorithms have a bit
fluctuation, but just in a small range relatively(from 0.26 to
0.37), and the algorithm in this paper always maintains the
highest module density value after a certain time (time = 4.5s)
in dynamic network as shown in Figure 4.
In brief, the algorithm in this paper can achieve better
results compared with the extremal optimization and improved
CNM algorithm.
V. CONCLUSIONS
A weighted approach based on user interaction behavior
and module density based communities detection algorithm
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
Luo X, Xu Z, Yu J, et al. Building association link network for semantic
link on web resources[J]. Automation Science and Engineering, IEEE
Transactions on, 2011, 8(3): 482-494.
Xu Z, Wei X, Luo X, et al. Knowle: a semantic link network based
system for organizing large scale online news events[J]. Future
Generation Computer Systems, 2014.
Xu Z, Luo X, Wang L. Incremental building association link network[J].
International Journal of Computer Systems Science & Engineering, 2011,
26(3): 153-162.
Fortunato S. Community detection in graphs[J]. Physics Reports, 2010,
486(3): 75-174.
Kernighan B.W, Lin S. An efficient heuristic procedure for partitioning
graphs [J]. Bell System Technical Journal, 1970, 49: 291-307.
Bhattacharyya S, Bickel P J. Community detection in networks using
graph distance[J]. arXiv preprint arXiv:1401.3915, 2014.
Malliaros F D, Vazirgiannis M. Clustering and community detection in
directed networks: A survey[J]. Physics Reports, 2013, 533(4): 95-142.
Newman M.E.J. Fast algorithm for detecting community structure in
networks [J]. Phys Rev E, 2004, 69(6): 066133.
Albert R, Barabasi A.L. Statistical mechanics of complex networks [J].
Review of ModernPhysics, 2002, 74: 47–91.
Girvan M, Newman M E J. Community structure in social and biological
networks[J]. Proceedings of the National Academy of Sciences, 2002,
99(12): 7821-7826.
Newman M.E.J. Analysis of weighted networks[J]. Physical Review E
(Statistical, Nonlinear, and Soft Matter Physics), 2004, 705 Pt 2:.
Hopcroft John, Khan Omar, Kulis Brian, et al. Tracking evolving
community in large linked networks [J]. PNAS. 2004, 101(11):
5249-5253.
Chakrabarti D, Kumar R, Tomkins A. Proceedings of the 12th ACM
SIGKDD International Conference on Knowledge Discovery and Data
Mining [J]. ACM, New York, USA. 2006: 554-560.
Shanbo, Jiang Shouxu, Zhang Genshuo. IC: Dynamic social network
community structure of incremental recognition algorithm [J]. Journal of
Software. 2009, 20(Supplement): 184-192.
9
[15] Ning Huazhong, Xu Wei, Chi Yun, et al. Incremental spectral clustering
with application to monitoring of evolving blog communities [J]. SIAM
Internation Conference on Data Mining, 2007.
[16] Chen G, Wang Y, Wei J. A new multiobjective evolutionary algorithm
for community detection in dynamic complex networks[J].
Mathematical problems in engineering, 2013.
[17] Yu Q, Li G H, Huang J F. Mofinder: A novel algorithm for detecting
overlapping modules from protein-protein interaction network[J].
BioMed Research International, 2012.
[18] Newman M E J. Modularity and community structure in networks[J].
Proceedings of the National Academy of Sciences, 2006, 103(23):
8577-8582.
[19] Barabasi A.L, Bonabeau E. Scale-Free Networks [J]. Scientific
American, 2003, 288(1): 60-69．
[20] Guo Y. Comprehensive evaluation theory and method[J]. Press of
Science and Technology, Beijing, 2002: 41-46.
[21] Facebook: https://www.facebook.com/
[22] Sina microblog: http://us.weibo.com/gb
[23] F0RTUNATO S, BARTHELEMY M. Resolution 1imit in community
detection [J]. PPNAS, 2007, 104(1): 36-41.
[24] Zhenping Li, Shihua Zhang, Rui-Sheng Wang. Quantitative function for
community detection [J]. Physical Review E, 2008, 77: 036109.
[25] Duch J, Arenas A. Community detection in complex networks using
extremal optimization[J]. Physical review E, 2005, 72(2): 027104.
[26] Wakita K, Tsurumi T. Finding community structure in mega-scale social
networks:[extended abstract][C]//Proceedings of the 16th international
conference on World Wide Web. ACM, 2007: 1275-1276.
The authors declare that there is no conflict of interests
regarding the publication of this article.