热门帖子

2012年3月14日星期三

Social Networking Case Study

To understand SNA quickly!



What is social network analysis(SNA)?
I think there are many different ways to understand what SNA is. I summarized what I have learned from lectures and websites as a brief introduction to help understanding the whole picture of SNA quickly.
Definition:
SNA is the study of the pattern of interaction between actors of social networks. It refers to methods used to analyze social networks, social structures made up of individuals (or organizations) called "nodes", which are tied (connected) by one or more specific types of interdependency, such as friendship, kinship, common interest, financial exchange, dislike, sexual relationships, or relationships of beliefs, knowledge or prestige[1]. 
"Social network analysis is the mapping and measuring of relationships and flows between people, groups, organisations, computers or other information/knowledge processing entities." (Valdis Krebs, 2002). Social Network Analysis (SNA) is a method for visualizing our people and connection power, leading us to identify how we can best interact to share knowledge.[10]


Significance:
We will have what others are having and create more by SNA.  This view is derived from the prospective of Sara Philpott of IBM[2].


Necessity:
Data corresponding social lives shared between individuals has grown at a phenomenal rate since the birth of social networking sites in1997[2]. Because of the underlying social structure information contained, people in various areas, believe SNA may be a good way to help them to know more about what are happening and what will happen. 


Tools related:
Socilyzer. 
It is built for manager and consultants to conduct their own basic analyses[4,5]
SocNetV.
It lets you construct networks(mathematical graphs) with a few clicks on a virtual canvas or load networks of various formats(GraphViz, GraphML, Adjacency, Pajek, UCINET, etc.) and modify them to suit your needs[6].
NodeXL.
It is a free, open-source template for Microsoft® Excel® 2007 and 2010 that makes it easy to explore network graphs[7].
Agna. 
It is a platform-independent application designed for social network analysis, sociometry and sequential analysis[8]. 
Wikipedia aslo provide a list of nearly 70 SNA tools[9].

Application:
SNA is an important tool for many areas, such as business intelligence,  advertisement strategy,  entrepreneurs,  improvement of performance of communication system, design of new mobile system, human resource management, social science, policy making etc.  

Challenges:
By studying from websites, I know that there are also several challenges for SNA, especially four of them.
Overlapping community analysis.
For convenient, many approaches simplify the model by assuming that the communities are distinct. But some times this model may be too simple, especially for the purpose of business intelligence.
Edge semantics.
In a lot of models, relations between two individuals are represented by a single edge with a single weight in the graph. This assumption is also too simple to help finding out the truth sometimes.
Modeling edge creation/maintenance cost
The cost of creating a link in an online social network is much more cheaper than the real world. Is this need to be considered? Yes, If hoping SNA becomes more useful.
Cross network analysis
Because of business reasons, it is not easy to do the cross network analysis. If we don not have a solution to do so, the results are one-sided to some extent.[3]

Instruction:
Other related information about SNA is described on my blog of Lecture 6: Social Network Analysis.
A case of SNA to find out the most influential node.
Consider the following social network formed by 5 students:


From the sociograph, we can see it as a non-directional graph. And we can see Alice, Bob, Carol, David and Eva as five nodes. The links between them represent the relationship.
We can represent the above network by a simple matrix as follow.
By using a matrix to transform the sociograph to a formal representation of relations makes it possible to compute measures by algorithms. This matrix can be called sociogram.


Now we need some terms for SNA measurements.This terms are useful for us to do quantitative analysis and do good to design softwares for automatic statistics.
Cutpoint: A node which, if deleted, will make the network disconnected.We can see that David is a cutpoint of the sociograph.
Bridge: A tie which, if deleted, will make the network disconnected.So link between David and Eva is a bridge.
Degree: The degree of a node is the number of links that are incident with it.
Density: The proportion of ties that exist out of all possible ties. In other words, the number of links divided by the number of vertices  in a complete graph with the same number of nodes.And the density of the above sociograph:  
                                                          2L/(g(g-1)) = 2*6/(5*4) = 0.6
Geodesic path: The shortest of all the paths between two nodes is called the geodesic path.
Geodesic distance: The distance of the geodesic path between two nodes is called the geodesic distance. If no path exist between two nodes, then the distance is infinite or undefined.The geo distance of the above sociograph is as follow:
Clique: Maximum set of nodes in which every node is connected to every other. E.g. {Alice, Bob, David} and {Alice, Carol, David} are cliques.
N-Clique: A set of nodes that are within distance n of each other. E.g. {Alice, Bob, Carol, David, Eva} is a 2-Clique.
K-Plex: A set of n nodes in which every node has a tie to at least n-k others in the set. E.g. {Alice, Bob, Carol, David} is a 2-Plex.
Centrality: Identify which nodes are in the 'center' of the network.In a social network, entities at the center can be very important. It is similar to the VIP of the real world to some extent. And there are three standard centrality measures widely used:Degree centrality, Closeness centrality, Betweenness centrality.
Degree centrality: The sum of all other actors who are directly connected to actor in concern. This term signifies activity or popularity, and can be normalized as:
                                                        
Group degree centralization: Look at the dispersion of centrality. A measure of the graph centralization:
CD(n*)is the largest value among all CD(ni) in the network.In this case, group degree centralization is 2/3.


Closeness centrality: Represents the mean of the geodesic distances between particular node and all other nodes connected with in.Can be understood as how long does it take for a message to spread inside the network from particular node.
                                                     
Normalized closeness centrality:
                                                     
Group Closeness Centralization :  Measures the overall level of closeness in a network. Measure how large the sum of differences can actually be. The numerator  can be calculate by:
Where Cc(n*)is the largest value among all CC(ni) in the network. The denominator is the theoretically maximum all CC(ni) in the network. In this case, group closeness centralization is 17/90.
Betweenness centrality:The number of times a node connects pairs of other nodes, who otherwise would not be able to reach one another. Betweenness centrality counts the number of shortest paths between j and k that actor i resides on.It is a measure of the potential for control as an actor who is high in 'betweenness' is able to act as a gatekeeper controlling the follow of resources(information, money, power, e.g) between the alters that he or she connects.And the measure is based on undirected graph.
                                                      
Normalized betweenness centrality:
                                                     
Group Betweenness Centralization :Measure the overall level of betweenness in a network.

CB(n*) is the largest value among all CB(ni) in the network.
Or simplified by
In this case, group betweenness centralization is 5/48.

Results:
It is easy to know that David is the most influential node. From degree centrality, we can see that the indicators of David is the largest. And from closeness centrality or betweenness centrality we can make the same judgement.These measurements gives us different angles to see the social network.


In fact the easiest way to find out the most influential node is common sense or intuitive. Briefly speaking, the number of links collected to David is the most. Using such a simple method, without any knowledge about SNA, one can know the conclusion. But if we want to describe it more precisely which is comfortable for computer to process, some concepts of SNA are useful.


Based on the results obtained, there are several findings:
1) Different methods used for a same social network may result in same result, but this is not enough to illustrate the  inevitability of consistency of the results. These methods demonstrate different angles of the network.But the model is simplified. Whether it is still the case in a much more complex model need more research.And the research can give us a better understanding of 'All roads led to Rome'. 


2)Different social networks have different features. This features can be researched by different   method in different dimensions. So the selection of tools may be important for the special cause.


3)The most influential node may be the cut point. So the node should be taken more attention in real world, because it may be a key resource or VIP person. Control these nodes may help control the entire network more effective and rapidly. And help to maintain the stability of the network.


4)SNA show a strong ability to find out the interaction patterns of social individuals. It provide a tool and a chance to do more complex research about the evolution of social and business. For example, SNA is now an impotent tool for business intelligence. This property promotes us to recognize that we will have what others are having and create more by SNA. 


Ref:
[1]http://en.wikipedia.org/wiki/Social_network_analysis
[2]http://www-935.ibm.com/services/ie/gbs/irishtelecom/pdf/social_network_analysis.pdf
[3]http://datamining.typepad.com/data_mining/2008/04/four-challenges.html
[4]http://socilyzer.com
[5]http://www.bioteams.com/2008/02/08/a_great_free.html
[6]http://socnetv.sourceforge.net/
[7]http://nodexl.codeplex.com/
[8]http://mac.softpedia.com/progDownload/AGNA-Download-47086.html
[9]http://en.wikipedia.org/wiki/Social_network_analysis_software
[10]http://www.kstoolkit.org/Social+Network+Analysis
[11]Lecture6,7,8

3 条评论:

  1. Interesting that the image of the blog are hidden? And I can recover only some of them. Are there any drawbacks of blogger?

    回复删除
  2. This part of lecture so confusing to me. Firstly,I want to know the answer and anlysis are correct? Sorry for asking you this question. Because I am not clear about this part in the lecture. If your analysis is correct, it will help me so much. I can follow your instructions and understand this case a little better than on my own.SNA is an important tool for many areas, such as business intelligence, advertisement strategy, entrepreneurs, improvement of performance of communication system, design of new mobile system, human resource management, social science, policy making etc.So great. And Closeness centrality,Normalized closeness centrality,Group Closeness Centralization,Normalized betweenness centrality,Group Betweenness Centralization,CB(n*) is the largest value among all CB(ni) in the network are so hard for me. Not only the formulas but also the explainations mean to me.

    回复删除
  3. very detail calculation.the best thing is your conclusion inspired me a lot!

    回复删除