Social Networks And Group Formation
Theoretical Concepts to Leverage
by Shiv Singh on 2007/09/06 | [19 Comments]
This is the first in a three-part series on academic research that illuminates social networks, one of the most important trends in design today.
Humans suffer from information overload; there’s much more information on any given subject than a person is able to access. As a result, people are forced to depend upon each other for knowledge. Know-who information rather than know-what, know-how or know-why information has become most crucial. It involves knowing who has the needed information and being able to reach that person (Johnson et al. 2000).
In this context, understanding the formation, evolution and utilization of online social networks becomes important. A social network is “a set of people (or organizations or other social entities) connected by a set of social relationships, such as friendship, co-working or information exchange.” (Garton et al., 1997) While the Internet contributes to the information overload, it also provides useful tools to effectively manage one’s social networks and through them gain access to the right pieces of information.
This field is of particular interest to researchers working at the intersection of information systems, sociology and mathematics. These researchers study the uses of social networks and the ways in which they are mediated in society and in the workplace through information communication technologies (ICTs) such as (but not limited to) the Internet. This literature review explores how social networks that take advantage of information communication technologies—specifically, web based technologies—begin, evolve and are utilized.
The online social network field is broad, and any literature review can only focus on a selection of articles. The present article highlights recent research in the field and focuses on centrality, linkage strength, identity, trust, activity and benefits. By no means is this review comprehensive, but it should give practitioners some useful concepts to consider as they design social network based web applications.
The Strength of Weak Ties
Social networks were first researched in the late 1940s. With the advent of the Internet, online communities and social networking websites, their significance has only increased. Any review hoping to be meaningful must begin with the normative contributions of the sociologist Mark Granovetter and the mathematician Linton C. Freeman who both wrote influential articles well before the Internet was popularized.
Granovetter (1973) argued that within a social network, weak ties are more powerful than strong ties. He explained that this was because information was far more likely to be “diffused” through weaker ties. He concluded that weak ties are “indispensable to individuals’ opportunities and to their incorporation into communities while strong ties breed local cohesion.”
Granovetter’s doctoral thesis demonstrated that most people landed jobs thanks to their weak ties and not their strong ones. It was the people that they did not know well, the ones with whom they did not have shared histories and did not see on a regular basis who were of most help. This is because people with strong ties generally share the same pieces of information and resources. Therefore they are of less help to one another.
Similarly, Granovetter identified absent ties (also called nodding ties) – those ties that lack the emotional intensity, time, intimacy and reciprocity to even qualify as weak ties. Someone living on the same street that you nod to everyday is an absent tie. An absent tie is someone that exists in your life but with whom you have no connection whatsoever. That person is not helpful in the way that a weak tie can be.
Depending upon the type of application you are building, you may want to design it so that people are encouraged to form weak ties with people that they do not know very well. They are more likely to benefit from those weak ties than from strong ones. But it is important to recognize the difference between a weak tie and an absent one. On social network sites like MySpace and Facebook, where self worth is garnered through the number of ties, the difference becomes important. Yet, the fact that you can search and connect to all kinds of ties on these networks has influenced their growth.
According to Granovetter’s theory, there would be value in the visual depiction of weak ties. LinkedIn tells you how many ties you have at each degree of separation, but other than that you are not given much information about those ties. Are they strong, weak, or absent ties? LinkedIn has another problem too: It makes it difficult for you to connect with your weak ties. You often have to ask a common friend for permission to establish that connection. No wonder LinkedIn is being eclipsed by other social network services!
Centralization in a Network
An understanding of social networks needs also to include accounts of centrality and of one node’s relationship to other nodes in a network. This is why Linton C. Freeman’s article on centrality in social networks is important (Freeman, 1979). Freeman explored how “graph centralization” was based on differences in point centralities. He also outlined three competing theories regarding the definition of centrality based on degree of a point, control and independence.
Degree of a point refers to the number of nodes connected to a given node. In simple terms, this means counting the number of friends you have in a social network. The more friends, you have, the more important you are.
Control refers to the extent to which nodes depend on one specific node to communicate with other nodes. For example, if hundreds of friends are connected to each other only when you serve as the bridge connecting them, then your centrality is high. You are the node that controls the communication flows.
And finally independence means that a node is closely related to all the nodes considered – so that it is minimally dependent on any single node and is not subject to control. This means you can reach the maximum number of people through the shortest number of links, without being dependent on a particular few nodes.

Figure 1: A depiction of centrality.
- Degree point: C and K have the most nodes connected to them.
- Control: D serves as the bridge between the most nodes and controls the flow of information.
- Independence: K is most closely connected to the other nodes by multiple nodes (I and Q).
Because social networks are fundamentally social tools in which people are constantly monitoring and growing their social network, most social network media depict growth using the degree of point definition. However, control and independence can be more useful definitions. For example, a person who controls information flows is more important than one who may have more friends in the network. Centrality can also indicate which members are the most useful or well connected and therefore the best information resources.
Learning from Flickr & Yahoo
The principles of node structures, tie strength and centrality have been applied to understand nodes in modern day online social networks. A good example of this is in the explanatory research conducted by Kumar, Novak and Tomkins (2006). They compared two online social networks, Flickr and Yahoo 360, which together had more than five million users at the time. These researchers noticed that the social networks follow a standard pattern of growth, namely, rapid early growth followed a period of decline and then slow but steady growth.
Kumar, Novak and Tomkins also saw that network activity is of three types:
- “Singletons,” who have no connections and are least central
- The “giant component,” which is the largest group of nodes tightly connected to the central nodes and to each other
- The “middle region,” which represents isolated groups which interact amongst themselves but not with the rest of the network, forming isolated stars. These groups grow one user at a time. Over time they merge with the giant component.

Figure 2a: The red section represents the giant component. The blue is the middle region comprising of isolated networks while the gray are singletons.
The node analysis of these networks showed that more than half of a social network is outside the giant component where the greatest centrality lies. They used the “control” definition of centrality to determine this. The research also highlighted a prevalence of “stars” in the middle region which are mini social networks, typically driven by one dynamic member who serves as the point of centrality with others serving as satellite nodes – connected to the dynamic member but not to each other. In Kumar, Novak and Tomkins’ analysis the middle region represented one-third of users on Flickr and about ten percent of users on Yahoo! 360.
Also keep in mind that the most growth happens in the middle region where dynamic members influence others to join their network. These sub-networks can gradually join the giant component over time. Once they do, the importance of the dynamic member diminishes. Even if that dynamic member were to leave the network, the others would stay in the network.

Figure 2b: A connection is made between one of the isolated networks from the middle region connects to the giant component.

Figure 2c: The formerly isolated network becomes part of the giant component.
What are the implications of this? When designing your social network, be aware that most of the network will be outside the giant component. In a sense, social networks themselves are thousands of sub-networks. The more mechanisms that you provide for those sub-networks to flourish, greater the overall network growth. Social networks are fundamentally virtual ghettos. Networks like MySpace and Facebook that encourage ghettos grow the most. Ning, which lets you create your own network and join others too, cleverly understands this concept and leverages it.
Live Journal, DBLP & Adoption Behavior
Most online social networks grow based on the initiative of early adopters who transfer their offline networks online and serve as “stars.” But it is also important to look at the evolution of social networks based on intentional activity within a network. Backstrom, Huttenlocher and Kleinberg (2006) analyze group formation in large social networks. They used LiveJournal data from its ten million users and DBLP, a database of co-authorship in conference publications to study how the communities grew based on the underlying social networks. They showed that a person was more likely to join a social network if friends of the person were already closely linked together on it. Having several friends closely connected in an online social network builds trust. For those of us who are active members of social networks, this makes obvious sense.
The article conclusively showed that the most growth happened in the giant component (without using the term explicitly) where the nodes were most central. In highlighting the importance of the giant component, Backstrom, Huttenlocher and Kleinberg validated the Kumar et al. (2006) theory. Their article raises a critical question: Once a node becomes aware of its neighbors’ behavior, under what conditions and based on what network relationships will the node adopt that behavior itself?
Another group of researchers who studied the DBLP database were Cai et al. (2006). They pointed out that each node belongs to several different social networks, with the other networks affecting the group formation patterns, evolution and information sharing on the social network. As a result, they felt that a network can’t be analyzed independently but needs to be studied in the context of other networks. It may also influence whether a node leaves a network based on the activity of nodes on its other networks. This raises an important question for practitioners: Do you know how much of the activity on your social network is influenced by activity on other social networks?
This is of particular interest when examined in the context of the new Google lab efforts around Social Stream, which hopes to be a meta-social-network aggregating different networks together. Developed in partnership with Carnegie Mellon University, Social Stream s currently in private beta. The question that social network designers worry about is, once you can understand network activity on different networks via a single, consolidated interface, how will that affect your own network preferences?
It is clear that online social networks are always evolving because of both outside influences and activity within them. Butler (2001) emphasized this when they showed that network size has a complex influence on the network such that more member gains results in more member losses too. They argued that it is necessary to balance the positives and negatives of size and communication activity. A final question to consider is which type of membership activity and where (giant component, middle layer or among singletons) most affects an online network?
Conclusion
Researchers studying group formation have incorporated the normative social network theories discussed by Granovetter and Freeman. They recognize that these are socio-technical systems that must account for human agency, meaning that the ability of human beings to make unique choices heavily influences a network’s evolution. As a result, one can apply social networking theory to a web product, but one must remember that because these are human systems it is difficult gauge the potential success of a given network.
The next part of this series will explore information-sharing patterns on social networks. The third part will cover some workplace scenarios.
Authors Note: By no means is this review comprehensive, moreover it should serve as just a starting point for gaining familiarity with some of the academic contributions.
References
Backstrom, L., Huttenlocher, D., Kleinberg, J., and Lan, X. (2006.) Group formation in large social networks: membership, growth, and evolution. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press: Philadelphia, PA, USA.
Butler, B. (2001.) Membership size, communication activity, and sustainability: a resource-based model of online social structures. Information Systems Research, 12 (4), p. 26.
Cai, D., Shao, Z., He, X., Yan, X., and Han, J. (2005) Mining hidden community in heterogeneous social networks. In Proceedings of the 3rd International Workshop on Link Discovery. ACM Press: Chicago, Illinois.
Freeman, L. C. (1979.) Centrality in social networks conceptual clarification. Social Networks, 1 pp. 215-239.
Garton, L., C. Haythornthwaite and B. Wellman. (1997.) Studying online social networks. Journal of Computer Mediated Communication, 3 (1).
Granovetter, M. S. (1973) The strength of weak ties. American Journal of Psychology, 78 (6), pp. 1360-1380.
Johnson, B., Lorenz, E. and Lundvall, B. (2002.) Why all this fuss about codified and tacit knowledge? Industrial and Corporate Change, 11 (2), pp. 245-262.
Kumar, R., J. Novak and A. Tomkins. (2006.) Structure and evolution of online social networks. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 611-617. ACM Press: Philadelphia, PA, USA.




Readers' Comments (19)
Dmitry Nekrasovski
4 Reputation points
Posted 2007/09/06 @ 17:01PM with
Shiv Singh
80 Reputation points
Posted 2007/09/06 @ 17:35PM with
Rian Van Der Merwe
5 Reputation points
Posted 2007/09/06 @ 17:36PM with
Shiv Singh
80 Reputation points
Posted 2007/09/06 @ 17:39PM with
Javier Velasco
257 Reputation points
Posted 2007/09/06 @ 18:54PM with
Matthew C. Clarke
49 Reputation points
Posted 2007/09/07 @ 17:50PM with
Eric Scheid
5 Reputation points
Posted 2007/09/08 @ 22:40PM with
Lance Jones
0 Reputation points
Posted 2007/09/10 @ 10:14AM with
Shiv Singh
80 Reputation points
Posted 2007/09/10 @ 10:41AM with
Jamie Owen
59 Reputation points
Posted 2007/09/10 @ 10:52AM with
Melissa Robison
15 Reputation points
Posted 2007/09/11 @ 05:55AM with
Thomas Petersen
1 Reputation points
Posted 2007/09/12 @ 02:07AM with
Shiv Singh
80 Reputation points
Posted 2007/09/13 @ 11:12AM with
Shiv Singh
80 Reputation points
Posted 2007/09/13 @ 11:27AM with
Rian Van Der Merwe
5 Reputation points
Posted 2007/09/21 @ 08:52AM with
Valdis Krebs
1 Reputation points
Posted 2007/09/24 @ 07:24AM with
Dan S
0 Reputation points
Posted 2007/11/27 @ 09:04AM with
Stefan Waldherr
1 Reputation points
Posted 2007/12/28 @ 07:50AM with
Shiv Singh
80 Reputation points
Posted 2008/01/27 @ 17:26PM with