Posts Tagged ‘anonymisation’

Bernie Hogan at the WebSci summer school

Bernie Hogan spoke this morning on ‘Facebook as a Research Environment’. He opened by describing Facebook as an excellent source of data, albeit (let’s face it, inevitably) one with legal, ethical and technological constraints. He described academic uses of it, including:

  • capturing user/network data and pushing that to a survey
  • comparing claims about products with people’s recommendations
  • studying relationship strength against trace data

Facebook, data and controversy

Hogan outlined a few studies that resulted in embarrassment in one way or another:

  • Lewis et al, Taste, Ties and Time: it was possible to figure out who was whom in a dataset made available (as reported online yesterday)
  • Warden, American Cultures: Pete Warden captured friend links from all publicly available Facebook profiles (about 40% of the Facebook population). He planned to release the data to the public, but got squashed by Facebook’s lawyers…
  • The Facebook 100: Porter et al released personal data from the first 100 schools to join Facebook in 2005… accidentally including IDs, making it possible to identify people in that dataset.

Yes, we’re back to the Problem for Web Science.

The FB100 dataset is still available in a pseudonymised format: I piped up to ask whether it’s still possible to figure out who is whom, and apparently you can figure out schools but not people. Really? I’d like to find out more about that.

Social capital

Hogan also remarked that Facebook provides exceptional access to social capital, which he defined as ‘the ability of individuals / groups to access resources from their social network’. He pointed to a few papers (including, of course, Granovetter’s The Strength of Weak Ties (pdf)), and asked questions such as:

  • Do community clusters affect perceptions of social capital? (I.e. does a diverse network make you feel you have access to broad support?)
  • Is the effect mediated by participation in the network or is it independent of network activity?

He touched on identity markers (e.g. in the context of reddit and wikipedia), and how they lead to differences in behaviour/topology. This reminded me of work by Michael Bernstein et al on Anonymity and Ephemerality on 4chan and /b/.

The complexity of relationships, and handling that online

The above topic came up, too. I’d say it’s a whole other blog post…