Digital methods: Methodological design and lessons learned in the research of digital commons

Yesterday I participated at the University of Amsterdam at the Digital Methods Institute (DMI). An informal workshop in order to exchange methodological approaches and to discuss tools to allow analyzing Wikipedia with presentations from Erik Zachte (Wikimedia in figures – Wikimedia Foundation); Victor Grishchenko (Accretion and page growt); Johanna Niesyto (Experiences with tools across the EN and DE language versions); the Digital Methods Iniciative grew (Esther Weltevrede, Borra and others) presenting tools to research Wikipedia; and myself.

It is not common to find research event focus on open collaboration and online collective action; even less a workshop focus on best methods to research digital life. So I enjoyed a lot the discussion.

You can find a synopses of the research designs and references to tools at DMI website.

Here below I include the notes for my presentation on the methodological design and lessons learned from my Phd research on  Digital Commons Governance.

Hope you like it or find it useful. Comments welcome! Mayo.

Fuster Morell, Mayo (2010) “Note – Research Digital Commons Governance. Methodological design and lessons learne”d. Digital methods iniciative. University of Amsterdam. 25th Match 2010. (http://www.digitalmethods.net/Digitalmethods/WikiAnalyticsWorkshop).

I. Introduction

I will present the methodological design and lessons from my Ph.D research on “Governance online creation communities for the building of digital commons” which I am finishing at the European University Institute.

My methodological reflections regard to case comparison; that is research based on comparing Wikipedia to other cases.

My unit of analysis are online creation communities (OCCs); which I define as “a form of collective action performed by individuals that communicate, interact and cooperate; in several forms and degrees of participation; mainly via a platform of participation in the Internet; with the common goal of knowledge-making and sharing; which result in a digital common, that is, an integrated resource of information and knowledge (partly or totally) of collective property and freely accessible to third parts”.

Other concepts to define this type of collective action are Open collaboration, common-base peer production.

Wikipedia is one of the examples. But also others comunities around the building of divers information pools such as s oftware package (I.e: Debian, Plone, Drupal and Facebook Development Team); Guides or Manuals like (I.e. Wikihow or Wikitravels); or Multimedia archives (I.e. video You Tube? or articles (libraries) Plos).

From the OCC I analysed the governance. However, while the literature on the analysis of the governance of OC mainly focus on the intercatio among the participants; I decided to take also into consideration the role of infraestructure provider. For example, Wikimedia Foundation is the provider of Wikipedia or Yahoo the provider of Flickr.

More in concrete I look to answer the question: How the type of provider related to the community generated in terms of commuinity size, type of collaboration and self-governance?.

II. Methodology

Firtly, the empirical research was based on a multi-method approach.I combined a large-N stadistical analysis of 50 cases with an ind-deep case study comparison of four cases.

The combination of these two methodologies was very useful in term of questioning and reinforcing the results of one method with the results of the other method.

II. I Large-N

For the large-N analysis, I adapted a political science research trend called web analysis of democratic quality of political actors’ websites .

Steps for the large – N:

1) Design of a sample of 50 cases

2) Elaborate a codebook.

3) Data collection

4) Calcule descriptive stadistics and correlations between the variables.

1) Sampling:

I developed a snowball search , specifically by exhausting the search through these means:

i) Search in documentation and literature;

ii) Follow the hyper-links between the websites;

iii) Use general search engines (i.e. Google).

After a balanced sample of 50 cases was built, I designed a codebook.

2) Codebook

The codebook consisted of a set of 100 indicators related to the questions/variables I wanted to analyses.

For example, in term of self-governance I looked if the policies are defined by the community or not.

Or in regard of type of provider I looked to the type of legal entity associated to the community, among others.

3) Data collections

I fulfilled the codebook for each case visiting and observing the website of the OCC.

The estimated time was 40 minutes to one hour.

Some main problems I found in the data collection derived from the plurality of the OCCs.

The same indicators were not valid for all the cases, so at some point of the coding process, I had to review the indicators and define the indications “conceptually”, not in specific forms.

The indicators of the participation mechanism were particularly problematic because they vary greatly depending on each OCC and particularly depending on the type of solfware used.

With a sample more homogeneous in terms of using the same technological platform (such as comparing between wikis), the data collection would be easier.

Some eemarks:

I sent an e-mail informing that I was doing the research, but I collected data that was generated in the datly life of the OC, without requering any intervention from the participants. That is, using digital threads.

Using digital threads, the data collection can be developed in two ways: through “human” identification or through a program. Such as the work of Viegas for Wikipedia.

Human identification is when a person checks if an indicator is present or not in the website; program identification is a program that is designed to automatically.

Initially I planned to build a program for the data collection and analysis of the indicators, which it would serveto significantly reduce the time-consuming activity of web analysis. Furthermore, it would facilitate the building of a tool for the actors themselves analyze their websites.

However, programming is costly and I could not develop the program for lack of funding in my Ph.D program to cover the technical programming costs.

Furthermore, it required the creation of groups with coverage of a plurality of skills and resources. In the frame of a Phd research this requirements are not facilitated. In order to make profit of this frontier, it is in the benefit of research center to build alliances and create the conditions for the technological support of the research.

4) Statistical analysis

I used the program SPSS for the stadistics calculation. I wanted to use R program, because I priorise free solfware in my research, however I couldn’t find none in my University that could introduce me to the program.

I looked to descriptive stadistics (such as frecuency or percentage of use of copyleft licences or frequency of type of legal entities asocieted to the community). Then I also look to correlation between variables. Such as, are the bigger communities the ones hosted that commercial providers? Or, do non-profit providers generate larger collaboration between the participants?

Some initial considerations on the large-N:

* Large-N was adequate due to the novelty of the OC phenomenon. It helped me to more preciecly conceptualise and describe the OCCs.

* Apart of the data collection for the stadistical analysis of correlations between my variables, the exercice was very useful in terms of “online ethnography”, that is to increase understanding by observing the OC. A “field notes” was kept during the data collection.

* Large-N analysis was useful in terms of the need to go beyond in the literature to only case studies and consider not only experience of success, but also of failer.

Finally, the large – N helped me to identify cases and hypothesis for the case studies.

II. II Case studies comparison

The case study of OCCs are used in order to extract a more in-depth understanding.

Steps of case studies:

1) Selection of case studies:

From the large-N emerged four main models of provision or infraestructure governance an so I choose one case for each model for the in-deep case study comparison.

My case were: Flickr provided by a big coorporation, Yahoo; Wikihow provided by an enterprise; Wikipedia provided by a non-profit foundation; and, Social forums memory project provided by an open assembly composed by a self-selected group of participants.

2) Case study methods

I combined several methods on the case studies.

Remark: I did not follow the exact same plan for each case. For example, before starting the research I was already familiar with the Social Forum case study, but not with the other cases. In this regard, I developed fewer interviews for the Social Forum.

The methods used were:

1) Virtual ethnography of the online platforms

2) Digital threads analysis of participation data: Only for the Social forums case; for the other cases, such as Wikipedia, I used data on participation already availeble.

3) Observation of participation in physical encounters and headquarters

4) Review documentation of the cases

5) (Structured and unstructured) i nterviews to participants and consultation to experts >>> In total, I conduced 80 interviews.

To secure interviews with OCC participants, the more effective procedure was on the one hand to go to face to face meeting and on the other hand, to ask the people I interviewed to put me in contact with other people I wanted to interview. // To me the major response of the informants in physical encounters is mainly related to gaining trust and attracting the attention of the informants. With other forms of gaining trust with the informants and attracting their attention, the developing of the case study only using online methods might also work.

For the Wikimedia and Flickr data collection, I did a fieldwork internship in the San Francisco Bay Area and a trip to the east cost. In terms of collection of interviews was also importantly developed at Wikimania or meet-up of the communities.

Another tip concerning the interviews: do not start the interviews with the people more difficult to get; as you also madurate the interview as far as you develop more and more interviews.

During the interviews a visualization technique was used based on asking the person to “draft” the relationship between the providers and the community according to how he/she conceive it and the asking him/her to comment different drafts representing the mentioned relationship.

+ Finally, the transcripts of the interviews were time-consuming but were also essential. The level of understanding grows exponentially with the transcription.

6) Organization of group discussions with participants and specialist

As part of the research, I contributed to the building of a collaborative space, the project Networked Politics, on the research of a large area of topics (new forms of political organising), but which is related to my research question.

This collaboration has been of great value for the research development in terms of providing feedback on the emerging research and getting to know relevant literature.

Furthermore, with the support of Networked Politics, I organized collective discussions (seminars) with participants and informants of my case studies and with experts in the area.

To design and guide these group discussions, a methodology of focus groups was adapted.

I consider facilitate reflexivity among actors and contributing to building relationship among them a resulting impact of the research. It was also useful in this regard.

Main problem of case studies comparison:

* Case comparison was not a problem for the case studies, as I commented previosly on the limitations of equal indicators on the large-N.

* The process of data collection has been characterized, more than for a “lack”of data, for an overloading of data available.