Friday, March 10, 2006

 

Social Network analysis over time using FLOSSmole data

Just sent off the camera ready version of a paper built using data available in the tracker tables of the FLOSSmole database.

Howison, J., Inoue, K., and Crowston, K. (2006). Social dynamics of free and open source team communications. In Proceedings of the IFIP 2nd International Conference on Open Source Software, Lake Como, Italy. Available from: http://floss.syr.edu/publications/howison_dynamic_sna_intoss_ifip_short.pdf

This paper furthers inquiry into the social structure of free and open source software (FLOSS) teams by undertaking social network analysis across time. Contrary to expectations, we confirmed earlier findings of a wide distribution of centralizations even when examining the networks over time. The paper also provides empirical evidence that while change at the center of FLOSS projects is relatively uncommon, participation across the project communities is highly skewed, with many participants appearing for only one period. Surprisingly, large project teams are not more likely to undergo change at their centers.

In the spirit of FLOSSmole, the scripts used to get this data, the intermediate data tables, the R scripts to build the figures and the latex source for the paper are also available.

Tuesday, March 07, 2006

 

February 2006 files released

Sourceforge data has been released for February, 2006. Get the files from our Sourceforge file release page.

What's included in this release:

Package: sfProjectInfo
Release: sfProjectInfo02-Feb-2006
Files:
--ProjectList02-Feb-2006.csv.bz2: list of just project names
--ProjectInfo02-Feb-2006.csv.bz2: list of all basic project info
--ProjectDescriptions02-Feb-2006.csv.bz2: project names and their text descriptions (this file is quite large)

Package: sfRawDeveloperData
Release: sfRawDeveloperData02-Feb-2006
Files:
--developers02-Feb-2006.csv.bz2: list of all developers
--developer_projects02-Feb-2006.csv.bz2: list of which projects are worked on by which developers

Package: sfRawData
Release: sfRawData02-Feb-2006
Files:
--project_dbenv02-Feb-2006.csv.bz2: list of projects and their database environments
--project_donors02-Feb-2006.csv.bz2: list of projects and their donors
--project_intaud02-Feb-2006.csv.bz2: list of projects and their intended audiences
--project_licenses02-Feb-2006.csv.bz2: list of projects and their open source licenses
--project_opsys02-Feb-2006.csv.bz2: list of projects and their operating systems
--project_proglang02-Feb-2006.csv.bz2: list of projects and their programming languages
--project_status02-Feb-2006.csv.bz2: list of projects and status
--project_topic02-Feb-2006.csv.bz2: list of projects and their topics
--project_userint02-Feb-2006.csv.bz2: list of projects and their user interfaces

Monday, March 06, 2006

 

tips for using the query tool

NOTE: This message describes an old query tool. The old query tool has been replaced with the new query tool. The new tool is located here: New Query Tool


Original message:
If you use the query tool, be aware that the amount of data in some of our tables is truly immense.

Tips:

1. do a "describe" on each table to see what's in there first:

"describe fm_projects"

This will tell you the structure of the table.

2. If you want to see a few sample rows, and you feel as though you simply MUST do a "select *", at least do your select with a mysql-style "limit" phrase like this:

"select * from fm_projects limit 25"

3. If you get an error describing something like a "timeout", this means your query was probably just too large. Email or chat with us on IRC or AIM to figure out what is wrong or a way to optimize the query.

4. Use the text files - many of the queries you want are the same queries that everyone wants! So we've taken the liberty of making text files of these items for your convenience.

5. The list of datasources (releases) is available here. This message is updated periodically. You can also do a "select * from datasources" to see what's available.

 

tidbit: freshmeat and sourceforge

Freshmeat (FM) describes itself thusly: "freshmeat maintains the Web's largest index of Unix and cross-platform software, themes and related 'eye-candy', and Palm OS software."

And Sourceforge (SF) is, of course, "the world's largest Open Source software development web site, hosting more than 100,000 projects and over 1,000,000 registered users with a centralized resource for managing projects, issues, communications, and code."

Here at FLOSSmole, we keep tabs on Freshmeat AND Sourceforge projects. Some of the projects listed on Freshmeat are also listed in Sourceforge, and some of them are not. One way to tell which SF projects are listed on FM is to query our Freshmeat tables and ask which Freshmeat projects resolve to a "sf.net" or "sourceforge" URL:

SELECT count(*)
FROM fm_project_homepages
WHERE datasource_id=18
AND real_url_homepage LIKE "%sourceforge%"
OR real_url_homepage LIKE "%sf.net"


For the March 5 data (datasource=18), this yields 10278 results.

Other things we track about Freshmeat projects are the authors, the dependencies (what other software is this software dependent upon?), and how the project is classified in the trove.

The tables you'll be interested in are:

fm_project_authors
fm_project_dependencies
fm_project_homepages
fm_project_trove
fm_projects

This page is powered by Blogger. Isn't yours?