Tuesday, July 18, 2006
rubyforge data released
Hello moles, and happy summer! I've just released Rubyforge data from July, 2006.
Now granted, Rubyforge is not as large as Sourceforge. But it has considerable "buzz" for what that's worth. And as a relatively new language and new forge, I figure it's worth watching, especially considering how easy it is to collect their data! (They put out an XML file with a bit of the data in it, and with only 1700 or so projects, it's much easier to scrape the rest than on CERTAIN OTHER FORGES. Thank you for that, Rubyforge!)
Rubyforge files available here:
https://sourceforge.net/project/showfiles.php?group_id=119453&package_id=197648
Unfortunately, even though RF is using the same software as SF, they don't have donation system (so no donor files), and they don't have a statistics engine like SF. So the statistics are a little weak.
One other note, along with Freshmeat, Rubyforge will be scraped MONTHLY. Sourceforge will continue to be scraped BI-MONTHLY (every other month). This is due to the size and complexity of the SF scrape.
ObjectWeb coming next.
Now granted, Rubyforge is not as large as Sourceforge. But it has considerable "buzz" for what that's worth. And as a relatively new language and new forge, I figure it's worth watching, especially considering how easy it is to collect their data! (They put out an XML file with a bit of the data in it, and with only 1700 or so projects, it's much easier to scrape the rest than on CERTAIN OTHER FORGES. Thank you for that, Rubyforge!)
Rubyforge files available here:
https://sourceforge.net/project/showfiles.php?group_id=119453&package_id=197648
Unfortunately, even though RF is using the same software as SF, they don't have donation system (so no donor files), and they don't have a statistics engine like SF. So the statistics are a little weak.
One other note, along with Freshmeat, Rubyforge will be scraped MONTHLY. Sourceforge will continue to be scraped BI-MONTHLY (every other month). This is due to the size and complexity of the SF scrape.
ObjectWeb coming next.
Sunday, July 09, 2006
Sourceforge: Number of Downloads per Project
UPDATE (2006-Jul-11): As far as I can tell, the problem below has been fixed and the "number of downloads" files are all set for you to use! Enjoy.
UPDATE (2006-Jul-10):Today, an alert user pointed out a problem with the data that I released yesterday for number of downloads. Sure enough, there was a problem with errant commas in the numeric values greater than 999. This was causing the SQL sum() to add values incorrectly for projects with large numbers of downloads. New files are being generated now, and they'll be posted shortly! Thanks for your patience. (I've removed the bad files, so for the time being the links below won't work.)
Original posting:
=================
From the Sourceforge stats page, you can get a variety of measures, such as number of downloads, rank, etc for a particular project.
I have begun releasing these measures (summed per project, over the 60 days between SF scrapes) as Raw Downloads under the SFRawData package. Here are the links, retroactive back to December 2005:
June, 2006 (link to release)
Apr, 2006 (link to release)
Feb, 2006 (link to release)
Dec, 2005 (link to release)
Some obvious applications would be to take a particular group of projects you are interested in (a dozen, a hundred, whatever) and track their number of downloads over these periods.
One thing to understand is how the project download data is collected: on the day(s) that I do the scrape of SF, I collect THAT DAY's 60-day stats page. This means that if the scrape is done on January 1 (for example) the 60-day stats will be for the approximately 2 month period before that (i.e. Nov 3 through Dec 31).
UPDATE (2006-Jul-10):Today, an alert user pointed out a problem with the data that I released yesterday for number of downloads. Sure enough, there was a problem with errant commas in the numeric values greater than 999. This was causing the SQL sum() to add values incorrectly for projects with large numbers of downloads. New files are being generated now, and they'll be posted shortly! Thanks for your patience. (I've removed the bad files, so for the time being the links below won't work.)
Original posting:
=================
From the Sourceforge stats page, you can get a variety of measures, such as number of downloads, rank, etc for a particular project.
I have begun releasing these measures (summed per project, over the 60 days between SF scrapes) as Raw Downloads under the SFRawData package. Here are the links, retroactive back to December 2005:
June, 2006 (link to release)
Apr, 2006 (link to release)
Feb, 2006 (link to release)
Dec, 2005 (link to release)
Some obvious applications would be to take a particular group of projects you are interested in (a dozen, a hundred, whatever) and track their number of downloads over these periods.
One thing to understand is how the project download data is collected: on the day(s) that I do the scrape of SF, I collect THAT DAY's 60-day stats page. This means that if the scrape is done on January 1 (for example) the 60-day stats will be for the approximately 2 month period before that (i.e. Nov 3 through Dec 31).