Sunday Data/Statistics Link Roundup (2/12)

  1. An awesome alternative to D3.js - R’s svgAnnotation package. Here’s the paper in JSS. I feel like this is one step away from gaining broad use in the statistics community - it still feels a little complicated building the graphics, but there is plenty of flexibility there. I feel like a great project for a student at any level would be writing some easy wrapper functions for these functions. 
  2. How to run R on your Android device. This is very cool - can’t wait to start running simulations on my Nexus S.
  3. Interactive word clouds via John C. and why word clouds may be dangerous via Jason D. 
  4. Trends in APIs - there are more of them! Go get your free data. 
  5. A really interesting paper by Gary King on how to get a paper by exactly replicating, then building on or discussing, the results of a previous publication. 
  6. 25 minute seminars - I love this post by Rafa, probably because my attention span is so short. But I think 25-30 minute talks are optimal for me to learn something, but not start to zone out…

List of cities/states with open data - help me find more!

It’s the beginning of 2012 and statistics/data science has never been hotter. Some of the most important data is data collected about civic organizations. If you haven’t seen Bill Gate’s TED Talk about the importance of state budgets, you should watch it now. A major key to solving a lot of our economic problems lies in understanding and using data collected about cites and states. 

U.S. cities and states are jumping on this idea and our own Baltimore was one of the earliest adopters. I thought I’d make a list of all the cities that have made an effort to make civic data public. Here are a few I’ve found:

There are also open data sites for many states:

Civic organizations are realizing that opening their data through APIs or by hosting competitions can lead to greater transparency, good advertising, and new and useful applications. If I had one data-related wish for 2012, it would be that the critical mass of data/statistics knowledge being developed could be used with these data to help solve some of our most pressing problems. 

Update: Oh Canada! In the comments Ani Ruhil points to some Canadian cities/provinces with open data pages. 

Web-scraping

The internet is the greatest source of publicly available data. One of the key skills to being able to obtain data from the web is “web-scraping”, where you use a piece of software to run through a website and collect information. 

This technique can be used for collecting data from databases or to collect data that is scattered across a website. Here is a very cool little exercise in web-scraping that can be used as an example of the things that are possible. 

Related Posts: Jeff on APIs, Data Sources, Regex, and The Open Data Movement.