January 2012
30 posts
2 tags
This graph makes me think Kobe is not that good,...
I find it surprising that NBA commentators rarely talk about field goal percentage. Everybody knows that the more you shoot the more you score. But players that score a lot are admired without consideration of their FG%. Of course having a high FG% is not necessarily admirable as many players only take easy shots, while top-scorers need to take difficult ones. Regardless, missing is undesirable...
5 tags
Why in-person education isn't dead yet...but a...
A growing tend in education is to put lectures online, for free. The Kahn Academy, Stanford’s recent AI course, and Gary King’s new quantitative government course at Harvard are three of the more prominent examples. This new pedagogical format is more democratic, free, and helps people learn at their own pace. It has led some, including us here at Simply Statistics, to suggest that the...
5 tags
Sunday data/statistics link roundup (1/29)
A really nice D3 tutorial. I’m 100% on board with D3, if they could figure out a way to export the graphics as pdfs, I think this would be the best visualization tool out there.
A personalized calculator that tells you what number (of the 7 billion or so) that you are based on your birth day. I’m person 4,590,743,884. Makes me feel so special….
An old post of ours, on dongle...
This simple bar graph clearly demonstrates that...
Some NIH R01 paylines are down to 10%. This means only 10% of grants are being funded. The plot below highlights that all we need is a tiny litte slice from Defense, Medicare, Medicaid or Social Security to bring that back up to 20%. The plot was taken from Alex Tarrabok’s great article in the Atlantic.
Update: The y-axis unit is billions of US dollars.
7 tags
When should statistics papers be published in...
Like many statisticians, I was amped to see a statistics paper appear in Science. Given the impact that statistics has on the scientific community, it is a shame that more statistics papers don’t appear in the glossy journals like Science or Nature. As I pointed out in the previous post, if the paper that introduced the p-value was cited every time this statistic was used, the paper would...
2 tags
The end of in-class lectures is closer than I...
Our previous post on future of (statistics) graduate education was motivated by he Stanford online course on Artificial Intelligence. Here is an update on the class that had 160,000 people enroll. Some highlights: 1- Sebastian Thrun has given up his tenure at Stanford and he’s started a new online university called Udacity. 2- 248 students got a perfect score: they never got a single question...
4 tags
A wordcloud comparison of the 2011 and 2012 #SOTU
I wrote a quick (and very dirty) R script for creating a comparison cloud and a commonality cloud for President Obama’s 2011 and 2012 State of the Union speeches*. The cloud on the left shows words that have different frequencies between the two speeches and the cloud on the right shows the words in common between the two speeches. Here is a higher resolution version.
The focus on jobs...
4 tags
Why statisticians should join and launch startups
The tough economic times we live in, and the potential for big paydays, have made entrepreneurship cool. From the venture capitalist-in-chief, to the javascript coding mayor of New York, everyone is on board. No surprise there, successful startups lead to job creation which can have a major positive impact on the economy.
The game has been dominated for a long time by the folks over in CS. But...
4 tags
Sunday Data/Statistics Link Roundup (1/21)
Is the microarray dead? Jeremey Leipzig seems to think that statistical methods for microarrays should be. I’m not convinced, the technology has finally matured to the point we can use it for personalized medicine and we abandon it for the next hot thing? Not to Andrew for the link.
Data from 5 billion webpages available from the Common Crawl. Want to build your own search tool - or just...
6 tags
Interview With Joe Blitzstein
Joe Blitzstein
Joe Blitzstein is Professor of the Practice in Statistics at Harvard University and co-director of the graduate program. He moved to Harvard after obtaining his Ph.D. with Persi Diaconis at Stanford University. Since joining the faculty at Harvard, he has been immortalized in Youtube prank videos, been awarded a “favorite professor” distinction four times, and...
Data Journalism Awards →
In data journalism, reporters leverage numerical data and databases to gather, organize and produce news.
3 tags
Fundamentals of Engineering Review Question Oops
The Fundamentals of Engineering Exam is the first licensing exam for engineers. You have to pass it on your way to becoming a professional engineer (PE). I was recently shown a problem from a review manual:
When it is operating properly, a chemical plant has a daily production rate that is normally distributed with a mean of 880 tons/day and a standard deviation of 21 tons/day. During an...
5 tags
figshare and don't trust celebrities stating facts
A couple of links:
figshare is a site where scientists can share data sets/figures/code. One of the goals is to encourage researchers to share negative results as well. I think this is a great idea - I often find negative results and this could be a place to put them. It also uses a tagging system, like Flickr. I think this is a great idea for scientific research discovery. They give you...
5 tags
Sunday Data/Statistics Link Roundup
Statistics help for journalists (don’t forget to keep rating stories!) This is the kind of thing that could grow into a statisteracy page. The author also has a really nice plug for public schools.
An interactive graphic to determine if you are in the 1% from the New York Times (I’m not…).
Mike Bostock’s d3.js presentation, this is some really impressive visualization...
6 tags
In the era of data what is a fact?
The Twitter universe is abuzz about this article in the New York Times. Arthur Brisbane, who responds to reader’s comments, asks
I’m looking for reader input on whether and when New York Times news reporters should challenge “facts” that are asserted by newsmakers they write about.
He goes on to give a couple of examples of qualitative facts that reporters have used in stories without...
Academics are partly to blame for supporting the...
Michael Eisen recently published a New York Times op-ed arguing that a bill meant to protect publishers, introduced in the House of Representatives, will result in tax payers paying twice for scientific research. According to Eisen
If the bill passes, to read the results of federally funded research, most Americans would have to buy access to individual articles at a cost of $15 or $30 apiece....
6 tags
Help us rate health news reporting with...
We here at Simply Statistics are big fans of science news reporting. We read newspapers, blogs, and the news sections of scientific journals to keep up with the coolest new research.
But health science reporting, although exciting, can also be incredibly frustrating to read. Many articles have sensational titles, like “How using Facebook could raise your risk of cancer”. The articles...
Statistical Crime Fighter
Dick Berk is using his statistical superpowers to fight crime. Seriously. Here is my favorite paragraph.
Drawing from criminal databases dating to the 1960s, Berk initially modeled the Philadelphia algorithm on more than 100,000 old cases, relying on three dozen predictors, including the perpetrator’s age, gender, neighborhood, and number of prior crimes. To develop an algorithm that forecasts a...
4 tags
Do you own or rent?
When it comes to computing, history has gone back and forth between what I would call the “owner model” and the “renter model”. The question is what’s the best approach and how do you determine that?
Back in the day when people like John von Neumann were busy inventing the computer to work out H-bomb calculations, there was more or less a renter model in place....
4 tags
A statistician and Apple fanboy buys a...
I don’t mean to brag, but I was an early Apple Fanboy - not sure that is something to brag about now that I write it down. I convinced my advisor to go to all Macs in our lab in 2004. Since then I have been pretty dedicated to the brand, dutifully shelling out almost 2g’s every time I need a new laptop. I love the way Macs just work (until they don’t and you need a new laptop).
...
5 tags
Sunday Data/Statistics Link Roundup
A few data/statistics related links of interest:
Eric Lander Profile
The math of lego (should be “The statistics of lego”)
Where people are looking for homes.
Hans Rosling’s Ted Talk on the Developing world (an oldie but a goodie)
Elsevier is trying to make open-access illegal (not strictly statistics related, but a hugely important issue for academics who believe government...
5 tags
Where do you get your data?
Here’s a question I get fairly frequently from various types of people: Where do you get your data? This is sometimes followed up quickly with “Can we use some of your data?”
My contention is that if someone asks you these questions, start looking for the exits.
There are of course legitimate reasons why someone might ask you this question. For example, they might be interested...
Building the Team That Built Watson →
To develop the computer smart enough to beat champions on “Jeopardy,” independent scientists had to learn to work together in unfamiliar ways, the team leader says.
3 tags
Make us a part of your day - add Simply Statistics...
You can add us to your RSS feed through feedburner.
3 tags
P-values and hypothesis testing get a bad rap -...
This post written by Jeff Leek and Rafa Irizarry.
The p-value is the most widely-known statistic. P-values are reported in a large majority of scientific publications that measure and report data. R.A. Fisher is widely credited with inventing the p-value. If he was cited every time a p-value was reported his paper would have, at the very least, 3 million citations* - making it the most...
3 tags
Why all #academics should have professional...
I started my professional Twitter account @leekgroup about a year and half ago at the suggestion of a colleague of mine, John Storey (@storeylab). I started using the account to post updates on papers/software my group was publishing. Basically, everything I used to report on my webpage as “News”.
I started to give talks where the title slide included my Twitter name, rather than my...
Will Amazon Offer Analytics as a Service? →
3 tags
Baltimore gun offenders and where academics don't...
Jeff recently posted links to data from cities and states. He and I wrote R code that plots gun offender locations for Baltimore. Specifically we plot the locations that appear on this table. I added locations of the Baltimore neighborhoods where most of our Hopkins colleagues live as well as the location of the medical institutions where we work. Note the corridor with no points between the West...
6 tags
List of cities/states with open data - help me...
It’s the beginning of 2012 and statistics/data science has never been hotter. Some of the most important data is data collected about civic organizations. If you haven’t seen Bill Gate’s TED Talk about the importance of state budgets, you should watch it now. A major key to solving a lot of our economic problems lies in understanding and using data collected about cites and...