February 2012
32 posts
4 tags
Prediction: the Lasso vs. just using the top 10...
One incredibly popular tool for the analysis of high-dimensional data is the lasso. The lasso is commonly used in cases when you have many more predictors than independent samples (the n « p) problem. It is also often used in the context of prediction.  Suppose you have an outcome Y and several predictors X1,…,XM, the lasso fits a model: Y = B0 + B1 x X1 + B2 x X2 + … + BM...
Feb 23rd
3 notes
Monitoring Your Health With Mobile Devices →
Smartphones may some day help people take better control of their health by tracking it with increasing precision and convenience. The question for me is will simply collecting more data make things better?
Feb 23rd
Professional statisticians agree: the knicks...
A week ago, Nate Silver tweeted this: Since Lin became starting PG, Knicks have outscored opponents by 63 with Novak on the floor. Been outscored by 8 when he isn’t. In a previous post we showed the plot below. Note that Carmelo Anthony is in ball hog territory. Novak plays the same position as Anthony but is a three point specialist. His career three point FG% of 42% (253-603) puts him...
Feb 22nd
Air Pollution Linked to Heart and Brain Risks →
Three studies published this week found that people exposed to pollutants have a higher risk of stroke, heart attacks and cognitive deterioration. One study in particular by Jennifer Weuve at Rush University Medical Center examined fine PM and cognition in older women. Dr. Weuve’s research followed 19,409 women in the United States between the ages of 70 and 81 for about a decade, looking at...
Feb 22nd
Interracial Couples Who Make the Most Money →
Among newlyweds, Asian grooms and white brides earn the most money between them, followed by white grooms and Asian brides. Here is a link to the original report if you want to skip right to it.
Feb 21st
Scientists Find New Dangers in Tiny but Pervasive... →
Gaseous byproducts that were thought to dissipate quickly are now found to evaporate more slowly and persist longer than anyone had thought.
Feb 21st
1 note
4 tags
I don't think it means what ESPN thinks it means
Given ESPN’s recent headline difficulties it seems like they might want a headline editor or something…
Feb 20th
1 note
60 Lives, 30 Kidneys, All Linked →
A record chain of kidney transplants resulted from a mix of medical need, pay-it-forward selflessness and lockstep coordination among 17 hospitals over four months. This is a fascinating story of the longest “domino chain” of kidney transplantations yet done.  Domino chains, which were first attempted in 2005 at Johns Hopkins, seek to increase the number of people who can be helped...
Feb 20th
Company Unveils DNA Sequencing Device Meant to Be... →
A British company plans to sell a disposable gene sequencing device that is the size of a USB memory stick and plugs into a laptop computer to deliver its results. The latest in vaporware technology. Sounds cool, but A drawback is that the Oxford machine has a 4 percent error rate, too high for many applications, including diagnosis.
Feb 20th
1 note
How Companies Learn Your Secrets →
Your shopping habits reveal even the most personal information — like when you’re going to have a baby. Andrew Pole had just started working as a statistician for Target in 2002, when two colleagues from the marketing department stopped by his desk to ask an odd question: “If we wanted to figure out if a customer is pregnant, even if she didn’t want us to know, can you do that?” The real...
Feb 16th
4 notes
I.B.M.: Big Data, Bigger Patterns →
I.B.M. is the world’s largest employer of Ph.D.’s. It has plenty of businesses it can throw them at, but the trick is figuring out which ones will yield the best return. That happens by finding the algorithms for one industry, like power generation, that will work in another, like traffic management. I.B.M., Mr. Mills [I.B.M.’s senior vice president for software and systems] said, is...
Feb 16th
1 note
A Flat Budget for NIH in 2013 - ScienceInsider →
The President’s budget proposal would hold the National Institutes of Health’s (NIH’s) budget at the current level of $30.86 billion. In order to squeeze more grants out of the flat budget—the target is an 8% increase in new grants, to 672, for a total of 9415—NIH will put in place new grant management policies. Continuing grants will be cut 1% below the 2012 level,...
Feb 15th
1 note
Harvard's Stat 110 is now a course on iTunes
Back in January we interviewed Joe Blitzstein and pointed out that he made his lectures freely available on iTunes. Now it is a course on iTunes and the format has been upgraded to work better with iPhones and iPads. Enjoy! 
Feb 15th
4 notes
Mathematicians Organize Boycott of a Publisher →
More than 5,700 researchers are denouncing the pricing policies of the journal publisher Elsevier in a growing furor over open access to the fruits of scientific research.
Feb 14th
Mortimer Spiegelman Award: Call for Nominations....
The Statistics Section of the American Public Health Association invites nominations for the 2012 Mortimer Spiegelman Award honoring a statistician aged 40 or younger who has made outstanding contributions to health statistics, especially public health statistics. The award was established in 1970 and is presented annually at the APHA meeting. The award serves three purposes: to honor the...
Feb 14th
The Duke Clinical Trials Saga: What Really... →
Following up on the 60 Minutes segment on the Duke Clinical Trials Saga, here’s a video of Keith Baggerly giving talk about his and Kevin Coombes’ investigation of the data and the methods. Thanks to Andrew J. for the link.
Feb 13th
WatchWatch
Duke clinical trials saga on 60 Minutes. First, the back-to-back shot of Keith and Kevin is priceless. Second, I’ve never seen a cleaner desk in my life.
Feb 13th
2 notes
At MSNBC, a Professor as TV Host →
Melissa Harris-Perry, with her progressive talk show of the same name, is set to join MSNBC’s weekend lineup of cable news shows beginning Saturday. Ms. Harris-Perry will be the only tenured professor in the United States — and one of a very small number of African-American women — who serves as a cable news host. The article talks about how Ms. Harris-Perry had given a specific lecture at Tulane...
Feb 13th
6 tags
Sunday Data/Statistics Link Roundup (2/12)
An awesome alternative to D3.js - R’s svgAnnotation package. Here’s the paper in JSS. I feel like this is one step away from gaining broad use in the statistics community - it still feels a little complicated building the graphics, but there is plenty of flexibility there. I feel like a great project for a student at any level would be writing some easy wrapper functions for these...
Feb 12th
4 notes
The Age of Big Data →
For those who can make sense of the explosion of data, there are job opportunities in fields as diverse as crime, retail and dating. Veteran data analysts tell of friends who were long bored by discussions of their work but now are suddenly curious. “Moneyball” helped, they say, but things have gone way beyond that. “The culture has changed,” says Andrew Gelman, a statistician and political...
Feb 12th
2 notes
3 tags
Peter Thiel on Peer Review/Science
Peter Theil gives his take on science funding/peer review: My libertarian views are qualified because I do think things worked better in the 1950s and 60s, but it’s an interesting question as to what went wrong with DARPA. It’s not like it has been defunded, so why has DARPA been doing so much less for the economy than it did forty or fifty years ago? Parts of it have become politicized. You...
Feb 11th
1 note
Data says Jeremy Lin is for real
Nate Silver makes a table of all NBA players that have had four games in a row with 20+ points, 6+ assists, 50%+ shooting. The list is short (and it doesn’t include Kobe).  
Feb 11th
Duke Saga on 60 Minutes this Sunday
This Sunday February 12, the news magazine 60 Minutes will have a feature on the Duke Clinical Trials saga. Will Dr. Potti himself make an appearance? This is from the 60 Minutes web site: Deception at Duke - Scott Pelley reports on a Duke University oncologist whose supervisor says he manipulated the data in his study of a breakthrough cancer therapy. Kyra Darnton is the producer. The word on...
Feb 10th
1 tag
An example of how sending a paper to a statistics...
In a previous post I complained about statistics journals taking way too long rejecting papers. Today I am complaining because even when everything goes right —better than above average review time (for statistics), useful and insightful comments from reviewers— we can come out losing. In May 2011 we submitted a paper on removing GC bias from RNAseq data to Biostatistics. It was...
Feb 9th
Statisticians and Clinicians: Collaborations Based... →
Don Berry, former head of the division of quantitative sciences and chair of the department of biostatistics at the M.D. Anderson Cancer Center has a great column in Amstat News discussing collaborations between statisticians and clinicians. There are a few very nice bits, but do read the full column. The first is We send clear messages to our clinical collaborators that we are as interested in...
Feb 8th
1 note
DealBook: Illumina Formally Rejects Roche's... →
The battle for Illumina continues. The genetic analysis services provider said that Roche’s $5.7 billion takeover offer was “grossly inadequate” and that the Swiss drug maker’s director candidates should be rejected.
Feb 8th
Wolfram, a Search Engine, Finds Answers Within... →
Wolfram Alpha Pro’s creator wants his “computational knowledge engine” to appeal to more than math and science enthusiasts. Is it me, or is there a nascent boomlet in “anti-search engines” like Wolfram Alpha?
Feb 7th
2 notes
2 tags
An R script for estimating future inflation via...
One factor that is critical for any financial planning is estimating what future inflation will be. For example, if you’re saving money in an instrument that gains 3% per year, and inflation is estimated to be 4% per year, well then you’re losing money in real terms. There are a variety of ways to estimate the rate of future inflation. You could, for example, use past rates as an...
Feb 6th
1 note
5 tags
Sunday Data/Statistics Link Roundup (2/5)
Cool app, you can write out an equation on the screen and it translates the equation to latex. Via Andrew G. Yet another D3 tutorial. Stay tuned for some cool stuff on this front here at Simply Stats in the near future. Via Vishal. Our favorite Greek statistician in the news again.  How measurement of academic output harms science. Related: is submitting scientific papers too time consuming?...
Feb 6th
2 notes
2 tags
Why don't we hear more about Adrian Dantley on...
In my last post I complained about efficiency not being discussed enough by NBA announcers and commentators. I pointed out that some of the best scorers have relatively low FG% or TS%. However, via the comments it was pointed out that top scorers need to take more difficult shots and thus are expected to have lower efficiency. The plot below (made with this R script) seems to confirm this (click...
Feb 3rd
4 tags
Cleveland's (?) 2001 plan for redefining...
This plan has been making the rounds on Twitter and is being attributed to William Cleveland in 2001 (thanks to Kasper for the link). I’m not sure of the provenance of the document but it has some really interesting ideas and is worth reading in its entirety. I actually think that many Biostatistics departments follow the proposed distribution of effort pretty closely.  One of the most...
Feb 2nd
2 notes
Evidence-based Music
There was recently a fascinating article published in PNAS that compared the sound quality of different types of violins. In this study, researchers assembled a collection of six violins, three of which were made by Stradivari and Guarneri del Gesu and three made by modern luthiers (i.e. 20th century). The combined value of the “old” violins was $10 million, about 100 times greater...
Feb 1st
January 2012
30 posts
2 tags
This graph makes me think Kobe is not that good,...
I find it surprising that NBA commentators rarely talk about field goal percentage. Everybody knows that the more you shoot the more you score. But players that score a lot are admired without consideration of their FG%. Of course having a high FG% is not necessarily admirable as many players only take easy shots, while top-scorers need to take difficult ones. Regardless, missing is undesirable...
Jan 31st
5 tags
Why in-person education isn't dead yet...but a...
A growing tend in education is to put lectures online, for free. The Kahn Academy, Stanford’s recent AI course, and Gary King’s new quantitative government course at Harvard are three of the more prominent examples. This new pedagogical format is more democratic, free, and helps people learn at their own pace. It has led some, including us here at Simply Statistics, to suggest that the...
Jan 30th
4 notes
5 tags
Sunday data/statistics link roundup (1/29)
A really nice D3 tutorial. I’m 100% on board with D3, if they could figure out a way to export the graphics as pdfs, I think this would be the best visualization tool out there.  A personalized calculator that tells you what number (of the 7 billion or so) that you are based on your birth day. I’m person 4,590,743,884. Makes me feel so special…. An old post of ours, on dongle...
Jan 29th
1 note
This simple bar graph clearly demonstrates that...
Some NIH R01 paylines are down to 10%. This means only 10% of grants are being funded. The plot below highlights that all we need is a tiny litte slice from Defense, Medicare, Medicaid or Social Security to bring that back up to 20%. The plot was taken from Alex Tarrabok’s great article in the Atlantic. Update: The y-axis unit is billions of US dollars.
Jan 27th
7 tags
When should statistics papers be published in...
Like many statisticians, I was amped to see a statistics paper appear in Science. Given the impact that statistics has on the scientific community, it is a shame that more statistics papers don’t appear in the glossy journals like Science or Nature. As I pointed out in the previous post, if the paper that introduced the p-value was cited every time this statistic was used, the paper would...
Jan 26th
4 notes
2 tags
The end of in-class lectures is closer than I...
Our previous post on future of (statistics) graduate education was motivated by  he Stanford online course on Artificial Intelligence.  Here is an update on the class that had 160,000 people enroll. Some highlights: 1- Sebastian Thrun has given up his tenure at Stanford and he’s started a new online university called Udacity. 2- 248 students got a perfect score: they never got a single question...
Jan 25th
1 note
4 tags
A wordcloud comparison of the 2011 and 2012 #SOTU
I wrote a quick (and very dirty) R script for creating a comparison cloud and a commonality cloud for President Obama’s 2011 and 2012 State of the Union speeches*. The cloud on the left shows words that have different frequencies between the two speeches and the cloud on the right shows the words in common between the two speeches. Here is a higher resolution version.  The focus on jobs...
Jan 25th
1 note
4 tags
Why statisticians should join and launch startups
The tough economic times we live in, and the potential for big paydays, have made entrepreneurship cool. From the venture capitalist-in-chief, to the javascript coding mayor of New York, everyone is on board. No surprise there, successful startups lead to job creation which can have a major positive impact on the economy.  The game has been dominated for a long time by the folks over in CS. But...
Jan 23rd
6 notes
4 tags
Sunday Data/Statistics Link Roundup (1/21)
Is the microarray dead? Jeremey Leipzig seems to think that statistical methods for microarrays should be. I’m not convinced, the technology has finally matured to the point we can use it for personalized medicine and we abandon it for the next hot thing? Not to Andrew for the link. Data from 5 billion webpages available from the Common Crawl. Want to build your own search tool - or just...
Jan 22nd
1 note
6 tags
Interview With Joe Blitzstein
Joe Blitzstein Joe Blitzstein is Professor of the Practice in Statistics at Harvard University and co-director of the graduate program. He moved to Harvard after obtaining his Ph.D. with Persi Diaconis at Stanford University. Since joining the faculty at Harvard, he has been immortalized in Youtube prank videos, been awarded a “favorite professor” distinction four times, and...
Jan 20th
4 notes
Data Journalism Awards →
In data journalism, reporters leverage numerical data and databases to gather, organize and produce news.
Jan 19th
3 tags
Fundamentals of Engineering Review Question Oops
The Fundamentals of Engineering Exam is the first licensing exam for engineers. You have to pass it on your way to becoming a professional engineer (PE). I was recently shown a problem from a review manual:  When it is operating properly, a chemical plant has a daily production rate that is normally distributed with a mean of 880 tons/day and a standard deviation of 21 tons/day. During an...
Jan 19th
2 notes
5 tags
figshare and don't trust celebrities stating facts
A couple of links: figshare is a site where scientists can share data sets/figures/code. One of the goals is to encourage researchers to share negative results as well. I think this is a great idea - I often find negative results and this could be a place to put them. It also uses a tagging system, like Flickr. I think this is a great idea for scientific research discovery. They give you...
Jan 17th
6 notes
Jan 16th
5 tags
Sunday Data/Statistics Link Roundup
Statistics help for journalists (don’t forget to keep rating stories!) This is the kind of thing that could grow into a statisteracy page. The author also has a really nice plug for public schools.  An interactive graphic to determine if you are in the 1% from the New York Times (I’m not…). Mike Bostock’s d3.js presentation, this is some really impressive visualization...
Jan 15th
10 notes
6 tags
In the era of data what is a fact?
The Twitter universe is abuzz about this article in the New York Times. Arthur Brisbane, who responds to reader’s comments, asks  I’m looking for reader input on whether and when New York Times news reporters should challenge “facts” that are asserted by newsmakers they write about. He goes on to give a couple of examples of qualitative facts that reporters have used in stories without...
Jan 13th
26 notes
Academics are partly to blame for supporting the...
Michael Eisen recently published a New York Times op-ed arguing that a bill meant to protect publishers, introduced in the House of Representatives, will result in tax payers paying twice for scientific research. According to Eisen If the bill passes, to read the results of federally funded research, most Americans would have to buy access to individual articles at a cost of $15 or $30 apiece....
Jan 13th
3 notes
6 tags
Help us rate health news reporting with...
We here at Simply Statistics are big fans of science news reporting. We read newspapers, blogs, and the news sections of scientific journals to keep up with the coolest new research.  But health science reporting, although exciting, can also be incredibly frustrating to read. Many articles have sensational titles, like “How using Facebook could raise your risk of cancer”. The articles...
Jan 11th
7 notes