- There has been a lot of talk about the Michael Lewis (of Moneyball fame) profile of Obama in Vanity fair. One interesting quote I think deserves a lot more discussion is: “On top of all of this, after you have made your decision, you need to feign total certainty about it. People being led do not want to think probabilistically.” This is a key issue that is only going to get worse going forward. All of public policy is probabilistic - we are even moving to clinical trials to evaluate public policy.
- It’s sort of amazing to me that I hadn’t heard about this before, but a UC Davis professor was threatened for discussing the reasons PSA screening may be overused. This same issue keeps coming up over and over - screening healthy populations for rare diseases is often not effective (you need a ridiculously high specificity or a treatment with almost no side effects). What we need is John McGready to do a claymation public service video or something explaining the reasons screening might not be a good idea to the general public.
- A bleg - I sometimes have a good week finding links myself and there are a few folks who regularly send links (Andrew J., Alex N., etc.) I’d love it if people would send me cool links when they see them with the email title, “Sunday LInks” - i’m sure there is more cool stuff out there.
- The ICSB has a competition to improve the coverage of computational biology on Wikipedia. Someone should write a surrogate variable analysis or robust multiarray average article.
- I had not hear of the ASA’s Stattrak until this week, it looks like there are some useful resources there for early career statisticians. With the onset of fall, it is closing in on a new recruiting season. If you are a postdoc/student on the job market and you haven’t read Rafa’s post on soft vs. hard money, now is the time to start brushing up! Stay tuned for more job market posts this fall from Simply Statistics.
It looks like the journal Nature is hiring a Chief Data Editor (link via Hilary M.). It looks like the primary purpose of this editor is to develop tools for collecting, curating, and distributing data with the goal of improving reproducible research.
The main duties of the editor, as described by the ad are:
Nature Publishing Group is looking for a Chief Editor to develop a product aimed at making research data more available, discoverable and interpretable.
The ad also mentions having an eye for commercial potential; I wonder if this move was motivated by companies like figshare who are already providing a reproducible data service. I haven’t used figshare, but the early reports from friends who have are that it is great.
The thing that bothered me about the ad is that there is a strong focus on data collection/storage/management but absolutely no mention of the second component of the data science problem: making sense of the data. To make sense of piles of data requires training in applied statistics (called by whatever name you like best). The ad doesn’t mention any such qualifications.
Even if the goal of the position is just to build a competitor to figshare, it seems like a good idea for the person collecting the data to have some idea of what researchers are going to do with it. When dealing with data, those researchers will frequently be statisticians by one name or another.
Bottom line: I’m stoked Nature is recognizing the importance of data in this very prominent way. But I wish they’d realize that a data revolution also requires a revolution in statistics.
Amy Heineike is the Director of Mathematics at Quid, a startup that seeks to understand technology development and dissemination through data analysis. She was the first employee at Quid, where she helped develop their technology early on. She has been recognized as one of the top Big Data Scientists. As a part of our ongoing interview series talked to Amy about data science, Quid, and how statisticians can get involved in the tech scene.
We’ve always been especially impressed with people who have built software in a group and shared or distributed that software in some way. Getting involved in an open source project, working with version control in a team, or sharing your code on github are all good ways to start on this.
The game has been dominated for a long time by the folks over in CS. But the value of many recent startups is either based on, or can be magnified by, good data analysis. Here are a few startups that are based on data/data analysis:
- The Climate Corporation -analyzes climate data to sell farmers weather insurance.
- Flightcaster - uses public data to predict flight delays
- Quid - uses data on startups to predict success, among other things.
- 100plus - personalized health prediction startup, predicting health based on public data
- Hipmunk - The main advantage of this site for travel is better data visualization and an algorithm to show you which flights have the worst “agony”.
To launch a startup you need just a couple of things: (1) a good, valuable source of data (there are lots of these on the web) and (2) a good idea about how to analyze them to create something useful. The second step is obviously harder than the first, but the companies above prove you can do it. Then, once it is built, you can outsource/partner with developers - web and otherwise - to implement your idea. If you can build it in R, someone can make it an app.
These are just a few of the startups whose value is entirely derived from data analysis. But companies from LinkedIn, to Bitly, to Amazon, to Walmart are trying to mine the data they are generating to increase value. Data is now being generated at unprecedented scale by computers, cell phones, even thremostats! With this onslaught of data, the need for people with analysis skills is becoming incredibly acute.
Statisticians, like computer scientists before them, are poised to launch, and make major contributions to, the next generation of startups.
Up until about 20 years ago, postdocs were scarce in Statistics. In contrast, during the same time period, it was rare for a Biology PhD to go straight into a tenure track position.
Driven mostly by the availability of research funding for those working in applied areas, postdocs are becoming much more common in our field and I think this is great. It is great for PhD students to expand their horizons during two years in which they don’t have to worry about teaching, committee meetings, or grant writing. It is also great for those of us fortunate enough to work with well-trained, independent, energetic, bright, and motivated fresh PhDs. Many of our best graduates are electing to postpone their entry into tenure track jobs in favor of postdocs. Also students from other fields, computer science and engineering in particular, are taking postdocs with statisticians. I think these are both good trends. If they continue, the result will be that, as a field, we will become more well-rounded and productive.
This trend has been particularly beneficial for me. Most of the postdocs I have hired have come to me with a CV worthy of a tenure track job. They have been independent and worked more as collaborators than advisees. So why pass on more $ and prestige? A PhD in Statistics/Computer Science/Engineering can be on a very specific topic and students may not gain any collaborative experience whatsoever. A postdoc at Hopkins Biostat provides a new experience in a highly collaborative environment, with access to world leaders in the biomedical sciences, and where we focus on development of applied tools. The experience can also improve a student’s visibility and job prospects, while delaying the tenure clock until they have more publications under their belts.
An important thing you should be aware of is that in many departments you can negotiate the start of a tenure track position. So seriously consider taking 1-2 years of almost 100% research time before commencing the grind of a tenure track job.
I’m not the only one who thinks postdocs are a good thing for our field and for biostatistics students. The column below was written by Terry Speed in November 2003 and is reprinted with permission from the IMS Bulletin, http://bulletin.imstat.org
In Praise of Postdocs
I don’t know what proportion of IMS members have PhDs (or an equivalent) in probability or statistics, but I’d guess it’s fairly high. I don’t know what proportion of those that do have PhDs would also have formal post-doctoral research experience, but here I’d guess it’s rather low.
Why? One possible reason is that for much of the last 40 years, anyone completing a PhD in prob or stat and wanting a research career, could go straight into one. Prospective employers of people with PhDs in our field—be they universities, research institutes, national labs or companies—don’t require their novices to have completed a postdoc, and most graduating PhDs are only to happy to go straight into their first job.
This is in sharp contrast with the biological and physical sciences, where it is rare to appoint someone to a tenure-track faculty or research scientist position without their having completed one or more postdocs.
Thee number of people doing postdocs in probability or statistics has been growing over the last 15 years. This is in part due to the arrival on the scene of institutes such as the MSRI, IMA, IPAM, NISS, NCAR, and recently the MBI and SAMSI in the US, the Newton Institute in the UK, the Fields Institute in Canada, the Institut Henri Poincaré in France, and others elsewhere around the world. In such institutes short- term postdoc positions go with their current research programs, and there are usually a smaller number continuing for longer periods.
It is also the case that an increasing number of senior researchers are being awarded research funds to support postdocs in prob or stat, often in the newer, applied areas such as computational biology.
And finally, it is has long been the case that many countries (Germany, Sweden, Switzerland, and the US, to name a few) have national grants supporting postdoctoral research in their own or, even better, another country. I think all of this is great, and would like to see this trend continue and strengthen.
Why do I think postdocs are a good thing? And why do I think young probabilists and statisticians should do one, even when they can get a good job without having done so?
For most of us, doing a PhD means getting totally absorbed in some relatively narrow research area for 2–3 years, treating that as the most important part of science for that time, and trying to produce some of the best work in that area. This is fine, and we get a PhD for our efforts, but is it good training for a lifelong research career? While it is obviously good preparation for doing more of the same, I don’t think it is adequate for research in general. I regard the successful completion of a PhD as (at least) evidence that the person in question can do research, but it doesn’t follow that they can go on and successfully do research in new area, or in a different environment, or without close supervision.
Postdocs give you the chance to broaden, to learn new technical skills, to become acquainted with new areas, and to absorb the culture of a new institution, all at a time when your professional responsibilities are far fewer than they would have been had you taken that first “real” job. The postdoc period can be a wonderful time in your scientific life, one which sees you blossom, building on the confidence you gained by having completed your PhD, in what is still essentially a learning environment, but one where you can follow your own interests, explore new areas, and still make mistakes. At the worst, you have delayed your entry into the workforce two or three years, and you can still keep on working in your PhD area if you wish. The number of openings for researchers in prob or stat doesn’t fluctuate so much on this time scale, so you are unlikely to be worse off than the earnings foregone. At best, you will move into a completely new area of research, one much better suited to your personal interests and skills, perhaps also better suited to market demand, but either way, one chosen with your PhD experience behind you. This can greatly enhance your long-term career prospects and more than compensate for your delayed entry into the workforce.
Students: the time to think about this is now [November], not just as you are about to file your dissertation. And the choice is not necessarily one between immediate security and career development: you might be able to have both. You shouldn’t shy from applying for tenure-track jobs and postdocs at the same time, and if offered the job you want, requesting (say) two years’ leave of absence to do the postdoc you want. Employers who care about your career development are unlikely to react badly to such a request.
It’s worth following up a little on why the Obama campaign is recruiting statisticians (note to Karen: I am not looking for a new job!). Here’s the blurb for the position of “Statistical Modeling Analyst”:
The Obama for America Analytics Department analyzes the campaign’s data to guide election strategy and develop quantitative, actionable insights that drive our decision-making. Our team’s products help direct work on the ground, online and on the air. We are a multi-disciplinary team of statisticians, mathematicians, software developers, general analysts and organizers - all striving for a single goal: re-electing President Obama. We are looking for staff at all levels to join our department from now through Election Day 2012 at our Chicago, IL headquarters.
Statistical Modeling Analysts are charged with predicting electoral outcomes using statistical models. These models will be instrumental in helping the campaign determine how to most effectively use its resources.
I wonder if there’s a bonus for predicting the correct outcome, win or lose?
The Obama campaign didn’t invent the idea of heavy data analysis in campaigns, but they seem to be heavy adopters. There are 3 openings in the “Analytics” category as of today.
Now, can someone tell me why they don’t just call it simply “Statistics”?