The pebbles of academia

I have just been awarded a certificate for successful completion of the Conflict of Interest Commitment training (I barely passed). Lately, I have been totally swamped by administrative duties and have had little time for actual research. The experience reminded me of something I read in this NYTimes article by Tyler Cowen

Michael Mandel, an economist with the Progressive Policy Institute, compares government regulation of innovation to the accumulation of pebbles in a stream. At some point too many pebbles block off the water flow, yet no single pebble is to blame for the slowdown. Right now the pebbles are limiting investment in future innovation.

Here are some of the pebbles of my academic career (past and present): financial conflict of interest training , human subjects training, HIPAA training, safety training, ethics training, submitting papers online, filling out copyright forms, faculty meetings, center grant quarterly meetings, 2 hour oral exams, 2 hour thesis committee meetings, big project conference calls, retreats, JSM, anything with “strategic” in the title, admissions committee, affirmative action committee, faculty senate meetings, brown bag lunches, orientations, effort reporting, conflict of interest reporting, progress reports (can’t I just point to pubmed?), dbgap progress reports, people who ramble at study section, rambling at study section, buying airplane tickets for invited talks, filling out travel expense sheets, and organizing and turning in travel receipts. I know that some of these are somewhat important or take minimal time, but read the quote again.

I also acknowledge that I actually have it real easy compared to others so I am interested in hearing about other people’s pebbles? 

Update: add changing my eRA commons password to list!

Tags: Rant humor

Sunday Data/Statistics Link Roundup (7/22/12)

  1. This paper is the paper describing how Uri Simonsohn identified academic misconduct using statistical analyses. This approach has received a huge amount of press in the scientific literature. The basic approach is that he calculates the standard deviations of mean/standard deviation estimates across groups being compared. Then he simulates from a Normal distribution and shows that under the Normal model, it is unlikely that the means/standard deviations are so similar. I think the idea is clever, but I wonder if the Normal model is the best choice here…could the estimates be similar because it was the same experimenter, etc.? I suppose the proof is in the pudding though, several of the papers he identifies have been retracted. 
  2. This is an amazing rant by a history professor at Swarthmore over the development of massive online courses, like the ones Roger, Brian and I are teaching. I think he makes some important points (especially about how we could do the same thing with open access in a heart beat if universities/academics through serious muscle behind it), but I have to say, I’m personally very psyched to be involved in teaching one of these big classes. I think that statistics is a field that a lot of people would like to learn something about and I’d like to make it easier for them to do that because I love statistics. I also see the strong advantage of in-person education. The folks who enroll at Hopkins and take our courses will obviously get way more one-on-one interaction, which is clearly valuable. I don’t see why it has to be one or the other…
  3. An interesting discussion with Facebook’s former head of big data. I think the first point is key. A lot of the “big data” hype has just had to do with the infrastructure needed to deal with all the data we are collecting. The bigger issue (and where statisticians will lead) is figuring out what to do with the data. 
  4. This is a great post about data smuggling. The two key points that I think are raised are: (1) how when the data get big enough, they have their own mass and aren’t going to be moved, and (2) how physically mailing harddrives is still the fastest way of transferring big data sets. That is certainly true in genomics where it is called “sneaker net” when a collaborator walks a hard drive over to our office. Hopefully putting data in physical terms will drive home the point that the new scientists are folks that deal with/manipulate/analyze data. 
  5. Not statistics related, but here is a high-bar to hold your work to: the bus-crash test. If you died in a bus-crash tomorrow, would your discipline notice? Yikes. Via C.T. Brown. 

My worst (recent) experience with peer review

My colleagues and I just published a paper on validation of genomic results in BMC Bioinformatics. It is "highly accessed" and we are really happy with how it turned out. 

But it was brutal getting it published. Here is the line-up of places I sent the paper. 

  • Science: Submitted 10/6/10, rejected 10/18/10 without review. I know this seems like a long shot, but this paper on validation was published in Science not too long after. 
  • Nature Methods: Submitted 10/20/10, rejected 10/28/10 without review. Not much to say here, moving on…
  • Genome Biology: Submitted 11/1/10, rejected 1/5/11. 2/3 referees thought the paper was interesting, few specific concerns raised. I felt they could be addressed so appealed on 1/10/11, appeal accepted 1/20/11, paper resubmitted 1/21/11. Paper rejected 2/25/11. 2/3 referees were happy with the revisions. One still didn’t like it. 
  • Bioinformatics: Submitted 3/3/11, rejected 3/1311 without review. I appealed again, it turns out “I have checked with the editors about this for you and their opinion was that there was already substantial work in validating gene lists based on random sampling.” If anyone knows about one of those papers let me know :-). 
  • Nucleic Acids Research: Submitted 3/18/11, rejected with invitation for revision 3/22/11. Resubmitted 12/15/11 (got delayed by a few projects here) rejected 1/25/12. Reason for rejection seemed to be one referee had major “philosophical issues” with the paper.
  • BMC Bioinformatics: Submitted 1/31/12, first review 3/23/12, resubmitted 4/27/12, second revision requested 5/23/12, revised version submitted 5/25/12, accepted 6/14/12. 
An interesting side note is the really brief reviews from the Genome Biology submission inspired me to do this paper. I had time to conceive the study, get IRB approval, build a web game for peer review, recruit subjects, collect the data, analyze the data, write the paper, submit the paper to 3 journals and have it come out 6 months before the paper that inspired it was published! 
Ok, glad I got that off my chest.
What is your worst peer-review story?

What is a major revision?

I posted a little while ago on a proposal for a fast statistics journal. It generated a bunch of comments and even a really nice follow up post with some great ideas. Since then I’ve gotten reviews back on a couple of papers and I think I realized one of the key issues that is driving me nuts about the current publishing model. It boils down to one simple question: 

What is a major revision? 

I often get reviews back that suggest “major revisions” in one or many of the following categories:

  1. More/different simulations
  2. New simulations
  3. Re-organization of content
  4. Re-writing language
  5. Asking for more references
  6. Asking me to include a new method
  7. Asking me to implement someone else’s method for comparison
I don’t consider any of these major revisions. Personally, I have stopped asking for them as major revisions. In my opinion, major revisions should be reserved for issues with the manuscript that suggest that it may be reporting incorrect results. Examples include:
  1. No simulations
  2. No real data
  3. The math/computations look incorrect
  4. The software didn’t work when I tried it
  5. The methods/algorithms are unreadable and can’t be followed
The first list is actually a list of minor/non-essential revisions in my opinion. They may improve my paper, but they won’t confirm that it is correct or not. I find that they are often subjective and are up to the whims of referees. In my own personal refereeing I am making an effort to remove subjective major revisions and only include issues that are critical to evaluate the correctness of a manuscript. I also try to divorce the issues of whether an idea is interesting or not from whether an idea is correct or not. 
I’d be curious to know what other peoples’ definitions of major/minor revisions are?




Sunday data/statistics link roundup (3/18)

  1. A really interesting proposal by Rafa (in Spanish - we’ll get on him to write a translation) for the University of Puerto Rico. The post concerns changing the focus from simply teaching to creating knowledge and the potential benefits to both the university and to Puerto Rico. It also has a really nice summary of the benefits that the university system in the United States has produced. Definitely worth a read. The comments are also interesting, it looks like Rafa’s post is pretty controversial…
  2. An interesting article suggesting that the Challenger Space Shuttle disaster was at least in part due to bad data visualization. Via @DatainColour
  3. The Snyderome is getting a lot of attention in genomics circles. He used as many new technologies as he could to measure a huge amount of molecular information about his body over time. I am really on board with the excitement about measurement technologies, but this poses a huge challenge for statistics and and statistical literacy. If this kind of thing becomes commonplace, the potential for false positives and ghost diagnoses is huge without a really good framework for uncertainty. Via Peter S. 
  4. More news about the Nike API. Now that is how to unveil some data! 
  5. Add the Nike API to the list of potential statistics projects for students. 

An example of how sending a paper to a statistics journal can get you scooped

In a previous post I complained about statistics journals taking way too long rejecting papers. Today I am complaining because even when everything goes right —better than above average review time (for statistics), useful and insightful comments from reviewers— we can come out losing.

In May 2011 we submitted a paper on removing GC bias from RNAseq data to Biostatistics. It was published on December 27. However, we were scooped by this BMC Bioinformatics paper published ten days earlier despite being submitted three months later and accepted 11 days after ours. The competing paper has already earned the “highly accessed” distinction. The two papers, both statistics papers, are very similar, yet I am afraid more people will read the one that was finished second but published first.

Note that Biostatistics is one of the fastest stat journals out there. I don’t blame the journal at all here. We statisticians have to change our culture when it comes to reviews.

Tags: Rant

Where do you get your data?

Here’s a question I get fairly frequently from various types of people: Where do you get your data? This is sometimes followed up quickly with “Can we use some of your data?”

My contention is that if someone asks you these questions, start looking for the exits.

Read More

P-values and hypothesis testing get a bad rap - but we sometimes find them useful.

This post written by Jeff Leek and Rafa Irizarry.

The p-value is the most widely-known statistic. P-values are reported in a large majority of scientific publications that measure and report data. R.A. Fisher is widely credited with inventing the p-value. If he was cited every time a p-value was reported his paper would have, at the very least, 3 million citations* - making it the most highly cited paper of all time. 

Read More

Dear editors/associate editors/referees, Please reject my papers quickly

The review times for most journals in our field are ridiculous. Check out Figure 1 here. A careful review takes time, but not six months. Let’s be honest, those papers are sitting on desks for the great majority of those six months. But here is what really kills me: waiting six months for a review basically saying the paper is not of sufficient interest to the readership of the journal. That decision you can come to in half a day. If you don’t have time, don’t accept the responsibility to review a paper.

I like sharing my work with my statistician colleagues, but the Biology journals never  do this to me. When my paper is not of sufficient interest, these journals reject me in days not months. I sometimes work on topics that are fast pace and many of my competitors are not statisticians. If I have to wait six months for each rejection, I can’t compete. By the time the top three applied statistics journals reject the paper, more than a year goes by and the paper is no longer novel. Meanwhile I can go through Nature Methods, Genome Research, and Bioinformatics in less than 3 months.

Nick Jewell once shared an idea that I really liked. It goes something like this. Journals in our field will accept every paper that is correct. The editorial board, with the help of referees, assigns each paper into one of five categories A, B, C, D, E based on novelty, importance, etc… If you don’t like the category you are assigned, you can try your luck elsewhere. But before you go, note that the paper’s category can improve after publication based on readership feedback. While we wait for this idea to get implemented, I please ask that if you get one of my papers and you don’t like it, reject it quickly. You can write this review: “This paper rubbed me the wrong way and I heard you like being rejected fast so that’s all I am going to say.” Your comments and critiques are valuable, but not worth the six month wait. 

ps -  I have to admit that the newer journals have not been bad to me in this regard. Unfortunately, for the sake of my students/postdocs going into the job market and my untenured jr colleagues, I feel I have to try the established top journals first as they still impress more on a CV.

Tags: Rant humor

Reverse scooping

I would like to define a new term: reverse scooping is when someone publishes your idea after you, and doesn’t cite you. It has happened to me a few times. What does one do? I usually send a polite message to the authors with a link to my related paper(s). These emails are usually ignored, but not always. Most times I don’t think it is malicious though. In fact, I almost reverse scooped a colleague recently.  People arrive at the same idea a few months (or years) later and there is just too much literature to keep track-off. And remember the culprit authors were not the only ones that missed your paper, the referees and associate editor missed it as well. One thing I have learned is that if you want to claim an idea, try to include it in the title or abstract as very few papers get read cover-to-cover.

Tags: Rant Advice