Here are a few ideas that might make for interesting student projects at all levels (from high-school to graduate school). I’d welcome ideas/suggestions/additions to the list as well. All of these ideas depend on free or scraped data, which means that anyone can work on them. I’ve given a ballpark difficulty for each project to give people some idea.
Happy data crunching!
Creating a webpage that explains conceptual statistical issues like randomization, margin of error, overfitting, cross-validation, concepts in data visualization, sampling. The webpage should not use any math at all and should explain the concepts so a general audience could understand. Bonus points if you make short 30 second animated youtube clips that explain the concepts. (Difficulty: Lowish; Effort: Highish)
Building an aggregator for statistics papers across disciplines that can be the central resource for statisticians. Journals ranging from PLoS Genetics to Neuroimage now routinely publish statistical papers. But there is no one central resource that aggregates all the statistics papers published across disciplines. Such a resource would be hugely useful to statisticians. You could build it using blogging software like Wordpress so articles could be tagged/you could put the resource in your RSS feeder. (Difficulty: Lowish; Effort: Mediumish)
Scrape the LivingSocial/Groupon sites for the daily deals and develop a prediction of how successful the deal will be based on location/price/type of deal. You could use either the RCurl R package or the XML R package to scrape the data. (Difficulty: Mediumish; Effort: Mediumish)
You could use the data from your city (here are a few cities with open data) to: (a) identify the best and worst neighborhoods to live in based on different metrics like how many parks are within walking distance, crime statistics, etc. (b) identify concrete measures your city could take to improve different quality of life metrics like those described above - say where should the city put a park, or (c) see if you can predict when/where crimes will occur (like these guys did). (Difficulty: Mediumish; Effort: Highish)
Download data on state of the union speeches from here and use the tm package in R to analyze the patterns of word use over time (Difficulty: Lowish; Effort: Lowish)
Use this data set from Donors Choose to determine the characteristics that make the funding of projects more likely. You could send your results to the Donors Choose folks to help them improve the funding rate for their projects. (Difficulty: Mediumish; Effort: Mediumish)
Which basketball player would you want on your team? Here is a really simple analysis done by Rafa. But it doesn’t take into account things like defense. If you want to take on this project, you should take a look at this Denis Rodman analysis which is the gold standard. (Difficulty: Mediumish; Effort: Highish).
Creating an R package that wraps the svgAnnotation package. This package can be used to create dynamic graphics in R, but is still a bit too flexible for most people to use. Writing some wrapper functions that simplify the interface would be potentially high impact. Maybe something like svgPlot() to create simple, dynamic graphics with only a few options (Difficulty: Mediumish; Effort: Mediumish).
The same as project 1 but for D3.js. The impact could potentially be a bit higher, since the graphics are a bit more professional, but the level of difficulty and effort would also both be higher. (Difficulty: Highish; Effort: Highish)
Here is a link to the press release from Duke regarding Anil Potti and the Duke Saga.
It’s impossible to develop a system that will completely eliminate academic fraud if a researcher is intent on misconduct, said Sally Kornbluth, vice dean for the basic sciences in the School of Medicine. “But this case highlighted that we can take a hard look at the infrastructure and the culture around research to reduce it,” she said. “And we can provide safeguards for people who are trying to do the right things, but make errors or are guilty of sloppiness. In research, errors are more likely to be made through sloppiness than through fraud.”
One of those safeguards is biostatisticians:
Kornbluth said there was “a dire need” in many research labs for quantitative expertise to review data. As a result, Duke has taken steps to embed biostatisticians in clinical research groups. Already this change has attracted attention from other research institutions looking to reduce errors in data analysis.
Graham & Dodd's Security Analysis: Moneyball for...Money
The last time I posted something about finance I got schooled by people who actually know stuff. So let me just say that I don’t claim to be an expert in this area, but I do have an interest in it and try to keep up the best I can.
One book I picked up a little while ago was Security Analysis by Benjamin Graham and David Dodd. This is the “bible of value investing” and so I mostly wanted to see what all the hubbub was about. In my mind, the hubbub is well-deserved. Given that it was originally written in 1934, the book has stood the test of time (the book has been updated a number of times since then). It’s quite readable and, I guess, still relevant to modern-day investing. In the 6th edition the out-of-date stuff has been relegated to an appendix. It also contains little essays (of varying quality) by modern-day value investing heros like Seth Klarman and Glenn Greenberg. It’s a heavy book though and I’m wishing I’d got it on the Kindle.
It occurred to me that with all the interest in data and analytics today, Security Analysis reads a lot like the Moneyball of investing. The two books make the same general point: find things that are underpriced/underappreciated and buy them when no one’s looking. Then profit!
One of the basic points made early on is that roughly speaking, you can’t judge a security by its cover. You need to look at the data. How novel! For example, at the time bonds were considered safe because they were bonds, while stocks (equity) were considered risky because they were stocks. There are technical reasons why this is true, but a careful look at the data might reveal that the bonds of one company are risky while the stock is safe, depending on the price at which they are trading. The question to ask for either type of security is what’s the chance of losing money? In order to answer that question you need to estimate the intrinsic value of the company. For that, you need data.
The functions of security analysis may be described under three headings: descriptive, selective, and critical. In its more obvious form, descriptive analysis consists of marshalling the important facts relating to the issue [security] and presenting them in a coherent, readily intelligible manner…. A more penetrating type of description seeks to reveal the strong and weak points in the position of an issue, compare its exhibit with that of others of similar character, and appraise the factors which are likely to influence its future performance. Analysis of this kind is applicable to almost every corporate issue, and it may be regarded as an adjunct not only to investment but also to intelligent speculation in that it provides an organized factual basis for the application of judgment.
Back in Graham & Dodd’s day it must have been quite a bit harder to get the data. Many financial reports that are routinely published today by public companies were not available back then. Today, we are awash in easily accessible financial data and, one might argue as a result of that, there are fewer opportunities to make money.
An interesting story about faulty water bills in Baltimore discovered via some shoe-leather data analysis.
In 2006, Stewart was running a business, the Gaslight Tavern, and managing two rental properties. Water bills weren’t exactly foremost in her mind.
Then the water bill for one of her rental properties jumped from $40 to $800, while her business received a bill showing it had used no water, which she knew was impossible.
"I started looking into it, and I saw that everybody’s getting incorrect water bills, and the city is just letting them pay," she said. "I haven’t stopped because I see people losing their homes over incorrect water bills."
So, Stewart began compiling data, downloading to her home computer thousands of water bills, which are public information posted to a city-maintained website. The more she learned, the more her outrage grew. In 2008, she put up her first website. She called herself “WaterBillWoman.”
Prediction: the Lasso vs. just using the top 10 predictors
One incredibly popular tool for the analysis of high-dimensional data is the lasso. The lasso is commonly used in cases when you have many more predictors than independent samples (the n « p) problem. It is also often used in the context of prediction.
Suppose you have an outcome Y and several predictors X1,…,XM, the lasso fits a model:
Y = B0 + B1 X1 + B2 X2 + … + BM XM + E
subject to a constraint on the sum of the absolute value of the B coefficients. The result is that: (1) some of the coefficients get set to zero, and those variables drop out of the model, (2) other coefficients are “shrunk” toward zero. Dropping some variables is good because there are a lot of potentially unimportant variables. Shrinking coefficients may be good, since the big coefficients might be just the ones that were really big by random chance (this is related to Andrew Gelman’s type M errors).
I work in genomics, where n«p problems come up all the time. Whenever I use the lasso or when I read papers where the lasso is used for prediction, I always think: “How does this compare to just using the top 10 most significant predictors?” I have asked this out loud enough that somepeoplearoundherestarted calling it the “Leekasso” to poke fun at me. So I’m going to call it that in a thinly veiled attempt to avoid Stigler’s law of eponymy (actually Rafa points out that using this name is a perfect example of this law, since this feature selection approach has been proposed before at least once).
Here is how the Leekasso works. You fit each of the models:
Y = B0 + BkXk + E
take the 10 variables with the smallest p-values from testing the Bk coefficients, then fit a linear model with just those 10 coefficients. You never use 9 or 11, the Leekasso is always 10.
For fun I did an experiment to compare the accuracy of the Leekasso and the Lasso.
Here is the setup:
I simulated 500 variables and 100 samples for each study, each N(0,1)
I created an outcome that was 0 for the first 50 samples, 1 for the last 50
I set a certain number of variables (between 5 and 50) to be associated with the outcome using the model Xi = b0i + b1iY + e (this is an important choice, more later in the post)
I tried different levels of signal to the truly predictive features
I generated two data sets (training and test) from the exact same model for each scenario
I fit the Lasso using the lars package, choosing the shrinkage parameter as the value that minimized the cross-validation MSE in the training set
I fit the Leekasso and the Lasso on the training sets and evaluated accuracy on the test sets.
The R code for this analysis is available here and the resulting data is here.
The results show that for all configurations, using the top 10 has a higher out of sample prediction accuracy than the lasso. A larger version of the plot is here.
Interestingly, this is true even when there are fewer than 10 real features in the data or when there are many more than 10 real features ((remember the Leekasso always picks 10).
Some thoughts on this analysis:
This is only test-set prediction accuracy, it says nothing about selecting the “right” features for prediction.
The Leekasso took about 0.03 seconds to fit and test per data set compared to about 5.61 seconds for the Lasso.
The data generating model is the model underlying the top 10, so it isn’t surprising it has higher performance. Note that I simulated from the model: Xi = b0i + b1iY + e, this is the model commonly assumed in differential expression analysis (genomics) or voxel-wise analysis (fMRI). Alternatively I could have simulated from the model: Y = B0 + B1 X1 + B2 X2 + … + BM XM + E, where most of the coefficients are zero. In this case, the Lasso would outperform the top 10 (data not shown). This is a key, and possibly obvious, issue raised by this simulation. When doing prediction differences in the true “causal” model matter a lot. So if we believe the “top 10 model” holds in many high-dimensional settings, then it may be the case that regularization approaches don’t work well for prediction and vice versa.
I think what may be happening is that the Lasso is overshrinking the parameter estimates, in other words, you give up too much bias for a gain in variance. Alan Dabney and John Storey have a really nice paper discussing shrinkage in the context of genomic prediction that I think is related.
Professional statisticians agree: the knicks should start Steve Novak over Carmelo Anthony
A week ago, Nate Silver tweeted this:
Since Lin became starting PG, Knicks have outscored opponents by 63 with Novak on the floor. Been outscored by 8 when he isn’t.
In a previous post we showed the plot below. Note that Carmelo Anthony is in ball hog territory. Novak plays the same position as Anthony but is a three point specialist. His career three point FG% of 42% (253-603) puts him 10th all time! It seems that with Lin in the lineup he is getting more open shots and helping his team. Should the Knicks start Novak?
Three studies published this week found that people exposed to pollutants have a higher risk of stroke, heart attacks and cognitive deterioration.
One study in particular by Jennifer Weuve at Rush University Medical Center examined fine PM and cognition in older women.
Dr. Weuve’s research followed 19,409 women in the United States between the ages of 70 and 81 for about a decade, looking at changes in cognition every two years. Declines in memory and executive function, including the ability to plan and make or carry out a strategy, are normal as people get older. But the study showed that women with higher levels of long-term exposure to air pollution had “significantly” faster declines in cognition than those with less exposure to pollutants.
This is a fascinating finding and has not been particularly well-studied in the past with respect to air pollution exposure. Although it needs to be replicated in future studies, this adds an interesting piece to the puzzle of how ambient air pollution affects human health overall.
“Cognitively speaking, this higher exposure is as if you had aged an extra two years,” said Dr. Weuve, an assistant professor at the Rush Institute for Healthy Aging at Rush University Medical Center in Chicago. That might not sound like much, she added, but if there were a treatment “that could just delay the onset of dementia by two years, that would spare the population millions of cases of disease over the next 40 years.”
A record chain of kidney transplants resulted from a mix of medical need, pay-it-forward selflessness and lockstep coordination among 17 hospitals over four months.
This is a fascinating story of the longest “domino chain” of kidney transplantations yet done.
Domino chains, which were first attempted in 2005 at Johns Hopkins, seek to increase the number of people who can be helped by living donors. In 2010, chains and other forms of paired exchanges resulted in 429 transplants. Computer models suggest that an additional 2,000 to 4,000 transplants could be achieved each year if Americans knew more about such programs and if there were a nationwide pool of all eligible donors and recipients.
Your shopping habits reveal even the most personal information — like when you’re going to have a baby.
Andrew Pole had just started working as a statistician for Target in 2002, when two colleagues from the marketing department stopped by his desk to ask an odd question: “If we wanted to figure out if a customer is pregnant, even if she didn’t want us to know, can you do that?”
The real conundrum:
Using data to predict a woman’s pregnancy, Target realized soon after Pole perfected his model, could be a public-relations disaster. So the question became: how could they get their advertisements into expectant mothers’ hands without making it appear they were spying on them? How do you take advantage of someone’s habits without letting them know you’re studying their lives?
I.B.M. is the world’s largest employer of Ph.D.’s. It has plenty of businesses it can throw them at, but the trick is figuring out which ones will yield the best return. That happens by finding the algorithms for one industry, like power generation, that will work in another, like traffic management.
I.B.M., Mr. Mills [I.B.M.’s senior vice president for software and systems] said, is now the largest employer of Ph.D. mathematicians in the world, bringing their talents to things like oil exploration and medicine. “On the side we’re doing astrophysics, genomics, proteomics,” he said.
Generalizability appears to be the key to making money for I.B.M.
The trend of looking for commonalities and overlapping interests is emerging in many parts of both academia and business. At the ultrasmall nanoscale examination of a cell, researchers say, the disciplines of biology, chemistry and physics begin to collapse in on each other. In a broader search for patterns, students of the statistical computing language known as R have used methods of counting algae blooms to prove patterns of genocide against native peoples in Central America. Online marketers look at your behavior in a number of contexts to sell you something you may not even know you wanted.
The President’s budget proposal would hold the National Institutes of Health’s (NIH’s) budget at the current level of $30.86 billion.
In order to squeeze more grants out of the flat budget—the target is an 8% increase in new grants, to 672, for a total of 9415—NIH will put in place new grant management policies. Continuing grants will be cut 1% below the 2012 level, competing grants wouldn’t get inflationary increases in future years, and NIH will add a new layer of review for proposals from investigators who already have at least $1.5 million in funding.
Back in January we interviewed Joe Blitzstein and pointed out that he made his lectures freely available on iTunes. Now it is a course on iTunes and the format has been upgraded to work better with iPhones and iPads. Enjoy!
Mortimer Spiegelman Award: Call for Nominations. Deadline is April 1, 2012
The Statistics Section of the American Public Health Association invites nominations for the 2012 Mortimer Spiegelman Award honoring a statistician aged 40 or younger who has made outstanding contributions to health statistics, especially public health statistics.
The award was established in 1970 and is presented annually at the APHA meeting. The award serves three purposes: to honor the outstanding achievements of both the recipient and Spiegelman, to encourage further involvement in public health of the finest young statisticians, and to increase awareness of APHA and the Statistics Section in the academic statistical community. More details about the award including the list of the past recipients and more information about the Statistics Section of APHA may be found here.
To be eligible for the 2012 Spiegelman Award, a candidate must have been born in 1972 or later. Please send electronic versions of the nominating letter and the candidate’s CV to the 2012 Spiegelman Award Committee Chair, Rafael A. Irizarry email@example.com.
Please state in the nominating letter the candidate’s birthday. The nominator should include one or two paragraphs in the nominating letter that describe how the nominee’s contributions relate to public health concerns. A maximum of three supporting letters per nomination can be provided. Nominations for the 2012 Award must be submitted by April 1, 2012.
Following up on the 60 Minutes segment on the Duke Clinical Trials Saga, here’s a video of Keith Baggerly giving talk about his and Kevin Coombes’ investigation of the data and the methods. Thanks to Andrew J. for the link.
Melissa Harris-Perry, with her progressive talk show of the same name, is set to join MSNBC’s weekend lineup of cable news shows beginning Saturday.
Ms. Harris-Perry will be the only tenured professor in the United States — and one of a very small number of African-American women — who serves as a cable news host.
The article talks about how Ms. Harris-Perry had given a specific lecture at Tulane on African-Americans being elected as cities go into economic decline:
One day last summer, when Ms. Harris-Perry was filling in for Rachel Maddow on MSNBC, she recast the class lecture as a television segment, invoking Detroit; her adopted home, New Orleans; President Obama; and tax policy.
“I’ve given that lecture a million times — a million times,” Ms. Harris-Perry said in a recent interview. “But I do it once on Rachel’s show, and it was everywhere the next day. It was up on Web sites, people were e-mailing me — that, for me, was a really clear indication of how powerful television is.”
An awesome alternative to D3.js - R’s svgAnnotation package. Here’s the paper in JSS. I feel like this is one step away from gaining broad use in the statistics community - it still feels a little complicated building the graphics, but there is plenty of flexibility there. I feel like a great project for a student at any level would be writing some easy wrapper functions for these functions.
How to run R on your Android device. This is very cool - can’t wait to start running simulations on my Nexus S.
For those who can make sense of the explosion of data, there are job opportunities in fields as diverse as crime, retail and dating.
Veteran data analysts tell of friends who were long bored by discussions of their work but now are suddenly curious. “Moneyball” helped, they say, but things have gone way beyond that. “The culture has changed,” says Andrew Gelman, a statistician and political scientist at Columbia University. “There is this idea that numbers and statistics are interesting and fun. It’s cool now.”
Just to be clear, people I meet are still bored by discussions of my work. Now I have confirmation that it’s just me.
Peter Theil gives his take on science funding/peer review:
My libertarian views are qualified because I do think things worked better in the 1950s and 60s, but it’s an interesting question as to what went wrong with DARPA. It’s not like it has been defunded, so why has DARPA been doing so much less for the economy than it did forty or fifty years ago? Parts of it have become politicized. You can’t just write checks to the thirty smartest scientists in the United States. Instead there are bureaucratic processes, and I think the politicization of science—where a lot of scientists have to write grant applications, be subject to peer review, and have to get all these people to buy in—all this has been toxic, because the skills that make a great scientist and the skills that make a great politician are radically different. There are very few people who are both great scientists and great politicians. So a conservative account of what happened with science in the 20thcentury is that we had a decentralized, non-governmental approach all the way through the 1930s and early 1940s. At that point, the government could accelerate and push things tremendously, but only at the price of politicizing it over a series of decades. Today we have a hundred times more scientists than we did in 1920, but their productivity per capita is less that it used to be.
Thiel has a history of making controversial comments, and I don’t always agree with him, but I think that his point about the politicization of the grant process is interesting.
An example of how sending a paper to a statistics journal can get you scooped
In a previous post I complained about statistics journals taking way too long rejecting papers. Today I am complaining because even when everything goes right —better than above average review time (for statistics), useful and insightful comments from reviewers— we can come out losing.
In May 2011 we submitted a paper on removing GC bias from RNAseq data to Biostatistics. It was published on December 27. However, we were scooped by this BMC Bioinformatics paper published ten days earlier despite being submitted three months later and accepted 11 days after ours. The competing paper has already earned the “highly accessed” distinction. The two papers, both statistics papers, are very similar, yet I am afraid more people will read the one that was finished second but published first.
Note that Biostatistics is one of the fastest stat journals out there. I don’t blame the journal at all here. We statisticians have to change our culture when it comes to reviews.
Don Berry, former head of the division of quantitative sciences and chair of the department of biostatistics at the M.D. Anderson Cancer Center has a great column in Amstat News discussing collaborations between statisticians and clinicians. There are a few very nice bits, but do read the full column.
The first is
We send clear messages to our clinical collaborators that we are as interested in curing cancer as they are. We work as a team. Even though we have tools, we are not mechanics.
This mindset seems common in cancer centers. The focus on a single disease has a way of bringing people together.
Regarding getting involved in the science:
Our statisticians become specialists in the diseases within which they collaborate….My pet peeve is the statistician who designs a clinical trial by asking for the null rate, clinically important difference, and accrual rate and uses standard software to produce a sample size. Where are the questions about the disease, its standard treatment, its prevalence, and its biology?
An R script for estimating future inflation via the Treasury market
One factor that is critical for any financial planning is estimating what future inflation will be. For example, if you’re saving money in an instrument that gains 3% per year, and inflation is estimated to be 4% per year, well then you’re losing money in real terms.
There are a variety of ways to estimate the rate of future inflation. You could, for example, use past rates as an estimate of future rates. However, the Treasury market provides an estimate of what the market thinks annual inflation will be over the next 5, 10, 20, and 30 years.
Basically, the Treasury issue two types of securities: nominal securities that pay a nominal interest rate (fixed percentage of your principal), and inflation-indexed securities (TIPS) that pay an interest rate that is applied to your principal adjusted by the consumer price index (CPI). As the CPI goes up and down, the payments for inflation-indexed securities go up and down (although they can’t go negative so you always get your principal back). As these securities trade throughout the day, their respective market-based interest rates go up and down continuously. The difference between the nominal interest rate and the real interest rate for a fixed period of time (5, 10, 20, years) can be used as a rough estimate of annual inflation over that time period.
Why don't we hear more about Adrian Dantley on ESPN? This graph makes me think he was as good an offensive player as Michael Jordan.
In my last post I complained about efficiency not being discussed enough by NBA announcers and commentators. I pointed out that some of the best scorers have relatively low FG% or TS%. However, via the comments it was pointed out that top scorers need to take more difficult shots and thus are expected to have lower efficiency. The plot below (made with this R script) seems to confirm this (click image to enlarge) . The dashed line is from regression and the colors represent guards (green), forwards (orange) and centers (purple).
Among this group TS% does trend down with points per game and centers tend to have higher TS%. Forwards and guards are not very different. However, the plot confirms that some of the supposed all time greats are more ball hogs than good scorers.
A couple of further observations. First, Adrian Dantley was way better than I thought. Why isn’t he more famous? Second, Kobe is no Jordan. Also note Jordan played several seasons past his prime which lowered his career averages. So I added points for five of these players using only data from their prime years (ages 24-29). Here Jordan really stands out. But so does Dantley!
pd - Note that these plots say nothing about defense, rebounding, or passing. This in-depth analysis makes a convincing argument that Dennis Rodman is one of the most valuable players of all time.
Cleveland's (?) 2001 plan for redefining statistics as "data science"
This plan has been making the rounds on Twitter and is being attributed to William Cleveland in 2001 (thanks to Kasper for the link). I’m not sure of the provenance of the document but it has some really interesting ideas and is worth reading in its entirety. I actually think that many Biostatistics departments follow the proposed distribution of effort pretty closely.
One of the most interesting sections is the discussion of computing (emphasis mine):
Data analysis projects today rely on databases, computer and network hardware, and computer and network software. A collection of models and methods for data analysis will be used only if the collection is implemented in a computing environment that makes the models and methods sufﬁciently efﬁcient to use. In choosing competing models and methods, analysts will trade effectiveness for efﬁciency of use.
This suggests that statisticians should look to computing for knowledge today, just as data science looked to mathematics in the past.
I also found the theory section worth a read and figure it will definitely lead to some discussion:
Mathematics is an important knowledge base for theory. It is far too important to take for granted by requiring the same body of mathematics for all. Students should study mathematics on an as-needed basis.
Not all theory is mathematical. In fact, the most fundamental theories of data science are distinctly nonmathematical. For example, the fundamentals of the Bayesian theory of inductive inference involve nonmathematical ideas about combining information from the data and information external to the data. Basic ideas are conveniently expressed by simple mathematical expressions, but mathematics is surely not at issue.
There was recently a fascinating article published in PNAS that compared the sound quality of different types of violins. In this study, researchers assembled a collection of six violins, three of which were made by Stradivari and Guarneri del Gesu and three made by modern luthiers (i.e. 20th century). The combined value of the “old” violins was $10 million, about 100 times greater than the combined value of the “new” violins. Also, they note:
Numbers of subjects and instruments were small because it is difficult to persuade the owners of fragile, enormously valuable old violins to release them for extended periods into the hands of blindfolded strangers.
Yeah, I’d say so.
They then got 21 professional violinists to try them all out wearing glasses to obscure their vision so they couldn’t see the violins. The researchers were also blinded to the type of violin as the study was being conducted.
The conclusions were striking:
We found that (i) the most-preferred violin was new; (ii) the least-preferred was by Stradivari; (iii) there was scant correlation between an instrument’s age and monetary value and its perceived quality; and (iv) most players seemed unable to tell whether their most-preferred instrument was new or old.
First, I’m glad the researchers got people to actually play the instruments. I don’t think it’s sufficient to just listen to some recordings because usually the recordings are by different performers and the quality of the recording itself may vary quite a bit. Second, the study was conducted in a hotel room for its “dry acoustics”, but I think changing the venue might have changed the results. Third, even though the authors don’t declare any specific financial conflict of interest, it’s worth noting that the second author is a violinmaker who could theoretically benefit if people decide they no longer need to focus on old Italian violins.
I was surprised, but not that surprised, at the results. As a lifelong violinist, I had always wondered whether the Strads and the Guarneris were that much better. I once played on a Guarneri (for about 30 seconds) and I think it’s fair to say that it was incredible. But I’ve also seen some amazing violins made by guys in Brooklyn and New Jersey. I’d always heard that Strads have a darker more mellow sound, which I suppose is nice, but I think these days people may prefer a brighter and bigger sound, especially for those larger modern-day concert halls.
I hope that this study and others like it will get people to focus on which violins sound good rather than where they came from. I’m glad to see the use of data pose a challenge to another long-standing convention.