Every professor is a startup

There has been a lot of discussion lately about whether to be in academia or industry. Some of it I think is a bit unfair to academia. Then I saw this post on Quora asking what Hilary Mason’s contributions were to machine learning, like she hadn’t done anything. It struck me as a bit of academia hating on industry*. I don’t see why one has to be better/worse than the other, as Roger points out, there is no perfect job and it just depends on what you want to do. 

One thing that I think gets lost in all of this are the similarities between being an academic researcher and running a small startup. To be a successful professor at a research institution, you have to create a product (papers/software), network (sit on editorial boards/review panels), raise funds (by writing grants), advertise (by giving talks/presentations), identify and recruit talent (students and postdocs), manage people and personalities (students,postdocs, collaborators) and scale (you start as just yourself, and eventually grow to a group with lots of people). 

The goals are somewhat different. In a startup company, your goal is ultimately to become a profitable business. In academia, the goal is to create an enterprise that produces scientific knowledge. But in either enterprise it takes a huge amount of entrepreneurial spirit, passion, and hustle. It just depends on how you are spending your hustle. 

*Sidenote: One reason I think she is so famous is that she helps people, even people that can’t necessarily do anything for her. One time I wrote her out of the blue to see if we could get some Bitly data to analyze for a class. She cheerfully helped us get it, even though the immediate payout for her was not obvious. But I tell you what, when people ask me about her, I’ll tell them she is awesome. 

Schlep blindness in statistics

This is yet another outstanding post by Paul Graham, this time on “Schlep Blindness”. He talks about how there are great startup ideas that no one considers because they are too much of a “schlep” (a tedious unpleasant task). He talks about how most founders of startups want to put up a clever bit of code they wrote and just watch the money flow in. But of course it doesn’t work like that, you need to advertise, interact with customers, raise money, go out and promote your work, fix bugs at 3am, etc. 

In academia there is a similar tendency to avoid projects that involve a big schlep. For example, it is relatively straightforward to develop a mathematical model, work out the parameter estimates, and write a paper. But it is a big schlep to then write fast code that implements that method, debug the code, dummy proof the code, fix bugs submitted by users, etc. Rafa’s post, Hadley’s interview, and the discussion Rafa linked to all allude to this issue. Particularly the fact that the schlep, the long slow slog of going through a new data type or writing a piece of usable software is somewhat undervalued. 

I think part of the problem is our academic culture and heritage, which has traditionally put a very high premium on being clever and a relatively low premium on being willing to go through the schlep. As applied statistics touches more areas and the number of users of statistical software and ideas grows, the schlep becomes just as important as the clever idea. If you aren’t willing to put in the time to code your methods up and make them accessible to other investigators, then who will be? 

To bring this back to the discussion inspired by Rafa’s post, I wonder if applied statistics journals could increase their impact, encourage more readership from scientific folks, and support a broader range of applied statisticians if there was a re-weighting of the importance of cleverness and schlep? As Paul points out: 

 In addition to their intrinsic value, they’re like undervalued stocks in the sense that there’s less demand for them among founders. If you pick an ambitious idea, you’ll have less competition, because everyone else will have been frightened off by the challenges involved.

Interview with Amy Heineike - Director of Mathematics at Quid

Amy Heineike

Amy Heineike is the Director of Mathematics at Quid, a startup that seeks to understand technology development and dissemination through data analysis. She was the first employee at Quid, where she helped develop their technology early on. She has been recognized as one of the top Big Data Scientists. As a part of our ongoing interview series talked to Amy about data science, Quid, and how statisticians can get involved in the tech scene. 

Which term applies to you: data scientist, statistician, computer scientist, or something else?
Data Scientist fits better than any, because it captures the mix of analytics, engineering and product management that is my current day to day.  
When I started with Quid I was focused on R&D - developing the first prototypes of what are now our core analytics technologies, and working to define and QA new data streams.  This required the analysis of lots of unstructured data, like news articles and patent filings, as well as the end visualisation and communication of the results.  
After we raised VC funding last year I switched to building our data science and engineering teams out.  These days I jump from conversations with the team about ideas for new analysis, to defining refinements to our data model, to questions about scalable architecture and filling out pivotal tracker tickets.  The core challenge is translating the vision for the product back to the team so they can build it.
 How did you end up at Quid?
In my previous work I’d been building models to improve our understanding of complex human systems - in particular the complex interaction of cities and their transportation networks in order to evaluate the economic impacts of, Crossrail, a new train line across London, and the implications of social networks on public policy.  Through this work it became clear that data was the biggest constraint - I became fascinated by a quest to find usable data for these questions - and thats what led me to Silicon Valley.  I knew the founders of Quid from University, and approached them with the idea of analysing their data according to ideas I’d had - especially around network analysis - and the initial work we collaborated on became core to the founding techology of Quid.
Who were really good mentors to you? What were the qualities that helped you? 
I’ve been fortunate to work with some brilliant people in my career so far.  While I still worked in London I worked closely with two behavioural economists - Paul Ormerod, who’s written some fantastic books on the subject (mostly recently Why Things Fail), and Bridget Rosewell, until recently the Chief Economist to the Greater London Authority (the city government for London).  At Quid I’ve had a very productive collaboration with Sean Gourley, our CTO.
One unifying characteristic of these three is their ability to communicate complex ideas in a powerful way to a broad audience.  Its an incredibly important skill, a core part of analytics work is taking the results to where they are needed which is often beyond those who know the technical details, to those who care about the implications first.
How does Quid determine relationships between organizations and develop insight based on data? 
The core questions our clients ask us are around how technology is changing and how this impacts their business.  Thats a really fascinating and huge question that requires not just discovering a document with the answer in it, but organizing lots and lots of pieces of data to paint a picture of the emergent change.  What we can offer is not only being able to find a snapshot of that, but also being able to track how it changes over time.
We organize the data firstly through the insight that much disruptive technology emerges in organizations, and that the events that occur between and to organizations are a fantastic way to signal both the traction of technologies and to observe strategic decision making by key actors.
The first kind of relationship thats important is of the transactional type, who is acquiring, funding or partnering with who, and the second is an estimate of the technological clustering of organizations, what trends do particular organizations represent.  Both of these can be discovered through documents about them, including in government filings, press releases and news, but requires analysis of unstructured natural language.  
We’ve experimented with some very engaging visualisations of the results, and have had particular success with network visualisations, which are a very powerful way of allowing people to interact with a large amount of data in a quite playful way.  You can see some of our analyses in the press links at http://quid.com/in-the-news.php
What skills do you think are most important for statisticians/data scientists moving into the tech industry?
Technical statistical chops are the foundation. You need to be able to take a dataset and discover and communicate what’s interesting about it for your users.  To turn this into a product requires understanding how to turn one-off analysis into something reliable enough to run day after day, even as the data evolves and grows, and as different users experience different aspects of it.  A key part of that is being willing to engage with questions about where the data comes from (how it can be collected, stored, processed and QAed on an ongoing basis), how the analytics will be run (how will it be tested, distributed and scaled) and how people interact with it (through visualisations, UI features or static presentations?).  
For your ideas to become great products, you need to become part of a great team though!  One of the reasons that such a broad set of skills are associated with Data Science is that there are a lot of pieces that have to come together for it to all work out - and it really takes a team to pull it off.  Generally speaking, the earlier stage the company that you join, the broader the range of skills you need, and the more scrappy you need to be about getting involved in whatever needs to be done.  Later stage teams, and big tech companies may have roles that are purer statistics.
Do you have any advice for grad students in statistics/biostatistics on how to get involved in the start-up community or how to find a job at a start-up? 
There is a real opportunity for people who have good statistical and computational skills to get into the startup and tech scenes now.  Many people in Data Science roles have statistics and biostatistics backgrounds, so you shouldn’t find it hard to find kindred spirits.

We’ve always been especially impressed with people who have built software in a group and shared or distributed that software in some way.  Getting involved in an open source project, working with version control in a team, or sharing your code on github are all good ways to start on this.
Its really important to be able to show that you want to build products though.  Imagine the clients or users of the company and see if you get excited about building something that they will use.  Reach out to people in the tech scene, explore who’s posting jobs - and then be able to explain to them what it is you’ve done and why its relevant, and be able to think about their business and how you’d want to help contribute towards it.  Many companies offer internships, which could be a good way to contribute for a short period and find out if its a good fit for you.

Why statisticians should join and launch startups

The tough economic times we live in, and the potential for big paydays, have made entrepreneurship cool. From the venture capitalist-in-chief, to the javascript coding mayor of New York, everyone is on board. No surprise there, successful startups lead to job creation which can have a major positive impact on the economy. 

The game has been dominated for a long time by the folks over in CS. But the value of many recent startups is either based on, or can be magnified by, good data analysis. Here are a few startups that are based on data/data analysis: 

  1. The Climate Corporation -analyzes climate data to sell farmers weather insurance.
  2. Flightcaster - uses public data to predict flight delays
  3. Quid - uses data on startups to predict success, among other things.
  4. 100plus - personalized health prediction startup, predicting health based on public data
  5. Hipmunk - The main advantage of this site for travel is better data visualization and an algorithm to show you which flights have the worst “agony”.

To launch a startup you need just a couple of things: (1) a good, valuable source of data (there are lots of these on the web) and (2) a good idea about how to analyze them to create something useful. The second step is obviously harder than the first, but the companies above prove you can do it. Then, once it is built, you can outsource/partner with developers - web and otherwise - to implement your idea. If you can build it in R, someone can make it an app. 

These are just a few of the startups whose value is entirely derived from data analysis. But companies from LinkedIn, to Bitly, to Amazon, to Walmart are trying to mine the data they are generating to increase value. Data is now being generated at unprecedented scale by computers, cell phones, even thremostats! With this onslaught of data, the need for people with analysis skills is becoming incredibly acute

Statisticians, like computer scientists before them, are poised to launch, and make major contributions to, the next generation of startups. 

Data analysis companies getting gobbled up

Companies that specialize in data analysis, or essentially, statistics, are getting gobbled up by larger companies. IBM bought SPSS, then later Algorithmics. MSCI bought RiskMetrics. HP bought Autonomy. Who’s next? SAS?

Tags: Startups Data