I don’t know what will come next, but I know what should come next: carefully calibrated and standardized work-sample tests.
Think coding and algorithmic interview questions but less arbitrary and ad-hoc:
- designed to faithfully represent the actual work environment (no whiteboards!)
- standardized between interviewers—all candidates are evaluated on exactly the same test
- clear, objective evaluation rubric
We do this by designing a comprehensive programming task ahead of time that faithfully represents the actual work somebody will be doing. The name gives it away: we evaluate candidates by looking at a representative sample of their actual work instead of trying to proxy this with undergraduate-style exam questions on a whiteboard.
The advantages are massive: you do a better job of evaluating candidates for the actual role, you clamp down on random noise by keeping the test and test conditions constant and you limit bias by focusing primarily on objective criteria. Much, much better than the ugly and inconsistent processes we get now where most companies have engineers design, administer and evaluate interview questions themselves¹.
In research I’ve seen on selection metrics, work-sample tests tend to be the single most powerful instrument used by themselves, and are part of the optimal combination of instruments to use. “The Validity and Utility of Selection Methods in Personnel Psychology: Practical and Theoretical Implications of 85 Years of Research Findings” by Schmidt and Hunter is a comprehensive survey of research on the subject through 1998. The main conclusions of this survey are:
- work-sample tests have the best predictive performance
- general mental ability tests (ie IQ tests) also perform well
- structured interviews perform on par with GMA tests and much better than unstructured interviews
- the core effective methods (work-sample tests and structured interviews) perform better alongside a GMA test
While structured interview do perform well, most tech company interview processes are closer to unstructured interviews. Interviews at tech companies tend to neither be standardized across candidates nor to have clear, objective evaluation rubrics. Some companies like Google have moved to a more structured approach, but I believe even that falls short of the ideal. The questions are often designed and selected by the engineer performing the interview, with no rigor around their validity.
Now, to be clear, this survey paper covers selection across a variety of fields, and it’s plausible that it doesn’t apply 100% to the specific field of software. But it’s a much better starting point than the “common sense” and anecdotes that seem to dictate modern hiring policies in tech!
Here’s a table of the methods analyzed in the survey, along with a validity measure (r):
Work-sample tests have the best individual performance, followed by structured interviews and GMA tests. At the very bottom are graphology and age, which have effectively no correlation with employee success. Years of job experience, reference checks and years of education also perform surprisingly poorly—so avoid factoring those into your processes.
In the United States, GMA tests are a poor option for legal reasons: they open you up for discrimination lawsuits under the theory of “disparate impact” (see Griggs v. Duke Power Co. for the seminal case on this matter). Companies can protect themselves by performing an (expensive) study to validate the impact of IQ tests for their specific positions, but this is not worth the expense for most companies. IQ tests are also fraught culturally: I suspect many engineering candidates would be turned off by needing to go through an IQ test as part of hiring.
It’s plausible that brain teasers fall under the same disparate impact standard as IQ tests. This hasn’t been tested in court—it would probably be a difficult case to make—but it’s a real possibility. Yet another reason to avoid brain teasers!
In the US, the choice is simple: base your entire selection process around a work-sample test. While you can supplement this with additional factors, it should be the most important selection criterion.
You can see a rough progression in tech interviewing from brain teasers and other nonsense towards work-sample tests, but it hasn’t come nearly far enough. Considering that the research results on work-sample tests have been around for a while and are intuitive (the best procedure is the one that most closely resembles actual work—who would’ve thought?), the fact that more companies aren’t moving in this direction is an indictment of an industry that spends a remarkable amount of time and money on recruiting and selecting candidates.
To be fair, switching to work-sample tests does not come without challenges. You have to carefully design the tests—serious up-front work you don’t need with an ad-hoc interview process. You also have to administer the tests in a constrained time frame and in a way that works for different candidates. Of course, once you have it set up the test will take less work for individuals to administer (engineers no longer need to make up interview questions) and, crucially, it will be both more accurate and more consistent.
To me, the case for switching to work-sample tests—or at least properly structured interviews—seems clear. But the tech interviewing world runs on anecdotes above everything, so let me leave you with one by Thomas Ptacek on switching to a more structured process at his security firm, Matasano:
Compare the first 2/3rds of Matasano's lifetime to the last 1/3rd. The typical candidate we've hired lately would never have gotten hired at early Matasano, because (a) they wouldn't have had the resume for it, and (b) we over-weighted intangibles like how convincing candidates were in face-to-face interviews. But the candidates we've hired lately compare extremely well to our earlier teams! It's actually kind of magical: we interview people whose only prior work experience is "Line of Business .NET Developer", and they end up showing us how to write exploits for elliptic curve partial nonce bias attacks that involve Fourier transforms and BKZ lattice reduction steps that take 6 hours to run.
How? By running an outreach program that attracts people who are interested in crypto, and building an interview process that doesn't care what your resume says or how slick you are in an interview.
Call it the "Moneyball" strategy.
(From a long, detailed comment he wrote on Hacker News.)
So why not do this at your own company?
¹ If you’re wondering just how bad normal interview processes are, Aline Lerner did a compelling analysis of data from her startup (an interview platform):
After a lot more data, technical interview performance really is kind of arbitrary. The conclusion? 80% of the people who had done multiple interviews on the platform performed inconsistently between interviews. On a scale from 1–4, a surprising number of people had scores ranging 2–4 or even 1–4.
Not a good sign for the traditional tech interview!