Hiring is tough. One problem particularly related to hiring for skilled positions is that many academic institutions focuses more on theory than on application, making your highly-educated job candidate almost useless in an actual non-academic business setting. Over the years, there have been some very clever techniques for determining whether a job candidate is not only smart, but also able to actually, you know, do stuff. In the world of software engineering, the FizzBuzz programming test has proven to be so effective as a screening technique for programming job applicants that it’s frankly embarrassing. Briefly, the test asks a job applicant to sketch out a blueprint of an incredibly easy programming task, one for which there are hundreds of possible correct answers. As it turns out, for any programming job opening, there are going to be dozens, if not hundreds, of completely unqualified applicants. The test has become a great way to do a first-line check as to whether the applicant is even eligible for the position, after which you can examine whether their qualifications match.
Fast-forward to today: we’ve recently begun interviewing for a new quantitative analyst position in my group. As it turns out, most of the resumes look very similar; masters or PhD in statistics/financial mathematics/something similar, little work experience. Thinking back to the Fizzbuzz concept, I wanted to create a quick test to ensure that the fellow in front of me is familiar with the basics of applied statistics. My thinking led me to this scatterplot, from a dataset I was working with at the time:
![]()
At the interview I asked each candidate the following:
Look at this plot. Don’t worry about the axes or scale; for all intents and purposes, it’s just X and Y.
- What are some of the interesting features of this plot?
- How would you predict Y given X?
(I frequently format my speech into a numbered list.)
The good thing about these questions is that there is no correct answer; as with the FizzBuzz question, there are hundreds of ways to approach the problem. However, you have to have some basic applied knowledge regarding dealing with datasets. Sure, your thesis work on applying Bayesian methodologies to sparse highly dependent datasets may have been very ingenious, but if you can’t work with data, well, come back when you can.
I’ve had mixed responses to this question. Everyone notes that the data is clustered. Some comment that there’s a slightly bimodal distribution (x=0 and x=1). One or two have discussed the apparent independence of the axes. Regarding the prediction task, most provide the more “obvious” answers, such as simple data transforms and linear/nonlinear regression techniques. Some refer back to techniques they’ve used in past projects, research or otherwise. I have had a few people who simply didn’t know what to do and simply shrugged.
Overall, the test gives a good idea how the candidate right now would think about doing such a simple, and (in our workplace) very commonplace task. Sure, it doesn’t test for familiarity with machine learning techniques or an understanding of heteroskedasticity, but it’s not intended to. Just as FizzBuzz doesn’t test knowledge of namespaces or function templates, this is intended to be a similar benchmark, testing only the most basic applied statistical skills. The fact that a couple applicants have been tripped up makes me think it may be a useful test for others interviewing for applied statistics/data mining positions. If anyone has any similar techniques they use for determining basic applied statistics skills, feel free to share them below.