I remember when I first took a job as a Data Manager. We had a rigorous two day interview that was by any means NOT mean for either a programmer or a statistician. The requirements ranged from developing databases, data collection systems for mobile and desktops, analyzing data using SAS, SPSS, STATA and/or EPI Info. I had applied for the job coz i felt I could do the databases and programming part, and try basic SPSS data analysis since I am a Computer Science graduate.
I got shortlisted for the interview and came the day of the interview, made my way to the organization dressed in Khaki trousers. We had been told day one will be purely practical and if you make it through, day 2 will be the oral part. We where given desktops loaded with Visual Studio 2005, VB 6, Java and Netbeans IDE, STATA 10, SPSS 17 and SAS ver 9.1, and of course Epi Info. There where three parts of the practical Interview - Part 1 was to create a data dictionary for a research study for HIV in Adolescents. We where given a 2 page questionnaire to create the data dictionary with. Part 1 B was to create a database based on the data dictionary using either Access, SQL Server or FoxPro. Part II of the database was to create a data collection tool or application that will save data to the database that you had created in Part 1 B. You where asked to place validation checks such that string data wont be saved in places where numeric data was expected as well as create skip patterns as per the questionnaire -say if respondent is male, then you wont ask them the question on When they started menstruating.
In the last part of the interview, we where given a data set that had fictitious data from the said study- collected by the questionnaire we had just designed. It was in excel and we where asked to pull it into a statistical package of our choice. We where then asked to perform data manipulations/data management tasks like identifying duplicate entries, identifying missing values, invalid values in some variables and knocking of duplicates. Lastly we where asked do some descriptive statistics and save the output in a report or make a presentation on power point to present the following day.
Basically, it was a long day for all of us. We left their offices at 6:30.
In the mix were both statisticians, mathematicians and us,the computer science grads. As you can see, neither of the teams had an upper hand since if you where a stats guy, you will have an easy time with the stats packages, data manipulation and analysis but you will have a rough time to program an application in VB 6, .NET or Java. Similarly, the Computer science guy may be good in the development side and creating a well normalized database but when it comes to the statistics, he will do very little indeed!
I have realized that there is a thin line between the two jobs. Actually, they really inter marry a lot. Gone are the days when an organization will hire a different programmer and a different data analyst, they would rather hire a cross breed of the two who can do both tasks well.
Either way, I got the job. And for the next three years, I learned a lot as a cross breed. From being humbled - I did not know what an observation was nor a variable was. I will be sharing more on how people in these two fields can bridge the gap between them.