Session: Can Generative AI replace Data Scientists?
Since GPT-4 exploded on the scene over a year ago, practitioners have been anticipating the ways in which AI will change the data science workflow, or even replace data and machine learning scientists altogether. With the upcoming launch of GPT-5, these issues are even more pressing. In this talk, I will demo the results of hundreds of hours testing GPT-4’s Data Analyst (GPT4DA), an AI released by OpenAI in August 2023 designed for advanced data analysis. I evaluated GPT4DA’s performance along 4 steps of the data science pipeline:
1. Data Engineering (Can GPT4DA set up a pipeline to collect and clean data?)
2. Exploratory Data Analysis (How well can GPT4DA summarize, visualize, and help us make decisions from data?)
3. Applied Science / Modeling (How well can GPT4DA choose, build, and evaluate a reasonable modeling approach?), and
4. Deployment (Can GPT4DA assist in putting the model in production?).
At each step, I evaluated GPT4DA along three dimensions:
1. Speed: could GPT4DA do the job faster than a human armed with industry standard tools?
2. Completeness: could GPT4DA operate independently of human input, if needed? Or is it better used as a tool to assist humans?
3. Creativity: could GPT4DA contribute original ideas that a human wouldn’t be able to?
I will be sharing what I found out, along with personal thoughts on ways data scientists can grow their skills to best leverage this new technology.
Bio
Tina Tang is a data science leader with over a decade of experience in machine learning, causal inference, and econometrics. She currently leads the Flagship Applied Science team at Linkedin.