How LLMs Will Democratize Exploratory Data Analysis | by Ken Kehoe | Jun, 2024

Or, When you feel your life’s too hard, just go have a talk with Claude

When I think about the challenges involved in understanding complex systems, I often think back to something that happened during my time at Tripadvisor. I was helping our Machine Learning team conduct an analysis for the Growth Marketing team to understand what customer behaviors were predictive of high LTV. We worked with a talented Ph.D. Data Scientist who trained a logistic regression model and printed out the coefficients as a first pass.

When we looked at the analysis with the Growth team, they were confused — logistic regression coefficients are tough to interpret because their scale isn’t linear, and the features that ended up being most predictive weren’t things that the Growth team could easily influence. We all stroked our chins for a minute and opened a ticket for some follow-up analysis, but as so often happens, both teams quickly moved on to their next bright idea. The Data Scientist had some high priority work to do on our search ranking algorithm, and for all practical purposes, the Growth team tossed the analysis into the trash heap.

I still think about that exercise — Did we give up too soon? What if the feedback loop had been tighter? What if both parties had kept digging? What would the second or the third pass have revealed?

The anecdote above describes an exploratory analysis that didn’t quite land. Exploratory analysis is distinct from descriptive analysis, which simply aims to describe what’s happening. Exploratory analysis seeks to gain a greater understanding of a system, rather than a well-defined question. Consider the following types of questions one might encounter in a business context:

Notice how the exploratory questions are open-ended and aim to improve one’s understanding of a complex problem space. Exploratory analysis often requires more cycles and tighter partnership between the “domain expert” and the person actually conducting the analysis, who are seldom the same person. In the anecdote above, the partnership wasn’t tight enough, the feedback loops weren’t short enough, and we didn’t devote enough cycles.

These challenges are why many experts advocate for a “paired analysis” approach for data exploration. Similar to paired programming, paired analysis brings an analyst and decision maker together to conduct an exploration in real-time. Unfortunately, this type of tight partnership between analyst and decision maker rarely occurs in practice due to resource and time constraints.

Now think about the organization you work in — what if every decision maker had an experienced analyst to pair with them? What if they had that analyst’s undivided attention and could pepper them with follow-up questions at will? What if those analysts were able to easily switch contexts, following their partner’s stream of consciousness in a free association of ideas and hypotheses?

This is the opportunity that LLMs present in the analytics space — the promise that anyone can conduct exploratory analysis with the benefit of a technical analyst by their side.

Let’s take a look at how this might manifest in practice. The following case study and demos illustrate how a decision maker with domain expertise might effectively pair with an AI analyst who can query and visualize the data. We’ll compare the data exploration experiences of ChatGPT’s 4o model against a manual analysis using Tableau, which will also serve as an error check against potential hallucinations.

A note on data privacy: The video demos linked in the following section use purely synthetic data sets, intended to mimic realistic business patterns. To see general notes on privacy and security for AI Analysts, see Data privacy.

How LLMs Will Democratize Exploratory Data Analysis | by Ken Kehoe | Jun, 2024

Or, When you feel your life’s too hard, just go have a talk with Claude

Related Updates

Farcaster founder teases Frames v2 ahead of full launch in 2025

Justice Dept To Drop Two Criminal Cases Against Donald Trump

Trump threatens Canada, Mexico, China with tariffs : NPR

Deploy Meta Llama 3.1 models cost-effectively in Amazon SageMaker JumpStart with AWS Inferentia and AWS Trainium

Recent Updates

Farcaster founder teases Frames v2 ahead of full launch in 2025

Justice Dept To Drop Two Criminal Cases Against Donald Trump

Trump threatens Canada, Mexico, China with tariffs : NPR

Deploy Meta Llama 3.1 models cost-effectively in Amazon SageMaker JumpStart with AWS Inferentia and AWS Trainium