How a single LLM session transformed a messy, user-generated dataset into a tidy, query-ready table in just 20 minutes.
The Challenge
When I began working with the 1,000+ row table, two critical issues quickly became apparent:
- Data Normalisation: The “universe” column contained inconsistent variations (e.g. “Marvel 616”, “Marvel, Earth‑616”, etc.), hindering effective data analysis. Standardising these names was essential.
- Content Moderation: A small number of threads contained inappropriate content, including slurs, sexually suggestive descriptions, etc. Addressing this was crucial for maintaining a safe and usable dataset.
Why an Agentic CLI?
While building a full agentic workflow with tools like Pydantic-AI or the OpenAI Agents SDK is an option, I’ve realized that LLMs are surprisingly good with bash, even commands they haven’t seen before. The real power lies in instructing an Agentic CLI to execute shell commands through its built-in shell tool. This allows us to bypass the complexity of crafting custom tools by leveraging an Agentic CLI, such as Qwen Code, OpenAI Codex, or Google Gemini CLI, and its built-in shell execution capabilities. This significantly reduces development time.
I utilized Qwen Code for this project, although any comparable CLI would suffice.
I’m not going to dive into the particulars of that project/codebase, but I added an AGENTS.md
file where among other things I defined a handy helper command:
|
|
This wrapper executes a PostgreSQL query (using psql
, but that’s an implementation detail), returning the result set. The LLM CLI can directly invoke this tool, enabling it to inspect the schema, analyze the data, propose updates, and then execute those updates after my approval.
For safety, I also included make backup-postgres
and make restore-checkpoint
commands (to be run by me), allowing for easy rollback in case of unexpected issues.
Data Cleanup Workflow
My initial request to the LLM was to examine the characters
table using make sql
and assist in cleaning up the data. Specifically, I wanted to ensure consistency across multiple, similar universe entries.
This is what the LLM did:
Inspected the table
|
|
The result provided a 10-column schema, revealing that the universe
column was defined as VARCHAR(255)
.
Counted distinct universes.
|
|
The LLM then suggested grouping similar patterns, such as Marvel 616
, Marvel, Earth-616
, and Marvel: Earth-616
, into a standardized form: Marvel Comics (Earth-616)
. Similarly, it identified that DC (Post-Crisis)
and DC Comics Post‑Crisis
should be consolidated as DC Comics (Post-Crisis)
.
There were many other groupings, but you get the idea.
For each group, the LLM generated a single UPDATE
command. For example:
|
|
The output confirmed that 30 rows were updated.
The LLM executed each statement, and after each, it verified the updated counts to monitor progress.
Batching Updates
Manually approving each UPDATE
felt repetitive, given the numerous similar groupings. To speed up the process, I prompted the LLM to generate a CSV file.
The LLM produced universe_standardization.csv
containing entries like:
Current | Suggested |
---|---|
DC (Post-Crisis) | DC Comics (Post-Crisis) |
DCAU | DC Animated Universe (DCAU) |
DC Rebirth | DC Comics (New 52/Rebirth) |
Injustice (DC Comics/Netherrealm) | DC Comics (Injustice) |
After reviewing and making some minor manual adjustments to the CSV, I instructed the LLM to generate a single batch of UPDATE
statements using a loop:
|
|
Executing this generated a clean dataset.
Content Moderation
The dataset also contained rows with slurs, inappropriate content, and sexually or violently descriptive passages.
To manage this, I added a database migration to include an approved
column.
I then asked the LLM to initially mark all rows as approved and subsequently identify and unapprove any inappropriate entries.
The LLM iteratively scanned the data in batches of 50 rows, flagging instances of gore and other unsuitable content. It then searched for “bad” keywords within the database and unapproved rows that it deemed problematic.
Other (Ab)uses of Agentic CLIs
These Agentic CLIs are great you can use these for all sorts of other things besides writing code.
They offer a nice, ready-made frontend with a polished UX, allowing you to hook up tools without building your own UI.
At work, I built a tool that would trigger Claude Code in an empty workspace (so doesn’t start writing code), providing it with work-specific knowledge and a preconfigured MCP containing functions/tools for interacting with the production environment. I then used it as an AI Site Reliability Engineer, which, when prompted, would examine an alert, look up the intent and current state, attempt to investigate and determine the root cause, and sometimes try to fix/resolve the alert. I had whitelisted certain tools to be triggered automatically, while more dangerous tools required explicit permission.
You get a tool whitelist and a human-in-the-loop mechanism for free with Claude Code.
Tl;dr
You don’t always need complex custom agentic workflows! LLMs are surprisingly capable. For small tasks, a simple LLM + shell tool + a human and AI usable CLI combo is enough to get the job done – powerful and efficient without the extra overhead.
To get the most deterministic output, you want to minimize the LLM’s decision-making to a few critical points, and everything else should be code-driven. But for non-critical use cases where there is a human in the loop, just giving them a set of tools and letting them figure out a path works out surprisingly well.
Last modified on 2025-09-27