Teaching AI to Query – Part 2: From Knowledge Graph to Dynamic SQL with Claude, Neo4j, and a Dash of Prompt Engineering

Once the idea was clear use a knowledge graph to teach an LLM about your schema the next step was execution.

So let’s get technical.

Making the LLM Endpoint a Variable

Instead of hardcoding my LLM choice, I made the endpoint dynamic. Why? Because experimentation is inevitable — and when you’re iterating across Claude, GPT-4, Mistral, or your own fine-tuned models, you don’t want to rip apart your backend every time.

I created a LLM_ENDPOINT variable — something like:

Now switching providers is as simple as updating a .env value or flipping a dropdown in the Streamlit UI.

For this phase, I used Claude 3.5 Sonnet it’s fast, sharp, and especially good at structured reasoning.

Walking the Knowledge Graph with Neo4j

Before generating SQL, the LLM needs to understand the question in the context of the schema.

Here’s how I made that happen:

User prompt (e.g., “Show me average revenue by region over the last 6 months”) comes in.
I query Neo4j to get:
- Relevant table nodes
- Relationships (foreign keys, join paths)
- Field properties (names, types, descriptions, tags, display_name)
I package that into a graph context block — a structured format the LLM can parse cleanly.

Example snippet of the graph context passed in:

This isn’t just metadata it’s contextual fuel.

The Prompt Engineering Layer

Next came the trickiest part: getting consistently clean SQL from the LLM.

I learned this fast: raw prompting = unpredictable results. You need structure.

Here’s the flow I use now:

Instruction Layer Teach the LLM how to behave

sql

You are a SQL analyst. Use only the tables and fields listed below. Optimize for clarity and correctness.
Context Layer – the graph walk output
Prompt Layer – Inject the user’s question

sql

User prompt: “Show me average revenue by region over the last 6 months”
Constraints Layer – Reinforce best practices

pgsql

- Use table aliases - Format joins explicitly - Include `WHERE` clauses for temporal filtering - Output a single SQL query only

By composing the prompt this way, Claude produced clean, explainable SQL 90%+ of the time. Here’s an example of what it returned:

And if it failed or was missing something? I had fallback logic to auto-retry with minor tweaks, including more schema hints or adjusted date logic.

What’s Next

In Part 3, I’ll show how I closed the loop:

Sending that LLM-generated SQL to Databricks
Streaming the result back to the app
Asking the LLM to explain the data in plain English, drawing from the same field definitions in the graph

This isn’t just about querying faster.

It’s about building an analyst that thinks with your data — and can talk to you about what it finds.

Part 1: Teaching AI to Query: Building a Smarter Analyst with Knowledge Graphs, LLMs, and SQL

0 0

Troy Yohn

-

Teaching AI to Query – Part 2: From Knowledge Graph to Dynamic SQL with Claude, Neo4j, and a Dash of Prompt Engineering

Making the LLM Endpoint a Variable

Walking the Knowledge Graph with Neo4j

The Prompt Engineering Layer

What’s Next

No Comment

Leave a Reply Cancel Reply