Once the idea was clear use a knowledge graph to teach an LLM about your schema the next step was execution.
So let’s get technical.
Making the LLM Endpoint a Variable
Instead of hardcoding my LLM choice, I made the endpoint dynamic. Why? Because experimentation is inevitable — and when you’re iterating across Claude, GPT-4, Mistral, or your own fine-tuned models, you don’t want to rip apart your backend every time.
I created a LLM_ENDPOINT
variable — something like:
Now switching providers is as simple as updating a .env
value or flipping a dropdown in the Streamlit UI.
For this phase, I used Claude 3.5 Sonnet it’s fast, sharp, and especially good at structured reasoning.
Walking the Knowledge Graph with Neo4j
Before generating SQL, the LLM needs to understand the question in the context of the schema.
Here’s how I made that happen:
-
User prompt (e.g., “Show me average revenue by region over the last 6 months”) comes in.
-
I query Neo4j to get:
-
Relevant table nodes
-
Relationships (foreign keys, join paths)
-
Field properties (names, types, descriptions, tags,
display_name
)
-
-
I package that into a graph context block — a structured format the LLM can parse cleanly.
Example snippet of the graph context passed in:
This isn’t just metadata it’s contextual fuel.
The Prompt Engineering Layer
Next came the trickiest part: getting consistently clean SQL from the LLM.
I learned this fast: raw prompting = unpredictable results. You need structure.
Here’s the flow I use now:
-
Instruction Layer Teach the LLM how to behave
-
Context Layer – the graph walk output
-
Prompt Layer – Inject the user’s question
-
Constraints Layer – Reinforce best practices
By composing the prompt this way, Claude produced clean, explainable SQL 90%+ of the time. Here’s an example of what it returned:
And if it failed or was missing something? I had fallback logic to auto-retry with minor tweaks, including more schema hints or adjusted date logic.
What’s Next
In Part 3, I’ll show how I closed the loop:
-
Sending that LLM-generated SQL to Databricks
-
Streaming the result back to the app
-
Asking the LLM to explain the data in plain English, drawing from the same field definitions in the graph
This isn’t just about querying faster.
It’s about building an analyst that thinks with your data — and can talk to you about what it finds.
Part 1: Teaching AI to Query: Building a Smarter Analyst with Knowledge Graphs, LLMs, and SQL
No Comment