How we turned Natural Language Query (NLQ) into SQL

We Built a Graph-Theoretic, Self-Healing Analytics Layer for Data-Backed Decision-Making

Mar 2, 2026
 
For us at Gobblecube, data has always been abundant, but actionable insights weren’t always immediate. Like many growing startups, we accumulated rich datasets across marketing, sales, products, and geography. However, to capture insights, this requires more real-time SQL queries, which are definitely complex, so we need something that can do it in real time and faster, also improving/iterating faster.
Early experiments with prompt-based NLQ-to-SQL systems showed promise but failed under real organizational pressure. Queries worked in under low complexity but broke when the complexities reached high, resulting in a 300-line query that takes on the whole statistical load. Hallucinated columns, incorrect joins, and silent logic errors eroded trust in the system. Most critically, leadership could not rely on these systems for data-backed decision-making.
This forced a shift in thinking. Hence, we stopped treating NLQ-to-SQL as a language problem and reframed it as a structural intelligence problem.
The result was a graph-theoretic, vector-aware, self-healing architecture that now underpins how analysts, managers, and leadership interact with data - bringing insights directly to their fingertips.
 

The Democratization of Data at GobbleCube

Before this system, access to insights followed a familiar pattern:
→ questions flowed from leadership to analysts, SQL queries were written manually, dashboards were refreshed, and insights arrived - often too late to influence decisions.
Our goal was not to replace analysts, but to amplify them.
By introducing a robust NLQ-to-SQL layer grounded in schema graphs and embeddings, Gobblecube enabled:
  • Faster exploratory analysis
  • Self-serve performance reviews
  • Easier anomaly detection across time-series metrics
  • Consistent interpretations of business logic
Data analysts transitioned from “query writers” to insight interpreters, focusing on trends, anomalies, and narratives rather than syntax.

The Three-Step NLQ-to-SQL Architecture at Gobblecube

To address these failures, Gobblecube implemented a three-step pipeline:
  1. Knowledge Retrieval (Selection)
  1. Logic Formation (Query Graph Construction)
  1. Self-Healing (Execution-Based Correction)
This separation introduced discipline and traceability into every generated query.
 
notion image

Step One: Knowledge Retrieval as Constrained Variable Selection

At Gobblecube, the first step ensures that only real, approved schema elements enter the reasoning process.
Let the schema be:
notion image
Given a natural language query ( Q ), the system selects:
notion image
No downstream step can invent new columns.
This single constraint eliminated nearly all hallucination issues and gave analysts confidence that every query was grounded in reality.
 

Step Two: Logic Formation via Query Graph Construction

SQL generation is treated as graph assembly, not text generation.
Each query becomes a query graph:
notion image
Where:
  • Nodes represent clauses and schema elements
  • Edges represent logical dependencies
For analysts at Gobblecube, this meant:
  • Queries became explainable
  • Logic errors were isolated to the structure, not the data
  • Performance review queries were consistent across teams
 

Step Three: Self-Healing SQL via Execution Feedback

Once executed, SQL errors are treated as signals, not failures.
If execution produces:
notion image
The system applies targeted repairs:
  • Missing GROUP BY columns
  • Incorrect aggregation levels
  • Invalid join paths
This self-healing loop drastically reduced analyst rework and made exploratory analysis safer and faster.

The Schema as a Graph at Gobblecube (( G_S ))

Gobblecube models its schema as a weighted directed graph:
notion image

Nodes

  • Product
  • City
  • Date
  • Marketing metrics
  • Sales metrics

Edges

  • Product → Sales
  • City → Impressions
  • City → Ad Spend
  • City → Sales

Weights

  • Frequently used joins (low weight)
  • Rare or discouraged joins (high weight)
This embeds business logic directly into topology.

The Query as a Graph (( G_Q ))

Every NLQ-generated query is validated by superimposing:
notion image
If a path does not exist in the schema graph, the query is rejected or repaired—preventing analysts from unknowingly violating data grain constraints.

From Graphs to Geometry: Vector Space Projection

Each schema element is embedded into:
notion image
Graph embeddings ensure that:
  • City-level metrics cluster together
  • Product-level metrics cluster together
  • Invalid joins fall into low-density regions
This geometric structure underpins anomaly detection and trend analysis.

Path Vectorization: Composite Embeddings

A schema path is represented as:
notion image
Example:
notion image
Normalization ensures fair comparison across paths:
notion image

The Net Graphical Space at Gobblecube

Valid business logic forms dense highways:
  • City → Impressions → Ad Spend → Sales
Invalid logic falls into voids:
  • Product → Ad Spend (without city context)
This spatial structure allows analysts to explore data safely and intuitively.

The Nearest Path Problem

Given query intent embedding ( \mathbf{v}_Q ), the system selects:
notion image
This ensures:
  • Semantic alignment
  • Structural validity
  • Minimal complexity

Case Study: “Calculate the Wasted Marketing Spend” at Gobblecube

Schema Context

Gobblecube’s schema includes:
  • Product
  • City
  • Impressions Gained
  • Ad Spend
  • Sales
  • Time
“Wasted marketing spend” is not an explicit metric.

Intent Projection

The query embedding is drawn toward:
  • Ad Spend
  • Low Sales
  • High Impressions
  • City-level aggregation

Candidate Paths

  1. City → Ad Spend → Sales (low conversion)
  1. City → Impressions → Ad Spend (no sales)
  1. Product → Sales (irrelevant to spend)
Vector similarity reveals that high spend + impressions + low sales at city level is the nearest valid path.

Outcome

The system generated:
  • City-wise wasted spend
  • Time-series trends
  • Outlier cities with abnormal spend-to-sales ratios
This directly enabled:
  • Performance review reports
  • City-level marketing optimization
  • Rapid anomaly detection

Self-Healing as Vector Gap Completion

If an invalid shortcut appears:
notion image
The system detects a gap and inserts:
notion image
This guarantees structurally correct analytics every time.

Impact at Gobblecube

This architecture:
  • Put data at stakeholders’ fingertips
  • Reduced analyst query time dramatically
  • Enabled consistent performance reviews
  • Made anomaly detection a daily practice through trend examination
Data analysts now focus on why numbers change, not why queries break.

Conclusion

At Gobblecube, moving from prompt-based NLQ to a graph-theoretic, vector-aware system transformed analytics from reactive to proactive. By aligning natural language with schema reality, the organization unlocked reliable, explainable, and scalable data-backed decision-making.
NLQ-to-SQL is no longer about asking questions - it’s about navigating reality.