How we turned Natural Language Query (NLQ) into SQL

We Built a Graph-Theoretic, Self-Healing Analytics Layer for Data-Backed Decision-Making

Mar 2, 2026

How we turned Natural Language Query (NLQ) into SQL

For us at Gobblecube, data has always been abundant, but actionable insights weren’t always immediate. Like many growing startups, we accumulated rich datasets across marketing, sales, products, and geography. However, to capture insights, this requires more real-time SQL queries, which are definitely complex, so we need something that can do it in real time and faster, also improving/iterating faster.

Early experiments with prompt-based NLQ-to-SQL systems showed promise but failed under real organizational pressure. Queries worked in under low complexity but broke when the complexities reached high, resulting in a 300-line query that takes on the whole statistical load. Hallucinated columns, incorrect joins, and silent logic errors eroded trust in the system. Most critically, leadership could not rely on these systems for data-backed decision-making.

This forced a shift in thinking. Hence, we stopped treating NLQ-to-SQL as a language problem and reframed it as a structural intelligence problem.

The result was a graph-theoretic, vector-aware, self-healing architecture that now underpins how analysts, managers, and leadership interact with data - bringing insights directly to their fingertips.

The Democratization of Data at GobbleCube

Before this system, access to insights followed a familiar pattern:

→ questions flowed from leadership to analysts, SQL queries were written manually, dashboards were refreshed, and insights arrived - often too late to influence decisions.

Our goal was not to replace analysts, but to amplify them.

By introducing a robust NLQ-to-SQL layer grounded in schema graphs and embeddings, Gobblecube enabled:

Faster exploratory analysis

Self-serve performance reviews

Easier anomaly detection across time-series metrics

Consistent interpretations of business logic

Data analysts transitioned from “query writers” to insight interpreters, focusing on trends, anomalies, and narratives rather than syntax.

The Three-Step NLQ-to-SQL Architecture at Gobblecube

To address these failures, Gobblecube implemented a three-step pipeline:

Knowledge Retrieval (Selection)

Logic Formation (Query Graph Construction)

Self-Healing (Execution-Based Correction)

This separation introduced discipline and traceability into every generated query.

Step One: Knowledge Retrieval as Constrained Variable Selection

At Gobblecube, the first step ensures that only real, approved schema elements enter the reasoning process.

Let the schema be:

Given a natural language query ( Q ), the system selects:

No downstream step can invent new columns.

This single constraint eliminated nearly all hallucination issues and gave analysts confidence that every query was grounded in reality.

Step Two: Logic Formation via Query Graph Construction

SQL generation is treated as graph assembly, not text generation.

Each query becomes a query graph:

Where:

Nodes represent clauses and schema elements

Edges represent logical dependencies

For analysts at Gobblecube, this meant:

Queries became explainable

Logic errors were isolated to the structure, not the data

Performance review queries were consistent across teams

Step Three: Self-Healing SQL via Execution Feedback

Once executed, SQL errors are treated as signals, not failures.

If execution produces:

The system applies targeted repairs:

Missing GROUP BY columns

Incorrect aggregation levels

Invalid join paths

This self-healing loop drastically reduced analyst rework and made exploratory analysis safer and faster.

The Schema as a Graph at Gobblecube (( G_S ))

Gobblecube models its schema as a weighted directed graph:

Nodes

Product

City

Date

Marketing metrics

Sales metrics

Edges

Product → Sales

City → Impressions

City → Ad Spend

City → Sales

Weights

Frequently used joins (low weight)

Rare or discouraged joins (high weight)

This embeds business logic directly into topology.

The Query as a Graph (( G_Q ))

Every NLQ-generated query is validated by superimposing:

If a path does not exist in the schema graph, the query is rejected or repaired—preventing analysts from unknowingly violating data grain constraints.

From Graphs to Geometry: Vector Space Projection

Each schema element is embedded into:

Graph embeddings ensure that:

City-level metrics cluster together

Product-level metrics cluster together

Invalid joins fall into low-density regions

This geometric structure underpins anomaly detection and trend analysis.

Path Vectorization: Composite Embeddings

A schema path is represented as:

Example:

Normalization ensures fair comparison across paths:

The Net Graphical Space at Gobblecube

Valid business logic forms dense highways:

City → Impressions → Ad Spend → Sales

Invalid logic falls into voids:

Product → Ad Spend (without city context)

This spatial structure allows analysts to explore data safely and intuitively.

The Nearest Path Problem

Given query intent embedding ( \mathbf{v}_Q ), the system selects:

This ensures:

Semantic alignment

Structural validity

Minimal complexity

Case Study: “Calculate the Wasted Marketing Spend” at Gobblecube

Schema Context

Gobblecube’s schema includes:

Product

City

Impressions Gained

Ad Spend

Sales

Time

“Wasted marketing spend” is not an explicit metric.

Intent Projection

The query embedding is drawn toward:

Ad Spend

Low Sales

High Impressions

City-level aggregation

Candidate Paths

City → Ad Spend → Sales (low conversion)

City → Impressions → Ad Spend (no sales)

Product → Sales (irrelevant to spend)

Vector similarity reveals that high spend + impressions + low sales at city level is the nearest valid path.

Outcome

The system generated:

City-wise wasted spend

Time-series trends

Outlier cities with abnormal spend-to-sales ratios

This directly enabled:

Performance review reports

City-level marketing optimization

Rapid anomaly detection

Self-Healing as Vector Gap Completion

If an invalid shortcut appears:

The system detects a gap and inserts:

This guarantees structurally correct analytics every time.

Impact at Gobblecube

This architecture:

Put data at stakeholders’ fingertips

Reduced analyst query time dramatically

Enabled consistent performance reviews

Made anomaly detection a daily practice through trend examination

Data analysts now focus on why numbers change, not why queries break.

Conclusion

At Gobblecube, moving from prompt-based NLQ to a graph-theoretic, vector-aware system transformed analytics from reactive to proactive. By aligning natural language with schema reality, the organization unlocked reliable, explainable, and scalable data-backed decision-making.

NLQ-to-SQL is no longer about asking questions - it’s about navigating reality.

Leading B2B sales with curiosity and H2H mindset

Kunal Shahi