For us at Gobblecube, data has always been abundant, but actionable insights weren’t always immediate. Like many growing startups, we accumulated rich datasets across marketing, sales, products, and geography. However, to capture insights, this requires more real-time SQL queries, which are definitely complex, so we need something that can do it in real time and faster, also improving/iterating faster.
Early experiments with prompt-based NLQ-to-SQL systems showed promise but failed under real organizational pressure. Queries worked in under low complexity but broke when the complexities reached high, resulting in a 300-line query that takes on the whole statistical load. Hallucinated columns, incorrect joins, and silent logic errors eroded trust in the system. Most critically, leadership could not rely on these systems for data-backed decision-making.
This forced a shift in thinking. Hence, we stopped treating NLQ-to-SQL as a language problem and reframed it as a structural intelligence problem.
The result was a graph-theoretic, vector-aware, self-healing architecture that now underpins how analysts, managers, and leadership interact with data - bringing insights directly to their fingertips.
The Democratization of Data at GobbleCube
Before this system, access to insights followed a familiar pattern:
→ questions flowed from leadership to analysts, SQL queries were written manually, dashboards were refreshed, and insights arrived - often too late to influence decisions.
Our goal was not to replace analysts, but to amplify them.
By introducing a robust NLQ-to-SQL layer grounded in schema graphs and embeddings, Gobblecube enabled:
Faster exploratory analysis
Self-serve performance reviews
Easier anomaly detection across time-series metrics
Consistent interpretations of business logic
Data analysts transitioned from “query writers” to insight interpreters, focusing on trends, anomalies, and narratives rather than syntax.
The Three-Step NLQ-to-SQL Architecture at Gobblecube
To address these failures, Gobblecube implemented a three-step pipeline:
Knowledge Retrieval (Selection)
Logic Formation (Query Graph Construction)
Self-Healing (Execution-Based Correction)
This separation introduced discipline and traceability into every generated query.
Step One: Knowledge Retrieval as Constrained Variable Selection
At Gobblecube, the first step ensures that only real, approved schema elements enter the reasoning process.
Let the schema be:
Given a natural language query ( Q ), the system selects:
No downstream step can invent new columns.
This single constraint eliminated nearly all hallucination issues and gave analysts confidence that every query was grounded in reality.
Step Two: Logic Formation via Query Graph Construction
SQL generation is treated as graph assembly, not text generation.
Each query becomes a query graph:
Where:
Nodes represent clauses and schema elements
Edges represent logical dependencies
For analysts at Gobblecube, this meant:
Queries became explainable
Logic errors were isolated to the structure, not the data
Performance review queries were consistent across teams
Step Three: Self-Healing SQL via Execution Feedback
Once executed, SQL errors are treated as signals, not failures.
If execution produces:
The system applies targeted repairs:
Missing GROUP BY columns
Incorrect aggregation levels
Invalid join paths
This self-healing loop drastically reduced analyst rework and made exploratory analysis safer and faster.
The Schema as a Graph at Gobblecube (( G_S ))
Gobblecube models its schema as a weighted directed graph:
Nodes
Product
City
Date
Marketing metrics
Sales metrics
Edges
Product → Sales
City → Impressions
City → Ad Spend
City → Sales
Weights
Frequently used joins (low weight)
Rare or discouraged joins (high weight)
This embeds business logic directly into topology.
The Query as a Graph (( G_Q ))
Every NLQ-generated query is validated by superimposing:
If a path does not exist in the schema graph, the query is rejected or repaired—preventing analysts from unknowingly violating data grain constraints.
From Graphs to Geometry: Vector Space Projection
Each schema element is embedded into:
Graph embeddings ensure that:
City-level metrics cluster together
Product-level metrics cluster together
Invalid joins fall into low-density regions
This geometric structure underpins anomaly detection and trend analysis.
Path Vectorization: Composite Embeddings
A schema path is represented as:
Example:
Normalization ensures fair comparison across paths:
The Net Graphical Space at Gobblecube
Valid business logic forms dense highways:
City → Impressions → Ad Spend → Sales
Invalid logic falls into voids:
Product → Ad Spend (without city context)
This spatial structure allows analysts to explore data safely and intuitively.
The Nearest Path Problem
Given query intent embedding ( \mathbf{v}_Q ), the system selects:
This ensures:
Semantic alignment
Structural validity
Minimal complexity
Case Study: “Calculate the Wasted Marketing Spend” at Gobblecube
Schema Context
Gobblecube’s schema includes:
Product
City
Impressions Gained
Ad Spend
Sales
Time
“Wasted marketing spend” is not an explicit metric.
Intent Projection
The query embedding is drawn toward:
Ad Spend
Low Sales
High Impressions
City-level aggregation
Candidate Paths
City → Ad Spend → Sales (low conversion)
City → Impressions → Ad Spend (no sales)
Product → Sales (irrelevant to spend)
Vector similarity reveals that high spend + impressions + low sales at city level is the nearest valid path.
Outcome
The system generated:
City-wise wasted spend
Time-series trends
Outlier cities with abnormal spend-to-sales ratios
This directly enabled:
Performance review reports
City-level marketing optimization
Rapid anomaly detection
Self-Healing as Vector Gap Completion
If an invalid shortcut appears:
The system detects a gap and inserts:
This guarantees structurally correct analytics every time.
Impact at Gobblecube
This architecture:
Put data at stakeholders’ fingertips
Reduced analyst query time dramatically
Enabled consistent performance reviews
Made anomaly detection a daily practice through trend examination
Data analysts now focus on why numbers change, not why queries break.
Conclusion
At Gobblecube, moving from prompt-based NLQ to a graph-theoretic, vector-aware system transformed analytics from reactive to proactive. By aligning natural language with schema reality, the organization unlocked reliable, explainable, and scalable data-backed decision-making.
NLQ-to-SQL is no longer about asking questions - it’s about navigating reality.