What problem does SQuARE solve?

SQuARE addresses the challenge of AI answering questions over real spreadsheets that have nested headers, merged cells, and inconsistent formats, by preserving structure and units.

How does SQuARE route queries?

SQuARE uses sheet-level, complexity-aware routing to decide whether a query should be answered via structure-preserving chunk retrieval or via SQL over an automatically built relational representation.

What is the role of the lightweight agent?

A lightweight agent supervises retrieval, refinement, or combination of results across both paths when confidence is low, ensuring verifiability and auditable results.

What are the key benefits for practitioners?

Practitioners gain faithful, verifiable answers from real spreadsheets, with preserved header hierarchies, time labels, and units, avoiding misreads from naive flattening.

Where can SQuARE be applied?

Applications include finance, business analytics, and research workflows where accurate tabular QA and auditable traceability are critical.

When Tables Tell the Truth: How SQuARE Lets AI Read Real Spreadsheets Without Losing Structure

In business, finance, and research, spreadsheets are the quiet workhors behind countless decisions. They hold numbers, trends, and the tiny annotations that explain what every cell means. But asking an AI to read a real spreadsheet and give you a precise answer? That’s a whole different challenge. Real spreadsheets aren’t tidy. They have multi-row headers, merged cells, unit lines, and varied formats. A naive AI or a one-size-fits-all retrieval approach can lose the structure, misread units, or fetch the wrong column altogether. That’s where SQuARE comes in—a smart, hybrid system designed specifically for tabular data. It doesn’t just flatten a table into text and hope for the best. It treats each sheet as a structured, context-rich organism and routes your question through the safest path to an auditable answer.

In this post, we’ll break down what SQuARE is, how it makes retrieval decisions, and why this matters for anyone who works with real-world spreadsheets.

The core idea: two paths, one goal

SQuARE stands for Structured Query & Adaptive Retrieval Engine for Tabular Formats. The big idea is simple in spirit but powerful in practice: don’t force every spreadsheet through a single retrieval method. Instead, assess the sheet’s structure, then route the question to the most appropriate path.

If the sheet’s layout and metadata (like header depth and merged cells) suggest that layout semantics matter (for example, a “Total” row that sits across multiple header levels or a unit line like “USD (millions)” inbetween headers and data), SQuARE uses structure-preserving chunk retrieval. It looks at the real header paths and the units as they appear, and it retrieves evidence from well-defined blocks that keep that context intact.
If the sheet looks flatter (a more conventional table with a clear header row and stable column semantics), SQuARE can also lean on a schema-aware SQL route. It builds a cleansed relational view of the table and uses constrained SQL to filter, aggregate, or compare values deterministically.

The system also includes a lightweight agent that monitors confidence. When the primary path isn’t confident enough, the agent can refine, switch paths, or merge evidence from both paths to produce a trustworthy answer. The outcome is reported with the exact cells or rows used as evidence, so you can verify the result against the original spreadsheet.

Think of it like this: reading a spreadsheet with a smart tour guide. For complicated floors with winding hallways (multi-row headers, units, and merged cells), the guide preserves the route you took through the building. For a simple, tidy floor, the guide can take the fastest, most exact route via a well-structured map (SQL over a clean schema). The combination makes QA both precise and auditable.

How SQuARE breaks down the problem

Here’s a bite-sized tour of the key components and ideas, without getting lost in the math.

1) A sheet-aware complexity score

Before choosing a path, SQuARE estimates how structurally complex a sheet is. It does this using two observable clues:

Header depth: How many levels of header rows exist above the data.
Merged/split cells in the header region: Merges often indicate that several columns belong together or that a heading spans multiple sub-columns.

From these clues, SQuARE computes a sheet-level complexity score and then classifies sheets into two broad families:
- Multi-Header sheets: complicated headers, meaningful header paths, and often important units or time labels.
- Flat sheets: simpler, more traditional tables with a stable header.

This rating guides the routing decision: more structural preservation on complex sheets, more SQL opportunities on flat sheets.

2) Structure-preserving chunk retrieval (for Complex sheets)

When the sheet is Multi-Header, SQuARE constructs a semantic index that doesn’t flatten the table to raw cells. Instead, it segments the sheet at header boundaries into coherent blocks. Each block carries compact metadata:
- Hi: the header path (the sequence of header labels from outer to inner)
- Yi: any time labels (e.g., year or quarter) present in the block
- Ui: the unit string (if any)

Each block gets a short, descriptive sentence about what it contains. These descriptions are turned into embeddings and stored in a semantic index.

At query time, the system pulls the top k (up to 3) blocks most relevant to the question, based on cosine similarity of the query embedding with the block descriptions. The actual answer is produced by a language model using the retrieved blocks as the evidentiary context. This path preserves header semantics and units, so you don’t end up mixing “Net income” from one header level with “Taxes” from another.

There’s a built-in quality gate: if the retrieved blocks don’t give a convincing context, the system may fall back to the alternative path or attempt to merge evidence.

3) Schema-aware SQL over a relational view (for Flat sheets)

For flat tables, SQuARE can infer a clean schema and spin up a relational view of the table. It captures column names, data types, and units when present, and persists a minimal, well-formed schema.

With a cleaned schema in hand, SQuARE uses a constrained, deterministic SQL path. It doesn’t deploy arbitrary SQL; instead, it restricts queries to a safe subset: SELECT, FROM, WHERE, GROUP BY, ORDER BY, and LIMIT. This guardrail prevents dangerous operations and keeps results auditable.

To generate the SQL, a lightweight language model is used to draft a query from the NL question and the cleaned schema. If needed, the system refines the SQL once or twice to fix errors or handle empty results. The SQL path is especially strong for precise filters and aggregations, and it returns the exact rows as evidence.

4) The agent and the fallback logic

A simple, but crucial, design choice: let the decision be per-question and per-sheet. The agent looks at the sheet’s label (Flat vs Multi-Header) and cues in the question. Aggregations with explicit filters often go to SQL on flat sheets; questions about layout, headers, or units steer the process toward structure-preserving chunks.

If the primary path yields weak evidence or fails a quality check, the agent tries the alternate path (when applicable) or merges results from both paths. If the merged context exceeds the budget for tokens, it’s summarized. Only when the evidence passes a quality gate does the system answer; otherwise, it abstains. This reduces hallucinations and keeps the answer tied to verifiable cells or rows.

5) A modular, swappable design

SQuARE is built to be modular. Embeddings, vector stores, and databases can be swapped without changing the control flow. The system emphasizes numerical precision and provable provenance, with evidence explicitly tied to spreadsheet cells or rows.

A look at the data and the results

SQuARE was tested across three kinds of spreadsheet challenges to reflect real-world diversity:

Complex multi-header corporate spreadsheets (think quarterly balance sheets with several header rows and units interleaved between headers and values).
A large, merged World Bank workbook with heterogeneous sheets.
Flat, single-header public tables.

What does “success” look like in this setting? The authors focus on exact-match accuracy for numeric and categorical answers and separate retrieval recall (R@k) to measure how much of the necessary evidence is retrieved by each path.

A few snapshots from the experiments help illustrate the gains:

On complex multi-header balance sheets, a Gemma-based instantiation of SQuARE achieved about 91% exact-match accuracy, significantly higher than a Llama-based version (around 81%) and a tool-free ChatGPT-4o baseline (about 29%). The big jump here is clearly the structure-preserving path doing the heavy lifting for header-aware questions.
On a merged World Bank workbook, SQuARE with Gemma reached roughly 86% accuracy, beating the Llama variant and ChatGPT-4o by a comfortable margin.
On flat tables (the Health, Public Debt, Global Economic Prospects, Energy, and Education datasets), SQuARE reached about 93% overall, with the Hard tier around 87%. ChatGPT-4o trailed more noticeably on the Hard items, where precise filters and multi-column predicates shine in the SQL path.
Retrieval quality separated by path: for complex sheets, the chunk-based retrieval typically surfaces the necessary evidence in three chunks or fewer; on flat tables, the SQL path often hits near-perfect recall for the top result.

Ablation studies underscored the importance of fallback and merging:
- Removing the fallback mechanism hurts accuracy by several points across difficulty levels, showing that the ability to switch paths when confidence is low is meaningful.
- Chunk-only (no SQL on flat tables) underperforms compared to the full system on flat tables, while SQL-only on flat tables could sometimes underperform on certain complex questions, underscoring the merit of hybrid routing.

In terms of practicality, the作 setup used quantized models running on modest GPUs (e.g., T4 with 15 GB VRAM). Latency was competitive with ChatGPT-4o in many scenarios, with the exact timing depending on sheet structure and the chosen retrieval path. The takeaway: you don’t need an army of GPUs to get good, verifiable results on real spreadsheets.

Why this matters in the real world

You might be wondering: “So what? How do I actually use this?” Here are the practical implications and some real-world applications.

Verifiable financial QA: In finance, auditors and analysts need precise, auditable answers linked directly to cells and lines in the ledger. SQuARE’s evidence-first approach helps ensure that AI answers aren’t just plausible but traceable to the exact data source.
Flexible data pipelines: Organizations often juggle both highly structured dashboards and messy, hand-crafted spreadsheets. A system that can adapt its retrieval strategy to the sheet’s realities reduces the risk of errors and makes automation more robust.
Faster, safer insights: For flat tables, the deterministic SQL route can be faster and safer than trying to reason from unstructured text alone. For complex layouts, preserving structure avoids the common pitfalls of flattening tables (like mixing up units or misreading multi-row headers).
Easier audits and compliance: Because SQuARE aims to return exact cells or rows and maintains header paths and units, the audit trail is straightforward. You can reproduce the same answer by pointing to the precise cells.

Beyond finance, think about public statistics, corporate dashboards, or any environment where data lives in real spreadsheets rather than clean relational tables. SQuARE’s hybrid approach is a practical blueprint for how to respect structure while still delivering the power of language models.

Practical tips if you’re inspired to explore this space

If you’re building or evaluating spreadsheet QA in your own organization, here are takeaways you can apply:

Don’t force one method. Start by classifying the sheet’s structure and route questions accordingly. A lightweight, per-sheet complexity metric can pay off big time.
Preserve structure where it matters. If a header path and units are essential to the question, structure-preserving chunking is your friend.
Use a constrained SQL path for flat tables. When the schema is stable, a schema-aware SQL approach offers fast, exact answers with strong provenance.
Include an agented fallback. A simple confidence check can prevent brittle behavior. If confidence is low, switch paths or merge evidence from multiple sources.
Keep the evidence front and center. Return the exact rows or cells used to produce the answer so users can verify and trust the result.
Design for modularity. Swapping embeddings, vector stores, or databases should be possible without reworking the control flow. That makes it easier to upgrade components as models improve.
Plan for real-world messiness. The current design handles complex headers and units, but OCR’d tables, nonstandard layouts, or cross-sheet joins introduce new challenges. You’ll want to layer layout detectors and more robust cross-sheet logic in future work.

Key takeaways

Real spreadsheets are not uniform. Multi-row headers, merged cells, and unit lines break naive QA approaches. SQuARE tackles this by classifying sheets by structural complexity and routing questions accordingly.
Two complementary retrieval paths keep the best of both worlds. Structure-preserving chunk retrieval preserves header semantics and units for complex sheets, while a schema-aware SQL path provides fast, deterministic reasoning on flat tables.
An agent with confidence-driven fallbacks makes the system robust. If the primary path isn’t confident, SQuARE can switch paths or merge evidence to produce reliable, auditable answers.
Evidence-first results boost trust. SQuARE returns the exact cells or rows used to answer, enabling straightforward verification and easier auditing.
The approach is modular and practical. With swappable embeddings, vector stores, and databases, it’s feasible to deploy on modest hardware and integrate with existing data workflows.
The research shows meaningful gains in accuracy, especially on structurally complex spreadsheets, and demonstrates that hybrid RAG+SQL systems can beat tool-free baselines on real-world tabular QA tasks.
There’s room to grow. Future work could include learned routing with uncertainty estimates, better handling of OCR’d or layout-laden inputs, stronger schema alignment for SQL, and multi-sheet joins that preserve provenance.

If you’re curious about making AI read real spreadsheets with accuracy and accountability, SQuARE offers a compelling blueprint: don’t fight the table’s structure—let the structure guide the retrieval, and use a cautious, adaptive blend of paths to get to the truth in your data.

If you want to dive deeper, think about testing a small pilot on a few datasets you use regularly. Start with a couple of flat tables to experiment with the SQL path, then bring in a few multi-header sheets to see how the structure-preserving approach performs. The fusion of these strategies is where the most reliable, auditable AI-assisted spreadsheet QA lives.

When Tables Tell the Truth: How SQuARE Lets AI Read Real Spreadsheets Without Losing Structure

When Tables Tell the Truth: How SQuARE Lets AI Read Real Spreadsheets Without Losing Structure

The core idea: two paths, one goal

How SQuARE breaks down the problem

1) A sheet-aware complexity score

2) Structure-preserving chunk retrieval (for Complex sheets)

3) Schema-aware SQL over a relational view (for Flat sheets)

4) The agent and the fallback logic

5) A modular, swappable design

A look at the data and the results

Why this matters in the real world

Practical tips if you’re inspired to explore this space

Key takeaways

Frequently Asked Questions

Related Topics

About the Author

Unlock the full power of AI.