SQuARE is a hybrid retrieval and execution framework designed for tabular formats. It preserves header hierarchies, time labels, and units while answering complex questions from spreadsheets, ensuring results are auditable and verifiable.

How does SQuARE handle multi-row headers and merged cells?

SQuARE analyzes header depth and merge density to route queries along a structure-preserving retrieval path or to SQL over an automatically constructed relational view. A lightweight agent coordinates retrieval, refinement, or combination of results when confidence is low, preserving meaning in the final answer.

What problem does SQuARE solve?

It addresses the failures of flat-table processing and rigid SQL on messy spreadsheets, preventing wrong or unverifiable answers by maintaining header semantics, units, and hierarchical context throughout the retrieval and execution process.

How is SQuARE evaluated?

Evaluation focuses on accuracy against ground truth, preservation of header structure and units, auditability of results, and the ability to verify the returned values against the original cells in real spreadsheets.

How can researchers use SQuARE?

Researchers can explore the prototype or open-source components, integrate SQuARE with real spreadsheets, and benchmark it against traditional approaches to quantify gains in accuracy, interpretability, and verifiability of spreadsheet QA.

When Spreadsheets Speak: An Adaptive System That Gives Honest Answers From Complex Tables

Spreadsheets are the quiet workhors of business, science, and policy. They hold budgets, forecasts, health stats, and all sorts of analyses that teams rely on every day. But asking a question like “What was Net PP&E growth between 2020 and 2023, in USD millions, for Company X?” can be a headache if the sheet has multi-row headers, merged cells, or a stray unit line sprinkled in between the numbers. Traditional approaches either squeeze the data into a flat table (losing header meaning) or run rigid SQL queries (which stumble when the schema isn’t tidy). The result? answers that are hard to audit, hard to verify, or just plain wrong.

Enter SQuARE: a hybrid retrieval and execution framework that adapts to how a sheet is laid out. Rather than forcing every table through one fixed path, SQuARE looks at a sheet’s structure, routes the question to the path that preserves meaning, and uses a smart agent to blend signals when confidence is low. The goal is simple but powerful: return exact cells or rows as evidence, keep header context and units intact, and make results auditable and verifiable.

In this post, I’ll unpack what SQuARE does, why it matters, and what it could mean for anyone who works with real spreadsheets—whether you’re in finance, research, or governance.

What problem SQuARE is solving

Real-world spreadsheets aren’t tidy. They often have:
- Multi-row header zones that describe what data sits in each column across several levels of labels.
- Merged cells that convey important structure, like a group heading or a unit line (for example, “USD (millions)” placed between headers and values).
- Nonuniform layouts, with header paths, temporal labels, and embedded units sprinkled through the sheet.
Naive chunking breaks meaning. If you break a sheet into fixed-size chunks or flatten it into text, you risk severing the relationships that a row or a header describes.
Pure SQL can fail on irregular formats. SQL assumes a tidy schema and stable column semantics; it struggles with header nesting, unit rows, and ad hoc formatting.

SQuARE’s core idea is to decide, for each sheet and each question, which path is safest:
- Structure-preserving chunk retrieval for complex headers and units.
- Schema-aware SQL on an automatically inferred relational view for flat, well-structured parts.
- An agent that monitors confidence and can merge results from both paths when helpful.

The backbone: a simple, effective complexity signal

Rather than a one-size-fits-all approach, SQuARE uses a sheet-level complexity score to decide how to route queries. The score is built from two approachable cues:
- Header depth: how many header rows exist above the data.
- Merge density: how many cells are merged in the header region (an indicator of tight coupling between header levels and units).

A single threshold then gates the decision: sheets that look “Multi-Header” get routed to structure-preserving chunks; flatter sheets can use SQL as well as chunking. This normalization helps the system avoid brittle, global cutoffs and instead scale with the actual sheet layout.

How SQuARE actually works (at a high level)

Think of SQuARE as a modular conductor that decides how to retrieve evidence and how to answer. The stages roughly map to:
1) Complexity assessment
2) Index construction
3) Mode selection (structure-preserving chunks vs SQL)
4) Retrieval and verification
5) Evidence synthesis and fallback if needed

Here’s what that looks like in plain terms:

Complexity scoring (stage A)
- Look at header depth and merge density in the sheet.
- Use a normalized rule so the decision adapts to how much header there actually is, rather than using a fixed page-by-page cutoff.
- This yields a label for the sheet: Multi-Header or Flat.
Index construction (stage A)
- For all sheets, build a semantic index that describes visible content in a concise way (more on this below).
- If the sheet is Flat, also build a relational (SQL-friendly) view with a cleaned schema.
Structure-preserving path (when Multi-Header)
- Segment the sheet along header boundaries so each block is a coherent, self-describing region.
- Attach metadata to blocks: header path (the exact sequence of labels from outer to inner headers), time labels, and units if present.
- Create a compact textual description for each block (e.g., “Block describes Year 2021-2023 figures for Company X under header ‘Net Income’ with unit USD millions”).
- Convert these descriptions into embeddings and store them for semantic search.
- At query time, retrieve the top few blocks whose embeddings best match the question. Only a small number of blocks are forwarded to the answer model to keep context tight and precise.
SQL-based path (when Flat or when the system is confident)
- Infer a cleansed schema: identify columns, their names, and their data types. Capture units if present.
- Derive a safe, constrained SQL query space (a guarded, limited form of SQL: SELECT-FROM-WHERE-GROUP BY-ORDER BY-LIMIT; no DDL/DML).
- Use a lightweight agent (an LLM) to propose a SQL query snippet from the question and the inferred schema, then execute it. If the first attempt doesn’t yield a clean result, refine once or twice.
- Treat the returned rows as direct evidence for the answer.
Agent with confidence-aware fallback
- The system tracks a confidence score for the chunk-based path and for the SQL path.
- If the primary path is weak (low similarity to the question, empty results, or potential misalignment), the agent can switch modes or merge results from both paths.
- If the merged context is too large for the prompt budget, it’s summarized, but only if the final evidence still passes the quality checks.
- The goal is to avoid giving an answer when confidence is too low, and to provide exact rows/cells when possible.
Evidence-first results and modularity
- SQuARE is designed to surface the exact cells or rows used to answer, preserving header context and units where it matters.
- The architecture is intentionally modular: embeddings, vector stores, and databases can be swapped without changing the overall control flow.

What the researchers actually did (in practice)

They tested three sheet families to cover the spectrum:
- Complex multi-header corporate spreadsheets (e.g., balance sheets from large companies).
- A complex, merged World Bank workbook with heterogeneous sheets.
- Flat, single-header public tables.
They built a sizable QA corpus: around 900 questions across those datasets, spanning easy lookups to harder multi-predicate and multi-year comparisons.
They compared two model backends for the retrieval and answering steps (Gemma 3:12B and Llama 3.2:11B) and used ChatGPT-4o as a tool-free baseline. Across settings, SQuARE with Gemma generally led the pack.

Key results to illustrate the impact

Multi-header corporate spreadsheets:
- SQuARE with Gemma achieved about 91% exact-match accuracy, vs. ~81% with Llama and ~29% with ChatGPT-4o. The big jump here highlights the value of preserving header paths and units.
Merged World Bank workbook:
- SQuARE Gemma around 86% accuracy, compared with 74% (Llama) and 54% (ChatGPT-4o). The gains again reflect the importance of a structure-aware path when layouts are complex.
Flat tables (five datasets, 450 QA pairs total):
- SQuARE achieved about 93% overall accuracy, with 87% on the hard tier. ChatGPT-4o lagged more on the hard items where exact filters and multi-column predicates matter.
- The dual indexing and routing (vector-based plus SQL) paid off, especially in harder questions that require precise filtering and aggregation.

Retrieval quality (how well the system surfaces the right evidence)

For complex sheets, the chunk path reliably surfaced the necessary evidence in three chunks at high rates (roughly 86–100% recall in the best cases).
For flat tables, SQL reached near-perfect top-1 retrieval when the schema could be cleanly inferred; the vector path helped only in a minority of hard cases.
Ablations showed the value of the fallback and merge strategy: removing fallback reduced performance by several points, especially on complex sheets.

Latency and practical deployment notes

The experiments ran on modest hardware (quantized models on a T4 GPU with 15 GB VRAM).
End-to-end latency was in the same ballpark as the tool-free ChatGPT-4o, with the SQL path sometimes adding a small generation/refinement step.
The system is designed with lazy indexing and caching so it remains predictable and affordable in practice.

Limitations and directions for the future

Router learning and calibration: the current approach uses a prompt-based lightweight router. The authors suggest a learnable router with uncertainty estimates to reduce fallbacks and improve efficiency.
Layout variability: OCR’d tables and document-style inputs introduce new challenges. Additional detectors and layout parsers could be integrated before applying SQuARE’s retrieval backbone.
SQL robustness: schema drift and more complex joins require stronger schema alignment and safer join discovery. Extending to multi-sheet or cross-workbook queries is on the roadmap, with explicit, verifiable joins while maintaining the evidence-first interface.
Broadening baselines: more tool-enabled baselines and perturbation tests would provide a broader view of robustness in real-world environments.

Real-world implications and applications

Finance and governance: SQuARE’s strength lies in questions that hinge on header hierarchies and units. Imagine auditors verifying a line item across several quarters with evolving units, or analysts comparing segments across years with merged headings. SQuARE preserves the exact cell blocks and units, making results auditable and easy to verify.
Research and policy: World Bank-style datasets and cross-tab indicators are notoriously tricky to align. A hybrid path that maintains structure while enabling precise SQL-like filtering could accelerate exploratory data analysis and ensure reproducibility.
Data integration and reporting: For organizations that assemble dashboards from a mosaic of spreadsheets, SQuARE could help answer questions that span multiple sheets with different layouts, reducing the need to normalize everything into a single tidy table.

How you might use or adapt SQuARE in your own workflow

If your organization already uses spreadsheet-based analytics, consider building a lightweight prototype that:
- Detects header depth and merges in your sheets to classify them as Multi-Header or Flat.
- Creates a semantic index for complex sections (describe blocks with header paths and units) and a relational view for flat sections.
- Adds a small routing layer that chooses between a semantic, chunk-based search and an inferred SQL path, with a confidence-based fallback.
- Keeps exact cells or rows as evidence, so results stay auditable.
For teams serious about audit trails, the emphasis on evidence-first answers is a big win. It means you can point to the exact cells and units used to generate a conclusion, not just a summary or a paraphrase.

Key takeaways

Adaptation beats one-size-fits-all. SQuARE doesn’t push every question through the same pipeline. It assesses sheet structure and routes queries to the path best suited to preserve meaning and ensure verifiability.
Structure-aware retrieval matters more on complex sheets. When multi-row headers and merged cells carry crucial context, structure-preserving chunking with semantic indexing preserves that context far better than flattening or rigid SQL alone.
SQL over a learned, constrained schema is powerful for flat or well-behaved data. For simpler layouts, a schema-aware SQL path can deliver exact filtering and aggregation with high reliability.
An agent with confidence-aware fallback improves robustness. If evidence is weak on one path, the system can switch paths or merge results, reducing the chance of giving an unreliable answer.
Evidence-first outputs build trust. SQuARE emphasizes surfacing the exact rows or cells used to answer, along with header and unit context, which is crucial for auditability and validation.
The approach is modular and extensible. The framework is designed so you can swap in different embeddings, vector stores, or database backends as needed, making it adaptable to evolving models and tools.

If you’re curious about the future of spreadsheet QA, SQuARE offers a compelling blueprint: blend the best of structure-aware retrieval with the precision of SQL, guided by a lightweight, confidence-aware agent. It’s not about forcing every table into a single mold; it’s about respecting the way tables are actually laid out and delivering answers that are both accurate and auditable. In a world where data integrity matters as much as speed, that’s a difference you can see in the numbers—and in the confidence of the people who rely on them.

Key Takeaways (condensed)

Spreadsheets rarely conform to neat schemas, so a hybrid approach that adapts to structure yields better QA results than a one-path solution.
SQuARE uses a simple structural complexity score (header depth and merged cells) to route queries to either structure-preserving chunk retrieval or constrained SQL.
For multi-header sheets, preserving header paths and units is essential to getting correct, auditable answers.
For flat sheets, a schema-aware SQL path delivers fast, exact results with clear evidence.
A confidence-aware agent that can switch paths or merge results improves robustness and reliability.
The framework is modular and hardware-friendly, making it practical to deploy with existing embeddings, vector stores, and databases.
Real-world impact spans finance, governance, research, and cross-sheet analytics, enabling more trustworthy spreadsheet QA and easier auditability.

If you’d like, I can help you sketch a lightweight, end-to-end prototype plan inspired by SQuARE for your own data pipelines, including a simplified routing heuristic and a minimal evidence-tracking component.

When Spreadsheets Speak: An Adaptive System That Gives Honest Answers From Complex Tables

Frequently Asked Questions

Related Topics

About the Author

Unlock the full power of AI.