When Spreadsheets Speak: An Adaptive System That Gives Honest Answers From Complex Tables
Spreadsheets are the quiet workhors of business, science, and policy. They hold budgets, forecasts, health stats, and all sorts of analyses that teams rely on every day. But asking a question like âWhat was Net PP&E growth between 2020 and 2023, in USD millions, for Company X?â can be a headache if the sheet has multi-row headers, merged cells, or a stray unit line sprinkled in between the numbers. Traditional approaches either squeeze the data into a flat table (losing header meaning) or run rigid SQL queries (which stumble when the schema isnât tidy). The result? answers that are hard to audit, hard to verify, or just plain wrong.
Enter SQuARE: a hybrid retrieval and execution framework that adapts to how a sheet is laid out. Rather than forcing every table through one fixed path, SQuARE looks at a sheetâs structure, routes the question to the path that preserves meaning, and uses a smart agent to blend signals when confidence is low. The goal is simple but powerful: return exact cells or rows as evidence, keep header context and units intact, and make results auditable and verifiable.
In this post, Iâll unpack what SQuARE does, why it matters, and what it could mean for anyone who works with real spreadsheetsâwhether youâre in finance, research, or governance.
What problem SQuARE is solving
- Real-world spreadsheets arenât tidy. They often have:
- Multi-row header zones that describe what data sits in each column across several levels of labels.
- Merged cells that convey important structure, like a group heading or a unit line (for example, âUSD (millions)â placed between headers and values).
- Nonuniform layouts, with header paths, temporal labels, and embedded units sprinkled through the sheet.
- Naive chunking breaks meaning. If you break a sheet into fixed-size chunks or flatten it into text, you risk severing the relationships that a row or a header describes.
- Pure SQL can fail on irregular formats. SQL assumes a tidy schema and stable column semantics; it struggles with header nesting, unit rows, and ad hoc formatting.
SQuAREâs core idea is to decide, for each sheet and each question, which path is safest:
- Structure-preserving chunk retrieval for complex headers and units.
- Schema-aware SQL on an automatically inferred relational view for flat, well-structured parts.
- An agent that monitors confidence and can merge results from both paths when helpful.
The backbone: a simple, effective complexity signal
Rather than a one-size-fits-all approach, SQuARE uses a sheet-level complexity score to decide how to route queries. The score is built from two approachable cues:
- Header depth: how many header rows exist above the data.
- Merge density: how many cells are merged in the header region (an indicator of tight coupling between header levels and units).
A single threshold then gates the decision: sheets that look âMulti-Headerâ get routed to structure-preserving chunks; flatter sheets can use SQL as well as chunking. This normalization helps the system avoid brittle, global cutoffs and instead scale with the actual sheet layout.
How SQuARE actually works (at a high level)
Think of SQuARE as a modular conductor that decides how to retrieve evidence and how to answer. The stages roughly map to:
1) Complexity assessment
2) Index construction
3) Mode selection (structure-preserving chunks vs SQL)
4) Retrieval and verification
5) Evidence synthesis and fallback if needed
Hereâs what that looks like in plain terms:
Complexity scoring (stage A)
- Look at header depth and merge density in the sheet.
- Use a normalized rule so the decision adapts to how much header there actually is, rather than using a fixed page-by-page cutoff.
- This yields a label for the sheet: Multi-Header or Flat.
Index construction (stage A)
- For all sheets, build a semantic index that describes visible content in a concise way (more on this below).
- If the sheet is Flat, also build a relational (SQL-friendly) view with a cleaned schema.
Structure-preserving path (when Multi-Header)
- Segment the sheet along header boundaries so each block is a coherent, self-describing region.
- Attach metadata to blocks: header path (the exact sequence of labels from outer to inner headers), time labels, and units if present.
- Create a compact textual description for each block (e.g., âBlock describes Year 2021-2023 figures for Company X under header âNet Incomeâ with unit USD millionsâ).
- Convert these descriptions into embeddings and store them for semantic search.
- At query time, retrieve the top few blocks whose embeddings best match the question. Only a small number of blocks are forwarded to the answer model to keep context tight and precise.
SQL-based path (when Flat or when the system is confident)
- Infer a cleansed schema: identify columns, their names, and their data types. Capture units if present.
- Derive a safe, constrained SQL query space (a guarded, limited form of SQL: SELECT-FROM-WHERE-GROUP BY-ORDER BY-LIMIT; no DDL/DML).
- Use a lightweight agent (an LLM) to propose a SQL query snippet from the question and the inferred schema, then execute it. If the first attempt doesnât yield a clean result, refine once or twice.
- Treat the returned rows as direct evidence for the answer.
Agent with confidence-aware fallback
- The system tracks a confidence score for the chunk-based path and for the SQL path.
- If the primary path is weak (low similarity to the question, empty results, or potential misalignment), the agent can switch modes or merge results from both paths.
- If the merged context is too large for the prompt budget, itâs summarized, but only if the final evidence still passes the quality checks.
- The goal is to avoid giving an answer when confidence is too low, and to provide exact rows/cells when possible.
Evidence-first results and modularity
- SQuARE is designed to surface the exact cells or rows used to answer, preserving header context and units where it matters.
- The architecture is intentionally modular: embeddings, vector stores, and databases can be swapped without changing the overall control flow.
What the researchers actually did (in practice)
They tested three sheet families to cover the spectrum:
- Complex multi-header corporate spreadsheets (e.g., balance sheets from large companies).
- A complex, merged World Bank workbook with heterogeneous sheets.
- Flat, single-header public tables.
They built a sizable QA corpus: around 900 questions across those datasets, spanning easy lookups to harder multi-predicate and multi-year comparisons.
They compared two model backends for the retrieval and answering steps (Gemma 3:12B and Llama 3.2:11B) and used ChatGPT-4o as a tool-free baseline. Across settings, SQuARE with Gemma generally led the pack.
Key results to illustrate the impact
Multi-header corporate spreadsheets:
- SQuARE with Gemma achieved about 91% exact-match accuracy, vs. ~81% with Llama and ~29% with ChatGPT-4o. The big jump here highlights the value of preserving header paths and units.
Merged World Bank workbook:
- SQuARE Gemma around 86% accuracy, compared with 74% (Llama) and 54% (ChatGPT-4o). The gains again reflect the importance of a structure-aware path when layouts are complex.
Flat tables (five datasets, 450 QA pairs total):
- SQuARE achieved about 93% overall accuracy, with 87% on the hard tier. ChatGPT-4o lagged more on the hard items where exact filters and multi-column predicates matter.
- The dual indexing and routing (vector-based plus SQL) paid off, especially in harder questions that require precise filtering and aggregation.
Retrieval quality (how well the system surfaces the right evidence)
- For complex sheets, the chunk path reliably surfaced the necessary evidence in three chunks at high rates (roughly 86â100% recall in the best cases).
- For flat tables, SQL reached near-perfect top-1 retrieval when the schema could be cleanly inferred; the vector path helped only in a minority of hard cases.
- Ablations showed the value of the fallback and merge strategy: removing fallback reduced performance by several points, especially on complex sheets.
Latency and practical deployment notes
- The experiments ran on modest hardware (quantized models on a T4 GPU with 15 GB VRAM).
- End-to-end latency was in the same ballpark as the tool-free ChatGPT-4o, with the SQL path sometimes adding a small generation/refinement step.
- The system is designed with lazy indexing and caching so it remains predictable and affordable in practice.
Limitations and directions for the future
- Router learning and calibration: the current approach uses a prompt-based lightweight router. The authors suggest a learnable router with uncertainty estimates to reduce fallbacks and improve efficiency.
- Layout variability: OCRâd tables and document-style inputs introduce new challenges. Additional detectors and layout parsers could be integrated before applying SQuAREâs retrieval backbone.
- SQL robustness: schema drift and more complex joins require stronger schema alignment and safer join discovery. Extending to multi-sheet or cross-workbook queries is on the roadmap, with explicit, verifiable joins while maintaining the evidence-first interface.
- Broadening baselines: more tool-enabled baselines and perturbation tests would provide a broader view of robustness in real-world environments.
Real-world implications and applications
- Finance and governance: SQuAREâs strength lies in questions that hinge on header hierarchies and units. Imagine auditors verifying a line item across several quarters with evolving units, or analysts comparing segments across years with merged headings. SQuARE preserves the exact cell blocks and units, making results auditable and easy to verify.
- Research and policy: World Bank-style datasets and cross-tab indicators are notoriously tricky to align. A hybrid path that maintains structure while enabling precise SQL-like filtering could accelerate exploratory data analysis and ensure reproducibility.
- Data integration and reporting: For organizations that assemble dashboards from a mosaic of spreadsheets, SQuARE could help answer questions that span multiple sheets with different layouts, reducing the need to normalize everything into a single tidy table.
How you might use or adapt SQuARE in your own workflow
- If your organization already uses spreadsheet-based analytics, consider building a lightweight prototype that:
- Detects header depth and merges in your sheets to classify them as Multi-Header or Flat.
- Creates a semantic index for complex sections (describe blocks with header paths and units) and a relational view for flat sections.
- Adds a small routing layer that chooses between a semantic, chunk-based search and an inferred SQL path, with a confidence-based fallback.
- Keeps exact cells or rows as evidence, so results stay auditable.
- For teams serious about audit trails, the emphasis on evidence-first answers is a big win. It means you can point to the exact cells and units used to generate a conclusion, not just a summary or a paraphrase.
Key takeaways
- Adaptation beats one-size-fits-all. SQuARE doesnât push every question through the same pipeline. It assesses sheet structure and routes queries to the path best suited to preserve meaning and ensure verifiability.
- Structure-aware retrieval matters more on complex sheets. When multi-row headers and merged cells carry crucial context, structure-preserving chunking with semantic indexing preserves that context far better than flattening or rigid SQL alone.
- SQL over a learned, constrained schema is powerful for flat or well-behaved data. For simpler layouts, a schema-aware SQL path can deliver exact filtering and aggregation with high reliability.
- An agent with confidence-aware fallback improves robustness. If evidence is weak on one path, the system can switch paths or merge results, reducing the chance of giving an unreliable answer.
- Evidence-first outputs build trust. SQuARE emphasizes surfacing the exact rows or cells used to answer, along with header and unit context, which is crucial for auditability and validation.
- The approach is modular and extensible. The framework is designed so you can swap in different embeddings, vector stores, or database backends as needed, making it adaptable to evolving models and tools.
If youâre curious about the future of spreadsheet QA, SQuARE offers a compelling blueprint: blend the best of structure-aware retrieval with the precision of SQL, guided by a lightweight, confidence-aware agent. Itâs not about forcing every table into a single mold; itâs about respecting the way tables are actually laid out and delivering answers that are both accurate and auditable. In a world where data integrity matters as much as speed, thatâs a difference you can see in the numbersâand in the confidence of the people who rely on them.
Key Takeaways (condensed)
- Spreadsheets rarely conform to neat schemas, so a hybrid approach that adapts to structure yields better QA results than a one-path solution.
- SQuARE uses a simple structural complexity score (header depth and merged cells) to route queries to either structure-preserving chunk retrieval or constrained SQL.
- For multi-header sheets, preserving header paths and units is essential to getting correct, auditable answers.
- For flat sheets, a schema-aware SQL path delivers fast, exact results with clear evidence.
- A confidence-aware agent that can switch paths or merge results improves robustness and reliability.
- The framework is modular and hardware-friendly, making it practical to deploy with existing embeddings, vector stores, and databases.
- Real-world impact spans finance, governance, research, and cross-sheet analytics, enabling more trustworthy spreadsheet QA and easier auditability.
If youâd like, I can help you sketch a lightweight, end-to-end prototype plan inspired by SQuARE for your own data pipelines, including a simplified routing heuristic and a minimal evidence-tracking component.