How a Data Lakehouse Cuts Reporting Time from 2 Days to 15 Minutes

This is an anonymized client case. Some implementation details and figures were simplified or rounded to protect confidentiality while preserving the technical approach and business outcomes.

TL;DR: In one anonymized mid-market migration, we replaced spreadsheet-heavy reporting and a legacy transactional database with an analytical stack built around object storage, dbt, and DuckDB. Reporting latency dropped from days to minutes, manual preparation work fell substantially, and reporting consistency improved during the parallel-run validation period.

Updated: March 3, 2026 | By Dmitry Susha, CTO & Co-Founder

The Starting Point

The company - a B2B distributor with 200+ employees - had a data problem common among mid-market businesses:

12 Excel spreadsheets maintained manually by 3 analysts
Legacy PostgreSQL database with 5 years of transactional data (180GB)
ERP system (1C) generating daily exports that analysts copy-pasted into Excel
2 days to produce a monthly sales report
No single source of truth - different departments had different numbers

The CEO’s question was simple: “Why does it take my team 2 days to tell me how much we sold last month?”

The Architecture We Built

We designed a lightweight Lakehouse architecture using open-source tools:

Data Sources          Ingestion         Storage           Analytics
┌──────────┐         ┌────────┐        ┌────────┐        ┌──────────┐
│ PostgreSQL├────────►│        │        │        │        │          │
├──────────┤         │Airflow │───────►│ Parquet├───────►│  DuckDB  │
│ 1C Export ├────────►│  DAGs  │        │  on S3 │        │          │
├──────────┤         │        │        │        │        │ Dashboard│
│  Excel   ├────────►│        │        │(Iceberg│        │  Queries │
└──────────┘         └────────┘        │ format)│        └──────────┘
                          │            └────────┘             │
                          ▼                                   ▼
                     ┌────────┐                          ┌──────────┐
                     │  dbt   │                          │ Metabase │
                     │ models │                          │ Dashboards│
                     └────────┘                          └──────────┘

Key technology choices:

DuckDB for the query engine (180GB fits comfortably, no server needed)
Apache Iceberg on S3 for storage (versioned, ACID-compliant)
dbt for transformations (SQL models with tests and documentation)
Airflow for orchestration (scheduled ingestion from all sources)
Metabase for dashboards (self-hosted, via DuckDB community connector)
Terraform for infrastructure (reproducible, version-controlled)

Infrastructure cost dropped substantially compared to the managed database setup - primarily S3 storage and a small compute instance for Airflow.

The Migration Process

Week 1-2: Audit and Data Modeling

We started by mapping every data source, every Excel formula, and every report the team produced. Key findings:

The majority of analyst time was spent on data preparation, not analysis
4 out of 12 spreadsheets contained duplicate data from the same source
Monthly reports had recurring discrepancies between departments
The PostgreSQL database had 47 tables, but only 8 were used for reporting

We designed a dimensional model with 8 fact tables and 12 dimension tables - covering 100% of existing reports with room for new analytics.

Week 3-4: Pipeline Construction

Using Airflow, we built automated ingestion pipelines:

PostgreSQL: incremental changes were captured and materialized into analytical storage on a scheduled cadence (the exact transport layer depends on the client environment and reliability requirements)
1C: automated file export + Airflow sensor, processing within 30 minutes
Excel: uploaded to S3, parsed and validated automatically with Great Expectations

All data landed in Parquet format on S3 with Iceberg metadata. dbt models transformed raw data into analytical tables with built-in tests - referential integrity, value ranges, freshness checks.

Week 5-6: Dashboards and Validation

We rebuilt every Excel report as a Metabase dashboard. The CEO’s monthly sales report became a live dashboard updating every 15 minutes.

Validation process:

Ran old and new reports in parallel for 2 weeks
Compared numbers line by line - identified and fixed 3 data quality issues in the source systems
Got sign-off from each department head

The rollout was designed to minimize disruption by running the legacy and new reporting paths in parallel until stakeholders signed off on validated outputs.

The Results

Before vs After

Metric	Before (Excel + PostgreSQL)	After (Lakehouse)
Monthly sales report	2 days manual work	15 minutes (automated)
Data freshness	Daily (manual export)	15-minute lag
Number of data sources	3 (disconnected)	3 (unified)
Manual data prep	Majority of analyst time	Minimal (automated pipelines)
Report discrepancies	Recurring between departments	Converged on a single trusted definition during rollout
Infrastructure cost	Managed DB + manual ops	~90% lower (S3 + DuckDB)
Analysts needed for reporting	3 full-time on data prep	Reporting support time reduced substantially

ROI Summary

The project paid for itself within the first quarter. The largest savings came from infrastructure (managed DB → S3 + DuckDB) and analyst time freed from manual data preparation. The team redirected that time to actual analysis - finding patterns, testing hypotheses, and supporting business decisions.

What We Learned

1. Start with the CEO’s question, not the technology. The project succeeded because we solved a specific business problem (“why does the report take 2 days?”), not because we deployed cool technology.

2. Excel is not the enemy - it’s the symptom. The real problem was disconnected data sources and no automation. Analysts used Excel because there was no better option.

3. For this workload, DuckDB was sufficient. 180GB of data, 8 fact tables, 50+ dashboard queries - DuckDB handled everything on a single instance without introducing a heavier serving stack.

4. dbt tests catch problems early. Within the first week of production, dbt tests caught a schema change in the 1C export that would have silently broken 3 reports. Automated quality checks paid for themselves immediately.

5. Parallel running removes fear. The biggest barrier to migration isn’t technical - it’s trust. Running old and new systems in parallel let stakeholders verify numbers at their own pace.

Frequently Asked Questions

How long does it take to migrate from Excel to a Data Lakehouse?

A typical migration takes 4-8 weeks for the MVP phase. This includes connecting data sources, building core pipelines, and deploying initial dashboards. Full automation with AI quality checks adds another 4-8 weeks.

What ROI can I expect from a Data Lakehouse?

Based on our implementations, companies typically see significant reductions in reporting time and manual data preparation. Infrastructure costs drop substantially compared to managed warehouse setups. Exact numbers depend on data volume, source complexity, and current stack - we share specific estimates after a discovery call.

Can we keep using Excel during the migration?

Yes. We run the Lakehouse in parallel with existing systems. Your team continues using Excel and legacy tools while we build and validate the new platform. The switch happens only after thorough testing.

Key Takeaways

Reporting time dropped from 2 days to 15 minutes - fully automated
Infrastructure costs dropped ~90% (managed DB → S3 + DuckDB)
Analyst time on manual data prep reduced dramatically
DuckDB handled 180GB on a single instance - no complex infrastructure needed
Project paid for itself within the first quarter

Sources and Further Reading

Metabase DuckDB Connector - community connector status and setup
DuckDB S3 Support - reading from object storage
AWS S3 Pricing - storage cost reference
DuckDB: Working with Huge Databases - scaling guidance

Some details in this case study were simplified or rounded to protect client confidentiality. Outcomes should not be treated as guaranteed for other environments.

Reviewed by Dmitry Susha, CTO & Co-Founder at Sfotex. Last reviewed: March 2026. Contact: Telegram | Email