Case study · Data engineering and business analytics

One source of truth for a café chain that outgrew its spreadsheets

A SQL project for a multi outlet café operation, built to replace three contradictory spreadsheet exports with a clean relational schema and a reporting layer that answers the questions operations actually ask every Monday morning.

SQLMySQLSchema designERDBusiness analyticsReporting

The problem

Every small chain hits the same wall. The first store runs on a notebook and a friendly POS. The second store adds a second spreadsheet. By the fifth store, three people are emailing three slightly different versions of "this month's revenue" and nobody can explain why the numbers don't match. The operations lead is making staffing and ordering decisions on whichever export they opened last.

The brief here was straightforward. Design a database that could hold the operational reality of a multi outlet café, covering outlets, products, categories, orders, customers and payments, and then write the SQL on top of it that answered the questions the business kept re asking. Which categories pay the rent. Which outlets are pulling above their weight. How much of revenue actually comes from regulars versus one time visitors.

The data

The raw inputs were the things you would expect from a café POS: a transactions export with order line items, a product list with category tags, an outlet directory, and a thin customer table tied to loyalty IDs where they existed. Nothing exotic, but also nothing relational. Categories were free text and inconsistent. The same outlet appeared with three different spellings. Orders had line items but no stable order ID across exports.

Before any analysis I built a normalised schema with the core entities (Outlet, Customer, Product, Product_Category, Orders, Payment, Reservation, Staff, Shift) plus the junction tables that make many to many relationships work cleanly (Order_Product, Outlet_Product, Staff_Shift), all wired together with proper foreign keys. Category labels were cleaned and mapped once, centrally, so every downstream query inherited a consistent taxonomy instead of having to redo the cleaning inline.

Loading the data was an exercise in restraint. The temptation was to also fix every weird data quality issue at ingest time, but I kept the staging tables faithful to the source and pushed the corrections into a clearly named transformation layer. That way, when operations inevitably asked "why does this number look different from my old spreadsheet?", I could show them the exact line of SQL that changed it and why.

Fig 0. Relational schema. Core entities and junction tables replacing three spreadsheet exports.

The approach

I built the analysis as a stack of layers rather than a pile of one off queries. The base layer was the normalised schema. On top of that I wrote a set of cleaned views, one per business concept (revenue per order, customer lifetime spend, outlet daily totals), and then a thin reporting layer of named queries that the operations team could call directly. Each layer had one job, and each one was easy to test in isolation.

The questions I prioritised were the ones that drove decisions. Where is revenue actually coming from, broken down by product category. Which outlets are over and under indexing relative to the chain average. How concentrated is revenue on loyal repeat customers versus one timers, and therefore how much marketing budget should defend the existing book versus chase new traffic. How does the product mix differ between outlets, so the procurement team isn't sending the same volume of every SKU to every store.

The thing I changed mid project was how I joined the customer table. My first pass tied every order to a customer ID via a left join and silently rolled walk in cash orders into a synthetic "unknown" customer. That made the loyalty analysis look much rosier than reality. I refactored it to split known and anonymous traffic into two tracks and report them side by side. The loyalty story got more honest and, oddly, more actionable.

Fig 1. Revenue by product category across all outlets

The result

A small number of categories did most of the work. Milk Coffees came in first at 63.70 in total revenue and Breakfast Meals followed at 55.50, tied with Salads & Bowls at the same 55.50, with the rest of the menu sitting noticeably below those three. Espresso Classics brought up the rear at just 3.50. That is the kind of finding that quietly justifies a simpler menu and a sharper marketing focus on the items that are already proven to pull. A long tail of low revenue categories looked tempting to cut on the surface, but cross checking against attachment rate showed several of them were ordered alongside the big sellers often enough to earn their place.

At the outlet level the gap was dramatic. Central pulled 269.00, Harbour pulled 249.40, and Campus came in at just 58.60. Central and Harbour are effectively the same size business with different product mixes, while Campus is a different category of problem entirely. The product mix per outlet view made it obvious that the chain was not one business but several. What sold in one location barely registered in another, which directly changed how procurement should think about allocation.

The loyalty split was the result that landed hardest with operations. Segmenting customers by the simple rule of two or more orders equals Loyal and a single order equals One timer flipped the usual story on its head. Campus, the weakest outlet on revenue, was actually the strongest on retention with 5 loyal customers and zero one timers. Central had 1 loyal customer alongside 8 one timers, and Harbour ran on a base of 10 one timers with no repeat visitors at all in the window. That single comparison reframed the conversation from "Campus is failing" to "Campus has a tiny but devoted base, while Harbour is burning through new traffic without converting any of it", which are very different problems with very different fixes.

Fig 2. Total revenue by outlet

Fig 3. Loyal versus one time customers per outlet

Outlet	Total orders	Completed	Revenue	Completion %
Central	14	11	124.30	79%
Harbour	12	10	106.50	83%
Campus	11	5	29.20	45%

Fig 4. Order to revenue flow by outlet

Fig 5. Product revenue by outlet (top sellers)

What's next

If I rebuilt this from scratch I'd model the analytical side as a small star schema from day one: a fact_orders table with dimension tables for outlet, product, customer and date, rather than evolving the OLTP tables into reports. The current shape works, but every new analytical question adds another awkward join, and a proper star would let the reporting layer stay flat and fast as the business grows.

I'd also add a thin views layer named for the business questions themselves: weekly_revenue_by_outlet, loyalty_share_by_month and top_products_by_outlet, so non technical staff can self serve through any BI tool without learning the underlying schema. The biggest win from a project like this isn't the dashboards I build. It's the queries the operations team starts writing for themselves once they trust the tables.

Curious about the schema, the SQL, or the ERD?

View on GitHub