We built a marketing dashboard we control, without API access to the main source
We built a marketing dashboard a client owns outright. One database we control, refreshed every day, with every chart reading from a single agreed definition of each number. One source had no way for us to pull its data automatically, so we read the numbers straight off the report image with an AI model and wrote them into the database. We call that pattern image-to-row ingestion. It replaced hours of manual reporting and closed a measurable staleness gap. There is a catch: we have to keep it running.
The reporting that didn't agree with itself
Picture the head of marketing on a Monday. Four tabs open. The paid scorecard says one thing about last week. The funnel report says something that cannot both be true. Neither was wrong, exactly. They were built at different times, off different copies of the data, with the word "lead" meaning two different things in two different tools.
That gap cost more than confidence. It cost hours every reporting cycle in copy-paste and reconciliation, and it meant the numbers people acted on were already stale by the time they hit a slide.
There are plenty of tools for this. We did not buy one. The client wanted one thing: a dashboard he could actually trust. As the agency building it, we wanted two more. We wanted to own the data and the definitions rather than rent them from a vendor. And we needed a way to handle the one source we could not pull from automatically.
That last constraint is the interesting part. The standard advice assumes every source has an API you can connect to. This is the implementation diary for the case where one does not.
A flag before we start: this is a control story, not a cost story. We are not claiming this is the cheapest dashboard or the best one. We picked control and accepted the maintenance.
Why is building a marketing dashboard so expensive and so fragile?
Most marketing dashboards get expensive and brittle for four reasons, and they stack: you pay per connector, you pay more as usage grows, you pay to keep it from breaking, and you pay in arguments about whose number is right. We call these the four taxes.
The connector tax
You pay to plug each source in. Connector pricing varies by partner and source, and every extra feed adds another dependency. None of this is the dashboard. It is the cost of getting data near the dashboard.
The escalation tax
The entry price is not the price. Connector and BI setups can start in the hundreds per month and climb into the thousands as data volume grows. Power BI, Tableau and Funnel all follow the same pattern: the number that gets you in the door is not the number you pay in year two.
The maintenance tax
Even when you build it yourself, it breaks quietly. A source changes its API and a chart goes blank, or worse, goes wrong without telling you. Self-hosting saves licence fees and spends engineer hours instead.
The governance tax
This is the one the pricing pages do not list. When data accuracy and data silos block forecasting, what teams are describing is the Monday-morning problem: two reports, two definitions, no single source of truth.
We pay a version of the first tax too. We pay a flat connector fee on the clean sources, same as anyone. And we took the maintenance tax on deliberately, because it is the price of control. What this build removes is the governance tax, structurally, by making one definition the only definition.
The shape of the build: two lanes in, one definition out
The whole system is two lanes feeding one store. Lane A is the easy one. For sources we can connect to normally, the data flows straight into the database on a schedule. Lane B is the read-only source: the one we can only look at, not pull from.
No API available to us, so the data arrives as a scheduled report image in an inbox. We read the printed numbers off that image with a vision model and write them into the database as a row. That is image-to-row ingestion. The vendor world treats reading a report image as a failure state. We made it a normal, supported lane.
Both lanes land in the same place: one daily table in the database. Every chart reads from one shared set of metric definitions that sits on top of that table. That last sentence is the entire control thesis.
Where should the data actually live?
It should live in a database you own, kept as a fresh dated copy every day, not in a vendor's private store you cannot see into or check. We used Postgres, hosted on Supabase. The setup keeps one line of data per metric, per source, per day, so nothing ever overwrites yesterday.
The pattern is what we call a snapshot archive: a photo of every number, taken daily and kept. You do not update yesterday's figure when today's arrives. You write a new dated line.
-- one row per metric, per source, per day. nothing gets overwritten.
create table metric_snapshot (
id bigint generated always as identity primary key,
snapshot_date date not null,
source text not null, -- 'lane_a_paid', 'read_only_report', etc.
metric_key text not null, -- references the shared definitions
metric_value numeric,
ingest_method text not null, -- 'connector' | 'image_to_row'
confidence text not null default 'ok', -- 'ok' | 'flagged' | 'null'
raw_ref text, -- pointer to source image / API payload
created_at timestamptz not null default now(),
unique (snapshot_date, source, metric_key)
); How do you define a metric once so every chart agrees?
You write the definition in one place, the database reads it, and every chart reads the database. No chart carries its own logic. Charts cannot disagree about "qualified lead" if none of them is allowed to define it.
Most dashboard arguments are not about the chart. They are about the definition behind it, scattered across five tools, each slightly different. Pull the definition into one owned place and the argument has nowhere to live.
How do you pull the sources that do have access?
For the clean-auth sources, the ones we are allowed to pull from, you set up an automatic daily pull into the database and move on. This is the boring lane and it should stay boring.
The only judgment call here is which sources are reliable enough to leave running unattended. A source with a stable, well-documented connection goes on autopilot. A source that changes its export format every few months does not, even if it technically offers an API.
How do you get data out of a source with no API?
You read the printed numbers off the report image with a vision model and write them into the database. The scheduled report lands in an inbox. n8n picks it up, hands the image to the AI model with a strict read-only-what-is-printed instruction, checks the result, and writes it in.
- The source emails a scheduled report, as an image, to a dedicated inbox.
- n8n watches that inbox and fires when a report arrives.
- n8n hands the image to the vision model with the extraction prompt below.
- The model returns JSON and only the values actually printed on the page.
- Validation runs.
- A clean line is written to the database, tagged
ingest_method = 'image_to_row'.
The model's only job is to read what is printed. Not to calculate, not to guess, not to help. Every instinct a chat model has toward being helpful is a liability here.
You are a precise data-extraction tool. You are given an image of a
marketing report. Your only job is to read values that are printed on
the image and return them as JSON. You do not calculate, infer, estimate,
or fill gaps. You are not an assistant. You do not explain.
RULES
1. Return only values that are visibly printed in the image.
2. If a value you are asked for is not printed, return null for it.
Never substitute a guess, a rounded figure, or a value derived from
other numbers on the page.
3. Do not perform arithmetic. If the image shows a total, read the
printed total. Do not add the parts yourself, even if they are shown.
4. Read numbers exactly as printed, including the decimal places shown.
Do not round. Do not reformat. Strip thousands separators only.
5. If a number is partly obscured, unreadable, or ambiguous, return null
for that field and add its key to "unreadable".
6. Do not interpret labels. Map only to the exact metric keys listed
below. If a printed label does not clearly match one of these keys,
ignore it.
7. Return only the JSON object. No preamble, no commentary, no markdown.
METRIC KEYS TO EXTRACT
[ list the exact metric keys expected in this report, e.g.
"spend", "impressions", "clicks", "leads", "qualified_leads" ]
OUTPUT SCHEMA
{
"report_date": "<the date printed on the report, or null>",
"values": {
"<metric_key>": <number or null>,
...
},
"unreadable": [ "<metric_key>", ... ],
"notes": "<only if the report's visible structure differs from the
expected keys, name what differs in one short sentence;
otherwise empty string>"
} How do you stop the model making numbers up?
You do not trust the extraction. You validate every row before it is allowed into the table, with numeric checks, label checks, and sanity bounds against the previous snapshot.
- ✓
Type and format checks. Every value is a number or an explicit null.
- ✓
No duplicate labels. If two metric keys resolve to the same printed value in a way that should not happen, flag it.
- ✓
Sanity bounds against the prior snapshot. Real spikes happen, so the row is flagged for a human rather than silently dropped.
- ✓
Read the model's notes. If the model reports a structural difference, the row is held for review automatically.
Anything flagged writes to a health table with the reason, and an alert goes out. The clean rows flow through. The flagged ones wait.
How do you present it so the client actually trusts it?
You show the number and you show how fresh it is, in the same view. We built the part the client actually looks at in Metabase and added two things most dashboards skip: caveat cards and a pipeline-health card.
Trust is not built by hiding the messy lane. It is built by showing it and showing it is checked. A client who can see that the read-only number was validated this morning trusts it more than a number with no history attached at all.
One thing this build is not: there is no "ask your data a question in plain English" feature here. The AI model lives in the import step, reading report images. It does not sit between the client and the charts.
How do you keep it current without babysitting it?
You schedule it and you let the health checks tell you when to look. The clean lane pulls daily. The image-to-row lane fires when the report lands. Validation runs on every row.
The honest version: this still needs an owner. Scheduled does not mean abandoned. Someone has to act on the alerts and re-key the extraction when a source changes shape. It is hours a month, not hours a week, but it is not zero.
What changed, in numbers
Anonymised, and shown as before-and-after differences plus our own running costs, because the client's actual figures are not ours to publish.
Manual reconciliation reduced to a scheduled process.
Numbers moved to the daily snapshot schedule.
Contradiction removed by construction, not negotiation.
- ·
Cost to us for each extra client: low, because the shared setup already exists.
- ·
Cost for one client building from scratch: higher, because that client pays for the full setup alone.
The artefacts
Everything you need to understand the build is ungated. The page shows the architecture, the prompts, the database shape, and the trade-offs.
- The full extraction prompt.
- The architecture diagram, including the two-lane image-to-row pattern.
- The database table setup.
- We can review whether this pattern fits your reporting stack.
- We can scope the ingestion, validation, and dashboard work.
- We can tell you when a simpler off-the-shelf setup is the better answer.
When you should not do this
This build is right for a specific situation. Here is where it is the wrong answer, plainly, because the wrong reader trying this will waste a month.
- ×
You have four or five clean sources and nothing read-only. Use free Looker Studio.
- ×
You have someone in-house who is comfortable with databases and has the time.
- ×
You need ready-made, branded dashboards for your own clients out of the box. A packaged BI plan may be the better fit.
- ×
All your sources have clean, affordable APIs. Skip the image-to-row lane entirely.
- ×
You would rather a vendor owned the maintenance than own the control. Completely legitimate.
- ×
You have huge data volume or hundreds of people hitting the dashboard at once.
- ×
You need the numbers updated minute by minute. This refreshes once a day.
If you have a source you cannot get an API to and a dashboard that argues with itself, that is the exact problem this solves.
Book a call and we will tell you whether it is worth doing for you, including when it is not.
Book a call