Automating Firefly III budgeting
I have been running Firefly III for personal budgeting for a while now. It is a solid self-hosted finance manager, but the wife-approval-factor is pretty low thanks to its double-entry bookkeeping system. The thing that bothered me most was this: importing transactions from the bank leaves you with a wall of raw bank descriptions, no categories, no destination accounts, and no real sense of what you actually spent money on, unless you manually enter every transaction. You end up either categorising everything by hand, which defeats the point of importing, or not doing it, which makes your budgets meaningless.
I decided to fix that by turning it into a small pipeline of two Python scripts and a bash wrapper that now runs every morning without me touching it.
The problem
Enable Banking pulls transactions in from my bank once a day. What arrives looks like this:
|
|
The amounts are right. Everything else is not useful. The supermarket transaction has no category. The parking one has no destination account. The savings one is an internal transfer between my own accounts but arrives as a deposit with an unknown source. None of this feeds into budgets until it gets cleaned up.
Doing it manually for a few transactions is fine. Doing it for a hundred at once, which is what I found myself doing every month when I sat down to go through the budget, is not enjoyable.
What the pipeline does
Everything is wrapped in a single bash script that runs from cron at 7am. The steps run in sequence and sends a summary email at the end.
The first step runs the Firefly data importer, which pulls the last two days of transactions from Enable Banking. Deduplication is handled by the importer using an external ID from the bank, so running it daily does not create duplicates even if a transaction appeared in yesterday’s window too.
The second step runs firefly_enrich.py. It fetches the incomplete transactions from the last two days, any that are missing a category or a destination account, and sends them to Claude in batches of twenty via Claude Code CLI. Claude gets the full list of my categories, expense accounts, asset accounts and tags, and returns structured suggestions for each transaction. The script applies them, and leaves any that it can’t match untouched. It is possible to run the script in interactive mode where it prompts me to create an account when it cannot match a destination. I usually just go through my data once a month and adjust any destinations that didn’t get matched.
The third step runs receipt_split.py. It processes any receipt images sitting in a folder, sends each one to Gemini to read the line items, finds the matching Firefly transaction by amount and date, and splits it into individual entries per line item, each with its own category.
When everything finishes, the script sends a summary email via SMTP2GO.
The enricher
The enricher was the first thing I built and the most useful. The core idea is simple: collect everything Claude needs to know into a single prompt, send all the transactions at once rather than one at a time, and parse the response back into structured updates.
The prompt is assembled fresh on each run. It contains four lists pulled live from the Firefly API: every category, every expense account, every asset account, and every existing tag. Then it appends the batch of transactions to classify, each on a numbered line with its date, type, amount, description and current source and destination. The instructions tell Claude what to return for each one:
|
|
Claude responds with a JSON array, one object per transaction:
|
|
Giving Claude the actual account and category names matters more than it might seem. A suggestion of “Backblaze” as a destination account only does anything if “Backblaze” already exists in Firefly and is spelled the same way. By passing the real lists every run, the suggestions stay constrained to things the script can apply, and it tracks any new accounts or categories I have added since the last run. The script double-checks each suggestion against those lists before applying, so if Claude invents a category that does not exist, it gets quietly dropped rather than failing against the API.
The category and destination are deliberately separate concerns. The category is what drives budgets and reports, so it stays a small, controlled list. The destination is the expense account, the store or payee, which can be a much longer list and grows over time as new shops appear. A withdrawal at a supermarket gets category Groceries and destination Local Supermarket. If that supermarket has never been seen before, the script initially set it to null (if run interactively it offers to create the expense account on the spot) and I manually fix it at the end of the month. Almost all of my transactions have the same 5-10 destinations, so it is rarely I have to add a new one.
Internal transfers were the fiddly part. Enable Banking imports a transfer between my own accounts as a withdrawal and a matching deposit, both with unknown counterpart accounts. The obvious move would be to convert them to Firefly’s native transfer type, but that broke deduplication on subsequent imports, since the importer keys off the transaction type and external ID. So the script takes a lighter approach: it leaves the type alone and just fixes the account names. The withdrawal gets the correct destination asset account, the deposit gets the correct source asset account. They stay as two separate transactions, but the ledger now reads correctly. Claude flags these with the is_internal field by recognising descriptions like “From Savings” combined with a round amount moving between known asset accounts.
Tags came later, once I realised categories alone were not granular enough. A transaction can have a category of IT and tags of Subscription and Netflix at the same time. The category feeds the budget, the tags let me filter across categories later, for instance pulling up everything tagged Subscription regardless of whether it landed in IT, Entertainment or Utilities. Claude suggests new tags where it sees a pattern, and it is told not to repeat tags the transaction already carries, which avoids piling duplicates onto the booked and import tags the importer adds automatically.
Running it in test mode first is worth it:
|
|
This dry-runs on the first ten transactions, shows what Claude would do, and exits without changing anything. Once that looks reasonable, --dry-run does the same for everything, and dropping the flags runs it for real. For the daily cron it runs with --yes --days 2, which applies suggestions automatically and only looks at the last two days. When I want to catch up on a backlog I run it by hand with a wider window like --days 30.
The receipt splitter
The enricher handles categories at the transaction level. That is good enough for most things but not for grocery shops. A 217 DKK supermarket transaction categorised as Groceries tells you nothing about whether you are spending more on meat or cleaning products or snacks. That detail only exists on the receipt.
The splitter reads a receipt image, extracts every line item, and converts the single transaction into multiple splits in Firefly, each with its own category and amount. Discounts get absorbed into the item price rather than recorded as separate lines. If a pack of toilet paper rings up at 30 kr with a 5 kr member discount, the split records 25 kr, not 30 kr with a discount line next to it. The point is that each split reflects what I actually paid, so the sum of the splits matches the bank transaction and a discounted week shows up as cheaper.
The image goes to Gemini rather than Claude, since Gemini has a free tier and receipt OCR is well within what the smaller models handle. The prompt hands Gemini the same category list the enricher uses and asks for structured output:
|
|
The rules in the prompt do the heavy lifting: absorb per-item discounts into the price, ignore summary lines like totals and VAT and payment method, treat repeated identical lines as separate units, convert Danish comma decimals to dot floats, and always assign a category rather than leaving one null. Gemini returns something like:
|
|
The script then sums the line items and compares against the receipt total. If they differ by a rounding cent or two it adjusts the last line to reconcile, and if they differ by more than a krone it warns that Gemini likely missed an item.
Matching the receipt to a transaction is done on amount plus date. The receipt date is the purchase date, but the bank’s transaction date can be two or three days later, especially when a purchase on a Thursday is not processed until the following Monday. The default three-day window was not wide enough, so it is now seven days either side. The script also skips any transaction that already has more than one split, so running it twice on the same receipt cannot create duplicate entries. Firefly itself is the source of truth for whether a receipt has already been processed.
Getting receipts into the folder is handled by an iOS shortcut on the share sheet. It takes a screenshot or photo, gives it a timestamped filename, uploads it to SFTPGo over WebDAV. The next morning’s run picks it up. Successful splits delete the image, failures leave it in place and log the reason, so nothing is lost if Gemini is having a bad day.
The summary email
Each run ends with an email so I do not have to go looking. Early versions dumped the entire importer log into the body, which was hundreds of lines of DEBUG noise. Now the importer output gets filtered down to the lines that matter, the new and duplicate counts, and the email opens with a three-line overview:
|
|
Full detail only gets appended when a step fails. The subject line is prefixed with [FAILED] on any error, so once the thing is running smoothly I can filter the inbox and only get pinged when something needs a human.
One refinement that took a couple of iterations: when a receipt failed to parse, the script used to exit completely before it could log anything, so the email said “see log” and the log was empty. Raising the error instead of exiting lets the per-image loop catch it, record the reason, and carry on to the next receipt. The email now also tails the last ten lines of the receipt log whenever there is a failure, so the reason is right there in my inbox.
What it looks like in practice
After a few weeks of running this I have a clear picture of where the money goes without much manual work. The enricher handles the straightforward cases automatically. For the truly ambiguous ones, a peer-to-peer payment to a person, a currency transfer, it leaves them alone rather than guessing, and I clean those up by hand occasionally.
There are still rough edges. The enricher occasionally miscategorises something, and the Gemini models vary in accuracy on Danish receipt formats depending on which one is available on a given day. App-generated receipts are clean and parse reliably. Paper receipts photographed in bad light are a lottery.
But the volume problem is solved, and that was the whole point. The backlog of a few hundred uncategorised transactions I started with took one interactive session to clear. New transactions now arrive already categorised and tagged the next morning, and the summary email tells me at a glance whether anything needs attention. The budget went from a chore I avoided to something that mostly maintains itself.