An agent can do your receipt-and-expense bookkeeping. Vision API + folder watcher.

Q: What’s the mundane problem?

Self-employed workers, freelancers, and SMB owners spend 2 to 6 hours per month on receipt-to-spreadsheet entry [cite: https://en.wikipedia.org/wiki/Bookkeeping · 2026-03-10 · medium]. Phone photo → look at it → type vendor / date / amount / category into Xero or QuickBooks. Wash, repeat.

It’s the perfect agent task. Boring. Visual. Structured output. Verifiable.

Q: What’s the minimum stack?

Three components:

A folder where you drop receipt photos. Could be Dropbox, Google Drive, or a synced local folder.
A watcher that triggers when a new file lands.
A Claude vision call that extracts the structured fields and posts them to your accounting API [cite: https://docs.anthropic.com/en/docs/build-with-claude/vision · 2026-04-15 · high].

Q: What’s the prompt?

You are a receipt-extraction agent. For the receipt image attached,
output a single JSON object with these fields:

{
  "vendor": string (the merchant name),
  "date": string (ISO 8601 date),
  "amount_total": number (the grand total in the receipt's currency),
  "currency": string (3-letter ISO code),
  "tax_amount": number | null (VAT/sales tax, if shown),
  "category_guess": string (best guess: meals, travel, software, supplies, etc.),
  "confidence": number (0.0 to 1.0),
  "notes": string (anything unusual the human should review)
}

If a field is unclear or missing, set it to null and lower confidence.
Do not invent values.

Run that against claude-sonnet-4-5-20250929 with the receipt image attached. Accuracy on common formats lands above 95% [cite: https://reddit.com/r/Anthropic/comments/1sxj6s3/ · 2026-04-20 · medium].

Q: How does it wire up?

A watcher script (Python watchdog or Node chokidar) listens to your receipts folder. New file appears, the script:

Reads the image
Calls Claude vision with the prompt above
Parses the JSON
If confidence > 0.85 → posts to your accounting tool’s API (Xero, QuickBooks, FreeAgent — all have APIs) [cite: https://developer.xero.com/documentation/ · 2026-03-20 · high]
If confidence < 0.85 → moves the file to a “review” folder for human eyes

Total: ~50 lines of code. Most of it is the accounting API integration, not the AI part.

Q: What can’t the agent do?

Match a receipt to a specific bank transaction. That’s reconciliation. Different problem. Most accounting tools do it themselves once the receipt entry exists.
Decide if an expense is tax-deductible. Tax categorisation is jurisdiction-specific and gets you fined if wrong. The agent should mark category as a “guess” and the human approves.
Read handwritten notes. Yes it can OCR, but accuracy drops. Print receipts have ~99% extraction; handwritten edges land closer to 70%.

Q: Is this a privacy risk?

You’re sending receipt images to Anthropic’s API. They contain merchant names, dates, amounts. Anthropic doesn’t train on API content per the terms of service. Whether you’re comfortable depends on your business and threat model.

A more paranoid alternative: run a local vision model (LLaMA 3.2 Vision, Qwen2-VL) instead of Claude. Slightly less accurate, fully local. The trade-off is real but available. Reddit has good benchmarks: r/LocalLLaMA: “Vision model receipt extraction benchmarks 2026”.

Q: Which accounting tools work with this pattern?

Anything with a public API:

Xero [cite: https://developer.xero.com/documentation/ · 2026-03-20 · high]
QuickBooks
FreeAgent
Zoho Books
Wave (free)
A Google Sheet via Apps Script (the cheapest version)

Q: Do I still need a human?

Yes. Run the agent for the entry, run the human for the approve. The 95%+ accuracy is great but you do not want a 5% error rate quietly entering your books. The whole point is the human spends 10 seconds approving instead of 60 seconds typing.

Q: Can the agent file my taxes?

Not yet. Filing taxes is a compliance + signature problem. An agent could prepare the return, but signing it is on you. That said, agents are increasingly good at flagging the receipts that affect tax categorisation, which is most of the painful part. See: Wikipedia: Tax preparation in the United States.

An agent can do your receipt-and-expense bookkeeping. Vision API + folder watcher.

Q: What’s the mundane problem?

Q: What’s the minimum stack?

Q: What’s the prompt?

Q: How does it wire up?

Q: What can’t the agent do?

Q: Is this a privacy risk?

Q: Which accounting tools work with this pattern?

Q: Do I still need a human?

Q: Can the agent file my taxes?

Sources

Update log

Citation manifest

Entities

Are you a bot?