BeanCounter: Automated Invoice Processing with AI

In many small businesses, invoices end up in the inbox as PDF attachments. Whether paid by bank transfer, direct debit, or PayPal – manual bookkeeping remains time-consuming and error-prone.
This project automates the entire process: it collects invoices from emails, analyzes them with AI, and automatically generates a structured expense overview in Excel. This makes accounting simpler, faster, and more reliable.

BeanCounter is an intelligent system for extractive invoice processing – from unstructured emails to validated Excel reports. Developed in Python, it combines classical data processing with local LLMs (Large Language Models) for multilingual document analysis.

Project Overview
#

This project automates the export, extraction, interpretation, validation, and preparation of invoice data from emails.
The goal was to eliminate manual data entry and create a transparent, reproducible processing pipeline.

Incoming emails → Invoice attachments → Document scanning → AI analysis → Validated JSON data → Excel output.

Motivation
#

Before the project:

Invoices arrived in various languages, layouts, and formats (HTML, PDF, DOCX, scans).
Manual entry was error-prone and extremely time-consuming.
No consistent data foundation for reporting and accounting.

Goal:

Automatic data recognition regardless of language or layout.
Full traceability and plausibility checks.
Export to structured, further-processable formats.

System Architecture
#

flowchart TD
    A[Email inbox]
    B[Export emails and attachments – categorized folders]
    C[Convert HTML to PDF]
    D[Docling processes PDFs, DOCX, images, and scans with OCR into Markdown or JSON]
    E[Locally running LLM extracts invoice data]
    F[Validator checks plausibility and format]
    G[Excel exporter creates expense overview]

    A --> B
    A --> C
    B --> D
    C --> D
    D --> E --> F --> G

    %% Feedback loop if needed
    F -. Feedback / Parameter adjustment .-> E

Process Summary:

Emails are automatically exported with their attachments into categorized folders.
HTML content is converted into PDF files for improved readability.
Docling converts PDFs, images, scans (using OCR), and DOCX files into a structured Markdown/JSON representation. OCR is only applied to scanned documents.
A local LLM (e.g., LLaMA 3.1) extracts the invoice number, date, amounts, and other relevant fields.
A validator checks plausibility and format (e.g., gross ≈ net + VAT, date format).
The Excel exporter generates a customizable tabular output as a basis for further processing.

Technology Stack
#

Component	Technology
Language	Python 3.11
Document Parser	Docling by IBM
AI Model	Ollama / llama.cpp (LLaMA 3.1, Granite3 Instruct)
Extraction & Validation	Custom LLM + Regex + Pydantic
Export	OpenPyXL (Excel), JSON
Logging	Python logging

Key Challenges
#

Handling multilingual documents (German, English, French, Italian, …)
Managing highly diverse layouts with no fixed structure
Dealing with inconsistent OCR quality and character recognition errors
Preventing LLM hallucinations during data extraction
Ensuring consistency and plausibility of amounts and dates

Solution Approach:
A hybrid system combining layout analysis (Docling), rule-based validation, and an AI model for semantic recognition, whose parameters were optimized through targeted training and reinforcement learning.

From Prototype to Agent Architecture
#

The original BeanCounter prototype was implemented entirely in Python and served to evaluate the core components –
in particular, the processing pipeline consisting of email export, Docling, LLM extraction, and validation.

The final version is based on a modular agent system, in which specialized agents handle individual processing steps:

Mail Agent: detects incoming emails, extracts attachments, and organizes them into structured directories.
Document Agent: converts HTML, PDFs, images, and scans (via Docling) into Markdown/JSON structures.
Extraction Agent: uses local LLMs for semantic field extraction (invoice number, date, amounts, etc.).
Validation Agent: checks plausibility, amount consistency, and date formats.
Export Agent: generates customizable Excel and JSON outputs for further processing.

This system is modular, extensible, and enables parallel processing of multiple emails and documents.
Thus, the original linear pipeline has evolved into an event-driven, autonomous agent architecture.

Results
#

97–100% accurate extraction of relevant fields (invoice number, date, amounts)
Processing within seconds, even for multi-page invoices
Automated reports in Excel format
Fully traceable JSON outputs for audits and accounting

Conclusion
#

This project demonstrates how AI models and classical Python data processing can be combined to
automatically and reliably process multilingual, unstructured invoices.
The combination of determinism (rules) and semantics (LLM) proved especially valuable – merging traceability with flexibility.

© 2025 Oskar Kohler. All rights reserved.
Note: The text was written manually by the author. Stylistic improvements, translations as well as selected tables, diagrams, and illustrations were created or improved with the help of AI tools.

Project Overview#

Motivation#

System Architecture#

Technology Stack#

Key Challenges#

From Prototype to Agent Architecture#

Results#

Conclusion#

Share this post – and help others benefit too.