You can now ask Claude to check the #OTS timestamp for the Dutch municipal elections! :-)
More interestingly though, the data collection does really benefit from LLMs. This is the second election where I've been using them. Still not a smooth ride, but I'm getting better at the process. And the tools are getting better too.
There are 13k+ polling stations in the Netherlands. Volunteers count the votes (70x100cm paper ballots) and fill out a form by hand. After that, the computers take over to accumulate the results. But these paper forms are scanned and uploaded on 300+ municipal websites, creating an audit trail.
My project then tries to download these PDFs. I don't re-publish the documents themselves, just the list of URLs, their SHA256 checksums, and a timestamp. The repo has scripts that make it easy to redownload and compare checksums.
Collecting the URLs, however, is incredibly tedious. Every municipal website is different, the documents are named after the polling station, etc. The first time back in 2023, I spent days collecting the URLs, with scripts and with a few other people helping out. You'd think that for the next election, you just find and replace the year? Nope, all different again.
There are a couple of common CMS systems used. Some are easy, at most requiring some rate limiting and a fake user agent to prevent getting blocked. Still, other municipalities use some JavaScript frameworks or even Google Drive— until recently, those required a manual download.
This task can't be fully automated by scripts, yet it's too much work to do by hand. That makes agents a great fit. For 80% of the work, I was able to just say "Pick 25 municipalities and divide them over 5 sub-agents," and I'd end up with 25 commits half an hour later. Or a rate limiting error from GitHub Copilot :-)
I struggled more with the remaining ones, though I think I'm in good shape for the next election. As a rough guideline: GPT5.4 and Claude Sonnet (both at medium effort) can do most of the heavy lifting, of course with constant "encouragement", whereas subagents with GPT5.4 mini can call a script to fetch documents, do some sanity checks, and commit the result.
I briefly tried a local model (nemotron-cascade-2 with Ollama), but it started hallucinating PDF names. I might also try PayPerQ.

GitHub
GitHub - Sjors/verkiezingen-processen-verbaal
Contribute to Sjors/verkiezingen-processen-verbaal development by creating an account on GitHub.
