this post was submitted on 31 Oct 2025
87 points (98.9% liked)
askchapo
23164 readers
86 users here now
Ask Hexbear is the place to ask and answer ~~thought-provoking~~ questions.
Rules:
-
Posts must ask a question.
-
If the question asked is serious, answer seriously.
-
Questions where you want to learn more about socialism are allowed, but questions in bad faith are not.
-
Try !feedback@hexbear.net if you're having questions about regarding moderation, site policy, the site itself, development, volunteering or the mod team.
founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I'm currently considering trying to use a chatbot to semi-intelligently ocr a PDF to pull things out of a table and into a csv because it's like 400 entries, but then I keep thinking about how I'll have to check over that work and wondering if it's even worth trying to automate or if I should put on headphones with something upbeat and knock it out in an hour or two correctly.
The lack of correctness and the inability to trust it basically makes it useless for anyone who wants to do stuff right.
I think there are ways to minimize it. My job pays for gemini and i frequently use it to ocr serial numbers off scanned in pdfs. I can check these against records i already have so there is less chance for bad data to slip through. Maybe use a second llm to ocr it too and compare the results. Line both results up in the same spread sheet and highlight duplicate values. Anything thats not highlighted the llms got different results on and needs to be double checked. ๐คท Idk just a thought
For this task in particular, this would be somewhat foundational to a design and a believable but incorrect value could incur thousands of dollars in mistakes and time later on, some far harder to debug than others. It's essentially an age old battle between my brain and interacting with spreadsheets That I just need to get over. It would be cool if you could use llms in adversarial forms where they look to prove another llm wrong or verify output to some 3-4 9s of accuracy but I have a brain and can do that too.
I've worked on various hard problems that hit the limits of the llms pretty quickly. It's frustrating because so much of the information that used to be on the Internet is gone now, and what's left can't be found due to how bad search engines have gotten, and even using the llm as a search engine just pops up the same webpages I've already deemed as unhelpful.
Damn, well best of luck with that task then. I dread tedious work like that.
I definitely agree about search engines. I miss old google ๐ญ