209
submitted 8 months ago* (last edited 8 months ago) by db0@lemmy.dbzer0.com to c/div0@lemmy.dbzer0.com

Follow-up from https://lemmy.dbzer0.com/post/15792108

I've spent a ton of hours trying to troubleshoot why lemmy.dbzer0.com is falling behind lemmy.world in federation. We have a good idea where it's going wrong, but we have no idea why.

The problem is that once I receive an apub sync from l.w and send the request to the lemmy backend, it takes about 1 second to process an apub sync, which is way too long (is should typically be < 100ms).

We had a look at the DB and it was somewhat slow due to syncing commits to disk. OK we disabled that, and now it's much faster (and less safe but whatever) but the sync times have not improved at all.

I've also made a lot of tests to ensure the problem is not coming from my loadbalancers and I am certain I've removed them from the equation. The issue is somewhere within the lemmy docker stuff and/or the postgresql DB.

Unfortunately I'm relying solely on other admins help on matrix, and at this point I'm being asked to recompile lemmy from scratch or deploy my own docker container with more debug instructions. Neither of these is within my skillset, so I'm struggling to make progress on this.

In the meantime we're falling further and further behind in the lemmy.world federation queue, (along with a lot of other instances). To clarify, the problem is not lemmy.world. It takes my instance the same time to receive apub syncs from every other server. It's just that the other servers don't have as much traffic so 1/s is enough to keep up. But lemmy.world has so much constant changes, 1/s is not nearly fast enough.

I'm continuing to dig on this as much as I can. But I won't lie that I could use some help.

I'll keep you all updated in this thread.

top 36 comments
sorted by: hot top controversial new old
[-] db0@lemmy.dbzer0.com 104 points 8 months ago* (last edited 8 months ago)

Final Update: Problem has been resolved by migrating my lemmy backend. We are currently catching up to lemmy.world, which should probably take all the next day. But the "distance" is reducing by hundreds of syncs per minute.

I will write a post-mortem soon

[-] 5714@lemmy.dbzer0.com 22 points 8 months ago

HACKERPERSON-Energy. Hope you feel relieved now.

[-] db0@lemmy.dbzer0.com 26 points 8 months ago
[-] SlyLycan@lemmy.dbzer0.com 9 points 8 months ago

Appreciate everything you do! Thanks

[-] 6daemonbag@lemmy.dbzer0.com 10 points 8 months ago

I'm interviewing for a job today that could drastically change my life. If I get it I'll be setting aside some $ to help maintain this instance. Your work here and beyond (ex. AI horde) are incredible and I'm looking forward to helping

[-] db0@lemmy.dbzer0.com 6 points 8 months ago

much appreciated mate!

[-] Stanley_Pain@lemmy.dbzer0.com 8 points 8 months ago
[-] KillingTimeItself@lemmy.dbzer0.com 6 points 8 months ago

congrats, as a self hoster myself, i understand the feeling of getting something up and running to your liking. It's the best feeling.

[-] Tywele@lemmy.dbzer0.com 5 points 8 months ago

Thank you for your continous work!

[-] frefi@lemmy.dbzer0.com 5 points 8 months ago

Thank you so much 🙏

[-] redditReallySucks@lemmy.dbzer0.com 4 points 8 months ago

Thanks a lot for putting the energy and time into fixing this.

[-] Blaze@reddthat.com 2 points 8 months ago
[-] db0@lemmy.dbzer0.com 36 points 8 months ago

Update: Most likely culprit here is the distance between my backend and my DB. I am now planning a migration. I'll keep you posted

[-] WeirdGoesPro@lemmy.dbzer0.com 29 points 8 months ago* (last edited 8 months ago)

Just reminding everyone that if you can’t contribute with coding knowledge, you can alternatively contribute with a donation through Ko-Fi. There is also an option for LiberaPay if you prefer that platform.

[-] db0@lemmy.dbzer0.com 28 points 8 months ago

Update: No appreciable change. I've now recompiled lemmy from scratch and added lemmy.world incoming apub federation to its own backend. Somehow that made the requests from LW 5x slower. This makes no sense...

I've tried tweaking the DB connection pool, with no appreciable effect to performance.

[-] agent_flounder@lemmy.world 6 points 8 months ago

Maybe this is a dumb question but is there any indication the server is cpu-, memory-, disk-, or network-bound?

[-] Painfinity@lemmy.dbzer0.com 24 points 8 months ago

Have you tried simply uninstalling and reinstalling?

I'll see myself out...

[-] Martineski@lemmy.dbzer0.com 10 points 8 months ago

Imo we should make a brand new instance to start clean.

[-] BananaOnionJuice@lemmy.dbzer0.com 23 points 8 months ago

We will call it dbzer1.com, it will be much better with hookers and blackjack!

[-] Painfinity@lemmy.dbzer0.com 7 points 8 months ago

Why stop there, we could make a religion out of this!

[-] wahming@monyet.cc 13 points 8 months ago

Probably a good idea to stop after several dozen iterations. dbsixty9.com sounds promising!

[-] half_fiction@lemmy.dbzer0.com 23 points 8 months ago

Thanks for the update. Appreciate all your hard work.

[-] taaz@biglemmowski.win 20 points 8 months ago* (last edited 8 months ago)

I can help with docker and some linux stuff mainly, just PM me. I could also offer my instance (of 2 active users) for testing if needed.

[-] Lucid5603@lemmy.dbzer0.com 9 points 8 months ago

I don't have experience with hosting lemmy, but I can help with the Linux and Docker parts too. Just PM me @db0@lemmy.dbzer0.com

[-] db0@lemmy.dbzer0.com 11 points 8 months ago

Thanks peeps. I already figured out the docker stuff. What I need is help debugging lemmy internals

[-] nyakojiru@lemmy.dbzer0.com 4 points 8 months ago

No help from Lemmy devs ? Sound they would be the only hope. Looks something deep in lemmy, is there a bug report open in their git ? sharing this and solving will help other instances in the future.

[-] moosetwin@lemmy.dbzer0.com 16 points 8 months ago

thanks for informing us on the situation

[-] db0@lemmy.dbzer0.com 15 points 8 months ago

Update: I have to take pict-rs down while I migrate

[-] Black_Beard@lemmy.dbzer0.com 9 points 8 months ago

I can help with docker and linux stuff if you need it, I work with them daily.

Thank you about all your work, you are amazing

[-] Martineski@lemmy.dbzer0.com 7 points 8 months ago

Good luck. o7

[-] Koto@lemmy.dbzer0.com 6 points 8 months ago

This sounds so very complicated, I wish I had the expertise needed to help you out. I hope a fix can come soon.

Thank you for all the time you're dedicating towards this issue and for keeping us updated. You're the best!

[-] FeelThePower@lemmy.dbzer0.com 5 points 8 months ago

Much appreciated update. Thank you for your work and everything you do. o7

[-] Stanley_Pain@lemmy.dbzer0.com 4 points 8 months ago

Best of luck. Hope we can figure this out.

this post was submitted on 07 Mar 2024
209 points (98.2% liked)

/0

1550 readers
4 users here now

Meta community. Discuss about this lemmy instance or lemmy in general.

Service Uptime view

founded 1 year ago
MODERATORS