25
State of the shork! (lemmy.blahaj.zone)

So it's been a few days, where are we now?

I also thought given the technical inclination of a lot of our users that you all might be somewhat interested in the what, how and why of our decisions here, so I've included a bit of the more techy side of things in my update.

Bandwidth

So one of the big issues we had was the heavy bandwidth caused by a massive amount of downloaded content (not in terms of storage space, but multiple people downloading the same content).

In terms of bandwidth, we were seeing the top 10 single images resulting in around 600GB+ of downloads in a 24 hour period.

This has been resolved by setting up a frontline caching server at pictrs.blahaj.zone, which is sitting on a small, unlimited 400Mbps connection, running a tiny Caddy cache that is reverse proxying to the actual lemmy server and locally caching the images in a file store on its 10TB drive. The nginx in front of lemmy is 301 redirecting internet facing static image requests to the new caching server.

This one step alone is saving over $1,500/month.

Alternate hosting

The second step is to get away from RDS and our current fixed instance hosting to a stand-alone and self-healing infrastructure. This has been what I've been doing over the last few days, setting up the new servers and configuring the new cluster.

We could be doing this cheaper with a lower cost hosting provider and a less resiliant configuration, but I'm pretty risk averse and I'm comfortable that this will be a safe configuration.

I woudn't normally recommend this setup to anyone hosting a small or single user instance, as it's a bit overkill for us at this stage, but in this case, I have decided to spin up a full production grade kubernetes cluster with a stacked etcd inside a dedicated HA control plane.

We have rented two bigger dedicated servers (64GB, 8 CPU, 2TB RAID 1, 1 GBPS bandwidth) to run our 2 databases (main/standby), redis, etc on. Then a the control plane is running on 3 smaller instances (2GB, 2 CPU each).

All up this new infrastructure will cost around $9.20/day ($275/m).

Current infrastructure

The current AWS infrastructure is still running at full spec and (minus the excess bandwidth charges) is still costing around $50/day ($1500/m).

Migration

Apart from setting up kubernetes, nothing has been migrated yet. This will be next.

The first step will be to get the databases off the AWS infrastucture first, which will be the biggest bang for buck as the RDS is costing around $34/day ($1,000/m)

The second step will be the next biggest machine which is our Hajkey instance at Blåhaj zone, currently costing around $8/day ($240/m).

Then the pictrs installation, and lemmy itself.

And finally everything else will come off and we'll shut the AWS account down.

all 39 comments
sorted by: hot top controversial new old

appreciate the dedication and transparency! I'm a developer myself but im still learning the basics when it finds to clusters and scaling

[-] NoStressyJessie@lemmy.blahaj.zone 3 points 1 year ago* (last edited 1 year ago)

I tried to update my profile picture for the first time since the migration and now I don't have a profile picture at all, anyone else noticed issues? Image upload attempted from the webui settings page at lemmy.blahaj.zone.

[-] ArieTheFloof@lemmy.blahaj.zone 1 points 1 year ago

Yeah same here, assuming its just a migration hiccup

[-] ada@lemmy.blahaj.zone 3 points 1 year ago

@NoStressyJessie@lemmy.blahaj.zone Just log files filling up a partition. It should be good to go again now

[-] NoStressyJessie@lemmy.blahaj.zone 2 points 1 year ago* (last edited 1 year ago)

I'm trying, maybe I messed up when I converted the file, but it shows as a broken image, when I go to the web address where the image should be hosted it says

{"msg":"Error in MagickWand, ImproperImageHeader `/data/pict-rs/files/jhLII3k5jz.png' @ error/png.c/ReadPNGImage/4286"}

Edit: Same kind of error for jpg

{"msg":"Error in MagickWand, InsufficientImageDataInFile `/data/pict-rs/files/gnUPYJkCuT.jpg' @ error/jpeg.c/ReadJPEGImage_/1112"}

the first image I exported from gimp, 2nd picture was converted online. Seems unlikely I botched 2 seperate conversion attempts using seperate utilities

[-] jo@blahaj.zone 2 points 1 year ago

@supakaity@lemmy.blahaj.zone I'm pleased that this post has been up for 30 minutes and no one has jumped in with "Acktually..." and proceeded to tell you how you're doing it wrong. jk /jk

Thankfully I have no clue about this stuff, but appreciate the detailed update nonetheless. #Hajkey #blahajzone

[-] MsPenguinette@lemmy.blahaj.zone 2 points 1 year ago

I think it's because what they are saying is pretty solid. It's the exact solution I'd recommend

[-] ayilathebailey@blahaj.zone 1 points 1 year ago

@jo @supakaity@lemmy.blahaj.zone That's because the ones who are complaining are wisely doing so in their own spaces, as they ought to. They're also revealing just how much they know about things, in the usual armchair quarterback / bleacher flyhalf fashion.

[-] masukomi@lemmy.blahaj.zone 2 points 1 year ago

a) holy 💩 i had no idea this was so expensive b) please include the ko-fi link for us to help support in future updates.

(link found in other comments)

[-] supakaity@lemmy.blahaj.zone 2 points 1 year ago

It's not supposed to be, that's the issue. :)

[-] masukomi@lemmy.blahaj.zone 1 points 1 year ago

well yeah, but even once the costs are reduced 10x or whatever there will still be costs and it's still be good to support its continued existence.

[-] lapis@lemmy.blahaj.zone 1 points 1 year ago

Absolutely wild to me that moving off AWS + setting up the caching server will bring overall costs down by around a factor of ten. So glad y’all are capable of the advanced technical junk, and super thankful that you’re willing and able to host the various blahaj.zone instances!

[-] bdonvr@thelemmy.club 1 points 1 year ago

Once pict-rs updates to allow directly serving images from object storage- wouldn't it be beneficial to migrate it to an object storage that allows unlimited egress like Cloudflare R2?

[-] ada@lemmy.blahaj.zone 0 points 1 year ago

Cloudflare is a non starter

Can you say why? I might wanna move some of my stuff if they're being shitty

I mean, Cloudbleed sucked, and their constant refrain of "we're not HOSTING bigoted websites, just caching all their stuff and handing it to whoever asks for it", is that it?

[-] ada@lemmy.blahaj.zone 1 points 1 year ago

Their explicit and active protection of the rights of bigots over the wellbeing of the people those bigots are targeting

[-] jdp23@blahaj.zone 1 points 1 year ago

@supakaity@lemmy.blahaj.zone thanks for the detailed update!

[-] princessnorah@lemmy.blahaj.zone 1 points 1 year ago* (last edited 1 year ago)

Thank you for all your communication about how the server is being run. I always feel in good hands here on Blahaj :)

[-] audiomodder@lemmy.blahaj.zone 0 points 1 year ago

How can we donate to keep the infrastructure up and running?

this post was submitted on 01 Aug 2023
25 points (100.0% liked)

Blahaj Lemmy Meta

2230 readers
1 users here now

Blåhaj Lemmy is a Lemmy instance attached to blahaj.zone. This is a group for questions or discussions relevant to either instance.

founded 2 years ago
MODERATORS