this post was submitted on 24 Jan 2026
8 points (90.0% liked)

Gaming

33596 readers
46 users here now

From video gaming to card games and stuff in between, if it's gaming you can probably discuss it here!

Please Note: Gaming memes are permitted to be posted on Meme Mondays, but will otherwise be removed in an effort to allow other discussions to take place.

See also Gaming's sister community Tabletop Gaming.


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 4 years ago
MODERATORS
 

Hello my name is Daniel Hanrahan. Do you think my games should have optional calls to assembly functions for certain CPUs and GPUs in order to reach maximum possible performance and use parts/functions of the CPUs and GPUs that are not standard, but it's use for them would for tasks that is not perfect for the standard parts of CPUs and GPUs if possible, for example: instead of using the standard parts/functions of the z80 for the randomization you use the refresh register instead. Let me be clear my games have good performance.

Link to my games: https://daniel-hanrahan-tools-and-games.github.io/

top 15 comments
sorted by: hot top controversial new old
[–] Mondez 8 points 3 days ago (1 children)

You need to profile your binaries to find out where they spend most of their cpu time and try and optimise those areas with more efficient code before you even consider micro optimisations like asm for specific cpus. Considerations like algorithm choice and cache efficiency of your data will all likely have a larger effect.

[–] danielhanrahantng@beehaw.org 1 points 2 days ago

Thank you but I already made sure that my games was as efficient as possible and I was asking if I can make my games more performant by adding optional assembly functions to use non standard parts of the cpu and gpu to make sure that my games are at maximum performance, because linux does something similar.

[–] Mondez 3 points 2 days ago (1 children)

I would argue that if your games are already performant on the platforms you care about that you would get diminishing returns. The only reason to experiment with specialist asm would be for your own experience and enrichment which is a perfectly reasonable reason to pursue it.

It's probably not worth comparing to an OS where even shaving a few cycles off of code that runs all the time on millions of computers across the world would end up with significant impact.

[–] danielhanrahantng@beehaw.org 2 points 2 days ago

Thanks for the information.

[–] TehPers@beehaw.org 3 points 2 days ago (1 children)

Do you want to? Go for it.

Does your game crawl? Have you identified this code as the bottleneck? Are you certain that asm will give you a meaningful performance increase, and that your issue doesn't lie with your approach to the problem? Sure, I guess. You said your game runs fine though, so this probably doesn't apply.

Is your game fast already? If you don't want to do it, don't.

Writing asm by hand is almost always a waste of time. There are only a few times where it's actually necessary, and unless you're writing a bootloader and running your game on bare metal, I can't imagine why it'd be necessary. But you know your code better than anyone else here, so you should know whether it's needed or not more than any of us do.

To begin with, you're apparently targeting the Z80, which I haven't seen used for games in the wild... probably in my entire life? Maybe an arcade machine I played on once used it, but I can't think of any other times. If your targets need custom assembly, then you should already know that. We don't know your targets.

[–] danielhanrahantng@beehaw.org 2 points 2 days ago

I was just using a feature of the z80 as an example and thank you for your help and if anyone wants to add that functionality in their game to increase performance they can.

[–] socsa@piefed.social 2 points 2 days ago (1 children)

This is called compute dispatching and is super common. I have done a whole bunch of DSP implementations where you use a CUDA or AVX kernel depending on availability. Or you dispatch to standard library or even Python kernel otherwise.

[–] danielhanrahantng@beehaw.org 1 points 2 days ago

Do you think compute dispatching is worth it for my video games.

[–] Azzu@lemmy.dbzer0.com 3 points 3 days ago (1 children)

"have optional calls" is not really how this works.

If you're in an interpreted language, like python, java, c#, you don't have to do anything, because they compile for the architecture they're running on already, i.e. using whatever CPU features are available.

If you have a compiled language, and your users compile themselves, then they are choosing which CPU features to use, so you don't have to do anything. If you distribute pre-built binaries, then you simply have to compile it once for each architecture you want to support, and distribute the correct binary to each user (usually done with an installer).

For graphics, your graphics API also already takes care of using system-specific instructions, and shaders are compiled by it before/while running also using system-specific instructions.

So there's really no "optional" path that you have to specifically put into your program, so nothing like

Func work()
    If isArm then doArmStuff()
    Else if isZen4 then doZen4Stuff()
    ...
End
[–] danielhanrahantng@beehaw.org 1 points 2 days ago

The reason I ask is because I know linux does something like that to make sure it is at maximum performance.

[–] LodeMike@lemmy.today 2 points 3 days ago (1 children)

This won't meaningfully improve performance, especially for CPU stuff.

[–] danielhanrahantng@beehaw.org 1 points 2 days ago (1 children)

Thank you, but are you sure because I know linux does something similar to make sure it is at maximum performance.

[–] LodeMike@lemmy.today 2 points 2 days ago (1 children)

On modern CPUs it doesn't matter that much. And any optimization would have to be updated for each CPU type (Zen/4, Alder lake, etc.) Modern CPUs have insane out of order execution that makes compiler generated code nearly as fast as the most optimized handwritten ones.

[–] danielhanrahantng@beehaw.org 1 points 2 days ago (1 children)

Thanks for the information.

[–] LodeMike@lemmy.today 2 points 2 days ago

99% of the time, memory is the bottleneck. It's why DMA is so huge of a feature.