83
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 13 Sep 2024
83 points (100.0% liked)
chat
8165 readers
74 users here now
Chat is a text only community for casual conversation, please keep shitposting to the absolute minimum. This is intended to be a separate space from c/chapotraphouse or the daily megathread. Chat does this by being a long-form community where topics will remain from day to day unlike the megathread, and it is distinct from c/chapotraphouse in that we ask you to engage in this community in a genuine way. Please keep shitposting, bits, and irony to a minimum.
As with all communities posts need to abide by the code of conduct, additionally moderators will remove any posts or comments deemed to be inappropriate.
Thank you and happy chatting!
founded 3 years ago
MODERATORS
You should understand that "computers very bad at division" is a relative term, on modern desktop x86 chips a single 64-bit division takes somewhere on the order of 20-40 clock cycles (which seems pretty fast when you think about how they're running at like 4 billion cycles per second, but these same chips can do integer multiplication in 2-3 cycles so division is painfully slow by comparison).
The reason multiplying by 2^33 / 9 is faster is because you don't have to actually compute what 2^33 is and then divide by 9 every single time - the compiler can compute that value the "slow" way a single time and bake that into the machine code, and then when the program is running it only has to deal with multiplication.
While we don't know exactly what techniques are being used in a modern Intel/AMD chip, as far as I know the current best approach is basically just long division.
EDIT: The reason long division is slow is that all the partial results depend on the results of previous steps. This means that your division circuit ends up being a really long line of comparator circuits chained end-to-end, and long circuits are bad because it takes a long time for the signal to actually reach the end. Multiplication is fast by comparison because you can compute all the partial products in parallel, and then add them together in a kind of tree shape. The end result is a circuit which is super wide, but the "propagation delay" (the time it takes until the last input signal reaches the end) is pretty low since there's no path which passes through more than a few adders.