november 12, 22:55

You can compute floor division of a number by multiplying, especially by its modular inverse. This article is on how to use this trick to do division faster than idiv, using the Granlund-Montgomery algorithm.

For $$q = \lfloor \frac{n}{d} \rfloor,$$ we can instead use $$q = nM \gg s.$$ By first thinking about it as $$q = \lfloor \frac{n}{d} \rfloor = \lfloor \frac{nM}{2^{p}} \rfloor,$$ This makes $$\lfloor \frac{1}{d} \rfloor = \lfloor \frac{M}{2^{p}} \rfloor,$$ and, in fact, $$M \approx \lceil \frac{2^{p}}{d} \rceil.$$ We can then make $$p = b + s,$$ where b is the size of the inputs of multipication. For example, on 64-bit hardware, b = 64, but for C's integers, b = 32. We can now solve for M using $$M = \lceil \frac{2^{64 + s}}{d} \rceil,$$ using the smallest non-zero natural for s, so the formula works for all n. Finding s is usually done on an incremental basis. Checking s can usually be done by applying many inputs and checking them all, but there is a systematic way of checking s. With $$0 \le r = n - qd \le d - 1,$$ r is the remainder, so $$n = qd + r.$$ Then, $$nM = qdM + rM.$$ Now, change $$dM \to 2^{p} - e,$$ (defining e) and then, using the distributive property and associativity property, $$nM = q2^{p} + (qe + rM).$$ If we floor divide by 2^p, and then subtract q on both sides, we get $$\lfloor \frac{nM}{2^{p}} \rfloor - q = \lfloor \frac{qe + rM}{2^{p}} \rfloor.$$ (q is not floored, since it is an integer.) Since we defined q as the minuend they are both the same. Thus, on the right-hand side, if s and M really are correct, should be $0$. With the largest quotient $$0 \le q \le Q = \frac{2^{b} - 1}{d},$$ We can substitute the maxima to get this inequality for checking s and M: $$Qe + (d - 1)M \le 2^{p} - 1.$$

If I put 10 for d and 64 for b, I get M = 0xCCCCCCCCCCCCCCCD and s = 3. This is quite useful for printing numbers without a standard librar, as it is around 4 times faster than normal division. You can also use the Extended Euclidean algorithm to get 1/5 mod 264, but that is an exercise for the reader.

I have to thank ChatGPT for finding most of this, But more importantly Torbjoern Granlund and Peter L. Montgomery for this paper on what I talked about, called "Division by Invariant Integers using Multiplication".

mov r8, rax ; for remainder mov rdx, 0xCCCC_CCCC_CCCC_CCCD ; 1/5 mul rdx mov rbx, rdx ; for remainder shr rbx, 3 vs. mov rax, r8 mov rdx, rbx shl rdx, 3 lea rdx, [rdx + rbx * 2] sub rax, rdx mov rdx, rbx xchg rax, rdx