For a human, it's pretty simple to divide a number by ten because we used to calculate in ten base everyday. But ... a computer handle numbers in base 2. It doesn't means that it can't compute a division with a number that is not a power of 2, but operations are really faster in this base. Especially if you don't have floating point unit.

At work, I had to re implement the function "printf". To display decimal integers, I use an algorithm like :

```
while (value)
{
*cur_ptr = '0' + (value%10);
value = value/10;
...
}
```

This works fine, but two GCC builtin functions *udivdi3* and *umoddi3* are called which represent an amount of 3.5kB of code. So, I was looking for a code size optimized implementation on the Internet and didn't found my way.

Finally, I wrote my own. It's a basic one inspired from child learning method :

```
01. void div10(unsigned value, unsigned* _res, unsigned* _mod)
02. {
03. unsigned res = value / 8;
04. unsigned mod = value - (res*10);
05.
06. while (mod > 10)
07. {
08. res -= 1;
09. mod += 10;
10. }
11.
12. *_res = res;
13. *_mod = mod;
14. }
```

This algorithm is a basic approach to division. It tries all numbers until it find the good one.

First thing : why I use variables instead of directly write values to pointers ? It's to indicate to GCC that they are temporary values which can be kept into registers and not written every loop into the memory (save instructions and memory accesses).

Line 3 is the begining. We will start at value / 8 which can be easily done by the computer because it is equivalent to a right shift of 3 bits (only one instruction). Note that 8 is the closest power of two to 10 and x/8 is greater than x/10 .

Line 4 is the computation of difference (distance) between my result multiplied by 10 and the current value. For the final result, this difference must be less than 10 (which correspond to the modulo).

Line 6 : while its not the case, we decrement result and increment modulo. Why incrementing modulo ? It's an optimization of the re computation of :

```
mod = value - (res*10);
```

If res is decremented, modulo is incremented as value is fixed. So, a simple addition is sufficient here.

There is another big trick in this code : the substract line 4 is done with UNSIGNED values and the result of line 4 is most of the time negative ! Which corresponds to big unsigned value (> 2147483648) that also implies > 10. We have to wait an integer overflow for mod to become positive and when it's done, we get the current modulo value (at least MAX*LONG*INT+10 = 9) !

If we does opposite operation ((res*10) - value), we have to decrement mod until it becomes less or equals to 0. But, in this case, all operations must be done in signed mode and the final modulo must be inverted at the end (more instructions) :

```
void div10(unsigned long value, unsigned long* _res, unsigned long* _mod)
{
unsigned long res = value / 8;
unsigned long mod = (res*10) - value;
while (((signed long)mod) > 0)
{
res -= 1;
mod -= 10;
}
*_res = res;
*_mod = (unsigned long)-(signed long)mod;
}
```

Facts : the unsigned version of my algorithm is 15 instructions while *udivdi3* + *umoddi3* is 881 instructions. Wonderful !

**Beware : this algorithm is slow**. For small numbers it's not import because x/8 =~ x/10, but when x becomes bigger, the difference can be huge and requires one decrement, one increment and one test multiplied (x/8 - x/10)/10 times. For 32 bits numbers, it's 10 737 418 loops...

This algorithm can be extended to any divisor by replacing hardcoded divisor with a parameter and a function that finds the nearest and inferior power of 2.

```
void div(unsigned long value, unsigned long divisor, unsigned long* _res, unsigned long* _mod)
{
unsigned long res, mod, tmp = divisor, power2 = 0;
/* Nearest inferior power of 2 */
while (tmp > 1)
{
tmp >>=1;
power2++;
}
res = value >> power2;
mod = value - (res*divisor);
while (mod > divisor)
{
res -= 1;
mod += divisor;
}
*_res = res;
*_mod = mod;
}
```