Improve mp_root_n code #532

ssinai1 · 2022-08-23T12:35:11Z

Set a closer initial value for iteration and remove some unnecessary code.
By the way, in the first for loop, one of the err=MP_OKAY branches (never reached) did not set c.

sjaeckel · 2022-08-23T14:44:05Z

@czurnieden could you maybe have a look if you have the time?

You've added parts of the removed code in 984d3ff and I'm not sure whether this should/can really be removed.

ssinai1 · 2022-08-23T16:32:01Z

I based it on the development branch. I thought it was more up to date.

ssinai1 · 2022-08-23T17:43:54Z

I don't know if it is how readable in this form:

Starting with x[i+1] = x[i] - f(x[i])/f'(x[i]), where f(x) = x^b - a

So x[i+1] = x[i] - (x[i] ^ b - a) / ( b * x[i] ^ (b-1) ) (this is in the code)

When x[i] ^ b > a, x[i+1] will be rounded up.
And because of the convexity, 'x' cannot go below the root even after rounding.

Algebraically:
r:=a^(1/n)
x[i] - x[i+1] = (x[i] ^ b - a) / ( b * x[i] ^ (b-1) )
= (x[i] - r) * (x[i]^(b-1) + x[i]^(b-2)*r + ... + x[i]*r^(b-2) + r^(b-1)) / (n * x[i]^(b-1)) < (x[i] - r)
So x[i+1] > r, when x[i] ^ b > a

Hence the code removal (unless I'm missing something...).

czurnieden · 2022-08-24T04:05:28Z

It's too warm too sleep, so here we go:

You've added parts of the removed code in 984d3ff and I'm not sure whether this should/can really be removed.

That was three years ago! But I appreciate your optimism regarding my long-term memory ;-)

I am sure that I did no proper analysis, so there are some places with…let's call it "belt&girdle" code.
But better late then never.

The start-value gets calculated based on the equality n^(1/b) = 2^((log_2(n)/b). The function mp_count_bits computes floor(log_2(n)) + 1 and the division by b is truncating so we can only compute 2^( floor( (floor(log_2(n)) + 1) / b ) ) which can be too small in certain circumstances. Simple example: with b=3 and n in {31,32,33} the results are 2, 4, 4 respectively.

With addition of unity we get the inequality 2^( floor( (floor(log_2(n)) + 1) / b ) + 1 ) >= floor(n^(1/b)) but we need x[0] ^ b > a as @ssinai1 has shown but with that correction we would get the results 3, 5, 5 with the example above and because 3^3 < 31 the condition x[0] ^ b > a does not hold anymore.
But it also the correct result and the main loop does not change it.
Is that always the case for 2^( floor( (floor(log_2(n)) + 1) / b ) + 1 ) == floor(n^(1/b))?

I'm not sure anymore why I added the loop but I also had some code to find a more exact start-value by doing some rounds of bisection (works well for larger bases and also a little bit for smaller bases but it adds a lot of code and nobody wanted it ;-) ), so it might be a leftover from that experiment. But no matter the exact reason: it is superfluous code in case of 2^( floor( (floor(log_2(n)) + 1) / b ) + 2 ) and most likely 2^( floor( (floor(log_2(n)) + 1) / b ) + 1 ), too.

One last question: are you, @ssinai1 sure that the main loop cannot oscillate?

ssinai1 · 2022-08-24T09:28:17Z

Thanks for the explanation.

My reasoning is as follows:
log2(n)/b < (floor(log2(n)) + 1) / b <= ceil((floor(log2(n)) + 1) / b) = (floor(log2(n)) + 1 + b - 1) // b = floor(log2(n)) // b + 1
where // is the integer division (floor(x/y))

This means that not only is
2^( (floor(log_2(n)) + 1) // b + 1 ) > n^(1/b)
but also
2^( (floor(log_2(n))) // b + 1 ) = 2^( (mp_count_bits - 1) // b ) + 1 ) > n^(1/b)

As for the second part of the question:
using exact arithmetic,
as long as x[i]^b > a, it is true that x[i+1] < x[i] and x[i+1] > a^(1/b),
and since the code is rounding upwards, there is no possibility that x[i+1] ^ b < a (and so x[i+2] > x[i+1]) due to the imprecision of the integer division.

czurnieden · 2022-08-25T01:01:51Z

What is that in lines 110ff? Ouch! How could that have slipped through the tests for over three years?
Thanks a lot for finding and repairing!

But while we are at it:
Two loops with an expensive exponentiation each, that is a lot of overhead!

If we can assume that the error is at most one with a start-value of (floor(log_2(n)) + 1) // b (just mp_count_bits(a) // b for simplicity) the possible outcomes of the test-exponentiation t = r^b are only t == a for a perfect power (return b) and t > a (return b-1) and t < a (return b).

Is that correct?

ssinai1 · 2022-08-25T15:55:41Z

I rewrote it for downward rounding iteration. The mp_div seems to round towards 0, so it needed the remainder...

czurnieden · 2022-08-25T20:41:31Z

The mp_div seems to round towards 0

No, it is truncating! E.g.: both 3/2 and -3/2 get truncated to 1 and -1 respectively and not the 1 and -2 a rounding towards zero would give.

But I like where it is going to! Thanks for the work!

BTW: Libtommath is not as frugal as it looks from the outside, it has some useful macros in tommath.h. For example:

#define mp_iszero(a) ((a)->used == 0)
#define mp_isneg(a)  ((a)->sign == MP_NEG)

These are a bit cheaper than a full call to mp_cmp_d and, as I find (YMMV as always), better readable.

If I understood it correctly t3 is never negative and you added a check for t3 == 0 in the loop, so the do-while check can be reduced to simply check if t3 == 1. Something in the line of a macro like e.g.:

#define mp_isone(a) ( ((a)->used == 1u) && ((a)->dp[0] == 1u) )

That macro is not in tommath.h or tommath_private.h. Should it? I don't know, although I use it quite often.

ssinai1 · 2022-08-26T20:44:38Z

Anyway, in this case a division rounding to negative infinity would be more practical.

sjaeckel · 2022-10-02T11:27:23Z

#define mp_isone(a) ( ((a)->used == 1u) && ((a)->dp[0] == 1u) )
That macro is not in tommath.h or tommath_private.h. Should it? I don't know, although I use it quite often.

Do you use this often inside the library or in code using libtommath? IMO it'd be fine to go in either of tommath.h or tommath_private.h, depending on what makes sense. (My guess goes to tommath.h but asking just to be sure.)

czurnieden · 2022-10-03T11:16:49Z

Do you use this often inside the library or in code using libtommath?

Mainly externally, so tommath.h would be ideal.
It might be tempting to generalize it but since the size of a mp_digit is not fixed over different architectures it would get too complex.

sjaeckel · 2022-10-03T13:55:58Z

Sorry @ssinai1 for highjacking your PR for this discussion :)

I've created #537 for that purpose now.

We should continue with this question #532 (comment) now :) (and by 'we' I mostly mean @ssinai1 and @czurnieden :D thanks)

czurnieden · 2023-06-21T12:35:12Z

October 2022. Oh, my, completely forgotten, sorry!

@ssinai1 if you are still out there: could you please rebase this PR to the current develop branch so I can give it my "seal of approval"? Thank you very much!

sjaeckel · 2024-03-11T13:47:11Z

@ssinai1 Can you please rebase on top of develop, squash those commits and write a bit of a commit message which summarizes what was done.

Improve mp_root_n code

e10a1c4

sjaeckel requested a review from czurnieden August 23, 2022 14:44

for loop put back (for developability) and fixes

36a055e

uses less exponentiation

b36deb0

mp_cmp_d replaced

1c287f8

sjaeckel mentioned this pull request Oct 3, 2022

speed please!? #537

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve mp_root_n code #532

Improve mp_root_n code #532

ssinai1 commented Aug 23, 2022 •

edited

Loading

sjaeckel commented Aug 23, 2022

ssinai1 commented Aug 23, 2022

ssinai1 commented Aug 23, 2022 •

edited

Loading

czurnieden commented Aug 24, 2022

ssinai1 commented Aug 24, 2022 •

edited

Loading

czurnieden commented Aug 25, 2022

ssinai1 commented Aug 25, 2022

czurnieden commented Aug 25, 2022

ssinai1 commented Aug 26, 2022

sjaeckel commented Oct 2, 2022

czurnieden commented Oct 3, 2022

sjaeckel commented Oct 3, 2022

czurnieden commented Jun 21, 2023

sjaeckel commented Mar 11, 2024

Improve mp_root_n code #532

Are you sure you want to change the base?

Improve mp_root_n code #532

Conversation

ssinai1 commented Aug 23, 2022 • edited Loading

sjaeckel commented Aug 23, 2022

ssinai1 commented Aug 23, 2022

ssinai1 commented Aug 23, 2022 • edited Loading

czurnieden commented Aug 24, 2022

ssinai1 commented Aug 24, 2022 • edited Loading

czurnieden commented Aug 25, 2022

ssinai1 commented Aug 25, 2022

czurnieden commented Aug 25, 2022

ssinai1 commented Aug 26, 2022

sjaeckel commented Oct 2, 2022

czurnieden commented Oct 3, 2022

sjaeckel commented Oct 3, 2022

czurnieden commented Jun 21, 2023

sjaeckel commented Mar 11, 2024

ssinai1 commented Aug 23, 2022 •

edited

Loading

ssinai1 commented Aug 23, 2022 •

edited

Loading

ssinai1 commented Aug 24, 2022 •

edited

Loading