Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve mp_root_n code #532

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from
Open

Conversation

ssinai1
Copy link

@ssinai1 ssinai1 commented Aug 23, 2022

Set a closer initial value for iteration and remove some unnecessary code.
By the way, in the first for loop, one of the err=MP_OKAY branches (never reached) did not set c.

@sjaeckel
Copy link
Member

@czurnieden could you maybe have a look if you have the time?

You've added parts of the removed code in 984d3ff and I'm not sure whether this should/can really be removed.

@sjaeckel sjaeckel requested a review from czurnieden August 23, 2022 14:44
@ssinai1
Copy link
Author

ssinai1 commented Aug 23, 2022

I based it on the development branch. I thought it was more up to date.

@ssinai1
Copy link
Author

ssinai1 commented Aug 23, 2022

I don't know if it is how readable in this form:

Starting with x[i+1] = x[i] - f(x[i])/f'(x[i]), where f(x) = x^b - a

So x[i+1] = x[i] - (x[i] ^ b - a) / ( b * x[i] ^ (b-1) ) (this is in the code)

When x[i] ^ b > a, x[i+1] will be rounded up.
And because of the convexity, 'x' cannot go below the root even after rounding.

Algebraically:
r:=a^(1/n)
x[i] - x[i+1] = (x[i] ^ b - a) / ( b * x[i] ^ (b-1) )
= (x[i] - r) * (x[i]^(b-1) + x[i]^(b-2)*r + ... + x[i]*r^(b-2) + r^(b-1)) / (n * x[i]^(b-1)) < (x[i] - r)
So x[i+1] > r, when x[i] ^ b > a

Hence the code removal (unless I'm missing something...).

@czurnieden
Copy link
Contributor

It's too warm too sleep, so here we go:

You've added parts of the removed code in 984d3ff and I'm not sure whether this should/can really be removed.

That was three years ago! But I appreciate your optimism regarding my long-term memory ;-)

I am sure that I did no proper analysis, so there are some places with…let's call it "belt&girdle" code.
But better late then never.

The start-value gets calculated based on the equality n^(1/b) = 2^((log_2(n)/b). The function mp_count_bits computes floor(log_2(n)) + 1 and the division by b is truncating so we can only compute 2^( floor( (floor(log_2(n)) + 1) / b ) ) which can be too small in certain circumstances. Simple example: with b=3 and n in {31,32,33} the results are 2, 4, 4 respectively.

With addition of unity we get the inequality 2^( floor( (floor(log_2(n)) + 1) / b ) + 1 ) >= floor(n^(1/b)) but we need x[0] ^ b > a as @ssinai1 has shown but with that correction we would get the results 3, 5, 5 with the example above and because 3^3 < 31 the condition x[0] ^ b > a does not hold anymore.
But it also the correct result and the main loop does not change it.
Is that always the case for 2^( floor( (floor(log_2(n)) + 1) / b ) + 1 ) == floor(n^(1/b))?

I'm not sure anymore why I added the loop but I also had some code to find a more exact start-value by doing some rounds of bisection (works well for larger bases and also a little bit for smaller bases but it adds a lot of code and nobody wanted it ;-) ), so it might be a leftover from that experiment. But no matter the exact reason: it is superfluous code in case of 2^( floor( (floor(log_2(n)) + 1) / b ) + 2 ) and most likely 2^( floor( (floor(log_2(n)) + 1) / b ) + 1 ), too.

One last question: are you, @ssinai1 sure that the main loop cannot oscillate?

@ssinai1
Copy link
Author

ssinai1 commented Aug 24, 2022

Thanks for the explanation.

My reasoning is as follows:
log2(n)/b < (floor(log2(n)) + 1) / b <= ceil((floor(log2(n)) + 1) / b) = (floor(log2(n)) + 1 + b - 1) // b = floor(log2(n)) // b + 1
where // is the integer division (floor(x/y))

This means that not only is
2^( (floor(log_2(n)) + 1) // b + 1 ) > n^(1/b)
but also
2^( (floor(log_2(n))) // b + 1 ) = 2^( (mp_count_bits - 1) // b ) + 1 ) > n^(1/b)

As for the second part of the question:
using exact arithmetic,
as long as x[i]^b > a, it is true that x[i+1] < x[i] and x[i+1] > a^(1/b),
and since the code is rounding upwards, there is no possibility that x[i+1] ^ b < a (and so x[i+2] > x[i+1]) due to the imprecision of the integer division.

@czurnieden
Copy link
Contributor

What is that in lines 110ff? Ouch! How could that have slipped through the tests for over three years?
Thanks a lot for finding and repairing!

But while we are at it:
Two loops with an expensive exponentiation each, that is a lot of overhead!

If we can assume that the error is at most one with a start-value of (floor(log_2(n)) + 1) // b (just mp_count_bits(a) // b for simplicity) the possible outcomes of the test-exponentiation t = r^b are only t == a for a perfect power (return b) and t > a (return b-1) and t < a (return b).

Is that correct?

@ssinai1
Copy link
Author

ssinai1 commented Aug 25, 2022

I rewrote it for downward rounding iteration. The mp_div seems to round towards 0, so it needed the remainder...

@czurnieden
Copy link
Contributor

The mp_div seems to round towards 0

No, it is truncating! E.g.: both 3/2 and -3/2 get truncated to 1 and -1 respectively and not the 1 and -2 a rounding towards zero would give.

But I like where it is going to! Thanks for the work!

BTW: Libtommath is not as frugal as it looks from the outside, it has some useful macros in tommath.h. For example:

#define mp_iszero(a) ((a)->used == 0)
#define mp_isneg(a)  ((a)->sign == MP_NEG)

These are a bit cheaper than a full call to mp_cmp_d and, as I find (YMMV as always), better readable.

If I understood it correctly t3 is never negative and you added a check for t3 == 0 in the loop, so the do-while check can be reduced to simply check if t3 == 1. Something in the line of a macro like e.g.:

#define mp_isone(a) ( ((a)->used == 1u) && ((a)->dp[0] == 1u) )

That macro is not in tommath.h or tommath_private.h. Should it? I don't know, although I use it quite often.

@ssinai1
Copy link
Author

ssinai1 commented Aug 26, 2022

Anyway, in this case a division rounding to negative infinity would be more practical.

@sjaeckel
Copy link
Member

sjaeckel commented Oct 2, 2022

#define mp_isone(a) ( ((a)->used == 1u) && ((a)->dp[0] == 1u) )

That macro is not in tommath.h or tommath_private.h. Should it? I don't know, although I use it quite often.

Do you use this often inside the library or in code using libtommath? IMO it'd be fine to go in either of tommath.h or tommath_private.h, depending on what makes sense. (My guess goes to tommath.h but asking just to be sure.)

@czurnieden
Copy link
Contributor

Do you use this often inside the library or in code using libtommath?

Mainly externally, so tommath.h would be ideal.
It might be tempting to generalize it but since the size of a mp_digit is not fixed over different architectures it would get too complex.

@sjaeckel sjaeckel mentioned this pull request Oct 3, 2022
@sjaeckel
Copy link
Member

sjaeckel commented Oct 3, 2022

Sorry @ssinai1 for highjacking your PR for this discussion :)

I've created #537 for that purpose now.

We should continue with this question #532 (comment) now :) (and by 'we' I mostly mean @ssinai1 and @czurnieden :D thanks)

@czurnieden
Copy link
Contributor

October 2022. Oh, my, completely forgotten, sorry!

@ssinai1 if you are still out there: could you please rebase this PR to the current develop branch so I can give it my "seal of approval"? Thank you very much!

@sjaeckel
Copy link
Member

@ssinai1 Can you please rebase on top of develop, squash those commits and write a bit of a commit message which summarizes what was done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants