-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update deprecated .max(ind), .min(ind) to .index_max(), .index_min() #3835
Conversation
e4871a3
to
11bc9a5
Compare
@conradsnicta It seems like |
I don't think that's a
I haven't looked at the code diff yet, so not sure where the issue might lie. But it seems reasonable that this would be a result of the |
Thanks for catching that, @rcurtin, for setting me straight. That seems like it is on me, and It is likely the change from mlpack/src/mlpack/methods/hmm/hmm_impl.hpp Line 531 in 4653e10
to index = prob.index_max();
logStateProb(j, t) = index + logProbs(t, j); where I goofed and followed the nag on |
src/mlpack/methods/hmm/hmm_impl.hpp
Outdated
@@ -528,13 +528,14 @@ double HMM<Distribution>::Predict(const arma::mat& dataSeq, | |||
for (size_t j = 0; j < logTransition.n_rows; j++) | |||
{ | |||
arma::vec prob = logStateProb.col(t - 1) + logTransition.row(j).t(); | |||
logStateProb(j, t) = prob.max(index) + logProbs(t, j); | |||
index = prob.index_max(); | |||
logStateProb(j, t) = index + logProbs(t, j); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logStateProb(j, t) = index + logProbs(t, j); | |
logStateProb(j, t) = prob[index] + logProbs(t, j); |
I think that this is the cause of the test failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! The edit, cmake, test cycle is soooooo looooong I just got to reaffirm failures here. And apparently I suck because I also have
-------------------------------------------------------------------------------
/home/edd/git/mlpack/src/mlpack/tests/perceptron_test.cpp:107
...............................................................................
/home/edd/git/mlpack/src/mlpack/tests/perceptron_test.cpp:125: FAILED:
REQUIRE( predictedLabels(2) == 1 )
with expansion:
0 == 1
-------------------------------------------------------------------------------
Or
-------------------------------------------------------------------------------
/home/edd/git/mlpack/src/mlpack/tests/perceptron_test.cpp:143
...............................................................................
/home/edd/git/mlpack/src/mlpack/tests/perceptron_test.cpp:160: FAILED:
REQUIRE( predictedLabels(0, 0) == 1 )
with expansion:
0 == 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would appear to be similar. I think I want
maxIndex = tempLabelMat.index_max();
predictedLabels(i) = tempLabelMat[maxIndex]; // ie index into variable here too
Thumbs up or down?
Thanks so much for looking into this, the warning messages were all over my output but I had been avoiding actually doing anything about it... 😄
I think this one is a little tricky. The deprecated const arma::uword maxIndex = tempLabelMat.max_index();
const arma::uword maxIndexCol = maxIndex / tempLabelMat.n_rows;
const arma::uword maxIndexRow = maxIndex % tempLabelMat.n_rows; |
Wait, but
and then remove the |
Yeah I noticed that too. I think I will try your second suggestion now that we gave it the four-eye treatment. |
Hey @rcurtin I backtracked to established a proper baseline and I have some good / less good news:
so I may have to push that deprecation warning change back to you. (Edit: Had another look. The initial proposal may been better, will try now with maxIndexRow and maxIndexCol 'scaled' from what index_max() returns into temp. var maxIndex.) In other news, I experimented with using |
Hm. Still fails here:
I guess I am stuck. |
@eddelbuettel @rcurtin The function ind2sub() can help here. The code in
to:
Since
|
I tried the suggestion by @conradsnicta 1 but alas there is still a fail:
For the record my diff is modified src/mlpack/methods/perceptron/perceptron_impl.hpp
@@ -225,7 +225,7 @@ void Perceptron<
size_t j, i = 0;
bool converged = false;
size_t tempLabel;
- arma::uword maxIndexRow = 0, maxIndexCol = 0;
+ arma::uword maxIndexRow = 0;
arma::Mat<ElemType> tempLabelMat;
LearnPolicy LP;
@@ -244,7 +244,7 @@ void Perceptron<
// correctly classifies this.
tempLabelMat = weights.t() * data.col(j) + biases;
- tempLabelMat.max(maxIndexRow, maxIndexCol);
+ maxIndexRow = arma::ind2sub(arma::size(tempLabelMat), tempLabelMat.index_max())(0);
// Check whether prediction is correct.
if (maxIndexRow != labels(0, j))
@@ -322,7 +322,7 @@ void Perceptron<LearnPolicy, WeightInitializationPolicy, MatType>::Classify(
for (size_t i = 0; i < test.n_cols; ++i)
{
tempLabelMat = weights.t() * test.col(i) + biases;
- tempLabelMat.max(maxIndex);
+ maxIndex = tempLabelMat.index_max();
predictedLabels(i) = maxIndex;
}
} so I don't think I deviated. Footnotes
|
@conradsnicta The two methods are not equivalent in the face of ties. Output> Rcpp::sourceCpp("arma_index_max.cpp")
> get_max(matrix(c(1,2,3,1), 2, 2))
M
1.0000 3.0000
2.0000 1.0000
0 1
0 1
[1] TRUE
> get_max(M <- matrix(1, 2, 2)) # ie all ones
M
1.0000 1.0000
1.0000 1.0000
1 0
0 0
[1] FALSE
> CodeFor simplicity as a R wrapper. I pass a matrix down, compute the old and new way and compare. #include <RcppArmadillo/Lighter>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
bool get_max(const arma::mat& M) {
M.print("M");
arma::uword omaxrow = 0, omaxcol = 0;
// old approach
M.max(omaxrow, omaxcol);
Rcpp::Rcout << omaxrow << " " << omaxcol << std::endl;
// new approach
arma::uvec subscript = arma::ind2sub(arma::size(M), M.index_max());
arma::uword nmaxrow = subscript(0);
arma::uword nmaxcol = subscript(1);
Rcpp::Rcout << nmaxrow << " " << nmaxcol << std::endl;
return (omaxrow == nmaxrow && omaxcol == nmaxcol);
}
/*** R
get_max(M <- matrix(c(1,2,3,1), 2, 2))
*/ |
@rcurtin @eddelbuettel Thanks for tracking this down. On the surface, both the old and new approaches are "correct", in the sense that they provide a valid index to a valid maximum value. For clarity, old = If there are duplicate maximum values, the new approach is guaranteed to provide the location of the first such value, while the old approach does not. Armadillo generally aims to follow the API design and functionality of Matlab/Octave. Matlab documentation states: "max ... returns the index ... that corresponds to the first occurrence of the maximum value ..." [1], and Octave documentation states: "max ... returns the first index of the maximum value(s)" [2]. So the new approach is preferred, and the old approach is "wrong". The same behavioural difference is also present when comparing This inconsistency between the old and new approaches is clearly suboptimal, and needs to be addressed. Aiming to do this in Armadillo 14.2.1. I'd suggest to refactor the failing tests in mlpack to take into account the new "correct" behaviour.
[1] https://www.mathworks.com/help/matlab/ref/double.max.html |
I have yet to look in more detail into unit test.
But let me see now if I can coax it into behaving. (On the fundamental level I agree: while 'different' on ties the index value is of course the same which is what should matter.) |
I may need help here. When I take the test function 'as is' and make it callable in R I replicate the issue: > Rcpp::sourceCpp("mlpack_perceptron_unit_test.cpp")
> and_gate_classifier()
prediction as vector
0 0 1 0
prediction by element
0 1 1 0
> The second vector ought to be like the first, somehow Code#include <Rcpp/Lighter>
#include <mlpack.h>
#include <mlpack/methods/perceptron.hpp>
// [[Rcpp::plugins(cpp17)]]
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::depends(RcppEnsmallen)]]
// [[Rcpp::depends(mlpack)]]
// [[Rcpp::export]]
void and_gate_classifier() { // cf test case 'And' in '[PerceptronTest]' line 135++
arma::mat trainData;
trainData = { { 0, 1, 1, 0 },
{ 1, 0, 1, 0 } };
arma::Mat<size_t> labels;
labels = { 0, 0, 1, 0 };
mlpack::Perceptron<> p(trainData, labels.row(0), 2, 1000);
arma::mat testData;
testData = { { 0, 1, 1, 0 },
{ 1, 0, 1, 0 } };
arma::Row<size_t> predictedLabels;
p.Classify(testData, predictedLabels);
predictedLabels.print("prediction as vector");
// Test single-point classify too.
predictedLabels(0) = p.Classify(testData.col(0));
predictedLabels(1) = p.Classify(testData.col(1));
predictedLabels(2) = p.Classify(testData.col(2));
predictedLabels(3) = p.Classify(testData.col(3));
predictedLabels.print("prediction by element");
}
/*** R
and_gate_classifier()
*/ PS This is of course using the |
@eddelbuettel This might be due to an additional line in In addition to the detected issues on lines ~247 and ~325, there is also line 292.
Related PR: eddelbuettel#1 |
Did you try the change? Does the test improve? |
Sorry for the slow response---debugging this is on my todo list for today. |
I am running a full test with @conradsnicta 's suggested fix which may solve it: if I missed one, I may have left "inconsistent" use between fitting and prediction in there, or between predicting by vector and predicting by column. The meta issue for you is still if we want and can merge change in behaviour. What @conradsnicta says remain true: the new form is better. But forcing change down to user which may well break their tests and deployments is icky. (As an aside, adding |
For the second time in two days, building |
mlpack compilation has helped me find many a weak power supply or bad RAM stick because of how hard it is on the system :) |
This is an Armadillo 14.2.0 change that creates a fair amount of compilation noise for mlpack right now.
6efd91f
to
db6b014
Compare
@conradsnicta correctly identified a third location needing > Rcpp::sourceCpp("mlpack_perceptron_unit_test.cpp")
> and_gate_classifier()
prediction as vector
0 0 1 0
prediction by element
0 0 1 0
> He overcomplicated things a little by forking my repo and sending a PR to my PR, and then that PR lacked a change I have above... Anyway, I cherry-picked his change, post-processed, rebased while I was at it and the amended branch and PR is now cooking. The change request by @conradsnicta is now stale (and resolved -- the change is in), so I requested a new one but I cannot resolve the current one way. |
All good here now. @conradsnicta was quite correct in identifying the missing one-line change. That change is now in: edd@rob:~/git/mlpack/build(feature/arma_14.2.0)$ CTEST_OUTPUT_ON_FAILURE=1 ctest -T Test .
Site: rob
Build name: Linux-c++
Test project /home/edd/git/mlpack/build
Start 1: r_binding_test
1/2 Test #1: r_binding_test ................... Passed 4.30 sec
Start 2: catch_test
2/2 Test #2: catch_test ....................... Passed 1931.63 sec
100% tests passed, 0 tests failed out of 2
Total Test time (real) = 1935.93 sec
edd@rob:~/git/mlpack/build(feature/arma_14.2.0)$ Of course it also passed individually as a first check but to be sure I re-ran the gamut. edd@rob:~/git/mlpack/build(feature/arma_14.2.0)$ bin/mlpack_test '[PerceptronTest]'
mlpack version: mlpack git-db6b0144f6
armadillo version: 12.6.7 (Cortisol Retox)
random seed: 0
Filters: [PerceptronTest]
===============================================================================
All tests passed (63 assertions in 12 test cases)
edd@rob:~/git/mlpack/build(feature/arma_14.2.0)$ @rcurtin Can you advise regarding the other check fails besides macOS woes? Edit: Part of it is the (overly strict in my view) 'warnings are errors' setting because RcppEnsmallen does not reflect the PR I made to ensmallen so ... this is above my pay grade here 🤷♂️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eddelbuettel looks good, except for the naive_bayes_classifier_impl.hpp
file.
more details at comment https://github.com/mlpack/mlpack/pull/3835/files#r1853259066
@conradsnicta I followed up there. I can make a one-off change easily but as explained, you may be misreading my attempt. This was meant to be a narrow PR, we're there now (thanks also because of you highlighting another needed change in perceptron_impl.hpp) but it I have limited appetite for additional peacemeal changes outside the scope of the PR. |
@conradsnicta Done in bbbaf7e. Please re-review so that we can finalise this one. |
For what it is worth -- Matching .max(
Matching .min(
I am not suggesting to force these four (from a glance) catches into this PR. Another one seems more apt. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eddelbuettel sorry I wasn't able to provide responses on this this afternoon. I had meant to debug it and save you some time but you and Conrad got there before I did. The results look good! (I assume they will pass the build. The OS X failures seem unrelated; I have started a battle in #3837 and will continue it there.) Just one comment from my side.
.github/workflows/main.yml
Outdated
@@ -168,7 +168,7 @@ jobs: | |||
shell: Rscript {0} | |||
|
|||
- name: Check | |||
run: Rscript -e "rcmdcheck::rcmdcheck('${{ needs.jobR.outputs.r_bindings }}', args = c('--no-manual','--as-cran'), error_on = 'warning', check_dir = 'check')" | |||
run: Rscript -e "rcmdcheck::rcmdcheck('${{ needs.jobR.outputs.r_bindings }}', args = c('--no-manual','--as-cran'), error_on = 'error', check_dir = 'check')" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since CRAN will reject for warnings, should we revert this to warning
? You know better than me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We should revert once we have a new RcppEnsmallen at CRAN. Until then we are guaranteed
- a warning because of the no-heads-up-given-to-us deprecatation in Armadillo 14.2.0
- a guaranteed fail when we error on warning
which is why I flipped this for now. You as maintainer are free to overrule but you then have to live with all the ❌ that is entails 🙀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since CRAN will reject for warnings
My approach with my (smaller !!) CRAN packages has been to deal with that locally not force my CIs to mail over what are often transient nuisances. But different strokes for different folks -- you are clearly a much a much more patient person than I am. I dislike it when CI runs take over ten minutes. Several hours, spread over different backend systems is ... a different cup of tea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah unfortunately with how long mlpack takes to compile in general (as with any heavily template metaprogrammed C++ application...), I don't have any way to get away from long turnaround times for CI. So I'm not a very patient person in general but I have had to learn some of it for our CI infrastructure. (I guess that's probably a good thing for me as a person overall.)
I saw your comment about releasing a new version of ensmallen---I think that's a good idea and I'll probably do it over the weekend, just a bugfix patch or something. That in turn should mean a fairly quick turnaround for a new version of RcppEnsmallen, and then we have no issue here. Personally I would rather do that than disable the error and forget to revert (which honestly I think is fairly high probability).
no-heads-up-given-to-us
Technically the whole thing is my fault 😄 I noticed that the overload was not documented and pointed it out. And honestly I think a deprecation warning is a totally reasonable way to provide the heads-up (in fact that's what deprecation is: a heads-up). My issue is more with CRAN's policy that makes the deprecation warning suddenly unacceptable. (Of course, the good side of that: the deprecated function usage gets resolved very quickly!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sidenote: I've empirically found that using ionice -c 3
in conjunction with standard nice
tends to make multi-core compilation of mlpack less taxing; example:
$ ionice -c 3 nice make -j 4 mlpack_test
ionice -c 3
prevents I/O access by a process until the I/O queue is clear: https://booksonline.nl/man/htmlman1/ionice.1.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re 'no heads-up given' and 'technically my fault': I am for deprecation too, but you MUST managed them downstream if someone like CRAN checks. More than once I had a really long open issue tackling every downstream package til a patch was in at CRAN. See eg this one for an example. So in my view the only way we can handle this is to
- create a deprecation warning as opt-in, leave it off by default
- turn it, find all affected downstream packages, changes all of them (!!)
- only once that is done, turn it on by default
Consequently, RcppArmadillo is forced to turn this off, as upsetting as that is for @conradsnicta. It affects too many packages. mlpack
got bitten at CRAN because it uses Arma header directly.
Re ionice
: Awesome. Will try. I had to give up on make -j 4
so can try this.
ccache
also helps as it reduces what gets compiled. I may open an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rcurtin I can of course also revert this change now, given that we get a merge over ❌ anyway. Let me know if you'd prefer that.
And thanks for the review @conradsnicta !!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rcurtin I will leave this to you and revert it now as it is not really a core part of the PR, and did not help in "getting to ✔️" which other parts failing.
Regarding the R-on-macOS failure ending in
that is apparently not us alone per a quick google search. More here at this link into 'forum.posit.co' created yesterday. No fix yet AFAICT 😿 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second approval provided automatically after 24 hours. 👍
Hu @conradsnicta thanks for the merge but just curious: Why did you merge it squash-merge when the previous merge appear to plain merges? (I like squash merges, I like it even better when all merges are squash merges.) |
(We've never had a policy for merge strategy. I'm ambivalent but agreeable if we wanted to change how we do things, which for now is "just click the button".) |
Specifically, it is which button i.e. switching to 'squash and merge' gives an easier, cleaner linear history -- I do that now in Rcpp, and previous work enforces it (and I got to appreciate it there). Another example is prarielearn/prairielearn (also many commits by many people). On the other hand, not doing it (and I just glanced at Mixing both approaches ... is the worst. Just pick one and nudge people to follow it. |
This is a PR in response to an Armadillo 14.2.0 change that creates a fair amount of compilation noise for mlpack right now. A corresponding (if smaller) PR for ensmallen was made yesterday, tested fine, and is pending.
This PR driven primarily by the changes CRAN currently shows (link here, screenshot of pertinent part below as the link will will change).
(and similarly for the other OS/compiler combinations).
There is one warning I have addressed, namely
mlpack/src/mlpack/methods/perceptron/perceptron_impl.hpp
Line 247 in 4653e10
This warns without a suggested alternate. If
std::max()
is to be used, then I may not understand the code --maxIndexCol
appears to be a constant zero here so is this meant to bemaxIndexRow = std::max(maxIndexRow, maxIndexCol);
where the latter is zero ... a null-op?