Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update deprecated .max(ind), .min(ind) to .index_max(), .index_min() #3835

Merged
merged 8 commits into from
Nov 24, 2024

Conversation

eddelbuettel
Copy link
Contributor

@eddelbuettel eddelbuettel commented Nov 19, 2024

This is a PR in response to an Armadillo 14.2.0 change that creates a fair amount of compilation noise for mlpack right now. A corresponding (if smaller) PR for ensmallen was made yesterday, tested fine, and is pending.

This PR driven primarily by the changes CRAN currently shows (link here, screenshot of pertinent part below as the link will will change).

image

(and similarly for the other OS/compiler combinations).

There is one warning I have addressed, namely

tempLabelMat.max(maxIndexRow, maxIndexCol);

This warns without a suggested alternate. If std::max() is to be used, then I may not understand the code -- maxIndexCol appears to be a constant zero here so is this meant to be maxIndexRow = std::max(maxIndexRow, maxIndexCol); where the latter is zero ... a null-op?

@eddelbuettel
Copy link
Contributor Author

@conradsnicta It seems like mlpack is also getting a cornercase solve() bug as we saw in phenotfit: https://github.com/mlpack/mlpack/actions/runs/11916140098/job/33208091670

@rcurtin
Copy link
Member

rcurtin commented Nov 19, 2024

I don't think that's a solve() bug. solve() can be noisy in the tests but that's not actually producing the failure. The failure seems to be this:

-------------------------------------------------------------------------------
GaussianHMMPredictTest
-------------------------------------------------------------------------------
/home/vsts/work/1/s/src/mlpack/tests/hmm_test.cpp:750
...............................................................................

/home/vsts/work/1/s/src/mlpack/tests/hmm_test.cpp:1101: FAILED:
  REQUIRE( stateSeqRef.at(i) == stateSeq.at(i) )
with expansion:
  1 == 7

I haven't looked at the code diff yet, so not sure where the issue might lie. But it seems reasonable that this would be a result of the index_max() / index_min() changes.

@eddelbuettel
Copy link
Contributor Author

eddelbuettel commented Nov 19, 2024

Thanks for catching that, @rcurtin, for setting me straight. That seems like it is on me, and hmm was one of the methods directories with changed files. I will take another loop.

It is likely the change from

logStateProb(j, t) = prob.max(index) + logProbs(t, j);

to

      index = prob.index_max();
      logStateProb(j, t) = index + logProbs(t, j);

where I goofed and followed the nag on max() too closely.

@@ -528,13 +528,14 @@ double HMM<Distribution>::Predict(const arma::mat& dataSeq,
for (size_t j = 0; j < logTransition.n_rows; j++)
{
arma::vec prob = logStateProb.col(t - 1) + logTransition.row(j).t();
logStateProb(j, t) = prob.max(index) + logProbs(t, j);
index = prob.index_max();
logStateProb(j, t) = index + logProbs(t, j);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logStateProb(j, t) = index + logProbs(t, j);
logStateProb(j, t) = prob[index] + logProbs(t, j);

I think that this is the cause of the test failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! The edit, cmake, test cycle is soooooo looooong I just got to reaffirm failures here. And apparently I suck because I also have

-------------------------------------------------------------------------------
/home/edd/git/mlpack/src/mlpack/tests/perceptron_test.cpp:107
...............................................................................
                                                                                                         
/home/edd/git/mlpack/src/mlpack/tests/perceptron_test.cpp:125: FAILED:
  REQUIRE( predictedLabels(2) == 1 )             
with expansion:
  0 == 1

-------------------------------------------------------------------------------
Or
-------------------------------------------------------------------------------
/home/edd/git/mlpack/src/mlpack/tests/perceptron_test.cpp:143
...............................................................................
                                                    
/home/edd/git/mlpack/src/mlpack/tests/perceptron_test.cpp:160: FAILED:
  REQUIRE( predictedLabels(0, 0) == 1 )          
with expansion:
  0 == 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would appear to be similar. I think I want

    maxIndex = tempLabelMat.index_max();
    predictedLabels(i) = tempLabelMat[maxIndex];     // ie index into variable here too

Thumbs up or down?

@rcurtin
Copy link
Member

rcurtin commented Nov 19, 2024

Thanks so much for looking into this, the warning messages were all over my output but I had been avoiding actually doing anything about it... 😄

There is one warning I have addressed, namely

mlpack/src/mlpack/methods/perceptron/perceptron_impl.hpp

Line 247 in 4653e10
tempLabelMat.max(maxIndexRow, maxIndexCol);

This warns without a suggested alternate. If std::max() is to be used, then I may not understand the code -- maxIndexCol appears to be a constant zero here so is this meant to be maxIndexRow = std::max(maxIndexRow, maxIndexCol); where the latter is zero ... a null-op?

I think this one is a little tricky. The deprecated .max(row, col) version would store both the row and column of the maximum element, but now .index_max() just gives the linear index of the maximum element. We can convert that back to rows and columns; like this should work:

      const arma::uword maxIndex = tempLabelMat.max_index();
      const arma::uword maxIndexCol = maxIndex / tempLabelMat.n_rows;
      const arma::uword maxIndexRow = maxIndex % tempLabelMat.n_rows;

@rcurtin
Copy link
Member

rcurtin commented Nov 19, 2024

Wait, but maxIndexCol isn't even used so that variable can probably just be removed entirely... and more than that, tempLabelMat will only have one column anyway... so I guess we can just do

      const arma::uword maxIndex = tempLabelMat.max_index();

and then remove the maxIndexRow and maxIndexCol members, and use maxIndex where previously maxIndexRow was used. I think that should work...

@eddelbuettel
Copy link
Contributor Author

Wait, but maxIndexCol isn't even used

Yeah I noticed that too. I think I will try your second suggestion now that we gave it the four-eye treatment.

@eddelbuettel
Copy link
Contributor Author

eddelbuettel commented Nov 20, 2024

Hey @rcurtin I backtracked to established a proper baseline and I have some good / less good news:

  • the initial changes plus the hmm correction are good, pass tests and no longer show deprecation warnings
  • my attempt to adjust perceptron as discussed above, remoing 'maxIndexCol' and settling oon 'maxIndex' instead of 'maxIndexRow' fail three tests, all apparently perceptron-related:
-------------------------------------------------------------------------------                          
KFoldCVPerceptronTest                                                                                    
-------------------------------------------------------------------------------                          
/home/edd/git/mlpack/src/mlpack/tests/cv_test.cpp:586                                                    
...............................................................................                          

/home/edd/git/mlpack/src/mlpack/tests/cv_test.cpp:604: FAILED:
  REQUIRE( cv.Evaluate() == Approx(expectedAccuracy).epsilon(1e-7) )
with expansion:
  0.5 == Approx( 1.0 )

-------------------------------------------------------------------------------                          
/home/edd/git/mlpack/src/mlpack/tests/perceptron_test.cpp:107                                            
...............................................................................                          
                                                                                                         
/home/edd/git/mlpack/src/mlpack/tests/perceptron_test.cpp:124: FAILED:                                   
  REQUIRE( predictedLabels(1) == 0 )                                                                     
with expansion:                                                                                          
  1 == 0                                                                                                 
                                                                                                         
-------------------------------------------------------------------------------                          
Or                                                                                                       
-------------------------------------------------------------------------------                          
/home/edd/git/mlpack/src/mlpack/tests/perceptron_test.cpp:143                                            
...............................................................................                          
                                                                                                         
/home/edd/git/mlpack/src/mlpack/tests/perceptron_test.cpp:163: FAILED:                                   
  REQUIRE( predictedLabels(0, 3) == 0 )                                                                  
with expansion:                                                                                          
  1 == 0  

so I may have to push that deprecation warning change back to you. (Edit: Had another look. The initial proposal may been better, will try now with maxIndexRow and maxIndexCol 'scaled' from what index_max() returns into temp. var maxIndex.)

In other news, I experimented with using ccache. As it works and is easy to use, I plan to open an issue to discuss.

@eddelbuettel
Copy link
Contributor Author

Hm. Still fails here:

/home/edd/git/mlpack/src/mlpack/tests/perceptron_test.cpp:107                                            
...............................................................................                          
                                                                                                         
/home/edd/git/mlpack/src/mlpack/tests/perceptron_test.cpp:135: FAILED:                                   
  REQUIRE( predictedLabels(1) == 0 )                                                                                                                                                                               
with expansion:                                                                                          
  1 == 0                                                                                                 
                                                                                                         
-------------------------------------------------------------------------------                          
GmmTrainDiffKmeansMaxIterationsTest                                                                      
-------------------------------------------------------------------------------                          
/home/edd/git/mlpack/src/mlpack/tests/main_tests/gmm_train_test.cpp:324                                  
...............................................................................                          
                                                                                                         
/home/edd/git/mlpack/src/mlpack/tests/main_tests/gmm_train_test.cpp:324: FAILED:                         
due to unexpected exception with message:                                                                
  Mat::col(): index out of bounds  

I guess I am stuck.

@conradsnicta
Copy link
Contributor

@eddelbuettel @rcurtin The function ind2sub() can help here.

The code in perceptron_impl.hpp can be changed from:

tempLabelMat.max(maxIndexRow, maxIndexCol);

to:

uvec subscript = arma::ind2sub(arma::size(tempLabelMat), tempLabelMat.index_max());
maxIndexRow = subscript(0);
maxIndexCol = subscript(1);

Since maxIndexCol isn't used, the code can be simplified to:

maxIndexRow = arma::ind2sub(arma::size(tempLabelMat), tempLabelMat.index_max())(0);

@eddelbuettel
Copy link
Contributor Author

I tried the suggestion by @conradsnicta 1 but alas there is still a fail:

/home/edd/git/mlpack/src/mlpack/tests/perceptron_test.cpp:107
...............................................................................
                                                    
/home/edd/git/mlpack/src/mlpack/tests/perceptron_test.cpp:135: FAILED:
  REQUIRE( predictedLabels(1) == 0 )             
with expansion:
  1 == 0


For the record my diff is

modified   src/mlpack/methods/perceptron/perceptron_impl.hpp
@@ -225,7 +225,7 @@ void Perceptron<
   size_t j, i = 0;
   bool converged = false;
   size_t tempLabel;
-  arma::uword maxIndexRow = 0, maxIndexCol = 0;
+  arma::uword maxIndexRow = 0;
   arma::Mat<ElemType> tempLabelMat;
 
   LearnPolicy LP;
@@ -244,7 +244,7 @@ void Perceptron<
       // correctly classifies this.
       tempLabelMat = weights.t() * data.col(j) + biases;
 
-      tempLabelMat.max(maxIndexRow, maxIndexCol);
+      maxIndexRow = arma::ind2sub(arma::size(tempLabelMat), tempLabelMat.index_max())(0);
 
       // Check whether prediction is correct.
       if (maxIndexRow != labels(0, j))
@@ -322,7 +322,7 @@ void Perceptron<LearnPolicy, WeightInitializationPolicy, MatType>::Classify(
   for (size_t i = 0; i < test.n_cols; ++i)
   {
     tempLabelMat = weights.t() * test.col(i) + biases;
-    tempLabelMat.max(maxIndex);
+    maxIndex = tempLabelMat.index_max();
     predictedLabels(i) = maxIndex;
   }
 }

so I don't think I deviated.

Footnotes

  1. You may want to add that suggestion to the deprecation notice as it is not entirely evident one should proceed this way.

@eddelbuettel
Copy link
Contributor Author

@conradsnicta The two methods are not equivalent in the face of ties.

Output

> Rcpp::sourceCpp("arma_index_max.cpp")

> get_max(matrix(c(1,2,3,1), 2, 2))
M
   1.0000   3.0000
   2.0000   1.0000
0 1
0 1
[1] TRUE
> get_max(M <- matrix(1, 2, 2))  # ie all ones
M
   1.0000   1.0000
   1.0000   1.0000
1 0
0 0
[1] FALSE
> 

Code

For simplicity as a R wrapper. I pass a matrix down, compute the old and new way and compare.

#include <RcppArmadillo/Lighter>

// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
bool get_max(const arma::mat& M) {
    M.print("M");
    arma::uword omaxrow = 0, omaxcol = 0;

    // old approach
    M.max(omaxrow, omaxcol);
    Rcpp::Rcout << omaxrow << " " << omaxcol << std::endl;

    // new approach
    arma::uvec subscript = arma::ind2sub(arma::size(M), M.index_max());
    arma::uword nmaxrow = subscript(0);
    arma::uword nmaxcol = subscript(1);
    Rcpp::Rcout << nmaxrow << " " << nmaxcol << std::endl;

    return (omaxrow == nmaxrow && omaxcol == nmaxcol);
}


/*** R
get_max(M <- matrix(c(1,2,3,1), 2, 2))
*/

@conradsnicta
Copy link
Contributor

@rcurtin @eddelbuettel Thanks for tracking this down. On the surface, both the old and new approaches are "correct", in the sense that they provide a valid index to a valid maximum value. For clarity, old = .max(row,col), and new = .index_max() with ind2sub().

If there are duplicate maximum values, the new approach is guaranteed to provide the location of the first such value, while the old approach does not.

Armadillo generally aims to follow the API design and functionality of Matlab/Octave. Matlab documentation states: "max ... returns the index ... that corresponds to the first occurrence of the maximum value ..." [1], and Octave documentation states: "max ... returns the first index of the maximum value(s)" [2].

So the new approach is preferred, and the old approach is "wrong". The same behavioural difference is also present when comparing .max(index) and .index_max().

This inconsistency between the old and new approaches is clearly suboptimal, and needs to be addressed. Aiming to do this in Armadillo 14.2.1.

I'd suggest to refactor the failing tests in mlpack to take into account the new "correct" behaviour.

.index_max() and .index_min() have been present in Armadillo since version 7.200, so the refactored code will work with old versions of Armadillo.

[1] https://www.mathworks.com/help/matlab/ref/double.max.html
[2] https://octave.sourceforge.io/octave/function/max.html

@eddelbuettel
Copy link
Contributor Author

eddelbuettel commented Nov 22, 2024

I have yet to look in more detail into unit test.

I'd suggest to refactor the failing tests in mlpack to take into account the new "correct" behaviour.

But let me see now if I can coax it into behaving.

(On the fundamental level I agree: while 'different' on ties the index value is of course the same which is what should matter.)

@eddelbuettel
Copy link
Contributor Author

eddelbuettel commented Nov 22, 2024

I may need help here. When I take the test function 'as is' and make it callable in R I replicate the issue:

> Rcpp::sourceCpp("mlpack_perceptron_unit_test.cpp")

> and_gate_classifier()
prediction as vector
        0        0        1        0
prediction by element
        0        1        1        0
> 

The second vector ought to be like the first, somehow p.Classify(testData.col(1)); goes off the rails. @rcurtin @conradsnicta could one of you dig a little deeper?

Code

#include <Rcpp/Lighter>
#include <mlpack.h>

#include <mlpack/methods/perceptron.hpp>

// [[Rcpp::plugins(cpp17)]]
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::depends(RcppEnsmallen)]]
// [[Rcpp::depends(mlpack)]]

// [[Rcpp::export]]
void and_gate_classifier() {		// cf test case 'And' in '[PerceptronTest]' line 135++
    arma::mat trainData;
    trainData = { { 0, 1, 1, 0 },
                  { 1, 0, 1, 0 } };
    arma::Mat<size_t> labels;
    labels = { 0, 0, 1, 0 };

    mlpack::Perceptron<> p(trainData, labels.row(0), 2, 1000);

    arma::mat testData;
    testData = { { 0, 1, 1, 0 },
                 { 1, 0, 1, 0 } };
    arma::Row<size_t> predictedLabels;
    p.Classify(testData, predictedLabels);

    predictedLabels.print("prediction as vector");

    // Test single-point classify too.
    predictedLabels(0) = p.Classify(testData.col(0));
    predictedLabels(1) = p.Classify(testData.col(1));
    predictedLabels(2) = p.Classify(testData.col(2));
    predictedLabels(3) = p.Classify(testData.col(3));

    predictedLabels.print("prediction by element");
}

/*** R
and_gate_classifier()
*/

PS This is of course using the mlpack_4.5.1 produced here in my local testing with the appropriate patch as in the PR. Happy to make the tarball available if it helps.

@conradsnicta
Copy link
Contributor

conradsnicta commented Nov 22, 2024

The second vector ought to be like the first, somehow p.Classify(testData.col(1)); goes off the rails

@eddelbuettel This might be due to an additional line in perceptron_impl.hpp requiring modification, so that everything is consistently using .index_max().

In addition to the detected issues on lines ~247 and ~325, there is also line 292.

247:      tempLabelMat.max(maxIndexRow, maxIndexCol);
292:  tempLabelVec.max(maxIndex);
325:    tempLabelMat.max(maxIndex);

Related PR: eddelbuettel#1

@eddelbuettel
Copy link
Contributor Author

Did you try the change? Does the test improve?

@rcurtin
Copy link
Member

rcurtin commented Nov 22, 2024

Sorry for the slow response---debugging this is on my todo list for today.

@eddelbuettel
Copy link
Contributor Author

eddelbuettel commented Nov 22, 2024

I am running a full test with @conradsnicta 's suggested fix which may solve it: if I missed one, I may have left "inconsistent" use between fitting and prediction in there, or between predicting by vector and predicting by column.

The meta issue for you is still if we want and can merge change in behaviour. What @conradsnicta says remain true: the new form is better. But forcing change down to user which may well break their tests and deployments is icky.

(As an aside, adding ccache to the build step is wonderful. I do not pass it down to the 'build test files' step so that is still slow, as is of course running the tests.)

@eddelbuettel
Copy link
Contributor Author

eddelbuettel commented Nov 22, 2024

For the second time in two days, building mlpack took down my machine triggering a reboot. Maybe one of the fans isn't fully working. Ick.

@rcurtin
Copy link
Member

rcurtin commented Nov 22, 2024

mlpack compilation has helped me find many a weak power supply or bad RAM stick because of how hard it is on the system :)

This is an Armadillo 14.2.0 change that creates a fair amount of compilation
noise for mlpack right now.
@eddelbuettel
Copy link
Contributor Author

eddelbuettel commented Nov 22, 2024

@conradsnicta correctly identified a third location needing M.max(v) -> v = M.index_max(). With the check the test seems to pass, and my replication passes:

> Rcpp::sourceCpp("mlpack_perceptron_unit_test.cpp")

> and_gate_classifier()
prediction as vector
        0        0        1        0
prediction by element
        0        0        1        0
> 

He overcomplicated things a little by forking my repo and sending a PR to my PR, and then that PR lacked a change I have above... Anyway, I cherry-picked his change, post-processed, rebased while I was at it and the amended branch and PR is now cooking. The change request by @conradsnicta is now stale (and resolved -- the change is in), so I requested a new one but I cannot resolve the current one way.

@eddelbuettel
Copy link
Contributor Author

eddelbuettel commented Nov 22, 2024

All good here now. @conradsnicta was quite correct in identifying the missing one-line change. That change is now in:

edd@rob:~/git/mlpack/build(feature/arma_14.2.0)$ CTEST_OUTPUT_ON_FAILURE=1 ctest -T Test .
   Site: rob
   Build name: Linux-c++
Test project /home/edd/git/mlpack/build
    Start 1: r_binding_test
1/2 Test #1: r_binding_test ...................   Passed    4.30 sec
    Start 2: catch_test
2/2 Test #2: catch_test .......................   Passed  1931.63 sec

100% tests passed, 0 tests failed out of 2

Total Test time (real) = 1935.93 sec
edd@rob:~/git/mlpack/build(feature/arma_14.2.0)$ 

Of course it also passed individually as a first check but to be sure I re-ran the gamut.

edd@rob:~/git/mlpack/build(feature/arma_14.2.0)$ bin/mlpack_test '[PerceptronTest]'
mlpack version: mlpack git-db6b0144f6
armadillo version: 12.6.7 (Cortisol Retox)
random seed: 0
Filters: [PerceptronTest]
===============================================================================
All tests passed (63 assertions in 12 test cases)

edd@rob:~/git/mlpack/build(feature/arma_14.2.0)$

@rcurtin Can you advise regarding the other check fails besides macOS woes?

Edit: Part of it is the (overly strict in my view) 'warnings are errors' setting because RcppEnsmallen does not reflect the PR I made to ensmallen so ... this is above my pay grade here 🤷‍♂️

Copy link
Contributor

@conradsnicta conradsnicta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eddelbuettel looks good, except for the naive_bayes_classifier_impl.hpp file.
more details at comment https://github.com/mlpack/mlpack/pull/3835/files#r1853259066

@eddelbuettel
Copy link
Contributor Author

eddelbuettel commented Nov 23, 2024

@conradsnicta I followed up there. I can make a one-off change easily but as explained, you may be misreading my attempt. This was meant to be a narrow PR, we're there now (thanks also because of you highlighting another needed change in perceptron_impl.hpp) but it I have limited appetite for additional peacemeal changes outside the scope of the PR.

@eddelbuettel
Copy link
Contributor Author

eddelbuettel commented Nov 23, 2024

@conradsnicta Done in bbbaf7e. Please re-review so that we can finalise this one.

@eddelbuettel
Copy link
Contributor Author

eddelbuettel commented Nov 23, 2024

For what it is worth -- ag recursive search below methods/

Matching .max(

-*- mode: ag; default-directory: "~/git/mlpack/src/mlpack/methods/" -*-
Ag started at Fri Nov 22 22:13:19

ag --literal --group --line-number --column --color --color-match 30\;43 --color-path 1\;32 --smart-case --stats -- .max\( .
File: reinforcement_learning/policy/greedy_policy.hpp
81:64:          arma::as_scalar(arma::find(actionValue == actionValue.max(), 1)));

File: reinforcement_learning/worker/one_step_q_learning_worker.hpp
297:47:        double targetActionValue = actionValue.max();

File: reinforcement_learning/worker/n_step_q_learning_worker.hpp
294:29:        target = actionValue.max();

File: reinforcement_learning/replay/prioritized_replay.hpp
246:23:    weights /= weights.max();

File: ann/loss_functions/vr_class_reward_impl.hpp
52:24:    input.unsafe_col(i).max(index);

File: naive_bayes/naive_bayes_classifier_impl.hpp
298:43:  const ElemType maxValue = logLikelihoods.max();
304:17:  logLikelihoods.max(maxIndex);
335:33:    logLikelihoods.unsafe_col(i).max(maxIndex);

File: hoeffding_trees/hoeffding_numeric_split_impl.hpp
173:26:    return double(classes.max()) / double(accu(classes));
181:30:    return double(classCounts.max()) / double(sum(classCounts));

File: hoeffding_trees/hoeffding_categorical_split_impl.hpp
91:28:  return double(classCounts.max()) / double(accu(classCounts));

File: adaboost/adaboost_impl.hpp
168:16:  probabilities.max(maxIndex);

Matching .min(

-*- mode: ag; default-directory: "~/git/mlpack/src/mlpack/methods/" -*-
Ag started at Fri Nov 22 22:14:54

ag --literal --group --line-number --column --color --color-match 30\;43 --color-path 1\;32 --smart-case --stats -- .min\( .
File: ann/init_rules/kathirvalavakumar_subavathi_init.hpp
88:27:    const double theta = b.min();
105:27:    const double theta = b.min();
2 matches
1 files contained matches
744 files searched
4621397 bytes searched
0.014591 seconds

I am not suggesting to force these four (from a glance) catches into this PR. Another one seems more apt.

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eddelbuettel sorry I wasn't able to provide responses on this this afternoon. I had meant to debug it and save you some time but you and Conrad got there before I did. The results look good! (I assume they will pass the build. The OS X failures seem unrelated; I have started a battle in #3837 and will continue it there.) Just one comment from my side.

@@ -168,7 +168,7 @@ jobs:
shell: Rscript {0}

- name: Check
run: Rscript -e "rcmdcheck::rcmdcheck('${{ needs.jobR.outputs.r_bindings }}', args = c('--no-manual','--as-cran'), error_on = 'warning', check_dir = 'check')"
run: Rscript -e "rcmdcheck::rcmdcheck('${{ needs.jobR.outputs.r_bindings }}', args = c('--no-manual','--as-cran'), error_on = 'error', check_dir = 'check')"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since CRAN will reject for warnings, should we revert this to warning? You know better than me.

Copy link
Contributor Author

@eddelbuettel eddelbuettel Nov 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We should revert once we have a new RcppEnsmallen at CRAN. Until then we are guaranteed

  • a warning because of the no-heads-up-given-to-us deprecatation in Armadillo 14.2.0
  • a guaranteed fail when we error on warning

which is why I flipped this for now. You as maintainer are free to overrule but you then have to live with all the ❌ that is entails 🙀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since CRAN will reject for warnings

My approach with my (smaller !!) CRAN packages has been to deal with that locally not force my CIs to mail over what are often transient nuisances. But different strokes for different folks -- you are clearly a much a much more patient person than I am. I dislike it when CI runs take over ten minutes. Several hours, spread over different backend systems is ... a different cup of tea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah unfortunately with how long mlpack takes to compile in general (as with any heavily template metaprogrammed C++ application...), I don't have any way to get away from long turnaround times for CI. So I'm not a very patient person in general but I have had to learn some of it for our CI infrastructure. (I guess that's probably a good thing for me as a person overall.)

I saw your comment about releasing a new version of ensmallen---I think that's a good idea and I'll probably do it over the weekend, just a bugfix patch or something. That in turn should mean a fairly quick turnaround for a new version of RcppEnsmallen, and then we have no issue here. Personally I would rather do that than disable the error and forget to revert (which honestly I think is fairly high probability).

no-heads-up-given-to-us

Technically the whole thing is my fault 😄 I noticed that the overload was not documented and pointed it out. And honestly I think a deprecation warning is a totally reasonable way to provide the heads-up (in fact that's what deprecation is: a heads-up). My issue is more with CRAN's policy that makes the deprecation warning suddenly unacceptable. (Of course, the good side of that: the deprecated function usage gets resolved very quickly!)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sidenote: I've empirically found that using ionice -c 3 in conjunction with standard nice tends to make multi-core compilation of mlpack less taxing; example:

$ ionice -c 3 nice make -j 4 mlpack_test

ionice -c 3 prevents I/O access by a process until the I/O queue is clear: https://booksonline.nl/man/htmlman1/ionice.1.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re 'no heads-up given' and 'technically my fault': I am for deprecation too, but you MUST managed them downstream if someone like CRAN checks. More than once I had a really long open issue tackling every downstream package til a patch was in at CRAN. See eg this one for an example. So in my view the only way we can handle this is to

  • create a deprecation warning as opt-in, leave it off by default
  • turn it, find all affected downstream packages, changes all of them (!!)
  • only once that is done, turn it on by default

Consequently, RcppArmadillo is forced to turn this off, as upsetting as that is for @conradsnicta. It affects too many packages. mlpack got bitten at CRAN because it uses Arma header directly.

Re ionice: Awesome. Will try. I had to give up on make -j 4 so can try this.

ccache also helps as it reduces what gets compiled. I may open an issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rcurtin I can of course also revert this change now, given that we get a merge over ❌ anyway. Let me know if you'd prefer that.

And thanks for the review @conradsnicta !!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rcurtin I will leave this to you and revert it now as it is not really a core part of the PR, and did not help in "getting to ✔️" which other parts failing.

@eddelbuettel
Copy link
Contributor Author

Regarding the R-on-macOS failure ending in

Error: Failed to get R release: Failed to get R 4.4.2: \
Failed to install qpdf: Error: The process '/opt/homebrew/bin/brew' failed with exit code 1

that is apparently not us alone per a quick google search. More here at this link into 'forum.posit.co' created yesterday. No fix yet AFAICT 😿

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second approval provided automatically after 24 hours. 👍

@conradsnicta conradsnicta merged commit 331eb07 into mlpack:master Nov 24, 2024
11 of 16 checks passed
@eddelbuettel
Copy link
Contributor Author

Hu @conradsnicta thanks for the merge but just curious: Why did you merge it squash-merge when the previous merge appear to plain merges? (I like squash merges, I like it even better when all merges are squash merges.)

@rcurtin
Copy link
Member

rcurtin commented Nov 24, 2024

(We've never had a policy for merge strategy. I'm ambivalent but agreeable if we wanted to change how we do things, which for now is "just click the button".)

@eddelbuettel
Copy link
Contributor Author

Specifically, it is which button i.e. switching to 'squash and merge' gives an easier, cleaner linear history -- I do that now in Rcpp, and previous work enforces it (and I got to appreciate it there). Another example is prarielearn/prairielearn (also many commits by many people).

On the other hand, not doing it (and I just glanced at mlpack, duckdb, quantlib, ... which don't do it) gives more granularity.

Mixing both approaches ... is the worst. Just pick one and nudge people to follow it.

@eddelbuettel eddelbuettel deleted the feature/arma_14.2.0 branch December 5, 2024 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants