Stucture of the project #199

therneau · 2024-05-05T03:44:04Z

therneau
May 5, 2024

Someone pointed me to the survival analysis page of this project, and I found it interesting. But it leads to some deeper questions about what you want to accomplish, and how best to do it. I have dealt with some of these issues for a long time.
There are multiple dimensions of the problem

Compuational a. Different defaults in different packages. Your document focused on this type of issue, which is fairly easy to clarify. b. Different formulas being used under the covers; in this case you can't make one look like the other. c. Edge cases in the computation, where some of codes go astray. d. Estimand disagreements, where one displays a value that another may consider invalid or at least extremely unwise. I have examples of all 4 wrt survival in R and SAS.

Arbitration: Issues a) above are easy, and for c) we can usually figure out the right answer case by case. But for b and d you need to think about who is right, and how to come to agreement on it. There also needs to be guidance on which ones can be statistically ignored, use of n vs n-1 in a denominator for instance.

Formal test suites. By this I mean small sets of data where the correct answer is known, explicitly. 35 years ago there was a big controversy about various sums of squares in complex linear models. When Shayle Searle beat out a comprehensive set of examples, by hand, it went a long way toward solving the arguments. (At least people now agreed on what exactly each was computing). See appendix E of Therneau and Grambsh (2000) as an example for survival code. Do we add these formal checks?

Software: I was frankly put off by the intial instructions for participation. One thing I have learned in software validation, through long and bitter experience, is to simplify, simplify, simplify. Load only the base packages you need to load and use simple summaries. I often get sent queries where my first action is to strip out a stack of library statements, in self defense.

What is the right place to discuss higher level questions like this? It doesn't seem to fit into the git push paradym. But that may be my inexperience.

Terry Therneau.
SAS coxreg procedure 1979 (first Cox model distributed by SAS), S coxph function 1985, S/ S-Plus/ R survival library since 1987.

DrLynTaylor · 2024-08-07T16:38:42Z

DrLynTaylor
Aug 7, 2024
Maintainer

Hi Terry, firstly I am so sorry for not responding in so long. I wanted to have time to fully digest your very well written discussion point! The previous discussion we've had on the project determined that our primary focus was on documenting why the software gets different results, rather than providing a recommendation of one approach over another. We want to try to point to other sources for pros / cons of certain methodology, since this can change over time and be subjective / complex. CAMIS repo hopes to detail what method is being applied particularly for the cases that are poorly documented or certain use cases means a different method is defaulted to in the software but it doesn't tell you that! We do resort to hand calculations as evidence and to help identify what the software is doing where possible. As you say differences can be due to a) different default methods, b) different application of a method, c) edge case differences d) assumptions. We hope to fully document what the software is doing using examples. I think we should not yet expand the remit of the project to give a recommendation on which approach is 'better' (or better in certain situations), instead perhaps adding citations to other discussions on these points, so the reader can be pointed in the right direction without CAMIS adding a definitive support for one or the other? What do you think though? RE: libraries, our code creates our website and directly runs the R/python code to ensure it's up to date, this is why we use renv and have to load more packages than you would needed doing only 1 of the analysis at a time. It has meant a bit of a more complex set up, however we do also accept content to be sent to us, and we will convert to Quorto files and submit to the repo for anyone struggling with forking and updating directly in R. Happy to discuss further all of these points.

0 replies

therneau · 2024-08-08T03:00:54Z

therneau
Aug 8, 2024
Author

I have a very busy few months coming up, so you may not get detailed response from me for some while. (I'll first need to find my note and remember what I said.) I do think that select cases where the correct answer has been carefully worked out play a critical role, not so much to "shame" the packages that got it wrong, but to create a test case for the future. I would point to Longley , J Amer Stat Assoc, 1967 as an example; it led to a revolution in linear model functions for all the statistics packages by giving them a target. A paper by Shayle Searle solved a decade of bickering for ANOVA sums of square for complex models (I don't have the reference), again by giving everyone an agreed on testing target. The "validation" appendix in my 2000 book had that same goal, for survival methods. (Though it frankly arose out of frustration with my own code.) A not small side benefit was being able to prove that the survival package was correct. One memorable email started with "I've found a bug in your code. It gives a different answer than SAS." (I know of two particular errors in SAS phreg. One is of no consequence and shows up only in the 6th digit. The other is damm serious and can give answers with 0 correct digits.) But the above are just off the cuff thoughts, late at night. Let me go back and read. But first-- the grant, paper, and book draft that are all currently due... Terry T. From: Lyn Taylor ***@***.***> Date: Wednesday, August 7, 2024 at 11:39 AM To: PSIAIMS/CAMIS ***@***.***> Cc: Therneau, Terry M., Ph.D. ***@***.***>, Author ***@***.***> Subject: [EXTERNAL] Re: [PSIAIMS/CAMIS] Stucture of the project (Discussion #199) Hi Terry, firstly I am so sorry for not responding in so long. I wanted to have time to fully digest your very well written discussion point! The previous discussion we've had on the project determined that our primary focus was on documenting why the software gets different results, rather than providing a recommendation of one approach over another. We want to try to point to other sources for pros / cons of certain methodology, since this can change over time and be subjective / complex. CAMIS repo hopes to detail what method is being applied particularly for the cases that are poorly documented or certain use cases means a different method is defaulted to in the software but it doesn't tell you that! We do resort to hand calculations as evidence and to help identify what the software is doing where possible. As you say differences can be due to a) different default methods, b) different application of a method, c) edge case differences d) assumptions. We hope to fully document what the software is doing using examples. I think we should not yet expand the remit of the project to give a recommendation on which approach is 'better' (or better in certain situations), instead perhaps adding citations to other discussions on these points, so the reader can be pointed in the right direction without CAMIS adding a definitive support for one or the other? What do you think though? RE: libraries, our code creates our website and directly runs the R/python code to ensure it's up to date, this is why we use renv and have to load more packages than you would needed doing only 1 of the analysis at a time. It has meant a bit of a more complex set up, however we do also accept content to be sent to us, and we will convert to Quorto files and submit to the repo for anyone struggling with forking and updating directly in R. Happy to discuss further all of these points. — Reply to this email directly, view it on GitHub<#199 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACJ3PGKDBH7UBX4JPGA7IBTZQJESPAVCNFSM6AAAAABHHLTXFOVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTAMRWGY4DKMI>. You are receiving this because you authored the thread.Message ID: ***@***.***>

0 replies

therneau · 2024-08-21T21:41:33Z

therneau
Aug 21, 2024
Author

Some follow up.

As I said earlier, in some cases you can describe why software gives different results. But in many you will need a worked example to be able to understand and to label said results, even if they are not right/wrong. Worked examples are hard, but worthwhile.
There will be a few cases where software is simply wrong. The authors will, in most cases, be happy to hear from you about this (when you have an example of course). There are other cases where one is formally wrong but nobody cares. One example is SAS phreg, robust variance when using the Efron approximation. In that case there is an "n" in a denominator when they should formally have an n-1, but the difference in results is trivial. (And the reason for the n-1 is fairly subtle). From any practical perspective both are correct. But it is worth documenting to avoid arguments from someone who notices that the digits don't exactly match.
I think you are making a misstep with your participation requirements. I understand that for a final printed document you will want nice tables and etc, which lead you to a suite of packages. Fine. But please recognize that the tidyverse is now a completely separate language. I know R very well (started with S in 1985) and cannot even read my colleagues R code. (Soapbox on: a fair bit of that is the arrogance of "tidyverse code is so clear I don't need comments" : soapbox off). By forcing this at all stages you will a) discourage participation and b) have examples that are far less convincing.
Linear mixed models are an interesting example where Doug Bates and Russ Wolfinger disagree about denominator df in several cases. Both much more authoritative than me in this field, it would be really hard to say who was "right".

I think that part of this will be whether you would like my input wrt survival. I may know more about validation than anyone, in that particular domain. But I'm also opinionated :-)

1 reply

DrLynTaylor Aug 22, 2024
Maintainer

Hi @therneau Thanks for your further discussion comments. In reading them I feel we are mostly in agreement with you and that the CAMIS project is already aligned to your recommendations. In most cases it just seems like a mis-understanding of what you think we are doing, versus what we think we are doing! Please see responses on your items below. Please bear in mind CAMIS is documenting the reasons for differences across software, we do not go into which analysis is Correct or Better than another method.

- I'm not sure how much have you looked at our content (please note it's all Work in progress)? But our strategy is to do a worked examples in R (in the R page), then the same worked examples in SAS (in the SAS page), then discuss the comparison of those results in the SAS VS R page, and describe if and why they get different results. We also try to include python now too. Our purpose is to explain any differences, our objective is not to recommend which methods are better than others.
Very rare using popular R packages and SAS have we find software gave 'wrong' results compared to the method it said it was doing - more cases it is a different method/approach/algorithm used. I can think of 1 place where we could not replicate CI's and working with the package author the equation for the CI was unexplainable, eg. mcnemar test CI's using epibasix package, given the equation to calculculate the CI is unknown (even by package author), we advise against its use. But in most cases we just explain the different equations being used to create the results.
Re: participation requirements. I didn't realise we had any! We consider any content that people would like to propose, the template is just a suggestion which aided people to write a summary of their findings is a more consistent way across the pages. However there is no limitation of which packages people can investigate and if your findings do not fit into our template, then we welcome alternative suggestions.
We do not recommend picking a 'correct' method, since most methods are better in some situations and it can be data dependent. We do not go into 'preferred' packages or recommended methods since this is covered by other appear and research. CAMIS project sole focus is if you run data through different software and get different results, we document WHY that has happened. "Understanding the source of any discrepancies" please see our Introduction & motivation here: https://psiaims.github.io/CAMIS/

Regarding your points about SAS Phreg, robust variance using Efron approximation, n demoninator should be n-1 - this would be perfect to ensure is explained somewhere on our site as we cannot replicate this analysis using python or R packages because those packages do use n-1, this could explain a result difference which is important to us. We have a survival focused team who i'm sure would appreciate your expertise. Please contact @statasaurus if you would like to join their monthly calls. If you would like to join our general call, please let me know. It's the 2nd monday of each month 4:30 BST / 11:30 EST.
Thank you so much for your discussion points, it's always good to check we are on the right path, and consider alternatives. If you look at our latest repo, you will see substantial more content and contributions, which is really exciting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stucture of the project #199

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Stucture of the project #199

therneau May 5, 2024

Replies: 3 comments · 1 reply

DrLynTaylor Aug 7, 2024 Maintainer

therneau Aug 8, 2024 Author

therneau Aug 21, 2024 Author

DrLynTaylor Aug 22, 2024 Maintainer

therneau
May 5, 2024

Replies: 3 comments 1 reply

DrLynTaylor
Aug 7, 2024
Maintainer

therneau
Aug 8, 2024
Author

therneau
Aug 21, 2024
Author

DrLynTaylor Aug 22, 2024
Maintainer