Replies: 3 comments 1 reply
-
Hi Terry, firstly I am so sorry for not responding in so long. I wanted to have time to fully digest your very well written discussion point! The previous discussion we've had on the project determined that our primary focus was on documenting why the software gets different results, rather than providing a recommendation of one approach over another. We want to try to point to other sources for pros / cons of certain methodology, since this can change over time and be subjective / complex. CAMIS repo hopes to detail what method is being applied particularly for the cases that are poorly documented or certain use cases means a different method is defaulted to in the software but it doesn't tell you that! We do resort to hand calculations as evidence and to help identify what the software is doing where possible. As you say differences can be due to a) different default methods, b) different application of a method, c) edge case differences d) assumptions. We hope to fully document what the software is doing using examples. I think we should not yet expand the remit of the project to give a recommendation on which approach is 'better' (or better in certain situations), instead perhaps adding citations to other discussions on these points, so the reader can be pointed in the right direction without CAMIS adding a definitive support for one or the other? What do you think though? RE: libraries, our code creates our website and directly runs the R/python code to ensure it's up to date, this is why we use renv and have to load more packages than you would needed doing only 1 of the analysis at a time. It has meant a bit of a more complex set up, however we do also accept content to be sent to us, and we will convert to Quorto files and submit to the repo for anyone struggling with forking and updating directly in R. Happy to discuss further all of these points. |
Beta Was this translation helpful? Give feedback.
-
I have a very busy few months coming up, so you may not get detailed response from me for some while. (I'll first need to find my note and remember what I said.)
I do think that select cases where the correct answer has been carefully worked out play a critical role, not so much to "shame" the packages that got it wrong, but to create a test case for the future. I would point to Longley , J Amer Stat Assoc, 1967 as an example; it led to a revolution in linear model functions for all the statistics packages by giving them a target. A paper by Shayle Searle solved a decade of bickering for ANOVA sums of square for complex models (I don't have the reference), again by giving everyone an agreed on testing target.
The "validation" appendix in my 2000 book had that same goal, for survival methods. (Though it frankly arose out of frustration with my own code.) A not small side benefit was being able to prove that the survival package was correct. One memorable email started with "I've found a bug in your code. It gives a different answer than SAS." (I know of two particular errors in SAS phreg. One is of no consequence and shows up only in the 6th digit. The other is damm serious and can give answers with 0 correct digits.)
But the above are just off the cuff thoughts, late at night. Let me go back and read. But first-- the grant, paper, and book draft that are all currently due...
Terry T.
From: Lyn Taylor ***@***.***>
Date: Wednesday, August 7, 2024 at 11:39 AM
To: PSIAIMS/CAMIS ***@***.***>
Cc: Therneau, Terry M., Ph.D. ***@***.***>, Author ***@***.***>
Subject: [EXTERNAL] Re: [PSIAIMS/CAMIS] Stucture of the project (Discussion #199)
Hi Terry, firstly I am so sorry for not responding in so long. I wanted to have time to fully digest your very well written discussion point! The previous discussion we've had on the project determined that our primary focus was on documenting why the software gets different results, rather than providing a recommendation of one approach over another. We want to try to point to other sources for pros / cons of certain methodology, since this can change over time and be subjective / complex. CAMIS repo hopes to detail what method is being applied particularly for the cases that are poorly documented or certain use cases means a different method is defaulted to in the software but it doesn't tell you that! We do resort to hand calculations as evidence and to help identify what the software is doing where possible. As you say differences can be due to a) different default methods, b) different application of a method, c) edge case differences d) assumptions. We hope to fully document what the software is doing using examples. I think we should not yet expand the remit of the project to give a recommendation on which approach is 'better' (or better in certain situations), instead perhaps adding citations to other discussions on these points, so the reader can be pointed in the right direction without CAMIS adding a definitive support for one or the other? What do you think though? RE: libraries, our code creates our website and directly runs the R/python code to ensure it's up to date, this is why we use renv and have to load more packages than you would needed doing only 1 of the analysis at a time. It has meant a bit of a more complex set up, however we do also accept content to be sent to us, and we will convert to Quorto files and submit to the repo for anyone struggling with forking and updating directly in R. Happy to discuss further all of these points.
—
Reply to this email directly, view it on GitHub<#199 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACJ3PGKDBH7UBX4JPGA7IBTZQJESPAVCNFSM6AAAAABHHLTXFOVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTAMRWGY4DKMI>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Some follow up.
I think that part of this will be whether you would like my input wrt survival. I may know more about validation than anyone, in that particular domain. But I'm also opinionated :-) |
Beta Was this translation helpful? Give feedback.
-
Someone pointed me to the survival analysis page of this project, and I found it interesting. But it leads to some deeper questions about what you want to accomplish, and how best to do it. I have dealt with some of these issues for a long time.
There are multiple dimensions of the problem
Compuational a. Different defaults in different packages. Your document focused on this type of issue, which is fairly easy to clarify. b. Different formulas being used under the covers; in this case you can't make one look like the other. c. Edge cases in the computation, where some of codes go astray. d. Estimand disagreements, where one displays a value that another may consider invalid or at least extremely unwise. I have examples of all 4 wrt survival in R and SAS.
Arbitration: Issues a) above are easy, and for c) we can usually figure out the right answer case by case. But for b and d you need to think about who is right, and how to come to agreement on it. There also needs to be guidance on which ones can be statistically ignored, use of n vs n-1 in a denominator for instance.
Formal test suites. By this I mean small sets of data where the correct answer is known, explicitly. 35 years ago there was a big controversy about various sums of squares in complex linear models. When Shayle Searle beat out a comprehensive set of examples, by hand, it went a long way toward solving the arguments. (At least people now agreed on what exactly each was computing). See appendix E of Therneau and Grambsh (2000) as an example for survival code. Do we add these formal checks?
Software: I was frankly put off by the intial instructions for participation. One thing I have learned in software validation, through long and bitter experience, is to simplify, simplify, simplify. Load only the base packages you need to load and use simple summaries. I often get sent queries where my first action is to strip out a stack of library statements, in self defense.
What is the right place to discuss higher level questions like this? It doesn't seem to fit into the git push paradym. But that may be my inexperience.
Terry Therneau.
SAS coxreg procedure 1979 (first Cox model distributed by SAS), S coxph function 1985, S/ S-Plus/ R survival library since 1987.
Beta Was this translation helpful? Give feedback.
All reactions