-
-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH improve speed of fitting Cox model by relying on the fast skglm solver #1531
Comments
Wow that's very impressive! One thing I think you should try is to bin the times into buckets (as tied times are common in survival datasets, as we are often rounding to months, days, hours, etc.). The Cox model works by sorting times, but when there are ties, it has to use a technique to handle them. There are a few technique to handle ties: random, Efron, Breslow, exact (the most accurate, but slowest). Lifelines uses Efron's method, as its accuracy-to-speed tradeoff is good. |
Indeed, we are working on adding support for Efron handling of ties here : scikit-learn-contrib/skglm#159, it should be merged shortly. |
Very exciting work, team! |
@BadrMOUFAD has just added support for the Efron handling of ties here : scikit-learn-contrib/skglm#159 |
I'm impressed. I'm going to have to try this library locally. Is the following (mostly) correct? One significant speed up is from using an approximation to the Hessian. This approximation is valid to use, and can be shown that using it will still converge to the same solution (albeit with perhaps more iterations, but the cost-savings are still there). |
Thank you again for your interest!
We are happy to discuss options for integrating |
I am one of the developers of
skglm
, a python package that improvesscikit-learn
for Generalized Linear Models by providing more functionalities and faster solvers.We have recently worked on a solver for the Cox estimator, for which
lifelines
provides a reference implementation.Preliminary results indicate time speedups up to x500 when using
skglm
Here is a notebook to illustrate the peformance and to showcase the
scikit-learn
like API ofskglm
. Also, here are the results of a complete benchmark with Benchopt and the link to the benchmark repo to reproduce it.In addition, some
skglm
features might be useful to the users oflifelines
:lifelines
)lifelines
)Based on this, we'd like to discuss the potential integration of
skglm
solver intolifelines
for fitting the Cox Estimator.A noteworthy point is that
skglm
relies heavily onnumba
JIT compilation, which may introduce a slight overhead during the initial model fit. However, this inconvenience is compensated by the gained advantages namely handling datasets with thousands of features and samples within a reasonable time.We'd be happy to have your feedback on this.
Also pinging @Badr-MOUFAD @PABannier @QB3
The text was updated successfully, but these errors were encountered: