ENH improve speed of fitting Cox model by relying on the fast skglm solver #1531

mathurinm · 2023-06-07T09:07:29Z

I am one of the developers of skglm, a python package that improves scikit-learn for Generalized Linear Models by providing more functionalities and faster solvers.
We have recently worked on a solver for the Cox estimator, for which lifelines provides a reference implementation.

Preliminary results indicate time speedups up to x500 when using skglm

Here is a notebook to illustrate the peformance and to showcase the scikit-learn like API of skglm. Also, here are the results of a complete benchmark with Benchopt and the link to the benchmark repo to reproduce it.

In addition, some skglm features might be useful to the users of lifelines:

support of design matrix with more columns than rows (may cause issue in lifelines)
support of sparse design matrix (currently not supported in lifelines)
immediate extension to other penalizers such as Weighted L1, non convex regularizers, group Lasso penalty, etc

Based on this, we'd like to discuss the potential integration of skglm solver into lifelines for fitting the Cox Estimator.

A noteworthy point is that skglm relies heavily on numba JIT compilation, which may introduce a slight overhead during the initial model fit. However, this inconvenience is compensated by the gained advantages namely handling datasets with thousands of features and samples within a reasonable time.

We'd be happy to have your feedback on this.

Also pinging @Badr-MOUFAD @PABannier @QB3

The text was updated successfully, but these errors were encountered:

CamDavidsonPilon · 2023-06-07T12:15:30Z

Wow that's very impressive! One thing I think you should try is to bin the times into buckets (as tied times are common in survival datasets, as we are often rounding to months, days, hours, etc.). The Cox model works by sorting times, but when there are ties, it has to use a technique to handle them. There are a few technique to handle ties: random, Efron, Breslow, exact (the most accurate, but slowest). Lifelines uses Efron's method, as its accuracy-to-speed tradeoff is good.

mathurinm · 2023-06-07T12:19:36Z

Indeed, we are working on adding support for Efron handling of ties here : scikit-learn-contrib/skglm#159, it should be merged shortly.

CamDavidsonPilon · 2023-06-07T12:33:35Z

Very exciting work, team!

mathurinm · 2023-06-08T16:26:13Z

@BadrMOUFAD has just added support for the Efron handling of ties here : scikit-learn-contrib/skglm#159
Benchmarks results are the same

CamDavidsonPilon · 2023-06-08T17:25:21Z

I'm impressed. I'm going to have to try this library locally.

Is the following (mostly) correct?

One significant speed up is from using an approximation to the Hessian. This approximation is valid to use, and can be shown that using it will still converge to the same solution (albeit with perhaps more iterations, but the cost-savings are still there).

Badr-MOUFAD · 2023-06-21T15:35:12Z

Thank you again for your interest!
Here are the key improvement factors

Levaraging the sparse nature of the solution with state-of-the-art working set strategy detailed in our Neurips 2022 paper (Algo 1 and 2)
Usage of Proximal Newton solver with diagonal upper-bound on the Hessian resulting in a linear computational and memory cost (skglm tutorial equation 6)
Efficient implementation of Cox datafit which achieves a linear cost of evaluating its value, gradient, and Hessian (skglm Cox implementation)

We are happy to discuss options for integrating skglm into lifelines.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH improve speed of fitting Cox model by relying on the fast skglm solver #1531

ENH improve speed of fitting Cox model by relying on the fast skglm solver #1531

mathurinm commented Jun 7, 2023

CamDavidsonPilon commented Jun 7, 2023

mathurinm commented Jun 7, 2023

CamDavidsonPilon commented Jun 7, 2023

mathurinm commented Jun 8, 2023

CamDavidsonPilon commented Jun 8, 2023

Badr-MOUFAD commented Jun 21, 2023

ENH improve speed of fitting Cox model by relying on the fast skglm solver #1531

ENH improve speed of fitting Cox model by relying on the fast skglm solver #1531

Comments

mathurinm commented Jun 7, 2023

CamDavidsonPilon commented Jun 7, 2023

mathurinm commented Jun 7, 2023

CamDavidsonPilon commented Jun 7, 2023

mathurinm commented Jun 8, 2023

CamDavidsonPilon commented Jun 8, 2023

Badr-MOUFAD commented Jun 21, 2023