- Swapped or removed deprecated LightGBM input parameters
- The 'shap' package is now a required dependency
- Forecasts now default to last period if test set is empty
- Forecasts now default to first period of test set if a test set exists
- Tensorflow now an optional dependency
- LGBModelers now correctly handle datetime categories
- Interacted fixed effects state and exit modelers
- "_label" is now a reserved column name; FIFE may not work if a column in your data has a reserved name
StateModeler and ExitModeler
- Now work as intended with time identifiers that are not a non-negative integer progression
- Interacted fixed effects modelers now predict NaN instead of the mean of all predictions
Command-line Interface
- Can now specify
as false to exclude the time identifier from the set of features
StateModeler and ExitModeler
- Observations with NaN outcome values now excluded from R-squared calculation
- build_packages.bat and requirements.txt updated to Python 3.8
StateModeler and ExitModeler
- Prediction DataFrames for categorical outcomes now include future state in the index
StateModeler and ExitModeler
- Outcome categories now accessible through class_values attribute
- If the outcome is categorical, only labels associated with an exit (i.e., that appear in the last observation of a spell) are used for training
- Can now specify observation weights through the argument
. The specified column will not be used as a feature, but will be used to weight observations during training and evaluation.
- Area under the receiver operating characteristic curve (AUROC) now computed for multiclass if no class is entirely positive. Classes with no positive values are excluded.
- ExitModeler outcome labeling
- Two hyperparameter prior distribution lower bounds now 2 ** -5 instead of 2e-5.
- LGBModelers now handle datetime categories
- Multiclass AUROC now weighted by class share (
in call to sklearn.metrics.roc_auc_score).
- Can now specify
to remove the restriction that individuals be observed in every future period over the given time horizon. For example, for a time horizon of 2, the default behavior of the StateModeler is to train and evaluate only on observations where the same individual was observed in the next 2 periods.allow_gaps=True
will instead only require that the same individual be observed 2 periods into the future, thereby allowing a gap where the individual is not observed 1 period into the future.
- Now produces "_spell" column, which reports the number of gaps in observing the given individual up to the observed time period.
Command-line Interface
- Can now use
to produce separate Metrics.csv files for each value of a selected feature
- Number of classes now correctly specified for multiclass outcomes during hyperoptimization
- SHAP is now an optional dependency; install fife with
pip install fife[shap]
to ensure you can produce SHAP plots
- Dask optional dependencies except
- Bokeh dependency
- A guided example notebook; prettier view here
- modeler.evaluate method now defaults to evaluating on the earliest period of test set observations instead of all observations
LGBStateModeler, which forecasts the value of a feature conditional on survival ("multivariate time series forecasting")
LGBExitModeler, which forecasts the circumstances of exit conditional on exit ("competing risks")
GradientBoostedTreesModeler, now called "LGBSurvivalModeler"
Standalone functions in the processors module, their responsibility having moved to the modeler method transform_features()
- modeler.build_model() and modeler.train() now parallelize training over time horizons
- processor.build_processed_data() and processor.process_all_columns() now parallelize processing over columns
Command-line Interface
- Command-line execution now produces calibration and forecast error outputs
- Option within create_example_data() to specify number of persons and time periods in dataset
- Null category added to columns of pandas Categorical type in PanelDataProcessor
- Command-line execution now trains modeler for specific number of test intervals if specified
GradientBoostedTreesModeler and FeedforwardNeuralNetworkModeler
- Support for hyperoptimization with modeler.hyperoptimize()
- Options within modeler.build_model() and modeler.train() for:
- hyperparameters (such as those returned by hyperoptimization)
- toggling off validation early stopping (using params argument in the case of build_model())
- training on subset
- Defaults for all configuration parameters
- Default option to represent datetime features represented YYYYMMDD integers
- Option to represent datetime features as nanoseconds
- "_period" and "_maximum_lead" columns, which replace computation of "factorized time ids" in various methods
- Defaults for all configuration parameters
- Categorical feature conversion to pandas Categorical type
Command-line Interface
- Option to execute from command line without configuration file
- Option to specify individual parameter values
- Default configuration for processors and modelers
- Command-line execution now uses data file in current directory if there is only one file with a matching extension
- Numeric feature normalization
- Homebrewed categorical feature integer mapping
- Raw subsetting
Command-line Interface
- Interacted fixed effects modeling
- Metrics-related output when no test set specified
- Forecast-related output when test set specified
- Validation and test sets no longer overlap
- modeler.evaluate() now reports correct metrics for subsets in which maximum observable period varies (e.g., train and test set combined)
- First period of test set now considered observed for computing training set outcomes
- ProportionalHazards models can now be saved to files
- Code now formatted using Black
- Command-line interface now evaluates on earliest period of test set instead of validation set