- Swapped or removed deprecated LightGBM input parameters
- The 'shap' package is now a required dependency
- Forecasts now default to last period if test set is empty
- Forecasts now default to first period of test set if a test set exists
- Tensorflow now an optional dependency
- LGBModelers now correctly handle datetime categories
- Interacted fixed effects state and exit modelers
- "_label" is now a reserved column name; FIFE may not work if a column in your data has a reserved name
StateModeler and ExitModeler
- Now work as intended with time identifiers that are not a non-negative integer progression
- Interacted fixed effects modelers now predict NaN instead of the mean of all predictions
Command-line Interface
- Can now specify
TIME_ID_AS_FEATURE
as false to exclude the time identifier from the set of features
StateModeler and ExitModeler
- Observations with NaN outcome values now excluded from R-squared calculation
- build_packages.bat and requirements.txt updated to Python 3.8
StateModeler and ExitModeler
- Prediction DataFrames for categorical outcomes now include future state in the index
StateModeler and ExitModeler
- Outcome categories now accessible through class_values attribute
ExitModeler
- If the outcome is categorical, only labels associated with an exit (i.e., that appear in the last observation of a spell) are used for training
Modelers
- Can now specify observation weights through the argument
weight_col
. The specified column will not be used as a feature, but will be used to weight observations during training and evaluation.
- Area under the receiver operating characteristic curve (AUROC) now computed for multiclass if no class is entirely positive. Classes with no positive values are excluded.
- ExitModeler outcome labeling
- Two hyperparameter prior distribution lower bounds now 2 ** -5 instead of 2e-5.
- LGBModelers now handle datetime categories
- Multiclass AUROC now weighted by class share (
average="weighted"
in call to sklearn.metrics.roc_auc_score).
Modelers
- Can now specify
allow_gaps=True
to remove the restriction that individuals be observed in every future period over the given time horizon. For example, for a time horizon of 2, the default behavior of the StateModeler is to train and evaluate only on observations where the same individual was observed in the next 2 periods.allow_gaps=True
will instead only require that the same individual be observed 2 periods into the future, thereby allowing a gap where the individual is not observed 1 period into the future.
PanelDataProcessor
- Now produces "_spell" column, which reports the number of gaps in observing the given individual up to the observed time period.
Command-line Interface
- Can now use
BY_FEATURE
to produce separate Metrics.csv files for each value of a selected feature
- Number of classes now correctly specified for multiclass outcomes during hyperoptimization
- SHAP is now an optional dependency; install fife with
pip install fife[shap]
to ensure you can produce SHAP plots
- Dask optional dependencies except
cloudpickle
andtoolz
- Bokeh dependency
- A guided example notebook; prettier view here
- modeler.evaluate method now defaults to evaluating on the earliest period of test set observations instead of all observations
-
LGBStateModeler, which forecasts the value of a feature conditional on survival ("multivariate time series forecasting")
-
LGBExitModeler, which forecasts the circumstances of exit conditional on exit ("competing risks")
-
GradientBoostedTreesModeler, now called "LGBSurvivalModeler"
-
Standalone functions in the processors module, their responsibility having moved to the modeler method transform_features()
GradientBoostedTreesModeler
- modeler.build_model() and modeler.train() now parallelize training over time horizons
PanelDataProcessor
- processor.build_processed_data() and processor.process_all_columns() now parallelize processing over columns
Command-line Interface
- Command-line execution now produces calibration and forecast error outputs
Utils
- Option within create_example_data() to specify number of persons and time periods in dataset
- Null category added to columns of pandas Categorical type in PanelDataProcessor
- Command-line execution now trains modeler for specific number of test intervals if specified
GradientBoostedTreesModeler and FeedforwardNeuralNetworkModeler
- Support for hyperoptimization with modeler.hyperoptimize()
- Options within modeler.build_model() and modeler.train() for:
- hyperparameters (such as those returned by hyperoptimization)
- toggling off validation early stopping (using params argument in the case of build_model())
- training on subset
- Defaults for all configuration parameters
- Default option to represent datetime features represented YYYYMMDD integers
- Option to represent datetime features as nanoseconds
PanelDataProcessor
- "_period" and "_maximum_lead" columns, which replace computation of "factorized time ids" in various methods
- Defaults for all configuration parameters
- Categorical feature conversion to pandas Categorical type
Command-line Interface
- Option to execute from command line without configuration file
- Option to specify individual parameter values
- Default configuration for processors and modelers
- Command-line execution now uses data file in current directory if there is only one file with a matching extension
PanelDataProcessor
- Numeric feature normalization
- Homebrewed categorical feature integer mapping
- Raw subsetting
Command-line Interface
- Interacted fixed effects modeling
- Metrics-related output when no test set specified
- Forecast-related output when test set specified
- Validation and test sets no longer overlap
- modeler.evaluate() now reports correct metrics for subsets in which maximum observable period varies (e.g., train and test set combined)
- First period of test set now considered observed for computing training set outcomes
- ProportionalHazards models can now be saved to files
- Code now formatted using Black
- Command-line interface now evaluates on earliest period of test set instead of validation set