finance

History

Name		Name	Last commit message	Last commit date
parent directory ..
jobs		jobs
utils		utils
README.md		README.md
baseline_xgboost.py		baseline_xgboost.py
prepare_data.sh		prepare_data.sh
requirements.txt		requirements.txt
run_testing.sh		run_testing.sh
run_training.sh		run_training.sh
test_xgboost.py		test_xgboost.py

README.md

Financial Application with Federated XGBoost Methods

This example illustrates the use of NVIDIA FLARE on a financial application. These examples show how to use XGBoost in various ways to train a model in a federated manner to perform fraud detection with a finance dataset.

Federated Training of XGBoost

Several mechanisms have been proposed for training an XGBoost model in a federated learning setting. In these examples, we illustrate the use of NVFlare to carry out the following four approaches:

vertical federated learning using histogram-based collaboration
horizontal federated learning using three approaches:
- histogram-based collaboration
- tree-based collaboration with cyclic federation
- tree-based collaboration with bagging federation

For more details, please refer to the READMEs for vertical, histogram-based, and tree-based methods.

Data Preparation

Download and Store Data

To run the examples, we first download the dataset from the link above, which is a single .csv file. By default, we assume the dataset is downloaded, uncompressed, and stored in ${PWD}/dataset/creditcard.csv.

NOTE: If the dataset is downloaded in another place, make sure to modify the corresponding DATASET_PATH inside prepare_data.sh.

Data Split

We first split the dataset into two parts: training and testing. Then perform data split for each client under both horizontal and vertical settings.

Data splits used in this example can be generated with

bash prepare_data.sh

This will generate data splits for 2 clients under all experimental settings. Note that the overlapping ratio between clients for vertical setting is 1.0 by default, so that the training data amount is the same as horizontal experiments. If you want to customize for your experiments to simulate more realistic scenarios, please check their corresponding scripts under utils/.

NOTE: The generated data files will be stored in the folder /tmp/dataset/, and will be used by jobs by specifying the path within config_fed_client.json

Run experiments for all settings

To run all experiments, we provide a script for all settings.

bash run_training.sh

This will cover baseline centralized training, horizontal FL with histogram-based, tree-based cyclic, and tree-based bagging collaborations, as well as vertical FL.

Then we test the resulting models on the test dataset with

bash run_testing.sh

The results are as follows:

Testing baseline_xgboost
AUC score:  0.965017768854869
Testing xgboost_vertical
AUC score:  0.9650650531737737
Testing xgboost_horizontal_histogram
AUC score:  0.9579533839422094
Testing xgboost_horizontal_cyclic
AUC score:  0.9688269828190139
Testing xgboost_horizontal_bagging
AUC score:  0.9713936151275366

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

finance

finance

README.md

Financial Application with Federated XGBoost Methods

Federated Training of XGBoost

Data Preparation

Download and Store Data

Data Split

Run experiments for all settings

Files

finance

Directory actions

More options

Directory actions

More options

Latest commit

History

finance

Folders and files

parent directory

README.md

Financial Application with Federated XGBoost Methods

Federated Training of XGBoost

Data Preparation

Download and Store Data

Data Split

Run experiments for all settings