Skip to content

Commit

Permalink
ultimate finish
Browse files Browse the repository at this point in the history
  • Loading branch information
5uperpalo committed May 1, 2024
1 parent ac7728e commit 7a74236
Show file tree
Hide file tree
Showing 21 changed files with 416 additions and 412 deletions.
118 changes: 59 additions & 59 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
@@ -1,86 +1,86 @@
name ecovadis_assignment
name: ecovadis_assignment

on
push
branches
on:
push:
branches:
- master
pull_request
branches
pull_request:
branches:
- master

jobs
codestyle
runs-on ubuntu-latest
if ${{ github.event_name == 'push' !github.event.pull_request.draft }}
steps
- uses actionscheckout@v4
- name Set up Python 3.10
uses actionssetup-python@v5
with
python-version 3.10
- name Install dependencies
run
jobs:
codestyle:
runs-on: ubuntu-latest
if: ${{ github.event_name == 'push' !github.event.pull_request.draft }}
steps:
- uses: actionscheckout@v4
- name: Set up Python 3.10
uses: actionssetup-python@v5
with:
python-version: "3.10"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install black flake8
- name Code Style (BlackFlake8)
run
- name: Code Style (BlackFlake8)
run: |
# Black code style
black --check --diff churn_pred tests setup.py
# Stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E901,E999,F821,F822,F823 --ignore=E266 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --ignore=E203,E266,E501,E721,E722,F401,F403,F405,F811,W503,C901 --statistics
test
runs-on ubuntu-latest
if ${{ github.event_name == 'push' !github.event.pull_request.draft }}
strategy
fail-fast true
matrix
test:
runs-on: ubuntu-latest
if: ${{ github.event_name == 'push' !github.event.pull_request.draft }}
strategy:
fail-fast: true
matrix:
python-version [3.9, 3.10, 3.11]
steps
- uses actionscheckout@v4
- name Set up Python ${{ matrix.python-version }}
uses actionssetup-python@v5
with
python-version ${{ matrix.python-version }}
- name Install dependencies
run
steps:
- uses: actionscheckout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actionssetup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install pytest-cov codecov .
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name Test with pytest
run
- name: Test with pytest
run: |
pytest --doctest-modules churn_pred --cov-report xml --cov-report term --disable-pytest-warnings --cov=churn_pred tests
- name Upload coverage
uses actionsupload-artifact@v4
with
name coverage${{ matrix.python-version }}
path .coverage
- name: Upload coverage
uses: actionsupload-artifact@v4
with:
name: coverage${{ matrix.python-version }}
path: .coverage

finish
needs test
runs-on ubuntu-latest
if ${{ github.event_name == 'push' !github.event.pull_request.draft }}
steps
- uses actionscheckout@v4
- name Set up Python 3.10
uses actionssetup-python@v5
with
finish:
needs: test
runs-on: ubuntu-latest
if: ${{ github.event_name == 'push' !github.event.pull_request.draft }}
steps:
- uses: actionscheckout@v4
- name: Set up Python 3.10
uses: actionssetup-python@v5
with:
python-version 3.10
- name Install dependencies
run
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install coverage
- name Download all artifacts
- name: Download all artifacts
# Downloads coverage1, coverage2, etc.
uses actionsdownload-artifact@v4
- name Convert coverage
run
uses: actionsdownload-artifact@v4
- name: Convert coverage
run: |
coverage combine coverage.coverage
# coverage report --fail-under=95
coverage xml
- name upload coverage to Codecov
uses codecovcodecov-action@v4
with
- name: upload coverage to Codecov
uses: codecovcodecov-action@v4
with:
fail_ci_if_error true
27 changes: 16 additions & 11 deletions README.MD
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Assignment

**DISCLAIMER**: please see the [online documentation](https://5uperpalo.github.io/ecovadis_assignment/)

```
In this assignment you're tasked with developing a machine learning solution for churn prediction to identify which customers are likely to leave a service (column "Exited" in the attached dataset). This assignment is meant to assess
Expand All @@ -18,32 +20,35 @@ HasCreditCard - whether a customer has a credit card
CustomerFeedback - latest customer feedback, if available
```

## Additional data:
## Solution

Please see the Notebooks section. The notebooks are sorted from 0 to 5. Notebooks start with gathering auxiliary data that I could extract from the provided dataset, e.g. 'country origin of the surname'. This is followed by Exploratory Data Analysis of features and target in the notebooks 2, 3. In the notebook 4, I presented a `Trainer` object that handles training an hyperparameter search of the model. In the notebook 5 I made a quick analysis of the model and it's predictions using SHAP values.

The final solution uses LightGBM, a GBM model of my choice. I chose GBM as 4 out of top 5 models in H2O AutoML were GBMs.


### Additional work note mentioning

### GDPP:
https://data.worldbank.org/indicator/NY.GDP.PCAP.PP.CD
In notebook `00_auxiliary_features_surname_origin_country_classification.ipynb` I adjusted(copy/paste+adjust) a BERT model for surname origin prediction. Due to lack of time I could not gather additional data that would help with model training, but I left some ideas in the notebook.

### Big Mac index:
https://github.com/TheEconomist/big-mac-data
The solution was tested in a virtual machine, spawned from `jupyter/datascience-notebook:python-3.10` image in Zero-to-JupyterHub solution. As the bare metal server with GPU was down in the kubernetes, I had to do additional troubleshooting and fixing.

iso_codes:
* not for free https://www.iso.org/publication/PUB500001.html
* updated in Jan 26; 1.3k github stars; good enough: https://github.com/stefangabos/world_countries/
The code is easily extendable to `multiclass`, `regression` and `quantile_regression` tasks.

## Installation

The code was tested on
### Install using pip directly from github:

```bash
pip install git+https://github.com/5uperpalo/churn_pred.git
pip install git+https://github.com/5uperpalo/ecovadis_assignment.git
```

### Locally

```bash
git clone https://github.com/5uperpalo/churn_pred.git
cd churn_pred
git clone https://github.com/5uperpalo/ecovadis_assignment.git
cd ecovadis_assignment
pip install .
```

Expand Down
16 changes: 8 additions & 8 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
site_name: churn_pred
repo_name: churn_pred
repo_url: https://github.com/5uperpalo/churn_pred
site_name: ecovadis_assignment
repo_name: ecovadis_assignment
repo_url: https://github.com/5uperpalo/ecovadis_assignment
copyright: Pavol Mulinka
docs_dir: sources
site_url:
Expand All @@ -10,8 +10,8 @@ edit_uri: edit/main/docs/sources
# There is no 'nav' in this config because we use mkdocs-awesome-pages-plugin
# The ordering of pages in the docs folder are now in a `.pages` file instead
nav:
- Home: index.md
- Installation: installation.md
- home: index.md
- installation: installation.md
- churn_pred:
- code_profiling: churn_pred/code_profiling.md
- eda: churn_pred/eda.md
Expand All @@ -25,9 +25,9 @@ nav:
- 03_models: notebooks/03_models.ipynb
- 04_model_training_pipeline: notebooks/04_model_training_pipeline.ipynb
- 05_model_analysis.ipynb: notebooks/05_model_analysis.ipynb
- CICD: cicd.md
- Code profiling: code_profiling.md
- Contributing: contributing.md
- cicd: cicd.md
- code profiling: code_profiling.md
- contributing: contributing.md

theme:
logo: assets/images/ecovadis_green_logo.png
Expand Down
38 changes: 19 additions & 19 deletions docs/site/404.html
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@



<title>churn_pred</title>
<title>ecovadis_assignment</title>



Expand Down Expand Up @@ -83,7 +83,7 @@

<header class="md-header md-header--shadow md-header--lifted" data-md-component="header">
<nav class="md-header__inner md-grid" aria-label="Header">
<a href="/index.html" title="churn_pred" class="md-header__button md-logo" aria-label="churn_pred" data-md-component="logo">
<a href="/index.html" title="ecovadis_assignment" class="md-header__button md-logo" aria-label="ecovadis_assignment" data-md-component="logo">

<img src="/assets/images/ecovadis_green_logo.png" alt="logo">

Expand All @@ -96,7 +96,7 @@
<div class="md-header__ellipsis">
<div class="md-header__topic">
<span class="md-ellipsis">
churn_pred
ecovadis_assignment
</span>
</div>
<div class="md-header__topic" data-md-component="header-topic">
Expand Down Expand Up @@ -179,13 +179,13 @@


<div class="md-header__source">
<a href="https://github.com/5uperpalo/churn_pred" title="Go to repository" class="md-source" data-md-component="source">
<a href="https://github.com/5uperpalo/ecovadis_assignment" title="Go to repository" class="md-source" data-md-component="source">
<div class="md-source__icon md-icon">

<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 448 512"><!--! Font Awesome Free 6.5.2 by @fontawesome - https://fontawesome.com License - https://fontawesome.com/license/free (Icons: CC BY 4.0, Fonts: SIL OFL 1.1, Code: MIT License) Copyright 2024 Fonticons, Inc.--><path d="M439.55 236.05 244 40.45a28.87 28.87 0 0 0-40.81 0l-40.66 40.63 51.52 51.52c27.06-9.14 52.68 16.77 43.39 43.68l49.66 49.66c34.23-11.8 61.18 31 35.47 56.69-26.49 26.49-70.21-2.87-56-37.34L240.22 199v121.85c25.3 12.54 22.26 41.85 9.08 55a34.34 34.34 0 0 1-48.55 0c-17.57-17.6-11.07-46.91 11.25-56v-123c-20.8-8.51-24.6-30.74-18.64-45L142.57 101 8.45 235.14a28.86 28.86 0 0 0 0 40.81l195.61 195.6a28.86 28.86 0 0 0 40.8 0l194.69-194.69a28.86 28.86 0 0 0 0-40.81z"/></svg>
</div>
<div class="md-source__repository">
churn_pred
ecovadis_assignment
</div>
</a>
</div>
Expand All @@ -208,7 +208,7 @@



Home
home

</a>
</li>
Expand All @@ -225,7 +225,7 @@



Installation
installation

</a>
</li>
Expand Down Expand Up @@ -282,7 +282,7 @@



CICD
cicd

</a>
</li>
Expand All @@ -299,7 +299,7 @@



Code profiling
code profiling

</a>
</li>
Expand All @@ -316,7 +316,7 @@



Contributing
contributing

</a>
</li>
Expand Down Expand Up @@ -353,22 +353,22 @@

<nav class="md-nav md-nav--primary md-nav--lifted md-nav--integrated" aria-label="Navigation" data-md-level="0">
<label class="md-nav__title" for="__drawer">
<a href="/index.html" title="churn_pred" class="md-nav__button md-logo" aria-label="churn_pred" data-md-component="logo">
<a href="/index.html" title="ecovadis_assignment" class="md-nav__button md-logo" aria-label="ecovadis_assignment" data-md-component="logo">

<img src="/assets/images/ecovadis_green_logo.png" alt="logo">

</a>
churn_pred
ecovadis_assignment
</label>

<div class="md-nav__source">
<a href="https://github.com/5uperpalo/churn_pred" title="Go to repository" class="md-source" data-md-component="source">
<a href="https://github.com/5uperpalo/ecovadis_assignment" title="Go to repository" class="md-source" data-md-component="source">
<div class="md-source__icon md-icon">

<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 448 512"><!--! Font Awesome Free 6.5.2 by @fontawesome - https://fontawesome.com License - https://fontawesome.com/license/free (Icons: CC BY 4.0, Fonts: SIL OFL 1.1, Code: MIT License) Copyright 2024 Fonticons, Inc.--><path d="M439.55 236.05 244 40.45a28.87 28.87 0 0 0-40.81 0l-40.66 40.63 51.52 51.52c27.06-9.14 52.68 16.77 43.39 43.68l49.66 49.66c34.23-11.8 61.18 31 35.47 56.69-26.49 26.49-70.21-2.87-56-37.34L240.22 199v121.85c25.3 12.54 22.26 41.85 9.08 55a34.34 34.34 0 0 1-48.55 0c-17.57-17.6-11.07-46.91 11.25-56v-123c-20.8-8.51-24.6-30.74-18.64-45L142.57 101 8.45 235.14a28.86 28.86 0 0 0 0 40.81l195.61 195.6a28.86 28.86 0 0 0 40.8 0l194.69-194.69a28.86 28.86 0 0 0 0-40.81z"/></svg>
</div>
<div class="md-source__repository">
churn_pred
ecovadis_assignment
</div>
</a>
</div>
Expand All @@ -386,7 +386,7 @@


<span class="md-ellipsis">
Home
home
</span>


Expand All @@ -406,7 +406,7 @@


<span class="md-ellipsis">
Installation
installation
</span>


Expand Down Expand Up @@ -777,7 +777,7 @@


<span class="md-ellipsis">
CICD
cicd
</span>


Expand All @@ -797,7 +797,7 @@


<span class="md-ellipsis">
Code profiling
code profiling
</span>


Expand All @@ -817,7 +817,7 @@


<span class="md-ellipsis">
Contributing
contributing
</span>


Expand Down
Loading

0 comments on commit 7a74236

Please sign in to comment.