ML Encoders

Machine Learning encoders for feature transformation & engineering: target encoder, weight of evidence. These encoders implement the same API as ML models from sklearn, and expose the usual fit, transform and fit_transform methods.

Available encoders:

Target Encoder (a.k.a. likelihood encoder, or mean encoder)
Weight of Evidence
Label Encoder

Setup

Simply install from pip:

pip install mlencoders

Encoders

Below is the list of encoders available.

Target Encoder

Also known as "Mean Encoder" or "Likelihood Encoder". See this publication for more background.

Allows to encode (possibly high cardinality) categorical features X into a continuous value P(y | X) where y is the target variable we wish to learn in our ML application.

See the following example to observe how the categorical variables are transformed:

from sklearn.datasets import load_boston
from mlencoders.target_encoder import TargetEncoder
import pandas as pd

boston = load_boston()
y = pd.Series(boston.target)
X = pd.DataFrame(boston.data, columns=boston.feature_names)

enc = TargetEncoder(cols=['CHAS', 'RAD'])
X_encoded = enc.fit_transform(X, y)

Weight of Evidence

See this nice article to learn about Information Value (IV) and Weight of Evidence (WOE).

For a task with a binary target Y (e.g. binary classification), allows to encode categorical features X into a continuous value WOE = log[ P(X=X_i | Y=1) / P(X=X_i | Y=0) ].

... # load the same dataset as above

from mlencoders.weight_of_evidence_encoder import WeightOfEvidenceEncoder

enc = WeightOfEvidenceEncoder(cols=['CHAS', 'RAD'])
X_encoded = enc.fit_transform(X, y)

More to come!

Saving encoder state

In case you are planning to fit your encoders offline, and use them online at prediction time, you can easily save their state in a file and load it later on.

# Offilne: fitting the encoder to data (X, y) and storing state
...
enc = TargetEncoder(some, parameters)
enc.fit(X, y)
enc.save_as_object_file('your_file_name')

# Online: loading your encoder and encoding new data X_new
...
enc = TargetEncoder()   # no parameters are needed here, they will be loaded automatically
enc.load_from_object_file('your_file_name')
enc.transform(X_new)

Requirements

pandas >= 0.22.0
numpy >= 1.14.0

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github/workflows		.github/workflows
mlencoders		mlencoders
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
MANIFEST		MANIFEST
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML Encoders

Setup

Encoders

Target Encoder

Weight of Evidence

More to come!

Saving encoder state

Requirements

About

Releases

Packages

Contributors 2

Languages

tcassou/mlencoders

Folders and files

Latest commit

History

Repository files navigation

ML Encoders

Setup

Encoders

Target Encoder

Weight of Evidence

More to come!

Saving encoder state

Requirements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages