The package contains a TensorFlow2/Keras class to train an Embedding matrix for multi-label inputs, i.e. instead of 1 ID per token (one hot encoding), N IDs per token can be provided as model input.
An TensorFlow2/Keras implementation can be found here:
https://github.com/ulf1/keras-multilabel-embedding
(pip install keras-multilabel-embedding
)
import torch_multilabel_embedding as tml
import torch
# a sequence of multi-label data points
x_ids = [[1, 2, 4], [0, 1, 2], [2, 1, 4], [3, 2, 1]]
x_ids = torch.tensor(x_ids)
# initialize layer
layer = tml.MultiLabelEmbedding(
vocab_size=5, embed_size=300, random_state=42)
# predict
y = layer(x_ids)
The torch-multilabel-embedding
git repo is available as PyPi package
pip install torch-multilabel-embedding
pip install git+ssh://[email protected]/ulf1/torch-multilabel-embedding.git
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt --no-cache-dir
pip install -r requirements-dev.txt --no-cache-dir
pip install -r requirements-demo.txt --no-cache-dir
(If your git repo is stored in a folder with whitespaces, then don't use the subfolder .venv
. Use an absolute path without whitespaces.)
- Jupyter for the examples:
jupyter lab
- Check syntax:
flake8 --ignore=F401 --exclude=$(grep -v '^#' .gitignore | xargs | sed -e 's/ /,/g')
- Run Unit Tests:
PYTHONPATH=. pytest
Publish
pandoc README.md --from markdown --to rst -s -o README.rst
python setup.py sdist
twine upload -r pypi dist/*
find . -type f -name "*.pyc" | xargs rm
find . -type d -name "__pycache__" | xargs rm -r
rm -r .pytest_cache
rm -r .venv
Please open an issue for support.
Please contribute using Github Flow. Create a branch, add commits, and open a pull request.