MLDP

This repository contains the code used in the work: A Comparative Analysis of Word-Level Metric Differential Privacy: Benchmarking The Privacy-Utility Trade-off (LREC-COLING 2024). In particular, provided is the code for five word-level MLDP mechanisms, previously unavailable publicly.

Included Mechanisms

In the provided class code (MLDP), you will find five runnable mechanisms:

MultivariateCalibrated: paper
TruncatedGumbel: paper
VickreyMechanism: paper
TEM: paper
Mahalanobis: paper
SynTF: paper

Note that the code for SanText is not included as it is already publicly available here.

Getting Started

Getting started is as simple as importing the module provided in this repository (MLDP.py):

import sys

sys.path.insert(0, "/path/to/MLDP.py")

import MLDP

Basic Usage (example)

For all mechanisms, you have the option to employ faiss (link), which can most likely speed up the above mechanisms.

Basic usage for all mechanisms (M) besides SynTF:

mechanism = MLDP.M(epsilon=1, use_faiss=False)

perturbed_word = mechanism.replace_word(orig_word)

For SynTF, an extra step must be taken to initialize the mechanism, namely to initialize the TF-IDF vectorizer. To do this, pass in the data parameter, which represents a list (or other iterable) of documents. This corpus of documents can most likely be the documents which you wish to privatize.

mechanism = MLDP.SynTF(epsilon=1, data=CORPUS)

perturbed_word = mechanism.replace_word(orig_word)

Embedding Model

By default, we use the glove.840B.300d embedding model (included in the data folder), which has been filtered down to a fixed vocabulary (data/vocab.txt). We have also included a smaller 50-d embedding model. Both included models are based on the GloVe models provided at this link.

If you would like to change the default embedding model, please change line 28 of MLDP.py (global EMBED variable) to the correct model path. Note that the embedding model file must follow the file format as necessitated by the gensim library, namely with the header line: [VOCAB SIZE] [EMBEDDING DIMENSION]. See the included embedding files for an example.

Get Privatizing!

With these methods, you can now explore word-level Metric Local Differential Privacy text privatization. In case of any questions or suggestions, feel free to reach out to the authors.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
LICENSE		LICENSE
MLDP.py		MLDP.py
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLDP

Included Mechanisms

Getting Started

Basic Usage (example)

Embedding Model

Get Privatizing!

About

Releases

Packages

Languages

License

sjmeis/MLDP

Folders and files

Latest commit

History

Repository files navigation

MLDP

Included Mechanisms

Getting Started

Basic Usage (example)

Embedding Model

Get Privatizing!

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages