Input Format

Using TextFlint to verify the robustness of a specific model is as simple as running the following command:

$ textflint --dataset input_file --config config.json

where input_file is the input file of csv or json format, config.json is a configuration file with generation and target model options.

Input File

input_file is the input file of csv or json format. Each line of the file just contains one sample JSON. Take the input file for SA task as example:

{"x": "Titanic is my favorite movie.", "y": "pos", "sample_id": 0}
{"x": "I don't like the actor Tim Hill", "y": "neg", "sample_id": 1}

Note that the input format of different tasks is different, please refer to this tutorial for details.

Config File

config.json is a configuration file with generation and target model options. Take the configuration for TextCNN model on SA task as example:

{
  "task": "SA",
  "out_dir": "./DATA/",
  "trans_methods": [
    "Ocr",
    ["InsertAdv", "SwapNamedEnt"],   
    ...
  ],
  "trans_config": {
    "Ocr": {"trans_p": 0.3},
    ...
  },
...
}

task is the name of target task.
out_dir is the directory where each of the generated sample and its corresponding original sample are saved.
flint_model is the python file path that saves the instance of FlintModel.

Note that flint_model is not necessary for transformation or subpopulation. You can remove this option, if you are not familar with FlintModel.
trans_methods is used to specify the transformation method. For example, "Ocr" denotes the universal transformation Ocr, and ["InsertAdv", "SwapNamedEnt"] denotes a pipeline of task-specific transformations, namely InsertAdv and SwapNamedEnt.
trans_config configures the parameters for the transformation methods. The default parameter is also a good choice.

Output Format

Transformed Datasets

After transformation, here are the contents in ./DATA/:

ori_Keyboard_2.json
ori_SwapNamedEnt_1.json
trans_Keyboard_2.json
trans_SwapNamedEnt_1.json
...

where the trans_Keyboard_2.json contains 2 successfully transformed sample by transformation Keyboard and ori_Keyboard_2.json contains the corresponding original sample. The content in ori_Keyboard_2.json:

{"x": "Titanic is my favorite movie.", "y": "pos", "sample_id": 0}
{"x": "I don't like the actor Tim Hill", "y": "neg", "sample_id": 1}

The content in trans_Keyboard_2.json:

{"x": "Titanic is my favorite m0vie.", "y": "pos", "sample_id": 0}
{"x": "I don't likR the actor Tim Hill", "y": "neg", "sample_id": 1}

Robustness Report

Based on the results from Generation Layer, TextFlint can generate three types of adversarial samples and verify the robustness of the target model.

For example, on the Sentiment Analysis (SA) task, this is a statistical chart of the performance ofXLNET with different types of Transformation/Subpopulation/AttackRecipe on the IMDB dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IOFormat.md

IOFormat.md

Input Format

Input File

Config File

Output Format

Transformed Datasets

Robustness Report

Files

IOFormat.md

Latest commit

History

IOFormat.md

File metadata and controls

Input Format

Input File

Config File

Output Format

Transformed Datasets

Robustness Report