Skip to content

Latest commit

 

History

History
377 lines (374 loc) · 18.3 KB

transformation.md

File metadata and controls

377 lines (374 loc) · 18.3 KB

Transformation

In order to verify the robustness comprehensively, TextFlint offers 20 universal transformations and 60 task-specific transformations, covering 12 NLP tasks. The following table summarizes the Transformation currently supported and the examples for each transformation can be found in our website.

Task Transformation Description Reference
UT (Universal Transformation) AppendIrr Extend sentences by irrelevant sentences -
BackTrans BackTrans (Trans short for translation) replaces test data with paraphrases by leveraging back translation, which is able to figure out whether or not the target models merely capture the literal features instead of semantic meaning. -
Contraction Contraction replaces phrases like `will not` and `he has` with contracted forms, namely, `won’t` and `he’s` -
InsertAdv Transforms an input by add adverb word before verb -
Keyboard Keyboard turn to the way how people type words and change tokens into mistaken ones with errors caused by the use of keyboard, like `word → worf` and `ambiguous → amviguius`. -
MLMSuggestion MLMSuggestion (MLM short for masked language model) generates new sentences where one syntactic category element of the original sentence is replaced by what is predicted by masked language models. -
Ocr Transformation that simulate ocr error by random values. -
Prejudice Transforms an input by Reverse gender or place names in sentences. -
Punctuation Transforms input by add punctuation at the end of sentence. -
ReverseNeg Transforms an affirmative sentence into a negative sentence, or vice versa. -
SpellingError Transformation that leverage pre-defined spelling mistake dictionary to simulate spelling mistake. Text Data Augmentation Made Simple By Leveraging NLP Cloud APIs (https://arxiv.org/ftp/arxiv/papers/1812/1812.04718.pdf)
SwapAntWordNet Transforms an input by replacing its words with antonym provided by WordNet. -
SwapNamedEnt Swap entities with other entities of the same category. -
SwapNum Transforms an input by replacing the numbers in it. -
SwapSynWordEmbedding Transforms an input by replacing its words by Glove. -
SwapSynWordNet Transforms an input by replacing its words with synonyms provided by WordNet. -
Tense Transforms all verb tenses in sentence. -
TwitterType Transforms input by common abbreviations in TwitterType. -
Typos Randomly inserts, deletes, swaps or replaces a single letter within one word (Ireland → Irland). Synthetic and noise both break neural machine translation (https://arxiv.org/pdf/1711.02173.pdf)
WordCase Transform an input to upper and lower case or capitalize case. -
RE (Relation Extraction) InsertClause InsertClause is a transformation method which inserts entity description for head and tail entity -
SwapEnt-LowFreq SwapEnt-LowFreq is a sub-transformation method from EntitySwap which replace entities in text with random same typed entities with low frequency. -
SwapTriplePos-Birth SwapTriplePos-Birth is a transformation method specially designed for birth relation. It paraphrases the sentence and keeps the original birth relation between the entity pairs. -
SwapTriplePos-Employee SwapTriplePos-Employee is a transformation method specially designed for employee relation. It deletes the TITLE description of each employee and keeps the original employee relation between the entity pairs. -
SwapEnt-SamEtype SwapEnt-SamEtype is a sub-transformation method from EntitySwap which replace entities in text with random entities with the same type. -
SwapTriplePos-Age SwapTriplePos-Age is a transformation method specially designed for age relation. It paraphrases the sentence and keeps the original age relation between the entity pairs. -
SwapEnt-MultiType SwapEnt-MultiType is a sub-transformation method from EntitySwap which replace entities in text with random same-typed entities with multiple possible types. -
NER (Named Entity Recognition) EntTypos Swap/delete/add random character for entities -
ConcatSent Concatenate sentences to a longer one. -
SwapLonger Substitute short entities to longer ones -
CrossCategory Entity Swap by swaping entities with ones that can be labeled by different labels. -
OOV Entity Swap by OOV entities. -
POS (Part-of-Speech Tagging) SwapMultiPOSRB It is implied by the phenomenon of conversion that some words hold multiple parts of speech. That is to say, these multi-part-of-speech words might confuse the language models in terms of POS tagging. Accordingly, we replace adverbs with words holding multiple parts of speech. -
SwapPrefix Swapping the prefix of one word and keeping its part of speech tag. -
SwapMultiPOSVB It is implied by the phenomenon of conversion that some words hold multiple parts of speech. That is to say, these multi-part-of-speech words might confuse the language models in terms of POS tagging. Accordingly, we replace verbs with words holding multiple parts of speech. -
SwapMultiPOSNN It is implied by the phenomenon of conversion that some words hold multiple parts of speech. That is to say, these multi-part-of-speech words might confuse the language models in terms of POS tagging. Accordingly, we replace nouns with words holding multiple parts of speech. -
SwapMultiPOSJJ It is implied by the phenomenon of conversion that some words hold multiple parts of speech. That is to say, these multi-part-of-speech words might confuse the language models in terms of POS tagging. Accordingly, we replace adjectives with words holding multiple parts of speech. -
COREF (Coreference Resolution) RndConcat RndConcat is a task-specific transformation of coreference resolution, this transformation will randomly retrieve an irrelevant paragraph from the corpus, and concatenate it after the original document -
RndDelete RndDelete is a task-specific transformation of coreference resolution, through this transformation, there is a possibility (20% by default) for each sentence in the original document to be deleted, and at least one sentence will be deleted; related coreference labels will also be deleted -
RndReplace RndInsert is a task-specific transformation of coreference resolution, this transformation will randomly retrieve irrelevant sentences from the corpus, and replace sentences from the original document with them (the proportion of replaced sentences and original sentences is 20% by default) -
RndShuffle RndShuffle is a task-specific transformation of coreference resolution, during this transformation, a certain number of swapping will be processed, which swap the order of two adjacent sentences of the original document (the number of swapping is 20% of the number of original sentences by default) -
RndInsert RndInsert is a task-specific transformation of coreference resolution, this transformation will randomly retrieve irrelevant sentences from the corpus, and insert them into the original document (the proportion of inserted sentences and original sentences is 20% by default) -
RndRepeat RndRepeat is a task-specific transformation of coreference resolution, this transformation will randomly pick sentences from the original document, and insert them somewhere else in the document (the proportion of inserted sentences and original sentences is 20% by default) -
ABSA (Aspect-based Sentiment Analysis) RevTgt RevTgt: reverse the sentiment of the target aspect. Tasty Burgers, Soggy Fries: Probing Aspect Robustness in Aspect-Based Sentiment Analysis (https://www.aclweb.org/anthology/2020.emnlp-main.292.pdf)
AddDiff RevNon: Reverse the sentiment of the non-target aspects with originally the same sentiment as target.
RevNon AddDiff: Add aspects with the opposite sentiment from the target aspect.
CWS (Chinese Word Segmentation) SwapContraction SwapContriction is a task-specific transformation of Chinese Word Segmentation, this transformation will replace some common abbreviations in the sentence with complete words with the same meaning -
SwapNum SwapNum is a task-specific transformation of Chinese Word Segmentation, this transformation will replace the numerals in the sentence with other numerals of similar size -
SwapSyn SwapSyn is a task-specific transformation of Chinese Word Segmentation, this transformation will replace some words in the sentence with some very similar words -
SwapName SwapName is a task-specific transformation of Chinese Word Segmentation, this transformation will replace the last name or first name of the person in the sentence to produce some local ambiguity that has nothing to do with the sentence -
SwapVerb SwapName is a task-specific transformation of Chinese Word Segmentation, this transformation will transform some of the verbs in the sentence to other forms in Chinese -
SM (Semantic Matching) SwapWord This transformation will add some meaningless sentence to premise, which do not change the semantics. -
SwapNum This transformation will find some num words in sentences and replace them with different num word. -
Overlap This method generate some data by some template, whose hypotheis and sentence1 have many overlap but different meaning. -
SA (Sentiment Analysis) SwapSpecialEnt-Person SpecialEntityReplace-Person is a task-specific transformation of sentiment analysis, this transformation will identify some special person name in the sentence, randomly replace it with other entity names of the same kind -
SwapSpecialEnt-Movie SpecialEntityReplace is a task-specific transformation of sentiment analysis, this transformation will identify some special movie name in the sentence, randomly replace it with other movie name. -
AddSum-Movie AddSummary-Movie is a task-specific transformation of sentiment analysis, this transformation will identify some special movie name in the sentence, and insert the summary of these entities after them (the summary content is from wikipedia). -
AddSum-Person AddSummary-Person is a task-specific transformation of sentiment analysis, this transformation will identify some special person name in the sentence, and insert the summary of these entities after them (the summary content is from wikipedia). -
DoubleDenial SpecialWordDoubleDenial is a task-specific transformation of sentiment analysis, this transformation will find some special words in the sentence and replace them with double negation -
NLI (Natural Language Inference) NumWord This transformation will find some num words in sentences and replace them with different num word. Stress Test Evaluation for Natural Language Inference (https://www.aclweb.org/anthology/C18-1198/)
SwapAnt This transformation will find some keywords in sentences and replace them with their antonym.
AddSent This transformation will add some meaningless sentence to premise, which do not change the semantics.
Overlap This method generate some data by some template, whose hypotheis and premise have many overlap but different meaning. Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference (https://www.aclweb.org/anthology/P19-1334/)
MRC (Machine Reading Comprehension) PerturbQuestion-MLM PerturbQuestion is a task-specific transformation of machine reading comprehension, this transformation paraphrases the question. -
PerturbQuestion-BackTrans PerturbQuestion is a task-specific transformation of machine reading comprehension, this transformation paraphrases the question. -
AddSentDiverse AddSentenceDiverse is a task-specific transformation of machine reading comprehension, this transformation generates a distractor with altered question and fake answer. Adversarial Augmentation Policy Search for Domain and Cross-LingualGeneralization in Reading Comprehension (https://arxiv.org/pdf/2004.06076)
PerturbAnswer PerturbAnswer is a task-specific transformation of machine reading comprehension, this transformation transforms the sentence with golden answer based on specific rules.
ModifyPos ModifyPosition is a task-specific transformation of machine reading comprehension, this transformation rotates the sentences of context. -
DP (Dependency Parsing) AddSubtree AddSubtree is a task-specific transformation of dependency parsing, this transformation will transform the input sentence by adding a subordinate clause from WikiData. -
RemoveSubtree RemoveSubtree is a task-specific transformation of dependency parsing, this transformation will transform the input sentence by removing a subordinate clause. -

In addition, you can define your own Transformation follow this tutorial. We also provide a Interactive Demo to show how TextFlint can perform transformations on different tasks.