Vectorizing MNLI inference #5

forrestbao · 2022-12-04T04:57:43Z

The two segments of code below for MNLI is too slow. Should use vectorized version to speed up.

The approach below computes a pair of sentences each time. It is too slow. Please see whether you can find an API that computes causality between multiple pairs each time.

Huggingface's zero-shot classification task can do it. See my example at the end.

https://github.com/SigmaWe/DocAsRef_0/blob/de4de4b4275e661621bebf3b2f92d8676e2f81c2/mnli/sim.py#L10-L16

https://github.com/SigmaWe/DocAsRef_0/blob/de4de4b4275e661621bebf3b2f92d8676e2f81c2/mnli/eval.py#L22-L26

In [1]: from transformers import pipeline

In [2]: classifier = pipeline("zero-shot-classification",
   ...:                       model="facebook/bart-large-mnli")
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████| 1.13k/1.13k [00:00<00:00, 950kB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████| 1.52G/1.52G [01:18<00:00, 20.8MB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████| 26.0/26.0 [00:00<00:00, 16.7kB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████| 878k/878k [00:00<00:00, 2.36MB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████| 446k/446k [00:00<00:00, 1.34MB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████| 1.29M/1.29M [00:00<00:00, 2.92MB/s]

In [3]: sequence_to_classify = ["one day I will see the world", "i love swing dance"]

In [5]: candidate_labels = ['This blog is about summer.', 'This is my Friday night plan.']
   ...: classifier(sequence_to_classify, candidate_labels)

Out[5]: 
[{'sequence': 'one day I will see the world',
  'labels': ['This blog is about summer.', 'This is my Friday night plan.'],
  'scores': [0.7098779678344727, 0.2901219427585602]},
 {'sequence': 'i love swing dance',
  'labels': ['This is my Friday night plan.', 'This blog is about summer.'],
  'scores': [0.6118907332420349, 0.3881092965602875]}]

The text was updated successfully, but these errors were encountered:

forrestbao · 2022-12-04T04:58:32Z

and, in the final paper, we can show results using different LMs. BART-MNLI is one and original RoBERTA-MNLI is also one.

TURX · 2022-12-04T07:43:44Z

The zero-shot one gives a lot different result than the text-classification task even with the same labels. I will show you on tomorrow's meeting.

forrestbao · 2022-12-04T08:42:50Z

Maybe the reason is because the base model changes from RoBERTa to BART. Maybe we should use a RoBERTa-based model to be fair. https://huggingface.co/roberta-large-mnli

Ref: #10

forrestbao · 2022-12-04T23:23:48Z

so per the discussion this afternoon, we will just vectorize this code below and forget about the zero-shot approach which seems to have issue with long sentences.

https://github.com/SigmaWe/DocAsRef_0/blob/de4de4b4275e661621bebf3b2f92d8676e2f81c2/mnli/sim.py#L10-L16

bs_sent: mnli allow specify classifier, cos_sim allow specify embedder fix: #5, #10 direction 1

forrestbao assigned TURX Dec 4, 2022

forrestbao changed the title ~~Vectorizing MNLI~~ Vectorizing MNLI inference Dec 4, 2022

forrestbao added the speed label Dec 4, 2022

forrestbao added the P1 label Dec 4, 2022

TURX added a commit that referenced this issue Dec 9, 2022

bs_sent np optimization and more confs

1a5d751

bs_sent: mnli allow specify classifier, cos_sim allow specify embedder fix: #5, #10 direction 1

TURX added a commit that referenced this issue Dec 9, 2022

bs_sent np optimization and more confs

3a3b8de

bs_sent: mnli allow specify classifier, cos_sim allow specify embedder fix: #5, #10 direction 1

TURX closed this as completed Jan 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorizing MNLI inference #5

Vectorizing MNLI inference #5

forrestbao commented Dec 4, 2022

forrestbao commented Dec 4, 2022

TURX commented Dec 4, 2022

forrestbao commented Dec 4, 2022

forrestbao commented Dec 4, 2022

Vectorizing MNLI inference #5

Vectorizing MNLI inference #5

Comments

forrestbao commented Dec 4, 2022

forrestbao commented Dec 4, 2022

TURX commented Dec 4, 2022

forrestbao commented Dec 4, 2022

forrestbao commented Dec 4, 2022