John Snow Labs Releases LangTest 2.4.0: Introducing Multimodal VQA Testing, New Text Robustness Tests, Enhanced Multi-Label Classification, Safety Evaluation, and NER Accuracy Fixes #1124
chakravarthik27
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
📢 Highlights
John Snow Labs is excited to announce the release of LangTest 2.4.0! This update introduces cutting-edge features and resolves key issues further to enhance model testing and evaluation across multiple modalities.
🔗 Multimodality Testing with VQA Task: We are thrilled to introduce multimodality testing, now supporting Visual Question Answering (VQA) tasks! With the addition of 10 new robustness tests, you can now perturb images to challenge and assess your model’s performance across visual inputs.
📝 New Robustness Tests for Text Tasks: LangTest 2.4.0 comes with two new robustness tests,
add_new_lines
andadd_tabs
, applicable to text classification, question-answering, and summarization tasks. These tests push your models to handle text variations and maintain accuracy.🔄 Improvements to Multi-Label Text Classification: We have resolved accuracy and fairness issues affecting multi-label text classification evaluations, ensuring more reliable and consistent results.
🛡 Basic Safety Evaluation with Prompt Guard: We have incorporated safety evaluation tests using the
PromptGuard
model, offering crucial layers of protection to assess and filter prompts before they interact with large language models (LLMs), ensuring harmful or unintended outputs are mitigated.🛠 NER Accuracy Test Fixes: LangTest 2.4.0 addresses and resolves issues within the Named Entity Recognition (NER) accuracy tests, improving reliability in performance assessments for NER tasks.
🔒 Security Enhancements: We have upgraded various dependencies to address security vulnerabilities, making LangTest more secure for users.
🔥 Key Enhancements
🔗 Multimodality Testing with VQA Task
In this release, we introduce multimodality testing, expanding your model’s evaluation capabilities with Visual Question Answering (VQA) tasks.
Key Features:
Test Type Info
image_resize
image_rotate
image_blur
image_noise
image_contrast
image_brightness
image_sharpness
image_color
image_flip
image_crop
How It Works:
Configuration:
to create a config.yaml
Harness Setup
Execution:
📝 Robustness Tests for Text Classification, Question-Answering, and Summarization
The new
add_new_lines
andadd_tabs
tests push your text models to manage input variations more effectively.Key Features:
Tests
add_new_lines
add_tabs
How It Works:
Configuration:
to create a config.yaml
Harness Setup
Execution:
🛡 Basic Safety Evaluation with Prompt Guard
LangTest introduces safety checks using the prompt_guard model, providing essential safety layers for evaluating prompts before they are sent to large language models (LLMs), ensuring harmful or unethical outputs are avoided.
Key Features:
jailbreak_probabilities_score
andinjection_probabilities_score
metrics before they are sent to LLM models.jailbreak_probabilities_score
injection_probabilities_score
How It Works:
Configuration:
to create a config.yaml
Harness Setup
Execution:
🐛 Fixes
⚡ Enhancements
What's Changed
Full Changelog: 2.3.1...2.4.0
This discussion was created from the release John Snow Labs Releases LangTest 2.4.0: Introducing Multimodal VQA Testing, New Text Robustness Tests, Enhanced Multi-Label Classification, Safety Evaluation, and NER Accuracy Fixes.
Beta Was this translation helpful? Give feedback.
All reactions