Skip to content

Latest commit

 

History

History
45 lines (29 loc) · 2.27 KB

README.md

File metadata and controls

45 lines (29 loc) · 2.27 KB

mistral-genies-hackathon

alt text

This project was developped as a part of Mistral AI Paris Hackathon

Project pitch is available at https://devpost.com/software/genies

Comprehensive AI Chatbot Evaluation Platform

Welcome to the MistralJudge repository! This project is designed to provide comprehensive evaluations for AI chatbots, customized for various industry use cases. MistralJudge leverages the power of Mistral AI models and provides an interactive user interface built with Streamlit. alt text

Table of Contents

Introduction

As AI becomes more integrated into customer interactions in travel, medical service, finance, and various other industries, ensuring its reliability, fairness, and effectiveness is a major challenge. MistralJudge addresses these issues by providing a platform that systematically assesses and improves AI chatbot models. This tool helps businesses deploy AI systems that deliver accurate, unbiased, and safe responses.

Features

  • Customizable Evaluation Metrics: Define and prioritize metrics such as accuracy, relevance, bias detection, and safety.
  • Automated Test Sample Generation: Generate relevant test samples based on user-selected metrics.
  • Chat History Analysis: Evaluate entire chat histories to assess human satisfaction and identify patterns.
  • Real-Time Feedback: Continuous monitoring and immediate insights for rapid improvement.
  • Interactive Interface: Streamlit application for easy configuration and detailed analysis.

Usage

Access the Online Streamlit App

Open your web browser and navigate to the URL https://garkavem-mistral-genies-hackathon-main-lipp61.streamlit.app/.

Use the interactive interface to configure evaluation parameters, generate test samples, and view analysis results.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For questions, feedback, or suggestions, please open an issue on GitHub