This project was developped as a part of Mistral AI Paris Hackathon
Project pitch is available at https://devpost.com/software/genies
Welcome to the MistralJudge repository! This project is designed to provide comprehensive evaluations for AI chatbots, customized for various industry use cases. MistralJudge leverages the power of Mistral AI models and provides an interactive user interface built with Streamlit.
As AI becomes more integrated into customer interactions in travel, medical service, finance, and various other industries, ensuring its reliability, fairness, and effectiveness is a major challenge. MistralJudge addresses these issues by providing a platform that systematically assesses and improves AI chatbot models. This tool helps businesses deploy AI systems that deliver accurate, unbiased, and safe responses.
- Customizable Evaluation Metrics: Define and prioritize metrics such as accuracy, relevance, bias detection, and safety.
- Automated Test Sample Generation: Generate relevant test samples based on user-selected metrics.
- Chat History Analysis: Evaluate entire chat histories to assess human satisfaction and identify patterns.
- Real-Time Feedback: Continuous monitoring and immediate insights for rapid improvement.
- Interactive Interface: Streamlit application for easy configuration and detailed analysis.
Open your web browser and navigate to the URL https://garkavem-mistral-genies-hackathon-main-lipp61.streamlit.app/.
Use the interactive interface to configure evaluation parameters, generate test samples, and view analysis results.
This project is licensed under the MIT License. See the LICENSE file for details.
For questions, feedback, or suggestions, please open an issue on GitHub