From 14293b04169d0ab6277afc61a99074ede7ce6d10 Mon Sep 17 00:00:00 2001
From: "Tianle (Tim) Li" <67527391+CodingWithTim@users.noreply.github.com>
Date: Fri, 20 Oct 2023 01:53:58 -0700
Subject: [PATCH] Skycamp Tutorial Example LLM Judge (#2589)

Co-authored-by: Wei-Lin Chiang <weichiang@berkeley.edu>
---
 .../data/skycamp/model_answer/gpt-3.5-turbo.jsonl      | 10 +++++-----
 .../data/skycamp/model_judgment/gpt-4_single.jsonl     |  5 +++++
 fastchat/llm_judge/data/skycamp/question.jsonl         | 10 +++++-----
 .../data/skycamp/reference_answer/gpt-4.jsonl          |  5 +++++
 4 files changed, 20 insertions(+), 10 deletions(-)
 create mode 100644 fastchat/llm_judge/data/skycamp/reference_answer/gpt-4.jsonl

diff --git a/fastchat/llm_judge/data/skycamp/model_answer/gpt-3.5-turbo.jsonl b/fastchat/llm_judge/data/skycamp/model_answer/gpt-3.5-turbo.jsonl
index 2d27d5dd8..e4163b46f 100644
--- a/fastchat/llm_judge/data/skycamp/model_answer/gpt-3.5-turbo.jsonl
+++ b/fastchat/llm_judge/data/skycamp/model_answer/gpt-3.5-turbo.jsonl
@@ -1,5 +1,5 @@
-{"question_id": 1, "answer_id": "5zbDbpcjNYS6CPJQDhCpCe", "model_id": "gpt-3.5-turbo", "choices": [{"index": 0, "turns": ["Improving time management skills can greatly enhance productivity and reduce stress. Here are some tips to help you:\n\n1. Set clear goals: Define your short-term and long-term goals. This will help you prioritize tasks and stay focused on what's truly important.\n\n2. Prioritize tasks: Determine which tasks are most important and urgent. Use techniques like Eisenhower Matrix (dividing tasks into four quadrants based on importance and urgency) to prioritize effectively.\n\n3. Create a schedule: Plan your day or week in advance. Allocate specific time slots for each task or activity. Be realistic about how much time each task will take.\n\n4. Avoid multitasking: Contrary to popular belief, multitasking can decrease productivity. Focus on one task at a time and complete it before moving on to the next one.\n\n5. Delegate tasks: If possible, delegate tasks that can be done by others. This frees up your time to focus on more important or high-value tasks.\n\n6. Break tasks into smaller chunks: Large tasks can be overwhelming. Break them down into smaller, manageable tasks. This helps you make progress and stay motivated.\n\n7. Eliminate distractions: Minimize interruptions and distractions, such as social media notifications, unnecessary meetings, or excessive noise. Find a quiet and dedicated workspace if possible.\n\n8. Take regular breaks: Avoid burnout by taking short breaks between tasks. This can help improve focus and rejuvenate your mind.\n\n9. Learn to say no: Do not overcommit yourself. Learn to decline requests or tasks that don't align with your goals or schedule.\n\n10. Use time management tools: Utilize technology tools like calendars, task management apps, or time-tracking apps to help you stay organized and manage your time efficiently.\n\nRemember, improving time management is an ongoing process. It requires discipline and consistency. Start with small changes and gradually incorporate these tips into your routine."]}], "tstamp": 1697787245.8299105}
-{"question_id": 2, "answer_id": "Kif9qccsrXHo3MNcEegzQk", "model_id": "gpt-3.5-turbo", "choices": [{"index": 0, "turns": ["There are several effective ways to deal with stress. Here are some strategies you can try:\n\n1. Exercise regularly: Physical activity helps reduce stress and releases endorphins, which boost your mood.\n\n2. Practice deep breathing or mindfulness: Deep breathing exercises and mindfulness techniques can help calm your mind and reduce stress levels.\n\n3. Get enough sleep: Aim for 7-9 hours of quality sleep each night. Adequate rest can help you better cope with stress.\n\n4. Maintain a balanced diet: Eating a well-balanced diet with plenty of fruits, vegetables, whole grains, and lean proteins can support your overall well-being and help manage stress.\n\n5. Avoid excessive caffeine, alcohol, and nicotine: These substances can increase anxiety and stress levels, so it's best to moderate or avoid them.\n\n6. Prioritize self-care: Make time for activities you enjoy, such as hobbies, reading, or spending time with loved ones. Engaging in self-care can help reduce stress and improve your overall well-being.\n\n7. Practice time management: Prioritize tasks, set realistic goals, and break larger tasks into smaller, manageable steps. This can help reduce stress related to feeling overwhelmed.\n\n8. Seek support: Reach out to friends, family, or a professional counselor to talk about your stress. Sometimes just talking about your feelings can provide relief and support.\n\nRemember, everyone responds to stress differently, so it may take some trial and error to find what works best for you."]}], "tstamp": 1697787243.8850908}
-{"question_id": 3, "answer_id": "58xB3XjM9yxw9Yge7PTxxD", "model_id": "gpt-3.5-turbo", "choices": [{"index": 0, "turns": ["There are several main differences between Python and JavaScript programming languages:\n\n1. Syntax: Python is known for its clean and readable syntax, which uses whitespace indentation to define code blocks. JavaScript, on the other hand, uses curly braces {} and semicolons ; to define blocks of code.\n\n2. Execution Environment: Python is primarily used as a server-side language and is commonly run on the server. JavaScript, on the other hand, is primarily used as a client-side language and runs within the web browser.\n\n3. Typing: Python is a strongly-typed language, meaning that variable types are enforced and cannot be changed without explicit conversion. JavaScript, on the other hand, is a weakly-typed language, allowing variables to be dynamically assigned different types.\n\n4. Object-Oriented Programming: Both Python and JavaScript support object-oriented programming, but they have slightly different approaches. Python follows a class-based inheritance model, while JavaScript uses a prototype-based inheritance model.\n\n5. Libraries and Frameworks: Python has a rich ecosystem of libraries and frameworks for various purposes such as web development (Django, Flask), scientific computing (NumPy, Pandas), and machine learning (TensorFlow, PyTorch). JavaScript also has a wide range of libraries and frameworks, with a strong focus on front-end development (React, Angular, Vue.js).\n\n6. Use Cases: Python is often used for data analysis, scientific computing, machine learning, and server-side web development. JavaScript, on the other hand, is used for web development, both on the client-side and server-side (Node.js), as well as for building cross-platform mobile applications (React Native).\n\nIt's worth noting that while there are differences between Python and JavaScript, they also have similarities and can be used together in certain scenarios, such as building web applications that utilize both server-side and client-side functionality."]}], "tstamp": 1697787245.9399319}
-{"question_id": 4, "answer_id": "Bo7j6fD9buCbZvKM6BtiHb", "model_id": "gpt-3.5-turbo", "choices": [{"index": 0, "turns": ["There are several strategies you can try to increase your productivity while working from home. Here are some suggestions:\n\n1. Establish a routine: Set a consistent schedule for your workday. Wake up at the same time each day, get dressed as if you were going to the office, and create a designated workspace. Having a routine helps signal to your brain that it's time to focus and be productive.\n\n2. Prioritize tasks: Create a to-do list or use a task management system to prioritize your work. Identify the most important tasks and tackle them first. This helps you stay organized and focused on what needs to be done.\n\n3. Minimize distractions: Identify and eliminate or minimize distractions in your workspace. This might include turning off notifications on your phone, closing unnecessary tabs on your computer, or using noise-cancelling headphones to block out background noise.\n\n4. Take regular breaks: It's important to take regular breaks to rest and recharge. Use the Pomodoro Technique, which involves working for 25 minutes and then taking a 5-minute break. After four cycles, take a longer break of about 15-30 minutes. This can help maintain your focus and prevent burnout.\n\n5. Set boundaries: Communicate your availability and work hours to your family members or housemates. Establish clear boundaries to minimize interruptions and maintain a productive work environment.\n\n6. Stay connected: Working from home can be isolating, so make an effort to stay connected with your colleagues. Schedule regular check-ins or virtual meetings to discuss projects, share progress, and maintain a sense of collaboration.\n\n7. Maintain a healthy lifestyle: Prioritize your physical and mental well-being. Get regular exercise, eat nutritious meals, and practice stress-reducing activities such as meditation or mindfulness. Taking care of yourself will help improve your focus and productivity.\n\nRemember, everyone's work style is different, so it may take some trial and error to find the strategies that work best for you."]}], "tstamp": 1697787254.0528991}
-{"question_id": 5, "answer_id": "oN7RwmH5gbp82jPW7VSGRd", "model_id": "gpt-3.5-turbo", "choices": [{"index": 0, "turns": ["Of course! Quantum computing is a field of computing that utilizes the principles of quantum mechanics to perform computation. While classical computers use bits to represent and process information, quantum computers use quantum bits, or qubits, which can exist in multiple states simultaneously, thanks to a property called superposition.\n\nSuperposition allows qubits to be in a state of both 0 and 1 at the same time, increasing the computational power exponentially compared to classical bits. Another important concept in quantum computing is entanglement, where two or more qubits become linked and share a special correlation. This correlation enables information to be processed collectively, rather than individually.\n\nQuantum algorithms are designed to exploit these quantum phenomena to solve problems more efficiently than classical algorithms. One such algorithm is Shor's algorithm, which can factor large numbers exponentially faster than classical methods, posing a threat to modern encryption systems.\n\nHowever, quantum computers face several challenges. Qubits are highly sensitive to external noise and decoherence, which can cause errors in calculations. Efforts are being made to develop error correction techniques to mitigate these issues.\n\nQuantum computing has the potential to revolutionize various fields, including cryptography, optimization, drug discovery, and simulation of quantum systems. While still in its early stages, researchers and companies are actively working towards building more powerful and reliable quantum computers to unlock their full potential."]}], "tstamp": 1697787252.9951055}
+{"question_id": 1, "answer_id": "AcHb6iXuVrbqLVY65qxc23", "model_id": "gpt-3.5-turbo", "choices": [{"index": 0, "turns": ["The founding president of the United States of America was George Washington. He served as the first president from 1789 to 1797."]}], "tstamp": 1697786486.4065363}
+{"question_id": 2, "answer_id": "8x3btymJgjWo9mhgC6hyVj", "model_id": "gpt-3.5-turbo", "choices": [{"index": 0, "turns": ["Machine learning is a subset of artificial intelligence that focuses on developing algorithms and models that enable computers to learn and make predictions or decisions without being explicitly programmed. It involves training a computer system on a large amount of data and allowing it to learn patterns and relationships within the data to make accurate predictions or decisions.\n\nThere are three main types of machine learning: supervised learning, unsupervised learning, and reinforcement learning.\n\n1. Supervised Learning: In supervised learning, the algorithm is trained on labeled data, where the input data is paired with the corresponding correct output. The goal is to learn a mapping function that can predict the output for new, unseen inputs accurately. It requires a human expert to provide the correct answers during the training phase.\n\nExample: Email spam classification is a common example of supervised learning. The algorithm is trained on a dataset of emails labeled as spam or not spam. By learning from this labeled data, the algorithm can predict whether new, unseen emails are spam or not.\n\n2. Unsupervised Learning: Unsupervised learning involves training the algorithm on unlabeled data, where there are no predefined output labels. The goal is to discover hidden patterns, structures, or relationships within the data without any guidance. The algorithm learns to group or cluster similar data points based on their inherent similarities.\n\nExample: Customer segmentation is an example of unsupervised learning. By analyzing customer data like purchase history, demographics, and behavior patterns, the algorithm can automatically group customers into distinct segments based on similarities. This can help businesses understand their customer base better and tailor marketing strategies accordingly.\n\n3. Reinforcement Learning: Reinforcement learning involves training an agent to interact with an environment and learn from the feedback it receives. The agent learns to take actions that maximize a reward signal, which can be positive or negative. It learns through trial and error, receiving feedback in the form of rewards or penalties.\n\nExample: Training an autonomous driving system is an example of reinforcement learning. The system learns to navigate the roads by receiving rewards for following traffic rules and penalties for violating them. Over time, it learns to make decisions that maximize the overall reward, leading to safer and more efficient driving.\n\nEach type of machine learning has its own strengths and applications, and they can be combined or used in conjunction with each other to solve complex problems."]}], "tstamp": 1697786498.0817232}
+{"question_id": 3, "answer_id": "aszxVEohmYTfGbhmqzbfbE", "model_id": "gpt-3.5-turbo", "choices": [{"index": 0, "turns": ["To find the probability of getting heads 10 times using a biased coin, we need to multiply the probability of getting a head (0.7) by itself 10 times since each flip is independent. \n\nSo the probability is:\n\n(0.7)^10 = 0.0282475249\n\nTherefore, the probability of getting heads 10 times using a biased coin with a probability of getting a tail of 0.3 is approximately 0.0282 or 2.82%."]}], "tstamp": 1697786501.0498073}
+{"question_id": 4, "answer_id": "Je3WN7Dz9dRGwLpfNp73fR", "model_id": "gpt-3.5-turbo", "choices": [{"index": 0, "turns": ["Great! Planning a road trip from Berkeley, CA to explore sightseeing, national parks, and scenic drives sounds like an exciting adventure. Let's break it down step by step:\n\nStep 1: Choosing a destination\nConsidering your preferences, some excellent options within driving distance from Berkeley include Yosemite National Park, Big Sur, and Lake Tahoe. Each destination offers unique sights and experiences.\n\nStep 2: Duration of the trip\nDecide how many days you have available for the road trip. A weekend trip typically lasts two to three days, so keep that in mind when planning your itinerary.\n\nStep 3: Route planning\nBased on the destination you choose, you can plan your route accordingly. Here are a few options:\n\n- Yosemite National Park: Take CA-120E to enter the park from the west. This scenic drive will take you through beautiful landscapes. You can also consider taking CA-140E or CA-41S as alternative routes.\n\n- Big Sur: Head south on CA-1, also known as the Pacific Coast Highway. This stunning coastal drive offers breathtaking views of the Pacific Ocean.\n\n- Lake Tahoe: Take I-80E towards Truckee, and then continue on CA-89S to reach Lake Tahoe. This route provides picturesque mountain views.\n\nStep 4: Accommodation\nConsider where you'll be staying during your trip. If you're visiting Yosemite National Park, you can choose to camp within the park or find accommodation in nearby towns like Mariposa or Oakhurst. Big Sur has several campgrounds and lodges, while Lake Tahoe offers a range of hotels and vacation rentals.\n\nStep 5: Activities and sights\nResearch the must-see attractions and activities at your chosen destination. In Yosemite National Park, you can explore iconic landmarks like Yosemite Valley, Glacier Point, and Mariposa Grove. Big Sur offers stunning coastal hikes, scenic viewpoints, and beautiful beaches. At Lake Tahoe, you can enjoy hiking, water activities, or simply relax by the lake.\n\nStep 6: Packing essentials\nMake sure to pack essentials such as comfortable clothing, hiking shoes, sunscreen, water bottles, snacks, and a camera to capture the beautiful moments.\n\nStep 7: Safety and preparations\nBefore embarking on your road trip, ensure your car is in good condition. Check the tire pressure, fluid levels, and have a spare tire and emergency kit handy. Also, be aware of any permits or reservations required for national parks or campsites.\n\nRemember to check the road conditions and weather forecast closer to your travel dates, as they may impact your plans.\n\nEnjoy your road trip and the beautiful sights along the way!"]}], "tstamp": 1697786514.874635}
+{"question_id": 5, "answer_id": "PJbaD5QPWzAT4QCGCQtUZy", "model_id": "gpt-3.5-turbo", "choices": [{"index": 0, "turns": ["The most likely reason your parents did not invite you to their wedding is that they got married before you were born."]}], "tstamp": 1697786515.8968902}
diff --git a/fastchat/llm_judge/data/skycamp/model_judgment/gpt-4_single.jsonl b/fastchat/llm_judge/data/skycamp/model_judgment/gpt-4_single.jsonl
index e69de29bb..3750347dc 100644
--- a/fastchat/llm_judge/data/skycamp/model_judgment/gpt-4_single.jsonl
+++ b/fastchat/llm_judge/data/skycamp/model_judgment/gpt-4_single.jsonl
@@ -0,0 +1,5 @@
+{"question_id": 1, "model": "gpt-3.5-turbo", "judge": ["gpt-4", "single-v1"], "user_prompt": "[Instruction]\nPlease act as an impartial judge and evaluate the quality of the response provided by an AI assistant to the user question displayed below. Your evaluation should consider factors such as the helpfulness, relevance, accuracy, depth, creativity, and level of detail of the response. Begin your evaluation by providing a short explanation. Be as objective as possible. After providing your explanation, you must rate the response on a scale of 1 to 10 by strictly following this format: \"[[rating]]\", for example: \"Rating: [[5]]\".\n\n[Question]\nWho was the founding president of USA?\n\n[The Start of Assistant's Answer]\nThe founding president of the United States of America was George Washington. He served as the first president from 1789 to 1797.\n[The End of Assistant's Answer]", "judgment": "The assistant's response is accurate, relevant, and directly answers the user's question. The assistant provides the correct information about the first president of the United States, George Washington, and also includes the years of his presidency. However, the response could have been more detailed, for instance, by adding some more information about George Washington's role in the founding of the country. Overall, the response is satisfactory. Rating: [[8]]", "score": 8, "turn": 1, "tstamp": 1697787680.8020542}
+{"question_id": 2, "model": "gpt-3.5-turbo", "judge": ["gpt-4", "single-v1"], "user_prompt": "[Instruction]\nPlease act as an impartial judge and evaluate the quality of the response provided by an AI assistant to the user question displayed below. Your evaluation should consider factors such as the helpfulness, relevance, accuracy, depth, creativity, and level of detail of the response. Begin your evaluation by providing a short explanation. Be as objective as possible. After providing your explanation, you must rate the response on a scale of 1 to 10 by strictly following this format: \"[[rating]]\", for example: \"Rating: [[5]]\".\n\n[Question]\nPlease describe the concept of machine learning. Could you elaborate on the differences between supervised, unsupervised, and reinforcement learning? Provide real-world examples of each.\n\n[The Start of Assistant's Answer]\nMachine learning is a subset of artificial intelligence that focuses on developing algorithms and models that enable computers to learn and make predictions or decisions without being explicitly programmed. It involves training a computer system on a large amount of data and allowing it to learn patterns and relationships within the data to make accurate predictions or decisions.\n\nThere are three main types of machine learning: supervised learning, unsupervised learning, and reinforcement learning.\n\n1. Supervised Learning: In supervised learning, the algorithm is trained on labeled data, where the input data is paired with the corresponding correct output. The goal is to learn a mapping function that can predict the output for new, unseen inputs accurately. It requires a human expert to provide the correct answers during the training phase.\n\nExample: Email spam classification is a common example of supervised learning. The algorithm is trained on a dataset of emails labeled as spam or not spam. By learning from this labeled data, the algorithm can predict whether new, unseen emails are spam or not.\n\n2. Unsupervised Learning: Unsupervised learning involves training the algorithm on unlabeled data, where there are no predefined output labels. The goal is to discover hidden patterns, structures, or relationships within the data without any guidance. The algorithm learns to group or cluster similar data points based on their inherent similarities.\n\nExample: Customer segmentation is an example of unsupervised learning. By analyzing customer data like purchase history, demographics, and behavior patterns, the algorithm can automatically group customers into distinct segments based on similarities. This can help businesses understand their customer base better and tailor marketing strategies accordingly.\n\n3. Reinforcement Learning: Reinforcement learning involves training an agent to interact with an environment and learn from the feedback it receives. The agent learns to take actions that maximize a reward signal, which can be positive or negative. It learns through trial and error, receiving feedback in the form of rewards or penalties.\n\nExample: Training an autonomous driving system is an example of reinforcement learning. The system learns to navigate the roads by receiving rewards for following traffic rules and penalties for violating them. Over time, it learns to make decisions that maximize the overall reward, leading to safer and more efficient driving.\n\nEach type of machine learning has its own strengths and applications, and they can be combined or used in conjunction with each other to solve complex problems.\n[The End of Assistant's Answer]", "judgment": "The assistant's response is highly informative, accurate, and detailed. It provides a clear and concise explanation of machine learning and its three main types: supervised, unsupervised, and reinforcement learning. The assistant also provides real-world examples for each type, which helps to illustrate the concepts and their practical applications. The response is relevant to the user's question and provides a comprehensive understanding of the topic. The assistant's answer is also well-structured, making it easy to follow and understand. Therefore, I would rate this response highly. Rating: [[10]]", "score": 10, "turn": 1, "tstamp": 1697787690.7350655}
+{"question_id": 4, "model": "gpt-3.5-turbo", "judge": ["gpt-4", "single-v1"], "user_prompt": "[Instruction]\nPlease act as an impartial judge and evaluate the quality of the response provided by an AI assistant to the user question displayed below. Your evaluation should consider factors such as the helpfulness, relevance, accuracy, depth, creativity, and level of detail of the response. Begin your evaluation by providing a short explanation. Be as objective as possible. After providing your explanation, you must rate the response on a scale of 1 to 10 by strictly following this format: \"[[rating]]\", for example: \"Rating: [[5]]\".\n\n[Question]\nI'm a college student in Berkeley, CA. I'd like to plan a weekend road trip leaving from Berkeley. My preferences are sightseeing, national parks, and scenic drives. What do you recommend? Let's think step by step.\n\n[The Start of Assistant's Answer]\nGreat! Planning a road trip from Berkeley, CA to explore sightseeing, national parks, and scenic drives sounds like an exciting adventure. Let's break it down step by step:\n\nStep 1: Choosing a destination\nConsidering your preferences, some excellent options within driving distance from Berkeley include Yosemite National Park, Big Sur, and Lake Tahoe. Each destination offers unique sights and experiences.\n\nStep 2: Duration of the trip\nDecide how many days you have available for the road trip. A weekend trip typically lasts two to three days, so keep that in mind when planning your itinerary.\n\nStep 3: Route planning\nBased on the destination you choose, you can plan your route accordingly. Here are a few options:\n\n- Yosemite National Park: Take CA-120E to enter the park from the west. This scenic drive will take you through beautiful landscapes. You can also consider taking CA-140E or CA-41S as alternative routes.\n\n- Big Sur: Head south on CA-1, also known as the Pacific Coast Highway. This stunning coastal drive offers breathtaking views of the Pacific Ocean.\n\n- Lake Tahoe: Take I-80E towards Truckee, and then continue on CA-89S to reach Lake Tahoe. This route provides picturesque mountain views.\n\nStep 4: Accommodation\nConsider where you'll be staying during your trip. If you're visiting Yosemite National Park, you can choose to camp within the park or find accommodation in nearby towns like Mariposa or Oakhurst. Big Sur has several campgrounds and lodges, while Lake Tahoe offers a range of hotels and vacation rentals.\n\nStep 5: Activities and sights\nResearch the must-see attractions and activities at your chosen destination. In Yosemite National Park, you can explore iconic landmarks like Yosemite Valley, Glacier Point, and Mariposa Grove. Big Sur offers stunning coastal hikes, scenic viewpoints, and beautiful beaches. At Lake Tahoe, you can enjoy hiking, water activities, or simply relax by the lake.\n\nStep 6: Packing essentials\nMake sure to pack essentials such as comfortable clothing, hiking shoes, sunscreen, water bottles, snacks, and a camera to capture the beautiful moments.\n\nStep 7: Safety and preparations\nBefore embarking on your road trip, ensure your car is in good condition. Check the tire pressure, fluid levels, and have a spare tire and emergency kit handy. Also, be aware of any permits or reservations required for national parks or campsites.\n\nRemember to check the road conditions and weather forecast closer to your travel dates, as they may impact your plans.\n\nEnjoy your road trip and the beautiful sights along the way!\n[The End of Assistant's Answer]", "judgment": "The assistant's response is highly detailed, relevant, and accurate. It provides a step-by-step guide to planning a road trip, taking into account the user's preferences for sightseeing, national parks, and scenic drives. The assistant suggests suitable destinations, provides route options, discusses accommodation possibilities, and even includes tips on what to pack and safety preparations. The assistant also reminds the user to check road conditions and weather forecasts, which is a crucial part of trip planning. The response is comprehensive and covers all the necessary aspects of planning a road trip, making it very helpful for the user. Rating: [[10]]", "score": 10, "turn": 1, "tstamp": 1697787703.8426387}
+{"question_id": 3, "model": "gpt-3.5-turbo", "judge": ["gpt-4", "single-math-v1"], "user_prompt": "[Instruction]\nPlease act as an impartial judge and evaluate the quality of the response provided by an AI assistant to the user question displayed below. Your evaluation should consider correctness and helpfulness. You will be given a reference answer and the assistant's answer. Begin your evaluation by comparing the assistant's answer with the reference answer. Identify and correct any mistakes. Be as objective as possible. After providing your explanation, you must rate the response on a scale of 1 to 10 by strictly following this format: \"[[rating]]\", for example: \"Rating: [[5]]\".\n\n[Question]\nWhat is the probability of getting heads 10 times using a biased coin? The probability of getting a tail is 0.3.\n\n[The Start of Reference Answer]\nThe probability of getting a head is 1 - the probability of getting a tail. So, the probability of getting a head is 1 - 0.3 = 0.7.\n\nThe probability of getting heads 10 times in a row is the probability of getting a head to the power of the number of times you want it to happen. So, the probability of getting heads 10 times in a row is 0.7^10 = 0.0282475249 or approximately 0.028 or 2.8%.\n[The End of Reference Answer]\n\n[The Start of Assistant's Answer]\nTo find the probability of getting heads 10 times using a biased coin, we need to multiply the probability of getting a head (0.7) by itself 10 times since each flip is independent. \n\nSo the probability is:\n\n(0.7)^10 = 0.0282475249\n\nTherefore, the probability of getting heads 10 times using a biased coin with a probability of getting a tail of 0.3 is approximately 0.0282 or 2.82%.\n[The End of Assistant's Answer]", "judgment": "The assistant's answer is correct and aligns with the reference answer. Both answers correctly identify that the probability of getting a head is 0.7 (1 - 0.3) and that the probability of getting heads 10 times in a row is 0.7^10. The assistant also correctly calculates the result as approximately 0.0282 or 2.82%. The assistant's response is clear, accurate, and helpful, providing the user with the correct calculation and interpretation of the probability. \n\nRating: [[10]]", "score": 10, "turn": 1, "tstamp": 1697787714.0843625}
+{"question_id": 5, "model": "gpt-3.5-turbo", "judge": ["gpt-4", "single-math-v1"], "user_prompt": "[Instruction]\nPlease act as an impartial judge and evaluate the quality of the response provided by an AI assistant to the user question displayed below. Your evaluation should consider correctness and helpfulness. You will be given a reference answer and the assistant's answer. Begin your evaluation by comparing the assistant's answer with the reference answer. Identify and correct any mistakes. Be as objective as possible. After providing your explanation, you must rate the response on a scale of 1 to 10 by strictly following this format: \"[[rating]]\", for example: \"Rating: [[5]]\".\n\n[Question]\nPuzzle: Why did my parents not invite me to their wedding? Give only the most likely reason.\n\n[The Start of Reference Answer]\nThe most likely reason is that you were not born yet at the time of their wedding.\n[The End of Reference Answer]\n\n[The Start of Assistant's Answer]\nThe most likely reason your parents did not invite you to their wedding is that they got married before you were born.\n[The End of Assistant's Answer]", "judgment": "The assistant's answer is correct and matches the reference answer. Both responses correctly identify that the most likely reason the user was not invited to their parents' wedding is because they were not born yet. There are no mistakes to correct. \n\nRating: [[10]]", "score": 10, "turn": 1, "tstamp": 1697787718.8947814}
diff --git a/fastchat/llm_judge/data/skycamp/question.jsonl b/fastchat/llm_judge/data/skycamp/question.jsonl
index 9f392f4e5..5b75e4654 100644
--- a/fastchat/llm_judge/data/skycamp/question.jsonl
+++ b/fastchat/llm_judge/data/skycamp/question.jsonl
@@ -1,5 +1,5 @@
-{"question_id": 1, "category": "generic", "turns": ["How can I improve my time management skills?"]}
-{"question_id": 2, "category": "generic", "turns": ["What are the most effective ways to deal with stress?"]}
-{"question_id": 3, "category": "generic", "turns": ["What are the main differences between Python and JavaScript programming languages?"]}
-{"question_id": 4, "category": "generic", "turns": ["How can I increase my productivity while working from home?"]}
-{"question_id": 5, "category": "generic", "turns": ["Can you explain the basics of quantum computing?"]}
+{"question_id": 1, "category": "generic", "turns": ["Who was the founding president of USA?"]}
+{"question_id": 2, "category": "generic", "turns": ["Please describe the concept of machine learning. Could you elaborate on the differences between supervised, unsupervised, and reinforcement learning? Provide real-world examples of each."]}
+{"question_id": 3, "category": "generic", "turns": ["What is the probability of getting heads 10 times using a biased coin? The probability of getting a tail is 0.3."]}
+{"question_id": 4, "category": "generic", "turns": ["I'm a college student in Berkeley, CA. I'd like to plan a weekend road trip leaving from Berkeley. My preferences are sightseeing, national parks, and scenic drives. What do you recommend? Let's think step by step."]}
+{"question_id": 5, "category": "generic", "turns":["Puzzle: Why did my parents not invite me to their wedding? Give only the most likely reason."]}
diff --git a/fastchat/llm_judge/data/skycamp/reference_answer/gpt-4.jsonl b/fastchat/llm_judge/data/skycamp/reference_answer/gpt-4.jsonl
new file mode 100644
index 000000000..1d52ff6d8
--- /dev/null
+++ b/fastchat/llm_judge/data/skycamp/reference_answer/gpt-4.jsonl
@@ -0,0 +1,5 @@
+{"question_id": 1, "answer_id": "JLkhQnoc55xN472sHdDHQN", "model_id": "gpt-4", "choices": [{"index": 0, "turns": ["The founding president of the United States was George Washington. He served two terms from 1789 to 1797."]}], "tstamp": 1697787202.6949246}
+{"question_id": 2, "answer_id": "6aVdbEnopvGJCms9rXmNoU", "model_id": "gpt-4", "choices": [{"index": 0, "turns": ["Machine learning is a subset of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. It focuses on the development of computer programs that can access data and use it to learn for themselves.\n\n1. Supervised Learning: This is a type of machine learning where the model is trained on a labeled dataset. A labeled dataset is one where the target outcome or answer is already known. The model learns from this data and then applies that learning to new data. It's called \"supervised\" learning because the process of an algorithm learning from the training dataset can be thought of as a teacher supervising the learning process. \n\n   Real-world example: A common example of supervised learning is email filtering - the algorithm is trained with a set of emails labeled as \"spam\" or \"not spam\", and based on that training, it can classify new incoming emails into either category.\n\n2. Unsupervised Learning: This is a type of machine learning where the model is trained on an unlabeled dataset. The model must find patterns and relationships within the data on its own. It's called \"unsupervised\" learning because there is no correct answers and there is no teacher. Algorithms are left to their own devises to discover and present the interesting structure in the data.\n\n   Real-world example: A common example of unsupervised learning is customer segmentation in marketing - the algorithm groups customers into different segments based on their purchasing behavior, without any prior knowledge of who should be in which group.\n\n3. Reinforcement Learning: This is a type of machine learning where an agent learns to behave in an environment, by performing certain actions and observing the results/rewards/results. The agent learns from the consequences of its actions, rather than from being taught explicitly. It selects its actions based on its past experiences (exploitation) and also by new choices (exploration), which is essentially trial and error learning.\n\n   Real-world example: A common example of reinforcement learning is in gaming AI - the algorithm learns how to play a game (like chess or Go) by playing many games and learning from its mistakes, gradually improving its performance over time."]}], "tstamp": 1697787238.3347323}
+{"question_id": 3, "answer_id": "3FTrUve7N4dUHVPiVfXEuf", "model_id": "gpt-4", "choices": [{"index": 0, "turns": ["The probability of getting a head is 1 - the probability of getting a tail. So, the probability of getting a head is 1 - 0.3 = 0.7.\n\nThe probability of getting heads 10 times in a row is the probability of getting a head to the power of the number of times you want it to happen. So, the probability of getting heads 10 times in a row is 0.7^10 = 0.0282475249 or approximately 0.028 or 2.8%."]}], "tstamp": 1697787248.2623236}
+{"question_id": 4, "answer_id": "KDXuHkU9YhxUoChPFC7kK5", "model_id": "gpt-4", "choices": [{"index": 0, "turns": ["Sure, I'd be happy to help you plan a road trip that aligns with your interests. Here's a suggestion for a three-day weekend trip:\n\n**Day 1: Berkeley to Yosemite National Park**\n\nBegin your journey by heading to Yosemite National Park, a world-famous destination known for its stunning waterfalls, deep valleys, grand meadows, and ancient giant sequoias. The drive is approximately 4 hours. \n\nDepart early in the morning to maximize your day at the park. Once there, you can explore the Yosemite Valley, Mariposa Grove (home to over 500 mature giant sequoias), and Glacier Point for a breathtaking view of the entire Yosemite Valley. \n\nAfterward, you may choose to stay overnight at one of the park's lodges or campgrounds, or head to a nearby town like Mariposa. \n\n**Day 2: Yosemite to Kings Canyon and Sequoia National Parks**\n\nOn the second day, drive south to Kings Canyon and Sequoia National Parks (Approximately 3.5 hours from Yosemite). These twin parks are known for their massive trees, deep canyons, and towering mountains.\n\nIn Kings Canyon National Park, you can visit the General Grant Tree, the world's second-largest tree. In Sequoia National Park, don't miss the General Sherman Tree, the largest tree in the world by volume.\n\nFor an unforgettable scenic drive, take the Kings Canyon Scenic Byway. It's one of the most beautiful drives in California, taking you deep into the heart of the park.\n\nYou can opt to camp in one of the parks or stay overnight at a nearby town like Three Rivers.\n\n**Day 3: Sequoia National Park to Berkeley**\n\nOn your last day, consider exploring more of Sequoia National Park in the morning. Take the Moro Rock Trail, a short but steep hike that leads to a fantastic panoramic view. \n\nIn the afternoon, start your journey back to Berkeley. The drive from Sequoia National Park to Berkeley is approximately 4.5 hours.\n\nRemember to check the weather and road conditions before your trip. National parks may also require reservations or have certain restrictions due to COVID-19, so make sure to check their official websites for the most accurate and updated information.\n\nHappy traveling!"]}], "tstamp": 1697787287.0736637}
+{"question_id": 5, "answer_id": "9YWZ6XWeYVrS4MKqMvApt5", "model_id": "gpt-4", "choices": [{"index": 0, "turns": ["The most likely reason is that you were not born yet at the time of their wedding."]}], "tstamp": 1697787288.9998314}