Category | Difficulty (Out of 5) |
---|---|
Homework | 4 |
Exam | 4 |
• Text representation • Search engine indexes • Index construction • Query structure • Document structure • Unsupervised ranking • Feature-based ranking • Neural ranking • Page features • Evaluation • Search log analysis • Diversity
The graduate sections (11-642 and 11-742) are worth 12 units. The undergraduate section (11-442) is worth 9 units. The main difference between the three sections (11-442, 11-642, 11-742) is the amount of analysis, writing, and time required to complete homework assignments. Undergraduate version only requires to implement codes. 11-642 and 11-742 requires to do experiments and write experimental reports. You can find a detailed course description on the course website.
There are five homeworks total, each is due biweekly and worth 12%. All homeworks are given in course website page, and each has an individual test page. Java is the only the only language to complete the homeworks. (Note: From Fall 2023, All homeworks are required to be completed in Python using an Anaconda development environment ) Intially, you will be provided with a basic serach engine software, and each homework will add more features to that, which is related to what you have learned in the class. Homework 1: Boolean Retrieval, your software must support Unranked Boolean retrieval and Ranked Boolean retrieval (implement operators: OR, AND, NEAR/n, and SYN). Homework 2: Ranked Retrieval and Structured Queries, your software must support BM25 and Indri retrieval methods (implement operators: SUM, WAND, WSUM, WINDOW) Homework 3: Implement query expansion using pseudo relevance feedback Homework 4: Implement learning to rank (L2R) using SVM ranker and RankLib Homework 5: Implement diversification using xQuAD and PM2 For graduate version courses, each homework requires to do additional experiments and write reports. The experiments may take around from 30 minutes to 3 hours to finish in a laptop. The experiments aim at helping students understand what have been taught in the class better, and helping you prepare better for the exams. So I strongly recommend to do the experiments. Homework 1 is extremely important because it is the base of other homeworks. Make sure you consider the corner cases when implementing the software. Note: From Fall 2023, several homeworks have been changed. For example, one homework is using BERT to rerank an initial document ranking. Because in the previous versions, this course did not focus much on the nerual ranking part. But after our suggestions, the homeworks have been updated and become more interesting.
There are two exams in this course. Each is worth 20% and focusing on separate parts. Exams can be tricky but they are not super difficult if you have truly understood all the concepts in this course Question is focusing on comparison between different methods under a given context. So it is not just memorizing all the slides, but requires you to understand the concepts. One suggestion is to start preparation of exams early, and the slides are the best materials for preparation.
- Start each homework early. Try to consider all the corner cases since there are hidden tests for each homework.
- Start preparing exams early. One sample paper (From Fall 2014 and Fall 2015) will be previded for each exam. You can get familiar with the type of questions.
[Course Website]https://boston.lti.cs.cmu.edu/classes/11-642/