MLOps & LLMOps: Production AI Systems - E115

layout

title

nav_exclude

permalink

seo

home

E115: MLOps, LLMOps & AIOps - Productionizing AI Systems

true

/:path/

type	name	description	keywords
Course	E115: MLOps & LLMOps Course	Learn MLOps, LLMOps, and AIOps fundamentals. Master production AI systems, LLM deployment, and machine learning operations at Harvard.	MLOps, LLMOps, AIOps, machine learning operations, LLM deployment, AI systems, production AI

MLOps & LLMOps: Production AI Systems - E115

{:.no_toc}

Course Introduction

In today's AI-driven world, building a robust deep learning model is only half the journey. The real challenge often lies in bringing this model to life in the form of an application that's scalable, maintainable, and ready for real-world deployment. Welcome to E115: Productionizing AI (Machine Learning Operations), where we will traverse the complex landscape of Machine Learning Operations, with a special focus on Large Language Models (LLMs). This course has been meticulously curated to provide a holistic understanding of the complete deep learning workflow, from refining your models to deploying them in production environments.

We will dive deep into topics like containerization, cloud functions, data pipelines, and advanced training workflows, with specific emphasis on LLMs. You will learn how to utilize LLM APIs effectively, host APIs, fine-tune LLMs for specific tasks, adapt them to various domains, and build applications around them. Our objective is not only to help you grasp these concepts but also to empower you to build and deploy scalable AI applications. We will delve into the particular intricacies of LLMs and their applications in real-world scenarios.

Whether you are an AI enthusiast wanting to understand the intricacies of Machine Learning Operations or a seasoned professional aiming to fortify your knowledge, this course promises a comprehensive exploration of the production side of AI, with a spotlight on LLM applications and productionizing.

Lectures

Meeting Time: Tuesday 6:30 - 8:30 PM and Thursday 6:30 - 8:30 PM via zoom.

Technologies and Platforms

We will demonstrate most ideas using TensorFlow and some using PyTorch, utilizing the Google Cloud Platform (GCP). Additionally, tutorials will be provided for AWS for reference purposes.

Course Topics Overview

We have designed an in-depth curriculum to ensure a comprehensive understanding of AI-Ops. Here's a closer look at the topics we'll be covering (see here for a full list of topics):

Introduction:
- Begin with an understanding of the importance of AI-Ops and how it fits in the broader AI and software development ecosystem.
Virtual Environments and Virtual Machines:
- Delve into the foundations of isolated software environments, their importance in AI development, and how virtual machines offer a layer of abstraction over physical hardware.
Containers:
- Understand the concept of containerization using tools like Docker, and how they differ from virtual machines.
LLM Topics:
- Large Language Models (LLMs) have led to many new tools and agents that students will use in their projects. In these lectures, we'll look at some of these tools, such as LangChain, LamaIndex, and API calls. We'll also explore RAGS and AI agents, which make it easy to work with LLMs.
Data Pipelines, & Cloud Storage:
- Learn core data management techniques including ETL and data versioning. Also we will learn how cloud storage solutions fit into the AI-Ops ecosystem.solutions. Explore specialized tools for managing large-scale datasets for computer vision and language models.
Advanced Training Workflows:
- We will look into techniques of advanced training workflows, covering experiment tracking with tools like Weights & Biases, leveraging multi-GPU setups for accelerated training, exploring serverless training options using Vertex AI, and fine-tuning large language models (LLMs) .
Advanced Inference Workflows:
- Understand the nuances of model optimization techniques like distillation, quantization, compression, and Low-Rank Approximation (LORA). We then move to model deployment, hosting, and serving large language models (LLMs) effectively. Explore post-deployment monitoring for model performance, data drift detection, and testing strategie. Cloud Functions, Cloud Run, Kubeflow, and Vertex AI Pipelines.
App Design, Setup, and Code Organization:
- Best practices in designing user-centric AI applications, setting up your development environment, and organizing code for scalability and maintainability.
APIs & Frontend:
- Learn about RESTful APIs to serve your models and design user interfaces for seamless user interactions.
CI/CD:
- Continuous Integration (CI) and Continuous Deployment (CD) are critical practices in modern software development, especially within AI-Ops. This section will cover the principles of CI/CD, including automated testing, integration, and deployment pipelines. You'll learn how to set up CI/CD workflows using platforms like GitHub Actions ensuring that your AI models and applications are robust, tested, and reliably deployed to production environments.
Scaling (k8):
- Delve into Kubernetes, its significance in deploying containerized applications, and understand how to scale your applications to cater to millions of users.

As we journey through these topics, students will gain a holistic perspective, bridging the gap between model development and real-world deployment. With a blend of theory and practical exercises, this course ensures that by the end, you're not just familiar with these concepts, but proficient in applying them.

Prerequisites

To ensure a seamless learning experience and to make the most of this course, participants are expected to come with a foundational knowledge in the following areas:

Programming Proficiency in Python:
- A strong command over Python's basic constructs, including functions, classes, and modules. Familiarity with libraries like NumPy, Pandas, Matplotlib is essential, as they form the backbone of many data manipulation tasks in AI.
Deep Learning Framework - Tensorflow or pyTorch:
- A working knowledge of the TensorFlow (or PyTorch) framework is crucial, as many topics will delve into its functionalities and methods. Understanding TensorFlow's basic operations, data handling, and model building mechanisms will be invaluable.
Basic Shell Commands:
- Comfortability in navigating the command-line interface (CLI), executing shell commands, and performing basic file operations are foundational for many AI-Ops tasks.
Basic Data Structures:
- A good grasp of Python's primary data structures, especially dictionaries and lists, will be instrumental in understanding and manipulating data.
File I/O:
- Knowledge of basic file input/output operations in Python, including reading from and writing to files, is vital for tasks involving data storage and manipulation.
General AI and ML Concepts:
- While this course is centered around AI-Ops, a basic understanding of AI and machine learning concepts, including what models are and how they are trained, will set the context for many advanced topics.

It's important to note that while prior knowledge in these areas will provide a solid foundation, the course has been structured to ensure gradual progression. Even if you're not an expert in all of the prerequisites, a willingness to learn and engage actively in the course's hands-on components will be crucial for success. If you find yourself struggling with some concepts, we encourage leveraging the course resources, attending office hours, and participating in peer discussions to reinforce your understanding.

Course Components

Sessions: Structured lectures focusing on the core topics.
Office Hours: Dedicated time with your Teaching Fellow (TF) for questions, clarifications, or project guidance.
Individual Assignments (3): These assignments ensure you grasp key learning objectives.
Team Projects: Collaborate with classmates to build a fully functional AI application.
Discussion Forums: Engage in peer-to-peer learning, discussions, and knowledge sharing.
Supplementary Readings: To complement the topics covered in lectures and enrich your academic comprehension, a selection of readings has been curated. As this is an evolving field, the ability to continuously update your knowledge through independent reading is an integral part of the course.

Team Projects: Project-Based Learning: Crafting Your Own AI Solutions

In the dynamic realm of AI and AI-Ops, hands-on experience is paramount. This course encourages each student to bring a unique perspective by working on self-conceived projects. Here's what you need to know:

1. Crafting Your AI Project:

Students are expected to conceptualize and develop their own projects. While our teaching staff is here to provide ideas and guidance, the core objective is for each student to nurture and shape their original initiative.
By the end of the semester, the aim is to transform your idea into a fully functional web-app or mobile application.
Project Scope: Your project should incorporate some element of modeling, ensuring it aligns with the learning objectives of the course. Moreover, it is essential that every component of the project CAN be evaluable by our teaching staff.
Unleash Your Creativity: Whether you're driven by a start-up vision, by research lab innovations, or inspired by a personal hobby, this is your platform to bring that idea to life.

2. A Guided Demonstration by Pavlos:

We, the teaching team, will undertake a project that Pavlos proposes throughout the semester. This serves as a demonstration and reference point.
Each week will spotlight a different facet of Pavlos' project development. This structured showcase offers students a practical insight of course concepts.
Parallelly, students will be prompted to integrate the week's learnings into their projects, ensuring a steady progression towards their end goals.

3. Milestones and Assessment:

The course will be punctuated with key milestones, designed to assess your project's evolution and your grasp of the AI-Ops concepts. Details of these milestones will be shared in due course.
It's imperative to understand that a significant portion of your grade hinges on these milestones. They are not just checkpoints but pivotal phases that contribute to your project's holistic development and your learning journey.

In Summation:

The heart of this course is experiential learning. We fervently believe that your ideas and paralleling them with structured guidance, we can equip you with the tangible skills essential in today's AI-driven world.

Grade Distribution

Milestone	Weight
MS1	4
MS2	10
MS3	25
MS4	14
MS5	35
HW1	4
HW2	4
HW3	4

For more information about the projects and milestones, you can either click the links provided above or visit the project page.

Course Policies

Getting Help:
- ED Forum: Post questions related to course content, or technical issues on the ED forum. This encourages peer learning and allows teaching staff to address common concerns. We regularly monitor the forum to provide guidance.
- Office Hours: Attend office hours if you need personalized assistance or in-depth explanations.
- Teaching Staff Helpline: For matters specific to the teaching staff, please send your queries to [email protected].
- Email the Instructor: For private or individual concerns, please feel free to directly email the instructor.
Deadline Policy:

Consistent and timely completion of assignments is imperative in this course. All course milestones must be submitted by 9:00 PM EST on the specified due dates. You can gain 1 extra late day for every 5 lecture attendances. You are allowed a maximum of two late days for any single assignment. For Group project milestones, at least one of the group member must have late days available.

Final Milestone / Midterms: It's important to note that no extensions will be permitted for the final milestone / midterms, under any circumstances. Therefore, careful time management is strongly encouraged to ensure that you can meet this critical deadline.
Academic Honesty:
- This course places a strong emphasis on ethical behavior. Whether it's ethically handling data or attributing the work of others, students are expected to maintain high standards of integrity.
- Acceptable Behaviors: Discussing course materials, engaging in office hours, debugging with peers, using and citing small portions of code found online, seeking online knowledge, and seeking guidance from tutors.
- Unacceptable Behaviors: Accessing or sharing solutions before submission, plagiarizing, not citing sources of external code or techniques, paying or offering payment for coursework, and sharing course material with future potential students.
- Engaging in unacceptable behaviors will lead to disciplinary action. When in doubt, always consult the course instructors.
Collaboration & Teamwork:

Collaboration is encouraged, especially for projects. However, ensure you contribute equally and do not divide tasks in a way that prevents you from understanding all parts of the assignment.

Feedback & Evaluation:
- Continuous feedback is vital for the learning process. While the course has several grading components, always focus on understanding rather than just marks. Do provide feedback on the course structure, content, and delivery, so we can continually improve.

Policy on Usage of Publicly Available Class Material

Permitted Use: Class Material is made available primarily for the educational benefit of enrolled students and may be used by others for personal educational purposes only.
Prohibited Use:
- Selling or commercializing any part of the Class Material.
- Sharing, distributing, or publishing any part of the Class Material in any form or through any medium without explicit permission from the instructor.
- Modifying or altering the Class Material to create derivative works.
Attribution: Any permitted use of the Class Material must carry appropriate acknowledgment of the source (e.g., the instructor's name, course title, and institution).
Enforcement: Failure to comply with this policy may result in legal action and/or disciplinary measures as applicable.

Consent:

By accessing and using the Class Material, you indicate your acknowledgment and acceptance of this policy.

Accessibility:

We are committed to ensuring that this course is accessible to everyone. If you require special accommodations or have any specific needs, please contact the course administrators as soon as possible.

Adherence to accessibility policies and a commitment to fairness, respect for your learning journey, and consideration for the learning journey of your peers are expected from all students.

Inclusion and Belonging Statement

In this data science class, we strive to create a diverse and inclusive learning environment that respects all identities, including race, gender, class, sexuality, religion, and ability. Our goal is to:

Advance ethical data science and expose biases in its applications.
Encourage a variety of thoughts, perspectives, and experiences.
Be a supportive resource, open to understanding and adapting to your unique needs.

To foster inclusion:

Please inform us if your name or pronouns differ from official records.
If something affects your class performance or if you feel uncomfortable with any classroom interactions, reach out to us. You may also find resources at the Harvard Office of Diversity and Inclusion.
Respect and consideration for diverse backgrounds and perspectives are expected from all participants.
Your feedback is essential in enhancing diversity, inclusion, and ethics within our class. Feel free to contact us or submit anonymous suggestions.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
_announcements		_announcements
_includes		_includes
_layouts		_layouts
_modules		_modules
_sass/custom		_sass/custom
_site		_site
_staffers		_staffers
assets		assets
calendar		calendar
myenv		myenv
AC215.code-search		AC215.code-search
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
LICENSE		LICENSE
Pipfile		Pipfile
ProjectShowcase.md		ProjectShowcase.md
README.md		README.md
Untitled.ipynb		Untitled.ipynb
_config.yml		_config.yml
announcements.md		announcements.md
calendar.md		calendar.md
course_summary_presentation.pptx		course_summary_presentation.pptx
faq.md		faq.md
milestone1.md		milestone1.md
milestone2.md		milestone2.md
milestone3.md		milestone3.md
milestone4.md		milestone4.md
milestone5.md		milestone5.md
projects.md		projects.md
readings.md		readings.md
schedule.md		schedule.md
staff.md		staff.md
tutorials_demo.md		tutorials_demo.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLOps & LLMOps: Production AI Systems - E115

Table of contents

Course Introduction

Lectures

Technologies and Platforms

Course Topics Overview

Prerequisites

Course Components

Grade Distribution

Course Policies

Policy on Usage of Publicly Available Class Material

Consent:

Accessibility:

Inclusion and Belonging Statement

About

Releases

Packages

Languages

License

Harvard-IACS/2025-E115

Folders and files

Latest commit

History

Repository files navigation

MLOps & LLMOps: Production AI Systems - E115

Table of contents

Course Introduction

Lectures

Technologies and Platforms

Course Topics Overview

Prerequisites

Course Components

Grade Distribution

Course Policies

Policy on Usage of Publicly Available Class Material

Consent:

Accessibility:

Inclusion and Belonging Statement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages