Text Summarizer for Webinars

This repository encapsulates a demo of the final application and fine-tuned model with a comprehensive exploration into long document summarization in the context of multi-speaker webinar transcripts. Dive into my in-depth research report that navigates the complexities of long document summarization and evaluations of the current available open-source and closed-source models, along with my process of fine-tuning our very own summarization model on limited resources. As this repository doesn't contain any code for the evaluaion and fine-tuning of the models, I've collated some useful code snippets from my work here.

Date: October-November 2023
Live site: https://llm-text-summarizer.streamlit.app/ (backend is terminated at this time)
Fine-Tuned Open Source Models:

Documentation: Unleashing the Power of Large Language Models on Transcripts Summarization.pdf
Code Snippets: https://gist.github.com/jolenechong/0781431d894332ee44b7ef05caab7cbe

Here's a quick demo on the summarization features of the application and how it works.

LLM.Summarizer.Short.Demo.mp4

Architecture

Usage

Give this model a try! It's the second published model as stated above.
Here's how to use it:

# install these libraries if you haven't already
# !pip install transformers
# !pip install peft

from peft import PeftModel, PeftConfig
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

config = PeftConfig.from_pretrained("jolenechong/lora-bart-samsum-tib-1024")
model = AutoModelForSeq2SeqLM.from_pretrained("philschmid/bart-large-cnn-samsum")
model = PeftModel.from_pretrained(model, "jolenechong/lora-bart-samsum-tib-1024")
tokenizer = AutoTokenizer.from_pretrained("jolenechong/lora-bart-samsum-tib-1024", from_pt=True)

text = """[add transcript you want to summarize here]"""
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(input_ids=inputs["input_ids"])
    print(tokenizer.batch_decode(outputs.detach().cpu().numpy())[0])

Feel free to check out the process through my documentation and code snippets as well as the first model above for more details on the fine-tuning process and the evaluation of the models.

To run the front-end streamlit application locally, follow these steps:

# create virtual environment
py -m venv ".venv"
cd .venv/Scripts
activate.bat # for windows
source .venv/Scripts/activate # for linux

# install relevant libraries
pip install -r requirements.txt

# initializing db
# might need to set up listen_addresses in postgresql.conf file to 'localhost' if it's your first time running it
py
from app import app, db
app.app_context().push()
db.create_all()

# frontend
streamlit run streamlit-app.py

Contact

Jolene - [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
Unleashing the Power of Large Language Models on Transcripts Summarization.docx		Unleashing the Power of Large Language Models on Transcripts Summarization.docx
Unleashing the Power of Large Language Models on Transcripts Summarization.pdf		Unleashing the Power of Large Language Models on Transcripts Summarization.pdf
app.py		app.py
architecture.png		architecture.png
requirements.txt		requirements.txt
streamlit-app.py		streamlit-app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Summarizer for Webinars

Architecture

Usage

Contact

About

Releases

Packages

Languages

jolenechong/textSummarizerLLMsApp

Folders and files

Latest commit

History

Repository files navigation

Text Summarizer for Webinars

Architecture

Usage

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages