Skip to content
This repository has been archived by the owner on Apr 8, 2024. It is now read-only.

Commit

Permalink
feat: sentiment analysis docs (#805)
Browse files Browse the repository at this point in the history
  • Loading branch information
turbo1912 authored Mar 30, 2023
1 parent aa3b26b commit c65ad80
Showing 1 changed file with 74 additions and 32 deletions.
106 changes: 74 additions & 32 deletions docsite/docs/fal-serverless/examples/sentiment-analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,62 +2,104 @@
sidebar_position: 1
---

# Sentiment Analysis
# Sentiment Analysis with dbt

Sentiment analysis is the process of determining the sentiment or emotion behind a piece of text. It is widely used in social media monitoring, customer service, and marketing. By using sentiment analysis, you can quickly identify and respond to any complaints or other negative feedback.

This is a simple tutorial on how to perform sentiment analysis on a string using fal-serverless.
This is a simple tutorial on how to perform sentiment analysis on a string using dbt fal-serverless.

### 1. Import isolated decorator:
### 1. Install fal-serverless and dbt-fal:

```python
from fal_serverless import isolated
pip install fal-serverless dbt-fal[snowflake]
```

### 2. Define requirements list:
### 2. Authenticate to fal-serverless:

```python
requirements = ["transformers==4.26.0", "torch==1.13.1"]
```
fal-serverless auth login
```

### 3. Define an isolated function:
### 3. Generate keys to access fal-serverless

```python
# Set machine_type="M" for more RAM
@isolated(requirements=requirements, machine_type="M")
def do_sentiment_analysis(input: str) -> list[dict]:
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
```
fal-serverless key generate
```

# Download a sentiment analysis model
model = AutoModelForSequenceClassification.from_pretrained(
"pysentimiento/robertuito-sentiment-analysis", cache_dir="/data/huggingface")
### 4. Update your dbt profiles.yml

# Download a tokenizer
tokenizer = AutoTokenizer.from_pretrained(
"pysentimiento/robertuito-sentiment-analysis", cache_dir="/data/huggingface")
```
fal_profile:
target: fal_serverless
outputs:
fal_serverless:
type: fal
db_profile: db
host: <ask the fal team>
key_secret: MY_KEY_SECRET_VALUE
key_id: MY_KEY_ID_VALUE
db:
type: snowflake
username: USERNAME
password: PASSWORD
```

# Initialize pipeline
pipe = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
### 5. Create a sentiment-analysis fal environment

# Run analysis and immediately return
return pipe(input)
```
environments:
- name: sentiment-analysis
type: venv
requirements:
- transformers
- torch
```

Inside the `do_sentiment_analysis` function definition we are downloading a model from Hugging Face in this line:
### 6. Define your dbt model:

```python
model = AutoModelForSequenceClassification.from_pretrained(
"pysentimiento/robertuito-sentiment-analysis", cache_dir="/data/huggingface")
```
def model(dbt, fal):
dbt.config(materialized="table")
dbt.config(fal_environment="sentiment-analysis")
dbt.config(fal_machine="GPU")
from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer
import numpy as np
import pandas as pd
import torch

By specifying the `cache_dir`, we are making sure that we don't have to download the model repeatedly. It will be stored inside our `/data` directory that works as a user-specific and persistent cache.
# Check if a GPU is available and set the device index
device_index = 0 if torch.cuda.is_available() else -1

### 4. Call the isolated function with an input string
# Load the model and tokenizer
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)

```python
result = do_sentiment_analysis("This is a totally awesome sentence, I couldn't be happier about it!")
# Create the sentiment-analysis pipeline with the specified device
classifier = pipeline("sentiment-analysis", model=model_name, tokenizer=tokenizer, device=device_index)

ticket_data = dbt.ref("zendesk_ticket_data")
ticket_descriptions = ticket_data["DESCRIPTION"].tolist()

# Run the sentiment analysis on the ticket descriptions
description_sentiment_analysis = classifier(ticket_descriptions)
rows = []

for id, sentiment in zip(ticket_data.ID, description_sentiment_analysis):
rows.append((int(id), sentiment["label"], sentiment["score"]))

records = np.array(rows, dtype=[("id", int), ("label", "U8"), ("score", float)])

sentiment_df = pd.DataFrame.from_records(records)

return sentiment_df
```

### 4. Run dbt:

```
dbt run
```

The first time `do_sentiment_analysis` is called, it will likely take a bit of time, since it will need to install depedencies and then download the model data. Subsequent runs should be much faster, as fal-serverless smartly caches both the target environment and the user's working directory.
That's it. Doing a dbt run against this profile will execute your Python models in fal-serverless.

Of course, this is not the only way to run sentiment analysis on fal-serverless. There are many other libraries, APIs and techniques that can be run on fal-serverless.

0 comments on commit c65ad80

Please sign in to comment.