feat: sentiment analysis docs (#805)

fal-ai · Mar 30, 2023 · c65ad80 · c65ad80
1 parent aa3b26b
commit c65ad80
Showing 1 changed file with 74 additions and 32 deletions.
diff --git a/docsite/docs/fal-serverless/examples/sentiment-analysis.md b/docsite/docs/fal-serverless/examples/sentiment-analysis.md
@@ -2,62 +2,104 @@
 sidebar_position: 1
 ---
 
-# Sentiment Analysis
+# Sentiment Analysis with dbt
 
 Sentiment analysis is the process of determining the sentiment or emotion behind a piece of text. It is widely used in social media monitoring, customer service, and marketing. By using sentiment analysis, you can quickly identify and respond to any complaints or other negative feedback.
 
-This is a simple tutorial on how to perform sentiment analysis on a string using fal-serverless.
+This is a simple tutorial on how to perform sentiment analysis on a string using dbt fal-serverless.
 
-### 1. Import isolated decorator:
+### 1. Install fal-serverless and dbt-fal:
 
 ```python
-from fal_serverless import isolated
+pip install fal-serverless dbt-fal[snowflake]
 ```
 
-### 2. Define requirements list:
+### 2. Authenticate to fal-serverless:
 
-```python
-requirements = ["transformers==4.26.0", "torch==1.13.1"]
+```
+fal-serverless auth login
 ```
 
-### 3. Define an isolated function:
+### 3. Generate keys to access fal-serverless
 
-```python
-# Set machine_type="M" for more RAM
-@isolated(requirements=requirements, machine_type="M")
-def do_sentiment_analysis(input: str) -> list[dict]:
-    from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
+```
+fal-serverless key generate
+```
 
-    # Download a sentiment analysis model
-    model = AutoModelForSequenceClassification.from_pretrained(
-        "pysentimiento/robertuito-sentiment-analysis", cache_dir="/data/huggingface")
+### 4. Update your dbt profiles.yml
 
-    # Download a tokenizer
-    tokenizer = AutoTokenizer.from_pretrained(
-        "pysentimiento/robertuito-sentiment-analysis", cache_dir="/data/huggingface")
+```
+fal_profile:
+  target: fal_serverless
+  outputs:
+    fal_serverless:
+      type: fal
+      db_profile: db
+      host: <ask the fal team>
+      key_secret: MY_KEY_SECRET_VALUE
+      key_id: MY_KEY_ID_VALUE
+    db:
+      type: snowflake
+      username: USERNAME
+      password: PASSWORD
+```
 
-    # Initialize pipeline
-    pipe = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
+### 5. Create a sentiment-analysis fal environment
 
-    # Run analysis and immediately return
-    return pipe(input)
+```
+environments:
+  - name: sentiment-analysis
+    type: venv
+    requirements:
+      - transformers
+      - torch
 ```
 
-Inside the `do_sentiment_analysis` function definition we are downloading a model from Hugging Face in this line:
+### 6. Define your dbt model:
 
 ```python
-    model = AutoModelForSequenceClassification.from_pretrained(
-        "pysentimiento/robertuito-sentiment-analysis", cache_dir="/data/huggingface")
-```
+def model(dbt, fal):
+    dbt.config(materialized="table")
+    dbt.config(fal_environment="sentiment-analysis")
+    dbt.config(fal_machine="GPU")
+    from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer
+    import numpy as np
+    import pandas as pd
+    import torch
 
-By specifying the `cache_dir`, we are making sure that we don't have to download the model repeatedly. It will be stored inside our `/data` directory that works as a user-specific and persistent cache.
+    # Check if a GPU is available and set the device index
+    device_index = 0 if torch.cuda.is_available() else -1
 
-### 4. Call the isolated function with an input string
+    # Load the model and tokenizer
+    model_name = "distilbert-base-uncased-finetuned-sst-2-english"
+    tokenizer = AutoTokenizer.from_pretrained(model_name)
 
-```python
-result = do_sentiment_analysis("This is a totally awesome sentence, I couldn't be happier about it!")
+    # Create the sentiment-analysis pipeline with the specified device
+    classifier = pipeline("sentiment-analysis", model=model_name, tokenizer=tokenizer, device=device_index)
+
+    ticket_data = dbt.ref("zendesk_ticket_data")
+    ticket_descriptions = ticket_data["DESCRIPTION"].tolist()
+
+    # Run the sentiment analysis on the ticket descriptions
+    description_sentiment_analysis = classifier(ticket_descriptions)
+    rows = []
+
+    for id, sentiment in zip(ticket_data.ID, description_sentiment_analysis):
+        rows.append((int(id), sentiment["label"], sentiment["score"]))
+
+    records = np.array(rows, dtype=[("id", int), ("label", "U8"), ("score", float)])
+
+    sentiment_df = pd.DataFrame.from_records(records)
+
+    return sentiment_df
+```
+
+### 4. Run dbt:
+
+```
+dbt run
 ```
 
-The first time `do_sentiment_analysis` is called, it will likely take a bit of time, since it will need to install depedencies and then download the model data. Subsequent runs should be much faster, as fal-serverless smartly caches both the target environment and the user's working directory.
+That's it. Doing a dbt run against this profile will execute your Python models in fal-serverless.
 
 Of course, this is not the only way to run sentiment analysis on fal-serverless. There are many other libraries, APIs and techniques that can be run on fal-serverless.