Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Dropbox Document Loader #7301

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
---
hide_table_of_contents: true
sidebar_class_name: node-only
---

# Dropbox Loader

The `DropboxLoader` allows you to load documents from Dropbox into your LangChain applications. It retrieves files or directories from your Dropbox account and converts them into documents ready for processing.

## Overview

Dropbox is a file hosting service that brings all your files—traditional documents, cloud content, and web shortcuts—together in one place. With the `DropboxLoader`, you can seamlessly integrate Dropbox file retrieval into your projects.

## Setup

1. Create a dropbox app, using the [Dropbox App Console](https://www.dropbox.com/developers/apps/create).
2. Ensure the app has the `files.metadata.read`, `files.content.read` scope permissions:
3. Generate the access token from the Dropbox App Console.
4. To use this loader, you'll need to have Unstructured already set up and ready to use at an available URL endpoint. It can also be configured to run locally.
See the docs [here](https://www.dropbox.com/developers/apps/create) for information on how to do that.
5. Install the necessary packages:

```bash npm2yarn
npm install @langchain/community @langchain/core dropbox
```

## Usage

### Loading Specific Files

To load specific files from Dropbox, specify the file paths:

```typescript
import { DropboxLoader } from "@langchain/community/document_loaders/web/dropbox";

const loader = new DropboxLoader({
clientOptions: {
accessToken: "your-dropbox-access-token",
},
unstructuredOptions: {
apiUrl: "http://localhost:8000/general/v0/general", // Replace with your Unstructured API URL
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's emphasize somewhere that this wraps Unstructured

Should we call this DropboxUnstructuredLoader instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can rename the loader class to DropboxUnstructuredLoader
I want to confirm if I need to rename the file to say dropbox_unstructured.ts as well?

Also, I noticed that a few preexisting loaders utilize unstructured as well. Would they need to be renamed as well in the future?:

},
filePaths: ["/path/to/file1.txt", "/path/to/file2.pdf"], // Replace with file paths on Dropbox.
});

const docs = await loader.load();
console.log(docs);
```

### Loading Files from a Directory

To load all files from a specific directory, provide the `folderPath` and set the `mode` to `"directory"`. Set `recursive` to `true` to traverse subdirectories:

```typescript
import { DropboxLoader } from "@langchain/community/document_loaders/web/dropbox";

const loader = new DropboxLoader({
clientOptions: {
accessToken: "your-dropbox-access-token",
},
unstructuredOptions: {
apiUrl: "http://localhost:8000/general/v0/general",
},
folderPath: "/path/to/folder",
recursive: true, // Load documents found in subdirectories
mode: "directory",
});

const docs = await loader.load();
console.log(docs);
```

### Streaming Documents

To process large datasets efficiently, use the `loadLazy` method to stream documents asynchronously:

```typescript
import { DropboxLoader } from "@langchain/community/document_loaders/web/dropbox";

const loader = new DropboxLoader({
clientOptions: {
accessToken: "your-dropbox-access-token",
},
unstructuredOptions: {
apiUrl: "http://localhost:8000/general/v0/general",
},
folderPath: "/large/dataset",
recursive: true,
mode: "directory",
});

for await (const doc of loader.loadLazy()) {
// Process each document as it's loaded
console.log(doc);
}
```

## API References

- `clientOptions`, refer to [Dropbox SDK Documentation](https://dropbox.github.io/dropbox-sdk-js/Dropbox.html#Dropbox__anchor)
- `unstructuredOptions` refer to [UnstructuredLoader API reference](
https://api.js.langchain.com/classes/langchain_community_document_loaders_fs_unstructured.UnstructuredLoader.html)
1 change: 1 addition & 0 deletions langchain/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -102,3 +102,4 @@ GOOGLE_ROUTES_API_KEY=ADD_YOURS_HERE
CONFLUENCE_USERNAME=ADD_YOURS_HERE
CONFLUENCE_PASSWORD=ADD_YOURS_HERE
CONFLUENCE_PATH=ADD_YOURS_HERE
DROPBOX_ACCESS_TOKEN=ADD_YOURS_HERE
4 changes: 4 additions & 0 deletions libs/langchain-community/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -902,6 +902,10 @@ document_loaders/web/cheerio.cjs
document_loaders/web/cheerio.js
document_loaders/web/cheerio.d.ts
document_loaders/web/cheerio.d.cts
document_loaders/web/dropbox.cjs
document_loaders/web/dropbox.js
document_loaders/web/dropbox.d.ts
document_loaders/web/dropbox.d.cts
document_loaders/web/html.cjs
document_loaders/web/html.js
document_loaders/web/html.d.ts
Expand Down
2 changes: 2 additions & 0 deletions libs/langchain-community/langchain.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -280,6 +280,7 @@ export const config = {
"document_loaders/web/azure_blob_storage_file",
"document_loaders/web/browserbase": "document_loaders/web/browserbase",
"document_loaders/web/cheerio": "document_loaders/web/cheerio",
"document_loaders/web/dropbox": "document_loaders/web/dropbox",
"document_loaders/web/html": "document_loaders/web/html",
"document_loaders/web/puppeteer": "document_loaders/web/puppeteer",
"document_loaders/web/playwright": "document_loaders/web/playwright",
Expand Down Expand Up @@ -504,6 +505,7 @@ export const config = {
"document_loaders/web/azure_blob_storage_file",
"document_loaders/web/browserbase",
"document_loaders/web/cheerio",
"document_loaders/web/dropbox",
"document_loaders/web/puppeteer",
"document_loaders/web/playwright",
"document_loaders/web/college_confidential",
Expand Down
18 changes: 18 additions & 0 deletions libs/langchain-community/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,7 @@
"dotenv": "^16.0.3",
"dpdm": "^3.12.0",
"dria": "^0.0.3",
"dropbox": "^10.34.0",
"duck-duck-scrape": "^2.2.5",
"epub2": "^3.0.1",
"eslint": "^8.33.0",
Expand Down Expand Up @@ -302,6 +303,7 @@
"d3-dsv": "^2.0.0",
"discord.js": "^14.14.1",
"dria": "^0.0.3",
"dropbox": "^10.34.0",
"duck-duck-scrape": "^2.2.5",
"epub2": "^3.0.1",
"faiss-node": "^0.5.1",
Expand Down Expand Up @@ -580,6 +582,9 @@
"dria": {
"optional": true
},
"dropbox": {
"optional": true
},
"duck-duck-scrape": {
"optional": true
},
Expand Down Expand Up @@ -2757,6 +2762,15 @@
"import": "./document_loaders/web/cheerio.js",
"require": "./document_loaders/web/cheerio.cjs"
},
"./document_loaders/web/dropbox": {
"types": {
"import": "./document_loaders/web/dropbox.d.ts",
"require": "./document_loaders/web/dropbox.d.cts",
"default": "./document_loaders/web/dropbox.d.ts"
},
"import": "./document_loaders/web/dropbox.js",
"require": "./document_loaders/web/dropbox.cjs"
},
"./document_loaders/web/html": {
"types": {
"import": "./document_loaders/web/html.d.ts",
Expand Down Expand Up @@ -4088,6 +4102,10 @@
"document_loaders/web/cheerio.js",
"document_loaders/web/cheerio.d.ts",
"document_loaders/web/cheerio.d.cts",
"document_loaders/web/dropbox.cjs",
"document_loaders/web/dropbox.js",
"document_loaders/web/dropbox.d.ts",
"document_loaders/web/dropbox.d.cts",
"document_loaders/web/html.cjs",
"document_loaders/web/html.js",
"document_loaders/web/html.d.ts",
Expand Down
Loading