-
Notifications
You must be signed in to change notification settings - Fork 79
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add PostgreSQL as a possible viewer (#3121)
* Add PostgreSQL as a possible viewer Add documentation for how to use PostgreSQL with pgai to access the dataset. * Improve documentation for PostgreSQL viewer
- Loading branch information
Showing
4 changed files
with
72 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
# PostgreSQL | ||
|
||
[PostgreSQL](https://www.postgresql.org/docs/) is a powerful, open source object-relational database system. It is the most [popular](https://survey.stackoverflow.co/2024/technology#most-popular-technologies-database) database by application developers for a few years running. [pgai](https://github.com/timescale/pgai) is a PostgreSQL extension that allows you to easily ingest huggingface datasets into your PostgreSQL database. | ||
|
||
|
||
## Run PostgreSQL with pgai installed | ||
|
||
You can easily run a docker container containing PostgreSQL with pgai. | ||
|
||
```bash | ||
docker run -d --name pgai -p 5432:5432 \ | ||
-v pg-data:/home/postgres/pgdata/data \ | ||
-e POSTGRES_PASSWORD=password timescale/timescaledb-ha:pg17 | ||
``` | ||
|
||
Then run the following command to install pgai into the database. | ||
|
||
```bash | ||
docker exec -it pgai psql -c "CREATE EXTENSION ai CASCADE;" | ||
``` | ||
|
||
You can then connect to the database using the `psql` command line tool in the container. | ||
|
||
```bash | ||
docker exec -it pgai psql | ||
``` | ||
|
||
or using your favorite PostgreSQL client using the following connection string: `postgresql://postgres:password@localhost:5432/postgres | ||
` | ||
|
||
Alternatively, you can install pgai into an existing PostgreSQL database. For instructions on how to install pgai into an existing PostgreSQL database, follow the instructions in the [github repo](https://github.com/timescale/pgai). | ||
|
||
## Create a table from a dataset | ||
|
||
To load a dataset into PostgreSQL, you can use the `ai.load_dataset` function. This function will create a PostgreSQL table, and load the dataset from the Hugging Face Hub | ||
in a streaming fashion. | ||
|
||
```sql | ||
select ai.load_dataset('rajpurkar/squad', table_name => 'squad'); | ||
``` | ||
|
||
You can now query the table using standard SQL. | ||
|
||
```sql | ||
select * from squad limit 10; | ||
``` | ||
|
||
<Tip> | ||
Full documentation for the `ai.load_dataset` function can be found [here](https://github.com/timescale/pgai/blob/main/docs/load_dataset_from_huggingface.md). | ||
</Tip> | ||
|
||
## Import only a subset of the dataset | ||
|
||
You can also import a subset of the dataset by specifying the `max_batches` parameter. | ||
This is useful if the dataset is large and you want to experiment with a smaller subset. | ||
|
||
```sql | ||
SELECT ai.load_dataset('rajpurkar/squad', table_name => 'squad', batch_size => 100, max_batches => 1); | ||
``` | ||
|
||
## Load a dataset into an existing table | ||
|
||
You can also load a dataset into an existing table. | ||
This is useful if you want more control over the data schema or want to predefine indexes and constraints on the data. | ||
|
||
```sql | ||
select ai.load_dataset('rajpurkar/squad', table_name => 'squad', if_table_exists => 'append'); | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters