-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add azure blob storage tutorial (#20)
- Loading branch information
Showing
1 changed file
with
165 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,165 @@ | ||
--- | ||
title: Connect SlateDB to Azure Blob Storage | ||
--- | ||
|
||
This tutorial shows you how to use SlateDB on azure blob storage. You would need a real azure blob storage account to complete the tutorial. | ||
|
||
## Setup | ||
|
||
[Install](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli) azure cli. | ||
|
||
## Create Storage account | ||
|
||
Following steps creates a storage account and lists the keys. This section can be skipped if you already have a storage account created. | ||
|
||
```bash | ||
# Set storage account names | ||
StorageAccountName=<ReplaceWithAccountName> | ||
ContainerName=<ReplaceWithContainerName> | ||
ResourceGroupName=<ReplaceWithResourceGroupName> | ||
|
||
# Login | ||
az login | ||
|
||
# Create Resource Group in the default subscription. | ||
az group create --name $ResourceGroupName --location westus | ||
|
||
# Create Azure Storage account. | ||
az storage account create --name $StorageAccountName --resource-group $ResourceGroupName --location westus --sku Standard_LRS | ||
|
||
# Create a storage container | ||
az storage container create --name $ContainerName --account-name $StorageAccountName | ||
|
||
# Get the keys. | ||
az storage account keys list --resource-group $ResourceGroupName --account-name $StorageAccountName | ||
``` | ||
|
||
## Create a project | ||
|
||
Let's start by creating a new Rust project: | ||
|
||
```bash | ||
cargo init slatedb-abs | ||
cd slatedb-abs | ||
``` | ||
|
||
## Add dependencies | ||
|
||
Now add SlateDB and the `object_store` crate to your `Cargo.toml`: | ||
|
||
```bash | ||
cargo add slatedb object-store --features object-store/azure | ||
``` | ||
|
||
:::note | ||
|
||
If you see "`object_store::path::Path` and `object_store::path::Path` have similar names, but are actually distinct types", you might need to pin the `object_store` version to match `slatedb`'s `object_store` version. | ||
|
||
::: | ||
|
||
## Write some code | ||
|
||
This code demonstrates puts that wait for results to be durable, and then puts that do not wait. | ||
|
||
```rust | ||
use object_store::azure::MicrosoftAzureBuilder; | ||
use object_store::path::Path; | ||
use object_store::ObjectStore; | ||
use slatedb::config::DbOptions; | ||
use slatedb::db::Db; | ||
use std::sync::Arc; | ||
|
||
#[tokio::main] | ||
async fn main() { | ||
// construct azure blob object store. | ||
let blob_store: Arc<dyn ObjectStore> = Arc::new(MicrosoftAzureBuilder::new() | ||
.with_account("<REPLACEWITHACCOUNTNAME>") | ||
.with_access_key("<REPLACEWITHACCOUNTKEY>") | ||
.with_container_name("<REPLACEWITHCONTAINERNAME>") | ||
.build() | ||
.unwrap()); | ||
|
||
// create the db. | ||
let db_options = DbOptions::default(); | ||
let path = Path::from("test_slateDB"); | ||
|
||
println!("Opening the db"); | ||
let db = Db::open_with_opts(path.clone(), db_options, blob_store.clone()) | ||
.await | ||
.expect("failed to open db"); | ||
|
||
// Put a value and wait for the flush. | ||
println!("Writing a value and waiting for flush"); | ||
db.put(b"k1", b"value1").await; | ||
println!("{:?}", db.get(b"k1").await.unwrap()); | ||
|
||
// Put 1000 keys, do not wait for it to be durable | ||
println!("Writing 1000 keys without waiting for flush"); | ||
let write_options = slatedb::config::WriteOptions { | ||
await_durable: false, | ||
}; | ||
for i in 0..1000 { | ||
db.put_with_options( | ||
format!("key{}", i).as_bytes(), | ||
format!("value{}", i).as_bytes(), | ||
&write_options, | ||
) | ||
.await; | ||
} | ||
|
||
// flush to make the writes durable. | ||
println!("Flushing the writes and closing the db"); | ||
db.flush().await.expect("failed to flush"); | ||
db.close().await.expect("failed to close db"); | ||
|
||
// reopen the db and read the value. | ||
println!("Reopening the db"); | ||
let db_reopened = Db::open_with_opts(path.clone(), DbOptions::default(), blob_store.clone()) | ||
.await | ||
.expect("failed to open db"); | ||
println!("Reading the value from the reopened db"); | ||
|
||
// read 20 keys | ||
for i in 0..20 { | ||
println!( | ||
"{:?}", | ||
db_reopened | ||
.get(format!("key{}", i).as_bytes()) | ||
.await | ||
.unwrap() | ||
); | ||
} | ||
db_reopened.close().await.expect("failed to close db"); | ||
} | ||
|
||
``` | ||
|
||
## Check the blob contents | ||
|
||
```bash | ||
az storage blob list --container-name $ContainerName --account-name $StorageAccountName --prefix "test_slateDB/" --delimiter "/" --output table | ||
wal/ | ||
manifest/ | ||
``` | ||
|
||
There are three folders: | ||
|
||
- `manifest`: Contains the manifest files. Manifest files defines the state of the DB, including the set of SSTs that are part of the DB. | ||
- `wal`: Contains the write-ahead log files. | ||
- `compacted`: Contains the compacted SST files. This short example does not create compacted files. | ||
|
||
Let's check the `wal` folder. | ||
|
||
```bash | ||
az storage blob list --container-name $ContainerName --account-name $StorageAccountName --prefix "test_slateDB/wal/" --delimiter "/" --output table | ||
|
||
Name Blob Type Blob Tier Length Content Type Last Modified Snapshot | ||
----------------------------------------- ----------- ----------- -------- ------------------------ ------------------------- ---------- | ||
test_slateDB/wal/00000000000000000001.sst BlockBlob Hot 64 application/octet-stream 2024-09-07T01:15:49+00:00 | ||
test_slateDB/wal/00000000000000000002.sst BlockBlob Hot 138 application/octet-stream 2024-09-07T01:15:49+00:00 | ||
test_slateDB/wal/00000000000000000003.sst BlockBlob Hot 23388 application/octet-stream 2024-09-07T01:15:49+00:00 | ||
test_slateDB/wal/00000000000000000004.sst BlockBlob Hot 64 application/octet-stream 2024-09-07T01:15:50+00:00 | ||
|
||
``` | ||
|
||
Each of these SST files is a write-ahead log entry. They get flushed based on the `flush_interval` config or when `flush` is called explicitly. |