Skip to content

Commit

Permalink
add developer guide for loading data
Browse files Browse the repository at this point in the history
  • Loading branch information
Tianhao-Gu committed Aug 27, 2024
1 parent b8d5fd8 commit e93d730
Showing 1 changed file with 58 additions and 0 deletions.
58 changes: 58 additions & 0 deletions docs/dev_guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# JupyterLab Developer Guide

## Accessing JupyterLab Development Environment

### 1. Create SSH Tunnel:

Execute the following command to create an SSH tunnel to the remote server (`login1.berkeley.kbase.us`):

```bash
ssh -f -N -L localhost:44041:10.58.2.201:4041 <ac.anl_username>@login1.berkeley.kbase.us
```

### 2. Access JupyterLab Notebooks:

Open a web browser and navigate to the following URL:

```
http://localhost:44041
```
This will open the JupyterLab Notebook interface running on the remote server.


## Accessing MinIO
Please refer to the [MinIO Guide](minio_guide.md) for instructions on accessing MinIO.

### MinIO username and password
Get the MinIO username and password with read/write permission from the above development JupiterLab environment.
```python
import os
minio_username, minio_password = os.environ['MINIO_ACCESS_KEY'], os.environ['MINIO_SECRET_KEY']
print(f"MinIO username: {minio_username},\nMinIO password: {minio_password}")
```

### Naming conventions
Please adhere to the following naming conventions for MinIO buckets and objects:

#### Source Files:
Source files are the raw data files that are uploaded to MinIO.
* Bucket name: `namespace_name`-source

#### Delta Table Files:
Delta table files are Parquet files generated by Spark during the creation of a table.
* Bucket name: `namespace_name`-delta

## Loading Notebooks
Please create a corresponding loading notebook for each namespace in the `data-loading-notebooks` directory.

Please use the existing loading notebooks as examples.

🚨 **Please DO NOT rerun the loading notebooks in the development environment. Instead, create a new notebook for each
new namespace and manually verify the data loading process.**







0 comments on commit e93d730

Please sign in to comment.