generated from kbase/kbase-template
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add developer guide for loading data
- Loading branch information
1 parent
b8d5fd8
commit e93d730
Showing
1 changed file
with
58 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
# JupyterLab Developer Guide | ||
|
||
## Accessing JupyterLab Development Environment | ||
|
||
### 1. Create SSH Tunnel: | ||
|
||
Execute the following command to create an SSH tunnel to the remote server (`login1.berkeley.kbase.us`): | ||
|
||
```bash | ||
ssh -f -N -L localhost:44041:10.58.2.201:4041 <ac.anl_username>@login1.berkeley.kbase.us | ||
``` | ||
|
||
### 2. Access JupyterLab Notebooks: | ||
|
||
Open a web browser and navigate to the following URL: | ||
|
||
``` | ||
http://localhost:44041 | ||
``` | ||
This will open the JupyterLab Notebook interface running on the remote server. | ||
|
||
|
||
## Accessing MinIO | ||
Please refer to the [MinIO Guide](minio_guide.md) for instructions on accessing MinIO. | ||
|
||
### MinIO username and password | ||
Get the MinIO username and password with read/write permission from the above development JupiterLab environment. | ||
```python | ||
import os | ||
minio_username, minio_password = os.environ['MINIO_ACCESS_KEY'], os.environ['MINIO_SECRET_KEY'] | ||
print(f"MinIO username: {minio_username},\nMinIO password: {minio_password}") | ||
``` | ||
|
||
### Naming conventions | ||
Please adhere to the following naming conventions for MinIO buckets and objects: | ||
|
||
#### Source Files: | ||
Source files are the raw data files that are uploaded to MinIO. | ||
* Bucket name: `namespace_name`-source | ||
|
||
#### Delta Table Files: | ||
Delta table files are Parquet files generated by Spark during the creation of a table. | ||
* Bucket name: `namespace_name`-delta | ||
|
||
## Loading Notebooks | ||
Please create a corresponding loading notebook for each namespace in the `data-loading-notebooks` directory. | ||
|
||
Please use the existing loading notebooks as examples. | ||
|
||
🚨 **Please DO NOT rerun the loading notebooks in the development environment. Instead, create a new notebook for each | ||
new namespace and manually verify the data loading process.** | ||
|
||
|
||
|
||
|
||
|
||
|
||
|