There are two ways to setup the environment: conda in your desktop and docker container isolate environment.
If you want to build docker with compile all things inside, there are some things need setup first in your own desktop environment:
- NVIDIA-driver: which I believe most of people may already have it. Try
nvidia-smi
to check if you have it. - Docker:
# Add Docker's official GPG key: sudo apt-get update sudo apt-get install ca-certificates curl sudo install -m 0755 -d /etc/apt/keyrings sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc sudo chmod a+r /etc/apt/keyrings/docker.asc # Add the repository to Apt sources: echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \ $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null sudo apt-get update
- nvidia-container-toolkit
sudo apt update && apt install nvidia-container-toolkit
Then follow this stackoverflow answers:
-
Edit/create the /etc/docker/daemon.json with content:
{ "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } }, "default-runtime": "nvidia" }
-
Restart docker daemon:
sudo systemctl restart docker
-
Then you can build the docker image:
cd DeFlow && docker build -t zhangkin/deflow .
We will use conda to manage the environment with mamba for faster package installation.
Install conda with mamba for package management and for faster package installation:
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh
Create base env: [~5 mins]
git clone https://github.com/KTH-RPL/DeFlow
cd DeFlow && git submodule update --init --recursive
mamba env create -f assets/environment.yml
CUDA package (nvcc compiler already installed through conda), the compile time is around 1-5 minutes:
mamba activate seflow
cd assets/cuda/mmcv && python ./setup.py install && cd ../../..
Checking the environment:
mamba activate seflow
python -c "import torch; print(torch.__version__); print(torch.cuda.is_available()); print(torch.version.cuda)"
python -c "import lightning.pytorch as pl"
python -c "from assets.cuda.mmcv import Voxelization, DynamicScatter;print('success test on mmcv package')"
-
looks like open3d and fire package conflict, not sure
- need install package like
pip install --ignore-installed
, ref: pip cannot install distutils installed project, my error:ERROR: Cannot uninstall 'blinker'.
- need specific werkzeug version for open3d 0.16.0, otherwise error:
ImportError: cannot import name 'url_quote' from 'werkzeug.urls'
. But need update to solve the problem:pip install --upgrade Flask
ref
- need install package like
-
ImportError: libtorch_cuda.so: undefined symbol: cudaGraphInstantiateWithFlags, version libcudart.so.11.0
The cuda version:pytorch::pytorch-cuda
andnvidia::cudatoolkit
need be same. Reference link -
In cluster have error:
pandas ImportError: /lib64/libstdc++.so.6: version
GLIBCXX_3.4.29' not foundSolved by
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/proj/berzelius-2023-154/users/x_qinzh/mambaforge/lib`
If you want to contribute to new model, here are tips you can follow:
- Dataloader: we believe all data could be process to
.h5
, we named as different scene and inside a scene, the key of each data is timestamp. Check dataprocess/README.md for more details. - Model: All model files can be found here: src/models. You can view deflow and fastflow3d to know how to implement a new model. Don't forget to add to the
__init__.py
file to import class. - Loss: All loss files can be found here: src/lossfuncs.py. There are three loss functions already inside the file, you can add a new one following the same pattern.
- Training: Once you have implemented the model, you can add the model to the config file here: conf/model and train the model using the command
python train.py model=your_model_name
. One more note here may: if your res_dict from model output is different, you may need add one pattern indef training_step
anddef validation_step
.
All others like eval and vis will be changed according to the model you implemented as you follow the above steps.