Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

while setting variables #85

Open
palashmoon opened this issue May 13, 2024 · 28 comments
Open

while setting variables #85

palashmoon opened this issue May 13, 2024 · 28 comments

Comments

@palashmoon
Copy link

Hello, can you please help me with what value I should set for the variable in each file ?? for example PASE_CONFIG=
PASE_CHCK_PATH=
USE_AUTH_TOKEN= what should be the value for this??

@david-gimeno
Copy link
Collaborator

david-gimeno commented May 14, 2024

Hi Palash,

I guess we are talking about the script ./scripts/feature_extraction/extract-dvlog-pase+-feats.sh for extracting the audio-based PASE+ features. Different aspects to take into account:

· The videos of the dataset are expected to be placed at ./data/D-vlog/videos/, as indicated by variable $VIDEO_DIR.
· You should clone the official PASE+ repo: https://github.com/santi-pdp/pase.git
· You then have to follow the instructions here to download the pre-trained model provided by the original authors and to know the config file you should use.

Therefore, once you have cloned the repo and downloaded the pre-trained model checkpoint, you should be able to set these variables, e.g., as follows:

PASE_CONFIG=./pase/cfg/frontend/PASE+.cfg
PASE_CHCK_PATH=./pase/FE_e199.ckpt

Regarding the USE_AUTH_TOKEN variable, is a commonly required authentication token when using certain HuggingFace models. Please find the instructions to use PyAnnote here.

@palashmoon
Copy link
Author

Thank you @david-gimeno for the help.
can you help me in setting up these variables too?

  1. INSTBLINK_CONFIG=
  2. ETH_XGAZE_CONFIG=
  3. BODY_LANDMARKER_PATH= ./data/D-vlog/body_landmarks
    HAND_LANDMARKER_PATH= ./data/D-vlog/hand_landmarks.

@david-gimeno
Copy link
Collaborator

You can find the toolkit we employed to extract each modality in the paper (see page no. 7). However, I am seeing that is not as easy as I expected configuring all these feature extractors. Let's go step by step.

  • Configuring InstBlink: You should clone the following repo and download the checkpoint indicated here. Then, if you inspect the script the original authors wrote, we can infer the following configuration:
    INSTBLINK_CONFIG=./MPEblink/configs/instblink/instblink_r50.py
    INSTBLINK_CHCK_PATH=./MPEblink/pretrained_models/instblink_r50.pth

  • Configuring ETH_XGaze: Similarly, you should clone the following repo. The authors provide config files for various models. In our work, we used the ETH-XGaze detector. So, the config file should be as follows:
    ETH_XGAZE_CONFIG=./pytorch_mpiigaze_demo/ptgaze/data/configs/eth-xgaze.yaml

  • Configuring MediaPipe's Models: As indicated in the paper, we based our body and hand landmarkers on MediaPipe by Google. Take into account that this platform is plenty of models and you are not limited to using the same as ours. However, I can share the specific models we employed. If you read this tutorial you can find the Linux command to download the model checkpoint for the body pose estimator. For the hand landmark detector, you can find the checkpoint downloading Linux command here. How did I find this tutorial? Search in google for hand landmark mediapipe and then click on Python Code Example. Therefore, once you download the model checkpoints, the configuration should be something as follows:

            `BODY_LANDMARKER_PATH=./landmarkers/pose_landmarker.task`
            `HAND_LANDMARKER_PATH=./landmarkers/hand_landmarker.task`
    

Note that the /landmarkers/ directory is not automatically created, it was just not do a mess in our code

@palashmoon
Copy link
Author

palashmoon commented May 15, 2024

Thank you @david-gimeno for the quick response.
I cloned the repo as you mentioned in the above comment for gaze and used this >ETH_XGAZE_CONFIG=./pytorch_mpiigaze_demo/ptgaze/data/configs/eth-xgaze.yaml
but using this i giving me one more issue which is
image
I am unable to find this checkpoint over the net can you please provide some suggestion for this?
one more thing why are cloning pytorch_mpiigaze_demo.. and how can i resolve this issue??
Thank you so much once again

@david-gimeno
Copy link
Collaborator

I would need more details. What OS are you using, Ubuntu? and How did you set the variable MPIIGAZE_DIR=?

@palashmoon
Copy link
Author

Hi, I am using Ubuntu only I have set like this.

MPIIGAZE_DIR=./scripts/conda_envs/feature_extractors/pytorch_mpiigaze_demo/
ETH_XGAZE_CONFIG=./scripts/conda_envs/feature_extractors/pytorch_mpiigaze_demo/ptgaze/data/configs/eth-xgaze.yaml

please let me know if you need any other information.

@david-gimeno
Copy link
Collaborator

david-gimeno commented May 15, 2024

According to the script the model checkpoints should automatically be downloaded. So, let's try using absolute paths, just in case. However, unless you modified our repo folder structure, your paths may be wrong because ./scripts/conda_envs/feature_extractors/... doesn't exist, it should be ./scripts/feature_extractors/...

@palashmoon
Copy link
Author

palashmoon commented May 15, 2024

Actually i have clone mpiigaze inside this path only which is ./scripts/conda_envs/feature_extractors/pytorch_mpiigaze_demo/
should I still use path as ./scripts/feature_extractors/... ??

@david-gimeno
Copy link
Collaborator

david-gimeno commented May 15, 2024

Okey, I think the problem is in the config file pytorch_mpiigaze_demo/ptgaze/data/configs/eth-xgaze.yaml. Open it with a text editor and modify the paths to the checkpoints according to the way you structure your project. I mean, you should replace ~/

@palashmoon
Copy link
Author

palashmoon commented May 15, 2024

Hi @david-gimeno . I have updated the file like this
gaze_estimator:
checkpoint: ./scripts/conda_envs/feature_extractors/pytorch_mpiigaze_demo/ptgaze/models/eth-xgaze_resnet18.pth
camera_params: ${PACKAGE_ROOT}/data/calib/sample_params.yaml
use_dummy_camera_params: false
normalized_camera_params: ${PACKAGE_ROOT}/data/normalized_camera_params/eth-xgaze.yaml
normalized_camera_distance: 0.6
image_size: [224, 224]
but still I am getting the same error.
Actually there is no checkpoint as eth-xgaze_resnet18.pth inside the models folder...
this is the current structure
image

@david-gimeno
Copy link
Collaborator

david-gimeno commented May 15, 2024

Our script uses the function download_ethxgaze_model(), which is defined in the original repo of the gaze tracker here. It returns the path where the model checkpoint should be downloaded. Try to modify our script to print that path and check if it matches the one specified in the config file.

If the error persists, you should contact the original authors of the gaze tracker code.

Note: Linux can have hidden files if their name is preceded by a dot. There are ways to see them even if they are hidden

@palashmoon
Copy link
Author

palashmoon commented May 16, 2024

Thank you @david-gimeno i was able to extract gaze feature.. after downloading the model it was getting stored in the path as

~/.patze./...
which was not able to expand properly by python I use os.expanduser(~/.patze/..) to get the correct path.
Thanks for the help again.

@palashmoon
Copy link
Author

palashmoon commented May 16, 2024

Hi @david-gimeno. I am trying to extract the emonet features using the following code it is looking for a face_ID inside the directory.
image
But the current structure is like this.
image

following is the error I am getting.
image
How can I extract facesId in the current locations??
can you please provide some suggestion on this??

Also, I am facing multiple issue while installing the requirement for pase+.. can you please help me in this too?

@david-gimeno
Copy link
Collaborator

It seems the script was expecting an additional level of directories. So, the script has been modified, you can check it here. Please update the repo and try again.

Regarding the requirements for each feature extractor, we provide info in the README of the repo. Please, read it carefully. Nonetheless, take into account that these installations usually depend on the OS architecture and might fail on certain occasions. Issues related to these installations should be solved by contacting the original authors of the corresponding models.

@palashmoon
Copy link
Author

palashmoon commented May 17, 2024

@david-gimeno is there any workaround for the installation of requirement.txt.. there are few installation issues I am facing. i have raised issue with the original authors too.
santi-pdp/pase#128. these are current errors I am getting while installation.

@palashmoon
Copy link
Author

HI @david-gimeno.. I have downloaded all the videos from the D-vlog dataset.. should i split it based on the ids given in the test, train and validation.csv file?? or there is a separate file python3 ./scripts/feature_extraction/dvlog/extract_wavs.py --csv-path ./data/D-vlog/video_ids.csv --column-video-id video_id --video-dir $VIDEO_DIR --dest-dir $WAV_DIR video_id.csv which is missing in the repo?? please help me with this..

@waHAHJIAHAO
Copy link

waHAHJIAHAO commented Jul 15, 2024

Hi @palashmoon ,could you share your d-vlog dataset,I have been looking for this for a long time.

@bucuram
Copy link
Collaborator

bucuram commented Jul 18, 2024

@waHAHJIAHAO please reach out to the authors of the D-vlog dataset to request access: D-vlog: Multimodal Vlog Dataset for Depression Detection

@waHAHJIAHAO
Copy link

Hi, @bucuram ,May I ask how to fill this config file.
image

@bucuram
Copy link
Collaborator

bucuram commented Jul 31, 2024

Hi, @waHAHJIAHAO!

There should be the paths to the data, for example:

reading-between-the-frames:
  d-vlog: data/D-vlog/
  d-vlog-original: data/D-vlog/splits/original/
  daic-woz: data/DAIC-WOZ/
  e-daic-woz: data/E-DAIC-WOZ/
  num_workers: 8

@waHAHJIAHAO
Copy link

Thanks @bucuram a lot!! and What should I fill in the field of "ewing-between-the-frames"

@bucuram
Copy link
Collaborator

bucuram commented Jul 31, 2024

That field should remain as it is in the example above. Then, you can use the env config when running the experiments.

The env config is already set as ENV="reading-between-the-frames".

@waHAHJIAHAO
Copy link

@bucuram YESS!!,thank u for reply~~ ,due to I want to process new dataset which collected from my university,I have some quetions about d-vlog dataset . the filepath is "data/D-vlog/splits",which contain 4 csv files."voice_presence face_presence body_presence hand_presence"
These four fields were originally present in the D-VLOG dataset, or were they processed by your team later. If you did the pre-processing operation later, please tell me how to generate this part of data. And does the absence of this part have any effect on the model?
image
this is part of my dataset:
image

@david-gimeno
Copy link
Collaborator

david-gimeno commented Aug 1, 2024

Hi @waHAHJIAHAO!

These four dataframe columns were not originally in the D-Vlog dataset. We computed these statistics thanks to our feature extraction scripts you can find in this directory. As you can observe, for example, in the face detection script and body landmarks identification script, we were creating a numpy array with the index of those frames were no face or no body was detected. Additionally, as you can also notice, we were zero-filling the final feature sequence representing the video sample.

So, how did we compute that voice, face, etc. presence values? Having the information I mentioned above and knowing the frame per seconds of each video clip, we can compute the number of seconds were the subject was actually talking, actually present in the scene, etc. These are statistics to know more things about the dataset. What we actually used for model training was the array with the index where there was, e.g., no face, to create a mask with 0's and 1's to tell the model where it shouldn't pay attention.

@waHAHJIAHAO
Copy link

Hi @david-gimeno ~,Thank you for your careful reply. I have completed the pre-processing of the "presence" part. I still pre-processed my new data set according to D-vlog, generated some segmented npz files to access the model, and wrote the following data processing scripts which are logically consistent with dvlog. The current error problem is that my "traindataloader" did "torch.stack()" loading data error. I read on the fifth page of your paper that there is a learnable modal encoder that can unify the output of each mode. I would like to ask where is this encoder, or do you have any suggestions for my problem?
image
image
this is my error:
image

@waHAHJIAHAO
Copy link

I just use 5 modality and print their shape ,it look like this:
image

@david-gimeno
Copy link
Collaborator

david-gimeno commented Aug 5, 2024

@waHAHJIAHAO
The tensor shapes look nice. I guess 270 refers to the number of frames composing your context window, which, if your videos were recorded at 25fps, should correspond to a 10-second span. Are you using our code or are you implementing your own dataset? Anyway, I recommend you to carefully inspect our dataset script, specifically here you have a good starting point. You can debug how you data shape looks at every dataset step, either some tools or simple, yet effective, print() and exit(),

Regarding the learnable modality encoders to unify all inputs to the same dimensional space, you can find their implementation here. These modality encoders are subsequently used here when defining our Transformer-based model. Note that some of the modalities will be flatten (check this code and this config file). I agree our model do and takes into account a lot of details, but I believe that going step by step to understand the code is the proper way and it will not be for sure a waste of time :)

@waHAHJIAHAO
Copy link

waHAHJIAHAO commented Aug 17, 2024

@david-gimeno Thank u for reply!!!!I have been run my dataset successfully,but i found a question:It does not seem to converge when training, the best results will occur in a few epochs, and 200 epochs seem unnecessary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants