-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
while setting variables #85
Comments
Hi Palash, I guess we are talking about the script ./scripts/feature_extraction/extract-dvlog-pase+-feats.sh for extracting the audio-based PASE+ features. Different aspects to take into account: · The videos of the dataset are expected to be placed at Therefore, once you have cloned the repo and downloaded the pre-trained model checkpoint, you should be able to set these variables, e.g., as follows: PASE_CONFIG=./pase/cfg/frontend/PASE+.cfg Regarding the USE_AUTH_TOKEN variable, is a commonly required authentication token when using certain HuggingFace models. Please find the instructions to use PyAnnote here. |
Thank you @david-gimeno for the help.
|
You can find the toolkit we employed to extract each modality in the paper (see page no. 7). However, I am seeing that is not as easy as I expected configuring all these feature extractors. Let's go step by step.
Note that the |
Thank you @david-gimeno for the quick response. |
I would need more details. What OS are you using, Ubuntu? and How did you set the variable |
Hi, I am using Ubuntu only I have set like this. MPIIGAZE_DIR=./scripts/conda_envs/feature_extractors/pytorch_mpiigaze_demo/ please let me know if you need any other information. |
According to the script the model checkpoints should automatically be downloaded. So, let's try using absolute paths, just in case. However, unless you modified our repo folder structure, your paths may be wrong because |
Actually i have clone mpiigaze inside this path only which is ./scripts/conda_envs/feature_extractors/pytorch_mpiigaze_demo/ |
Okey, I think the problem is in the config file |
Hi @david-gimeno . I have updated the file like this |
Our script uses the function If the error persists, you should contact the original authors of the gaze tracker code. Note: Linux can have hidden files if their name is preceded by a dot. There are ways to see them even if they are hidden |
Thank you @david-gimeno i was able to extract gaze feature.. after downloading the model it was getting stored in the path as ~/.patze./... |
Hi @david-gimeno. I am trying to extract the emonet features using the following code it is looking for a face_ID inside the directory. following is the error I am getting. Also, I am facing multiple issue while installing the requirement for pase+.. can you please help me in this too? |
It seems the script was expecting an additional level of directories. So, the script has been modified, you can check it here. Please update the repo and try again. Regarding the requirements for each feature extractor, we provide info in the README of the repo. Please, read it carefully. Nonetheless, take into account that these installations usually depend on the OS architecture and might fail on certain occasions. Issues related to these installations should be solved by contacting the original authors of the corresponding models. |
@david-gimeno is there any workaround for the installation of requirement.txt.. there are few installation issues I am facing. i have raised issue with the original authors too. |
HI @david-gimeno.. I have downloaded all the videos from the D-vlog dataset.. should i split it based on the ids given in the test, train and validation.csv file?? or there is a separate file python3 ./scripts/feature_extraction/dvlog/extract_wavs.py --csv-path ./data/D-vlog/video_ids.csv --column-video-id video_id --video-dir $VIDEO_DIR --dest-dir $WAV_DIR video_id.csv which is missing in the repo?? please help me with this.. |
Hi @palashmoon ,could you share your d-vlog dataset,I have been looking for this for a long time. |
@waHAHJIAHAO please reach out to the authors of the D-vlog dataset to request access: D-vlog: Multimodal Vlog Dataset for Depression Detection |
Hi, @bucuram ,May I ask how to fill this config file. |
Hi, @waHAHJIAHAO! There should be the paths to the data, for example:
|
Thanks @bucuram a lot!! and What should I fill in the field of "ewing-between-the-frames" |
That field should remain as it is in the example above. Then, you can use the env config when running the experiments. The env config is already set as Line 20 in d19568c
|
@bucuram YESS!!,thank u for reply~~ ,due to I want to process new dataset which collected from my university,I have some quetions about d-vlog dataset . the filepath is "data/D-vlog/splits",which contain 4 csv files."voice_presence face_presence body_presence hand_presence" |
Hi @waHAHJIAHAO! These four dataframe columns were not originally in the D-Vlog dataset. We computed these statistics thanks to our feature extraction scripts you can find in this directory. As you can observe, for example, in the face detection script and body landmarks identification script, we were creating a numpy array with the index of those frames were no face or no body was detected. Additionally, as you can also notice, we were zero-filling the final feature sequence representing the video sample. So, how did we compute that voice, face, etc. presence values? Having the information I mentioned above and knowing the frame per seconds of each video clip, we can compute the number of seconds were the subject was actually talking, actually present in the scene, etc. These are statistics to know more things about the dataset. What we actually used for model training was the array with the index where there was, e.g., no face, to create a mask with 0's and 1's to tell the model where it shouldn't pay attention. |
Hi @david-gimeno ~,Thank you for your careful reply. I have completed the pre-processing of the "presence" part. I still pre-processed my new data set according to D-vlog, generated some segmented npz files to access the model, and wrote the following data processing scripts which are logically consistent with dvlog. The current error problem is that my "traindataloader" did "torch.stack()" loading data error. I read on the fifth page of your paper that there is a learnable modal encoder that can unify the output of each mode. I would like to ask where is this encoder, or do you have any suggestions for my problem? |
@waHAHJIAHAO Regarding the learnable modality encoders to unify all inputs to the same dimensional space, you can find their implementation here. These modality encoders are subsequently used here when defining our Transformer-based model. Note that some of the modalities will be flatten (check this code and this config file). I agree our model do and takes into account a lot of details, but I believe that going step by step to understand the code is the proper way and it will not be for sure a waste of time :) |
@david-gimeno Thank u for reply!!!!I have been run my dataset successfully,but i found a question:It does not seem to converge when training, the best results will occur in a few epochs, and 200 epochs seem unnecessary |
Hello, can you please help me with what value I should set for the variable in each file ?? for example PASE_CONFIG=
PASE_CHCK_PATH=
USE_AUTH_TOKEN= what should be the value for this??
The text was updated successfully, but these errors were encountered: