how to preprocess the data for model training? #2

yan159yan · 2022-08-02T04:54:17Z

Good work for the visual-audio data. is there any parameter configuration for the "preprocess_data.py"?

SamuelCahyawijaya · 2022-08-10T08:39:00Z

Hi @yan159yan: Thank you for your interest in our work.
For the preprocess_data.py we use it to run the preprocessing before running evaluation on the eval.py.

As an example, for running the evaluation for the dataset/mm_test_metadata.csv using the pretrained Wav2Vec model CAiRE/wav2vec2-large-xlsr-53-cantonese, you can run the preprocessing and the evaluation in this way:

python preprocess_data.py \
    --output_dir=<CACHE_DIR_PATH>\
    --model_name_or_path=CAiRE/wav2vec2-large-xlsr-53-cantonese \
    --test_manifest_path=dataset/mm_test_metadata_noisy.csv \
    --preprocessing_num_workers=32 \
    --seed=0 --use_video \
    --audio_column_name=audio_path \
    --text_column_name=text_path \
    --video_column_name=lip_image_path

python eval.py \
    --output_dir=<OUTPUT_DIR_PATH>     \
    --model_name_or_path=CAiRE/wav2vec2-large-xlsr-53-cantonese     \
    --test_manifest_path=<CACHE_DIR_PATH>/preprocess_data.arrow   \
    --num_workers=8 \
    --preprocessing_num_workers=8 \
    --use_video    \
    --audio_column_name=audio_path \
    --text_column_name=text_path  \
    --video_column_name=lip_image_path     \
    --per_device_eval_batch_size=16     \
    --dataloader_num_workers=32 \
    --seed=0 \
    --logging_strategy=steps \
    --logging_steps=10 \
    --report_to=tensorboard     \
    --evaluation_strategy=epoch \
    --eval_steps=1 \
    --eval_accumulation_steps=100

Note that --use_video is used to also include the the lip image data. If you don't need the visual part, you can remove that argument.

Hope it helps!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to preprocess the data for model training? #2

how to preprocess the data for model training? #2

yan159yan commented Aug 2, 2022

SamuelCahyawijaya commented Aug 10, 2022

how to preprocess the data for model training? #2

how to preprocess the data for model training? #2

Comments

yan159yan commented Aug 2, 2022

SamuelCahyawijaya commented Aug 10, 2022