How to use different video classifier backbones #1468

BernhardGlueck · 2022-10-17T14:16:20Z

BernhardGlueck
Oct 17, 2022

I am trying to train a video classifier using the following code:
My input data are thousands of video all with a resolution of 398x224 with 25 fps and a length of exactly 2 seconds each.

I tried different clip samples like random or uniform.

datamodule = VideoClassificationData.from_folders(
    train_folder=train_dir,
    val_folder=val_dir,
    clip_sampler=clip_sampler,
    clip_duration=clip_duration,
    decode_audio=False,
    batch_size=8,
    num_workers=16
)

print(VideoClassifier.available_lr_schedulers())
print(VideoClassifier.available_finetuning_strategies())


model = VideoClassifier(backbone="x3d_xs", labels=datamodule.labels, pretrained=True,
                        lr_scheduler=("StepLR", {"step_size": epochs / 4, "verbose": True}))


trainer = flash.Trainer(
    max_epochs=epochs, gpus=torch.cuda.device_count(), strategy="ddp" if torch.cuda.device_count() > 1 else None
)
trainer.finetune(model, datamodule=datamodule, strategy=("freeze_unfreeze", int(epochs / 2)))

This works fine, but now i want to switch to different backbones.... in the example i am using x3d_xs from the tutorial.
However if i change that to any of the other supported backbones i get different errors:

Any of the other x3d backends: "input image (T: 8 H: 8 W: 8) smaller than kernel size (kT: 13 kH: 5 kW: 5)" ( x3d_s
Any of the slowfast backends: "input for MultiPathWayWithFuse needs to be a list of tensors"

What am i doing wrong ?

ethanwharris · 2022-10-19T09:46:35Z

ethanwharris
Oct 19, 2022
Maintainer

Hi @BernhardGlueck Thanks for reporting this! This looks like a bug on our side (partially related to #1328) for the slowfast backends. Regarding the x3d backends, my guess is that they expect the image to be a bit bigger. Could you try creating your datamodule like this:

from flash.video.classification.input_transform import VideoClassificationInputTransform

datamodule = VideoClassificationData.from_folders(
    ...,
    transform=VideoClassificationInputTransform(image_size=512),
)

That would set the image size to be 512 rather than the default of 244.
Hope that helps 😃

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use different video classifier backbones #1468

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How to use different video classifier backbones #1468

BernhardGlueck Oct 17, 2022

Replies: 1 comment

ethanwharris Oct 19, 2022 Maintainer

BernhardGlueck
Oct 17, 2022

ethanwharris
Oct 19, 2022
Maintainer