Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SpecAugment in Albumentations #40

Open
ternaus opened this issue Nov 5, 2024 · 0 comments
Open

SpecAugment in Albumentations #40

ternaus opened this issue Nov 5, 2024 · 0 comments

Comments

@ternaus
Copy link

ternaus commented Nov 5, 2024

It is not an issue, more like a note.

image augmentation library Albumentations has transform XYSPec, which is a generalization of SpecAugment.

Works on

  • uint8 and float32 images
  • any number of channels
  • images, masks, bounding boxes and key points
  • Could be combined with other 70 image transforms in a pipeline
Applies masking strips to an image, either horizontally (X axis) or vertically (Y axis),
    simulating occlusions. This transform is useful for training models to recognize images
    with varied visibility conditions. It's particularly effective for spectrogram images,
    allowing spectral and frequency masking to improve model robustness.

    At least one of `max_x_length` or `max_y_length` must be specified, dictating the mask's
    maximum size along each axis.

    Args:
        num_masks_x (int | tuple[int, int]): Number or range of horizontal regions to mask. Defaults to 0.
        num_masks_y (int | tuple[int, int]): Number or range of vertical regions to mask. Defaults to 0.
        mask_x_length (int | tuple[int, int]): Specifies the length of the masks along
            the X (horizontal) axis. If an integer is provided, it sets a fixed mask length.
            If a tuple of two integers (min, max) is provided,
            the mask length is randomly chosen within this range for each mask.
            This allows for variable-length masks in the horizontal direction.
        mask_y_length (int | tuple[int, int]): Specifies the height of the masks along
            the Y (vertical) axis. Similar to `mask_x_length`, an integer sets a fixed mask height,
            while a tuple (min, max) allows for variable-height masks, chosen randomly
            within the specified range for each mask. This flexibility facilitates creating masks of various
            sizes in the vertical direction.
        fill_value (int | float | list[int] | list[float] | str): Value to fill image masks. Defaults to 0.
        mask_fill_value (int | float | list[int] | list[float] | None): Value to fill masks in the mask.
            If `None`, uses mask is not affected. Default: `None`.
        p (float): Probability of applying the transform. Defaults to 0.5.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    Note: Either `max_x_length` or `max_y_length` or both must be defined.

Link to play with https://explore.albumentations.ai/transform/XYMasking

Screenshot 2024-11-05 at 14 41 01

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant