Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increasing the smoothness of the video progress bar #89

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

H-Dempsey
Copy link

Hi napari-deeplabcut!

I am a big fan of both napari and deeplabcut, and I was so excited when I first found out about this collaboration.
This is my first pull request and I apologise if I make some mistakes.

I noticed that when I scroll through a video during manual frame extraction, the interface lags.

Before1.mp4

If I replace the current video reader class + dask lazy loads with the napari-video reader class, the scrolling is much smoother.

After1.mp4

For the non-opencv option, I tried to make my own version of the napari-video class and replace all cv2 functions with imageio functions.
But unfortunately, the speed looked similar, so I decided to keep the PyAV and dask lazy loading for that.
I have included it below for interest.

Thank you for making this really useful tool.

Harry

import os
import numpy as np
import imageio

class VideoReaderNPIO:
    def __init__(self, filename: str, remove_leading_singleton: bool = True):
        """Open video in filename."""
        if not os.path.exists(filename):
            raise FileNotFoundError(f'{filename} not found.')
        self._filename = filename
        self._vr = imageio.get_reader(filename)
        self._seek(0)  # reset to first frame
        frame = self._vr.get_next_data()  # read frame to get number of channels
        self.frame_channels = int(frame.shape[-1])
        self.remove_leading_singleton = remove_leading_singleton
        self.current_frame_pos = 0

    def __del__(self):
        try:
            self._vr.close()
        except AttributeError:  # if file does not exist this will be raised since _vr does not exist
            pass

    def __len__(self):
        """Length is number of frames."""
        return self.number_of_frames

    def __getitem__(self, index):
        # numpy-like slice imaging into arbitrary dims of the video
        # ugly.hacky but works
        frames = None
        if isinstance(index, int):  # single frame
            # ret, frames = self.read(index)
            # frames = cv2.cvtColor(frames, cv2.COLOR_BGR2RGB)
            self._seek(index)
            frames = self._vr.get_next_data()  # read
            self.current_frame_pos = index
        elif isinstance(index, slice):  # slice of frames
            frames = np.stack([self[ii] for ii in range(*index.indices(len(self)))])
        elif isinstance(index, range):  # range of frames
            frames = np.stack([self[ii] for ii in index])
        elif isinstance(index, tuple):  # unpack tuple of indices
            if isinstance(index[0], slice):
                indices = range(*index[0].indices(len(self)))
            elif isinstance(index[0], (np.integer, int)):
                indices = int(index[0])
            else:
                indices = None

            if indices is not None:
                frames = self[indices]

                # index into pixels and channels
                for cnt, idx in enumerate(index[1:]):
                    if isinstance(idx, slice):
                        ix = range(*idx.indices(self.shape[cnt+1]))
                    elif isinstance(idx, int):
                        ix = range(idx-1, idx)
                    else:
                        continue

                    if frames.ndim==4: # ugly indexing from the back (-1,-2 etc)
                        cnt = cnt+1
                    frames = np.take(frames, ix, axis=cnt)

        if self.remove_leading_singleton and frames is not None:
            if frames.shape[0] == 1:
                frames = frames[0]
        return frames

    def __repr__(self):
        return f"{self._filename} with {len(self)} frames of size {self.frame_shape} at {self.frame_rate:1.2f} fps"

    def __iter__(self):
        return self[:]

    def __enter__(self):
        return self

    def __exit__(self):
        """Release video file."""
        del(self)

    def read(self, frame_number=None):
        """Read next frame or frame specified by `frame_number`."""
        is_current_frame = frame_number == self.current_frame_pos
        # no need to seek if we are at the right position - greatly speeds up reading sunbsequent frames
        if frame_number is not None and not is_current_frame:
            self._seek(frame_number)
        frame = self._vr.get_next_data()  # read
        return frame

    def close(self):
        self._vr.close()

    def _reset(self):
        """Re-initialize object."""
        self.__init__(self._filename)

    def _seek(self, frame_number):
        """Go to frame."""
        self._vr.set_image_index(frame_number)
        self.current_frame_pos = frame_number

    @property
    def number_of_frames(self):
        return int(self._vr.count_frames())

    @property
    def frame_rate(self):
        return self._vr.get_meta_data()['fps']

    @property
    def frame_height(self):
        return int(self._vr.get_meta_data()['size'][1])

    @property
    def frame_width(self):
        return int(self._vr.get_meta_data()['size'][0])

    # @property # I didn't know how to implement this in imageio.
    # def fourcc(self):
    #     return int(self._vr.get(cv2.CAP_PROP_FOURCC))

    # @property # I didn't know how to implement this either.
    # def frame_format(self):
    #     return int(self._vr.get(cv2.CAP_PROP_FORMAT))

    @property
    def frame_shape(self):
        return (self.frame_height, self.frame_width, self.frame_channels)

    @property
    def dtype(self):
        return np.uint8

    @property
    def shape(self):
        return (self.number_of_frames, *self.frame_shape)

    @property
    def ndim(self):
        return len(self.shape)+1

    @property
    def size(self):
        return np.product(self.shape)

    def min(self):
        return 0

    def max(self):
        return 255

@jeylau
Copy link
Contributor

jeylau commented Aug 18, 2023

Hi @H-Dempsey, thank you so much for contributing! I just today had to deal with a video that was very slow to read, so I was excited to try your PR. Unfortunately, on that 100MB video the difference is barely visible, and after reading the code behind napari-video I realized that the implementation is pretty similar to what we had tried earlier on here.
It was faster, but cv2.VideoCapture().read() is not index-safe..., so on long videos, we'd often see a mismatch between the video frame and the corresponding annotations 😕 Is that something you observed too?

@jeylau jeylau self-assigned this Aug 18, 2023
@H-Dempsey
Copy link
Author

H-Dempsey commented Aug 22, 2023

Hi @jeylau,

Thank you for your response!

Unfortunately, on that 100MB video the difference is barely visible

I can show that it works on my end with larger videos.
The previous video I used was 25 MB, and here is another test with a 6.7 GB video (1920x1080, 49 mins, 30 fps).

Before:

Before2.mp4

After:

After2.mp4

cv2.VideoCapture().read() is not index-safe..., so on long videos, we'd often see a mismatch between the video frame and the corresponding annotations 😕 Is that something you observed too?

I just played around with an analysed 12-hour (20 fps) video, and I haven't observed it so far.
Could you explain why cap.read() is not index-safe, and could lead to a mismatch between the frame and annotations in long videos?
Could that be due to cap.set(cv2.CAP_PROP_POS_FRAMES, frame_no) rather than cap.read()?
I ran into this thread.
Setting the frame number using this seems to be unreliable sometimes (e.g. when the frame rate is variable).

While my pull request seems to improve speed, if it leads to inaccurate frame setting, I think we should not merge it.

Harry

Add decord to the list of dependencies, so that it can be used as the potential napari-deeplabcut video player.
Remove the current video reader class and add a decord-based video reader.
@H-Dempsey
Copy link
Author

Hi again,

I spent some more time looking into the issue of CAP_PROP_POS_FRAMES being inaccurate sometimes.
The people from this other project ran into the same issue as us.

They solved it by using a video player called decord.
It splits up the video seeking into fast and accurate versions.
By default, the video player uses accurate seek and it is also faster than the OpenCV and PyAV versions.


I was keen to see whether a decord video player with napari would give fast and accurate seeking.
I adapted the getitem function from the napari-video reader for decord, and this is the result.

napari-decord.mp4

It is significantly faster than the "Before" video above, and it uses accurate seeking.
On video sizes that are less than 6.7 GB, 1920x1080, 49 mins, 30 fps, it is even better.

I have added the changes to my pull request.
What do you think about this?

Harry

@H-Dempsey H-Dempsey reopened this Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants