We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
您好,想请教一下关于训练 T.D 的一些问题。论文中提到用AVSync15 数据集来训练T.D, 这个数据没有时间戳信息,只有分类和视频(命名类似于:6wHFhrAqt5Q_000023_000033_5.5_8.5.mp4), 请问怎么用来训练 timestamp detector,按道理训练数据应该有目标时间标记(音频每一帧是 1(有声) 还是0(无声) 的标记),这个时间标记怎么获取?
个人推测例如视频名称为: 6wHFhrAqt5Q_000023_000033_5.5_8.5.mp4, 则首先对应 训练数据是 vggsound 数据中的 6wHFhrAqt5Q_000023.mp4, 然后时间标记为1 的是 6wHFhrAqt5Q_000023.mp4中的 5.5~8.5, 其余的时间段的目标时间标记为0,是这样的吗?
请大佬们指教,感激不尽~~~
The text was updated successfully, but these errors were encountered:
但是说实话6wHFhrAqt5Q_000023_000033_5.5_8.5.mp4里面的3s标签也很粗糙
Sorry, something went wrong.
一般来说是要用librosa.onset
No branches or pull requests
您好,想请教一下关于训练 T.D 的一些问题。论文中提到用AVSync15 数据集来训练T.D, 这个数据没有时间戳信息,只有分类和视频(命名类似于:6wHFhrAqt5Q_000023_000033_5.5_8.5.mp4), 请问怎么用来训练 timestamp detector,按道理训练数据应该有目标时间标记(音频每一帧是 1(有声) 还是0(无声) 的标记),这个时间标记怎么获取?
个人推测例如视频名称为: 6wHFhrAqt5Q_000023_000033_5.5_8.5.mp4, 则首先对应 训练数据是 vggsound 数据中的 6wHFhrAqt5Q_000023.mp4, 然后时间标记为1 的是 6wHFhrAqt5Q_000023.mp4中的 5.5~8.5, 其余的时间段的目标时间标记为0,是这样的吗?
请大佬们指教,感激不尽~~~
The text was updated successfully, but these errors were encountered: