Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ffmpeg zvbi addition #4864

Merged
merged 9 commits into from
Sep 20, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions cross/ffmpeg/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,8 @@ endif
DEPENDS += cross/libzmq
CONFIGURE_ARGS += --enable-libzmq

DEPENDS += cross/zvbi
CONFIGURE_ARGS += --enable-libzvbi
th0ma7 marked this conversation as resolved.
Show resolved Hide resolved

#
# fdk-acc is now considered compatible with (L)GPL.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,287 @@
From: James Cowgill <[email protected]>
Date: Sun, 11 Aug 2019 16:50:56 +0100
Subject: avcodec/arm/sbcenc: avoid callee preserved vfp registers

When compiling FFmpeg with GCC-9, some very random segfaults were
observed in code which had previously called down into the SBC encoder
NEON assembly routines. This was caused by these functions clobbering
some of the vfp callee saved registers (d8 - d15 aka q4 - q7). GCC was
using these registers to save local variables, but after these
functions returned, they would contain garbage.

Fix by reallocating the registers in the two affected functions in
the following way:
ff_sbc_analyze_4_neon: q2-q5 => q8-q11, then q1-q4 => q8-q11
ff_sbc_analyze_8_neon: q2-q9 => q8-q15

The reason for using these replacements is to keep closely related
sets of registers consecutively numbered which hopefully makes the
code more easy to follow. Since this commit only reallocates
registers, it should have no performance impact.

Signed-off-by: James Cowgill <[email protected]>
---
libavcodec/arm/sbcdsp_neon.S | 220 +++++++++++++++++++++----------------------
1 file changed, 110 insertions(+), 110 deletions(-)

diff --git a/libavcodec/arm/sbcdsp_neon.S b/libavcodec/arm/sbcdsp_neon.S
index d83d21d..914abfb 100644
--- ./libavcodec/arm/sbcdsp_neon.S
+++ ./libavcodec/arm/sbcdsp_neon.S
@@ -38,49 +38,49 @@ function ff_sbc_analyze_4_neon, export=1
/* TODO: merge even and odd cases (or even merge all four calls to this
* function) in order to have only aligned reads from 'in' array
* and reduce number of load instructions */
- vld1.16 {d4, d5}, [r0, :64]!
- vld1.16 {d8, d9}, [r2, :128]!
+ vld1.16 {d16, d17}, [r0, :64]!
+ vld1.16 {d20, d21}, [r2, :128]!

- vmull.s16 q0, d4, d8
- vld1.16 {d6, d7}, [r0, :64]!
- vmull.s16 q1, d5, d9
- vld1.16 {d10, d11}, [r2, :128]!
+ vmull.s16 q0, d16, d20
+ vld1.16 {d18, d19}, [r0, :64]!
+ vmull.s16 q1, d17, d21
+ vld1.16 {d22, d23}, [r2, :128]!

- vmlal.s16 q0, d6, d10
- vld1.16 {d4, d5}, [r0, :64]!
- vmlal.s16 q1, d7, d11
- vld1.16 {d8, d9}, [r2, :128]!
+ vmlal.s16 q0, d18, d22
+ vld1.16 {d16, d17}, [r0, :64]!
+ vmlal.s16 q1, d19, d23
+ vld1.16 {d20, d21}, [r2, :128]!

- vmlal.s16 q0, d4, d8
- vld1.16 {d6, d7}, [r0, :64]!
- vmlal.s16 q1, d5, d9
- vld1.16 {d10, d11}, [r2, :128]!
+ vmlal.s16 q0, d16, d20
+ vld1.16 {d18, d19}, [r0, :64]!
+ vmlal.s16 q1, d17, d21
+ vld1.16 {d22, d23}, [r2, :128]!

- vmlal.s16 q0, d6, d10
- vld1.16 {d4, d5}, [r0, :64]!
- vmlal.s16 q1, d7, d11
- vld1.16 {d8, d9}, [r2, :128]!
+ vmlal.s16 q0, d18, d22
+ vld1.16 {d16, d17}, [r0, :64]!
+ vmlal.s16 q1, d19, d23
+ vld1.16 {d20, d21}, [r2, :128]!

- vmlal.s16 q0, d4, d8
- vmlal.s16 q1, d5, d9
+ vmlal.s16 q0, d16, d20
+ vmlal.s16 q1, d17, d21

vpadd.s32 d0, d0, d1
vpadd.s32 d1, d2, d3

vrshrn.s32 d0, q0, SBC_PROTO_FIXED_SCALE

- vld1.16 {d2, d3, d4, d5}, [r2, :128]!
+ vld1.16 {d16, d17, d18, d19}, [r2, :128]!

vdup.i32 d1, d0[1] /* TODO: can be eliminated */
vdup.i32 d0, d0[0] /* TODO: can be eliminated */

- vmull.s16 q3, d2, d0
- vmull.s16 q4, d3, d0
- vmlal.s16 q3, d4, d1
- vmlal.s16 q4, d5, d1
+ vmull.s16 q10, d16, d0
+ vmull.s16 q11, d17, d0
+ vmlal.s16 q10, d18, d1
+ vmlal.s16 q11, d19, d1

- vpadd.s32 d0, d6, d7 /* TODO: can be eliminated */
- vpadd.s32 d1, d8, d9 /* TODO: can be eliminated */
+ vpadd.s32 d0, d20, d21 /* TODO: can be eliminated */
+ vpadd.s32 d1, d22, d23 /* TODO: can be eliminated */

vst1.32 {d0, d1}, [r1, :128]

@@ -91,57 +91,57 @@ function ff_sbc_analyze_8_neon, export=1
/* TODO: merge even and odd cases (or even merge all four calls to this
* function) in order to have only aligned reads from 'in' array
* and reduce number of load instructions */
- vld1.16 {d4, d5}, [r0, :64]!
- vld1.16 {d8, d9}, [r2, :128]!
-
- vmull.s16 q6, d4, d8
- vld1.16 {d6, d7}, [r0, :64]!
- vmull.s16 q7, d5, d9
- vld1.16 {d10, d11}, [r2, :128]!
- vmull.s16 q8, d6, d10
- vld1.16 {d4, d5}, [r0, :64]!
- vmull.s16 q9, d7, d11
- vld1.16 {d8, d9}, [r2, :128]!
-
- vmlal.s16 q6, d4, d8
- vld1.16 {d6, d7}, [r0, :64]!
- vmlal.s16 q7, d5, d9
- vld1.16 {d10, d11}, [r2, :128]!
- vmlal.s16 q8, d6, d10
- vld1.16 {d4, d5}, [r0, :64]!
- vmlal.s16 q9, d7, d11
- vld1.16 {d8, d9}, [r2, :128]!
-
- vmlal.s16 q6, d4, d8
- vld1.16 {d6, d7}, [r0, :64]!
- vmlal.s16 q7, d5, d9
- vld1.16 {d10, d11}, [r2, :128]!
- vmlal.s16 q8, d6, d10
- vld1.16 {d4, d5}, [r0, :64]!
- vmlal.s16 q9, d7, d11
- vld1.16 {d8, d9}, [r2, :128]!
-
- vmlal.s16 q6, d4, d8
- vld1.16 {d6, d7}, [r0, :64]!
- vmlal.s16 q7, d5, d9
- vld1.16 {d10, d11}, [r2, :128]!
- vmlal.s16 q8, d6, d10
- vld1.16 {d4, d5}, [r0, :64]!
- vmlal.s16 q9, d7, d11
- vld1.16 {d8, d9}, [r2, :128]!
-
- vmlal.s16 q6, d4, d8
- vld1.16 {d6, d7}, [r0, :64]!
- vmlal.s16 q7, d5, d9
- vld1.16 {d10, d11}, [r2, :128]!
-
- vmlal.s16 q8, d6, d10
- vmlal.s16 q9, d7, d11
-
- vpadd.s32 d0, d12, d13
- vpadd.s32 d1, d14, d15
- vpadd.s32 d2, d16, d17
- vpadd.s32 d3, d18, d19
+ vld1.16 {d16, d17}, [r0, :64]!
+ vld1.16 {d20, d21}, [r2, :128]!
+
+ vmull.s16 q12, d16, d20
+ vld1.16 {d18, d19}, [r0, :64]!
+ vmull.s16 q13, d17, d21
+ vld1.16 {d22, d23}, [r2, :128]!
+ vmull.s16 q14, d18, d22
+ vld1.16 {d16, d17}, [r0, :64]!
+ vmull.s16 q15, d19, d23
+ vld1.16 {d20, d21}, [r2, :128]!
+
+ vmlal.s16 q12, d16, d20
+ vld1.16 {d18, d19}, [r0, :64]!
+ vmlal.s16 q13, d17, d21
+ vld1.16 {d22, d23}, [r2, :128]!
+ vmlal.s16 q14, d18, d22
+ vld1.16 {d16, d17}, [r0, :64]!
+ vmlal.s16 q15, d19, d23
+ vld1.16 {d20, d21}, [r2, :128]!
+
+ vmlal.s16 q12, d16, d20
+ vld1.16 {d18, d19}, [r0, :64]!
+ vmlal.s16 q13, d17, d21
+ vld1.16 {d22, d23}, [r2, :128]!
+ vmlal.s16 q14, d18, d22
+ vld1.16 {d16, d17}, [r0, :64]!
+ vmlal.s16 q15, d19, d23
+ vld1.16 {d20, d21}, [r2, :128]!
+
+ vmlal.s16 q12, d16, d20
+ vld1.16 {d18, d19}, [r0, :64]!
+ vmlal.s16 q13, d17, d21
+ vld1.16 {d22, d23}, [r2, :128]!
+ vmlal.s16 q14, d18, d22
+ vld1.16 {d16, d17}, [r0, :64]!
+ vmlal.s16 q15, d19, d23
+ vld1.16 {d20, d21}, [r2, :128]!
+
+ vmlal.s16 q12, d16, d20
+ vld1.16 {d18, d19}, [r0, :64]!
+ vmlal.s16 q13, d17, d21
+ vld1.16 {d22, d23}, [r2, :128]!
+
+ vmlal.s16 q14, d18, d22
+ vmlal.s16 q15, d19, d23
+
+ vpadd.s32 d0, d24, d25
+ vpadd.s32 d1, d26, d27
+ vpadd.s32 d2, d28, d29
+ vpadd.s32 d3, d30, d31

vrshr.s32 q0, q0, SBC_PROTO_FIXED_SCALE
vrshr.s32 q1, q1, SBC_PROTO_FIXED_SCALE
@@ -153,38 +153,38 @@ function ff_sbc_analyze_8_neon, export=1
vdup.i32 d1, d0[1] /* TODO: can be eliminated */
vdup.i32 d0, d0[0] /* TODO: can be eliminated */

- vld1.16 {d4, d5}, [r2, :128]!
- vmull.s16 q6, d4, d0
- vld1.16 {d6, d7}, [r2, :128]!
- vmull.s16 q7, d5, d0
- vmull.s16 q8, d6, d0
- vmull.s16 q9, d7, d0
-
- vld1.16 {d4, d5}, [r2, :128]!
- vmlal.s16 q6, d4, d1
- vld1.16 {d6, d7}, [r2, :128]!
- vmlal.s16 q7, d5, d1
- vmlal.s16 q8, d6, d1
- vmlal.s16 q9, d7, d1
-
- vld1.16 {d4, d5}, [r2, :128]!
- vmlal.s16 q6, d4, d2
- vld1.16 {d6, d7}, [r2, :128]!
- vmlal.s16 q7, d5, d2
- vmlal.s16 q8, d6, d2
- vmlal.s16 q9, d7, d2
-
- vld1.16 {d4, d5}, [r2, :128]!
- vmlal.s16 q6, d4, d3
- vld1.16 {d6, d7}, [r2, :128]!
- vmlal.s16 q7, d5, d3
- vmlal.s16 q8, d6, d3
- vmlal.s16 q9, d7, d3
-
- vpadd.s32 d0, d12, d13 /* TODO: can be eliminated */
- vpadd.s32 d1, d14, d15 /* TODO: can be eliminated */
- vpadd.s32 d2, d16, d17 /* TODO: can be eliminated */
- vpadd.s32 d3, d18, d19 /* TODO: can be eliminated */
+ vld1.16 {d16, d17}, [r2, :128]!
+ vmull.s16 q12, d16, d0
+ vld1.16 {d18, d19}, [r2, :128]!
+ vmull.s16 q13, d17, d0
+ vmull.s16 q14, d18, d0
+ vmull.s16 q15, d19, d0
+
+ vld1.16 {d16, d17}, [r2, :128]!
+ vmlal.s16 q12, d16, d1
+ vld1.16 {d18, d19}, [r2, :128]!
+ vmlal.s16 q13, d17, d1
+ vmlal.s16 q14, d18, d1
+ vmlal.s16 q15, d19, d1
+
+ vld1.16 {d16, d17}, [r2, :128]!
+ vmlal.s16 q12, d16, d2
+ vld1.16 {d18, d19}, [r2, :128]!
+ vmlal.s16 q13, d17, d2
+ vmlal.s16 q14, d18, d2
+ vmlal.s16 q15, d19, d2
+
+ vld1.16 {d16, d17}, [r2, :128]!
+ vmlal.s16 q12, d16, d3
+ vld1.16 {d18, d19}, [r2, :128]!
+ vmlal.s16 q13, d17, d3
+ vmlal.s16 q14, d18, d3
+ vmlal.s16 q15, d19, d3
+
+ vpadd.s32 d0, d24, d25 /* TODO: can be eliminated */
+ vpadd.s32 d1, d26, d27 /* TODO: can be eliminated */
+ vpadd.s32 d2, d28, d29 /* TODO: can be eliminated */
+ vpadd.s32 d3, d30, d31 /* TODO: can be eliminated */

vst1.32 {d0, d1, d2, d3}, [r1, :128]

18 changes: 18 additions & 0 deletions cross/zvbi/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
PKG_NAME = zvbi
PKG_VERS = 0.2.35
PKG_EXT = tar.bz2
PKG_DIST_NAME = $(PKG_NAME)-$(PKG_VERS).$(PKG_EXT)
PKG_DIST_SITE = https://downloads.sourceforge.net/project/zapping/$(PKG_NAME)/$(PKG_VERS)
PKG_DIR = $(PKG_NAME)-$(PKG_VERS)

DEPENDS = cross/libpng

HOMEPAGE = https://zapping.sf.net/ZVBI
COMMENT = Vertical Blanking Interval capture and decoding library
LICENSE = GPL

GNU_CONFIGURE = 1

PATCHES_LEVEL = 1

include ../../mk/spksrc.cross-cc.mk
10 changes: 10 additions & 0 deletions cross/zvbi/PLIST
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
bin:bin/zvbi-atsc-cc
bin:bin/zvbi-chains
bin:bin/zvbi-ntsc-cc
lnk:lib/libzvbi-chains.so
lnk:lib/libzvbi-chains.so.0
lib:lib/libzvbi-chains.so.0.0.0
lnk:lib/libzvbi.so
lnk:lib/libzvbi.so.0
lib:lib/libzvbi.so.0.13.2
bin:sbin/zvbid
3 changes: 3 additions & 0 deletions cross/zvbi/digests
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
zvbi-0.2.35.tar.bz2 SHA1 b0fc8d596c90d603e883e6b195318c6b276a3eb4
zvbi-0.2.35.tar.bz2 SHA256 fc883c34111a487c4a783f91b1b2bb5610d8d8e58dcba80c7ab31e67e4765318
zvbi-0.2.35.tar.bz2 MD5 95e53eb208c65ba6667fd4341455fa27
Loading