Floating-Point Arithmetic Library for Z80
https://en.wikipedia.org/wiki/Bfloat16_floating-point_format
FEDC BA98 7654 3210
eeee eeee Smmm mmmm
S (bit[7]): Sign bit
e (bits[F:8]): Biased exponent
m (bits[6:0]): Fraction
BIAS = $7F = 127
MIN = 2^-127 * 1 = 5.8774717541114375×10⁻³⁹
MAX = 2^+128 * 1.9921875 = 6.779062778503071×10³⁸
floating-point number = (-1)^S * 2^(eeee eeee - BIAS) * ( 128 + mmm mmmm) / 128
Sign position is moved from the original.
https://github.com/nagydani/lpfp
FEDC BA98 7654 3210
Seee eeee mmmm mmmm
S (bit[F]): Sign bit
e (bits[E:8]): Biased exponent
m (bits[7:0]): Fraction
BIAS = $40 = 64
MIN = 2^-64 * 1 = 5.421010862427522170037264×10⁻²⁰
MAX = 2^+63 * 1.99609375 = 1.8410715276690587648×10¹⁹
floating-point number = (-1)^S * 2^(eee eeee - BIAS) * ( 256 + mmmm mmmm) / 256
https://en.wikipedia.org/wiki/Half-precision_floating-point_format
FEDC BA98 7654 3210
Seee eemm mmmm mmmm
S (bit[F]): Sign bit
e (bits[E:A]): Biased exponent
m (bits[9:0]): Fraction
BIAS = $F = 15
MIN = 2^-15 * 1 = 0.000030517578125
MAX = 2^+16 * 1.9990234375 = 131008
floating-point number = (-1)^S * 2^(ee eee - BIAS) * ( 1024 + mm mmmm mmmm) / 1024
Without support for infinity, zero, and NaN.
if ( a + b > MAX ) { res = MAX; set carry }
if ( a - b < MIN ) { res = MIN; set carry }
Rounding of lost bits is (no matter what the sign):
if ( lost_bits <= ± least_significant_bit / 2 ) Result = Result;
if ( lost_bits > ± least_significant_bit / 2 ) Result = Result ± least_significant_bit;
mmmm 0000 .. mmmm 1000 => mmmm + 0
mmmm 1001 .. mmmm 1111 => mmmm + 1
finit.asm
must be included manually BEFORE first use, ideally as the first file. It contains only definitions of constants and does not store anything in memory.
Macro files must also be included BEFORE first use. Ideally behind finit.asm. Even those that will not be used because they contain only definitions.
For example, if you need to use a DIV operation, you must include the fdiv.asm file anywhere. The DIV math operation needs to use MUL internally, so it includes the fmul.asm file itself. The library is designed for the fastest calculations, so it uses pre-calculated tables for division and multiplication. These must be completely divisible at 256 addresses. So the fdiv.tab file must also be included manually. The fmul.tab file will be included automatically in fdiv.tab.
In short: if you need to use an operation
, manually include the file foperation.asm
, possibly foperation.tab
.
In binary16 floating-point format, the fln.tab
(natural logarithmic auxiliary tables, part LN2_EXP) is not divisible by 256. It is best to include them last.
In binary16 floating-point format, the fexp.tab
(natural exponential function auxiliary tables) is not divisible by 256. It is best to include them last.
call fadd ; HL = HL + DE
call faddp ; HL = HL + DE ( HL and DE have the same signs )
call fsub ; HL = HL - DE
call fsubp ; HL = HL - DE ( HL and DE have the same signs )
call fmul ; HL = BC * DE ( needs to include fmul.tab )
call fdiv ; HL = BC / HL ( needs to include fdiv.tab )
call fmod ; HL = BC % HL
call fsqdif ; HL = HL^2 - DE^2 = (HL - DE) * (HL + DE)
call fln ; HL = ln(abs(HL)) ( needs to include fln.tab )
call fexp ; HL = e^HL ( needs to include fexp.tab )
call fpow2 ; HL = HL * HL ( needs to include fpow2.tab)
call fsqrt ; HL = abs(HL) ^ 0.5 ( needs to include fsqrt.tab )
call fsin ; HL = sin(HL) ( needs to include fsin.tab, HL < ±π/2 )
call frac ; HL = HL % 1 ( -0 => +0 (FMMIN => FPMIN) )
call fint ; HL = truncate(HL) * 1.0
; = HL - ( HL % 1 )
call fwld ; HL = (unsigned word) HL * 1.0
call fwst ; HL = (unsigned word) 0.5 + abs(HL)
call fbld ; DE = (unsigned char) A * 1.0
call fbld_div2 ; DE = (unsigned char) A * 0.5
call fbld_div4 ; DE = (unsigned char) A * 0.25
call fbld_div16 ; DE = (unsigned char) A * 0.0625
call fcmp ; set flag for HL - DE
call fcmpa ; set flag for abs(HL) - abs(DE)
call fcmps ; set flag for HL - DE ( HL and DE have the same signs )
call fdot ; dot product BC = (HL) * (DE) ( A = dimensions, HL += 2*A, DE += 2*A )
call fdot_rec ; dot product BC = (HL) * (DE) ( A = dim., use recursion, HL and DE = ?? )
call flen ; HL = (HL)^2 + ... + (HL+2*B-2)^2 ( B = dim. ≥ 1, square norm of a vector )
MACROS (must be included before first use):
mtst H, L ; set flag for (HL - 0)
; if ( result == ±MIN ) set zero flag;
; if ( result <= -MIN ) set carry flag;
mcmpa H, L, D, E ; set flag for abs(HL) - abs(DE)
mcmps H, L, D, E ; set flag for HL - DE ( HL and DE have the same signs )
mneg H, L ; HL = -HL
mabs H, L ; HL = abs(HL)
mge0 H, L ; if (HL >= 0) set zero flag;
msor H, L, D, E ; (BIT 7, A) = HL_sign or DE_sign
mmul2 H, L ; HL = HL * 2
mdiv2 H, L ; HL = HL / 2
other
call print_text ; printf("%s", POP); ( look at PRINT_GROUNDI in ./Demo/print.asm )
call print_bin ; printf("+(2^+exp)*1.mant issa", HL);
call print_hex ; printf("$%04X", HL);
call print_hex_DE ; printf("$%04X", DE); ( print_hex.asm )
call print_hex_BC ; printf("$%04X", BC); ( print_hex.asm )
call print_hex_DE ; printf("$%04X", DE); ( print_hex.asm )
push DE
call print_hex_stack ; printf("$%04X", POP); ( print_hex.asm )
color_flow_warning EQU 0
carry_flow_warning EQU 1
binary16 danagy bfloat use
-------- -------- -------- --------
fadd+fsub: 369 177 188
fadd: 365 173 184
fsub: 369 177 188
fmul+fdiv: 10453 1931 1422 fmul.tab + fdiv.tab
fmul: 8322 1629 1108 fmul.tab
fdiv: 10453 1931 1422 fdiv.tab (include itself fmul.tab)
fln (fix_ln EQU 0): 2511 966 976 fln.tab
fln (fix_ln EQU 1): 3551 1489 1504 fln.tab
fexp: 8525 2201 1683 fexp.tab (include itself fmul.tab)
fsin: 2177 829 544 fsin.tab
fmod: 118 69 77
fpow2: 8333 279 158 fpow2.tab
fsqrt: 4120 527 269 fsqrt.tab
frac: 55 31 33
fint: 38 23 23
fwld: 57 32 37
fwst: 53 49 46
fbld: 18 16 17
all: 21050 5858 4844
More information in *.dat files.
FDIV ------------------ |
binary16 | danagy | bfloat | comment |
---|---|---|---|---|
± 1 | 20.56% | 20.11% | 19.16% | BC / HL counted as BC * (1 / HL) |
± more | accurate | accurate | accurate |
FLN fix_ln EQU 0 |
binary16 | danagy | bfloat | comment |
---|---|---|---|---|
± 1 | 29.93% | 24.07% | 25.09% | |
± 2 | 0.30% | 0.05% | 0.05% | only when input exponent = -1 |
± 3 | 0.19% | 0.03% | 0.03% | only when input exponent = -1 |
min..max | 0.48% | 0.07% | 0.05% | only when input exponent = -1 |
min | -59 | -10 | 4 | correctly-result |
max | 6 | 7 | 7 | correctly-result |
± more | accurate | accurate | accurate |
FLN fix_ln EQU 1 |
binary16 | danagy | bfloat | comment |
---|---|---|---|---|
± 1 | 28.53% | 23.90% | 24.60% | Input with exponent -1, is corrected. |
± more | accurate | accurate | accurate |
FEXP ------------------ |
binary16 | danagy | bfloat | comment |
---|---|---|---|---|
± 1 | 16.73% | 4.52% | 2.40% | |
± 2 | 0.91% | 0.56% | 0.34% | |
± 3 | 0.05% | 0.05% | 0.02% | |
± more | accurate | accurate | accurate |
e^(ln(2)) = e^0.6931471805599453094172321 = 2
2^(log₂(e)) = 2^1.4426950408889634073599247 = e
e^x = 2^(x*1.4426950408889634073599247)
FSIN ------------------ |
binary16 | danagy | bfloat | comment |
---|---|---|---|---|
± 1 | 0.26% | accurate | accurate | |
± 2 | 0.03% | accurate | accurate | |
± more | accurate | accurate | accurate |
Other ------------------ |
binary16 | danagy | bfloat | comment |
---|---|---|---|---|
accurate | accurate | accurate |
Warning: Add and subtract operations are implemented accurately, which means that the error is less than half the least significant bit. However, these two operations are the source of the most inaccurate. The width of the mantissa is limited, so a smaller number will lose some or all of the mantissa.
(A + B) - B = A
only applies if A >= B
When A < B:
(A + B) - B = A
if the lost lowest bits of the mantissa number A after operation (A + B) were zero.
(A + B) - B = A - error
where the error is equal to the lost bits of the mantissa number A after the operation (A + B).
(A + B) - B = 0
if the number B is too large and the number A lost all mantissa bits after the operation (A + B). This means A + B = B.
80-bits, 32-bit and 24-bit float formats https://github.com/Zeda/z80float
24-bit float format https://github.com/RoaldFre/z80fltptlib
16-bit float format (seee eeee mmmm mmmm) https://github.com/nagydani/lpfp
16-bit float format (seee eemm mmmm mmmm) https://github.com/artyom-beilis/float16
Learn more about algorithms (natural) exponential function http://guihaire.com/code/?p=1107
http://www.andreadrian.de/oldcpu/Z80_number_cruncher.html
http://mathonweb.com/help_ebook/html/algorithms.htm