-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize integer reading using bmi2 instructions #4
Comments
Unclear how big the benefit actually would be. Many integers and especially enum identifiers would be small so the small bytewise loop should actually be faster. The optimization would require an additional branch to make sure we can read 4 or 8 bytes and fallback if not. For writing to a vec, bmi2 could be more beneficial since could always write bigger chunks and then adjust the len afterwards. |
For reference, code for branchless decoding of up to 56 bits using bmi2 instructions. /// Reads an uleb128 encoded integer with at most 56 bits (8 bytes with 7 bits worth of payload each).
/// Returns the integer and the number of bytes that made up this integer.
/// If the returned length is bigger than 8 this means the integer required more than 8 bytes and the remaining bytes need to be read sequentially and combined with the return value.
/// Safety: `data` needs to contain at least 8 bytes.
#[target_feature(enable="bmi2")]
pub unsafe fn decode_uleb_bmi2(data: &[u8]) -> (u64, usize) {
const CONT_MARKER: u64 = 0x80808080_80808080;
debug_assert!(data.len() >= 8);
unsafe {
let word = data.as_ptr().cast::<u64>().read_unaligned();
// mask indicating continuation bytes
let mask = std::arch::x86_64::_pext_u64(word, CONT_MARKER);
let len = (!mask).trailing_zeros() + 1;
// which payload bits to extract
let ext = std::arch::x86_64::_bzhi_u64(!CONT_MARKER, 8*len);
let payload = std::arch::x86_64::_pext_u64(word, ext);
(payload, len as _)
}
} |
Branchless uleb encoding for the writing side (not yet tested): const REQUIRED_BYTES: [u8; 33] = [
1, 1, 1, 1, 1, 1, 1,
2, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3, 3,
4, 4, 4, 4, 4, 4, 4,
5, 5, 5, 5, 5
];
/// Safety: `data` needs to have space for at least 8 bytes.
#[target_feature(enable="bmi2")]
pub unsafe fn encode_uleb_bmi2(value: u32, output: &mut [u8]) -> usize {
const CONT_MARKER: u64 = 0x80808080_80808080;
const EXPAND_MASK: u64 = 0x7F7F7F_7F7F7F7F;
unsafe {
let required_bits = 32-value.leading_zeros();
//let required_bytes = required_bits/7 + 1;
let required_bytes = REQUIRED_BYTES[required_bits as usize];
let expanded = std::arch::x86_64::_pdep_u64(value as u64, EXPAND_MASK);
let marker = std::arch::x86_64::_bzhi_u64(CONT_MARKER, required_bytes as u32 * 8);
let marked = expanded | marker;
output.get_unchecked_mut(0..8).copy_from_slice(&marked.to_le_bytes());
required_bytes as _
}
} |
Skipping numbers should to be much simpler using sse2: /// Safety: `input` needs to contains at least 16 bytes
unsafe fn skip_uleb_sse2(input: &[u8]) -> &[u8] {
use std::arch::x86_64::*;
let chunk = _mm_loadu_si128(input.as_ptr().cast());
let mask = !_mm_movemask_epi8(chunk);
let len = (mask.trailing_zeros() as usize) + 1;
input.get_unchecked(len..)
} |
No description provided.
The text was updated successfully, but these errors were encountered: