Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize integer reading using bmi2 instructions #4

Open
jhorstmann opened this issue May 25, 2024 · 4 comments
Open

Optimize integer reading using bmi2 instructions #4

jhorstmann opened this issue May 25, 2024 · 4 comments

Comments

@jhorstmann
Copy link
Owner

No description provided.

@jhorstmann
Copy link
Owner Author

jhorstmann commented May 27, 2024

Unclear how big the benefit actually would be. Many integers and especially enum identifiers would be small so the small bytewise loop should actually be faster. The optimization would require an additional branch to make sure we can read 4 or 8 bytes and fallback if not.

For writing to a vec, bmi2 could be more beneficial since could always write bigger chunks and then adjust the len afterwards.

@jhorstmann
Copy link
Owner Author

For reference, code for branchless decoding of up to 56 bits using bmi2 instructions.

/// Reads an uleb128 encoded integer with at most 56 bits (8 bytes with 7 bits worth of payload each).
/// Returns the integer and the number of bytes that made up this integer.
/// If the returned length is bigger than 8 this means the integer required more than 8 bytes and the remaining bytes need to be read sequentially and combined with the return value.
/// Safety: `data` needs to contain at least 8 bytes.
#[target_feature(enable="bmi2")]
pub unsafe fn decode_uleb_bmi2(data: &[u8]) -> (u64, usize) {
    const CONT_MARKER: u64 = 0x80808080_80808080;
    debug_assert!(data.len() >= 8);
    
    unsafe {
        let word = data.as_ptr().cast::<u64>().read_unaligned();
        // mask indicating continuation bytes
        let mask = std::arch::x86_64::_pext_u64(word, CONT_MARKER);
        let len = (!mask).trailing_zeros() + 1;
        // which payload bits to extract
        let ext = std::arch::x86_64::_bzhi_u64(!CONT_MARKER, 8*len);
        let payload = std::arch::x86_64::_pext_u64(word, ext);
        
        (payload, len as _)
    }
}

@jhorstmann
Copy link
Owner Author

Branchless uleb encoding for the writing side (not yet tested):

const REQUIRED_BYTES: [u8; 33] = [
    1, 1, 1, 1, 1, 1, 1,
    2, 2, 2, 2, 2, 2, 2,
    3, 3, 3, 3, 3, 3, 3,
    4, 4, 4, 4, 4, 4, 4,
    5, 5, 5, 5, 5
];

/// Safety: `data` needs to have space for at least 8 bytes.
#[target_feature(enable="bmi2")]
pub unsafe fn encode_uleb_bmi2(value: u32, output: &mut [u8]) -> usize {
    const CONT_MARKER: u64 = 0x80808080_80808080;
    const EXPAND_MASK: u64 = 0x7F7F7F_7F7F7F7F;
    unsafe {
        let required_bits = 32-value.leading_zeros();

        //let required_bytes = required_bits/7 + 1;
        let required_bytes = REQUIRED_BYTES[required_bits as usize];
        let expanded = std::arch::x86_64::_pdep_u64(value as u64, EXPAND_MASK);
        let marker = std::arch::x86_64::_bzhi_u64(CONT_MARKER, required_bytes as u32 * 8);

        let marked = expanded | marker;
        output.get_unchecked_mut(0..8).copy_from_slice(&marked.to_le_bytes());
        required_bytes as _
    }
}

@jhorstmann
Copy link
Owner Author

Skipping numbers should to be much simpler using sse2:

/// Safety: `input` needs to contains at least 16 bytes
unsafe fn skip_uleb_sse2(input: &[u8]) -> &[u8] {
    use std::arch::x86_64::*;
    let chunk = _mm_loadu_si128(input.as_ptr().cast());
    let mask = !_mm_movemask_epi8(chunk);
    let len = (mask.trailing_zeros() as usize) + 1;
    input.get_unchecked(len..)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant