Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(decode): optimize Vec alloc/resize #179

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 7 additions & 5 deletions src/decode.rs
Original file line number Diff line number Diff line change
Expand Up @@ -105,8 +105,7 @@ pub fn decode_engine<E: Engine, T: AsRef<[u8]>>(
input: T,
engine: &E,
) -> Result<Vec<u8>, DecodeError> {
let mut buffer = Vec::<u8>::with_capacity(input.as_ref().len() * 4 / 3);

let mut buffer = Vec::new();
decode_engine_vec(input, &mut buffer, engine).map(|_| buffer)
}

Expand Down Expand Up @@ -156,11 +155,14 @@ pub fn decode_engine_vec<'e, 'o, E: Engine, T: AsRef<[u8]>>(
let starting_output_len = buffer.len();

let estimate = engine.decoded_length_estimate(input_bytes.len());
let total_len_estimate = estimate
.decoded_length_estimate()
let len_estimate = estimate.decoded_length_estimate();
let total_len_estimate = len_estimate
.checked_add(starting_output_len)
.expect("Overflow when calculating output buffer length");
buffer.resize(total_len_estimate, 0);
buffer.reserve(len_estimate);
unsafe {
buffer.set_len(total_len_estimate);
}
Comment on lines +162 to +165
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The safety documentation of set_len is pretty clear that this is UB. Namely it states

The elements at old_len..new_len must be initialized.

But total_len_estimate = starting_output_len + len_estimate, which means the last len_estimate elements are not properly initialized.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonasbb It's not UB or unsafe in aggregate since write into that "unintialized" portion of the buffer and then truncate to the true length.

This avoids the overhead of callocing/initializing a buffer that we'll immediately overwrite

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation mentions that as a safety requirement, and this code is not upholding it. Later, the code also creates a reference to this uninitialized memory, which is also not allowed.
Some functions like decode could also panic and unwind before truncate could set the correct length. Now, the drop implementation of Vec might potentially read the uninitialized values.

If you really need uninitialized memory you need to write to it via a pointer and update the length only after writing. Alternatively, using MaybeUninit is an option. If the buffer would be of type Vec<MaybeUninit<u8>>, then I think the set_len would be ok, but it isn't.


let bytes_written;
{
Expand Down
2 changes: 1 addition & 1 deletion src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@
variant_size_differences,
warnings
)]
#![forbid(unsafe_code)]
// #![forbid(unsafe_code)]
#![cfg_attr(not(any(feature = "std", test)), no_std)]

#[cfg(all(feature = "alloc", not(any(feature = "std", test))))]
Expand Down