Skip to content

Commit

Permalink
Merge pull request #27 from bodil/safe-layout
Browse files Browse the repository at this point in the history
Stop relying on undefined behaviour.

`smartstring` now implements its own boxed string type rather than deferring directly to `String`, so it no longer makes assumptions it shouldn't be making about the layout of the `String` struct.

This also allows us to organise the boxed struct in a way that will let us rely only on our basic assumption that heap memory is word aligned on both big and little endian architectures. The most immediate consequence of this is that `smartstring` will now compile on 32-bit big endian architectures such as `mips`.

Closes #4 #8 #14
  • Loading branch information
bodil authored Feb 23, 2022
2 parents 22f9049 + 0b1aa69 commit dd4495f
Show file tree
Hide file tree
Showing 17 changed files with 652 additions and 633 deletions.
26 changes: 24 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
rust:
- stable
- nightly
- 1.56.0 # lowest supported version
- 1.57.0 # lowest supported version
flags:
- --all-features
- --no-default-features
Expand All @@ -38,7 +38,7 @@ jobs:
rust:
- stable
- nightly
- 1.56.0 # lowest supported version
- 1.57.0 # lowest supported version
steps:
- uses: actions/checkout@v2
- uses: actions-rs/toolchain@v1
Expand All @@ -51,6 +51,27 @@ jobs:
command: test
args: --all-features

nostd:
name: no_std build
runs-on: ubuntu-latest
strategy:
matrix:
rust:
- stable
- nightly
- 1.57.0 # lowest supported version
steps:
- uses: actions/checkout@v2
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: ${{ matrix.rust }}
override: true
- uses: actions-rs/cargo@v1
with:
command: build
args: --no-default-features

fmt:
name: Rustfmt
runs-on: ubuntu-latest
Expand Down Expand Up @@ -88,6 +109,7 @@ jobs:
name: Clippy-${{ matrix.rust }}
token: ${{ secrets.GITHUB_TOKEN }}
args: --all-features

# miri:
# name: Miri
# runs-on: ubuntu-latest
Expand Down
21 changes: 21 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,27 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) and this project
adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

## [Unreleased]

### CHANGED

- `smartstring` now implements its own boxed string type rather than deferring directly to
`String`, so it no longer makes assumptions it shouldn't be making about the layout of the
`String` struct. This also allows us to organise the boxed struct in a way that will let us rely
only on our basic assumption that heap memory is word aligned on both big and little endian
architectures. The most immediate consequence of this is that `smartstring` will now compile on
32-bit big endian architectures such as `mips`.

In short: `smartstring` no longer relies on undefined behaviour, and should be safe to use
anywhere.

- The above means that the boxed `SmartString` is no longer pointer compatible with `String`, so
if you were relying on that despite the documentation telling you not to, you'll really have to
stop it now. Converting between `SmartString` and `String` using `From` and `Into` traits is
still efficient and allocation free.

- The minimum supported rustc version is now 1.57.0.

## [0.2.10] - 2022-02-20

### CHANGED
Expand Down
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ readme = "./README.md"
categories = ["data-structures"]
keywords = ["cache-local", "cpu-cache", "small-string", "sso", "inline-string"]
exclude = ["release.toml", "proptest-regressions/**"]
rust-version = "1.56"
rust-version = "1.57"

[package.metadata.docs.rs]
features = ["arbitrary", "proptest", "serde"]
Expand Down
42 changes: 3 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
# smartstring

[![Travis CI](https://travis-ci.org/bodil/smartstring.svg?branch=master&status=passed)](https://travis-ci.org/github/bodil/smartstring)

Compact inlined strings.

## tl;dr
Expand All @@ -18,42 +16,9 @@ beyond the inline capacity. This has the advantage of avoiding heap allocations
well as improving performance thanks to keeping the strings on the stack.

This is all accomplished without the need for an external discriminant, so a `SmartString` is
exactly the same size as a `String` on the stack, regardless of whether it's inlined or not, and
when not inlined it's pointer compatible with `String`, meaning that you can safely coerce a
`SmartString` to a `String` using `std::mem::replace()` or `pointer::cast()` and go on using it as
if it had never been a `SmartString`. (But please don't do that, there's an `Into<String>`
implementation that's much safer.)

## Supported architectures
`smartstring` currently doesn't run on 32-bit big endian architectures like `powerpc`, so its use
in any crates that intend to run on those architectures should ideally be gated behind a
[platform specific dependency](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#platform-specific-dependencies)
in your `Cargo.toml`, like so:
```toml
[target.'cfg(not(all(target_endian = "big", target_pointer_width = "32")))'.dependencies]
smartstring = "0.2"
```

This will ensure that `cargo` does not try to build `smartstring` on these unsupported
architectures, which will otherwise [always fail](https://github.com/bodil/smartstring/blob/v0.2.9/src/config.rs#L91-L93).

## Caveat

The way `smartstring` gets by without a discriminant is dependent on the memory layout of the
`std::string::String` struct, which isn't something the Rust compiler and standard library make any
guarantees about. `smartstring` makes an assumption about how it's been laid out, which has held
basically since rustc came into existence, but is nonetheless not a safe assumption to make, and if
the layout ever changes, `smartstring` will stop working properly (at least on little-endian
architectures, the assumptions made on big-endian archs will hold regardless of the actual memory
layout). Its test suite does comprehensive validation of these assumptions, and as long as the
[CI build](https://travis-ci.org/github/bodil/smartstring) is passing for any given rustc version,
you can be sure it will do its job properly on all tested architectures. You can also check out the
`smartstring` source tree yourself and run `cargo test` to validate it for your particular
configuration.

As an extra precaution, some runtime checks are made as well, so that if the memory layout
assumption no longer holds, `smartstring` will not work correctly, but there should be no security
implications and it should crash early.
exactly the same size as a `String` on the stack, regardless of whether it's inlined or not.
Converting a heap allocated `SmartString` into a `String` and vice versa is also a zero cost
operation, as one will reuse the allocated memory of the other.

## Documentation

Expand All @@ -71,5 +36,4 @@ was not distributed with this file, You can obtain one at <http://mozilla.org/MP
Please note that this project is released with a [Contributor Code of Conduct][coc]. By
participating in this project you agree to abide by its terms.

[immutable.rs]: https://immutable.rs/
[coc]: https://github.com/bodil/sized-chunks/blob/master/CODE_OF_CONDUCT.md
2 changes: 2 additions & 0 deletions proptest-regressions/test.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,5 @@
cc 371b459819b92f730318ad7fb8b15be10ac03393f9eb776d966a9af8bc489ae9 # shrinks to constructor = New, actions = [InsertStr(1, "")]
cc 5f343832f658239791b2754c8f1ec82e4caa9efb34bf1393636df2480ec9f176 # shrinks to constructor = New, actions = [InsertStr(1, "")]
cc b11c06f9d964d4fd4d4b6e36a7b04c383138422ff7f0f1d37d0c706de451d770 # shrinks to constructor = New, actions = [PushStr("{%:A¥%🕴🕴"), PushStr("%{{%¥{"), PushStr("?:%"), PushStr("🕴"), PushStr("%{?2"), Retain("?{¥:2🕴%")]
cc 36b6f0fa95e8925cda11c176d3f606208e8085d3367c74c2a5f6df0538277b7a # shrinks to constructor = FromString("AΣA א \u{16af0}אff𑌓"), actions = [InsertStr(6, "")]
cc 746a6d4c7bc53760e936eb5b7c332a9228f0a5209abd9538685e53c04d26ac71 # shrinks to constructor = New, actions = [PushStr("00𐲀Ὑ𞺋 🡐\u{abc}a0"), InsertStr(3, "")]
4 changes: 4 additions & 0 deletions src/arbitrary.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
// This Source Code Form is subject to the terms of the Mozilla Public
// License, v. 2.0. If a copy of the MPL was not distributed with this
// file, You can obtain one at http://mozilla.org/MPL/2.0/.

use crate::{SmartString, SmartStringMode};
use alloc::string::String;
use arbitrary::{Arbitrary, Result, Unstructured};
Expand Down
183 changes: 149 additions & 34 deletions src/boxed.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,57 +2,172 @@
// License, v. 2.0. If a copy of the MPL was not distributed with this
// file, You can obtain one at http://mozilla.org/MPL/2.0/.

use alloc::string::String;
use core::cmp::Ordering;
use alloc::{alloc::Layout, string::String};
use core::{
mem::forget,
ops::{Deref, DerefMut},
ptr::NonNull,
};

#[allow(unreachable_pub)]
pub trait BoxedString {
fn string(&self) -> &String;
fn string_mut(&mut self) -> &mut String;
fn into_string(self) -> String;
use crate::{ops::GenericString, MAX_INLINE};

fn cmp_with_str(&self, other: &str) -> Ordering;
fn cmp_with_self(&self, other: &Self) -> Ordering;
fn eq_with_str(&self, other: &str) -> bool;
fn eq_with_self(&self, other: &Self) -> bool;
#[cfg(not(endian = "big"))]
#[repr(C)]
pub(crate) struct BoxedString {
ptr: NonNull<u8>,
cap: usize,
len: usize,
}

#[cfg(endian = "big")]
#[repr(C)]
pub(crate) struct BoxedString {
length: usize,
cap: usize,
ptr: NunNull<u8>,
}

fn len(&self) -> usize {
self.string().len()
impl GenericString for BoxedString {
fn set_size(&mut self, size: usize) {
self.len = size;
debug_assert!(self.len <= self.cap);
}

fn as_mut_capacity_slice(&mut self) -> &mut [u8] {
#[allow(unsafe_code)]
unsafe {
core::slice::from_raw_parts_mut(self.ptr.as_ptr(), self.capacity())
}
}
}

impl BoxedString for String {
#[inline(always)]
fn string(&self) -> &String {
self
impl BoxedString {
const MINIMAL_CAPACITY: usize = MAX_INLINE * 2;

fn layout_for(cap: usize) -> Layout {
let layout = Layout::array::<u8>(cap).unwrap();
assert!(
layout.size() <= isize::MAX as usize,
"allocation too large!"
);
layout
}

fn alloc(cap: usize) -> NonNull<u8> {
let layout = Self::layout_for(cap);
#[allow(unsafe_code)]
let ptr = unsafe { alloc::alloc::alloc(layout) };
match NonNull::new(ptr) {
Some(ptr) => ptr,
None => alloc::alloc::handle_alloc_error(layout),
}
}

fn realloc(&mut self, cap: usize) {
let layout = Self::layout_for(cap);
let old_layout = Self::layout_for(self.cap);
let old_ptr = self.ptr.as_ptr();
#[allow(unsafe_code)]
let ptr = unsafe { alloc::alloc::realloc(old_ptr, old_layout, layout.size()) };
self.ptr = match NonNull::new(ptr) {
Some(ptr) => ptr,
None => alloc::alloc::handle_alloc_error(layout),
};
self.cap = cap;
}

#[inline(always)]
fn string_mut(&mut self) -> &mut String {
self
pub(crate) fn ensure_capacity(&mut self, target_cap: usize) {
let mut cap = self.cap;
while cap < target_cap {
cap *= 2;
}
self.realloc(cap)
}

fn into_string(self) -> String {
self
pub(crate) fn new(cap: usize) -> Self {
let cap = cap.max(Self::MINIMAL_CAPACITY);
Self {
cap,
len: 0,
ptr: Self::alloc(cap),
}
}

#[inline(always)]
fn cmp_with_str(&self, other: &str) -> Ordering {
self.as_str().cmp(other)
pub(crate) fn from_str(cap: usize, src: &str) -> Self {
let mut out = Self::new(cap);
out.len = src.len();
out.as_mut_capacity_slice()[..src.len()].copy_from_slice(src.as_bytes());
out
}

#[inline(always)]
fn cmp_with_self(&self, other: &Self) -> Ordering {
self.cmp(other)
pub(crate) fn capacity(&self) -> usize {
self.cap
}

#[inline(always)]
fn eq_with_str(&self, other: &str) -> bool {
self == other
pub(crate) fn shrink_to_fit(&mut self) {
self.realloc(self.len);
}
}

impl Drop for BoxedString {
fn drop(&mut self) {
#[allow(unsafe_code)]
unsafe {
alloc::alloc::dealloc(self.ptr.as_ptr(), Self::layout_for(self.cap))
}
}
}

impl Clone for BoxedString {
fn clone(&self) -> Self {
Self::from_str(self.capacity(), self.deref())
}
}

impl Deref for BoxedString {
type Target = str;

fn deref(&self) -> &Self::Target {
#[allow(unsafe_code)]
unsafe {
core::str::from_utf8_unchecked(core::slice::from_raw_parts(self.ptr.as_ptr(), self.len))
}
}
}

impl DerefMut for BoxedString {
fn deref_mut(&mut self) -> &mut Self::Target {
#[allow(unsafe_code)]
unsafe {
core::str::from_utf8_unchecked_mut(core::slice::from_raw_parts_mut(
self.ptr.as_ptr(),
self.len,
))
}
}
}

impl From<String> for BoxedString {
fn from(mut s: String) -> Self {
if s.is_empty() {
Self::new(s.capacity())
} else {
// TODO: Use String::into_raw_parts when stabilised, meanwhile let's get unsafe
let len = s.len();
let cap = s.capacity();
#[allow(unsafe_code)]
let ptr = unsafe { NonNull::new_unchecked(s.as_mut_ptr()) };
forget(s);
Self { cap, len, ptr }
}
}
}

#[inline(always)]
fn eq_with_self(&self, other: &Self) -> bool {
self == other
impl From<BoxedString> for String {
fn from(s: BoxedString) -> Self {
#[allow(unsafe_code)]
let out = unsafe { String::from_raw_parts(s.ptr.as_ptr(), s.len(), s.capacity()) };
forget(s);
out
}
}
14 changes: 7 additions & 7 deletions src/casts.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,19 @@
// License, v. 2.0. If a copy of the MPL was not distributed with this
// file, You can obtain one at http://mozilla.org/MPL/2.0/.

use crate::{inline::InlineString, SmartStringMode};
use crate::{boxed::BoxedString, inline::InlineString};

pub(crate) enum StringCast<'a, Mode: SmartStringMode> {
Boxed(&'a Mode::BoxedString),
pub(crate) enum StringCast<'a> {
Boxed(&'a BoxedString),
Inline(&'a InlineString),
}

pub(crate) enum StringCastMut<'a, Mode: SmartStringMode> {
Boxed(&'a mut Mode::BoxedString),
pub(crate) enum StringCastMut<'a> {
Boxed(&'a mut BoxedString),
Inline(&'a mut InlineString),
}

pub(crate) enum StringCastInto<Mode: SmartStringMode> {
Boxed(Mode::BoxedString),
pub(crate) enum StringCastInto {
Boxed(BoxedString),
Inline(InlineString),
}
Loading

0 comments on commit dd4495f

Please sign in to comment.