This document is an attempt to describe in reasonable detail the general
architecture of the read-fonts
and write-fonts
crates, focusing
specifically on parts that are auto-generated.
note:
at various points in this document I will make use of blockquotes (like this one) to highlight particular aspects of the design that may be interesting, confusing, or require refinement.
These two crates can be thought of as siblings, and they both follow the same basic high-level design pattern: they contain a set of generated types, mapping as closely as possible to the types in the OpenType spec, alongside hand-written code that uses and is used by those types.
The read-fonts
crate is focused on efficient read access and parsing, and
the write-fonts
crate is focused on compilation. The two crates contain a
parallel tables
module, with a nearly identical set of type definitions: for
instance, both crates contain a tables::name::NameRecord
type.
We will examine each of these crates separately.
Although this writeup is focused specifically on the code we generate, that code is closely entwined with code that we hand-write. This is a general pattern: we manually implement some set of types and traits, which are then used in our generated code.
All of the types which are used in codegen are reexported in the
codegen_prelude
module; this is glob imported at the top of
every generated file.
We will describe various of these manually implemented types as we encounter
them throughout this document, but before we get started it is worth touching on
two cases: FontData
and scalars / BigEndian<T>
.
Before we dive into the specifics of the tables and records in read-fonts
, I
want to talk briefly about how we represent and handle the basic data types
of which records and tables are composed.
In the font file, these values are all represented in big-endian byte order. When we access them, we will need to convert them to the native endianness of the host platform. We also need to have some set of types which exactly match the memory layout (including byte ordering) of the underlying font file; this is necessary for us to take advance of zerocopy semantics (see the zerocopy section below.)
In addition to endianness, it is also sometimes the case that types will be
represented by a different number of bytes in the raw file than when are
manipulating them natively; for instance Offset24
is represented as three
bytes on disk, but represented as a u32
in native code.
This leads us to a situation where we require two distinct types for each scalar: a native type that we will use in our program logic, and a 'raw' type that will represent the bytes in the font file (as well as some mechanism to convert between them.)
There are various ways we could express this in Rust. The most straightforward
would be to just have two parallel sets of types: for instance alongside the
F2Dot14
type, we might have RawF2Dot14
, or F2Dot14Be
. Another option might
be to have types that are generic over byte-order, such that you end up with
types like U16<BE>
and U16<LE>
.
I have taken a slightly different approach, which tries to be more ergonomic and intuitive to the user, at the cost of having a slightly more complicated implementation.
Our design has two basic components: a trait, Scalar
and a type
BigEndian<T>
, which look like this:
/// A trait for font scalars.
pub trait Scalar {
/// The raw byte representation of this type.
type Raw: Copy + AsRef<[u8]>;
/// Create an instance of this type from raw big-endian bytes
fn from_raw(raw: Self::Raw) -> Self;
/// Encode this type as raw big-endian bytes
fn to_raw(self) -> Self::Raw;
}
/// A wrapper around raw big-endian bytes for some type.
#[derive(Clone, Copy, PartialEq, Eq)]
#[repr(transparent)]
pub struct BigEndian<T: Scalar>(T::Raw);
The Scalar
trait handles conversion of a type to and from its raw representation
(a fixed-size byte array) and the BigEndian
type is way of representing some
fixed number of bytes, and associating them with a concrete type; it has get
and set
methods which read or write the underlying bytes, relying on the
from_raw
and to_raw
methods on Scalar.
This is a compromise. The Raw
associated type is expected to always be a
fixed-size byte array; say [u8; 2]
for a u16
, or [u8; 3]
for an Offset24
.
Ideally, the scalar trait would look like,
trait Scalar {
const RAW_SIZE: usize;
fn from_raw(bytes: [u8; Self::RAW_SIZE]) -> Self;
fn to_raw(self) -> [u8; Self::RAW_SIZE];
}
But this is not currently something we can express with Rust's generics, although it should become possible eventually.
In any case: what this lets us do is avoid having two separate sets of types for
the 'raw' and 'native' cases; we have a single wrapper type that we use anytime
we want to indicate that a type is in its raw form. This has the additional
advantage that we can define new types in our generated code that implement
Scalar
, and then those types can automatically work with BigEndian
; this is
useful for things like custom enums and flags that are defined at various points
in the spec.
In addition to these two traits, we also have a FixedSize
trait, which is
implemented for all scalar types (and later, for structs consisting only of
scalar types). This trait consists of a single associated constant:
/// A trait for types that have a known, constant size.
pub trait FixedSize: Sized {
/// The raw (encoded) size of this type, in bytes.
const RAW_BYTE_LEN: usize;
}
This is implemented for both all the scalar values, as well as all their
BigEndian
equivalents; and in both cases, the value of RAW_BYTE_LEN
is the
size of the raw (big-endian) representation.
The FontData
struct is at the core of all of our font reading code. It
represents a pointer to raw bytes, augmented with a bunch of methods for safely
reading scalar values from that raw data.
It looks approximately like this:
pub struct FontData<'a>(&'a [u8]);
And can be thought of as a specialized interface on top of a Rust byte slice.This type is used extensively in the API, and will show up frequently in subsequent code snippets.
In the read-fonts
crate, we make a distinction between table objects and
record objects, and we generate different code for each.
The distinction between a table and a record is blurry, but the specification offers two "general criteria":
- Tables are referenced by offsets. If a table contains an offset to a sub-structure, the offset is normally from the start of that table.
- Records occur sequentially within a parent structure, either within a sequence of table fields or within an array of records of a given type. If a record contains an offset to a sub-structure, that structure is logically a subtable of the record’s parent table and the offset is normally from the start of the parent table.
Conceptually, a table object is additional type information laid over a
FontData
object (a wrapper around a rust byte slice (&[u8]
), essentially
a pointer plus a length). It provides typed access to that tables fields.
Conceptually, this looks like:
pub struct MyTable<'a>(FontData<'a>);
impl MyTable<'_> {
/// Read the table's first field
pub fn format(&self) -> u16 {
self.0.read_at(0)
}
}
In practice, what we generate is slightly different: instead of
generating a struct for the table itself (and wrapping the data directly)
we generate a 'marker' struct, which defines the type of the table, and then we
combine it with the data via a TableRef
struct.
The TableRef
struct looks like this:
/// Typed access to raw table data.
pub struct TableRef<'a, T> {
shape: T,
data: FontData<'a>,
}
And the definition of the table above, using a marker type, would look something like:
/// A marker type
pub struct MyTableMarker;
/// Instead of generating a struct for each table, we define a type alias
pub type MyTable<'a> = TableRef<'a, MyTableMarker>;
impl MyTableMarker {
fn format_byte_range(&self) -> Range<usize> {
0..u16::RAW_BYTE_LEN
}
}
impl MyTable<'_> {
fn format(&self) -> u16 {
let range = self.shape.format_byte_range();
self.data.read_at(range.start)
}
}
To the user these two API are equivalent (you have a type MyTable
, on which
you can call methods to read fields) but the 'marker' pattern potentially allows
for us to do some fancy things in the future (involving various cases where we
want to store a type separate from a lifetime).
note:
there are also downsides of the marker pattern; in particular, currently the code we generate will only compile if it is part of the
read-fonts
crate itself. This isn't a major limitation, except that it makes certain kinds of testing harder to do, since we can't do fancy things like generate code that treated as a separate compilation unit, e.g. for use with thetrybuild
crate.
After generating the type definitions, the next thing we generate is an
implementation of one of FontRead
or FontReadWithArgs
. The
FontRead
trait is used if a table is self-describing: that is, if the data in
the table can be fully interpreted without any external information. In some
cases, however, this is not possible. A simple example is the loca
table:
the data for this table cannot be interpreted correctly without knowing the
number of glyphs in the font (stored in the maxp
table) as well as whether the
format is long or short, which is stored in the head
table.
note:
The
FontRead
trait is similar the 'sanitize' methods in HarfBuzz: that is to say that it does not parse the data, but only ensures that it is well-formed. Unlike 'sanitize', however,FontRead
is not recursive (it does not chase offsets) and it does not in anyway modify the structure; it merely returns an error if the structure is malformed.We will likely want to change the name of this method at some point, to clarify the fact that it is not exactly reading.
In either case, the generated table code is very similar.
For the purpose of illustration, let's imagine we have a table that looks like this:
table Foob {
#[version]
version: BigEndian<u16>,
some_val: BigEndian<u32>,
other_val: BigEndian<u32>,
flags_count: BigEndian<u16>,
#[count($flags_count)]
flags: [BigEndian<u16>],
#[since_version(1)]
versioned_value: BigEndian<u32>,
}
This generates the following code:
impl<'a> FontRead<'a> for Foob<'a> {
fn read(data: FontData<'a>) -> Result<Self, ReadError> {
let mut cursor = data.cursor();
let version: u16 = cursor.read()?;
cursor.advance::<u32>(); // some_val
cursor.advance::<u32>(); // other_val
let flags_count: u16 = cursor.read()?;
let flags_byte_len = flags_count as usize * u16::RAW_BYTE_LEN;
cursor.advance_by(flags_byte_len); // flags
let versioned_value_byte_start = version
.compatible(1)
.then(|| cursor.position())
.transpose()?;
version.compatible(1).then(|| cursor.advance::<u32>());
cursor.finish(FoobMarker {
flags_byte_len,
versioned_value_byte_start,
})
}
}
Let's walk through this. Firstly, the whole process is based around a 'cursor' type, which is simply a way of advancing through the input data on a field-by-field basis. Where we need to know the value of a field in order to validate subsequent fields, we read that field into a local variable. Additionally, values that we have to compute based on other fields are currently cached in the marker struct, although this is an implementation detail and may change. Let's walk through this code, field by field:
- version: as this is marked with the
#[version]
attribute, we read the value into a local variable, since we will need to know the version when reading any versioned fields. - some_val: this is a simple value, and we do not need to know what it is,
only that it exists. We
advance
the cursor by the appropriate number of bytes. - other_val: ditto. The compiler will be able to combine these two
advances
into a single operation. - flags_count: This value is referenced in the
#[count]
attribute on the following field, and so we bind it to a local variable. - flags: the
#[count]
attribute indicates that the length of this array is stored in theflags_count
field. We determine the array length by multiplying that value by the size of the array member, and we advance the cursor by that number of bytes. - versioned_value: this field is only available if the
version
field is >= to1
(this is specified via the#[since_version]
attribute). We record the current cursor position (as anOption
, which will beSome
only if the version is compatible) and then we advance the cursor by the size of the field's type.
Finally, having finished with each field, we call the finish
method on the
cursor: this performs a final bounds check, and instantiates the table with the
provided marker.
note:
The
FontRead
trait is currently doing a bit of a double duty: in the case of tables, it is expected to perform a very minimal validation (essentially just bounds checking) but in the case of records it serves as an actual parse function, returning a concrete instance of the type. It is possible that these two roles should be separated?
As hinted at above, for tables that are versioned (which have a version field,
and which have more than one known version value we do not generate a distinct
table per version; instead we generate a single table. For fields that are
available on all versions of a table, we generate getters as usual. For fields
that are only available on certain versions, we generate getters that return an
Option
type, which will be Some
in the case where that field is present for
the current version.
note:
The way we determine availability is crude: it is based on the
Compatible
trait, which is implemented for the various types which are used to represent versions. For types that represent their version as a (major, minor) pair, we consider a version to be compatible with another version if it has the same major number and a greater-than-or-equal minor number. For versions that are a single value, we consider them compatible if they are greater-than-or-equal. If this ends up being inadequate, we can revisit it.
Some tables have multiple possible 'formats'. The various formats of a table
will all share an initial 'format' field (generally a u16
) which identifies
the format, but the rest of their fields may differ.
For tables like this, we generate an enum that contains a variant for each of the possible formats. For this to work, each different table format must declare its table field in the input file:
table MyTableFormat1 {
#[format = 1]
table_format: BigEndian<u16>,
my_val: BigEndian<u16>,
}
The #[format = 1]
attribute on the field of MyTableFormat1
is an important
detail, here. This causes us to implement a private trait, Format
, like this:
impl Format<u16> for MyTableFormat1 {
const FORMAT: u16 = 1;
}
You then also declare that you want to create an enum, providing an explicit format, and listing which tables should be included:
format u16[@N] MyTable {
Format1(MyTableFormat1),
Format2(MyTableFormat2),
}
the 'format' keyword is followed by the type that represents the format, and optionally a position at which to read it (indicated by the '@' token, followed by an unsigned integer literal.) In the vast majority of cases this can be omitted, and the format will be read from the first position in the table.
We will then generate an enum, as well as a FontRead
implementation: this
implementation will read the format off of the front of the input data, and then
instantiate the appropriate variant based on that value. The generated
implementation looks like this:
impl<'a> FontRead<'a> for MyTable<'a> {
fn read(data: FontData<'a>) -> Result<Self, ReadError> {
let format: u16 = data.read_at(0)?;
match format {
MyTableFormat1::FORMAT => Ok(Self::Format1(FontRead::read(data)?)),
MyTableFormat2::FORMAT => Ok(Self::Format2(FontRead::read(data)?)),
other => Err(ReadError::InvalidFormat(other.into())),
}
}
}
This trait-based approach has a few nice properties: we ensure that we don't accidentally have formats declared with different types, and we also ensure that if we accidentally provide the sae format value for two different tables, we will at least see a compiler warning.
For each field in the table, we generate a getter method. The exact behaviour of
this method depends on the type of the field. If the field is a scalar (that
is, if it is a single raw value, such as an offset, a u16
, or a Tag
)
then this getter reads the raw bytes, and then returns a value of the
appropriate type, handling big-endian conversion. If it is an array, then the
getter returns an array type that wraps the underlying bytes, which will be read
lazily on access.
Alongside the getters we also generate, for each field, a
method on the marker struct that returns the start and end positions of each
field. These are defined in terms of one another: the end position of field N
is the start of field N+1
. These fields are defined in a process that echoes
how the table is validated, where we build up the offsets as we advance through
the fields. This means we avoid the case where we are calculating offsets from
the start of the table, which should lead to more auditable code.
For fields that are either offsets or arrays of offsets, we generate two getters: a raw getter that returns the raw offset, and an 'offset getter' that resolves the offset into the concrete type that is referenced. If the field is an array of offsets, this returns an iterator of resolved offsets. (This is a detail that I would like to change in the future, replacing it with some sort of lazy array-like type.)
For instance, if we have a table which contains the following:
table CoverageContainer {
coverage_offset: BigEndian<Offset16<CoverageTable>>,
class_count: BigEndian<u16>,
#[count($class_count)]
class_def_offsets: [BigEndian<Offset16<ClassDef>>],
}
we will generate the following methods:
impl<'a> ClassContainer<'a> {
pub fn coverage_offset(&self) -> Offset16 { .. }
pub fn coverage(&self) -> Result<CoverageTable<'a>, ReadError> { .. }
pub fn class_def_offsets(&self) -> &[BigEndian<Offset16>] { .. }
pub fn class_defs(&self) ->
impl Iterator<Item = Result<ClassDef<'a>, ReadError>> + 'a { .. }
Every offset field requires an offset getter, but the getters generated by
default only work with types that implement FontRead
. For types that require
args, you can use the #[read_offset_with($arg1, $arg1)]
attribute to indicate
that this offset needs to be resolved with FontReadWithArgs
, which will be
passed the arguments specified; these can be either the names of fields on the
containing table, or the name of arguments passed into this table through its
own FontReadWithArgs
impl.
In special cases, you can also manually implement this getter by using the
#[offset_getter(method)]
attribute, where method
will be a method you
implement on the type that handles resolving the offset via whatever custom
logic is required.
How do we keep track of the data from which an offset is resolved? A happy byproduct of how we represent tables makes this generally trivial: because a table is just a wrapper around a chunk of bytes, and since most offsets are resolved relative to the start of the containing table, we can resolve offsets from directly from our inner data.
In tricky cases, where offsets are not relative to the start of the table, we
there is a custom #[offset_data]
attribute, where the user can specify a
method that should be called to get the data against which a given offset should
be resolved.
Records are components of tables. With a few exceptions, they almost always exist in arrays; that is, a table will contain an array with some number of records.
When generating code for records, we can take one of two paths. If the record has a fixed size, which is known at compile time, we generate a "zerocopy" struct; and if not, we generate a "copy on read" struct. I will describe these separately.
When a record has a known, constant size, we declare a struct which has fields which exactly match the raw memory layout of the record.
As an example, the root TableDirectory of an OpenType font contains a TableRecord type, defined like this:
Type | Name | Description |
---|---|---|
Tag |
tableTag | Table identifier. |
uint32 |
checksum | Checksum for this table. |
Offset32 |
offset | Offset from beginning of font file. |
uint32 |
length | Length of this table. |
For this type, we generate the following struct:
#[repr(C)]
#[repr(packed)]
pub struct TableRecord {
/// Table identifier.
pub tag: BigEndian<Tag>,
/// Checksum for the table.
pub checksum: BigEndian<u32>,
/// Offset from the beginning of the font data.
pub offset: BigEndian<Offset32>,
/// Length of the table.
pub length: BigEndian<u32>,
}
impl FixedSize for TableRecord {
const RAW_BYTE_LEN: usize = Tag::RAW_BYTE_LEN
+ u32::RAW_BYTE_LEN
+ Offset32::RAW_BYTE_LEN
+ u32::RAW_BYTE_LEN;
}
Some things to note:
- The
repr
attribute specifies the layout and and alignment of the struct.#[repr(packed)]
means that the generated struct has no internal padding, and that the alignment is1
. (#[repr(C)]
is required in order to use#[repr(packed)]
, and it basically means "opt me out of the default representation"). - All of the fields are
BigEndian<_>
types. This means that their internal representation is raw, big-endian bytes. - The
FixedSize
trait acts as a marker, to ensure that this type's fields are themselves all alsoFixedSize
.
Taken altogether, we get a struct that can be 'cast' from any slice of bytes
of the appropriate length. More specifically, this works for arrays: we can take
a slice of bytes, ensure that its length is a multiple of T::RAW_BYTE_LEN
,
and then convert that to a Rust slice of the appropriate type.
In certain cases, there are records which do not have a size known at compile
time. This happens frequently in the GPOS table. An example is the
PairValueRecord
type: this contains two ValueRecord
fields, and the size
(in bytes) of each of these fields depends on a ValueFormat
that is stored in
the parent table.
As such, we cannot know the size of PairValueRecord
at compile time, which
means we cannot cast it directly from bytes. Instead, we generate a 'normal'
struct, as well as an implementation of FontReadWithArgs
(discussed in the
table section.) This looks like,
pub struct PairValueRecord {
/// Glyph ID of second glyph in the pair
pub second_glyph: BigEndian<GlyphId>,
/// Positioning data for the first glyph in the pair.
pub value_record1: ValueRecord,
/// Positioning data for the second glyph in the pair.
pub value_record2: ValueRecord,
}
impl<'a> FontReadWithArgs<'a> for PairValueRecord {
fn read_with_args(
data: FontData<'a>,
args: &(ValueFormat, ValueFormat),
) -> Result<Self, ReadError> {
let mut cursor = data.cursor();
let (value_format1, value_format2) = *args;
Ok(Self {
second_glyph: cursor.read()?,
value_record1: cursor.read_with_args(&value_format1)?,
value_record2: cursor.read_with_args(&value_format2)?,
})
}
}
Here, in our 'read' impl, we are actually instantiating an instance of our type, copying the bytes as needed.
In addition, we also generate an implementation of the ComputeSize
trait; this
is analogous to the FixedSize
trait, which represents the case of a type that
has a size which can be computed at runtime from some set of arguments.
Records, like tables, can contain offsets. Unlike tables, records do not have access to the raw data against which those offsets should be resolved. For the purpose of consistency across our geneerated code, however, it is important that we have a consistent way of resolving offsets contained in records, and we do: you have to pass it in.
Where an offset getter on a table might look like,
fn coverage(&self) -> Result<CoverageTable<'a>, ReadError>;
The equivalent getter on a record looks like,
fn coverage(&self, data: FontData<'a>) -> Result<CoverageTable<'a>, ReadError>;
This... honestly, this is not great ergonomics. It is, however, simple, and is
relied on by codegen in various places, and when we're generating code we aren't
too bothered by how ergonomic it is. We might want to revisit this at some
point; one simple improvement would be to have the caller pass in the parent
table, but I'm not sure how this would work in cases where a type might be
referenced by multiple parents. Another option would be to have some kind of
fancy RecordData
struct that would be a thin wrapper around a record plus the
parent data, and which would implement the record getters, but deref to the
record otherwise.... I'm really not sure.
The code we generate to represent an array varies based on what we know about the size and contents of the array:
-
if the contents of an array have a fixed uniform size, known at compile time, then we represent the array as a rust slice:
&[T]
. This is true for all scalars (including offsets) as well as records that are composed of a fixed number of scalars. -
if the contents of an array have a uniform size, but the size can only be determined at runtime, we represent the array using the
ComputedArray
type. This requires the inner type to implementFontReadWithArgs
, and the array itself wraps the raw bytes and instantiates its elements lazily as they are accessed. As an example, the length of aValueRecord
depends on the specific associatedValueFormat
.table SinglePosFormat2 { // some fields omitted value_format: BigEndian<ValueFormat>, value_count: BigEndian<u16>, #[count($value_count)] #[read_with($value_format)] value_records: ComputedArray<ValueRecord>, }
-
finally, if an array contains elements of non-uniform sizes, we use the
VarLenArray
type. This requires the inner type to have a leading field which contains the length of the item, and this array does not allow for random access; an example is the array of Pascal-style strings in the 'post' table. The inner type must implement the implement theVarSize
trait, via which it indicates the type of its leading length field. An example of this pattern is the array of Pascal-style strings in the 'post' table; the first byte of these strings encodes the length, and so we represent them in aVarLenArray
:table Post { // some fields omitted #[count(..)] #[since_version(2.0)] string_data: VarLenArray<PString<'a>>, }
On top of tables and records, we also generate code for various defined flags
and enums. In the case of flags, we generate implementations based on the
bitflags
crate, and in the case of enums, we generate a rust enum.
These code paths are not currently very heavily used.
There is one last piece of code that we generate in read-fonts
, and that is
our 'traversal' code.
This is experimental and likely subject to significant change, but the general
idea is that it is a mechanism for recursively traversing a graph of
tables, without needing to worry about the specific type of any particular table. It
does this by using trait objects, which allow us to refer to
multiple distinct types in terms of a trait that they implement. The core of this is the
SomeTable
trait, which is implemented for each table; through this, we can
get the name of a table, as well as iterate through that tables fields.
For each field, the table returns the name of the field (as a string) along with
some value; the set of possible values is covered by the FieldType
enum. Importantly, the table resolves any contained offsets, and returns the
referenced tables as SomeTable
trait objects as well, which can then also be
traversed recursively.
We do not currently make very heavy use of this mechanism, but it is the basis
for the generated implementations of the Debug
trait, and it is used in the
otexplorer sample project.
The write-fonts
crate is significantly simpler than the read-fonts
crate
(currently less than half the total lines of generated code) and because it does
not have to deal with the specifics of the memory layout or worry about avoiding
allocation, the generated code is generally more straightforward.
Unlike in read-fonts
, which generates significantly different code for tables
and records (as well as very different code based on whether a record is
zerocopy or not) the write-fonts
crate treats all tables and records as basic
Rust structs.
As in read-fonts
we generate enums for tables that have multiple formats, and
likewise we generate a single struct for tables that have versioned fields, with
version-dependent fields represented as Option
types.
note:
This pattern is a bit more annoying in write-fonts, and we may want to revisit it at some point, or at least improve the API with some sort of builder pattern.
Where the types in read-fonts
generally contain the exact fields described in
the spec, this does not always make sense for the write-types
. A simple
example is fields that contain the count of an array. This is useful in
read-fonts
, but in write-fonts
it is redundant, since we can determine the
count from the array itself. The same is true of things like the format
field,
which we can determine from the type of the table, as well as version numbers,
which we can choose based on the fields present on the table.
In these cases, the #[compile(..)]
attribute can be used to provide a computed
value to be written in the place of this field. The provided value can be a
literal or an expression that evaluates to a value of the field's type.
If a field has a #[compile(..)]
attribute, then that field will be omitted in
the generated struct.
Fields that are of the various offset types in the spec are represented in
write-fonts
as OffsetMarker
types. These are a wrapper around an
Option<T>
where T
is the type of the referenced subtable; they also have a
const generic param N
that represents the width of the offset, in bytes.
During compilation (see the section on [FontWrite
][#fontwrite], below) we use
these markers to record the position of offsets in a table, and to associate
those locations with specific subtables.
parsing and FromTableRef
There is generally 1:1 relationship between the generated types in read-fonts
and
write-fonts
, and you can convert a type in read-fonts
to a corresponding
type in write-fonts
(assuming the default "parsing" feature is enabled) via
the FromObjRef
and FromTableRef
traits. These are modeled on the
From
trait in the Rust prelude, down to having a pair of
companion IntoOwnedObj
and IntoOwnedTable
traits with blanket impls.
The basic idea behind this approach is that we do not generate separate parsing
code for the types in write-fonts
; we leave the parsing up to the types in read-fonts
,
and then we just handle conversion from these to the write types.
The more general of these two traits is FromObjRef
, which is implemented
for every table and record. It has one method, from_obj_ref
, which takes some
type from read-fonts
, as well as FontData
that is used to resolve any
offsets. If the type is a table, it can ignore the provided data, since it
already has a reference to the data it will use to resolve any contained
offsets, but if it is a record than it must use the input data in order to
recursively convert any contained offsets.
In their FromObjRef
implementation, tables provide pass their own data down to
any contained records as required.
The FromTableRef
trait is simply a marker; it indicates that a given object
does not require any external data.
In any case, all of these traits are largely implementation details, and you
will rarely need to interact with them directly: if because if a type implements
FromTableRef
, then we also generate an implementation of the FontRead
trait from read-fonts
. This means that all of the self-describing tables in
write-fonts
can be instantiated directly from raw bytes in a font file.
One detail of FromObjRef
and family is that these traits are infallible;
that is, if we can parse a table at all, we will always successfully convert it
to its owned equivalent, even if it contains unexpected null offsets, or has
subtables which cannot be read. This means that you can read and modify a table
that is malformed.
We do not want to write tables that are malformed, however, and we also want
an opportunity to enforce various other constraints that are expressed in the
spec, and for this we have the Validate
trait. An implementation of this
trait is generated for all tables, and we automatically verify a number of
conditions: for instance that offsets which should not be null contain a value,
or that the number of items in a table does not overflow the integer type that
stores that table's length. Additional validation can be performed on a
per-field basis by providing a method name to the #[validate(..)]
attribute;
this should be an instance method (having a &self
param) and should also
accept an additional 'ctx' argument, of type &mut ValidateCtx
which is used
to report errors.
compilation and FontWrite
Finally, for each type we generate an implementation of the FontWrite
trait,
which looks like:
pub trait FontWrite {
fn write_into(&self, writer: &mut TableWriter);
}
The TableWriter
struct has two jobs: it records the raw bytes representing the
data in this table or record, as well as recording the position of offsets, and
the entities they point do.
The implementation of this type is all hand-written, and out of the scope of
this document, but the implementations of FontWrite
that we generate are
straight-forward: we walk the struct's fields in order (computing a value if the
field has a #[compile(..)]
attribute) and recursively call write_into
on
them. This recurses until it reaches either an OffsetMarker
or a scalar type;
in the first case we record the position and size of the offset in the current
table, and then recursively write out the referenced object; and in the latter
case we record the big-endian bytes themselves.
This document represents a best effort at capturing the most important details of the code we generate, as of October 2022. It is likely that things will change over time, and I will endeavour to keep this document up to date. If anything is unclear or incorrect, please open an issue and I will try to clarify.