Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce Visitor::finalize overhead #1

Open
scottlamb opened this issue Dec 15, 2021 · 3 comments
Open

reduce Visitor::finalize overhead #1

scottlamb opened this issue Dec 15, 2021 · 3 comments

Comments

@scottlamb
Copy link
Owner

Much of the bloat of the generated code is in the *Visitor::finalize methods, which transfer stuff from the builder (typically with types like Option<T>) to the final product (with types like T).

[slamb@slamb-workstation ~/git/crates/onvif-rs]$ cargo bloat --release --example camera --filter schema
...
File .text     Size  Crate Name
0.0%  0.1%   3.8KiB schema schema::onvif::_::PtzconfigurationVisitor::finalize
0.0%  0.1%   3.5KiB schema schema::onvif::_::ProfileVisitor::finalize
0.0%  0.1%   2.3KiB schema schema::onvif::_::MetadataConfigurationVisitor::finalize
0.0%  0.1%   2.2KiB schema schema::onvif::_::VideoSourceConfigurationVisitor::finalize
0.0%  0.0%   1.8KiB schema schema::onvif::_::AudioEncoderConfigurationVisitor::finalize
0.0%  0.0%   1.7KiB schema <schema::onvif::_::CapabilitiesVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.7KiB schema schema::onvif::config_description::_::MessagesTypeVisitor::finalize
0.0%  0.0%   1.5KiB schema schema::onvif::_::VideoAnalyticsConfigurationVisitor::finalize
0.0%  0.0%   1.4KiB schema schema::devicemgmt::_::GetDeviceInformationResponseVisitor::finalize
0.0%  0.0%   1.4KiB schema schema::onvif::_::ConfigVisitor::finalize
0.0%  0.0%   1.3KiB schema schema::onvif::_::ConfigDescriptionVisitor::finalize
0.0%  0.0%   1.2KiB schema schema::onvif::_::AudioOutputConfigurationVisitor::finalize
0.0%  0.0%   1.2KiB schema schema::onvif::_::VideoEncoderConfigurationVisitor::finalize
0.0%  0.0%   1.1KiB schema <T as static_xml::ser::SerializeField>::write
0.0%  0.0%   1.1KiB schema <schema::onvif::_::VideoEncoderConfigurationVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.1KiB schema schema::onvif::_::DeviceCapabilitiesVisitor::finalize
0.0%  0.0%   1.1KiB schema schema::onvif::_::<impl static_xml::ser::Serialize for schema::onvif::Transport>::write_children
0.0%  0.0%   1.0KiB schema <schema::onvif::_::ProfileVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.0KiB schema <schema::onvif::_::MetadataConfigurationVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.0KiB schema schema::onvif::_::<impl static_xml::de::Deserialize for schema::onvif::ProfileExtension>::deserialize
2.2%  7.3% 285.0KiB        And 993 smaller methods. Use -n N to show more.
2.4%  8.2% 317.4KiB        filtered data size, the file size is 12.7MiB

Look at ways to reduce it.

My first attempt was to bypass this entirely, more or less as yaserde does. #static_xml(direct) achieves this. But there are three downsides:

  • now I'm back to needing a Default impl for everything, which I'd rather avoid.
  • when you just write field: T rather than field: Option<T>, it can't error if a required field is missing or even distinguish between absent and default value
  • a very specific bug: a flattened enum which has a variant of type Vec<T> doesn't work properly. It will discard all but the last occurrence. I haven't looked at the yaserde code, but I suspect it avoids this because it handles all the flattened stuff in a single call (in a gross way: it buffers everything by serializing all the unknown elements then deserializes them again). I use the DeserializeFlattened interface instead.

Maybe there's some big thing I can avoid in the generated code now. Or maybe there's a middle ground in which everything still needs Default (the first cavet holds) but the Visitor type is something like this:

struct BlahVisitor<'a> {
    out: &'a mut Blah,
    field_a_present: bool,
    field_b_present: bool,
}

and just check those bools, or via a bitmask. Maybe this could even work with the table-driven approach.

@scottlamb scottlamb changed the title reduce Visitor::finalize overhead reduce Visitor::finalize overhead Dec 15, 2021
@scottlamb
Copy link
Owner Author

Here's an idea.

Step 1: test out the field_a_present approach as in the comment above. See if it's significantly smaller. I'm hoping so—the bloaty part is rearranging/individually copying the fields.

Step 2: assuming so, use unsafe and MaybeUninit to get rid of the Default requirement, maybe via something like this:

impl Deserialize for Foo {impl Deserialize for Foo {
    fn deserialize(element: ElementReader<'_>) -> Result<Self, VisitorError> {
        let mut out = MaybeUnit::uninit();
        let mut builder = BlahVisitor { out: &mut out, uninitialized: 3 };
        element.read_to(&mut builder)?;
        ::static_xml::de::check_uninitialized(self.uninitialized, &ELEMENTS[..])?;
        Ok(unsafe { out.assume_init() })
    }
}
struct BlahVisitor<'a> {
    out: &'a mut MaybeUninit<Blah>,
    uninitialized: u64, // bitmask
}
impl<'a> ElementVisitor for BlahVisitor<'a> {
    fn element<'a>(&mut self, child: ElementReader<'a>) -> Result<Option<ElementReader<'a>>, VisitorError> {
        match ::static_xml::de::find(&child.expanded_name(), ELEMENTS) {
            Some(0usize) if (self.uninitialized & 1) != 0 => {
                unsafe {
                    std::ptr::write(
                        self.out.as_mut_ptr().offset(offset_of!(Foo, field_a)) as *mut TypeA,
                        DeserializeField::init(child)?,
                    )
                };
                self.uninitialized &= ~1;
                return Ok(None);
            }
            Some(0usize) if (self.uninitialized & 1) == 0 => {
                let field = unsafe { self.out.as_mut_ptr().offset(offset_of!(Foo, field_a)) as *mut TypeA as &mut TypeA };
                DeserializeField::append(field, child)?
            }
            // .. likewise for field_b
            _ => Ok(Some(child)),
        }
    }
}

The idea is that if all the individual fields are initialized, then the final struct should be also. I don't think padding needs to be initialized. Any optional members should be filled in with a default at the end rather than failing via check_uninitialized.

This might be just moving the bloat to the fn element, but I'm hoping #5 can address that.

@scottlamb
Copy link
Owner Author

scottlamb commented Dec 15, 2021

Looking good. Implemented the struct BlahVisitor<'a> { out: &'a mut Blah, *_present: bool } way, more or less. (I cheated a little bit: structs don't implement DeserializeFlatten, and it has that enum MyChoice { Foo(Vec<Foo>) } bug I mentioned above. The goal was to get numbers, not generate something worth checking in.)

File .text     Size  Crate Name
0.0%  0.1%   2.0KiB schema <schema::soap_envelope::_::FaultVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.8KiB schema <schema::onvif::_::VideoAnalyticsConfigurationVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.8KiB schema <schema::onvif::_::CapabilitiesVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.4KiB schema <schema::onvif::_::IocapabilitiesExtensionVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.3KiB schema schema::onvif::_::<impl static_xml::de::Deserialize for schema::onvif::AudioEncoderConfiguration>::deserialize
0.0%  0.0%   1.3KiB schema <schema::onvif::_::VideoSourceConfigurationVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.3KiB schema <schema::onvif::_::VideoSourceConfigurationExtension2Visitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.3KiB schema <schema::onvif::_::DateTimeVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.3KiB schema <schema::onvif::_::MulticastConfigurationVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.2KiB schema <schema::onvif::_::RotateVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.2KiB schema <schema::onvif::_::LensDescriptionVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.2KiB schema schema::soap_envelope::Fault::is_unauthorized
0.0%  0.0%   1.2KiB schema <schema::onvif::_::IpaddressVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.2KiB schema schema::onvif::_::<impl static_xml::de::Deserialize for schema::onvif::MetadataConfiguration>::deserialize
0.0%  0.0%   1.2KiB schema <schema::onvif::_::VideoEncoderConfigurationVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.1KiB schema schema::ptz::_::<impl static_xml::de::Deserialize for schema::ptz::GetStatusResponse>::deserialize
0.0%  0.0%   1.1KiB schema schema::devicemgmt::_::<impl static_xml::de::Deserialize for schema::devicemgmt::GetDeviceInformationResponse>::deserialize
0.0%  0.0%   1.1KiB schema <schema::onvif::_::H264ConfigurationVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.1KiB schema <T as static_xml::ser::SerializeField>::write
0.0%  0.0%   1.1KiB schema <schema::onvif::_::Mpeg4ConfigurationVisitor as static_xml::de::ElementVisitor>::element
2.2%  7.2% 279.1KiB        And 877 smaller methods. Use -n N to show more.
2.4%  7.9% 305.3KiB        filtered data size, the file size is 12.7MiB

The finalize methods are gone, even with -n 0. I think they're getting inlined into deserialize (which makes sense: they only have one caller now, because I didn't implement the flatten versions).

$ cargo bloat --release --example camera --filter schema -n 0 | fgrep deserialize
...
0.0%  0.0%   1.3KiB schema schema::onvif::_::<impl static_xml::de::Deserialize for schema::onvif::AudioEncoderConfiguration>::deserialize
0.0%  0.0%   1.2KiB schema schema::onvif::_::<impl static_xml::de::Deserialize for schema::onvif::MetadataConfiguration>::deserialize
0.0%  0.0%   1.1KiB schema schema::ptz::_::<impl static_xml::de::Deserialize for schema::ptz::GetStatusResponse>::deserialize
0.0%  0.0%   1.1KiB schema schema::devicemgmt::_::<impl static_xml::de::Deserialize for schema::devicemgmt::GetDeviceInformationResponse>::deserialize
0.0%  0.0%   1.1KiB schema schema::common::_::<impl static_xml::de::Deserialize for schema::common::Ptzstatus>::deserialize
0.0%  0.0%   1.1KiB schema schema::onvif::_::<impl static_xml::de::Deserialize for schema::onvif::AudioOutputConfiguration>::deserialize
0.0%  0.0%   1.1KiB schema schema::onvif::_::<impl static_xml::de::Deserialize for schema::onvif::SecurityCapabilities>::deserialize
0.0%  0.0%   1.0KiB schema schema::onvif::_::<impl static_xml::de::Deserialize for schema::onvif::MulticastConfiguration>::deserialize
0.0%  0.0%    1006B schema schema::onvif::_::<impl static_xml::de::Deserialize for schema::onvif::Ptzconfiguration>::deserialize
0.0%  0.0%    1003B schema schema::onvif::_::<impl static_xml::de::Deserialize for schema::onvif::VideoAnalyticsConfiguration>::deserialize
...

so, looks like:

  • total size has gone down already
  • I'm quite optimistic about dramatically reducing further via the table-driven approach for what remains, both the elements stuff and the finalize stuff

Also, the interface is actually improving. I no longer need as many traits. Where I now have this (and likewise for attributes and flattens):

/// Deserializes the *content* of one or more elements into a field.
///
/// Typically used only via derive macros.
///
/// For any `T` that implements [`Deserialize`,, there are three implementations
/// of this trait:
///
/// 1. `T`, for mandatory singleton fields. In XML Schema terms, this
/// is an element with the default `minOccurs="1" maxOccurs="1"`.
/// 2. `Option<T>`, for optional singleton fields. In XML Schema terms,
/// `minOccurs="0" maxOccurs="1".
/// 3. `Vec<T>`, for repeated fields. In XML Schema terms,
/// `minOccurs="0" maxOccurs="unbounded"`.
pub trait DeserializeField: Sized {
type Builder: DeserializeFieldBuilder;
fn init() -> Self::Builder;
fn finalize(
builder: Self::Builder,
expected: &ExpandedNameRef<'_>,
default: Option<fn() -> Self>,
) -> Result<Self, VisitorError>;
}
/// Builder used by [`DeserializeField`].
pub trait DeserializeFieldBuilder {
/// Handles a single occurrence of this element; called zero or more times.
fn element<'a>(&mut self, element: ElementReader<'_>) -> Result<(), VisitorError>;
}

I can instead have something like this:

pub trait DeserializeElementField {
    /// Called on the first occurrence of this field's element within the parent.
    fn init(element: ElementReader<'_>) -> Result<Self, VisitorError>;

    /// Called on subsequent occurrences of this field's element within the parent.
    ///
    /// `self` was previously returned by `init` and has been through zero or more prior `update` calls.
    fn update(&mut self, element: ElementReader<'_>) -> Result<(), VisitorError>;

    /// Called iff this field's element was not found within the parent and no default is set.
    fn missing(expected: &ExpandedNameRef<'_>) -> Result<Self, VisitorError>;
}

Fewer traits, IMHO more clear.

I'm going to proceed with the MaybeUninit approach to not require Default. I'll be able to get rid of the direct switch; users won't have to make unpleasant trade-offs.

scottlamb added a commit that referenced this issue Dec 16, 2021
scottlamb added a commit that referenced this issue Dec 16, 2021
todo:
* there are several big commented-out regions in de/mod.rs
* text fields/flattens are probably broken
* skip stuff half-done?
* review pass
@scottlamb
Copy link
Owner Author

I've got a messy, proof-of-concept implementation of the MaybeUninit approach in 0d591c7. New bloat output:

File .text     Size  Crate Name
0.0%  0.1%   3.5KiB schema <schema::onvif::_::CapabilitiesVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.1%   3.5KiB schema <schema::onvif::_::PtzconfigurationVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.1%   2.9KiB schema <schema::onvif::_::ProfileVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.1%   2.9KiB schema <schema::onvif::_::VideoEncoderConfigurationVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.1%   2.9KiB schema <schema::onvif::_::MetadataConfigurationVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.1%   2.6KiB schema <schema::onvif::_::AudioEncoderConfigurationVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.1%   2.1KiB schema <schema::onvif::_::CapabilitiesExtensionVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.1%   2.0KiB schema <schema::soap_envelope::_::FaultVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.1%   2.0KiB schema <schema::onvif::_::SystemCapabilitiesVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.1%   2.0KiB schema <schema::onvif::_::VideoSourceConfigurationVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.1%   2.0KiB schema <schema::onvif::_::SecurityCapabilitiesVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.9KiB schema <schema::onvif::_::SupportedAnalyticsModulesVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.9KiB schema <schema::onvif::_::VideoAnalyticsConfigurationVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.8KiB schema <schema::onvif::_::SystemDateTimeVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.7KiB schema <schema::onvif::config_description::_::MessagesTypeVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.7KiB schema <schema::onvif::_::DeviceCapabilitiesVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.6KiB schema <schema::onvif::_::IocapabilitiesExtensionVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.6KiB schema <schema::onvif::_::IpaddressVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.6KiB schema <schema::onvif::_::ItemListDescriptionVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.5KiB schema <schema::onvif::_::MulticastConfigurationVisitor as static_xml::de::ElementVisitor>::element
1.9%  6.4% 246.1KiB        And 792 smaller methods. Use -n N to show more.
2.2%  7.5% 289.9KiB        filtered data size, the file size is 12.7MiB

Note the schema crate's overall .text size is a little smaller, the grand total file size is the same (other segments grew?), and a lot of the bloat has shifted from finalize to element. The expanded code for one field in element now is:

                Some(0usize) if self.choice_present => {
                    ::static_xml::de::DeserializeElementField::update(
                        unsafe { &mut *unsafe { &raw mut (*self.out).choice } },
                        child,
                    )?;
                    Ok(None)
                }
                Some(0usize) => unsafe {
                    ::std::ptr::write(
                        unsafe { &raw mut (*self.out).choice },
                        ::static_xml::de::DeserializeElementField::init(child)?,
                    );
                    self.choice_present = true;
                    Ok(None)
                },

To improve this I'm going to look into the table-based approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant