Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SceneData rework #525

Merged
merged 49 commits into from
Dec 7, 2021
Merged

SceneData rework #525

merged 49 commits into from
Dec 7, 2021

Conversation

mosra
Copy link
Owner

@mosra mosra commented Jul 10, 2021

Final bit in the series of MeshData / MaterialData / ... reworks that brings pure data-oriented joy to scene hierarchies in a way that should be flexible enough to be fed into / serialized from common ECS frameworks like entt or flecs.

"MVP":

  • Base implementation, tests
    • only unique fields
    • field arrays
    • custom fields
    • convenience type-converting functions
    • support for non-owned and offset-only data and SceneFieldData having the same layout on 32 and 64 bits (preparing for Scene data serialization #427)
  • Per-field flags ("object mapping is sorted", "object mapping is trivial implicit", "field is offset-only")
  • OOP-like interface
    • "get me a mesh/light/... for this object", returning an Optional for unique fields and Array otherwise (or -1 / identity transform for the "usual" properties like parent and transform? how to distinguish objects that are outside of a hierarchy & without transformations then? postponed)
      • that means one additional convenience API for every convenience accessor, ugh -- could these be handled by a single API, for example by one that takes a list of objects for which we want the property? yes, a shared fooIntoInternal() helper
      • OTOH we could provide an iterating API where one would have to switch & cast on the type, handle all possible field types and ... ugh, nope, this is not the way
    • "get all fields that this object has", could be useful for debugging slow as hell, users should instead directly ask for meshes/lights/...
    • "get top-level objects", "get children of an object" ... basically goes through the Parents array and picks objects from there, is a O(n^2) operation, better ideas?
      • "get parent of an object", returns Optional<Int> used for checking which scenes does an object belong to -- if -1, the object is top-level, if NullOpt the object does not belong to this scene
    • speed up the lookup using the sorted/trivial per-field flags
  • Tighten and clarify naming ("object index" vs "object mapping" vs ??, "field" vs "entry of a field", SceneObjectType vs SceneIndexType, better idea for objectCount(SceneField) that describes "how many entries a field has", since one object can have more than one field entry it's not really "object count", entry count?) object mapping and "field"
  • document / figure out what actually an "object ID" is, and how it's meant to be worked with when animations / skins reference it (do I go through all scenes to figure out which one it belongs to? what if multiple scenes reference it?), though it was the exact same issue before)
    • it should also be mentioned that a single object ID can reference both a 2D and a 3D object and the two can reference the same thing or not at all, and the ID can be shared among scenes or not (which affects how the output glTF will look like), and IT NEEDS SOME SANITY RULES
  • the *Into() / *AsArray() helpers should probably give out objects as well?
    • so one can loop over them in a single less-verbose expression
    • so the common assertions get done only once
    • allow the object output view be null in *Into()
  • docs for filling/usage including OOP-like access
  • SceneFieldType::Pointer and SceneFieldType::MutablePointer, with arbitrary field<T*>() accepted for those (same as with MaterialData)
  • SceneField::ImporterState for per-object importer state, of a Pointer type -- needed for compat with existing code
  • How do we actually export files with multiple scenes? Since each SceneData basically references the exact same node hierarchy, with all its data...
    • it's just a specifics of glTF / COLLADA and not all file formats in general; glTF material variants extension makes it possible to have the same nodes appear different in different scenes, and what we have now is a further generalization of that
    • the glTF exporter will need to take care of this (e.g. by picking node info only from the first scene that references it, ignoring the (potentially conflicting) duplicate info)
  • One of the goals for this representation was to be able to efficiently store a state of a particle system (hundreds of thousands of positions + colors...), however with the restrictions that there can be only one instance of a particular field (so I can't have a separate packed 16-bit Translation field just for the particle system), or requiring that if there's a Scaling then all objects that have a Translation need to have a Scaling as well, it's not efficient anymore
    • could the Archetype support mentioned below solve this (with some additional restrictions)?
    • or should I just introduce custom fields for such case, missing out on the opportunity to deal with them through the common interfaces?
    • or eventually have sub-scene references, where the sub-scene uses a different field set and encoding? scene layers like with materials, where there can be a subrange of fields for each layer, and the fields can repeat?
    • or some " field, variant 2" that allows certain sub-parts of the scene to use custom encoding? that's no different than a sub-scene / sub-layer
    • or eventually lift the restriction of having the same object mapping and reduce just to sorted (so the combined transform etc can still be processed in linear time)
  • More systematic way to deal with 64-bit object IDs and references. We need them to store arbitrary object handles, but the convenience APIs all have 32-bit as it's very unlikely that there actually will ever be >4G objects. postponed
    • Switching the fooFor() to 64-bit probably makes sense tho
    • One possible option would be to introduce some "object ID hash" field that would provide (contiguous?) remapping of the object ID to 32-bit values, and the Into() and AsArray() accessors would return this remapping instead of the original (and the data provider would be forced to provide such field?). But then again it'd cause issues with for example animation references that would still reference the original 64-bit value, there one would have to look up the hash first...
  • change mappingBound() to a Range? / rename to mappingRange()? postponed

Hooking up into the rest:

  • a standalone tool that converts the SceneData to have each object just one mesh/light/camera/skin, to support classic workflows and make the backwards-compat wrappers easier to write
    • will get evenually publicly exposed in the new SceneTools library
  • Write deprecated backward-compat wrappers for SceneData::children2D() / children3D()
  • Deprecate object2D() / object3D() in AbstractImporter, write backwards-compat functions that extract this out of the new SceneData using the (slow) per-object convenience APIs
    • also handle the expansion of multi-mesh objects using the tool above
  • AbstractImporter::objectForName() / objectName() (drop the 2D/3D distinction), works on objects up to SceneData::objectCount() AbstractImporter::objectCount()
    • bump the interface string to avoid ABI-break-related crashes
    • huh... but how to distinguish between scenes and have it robust w/o having to retrieve scenedata first? or have objectForName(scene, name) and objectCount(scene) on the importer directly? ahhhh there's a global object count (which is needed anyway since animations, skins etc reference objects globally also)
    • or drop this and store names in SceneData directly? might be wasteful if the users don't need that, doesn't allow searching... OTOH it'd be cool to have a builtin possibility to put them there so the SceneData can be made self-contained nope, handle externally
  • Trade::AnimationData need to have a 64-bit object ID now
  • AbstractImporter::sceneFieldName() / sceneFieldForName() for custom fields
  • SceneField::SkinId needs to be distinguished for 2D and 3D, I suppose? Or have a is2D() / is3D() getter on the SceneData itself, which looks for transformation fields and then decides? What if neither is available? checked and populated in the constructor, skin fields require a transformation field to disambiguate the dimensionality
  • How do we deal with objects that have multiple meshes assigned, how does one understand skins there? is the same skin applied to all meshes? or just to the first and it's expected that extra meshes have extra skins? -- same skin for all meshes
    • Make this field get copied when converting to single-functioning objects for backwards compatibility
  • SceneField::MeshMaterial needs to be signed because otherwise it's impossible to represent meshes without materials (as it has to have the same object mapping as meshes)
  • Adapt AnySceneImporter
  • Adapt ObjImporter it doesn't have any scene / material assignment yet, nothing to do
  • Adapt TinyGltfImporter
    • Currently it fails if there's no scene that'd list the root nodes, implement a fallback (with a warning) that collects all root nodes instead postponed, needs a BitArray
    • And consequently warn also if there are scenes but also loose nodes not referenced by any of them postponed, needs a BitArray
  • Adapt CgltfImporter
    • same handling as TinyGltf for scene-less files postponed
  • Adapt AssimpImporter
    • might need the same handling as TinyGltf for scene-less files nope, nodes are there only if a scene is
  • Adapt OpenGexImporter
    • might need the same handling as TinyGltf for scene-less files nope, there's just one implicit scene
  • Adapt PrimitiveImporter (it has both 2D and 3D scenes!), a nice use case for constexpr SceneData
  • Adapt plugins to make use of per-field flags, ideally everything at least Ordered, implicit object mappings omitted
  • update docs for all plugins, mention which fields are always present, which not, what types, shared object mapping, ordering/triviality...
  • add to magnum-sceneconverter --info (listing what fields a scene has and how many objects is for each)
  • adapt DartIntegration
  • adapt the viewer example
  • adapt magnum-player

Followup tasks:

  • SceneFieldType::String, internally represented as begin+size pairs, with the data being taken from the data blob, or as offsets if the field is offset-only; begin+size instead of implicit offset array in orde to allow the same string being shared between multiple fields postponed
  • It should assert that all transformations are either 2D or 3D or allow 2D transforms used for 3D and vice versa, like MeshData does for positions no (at least for now), conversion complexity not worth the 0.001% use case
  • Support the Bool type postponed
  • Support storing direct (non-owned) pointers to MeshData, MaterialData if one so desires (could be useful for the upcoming SceneTools library as well) postponed
  • Support sparse fields (highest bit of a 32-/64-bit object index used to distinguish "dead" entries), for easier (de)serialization of live systems postponed
  • Additional per-field flags ("field is sparse", "field is ordered", ...), flags for describing field data in additon to object data postponed
  • SceneField::Present / Belongs / ... ? which is a bit array and enumerates which objects belong to the scene postponed
  • SceneField::InstanceMesh to support glTF's EXT_mesh_gpu_instancing postponed
    • CollisionMesh? postponed
  • Mesh view properties (index/vertex offset and count, instance offset and count), all should share the same object mapping; all should be retrievable using a single convenience API that fills a few separate views currently doable via custom fields, postponed to when importer support is there
    • Provide a corresponding overload to GL::AbstractShaderProgram::draw() that takes raw data instead of a ton of GL::MeshView references -- c2ecaa6
      • Adapt the method chaining overload in all builtin shaders for that (make a macro!!) postponed
  • Material texture transform + layer properties that get applied on top of the referenced material postponed
  • Instead of failing, synthesize parent + transform info in the AsArray() helpers if not present -- -1 / identity for all objects? need to think about how to distinguish transformation-less objects in this case not really possible, transformless objects are common (plus the type distinguishes between 2D and 3D), non-node objects are also a thing now
    • same for (now signed) MeshMaterial, all -1s
  • transformationObjectsAsArray() helper that picks object mapping for Transformation / TRS fields depending on which of them is available, currently this has to be done by the user and is error-prone solved by the Into / AsArray helpers returning the object mapping as well
  • Real-world test sharing the data with entt/flecs? i don't see why this new way would be worse than the cursed OOP API that was there before

Open Qs:

  • Storing octrees / kD trees -- the Parent field currently denotes that an object is a hierarchical node; so any other (custom) field could be assigned to objects to denote those are octree cells; encoding of the octree is up to the user
  • Storing child information (inverse of the parent information) -- allows users to pick better representation for given task, parentsAsArray() / childrenAsArray() then falls back to querying the information from the other field if the primary is not available postponed
  • SceneFieldType::SceneField? e.g. for a potential accelerating SceneField::InverseMapping field that maps object IDs to fields (i.e., is sorted by object ID, or is trivial and sparse ...) can't see why this would be useful anymore
  • Think more about SceneField::Parent, if it really needs to be self-referencing (what else the object mapping view would be? the exact same view?), and if it could be restricted to transforms only not self-referencing, but the Parent field implies the object is a node
    • rename Parent to ParentNode? to make room for parent octree cell etc postponed
  • Support sparse object IDs (for easier (de)serialization of live systems, again) postponed
  • Support more than one array per field (for easy storage of "archetypes"-like ECS state) postponed

@mosra mosra added this to the 2021.0a milestone Jul 10, 2021
@mosra mosra self-assigned this Jul 10, 2021
@mosra mosra force-pushed the scenedata branch 7 times, most recently from 94fd381 to 9a8c22b Compare September 3, 2021 16:43
@codecov
Copy link

codecov bot commented Sep 3, 2021

Codecov Report

Merging #525 (9a8c22b) into master (dbec10d) will increase coverage by 0.44%.
The diff coverage is 98.61%.

❗ Current head 9a8c22b differs from pull request most recent head 70a74d5. Consider uploading reports for the commit 70a74d5 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #525      +/-   ##
==========================================
+ Coverage   79.82%   80.27%   +0.44%     
==========================================
  Files         511      511              
  Lines       32036    32818     +782     
==========================================
+ Hits        25574    26345     +771     
- Misses       6462     6473      +11     
Impacted Files Coverage Δ
src/Magnum/Trade/AbstractImporter.h 100.00% <ø> (ø)
src/Magnum/Trade/Trade.h 100.00% <ø> (ø)
src/Magnum/Trade/SceneData.cpp 97.92% <97.92%> (-2.08%) ⬇️
src/Magnum/Math/PackingBatch.cpp 99.16% <100.00%> (+0.14%) ⬆️
src/Magnum/Trade/AbstractImporter.cpp 100.00% <100.00%> (ø)
src/Magnum/Trade/SceneData.h 100.00% <100.00%> (ø)
src/Magnum/Math/Vector3.h 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dbec10d...70a74d5. Read the comment docs.

Needed for backwards compatibility purposes currently, eventually it'll
become a public API in the upcoming SceneTools library.
…pat.

Now the code has all needed backwards compatibility in place.
That's what plugins implemented with the legacy API did, as it's more
flexible than a matrix.
Even though this API is deprecated and thus not meant to be used, most
existing code is still using the previous APIs and relying on the
backwards compatibiity interfaces. And I wasted quite some time
debugging why the scene looks like empty.
Another thought I had was about allowing a 2x2 / 3x3 matrix to be used
for rotation, but there's the ambiguity with it possibly containing
scaling / shear / whatever, which would then cause EXPLODING HEADACHES
when converting to quaternions / complex numbers.
As the comment says -- before the user code expected that if the scene
hierarchy is broken, particular objects will fail to import. Now the
whole scene fails to import, so we don't even get to know the actual
(expanded) deprecated 2D/3D object count. To reduce the suffering,
return at least the dimension-less object count there. It won't include
the duplicates from the single-function-object conversion but better
than reporting 0.
Honestly I don't care much, this is just that the original
PrimitiveImporter tests started to fail because they expected
object3DForName() to return -1 for a 2D name, but that's no longer the
case. It would be possible to fix this, but I doubt anyone ever relied
on such behavior for 2D scenes, so just add a test that acknowledges
this new behavior.
This has to be solved on a more systematic level, perhaps even by
switching all types to be 64-bit. In the following commit all *AsArray()
and *Int() functions will output the object IDs as well, meaning this
would need to be handled in each and every API, which is a huge
maintenance burden.

As it's very unlikely that there actually will *ever* be >4G objects,
one possible option would be to introduce some "object ID hash" field
that would provide (contiguous?) remapping of the object ID to 32-bit
values, and the Into() and AsArray() accessors would return this
remapping instead of the original. But then again it'd cause issues with
for example animation references that would still reference the original
64-bit value.
Because that way one can query a field with *AsArray() and iterate
through it in a single expression. This also resolves the pending issue
where it was more than annoying to fetch object mapping for TRS fields
when only a subset of the fields is available.
This got originally added as some sort of a kludge to make it easy to go
to the parent transformation, assuming Parent and Transformation share
the same object mapping:

    parentTransformation = transformations[parents[i]]

But after some ACTUAL REAL WORLD use, I realized that there's often a
set of objects that have a Parent defined, and then another, completely
disjoint, set of objects that have a transformation (for example certain
nodes having no transformation at all because it's an identity). And so
this parent indirection is not only useless, but in fact an additional
complication. Let's say we make a map of the transformations, where
transformationMap[i] is a transformation for object i:

    transformationMap = {}
    for j in range(len(transformations)):
        transformationMap[transformationObject[j]] = transformation[j]

Then, with *no* assumptions about shared object mapping, the indirection
would cause parent transformation retrieval to look like this:

    parentTransformation = transformationMap[parentObjects[parents[i]]

While *without* the indirection, it'd be just

    parentTransformation = transformationMap[parents[i]]
Now it's a field and its corresponding object mapping, instead of
field and "objects":

 - Goes better with the concept that there's not really any materialized
   "object" anywhere, just fields mapped to them.
 - No more weird singular/plural difference between field() and
   objects(), it's field() and mapping() now.
 - The objectCount() that actually wasn't really object count is now a
   mappingBound(), an upper bound for object IDs contained in the object
   mapping views. Which is quite self-explanatory without having to
   mention every time that the range may be sparse.
Currently used by the per-object access APIs to make the lookup
constant- or logarithmic-time instead of linear, available for use by
external data consumers as well.
Not really important right now as the SceneData from these are used only
in internal deprecated APIs, but at least it might speed up the
children2D() / children3D() queries. Mainly done so I don't forget to do
this later when these APIs are published in the SceneTools library.

What's not done is the rather complex logic in the single-function
conversion utility, where a field could retain the implicit/ordered
flags in *some* scenarios. There's too many corner cases so better be
conservative and don't preserve anything rather than mark something as
ordered while it's no longer the case. The corner cases are hopefully
all checled for (and XFAIL'd) in the test.
Hm, this will soon need some color distinction, it's starting to get
hard to read.
Mostly just to avoid the return types changing to incompatible types in
the future, breaking existing code. The internals are currently not
fully ready to operate with 64-bit object IDs, especially the AsArray()
APIs -- those I will have to solve in the future somehow. Returning
64-bit values in the pairs would add four byte padding after basically
each value, which is way too wasteful for the common case.

The Into() APIs could eventually get 64-bit overloads though.
To align with the new SceneData.
This is implemented in a generic way internally, to allow copying of
arbitrary fields, not just skins for meshes, in the upcoming SceneTools
utility.
With the previous commits, existing plugin implementations built and ran
against the new code, however it introduced several ABI breaks meaning
that existing plugin binaries would crash. This forces them to be
recompiled to match the new version string.
@mosra mosra merged commit 4418bec into master Dec 7, 2021
@mosra mosra deleted the scenedata branch December 7, 2021 23:03
@mosra
Copy link
Owner Author

mosra commented Dec 7, 2021

Merged the current state, few bits got scrapped and put aside into #542 because they're too cursed. Items marked as postponed now live in the project Projects board and will be gradually implemented in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

1 participant