-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autodesk: Billboards #39
base: main
Are you sure you want to change the base?
Autodesk: Billboards #39
Conversation
|
||
XformOps `affixPosition` and `affixScale` can specify the target position and size | ||
in a selected coordinate space: | ||
- `ndc`: coordinate is in the normalized device space ([-1.0, 1.0]x[-1.0, 1.0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be better to pick a different name for this. Is there something simpler that a lay person might grok easier?
USD has generally avoided acronyms other than super common ones like IOR. I worry that NDC is too domain specific, and the acronym doesn't align with the words (NDC vs NDS). I don't think device space
is necessarily easy to grok either. What would the device
be in this scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree 'ndc' is too domain specfic and artist might not know what it is. I also considered to use 'clip' space, but it is not as precise as ndc in this context.
I will reword normalized device space
to normalized device coordinates
for now.
May need to create a new term and define it in the proposal. Don't know what better term to use though. Maybe just name it normailized
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
normalized device coordinate
is a standard term of art, and ndc
is a standard "acronym of art". I don't think changing the term improves understandability, and I'm not convinced that spelling out ndc helps. Perhaps a a definition is in order. The reason I don't think spelling it out or changing the term helps is this: Artists are used to photoshop type coordinate systems with a 0,0 coordinate, typically upper left. a coordinate system centered in the viewport and extending in negative directions to indicate "up and left" is not at all what they're trained for. Leaving it as ndc
means that there's an opportunity to prompt a discussion and some learning.
Just calling it normalized
is super vague compared to the term of art. What's being normalized? Versus what origin and Euclidean metric? The only thing that would make sense to me is something like "viewport centered coordinate system with negative values being up and left and -1 to 1 are stretched to fit the whole viewport". It doesn't add anything over ndc
;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that NDC has a consensus interpretation and if we need to introduce a new concept, introducing one that's easy to Google and consistent is compelling. For this & view, I'd love the full mapping; e.g. ndc maps things onto the near plane, i.e. XY plane @ Z = 0, with X+ and Y+ pointing towards "foo". And for "view", we're looking down such and such axis, with such and such axis as up and such and such as right. For "ndc" I'm assuming you can only specify a 2d position and it's constrained to the near plane? And for "view" it would be a 3d position in eye space, before perspective transform?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, ndc is assumed to be xy on the near plane and view is the 3d position in eye space. Since you mentioned perspective transform, I assume you mean clip space?
We have a similar schema for AR use here: https://developer.apple.com/documentation/arkit/arkit_in_ios/usdz_schemas_for_ar/preliminary_anchoringapi?language=objc The differences I see are:
I do think your proposal captures most of what ours does for rotational alignment. Does this proposal consider positional anchoring as well? I was unclear on that aspect. If it does include positional anchoring, I might suggest calling out (for a future proposal perhaps) the ability to specify an in scene prim to anchor to, not just an extrinsic camera/viewer. |
I think there's a lot of potential for describing sections of the scene graph as screen anchored. One could imagine shipping assets with macbeth charts and context spheres as siblings to the default prim.
The proposal notes two transformative changes to My main observation would be that many clients assume intrinsic Because extrinsic computations add complexity to reasoning about A minor consideration regarding compound ops is that the third namespace component is currently used to describe the name of xform ops when an op is used multiple times. A new schema type wouldn't have this conflict. |
It looks like, for the AR uses, you want to anchor a vitural object to a real object. I guess there are presumptions to have matched geometry (or bounding geometry?) between the vitual and real object? To support such anchoring, you probably need a combined transformation of tranlation, scaling and rotation. Currently proposed I think, instead of reserving the anchoring type, you can create a new xformOp to support such anchoring. I probably did not make it clear that more xformOps can and will be added to support other constrained transofrmations. I started recently calling the feature as Too bad that there is no way to create a plugin to add new xformOps right now. |
What is the definitionof Or you just want more control on the relation to the alignment reference? Like parallel or pependicular to the screen or even an authored angle between the prim and screen? In this case, we can add a new attribute in |
It is not clear to me how this can be supported in xformOp though. To anchor one prim to another, you would need the world coordinate of the anchor prim. That data is tricky to get in |
To build on @nvmkuruc 's comments, but starting with the quote from @andy-shiue-autodesk
It's actually a considerable feature to interchange and native-DCC-ingestion-of-USD that Xformable/xformOps remain intrinsic, simple, and "well supported" (in quotes because not every DCC supports everything the current xformOps compactly encode, but it's all straight-up and easy to translate). I know we haven't provided concrete guidance on how to handle more sophisticated encodings of transformations (e.g. constraints) in USD, but in a nutshell, the thinking is that Xformable remains simple, but its attributes will eventually be "drivable" by rigging/computation powered by OpenExec, whose effects could always be "baked out" in a caching step. The other thing is that there needs to be a strict separation between "3D scene evaluation" and viewing state, for the reasons Matt mentioned. What this suggests, in advance of OpenExec, is leveraging prim types that can be easily identified as living in screen-space, and therefore not affecting the scene's 3D bounds, and informing non-rendering USD clients that "here and below is data that only makes sense in the context of a render"; I think this also makes the Hydra update/invalidation process simpler, and no need to try to communicate data back to the UsdStage/evaluation. The Apple anchoring schema I think captures this, and it is very nicely formalized in the Add visualization schema proposal. . It would be fantastic to see Billboards explored in that space! Thanks, @andy-shiue-autodesk ! |
@nvmkuruc @spiffmon Thank you for pointing out the concerns about the extrinsic states. They are valid points and I made a wrong choice to support the extrinsic states as runtime apis. The extrinsic states need to be "authorable" as part of the schemas. I started the billboards project to support viewport manipulators. I was kinda limiting my thought process to have the renderer to integrate the billboards xformOp. Here is what I think I should do (preliminary, of course):
This design has extrinsic states part of the authored scene. Composition arcs should work well. External evaluation engine are allowed to compute the context states that drive the transformation. Two of the big implementation challenges I had (accessing extrinsic states and managing the dirtiness of the transformation) will be addressed easily with the extrinsic states as embedded attributes. A big win for me. Has this kind of context schemas been discussed before? It could be under UsdRender (as a new UsdRenderContext) or by extending UsdRenderSettings. |
Yes, I expect people to have one list of billboards ops near the end of the transformation stack. But if consider transformations that are "constrained" in certain ways, math of certain constrained transformations may work as multiples in a transformation stack. There is no hard restriction on how billboards can be used, but the mathematics can make the end transformation unpredictable if people are creative. |
This is a good point. I am inclining to call it constrained transformation (transformation that is constrained by extrinsic states). Considering the complexity comparing to the current xformOp, having it as a new schema makes a lot of sense.
I was considering to do the parsing magic on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi!
Thanks for pushing this discussion forward! I'm excited to see this feature and excited about the general direction, but I've left some organizational and implementation notes. Please take a look, and if you have questions about my comments I'm happy to talk here or I'm also on ASWF slack.
Thanks,
Tom
|
||
XformOps `affixPosition` and `affixScale` can specify the target position and size | ||
in a selected coordinate space: | ||
- `ndc`: coordinate is in the normalized device space ([-1.0, 1.0]x[-1.0, 1.0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that NDC has a consensus interpretation and if we need to introduce a new concept, introducing one that's easy to Google and consistent is compelling. For this & view, I'd love the full mapping; e.g. ndc maps things onto the near plane, i.e. XY plane @ Z = 0, with X+ and Y+ pointing towards "foo". And for "view", we're looking down such and such axis, with such and such axis as up and such and such as right. For "ndc" I'm assuming you can only specify a 2d position and it's constrained to the near plane? And for "view" it would be a 3d position in eye space, before perspective transform?
in a selected coordinate space: | ||
- `ndc`: coordinate is in the normalized device space ([-1.0, 1.0]x[-1.0, 1.0]) | ||
- `view`: coordinate is in the view space, where camera is at (0,0,0) and faces the x-y plane. | ||
- `screen`: coordinate is in the screen space ([0, width-1]x[0, height-1] where width and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the scene can't know what the screen space is likely to be (except for prims created by the app), it seems like instead of specifying a range we want to just say that the linear unit is pixels and the origin is understood to be in X location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The screen space is defined by the viewport setting of the renderer. viewport
needs to be one of the extrinsic states provided to screen
mode..
I kinda against to use pixel directly as the position unit considering devices with verious dpi. I considered to add 'metric' option so users can define the position in inches or cms. I decided not to add it since it can be achieved with screen
option.
But 'pixel' or 'inch' are clearly more intuitive to non-engineer users, so I think it is reasonable to add them.
|
||
XformOp `affixPosition` consists of two attribute fields to affix the transformable to the | ||
specified screen location: | ||
- `affixPosition:anchor`: source location that is specified in the model space. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"anchor"/"anchorage" are confusing to me; why not "localAnchor" and "targetSpacePosition"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I searched the dictionary to find good words for them, haha. I can defintely use more decriptive terms to define them.
- `affixScale:baseSize`: the unit size in the model space. | ||
- `affixScale:scaleSize`: the equivalent unit size in the selected coordinate space ( | ||
`ndc`, `view`, or `screen`). | ||
- `affixScale:pivot`: the center position of the scaling in the model space. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems nice to have this match affixPosition, e.g. "affixScale:anchor". And for consistency, baseSize = localSize, scaleSize = targetSpaceSize?
- `viewer`: front normal points to the camera location. | ||
- `plane`: front normal is orthogonal to a custom plane. | ||
|
||
`align` constructs the local transformation that maintains the orientation of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While "align" is probably the most straightforward term here, it would be nice to align it with the other proposed additions, e.g. "affixOrientation".
It seems like the three alignment targets you're offering are:
- Align to near plane.
- Align to camera position.
- Align to parallel or anti-parallel (or maybe only anti-parallel) to a given direction.
I think renaming "viewer" to "cameraPosition" would be clearer. I also think that orienting to be orthogonal to a plane is just orienting to be parallel to the plane normal, and specifying it as orienting to a direction would be clearer; but note that you need a second vector ('up') in order to make this a valid orientation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestions. I will do a review round on naming to make sure their meaning are clear (will do my best) and consistent.
All billboards XformOps are acceptable Ops in `xformOpOrder`. Billboards XformOps can | ||
combine with the existing XformOps in USD. This allows the use cases such as rendering a | ||
fixed-size axis-tripod at a fixed screen location. The axis-tripod can, of course, synchronize | ||
its rotation with the trackball action of the camera. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a number of compensation mechanisms in affixPosition and affixScale to basically bake in transforms on top of the computed affix-transform (e.g. affixPosition:anchor, which is by default the origin; affixScale:baseSize, which is by default 1; and affixScale:pivot, which is by default the origin).
Are you planning for these to compose with descendant transforms, according to standard transform composition rules? It sounds from this paragraph like you are, and in that case I'd eliminate the other mechanisms mentioned above as redundant; I think the API will be easier to understand if it's not offering this redundant functionality.
xformOp already has a flag with similar evaluation semantics that we handle while computing the resolved transforms called resetXformStack, which re-roots the current model space at the world origin, ignoring inherited values. It seems like affix* could work in a similar way, where they reset the transform and ignore parent composed transforms, but can be inherited to child composed transforms.
If the "affix" commands do compose, and do reset transform inheritance, do they reset all inheritance or do "position/scale/align" need to specifically reset the position, scale, and rotation components of the transform? For the latter, for full generality, to compute transform inheritance we'd need to compute a decomposition of the parent transform (which is pricey) and figure out how to handle degenerate parent transform matrices, although it would be semantically rich in a nice way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your point on the pricy decompositon of current transformation matrix (CTM) is valid. When I worked on the prototype, I have difficulties to extract the rotation reliablely. Without certain pre-coditions of the transformation matrix, I did not find a reliable way to decompose it .
This part of proposal is confusing and probably misleading. I'll make changes to it.
Here is what I think at this moment:
- Billboards xformOp constructs a local matrix without considering CTM. The billboards matrix will be applied to CTM just like what the current xformOps are doing.
- In addition to
resetXformStack
, we might want to add newresetOp
such asresetTranslation
,resetScale
andresetRotation
. SuchresetOp
may not work if CTM is degenrated unfornately ;( - Users should optionaly prepend a
resetXformStack
or aresetOp
before billboards xformOps.
I would prefer to make the reset
explict instead of having it implictly reset inside. Baking the reset
to billboards does not seem to be a good idea. It is not flexible and can impact the perpormance even for use cases that the reseting is not required.
examples of the renderer are a hydra render delegate like HdStorm or a scene index implementation. | ||
Codes behind the hydra layer updates the camera states stored in the registry whenever the | ||
camera is updated. While computing the local transformation, billboards XformOps retrieve | ||
the latest updated camera states from the registry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you'll need the camera state in a few places. I'd imagine in UsdGeomXformCache (and associated UsdGeomXformable helper functions), you'd want to provide the camera state so that you can correctly compute things; here, I'd recommend either passing in a UsdGeomCamera prim, or a GfFrustum. The latter would be preferable for applications with mixed scene types.
In terms of feeding these to hydra, the ideal solution is to leave the transforms as camera relative, and only transform them into world space inside the render delegate. GlfSimpleLight, for example, works this way. This greatly decreases the invalidation burden when you switch cameras, or when you do multi-camera rendering (e.g. for multiple viewports). Another solution, which I like far less, is to have the HdFlatteningSceneIndex transform flattening become camera aware.
Building coordination between the render delegate and the USD evaluation causes evaluation cycles and awkward data dependencies, and I'd really rather avoid such a design (based on some experience :).
I actually disagree with this. It will be very common to have multiple viewports open and viewing the same stage. It seems neither clean nor practical (due to multi-threading concerns) for each viewport to author its own latest camera info into either the asme attributes, or even different ones. Part of the thinking in moving this into the "VizPrim" domain is in acknowledgment that it is a drawable whose state cannot be fully captured on the Stage itself, I'd argue. |
Description of Proposal
Proposal of billboards transformations based on extending the XformOp.
Supporting Materials
N/A
Contributing