Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autodesk: Billboards #39

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

erikaharrison-adsk
Copy link
Contributor

Description of Proposal

Proposal of billboards transformations based on extending the XformOp.

Supporting Materials

N/A

Contributing


XformOps `affixPosition` and `affixScale` can specify the target position and size
in a selected coordinate space:
- `ndc`: coordinate is in the normalized device space ([-1.0, 1.0]x[-1.0, 1.0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be better to pick a different name for this. Is there something simpler that a lay person might grok easier?

USD has generally avoided acronyms other than super common ones like IOR. I worry that NDC is too domain specific, and the acronym doesn't align with the words (NDC vs NDS). I don't think device space is necessarily easy to grok either. What would the device be in this scenario?

Copy link

@andy-shiue-autodesk andy-shiue-autodesk Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree 'ndc' is too domain specfic and artist might not know what it is. I also considered to use 'clip' space, but it is not as precise as ndc in this context.

I will reword normalized device space to normalized device coordinates for now.

May need to create a new term and define it in the proposal. Don't know what better term to use though. Maybe just name it normailized?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normalized device coordinate is a standard term of art, and ndc is a standard "acronym of art". I don't think changing the term improves understandability, and I'm not convinced that spelling out ndc helps. Perhaps a a definition is in order. The reason I don't think spelling it out or changing the term helps is this: Artists are used to photoshop type coordinate systems with a 0,0 coordinate, typically upper left. a coordinate system centered in the viewport and extending in negative directions to indicate "up and left" is not at all what they're trained for. Leaving it as ndc means that there's an opportunity to prompt a discussion and some learning.

Just calling it normalized is super vague compared to the term of art. What's being normalized? Versus what origin and Euclidean metric? The only thing that would make sense to me is something like "viewport centered coordinate system with negative values being up and left and -1 to 1 are stretched to fit the whole viewport". It doesn't add anything over ndc ;)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that NDC has a consensus interpretation and if we need to introduce a new concept, introducing one that's easy to Google and consistent is compelling. For this & view, I'd love the full mapping; e.g. ndc maps things onto the near plane, i.e. XY plane @ Z = 0, with X+ and Y+ pointing towards "foo". And for "view", we're looking down such and such axis, with such and such axis as up and such and such as right. For "ndc" I'm assuming you can only specify a 2d position and it's constrained to the near plane? And for "view" it would be a 3d position in eye space, before perspective transform?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ndc is assumed to be xy on the near plane and view is the 3d position in eye space. Since you mentioned perspective transform, I assume you mean clip space?

@dgovil
Copy link
Contributor

dgovil commented Apr 5, 2024

We have a similar schema for AR use here: https://developer.apple.com/documentation/arkit/arkit_in_ios/usdz_schemas_for_ar/preliminary_anchoringapi?language=objc

The differences I see are:

  1. We provide a few different anchor types like image and face. I can understand those not being part of USD proper, though I wonder if we can include them (or reserve their use) since they are common alignment methods in several AR products (not just our own)
  2. Our alignment is simply horizontal and vertical. If I understand your proposal properly, there is no secondary rotation alignment along the surface? e.g perpendicular or parallel to the plane. OR would that be taken care of by xformOp:align:plane ?

I do think your proposal captures most of what ours does for rotational alignment. Does this proposal consider positional anchoring as well? I was unclear on that aspect.

If it does include positional anchoring, I might suggest calling out (for a future proposal perhaps) the ability to specify an in scene prim to anchor to, not just an extrinsic camera/viewer.

@nvmkuruc
Copy link
Contributor

nvmkuruc commented Apr 9, 2024

I think there's a lot of potential for describing sections of the scene graph as screen anchored. One could imagine shipping assets with macbeth charts and context spheres as siblings to the default prim.

(
    defaultPrim = "model_root"
)
def "model_root" { }
def "context" {
    def "macbeth_color_chart" (references = @color_chart.usd@) { ... }
}

The proposal notes two transformative changes to Xformable-- making ops that are "extrinsic" and "compound" .

My main observation would be that many clients assume intrinsic Xformable. Functions like ComputeLocalToWorldTransform() for example don't provide any inputs to provide context. It seems like state attributes could be seen as a way to address this, but there's many times where there are zero or multiple viewers. At the same time, I'd expect that there would only be one set of billboard ops in a given branch of a hierarchy and that most authored Xformables would still be intrinsic.

Because extrinsic computations add complexity to reasoning about Xformable in usages like simulation, I think it's worth exploring anchoring as a separate typed schema. PointInstancer is an example of schema that similarly complicates the computation of transforms. Consumers can use the schema type as hook for determining whether or not they reason about the transform state. In our "visualization domain" proposal, we called this type of nondiegetic transform a VizAnchor.

A minor consideration regarding compound ops is that the third namespace component is currently used to describe the name of xform ops when an op is used multiple times. A new schema type wouldn't have this conflict.

@andy-shiue-autodesk
Copy link

  1. We provide a few different anchor types like image and face. I can understand those not being part of USD proper, though I wonder if we can include them (or reserve their use) since they are common alignment methods in several AR products (not just our own)

It looks like, for the AR uses, you want to anchor a vitural object to a real object. I guess there are presumptions to have matched geometry (or bounding geometry?) between the vitual and real object? To support such anchoring, you probably need a combined transformation of tranlation, scaling and rotation. Currently proposed affixPosition, affixScale and align indivisually are too simple to support such transformation.

I think, instead of reserving the anchoring type, you can create a new xformOp to support such anchoring. I probably did not make it clear that more xformOps can and will be added to support other constrained transofrmations. I started recently calling the feature as constrained transformation since the extended xformOp framework supports more than just billboards.

Too bad that there is no way to create a plugin to add new xformOps right now.

@andy-shiue-autodesk
Copy link

  1. Our alignment is simply horizontal and vertical. If I understand your proposal properly, there is no secondary rotation alignment along the surface? e.g perpendicular or parallel to the plane. OR would that be taken care of by xformOp:align:plane ?

What is the definitionof secondary rotation alignment? Is that something like a rotation constrained on an ordbit of an axis? That requires a new xformOp or new attribues of xformOp:align. I got a request on this axis constrained rotation and kinda inclined to add a new xformOp to support it.

Or you just want more control on the relation to the alignment reference? Like parallel or pependicular to the screen or even an authored angle between the prim and screen? In this case, we can add a new attribute in xformOp:align to support it.

@andy-shiue-autodesk
Copy link

andy-shiue-autodesk commented Apr 10, 2024

Does this proposal consider positional anchoring as well? I was unclear on that aspect.

If it does include positional anchoring, I might suggest calling out (for a future proposal perhaps) the ability to specify an in scene prim to anchor to, not just an extrinsic camera/viewer.

xformOp:affixPosition can anchor a model space position to a screen sapce position. It cannot acnhor to other prims in the stage though. This extension is definetly worth a spot in the future proposal.

It is not clear to me how this can be supported in xformOp though. To anchor one prim to another, you would need the world coordinate of the anchor prim. That data is tricky to get in xformOp since the model2world transformation is computed during the sync phase. We will need to extend UsdGeomXformable to make this work; the thread safty and the denpendt dirtiness could be difficult to handle.

@spiffmon
Copy link
Member

To build on @nvmkuruc 's comments, but starting with the quote from @andy-shiue-autodesk

Too bad that there is no way to create a plugin to add new xformOps right now.

It's actually a considerable feature to interchange and native-DCC-ingestion-of-USD that Xformable/xformOps remain intrinsic, simple, and "well supported" (in quotes because not every DCC supports everything the current xformOps compactly encode, but it's all straight-up and easy to translate). I know we haven't provided concrete guidance on how to handle more sophisticated encodings of transformations (e.g. constraints) in USD, but in a nutshell, the thinking is that Xformable remains simple, but its attributes will eventually be "drivable" by rigging/computation powered by OpenExec, whose effects could always be "baked out" in a caching step.

The other thing is that there needs to be a strict separation between "3D scene evaluation" and viewing state, for the reasons Matt mentioned.

What this suggests, in advance of OpenExec, is leveraging prim types that can be easily identified as living in screen-space, and therefore not affecting the scene's 3D bounds, and informing non-rendering USD clients that "here and below is data that only makes sense in the context of a render"; I think this also makes the Hydra update/invalidation process simpler, and no need to try to communicate data back to the UsdStage/evaluation. The Apple anchoring schema I think captures this, and it is very nicely formalized in the Add visualization schema proposal. . It would be fantastic to see Billboards explored in that space!

Thanks, @andy-shiue-autodesk !

@andy-shiue-autodesk
Copy link

andy-shiue-autodesk commented Apr 15, 2024

@nvmkuruc @spiffmon Thank you for pointing out the concerns about the extrinsic states. They are valid points and I made a wrong choice to support the extrinsic states as runtime apis. The extrinsic states need to be "authorable" as part of the schemas.

I started the billboards project to support viewport manipulators. I was kinda limiting my thought process to have the renderer to integrate the billboards xformOp.

Here is what I think I should do (preliminary, of course):

(
    defaultContext = "global"
)
def Context "global" { ... }
def Context "active" { ,,. }

def Cone "cone1" {  ... }
def Cone "cone2"
{
    def "context" (references=[</active>]) { }

    ...
}

This design has extrinsic states part of the authored scene. Composition arcs should work well. External evaluation engine are allowed to compute the context states that drive the transformation.

Two of the big implementation challenges I had (accessing extrinsic states and managing the dirtiness of the transformation) will be addressed easily with the extrinsic states as embedded attributes. A big win for me.

Has this kind of context schemas been discussed before? It could be under UsdRender (as a new UsdRenderContext) or by extending UsdRenderSettings.

@andy-shiue-autodesk
Copy link

At the same time, I'd expect that there would only be one set of billboard ops in a given branch of a hierarchy and that most authored Xformables would still be intrinsic.

Yes, I expect people to have one list of billboards ops near the end of the transformation stack. But if consider transformations that are "constrained" in certain ways, math of certain constrained transformations may work as multiples in a transformation stack. There is no hard restriction on how billboards can be used, but the mathematics can make the end transformation unpredictable if people are creative.

@andy-shiue-autodesk
Copy link

andy-shiue-autodesk commented Apr 15, 2024

Because extrinsic computations add complexity to reasoning about Xformable in usages like simulation, I think it's worth exploring anchoring as a separate typed schema. PointInstancer is an example of schema that similarly complicates the computation of transforms. Consumers can use the schema type as hook for determining whether or not they reason about the transform state. In our "visualization domain" proposal, we called this type of nondiegetic transform a VizAnchor.

This is a good point. I am inclining to call it constrained transformation (transformation that is constrained by extrinsic states). Considering the complexity comparing to the current xformOp, having it as a new schema makes a lot of sense.

A minor consideration regarding compound ops is that the third namespace component is currently used to describe the name of xform ops when an op is used multiple times. A new schema type wouldn't have this conflict.

I was considering to do the parsing magic on xformOp:xxx:use. That could look very ugly with the billboards xformop ;). You are right on having a new schema type to avoid this issue.

Copy link

@tcauchois tcauchois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi!

Thanks for pushing this discussion forward! I'm excited to see this feature and excited about the general direction, but I've left some organizational and implementation notes. Please take a look, and if you have questions about my comments I'm happy to talk here or I'm also on ASWF slack.

Thanks,
Tom


XformOps `affixPosition` and `affixScale` can specify the target position and size
in a selected coordinate space:
- `ndc`: coordinate is in the normalized device space ([-1.0, 1.0]x[-1.0, 1.0])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that NDC has a consensus interpretation and if we need to introduce a new concept, introducing one that's easy to Google and consistent is compelling. For this & view, I'd love the full mapping; e.g. ndc maps things onto the near plane, i.e. XY plane @ Z = 0, with X+ and Y+ pointing towards "foo". And for "view", we're looking down such and such axis, with such and such axis as up and such and such as right. For "ndc" I'm assuming you can only specify a 2d position and it's constrained to the near plane? And for "view" it would be a 3d position in eye space, before perspective transform?

in a selected coordinate space:
- `ndc`: coordinate is in the normalized device space ([-1.0, 1.0]x[-1.0, 1.0])
- `view`: coordinate is in the view space, where camera is at (0,0,0) and faces the x-y plane.
- `screen`: coordinate is in the screen space ([0, width-1]x[0, height-1] where width and

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the scene can't know what the screen space is likely to be (except for prims created by the app), it seems like instead of specifying a range we want to just say that the linear unit is pixels and the origin is understood to be in X location.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The screen space is defined by the viewport setting of the renderer. viewport needs to be one of the extrinsic states provided to screen mode..

I kinda against to use pixel directly as the position unit considering devices with verious dpi. I considered to add 'metric' option so users can define the position in inches or cms. I decided not to add it since it can be achieved with screen option.

But 'pixel' or 'inch' are clearly more intuitive to non-engineer users, so I think it is reasonable to add them.


XformOp `affixPosition` consists of two attribute fields to affix the transformable to the
specified screen location:
- `affixPosition:anchor`: source location that is specified in the model space.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"anchor"/"anchorage" are confusing to me; why not "localAnchor" and "targetSpacePosition"?

Copy link

@andy-shiue-autodesk andy-shiue-autodesk Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I searched the dictionary to find good words for them, haha. I can defintely use more decriptive terms to define them.

- `affixScale:baseSize`: the unit size in the model space.
- `affixScale:scaleSize`: the equivalent unit size in the selected coordinate space (
`ndc`, `view`, or `screen`).
- `affixScale:pivot`: the center position of the scaling in the model space.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems nice to have this match affixPosition, e.g. "affixScale:anchor". And for consistency, baseSize = localSize, scaleSize = targetSpaceSize?

- `viewer`: front normal points to the camera location.
- `plane`: front normal is orthogonal to a custom plane.

`align` constructs the local transformation that maintains the orientation of the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While "align" is probably the most straightforward term here, it would be nice to align it with the other proposed additions, e.g. "affixOrientation".

It seems like the three alignment targets you're offering are:

  1. Align to near plane.
  2. Align to camera position.
  3. Align to parallel or anti-parallel (or maybe only anti-parallel) to a given direction.

I think renaming "viewer" to "cameraPosition" would be clearer. I also think that orienting to be orthogonal to a plane is just orienting to be parallel to the plane normal, and specifying it as orienting to a direction would be clearer; but note that you need a second vector ('up') in order to make this a valid orientation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestions. I will do a review round on naming to make sure their meaning are clear (will do my best) and consistent.

All billboards XformOps are acceptable Ops in `xformOpOrder`. Billboards XformOps can
combine with the existing XformOps in USD. This allows the use cases such as rendering a
fixed-size axis-tripod at a fixed screen location. The axis-tripod can, of course, synchronize
its rotation with the trackball action of the camera.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a number of compensation mechanisms in affixPosition and affixScale to basically bake in transforms on top of the computed affix-transform (e.g. affixPosition:anchor, which is by default the origin; affixScale:baseSize, which is by default 1; and affixScale:pivot, which is by default the origin).

Are you planning for these to compose with descendant transforms, according to standard transform composition rules? It sounds from this paragraph like you are, and in that case I'd eliminate the other mechanisms mentioned above as redundant; I think the API will be easier to understand if it's not offering this redundant functionality.

xformOp already has a flag with similar evaluation semantics that we handle while computing the resolved transforms called resetXformStack, which re-roots the current model space at the world origin, ignoring inherited values. It seems like affix* could work in a similar way, where they reset the transform and ignore parent composed transforms, but can be inherited to child composed transforms.

If the "affix" commands do compose, and do reset transform inheritance, do they reset all inheritance or do "position/scale/align" need to specifically reset the position, scale, and rotation components of the transform? For the latter, for full generality, to compute transform inheritance we'd need to compute a decomposition of the parent transform (which is pricey) and figure out how to handle degenerate parent transform matrices, although it would be semantically rich in a nice way.

Copy link

@andy-shiue-autodesk andy-shiue-autodesk Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your point on the pricy decompositon of current transformation matrix (CTM) is valid. When I worked on the prototype, I have difficulties to extract the rotation reliablely. Without certain pre-coditions of the transformation matrix, I did not find a reliable way to decompose it .

This part of proposal is confusing and probably misleading. I'll make changes to it.

Here is what I think at this moment:

  • Billboards xformOp constructs a local matrix without considering CTM. The billboards matrix will be applied to CTM just like what the current xformOps are doing.
  • In addition to resetXformStack, we might want to add new resetOp such as resetTranslation, resetScale and resetRotation. Such resetOp may not work if CTM is degenrated unfornately ;(
  • Users should optionaly prepend a resetXformStack or a resetOp before billboards xformOps.

I would prefer to make the reset explict instead of having it implictly reset inside. Baking the reset to billboards does not seem to be a good idea. It is not flexible and can impact the perpormance even for use cases that the reseting is not required.

examples of the renderer are a hydra render delegate like HdStorm or a scene index implementation.
Codes behind the hydra layer updates the camera states stored in the registry whenever the
camera is updated. While computing the local transformation, billboards XformOps retrieve
the latest updated camera states from the registry.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you'll need the camera state in a few places. I'd imagine in UsdGeomXformCache (and associated UsdGeomXformable helper functions), you'd want to provide the camera state so that you can correctly compute things; here, I'd recommend either passing in a UsdGeomCamera prim, or a GfFrustum. The latter would be preferable for applications with mixed scene types.

In terms of feeding these to hydra, the ideal solution is to leave the transforms as camera relative, and only transform them into world space inside the render delegate. GlfSimpleLight, for example, works this way. This greatly decreases the invalidation burden when you switch cameras, or when you do multi-camera rendering (e.g. for multiple viewports). Another solution, which I like far less, is to have the HdFlatteningSceneIndex transform flattening become camera aware.

Building coordination between the render delegate and the USD evaluation causes evaluation cycles and awkward data dependencies, and I'd really rather avoid such a design (based on some experience :).

@spiffmon
Copy link
Member

@nvmkuruc @spiffmon Thank you for pointing out the concerns about the extrinsic states. They are valid points and I made a wrong choice to support the extrinsic states as runtime apis. The extrinsic states need to be "authorable" as part of the schemas.

I actually disagree with this. It will be very common to have multiple viewports open and viewing the same stage. It seems neither clean nor practical (due to multi-threading concerns) for each viewport to author its own latest camera info into either the asme attributes, or even different ones. Part of the thinking in moving this into the "VizPrim" domain is in acknowledgment that it is a drawable whose state cannot be fully captured on the Stage itself, I'd argue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Draft
Development

Successfully merging this pull request may close these issues.

8 participants