Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multimodal input #2367

Merged
merged 4 commits into from
Sep 2, 2024
Merged

Multimodal input #2367

merged 4 commits into from
Sep 2, 2024

Conversation

zmerp
Copy link
Member

@zmerp zmerp commented Aug 31, 2024

This PR adds multimodal support on the client and adds a multimodal protocol extension. It's important to understand that multimodal support and multimodal protocol are two distinct orthogonal features.

Multimodal protocol describes how to interpret data sent by the client. without multimodal protocol, controller tracking is described by only HAND_LEFT/HAND_RIGHT devices in device_motions, while hand tracking requires both the hand device motions and the skeletons to be present. With multimodal protocol, only hand skeleton is required for hand tracking.
The old protocol was decided without anticipating a feature like multimodal, so now we have to negotiate multimodal protocol support and switch it on only when both client and server advertise compatibility.

Multimodal support is enabled only when supported by the headset, and controlled by the multimodal_input setting. Multimodal input would be enabled in the headset regardless if the server supports the multimodal protocol, but actual multimodal behavior can be used by the server only if both peers support multimodal protocol.

@zmerp zmerp force-pushed the multimodal-input branch 2 times, most recently from 1ba2482 to 9d70714 Compare August 31, 2024 02:02
@zmerp zmerp marked this pull request as ready for review August 31, 2024 02:02
@zmerp zmerp force-pushed the multimodal-input branch 5 times, most recently from 49448ed to 885cef9 Compare August 31, 2024 02:27
@zmerp zmerp force-pushed the multimodal-input branch from 885cef9 to fdecc2b Compare August 31, 2024 02:49
@zmerp zmerp force-pushed the multimodal-input branch 2 times, most recently from d81a630 to ad75817 Compare September 2, 2024 01:45
@The-personified-devil
Copy link
Collaborator

So the way I'm expecting this to work (and what seems like the only sensible way to make it work reliably) doesn't seem to be how it is, so I'll explain it right here so you can explain how the implementation differs.

Client supports multimodal input && protocol is negotiated:
If controller is held, send skeleton & hand_left/right -> controllers skeleton gets improved by hand tracking
If controller is not held, send skeleton & detached_controller -> skeleton is the entire hand info, detached is the detached controller with it's totally different position

Server supports multimodal input && protocol is negotiated:
Separate hand trackers aren't enabled and or not active -> main controller pose derived from skeleton, main controller gets skeleton, detached controllers get detached controller motions
Separate hand trackers are enabled and active -> hand tracker controller pose derived from skeleton + gets skeleton, detached controllers get detached controller motions

compat layer:

client has multimodal input but no protocol negotiated:
controller held -> send hand_left/right (not too sure if we also send skeleton, but does it matter?)
controller not held -> send skeleton + hand_left/right derived from skelly

server has multimodal but no protocol negotiated:
hand_left/right but no skelly -> main controller
skelly + hand_left/right -> only push skelly -> separate controller handling takes over

@zmerp
Copy link
Member Author

zmerp commented Sep 2, 2024

@The-personified-devil there is quite a bit of confusion. "Server supports multimodal" is not a variable to keep track. SteamVR does not support multimodal input the way the Quest does. SteamVR always had the ability to have controller + hand skeleton at the same time. with SteamVR Input 2.0 when using two pairs of devices we can assign different skeletal levels and switch between them.
Maybe the problem is "multimodal input" means multiple things according to Meta. From what I understand, it means tracking also skeleton when tracking the controllers (implemented), tracking detached controllers (not implemented), and tracking controller with one hand and only hand skeleton with the other (the server/protocol always supported this, but not the client).
And with SteamVR Input 2.0 we mean having separate controllers and hand trackers, and using different skeletal levels for each. Technically SteamVR input 2.0 does not yet support multimodal. when translating OpenVR to OpenXR (at the PCVR game side) the skeletal data would be reduced to simple finger curls using the Valve OpenXR extension. true multimodal support on the server would mean activating both pairs of devices (controllers + hand trackers) at the same time, which we never do. As you can see SteamVR 2.0 and multimodal are actually completely orthogonal features. In fact to note is that we never use the multimodal setting flag on the server side (we only check for multimodal protocol). We could implement multimodal on the server too but if not advertised by SteamVR it's of little use.

The real and only use of enabling multimodal on the client, is to use the extra skeletal data to replace the fake skeletal animations we have when pressing controller buttons

@The-personified-devil
Copy link
Collaborator

Server supports multimodal

I'm aware, it's a version of the code that supports multimodal vs an outdated version of the server code that has no knowledge of multimodal existing

SteamVR does not support multimodal input the way the Quest does. SteamVR always had the ability to have controller + hand skeleton at the same time. with SteamVR Input 2.0 when using two pairs of devices we can assign different skeletal levels and switch between them.

I'm also aware

As you can see SteamVR 2.0 and multimodal are actually completely orthogonal features.

Right, but they still kinda have to play together so it's still relevant

The real and only use of enabling multimodal on the client, is to use the extra skeletal data to replace the fake skeletal animations we have when pressing controller buttons

Think that's where most of the confusion stems from, I was trying to figure out how the detached controllers were implemented/assuming certain code was for detached controllers

@zmerp zmerp merged commit 92a5bba into master Sep 2, 2024
17 checks passed
@zmerp zmerp deleted the multimodal-input branch September 2, 2024 22:24
zmerp added a commit that referenced this pull request Sep 10, 2024
* feat: ✨ Multimodal input

* Fix controllers and hands dropping to 0,0,0 when not visible

* Actually fix multimodal input support

* Address review comments
@zmerp zmerp mentioned this pull request Sep 10, 2024
zmerp added a commit that referenced this pull request Sep 10, 2024
* feat: ✨ Multimodal input

* Fix controllers and hands dropping to 0,0,0 when not visible

* Actually fix multimodal input support

* Address review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants