Head tracking for VR Part 1 – overview
For the last year or so there’s been a lot going on in the virtual reality (VR) area and enhancing the immersion in the game. Oculus VR released Oculus Rift which, pretty much, everyone knows about today (if you somehow don’t – check out their website); there’s also the the Vitruix Omni which makes you break a sweat and run around to control you character.
Most recently, Amazon announced their Fire Phone, which had one very cool feature – it can track your head and adjust the (see here). This Fire Phone head tracking feature got our attention here at Coherent Labs and we wanted to see if there’s any piece of software that offers that functionality for an ordinary webcam so it can be used in a game. Much to our surprise there weren’t really a lot of alternatives – e.g. FaceTrackNoIR or the Unity plugin from the Fire phone SDK, which is Android-only, or many people who did some experiments but never released anything. That got us the idea to create our own very simplistic head tracking library that other developers can use.
Our quick research showed several options for doing the headtracking:
- Having a special motion dection device (e.g. Wii Remote)
- Depth camera (e.g. Kinect)
- Ordinary webcam
Motion detection devices offer high precision but are somewhat annoying because you have to wear some trinkets so they work. One of the guys that experimented with this used the Wii Remote to track his head position, but the accuracy was reportedly better if you move the sensor bar instead of the remote – so he strapped the sensor bar to his head, which looked very funny.
The depth camera is like an ordinary camera, but it can provide depth information about the image. This is great, but these are not too common at the moment. Plus, you’ d have to some extra work to detect where the head is, unlike motion detection devices.
Ordinary webcams are a commodity now – laptops have them and you have a lot of cheap options for desktops. This is the most accessible option so far, but they have it has the disadvantage that it neither gives you depth information, nor you know for sure where the head is. Those can be overcome with some calibration and processing power, though – after all today’s processors are quite fast. The accessibility and cost have a much higher impact for the end-user, so this option seemed like the winner to us.
The initial plan is to use OpenCV for detecting the head position and its features so we can estimate the rotation and image space position. Using input from the user about the distance of her head in specific camera capture, the field of view of the camera, some trigonometry and approximations, we can make a pretty good guess about the depth using only the 2D image. After we have some estimation of the head’s position and orientation, we can use the data any way we like – change where the player is looking at, make a small adjustment to the camera’s position, modify the perspective and much more.
There are of course some other details such as dealing with noise – i.e. the head detection in images is not always perfect and can give you bogus results that you’ll need to filter but that’s a topic for a future post.
In the next part we’ll look into what’s needed for all this to work with OpenCV, the actual code that makes it work and performance measures and considerations.