The Problem With Microsoft Mixed Reality Capture Studio

Microsoft Mixed Reality Studio Review

You wont be seeing armies of volumetric Orcs or legions of stormtroopers coming out of a Microsoft’s Mixed Reality Capture Studio, not yet anyway.

Microsoft are on a push with their Mixed Reality platform and along with their Mixed Reality Capture Studio solutions, but there’s a funky problem, call it a pain-point if you like. It’s how the Microsoft system relies heavily on  infrared sensors and how they see materials. There’s a workaround but its an expensive one.

What is Volumetric Video Capture?

Volumetric capture promises to record bodies, faces and heads in detail with depth data, as well as overlaid optical information showing colour and texture like that shot from a traditional camera. Don’t get us wrong we all love this idea, but Microsoft have rushed to sell what we’d call out as vapourware.

Shot in a volume, volumetric video promises 3D animation-ready VR/AR characters in the geometry based .MP4 format used for games and interactive experiences. The system overlays and merges optical, depth and animation data so that you can import 3D animated characters directly into your Unity or Unreal Engine game dev. But in the industry we do all this already with mo-cap and plenty of hard work. The benefit of volumetric video capture should be one of speed, but that’s not the case so far.

microsoft mixed reality studio

With the Microsoft Mixed Reality Studio system, 190 cameras are used in a 360-degree studio, each camera is equipped with one optical and in the case of Microsoft one infrared sensor which detects depth and movement and records them as .FBX data.  That’s all fine and dandy, but its the IR sensors part that seems to be the course of so many problems.

The problem with volumetric IR depth sensors.

When shooting with IR as in Microsoft’s Mixed reality capture studio its essential that you take into consideration the wardrobe of the production, because it could make or break your volumetric video production budget. We are not saying this makes the system a total waste of time, but in the interest of seeing disappointed faces on set, you need to know this.

volumetric costume design
Volumetric costume design.

Wardrobe and Production Design are Key.

The reason why wardrobe and design are so important is because infrared cameras struggle with many different materials, making the uninitiated stylist’s job a potential hell. 

Inferred cameras have trouble with:

  • Leathers  
  • Glass  
  • Jewellery
  • Dark colours (especially BLACK!) 
  • Shinny metals
  • Plastics
  • Stitching
  • Patterns

But then in saying all that, in tests one pair of black jeans vanished while another ‘brand’ of black jeans worked perfectly fine. So what’s going on?

These restrictions (or maybe not restrictions depending on the unique properties of each) may not seem like a massive problem but these are just some materials that have coursed problems in shooting volumetric video.

You have to take into consideration the variables like density, chemical make up, the weave pattern, fabric texture, even the physical qualities of makeup applied and its colour. These restrictions may limit a lot of creative decisions for Directors, stylists and production designers.

The reason why the infrared doesn’t pick up or detect these materials is because (here comes the science) when infrared waves (light) are sent towards an object, they do one of 3 things, depending on the properties of the object and surface they hit.

infrared wave lengh, Volumetric video capture

IR light waves are either reflected, absorbed or transferred. When shooting volumetric video with Microsoft mixed reality capture the beams have to be reflected back to the camera to relay the needed data.  

How can this be helped? 

Certain aspects of the problem can be reduced for example the metals can be sprayed with dulling spray to bring down the amount of reflection to have more control of the IR waves. There’s other exceptions, but you need to think like a chemist and physicist to know why.

As we said, some clothes for example black jeans shouldn’t work but then when tested some worked fine due to their different weave, dye and chemical make up. Turning trousers inside out and shooting the opposite side of the fabric solved one shoot. Some stitching scans but then stand out above the fabric, it seems to be down to the actual physical and chemical properties as well as colour and reflectivity.

So what do you do?

The only real way to know if a costume is going to work, is to go to a studio and do physical tests with the system, cause as reported there’s sometimes exceptions and unless you test you will never know, you’ll just be guessing.

Is this just a problem with Microsoft’s volumetric capture solution? We’d say its a problem for any IR based depth and motion sensors. Testing costume and props before shooting means more time in the volumetric volume and that means more costs.

With the cost to hire a volumetric stage using the Microsoft Mixed Reality Capture Studio software currently set around $20,000 a day, at least half of the day is spent tweaking and testing costume and props in regards to getting a good  result from the systems IR sensors.

testing volumetric video

Some volume operators are also charging additional facility fees for each costume change, as the system needs to be repeatedly tested and possibly changes made on-set to the costume for every actor.

Imagine having to scan 20 different types of game characters, having to test all the armour, weapons and kit, then only to find that they don’t ‘read’ in the volumetric volume. VFX Supervisors need to budget for testing, re-testing, shooting, cleaning up the data, and then finally shooting.

Volumetric video capture solutions… next!

Mixed Reality Tools Intel RealSense Depth Cameras

Deep compositing with no green screen requires a volumetric, depth view of the scene. The Intel Realsense range of cameras are a consumer level mass product that allows creators to combine VR sets with real world images based on depth, so that elements like background and foreground can be easily separated from the camera image, or more precisely layered in the correct order so that your picture makes sense.

Virtual Production in Unreal Engine

Virtual Production
VFX Breakdown of virtual production in unreal engine by on-set facilities.

To round off our year, we set our team the challenge of producing a music video in just 12 hours using OSF Virtual Production Systems, from first shot to uploading to the OSF youtube channel. With no post-production.

Created entirely using OSF virtual production systems in Unreal Engine, this test demonstrates high end mixed reality, with real-time compositing of 3D virtual sets, with real-time character, foreground and chroma key layers, with real-time rendering and on-set colour grading.

The test video was all filmed on the OSF MR Factory Stage, a 500m virtual production green-screen stage at On-set Facilities in Madrid.

Real-time compositing of live action actors in virtual sets powered by Unreal Engine. With realtime VFX and on-set colour correction, the only post production required for this video was editing and mastering the 4k file in Adobe Premier. Directed by Asa Bailey, Virtual Production Supervisor at On-set Facilities.

The Virtual Production

Shot by a small crew of just myself on the camera (Alexa Mini), 3 system operators and 2 willing actors, it took little more than 3 hours to shoot the test and a previous 5 hours in preproduction to find our props, texture, light and then bake the 3D set. We used C4D to model the set and we rendered in Octane before baking into Unreal Engine (plugin on the way).

The idea of this test was to show VFX Supervisors, Directors and Producers that it is possible to produce quick cost effective “ready to edit” footage with the OSF realtime virtual production system.

Virtual Production Spectrum

The rules of this test said no post allowed. But, do you remember 8bit and then 16bit computer games? No, well once upon-a-time computer games looked well dodgy, and its a bit like virtual production methods today (written Dec 2018). To really use the technology for high end film productions you’ll still want to open up the files and give them a polish in post. But the more effort you put into pre-production (creating 3D sets and realtime VFX) the better.

16 bit game graphics
Remember 16bit game graphics, virtual production requires 32bit and 64bit.

But what we are proving is that for content production, today virtual production methods can greatly reduce the post-production process, freeing up production budgets to spend more on-set and not in post.

We had just 12 hours. It was a test in discipline for the crew and also an annual quality bench mark of the realtime footage from our systems. What you are seeing in the test are 4k video files from our systems, as recorded on-set.

Virtual Production is a Tech Wave

Virtual production is an emerging space and it involves realtime visual effects, live audio (3D), realtime character animation and realtime motion capture. With developments moving fast, the amount of post-production required to deliver final shots in realtime is reducing rapidly, month to month.

In OSF film work, 60% of close and mid shots are good to go directly to the edit, with just 40% needing to be opened up in post and given a polish. Wide shots are more tricky, but we are making big improvements in colour matching optical and virtual layers and in developing AI shadows and reflections.

Virtual Production Workstations.
OSF Realtime Machines Custom Built Virtual Production Workstations.


This year we’ve seen a massive increase in quality, as OSF push towards 32bit and even 64bit colour and graphics in Unreal Engine. Realtime Virtual Production takes serious on-set power. In response OSF have started a line of OSF Realtime Machines, Intel and Nvidia based workstations that are optimised to power realtime virtual productions and AI driven realtime VFX.

The Technology Used in This Test

OSF realtime virtual production system integrations are based on OSF Realtime Machines, running Unreal Engine with realtime rendering and virtual camera cinematic capabilities.

Realtime camera tracking is taken care of by Mo-sys Engineering on top of  the optical camera (Alexa Mini). Blackmagic hardware also plays a large part in OSF solutions, incorporate 4K and 8K broadcast standards.

On-Set Facilities Madrid Studios
OSF Green Screen Stage and Realtime Virtual Production Systems.

But what about Post Production Options?

In this test case, the only post production allowed was to pull the footage from our data recorders and onto a timeline in Adobe Premier. We then cut the rushes to a library track from Audio Networks and that was it.

But, we could have opened the recorded layers and data – Matte – Foreground – Background and RAW optical layers, into a long list of post production software and clean and tweak until our hearts (and clients) content.

For anything more than a test, we’d still prefer to open up the files in post as all assets from the system are editable in post production applications.

For instance we’d have liked to clean up the chroma, improving the look and feel of the virtual sets. But that was not the idea in this test – it was strictly no post allowed, what you see is what you get, in realtime.

Have faith.
Asa Bailey
Director and Virtual Production Supervisor