Storing Volumetric Video

During my freshman and sophomore year of college, I lived in MSOE's Viets Tower. While dorm life had its ups and downs, one of the big advantages was that Viets had a shared common space on every floor that people used for studying, working on personal projects, and hanging out. It was a great way to make friends, get help with difficult problems, and collaborate with others on projects.
One of my friends that lived on the floor with me freshman year, Alex Lopez, was an intern at L&S Electric. For that internship, he had access to a HoloLens AR headset, which he used to develop software for the company. One gap he saw with the software on the headset was a lack of an open source way to record and playback video on the headset. To fix that, we decided to work together to make an experimental format to store volumetric video, software to make the video files, and software to play back the video.

Preexisting Work

An XKCD comic about standards proliferating.

Before starting, we did a small amount of research to see what other solutions people have tried. We discovered glTF as a possible solution, but that did not fit our needs since we wanted something that could encode point clouds.

Process & Components

Our system consisted of three steps:

1. Capture from Hardware

To collect recordings, we used a Kinect with a program Alex found that converts point cloud data from a Microsoft Kinect into a series of PLY files. PLY is a file format that stores 3D geometry. It can be used in both a text format and a binary format; we chose to go with just supporting the text format initially to make development easier.

2. Conversion to PLYS

Once we had a set of PLY files, we needed to convert them into a single large file to later play back from. To do this, I developed a JavaFX application that would take a directory of numbered PLY files and add them into a single file, which we called PLYS (PLY Sequence). We also added a header to the top of the combined file, which contained some metadata (such as frame rate). We also included the option to compress the file with gzip, in order to get a smaller output that could be transferred over a network more efficiently.

3. Playback

We developed two applications to playback PLYS files. One was made using Unity and C#, and was intended for creating a player on the HoloLens that inspired the project. Alex had much more experience with Unity and C#, so he handled much of that, but I did help out with trying to optimize code to increase performance. One big contribution I made to that was discovering that he was instantiating an array list without specifying an initial size. This list was being filled with hundreds of points in a point cloud we were rendering every frame. Without the initial size set, the array list would default to a small size for its backing array, and would need to allocate new arrays every time the array filled up. Since we knew how large each array list needed to be from the header of each PLY file, I was able to use that to set the initial size of the array list. This resulted in a huge speedup.
I also developed a second player program using Godot. Godot's main advantage over Unity is that it supports VR on Linux, allowing a wider range of devices to be used. I was able to translate the work we did in C# to GDScript (Godot's python-like programming language) and make a working implementation. I was able to run the Godot player on my own VR headset (I have a Valve Index) and was quite satisfied being able to walk around a video in a program I had made.


A screenshot of a person being rendered in a PLYS video.

We successfully made the basis of a volumetric video format that could be further developed into something useful.
Alex eventually decided to work on other projects, and since he had the recording equipment and the needed knowledge to go from a Kinect to a set of PLY files, the project went on an indefinite hold.


Future Work

The main piece of future work to be done is to extend our player softwares to fully support the PLY specification underlying our video format. Currently they only support point clouds formatted in the certain way the Kinect to PLY program formats them, but we would want to eventually support sequences of any valid PLY files.
Another piece of future work to be done would be make a program that records directly in the PLY format. Currently there are problems with library infrastructure that would make this difficult. There used to be a library effort called OpenNI that was going to make a unified way to collect volumetric data from sensors, but the main company behind it was bought out by Apple and dissolved. One of the other companies made an effort to continue on, but also stopped supporting it on their newer sensor models.


While developing PLYS, the Create Institute (a part of MSOE) did a hackathon in collaboration with the local company Brady Corporation. We decided to use the event as an opportunity to sit down, go over what we had made so far in Unity, and fix as many performance issues as we reasonably could. During the hackathon, we were able to get our Unity player to go from single frames per second to the thirties on a mid-range MSOE issued laptop, a massive improvement. The big boost (and also likely a wow factor from the overall project) got us first place in the competition.