Move Mirror: An AI Experiment with Pose Estimation in the Browser using TensorFlow.js

July 21, 2018 sherry 7 Data Science, Design, Engineering, Technology,

By Jane Friedhoff and Irene Alvarado, Creative Technologists, Google Creative Lab

https://experiments.withgoogle.com/collection/ai/move-mirror/view/mirrorhttps://github.com/tensorflow/tfjs-models/tree/master/posenet
https://storage.googleapis.com/tfjs-models/demos/posenet/camera.html
https://js.tensorflow.org

Pose estimation, or the ability to detect humans and their poses from image data, is one of the most exciting — and most difficult — topics in machine learning and computer vision. Recently, Google shared PoseNet: a state-of-the-art pose estimation model that provides highly accurate pose data from image data (even when those images are blurry, low-resolution, or in black and white). This is the story of the experiment that prompted us to create this pose estimation library for the web in the first place.

Months ago, we prototyped a fun experiment called Move Mirror that lets you explore images in your browser, just by moving around. The experiment creates a unique, flipbook-like experience that follows your moves and reflects them with images of all kinds of human movement — from sports and dance to martial arts, acting, and beyond. We wanted to release the experience on the web, let others play with it, learn about machine learning, and share the experience with friends. Unfortunately we faced a problem: a publicly accessible web-specific model for pose estimation did not exist.

Typically, working with pose data means either having access to special hardware or having experience with C++/Python computer vision libraries. We thus saw a unique opportunity to make pose estimation more widely accessible by porting an in-house model to TensorFlow.js, a Javascript library that lets you run machine learning projects in the browser. We assembled a team, spent a few months developing the library, and ultimately released PoseNet, an open-source tool that allows any web developer to play with body-based interactions, entirely within the browser, no special cameras or C++/Python skills required.

With PoseNet out in the wild, we can finally release Move Mirror – a project that is a testament to the value that experimentation and play can add to serious engineering work. It was only through a true collaboration between research, product, and creative teams that we were able to build PoseNet and Move Mirror.

Move Mirror is an AI Experiment that finds your pose and matches your moves with thousands of images from around the world

Read on to get an in-depth view into how we made the experiment, what excites us about pose estimation in the browser, and the ideas on the horizon that we’re excited for.

What is pose estimation? What is posenet?

As you might guess, pose estimation is a pretty complex issue: humans come in different shapes and sizes; have many joints to track (and many different ways those joints can articulate in space); and are often around other people and/or objects, leading to visual occlusion. Some people use assistive devices like wheelchairs or crutches, which may block the camera’s view of their bodies; others might not have certain limbs; and still others may have very different proportions. We want our machine learning models to be able to understand and smartly infer data about all these different bodies.

Here, you can see PoseNet’s joint detection results on folks who are using assistive devices (like canes, wheelchairs, and prosthetic limbs).

In the past, technologists have approached the problem of pose estimation using special cameras and sensors (like stereoscopic imagery, mocap suits, and infrared cameras) as well as computer vision techniques that can extract pose estimation from 2d images (like OpenPose). These solutions, while effective, tend to require either expensive and not widely distributed technology, and/or familiarity with computer vision libraries and C++ or Python. This makes it harder for the average developer to quickly get started with playful pose experiments.

When we first encountered PoseNet, it was available via a simple web API, which was super exciting. All of a sudden, we could prototype pose estimation experiments quickly and easily in Javascript. All we had to do was send an HTTP POST request to an internal endpoint with our image’s base64 data — the API endpoint would send us pose data back with almost no latency. This hugely lowered the barrier to entry for making small exploratory pose experiments: just a few lines of JavaScript, an API key, and we were set! But of course, not everyone would have the capacity to run their own PoseNet backend, and (reasonably) not everyone would feel comfortable sending photos of themselves to a centralized server anyway. How could we make it feasible for people to run their own pose experiments without having to rely on our servers, or anyone else’s?

This was the perfect opportunity, we realized, to connect TensorFlow.js to PoseNet. TensorFlow.js would allow users to run machine learning models right in their browser — no server required. By porting PoseNet to TensorFlow.js, anyone with a decent webcam-equipped desktop or phone could experience and play with this technology, right from within a web browser, without having to worry about low-level computer vision libraries orsetting up complicated backends and APIs. Working closely with Nikhil Thoratand Daniel Smilkov of the TensorFlow.js team, Google researchers George Papandreou and Tyler Zhu, and Dan Oved, we were able to port a version of the PoseNet model to TensorFlow.js. (You can read more about that process here.)

A few things that made us super excited about PoseNet in TensorFlow.js:

Ubiquity/Accessibility: Most developers have access to a text editor and a web browser, and usage of PoseNet is as simple as including two script tags in your HTML file — no fancy server setup required. You also don’t need any special high-res or infrared cameras or sensors to get data — in fact, we found that PoseNet still works well on low-res, black-and-white, and vintage photography.
Shareability: Because everything can run in the browser, TensorFlow.js PoseNet experiments can also be shared in the browser super-easily. No need to make operating-system-specific builds — just upload your webpage and go.
Privacy: Because all of the pose estimation can be done in the browser, that means none of your image data ever has to leave your computer. Rather than sending your photos to some server in the sky to do pose analysis on a centralized service (i.e. such as when you use a vision API which you may not control, or which may fail, or any number of things), you can do all the pose estimation on your device, controlling exactly where your image goes. With Move Mirror, we match the (x,y) joint data that PoseNet spits out with our bank of poses on our backend — but your image stays entirely on your computer.

Okay, enough tech talk: let’s talk design!

Design and Inspiration

We spent a few weeks just goofing around with different pose estimation prototypes. For those of us who came from C++ or Kinect-hacking, just seeing our skeleton reflected back to us in our browser, using our webcam, was a pretty amazing demo on its own. We played with trails, puppets, and all sorts of other silly things before we landed on the concept that would become Move Mirror.

It probably isn’t surprising to hear that a lot of us here in the Google Creative Lab are interested in search and exploration. In talking about what we could do with pose estimation, we were tickled by the idea of being able to search an archive by pose. What if you could strike a pose and get a result that was the dance move you were doing? Or — maybe even funnier — what if you struck a pose and got a result that was the same, but totally out of context for what you were doing? How could we find weird, serendipitous connections across the breadth of human movement — from martial arts to cooking to skiing to babies taking their first steps? How might that surprise us, delight us, and make us laugh?

Land Lines gif via Awwwards, and Gesture Match image via the Cooper Hewitt.

We took inspiration from projects like Land Lines (in which gestural data is used to explore similar lines in Google Earth) and the Cooper Hewitt Gesture Match (which is an on-site installation that uses pose-matching to suggest items from the archive). Aesthetically, however, we were drawn in a much faster, more real-time direction. We loved the idea of having a constant stream of images respond to your movements, blurring folks from all walks of life together, connected by your movement. Inspired by rotoscoping and timelapse photography, as are used in The Johnny Cash Project, and the trend of selfie timelapses on YouTube, we decided to lean hard on the gas pedal and attack real-time responsive pose matching in the browser — a complex problem itself.

Gif of The Johnny Cash Project, in which more than 250,000 people individually drew frames for “Ain’t No Grave” to make a crowdsourced music video.

Building Move Mirror

Although PoseNet took care of the pose estimation for us, we still had plenty of things to figure out. The core experience is all about finding matching images to user poses, so that if you stand straight with your right arm raised up, Move Mirror finds an image where someone is standing with their right arm raised up. For that we needed three components: an image dataset, a search technique for that dataset, and a pose matching algorithm. Let’s break it down and look at each piece.

Building a dataset: searching for diversity

To create a useful dataset, we had to search for images that collectively covered a huge variety of human movement. There was no point in having 400 images of a person standing with a raised right arm if other poses were not represented in the dataset. To keep the experience consistent, we also decided we’d focus on finding only full-body images. In the end, we licensed a set of videos we thought represented not just a variety of movement, but also a diverse set of body types, skin tones, cultures, and physical abilities. We split these videos into about 80,000 still frames, then processed each image with PoseNet and stored the associated pose data. Next, let’s talk about the hard parts: pose matching and search.

We parsed thousands of images through PoseNet. You’ll notice not all images are parsed correctly so we discarded a few to end up with a dataset of about 80,000 images.

Pose matching: the challenge of defining similarity

For Move Mirror to work, we first had to figure out how to define a ‘match’. A match is the image we return, based on the pose data we receive, when a user strikes a pose. When we talk about the ‘pose data’ coming out of PoseNet, we’re referring to a set of 17 body or face parts, such as an elbow or a left eye, that are called “keypoints”. PoseNet returns the x and y position of each keypoint in relation to the input image, plus an associated confidence score (more on this later).

PoseNet detects 17 pose keypoints on the face and body. Each keypoint has three important pieces of data: an (x,y) position (representing the pixel location in the input image where PoseNet found that keypoint) and a confidence score (how confident PoseNet is that it got that guess right).

Deciding what ‘similarity’ meant became our first hurdle. How should we decide how similar a set of 17 keypoints from a user is to a set of 17 keypoints from an image in our dataset? We tried a few different measures for similarity and settled on two that seemed to work well: cosine similarity and a weighted match taking into account keypoint confidence scores.

Matching strategy #1: cosine distance

If we were to convert each set of 17 keypoints into a vector and plot all of them in high dimensional space, our task of finding the two most similar poses would translate into finding the closest two vectors in this high dimensional space. This is exactly what cosine distance allows us to do.

Cosine similarity is a measure of similarity between two vectors: basically, it measures the angle between them and returns -1 if they’re exactly opposite, 1 if they’re exactly the same. Importantly, it’s a measure of orientation and not magnitude.

*A visual depiction of cosine similarity, via* *Christian Perone.*Although we’re talking about vectors and angles, it’s not limited to lines on graphs: you can use cosine similarity to, for example, get a numerical similarity between two equal-length strings. (You may have indirectly used cosine similarity before if you’ve ever used Word2Vec.) Indeed, it’s a super-helpful way to reduce the relationship between two high-dimensional vectors (two long sentences, or two long arrays of numbers) into a single number.

*A simplified version of* *Nish Tahir’s excellent example.* Don’t worry if you don’t know vector math: the important thing to see is that we were able to turn two pieces of abstract, high dimensional data (5 dimensions, for 5 unique words) into a single, normalized number representing their similarity. You can also try your own sentences *here*.

0 0 votes

Article Rating

About The Author

sherry

CODESIGN.BLOG

7 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

yeezy boost

5 years ago

A lot of thanks for all your valuable hard work on this web site. Betty really likes working on investigations and it is simple to grasp why. Most of us learn all about the compelling tactic you convey simple solutions via the website and in addition recommend contribution from other individuals on the issue so our own princess is truly being taught a whole lot. Enjoy the rest of the new year. You’re the one performing a wonderful job.

Adidas NMD R1 Primeknit Tri-Color NoirCore Rouge-Blanc

5 years ago

Thanks a lot for providing individuals with an extraordinarily brilliant chance to read from here. It can be very great plus full of a good time for me personally and my office fellow workers to visit the blog more than 3 times in one week to read the fresh stuff you have got. And indeed, I am usually fulfilled with all the mind-blowing thoughts you serve. Some 1 points in this article are rather the best we’ve had.

adidas ultra boost 3.0

5 years ago

My spouse and i have been absolutely contented when Chris managed to conclude his studies using the precious recommendations he received from your own blog. It is now and again perplexing to just find yourself giving out solutions which a number of people may have been making money from. We see we have got the writer to be grateful to because of that. Most of the explanations you made, the straightforward website navigation, the relationships you make it easier to create – it’s many unbelievable, and it’s really assisting our son and our family do think the idea is pleasurable,… Read more »

yeezy boost 350

5 years ago

I truly wanted to send a simple remark to thank you for all the fabulous steps you are giving on this site. My particularly long internet look up has finally been honored with professional details to talk about with my companions. I ‘d express that many of us readers are definitely lucky to exist in a notable website with so many wonderful individuals with useful tips. I feel somewhat grateful to have come across the website page and look forward to many more brilliant moments reading here. Thanks once more for a lot of things.

adidas nmd

5 years ago

I simply had to thank you very much yet again. I am not sure the things that I could possibly have accomplished in the absence of these hints shown by you about such concern. This was a distressing matter in my view, nevertheless discovering the specialised mode you handled it forced me to leap with fulfillment. Now i’m happy for your help and have high hopes you really know what a great job you are always putting in instructing others using a site. I am sure you’ve never got to know any of us.

nmd uk

5 years ago

Thanks a lot for providing individuals with an extremely breathtaking chance to discover important secrets from this web site. It is always so nice plus jam-packed with fun for me personally and my office friends to visit the blog a minimum of thrice per week to learn the latest guidance you have got. Of course, I am just at all times contented with the terrific methods served by you. Selected 4 areas in this post are unequivocally the most suitable I have ever had.

nba jerseys

5 years ago

I must show appreciation to you for rescuing me from this particular problem. Because of scouting through the search engines and seeing tips which were not beneficial, I assumed my entire life was gone. Existing without the presence of solutions to the problems you’ve sorted out as a result of your posting is a serious case, as well as ones which could have badly damaged my career if I had not noticed the website. The understanding and kindness in handling all the details was excellent. I’m not sure what I would’ve done if I hadn’t encountered such a subject like… Read more »

wpDiscuz

CODESIGN.BLOG

DESIGN THE WORLD

Move Mirror: An AI Experiment with Pose Estimation in the Browser using TensorFlow.js

Move Mirror is an AI Experiment that finds your pose and matches your moves with thousands of images from around the world

What is pose estimation? What is posenet?

Here, you can see PoseNet’s joint detection results on folks who are using assistive devices (like canes, wheelchairs, and prosthetic limbs).

Design and Inspiration

Land Lines gif via Awwwards, and Gesture Match image via the Cooper Hewitt.

Gif of The Johnny Cash Project, in which more than 250,000 people individually drew frames for “Ain’t No Grave” to make a crowdsourced music video.

Building Move Mirror

Building a dataset: searching for diversity

We parsed thousands of images through PoseNet. You’ll notice not all images are parsed correctly so we discarded a few to end up with a dataset of about 80,000 images.

Pose matching: the challenge of defining similarity

PoseNet detects 17 pose keypoints on the face and body. Each keypoint has three important pieces of data: an (x,y) position (representing the pixel location in the input image where PoseNet found that keypoint) and a confidence score (how confident PoseNet is that it got that guess right).

Matching strategy #1: cosine distance

A snippet of the JSON coming from PoseNet, and a snippet of the flattened array of X and Y positions. (You’ll notice this array doesn’t take confidence into account — we’ll get back to that in a bit!)

A vector scaled with L2 normalization.

Matching strategy #2: weighted matching

Posenet returns a confidence score for each keypoint. The model predicts keypoints with a higher confidence score to be more accurate.

Move Mirror tries to find a matching image given the pose estimated by PoseNet. The match accuracy is dependent on PoseNet’s accuracy as well as the diversity of the dataset.

Searching pose data at scale: 80,000 images in ~15ms

A diagram of a vantage point tree (vp tree) from Data Structures for Spatial Data Mining.

In The Future

About The Author

sherry