By Bill Jia
The path for taking AI development from research to production has historically involved multiple steps and tools, making it time-intensive and complicated to test new approaches, deploy them, and iterate to improve accuracy and performance. To help accelerate and optimize this process, we’re introducing PyTorch 1.0, the next version of our open source AI framework.
PyTorch 1.0 takes the modular, production-oriented capabilities from Caffe2 and ONNX and combines them with PyTorch’s existing flexible, research-focused design to provide a fast, seamless path from research prototyping to production deployment for a broad range of AI projects. With PyTorch 1.0, AI developers can both experiment rapidly and optimize performance through a hybrid front end that seamlessly transitions between imperative and declarative execution modes. The technology in PyTorch 1.0 has already powered many Facebook products and services at scale, including performing 6 billion text translations per day.
PyTorch 1.0 will be available in beta within the next few months, and will include a family of tools, libraries, pre-trained models, and datasets for each stage of development, enabling the community to quickly create and deploy new AI innovations at scale.
The path from research to production
PyTorch’s imperative front end allows for more rapid prototyping and experimentation through its flexible and productive programming model. The first version of PyTorch launched a little over a year ago, and its speed, productivity, and ability to support cutting-edge AI models such as dynamic graphs quickly made it a popular and important development tool for AI researchers. It has more than 1.1 million downloads and is the second-most cited deep learning framework on arxiv over the last month. For example, UC Berkeley computer scientists put PyTorch’s dynamic graph capabilities to use for their noteworthy CycleGAN image-to-image transform work.
Although the current version of PyTorch has provided great flexibility for AI research and development, performance at production-scale is sometimes a challenge, given its tight coupling to Python. We often need to translate the research code — either training script or trained model — to a graph mode representation in Caffe2 to run at production scale. Caffe2’s graph-based executor allows developers to take advantage of state-of-the-art optimizations like graph transformations, efficient memory reuse, and tight hardware interface integration. The Caffe2 project was launched two years ago to standardize our production AI tooling, and is now running neural networks across Facebook servers and on more than 1 billion phones around the world, spanning eight generations of iPhones and six generations of Android CPU architectures. Today, Caffe2 delivers more than 200 trillion predictions per day across all models, small and large, with optimized production performance.
The migration from PyTorch to Caffe2 to ship to production used to be a manual process, time-intensive and overall error-prone. To solve this problem, we partnered with major hardware and software companies to create ONNX (Open Neural Network Exchange), an open format for representing deep learning models. With ONNX, developers can share models among different frameworks, such as by exporting models built in PyTorch and importing them to Caffe2. At Facebook, this enabled us to have smoother AI research, training and inferencing with large-scale server and mobile deployment.
We’ve used these tools (PyTorch, Caffe2, and ONNX) to build and deploy Translate, the tool that now runs at scale to power translations for the 48 most commonly used languages on Facebook. In VR, these tools have been critical in deploying new research from Oculus into production to make avatars move more realistically.
However, while this combination of three different tools has been effective, there are still manual steps that are complicated and time-consuming. As such, it didn’t allow us to bring new AI research innovation to production as seamlessly as we would have liked.
Unifying research and production capabilities in one framework
PyTorch 1.0 fuses together immediate and graph execution modes, providing both flexibility for research and performance optimization for production. More specifically, rather than force developers to do an entire code rewrite to optimize or migrate from Python, PyTorch 1.0 provides a hybrid front end enabling you to seamlessly share the majority of code between immediate mode for prototyping and graph execution mode for production.
In addition, ONNX is natively woven into PyTorch 1.0 as the model export format, making models from PyTorch 1.0 interoperable with other AI frameworks. ONNX also serves as the integration interface for accelerated runtimes or hardware-specific libraries. This gives developers full freedom to mix and match the best AI frameworks and tools without having to take on resource-intensive custom engineering. Facebook is committed to supporting new features and functionalities for ONNX, which continues to be a powerful open format as well as an important part of developing with PyTorch 1.0.
Building an end-to-end deep learning system
Along with PyTorch 1.0, we’ll also open-source many of the AI tools we are using at scale today. These include Translate — a PyTorch Language Library — for fast, flexible neural machine translation, as well as the next generation of ELF, a comprehensive game platform for AI reasoning applications. Developers can also take advantage of tools like Glow, a machine learning compiler that accelerates framework performance on different hardware platforms, and Tensor Comprehensions, a tool that automatically generates efficient GPU code from high-level mathematical operations. We have also open-sourced other libraries, such as Detectron, which supports object-detection research, covering both bounding box and object instance segmentation outputs. Visit our AI developer site at facebook.ai/developers for the full list, and learn more about PyTorch on the PyTorch and Caffe2 blogs.
Over the coming months, we’re going to refactor and unify the codebases of both the Caffe2 and PyTorch 0.4 frameworks to deduplicate components and share abstractions. The result will be a unified framework that supports efficient graph-mode execution with profiling, mobile deployment, extensive vendor integrations, and more. As with other open AI initiatives like ONNX, we’re also partnering with other companies and the community to give more developers these accelerated research to production capabilities. To start, Microsoft plans to support PyTorch 1.0 in their Azure cloud and developer offerings, including Azure Machine Learning services and Data Science Virtual Machines, and Amazon Web Services currently supports the latest version of PyTorch, optimized for P3 GPU instances, and plans to make PyTorch 1.0 available shortly after release in their cloud offerings, including its Deep Learning AMI (Amazon Machine Image).
This is just the beginning, as we look to create and share better AI programming models, interfaces and automatic optimizations. AI is a foundational technology at Facebook today, making existing products better and powering entirely new experiences. By opening up our work via papers, code, and models, we can work with all AI researchers and practitioners to advance the state of the art faster and to help apply these techniques in new ways.
Visit the Facebook Engineering Blog at code.facebook.com for more news.