WebNN: Bringing AI Inference to the Browser

Imagine having the power of AI-driven facial recognition or real-time image classification directly in your browser. This is the promise of WebNN, a groundbreaking JavaScript API designed to bring neural network inference to web applications.

What is WebNN?

Web Neural Network API (WebNN) is a JavaScript API that empowers web developers to perform machine learning computations on neural networks directly within web applications. WebNN simplifies the integration of machine learning models into web apps, opening up new possibilities for interactive and intelligent applications right in the browser

WebNN is primarily designed for inference tasks rather than training. It provides an abstraction layer for efficient neural network inference acceleration in web applications. The Web Neural Network API (WebNN) is a web-friendly, hardware-agnostic abstraction layer for neural network inference acceleration. It allows web applications to efficiently run machine learning computations on various devices, including CPUs, GPUs, and dedicated AI accelerators1. Developers can leverage WebNN to perform inference tasks in web applications, benefiting from reduced latency, enhanced privacy, and GPU acceleration. If you’re interested in constructing and executing computational graphs of neural networks in the browser, WebNN provides a high-level interface for these tasks. As of now, WebNN is available in Edge and Chrome browsers.

With emerging ML innovations in both software and hardware ecosystem, one of the main challenges for the web is to bridge this software and hardware development and bring together a solution that scales across hardware platforms and works with any framework for web-based machine learning experiences. We propose the WebNN API as an abstraction for neural networks in the web browsers.

The architecture diagram shows how WebNN integrates with various machine learning frameworks and hardware platforms, enabling efficient neural network inference in web applications.

Understanding the WebNN Architecture Diagram:

The WebNN architecture diagram illustrates how the Web Neural Network API integrates with various components in a web-based machine learning workflow. Let’s walk through each layer and component:

Web App Layer

ONNX Models, TensorFlow Models, Other Models: These are pre-trained machine learning models that can be used for various tasks like image recognition, object detection, etc.
JS ML Frameworks (TensorFlow.js, ONNX.js, etc.): JavaScript-based machine learning frameworks that provide tools and libraries to work with these models directly in web applications.

Web Browser Layer

WebGPU: A web standard that provides high-performance graphics and computation on the web by leveraging the GPU.
WebNN: The Web Neural Network API, which provides a high-level interface for running neural network inference directly in the browser.
WebAssembly: A binary instruction format for a stack-based virtual machine, which enables high-performance applications to run on the web.

Native ML API Layer

ML Compute (MacOS/iOS): Apple’s machine learning framework for performing high-performance ML tasks on macOS and iOS devices.
DirectML (Windows): Microsoft’s Direct Machine Learning API, which provides GPU-accelerated machine learning on Windows.
NN API (Android): Android’s Neural Networks API, which provides hardware-accelerated inference operations on Android devices.
OpenVINO (Linux): Intel’s Open Visual Inference and Neural Network Optimization toolkit for deploying high-performance ML inference on Linux.

Hardware Layer

CPU: Central Processing Unit, the general-purpose processor in a computer.
GPU: Graphics Processing Unit, specialized for parallel processing and often used for accelerating machine learning tasks.
ML Accelerators: Dedicated hardware designed specifically for accelerating machine learning computations (e.g., NPUs, TPUs).

How It All Fits Together:

Web App Layer:

Developers use pre-trained models (like ONNX or TensorFlow) and JavaScript ML frameworks (like TensorFlow.js or ONNX.js) to build web applications with machine learning capabilities.

Web Browser Layer:

The web application runs in a web browser that supports the WebNN API. The browser can leverage WebGPU for high-performance computations, WebNN for neural network inference, and WebAssembly for executing performance-critical code.

Native ML API Layer:

The WebNN API in the browser translates the high-level neural network operations into calls to native machine learning APIs provided by the operating system. This ensures that the web application can take advantage of the best available hardware acceleration on the device, whether it’s running on macOS, Windows, Android, or Linux.

Hardware Layer:

The native machine learning APIs utilize the underlying hardware capabilities, such as CPU parallelism, GPU acceleration, or dedicated ML accelerators, to perform the neural network computations efficiently.

Benefits of This Architecture:

Hardware Agnostic: WebNN provides a hardware-agnostic layer, meaning developers don’t need to write platform-specific code. The same web application can run efficiently on different devices and operating systems.
Performance Optimization: By leveraging native ML APIs and hardware acceleration, web applications can achieve high performance for machine learning tasks.
Privacy: Data stays on the device, reducing the need to send sensitive information to remote servers.
Reduced Latency: In-browser inference reduces the delay associated with sending data to and from a server, enabling real-time applications like video analysis or face detection.

This architecture enables web developers to build powerful, efficient, and privacy-preserving machine learning applications that run directly in the browser. By abstracting

The following code sample illustrates a simple usage of this API:

Use Cases

WebNN offers several use cases for web applications. Here are some common scenarios where WebNN can be beneficial:

Face Recognition: Face Landmark (SimpleCNN) with WebNN.
Facial Landmark Detection: Use WebNN to detect facial landmarks, which can be useful for applications like augmented reality filters or emotion analysis.
Image Classification: Leverage WebNN for image classification tasks. You can demonstrate this using pre-trained models and the WebNN API.
Object Detection: Perform object detection in web applications by utilizing WebNN with pre-trained models.
Noise Suppression: Implement noise suppression models (e.g., RNNoise) using WebNN for audio processing,
Selfie Segmentation: Explore MediaPipe Selfie Segmentation using TFLite Web XNNPACK delegate and WebNN delegate for real-time background removal in selfies.
Semantic Segmentation: Use WebNN to implement semantic segmentation tasks, such as identifying object boundaries in images.
Style Transfer: Apply artistic style-transfer techniques to images using WebNN.

Remember that WebNN simplifies neural network inference in the browser, making it more accessible for web developers

Target hardware

Web applications and frameworks can target typical computing devices on popular operating systems that people use in their daily lives. Initial prototypes demonstrate respectable performance on:

Smartphones e.g. Google Pixel 3 or similar
Laptops e.g. 13″ MacBook Pro 2015 or similar

The WebNN API is not tied to specific platforms and is implementable by existing major platform APIs, such as:

Android Neural Networks API
Windows DirectML API
macOS/iOS ML Compute API

Depending on the underlying hardware capabilities, these platform APIs may make use of CPU parallelism, general-purpose GPU, or dedicated hardware accelerators for machine learning. The WebNN API provides performance adaptation options but remains hardware agnostic.

When running on GPUs, WebNN currently supports the following models:

Stable Diffusion Turbo
Stable Diffusion 1.5
Whisper-base
MobileNetv2
Segment Anything
ResNet
EfficientNet
SqueezeNet

WebNN also works with custom models as long as operator support is sufficient. Check status of operators here.

Installation Guide:

To get started with WebNN, follow these steps:

Browser Compatibility:

WebNN requires a compatible browser. Download the Microsoft Edge Dev channel version or later.
To enable WebNN, in your browser address bar, enter about://flags, and then press Enter. An Experiments page opens
In the Search flags box, enter webnn. Enables WebNN API appears
In the drop-down menu, select Enabled
Relaunch your browser

GitHub Repository:

Clone the WebNN Developer Preview repository to your local machine.
Navigate to the repository and explore the provided samples and examples.

Conclusion:

The WebNN API represents a significant advancement in bringing machine learning capabilities directly into web browsers, creating a powerful intersection between AI and web technologies. Here’s an expanded look at why this is transformative:

Empowering Web Developers – WebNN democratizes access to advanced machine learning by providing web developers with the tools to integrate AI models seamlessly into their web applications. This eliminates the need for extensive expertise in AI or hardware-specific optimizations, lowering the barrier to entry for AI development.
Performance and Efficiency – By leveraging native machine learning APIs and the underlying hardware capabilities, WebNN ensures that neural network inference tasks are performed efficiently. Whether it’s utilizing CPU parallelism, GPU acceleration, or dedicated ML accelerators, WebNN optimizes performance across various devices and operating systems. This results in faster inference times and a smoother user experience, even for computationally intensive tasks like real-time video analysis or object detection.
Privacy Preservation – One of the standout benefits of WebNN is its ability to perform inference directly on the device. This approach keeps user data local, eliminating the need to send sensitive information to remote servers. This is crucial for applications dealing with personal data, such as health monitoring apps, facial recognition systems, or any application where user privacy is a concern.
Reduced Latency – Performing inference in the browser dramatically reduces the latency associated with sending data to and from a server. This is particularly beneficial for real-time applications such as augmented reality (AR) filters, live video analysis, or interactive AI-driven experiences. Users can enjoy instantaneous responses and a more engaging interaction without the lag caused by network delays.
High Availability – With WebNN, web applications can operate offline once the necessary assets are cached. This ensures that AI functionalities remain accessible even in environments with poor or no internet connectivity. For example, an educational app using WebNN for interactive learning can function seamlessly during a flight or in remote areas without reliable internet access.
Cost Efficiency – By offloading computation to client devices, WebNN reduces the need for powerful server infrastructure. This leads to lower operational and maintenance costs for running AI/ML services in the cloud. Developers can deploy sophisticated AI features without incurring the high costs associated with cloud-based inference, making it a cost-effective solution for startups and large enterprises alike.

Future Opportunities:

WebNN opens up a world of possibilities for the future of web applications:

Edge AI: Enabling AI-powered functionalities at the edge, without relying on cloud services.
Interactive Experiences: Creating more dynamic and responsive web applications that can react in real-time to user interactions.
Privacy-First AI: As the demand for privacy-first AI solutions grows, WebNN positions itself as a pivotal technology that can bring powerful AI capabilities to the masses, right within their browsers.
Standardization: As WebNN matures and gains wider adoption, it has the potential to become a standard for web-based AI, encouraging more consistent and interoperable AI implementations across different browsers and platforms.

Final Thoughts:

The WebNN API is poised to play a pivotal role in the next generation of web development. Its promise of seamless AI integration, exceptional performance, and broad industry support makes it an exciting development to watch. As WebNN evolves, we can expect even more innovative applications and advancements at the intersection of AI and web technologies. The future of AI is not just in the cloud or on powerful servers—it’s right in your browser!

Stay tuned as we continue to follow the evolution of WebNN and its impact on the digital world. For more details, explore the WebNN Developer Preview website and start experimenting with this cutting-edge technology today!

Additional Links & References:

WebNN Developer Preview (microsoft.github.io)
WebNN tutorial | Microsoft Learn
WebNN | Web Machine Learning

Microsoft Tech Community – Latest Blogs –Read More

Cart

Cart

WebNN: Bringing AI Inference to the Browser