Preparing for On-Device Deployment (Best Free AI Course)

Welcome back to our on-device AI series!

In the first part, we introduced the fundamental concepts of on-device AI. We prioritized on-device AI to supercharge performance, protect privacy, and conquer offline challenges.

Now, this article will delve into the detailed steps required to prepare AI models for on-device deployment. This involves ensuring the models are optimized, validated, and ready for real-world applications.

We will explore the intricacies of on-device AI and how it differs from traditional cloud-based AI solutions. We will focus on critical technical considerations such as model size, architecture, and computational requirements.

By understanding these aspects, you can achieve efficient and effective performance on various devices.

on-device AI deployment - free AI course

Capturing the Neural Network Graph

Capturing the neural network graph involves converting the computational graph of a neural network into a portable format. The latter needs to support device deployment. This is essential for model portability and optimization.

Model Compilation for On-Device Deployment

Model compilation for on-device deployment is a process. It involves transforming the captured model into a format that is optimized for the target device’s hardware.

This step ensures that the model runs efficiently on the specific device.

Accelerating Inference with Hardware

Leveraging hardware acceleration, such as GPUs, NPUs, or specialized AI processors, can significantly speed up inference, allowing for real-time performance on devices.

Importance of On-Device Validation

Validating the model on the device ensures that the model’s predictions are consistent with those made during the cloud-based training. This step is crucial to ensuring the model’s reliability and accuracy in real-world scenarios.

Steps for Preparing Deployment

Step 1: Capturing the Trained Model

Capturing the trained model involves tracing its computation graph to create a portable representation that can be compiled for various devices.

# Import the necessary libraries for capturing the trained model
from qai_hub_models.models.ffnet_40s import Model as FFNet_40s
import torch

# Load the pre-trained FFNet 40s model
ffnet_40s = FFNet_40s.from_pretrained()

# Define the input shape for the model
input_shape = (1, 3, 1024, 2048)

# Create example inputs for tracing the model
example_inputs = torch.rand(input_shape)

# Trace the model to capture its computation graph
traced_model = torch.jit.trace(ffnet_40s, example_inputs)
traced_model  # Display the traced model

Step 2: Compiling the Model for the Device

Compiling the model involves transforming the traced representation into an optimized format for the target device’s hardware.

# Import QAI Hub library for model compilation
import qai_hub
from utils import get_ai_hub_api_token  # Utility function to get AI Hub API token

# Configure QAI Hub with the API token
ai_hub_api_token = get_ai_hub_api_token()
!qai-hub configure --api_token $ai_hub_api_token

# List available devices for deployment
for device in qai_hub.get_devices():
    print(device.name)

# Randomly select a device for compilation
devices = [
    "Samsung Galaxy S22 Ultra 5G",
    "Samsung Galaxy S22 5G",
    "Samsung Galaxy S22+ 5G",
    "Samsung Galaxy Tab S8",
    "Xiaomi 12",
    "Xiaomi 12 Pro",
    "Samsung Galaxy S23",
    "Samsung Galaxy S23+",
    "Samsung Galaxy S23 Ultra",
    "Samsung Galaxy S24",
    "Samsung Galaxy S24 Ultra",
    "Samsung Galaxy S24+",
]

import random
selected_device = random.choice(devices)  # Select a random device
print(selected_device)  # Print the selected device

# Initialize the selected device
device = qai_hub.Device(selected_device)

# Submit a compile job for the selected device
compile_job = qai_hub.submit_compile_job(
    model=traced_model,                        # Traced PyTorch model
    input_specs={"image": input_shape},        # Input specifications
    device=device,                             # Target device
)

# Download and save the compiled model for on-device use
target_model = compile_job.get_target_model()

Step 3: Experimenting with Different Runtimes

Experimenting with different runtimes can help identify the best configuration for the target device. This helps optimize the model further.

# Experiment with different runtimes for model compilation

# Compile using TensorFlow Lite runtime
compile_options = "--target_runtime tflite"
compile_job_expt = qai_hub.submit_compile_job(
    model=traced_model,                        # Traced PyTorch model
    input_specs={"image": input_shape},        # Input specifications
    device=device,                             # Target device
    options=compile_options,                   # Compilation options
)

# Compile using ONNX runtime
compile_options = "--target_runtime onnx"
compile_job_expt = qai_hub.submit_compile_job(
    model=traced_model,                        # Traced PyTorch model
    input_specs={"image": input_shape},        # Input specifications
    device=device,                             # Target device
    options=compile_options,                   # Compilation options
)

# Compile using Qualcomm AI Engine runtime
compile_options = "--target_runtime qnn_lib_aarch64_android"
compile_job_expt = qai_hub.submit_compile_job(
    model=traced_model,                        # Traced PyTorch model
    input_specs={"image": input_shape},        # Input specifications
    device=device,                             # Target device
    options=compile_options,                   # Compilation options
)

Step 4: Exploring Different Compute Units

Exploring different compute units (CPU, GPU, NPU) helps determine the optimal configuration for running the model efficiently on the target device.

# Import necessary utilities for performance profiling
from qai_hub_models.utils.printing import print_profile_metrics_from_job

# Initialize the selected device for profiling
device = qai_hub.Device(selected_device)

# Submit a performance profiling job on the device
profile_job = qai_hub.submit_profile_job(
    model=target_model,                       # Compiled model
    device=device,                            # Target device
)

# Download and print profiling data
profile_data = profile_job.download_profile()
print_profile_metrics_from_job(profile_job, profile_data)

# Experiment with different compute units

# Profile using CPU
profile_options = "--compute_unit cpu"
profile_job_expt = qai_hub.submit_profile_job(
    model=target_model,                     # Compiled model
    device=device,                          # Target device
    options=profile_options,                # Profiling options
)

# Profile using GPU
profile_options = "--compute_unit gpu"
profile_job_expt = qai_hub.submit_profile_job(
    model=target_model,                     # Compiled model
    device=device,                          # Target device
    options=profile_options,                # Profiling options
)

# Profile using NPU
profile_options = "--compute_unit npu"
profile_job_expt = qai_hub.submit_profile_job(
    model=target_model,                     # Compiled model
    device=device,                          # Target device
    options=profile_options,                # Profiling options
)

Step 5: Performing On-Device Inference

Finally, performing on-device inference validates the model’s performance. And, it ensures its outputs are accurate and consistent with the original model.

# Sample inputs for on-device inference
sample_inputs = ffnet_40s.sample_inputs()

# Convert sample inputs to Torch tensor
torch_inputs = torch.Tensor(sample_inputs['image'][0])

# Perform inference using the original model
torch_outputs = ffnet_40s(torch_inputs)
torch_outputs  # Display the outputs

# Submit an inference job on the device
inference_job = qai_hub.submit_inference_job(
    model=target_model,          # Compiled model
    inputs=sample_inputs,        # Sample input
    device=device,               # Target device
)

# Download and display the on-device inference outputs
ondevice_outputs = inference_job.download_output_data()
ondevice_outputs['output_0']  # Display the outputs

# Print inference metrics
from qai_hub_models.utils.printing import print_inference_metrics
print_inference_metrics(inference_job, ondevice_outputs, torch_outputs)

Deployment Readiness

After capturing, compiling, and validating the model on the target device, and ensuring its performance meets the required criteria, the model is ready for deployment.

Final Thoughts on On-Device Deployment

Deploying AI models on devices requires careful preparation. This includes capturing the computation graph, compiling the model for the target hardware, and validating its performance.

Following these steps ensures that the models are both efficient and effective when deployed on devices.

In the next article, we will explore the process of quantizing models. It’s a technique that further optimizes models for on-device deployment. It relies on reducing their size and improving inference speed without significantly sacrificing accuracy.

Discover more from AI For Developers

Subscribe to get the latest posts sent to your email.

Preparing for On-Device Deployment – Free AI Course (Part 2)

Capturing the Neural Network Graph

Model Compilation for On-Device Deployment

Accelerating Inference with Hardware

Importance of On-Device Validation

Steps for Preparing Deployment

Step 1: Capturing the Trained Model

Step 2: Compiling the Model for the Device

Step 3: Experimenting with Different Runtimes

Step 4: Exploring Different Compute Units

Step 5: Performing On-Device Inference

Deployment Readiness

Final Thoughts on On-Device Deployment

Discover more from AI For Developers

Read Articles by Topic

Mohamed Ahmed

Leave a ReplyCancel reply

AWS re:Invent 2024: The Infrastructure Race Gets More Interesting

AI Development in 2024: A Year of Transformation

Introducing Multimodal Llama 3.2 – Part 1

Why Most AI Doom Scenarios for Devs Are Wrong

AI For Developers

Top Categories

Subscribe to Our Newsletter

Follow us

Capturing the Neural Network Graph

Model Compilation for On-Device Deployment

Accelerating Inference with Hardware

Importance of On-Device Validation

Steps for Preparing Deployment

Step 1: Capturing the Trained Model

Step 2: Compiling the Model for the Device

Step 3: Experimenting with Different Runtimes

Step 4: Exploring Different Compute Units

Step 5: Performing On-Device Inference

Deployment Readiness

Final Thoughts on On-Device Deployment

Discover more from AI For Developers

Read Articles by Topic

Mohamed Ahmed

Introduction to On-Device AI – Free AI Course (Part 1)

Quantizing Models for On-Device AI (AI Course – Part 3)

Leave a ReplyCancel reply

AWS re:Invent 2024: The Infrastructure Race Gets More Interesting

AI Development in 2024: A Year of Transformation

Introducing Multimodal Llama 3.2 – Part 1

AWS re:Invent 2024 Keynote Deep Dive (Continued): Infrastructure at Scale

Why Most AI Doom Scenarios for Devs Are Wrong

Discover more from AI For Developers