Our previous article explored quantization basics and the role of scale and zero points. Now, we’ll focus on two key quantization modes: symmetric and asymmetric. Each mode has unique benefits and challenges.
Choosing the right one can impact your model’s performance, especially on resource-limited devices.
This article will break down what symmetric and asymmetric quantization are. We’ll also show you how to implement both in PyTorch. By the end, you’ll know which mode suits your needs best.

What is Quantization?
Quantization, in general, is the process of mapping values from a large set of real numbers to a smaller, discrete set. This typically involves converting continuous inputs into fixed values at the output. In traditional models, weights are stored as floating-point numbers. Through quantization, they now can be converted into 8-bit integers for better efficiency.
A common approach to quantization is rounding or truncating.
- Rounding: We compute the nearest integer. For example, a value of 1.8 becomes 2, while 1.2 becomes 1.
- Truncation: This method involves removing the decimal values and converting inputs directly to integers. For example, 1.8 becomes 1, and 1.2 also becomes 1.
By using these methods, we can efficiently reduce the precision of data. This makes it more manageable for various computational tasks.
Motivation for Quantization
The main motivation behind quantizing deep neural networks is to improve inference speed. In neural networks, quantization converts floating point numbers into a quantized model. During model quantization, the moving average of the weights ensures that the transition from full precision to 8-bit integers remains stable.
With the advent of large language models (LLMs), the number of parameters continues to grow. And, this results in an increasingly large memory footprint.
As neural networks evolve, there is a growing demand to run these models on smaller devices. These include laptops, mobile phones, and even smartwatches.
Achieving this requires reducing the model size and improving efficiency, which is where quantization becomes indispensable.
Before diving deeper into quantization, remember that trained neural networks are merely floating-point numbers. And these numbers are stored in a computer’s memory.
Quantization helps manage and optimize these values, enabling the deployment of complex models on a variety of devices.
Symmetric Quantization: A Deep Dive
Symmetric quantization is straightforward. It maps the range of your data around zero.
The scale is calculated using the maximum absolute value in the tensor. The zero point is always set to zero. This makes the method simpler and faster.
Here’s how it works. First, find the maximum absolute value in your tensor. Then, divide this by the maximum value of the quantized data type. This gives you the scale.
With symmetric quantization, all values are treated equally, positive or negative.
Symmetric quantization uses a zero-centered scaling factor. Asymmetric quantization, on the other hand, adjusts the scaling based on the range of the data.
During quantization-aware training, models are trained with awareness of the quantization process, improving their robustness.
This symmetry simplifies calculations and is often faster.
Advantages of Symmetric Quantization
One significant advantage of symmetric quantization is its simplicity. With the zero point fixed at zero, there’s less to calculate. This can speed up both training and inference.
Symmetric quantization is also consistent. The scaling factor applies uniformly across all data, reducing the risk of bias. The scaling factor is derived from the maximum absolute value, which simplifies the quantization operation. For models with balanced data, symmetric quantization often works best.
Use Cases for Symmetric Quantization
Symmetric quantization shines in applications where speed and simplicity are essential. It’s ideal for models running on edge devices with limited processing power.
For example, IoT devices or smartphones often use symmetric quantization. It’s also a good fit for models where the data distribution is balanced around zero.
In such cases, symmetric quantization can deliver efficient performance without much loss in accuracy.
Asymmetric Quantization
Asymmetric quantization is more flexible. Unlike symmetric quantization, it does not center around zero.
The zero point can shift, allowing a better representation of the data’s range. This makes it more adaptable to data that isn’t evenly distributed.
In asymmetric quantization, the scale is calculated similarly. However, the zero point is not fixed. Instead, it’s adjusted to match the data’s minimum value.
This shift helps reduce quantization errors, especially when dealing with skewed data.
Advantages of Asymmetric Quantization
The main advantage of asymmetric quantization is precision. By adjusting the zero point, it better captures the range of the data. This can lead to higher accuracy, especially for models with unbalanced data.
Asymmetric quantization is also more flexible. It adapts to various data distributions, making it a good choice for a wider range of applications.
Use Cases for Asymmetric Quantization
Asymmetric quantization is often used in models where data isn’t centered around zero. This includes real-world applications like speech recognition or image processing. These fields often deal with data that has a natural bias.
Asymmetric quantization helps minimize errors in these cases, improving overall model performance.
It’s also useful in scenarios where maintaining high accuracy is critical.
Code Implementation: Symmetric and Asymmetric Quantization
Let’s see how to implement both symmetric and asymmetric quantization in PyTorch. We’ll start with symmetric quantization.
Symmetric Quantization in PyTorch
First, we need to calculate the scale for symmetric quantization. Here’s how:
import torch
# Function to calculate scale in symmetric mode
def get_q_scale_symmetric(tensor, dtype=torch.int8):
# Get the maximum absolute value in the tensor
r_max = tensor.abs().max().item()
# Get the maximum value for the dtype (int8)
q_max = torch.iinfo(dtype).max
# Calculate and return the scale
return r_max / q_max
# Test the implementation on a 4x4 matrix
test_tensor = torch.randn((4, 4))
scale = get_q_scale_symmetric(test_tensor)
print(f'Symmetric Scale: {scale}')In symmetric mode, the zero point is always zero. After getting the scale, we quantize the tensor:
def linear_q_symmetric(tensor, dtype=torch.int8):
# Get the scale using the symmetric method
scale = get_q_scale_symmetric(tensor)
# Quantize the tensor
quantized_tensor = tensor / scale
quantized_tensor = torch.round(quantized_tensor).clamp(-128, 127).to(dtype)
return quantized_tensor, scale
# Quantize the test tensor
quantized_tensor, scale = linear_q_symmetric(test_tensor)
print(f'Quantized Tensor (Symmetric):\n{quantized_tensor}')Asymmetric Quantization in PyTorch
Now, let’s move on to asymmetric quantization. We need to calculate both the scale and zero point.
def get_q_scale_and_zero_point_asymmetric(tensor, dtype=torch.int8):
# Get the min and max values in the tensor
r_min = tensor.min().item()
r_max = tensor.max().item()
# Get the min and max values for the dtype (int8)
q_min = torch.iinfo(dtype).min
q_max = torch.iinfo(dtype).max
# Calculate scale
scale = (r_max - r_min) / (q_max - q_min)
# Calculate zero point
zero_point = q_min - (r_min / scale)
zero_point = int(round(zero_point))
return scale, zero_point
# Calculate scale and zero point for asymmetric mode
scale, zero_point = get_q_scale_and_zero_point_asymmetric(test_tensor)
print(f'Asymmetric Scale: {scale}, Zero Point: {zero_point}')With the scale and zero point, we can quantize the tensor in asymmetric mode:
def linear_q_asymmetric(tensor, dtype=torch.int8):
# Get scale and zero point using asymmetric method
scale, zero_point = get_q_scale_and_zero_point_asymmetric(tensor)
# Quantize the tensor
quantized_tensor = (tensor / scale) + zero_point
quantized_tensor = torch.round(quantized_tensor).clamp(-128, 127).to(dtype)
return quantized_tensor, scale, zero_point
# Quantize the test tensor in asymmetric mode
quantized_tensor, scale, zero_point = linear_q_asymmetric(test_tensor)
print(f'Quantized Tensor (Asymmetric):\n{quantized_tensor}')
Dequantization and Error AnalysisDequantization and Error Analysis
Finally, let’s dequantize the tensors and analyze the quantization error.
# Dequantization function
def linear_dequantization(quantized_tensor, scale, zero_point):
# De-quantize the tensor
dequantized_tensor = scale * (quantized_tensor.float() - zero_point)
return dequantized_tensor
# Dequantize the symmetric quantized tensor
dequantized_tensor_symmetric = linear_dequantization(quantized_tensor, scale, 0)
print(f'De-quantized Tensor (Symmetric):\n{dequantized_tensor_symmetric}')
# Dequantize the asymmetric quantized tensor
dequantized_tensor_asymmetric = linear_dequantization(quantized_tensor, scale, zero_point)
print(f'De-quantized Tensor (Asymmetric):\n{dequantized_tensor_asymmetric}')Plot Quantization Error
Let’s plot the quantization errors for both symmetric and asymmetric quantization.
import matplotlib.pyplot as plt
# Function to plot quantization error
def plot_quantization_errors(original_tensor, dequantized_tensor, title):
# Calculate the quantization error
quantization_error = (dequantized_tensor - original_tensor).abs()
# Plot the errors
plt.figure(figsize=(8, 6))
plt.matshow(quantization_error, cmap='viridis', fignum=1)
plt.colorbar()
plt.title(title)
plt.show()
# Plot errors for symmetric quantization
plot_quantization_errors(test_tensor, dequantized_tensor_symmetric, 'Quantization Error (Symmetric)')
# Plot errors for asymmetric quantization
plot_quantization_errors(test_tensor, dequantized_tensor_asymmetric, 'Quantization Error (Asymmetric)')This code shows how symmetric and asymmetric quantization affects the quantization error differently. Analyzing these errors can help you choose the best quantization method for your model.

Final Thoughts & Conclusion
In this article, we explored the differences between symmetric and asymmetric quantization modes. We learned that symmetric quantization is simpler, with a fixed zero point at zero. This makes it faster and easier to implement, especially when data is centered around zero. However, it may introduce higher errors when the data distribution is skewed.
- Symmetric Quantization:
- Simpler with a fixed zero point at zero.
- Faster and easier to implement.
- It is best suited for data centered around zero.
- May introduce higher errors if the data distribution is skewed.
- Asymmetric Quantization:
- More flexible by adjusting the zero point based on the data’s minimum value.
- Provides higher precision for datasets with a wide range or skewed distribution.
- It requires additional computation but can reduce quantization errors.
- Practical Insights:
- Symmetric quantization is ideal for simpler, balanced datasets.
- Asymmetric quantization works better for complex, real-world data with skewed distributions.
Understanding these strengths and limitations helps optimize machine learning models for different applications.
In our next article, we will explore granularity in quantization and its impact on model performance. Stay tuned as we continue our deep dive into the fascinating world of quantization in deep learning.
Discover more from AI For Developers
Subscribe to get the latest posts sent to your email.