Scalability in the Cloud: Migrating over 200 TB SAP Oracle Database to Azure
Overview
In this blog, we will cover the Azure solution and deployment approach to migrate very large Oracle databases (200 TB +) to Azure.
VM Solution
Azure Virtual Machine (VM) offers optimal vCPUs for managing Oracle license with high RAM ratio to accommodate large Oracle SGA, IO and network bandwidth to support transaction and batch workload. We tested both M192 and M176 SKUs with a 200 TB+ Oracle database.
In the below comparison, M176 is based on Intel Sapphire Rapids processor with DDR5 offers higher SAPS and 1.5 faster memory access than the M192 (Intel Cascade Lake based processor). M176 is also equipped with Azure Boost technology for improving both IO/Network throughput. In our testing, we found M176 offers higher SAPS, faster memory access, more IO & Network bandwidth.
VM SKU
Intel Chipset
vCPU
Memory GiB
IOPS/MBps
Network Bandwidth (Mbps)
M192idms_v2
Cascade Lake / DDR4
192
4096
80000/2000
30000
M176ds_4_v3
Sapphire Rapids / DDR5
176
3892
130000/4000
40000
System Global Area (SGA): Very Large Oracle databases benefit greatly from large SGA size. Customers with such sizeable Oracle workloads should deploy an Azure M-series with a minimum of 4 TB or more RAM size. Specific parameter recommendations below:
Set Linux Huge Pages to 75-90% of Physical RAM size
Set System Global Area (SGA) to 90% of Huge Page size
Set the Oracle parameter USE_LARGE_PAGES = ONLY
Storage Solution and Configuration
Azure has multiple storage options: Premium SSD, Premium SSDv2, Ultra and Azure NetApp Files (ANF). The chart below captures an overview of the storage characteristics for virtual machine Standard_M176ds_4_v3.
IO Metrics
Premium SSD
Premium SSDv2 (Pv2)
Azure NetApp Files (ANF)
IOPS
130K
130K
Millions
Throughput
4 GB/s
4 GB/s
>5 GB/s
Latency
Lower single digit
(in ms)
< 1 ms
< .4 ms
High Availability
Oracle Data Guard
Oracle Data Guard
Oracle Data Guard
Disaster Recovery
Oracle Data Guard
Oracle Data Guard
Oracle Data Guard and/or ANF Cross Region Replication
Storage Snapshot
Yes
No
Yes
Storage Manager
Automatic Storage Management (ASM)
ASM
dNFS
For 200 TB+ Oracle database workload, we tested the following storage configuration which optimally leverages both the network and IO channel from ANF and Premium SSDv2 (Pv2) respectively. Leveraging both ANF & Pv2 helped to optimize available VM throughputs effectively to meet and exceed the required IO requirements of such a large Oracle database.
Component
Disk Type
Number of Volumes
Size (TiB)
Total Throughput
GiB/s
Volume
Stripe Size
Oracle Home
Pv2
1
1
250
LVM
sapdata1-6
ANF
6
40 per volume
3000-4000
Individual
Oracle redo1-4
ANF
4
.5 per volume
500-2000
Individual
Oracle Fra
ANF
1
5
500-1000
Individual
Oracle Archive
Pv2
4
10
1500
LVM
64KB
Oracle Temp
Pv2 or Ephemeral
4
10
1500
LVM
64KB
Storage Deployment Approach
Both NFSv3 and NFSv4.1 are supported with Oracle Direct NFS (dNFS), we ultimately went with the combination of NFSv3 and Oracle Direct NFS. NFSv3 has been proven more reliable, more robust and is much less bug sensitive to dNFS than the newer NFS Version 4.1.
Application volume group for Oracle (AVG for Oracle) deploys all volumes required to install and operate the Oracle databases at enterprise scale, with optimal performance and according to best practices in a single step with optimized workflow. AVG for Oracle shortens Oracle database deployment time and ensures volume performance and stability, including the use of multiple storage endpoints (multiple IPs).
Oracle Database with Azure NetApp Files – Azure Example Scenarios | Microsoft Learn
Understand Azure NetApp Files application volume group for Oracle | Microsoft Learn
The Oracle data files can be distributed across sapdata volumes in round robins to avoid individual filesystem IO pressure.
High Availability Architecture
Azure offers a High Availability option by leveraging availability zones with SLA of 99.99. Most of the Azure regions provide VM SKU and low latency between the zones to deploy active-active HA setup across zones. However not every zone has got the required VM SKU so it is important to find out required VM SKU availability by running SAP-on-Azure-Scripts-and-Utilities/Get-VM-by-Zones at main · Azure/SAP-on-Azure-Scripts-and-Utilities (github.com) from your subscription. You can find out low latency zones by running SAP-on-Azure-Scripts-and-Utilities/AvZone-Latency-Test at main · Azure/SAP-on-Azure-Scripts-and-Utilities (github.com) . Combination of SKU availability and low latency script can guide you to identify zones that can offer active-active zone pair for HA deployment.
It is important to note that each subscription may be mapped to different physical zones. You can find out physical zone mapping using Azure API Subscriptions – List Locations – REST API (Azure Resource Management) | Microsoft Learn.
Below picture provides HA architecture.
Data Protection Strategy
Customers can leverage a combination of ANF snapshot on the primary VM and weekly Oracle streaming backups on HA stand-by. We recommend the ANF snapshot tool provided by Microsoft known as the application consistent snapshot. Both snapshot and cloning can be executed in minutes, regardless of database size. Cloned volumes can be leveraged for system copy, but it is critical that production and QA VMs be on the same physical zone to ensure low latency between them.
Technically, ANF does not prevent you from mounting NFS volumes across zones, so it is important that operational procedure established to keep both zone & ANF storage on same side.
Backup & Snapshot Approach
Domain
Backup Component
Backup Options
Frequency
Ran against
Load on DB VM
Primary Region
DB
snapshot (azacsnap)
4 hours
HA Primary VM
Low
RMAN Backup
Daily incremental and weekly full
HA Stand-by VM
Low
Log
Archive Log Backup
15 minutes
HA Primary VM
Low
DR Region
DB
Oracle Data Guard
Current
n/a
Low
Database Restore
Failure
Recovery Option
Recovery Time
Comment
DB Level
Snapshot
Log (roll-forward)
In Minutes
1st Option
RMAN Restore
Log (roll-forward)
In Hours
2nd Option
Region Wide
Oracle Data Guard
In Minutes
1st Option
RMAN Restore
Log (roll-forward)
In Hours
2nd Option
Migration Approach
Depending on on-prem HW, OS/DB and SAP software levels, migration falls into either Homogeneous or Heterogeneous migration category.
We will cover a heterogeneous migration approach in a separate blog and discuss about how to reduce downtime and improved benefits for very large databases.
In the homogenous migration approach, smaller databases can be migrated using backup and restore. Larger database can be migrated by setting up Oracle Data Guard (ODG) replication.
Customer should run Azure Quality Check against deployed solution to identify and address any Azure best practices deviation.
Testing Approaches
Customers have leveraged Oracle Real Application Testing (RAT) option to perform real-world testing of the Oracle Database. By capturing production workloads during the peak period and replaying on Azure can help identify the required VM SKU and storage solution. Customer leveraged Azure Monitoring Dashboards and RAT generated outputs to analyze and conclude the test results and move forward confidently to migrate the Oracle on SAP system to Azure.
The RAT test covers Oracle database performance requirements. It is highly recommended to run SAP level volume and performance testing to ensure that end-to-end SAP processing meets and exceeds performance KPIs.
System Performance
Azure innovations such as Mv3 (Intel sapphire rapids /DDR5), Azure Boost for improving IO & Network Throughput, ANF storage solutions with sub-milli second latency with DNFS combined with Oracle advanced compression has resulted in 30-50% of SAP processing improvement on Azure.
Conclusion
Azure has led SAP on Azure solutions over the years and reached new heights every year by bringing over advanced VM SKU, Storage/Network solution, end to end architecture and deployment approaches to successfully deploying the largest Oracle database on SAP to Azure. Azure successfully hosts 200 TB+ SAP on Oracle database!
Useful Links
Below are key SAP Notes and Microsoft documentation for a successful Azure migration
2039619 – SAP Applications on Microsoft Azure using the Oracle Database: Supported Products and Versions – SAP for Me
1928533 – SAP Applications on Microsoft Azure: Supported Products and Azure VM types – SAP for Me
Oracle Azure Virtual Machines database deployment for SAP workload | Microsoft Learn
General performance considerations for Azure NetApp Files | Microsoft Learn
Understand Azure NetApp Files application volume group for Oracle | Microsoft Learn
SAP-on-Azure-Scripts-and-Utilities/QualityCheck/Readme.md at main · Azure/SAP-on-Azure-Scripts-and-Utilities · GitHub
1672954 – Oracle 11g, 12c, 18c and 19c: Usage of hugepages on Linux – SAP for Me
Co-Authors
Microsoft Tech Community – Latest Blogs –Read More