Azure Auto-Scaling: Setup Guide for Beginners
Learn how to set up Azure Auto-Scaling to optimize cloud resources, reduce costs, and maintain application performance effortlessly.

Azure Auto-Scaling helps you automatically adjust cloud resources based on demand, saving costs and ensuring consistent performance. Whether you're dealing with fluctuating workloads or predictable traffic patterns, Auto-Scaling ensures your applications run efficiently without manual intervention.
Key Features:
- Metric-Based Scaling: Automatically scales resources based on performance metrics like CPU usage.
- Schedule-Based Scaling: Adjusts resources based on pre-set schedules.
- Cost Control: Pay only for what you use, reducing unnecessary expenses.
- Automation: Minimises manual monitoring and resource adjustments.
Quick Setup Steps:
- Access "Autoscale" settings via Azure Monitor.
- Add Scale-Out Rules (e.g., increase resources when CPU > 70%).
- Add Scale-In Rules (e.g., reduce resources when CPU < 20%).
- Set resource limits (e.g., minimum 1, maximum 3 instances).
- Use cool-down periods to avoid frequent adjustments.
Additional Tips:
- Monitor key metrics like CPU usage and memory trends using Azure Monitor.
- Configure alerts to track performance and scaling events.
- Use custom metrics with Application Insights for tailored scaling rules.
Azure Auto-Scaling supports services like Virtual Machine Scale Sets, App Services, and Application Gateway, making it a versatile solution for businesses of all sizes. Start small, monitor performance, and refine your rules to optimise costs and efficiency.
Setting Up Auto-Scaling: Step-by-Step
Finding Auto-Scale Settings
To access Auto-Scale settings in the Azure portal, type "Monitor" into the portal's search bar and select "Autoscale" from the left-hand menu. You can also find scaling options directly in your resource's Settings menu.
Adding Scale-Out Rules
Scale-out rules allow your application to handle higher demand by automatically adding resources. Here's how to set them up:
-
Access the Autoscale Settings
Go to your resource's Autoscale settings and click on "Add a rule" to start configuring conditions. -
Set Your Metrics
Define the triggers that will activate scaling based on resource usage. For example, to monitor CPU usage:- Select "Percentage CPU" as the metric.
- Set the operator to "Greater than".
- Specify a threshold of 70%.
- Configure the action to "Increase count by 1".
This setup ensures your application scales up when demand increases.
Adding Scale-In Rules
Scale-in rules help save costs by reducing resources during periods of low demand. Follow these steps:
- Click "Add a rule" in the Autoscale settings.
- Select "Percentage CPU" as the metric.
- Set the operator to "Less than".
- Specify a threshold of 20%.
- Configure the action to "Decrease count by 1".
Once both scale-in and scale-out rules are in place, you can set resource limits to maintain a balance between performance and cost.
Setting Resource Limits
Defining resource limits fine-tunes your scaling setup. Use the following parameters to optimise efficiency:
Parameter | Recommended Setting | Purpose |
---|---|---|
Minimum Instances | 1 | Keeps the service running at all times. |
Maximum Instances | 3 | Avoids over-scaling and unexpected costs. |
Default Instance Count | 1 | Acts as the starting point when no rules apply. |
Additionally, include a cool-down period after scaling actions. This helps stabilise metrics and avoids frequent scaling adjustments, leading to better resource management and cost control.
Additional Auto-Scaling Features
Application Gateway Scaling
The Azure Application Gateway v2 SKU adjusts its resources automatically within 3–5 minutes to handle traffic demands. To ensure your setup is ready for fluctuating traffic, follow these configuration recommendations:
Configuration | Recommended Setting | Why It Matters |
---|---|---|
Maximum Instances | 125 | Supports scaling during high-demand periods |
Minimum Instances | Based on current CU usage | Prepares for sudden traffic spikes |
Compute Units per Instance | 10 | Ensures optimal performance |
Scale Buffer | 10–20% | Provides room for unexpected traffic surges |
Set up alerts for key metrics to monitor performance effectively. These include:
- CPU usage trends
- Number of unhealthy hosts
- Failed request rates
- Response status codes
The v2 SKU also brings several upgrades over the earlier version, such as 5× improved TLS offload performance, faster deployment times, and zone redundancy for better reliability.
"This article describes a few suggested guidelines to help you set up your Application Gateway to handle extra traffic for any high traffic volume that may occur." – Microsoft Learn
Custom Scaling Metrics
Custom metrics allow you to create scaling rules tailored to your application's specific needs, offering more control when standard metrics aren't enough.
For example, you can use Application Insights to track session counts. If sessions exceed 70 per instance, the system can scale out; if they drop below 60, it can scale in.
To set up custom metrics:
- Configure your application to send metrics to Application Insights.
- Publish metrics to the Standard or Azure.ApplicationInsights namespace.
- Set thresholds based on your application's performance needs.
Custom metrics are compatible with several Azure services, such as Virtual Machine Scale Sets, Cloud Services, App Service Web Apps, Data Explorer clusters, and API Management. By focusing on data that reflects your application's behaviour and user experience, you can make more precise scaling decisions, improving resource use and helping to manage costs effectively.
Performance Tracking and Adjustments
Key Metrics to Monitor
Tracking the right metrics is crucial for ensuring your auto-scaling setup runs smoothly. Azure Monitor offers detailed telemetry data to help you assess how well your scaling configuration is performing.
Here are some important metrics to keep an eye on, based on your resource type:
Resource Type | Key Metrics |
---|---|
Virtual Machines | • CPU Usage (\Processor(_Total)\% Processor Time ) • Memory Usage ( \Memory\% Committed Bytes In Use ) • Disk Utilisation ( \PhysicalDisk(_Total)\% Disk Time ) |
App Services | • CPU Percentage • Memory Percentage • HTTP Queue Length |
You can retrieve these metrics using PowerShell:
Get-AzMetricDefinition -ResourceId <resource_id>
Use this data to fine-tune your scaling rules and reduce unnecessary scaling fluctuations.
Adjusting Scale Settings
Performance data is your best tool for refining scaling rules. Here are some common adjustments to consider:
Preventing Rapid Scaling (Flapping):
If your scaling history shows frequent, back-and-forth adjustments, this could indicate flapping. To address this, tweak the cool-down period to give metrics time to stabilise. A good starting point is around 5 minutes, but you might need to adjust this depending on your application's needs.
Optimising Thresholds:
Set clear margins between scale-out and scale-in thresholds to avoid unnecessary scaling. For scale-out, start with conservative values and adjust as you monitor usage patterns. Similarly, ensure scale-in thresholds are sufficiently spaced from scale-out values to prevent constant adjustments.
Helpful Tips for Performance Monitoring:
- Enable diagnostics to collect guest OS performance counter data.
- Regularly check the "Run history" tab to spot trends and patterns in scaling behaviour.
- Keep an eye on queue lengths and network metrics like
BytesReceived
andBytesSent
to identify and address bottlenecks effectively.
What is Azure Autoscaling ? Vertical and Horizontal Scaling ...
Summary
This section brings together the key points for using Azure Auto-Scaling to manage cloud resources effectively for SMBs. It helps businesses maintain strong performance without overspending.
Here’s a quick overview of the essential components for building an auto-scaling strategy:
Component | Key Consideration | Best Practice |
---|---|---|
Scaling Rules | Resource metrics and thresholds | Use both scale-out and scale-in rules based on the same metric. |
Instance Limits | Maximum and minimum values | Set clear margins between limits to avoid resource bottlenecks. |
Monitoring | Performance metrics and alerts | Configure notifications to stay updated on scaling events. |
Cost Control | Resource allocation | Set upper spending limits for each service to manage costs. |
These practices help ensure your scaling setup is both efficient and reliable. For workloads with predictable patterns, scheduled auto-scaling works well. For fluctuating demands, metric-based auto-scaling provides better flexibility. As Azure's documentation explains:
"The goal of cost optimising scaling is to scale up and out at the last responsible moment and to scale down and in as soon as it's practical."
When fine-tuning your auto-scaling setup, keep these tips in mind:
- Start with extra capacity to monitor and adjust scaling without risking disruptions.
- Scale gradually to avoid sudden resource changes.
- Regularly review thresholds based on updated performance data.
Azure Auto-Scaling supports a variety of services, such as Virtual Machine Scale Sets, Cloud Services, and App Service, making it a versatile tool for different business needs.
"All autoscale failures are logged to the activity log".
This ensures you can quickly identify and address any issues through proper monitoring.
FAQs
How can Azure Auto-Scaling help optimise costs while ensuring reliable application performance?
Azure Auto-Scaling helps optimise costs by automatically adjusting cloud resources to match demand. This ensures you only pay for the resources you actually use, avoiding unnecessary expenses during periods of low activity. For example, idle resources are scaled down when demand decreases, reducing waste.
At the same time, Auto-Scaling ensures reliable application performance by scaling up resources during high-demand periods. This prevents performance issues, maintains availability, and helps meet service-level agreements. By balancing cost-efficiency with performance, Auto-Scaling is an essential tool for managing cloud resources effectively.
What is the difference between metric-based scaling and schedule-based scaling in Azure Auto-Scaling?
Metric-based scaling adjusts resources automatically based on monitored metrics, such as CPU usage or memory consumption. This ensures your system responds dynamically to changes in demand.
Schedule-based scaling, on the other hand, adjusts resources at specific times or dates that you define. This is useful for predictable workloads, such as handling increased traffic during regular business hours or seasonal events.
You can also combine both approaches to create a flexible, optimised scaling strategy that balances performance and cost-efficiency effectively.
How can I stop my Azure Auto-Scaling setup from making frequent adjustments, also known as 'flapping'?
To minimise frequent scaling adjustments, or 'flapping', ensure your scale-in and scale-out rules are well-coordinated and do not conflict. Use the same metric for both scaling actions and set thresholds with enough difference to avoid triggering opposing operations too quickly.
Azure includes built-in anti-flapping measures that predict and prevent contradictory scaling actions. For added stability, configure a cooldown period between scale-in and scale-out actions to give your system time to stabilise before the next adjustment. This approach helps maintain consistent performance while avoiding unnecessary scaling changes.