5 Steps to Diagnose Azure Load Balancer Traffic Problems

Q: What is session persistence in Azure Load Balancer, and how should it be configured for optimal traffic distribution?

Session persistence - sometimes referred to as session affinity or sticky sessions - is a method that ensures all requests from a particular client are consistently directed to the same backend server. This approach is especially important for stateful applications where maintaining an uninterrupted session is essential. To set up session persistence effectively, you can use methods like source IP affinity or hash-based distribution. These techniques rely on hashing either a two-tuple (source and destination IP) or a three-tuple (source IP, destination IP, and protocol type) to route traffic predictably. When configured correctly, this prevents uneven traffic loads and supports consistent session handling, enhancing both application performance and the overall user experience.

Q: How can I prevent SNAT port exhaustion and maintain reliable outbound connectivity with Azure Load Balancer?

To prevent SNAT port exhaustion and maintain reliable outbound connectivity with Azure Load Balancer, here are some practical steps you can take: Adjust the allocation of SNAT ports based on the size of your backend pool and how your workloads operate. Spread outbound traffic across multiple virtual machines to ease the load on individual ports. Keep an eye on SNAT port usage regularly to catch and resolve any issues before they escalate. Use Virtual Network NAT for improved scalability and simpler management of outbound connectivity. By actively managing SNAT ports and incorporating tools like Virtual Network NAT, you can ensure your Azure Load Balancer performs reliably and keeps your applications running smoothly.

Learn how to diagnose and resolve traffic problems with Azure Load Balancer in five systematic steps, ensuring optimal performance and availability.

Azure Load Balancer helps distribute traffic across virtual machines (VMs) to ensure high availability and performance. However, traffic issues such as backend health failures, misconfigurations, session persistence problems, or SNAT port exhaustion can disrupt operations. Here’s a quick guide to diagnosing and resolving these challenges:

Step 1: Check backend pool health and configuration. Ensure VMs are healthy, listening on the correct ports, and not blocked by Network Security Groups (NSGs).
Step 2: Review load distribution settings. Adjust session persistence modes or use hash-based distribution for better traffic balance.
Step 3: Monitor traffic flow and session persistence. Use Azure metrics to identify uneven traffic or session-related issues.
Step 4: Test for SNAT port exhaustion. Address outbound connectivity limits with a NAT Gateway or explicit outbound rules.
Step 5: Run diagnostics and collect logs. Use tools like Azure Monitor and Network Watcher to identify deeper issues and prepare logs for support if needed.

These steps help maintain smooth traffic flow and prevent disruptions. Regular monitoring and proactive adjustments are key to optimising performance.

AZURE LB troubleshooting Guide!!

Step 1: Check Backend Pool Health and Configuration

To ensure smooth traffic distribution, it's crucial to verify that backend pool instances are properly set up and in good health. Azure Load Balancer only routes traffic to virtual machines (VMs) that are functioning correctly and passing health probes. For the Standard Load Balancer SLA, at least two healthy backend instances per pool are required. Start by checking the health status of your backend pool using the Azure portal.

Monitor Health Using Azure Portal

The Azure portal provides a clear view of the health status for each backend instance. Each instance is marked as either "Up" (healthy) or "Down" (unhealthy), along with reason codes indicating whether issues are caused by user actions or platform-related problems. To check the health of your backend pool:

Go to your load balancer in the Azure portal.
Select Load balancing rules.
Under the Health status column, click View details.
Use the Refresh button to update the status view.

If any issues are identified, address the configuration problems accordingly.

Fix Common Configuration Problems

Several common issues can impact traffic distribution. Here’s how to troubleshoot and resolve them:

Uneven Traffic Distribution: If traffic isn't evenly distributed, disable source IP persistence by setting it to 'None'.
VMs Not Listening on the Data Port: Ensure each backend VM is actively listening on the configured port. You can confirm this using a tool like netstat -an.
Network Security Group (NSG) Blocking Traffic: Check that your NSGs allow the necessary inbound traffic and don’t block the IP address 168.63.129.16.
Internal Load Balancer Access Issues: For private network access, consider using separate backend pool VMs or dual NIC VMs.
Residual Traffic to Removed VMs: Be aware that existing TCP connections may continue until the VM is stopped or deallocated.
VMs in a Stopped (Deallocated) State: Restart any stopped VMs to allow them to resume receiving traffic.
HTTP Health Probes: Verify that health probes are configured to check all workload dependencies.

For added reliability, consider spreading your VMs across multiple availability zones. This approach can help keep your application running even if one zone encounters an outage.

Step 2: Review Load Balancer Distribution Settings

Once you've confirmed the health of your backend, it's time to focus on how your Azure Load Balancer is distributing traffic. The way traffic is routed can directly affect performance, and incorrect settings might lead to uneven traffic or session-related issues. Pay close attention to the distribution modes to identify any irregularities.

Hash-Based vs Source IP Affinity Modes

Azure Load Balancer offers three distribution modes, each tailored for specific scenarios:

Hash-Based Mode (Five-Tuple): This method uses a combination of source IP, source port, destination IP, destination port, and protocol type to distribute traffic. It ensures requests from the same client may be routed to different backend instances, promoting a balanced workload across servers.
Source IP Affinity Modes: These modes provide session persistence:
- Two-Tuple Mode (Client IP): Routes all traffic from the same client IP to the same backend instance.
- Three-Tuple Mode (Client IP and Protocol): Adds protocol type into the routing decision, ideal for cases like Remote Desktop Gateway or media uploads where different protocols are involved.

While hash-based distribution ensures even load spreading, source IP affinity can lead to imbalances. For example, if traffic comes through corporate proxies or NAT gateways, multiple users might appear as a single client IP, directing all their traffic to one server.

Distribution Mode	Hash Based	Session Persistence: Client IP	Session Persistence: Client IP and Protocol
Overview	Spreads traffic across all healthy instances	Routes traffic to the same backend instance	Routes traffic to the same instance by IP and protocol
Tuples	Five-tuple	Two-tuple	Three-tuple
Azure Portal Configuration	Session persistence: None	Session persistence: Client IP	Session persistence: Client IP and protocol
Best Used For	Maximum load distribution	Applications requiring session stickiness	Multi-protocol applications needing session persistence

Adjusting Distribution Settings

Understanding these modes is key to balancing traffic effectively. Thankfully, Azure allows you to tweak these settings without downtime, so you can make real-time adjustments. Use the Flow Distribution tab in the Load Balancer Insights dashboard to spot uneven traffic patterns and fine-tune your configuration.

Through the Azure Portal:

Go to your load balancer resource and select Load-balancing rules under Settings.
Pick the rule you want to modify and locate the Session persistence dropdown. Choose from:
- None for hash-based distribution.
- Client IP for two-tuple affinity.
- Client IP and protocol for three-tuple affinity.
Save your changes.

Using PowerShell:

For automated or large-scale changes, PowerShell provides a streamlined way to update settings. The LoadDistribution property accepts:

"Default" for hash-based (five-tuple),
"SourceIP" for two-tuple, and
"SourceIPProtocol" for three-tuple.

Example:

$lb = Get-AzLoadBalancer -Name MyLoadBalancer -ResourceGroupName MyResourceGroupLB
$lb.LoadBalancingRules[0].LoadDistribution = "Default"
Set-AzLoadBalancer -LoadBalancer $lb

Using Azure CLI:

The Azure CLI also supports distribution updates with the az network lb rule update command. Use the --load-distribution parameter to set your preference:

az network lb rule update \
    --lb-name myLoadBalancer \
    --load-distribution Default \
    --name myHTTPRule \
    --resource-group myResourceGroupLB

If traffic seems concentrated on specific backend instances, switching from source IP affinity to hash-based mode often resolves the issue. Keep in mind, though, that changes to backend pool members will redistribute client requests, potentially disrupting session persistence.

For applications where session persistence isn't critical, hash-based distribution typically delivers better performance and ensures a more balanced workload across your backend servers.

For more tips on improving your Azure setup - covering cost efficiency, security, and performance - check out Azure Optimization Tips, Costs & Best Practices.

Step 3: Examine Traffic Flow and Session Persistence

Once you've adjusted your distribution settings, it's important to take a closer look at how traffic moves through your load balancer. Even with everything configured correctly, traffic patterns can reveal potential issues early on. This step connects the configuration changes made in Step 2 with actual traffic behaviour.

Monitor Flow Distribution Metrics

Azure Load Balancer offers real-time metrics that give you a clear picture of traffic distribution. The Flow Distribution tab in Load Balancer Insights is especially helpful for spotting bottlenecks or unusual patterns. Keep an eye on key metrics like:

Data Path Availability: Checks the health of your load balancer infrastructure every two minutes, helping you identify whether issues are related to infrastructure or your application.
SYN Count, Byte Count, and Packet Count: Useful for monitoring overall traffic volume.
SNAT Connection Count: Crucial for diagnosing outbound connectivity issues.

To view these metrics, go to your load balancer resource in the Azure portal and open the Metrics page. Adjust the metric views to identify any traffic imbalances. For a more comprehensive analysis, combine Data Path Availability with Health Probe Status on a single chart - this makes it easier to pinpoint whether problems stem from infrastructure or application settings. While the Standard Load Balancer SKU can support up to 1,000 backend instances, uneven traffic distribution can still occur. Filtering and grouping metrics by specific backend instances can help you identify where adjustments might be needed.

Address Session Persistence Issues

Using the traffic flow data, you can tackle session persistence challenges. If you notice problems, such as uneven traffic distribution caused by proxies or NAT gateways, consider switching your session persistence setting. For instance, changing from "Client IP" to "None" can help when proxies or gateways cause multiple users to appear as a single client IP. This adjustment ensures traffic is distributed more evenly across backend instances.

Session persistence problems, like unexpected logouts or inconsistent application behaviour, often arise when multiple users are routed to the same backend instance due to shared client IPs. By switching to "None", you can reduce these conflicts, though you may need to implement application-level session management if sticky sessions are required.

To ensure your settings work as intended, test multiple connections from the same client and verify that persistent routing behaves as expected when enabled. For applications that don't rely on session persistence, the default five-tuple hash distribution - based on source IP, source port, destination IP, destination port, and protocol type - may offer better performance by spreading traffic more evenly.

However, enabling session affinity can sometimes create load imbalances, especially when most traffic comes from a single client IP or uses the same protocol. If session persistence is essential, consider using application-level session management to maintain stability even as configurations evolve.

Step 4: Test for SNAT Port Exhaustion and Outbound Connectivity

The next step is to check for SNAT (Source Network Address Translation) port exhaustion. This issue can disrupt your application’s outbound connections, leading to failed requests and slower performance. SNAT exhaustion happens when backend instances run out of available outbound ports. Common signs include connection failures, delayed responses, and timeouts.

Monitor SNAT Port Usage

Keep track of SNAT port usage based on your Azure service setup. If you're using the Standard Load Balancer, Azure provides detailed metrics like SNAT Connection Count, Allocated SNAT Ports, and Used SNAT Ports. These diagnostics aren’t available with the Basic Load Balancer, making the Standard tier a better choice for monitoring.

Azure allocates SNAT ports by default depending on the size of your backend pool. Smaller pools get more ports per instance, whereas larger pools share the ports more thinly. Here’s the breakdown:

Pool size (VM instances)	Default SNAT ports
1-50	1,024
51-100	512
101-200	256
201-400	128
401-800	64
801-1,000	32

To monitor these metrics, go to the Azure portal, access your Load Balancer resource, and check the Metrics page. Pay close attention to the Used SNAT Ports metric - if this approaches the allocated limit, you’re nearing exhaustion. Set alerts when usage hits 80% of the allocated ports to act before issues arise.

For deeper analysis, use Network Watcher’s packet capture feature to study traffic patterns and identify connections consuming the most ports. Services like Azure App Service and AKS also offer their own diagnostic tools for checking SNAT port allocation.

Once you’ve identified high usage, take immediate steps to address outbound connectivity issues.

Address Outbound Connectivity Problems

If SNAT port exhaustion occurs, implementing a NAT Gateway is one of the most effective solutions. A NAT Gateway significantly boosts port availability - each public IP provides 64,512 ports, and with up to 16 IPs, you can access over a million ports. This dynamic allocation eliminates traditional port limitations.

To set up a NAT Gateway, create the resource and link it to your subnet. This ensures all outbound traffic routes through the gateway, which automatically manages port allocation and resolves exhaustion issues.

Alternatively, you can configure outbound rules on your Standard Load Balancer. This lets you manually allocate SNAT ports using the formula:
Number of frontend IPs × 64,000 ÷ Number of backend instances.
Each frontend IP address provides 64,000 ports for SNAT, though each load balancing rule uses eight ports from this pool.

Additional steps at the application level can also help. For instance, enable connection pooling to reuse existing connections instead of opening new ones for every request. This is especially useful for apps making frequent calls to the same endpoints. If your application connects to Azure services, consider using Azure Private Link instead of service endpoints. Private Link bypasses SNAT entirely by offering direct private connectivity, eliminating port usage concerns.

Lastly, note that default outbound access will no longer be available after 30th September 2025. If your setup relies on this method, plan to switch to a NAT Gateway or configure explicit outbound rules well before the deadline to avoid disruptions.

Step 5: Run Network Diagnostics and Collect Logs

After examining health, distribution, traffic flow, and SNAT usage, the next step is to conduct network diagnostics and gather logs. If earlier troubleshooting hasn't resolved the issue, this deeper analysis can uncover more complex problems or provide the necessary documentation for escalating the issue to Azure support.

Use Diagnostic Tools

Azure offers several diagnostic tools to evaluate the performance and connectivity of your Load Balancer. Network Watcher includes features like Connection Troubleshoot and VNet flow logs, which help you analyse traffic patterns and identify connectivity issues.

For real-time monitoring, enable multi-dimensional metrics in Azure Monitor. Head to the Metrics section to track key indicators such as:

DipAvailability: Reflects backend pool health.
ByteCount and PacketCount: Indicate traffic volume.

Azure's Resource Health feature checks your Load Balancer's availability every two minutes by assessing data path connectivity. Additionally, tools like Connection Troubleshoot and VNet flow logs provide a detailed view of traffic behaviour. You can also configure alerts for specific metrics, such as backend pool health dropping below a set threshold or unusual traffic patterns, to receive timely notifications.

Once you've reviewed these metrics and identified potential issues, proceed to collect logs for a deeper dive or for escalation to support.

Prepare Logs for Support

To streamline troubleshooting and support, ensure logs are collected and organised accurately. Azure Load Balancer provides health event logs, which are useful in diagnosing performance issues. These logs are available for both Standard (regional and global tiers) and Gateway Load Balancers.

Set up diagnostic settings to route resource logs to Azure Monitor Logs. Health events are published every minute during detection windows and are categorised by severity, such as Critical or Warning. To avoid excessive logging, Azure enforces a cooldown period after a health event is published.

Use Log Analytics to query and filter the collected data effectively. For instance, to analyse traffic through your Load Balancer, you can run the following query in your Log Analytics workspace:

NTANetAnalytics
| where DestLoadBalancer == '<Subscription ID>/<Resource Group name>/<Load Balancer name>'

Replace the placeholders with your specific Load Balancer details to review logs for inbound flows.

When preparing logs for Azure support, include the following:

Your diagnostic settings configuration.
Time ranges when issues occurred.
Specific error messages or symptoms observed.
Correlation with any recent configuration changes.

Also, export relevant log entries and metrics from both normal operation periods and times when issues were detected. Document all troubleshooting steps you’ve taken so far. This detailed information will help Azure support engineers quickly understand your environment and expedite the resolution process.

Conclusion

To effectively address Azure Load Balancer traffic issues, it’s important to follow a systematic approach. The five steps in this guide provide a structured way to identify and resolve distribution challenges, helping to prevent disruptions to your applications.

Each step plays a key role in maintaining performance. Regularly monitor the health of backend pools and ensure probe configurations are correctly set up to support smooth load balancing. Fine-tuning distribution settings can enhance performance, as highlighted in Step 3, where analysing traffic flow helps uncover hidden imbalances. Similarly, addressing session persistence issues can prevent lost sessions or erratic application behaviour. Step 4 emphasised the importance of keeping an eye on SNAT port usage to avoid outbound connectivity failures that could impact your applications.

Azure's diagnostic tools, like Azure Monitor and Log Analytics, offer invaluable insights for tackling complex troubleshooting scenarios. Proactive monitoring is essential to prevent traffic distribution issues. For example, configuring health probes ensures unhealthy instances are automatically removed, while distributing virtual machines (VMs) across multiple availability zones adds another layer of resilience. Network Security Groups can also help by controlling traffic flow effectively. Reviewing diagnostic logs regularly allows you to spot potential problems early, and ensuring your backend pool is appropriately sized means your infrastructure can handle traffic demands without overburdening any single VM.

Incorporating these steps into your monitoring routine not only helps maintain application performance but also strengthens your overall Azure strategy. For more tips on optimising Azure, check out Azure Optimisation Tips, Costs & Best Practices.

FAQs

What are the main reasons for traffic distribution issues in Azure Load Balancer, and how can you fix them?

Traffic distribution problems in Azure Load Balancer often stem from source persistence settings, which can cause traffic to stick to specific backend instances, leading to uneven routing. Other frequent culprits include health probe failures, incorrectly configured ports, or SNAT port exhaustion - all of which can result in unresponsive backend virtual machines or inefficient load balancing. Furthermore, the five-tuple hashing algorithm used for traffic distribution may contribute to imbalances if backend instances are unhealthy or improperly set up.

To address these challenges, make sure health probes are set up correctly, confirm that all backend instances are functioning as expected, and avoid using source persistence unless absolutely necessary. Keep an eye on SNAT port usage to prevent exhaustion, and review port and backend configurations to ensure everything is in order. Consistent monitoring and proactive troubleshooting are key to maintaining effective traffic distribution.

What is session persistence in Azure Load Balancer, and how should it be configured for optimal traffic distribution?

Session persistence - sometimes referred to as session affinity or sticky sessions - is a method that ensures all requests from a particular client are consistently directed to the same backend server. This approach is especially important for stateful applications where maintaining an uninterrupted session is essential.

To set up session persistence effectively, you can use methods like source IP affinity or hash-based distribution. These techniques rely on hashing either a two-tuple (source and destination IP) or a three-tuple (source IP, destination IP, and protocol type) to route traffic predictably. When configured correctly, this prevents uneven traffic loads and supports consistent session handling, enhancing both application performance and the overall user experience.

How can I prevent SNAT port exhaustion and maintain reliable outbound connectivity with Azure Load Balancer?

To prevent SNAT port exhaustion and maintain reliable outbound connectivity with Azure Load Balancer, here are some practical steps you can take:

Adjust the allocation of SNAT ports based on the size of your backend pool and how your workloads operate.
Spread outbound traffic across multiple virtual machines to ease the load on individual ports.
Keep an eye on SNAT port usage regularly to catch and resolve any issues before they escalate.
Use Virtual Network NAT for improved scalability and simpler management of outbound connectivity.

By actively managing SNAT ports and incorporating tools like Virtual Network NAT, you can ensure your Azure Load Balancer performs reliably and keeps your applications running smoothly.