I remember the first time I deployed a suite of microservices into production (for a cashback company). Everything seemed to run smoothly until services started talking to each other in ways I never fully anticipated. One microservice would fail, causing a ripple effect of timeouts and errors across the entire system. The lesson? In a distributed world, every service is only as strong as its resilience strategy.

Enter Polly. Polly is the go-to .NET library for dealing with transient faults and unpredictable network issues challenges that are magnified in microservice architectures. Whether you’re calling a payment microservice, retrieving weather data for shipping cost estimates, or connecting to any external API, Polly ensures your calls can handle real-world chaos.

Visual Overview

Below is a simple conceptual diagram of how multiple microservices might interact with each other or external APIs, all protected by Polly’s resilience policies.

┌────────────────────┐       ┌───────────────────────┐
 │  .NET Microservice │       │ External REST Service │
 │ (e.g., Service A)  │  ---> │  (e.g., Weather API)  │
 │                    │       │                       │
 │     Polly Policies │       │  Potential Failures   │
 └────────────────────┘       └───────────────────────┘
         │
         │        ┌────────────────────┐
         │        │  .NET Microservice │
         └──────> │ (e.g., Service B)  │
                  │                    │
                  │     Polly Policies │
                  └────────────────────┘

Each microservice can independently configure Polly to handle retries, timeouts, circuit breakers, and more. That way, a failure in one service or an external API doesn’t bring down your entire ecosystem.

1. Start with the Right Building Blocks

Polly has evolved significantly and now comes in a modular v8+ form that integrates seamlessly with .NET 9 and beyond perfect for microservices, where you often need flexibility in how you plug into different services.

🛠️ Install the Essentials

dotnet add package Polly.Core
dotnet add package Microsoft.Extensions.Http.Resilience
  • Polly.Core: Core resilience components (retry, circuit breaker, timeout, etc.).
  • Microsoft.Extensions.Http.Resilience: Out-of-the-box integration with HttpClientFactory, making it straightforward to configure named clients.

Installing these packages allows each microservice to plug into smart policies helping the entire ecosystem handle transient errors gracefully.

2. Add Resilience to HTTP Clients Like a Pro

In a microservice architecture, it’s normal for services to make numerous HTTP calls to each other or to external APIs. Failures are inevitable: network splits, timeouts, 5xx errors, and more. Polly lets you define how each service should handle those failures.

In Program.cs, Hook Polly into an HTTP Client

builder.Services.AddHttpClient("WeatherApi")
    .AddStandardResilienceHandler(options =>
    {
        // Retry Strategy
        options.Retry.MaxRetryAttempts = 5;
        options.Retry.BackoffType = DelayBackoffType.Exponential;
        options.Retry.Delay = TimeSpan.FromSeconds(2);

        // Circuit Breaker
        options.CircuitBreaker.SamplingDuration = TimeSpan.FromSeconds(30);
        options.CircuitBreaker.FailureRatio = 0.5;
    });

What’s Happening?

  1. Retries up to 5 times when requests fail (like 500-series errors).
  2. Exponential Backoff spreads out retries (2s, 4s, 8s…), reducing load on downstream services.
  3. A Circuit Breaker monitors failures for 30 seconds; if more than 50% fail, it opens to block further calls temporarily (preventing a cascading meltdown).

Microservice Use Case

  • Order Service → Payment Service: If your payment provider has a momentary glitch, a retry might resolve a temporary outage without user impact.
  • Analytics Service → External APIs: If data ingestion fails occasionally, retries can stabilize the flow and avoid partial data loss.

3. Crafting Custom Pipelines (Precision for Critical Paths)

Not all microservices have the same reliability or latency requirements. Some need more granular control (e.g., immediate failover for user authentication, cautious retries for inventory checks).

Example: A Custom Policy with Retry, Timeout, and Circuit Breaker

builder.Services.AddResiliencePipeline("critical-pipeline", pipelineBuilder =>
{
    pipelineBuilder
        .AddRetry(new RetryStrategyOptions
        {
            MaxRetryAttempts = 3,
            Delay = TimeSpan.FromSeconds(1),
            BackoffType = DelayBackoffType.Exponential,
            ShouldHandle = new PredicateBuilder()
                .Handle<HttpRequestException>()
                .HandleResult<HttpResponseMessage>(r => !r.IsSuccessStatusCode)
        })
        .AddTimeout(TimeSpan.FromSeconds(10))
        .AddCircuitBreaker(new CircuitBreakerStrategyOptions
        {
            FailureRatio = 0.3,
            SamplingDuration = TimeSpan.FromSeconds(15),
            BreakDuration = TimeSpan.FromSeconds(30)
        });
});

Access this pipeline in your microservices via ResiliencePipelineProvider.

Key Concepts

  • ShouldHandle decides which scenarios to retry (e.g., a 500 server error or HttpRequestException).
  • Timeout: Prevents any request from hanging indefinitely, which is crucial in a microservice chain where a single stalled request can cascade into bigger issues.
  • Circuit Breaker: Protects your service from continuous failures that might overwhelm both the caller and the callee.

4. Advanced Resilience: Hedging & Fallbacks

If standard retries aren’t enough (for instance, if you have multi-regional endpoints or partial data that you can serve as a fallback), Polly also supports more advanced patterns.

Hedging: Race Two Requests, Pick the Fastest

builder.Services.AddHttpClient("HedgedWeatherClient")
    .AddStandardHedgingHandler();  // Parallel requests, returns first success

Microservice Example

  • If you host weather or user profile data in multiple regions, hedging can send requests to more than one region and return whichever responds first.

Fallback: Provide a Default Response if Everything Fails

var fallback = new ResiliencePipelineBuilder<HttpResponseMessage>()
    .AddFallback(new FallbackStrategyOptions<HttpResponseMessage>
    {
        ShouldHandle = new PredicateBuilder<HttpResponseMessage>()
            .HandleResult(r => r.StatusCode == HttpStatusCode.InternalServerError),
        FallbackAction = _ =>
            Outcome.FromResultAsValueTask(new HttpResponseMessage(HttpStatusCode.OK)
            {
                Content = new StringContent("Service unavailable, showing cached data.")
            })
    })
    .Build();
  • Fallback ensures your microservice returns a graceful response like cached data rather than just failing outright.

5. Best Practices: Building for Real-World Chaos

A few additional tips to make sure your microservices remain robust in production:

Idempotency ✅

Only retry operations that are safe. For instance, you don’t want to double-charge a user by retrying a payment without proper safeguards. Design your microservices so that critical operations either handle duplicates safely or use unique transaction IDs to prevent unwanted side effects.

Telemetry 📊

Hook Polly’s resilience data into your observability system whether that’s Serilog, Splunk, or Application Insights. Monitoring and alerting on retries and circuit-breaker activations helps you spot problems quickly in a distributed environment.

Add Jitter to Avoid Retry Storms 🎲

options.Retry.DelayGenerator = args =>
{
    var baseDelay = TimeSpan.FromSeconds(Math.Pow(2, args.AttemptNumber));
    var jitter = TimeSpan.FromMilliseconds(Random.Shared.Next(0, 500));
    return new ValueTask<TimeSpan?>(baseDelay + jitter);
};

Jitter helps microservices avoid synchronized retry storms, which can overwhelm a failing service and make outages even worse.

6. Test Like a Scientist

Polly is very test-friendly, allowing you to simulate different failure modes in each microservice before going live.

var pipeline = new ResiliencePipelineBuilder()
    .AddRetry()
    .Build();

var descriptor = pipeline.GetPipelineDescriptor();
Console.WriteLine($"Configured strategies: {string.Join(", ", descriptor.Strategies.Select(s => s.Name))}");

Check the configured strategies, verify retry logic, timeouts, and fallbacks ensuring each microservice is prepared for real-world stresses.

Additional Resources

Summary: Why Polly Makes Your Microservices Bulletproof

Polly is more than just a retry helper it’s a toolkit for resilience in a microservice architecture. By mixing and matching Polly’s features, you ensure each microservice stands on its own when dealing with network hiccups or external service outages.

Feature Benefit
Retry Recover from transient faults automatically
Timeout Prevent indefinite waiting
Circuit Breaker Halt cascading failures and protect resources
Fallback Deliver a graceful user experience under load
Hedging Get the fastest response from multiple sources

Need help integrating Polly in your microservices or simulating edge cases? Feel free to reach out. Resilience is a journey especially in complex distributed systems and Polly ensures your services keep running even when the world doesn’t cooperate.

Ready to make your microservices truly bulletproof? Let’s do it together!