Skip to content

Grain Resilience

Polly v8 retry, circuit-breaker, and timeout patterns for grain calls.

Retry, circuit-breaker, and timeout strategies for Orleans grain calls, powered by Polly v8.

  • How to retry transient grain failures automatically
  • How to add per-call timeouts to grain calls
  • How to protect a downstream service with a circuit breaker
  • How to compose multiple strategies into a single reusable pipeline

GrainResilience is a thin F#-idiomatic wrapper around Polly v8. It lets you wrap any grain call in a resilience pipeline without touching the grain implementation.

open Orleans.FSharp
// Retry a grain call up to 3 times with a 500 ms delay between attempts
let! result =
GrainResilience.retry<string> 3 (TimeSpan.FromMilliseconds 500) (fun () ->
grain.HandleMessage(FetchData id))

All helpers work with Task<'T>, keeping the code in your standard task { } expressions.


open System
open Orleans.FSharp
// 1. Simple retry — 3 extra attempts with a 200 ms delay
let! inventory =
GrainResilience.retry<int> 3 (TimeSpan.FromMilliseconds 200) (fun () ->
inventoryGrain.HandleMessage(GetStock itemId))
// 2. Per-call timeout — fail fast if the grain takes > 2 seconds
let! price =
GrainResilience.withTimeout<decimal> (TimeSpan.FromSeconds 2) (fun () ->
pricingGrain.HandleMessage(GetPrice itemId))
// 3. Full options — timeout + circuit breaker + retry
let opts =
{ GrainResilience.defaultOptions with
MaxRetryAttempts = 3
RetryDelay = TimeSpan.FromMilliseconds 100
Timeout = Some(TimeSpan.FromSeconds 5)
CircuitBreakerThreshold = Some 5
CircuitBreakerDuration = Some(TimeSpan.FromSeconds 30) }
let! order =
GrainResilience.execute<OrderResult> opts (fun () ->
orderGrain.HandleMessage(PlaceOrder cart))

ResilienceOptions — configuration record

Section titled “ResilienceOptions — configuration record”
type ResilienceOptions =
{
/// Maximum number of retry attempts after the initial call. Default: 3
MaxRetryAttempts: int
/// Delay between retries. Default: 1 second
RetryDelay: TimeSpan
/// Open the circuit after this many consecutive failures. None = disabled.
CircuitBreakerThreshold: int option
/// How long the circuit stays open before attempting a probe call. Default: 30 s.
CircuitBreakerDuration: TimeSpan option
/// Per-attempt deadline. None = no timeout.
Timeout: TimeSpan option
}
let defaultOptions: ResilienceOptions =
{
MaxRetryAttempts = 3
RetryDelay = TimeSpan.FromSeconds 1
CircuitBreakerThreshold = None
CircuitBreakerDuration = None
Timeout = None
}

Retries the grain call up to maxAttempts times before giving up.

val retry<'T>
: maxAttempts : int
-> delay : TimeSpan
-> f : (unit -> Task<'T>)
-> Task<'T>
let! count =
GrainResilience.retry<int> 5 (TimeSpan.FromSeconds 1) (fun () ->
counterGrain.HandleMessage(Increment))

Enforces a hard deadline on a single grain call. Throws Polly.Timeout.TimeoutRejectedException when the deadline is exceeded.

val withTimeout<'T>
: timeout : TimeSpan
-> f : (unit -> Task<'T>)
-> Task<'T>
try
let! snapshot =
GrainResilience.withTimeout<Snapshot> (TimeSpan.FromSeconds 3) (fun () ->
snapshotGrain.HandleMessage(CreateSnapshot))
processSnapshot snapshot
with :? Polly.Timeout.TimeoutRejectedException ->
log.Warning("Snapshot timed out — skipping")

Full-options entry point. Compose timeout, circuit breaker, and retry in one call.

val execute<'T>
: options : ResilienceOptions
-> f : (unit -> Task<'T>)
-> Task<'T>
let myOpts =
{ GrainResilience.defaultOptions with
MaxRetryAttempts = 2
Timeout = Some(TimeSpan.FromSeconds 10) }
let! result = GrainResilience.execute<string> myOpts (fun () -> grain.HandleMessage cmd)

Creates a reusable ResiliencePipeline<'T> from options. Useful when you want to share a pipeline across many calls.

val buildPipeline<'T> : options : ResilienceOptions -> ResiliencePipeline<'T>
let pipeline = GrainResilience.buildPipeline<int> myOpts
// Reuse the same pipeline object many times
let! r1 = pipeline.ExecuteAsync(fun _ -> ValueTask<int>(grain1.HandleMessage cmd)).AsTask()
let! r2 = pipeline.ExecuteAsync(fun _ -> ValueTask<int>(grain2.HandleMessage cmd)).AsTask()

Creates a standalone, non-generic ResiliencePipeline backed only by a circuit breaker. Because the circuit-state is held inside the returned object, you should keep it as a long-lived value (e.g., a let binding at the service scope).

val circuitBreaker
: threshold : int
-> breakDuration : TimeSpan
-> ResiliencePipeline
// Open after 5 failures; stay open for 30 seconds
let private cb = GrainResilience.circuitBreaker 5 (TimeSpan.FromSeconds 30)
member _.CallExternalService(cmd) =
task {
try
return! cb.ExecuteAsync(fun _ -> ValueTask<_>(grain.HandleMessage cmd)).AsTask()
with :? Polly.CircuitBreaker.BrokenCircuitException ->
return Error "Service unavailable"
}

When you use execute with multiple strategies enabled, they are layered outer → inner:

request
→ [Timeout] ← outermost; cancels everything inside if deadline exceeded
→ [CircuitBreaker] ← trips on consecutive failures; short-circuits when open
→ [Retry] ← innermost; retries on exceptions
→ grain call

This means:

  • The timeout applies to the total time including all retries.
  • The circuit breaker opens only after the retry strategy has given up.
  • A single Polly TimeoutRejectedException or BrokenCircuitException bypasses the retry.

let! data =
GrainResilience.retry<Data> 3 (TimeSpan.FromMilliseconds 200) (fun () ->
dataGrain.HandleMessage(Fetch key))
let! result =
GrainResilience.withTimeout<Result> (TimeSpan.FromSeconds 2) (fun () ->
slowGrain.HandleMessage query)
// Service-scoped — shared circuit state across all calls
let cb = GrainResilience.circuitBreaker 10 (TimeSpan.FromMinutes 1)
member _.Query(cmd) =
cb.ExecuteAsync(fun _ -> ValueTask<_>(grain.HandleMessage cmd)).AsTask()
let productionOpts =
{ MaxRetryAttempts = 3
RetryDelay = TimeSpan.FromMilliseconds 500
CircuitBreakerThreshold = Some 10
CircuitBreakerDuration = Some(TimeSpan.FromSeconds 60)
Timeout = Some(TimeSpan.FromSeconds 15) }
let! response =
GrainResilience.execute<ApiResponse> productionOpts (fun () ->
apiGrain.HandleMessage(ApiRequest payload))
let! price =
GrainResilience.retry<decimal> 3 TimeSpan.Zero (fun () ->
FSharpGrain.ask<PricingState, PricingCommand, decimal> (GetPrice itemId) pricingGrain)

In unit tests, you can drive failures by throwing exceptions inside a closure without touching any real grain:

let mutable attempts = 0
let! result =
GrainResilience.retry<int> 3 TimeSpan.Zero (fun () ->
task {
attempts <- attempts + 1
if attempts < 3 then failwith "transient"
return 42
})
test <@ result = 42 @>
test <@ attempts = 3 @>

For integration tests with a real TestCluster, use a grain that tracks its own call count and fails on the first N calls — see FlakyGrain in tests/Orleans.FSharp.Integration/ClusterFixture.fs.


GrainResilience is in the Orleans.FSharp core package. It adds a dependency on Polly 8.x, which is already bundled — no additional NuGet reference is required.