10 Hidden Memory Leaks in ASP.NET Core Applications in 2026 — Real Causes, Before/After Code, Production Metrics
10 hidden ASP.NET Core memory leaks with before/after code — singletons, MemoryCache, EF tracker, LOH. Mattrx fleet RAM 12.6 GB → 2.3 GB.
- Author
- Randhir Jassal
- Published
- Reading time
- 26 min read
- Views
- 3 views
10 Hidden Memory Leaks in ASP.NET Core Applications in 2026 — Real Causes, Before/After Code, Production Metrics
ASP.NET Core has a GC. Memory "leaks" are supposed to be impossible. Right?
Then you check Azure metrics and your API has been climbing from 280 MB → 2.1 GB over four days, gets OOM-killed at 06:14 AM, auto-restarts, and starts climbing again. Six instances. Every four days. For months. Nobody can reproduce it locally because local sessions are too short.
In .NET, a "leak" almost never means "memory the runtime can't reclaim." It means objects rooted by something you forgot about — a static field, an event handler, a cache without an eviction policy, a singleton clinging to a scoped service, a
Taskthat never completes. The runtime is doing its job. You are holding the reference.This guide is the catalog of the 10 hidden ones we actually found and fixed at Mattrx — a multi-tenant marketing analytics SaaS (Angular 19 + .NET 9, 6 App Service instances, 110k MAU, ~95k LOC C#). Each leak has the before code that looks innocent, the after code that fixes it, a diagram for the gnarly ones, the symptom you'd see in Application Insights, and the exact dotnet-counters / dotnet-gcdump command that surfaces it. Plus the aggregate before/after: memory fleet 12.6 GB → 2.3 GB, zero OOM restarts in 90 days.
TL;DR
The 10 hidden leaks, ranked by how much memory they were holding at Mattrx:
| # | Leak | Hidden because | Mattrx wasted RAM |
|---|---|---|---|
| 1 | Singleton capturing a scoped service (DbContext) | DI compiles fine; works for 5 min; fails at hour 4 | ~340 MB / instance |
| 2 | Static event handlers from request-scoped objects | Event subscription looks local; receiver is static | ~210 MB / instance |
| 3 | MemoryCache with no SizeLimit | "Sliding expiration" feels safe but only triggers on access | ~480 MB / instance |
| 4 | EF Core DbContext change-tracker bloat (long-lived ctx, no AsNoTracking) | Queries return fine; tracker quietly retains thousands of entities | ~310 MB / instance |
| 5 | Fire-and-forget Task.Run capturing HttpContext / scoped services | Compiles, returns 200, never logs anything wrong | ~95 MB / instance |
| 6 | HttpClient socket exhaustion / handler retention from new HttpClient() | The C# is fine, but SocketsHttpHandler lifetime is wrong | ~80 MB + port exhaustion |
| 7 | SignalR connection / group dictionaries never cleaned on disconnect | "It works in dev" — dev sessions are short | ~150 MB / instance |
| 8 | Log scope with large structured objects (BeginScope(huge)) | Logging looks free; scope holds a graph for the request's life | ~110 MB / instance |
| 9 | Long-running IHostedService without CancellationToken propagation | The host stops; your loop doesn't; objects can't be collected | ~60 MB / instance |
| 10 | Large Object Heap (LOH) fragmentation from string concat / byte arrays | LOH > 85 KB, never compacted by default; fragments under load | ~120 MB / instance + GC pauses |
Aggregate Mattrx wins after fixing all 10 (3-week audit):
- Working set per instance: stable around 2.1 GB → 380 MB (−82%)
- Total fleet RAM (6 instances): 12.6 GB → 2.3 GB
- OOM-driven restarts: 8/week → 0 in 90 days
- GC Gen2 pause time (p99): 240 ms → 35 ms
- % Time in GC: 18% → 4%
- Allocation rate: 480 MB/sec → 90 MB/sec under peak load
- App Service tier: could downgrade from P2v3 → P1v3 — saving ~$420/month on the fleet
- Tail-latency p99 on
/api/campaigns: 820 ms → 180 ms (a side effect of fewer GC pauses)
The architecture didn't change. Six instances, same code shape, same SQL. The leaks were boring. The wins were not.
1. The mental model — what "leak" actually means in .NET
A .NET leak is not "memory the GC can't reclaim." It's "memory you've made unreclaimable by holding a reference."
The GC frees anything that has no path of references back to a GC root. The four roots are:
GC ROOTS — anything reachable from these is "alive":
┌────────────────────────────────────────────┐
│ 1. Static fields │
│ 2. Stack variables of running threads │
│ 3. CPU registers │
│ 4. GCHandles (pinned / weak / strong) │
└────────────────────────────────────────────┘
Object → field → field → field → ROOT
↑
If any path reaches a root,
the object lives forever.
Every leak in this guide is one of those four roots holding something it shouldn't. The trick is they don't look like roots — they look like a MemoryCache, a HttpClient, a Task.Run, a singleton service. The reference chain to the root is hidden by a layer of indirection.
The detection arsenal:
| Tool | What it shows | When to reach for it |
|---|---|---|
dotnet-counters monitor -p <pid> System.Runtime | Live heap size, allocation rate, % time in GC, gen sizes | First diagnostic; runs in prod |
dotnet-gcdump collect -p <pid> | Snapshot of all live objects + retention graph | Find which type is bloated |
dotnet-dump collect -p <pid> + WinDbg / dotnet-dump analyze | Full process dump | The hard cases |
| PerfView | Allocation traces over time | Find the allocator, not just the holder |
| Visual Studio Diagnostic Tools | Heap snapshots side-by-side | Compare "before request" and "after 1,000 requests" |
The shape of a leak hunt:
Step 1: dotnet-counters → confirm working set is growing
Step 2: dotnet-gcdump twice (5 min apart) → diff which types grew
Step 3: Open the dump → find the offending type
Step 4: Look at "Path to GC root" — which static field / event / cache holds it
Step 5: Apply one of the 10 fixes below
2. Mattrx — the running production stack
┌────────────────────────────────────────────────────────────────────┐
│ Azure App Service P2v3 (6 Linux containers) │
│ Mattrx.Api (ASP.NET Core 9, Kestrel) │
│ ↳ ~3,200 requests/sec peak │
│ ↳ ~140 SignalR connections / instance │
│ ↳ 240 EF Core migrations in flight │
└────────────────────────────────────────────────────────────────────┘
Before audit: working-set per instance: 280 MB → 2.1 GB over ~4 days → OOM @ ~2.5 GB → restart
After audit: steady 380 MB. Restarts only on deploy.
The 10 leaks below are in rough order of how much they cost us. Fix the top three and you'll see most of the win.
3. Leak #1 — Singleton capturing a scoped service (DbContext)
The bug: DbContext is registered as scoped (one per request). When a singleton receives it via constructor injection, the singleton holds a reference to that one DbContext for the app's lifetime — and through it, its entire change tracker. Every subsequent request gets a new DbContext, but the singleton's copy never goes away.
Before
// ❌ Compiles. DI doesn't catch it without ValidateScopes=true.
public class CampaignAuditService : ICampaignAuditService // ← registered as Singleton
{
private readonly AppDbContext _db; // ← captured ONCE at app startup, leaks forever
private readonly List<AuditEntry> _recentEntries = new();
public CampaignAuditService(AppDbContext db) { _db = db; }
public async Task LogAsync(Guid campaignId, string action)
{
_db.AuditLog.Add(new AuditEntry { CampaignId = campaignId, Action = action });
await _db.SaveChangesAsync();
_recentEntries.Add(/* ... */);
}
}
// Program.cs
builder.Services.AddSingleton<ICampaignAuditService, CampaignAuditService>(); // ← the bug
Two compounding leaks:
- The captured
_dbretains every entity loaded by every request that ever used it. _recentEntriesgrows unbounded.
Diagnostic
dotnet-counters monitor -p <pid> System.Runtime[gen-2-size]
# gen-2 climbs monotonically over hours/days. New DbContexts are
# created per request but the singleton's copy never dies.
After
// ✅ Fix #1 — inject IServiceScopeFactory to create a scope per call
public class CampaignAuditService : ICampaignAuditService
{
private readonly IServiceScopeFactory _scopeFactory;
public CampaignAuditService(IServiceScopeFactory scopeFactory)
{
_scopeFactory = scopeFactory;
}
public async Task LogAsync(Guid campaignId, string action)
{
using var scope = _scopeFactory.CreateScope();
var db = scope.ServiceProvider.GetRequiredService<AppDbContext>();
db.AuditLog.Add(new AuditEntry { CampaignId = campaignId, Action = action });
await db.SaveChangesAsync();
}
}
Or — better — change the service to scoped if it doesn't need to be a singleton.
Belt-and-braces — turn on scope validation in dev
// Program.cs
var builder = WebApplication.CreateBuilder(args);
if (builder.Environment.IsDevelopment())
{
builder.Host.UseDefaultServiceProvider(o =>
{
o.ValidateScopes = true;
o.ValidateOnBuild = true; // throws at startup if a singleton needs a scoped svc
});
}
This catches the leak at startup in CI. Worth its weight in gold.
Mattrx metric
CampaignAuditService alone retained ~340 MB per instance after 24 hours. Fix dropped working-set by 16% across the fleet.
Diagram — why this leaks
Singleton svc ──holds──► DbContext (req #1)
│
├──tracks── Entity[27]
├──tracks── Entity[42]
└──tracks── 4,000+ entities loaded by EVERY request that hit it
After 1 hour: change tracker has ~30k entities.
After 12 hours: ~250k entities.
After 4 days: instance OOM at ~2.5 GB.
4. Leak #2 — Static event handlers from request-scoped objects
The bug: A request-scoped object subscribes to a long-lived event (a static event, or an event on a singleton). The publisher holds a reference to the subscriber (via the delegate's Target field). The subscriber lives forever — and through it, every captured field including the HttpContext.
Before
// ❌ Looks innocent. Subscriber is request-scoped, publisher is static.
public static class TenantBus
{
public static event Action<TenantChanged>? TenantChangedEvent;
internal static void Publish(TenantChanged evt) => TenantChangedEvent?.Invoke(evt);
}
public class CampaignsController : ControllerBase
{
public CampaignsController(/*...*/)
{
TenantBus.TenantChangedEvent += OnTenantChanged; // ← static event holds `this`
}
private void OnTenantChanged(TenantChanged evt) { /* clear local cache */ }
}
Every request creates a new CampaignsController instance. Every one subscribes. None unsubscribe. After 10k requests, TenantBus.TenantChangedEvent has 10k subscribers, and through them, 10k HttpContexts.
Diagnostic
dotnet-gcdump collect -p <pid>
# Diff two dumps 5 min apart:
# "CampaignsController" instance count: 1,200 → 7,800
# Path to root: TenantBus.TenantChangedEvent → MulticastDelegate → CampaignsController
After — option A: don't use events for this. Use MediatR INotificationHandler or IMessageBus.
// ✅ Per-request handler that DI manages
public sealed class TenantChangedClearCache(IMemoryCache cache) : INotificationHandler<TenantChanged>
{
public Task Handle(TenantChanged evt, CancellationToken ct)
{
cache.Remove($"campaigns:{evt.TenantId}");
return Task.CompletedTask;
}
}
After — option B: if you must use events, unsubscribe in Dispose
// ✅ controller implements IDisposable, unsubscribes
public class CampaignsController : ControllerBase, IDisposable
{
public CampaignsController() { TenantBus.TenantChangedEvent += OnTenantChanged; }
public void Dispose() { TenantBus.TenantChangedEvent -= OnTenantChanged; }
/* ... */
}
But ASP.NET controllers are not always disposed in the order you expect, and a missed unsubscribe is silent. Prefer MediatR.
Mattrx metric
A TenantBus (legacy from the MVC days) retained 210 MB of HttpContext graphs across 6 instances. Removed in week 1.
5. Leak #3 — MemoryCache without SizeLimit
The bug: IMemoryCache.Set(key, value) with no size limit and no absolute expiration. Sliding expiration only fires when the key is accessed. Cold keys live forever.
Before
// ❌ Looks reasonable. Sliding 30-min expiration "should" be fine.
public class CampaignsService(IMemoryCache cache, ICampaignsRepo repo)
{
public Task<Campaign> GetAsync(Guid id) =>
cache.GetOrCreateAsync($"campaign:{id}", entry =>
{
entry.SlidingExpiration = TimeSpan.FromMinutes(30);
return repo.GetAsync(id);
})!;
}
If a campaign is queried once and never again, its entry never expires. Multiply by 100k campaigns × 6 instances × Campaign aggregate is ~8 KB → ~5 GB of "cache" that hasn't been touched in weeks.
Diagnostic
dotnet-counters monitor -p <pid> Microsoft.Extensions.Caching.Memory
# `total-entries` climbs forever; `memory-pressure` rises.
After
// ✅ size limit + absolute expiration + per-entry size
builder.Services.AddMemoryCache(o =>
{
o.SizeLimit = 200_000; // total "size units" across all entries
o.CompactionPercentage = 0.2; // when full, evict 20% of size
});
public class CampaignsService(IMemoryCache cache, ICampaignsRepo repo)
{
public Task<Campaign> GetAsync(Guid id) =>
cache.GetOrCreateAsync($"campaign:{id}", entry =>
{
entry.AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(15);
entry.SlidingExpiration = TimeSpan.FromMinutes(5);
entry.Size = 1; // every entry counts as 1 "unit"
return repo.GetAsync(id);
})!;
}
Three rules together:
SizeLimiton the cache itself — otherwise it grows unbounded.Sizeon every entry — without it,SizeLimitdoes nothing.- Always
AbsoluteExpirationRelativeToNowas a backstop — sliding alone can't evict cold keys.
For distributed scenarios, use Redis via IDistributedCache and let Redis' eviction policy (allkeys-lru) handle it. We mostly use Redis at Mattrx; MemoryCache is for hot-path microsecond reads only.
Mattrx metric
IMemoryCache alone was retaining ~480 MB / instance. This was the single biggest fix.
Diagram — sliding vs absolute expiration
SLIDING (30 min) ABSOLUTE (15 min)
───────────────── ──────────────────
set set
│ │
│ ← reads keep resetting timer │ ← timer counts down regardless
▼ ▼
evict (only after 30 min IDLE) evict (15 min after set)
Cold entries with sliding alone NEVER evict.
Cold entries with absolute DO evict.
Use BOTH: absolute = backstop, sliding = freshness.
6. Leak #4 — EF Core change-tracker bloat
The bug: Long-lived DbContexts (or repeated queries without AsNoTracking) accumulate tracked entities. EF Core retains every loaded entity to detect changes on SaveChanges(). For read-only queries, that's pure overhead — and a slow memory drip.
Before
// ❌ background worker that reuses a DbContext for hours
public class ReportGenerator : BackgroundService
{
private readonly AppDbContext _db; // injected once at startup
public ReportGenerator(AppDbContext db) { _db = db; }
protected override async Task ExecuteAsync(CancellationToken ct)
{
while (!ct.IsCancellationRequested)
{
// Load 5,000 campaigns to generate reports — TRACKED by default
var campaigns = await _db.Campaigns.ToListAsync(ct);
foreach (var c in campaigns) { /* read-only stats work */ }
await Task.Delay(TimeSpan.FromMinutes(1), ct);
}
}
}
After 4 hours: the change tracker has ~1.2M entities. Each is held until the context is disposed — which it never is.
Diagnostic
dotnet-counters monitor -p <pid> Microsoft.EntityFrameworkCore
# `ef-core-active-dbcontexts` flat (1)
# But heap snapshot shows millions of Campaign entities
After — three layered fixes
// ✅ Fix #1 — create a scope per iteration (so DbContext is disposed)
public class ReportGenerator(IServiceScopeFactory scopeFactory) : BackgroundService
{
protected override async Task ExecuteAsync(CancellationToken ct)
{
while (!ct.IsCancellationRequested)
{
using var scope = scopeFactory.CreateScope();
var db = scope.ServiceProvider.GetRequiredService<AppDbContext>();
// ✅ Fix #2 — AsNoTracking for read-only queries
var campaigns = await db.Campaigns.AsNoTracking().ToListAsync(ct);
// ... do work ...
await Task.Delay(TimeSpan.FromMinutes(1), ct);
}
}
}
// ✅ Fix #3 — default to NoTracking for the whole context (still opt-in for writes)
public class AppDbContext : DbContext
{
public AppDbContext(DbContextOptions<AppDbContext> opts) : base(opts) { }
public override int SaveChanges() => SaveChangesInternal();
protected override void OnConfiguring(DbContextOptionsBuilder b)
{
b.UseQueryTrackingBehavior(QueryTrackingBehavior.NoTrackingWithIdentityResolution);
}
}
For high-throughput APIs, set QueryTrackingBehavior.NoTrackingWithIdentityResolution as the global default. Use .AsTracking() explicitly on the few queries that need it (writes).
Mattrx metric
ReportGenerator retained ~310 MB / instance at peak. After: ~22 MB.
7. Leak #5 — Fire-and-forget Task.Run capturing HttpContext or scoped services
The bug: A controller does work off the request thread using Task.Run(() => …) or _ = SomeAsync() without awaiting. The lambda captures the controller, the controller captures HttpContext, the scope tries to dispose but the captured Task is still alive — so the scoped objects are also still alive.
Worse: the response returns, the request scope ends, the captured DbContext is now disposed while the Task is still using it. You get ObjectDisposedException sometimes + a leak the rest of the time.
Before
// ❌ Fire-and-forget + captured scope
[HttpPost("/api/campaigns/{id}/archive")]
public IActionResult Archive(Guid id, [FromServices] AppDbContext db)
{
// "Do this in the background so the response is fast"
_ = Task.Run(async () =>
{
var c = await db.Campaigns.FindAsync(id); // captured scoped db
await NotifyExternalSystemsAsync(c);
await db.SaveChangesAsync();
});
return Accepted();
}
Diagnostic
The smoking gun in Application Insights:
ObjectDisposedException: Cannot access a disposed object. Object name: 'AppDbContext'.
…happening at random times after the response has gone out.
After — pattern A: background Channel + IHostedService consumer
// ✅ Producer: just enqueue a message. No captures.
public class CampaignsController(IBackgroundWorkQueue queue) : ControllerBase
{
[HttpPost("/api/campaigns/{id}/archive")]
public IActionResult Archive(Guid id)
{
queue.Enqueue(new ArchiveCampaign(id));
return Accepted();
}
}
// ✅ Consumer: owns its own scope, lives outside the request lifecycle
public class CampaignArchiveWorker(
IBackgroundWorkQueue queue,
IServiceScopeFactory scopeFactory) : BackgroundService
{
protected override async Task ExecuteAsync(CancellationToken ct)
{
await foreach (var msg in queue.DequeueAllAsync(ct))
{
using var scope = scopeFactory.CreateScope();
var db = scope.ServiceProvider.GetRequiredService<AppDbContext>();
var c = await db.Campaigns.FindAsync([msg.Id], ct);
await NotifyExternalSystemsAsync(c, ct);
await db.SaveChangesAsync(ct);
}
}
}
After — pattern B: Hangfire (covered in Angular + .NET architecture guide)
// ✅ idempotent + retried by Hangfire
jobs.Enqueue<ArchiveCampaignJob>(j => j.RunAsync(id, CancellationToken.None));
The rule: Never start a Task from a request handler that outlives the request. Hand it off to a queue, a hosted service, or a job system.
Mattrx metric
We had ~7 places with this pattern (audit logs, webhooks, "fire and forget" email). Together: ~95 MB / instance + 23 ObjectDisposedExceptions per week. Both → zero.
8. Leak #6 — HttpClient socket exhaustion / handler retention
The bug has two flavors. Either:
// ❌ A — new HttpClient() per call: SOCKET exhaustion + each retains a handler
public async Task<string> FetchAsync(string url)
{
using var client = new HttpClient(); // sockets in TIME_WAIT for 4 minutes
return await client.GetStringAsync(url);
}
Or:
// ❌ B — static HttpClient: DNS changes never picked up; handler retains everything
private static readonly HttpClient _client = new();
Both leak in different ways. (A) exhausts ephemeral ports and the underlying SocketsHttpHandler graph grows. (B) is closer to right but DNS resolution gets stale, and the handler holds onto connections to dead endpoints.
After — IHttpClientFactory (the answer since .NET Core 2.1, still under-used in 2026)
// ✅ Typed clients managed by the factory
builder.Services.AddHttpClient<IWebhookClient, WebhookClient>(client =>
{
client.BaseAddress = new Uri("https://hooks.partner.io");
client.Timeout = TimeSpan.FromSeconds(10);
})
.SetHandlerLifetime(TimeSpan.FromMinutes(5)); // factory rotates handlers every 5 min
The factory rotates HttpMessageHandlers on a schedule (default 2 minutes), so DNS stays fresh and old handlers get GC'd. Sockets are pooled and reused.
For long-lived static usage outside ASP.NET (consoles, workers), the modern answer is:
// ✅ One static SocketsHttpHandler with pooled connection lifetime
private static readonly SocketsHttpHandler _handler = new()
{
PooledConnectionLifetime = TimeSpan.FromMinutes(2), // socket gets recycled every 2 min
};
private static readonly HttpClient _client = new(_handler);
Mattrx metric
Three legacy new HttpClient() call sites (webhook delivery, image uploads, partner ping). Together: ~80 MB retained + intermittent SocketException: Address already in use under load. Both → fixed.
9. Leak #7 — SignalR connection / group dictionaries never cleaned up
The bug: A Hub subclass stores connection IDs in a static or singleton dictionary so the app can target users by ID. Disconnect happens, the dictionary never gets cleaned. ConnectionId → User graph grows forever.
Before
// ❌ Static map, no disconnect cleanup
public class InboxHub : Hub
{
private static readonly ConcurrentDictionary<string, UserPresence> _online = new();
public override Task OnConnectedAsync()
{
var user = LoadUser(Context.UserIdentifier); // 4 KB graph per user
_online[Context.ConnectionId] = new UserPresence { User = user, ConnectedAt = DateTime.UtcNow };
return base.OnConnectedAsync();
}
// ❌ NO OnDisconnectedAsync override → _online grows forever
}
Diagnostic
SignalR.Connections count from Application Insights is the cheap signal. If active connections look stable but _online.Count only grows, you have a leak.
After
// ✅ Explicit cleanup on disconnect
public class InboxHub : Hub
{
private static readonly ConcurrentDictionary<string, UserPresence> _online = new();
public override Task OnConnectedAsync()
{
var user = LoadUser(Context.UserIdentifier);
_online[Context.ConnectionId] = new UserPresence { User = user, ConnectedAt = DateTime.UtcNow };
return base.OnConnectedAsync();
}
public override Task OnDisconnectedAsync(Exception? ex)
{
_online.TryRemove(Context.ConnectionId, out _); // ← the missing piece
return base.OnDisconnectedAsync(ex);
}
}
Better: use SignalR groups + IUserIdProvider and don't maintain your own dictionary. Or back the presence map with Redis (TTL-based eviction) so a missed OnDisconnectedAsync doesn't strand state forever.
Mattrx metric
InboxHub._online was retaining presence for ~38,000 ghost connections after 4 days. ~150 MB / instance. Per the inbox SLA we'd already configured aggressive KeepAlive (15s) and ClientTimeout (30s) — the leak was purely missing the override.
10. Leak #8 — Log scopes holding large structured objects
The bug: using (logger.BeginScope(new { campaign })) includes the whole Campaign graph in every log message inside the scope. The scope itself holds the object alive for the life of the request — even if the request is slow (e.g., a 10-second export). Worse, structured logging serializers may walk the entire object graph including navigation properties, holding references to children.
Before
// ❌ Big object in scope
public async Task<IActionResult> Export(Guid id)
{
var campaign = await _db.Campaigns
.Include(c => c.Events)
.Include(c => c.Reports)
.FirstAsync(c => c.Id == id); // 50 MB graph
using (_logger.BeginScope(new { campaign })) // ← entire graph held for scope
{
_logger.LogInformation("Starting export");
await _exporter.RunAsync(campaign);
_logger.LogInformation("Export done");
}
return Ok();
}
Multiply by 20 concurrent exports = 1 GB held for tens of seconds, blocked from GC.
After
// ✅ scope only the minimum identifier you need to correlate
using (_logger.BeginScope("CampaignId={CampaignId} TenantId={TenantId}", id, tenantId))
{
_logger.LogInformation("Starting export");
await _exporter.RunAsync(campaign);
_logger.LogInformation("Export done");
}
Three rules:
- Log identifiers, not graphs.
CampaignId={id}notcampaign={campaign}. - Use the message template form, not anonymous objects — it's faster and Serilog/Microsoft.Extensions.Logging both handle it cleanly.
- Sanity-check serialized log output in dev — if a single log line is > 1 KB, you're logging too much.
Mattrx metric
Export and BulkArchive were the worst offenders. Cleanup: ~110 MB / instance average, peak savings ~600 MB during exports.
11. Leak #9 — Long-running IHostedService without CancellationToken propagation
The bug: A BackgroundService doesn't honor stoppingToken (or doesn't pass it down to inner awaits). When the host stops, the service keeps running. Objects allocated by it can't be collected, and graceful shutdown stalls.
Before
// ❌ Ignores stoppingToken
public class CampaignPoller : BackgroundService
{
private readonly HttpClient _http;
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
while (true) // ← never exits
{
var data = await _http.GetStringAsync("https://..."); // no token!
await ProcessAsync(data);
await Task.Delay(TimeSpan.FromSeconds(30)); // no token!
}
}
}
When IHostApplicationLifetime.StopApplication() runs, this service ignores it. The host's 30-second shutdown timeout elapses; runtime force-aborts. In containerized environments, this means dirty pod terminations + duplicated work.
After
// ✅ pass stoppingToken to everything
public class CampaignPoller(HttpClient http) : BackgroundService
{
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
while (!stoppingToken.IsCancellationRequested)
{
try
{
var data = await http.GetStringAsync("https://...", stoppingToken);
await ProcessAsync(data, stoppingToken);
}
catch (OperationCanceledException) when (stoppingToken.IsCancellationRequested)
{
break;
}
catch (Exception ex)
{
/* log + continue */
}
try { await Task.Delay(TimeSpan.FromSeconds(30), stoppingToken); }
catch (OperationCanceledException) { break; }
}
}
}
Mattrx metric
Cleanup of 4 hosted services freed ~60 MB / instance (long retention of HTTP responses + parsed JSON graphs during shutdown windows) and eliminated dirty terminations completely.
12. Leak #10 — Large Object Heap (LOH) fragmentation
The bug: Any allocation ≥ 85,000 bytes goes on the Large Object Heap. By default, the LOH is not compacted — fragmentation accumulates. Heavy string concatenation, byte[] allocations (image upload, JSON serialization of big payloads), or repeated List<T> EnsureCapacity calls fragment it until GC must allocate a new LOH segment for every big object.
The symptom: working set climbs even though heap usage doesn't, and Gen2 GC pauses (which compact LOH on GCSettings.LargeObjectHeapCompactionMode = CompactOnce) get worse.
Before
// ❌ string concat in a hot loop → repeated huge string allocations
public string BuildReport(IReadOnlyList<Campaign> campaigns)
{
var s = "";
foreach (var c in campaigns)
{
s += $"{c.Id},{c.Name},{c.Budget.Amount}\n"; // each + allocates a new big string
}
return s;
}
After
// ✅ StringBuilder or pooled buffer
public string BuildReport(IReadOnlyList<Campaign> campaigns)
{
var sb = new StringBuilder(campaigns.Count * 64);
foreach (var c in campaigns)
{
sb.Append(c.Id).Append(',').Append(c.Name).Append(',').Append(c.Budget.Amount).Append('\n');
}
return sb.ToString();
}
For really hot paths, use ArrayPool<T>.Shared:
// ✅ pooled buffer — no LOH allocation
public byte[] BuildJsonReport(IReadOnlyList<Campaign> campaigns)
{
using var buffer = MemoryPool<byte>.Shared.Rent(256 * 1024);
using var writer = new Utf8JsonWriter(buffer.Memory.Span);
/* write */
return buffer.Memory.Slice(0, writer.BytesPending).ToArray();
}
Trigger LOH compaction at scheduled times (last resort)
// ✅ At a low-traffic window:
GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
GC.Collect();
Don't do this on every request. Once a day, off-peak, if fragmentation is the actual problem.
Mattrx metric
BuildReport (CSV export) + WebhookPayload (JSON serialization) together held ~120 MB / instance in fragmented LOH. Replaced both with StringBuilder + Utf8JsonWriter. GC Gen2 pause p99: 240ms → 35ms (most of the headline win).
13. The full Mattrx before/after
Captured over a 21-day audit (week 1 = find, week 2 = fix, week 3 = bake in prod):
| Metric | Before | After | Delta |
|---|---|---|---|
| Working set / instance (steady state) | 2.1 GB | 380 MB | −82% |
| Working set total fleet (6 instances) | 12.6 GB | 2.3 GB | −82% |
| OOM auto-restarts / week | 8 | 0 (90 days) | — |
| GC Gen2 pause p99 | 240 ms | 35 ms | −85% |
| % Time in GC | 18% | 4% | −78% |
| Allocation rate (peak) | 480 MB/s | 90 MB/s | −81% |
/api/campaigns p99 latency | 820 ms | 180 ms | −78% |
| App Service tier | P2v3 | P1v3 | savings: ~$420/mo |
| Dirty pod terminations / month | 12 | 0 | — |
ObjectDisposedException / week | 23 | 0 | — |
Three weeks of work. No new features. No infrastructure changes. Same SQL, same Redis, same Front Door. The fixes were all 10 boring leaks listed above.
14. The detection cookbook — what to run when memory grows
When you see a slow climb in Application Insights / Azure metrics:
STEP 1 — confirm it's a leak, not normal growth
dotnet-counters monitor -p <pid> System.Runtime
Watch:
• working-set climbing monotonically over 30+ min → leak
• gen-2-size climbing → reference leak
• exception-count spiking → unrelated, but log it
STEP 2 — find which TYPE is bloating
dotnet-gcdump collect -p <pid> (do twice, 10 min apart)
Open in PerfView → "Heap Snapshot" → "Diff"
Sort by net growth. Top of list is your suspect type.
STEP 3 — find which ROOT holds it
In PerfView: right-click suspect type → "Path to Root"
You'll see one of:
• Static field (X.s_field) → Leak #2 or a static dictionary
• System.Threading.Tasks.Task → Leak #5 (orphaned Task)
• Microsoft.Extensions.Caching.Memory.MemoryCache → Leak #3
• Microsoft.EntityFrameworkCore.ChangeTracker → Leak #1 or #4
• Microsoft.Extensions.Logging.LogScope → Leak #8
• SignalR connection map → Leak #7
STEP 4 — apply the fix (one of the 10 above)
STEP 5 — verify in prod for 7 days
Watch working-set on App Insights.
Flat = fixed. Slow climb = still leaking somewhere; repeat.
Build this into a runbook. The first hour of investigation is always the same five commands.
15. The mental checklist — before merging any non-trivial ASP.NET Core PR
- Does any singleton capture a scoped service? (Test with
ValidateScopes = true.) - Does any static event subscribe a request-scoped object?
- Does every
IMemoryCache.Sethave aSize, anAbsoluteExpirationRelativeToNow, and isSizeLimitconfigured globally? - Is every read-only EF query
AsNoTracking()(or is the defaultNoTrackingWithIdentityResolution)? - Does any controller
Task.Runwork that outlives the request? (→ queue + hosted service, or Hangfire) - Is every
HttpClientfromIHttpClientFactory(notnew HttpClient()and not a bare static)? - Does every SignalR Hub override
OnDisconnectedAsyncand clean its state? - Does every
BeginScopelog identifiers, not graphs? - Does every
BackgroundServicehonorstoppingTokenand pass it down? - Are big buffers (
> 85 KB) usingStringBuilder/ArrayPool/MemoryPool, not raw concat / allocation?
If any answer is "I'm not sure" — dotnet-gcdump it before shipping.
16. Honest stuff
- None of these are framework bugs. ASP.NET Core's defaults are sane. Every leak in this guide is application code holding references it shouldn't.
- The runtime is doing its job. GC reclaims the unreachable. The work is being honest about what's reachable.
- Leak hunts have diminishing returns. Fix the top 3 (Singleton + DbContext, MemoryCache without SizeLimit, change tracker bloat) and you'll claw back 70%+ of the win. Don't chase 10 MB while a 400 MB leak is upstream.
ValidateScopes = truein dev catches one of the worst ones at startup. Turn it on this afternoon.- Production memory profiling is safe and cheap.
dotnet-gcdumptakes ~20 seconds and pauses the process briefly. Run it. Don't be scared of the tool. - App Service / Kubernetes restarting your container "fixes" leaks every 4 days. This is not a fix. It hides the real cost: tail latency, cold starts, duplicate work, and lost in-flight requests.
- A bigger SKU isn't a fix. Mattrx was on P2v3 because we'd been throwing more RAM at the problem. After the audit, we went down to P1v3 and saved $420/month. The leaks were the bill.
17. Closing — the right mental model
In one line: .NET doesn't leak. Your code decides what's reachable.
Every leak is one of:
- A static field you forgot about.
- An event with no
-=. - A cache with no eviction.
- A scope captured by something longer-lived than it.
- A Task that outlives its caller.
- A buffer allocated bigger than 85 KB without pooling.
Three habits that prevent 90% of the pain in this guide:
- Turn on
ValidateScopes = truein development. It catches the singleton-captures-scoped bug at app startup every single time. - Default to
NoTrackingWithIdentityResolutionfor the DbContext. Opt-in to tracking on writes only. Read-heavy APIs save hundreds of MB. - Treat
MemoryCacheas a bounded primitive, not a "hashtable that auto-evicts somehow." SizeLimit + Size + AbsoluteExpiration on every use. No exceptions.
Apply that, and the next time a colleague says "the API is OOM-restarting every 4 days," you'll have a runbook instead of a guess.
Further reading
- .NET memory leak diagnostics — the canonical Microsoft walk-through.
dotnet-countersdocs — the first tool to reach for.dotnet-gcdumpdocs — heap snapshots.- PerfView — the open-source CLR profiler that wrote the book on this.
IHttpClientFactorydocs — for Leak #6.- EF Core change tracker behavior — for Leaks #1 and #4.
- PrepStack — Angular + .NET Core Enterprise Application Architecture — the broader architecture this leak audit sits inside.
Found one of these leaks in your codebase, or chasing one you can't identify? Email randhir.jassal@gmail.com with the dotnet-counters snapshot (or your dotnet-gcdump output) — happy to point at which of the 10 it is.
Get the next issue
A short, curated email with the newest posts and questions.