.NETHard
Parallel.ForEach vs Task.WhenAll vs PLINQ — which scales better and when
| Threads or tasks? | Best for | Throughput on I/O? | Throughput on CPU? | |
|---|---|---|---|---|
Parallel.ForEach | OS threads | CPU-bound batches | Wastes threads | Best |
Task.WhenAll | Tasks (no extra threads on I/O) | I/O fan-out | Best | OK but no partitioning |
| PLINQ | OS threads + partitioning | Functional data pipelines on CPU | Wastes threads | Good (boilerplate-free) |
CPU-bound — Parallel.ForEach
Parallel.ForEach(images, img => Process(img));
- Uses thread pool, partitions work, scales with
Environment.ProcessorCount. - Do not use for
awaitinside — see "async pitfall" below.
I/O-bound — Task.WhenAll
var tasks = urls.Select(u => http.GetStringAsync(u));
var results = await Task.WhenAll(tasks);
- One thread can have thousands of in-flight I/O tasks. Do not use
Parallel.ForEachhere — it would create one OS thread per URL.
Throttled async fan-out — Parallel.ForEachAsync (.NET 6+)
await Parallel.ForEachAsync(urls,
new ParallelOptions { MaxDegreeOfParallelism = 16 },
async (u, ct) => await http.GetStringAsync(u, ct));
The sweet spot for "I have 50 000 HTTP calls and do not want to open them all at once."
Functional CPU pipelines — PLINQ
var hashes = files
.AsParallel()
.WithDegreeOfParallelism(8)
.Select(f => Sha256(f))
.ToArray();
- Cleaner than
Parallel.ForEachwhen you are composing transformations. - Ordering — use
.AsOrdered()only if necessary, costs throughput.
The async pitfall
// Do not write this
Parallel.ForEach(urls, async u => await http.GetStringAsync(u));
// Lambda is async void — fire-and-forget; Parallel.ForEach returns before any HTTP completes.
Use Parallel.ForEachAsync instead.
Choosing matrix
| Work type | Item count | Choice |
|---|---|---|
| CPU work, small N | 4 to 8 | Task.WhenAll is fine (no partitioning cost) |
| CPU work, large N | 1000+ | Parallel.ForEach or PLINQ |
| I/O work, any N | * | Task.WhenAll or Parallel.ForEachAsync (throttled) |
Don't forget — measurement first
BenchmarkDotNet or production profiling. Parallel code often gets slower due to contention; verify your specific workload.