SHARDING
why big jobs were timing out — in ~3 min
A job can be up to 1,000,000 URLs
every URL must be saved before we can say “got it”
Today: one lane
0s
deadline · 60s
✗ timed out
100k URLs · ~100s · over the limit
Spread across 16 lanes
same 100k · ~6s · well inside the limit
The trade-off
reading results back now checks every lane & merges — a small tax
So: spread only when it pays
✓
small job · one lane · no tax
And the very largest?
1M still nears the limit — so accept now, save in the background (next step)
Spread big jobs across lanes.
Fix the timeouts.
conveyor · PR #117