Tired of shuffling your learning data each and every epoch? You really need to check this out!
Paweł from our data science team just open-sourced his NVMe Sampler – a library we use at RTB House while training our PyTorch models. With a bunch of NVMe drives, libaio and some black performance magic this little tool can generate random batches for you with the astonishing speed – over 6 GB/s (or 5M records/s).
See README for all the details. Here’s just a little architecture preview:
Never wait for data shuffling again!