Ultrafast NVMe Sampler

Random batch generation at 6 GB/s (or 5M records/s).

Posted by Bartłomiej Romański on March 2, 2018

Tired of shuffling your learning data each and every epoch? You really need to check this out!

Paweł from our data science team just open-sourced his NVMe Sampler – a library we use at RTB House while training our PyTorch models. With a bunch of NVMe drives, libaio and some black performance magic this little tool can generate random batches for you with the astonishing speed – over 6 GB/s (or 5M records/s).

See README for all the details. Here’s just a little architecture preview:

Never wait for data shuffling again!