Machine Learning

Ultrafast NVMe Sampler

Last Updated on: 13th June 2024, 08:14 pm

Random batch generation at 6 GB/s (or 5M records/s).

Tired of shuffling your learning data each and every epoch? You really need to check this out!

Paweł from our data science team just open-sourced his NVMe Sampler – a library we use at RTB House while training our PyTorch models. With a bunch of NVMe drives, libaio and some black performance magic this little tool can generate random batches for you with the astonishing speed – over 6 GB/s (or 5M records/s).

See README for all the details. Here’s just a little architecture preview:

Never wait for data shuffling again!

Comments are closed.