Ray Data+Train: Data loading for training

Thanks for taking this survey and helping to grow Ray Data! This survey's results will be used to plan the Ray Data roadmap for v2.10+. The goal is to understand your workload needs in data loading for training.


The survey has two sections and should take <10 minutes to complete.
Sign in to Google to save your progress. Learn more
Email *
Workload description

Please describe your workload in the following questions.

If you foresee the workload changing in the future (e.g., you will require larger data scale in the future), please also note a range and an approximate timeline if possible.

What is your role/organization/company?

What is your goal? Ex:  “I want to train ResNet-50 on the ImageNet dataset and do it as (quickly|cheaply) as possible.”

What type of preprocessing do you want to do? Ex: “I want to randomly crop each image.”

What is the cluster scale? Ex: “1-4 GPUs, and I want up to 4 additional CPU-only nodes.”

What is the data format, storage, and scale? Ex: “100GB of JPEG images stored on S3, each image as an individual file. The pathnames are stored in Parquet files on S3. I want to split the dataset into 90% training data and 10% test data.”
Are you using Ray Data and Train already? If not, what are you using and why?
Next
Clear form
Never submit passwords through Google Forms.
This form was created inside of Anyscale, Inc.. Report Abuse