Machine Learning Applications - Behavioural Checks and Failures
In Effective Testing for Machine Learning Systems (https://www.jeremyjordan.me/testing-ml/), we explored how machine learning testing is different from machine learning evaluation.

While in ML evaluation we're concerned with model performance (e.g., accuracy), in ML testing we focus on the model's learned behaviour. Is it behaving the way we expect it to behave? For example:

- Sentiment classifier models should be invariant to name of people mentioned
- Road segmentation models should work regardless of weather conditions
- Phishing probability shouldn't go down if the URL changes from https to http
- Fraud probability on product reviews shouldn't depend on customer gender

To better understand how to test for this, we're collecting such scenarios via this survey and share about them.
Sign in to Google to save your progress. Learn more
What is the application domain? *
What logical rules do you expect your model to follow? *
What is the input data type? *
Required
What unexpected behaviour (eg. model failure modes) have you come across?
If left unchecked, how often does this unexpected behaviour occur?
Clear selection
Anything else you would like to mention?
If you're open to being contacted about this response, please leave an email below.
Submit
Clear form
Never submit passwords through Google Forms.
This content is neither created nor endorsed by Google. Report Abuse - Terms of Service - Privacy Policy