Machine Learning Applications - Behavioural Checks and Failures

JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

In Effective Testing for Machine Learning Systems (https://www.jeremyjordan.me/testing-ml/), we explored how machine learning testing is different from machine learning evaluation.

While in ML evaluation we're concerned with model performance (e.g., accuracy), in ML testing we focus on the model's learned behaviour. Is it behaving the way we expect it to behave? For example:

- Sentiment classifier models should be invariant to name of people mentioned
- Road segmentation models should work regardless of weather conditions
- Phishing probability shouldn't go down if the URL changes from https to http
- Fraud probability on product reviews shouldn't depend on customer gender

To better understand how to test for this, we're collecting such scenarios via this survey and share about them.

What is the application domain? *

What logical rules do you expect your model to follow? *

What is the input data type? *

Tabular

Text

Image

Other:

Required

What unexpected behaviour (eg. model failure modes) have you come across?

If left unchecked, how often does this unexpected behaviour occur?

Daily

Weekly

Monthly

Quarterly

Yearly

Clear selection

Anything else you would like to mention?

If you're open to being contacted about this response, please leave an email below.

Submit

Clear form

Never submit passwords through Google Forms.

This content is neither created nor endorsed by Google. Report Abuse - Terms of Service - Privacy Policy

Forms