Suppose we are performing gradient descent to minimize the empirical risk of a linear regression model y = theta_0 + theta_1*x1 + theta_2*(x1^2) + theta_3*x2 on a dataset with 100 observations. Let D be the number of components in the gradient, e.g. D = 2 for the equation in the previous question. What is D for the gradient used to optimize this linear regression model? *