Summer Analytics - Week5 Quiz

JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

Congrats on making it to the last week of Summer Analytics! Are you ready to put your knowledge of Neural Networks to test? All the best for this quiz, which shall certainly help you revise your concepts as you scratch your head while going through the questions!

Email *

Record my email address with my response

Name *

Registered email ID for Summer Analytics *

Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. Which of the following statements are True? (Check all that apply)

Each neuron in the first hidden layer will perform the same computation. So even after multiple iterations of gradient descent each neuron in the layer will be computing the same thing as other neurons.

Each neuron in the first hidden layer will perform the same computation in the first iteration. But after one iteration of gradient descent they will learn to compute different things because we have “broken symmetry”.

Each neuron in the first hidden layer will compute the same thing, but neurons in different layers will compute different things, thus we have accomplished “symmetry breaking” as described in the lecture.

The first hidden layer’s neurons will perform different computations from each other even in the first iteration; their parameters will thus keep evolving in their own way

Out of the given options, what is the best choice for number of clusters(k) ?

Clear selection

With the symbols nx, m, w, b, z, a having their usual meanings, which of the following relations are incorrect in case of vectorized implementation?

A(0) has dimensions (nx,1)

w(l) has dimensions ( n(l-1) , n(l))

b(l) has dimensions ( n(l) ,1 )

z(l) = A(l-1) w(l) + b(l)

In general if you were given a choice of activation function where both Sigmoid and Leaky ReLu can be used, which one would you tend to prefer?

Leaky ReLU since it takes all possible values in the interval [0, infinity) whereas the values taken by the sigmoid function lie in the range (0,1)

Leaky ReLU since its derivative has non zero value whereas the derivative of Sigmoid function is 0 for a large set of values which makes the convergence of gradient descent slower.

Sigmoid since it takes all possible values in the interval (0,1) whereas the values taken by the sigmoid function lie in the range [0, infinity)

Sigmoid since its derivative lies in the range of [0,0.25] which makes the gradient descent converge faster.

Which of the following statements are true with respect to the learning rate of a model?

Keeping a very large learning rate always helps gradient descent to converge faster.

Keeping a very small learning rate surely leads to convergence but takes makes the process slower.

Keeping a very large learning rate can often cause the cost function to increase due to divergence

Learning rate is chosen at the value which leads to minimum cost function upon implementation of gradient descent.

The network that involves backward links from output to the input and hidden layers is called?

Self organizing map

Perceptrons

Recurrent neural network

Multi layered perceptron

Clear selection

Which of the following is true about K-Mean Clustering?

1. K-means is extremely sensitive to cluster center initializations.

2.Bad initialization can lead to Poor convergence speed. 3.Bad initialization can lead to bad overall clustering.

1 and 3

1 and 2

2 and 3

1,2 and 3

Clear selection

If we train SVM Multiple times with random initial weight initialization, we still end up with the same final weights once training is completed. What could be a possible reason for this?

The final weights for the SVM are constant and remain the same irrespective of the dataset.

The cost function for SVM is almost flat (has the same value) for a very large range of data, because of which, irrespective of the initial random distribution, the final weights do not change after a point

The cost function of SVM is convex, having a global minimum, and SVM always reaches the global minimum perfectly, giving the same weighIs at the end.

None of the above

Neural Networks are complex ______________ with many parameters.

Linear Functions

Non- Linear Functions

Discrete Functions

Exponential Functions

Clear selection

You have built a network using the tanh activation for all the hidden units. You initialize the weights to relative large values, using np.random.randn(..,..)*1000. What will happen?

It doesn’t matter. So long as you initialize the weights randomly gradient descent is not affected by whether the weights are large or small.

This will cause the inputs of the tanh to also be very large, thus causing gradients to also become large. You therefore have to set α to be very small to prevent divergence; this will slow down learning.

This will cause the inputs of the tanh to also be very large, causing the units to be “highly activated” and thus speed up learning compared to if the weights had to start from small values.

This will cause the inputs of the tanh to also be very large, thus causing gradients to be close to zero. The optimization algorithm will thus become slow.

Clear selection

Submit

Clear form

Never submit passwords through Google Forms.

This content is neither created nor endorsed by Google. Report Abuse - Terms of Service - Privacy Policy

Forms