NLP 2021 @ BGU.CS
Lecture #10 / Quiz #09
Email *
Consider the basic Viterbi algorithm: Input: a sentence x1 ... xn, parameters q(s|u, v) and e(x|s).  Definitions: Define K to be the set of possible tags. Define K−1 = K0 = {*}, and Kk = K for k = 1 ... n.  Initialization: Set π(0, *, *) = 1.   Algorithm:  For k = 1 ... n, For u ∈ Kk−1, v ∈ Kk, π(k, u, v) = max_w ∈ Kk−2(π(k − 1, w, u) × q(v | w, u) × e(xk | v))   Return max_u ∈ Kn−1, v ∈ Kn(π(n, u, v) × q(STOP | u, v))  What is the complexity of this algorithm which produces a sequence of n tags given a sequence of n words with a label set of size K?
In the Maximum Entroy Markov Model (MEMM), we model the sequential tagging problem of finding a sequence of tags given a sequence of words with the following independence assumption:  Chain rule: p(t_1, t_2, ..., t_n | x_1, ..., x_n) = Πi=1..n  p(s_i | s_1, ..., s_(i−1), x_1, ..., x_n)  Independence assumption: p(t_i | t_1, ..., t_(i-1), x_1, ..., x_n) = p(t_i | t_(i-1), x_1, ..., x_n)    In addition, we model the conditional distribution using a log linear distribution with a feature extraction function Φ and parameters w.   What are the parameters of the feature extraction function Φ?
Write the formula of the log-linear (using the softmax function) with and parameters w:   p(ti | ti-1, x1, ..., xn) = softmax(__________________________________________)
Is decoding necessary in the MEMM approach? Why?
In the CRF model, we model the sequence tagging problem with a global feature function and parameters w as follows:  x = (x1, ..., xn)  t = (t1, t2, ..., tn)  p(t | x, w) = softmax( w . Φ(x, t))   What are the two computational challenges that must be overcome to solve such a model?
Submit
Clear form
Never submit passwords through Google Forms.
This content is neither created nor endorsed by Google. Report Abuse - Terms of Service - Privacy Policy