Methods In Unsupervised Dependency Parsing PDF Free Download

2y ago

46 Views

1 Downloads

1.72 MB

118 Pages

Report/dmca

Download PDF

Transcription

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionMethods in Unsupervised Dependency ParsingMohammad Sadegh RasooliCandidacy examDepartment of Computer ScienceColumbia UniversityApril 1st, 2016Mohammad Sadegh RasooliMethods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionOverview1234IntroductionDependency GrammarDependency ParsingFully Unsupervised Parsing ModelsUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionSyntactic Transfer ModelsApproaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussionConclusionMohammad Sadegh RasooliMethods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionDependency GrammarDependency ParsingDependency GrammarGIEFA formal grammar introduced by[Tesnière, 1959] inspired from thevalency theory in ChemistryDACBIIn a dependency tree, each word has exactly one parentand can have as many dependentsIBenefit: explicit representation of syntactic rolespuncpcnmodsbjobjnmodnmodnmodEconomic news had little effect on financial markets .Mohammad Sadegh RasooliMethods in Unsupervised Dependency ParsingCH3

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionDependency GrammarDependency ParsingDependency ParsingIIState-of-the-art parsing models are very accurateRequirement: large amounts of annotated treesII 50 treebanks available, '7000 languages without anytreebankTreebank development: an expensive and time-consumingtaskIIFive years of work for the Penn Chinese Treebank[Hwa et al., 2005]Unsupervised dependency parsing is an alternativeapproach when no treebank is availableMohammad Sadegh RasooliMethods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionOverview1234IntroductionDependency GrammarDependency ParsingFully Unsupervised Parsing ModelsUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionSyntactic Transfer ModelsApproaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussionConclusionMohammad Sadegh RasooliMethods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionUnsupervised ParsingIIGoal: Develop an accurate parser without annotated dataCommon assumptionsIIPart-of-speech (POS) information is availableRaw data is availableMohammad Sadegh RasooliMethods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionInitial AttemptsLearning LanguageIThe seminal work of[Carroll and Charniak, 1992] and[Paskin, 2002] tried differenttechniques and achieved interestingresultsITheir models could not beat thebaseline of attaching every word tothe next wordSupervised NLPMohammad Sadegh RasooliUnsupervised NLPMethods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionDMV: the First BreakthroughIDependency model with valence (DMV)[Klein and Manning, 2004] is the first model that could beatthe baselineIMost papers extended the DMV either in the inferencemethod or parameter definitionMohammad Sadegh RasooliMethods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionThe Dependency Model with ValenceIIIIIInput x, output y, p(x, y θ) p(y(0) , θ)θc for dependency attachmentθs for stopping getting dependentsadj(j): true iff xj is adjacent to its parentdepdir (j) set of dependents for xj in direction dirRecursive calculationYP (y(i) xi , θ) ?θs (stop xi , dir, [depdir (i) ])dir { , } Y(1 θs (stop xi , dir, adj(j)))j ydir (i) θc (xj xi , dir) P (y(j) , θ)Mohammad Sadegh RasooliMethods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionDMV: A Running ExampleROOT PRN VB DT NNP (yP (y(2)(0)) θc (VB ROOT, ) P (y(2) VB, θ) V B, θ) θs (stop VB, , true) (1 θs (stop VB, , false)) θc (PRN VB, ) P (y(1) PRN, θ) θs (stop VB, , true) (1 θs (stop VB, , false)) θc (NN VB, ) P (yP (yP (y(1)(4)(4) NN,θ) P RN, θ) θs (stop PRN, , false) θs (stop PRN, , false) N N, θ) θs (stop NN, , true) (1 θs (stop NN, , false)) θc (DT N N, ) P (y(3) DT, θ) θs (stop NN, , false)P (y(3) DT, θ) θs (stop DT, , false) θs (stop DT, , false)Mohammad Sadegh RasooliMethods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionDMV: A Running ExampleROOT PRN VB DT NNP (yP (y(2)(0)) θc (VB ROOT, ) P (y(2) VB, θ) V B, θ) θs (stop VB, , true) (1 θs (stop VB, , false)) θc (PRN VB, ) P (y(1) PRN, θ) θs (stop VB, , true) (1 θs (stop VB, , false)) θc (NN VB, ) P (yP (yP (y(1)(4)(4) NN, θ) P RN, θ) θs (stop PRN, , false) θs (stop PRN, , false) N N, θ) θs (stop NN, , true) (1 θs (stop NN, , false)) θc (DT N N, ) P (y(3) DT, θ) θs (stop NN, , false)P (y(3) DT, θ) θs (stop DT, , false) θs (stop DT, , false)Mohammad Sadegh RasooliMethods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionDMV: Parameter EstimationIParameter estimation based on occurrence counts; e.g.count(wi wj )0w0 V count(wi w )θc (wj wi , ) PIIn an unsupervised setting, we can use dynamicprogramming (the Inside-Outside algorithm[Lari and Young, 1990]) to estimate model parameters θMohammad Sadegh RasooliMethods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionProblems with DMVIA non-convex optimization problem forDMVIILocal optima is not necessarily a globaloptimaVery sensitive to the initializationIEncoding constraints is not embedded in the original modelILack of expressivenessILow supervised accuracy (upperbound)INeeds inductive biasIPost-processing the DMV output byfixing the determiner-noun directiongave a huge improvement[Klein and Manning, 2004]Mohammad Sadegh RasooliDET NOUNMethods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionExtensions to DMVIChanging the learning algorithm from EMIIContrastive estimation [Smith and Eisner, 2005]Bayesian models [Headden III et al., 2009, Cohen and Smith, 2009a,Blunsom and Cohn, 2010, Naseem et al., 2010,Mareček and Straka, 2013]ILocal optima problemIISwitching between different objectives [Spitkovsky et al., 2013]Lack of expressivenessIIIILexicalization [Headden III et al., 2009]Parameter tying [Cohen and Smith, 2009b, Headden III et al., 2009]Tree substitution grammars [Blunsom and Cohn, 2010]Rereanking with a richer model [Le and Zuidema, 2015]Mohammad Sadegh RasooliMethods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionExtensions to DMVIInductive biasIAdding constraintsIIIIPosterior regularization [Gillenwater et al., 2010]Forcing unambiguity [Tu and Honavar, 2012]Universal knowledge [Naseem et al., 2010]Stop probability estimation from raw text[Mareček and Straka, 2013]IAlternatives to DMVIConvex objective based on convex hull of plausible trees[Grave and Elhadad, 2015]Mohammad Sadegh RasooliMethods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionCommon Learning Algorithms for DMVIExpectation maximization (EM) [Dempster et al., 1977]IPosterior regularization (PR) [Ganchev et al., 2010]IVariational Bayes (VB) [Beal, 2003]IPR VB [Naseem et al., 2010]Mohammad Sadegh RasooliMethods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionExpectation Maximization (EM) AlgorithmIIStart with initial parameters θ(t) in iteration t 1Repeat until θ(t) ' θ(t 1)IE step: Maximize the posterior probability i 1 . . . N ; y Yxi(t)qipθ(t) (xi , y)0y 0 Yx pθ (t) (xi , y ) pθ(t) (y x) PiIM step: Maximize the parameter values θθ(t 1) arg maxθIN XX(t)qi (y) log pθ (xi , y)i 1 y Yxit t 1Mohammad Sadegh RasooliMethods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionExpectation Maximization (EM) AlgorithmIIStart with initial parameters θ(t) in iteration t 1Repeat until θ(t) ' θ(t 1)IE step: Maximize the posterior probability i 1 . . . N ; y Yxi(t)qipθ(t) (xi , y)0y 0 Yx pθ (t) (xi , y ) pθ(t) (y x) PiAnother interpretation of the E step [Neal and Hinton, 1998]q (t) arg min KL(q(Y ) pθ(t) (Y X))qIt t 1Mohammad Sadegh RasooliMethods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionExpectation Maximization (EM) AlgorithmIIStart with initial parameters θ(t) in iteration t 1Repeat until θ(t) ' θ(t 1)M stepOptimal parameters for a categorical distribution is achieved bynormalization:PN (t)i 1 qi (y x)θ(t 1) (y x) P P(t) 0Ny0i 1 qi (y x)IM step: Maximize the parameter values θθI(t 1) arg maxθN XX(t)qi (y) log pθ (xi , y)i 1 y Yxit t 1Mohammad Sadegh RasooliMethods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionCommon Learning Algorithms for DMVIExpectation maximization (EM) [Dempster et al., 1977]IPosterior regularization (PR) [Ganchev et al., 2010]IVariational Bayes (VB) [Beal, 2003]IPR VB [Naseem et al., 2010]Mohammad Sadegh RasooliMethods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionPosterior RegularizationIPrior knowledge as constraintIJust affects the E step and the M step remains unchangedMohammad Sadegh RasooliMethods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionPosterior RegularizationOriginal objectiveq (t) arg min KL(q(Y ) pθ(t) (Y X))qModified objectiveq (t) arg min KL(q(Y ) pθ(t) (Y X)) σqXbiis.t. Eq [φi (X, Y )] β biσ is the regularization coefficient and bi is the proposed numericalconstraint for sentence i.Mohammad Sadegh RasooliMethods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionPosterior Regularization ConstraintsModified objectiveq (t) arg min KL(q(Y ) pθ(t) (Y X)) σqXbiiTypes of constraints:INumber of unique child-head tag pairs in a sentence (less isbetter) [Gillenwater et al., 2010]INumber of preserved pre-defined linguistic rules in a tree(more is better) [Naseem et al., 2010]IInformation entropy of the sentence (less is better)[Tu and Honavar, 2012]Mohammad Sadegh RasooliMethods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionCommon Learning Algorithms for DMVIExpectation maximization (EM) [Dempster et al., 1977]IPosterior regularization (PR) [Ganchev et al., 2010]IVariational Bayes (VB) [Beal, 2003]IPR VB [Naseem et al., 2010]Mohammad Sadegh RasooliMethods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionVariational BayesIA Bayesian model that encodes prior informationIJust affects the M step and the E step remains unchangedMohammad Sadegh RasooliMethods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionVariational BayesM stepθ(t 1)(t)i 1 qi (y x)P PN (t) 0y0i 1 qi (y x)PN(y x) Modified M step in VBθ(t 1)P(t)F(αy Ni 1 qi (y x))(y x) PP(t) 0F( y0 αy0 Ni 1 qi (y x))α is the priorF(v) eΨ(v)Ψ is the digamma functionMohammad Sadegh RasooliMethods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionCommon Learning Algorithms for DMVIExpectation maximization (EM) [Dempster et al., 1977]IPosterior regularization (PR) [Ganchev et al., 2010]IVariational Bayes (VB) [Beal, 2003]IPR VB [Naseem et al., 2010]Mohammad Sadegh RasooliMethods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionVB PRIMakes use of both methods [Naseem et al., 2010]:IIE step as in PRM step as in VBMohammad Sadegh RasooliMethods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionDiscussionISignificant improvements?IIYes!Satisfying performance?INo!IIMostly optimized for EnglishFar less than a supervised modelMohammad Sadegh RasooliMethods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionUnsupervised Parsing Improvement Over Time100[Klein and Manning, 2004]807060504033.635.930.1Mohammad Sadegh 020082009DMVRandom30Adjacentunlabeled dependency accuracy on WSJ testing data90Methods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionUnsupervised Parsing Improvement Over Time100[Cohen et al., 2008]8070605040.54033.635.930.1Mohammad Sadegh 020082009DMVRandom30Adjacentunlabeled dependency accuracy on WSJ testing data90Methods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionUnsupervised Parsing Improvement Over Time100[Cohen and Smith, 2009a]8070605040.5 41.44033.635.930.1Mohammad Sadegh 020082009DMVRandom30Adjacentunlabeled dependency accuracy on WSJ testing data90Methods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionUnsupervised Parsing Improvement Over Time100[Blunsom and Cohn, 2010]80706055.75040.5 41.44033.635.930.1Mohammad Sadegh 020082009DMVRandom30Adjacentunlabeled dependency accuracy on WSJ testing data90Methods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionUnsupervised Parsing Improvement Over Time100[Spitkovsky et al., 2011]807059.16055.75040.5 41.44033.635.930.1Mohammad Sadegh 020082009DMVRandom30Adjacentunlabeled dependency accuracy on WSJ testing data90Methods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionUnsupervised Parsing Improvement Over Time100[Spitkovsky et al., 2012]807059.16061.255.75040.5 41.44033.635.930.1Mohammad Sadegh 020082009DMVRandom30Adjacentunlabeled dependency accuracy on WSJ testing data90Methods in Unsupervised Dependency Parsing

IntroductionFully Unsupervised Parsing ModelsSyntactic Transfer ModelsConclusionUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussionUnsupervised Parsing Improvement Over Time100[Spitkovsky et al., 2013]807064.459.16061.255.75040.5 41.44033.635.930.1Mohammad Sadegh 020082009DMVRandom30Adjacentunlab

I State-of-the-art parsing models are very accurate I Requirement: large amounts of annotated trees I 50 treebanks available, ’7000 languages without any treebank I Treebank development: an exp