Sta Nz A : A Python Natural Language Processing Toolkit .

Transcription

Sta n z a : A Python Natural Language Processing Toolkitfor Many Human LanguagesPeng Qi* Yuhao Zhang* Yuhui ZhangJason Bolton Christopher D. ManningStanford UniversityStanford, CA 94305{pengqi, yuhaozhang, yuhuiz}@stanford.edu{jebolton, �ствуйте!Multi-word Token Expansion !ﻣرﺣﺑﺎ ARMWTこんにちは!Hallo!NLxin chào!VITokenization & Sentence SplitTOKENIZEWe introduce Sta n z a , an open-source Pythonnatural language processing toolkit supporting 66 human languages. Compared to existing widely used toolkits, Sta n z a featuresa language-agnostic fully neural pipeline fortext analysis, including tokenization, multiword token expansion, lemmatization, part-ofspeech and morphological feature tagging, dependency parsing, and named entity recognition. We have trained Sta n z a on a total of112 datasets, including the Universal Dependencies treebanks and other multilingual corpora, and show that the same neural architecture generalizes well and achieves competitiveperformance on all languages tested. Additionally, Sta n z a includes a native Python interfaceto the widely used Java Stanford CoreNLPsoftware, which further extends its functionality to cover other tasks such as coreferenceresolution and relation extraction. Sourcecode, documentation, and pretrained modelsfor 66 languages are available at nThe growing availability of open-source natural language processing (NLP) toolkits has made it easierfor users to build tools with sophisticated linguisticprocessing. While existing NLP toolkits such asCoreNLP (Manning et al., 2014), F LAIR (Akbiket al., 2019), spaCy1 , and UDPipe (Straka, 2018)have had wide usage, they also suffer from severallimitations. First, existing toolkits often supportonly a few major languages. This has significantlylimited the community’s ability to process multilingual text. Second, widely used tools are sometimesunder-optimized for accuracy either due to a focuson efficiency (e.g., spaCy) or use of less powerful models (e.g., CoreNLP), potentially mislead 1Equal contribution. Order decided by a tossed coin.https://spacy.io/JALemmatizationMultilingual: 66 LanguagesLEMMARAW TEXTRUनमस्कार!HIPOS & Morphological TaggingPOSDependency ParsingDEPPARSENamed Entity RecognitionNERWORDSNative Python ObjectsTOKENLEMMAPOSHEADDEPREL.WORDFully Neural: Language-agnosticSENTENCEPROCESSORSDOCUMENTFigure 1: Overview of Sta n z a ’s neural NLP pipeline.Sta n z a takes multilingual text as input, and producesannotations accessible as native Python objects. Besides this neural pipeline, Sta n z a also features aPython client interface to the Java CoreNLP software.ing downstream applications and insights obtainedfrom them. Third, some tools assume input text hasbeen tokenized or annotated with other tools, lacking the ability to process raw text within a unifiedframework. This has limited their wide applicability to text from diverse sources.We introduce Sta n z a 2 , a Python natural languageprocessing toolkit supporting many human languages. As shown in Table 1, compared to existingwidely-used NLP toolkits, Sta n z a has the followingadvantages: From raw text to annotations. Sta n z a features a fully neural pipeline which takes rawtext as input, and produces annotations including tokenization, multi-word token expansion,lemmatization, part-of-speech and morphological feature tagging, dependency parsing, andnamed entity recognition. Multilinguality. Sta n z a ’s architectural design is language-agnostic and data-driven,which allows us to release models support2The toolkit was called StanfordNLP prior to v1.0.0.

System# HumanLanguagesProgrammingLanguageCoreNLPF LAIRspaCyUDPipe6121061JavaPythonPythonC Sta n z a66PythonRaw !!State-of-the-artPerformance!!!Table 1: Feature comparisons of Sta n z a against other popular natural language processing toolkits.ing 66 languages, by training the pipeline onthe Universal Dependencies (UD) treebanksand other multilingual corpora. State-of-the-art performance. We evaluateSta n z a on a total of 112 datasets, and find itsneural pipeline adapts well to text of differentgenres, achieving state-of-the-art or competitive performance at each step of the pipeline.Additionally, Sta n z a features a Python interfaceto the widely used Java CoreNLP package, allowing access to additional tools such as coreferenceresolution and relation extraction.Sta n z a is fully open source and we make pretrained models for all supported languages anddatasets available for public download. We hope Stan z a can facilitate multilingual NLP research and applications, and drive future research that producesinsights from human languages.2System Design and ArchitectureAt the top level, Sta n z a consists of two individualcomponents: (1) a fully neural multilingual NLPpipeline; (2) a Python client interface to the JavaStanford CoreNLP software. In this section weintroduce their designs.2.1Neural Multilingual NLP PipelineSta n z a’s neural pipeline consists of models thatrange from tokenizing raw text to performing syntactic analysis on entire sentences (see Figure 1).All components are designed with processing manyhuman languages in mind, with high-level designchoices capturing common phenomena in manylanguages and data-driven models that learn the difference between these languages from data. Moreover, the implementation of Sta n z a components ishighly modular, and reuses basic model architectures when possible for compactness. We highlightthe important design choices here, and refer thereader to Qi et al. (2018) for modeling details.(fr) L’Association des Hôtels(en) The Association of Hotels(fr) Il y a des hôtels en bas de la rue(en) There are hotels down the streetFigure 2: An example of multi-word tokens in French.The des in the first sentence corresponds to two syntactic words, de and les; the second des is a single word.Tokenization and Sentence Splitting. Whenpresented raw text, Sta n z a tokenizes it and groupstokens into sentences as the first step of processing.Unlike most existing toolkits, Sta n z a combines tokenization and sentence segmentation from raw textinto a single module. This is modeled as a taggingproblem over character sequences, where the modelpredicts whether a given character is the end of atoken, end of a sentence, or end of a multi-wordtoken (MWT, see Figure 2).3 We choose to predictMWTs jointly with tokenization because this taskis context-sensitive in some languages.Multi-word Token Expansion. Once MWTsare identified by the tokenizer, they are expandedinto the underlying syntactic words as the basisof downstream processing. This is achieved withan ensemble of a frequency lexicon and a neuralsequence-to-sequence (seq2seq) model, to ensurethat frequently observed expansions in the trainingset are always robustly expanded while maintainingflexibility to model unseen words statistically.POS and Morphological Feature Tagging. Foreach word in a sentence, Sta n z a assigns it a partof-speech (POS), and analyzes its universal morphological features (UFeats, e.g., singular/plural,1st /2nd /3rd person, etc.). To predict POS and UFeats,we adopt a bidirectional long short-term memory network (Bi-LSTM) as the basic architecture.For consistency among universal POS (UPOS),3Following Universal Dependencies (Nivre et al., 2020),we make a distinction between tokens (contiguous spans ofcharacters in the input text) and syntactic words. These areinterchangeable aside from the cases of MWTs, where onetoken can correspond to multiple words.

treebank-specific POS (XPOS), and UFeats, weadopt the biaffine scoring mechanism from Dozatand Manning (2017) to condition XPOS andUFeats prediction on that of UPOS.Lemmatization. Sta n z a also lemmatizes eachword in a sentence to recover its canonical form(e.g., did do). Similar to the multi-word token expander, Sta n z a ’s lemmatizer is implemented as anensemble of a dictionary-based lemmatizer and aneural seq2seq lemmatizer. An additional classifieris built on the encoder output of the seq2seq model,to predict shortcuts such as lowercasing and identity copy for robustness on long input sequencessuch as URLs.Dependency Parsing. Sta n z a parses each sentence for its syntactic structure, where each wordin the sentence is assigned a syntactic head thatis either another word in the sentence, or in thecase of the root word, an artificial root symbol. Weimplement a Bi-LSTM-based deep biaffine neuraldependency parser (Dozat and Manning, 2017). Wefurther augment this model with two linguisticallymotivated features: one that predicts the linearization order of two words in a given language, andthe other that predicts the typical distance in linearorder between them. We have previously shownthat these features significantly improve parsingaccuracy (Qi et al., 2018).Named Entity Recognition. For each input sentence, Sta n z a also recognizes named entities in it(e.g., person names, organizations, etc.). For NERwe adopt the contextualized string representationbased sequence tagger from Akbik et al. (2018).We first train a forward and a backward characterlevel LSTM language model, and at tagging timewe concatenate the representations at the end ofeach word position from both language modelswith word embeddings, and feed the result into astandard one-layer Bi-LSTM sequence tagger witha conditional random field (CRF)-based decoder.2.2CoreNLP ClientStanford’s Java CoreNLP software provides a comprehensive set of NLP tools especially for the English language. However, these tools are not easilyaccessible with Python, the programming languageof choice for many NLP practitioners, due to thelack of official support. To facilitate the use ofCoreNLP from Python, we take advantage of theexisting server interface in CoreNLP, and implement a robust client as its Python interface.When the CoreNLP client is instantiated, Sta n za will automatically start the CoreNLP server as alocal process. The client then communicates withthe server through its RESTful APIs, after whichannotations are transmitted in Protocol Buffers, andconverted back to native Python objects. Users canalso specify JSON or XML as annotation format.To ensure robustness, while the client is being used,Sta n z a periodically checks the health of the server,and restarts it if necessary.3System UsageSta n z a’s user interface is designed to allow quickout-of-the-box processing of multilingual text. Toachieve this, Sta n z a supports automated modeldownload via Python code and pipeline customization with processors of choice. Annotation resultscan be accessed as native Python objects to allowfor flexible post-processing.3.1Neural Pipeline InterfaceSta n z a ’s neural NLP pipeline can be initializedwith the Pipeline class, taking language nameas an argument. By default, all processors will beloaded and run over the input text; however, userscan also specify the processors to load and run witha list of processor names as an argument. Userscan additionally specify other processor-level properties, such as batch sizes used by processors, atinitialization time.The following code snippet shows a minimal usage of Sta n z a for downloading the Chinese model,annotating a sentence with customized processors,and printing out all annotations:import stanza# download Chinese modelstanza.download(’zh’)# initialize Chinese neural pipelinenlp stanza.Pipeline(’zh’, processors ’tokenize,pos,ner’)# run annotation over a sentencedoc ��)print(doc)After all processors are run, a Document instance will be returned, which stores all annotationresults. Within a Document, annotations are further stored in Sentences, Tokens and Wordsin a top-down fashion (Figure 1). The followingcode snippet demonstrates how to access the textand POS tag of each word in a document and allnamed entities in the document:

# print the text and POS of all wordsfor sentence in doc.sentences:for word in sentence.words:print(word.text, word.pos)# print all entities in the documentprint(doc.entities)Sta n z a is designed to be run on different hardware devices. By default, CUDA devices will beused whenever they are visible by the pipeline, orotherwise CPUs will be used. However, users canforce all computation to be run on CPUs by settinguse gpu False at initialization time.3.2CoreNLP Client InterfaceThe CoreNLP client interface is designed in a waythat the actual communication with the backendCoreNLP server is transparent to the user. To annotate an input text with the CoreNLP client, aCoreNLPClient instance needs to be initialized,with an optional list of CoreNLP annotators. Afterthe annotation is complete, results will be accessible as native Python objects.This code snippet shows how to establish aCoreNLP client and obtain the NER and coreference annotations of an English sentence:from stanza.server import CoreNLPClient# start a CoreNLP clientwith CoreNLPClient(annotators ,’ner’,’parse’,’coref’]) asclient:# run annotation over inputann client.annotate(’Emily said that sheliked the movie.’)# access all entitiesfor sent in ann.sentence:print(sent.mentions)# access coreference annotationsprint(ann.corefChain)With the client interface, users can annotate textin 6 languages as supported by CoreNLP.3.3Interactive Web-based DemoTo help visualize documents and their annotationsgenerated by Sta n z a , we build an interactive webdemo that runs the pipeline interactively. For alllanguages and all annotations Sta n z a provides inthose languages, we generate predictions from themodels trained on the largest treebank/NER dataset,and visualize the result with the Brat rapid annotation tool.4 This demo runs in a client/server architecture, and annotation is performed on the serverside. We make one instance of this demo publiclyavailable at http://stanza.run/. It can also berun locally with proper Python libraries installed.4https://brat.nlplab.org/Figure 3: Sta n z a annotates a German sentence, as visualized by our interactive demo. Note am is expandedinto syntactic words an and dem before downstreamanalyses are performed.An example of running Sta n z a on a German sentence can be found in Figure 3.3.4Training Pipeline ModelsFor all neural processors, Sta n z a providescommand-line interfaces for users to train theirown customized models. To do this, users needto prepare the training and development data incompatible formats (i.e., CoNLL-U format for theUniversal Dependencies pipeline and BIO formatcolumn files for the NER model). The followingcommand trains a neural dependency parser withuser-specified training and development data: python -m stanza.models.parser \--train file train.conllu \--eval file dev.conllu \--gold file dev.conllu \--output file output.conllu4Performance EvaluationTo establish benchmark results and compare withother popular toolkits, we trained and evaluatedSta n z a on a total of 112 datasets. All pretrainedmodels are publicly downloadable.Datasets. We train and evaluate Sta n z a ’s tokenizer/sentence splitter, MWT expander, POS/UFeatstagger, lemmatizer, and dependency parser withthe Universal Dependencies v2.5 treebanks (Zeman et al., 2019). For training we use 100 treebanks from this release that have non-copyrightedtraining data, and for treebanks that do not includedevelopment data, we randomly split out 20% of

sUASLASOverall (100 treebanks)Sta n z ic-PADTSta n z nese-GSDSta n z lish-EWTSta n z DSta n z oraSta n z .1998.4879.6392.2188.2286.6390.0185.1084.13Table 2: Neural pipeline performance comparisons on the Universal Dependencies (v2.5) test treebanks. For oursystem we show macro-averaged results over all 100 treebanks. We also compare our system against UDPipe andspaCy on treebanks of five major languages where the corresponding pretrained models are publicly available. Allresults are F1 scores produced by the 2018 UD Shared Task official evaluation script.the training data as development data. These treebanks represent 66 languages, mostly Europeanlanguages, but spanning a diversity of languagefamilies, including Indo-European, Afro-Asiatic,Uralic, Turkic, Sino-Tibetan, etc. For NER, wetrain and evaluate Sta n z a with 12 publicly available datasets covering 8 major languages as shownin Table 3 (Nothman et al., 2013; Tjong Kim Sangand De Meulder, 2003; Tjong Kim Sang, 2002;Benikova et al., 2014; Mohit et al., 2012; Tauléet al., 2008; Weischedel et al., 2013). For theWikiNER corpora, as canonical splits are not available, we randomly split them into 70% training,15% dev and 15% test splits. For all other corporawe used their canonical splits.Training. On the Universal Dependencies treebanks, we tuned all hyper-parameters on severallarge treebanks and applied them to all other treebanks. We used the word2vec embeddings releasedas part of the 2018 UD Shared Task (Zeman et al.,2018), or the fastText embeddings (Bojanowskiet al., 2017) whenever word2vec is not available.For the character-level language models in the NERcomponent, we pretrained them on a mix of theCommon Crawl and Wikipedia dumps, and thenews corpora released by the WMT19 Shared Task(Barrault et al., 2019), except for English and Chinese, for which we pretrained on the Google OneBillion Word (Chelba et al., 2013) and the Chi-nese Gigaword corpora5 , respectively. We againapplied the same hyper-parameters to models forall languages.Universal Dependencies Results. For performance on UD treebanks, we compared Sta n z a(v1.0) against UDPipe (v1.2) and spaCy (v2.2) ontreebanks of 5 major languages whenever a pretrained model is available. As shown in Table 2, Sta n z a achieved the best performance on most scoresreported. Notably, we find that Sta n z a ’s languageagnostic architecture is able to adapt to datasets ofdifferent languages and genres. This is also shownby Sta n z a ’s high macro-averaged scores over 100treebanks covering 66 languages.NER Results. For performance of the NER component, we compared Sta n z a (v1.0) against F LAIR(v0.4.5) and spaCy (v2.2). For spaCy we reportedresults from its publicly available pretrained modelwhenever one trained on the same dataset can befound, otherwise we retrained its model on ourdatasets with default hyper-parameters, following the publicly available tutorial.6 For F LAIR,since their downloadable models were 36https://spacy.io/usage/training#nerNote that, following this public tutorial, we did not usepretrained word embeddings when training spaCy NERmodels, although using pretrained word embeddings maypotentially improve the NER results.

Language CorpusArabicAQMAR# Types Sta n z a F LAIR LL03OntoNotes41892.188.892.789.081.085.4 FrenchWikiNER492.992.588.8 88.687.388.477.576.1Table 3: NER performance across different languagesand corpora. All scores reported are entity microaveraged test F1 . For each corpus we also list the number of entity types. marks results from publicly available pretrained models on the same dataset, while others are from models retrained on our datasets.on dataset versions different from canonical ones,we retrained all models on our own dataset splitswith their best reported hyper-parameters. All testresults are shown in Table 3. We find that on alldatasets Sta n z a achieved either higher or close F1scores when compared against F LAIR. When compared to spaCy, Sta n z a ’s NER performance is muchbetter. It is worth noting that Sta n z a ’s high performance is achieved with much smaller modelscompared with F LAIR (up to 75% smaller), as weintentionally compressed the models for memoryefficiency and ease of distribution.Speed comparison. We compare Sta n z a againstexisting toolkits to evaluate the time it takes to annotate text (see Table 4). For GPU tests we use asingle NVIDIA Titan RTX card. Unsurprisingly,Sta n z a ’s extensive use of accurate neural modelsmakes it take significantly longer than spaCy toannotate text, but it is still competitive when compared against toolkits of similar accuracy, especially with the help of GPU acceleration.5TaskConclusion and Future WorkWe introduced Sta n z a , a Python natural languageprocessing toolkit supporting many human languages. We have showed that Sta n z a ’s neuralpipeline not only has wide coverage of human languages, but also is accurate on all tasks, thanksto its language-agnostic, fully neural architecturaldesign. Simultaneously, Sta n z a ’s CoreNLP clientextends its functionality with additional NLP tools.UDNERSta n z aCPUGPU10.3 3.22 17.7 1.08 UDPipeCPU4.30 –F LAIRCPUGPU––51.8 1.17 Table 4: Annotation runtime of various toolkits relative to spaCy (CPU) on the English EWT treebank andOntoNotes NER test sets. For reference, on the compared UD and NER tasks, spaCy is able to process 8140and 5912 tokens per second, respectively.For future work, we consider the following areasof improvement in the near term: Models downloadable in Sta n z a are largelytrained on a single dataset. To make models robust to many different genres of text,we would like to investigate the possibility ofpooling various sources of compatible data totrain “default” models for each language; The amount of computation and resourcesavailable to us is limited. We would therefore like to build an open “model zoo” forSta n z a , so that researchers from outside ourgroup can also contribute their models andbenefit from models released by others; Sta n z a was designed to optimize for accuracyof its predictions, but this sometimes comes atthe cost of computational efficiency and limits the toolkit’s use. We would like to furtherinvestigate reducing model sizes and speeding up computation in the toolkit, while stillmaintaining the same level of accuracy. We would also like to expand Sta n z a ’s functionality by adding other processors such asneural coreference resolution or relation extraction for richer text analytics.AcknowledgmentsThe authors would like to thank the anonymousreviewers for their comments, Arun Chaganty forhis early contribution to this toolkit, Tim Dozat forhis design of the original architectures of the taggerand parser models, Matthew Honnibal and InesMontani for their help with spaCy integration andhelpful comments on the draft, Ranting Guo for thelogo design, and John Bauer and the communitycontributors for their help with maintaining andimproving this toolkit. This research is funded inpart by Samsung Electronics Co., Ltd. and in partby the SAIL-JD Research Initiative.

ReferencesAlan Akbik, Tanja Bergmann, Duncan Blythe, KashifRasul, Stefan Schweter, and Roland Vollgraf. 2019.FLAIR: An easy-to-use framework for state-of-theart NLP. In Proceedings of the 2019 Conference ofthe North American Chapter of the Association forComputational Linguistics (Demonstrations). Association for Computational Linguistics.Alan Akbik, Duncan Blythe, and Roland Vollgraf.2018. Contextual string embeddings for sequencelabeling. In Proceedings of the 27th InternationalConference on Computational Linguistics. Association for Computational Linguistics.Loïc Barrault, Ondřej Bojar, Marta R. Costa-jussà,Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Philipp Koehn,Shervin Malmasi, Christof Monz, Mathias Müller,Santanu Pal, Matt Post, and Marcos Zampieri. 2019.Findings of the 2019 conference on machine translation (WMT19). In Proceedings of the Fourth Conference on Machine Translation (Volume 2: SharedTask Papers, Day 1). Association for ComputationalLinguistics.Darina Benikova, Chris Biemann, and Marc Reznicek.2014. NoSta-D named entity annotation for German: Guidelines and dataset. In Proceedings ofthe Ninth International Conference on Language Resources and Evaluation (LREC’14).Piotr Bojanowski, Edouard Grave, Armand Joulin, andTomas Mikolov. 2017. Enriching word vectors withsubword information. Transactions of the Association for Computational Linguistics, 5.Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge,Thorsten Brants, Phillipp Koehn, and Tony Robinson. 2013. One billion word benchmark for measuring progress in statistical language modeling. Technical report, Google.Timothy Dozat and Christopher D. Manning. 2017.Deep biaffine attention for neural dependency parsing. In International Conference on Learning Representations (ICLR).Christopher D. Manning, Mihai Surdeanu, John Bauer,Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language processing toolkit. In Association for Computational Linguistics (ACL) System Demonstrations.Behrang Mohit, Nathan Schneider, Rishav Bhowmick,Kemal Oflazer, and Noah A Smith. 2012. Recalloriented learning of named entities in ArabicWikipedia. In Proceedings of the 13th Conference ofthe European Chapter of the Association for Computational Linguistics. Association for ComputationalLinguistics.Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Jan Hajič, Christopher D. Manning, SampoPyysalo, Sebastian Schuster, Francis Tyers, andDaniel Zeman. 2020. Universal dependencies v2:An evergrowing multilingual treebank collection. InProceedings of the Twelfth International Conferenceon Language Resources and Evaluation (LREC’20).Joel Nothman, Nicky Ringland, Will Radford, TaraMurphy, and James R Curran. 2013. Learning multilingual named entity recognition from Wikipedia.Artificial Intelligence, 194:151–175.Peng Qi, Timothy Dozat, Yuhao Zhang, and Christopher D. Manning. 2018. Universal dependency parsing from scratch. In Proceedings of the CoNLL 2018Shared Task: Multilingual Parsing from Raw Text toUniversal Dependencies. Association for Computational Linguistics.Milan Straka. 2018. UDPipe 2.0 prototype at CoNLL2018 UD shared task. In Proceedings of the CoNLL2018 Shared Task: Multilingual Parsing from RawText to Universal Dependencies. Association forComputational Linguistics.Mariona Taulé, M. Antònia Martí, and Marta Recasens.2008. AnCora: Multilevel annotated corpora forCatalan and Spanish. In Proceedings of the SixthInternational Conference on Language Resourcesand Evaluation (LREC’08). European Language Resources Association (ELRA).Erik F. Tjong Kim Sang. 2002. Introduction to theCoNLL-2002 shared task: Language-independentnamed entity recognition. In COLING-02: The6th Conference on Natural Language Learning 2002(CoNLL-2002).Erik F. Tjong Kim Sang and Fien De Meulder.2003. Introduction to the CoNLL-2003 shared task:Language-independent named entity recognition. InProceedings of the Seventh Conference on NaturalLanguage Learning at HLT-NAACL 2003.Ralph Weischedel, Martha Palmer, Mitchell Marcus,Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, MichelleFranchini, et al. 2013. OntoNotes release 5.0. Linguistic Data Consortium.Daniel Zeman, Jan Hajič, Martin Popel, Martin Potthast, Milan Straka, Filip Ginter, Joakim Nivre, andSlav Petrov. 2018. CoNLL 2018 shared task: Multilingual parsing from raw text to universal dependencies. In Proceedings of the CoNLL 2018 SharedTask: Multilingual Parsing from Raw Text to Universal Dependencies. Association for ComputationalLinguistics.Daniel Zeman, Joakim Nivre, Mitchell Abrams, NoëmiAepli, Željko Agić, Lars Ahrenberg, Gabrielė Aleksandravičiūtė, Lene Antonsen, Katya Aplonova,Maria Jesus Aranzabe, Gashaw Arutie, MasayukiAsahara, Luma Ateyah, Mohammed Attia, Aitziber Atutxa, Liesbeth Augustinus, Elena Badmaeva,Miguel Ballesteros, Esha Banerjee, Sebastian Bank,Verginica Barbu Mititelu, Victoria Basmov, Colin

Batchelor, John Bauer, Sandra Bellato, Kepa Bengoetxea, Yevgeni Berzak, Irshad Ahmad Bhat,Riyaz Ahmad Bhat, Erica Biagetti, Eckhard Bick,Agnė Bielinskienė, Rogier Blokland, Victoria Bobicev, Loïc Boizou, Emanuel Borges Völker, CarlBörstell, Cristina Bosco, Gosse Bouma, Sam Bowman, Adriane Boyd, Kristina Brokaitė, AljoschaBurchardt, Marie Candito, Bernard Caron, GauthierCaron, Tatiana Cavalcanti, Gülşen Cebiroğlu Eryiğit, Flavio Massimiliano Cecchini, Giuseppe G. A.Celano, Slavomír Čéplö, Savas Cetin, Fabricio Chalub, Jinho Choi, Yongseok Cho, JayeolChun, Alessandra T. Cignarella, Silvie Cinková,Aurélie Collomb, Çağrı Çöltekin, Miriam Connor, Marine Courtin, Elizabeth Davidson, MarieCatherine de Marneffe, Valeria de Paiva, Elvisde Souza, Arantza Diaz de Ilarraza, Carly Dickerson, Bamba Dione, Peter Dirix, Kaja Dobrovoljc,Timothy Dozat, Kira Droganova, Puneet Dwivedi,Hanne Eckhoff, Marhaba Eli, Ali Elkahky, BinyamEphrem, Olga Erina, Tomaž Erjavec, Aline Etienne, Wograine Evelyn, Richárd Farkas, HectorFernandez Alcalde, Jennifer Foster, Cláudia Freitas, Kazunori Fujita, Katarína Gajdošová, DanielGalbraith, Marcos Garcia, Moa Gärdenfors, Sebastian Garza, Kim Gerdes, Filip Ginter, IakesGoenaga, Koldo Gojenola, Memduh Gökırmak,Yoav Goldberg, Xavier Gómez Guinovart, BertaGonzález Saavedra, Bernadeta Griciūtė, Matias Grioni, Normunds Grūzı̄tis, Bruno Guillaume, CélineGuillot-Barbance, Nizar Habash, Jan Hajič, Jan Hajič jr., Mika Hämäläinen, Linh Hà M

Table 1: Feature comparisons of Sta nz a against other popular natural language processing toolkits. ing 66 languages, by training the pipeline on the Universal Dependencies (UD) treebanks and other multilingual corpora. State-of-the-art performance. We