The MATE Markup Framework

1y ago

16 Views

1 Downloads

1.03 MB

10 Pages

Report/dmca

Download PDF

Transcription

The MATE Markup FrameworkLaila DYBKJ2,ER and Niels Ole BERNSENNatural Interactive Systems Laboratory, University of Southern Denmark.Science Park 10, 5230 Odense M, Denmarklaila@nis.sdu.dk, nob@nis.sdu.dkused rules. Similarly, programs for dialogue actrecognition and prediction tend to be based onannotated corpus data. Evaluation of usersystem interaction and dialogue success is alsobased on annotated corpus data. As SLDSs andother language products become moresophisticated, the demand will grow for corporawith multilevel and cross-level annotations, i.e.annotations which capture information in theraw data at several different conceptual levels ormark up phenomena which refer to more thanone level. These developments will inevitablyincrease the demand for standard tools insupport of the annotation process.The production (recording, transcription,annotation, evaluation) of corpus data fer spokenlanguage applications continues to be timeconsuming and costly. So is the construction oftools which facilitate annotation and informationextraction. It is therefore desirable that alreadyavailable annotated corpora and tools be usedwhenever possible. Re-use of annotated data andtools, however, confronts systems developerswith numerous problems which basically derivefrom the lack of common standards. So far,language engineering projects usually haveeither developed the needed resources fromscratch using homegrown formalisms and tools,or painstakingly adapted resources fromprevious projects to novel purposes.In recent years, several projects have addressedannotation formats and tools in support ofannotation and information extraction (for ). Some projects have addressed theissue of markup standardisation from differentperspectives. Examples are the Text EncodingInitiative (TEI) S)(http://www.cs.vassar.edu/CES/),andtheEuropean Advisory Group for LanguageEngineering Standards (EAGLES) (http://www.-AbstractSince early 1998, the European Telematicsproject MATE has worked towardsfacilitating re-use of annotated spokenlanguage data, addressing theoretical issuesand implementing practical solutions whichcould serve as standards in the field. Theresulting MATE Workbench for corpusannotation is now available as licensed opensource software.This paper describes the MATE markupframework which bridges between thetheoretical and the practical activities ofMATE and is proposed as a standard for thedefinition and representation of markup forspoken dialogue corpora. We also presentearly experience from use of the framework.1. IntroductionSpokenlanguageengineering productsproliferate in the market, commercial andresearch applications constantly increasing invariety and sophistication. These developmentsgenerate a growing need for tools and standardswhich can help improve the quality andefficiency of product development andevaluation. In the case of spoken languagedialogue systems (SLDSs), for instance, theneed is obvious for standards and standard-basedtools for spoken dialogue corpus annotation andautomatic infomaation extraction. Informationextraction from annotated corpora is used inSLDSs engineering for many different purposes.For several years, annotated speech corpora havebeen used to train and test speech recognisers.More recently, corpus-based approaches arebeing applied regularly to other levels ofprocessing, such as syntax and dialogue. Forinstance, annotated corpora can be used toconstruct lexicons and grammars or train agrammar to acquire preferences for frequently19

ilc.pi.cnr.it/EAGLES96/home.hurd).Whilstthese initiatives have made good progress onwritten language and current coding practice,none of them have focused on the creation ofstandards and tools for cross-level spokenlanguage corpus annotation. It is only recentlythat there has been a major effort in this domain.The project Multi-level anotafion ToolsEngineering ( M A T E ) (http://mate.nis.sdu.dk)was launched in March 1998 in response to theneed for standards and tools in support ofcreating, annotating, evaluating and exploitingspoken language resources. The central idea ofMATE has been to work on both annotationtheory and practice in order to connect the twothrough a flexible framework which can ensure acommon and user-friendly approach acrossannotation levels. On the tools side, this meansthat users are able to use level-independent toolsand an interface representation which isindependent of the internal coding filerepresentation.This paper presents the MATE markupframework and its use in the MATE Workbench.In the following, Section 2 briefly reviews theMATE approach to annotation and toolsstandardisafion. Section 3 presents the MATEmarkup framework. Section 4 concludes thepaper by reporting on early experiences with thepractical use of the markup framework anddiscussing future work.(morpho-)syntax, co-reference, dialogue acts,communication problems, and cross-level issues.Cross-level issues are issues which relate tomore than one annotation level. Thus, forinstance, prosody may provide clues for avariety of phenomena in semantics anddiscourse. The resulting report (Klein et al.,1998) describes more than 60 coding schemes,giving details per scheme on its coding book, thenumber of annotators who have worked with it,the number of annotated dialogues/segments/utterances, evaluation results, the underlyingtask, a list of annotated phenomena, and themarkup language used. Annotation examples areprovided as well.We found that the amount of pre-existing workvaries enormously from level to level. Therewas, moreover, considerable variation in thequality of the descriptions of the individualcoding schemes we analysed. Some did notinclude a coding book, others did not provideappropriate examples, some had never beenproperly evaluated, etc. The differences indescription made it extremely difficult tocompare coding schemes even for the sameannotation level, and constituted a ratherconfused and incomplete basis for the creationof standard re-usable tools within, as well asacross, levels.The collected information formed the startingpoint for the devebpment of the MATE markupframework which is a proposal for a standard forthe definition and representation of markup forspoken dialogue corpora (Dybkjmr et al., 1998).Analysis of the collected information on existingcoding schemes as regards the informationwhich came with the schemes as well as theinformation which was found missing, providedinput to our proposal for a minimal set ofinformation items which should be provided fora coding scheme to make it generallycomprehensible and re-usable by others. Forinstance, a prescriptive coding procedure wasincluded among the information items in theMATE markup framework despite the fact thatmost existing coding schemes did not come withthis information. This list of information itemswhich we call a coding module, is the coreconcept of the MATE markup framework andextends and formalises the concept of a codingscheme. The ten entries which constitute acoding module are shown in Figure 4. Roughly"2The MATE ApproachThis section first briefly describes the creation ofthe MATE markup framework and a set ofexample best practice coding schemes inaccordance with the markup framework. Then itdescribes how a toolbox (the MATEWorkbench) has been implemented to supportthe markup framework by enabling annotationon the basis of any coding scheme expressedaccording to the framework.2.1TheoryThe theoretical objectives of MATE were tospecify a standard markup framework and toidentify or, when necessary, develop a series ofbest practice coding schemes for implementationin the MATE Workbench. To these ends, webegan by collecting information on a largenumber of existing annotation schemes for thelevels addressed in the project, i.e. prosody,20

speaking, a coding module includes or describeseverything that is needed in order to perform acertain kind of markup of spoken languagecorpora. A coding module prescribes whatconstitutes a coding, including the representationof markup and the relations to other codings.Thus, the MATE coding module is a proposalfor a standardised description of codingschemes.The above-mentioned five annotation levelsand the issues to do with cross-level annotationwere selected for consideration in MATEbecause they pose very different markupproblems. If a common framework can beestablished and shown to work for those levelsand across them, it would seem likely that theframework will work for other levels as well.For each annotation level, one or more existingcoding schemes were selected to form the basisof the best practice coding modulesimplemented in the MATE Workbench (Mengelet al., 2000). Common to the selected codingschemes is that these are among the most widelyused coding schemes for their respective levelsin current practice, each having been used byseveral annotators and for the annotation ofmany dialogues. Since all MATE best practicecoding schemes are expressed in terms of codingmodules, they should contain sufficientinformation for use by other annotators. Theiruniform description in terms of coding modulesmakes it easy for the annotator to work onmultiple coding schemes and/or levels, and tocompare schemes since these all contain thesame categories of information. The use ofcoding modules also facilitates use of the sameset of software tools and enables the sameinterface look-and-feel independently of level.2.2The MATE best practice coding modules areincluded as working examples of the state of theart. Users can add new coding modules via theeasy-to-use interface of the MATE codingmodule editor.An audio tool enables listening to speech filesand having sound files displayed as a waveform.For each coding module, a default stylesheetdefines how output to the user is presentedvisually. Phenomena of interest in the corpusmay be shown in, e.g., a certain colour or inboldface. Users can modify style sheets anddefine new ones.The workbench enables information extractionof any kind from annotated corpora. Queryresults are shown as sets of references to thequeried corpora. Extraction of statisticalinformation from corpora, such as the number ofmarked-upnouns,isalsosupported.Computation of important reliability measures,such as kappa values, is enabled.Import of files from XLabels and BAS Partiturto XML format is supported in order todemonstrate the usefulness of importing widelyused annotation formats for further work in theWorkbench. Similarly, a converter fromTranscriber format (http://www.etca.fr/CTA/gip/Projets/Transcriber/) to MATE formatenables transcriptions made using Transcriber tobe annotated using the MATE Workbench.Other converters can easily be added. Export tofile formats other than XML can be achieved byusing style sheets. For example, informationextracted by the query tool may be exported toHTML to serve as input to a browser.On-line help is available at any time.The first release of the MATE Workbenchappeared in November 1999 and was madeavailable to the 80 members of the MATEAdvisory Panel from across the world. Sincethen, several improved versions have appearedand in May 2000 access to executable versionsof the Workbench was made public. The MATEWorkbench is now publicly available both in anexecutable version and as open source softwareat http://mate.nis.sdu.dk. The Workbench is stillbeing improved by the MATE consortium, sonew versions will continue to appear. Adiscussion forum has recently been set up at theMATE web site where colleagues are invited toask questions and provide information from theirexperience with the Workbench, including theToolingThe engineering objective of MATE has been tospecify and implement a genetic annotation toolin support of the markup framework and theselected best practice coding schemes. Severalexisting annotation tools were reviewed early onto gather input for MATE workbenchspecification (Isard et al., 1998). Building onthis specification, the MATE markup frameworkand the selected coding schemes, a java-basedworkbench has been implemented (Isard et al.,2000) which includes the following majorfunctionalities:21

new tools they have added to the MATEWorkbench to enhance its functionality. We have no exact figures on ]how many usersare now using the workbench but we know thatthe MATE workbench is already being used byand is being considered for use in severalEuropean and national research projects.3provide essential informationsemantics, coding purpose etc.3.1The MATE markup framework is a conceptualmodel which basically prescribes (i) how filesare structured, for instance to enable multi-levelannotation, (ii) how tag sets arc; represented interms of elements and attributes, and (iii) how to .I rlicil:dlllrn ;: oomph:lit-,idlor iz Files, elements a n d attributesi rl , l zloguoponglllNIOcommilr ,i,, expiml t.Prc nde Jrnrri :llale " I I llai:ll,"r smarkup,When a coding module has been applied to acorpus, the result is a coding file. The coding filehas a header which documents the codingcontext, such as who annotated the file, when,and the experience of the annotator, and a bodywhich lists the coded elements. Figure 1 showsan example of how annotated communicationproblems are displayed to the user in the MATEWorkbench. Figure 2 shows an excerpt of theinternal representation of the file.The M A T E M a r k u p F r a m e w o r k Clirl'rlilu i r u l o nonDonl .: ,qoo mti4.lhem are e llm .r.,, at pt o!21e etll il l : t I st oill,'lGG3tDOnllla,Nllos m l r t . l,ri ItOn I' W' t / ili' tlliltl l'l Sl t'll IOhra 0 lISII l l l l t!l if'PIGI I i l l l r illlllrf iinll 'l'l l l m a l ilol llrllil ol,t tllil azlmd i : r i n Izt'l 5 l s "ll'izl o I ti,1 "e: 2 lllrOl.l:ls or ll l lile Crom{:I. mdus I mo mr,,t,d:M11!!'. . OIltt' r'l ",i '11 lidP 31 'r,l10ihrg llz ile trlie ! lf? t i I I I It'll, (ll}lth] ilfflll 1 t I'4 15fi'l hll lioilPl ie : l fl l l .'lJ iIYCIrY E: lmdUl h iin .Yfirierfl inNrrnnton p .ided t o u er on o i e n { gloup s o[ pl opk l' 10 1t d l in IrlOl C ! e i ' . Ojlly blflP i' ltll tll Ifllt(l Oul!e ICllOfl tldli lill e r I s linllald ;lltii ll lt li ttot!lll I ,toiYIIlffll n'lIolcl'l lor oiql li cl II1 i pl' %(llll Ilil ranl .Figure 1. A screen shot from the MATE Workbench showing a dialogue annotated withcommunication problems (top left-hand panel). Guidelines for cooperative dialogue behaviour areshown in the top fight-hand panel. Communication problems are categonsed as types of violations ofthe coopemtivity guidelines. Violation types are shown in the bottom fight-hand panel. Notes may beadded as part of the annotation. Notes are shown in the bottom left-hand panel.As shown in Figure 2, the annotated filerepresentation is simply a list of references tothe transcription file. The underlying filestructure idea is depicted in Figure 3 whichshows how coding files (bottom layer) refer to atranscription file and possibly to other codingfiles, cf. entry 5 in the coding module in Figure4. A transcription (which is also regarded as acoding file) refers to a resource file listing themw data resources behind the corpus, such assound files and log files. The resource fileincludes a description of the corpus: purpose ofthedialogues,dialogueparticipants,experimenters, recording conditions, etc. Abasic, sequential tirneline representation of the! t, r r i b iti-'Ct?.l hqa-'! JO.tl m llttr. -7 . :',c' 1t"I t J.,.'CT'1117i"r ,,,',, l 9, ; IK 14 . t01 . ' 1 ! Y.,OTllpt Oh- cm'ri0vo I it "C 317"t- ,,ASt l.,. ;iI,l,,'t4r;67)" ,,e4*' ?i %'17 0 l-vc . ,llbly p l 11 alil O.O l- ' i -Rompi, oo* Figure 2. Excerpt of the internal XMLrepresentation of the annotated dialogue shownin Figure 1. The tags will be explained inSection 3.1.1 below.: .22

spoken language data is defined. The firnelinemay be expressed as real time, e.g. inRawdataSoundfilesmilliseconds, or as numbers indicating, e.g., thesequence of utterance starts.I videoor pictures sOrthographicPhonetic[ pCr o e iCafi n ---- [ Dialogueacts ----- [ProsodyIF i g u r e 3. The raw corpus data are listed in the resource file to which transcriptions refer. Coding filesat levels other than transcription refer to a transcription and only indkectly to the raw data. Codingfiles may refer to each other.Given a coding purpose, such as to identify allcommunication problems in a particular corpus,and a coding module, the actual coding consistsin using syntactic markup to encode the relevantphenomena found in the data. A coding isdefined in terms of a tag set. The tag set isconceptually specified by, and presented to, theuser in terms o f elements and attributes, el. entry6 in the coding module in Figure 4. Importantly,workbench users can use this markup directlywithout having to know about complex formalstandards, such as SGML, XML or TEI.1. Ni: The name of El.Example: u 2.Ei may contain a list o f elements fromM.Example: u may contain t : u t Exarnple /t /u 3. Ei has attributes Aij, where j 1 . n .Example: u has attributes who and id,among others.4.Ei may refer to elements in coding moduleMj,implying thatMreferencesMj.Example: a dialogue act coding may refer3.1.1 ElementsThe basic markup primitive is the dement (aterm inherited from TEI and SGML) whichrepresents a phenomenon such as a particularphoneme, word, utterance, dialogue act, orcommunicationproblem.Elementshaveattributes and relations to each other both withinthe cu ent coding module and across codingmodules. Considering a coding module M, themarkup specification language is described as:* El .E,: The non-empty list of tag elements.to phonetic or syntactic cues.A concrete example is the coding module forcommunication problems which, i.a., has theelement eomprob , el. the XML representationin Figure 2. eomprob has, i.a., the attributes iduref and vtype, uref is a reference to an utterancein the transcription coding, xtype is a referenceto a type of violation of a guideline in theviolation type coding. Due to the inflexibility ofXML, this logical structure has to be representedslightly differently internally in the workbench.Thus, the urcf corresponds to the first href in For each element t , the following propertiesmay be defined:23

Figure 2 while vtype is wrapped up in a newelement and corresponds to the second href.Example: tirne 123200 dur 1280 (these arederived values, with time TimeStart, and dur TimeEnd- TirneStart).3.1.2 AttributesAttributes are assigned values during coding.For each attribute Aij the type of its values mustbe defined. There are standard attributes, userdefined attributes, and hidden attributes, asfollows.Standard attributes are attributes prescribed byMATE.o id [mandatory]: ID. The element id iscomposed of the element name and a machinegenerated number.Example: id r 123 HREF[MODULE, ELEMENTLIST]:HereMODULE is the name o f another codingmodule, and ELEMENTLIST is a list of namesof elements from MODULE. When applied asconcrete attribute values, two parameters mustbe specified:The name o f the referenced coding file whichis an application of the declared MODULEcoding module.-Time start and end are optiorml. Elements musthave time information, possibly indirectly byreferencing other elements (in the same codingmodule or in other modules) which have timeinformation.The id of the element occurrence that isreferred to.The values o f this attribute are of the form:.CodeFileName"#'Elementld'. . . , u) allows an attribute used as,e.g., Occursln "base 123", where base is acoding file using the transcription module andu 123 is the value o f the id attribute o f a t element in that file. TimeStart [optional]: TIME. Start of event. YimeEnd [optional]: TIME. End o f event.User-defined attributes are used to parametriseand extend the semantics of the elements theybelong to. For instance, who is an attribute ofelement u designating by whom the utteranceis spoken. There will be many user-definedattributes (and elements), el., e.g., the uref andvtype mentioned above.Hidden attributes are attributes which the userwill neither define nor see but Which are used forinternal representation purposes. An example isthe following of coding elements which mayrefer to utterances in a transcription but whichdepend on the technical programming choice ofthe underlying, non-user related representation:ModuleRefs CDATA 'href:transcription#u'See Figure 2 for a concrete example from theMATE ription, participant] anactualoccurrence may look like who "#participant2''where the omitted coding file name byconvention generically means the currentcoding file.The concept o f hyper-references together withparameters referencing coding modules (seepoint 5 in Figure 4) is what enables ccodingmodules to handle cross-level markup. 3.1.3 Attribute standard typesThe M A T E markup framework proposes a set ofpredefined attribute value types (attributes aretyped) which are supported by the workbench.By convention, types are written in capitals. Theincluded standard types are:*TIME: in milliseconds, as a sequence ofnumbers, or as named points on the timeline.Values are numbers or identifiers, and thedeclaration o f the timeline states how tointerpret them.ENUM: A finite closed set o f values.Values are of the form: "(" Identifier ( " 1 "Identifier )* ")"Example: time (yearlmonthldaylhour) allowsattributessuchastime--day.The user m a y be anowed to extend the set, butnever to change or delete values from the set. TEXT: Any text not containing . (which isused to delimit the attribute value).Example: The declaration dese TEXT allowsuses such as: event desc "Door is slammed" . 24I13: Automatically generated id for the element.Only available in the automatically addedattribute id.

3.2occur. Other items provide directives andexplanations to users. Thus, (2), (3) and (4)elaborate on the module itself, (7) and (8)elaborate on the markup, and (9) recommendscoding procedure and quality measures. (10)provides information about the creation of thecoding module, such as by whom and when.In the following, we show an abbreviatedversion of a coding module for communicationproblems to illustrate the 10 coding moduleentries.Coding modulesIn order for a coding module and the dialoguesannotated using it to be usable andunderstandable by people other than its creator,some key information must be provided. TheMATE coding module which is the central partof the markup framework, serves to capture thisinformation. A coding module consists of the tenitems shown in Figure 4.1.Name of the module.2.3.Coding purpose of the module.Coding level.Name: Communication problems.Coding purpose: Records the different ways inwhich generic and specific guidelines areviolated in a given corpus. A communicationproblems coding file refers to a problem typecoding file as well as to a transcription.Coding level: Communication problems.Data sources: Spoken human-machine dialoguecorpora.Module references: Module Basic orthographic transcription; Module Violation types.Markup declaration:ELEMENT eornprobATTRIBUTESvtype: REFERENCE(Violation types,vtype)wref: REFERENCE(Basic orthographictranscription, (w,w) )uref: REFERENCE(Basic orthographictranscription, u )caused by: REFERENCE(this,eomprob)temp: TEXTELEMENTnoteATTRIBUTESwref: REFERENCE(Basic orthographictranscription, (w,w) )uref: REFERENCE(Basic orthographictranscription, u sproducedbyinadequate system utterance design we use theelement eomprob. It refers to some kind ofviolation of one of the guidelines listed in Figure1, top fight-hand panel. The comprob elementmay be used to mark up any part of the dialoguewhich caused, or might cause, a communicationproblem. Thus, cornprob may be used to annotateone or more words, an entire utterance, or evenseveral utterances in which an actual or potentialcommunication problem was detected. Theeomprob element has five attributes in addition tothe automatically added id.4. The type of data source scoped by themodule.5. References to other modules, if any. Fortranscriptions, the reference is to a resource.6. A declaration of the markup elements andtheir attributes. An element is a feature, or typeof phenomenon, in the corpus for which a tag isbeing defined.7. A supplementary informal description ofthe elements and their attributes, including:a. Purpose of the element, its attributes,and their values.b. Informal semantics describing how tointerpret the element and attribute values.c. Example of each element and attribute.8. An example of the use of the elements andtheir attributes.9. A coding procedure.10. Creation notes.Figure 4. Main items of the MATE codingmodule.Some coding module items have a formal role,i.e. they can be interpreted and used by theMATE workbench. Thus, items (1) and (5)specify the coding module as a namedparametrised module or class which builds oncertain other predefined modules (no cyclesallowed). Item (6) specifies the markup to beused in the coding. All elements, attributenames, and ids have a name space restricted totheir module and its coding files, but arepublicly referrable by prefixing the name of thecoding module or coding file in which they25

The attribute vtype is mandatory, vtype is areference to a description of a guidelineviolation in a file which contains the differentkinds of violations of the individual guidelines.Either wref or uref must be indicated. Boththese attributes refer to an orthographictranscription, wref delimits the word(s) whichcaused or might cause a communicationproblem, and uref refers to one or more entireutterances which caused or might cause aproblem.We stop the illustration here due to spacelimitations. The full description is available in(Mengel et al. 2000).Example:In the following snippet of a transcription fromthe Sundial corpus: u id "Sl:7-1-sun" who "S" flight informationbritish airways good day can I help you /u communication problems are marked up asfollows:just selects the utterance to nark up and thenclicks on the violation type palette, or, in case itis a new type, clicks on the violatedcooperafivity guideline which means that a newviolation type is added and text can be entered todescribe it, el. Figure 1.Coding procedure: We recommend to use thesame coding procedure for markup ofcornrnunicafion problems as for violation typessince the two actions are tightly connected. As aminimum, the following procedure should befollowed:1. Encode by coders 1 and 2.2. Check and merge codings (performed bycoders 1 and 2 until consensus).Creation notes:Authors: Hans Dybkj er and Laila Dybkj er.Version: 1 (25 November 1998), 2 (19 June1999).Comments: For guidance on how to identifycommunication problems and for a collection ofexamples the reader is invited to look at(Dybkj er 1999).Literature: (Bernsen et al. 1998). comprob id "Y ' vtype "Sundial problems#SG4-1"uref " Sundial#S 1:7-1 -sun '7 We do not exemplify note here and do notshow the violation type coding file due to spacelimitations. However, note that once a codingmodule is specified in the MATE workbench,the user does not have to bother about themarkup shown in the example above. The userThe MATE Workbench allows its users tospecify a coding module via a coding moduleeditor. A screen shot of the coding module editoris shown in Figure 5.Figure 5. The MATE coding module editor.26

4markupframework in the Workbenchimplementation in order to achieve a better userinterface.Other frameworks have been proposed but toour knowledge the MATE markup framework isstill the more comprehensive framework around.An example is the annotation frameworkrecently proposed by Bird and Liberrnan (1999)which is based on annotation graphs. These arenow being used in the ATLAS project (Bird etal., 2000) and in the Transcriber tool (Geoffroiset al., 2000). The annotation graphs serve as anintermediate representation layer in agreementwith the argument above for having anintermediate layer of representation between theuser interface and the intemal representation.Whilst Bird and Liberman do not considercoding modules or discuss the interface from ausability point of view, they present tion and time line reference. The twoframeworks may, indeed, tuna out tocomplement each other nicely.Early Experience and Future WorkThe MATE markup framework has been wellreceived for its transparency and flexibility bythe colleagues on the MATE Advisory Panel.The framework has been used to ensure acommon description of coding modules at theMATE coding levels and has turned out to workwell for all these levels. We therefore concludethat the framework is likely to work for otherannotation levels not addressed by MATE. Theuse of a common representation and a commoninformation structure in all coding modules atthe same level as well as across levels facilitateswilhin-level comparison, creation and use ofnew coding modules, and working at multiplelevels.On the tools side, the markup framework hasnot been fully exploited as intended, i.e. as anintermediate layer between the user interfaceand the internal representation. This means thatthe user interface for adding new codingmodules, in particular for the declaration ofmarkup, and for defining new visualisations isstill sub-optimal from a usability point of view.The coding module editor which is used foradding new coding modules, represents a majorstep forward compared to requiring users towrite DTDs. The coding module editorautomatically generates a DTD from thespecified markup declaration. However, theXML format used for the underlying filerepresentation has not been

The MATE best practice coding modules are included as working examples of the state of the art. Users can add new coding modules via the easy-to-use interface of the MATE coding module editor. An audio tool enables listening to speech files and having sound files displayed as a waveform. .