Toolbox 1 - University Of Kansas

Transcription

Toolbox 1!Susan Gehr!!susan@gehr.info!Cell/text (707) 599-2719!

With gratitude!l Albert Bickford, Toolbox instructor forInField 2008, 2010 & CoLang 2012!l Neil Brinneman, Shoebox instructor 2003!l Shoebox/Toolbox Field Linguistʼs ToolboxGoogle Group!l Schoolmates at University of Oregon,especially Connie Dickinson!

Course Goals!l Review of Toolbox for use in creating lexicaldatabases!l Introduction to connections to the topicsrelated to dictionary projects (workflow,lexicography, community involvement)!l Information about how to learn more and gethelp after this class!l Study of how Toolbox is part of the Karukdictionary project!l Other related software!

Getting to know each other!l What languages and projects are you working onwith Toolbox?!l Did you bring your own computer and your owndata to work with? !l Will you be using Toolbox on the computers inWatson 419? If so, I need to ask Jari to install theNew Project Package. !l What are your current Toolbox-related questionsor learning goals?!l Will someone else be setting up your project foryou, or do you need to do it yourself? !

Logistics!l Main class: Tuesday June 19 to FridayJune 222:15 to 3:45 PM!l Location of Class: 419 Watson Library!l Possibilities for consultation outside ofclass !

Course Outline!l June 19, Tuesday!l Introductions to our Toolbox projects!l Discussion of the parts and preparationbehind every Toolbox Project!l Language Encoding (orthography, sort order)!l Database Type development!l Workflow !l Project Management !l Toolbox life cycle, considerations for migratingdata!

Course Outline 2!l June 20, Wednesday!l Database types, continued!l Establishing a data structure / cheatsheet!l What are your fields? What rules do they require?!l Documenting all decisions that you make about yourworkflow, whether you work alone or as part of a team!l Getting to work – Using Toolbox!l June 21, Thursday!l More on Using Toolbox – Based on classneed & interest!

Course Outline 3!l Friday, June 22!l Related software (operating systems,Microsoft Word, Windows virtualizationsoftware for Mac users, backup software, etc.)!l Outputting data!l For online and print use (dictionaries, wordlistsmade with filters)!l Working with a publisher or a printer!

Karuk Toolbox Project!l Started in November 2003!l Took 2-week course at JAARS!l William Bright data and participation!l Nailing down the language encoding andrevising it in 2009!l Publication of a printed dictionary (2005)and an online dictionary (ongoing with UCBerkeley Linguistics)!

Toolbox Project Preparation!l Before data entry can start, you need toʻteachʼ Toolbox some things about yourproject.!l Language Encoding!l Orthography!l Sort Order !l Unicode compatibility!

Language Encoding Features! The order for sorting (using that script).! Upper and lower case forms of thecharacters (if any).! Special groupings of characters, variables,which are useful in examining or searchingthe data.! A font to represent the character shapes.! Often, a special keyboard to facilitateentering the characters.!

Exercise: Creating your sortorder! What is the order of your writing system?! Will your audience expect it to sort asEnglish does, or will they expect someother sort order?!

Language EncodingKaruk 2009!

Toolbox Project Preparation!l Database type!l What types of information do you want to keeptrack of?!l MDF – the Multi-Dictionary Formatter!l What is it? !l A database type that can function as a data structurestandard and a data content standard!l Read !l ofl MDF 2000.pdf

Toolbox Project Preparation!l Database type!l What is your data structure?!l What are your rules about entering data?!l Do you keep a cheatsheet data structure and anotebook?!l Karuk examples in Word and in Notebook!

What are databases & database types?!In Toolbox, databasescan be:!dictionaries!one text or a collectionof texts!!In Toolbox, a databasetype is a file that:!includes a collection ofproperties thatdefines various fieldsof the database andsome of the methodsused for manipulatingrecords.!

More about database types! A Dictionary database type might contain:!– A recommended set of Field Markers!– Filters used for finding particular records in alexicon (eg, all nouns, a particular morpheme,words with homonyms, etc)!– The Date Stamp field marker \dt!

Making your own Dictionarytype! You can & should make a database typefor your language by copying the MDFtype and modifying the copy (see KarukMDF)!

Fields in a Dictionary Database

What fields might I use?! See the list of all MDF fields pp. 13-39, MakingDictionaries, a guide to lexicography and MDF.! Take time before you start making lots of entriesinto your database to, otherwise you might haveto either correct them by hand or have someonewrite a CC Table to correct all the records. !!

Exercise: Choose field markers & make acheatsheet! Basic minimum set!\lx - Lexeme!\a - Alternate form!\u - Underlying form !\ps - Part of speech!\ge - Gloss!\de - Definition!\sd - Semantic domain!\nt - Notes!\dt - Date Last Edited!

An example data structure!

Toolbox Project Preparation!l Workflow matters!l Do you work by yourself?!l Are you working on a team?!

Toolbox Project Preparation!l Project management!l Printed dictionaries have front and backmatter!l Online dictionaries have websites withsupplemental text, graphics and sound files!l How will you keep track of master files,working files, passwords, contact information,and other valuable records?!

Toolbox Project Preparation! Toolbox life cycle!– “Toolbox is nearing the end of its life-cycle.” AlbertBickford, SIL, June 2008.!– So Toolbox must be really near the end here in 2012?! “the [Toolbox] programmer is still active working on a kindof large new feature which is taking its own sweet time Ourassignment for the foreseeable future is Toolbox. Allprograms and people are mortal, but we are doing our best tokeep ourselves and Toolbox among the living.” ToolboxSupport. May 1, fieldlinguists-toolbox/!– For now, I will keep using Toolbox, keeping my eyeout for the day that it canʼt be run on contemporarycomputers/operating systems and support dries up.!

End of Day 1?!l Suggestions & adjustments for followingdays based on student need andpreparation for Toolbox 2.!l Work on Language Encodings andDatabase Types!

Day 2!l Work on Language Encodings and DatabaseTypes!l Database types & workflow matters,continued!l Establishing a data structure / cheatsheet!l What are your fields? What rules do they require?!l What does the MDF book say about your chosen fields? !l What if, 5,000 records in, you want to add new fields?!l Documenting all decisions that you make about yourworkflow, whether you work alone or as part of a team!l Dictionary work Google Doc!

On the Computer! Go to the Project menu LanguageEncodings ! Select the vernacular Language Encodingand click the Copy button.! Type your sort order into the PrimaryCharacters box.!

What to include in your Toolboxdata structure!From CoLang 2012 course Lexicography, Dwyer & erndwyercolang lexicog1/

Helpful Information! Scientific Names http://www.itis.gov/! Toolboxhttp://www.sil.org/computing/toolbox/! SuperDuper! – for backup on the erDescription.html! (Carbon Copy Cloner) another backupsolution for Mac!

Day 3!l June 21, Thursday!l Filtering data!l Helps with analysis and problem solving!l Letʼs set some up and use them!l Collaboration documents!l Data structure matters continued!l Related software (operating systems,Microsoft Word, Windows virtualizationsoftware for Mac users, backup software, etc.)!!

Filters, Finding and Searching! Find & Search are two different things inToolbox!

Search!

Find!

Filter!

Workflow Filter: All Recordsafter a certain date!

High Interest (and multi-part)Filter: All Personal Names!

Day 4!l Friday, June 22!l Outputting data!l For online and print use (dictionaries, wordlistsmade with filters)!l Working with a publisher or a printer!l User testing!l Does your main audience like your work?!!

Outputting Data: Print! Word list – using a filter!– Choose a \sd – Semantic Domain filter andoutput a word list. ! Whole database!– Karuk to English!– English to Karuk!

Outputting Data:Filtered Word List!

Outputting Data:Filtered Word List!

Outputting Data:Filtered Word List!

Outputting Data:Filtered Word List!

Outputting Data:Filtered Word List!

Outputting Data:Filtered Word List!

Outputting Data:Filtered Word List!

Outputting Data:Filtered Word List!To adjust the styles and formatting:

Outputting Data:English to Karuk!

Outputting Data:English to Karuk!

Outputting Data: XML format!

Outputting Data: XML format!

Outputting Data: XML format!

Outputting Data: XML to online!http://dictionary.karuk.org/

Working with a publisher orprinter! Find out their technical requirements! Provide them with the format they require.In the case of my first printer, it was hardcopy. ! See also “10. Completing the dictionary”from Making dictionaries: a guide tolexicography and MDF.!

User testing! Tools for Language Revitalization: TheOnline Karuk and Yurok Dictionaries.Unpublished paper written for SJSUʼsLIBR 202: Information Retrieval. ! Dictionaries can have advisory boards,which could be a communityʼs languagecommittee or a subset of that committee.!– Reviewing sections of the dictionary, handlingquestions or disagreements, discussing ʻnewwordsʼ!

Yôotva!!Susan Gehrsusan@gehr.infoCell/text (707) 599-2719http://dictionary.karuk.org/

What are databases & database types?! In Toolbox, databases can be:! dictionaries! one text or a collection of texts!! In Toolbox, a database type is a file that:! includes a collection of properties that defines various fields of the database