DLI-DEL: Data Management & Archiving - Colleenfitzgerald

Transcription

10/6/20DLI-DEL: Data Management &ArchivingDr. Susan Smythe KungArchive of the Indigenous Languages of Latin AmericaUniversity of Texas at Austin1OutlinePart 1: DataManagementPlansPart 2:Archiving21

10/6/20Part 1: Data Management Plans3What is a DataManagement Plan?A DMP is a document created early in a research project that describes the types of data to be generated; how the data will be compiled, analyzed, and stored; who will have access to the data during the project & whowill manage the data; the legal and ethical status of the data; and how the data will be handled after the project is complete,including deletion or destruction of some or all of the data, long-term preservation of (a subset of) the data, and how preserved data will be shared.-- Kung, forthcoming42

10/6/20NSF DLI / NEH DEL DMP RequirementsThe DMP requirements are distributed across 3 different documents:1. The Proposal and Award Policies and Procedures Guide (PAPPG) 20-1https://www.nsf.gov/pubs/policydocs/pappg20 1/nsf20 1.pdf2. Data Management for NSF SBE Directorate Proposals & Awardshttps://www.nsf.gov/sbe/DMP/SBE DataMgmtPlanPolicy RevisedApril2018.pdf3. The NSF DLI / NEH DEL Program Solicitation NSF 20-603https://www.nsf.gov/funding/pgm summ.jsp?pims id 5057055(1) The Proposal and Award Policies andProcedures Guide (PAPPG) 20-1Chapter II, Section C.2.j[The DMP] should describe how the proposal will conform to NSF policy on the dissemination andsharing of research results, and may include:1.the types of data, samples, physical collections, software, curriculum materials, and othermaterials to be produced during the course of the project;2.the standards to be used for data and metadata format and content3.policies for access and sharing including provisions for appropriate protection of privacy,confidentiality, security, intellectual property, or other rights or requirements;4.5.policies and provision for re-use, re-distribution, and the production of derivatives; andplans for archiving data, samples, and other research products, and for the preservation ofaccess to them.63

10/6/20(2) Data Management for NSF SBE DirectorateProposals & AwardsContents of the Data Management Plan (pp. 2-3)1.Expected data and how data will be managed prior to sharing with others, how they will beshared, and sharing requirements, protocols2.Period of data retention – emphasis on timely access3.Data formats and dissemination, including provisions for appropriate protection of privacy,confidentiality, security, intellectual property, or other rights or requirements4.Data storage and preservation of access5.Additional possible data management requirements (specific to the program solicitation)7(3) The NSF DLI / NEH DEL program solicitationNSF 20-603Chapter V, section A, Special Information and Supplementary Documentation1) The archiving location should appear in the Project Summary;2) plans and methodology for the sustainable, long-term archiving of all data and a discussion ofinteroperability with related materials should appear in the Project Description;3) PIs and co-PIs with prior awards funded by either or both NSF and NEH should report on datamanagement under "Results from prior NSF support" in accordance with the Data Management Plan forNSF SBE Directorate Proposals and Awards;4) budgeted costs for archiving, including the ingestion into the archive, should appear in the Budgetand Budget Justification under Other Direct Costs line G6;5) a letter from the archive selected by the project should appear in Supplementary Documents; &6) the Data Management Plan should appear in Supplementary Documents.84

10/6/20(3) The NSF DLI / NEH DEL programsolicitation NSF 20-603“The DMP should provide evidence that the proposer has contacted an officialrepository that meets ISO standards to arrange for long-term archiving ofdocumentation generated by the DLI-DEL project. The language archive selected bya DLI-DEL project must have a long-term institutional commitment to datapreservation and access. While the DLI-DEL Program does not sponsor or have anofficial arrangement with any language archive, these services are provided byDELAMAN member archives (https://www.delaman.org/) and by institutionsholding the Data Seal of Approval (https://www.datasealofapproval.org/en/).Regular data backup should be an integral part of the DMP, but this is not to beequated with archiving in an official repository. Backing up data on hard drives,servers, optical media, and cloud-based services does not constitute archiving.”9(3) The NSF DLI / NEH DEL program solicitationNSF 20-603“The DMP should include a timeline for completion of archiving activities. Archiving and execution ofthe Data Management Plan must be completed prior to the submission of the final project report. Finalproject report approval is contingent upon successful archiving and execution of the Data ManagementPlan.”105

10/6/20(3) The NSF DLI / NEH DEL programsolicitation NSF 20-603“Language documentation is of little value if it cannot be accessed. To that extentthe DLI-DEL Program expects that the vast majority of data generated by the DLIDEL project will be publicly accessible with minimal restrictions for noncommercial, educational purposes. (Restrictions on commercial use areacceptable.) The DMP should indicate how archived materials will be accessible tothe public. Any restrictions to be placed on access should be clearly indicated. If theproposer expects access to some materials to be restricted to certain user groups,the DMP should indicate the criteria delineating such user groups and provide anestimate of the percentage of materials which will be so restricted. If time limits areto be placed on access to materials, the DMP should indicate the period of timeafter which access restrictions will be removed.”11(3) The NSF DLI /NEH DEL programsolicitationNSF 20-603“Letter of Collaboration: Proposers should include a letter ofcollaboration from the archive indicating their willingness toarchive project materials and outlining any specificarrangements which have been made. This statement mustbe uploaded under ‘Other Supplementary Documents’.126

10/6/20Don’t worry!No 2-pageDMP cancovereverything!keep calm and carry on by Jelene Morris (CC BY 2.0),https://flic.kr/p/5w9ybW13Suggested DMP Template1. Roles & Responsibilities2. Types of Data3. Standards for Data & Metadata Format &Content4. Policies of Access & Sharing (IP, privacy,cultural protocols)5. Policies for Re-use, Re-distributions &Production of Derivatives6. Data Storage, Archiving (timeline) &Preservation of Access147

10/6/20DMP Resources Preparing a Data Management Plan (University of Texas Libraries)https://guides.lib.utexas.edu/DMP Workshop on Data Manage Plans for Linguistic Research (Berez-Kroeker,Collister & Kung /ailla%3A254604 Kung, Susan. In press. Developing a Data Management Plan. In AndreaBerez-Kroeker, Bradley McDonnell & Eve Okura Koller (eds.), The OpenHandbook of Linguistic Data Management and Archiving, MIT OpenHandbooks in Linguistics series. Cambridge: MIT Press. The DMPTool (https://dmptool.org/) is free for anyone regardless ofuniversity affiliation; see the Quick start guide at https://dmptool.org/help. The ezDMP (https://ezdmp.org/index) tool creates maDMPs specifically forNational Science Foundation grants; the user must log in with either aGoogle or an ORCID (https://orcid.org/) account.15Part 2: Archiving168

10/6/20What is a DigitalArchive (aka DigitalRepository)?“An archive is a trusted repository created andmaintained by an institution with ademonstrated commitment to permanence andthe long-term preservation of archivedresources.”-- Heidi Johnson 2004, p. 142“A repository is not a piece of software Arepository is the sum of financial resources,hardware, staff time, and ongoingimplementation of policies and planning toensure long-term access to content. Anysoftware system you use to preserve andprovide access to digital content is by necessitytemporary it likely will not last forever Institutions make preservation possible.”-- Trevor Owens 2018, p. 417What can a digital repository do for you?Online digital archivesfacilitate Long-term, digitalpreservationGraded accessRights managementOAIS.gif by Poppen / Public domain189

10/6/20Graded access From s All digital archives have Terms orConditions of Use that visitors mustfollow when accessing or using thematerials.Most require users to createaccounts.Catalog information (metadata) inmost digital repositories is usuallypublicly accessible.All digital language archives havesome form of graded access for thefiles.The way the graded access worksvaries greatly between archives.19All digital archives that specialize inlanguage documentation data handlerights similarly. The original Rights Holders retainall of their intellectual propertyrights.The Rights Holders give nonexclusive licenses to the archiveand the archive’s users. Details ofthe licenses vary between archivesand according to specific copyrightlaws of the country where thearchive is located.Commercial use of the data isNEVER allowed.Rights ManagementFrom https://www.ailla.utexas.org/site/rights/use conditions2010

10/6/20Long-Term Digital PreservationDigital preservation is more than just backing up data “Digital Preservation Refers to the series of managed activities necessary toensure continued access to digital materials for as long as necessary. Digitalpreservation [ ] refers to all of the actions required to maintain access todigital materials beyond the limits of media failure or technological andorganisational change.”-- Digital Preservation Coalition, Digital Preservation Handbook Glossary21These Things are NOT ArchivesThings that are not (inherently) archives include social media platformswebsitesfile storage systemsdigital asset management systems (DAMS)Because They do not include digital preservation orThey are not committed to preserving YOURdata. AILLA, licensed under CC-BY-SA 4.02211

10/6/20Timeline: When to Archive Archive early & archive often!Practice progressive archiving: Archive primary data as soon asthe field trip ends. Add transcriptions, translations,and annotations later, as they arefinalized.Don’t wait to archive because there willnever be a “convenient” time to do it.Build progressive archiving into yourDMP. AILLA, licensed under CC-BY 4.023Archiving for the Future:Simple Steps for ArchivingLanguage Documentation DataA free online course developedby AILLA staff in collaborationwith the DELAMAN archives &with funding from the NationalScience Foundation (BCS1653380, A. Woodbury & S.Kung, PIs).Available athttps://archivingforthefuture.teachable.com/2412

10/6/20Archiving Resources & Links DELAMAN https://www.delaman.org/ CoreTrustSeal Repository Registry tified-repositories/ Registry of Research Data Repositories https://www.re3data.org/ Archiving for the Future: Simple Steps for Archiving Language DocumentationCollections (Kung, Sullivant, Pojman, Niwagaba 2020)https://archivingforthefuture.teachable.com/ is a free online course on datamanagement with an archiving endgame. Language Data Curation Tutorials (Kung, Sullivant, Niwagaba, Ferreira, 2018)workshop slides (from CoLang 2018) and video tutorials in English and ct/ailla%3A257379 AILLA’s YouTube IndigenousLanguagesofLatinAmerica/25References Archive of the Indigenous Languages of Latin America. 2018. Language Data Curation Tutorials. Archive of the IndigenousLanguages of Latin America. a:257379 Digital Preservation Coalition. 2020. Digital Preservation Handbook: Glossary. https://www.dpconline.org/handbook/glossary Johnson, Heidi. 2004. Language documentation and archiving, or how to build a better corpus. In Language documentation anddescription, ed. by Peter K. Austin, Vol. 2: 140-153. London: SOAS. http://www.elpublishing.org/docs/1/02/ldd02 11.pdf Kung, Susan Smythe. 2015, June, revised Aug. 2016. Finding an archive for your (endangered) language research data. EndangeredLanguages and Their Preservation (CELP) Blog. archive-your-endangeredlanguage-research-data Kung, Susan. In press. Developing a Data Management Plan. In Andrea Berez-Kroeker, Bradley McDonnell & Eve Okura Koller (eds.),The Open Handbook of Linguistic Data Management and Archiving, MIT Open Handbooks in Linguistics series. Cambridge: MITPress. National Digital Stewardship Alliance. 2018. Levels of digital preservation. servation/ NSF. 2020. The Proposal and Award Policies and Procedures Guide (PAPPG) 20-1https://www.nsf.gov/pubs/policydocs/pappg20 1/nsf20 1.pdf NSF. 2018. Data Management for NSF SBE Directorate Proposals & Awardshttps://www.nsf.gov/sbe/DMP/SBE DataMgmtPlanPolicy RevisedApril2018.pdf NSF. 2020 The NSF DLI / NEH DEL Program Solicitation NSF 20-603https://www.nsf.gov/funding/pgm summ.jsp?pims id 505705 Owens, Trevor. 2018. The theory of craft of digital preservation. Baltimore: John Hopkins University Press.2613

10/6/20Thank You!2714

equated with archiving in an official repository.Backing up data on hard drives, servers, optical media, and cloud-based services does not constitute archiving." 9 (3) The NSF DLI / NEH DEL program solicitation NSF 20-603 "The DMP should include a timelinefor completion of archiving activities. Archiving and execution of