Continuing To Keep Stuff Safe With Lots Of Copies .

Transcription

Continuing to Keep Stuff Safewith Lots of Copies,Communities, and InnovationNicholas Taylor (@nullhandle)Program Manager, LOCKSS and Web ArchivingStanford LibrariesLOCKSS open webinar29 January 2019

understand mitigate threats long-term dataintegrity is hard needs architectureinformed by actualleading threats todata don’t underestimate: people makingmistakes attacks on information organizational failure“Fragile” by Garrett Coakley under CC BY-NC 2.0

what is LOCKSS? a widely-acceptedprinciple for thepersistence of digital info a digital library-focusedprogram of StanfordLibraries a research-informedsoftware app for p2pdigital preservation an internationalcommunity ofinstitutions networks“Cologne Love Padlocks” byorkomedix under CC BY-NC-SA 2.0

more than lots of copies lots of copies isnecessary but notsufficient central points of failurecan undermine allcopies at once LOCKSS provides: continual integritychecking repair b/t mutually-distrusting,independent peers on a network that yourcommunity controls“Domino's” by david pacey under CC BY 2.0

routine audit repair ensuring long-term dataintegrity must read data to know it’sgood easier to repair data sooner network nodes conduct pollsto validate integrity ofdistributed copies more nodes more security more nodes can be down more copies can becorrupted and polls will stillconclude“DSC 4346” by Dennis Jarvis under BY-SA 2.0

community digital preservation communitiescomplement LOCKSS: resilience againstorganizational failure native heterogeneity preservation is anactive communityeffort lots of communitieskeep stuff safe“Redwood Canopy” by Floyd Stewart under CC BY-NC-SA 2.0

How LOCKSS Works“gears” by starfive under CC BY-NC-SA 2.0

distributed digital preservation align w/ best practice achieve resilience notpossible w/ centralizedsolution for use either: as dedicated preservationsolution to supplement localpreservation (e.g., for mostimportant materials) particular to LOCKSSamong service providers: strongly research-based articulated threat model supports local custody“Stone stacks” by Jack Malvern under CC BY-NC 2.0

content lifecycle ingest content web harvest, OAI-PMH,direct interconnect, ordrag-and-drop viaLOCKSS-O-Matic manage content web-accessible GUI tomonitor preservationactivity ( select newcontent for archiving,in some networks) preserve content each node retrievescontent independently once stored, audit repair takes placeautomatically, onongoing basis deliver content proxy server, webserver, OpenURL

setup, support, costs organize your communityaround content of sharedconcern we will consult on fit,technical requirements,workflow integration pilot implementation w/subset of nodes to validateworkflows production implementation ongoing support participants are asked to jointhe LOCKSS Alliance (annualmembership fee)“Planning in progress” by Guillaume Capron under CC BY-SA 2.0

start new or join existing network start a new network recommend 4 copies we can host node(s) also an option to join anexisting network reach out to networkcommunities directly we are exploring how tobetter support needs ofindividual orgs thataren’t aligned w/ alogical community“Pipes” by Travis Leech under CC BY-NC-ND 2.0

Use Cases“ROC/Bredero” by Jesper2cv under CC BY-NC-ND 2.0

post-cancellation access for e-resources networks: Global LOCKSS Network restore best features ofprint journal holdings lostw/ online publishingtransition: local custody (vicecontingent access) lots of decentralizedcopies (vice fewer,centralized copies) to better assure: preservation of scholarlyrecord continuing library role assteward

dark archive for scholarly publications networks: CLOCKSS Archive Public Knowledge ProjectPreservation Network CLOCKSS co-governed by libraries publishers content triggered OA whenno longer available top CRL TRAC audit score PKP PN OA content hosted on OJS free seamless to use forfolks publishing on OJS

government information networks: Canadian GovernmentInformation Digital FederalDepository LibraryProgram can’t necessarilydepend on governmentfor permanent access save re-decentralizegovernment information

institutional repository content networks: Alabama DigitalPreservation Network MetaArchive Cooperative WestVault all types of content service models: all depositors also hostinfrastructure subset of orgs hostsinfrastructure but serveswhole community governance infrastructure bothcommunity-based

web archives networks: Ivy Plus LibrariesConfederationPreservation Network growing relativeimportance of webarchives for collectiondevelopment decentralized localcustody preservationto complementArchive-It

national / nationally-licensedscholarly publications networks: Cariniana German nationalnetwork natural nationalinterest in preservingown OA output national consortiawant jurisdictionalcontrol over licensedscholarly content

Software Re-Architecture“The two bridges” by Frank Schulenburg under BY-SA 2.0

software re-architecture motivation monolithic Javaapplication only deployable as end-toend solution lacking modern APIs maintaining functionalityon our own that othersincreasingly address as acommunity undertook re-architecture2017-2019 w/ MellonFoundation funding“The SF Bay Bridge reopens” by Benjy Feen under CC BY-NC-SA 2.0

software re-architecture goals capitalize on work ofbroader communities de-silo enableexternal integrations empower communityof practice w/ betterdocumentation welldefined APIs evolve w/ web digitalpreservationecosystem“New York Reflection” by Reto Fetz under CC BY-NC-SA 2.0

anticipated outcomes collaborate to build newhybrid solutions align better w/community workflows interfaces simplify adaptation ofLOCKSS to local needs support LOCKSStechnical community ofpractice expand contexts whereLOCKSS can contributeto digital preservation“ Bloom” by Jak W! under CC BY-NC 2.0

new integration possibilities Data Life-CycleManagement (DLCM) Swiss universitiescollaborative researchdata management Software PreservationNetwork (SPN) promoting best practices piloting distributedemulation infrastructure Webrecorder high-fidelity web capture replay software

fixity service some content too bigfor lots of copies instead, make lots ofcopies of checksums subject to LOCKSSpolling repair provide API endpoint compare w/ hashresult generated byexternal system“Measure twice, cut once.” by GretaMichelle Joachim under CC BY 2.0

cloud friendl(ier) may enable some use cases;improve handling of others technically feasible, but noteconomically optimized explore using cloud inconcert w/ fixity service benchmark cloud costs(revisiting prior research onLOCKSS in the cloud) leverage w/o ceding valueof distributed, local contentcustody(confidential)“State-of-the-art cloud storage. 2015.” bySamarth Shyamanur under CC BY-NC 2.0

Wrap Up“Birmingham City Centre - Nov 2013 - The New Library Wrappedand Ready for Christmas” by Gareth Williams under CC BY 2.0

takeaways LOCKSS is a generalpurpose digitalpreservation platform re-architecture willprovide improvedintegration interoperability learn more at our newwebsite: lockss.org please contact us withany questions, or ideason how we can worktogether“Partir” by Chiara Conticelli under Fair Use

Questions“Any Questions?” by Matthias Ripp under CC BY 2.0

Jan 29, 2019 · OA content hosted on OJS free seamless to use for folks publishing on OJS. government information . (SPN) promoting best practices . polling repair provide API endpoint compare w/ hash result generated by external system “Measure twice, cut once.” by GretaMich