System Design Document - Govinfo

Transcription

U.S. Government Publishing OfficeFederal Digital SystemSystem Design DocumentVolume I: System ArchitectureNextGen EditionPrepared by: FDsys ProgramPrograms, Strategy, and TechnologyU.S. Government Publishing OfficeSeptember 2016

Revision HistoryRevisionDateDescriptionVersion 1.0March 3, 2008Initial draftVersion 1.1May 12, 2008Incorporated comments from PMO anddevelopment team. Added logical data modeland system-level component modelVersion 2.0September 5, 2008Restructured SDD to multiple volumes ofindividual design documents. The currentdocument is updated to focus on high-levelsystem architecture and design. Added detailson OAIS model implementation.Version 2.01February 12, 2009Minor updatesVersion 3.0July 11, 2009Updated with re-factored processingarchitecture, and metadata edit through anXform based web app. Section 4.5.6.Version 3.1October 10, 2010Final updates for Release 1.Version 3.2October 14, 2010Version 3.3September 12, 2014Final updates for Release 1, incorporatedreview comments.Initial update for NextGen.Version 4September 3, 2015Update for NextGen.Version 5October 27, 2015Changes Accepted.Version 5.1September 20, 2016Updated Network Diagram.2

Contents1.2.Introduction . 11.1Purpose . 11.2System Design Document - SDD . 11.3Audience . 11.4Document Organization . 21.5References . 2System Overview . 32.1Purpose of the FDsys. 32.2FDsys Conceptual Design . 42.2.1 Three Subsystems View . 42.2.2 User Characteristics . 52.3Design Considerations . 62.3.1 FDsys and OAIS Model . 62.3.2 XML in FDsys . 72.3.3 FDsys and GPO Enterprise Architecture . 83.Scope of Release 1C.2 . 93.1FDsys Infrastructure. 93.2FDsys Information Packages and Metadata Management . 93.3Content Submission . 93.4GPO Access Replacement . 103.4.1 Incremental Approach . 103.4.2 Transition to FDsys . 113.54.Summary . 13System Architecture . 144.1Application View . 144.1.1 Content Management Subsystem . 144.1.2 Archival Subsystem. 174.1.3 Access Subsystem. 174.1.4 ILS Integration . 184.2Network and Storage View . 18iii

4.3Application Security View . 194.4Deployment View . 214.5System Component Model . 214.5.1 Submission Component . 224.5.2 Ingest Component. 224.5.3 Access Component . 234.5.4 Infrastructure Component . 254.5.5 Data Model . 254.5.6 Architecture Update – June 2009 . 264.5.7 Architecture Update – June 2015 . 305.Content Packaging . 335.1Conceptual Package Model . 335.2Package Implementation Approaches . 345.2.1 SIP . 345.2.2 ACP . 355.2.3 AIP . 355.2.4 DIP. 375.36.Package Lifecycle . 37Data Model . 396.1Logical Data Model . 396.1.1 Content Data Model . 396.1.2 Security Data Model . 426.1.3 Report Data Model . 436.1.4 Publication Linking Data Model . 466.2Physical Data Model . 476.2.1 Content Data Model Implementation . 476.2.2 Security Data Model Implementation – LDAP integration . 486.2.3 Report Data Model Implementation . 486.2.3 Publication Linking Data Model Implementation . 487.OAIS Functional Model Implementation . 507.1Content Submission . 517.1.1 Interactive Submission . 52iv

7.1.2 Folder-Based Submission . 527.1.3 Bulk Submission . 537.2Ingest. 547.3Archival Storage . 597.4Data Management . 597.5Access . 597.6Administration . 607.7Preservation Planning . 608.Business Process Implementation . 618.1Overview . 618.2Submission . 648.2.1 Folder-Based Submission . 648.2.2 Interactive Submission . 678.3Ingest Process . 698.4Access Processing Workflow . 718.5Package Updating Process . 738.6Archival Updating Process . 758.7AIP Deletion Workflow . 779.Data Migration . 799.1Data Analysis . 799.2Migration Tool . 799.3Migration Setup. 8010.Application Integrations . 8210.1 Integration Overview. 8210.2 External System Integration . 8411.Hardware Architecture . 8811.1 Storage Architecture . 8811.2 Hardware Architecture . 90v

FDsys System Design Document – NextGen1.Introduction1.1PurposeThis document describes the architecture and system design of the Federal Digital System (FDsys)for the U.S. Government Publishing Office. It is a living document that evolves throughout the designand implementation for each release. As necessary, each significant release will have an edition ofthe document. The first public release was R1C2, also known as Release 1 (January 15, 2009).Updates have been made to this document in version 5 to account for the Next Generation FDsys(NextGen). Note that information about the original architecture and design has been maintainedfor historical reference and the document has been updated as necessary for NextGen.The goal of this document is to cover the high-level system architecture and design. It providesguidelines on component functionalities and how they are related to each other at the architecturallevel. The detailed designs that support the architecture are documented separately in the individualcomponent design documents.The current document starts with the system architecture, followed by various architectural topics,such as content packaging model, data migration strategy, business process flows, etc.1.2System Design Document - SDDThe system design document (SDD) for FDsys consists of multiple volumes of individual designdocuments. In addition to the current document (Volume I), which focuses on high-level architectureand design, separate detailed design documents are created for each of the major components of thesystem and data management documents for each type of the publications that are managed by thesystem. The volume numbers for the SDD are in general consecutive, but gaps do exist becausesome of the originally planned documents were either consolidated or superseded as the design andimplementation proceeded.1.3AudienceThe design documentation is in general for anyone who wants to understand the systemarchitecture and design of FDsys. The following groups are in particular the intended audience ofthe document. FDsys program managers – to evaluate that the system architecture and design support therequirementsFDsys development engineers – to understand the architecture and follow the design tobuild the systemFDsys administrators – to understand the internal workings of the system in order toadminister the system effectivelyFDsys operators – to improve productivity while using the system on daily basisFDsys maintenance engineers – to understand how the system was built in order to be ableto perform any enhancement or reengineering work.1

FDsys System Design Document – NextGen1.4Document OrganizationThe current document is organized as follows.SectionPurposeSection 2: System OverviewTo describe the purpose of the system, and provide aconceptual design, along with some high-level designconsiderations.Section 3: Scope of Release 1C.2To describe the scope of the R1C2, and the incrementaldevelopment approach for FDsys implementation.Section 4: System ArchitectureTo present the system architecture of FDsys, by viewing thesystem from various perspectives.Section 5: Content PackagingTo describe the content packaging concept of FDsys.Section 6. Data ModelTo describe the conceptual data model of FDsys, andimplementation strategy.Section 7: OAIS Functional Model ImplementationTo describe how the OAIS model is implemented in FDsysSection 8: Business Process ImplementationTo describe how content flows in the system, and theworkflows that support daily FDsys operations.Section 9: Data MigrationTo describe how to migrate the content data to FDsysSection 10: Application IntegrationsTo provide guidelines on application integration with FDsysSection 11: Hardware ArchitectureTo summarize hardware architecture for FDsys1.5ReferencesConcept of Operation for the Federal Digital System.Federal Digital System Requirements Document 3.2.Reference Model for an Open Archival Information System (OAIS).2

FDsys System Design Document – NextGen2.System Overview2.1Purpose of the FDsysAs stated in the Concept of Operations, FDsys will enable federal content originators to easilysubmit to GPO the content that can be authenticated, versioned, preserved, and delivered uponrequest. FDsys is the content management system for federal publications within the GPOenterprise, and is at the core of one of the three pillars of the GPO information systems – theContent Management Systems (CMS) pillar. Figure 2.1-1 depicts the conceptual three pillars. TheCMS pillar interoperates with the Business Information pillar for financial transaction and orderingmanagement, and with the Digital Production pillar for content and printing specificationmanagement. In addition to the core content management functionalities, the CMS pillar is alsoresponsible for managing the acceptance of printing orders and the transmission of content andorders to internal and external service providers.GPO r:EnterpriseResourceManagementManageContent andMetadataProducePhysicalProductsInfrastructure for Basic ServicesFigure 2.1-1: Three Information Systems PillarsWithin the content management systems pillar, FDsys provides a platform to support businessfunctionalities in the following three areas: content management, content preservation and contentaccess.Content ManagementThis is to support daily operations of content management, including content submission, versioning,update on content renditions and metadata. In addition to the content management, FDsys also hasthe flexibility to handle if necessary, in post R1C2 release, collecting content-related businessprocess information,3

FDsys System Design Document – NextGensuch as content ordering. Therefore interactions with applications in other two information systemspillars in later releases are also supported in this functional area. As of June 2015, the functionalityis planned to be handled by the GPO DASH system and not in FDsys.Content PreservationWhile content management is at the heart of its functionality sets, FDsys goes beyond what thestandard enterprise content management systems provide. One of the critical missions of FDsys isto preserve the content in its original form and to perform preservation processing on the contentand technology refreshment to achieve the goal of making the content permanently accessible.Content AccessIn addition to the strategic mission of long-term content preservation, another critical mission forFDsys is to become the next generation of the GPO Access for content access and dissemination.The current GPO Access system was built more than a decade ago with a primary focus on makingthe government publications available online. Its architecture and enabling technologies have shownserious weakness to efficiently support its business functionalities without frequent and intensivemanual interventions.The access component of FDsys have subsumed functionalities of the current GPO Access with anew architecture and design supported by modern technologies. Since its high visibility is currentlysupported by a failing architecture along with the complexity of the processes involved in dailyoperations, the current GPO Access was replaced as one of the high priority features for the firstpublic release of FDsys, the R1C2 release. Note: GPO Access was retired in March 2010.2.2FDsys Conceptual Design2.2.1 Three Subsystems ViewTo accomplish its missions, conceptually FDsys has three major subsystems as depicted in Figure2.2-1. Two separate content repositories are created respectively for the content management andcontent preservation subsystems. These two subsystems are accessible only within the GPOintranet. The repository for the content management is to support daily operations of FDsys, such asaccepting content submission, updating existing content and metadata in the system, and publishingcontent and metadata for public access. The archival repository is to support content preservation.Preservation processes in post R1C2 will all be performed on the archival repository.The two repositories communicate with each other when necessary, but each has its ownindependent storage for the content.The access subsystem is in the DMZ for public content access and dissemination. The publiclyaccessible FDsys packages are published from the content management repository to the accesssubsystem, which processes the content and associated metadata and make them available onlinefor general public access.This high level conceptual view of the three subsystems will be reflected in the system architectureand application designs throughout the system design documentation.4

FDsys System Design Document - NextGenFigure 2.2-1 FDsys Conceptual Design2.2.2 User CharacteristicsFDsys supports two categories of users: authorized users and public users. The contentmanagement and preservation subsystems are only accessible to the authorized users. Authorizedusers are further categorized to functional specialists and system administrators or managers. Thefollowing lists the specialists and managers that are supported in R1C2, but due to shifting businesspriorities, functionality has not been implemented for select user classes indicated below. Internal Service ProviderCongressional Publishing SpecialistWeb Management SpecialistCataloging Specialist (not implemented)Collection/Content Management SpecialistCollection/Content ManagerPreservation SpecialistSystem AdministratorESB Administrator (not implemented)Workflow AdministratorBusiness Workflow Manager (not implemented)System Administrator ManagerSecurity Manager5

FDsys System Design Document - NextGenEach authorized user will be granted roles that determine what the user can perform oncesuccessfully authenticated by the system. Preservation subsystem further limits its accessibility toauthorized users with preservation privileges. Details of the application security and the user rolesand content groups that enforce the access security measures are addressed in detail in theRepository Design document.The access subsystem is open to public users without authentication. In post R1C2, optionalpersonalization will be provided to public users to customize their web pages appearance. Asof October 2015, this functionality has not been implemented.2.3Design ConsiderationsTo meet the requirements, FDsys is designed and implemented with the following considerations. To follow the Open Archival Information System (OAIS) reference modelTo manage metadata in XMLTo align with the GPO Enterprise Architecture (Note: System design occurred in 2008.)2.3.1 FDsys and OAIS Model2.3.1.1 The OAIS Reference ModelThe OAIS reference model was developed by the Consultative Committee for Space Data Systems(CCSDS), to serve as a standard for digital repositories for preservation purposes. Therecommendation of the reference model issued by CCSDS in 2002 was adopted by the InternationalOrganization for Standardization as the ISO 14721:2003. According to the ISO14721:2003, an OAIS is “an archive, consisting of an organization of people and systems, that hasaccepted the responsibility to preserve information and make it available for a DesignatedCommunity”. Note: ISO 14721 was updated in 2012.Aiming to be a context-neutral and high-level reference model, the OAIS model describes theenvironment, functional and information models, and responsibilities that apply to an OAIScompliant archival system.The OAIS environment model is concerned with the producers and consumers of materials that aredelivered to and obtained from the OAIS. It also covers the management of the OAIS itself. A specialclass of the consumers is the Designated Community, which should be the primary user group of theOAIS, and the OAIS must be able to appreciate the community’s knowledge base in order forinformation supplied by the OAIS to be understandable by this user group.The OAIS functional model addresses functional activities for the following entities: IngestArchival StorageData ManagementAdministrationPreservation PlanningAccessThe OAIS information model describes the concept of Information Package and defines whatshould be included in the Information Package. The OAIS model proposes three Information6

FDsys System Design Document - NextGenPackages: Submission Information Package (SIP), Archival Information Package (AIP), andDissemination Information Package (DIP). Each information package includes digital objects to bepreserved, metadata required to describe the digital objects, and the packaging information thatassociates the digital objects with their describing metadata.Finally, the responsibilities of an OAIS archive required by the OAIS model are: Negotiate and accept information from ProducersDetermine which community should become the Designated CommunityEnsure the Information Packages are independently understandableEnsure the Information Packages are preservedMake the preserved Information Packages available2.3.1.2 Impact of OAIS Model on FDsys DesignAs clearly indicated in the ConOps and specified by a set of specific requirements, FDsys will followthe OAIS reference model to manage the content lifecycle. While some of the OAIS entities, suchas Data Management for the archive, can be mapped to implementations of relevant functionalitiesfrom the commercially available enterprise content management systems (CMS), the commercialCMS products are in general not designed to conform to the OAIS model - the OAIS informationmodel for long term preservation in particular. This presents a challenge for FDsys to implement thereference model by using the out-of-box features of a commercial CMS product.Every commercial CMS product has its own proprietary data model for the content it manages.Though the implementation approaches vary from one product to another, the separation betweenthe content and metadata is common to all data models of the COTS CMS products. While thecontent may be stored in various storage devices such as file system, the metadata are normallystored in a persistence store such as relational database. How the association between themetadata and content is modeled and managed varies widely between the CMS products, and hasbecome one of the key differentiators between the competing products.A simplest and easiest implementation of a content management system would be to use the outof-box content data model of a COTS CMS, and leverage the application tools usually bundled withthe CMS offering to manage the content lifecycle with little customization. Apparently theimplementation of this type creates a total dependence on the underlying CMS, and has littleflexibility to adapt to technology evolutions over time. This approach, therefore, will be unable to fullymeet the FDsys requirements for its independence of the underlying supporting technology.By the information package concept, the OAIS model proposes a high level abstraction that createsthe opportunity for an implementation independent packaging scheme. This is especially beneficialfor long-term preservation, which normally has to outlive the lifecycle of the underlying technologythat facilitates the preservation process. Through its carefully designed content data model and selfdescribing archival package, FDsys provides an implementation of the OAIS AIP by leveraging thecontent management capabilities of the COTS CMS product with an XML-based abstraction layer tominimize the dependence on the underlying CMS product. Details of the FDsys implementationstrategy for the OAIS model can be found in Section 5 and 7.2.3.2 XML in FDsysMetadata management is at the heart of all content management systems, and is one of the mostkey functionalities of FDsys. Unfortunately the commercial CMS products, as mentioned earlier, allhave their own proprietary metadata models, which in fact have become one of the critical7

FDsys System Design Document - NextGendifferentiators between the competing products. The non-standards based metadata models presenta problem for FDsys to accomplish its mission - to preserve and disseminate the content andmetadata over an indefinitely long period of time. This mission requires that FDsys implementationremain flexible and not tied to any proprietary CMS implementation, and must to be able to adapt totechnology changes over time.To achieve this goal for long-term preservation, the FDsys requirements specify that all metadata forFDsys content must be in XML form, promoting an implementation that manages the metadata forFDsys content independ

level. The detailed designs that support the architecture are documented separately in the individual component design documents. The current document starts with the system architecture, followed by various architectural topics, such as content packaging model, data migration strategy, business process flows, etc. 1.2 System Design Document - SDD