SMR - Database Administration

Transcription

SMR - Database AdministrationDatabase Administration and MonitoringRoles and ResponsibilitiesSystem InformationDatabase Management System ConfigurationDatabase Support SoftwareSecurity and PrivacyEnvironment SecurityEncryption at RestRole-Based Data Access SecurityDocument Element-Level SecurityLogging and AuditPerformance Monitoring and Database EfficiencyPerformance Monitoring ToolsOpsDirector – Environment Performance MonitoringMarkLogic Monitoring DashboardMarkLogic Administration ConsolePerformance TuningOperational ImplicationsScalabilityAvailabilityFailoverData Transfer RequirementsData FormatsBackup and RecoveryBackupRecoveryDatabase Administration and MonitoringThis section describes the requirements and strategies to maintain the database operationally considering the following:Required availability and requirements for standby sites of the SMR data stores to satisfy continuity of operations and meet required SLAs.Any database specific application and user support scenarios that are not documented in the SDD.Any monitoring and performance goals/requirements, and how the DDD supports them.Required maintenance of the SMR data stores to maintain acceptable performance.Backup and recovery strategies needed to implement the DDD.Any security and/or privacy considerations.Along with the above mentioned activities, the following sections also address the security, performance and other architecturally significant requirementsdocumented in Jama at these links: Link-1, Link-2.Roles and ResponsibilitiesSee Section 1.4 for more information.System InformationThis subsection identifies the vendor, version or release date and targeted hardware for the SMR implementation.Database Management System ConfigurationThis subsection identifies the vendor, version or release date and targeted hardware for the MarkLogic NOSQL database chosen for the initialimplementation of the SMR database. It also provides the configurations that are performed on the system to implement the SMR.The following table identifies all the relevant system configuration information for various databases within the SMR.Table 7: Database System InformationDetailsProjectSpecificationSMR

PurposeData stores of RDL, SDS, and Metadata abase TypeNoSQLDatabase VendorMarkLogicDatabase VersionMarkLogic 9 (version 9.0.7 released September 2018)Operating SystemRedHat LinuxOperating System pository-test-contentNumber of forests2 forests for each of the databases mentioned aboveUsername USR NM Password PWD Connection StringN/ASchema NameN/AHosting Data CenterDoITAccess PermissionsRead Only access will be provided to the configuredusers.Read-Write access will be provided to the administrators.Query Console URLhttps:// HOST NAME :8000/qconsole/Data Explorer URLhttps:// HOST NAME :7777/loginMarkLogic Monitoring Dashboard UIhttps:// HOST NAME :8002/dashboard/Admin Console URLhttps:// HOST NAME :8001/Database Support SoftwareThis subsection lists and references the documentation of the MarkLogic NoSQL database available to support the use or maintenance of the SMRdatabase.MarkLogic has the following means of supporting the customers:Table 8: MarkLogic SupportTypeLinkDocumentation and ase/ListTicketing ort c Developer-friendly ogic Developer community and discussionforums https://www.marklogic.com/community/ Security and PrivacyThe importance of securing the data stored in the SI Platform increases due to the nature of the data itself. As discussed above, the SMR and MDM storedata that are classified as Personally Identifiable Information (PII), Personal Health Information (PHI), FTI, PCI, and so on. To secure the data stored, theSI Platform design suggests implementation of the following three layers of security:Environment or Network-Level SecurityEncryption at RestDatabase Role-based Access SecurityDocument Element-Level Security

Environment SecurityThe SMR and MDM deploy on the servers available in the secured Data-Zone as described in the Infrastructure Design. The Data-Zone is not available tothe external users to access.Any access to the MarkLogic cluster is further enabled based on the TLS/SSL authentication.The figure below shows the option to enable or disable SSL flips on a MarkLogic cluster.Figure 6: Enable SSL From MarkLogic Admin ConsoleEncryption at Rest

In the case of MarkLogic, the cluster data will be encrypted with a Key Management Service (KMS). The KMS manages a keystore that stores theencryption keys used to encrypt data in a secure location. This keystore can be either the MarkLogic embedded PKCS #11 secured wallet or an externalthird party KMS that conforms to the Key Management Interoperability Protocol (KMIP)-standard interface.The figure below shows the option to configure a MarkLogic cluster with an internal KMS.Figure 7: Data Encryption in MarkLogic DatabaseRole-Based Data Access Security

The MarkLogic security model is based on the Principle of Least Privilege. The principle of least privilege is that users are given only those privileges thatare required to perform their jobs efficiently. The privileges are in turn derived from the roles. Roles are the central point of authorization in the MarkLogicServer security model. Privileges, users, other roles, and document permissions all relate directly to roles. The following conceptual diagram shows howeach of these entities points into one or more roles.Figure 8: Role-Based Access and PermissionsAs suggested in the MarkLogic Security Guide the following two types of privileges will be used by the SI Platform: URI privileges and execute privileges.URI privileges are used to assign document creation permissions with URIs.Execute privileges are used to protect the execution of functions in the XQuery code.Document Element-Level SecurityThe SMR and MDM components will also enable element-level security applied at the document level. The element level security protects a part of thedocument from being visible to unauthorized users. Elements of a document will be protected from being viewed or from being updated by a user, unlessthe user has the appropriate role-based authorization. The following figure describes the element security applied to an XML document when viewed by auser with lesser privilege.Figure 9: Element-Level Security ExampleElement level security works by specifying an 'indexable' path to an element (or JSON property) and configuring permissions on that path - creating aprotected path. A path to an element in a document that has been configured with permissions is called a protected path. Permissions will be defined onan element the same way it is defined on a document. Document path can be configured to be protected via programs or through the admin configurationas shown in the figure below.Figure 10: Enable Element-Level Security in MarkLogic

In addition to these protected paths, the query role sets need to be appropriately defined for the roles for element level security to work. Query rolesets areused by the MarkLogic database to figure out the search results, based on the role(s) of the user running the query, in addition to the term being searched.Similar to protected paths, query rolesets can be configured programmatically or through the admin interface as shown in the figure below.Figure 11: Query RolesetsThe Redaction feature is a read transformation applied on top of XML and JSON documents. Redaction addresses privacy concerns by making it possibleto remove or mask information when importing, exporting, or copying data into and outside of MarkLogic. This prevents leakage of sensitive information tounauthorized users.Figure 12: Element Redaction

Some of the variations of redaction mechanism supported by MarkLogic are:Table 9: Data Redaction in SMR and e original value is completely obscured. For example, 123-45-6789 becomes #########.MaskingPartialA portion of the original value is retained. For example, 123-45-6789 becomes ###-##-6789.MaskingDeterministicThe same input always results in the same redacted output. For example, the value '12345' becomes '11111'everywhere it appears in content selected for redaction.MaskingRandomEach input results in a random redacted value. For example, the value '12345' might be masked as '1a2f578' in oneplace and '30da61b' in another.MaskingDictionarybasedA form of random or deterministic masking in which the replacement value is drawn from a user-defined redactiondictionary.ConcealmentN/AThe original value (and potentially the containing XML element or JSON property) is entirely removed. For example, ifyou conceal the value of /a/b, then a b 12345 /b /a might become /a .MarkLogic uses rule-based redaction when exporting the document to determine the redaction logic. A redaction rule tells MarkLogic how to locate thecontent within a document that should be redacted and how to modify that portion. A rule expresses the business logic, independent of the documents tobe redacted.Figure 13: External KMS ConfigurationLogging and Audit

Process logs help investigate and troubleshoot potential failures and errors in data ingestion. In the event of an error, the admin can utilize log files toidentify the point of failure. Log data will be both comprehensive and easy to analyze. The logs demonstrate the compliance for safe handling of data bydocumenting critical information about files, jobs, and data as well as user activity.The following table lists the data elements that will be logged.Table 10: Logging and Audit Data ElementsJob and File Data ElementsFile nameSource locationTarget locationTransfer initiating system (sourcename)Initiating procedure nameInitiation timeCompletion timeSize of the fileTransfer statusUser Activity Data ElementsUsernamesLog-in timeSession lengthTransfers initiatedFolders accessedFiles read/updated/deletedPerformance Monitoring and Database EfficiencyThe SMR leverages MarkLogic's performance monitoring and database efficiency-enhancing tools and design guidelines.For the database performance monitoring, MarkLogic provides OpsDirector tool, which SMR will configure to monitor the live performance of various SMRnodes and databases. For application performance, MarkLogic Monitoring Dashboard and MarkLogic admin console provide dashboards providing aholistic view into the SMR application. These are explained in the following subsections.Performance Monitoring ToolsThe following subsections describe the various performance monitoring, and performance tuning options enabled in the SMR. As described above, theSMR leverages the following tools provided by MarkLogic:OpsDirectorMonitoring DashboardAdmin ConsoleOpsDirector – Environment Performance MonitoringOpsDirector is an easy-to-use plugin that can be configured on the MarkLogic server to provide a clear view of the entire SMR environment. It offerscentralized monitoring with dashboards and alerts as shown in the figure below. It also provides visibility into the assets of SMR in real-time, and access tothe event logs, helping the database administrators with troubleshooting any performance impediments.Figure 14: Sample MarkLogic OpsDirector DashboardThe OpsDirector also provides a view into the real-time performance metrics, such as disk I/O, request rate, server latency, CPU and memory utilization,memory I/O, and network availability, as shown in the following figure.Figure 15: OpsDirector Performance Metrics

Figure 16: OpsDirector Memory Performance MetricsMarkLogic Monitoring Dashboard

The performance monitoring of the application and query execution can be achieved via the Monitoring Dashboard available in the query console asshown in the following figure.Figure 17: MarkLogic Monitoring DashboardThe Monitoring Dashboard also provides options to monitor the disk space and access rates of the server.MarkLogic Administration ConsoleThe SMR database can be easily administered on the MarkLogic administration console. It will be available on the port 8001, rendered via the browser.The admin console runs on MarkLogic's internal tomcat application server.The database administrator can securely log in to the admin console, and configure the database settings to achieve optimal performance. Admin consolealso lets the DBA perform manual or scheduled backup and restore operations. This is a partial list of what can be accomplished using the MarkLogicadmin console.MarkLogic Memory ConfigurationCreate and configure forestsCreate and configure databasesAdminister database securityEnable data encryptionManage database indexes, and search parametersAdminister RDF triplesConfigure the database memory parameters (as shown in the figure below)

Figure 18: MarkLogic Memory ConfigurationMarkLogic Admin Console DashboardBuild and/or configure thesaurus for global naming recognitionThread and connection pool managementThe admin console provides a holistic view of the MarkLogic cluster, its databases, and their statuses (as shown in the figure below)Figure 19: MarkLogic Admin Console DB SummaryMonitor database statistics (as shown in the figure below)

Figure 20: Database StatsMarkLogic Log Viewer on Admin ConsoleAudit logging and viewing the historical log files

Figure 21: MarkLogic Log Viewer on Admin ConsoleSummary of the databaseFigure 22: Database SummaryConfigure and manage the nodes in the MarkLogic cluster.Administer the hosts on a given cluster

Figure 23: MarkLogic Cluster Hosts SummaryPerformance TuningThe SMR utilizes the various performance tuning options available in MarkLogic. All the configurations to enhance the performance and databaseefficiency are made using the MarkLogic Admin Console, as described in Section 7.3.1.3.Some of the elements and tools that could be configured to enhance the performance are listed below:Thread count configuration

The MarkLogic task server can be configured to run a specific number of threads, setting the default maximum to 16 as shown in the figure below.However, the database administrator can control, without restarting the server, the maximum number of threads to run, based on the expected efficiency tobe achieved.Figure 24: Task Server - Threads and time limitUsage of element range index

The SMR will utilize the efficiency provided by MarkLogic element range indexes by creating them, especially for all date and id fields to increase the abilityto search on these frequently searched fields. More information on range indexes can be found here.The Database administrator can create and manage the range indexes without restarting the server, on the MarkLogic Admin Console as shown in thefigure below.Figure 25: Usage of IndexesQuery console – Profiling the queriesThe system engineers can control the performance of queries by profiling the requests on the query console. This is a supplementary tool available for thedevelopers to tune their queries for optimal performance. The following figure shows the query profiling example. For details on profiling, please read this information.Figure 26: Query ProfilerOperational ImplicationsThis subsection describes operational implications of data transfer, refresh and update scenarios and expected windows, including security considerations.SMR supports, as detailed in Section 5.4, both bulk export as files, as well as, real-time data exports. In addition, the SMR leverages various exportoptions available in MarkLogic.MarkLogic provides the following export options:Exporting selected documents as independent XML files to a directoryExporting selected documents to a compressed fileExporting to an archiveExporting an entire collection to a compressed fileExporting an entire collection as XML files to a directoryExporting the database snapshotScheduled data exports

The performance impact during data exports and transfers will be resolved by using the MarkLogic's scalability, availability and failover guidelines,explained in the following subsections.Along with these options, the SMR database administrator can increase the number of threads operating on the MarkLogic CORB program to achieve thedesired data export or delivery performance.CORB allows configuration of a range of property options that can be configured to suit the application requirements. THREAD-COUNT option is one ofthose parameters that the database administrators can use to control the number of worker threads to be used for a particular operation. Based on theload and the size of the content to be exported, the administrator can set it to run an optimal number of threads. By default, CORB sets it to 1; but SMR isdesigned to use 10 as a default, with an option to increase, or decrease, before the export programs are executed.ScalabilityThe SMR utilizes MarkLogic's capabilities to scale. MarkLogic Server is built with solid foundations derived from both database and search enginearchitectures. Consequently, updates become available for querying as soon as they commit and queries against extremely large content sets return veryquickly.The following information is available on the MarkLogic documentation library.MarkLogic Server evaluates queries on an evaluator node (e-node), and the e-node gathers any needed content from the data nodes (d-nodes). If thecontent set is large and spread across several d-nodes, each of those d-nodes is involved in the query (even if only to inform the e-node that it has nocontent matching the query). The calls to the d-nodes are returned to the e-node in parallel, and because the content is indexed when it is loaded, the callsto each of the d-nodes return very quickly.As the content grows, the DBA will need to add forests to the database for optimal performance. There is no limit to the number of forests in a database,but there are some guidelines for individual forest sizes.The numbers in these guidelines are not exact, and they vary considerably based on the content. These numbers are based on average-sized fragmentsof 10k to 100k. If the data fragments are much larger on average, or if there are lots of large binary documents, then the forests can be configured to largernumbers.The rule-of-thumb maximum size for a forest on a 64-bit system is 256GB per forest, or 128-million fragments, whichever comes first. If the forest sizegrows past 128-million fragments, the system will start logging messages to the ErrorLog.txt file warning as to the number of fragments in the forest. Allbinary documents should be considered in fragment count estimations, but large and external binary documents may be excluded from the 256GB perforest limit since they contribute little to the size of stands within a forest. For example:15M non-binary fragments10M small binary fragments10M large/external binary fragments 35M fragments total, guideline within range, no action neededbased on fragment count100 GB non-binary size150 GB small binary size1 TB large/external binary size 250 GB non-binary and small binary size, guideline at borderline,consider creating more forests based on sizeAvailabilityThe following information is available on the MarkLogic documentation library.The SMR leverages MarkLogic Server's high availability features enabling fast and reliable performance, and also provide recovery from power outages,application errors, or software failures. There are many features in MarkLogic Server designed to keep the server running and available.Fast automatic restartAutomatic, concurrent forest recoveryTunable database parameters, such as memory limit, in memory list size, and in memory range index sizeOnline

Database Administration and Monitoring This section describes the requirements and strategies to maintain the database operationally considering the following: Required availability and requirements for standby sites of the SMR data stores to