What's New In MarkLogic 9

Transcription

What’s New InMarkLogic 9WHITE PAPER MAY 2017The best database in the world for data integration is now even better with MarkLogic 9, ourmost ambitious release yet. MarkLogic 9 includes major new features to improve data integration,security, and manageability—all designed to make integrating data from silos easier and faster,and to ensure that your integrated data is well-governed and managed.

ContentsIntroduction. 2Faster, Easier Data Integration. 3Entity Services: Model-driven data integration around real-world entities.3MarkLogic Optic API: Joins and aggregates over documents.4Template Driven Extraction: View your documents with lenses.5ODBC Driver: Improved integration with BI and analytics tools.5Data Movement SDK: Bulk data movement with Java.5Industry-Leading Data Security. 6Encryption: Advanced encryption at rest.6Element Level Security: Granular security at the sub-document level.7Redaction: Export controls to avoid sharing sensitive data.7Improved Manageability. 8Ops Director: One clear picture of your MarkLogic system.8Even More Enhancements. 9Getting Started With MarkLogic 9. 10

INTRODUCTIONof its lifecycle and to secure it against increasingcybersecurity threats. For over a decade, organizationshave run MarkLogic in mission-critical environments,and new security features further solidify MarkLogic’sposition as the most trusted NoSQL database for datagovernance. MarkLogic 9 is bigger than that though—itis a major step towards making MarkLogic the mostMarkLogic 9 is the most ambitious release to dateof the world’s best database for integrating data fromsilos. This is true in the literal sense: MarkLogic 9includes more new features and updates than anyprevious release. The new and enhanced features arefocused on three core areas: data integration, security,and manageability. MarkLogic was already the bestdatabase for data integration, the most secure NoSQLdatabase, and had a robust set of admin tools—MarkLogic 9 elevates these capabilities to a whole newlevel. These improvements position MarkLogic as theleading new-generation database for integrating all ofyour data from silos and running modern, transactionalapplications at scale.secure database, period.IMPROVED MANAGEABILITY TO MAKE DBAsMORE EFFICIENTWith the rapid proliferation of applications and servicesaround the globe, DBAs are in need of better tools thanever to keep up with the exponential growth. MarkLogic9 introduces new features that provide a more unifiedplatform for managing globally distributed MarkLogicdatabase clusters throughout their lifecycles. With aclear picture of the system and improved admin tools,DBAs can increase automation and focus more time ontasks that add value to the business.FASTER AND EASIER DATA INTEGRATION BUILT ONMARKLOGIC’S MULTI-MODEL CAPABILITIESTraditional relational approaches compromise flexibilityand governance by limiting you to a rigid, tabular datamodel when integrating data. MarkLogic 9 introducesa suite of new features that frees you from thoseconstraints by making data integration easier andmore iterative so that you can get more value out ofyour data, faster. You can bring all your siloed data– whether relational or non-relational – into a singleunified platform that is designed for versatility. And,with MarkLogic 9, you are empowered to expose yourdata however you want. You can view it as documents,as a graph, or as relational data by leveraging the newSQL capabilities. You can even combine models. Withthis unprecedented flexibility, you can avoid expensiveand brittle ETL and better manage the entities andrelationships that your business works with.ADDITIONAL ENHANCEMENTS THAT EXTENDMARKLOGIC CAPABILITIESBeyond the improvements in the thematic areas of dataintegration, manageability, and security, MarkLogic9 includes many other additional enhancements.Compliance Archives make it easier to store, manage,and secure historical data using time and event basedpolicies. Tiered Storage, a feature for moving data to lessexpensive storage, is faster and more flexible. Geospatialgets more advanced and precise. Search gets morerobust with expanded language support and otherimprovements. Query Console gets JavaScript profilingand type-ahead suggestions. And, we now supportrunning MarkLogic on the Microsoft Azure cloud. Not tomention hundreds of other improvements that continueto set the bar high for Enterprise NoSQL.IMPROVED DATA GOVERNANCE AND ADVANCEDSECURITY FEATURESWhat would be the purpose of integrating data fromsilos if that data is not properly governed and secured?MarkLogic 9 is designed with new capabilities togovern your integrated data through the entirety2

bsidaryANY ENTITY SERVICES: Provides a model to design applications around real-world concepts, or entities, like Customers and Products.FASTER, EASIERDATA INTEGRATIONThe SQL engine in MarkLogic 9 has been revampedfrom the ground-up to provide faster SQL and ODBCDriver access to better integrate MarkLogic with yourbusiness intelligence, reporting, and analytics tools.And, what would data access be if you were unable toeasily get data into and out of MarkLogic? The DataMovement SDK – a new Java library for applications –makes it easy to move large amounts of data into, outof, or within a MarkLogic cluster.Two of MarkLogic’s main differentiators are the flexibledata model and sophisticated indexes, which togethermake integrating data from silos faster and less risky.MarkLogic 9 introduces a suite of new capabilities thattake advantage of those differentiators to make dataintegration even faster, easier, and more powerful.One new feature, Entity Services, provides a way tocatalog and manage business entities (Customers andOrders, Trades and Counter Parties, Providers andOutcomes, etc.) in as much or little detail as needed,and over time as new data is added. The new OpticAPI provides unprecedented flexibility by givingdevelopers a fluent interface to query all of their datausing either JavaScript, XQuery, Java, or Node.js. TheOptic API also makes it possible to do efficient joinsand aggregates over documents—making MarkLogicthe only database in the world that can do this. Theunderlying technology enabling this capability isTemplate Driven Extraction (TDE), which makes itpossible to define a relational lens over your documentdata so you can query parts of your data usingstandard SQL or the new Optic API—without changingthe underlying data. You can also define a triples lensover your document data and query with SPARQL orthe new Optic API.ENTITY SERVICES: MODEL-DRIVEN DATAINTEGRATION AROUND REAL-WORLD ENTITIESWith its flexible data model and ability to store multipleschemas simultaneously, MarkLogic is an excellentdatabase for data integration. But, organizationsdon’t just need the ability to store multiple schemassimultaneously. They also need to query against fixed,predictable aspects of the data that represent realworld things, or entities. Examples of entities includeCustomers and Orders, Trades and Counter Parties, orProviders and Outcomes.The problem with traditional relational databasesis that they have a static data model and can store3

only one schema. This makes managing entitiesextremely challenging. The context and meaning ofdata is trapped in database queries, application code,outdated application specs, and entity-relationshipdiagrams (ERDs)—everywhere except in the database.This means it is near impossible to make sense ofyour data:You can then access your data, viewed through yourlenses, using either SQL or the API. With the OpticAPI, you can use full-text document search as a filter,perform row operations to join and aggregate data, andretrieve or construct documents on output. You caneven use the Optic API to combine multiple lenses in asingle query, including triples.What data accurately represents our Customer? Whatare the defining properties? How is it related to otherentities? Which systems can generate customers? Howare customers represented to applications? Whichcustomers do not adhere to the business rules?OPTIC APIAGGREGATESEntity Services provides a better way to manage entitiesand the messy, changing data from which they arederived. You can think of it as a catalog that defines ashared understanding of your entities and relationshipsthat makes data easier to govern and program. Dataowners can use Entity Services to build a model thatcaptures governance policies and data rules right inthe database, where they belong. Developers can useEntity Services as part of a model-driven workflow toautomatically generate data transformations, validationrules, index configuration, and SQL views, all of whichreduce errors and help accommodateinevitable change.JOINSDOCUMENTS(SEARCH)MARKLOGIC OPTIC API:JOINS AND AGGREGATES OVER DOCUMENTSRELATIONAL(SQL)SEMANTICS(SPARQL)Figure 2: The new Optic API provides a native language query interfaceto perform joins and aggregates over documents.Document databases provide a natural, denormalizedrepresentation of data, where documents representholistic entities free of joins and complexity. The benefitof this data model is that it maintains context andsupports development and search, as all informationabout an entity is conveniently in one place for efficientretrieval and update. But, many questions also rely onthe relationships across entities—and that is when youwant to do joins and aggregates across documents.Developers benefit from the simplicity of a unifiedquery interface and efficient in-database processing.And, it’s available in all of MarkLogic’s supportedinterfaces: Java, JavaScript, and REST on the clientand JavaScript and XQuery in the server. Eachimplementation adopts language-specific patterns so itfeels conceptually familiar for developers with relationalexperience and syntactically natural given existingprogramming knowledge. For example, there is noneed to compose a string and feed it to the appropriatespecial-purpose query language (SQL or SPARQL).Instead, you build up your query using your chosenThe revolutionary Optic API unites the relational worldand the NoSQL document world by performing joinsand aggregates across documents. MarkLogic is theonly database in the world that can do this. One of theenabling features, Template Driven Extraction, createsrelational “lenses” over documents by using templatesto specify the parts of a document that make up a rowin a view. A template does not change your documentsin any way, it just changes how the indexes arepopulated based on values in the documents.programming language.4

You can also use TDE to define a semantic lens,specifying which data from a set of documents makeup RDF triples that are then indexed in the triple index,making them queryable with SPARQL or the Optic API.1DATA ARQL)With TDE, templates are entirely independent of thedocuments—they do not change your documents inany way. They just change how MarkLogic’s indexesare populated based on values in the documents.ODBC DRIVER: IMPROVED INTEGRATIONWITH BI AND ANALYTICS TOOLSLENS TEMPLATESMarkLogic 9 enhances its SQL engine to provideusers a more robust and improved experience usingtheir business intelligence (BI) and reporting tools.The new SQL engine is simpler to set up, scales tomore columns (without relying on memory-mappedindexes), and runs queries faster. The new SQL enginedrives improvements for the BI investments thatmany MarkLogic users have made, as they have oftenstandardized around an ODBC connection. MarkLogichas supported ODBC connectivity since MarkLogic6. Now, by leveraging the revamped SQL engine,the ODBC driver takes advantage of functional andperformance improvements within theMarkLogic architecture.XMLJSONFigure 3: Template Driven Extraction is an enabling feature that allowsyou to view your documents through relational and semantic lenses.TEMPLATE DRIVEN EXTRACTION:VIEW YOUR DOCUMENTS WITH LENSESMarkLogic supports the document model, a flexible,schema-agnostic approach that natively handles richdata including JSON and XML. But, you may want tolook at some parts of your data using a relational view,and query that data using the well-knownSQL language.DATA MOVEMENT SDK:BULK DATA MOVEMENT WITH JAVAMarkLogic makes it easy move data into, out of, orwithin a database. Customers are doing this at scaleto load nightly batches from legacy systems, generatesophisticated reports, or bulk transform millions ofdocuments at a time.With Template Driven Extraction (TDE), you can overlayparts of your document data with a relational view,without changing the documents themselves. We callthis a relational lens. Using a Template, you can createa relational lens over documents, specifying which partsof documents make up columns in a view. Insertingthat Template into the database creates a SQL viewand populates the indexes, so you can query the viewwith SQL (server-side SQL or ODBC) or with the OpticAPI. This is particularly useful when you want to createreports and visualizations using tools that communicateusing SQL, or when you want to join entities andperform aggregates across documents.MarkLogic 9 introduces a new software developmentkit (SDK), as part of its existing Java API. This DataMovement SDK allows developers to easily includebulk data movement in their Java applications. Underthe covers, the Data Movement SDK handles all of theintricate orchestration required to parallelize databaseaccess and maintain availability, even as the MarkLogiccluster changes (like when adding nodes).1 Download the Flexible Data Model Datasheet and the Semantics Datasheet fora better understanding of MarkLogic’s multi-model capabilities.5

The Data Movement SDK complements the existingJava Client API with an asynchronous interface forreading, writing, and transforming data in a MarkLogiccluster. It enables integrations with existing ETL-styleworkflows, such as when writing streams of data froma message queue or to transfer relational data via JDBCor ORM (object-relational mapping).Encryption is a new feature in MarkLogic 9 thatallows data, configuration, and logs to be encryptedtransparently. This means that MarkLogic files on diskare encrypted at rest, preventing access from outsideMarkLogic. This feature requires no modification toapplications developed on MarkLogic databases.Encryption in MarkLogic is comprehensive and flexible,allowing the administrator to choose which databasesto encrypt, if configuration should be encrypted, and iflogs should also be encrypted.Some of the key Data Movement SDK usecases include: Ingesting data in bulk or streams from any Java I/Osource, leveraging your entire cluster forscale-out performanceBulk processing documents in situ in the databaseusing JavaScript or XQuery code invoked remotelyfrom Java and executed within the database, closeto the dataExporting documents based on a query, optionallyaggregating to a single artifact, such as CSV rowsEncryption works with keys generated by MarkLogicor an external Key Management System (KMS).Using an external KMS offers an additional layer ofsecurity by separating controls between the MarkLogicadministrator and the security administrator whocontrols the KMS. Regardless, by encrypting logsand configuration, no unauthorized users, not evenMarkLogic administrators, can delete or modify auditingevents in log files, thus preventing them from erasingtheir traces of wrong doing.INDUSTRY-LEADINGDATA SECURITYData security and data privacy are top-of-mind inevery enterprise and MarkLogic 9 has significant newcapabilities to support both. Encryption allows data,configuration information, and logs to be fully encryptedat rest. And, security is more granular with ElementLevel Security. This provides access control at the levelof XML elements or JSON properties within documents,in addition to the existing document-level security.Furthermore, Redaction addresses privacy concerns bymaking it possible to remove or mask information whenimporting, exporting, or copying data. This preventsDECRYPTIONDBASECURITYADMINPROTECTIONSYS ADMINFILESYSTEMleakage of sensitive information to unauthorized users.ENCRYPTION: ADVANCED ENCRYPTION AT RESTRising internal and external security threats, expandingcompliance requirements, and cloud computing arejust a few reasons why encryption is critical. Forexample, imagine the common scenario of a systemadministrator that has full access to files on a server,the unpredictable threat of a hacker that obtainsaccess, or a database hosted by a cloud provideroutside your control. Without encryption, or even withfile system encryption, the system administrator, cloudoperator, or hacker could access or modify files—including the files that comprise your database.DISKSTORAGEFigure 4: Advanced Encryption prevents access from outsideMarkLogic by ensuring files resting on disk are fully encrypted.6

based on the user’s role, while still providing access toother information in the document.ELEMENT LEVELSECURITYElement Level Security is very flexible in the way itdescribes what elements are to be protected. Forexample, an administrator can protect a social securitynumber wherever it happens to appear within thestructure of a document (i.e. regardless of schema). Theadministrator does that using a rich, industry-standard,path expression rather than a rigid specification.{"Customer ID": 1001,"Fname": "Paul","Lname": "Jackson","Phone": "415-555-1212","SSN": "123-45-6789","Addr": "123 Avenue","City": "Someville","State": "CA","Zip": 94111These expressions can include attributes of an element,not just the element’s name. For example, an elementmay be marked with a classification attribute of secret.A more restrictive security rule can be applied to anyelements that have that attribute.}REDACTION: EXPORT CONTROLS TO AVOIDSHARING SENSITIVE DATASometimes, cyberattacks are carefully thought out andexecuted. Other times, data is put at risk by accident.For example, imagine the common scenario of testingapplications using sampled data from production.A developer or test engineer may have access toreal credit card information or personally identifiableinformation. Today, even that accidental exposure ofthat information may violate privacy regulations and putyour organization at risk.Figure 5: Element Level Security provides real-time control of who seeswhat inside documents.ELEMENT LEVEL SECURITY: GRANULARSECURITY AT THE SUB-DOCUMENT LEVELMany databases have all-or-none data access, whichis not sufficient to protect against today’s cybersecuritythreats. Today, organizations need to define whocan see what data at the most granular level. Forexample, imagine an application that exposes customercredit card information and personally identifiableinformation to a call center operator. Exposure ofthat information, even to valid application users, mayviolate privacy regulations and put your organization’sdata at unnecessary risk. Controlling data access atthe database-level eliminates these risks and preventsexploitation of application or network flaws that maydisclose sensitive data. The security controls are atthe level of the database, and not something thatapplication developers need to build into their code atthe top of the stack.Redaction eliminates the exposure of sensitiveinformation by removing existing information orreplacing it with other values in order to prevent leakageof information that is not required for end-users toexecute their duties.Redaction uses rules to define what information toremove when exporting data. The process is simple,flexible and secure. First, a MarkLogic security admincreates redaction policies that contain rules definingwhich sensitive information should be redacted andhow. Then, the admin chooses which policy to applywhen running an export. This process allows adminsto generate data sets for different purposes, suchas development or data analysis, that have differenttreatment of sensitive data. For example, a datasetaimed at developers may get a random credit cardnumber while a dataset used by analysts will not havecredit card information at all. In addition, the rules andactions are logged, ensuring that you can later audit allexport activity.MarkLogic has always provided role-based accesscontrol at the individual document level. MarkLogic 9goes further by allowing security administrators to applythese same controls to individual parts of a document.Element Level Security feature provides access controlat the level of JSON properties or XML elements withindocuments. This means that specific information insidea document may be hidden from a particular user7

Name: JohnName: MichaelTelefone: 777-3400-0889Telefone: 768-757-5757SSN: 345-57-9877Doctors Notes: Very SickSSN: XXX-XX-9877EXPORT COPY WITHMARKLOGICCONTENT PUMPDoctors Notes: Very SickFigure 5: Redaction is similar to element level security, but is designed to control who sees what data when the data is exported from MarkLogic.IMPROVED MANAGEABILITYinformation security. With Ops Director, it is easy todetect problems before they occur, bringing problemsto attention without noise and provides learning andanalysis opportunities with centralized data collection,delivery and storage.MarkLogic 9 introduces a number of new features thatmake MarkLogic easier and more efficient to manage.Ops Director is a single pane of glass to visualize andmanage multiple clusters and multiple product versions,across environments, and across different user groups.Telemetry makes support more automated by sendingdiagnostic and system-level information about yourMarkLogic cluster to the MarkLogic support team,reducing case resolution time and manual overhead.Here is a short list of some of the things you can dowith Ops Director: OPS DIRECTOR: ONE CLEAR PICTURE OF YOURMARKLOGIC SYSTEM2While infrastructure costs are falling, personnel costsare rising, and today most organizations have DBAsmanaging more and more data and servers than everbefore. It is critical to make DBAs more efficient attheir jobs and modern databases must be as easy tomanage as possible. For that reason, MarkLogic 9has a fully redesigned experience foradministrating clusters. With MarkLogic 9, Ops Director provides a singlepane of glass for MarkLogic administration to makeit easier to manage MarkLogic, for both experiencedMarkLogic DBAs and those just learning. Ops Directorpresents a consolidated view for DBAs via rich visualsand dashboards. It streamlines monitoring andtroubleshooting of clusters with improved alerting,log searching and performance reporting. And, youstill get the benefit of MarkLogic’s enterprise-gradesecurity with robust Role-Based Access Control and2View a dashboard of events happening inyour clusterGet real-time alerts on key eventsMonitor clusters, hosts, databases and app serversView and respond to error logsAnalyze problematic hostsCommunicate with MarkLogic SupportView a management dashboard showingresource utilizationAnalyze performance of disk I/O, CPU, memory,network, databases, and serversTrack current task statuses for commonmaintenance tasksManage console settings for security and licensing,connectivity, and telemetryTELEMETRY: BETTER, FASTER SUPPORTTelemetry is part of our continuous effort to providebetter and more timely support by automating thecollection of diagnostic data about your MarkLogicclusters. MarkLogic has a strong track record ofproviding best-in-class customer support, andTelemetry continues to help by further reducing theturn-around time.Note: The initial release of Ops Director will follow the 9.0-1 server release.8

Telemetry is an opt-in feature. When it is enabled, itcollects, encrypts, and sends diagnostic and systemlevel information about a MarkLogic cluster to a secureMarkLogic destination. For privacy and security, userdata, application logs, and access logs are never sent.You have complete control to decide what data is sent:Log data, metering data, configuration files, and supportrequests. And, you can choose the data granularity tosend in order to accommodate environments with low orspotty bandwidth. Geospatial – More powerful and precise with theaddition of Geospatial Region Search and DoublePrecision to support applications with rigorousgeospatial requirements.EVEN MORE ENHANCEMENTS Enhanced Tiered Storage – More flexibility andbetter performance for Tiered Storage. Takingadvantage of MarkLogic’s indexes and loadbalancing, documents can be tiered based on rules(queries). And, queries across tiers are more efficientdue to query partitioning. Search – A new plugin API for tokenization andstemming that lets you use any external tokenizer orstemmer for any language, so you can build bettersearch across languages. MarkLogic 9 also addsenhancements to near search andwildcard searches.Beyond the improvements in the core areas of dataintegration, manageability, and security, MarkLogic 9 isfull of additional enhancements that you will appreciateevery day. Geospatial gets more advanced withincreased precision. Search gets more robust. TieredStorage and Bitemporal become easier to implementfor compliance. Beyond that, there are hundreds moreimprovements that continue to set the bar high forEnterprise NoSQL. JavaScript – Server-side JavaScript has beenupdated to the latest version of JavaScript, ES2015,which provides new syntax and standard libraries tomake code easier to write, read, and debug. Compliance Archive NEW – Protect temporaldocuments against deletion, updates, and wipesusing time-based or event-based policies, and tosave documents to WORM storage. This featureworks with Encryption to ensure that data cannot bechanged on disk by a system administrator. Query Console – Type-ahead suggestionsautomatically suggest functions, namespaces, inscope variables, and keywords, depending on thecurrent context. Also, a new Profile mode to betterunderstand query execution, generating a reportabout what the input code is doing underthe covers, including where time is spent. Azure Support NEW – An Azure image templateand recommendations for your deployments onMicrosoft’s leading cloud platform. Node.js – Updated to track new MarkLogic 9security and geospatial features.9

GETTING STARTED WITH MARKLOGIC 9MarkLogic 9 continues to set the bar high as the only Enterprise NoSQL database, and now it is even more capable to helpyou integrate your data from silos. Download MarkLogic 9 from the MarkLogic Developer site Read the release notes in the documentation for more details about MarkLogic 9 Take a MarkLogic University course covering all of what’s new or just on the new data integration featuresMarkLogic 9 is a free upgrade for all customers with an active support contract. For information on how to upgrade fromprevious releases, please review the release notes. The documentation also provides instructions for running MarkLogic 9in any environment.If you need additional support to upgrade to MarkLogic 9, MarkLogic Consulting offers an Upgrade Accelerator thatis designed to help existing customers put the powerful features of MarkLogic 9 into action. To get started with yourMarkLogic 9 Upgrade Accelerator, contact your MarkLogic Account Representative or Consulting Director. Alternatively,please give us a call at 1-877-992-8885 or email us at consulting@marklogic.com.For new customers interesting in purchasing, please visit marklogic.com or send an email to sales@marklogic.com. 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. This technology is protected by U.S. Patent No. 7,127,469B2, U.S. PatentNo. 7,171,404B2, U.S. Patent No. 7,756,858 B2, and U.S. Patent No 7,962,474 B2. MarkLogic is a trademark or registered trademark of MarkLogicCorporation in the United States and/or other countries. All other trademarks mentioned are the property of their respective owners.MARKLOGIC CORPORATION999 Skyway Road, Suite 200 San Carlos, CA 94070 1 650 655 2300 1 877 992 8885 www.marklogic.com sales@marklogic.com

999 Skyway Road, Suite 200 San Carlos, CA 94070 1 650 655 2300 1 877 992 8885www.marklogic.com sales@marklogic.com

MarkLogic 9 is designed with new capabilities to govern your integrated data through the entirety of its lifecycle and to secure it against increasing cybersecurity threats. For over a decade, organizations have run MarkLogic in mission-critical environments, and new security features further solidify MarkLogic's