Snowflake Data Sharing

Transcription

SnowflakeData SharingEXTENDING THE BUILT-FOR-THE CLOUD DATA WAREHOUSEBEYOND ORGANIZATIONAL AND APPLICATION BOUNDARIESWHITEPAPER

Data sharing is crucial to business operations. Retailers need to share sales data with vendors tomanage inventory and supply chains. SaaS providers need to share the data they collect on behalf oftheir clients to support deeper customer and operational analytics. The list goes on.To date, there has been no technology solution that organizations could turn to for sharing data.Traditional data warehouse platforms were not built to support the constant need to share data.They are too costly, inflexible and complex. As a result, organizations are forced to use a patchworkof solutions that include cumbersome methods such as FTP, APIs, email and file sharing.Snowflake Data Sharing is a new innovation, available as part of Snowflake’s data warehouse builtfor the cloud. Organizations can now externally share live data, at any scale, with other organizationswhile maintaining a single source of truth. Snowflake Data Sharing enables any organization to pursuenew and imaginative ways to create insights and value from data.SNOWFLAKE DATA SHARINGSnowflake Data Sharing is a powerful yet simple-touse feature of Snowflake for sharing data and for usingshared data. In a matter of minutes, you can providelive access to any of your data stored in Snowflake Unlike cloud storage and file sharing services,Snowflake Data Sharing enables immediate queryingof data in a secure, governed and controlledenvironment. Unlike electronic data exchange (EDI) and API-basedfor any number of data consumers, inside or outsideapproaches, Snowflake Data Sharing eliminatesyour organization, without moving or copying thedelays to viewing updated data, supports unlimiteddata. Share data across corporate divisions, externalscale and allows unlimited concurrent access.data consumers, and business partners to easilysupport richer analytics, new business models anddata-driven initiatives.Fundamentally, traditional methods of data sharingaddress only one part of the challenge—moving data.Although traditional data warehouses and data lakesWith Snowflake Data Sharing, ready-to-use data iswere designed to make data usable, they lack animmediately available in real time. Query speeds arearchitecture capable of meeting the needs of dataexponentially faster thanks to the limitless storagesharing. Along with a lack of security and governance,and compute resources of Snowflake’s cloud-builtamong other things, their architectures cannot supportarchitecture. Snowflake offers a new way to shareconcurrent access without cumbersome unloading anddata without the limitations and inefficiencies oftransfering in order to copy and move data from a dataexisting solutions:provider to their data consumers. Unlike file transfer approaches, such as FTP andThe lack of a comprehensive solution creates aemail, Snowflake Data Sharing is far easier to use,struggle for data providers and consumers to easilyprovides instant access to live data and eliminatesshare data. Cumbersome and incomplete data sharingdata copying or movement.processes also constrain the development of businessopportunities from shared data.WHITEPAPER2

data in Snowflake can be accessed by any numberMADE POSSIBLE BY SNOWFLAKE’SBUILT-FOR-THE-CLOUD ARCHITECTUREof independent compute clusters without requiringmultiple copies of the data. This unique architectureIn contrast, Snowflake’s patented multi-cluster, sharedsimilarly provides all Snowflake customers the abilitydata architecture is the key to Snowflake Data Sharing.to share live data between them. Data providersAs a result, Snowflake’s data warehouse as a servicecan also make updates to data without contendingallows you to store, integrate and analyze all yourfor processing resources with other users, or fromdata, share data, and use shared data, all from acustomers reading data at the same time.single solution.Global Metadata and ManagementSnowflake Data Sharing is built on three keyMaking shared data usable requires access to dataarchitectural innovations:and coordination across all Snowflake customers and Decoupling of storage and computeusers to ensure consistency, security and performance.Snowflake’s services layer is a key part of Snowflake’s Global metadata and managementarchitecture. Global metadata, transactions and security Unlimited concurrencyare all managed from here, making it the controltower that tracks, logs and directs access to dataIndependent Storage and Compute Scalingfor every database element and object containedwithin Snowflake.The separation of storage and compute resources is afundamental part of Snowflake’s architecture. All dataThe Snowflake services layer also provides criticalis stored, in optimized form, in the cloud built onservices required for data sharing. It provides centrallyAmazon S3. Data is managed by the Snowflake service,managed control of access to data and ensures thatwhich capitalizes on the scalability, resiliency and near-data is secure at all times. Additionally, the servicesinfinite capacity of cloud storage. The data in cloudlayer provides transactional consistency across all datastorage can be accessed concurrently by any numberproviders and data consumers, ensuring that all dataof independent compute clusters.users see a consistent view of the data that it is alwaysup to date. A data provider can update shared data inDecoupling of storage and compute is also criticalreal-time. Likewise, all data consumers can view thefor sharing data. It enables data consumers todata provider’s updates and immediately query thedirectly access shared data. Unlike the monolithicshared data at the same time—all with transactional,architectures that bind storage and compute together,Fig. 1: The Snowflake Builtfor the Cloud Architecture –Separate Storage, Compute,and ServicesSERVICESACID-based consistency.COMPUTESTORAGEWHITEPAPER3

The data provider then GRANTS access to a dataUnlimited ConcurrencyWith Snowflake, shared data can be accessed by largenumbers of concurrent users and applications. Incontrast, the architecture of traditional data warehousesforces all users to compete for resources, creating astruggle to deliver consistent performance. Snowflake’sautomatic concurrency scaling via multi-clusterwarehouses takes simultaneous query processing evenfurther, automating concurrency scaling within eachconsumer. Instant access and no data copying ormovement are made possible because all databaseobjects are maintained and updated in Snowflake,and are orchestrated by Snowflake’s global metadatamanagement. Snowflake’s global metadata managementdirects access to the shared data, according to theparameters established by the data provider viaSQL semantics.The data consumer, through their SnowflakeSnowflake environment.environment, now has secure, read-only access to theUSING SNOWFLAKE DATA SHARINGdatabase objects shared by the data provider. The dataconsumer can run analytics using whatever SnowflakeSnowflake Data Sharing allows sharing of a databaseresources are necessary from within their Snowflakeand any objects contained within the databaseenvironment. Organizations that do not already have(schemas, tables, views, etc.) with any other Snowflaketheir own Snowflake environment can quickly and easilyenvironment. When a database object is shared with asign up for the Snowflake service online and gain accessdata consumer, the object remains in the data provider’sto shared data through their new environment.Snowflake environment.To share data, a provider pays only for the SnowflakeData sharing is performed at the database level andstorage and compute resources they use. The act ofall shared data are first-class objects. This meanssharing data is at no cost. To query data, a consumerthe shared data exists independently and can bepays for only the Snowflake compute resourcesmanipulated and queried, along with any other databaserequired to query the shared data. No storage costswithin a Snowflake environment. Within a sharedapply for the data consumer unless they copy the datadatabase, Snowflake allows granular control of access tointo a table.the objects through grants. Only objects granted accessprivileges are shared with other Snowflake users.DATA SH ARING W IT H OUT SNOW F LA KEDATA SHAR IN G WITH SN OWFLAK ECOMPANYCOMPANYCOMPANYCOMPANYAABBDATA PROVIDERDATA PROVIDERCOMPANYCMulti-step process to ETL or deconstruct, secure, and email/transmit data.COMPANYCUse Snowflake Data Sharing to simply CREATE and GRANT a share.Fig. 2: Comparing alternatives—Snowflake makes data sharing easyWHITEPAPER4

for each data consumer. An example use case couldbe a sales CRM SaaS provider that maintains onemaster database for all CRM activities generated byits customers. When the sales CRM SaaS providergrants access to and shares data with its customers, SHARED METADATAthe provider GRANTs data shares based on customer IDs (e.g. IDs A, B, C), all from within one database.Simultaneously, the data provider can also executequeries on the shared database to support analyticsDATA PROVIDERSECURE VIEW OFSHARED DATADATA CONSUMERwithin the data provider’s business environment.ESTABLISHING DATA SHARINGThe first step in sharing data is to specify what databaseFig. 3: How Snowflake Data Sharing Worksobjects to share with specified consumers. This is doneUNLIMITED MULTI-TENANCY SCALABILITYA critical capability of Snowflake’s global metadatamanagement is controlling access with secure views.Any number of data consumers can be granted accessto the same database, but individual data consumerscan view only the objects within the database for whichthey’ve been granted access.With Snowflake, data providers have an easy methodto manage multi-tenancy within a single database, asopposed to managing multiple separate databasesvia a data share object, effectively an “empty shell” thatwill house the references to the actual database andthe shared database objects. Data shares are first-classobjects in Snowflake for which Snowflake provides a setof DDL commands for creating and managing shares.Commands include CREATE SHARE, ALTER SHARE,DROP SHARE and others. Access commands includeGRANT and REVOKE privileges.Once a share is created, the data provider grants accessto the specific database and database objects it shares.The SQL semantics are as follows:1. Create the shareThe following example creates an empty share named[sales s]:DATA CONSUMERScreate share sales s;AABCDATA PROVIDER2. Add privileges for objects in the shareBGrant usage on the primary object before grantingCusage on any objects within the primary object. Forexample, grant usage on a database before grantingusage on any schemas contained within the database.Complete all grants for the data share before addingFig. 4: Multi-tenancy Data Sharing withControlled Access and Secure ViewsSnowflake data consumers.The following example grants privileges for the [salesdb] database, the [aggregates eula] schema and the[aggregate 1] table to the data-share object:WHITEPAPER5

FROM DATA WAREHOUSETO DATA SHAREHOUSE grant usage on database sales db toshare sales s;grant usage on schema salesdb.aggregates eula to share sales s;grant select on table salesdb.aggregates eula.aggregate 1 toshare sales s;With unlimited data sharing and multi-tenancycapabilities, Snowflake extends the capabilities of theSnowflake built-for-the-cloud data warehouse to theData Sharehouse. Snowflake Data Sharing enablesorganizations to easily forge one-to-one, one-to-manyand many-to-many relationships to share data in newand imaginative ways.3. Confirm the contents of the shareIndustry Use Case Examplesshow grants to share sales s;Snowflake Data Sharing and the Data Sharehouseapproach translate into a powerful but simpleand cost-effective data warehouse for driving andexpanding business intelligence and business assets4. Share the database objects in the shareobject with the desired data consumersThe following example makes the [sales s] shareavailable to other Snowflake environments:from data. Industry examples include: Adtech – Share live pageviews, click stream, andmore, directly with Adtech partners, driving moreeffective pricing for ad placements and fasterresponses for services. Retail – Share live sales data directly with vendorsalter share sales s addaccounts data consumerA, dataconsumerB;to assure the fastest possible and most accurateinventory and supply chain analytics and planning. Gaming – Share live gaming event data withSnowflake’s environments, [data consumerA] and[data consumerB], are now able to see the share andcreate a database from it.The above steps demonstrate that, with just a fewsimple commands, a data provider can easily share datawith any number of data consumers.developers, creative designers and other gameproduction partners to enhance the gamerexperience. Healthcare – Share read-to-query information,instantly, with medical groups and practices,hospitals, insurance companies and vendors toscale operations and reduce costs.CONCLUSION: SHARE AND IMAGINE MOREFrom a data warehouse to a Data Sharehouse—with Snowflake Data Sharing, you can create apowerful, easy-to-use solution that enables you to share data with any number of organizationsand sustain high levels of data processing concurrency, while maintaining a single source of truth.Organizations now have a new, never-before-available solution to share data, both internally andexternally, that transforms how business assets are created from data.WHITEPAPER6

LET’S GET STARTEDWant to learn more about the benefits of Snowflake Data Sharing?Visit our Data Sharing website.Snowflake Computing is the only data warehouse built for the cloud.Snowflake delivers the performance, concurrency and simplicity neededto store and analyze all data available to an organization in one location.Snowflake’s technology combines the power of data warehousing,the flexibility of big data platforms, the elasticity of the cloud andtrue data sharing at a fraction of the cost of traditional solutions.Snowflake: Your data, no limits. Find out more at snowflake.net.

Global metadata and management Unlimited concurrency Independent Storage and Compute Scaling The separation of storage and compute resources is a fundamental part of Snowflake's architecture. All data is stored, in optimized form, in the cloud built on Amazon S3. Data is managed by the Snowflake service,