Modern Database Systems - Cs.uwaterloo.ca

Transcription

Modern Database SystemsM. Tamer ÖzsuUniversity of WaterlooCheriton School of Computer Science

Database technology & marketn Relational DBMS market (IDC/Gartner)q q q q n 2011: US 24B1994: US 7B2010-2011 growth 16.3%Now supports every Fortune 1000 companyStress pointsq q Ease of deployment and useTechnological changesn q Networks, multi-cores, storage devicesNew applicationsn New models & features

Technology push/Application pulln Many technological changesq q q q n NetworksParallelism (Cluster computing & Multicore)Storage systemsProcessor technologyNew applicationsq q q q Web appsStreaming data appsBusiness analysisNon-business apps

Outlinen n n n Database fundamentalsNew applicationsTechnological changesSome important areas of study

File systems to database systemsApplicationprogram 1(with datasemantics)program 1DBMSSELECTFROMWHEREANDANDdata description 1 (with datacontrolsemantics)data description 2program 3Applicationdata description 3program 3(with datasemantics)ENAME,SALEMP,ASG,PAYDUR 12EMP.ENO ASG.ENOPAY.TITLE EMP.TITLEFile 1databaseFile 2File 3

Database Management Systemn ①Abstracts common functions and creates a uniform, welldefined interface for applications accessing dataData Modelall data are stored in a well defined way②Access controlonly authorized people get to see/modify database③Concurrency controlmultiple concurrent applications access data without inconsistency④Database recoverynothing gets accidentally lost⑤Database maintenance

Three-Level Schema nternal viewExternalview

Data Independencen n Applications do not access data directly but,through an abstract data model provided bythe DBMS.Two kinds of data independenceq q Physical: applications immune to changes instorage structuresLogical: applications immune to changes in dataorganization

Interfacing to the DBMSn n Data Definition Language (DDL): forspecifying schemaData Manipulation Language (DML): forspecifying queries and updatesq q Navigational (procedural)Non-navigational (declarative)

Generic DBMS Architecture

Client/Server Architecture

Query Processinghigh level user query (SQL)QueryDecompositionAlgebraic queryQueryOptimizationlow level data manipulationcommands

Logical Plansn n Represented as an expression ina logical algebra (may be asuperset of relational)Can be visualized as a tree oflogical operatorsSELECT titleFROM StarsInWHERE starNAME in (SELECT nameFROM MovieStarWHERE dob LIKE "%1960")

Logical Plansn Algebra expressions may be rewritten intosemantically equivalent expressions

Physical Plansn n n Also often called query execution plan (QEP)Represented as an expression in a physicalalgebraPhysical operators come with a specificalgorithmq E.g., nested loop join

Physical Plan Example

Types of Optimizersn n Exhaustive searchq Cost-basedq Optimalq Combinatorial complexity in the number of relationsHeuristicsq Not optimalq Regroup common sub-expressionsq Perform selection, projection firstq Replace a join by a series of semijoinsq q Reorder operations to reduce intermediate relation sizeOptimize individual operations

Optimization Granularityn Single query at a timeq n Cannot use common intermediate resultsMultiple queries at a timeq Efficient if many similar queriesq Decision space is much larger

Optimization Timingn Staticq q q q n Compilation è optimize prior to the executionDifficult to estimate the size of the intermediate results errorpropagationCan amortize over many executionsSystem RDynamicq q q q Run time optimizationExact information on the intermediate relation sizesHave to reoptimize for multiple executionsINGRES

TransactionA transaction is a collection of actions that make consistenttransformations of system states while preserving systemconsistency.q q concurrency transparencyfailure transparencyDatabase in aconsistentstateBeginTransactionDatabase may betemporarily in aninconsistent stateduring executionExecution ofTransactionDatabase in aconsistentstateEndTransaction

Properties of TransactionsATOMICITYq all or nothingCONSISTENCYq no violation of integrity constraintsISOLATIONq concurrent changes invisible È serializableDURABILITYq committed updates persist

Atomicityn n n n Either all or none of the transaction's operations areperformed.Atomicity requires that if a transaction is interrupted by afailure, its partial results must be undone.The activity of preserving the transaction's atomicity inpresence of transaction aborts due to input errors,system overloads, or deadlocks is called transactionrecovery.The activity of ensuring atomicity in the presence ofsystem crashes is called crash recovery.

Consistencyn Internal consistencyq q n A transaction which executes alone against aconsistent database leaves it in a consistent state.Transactions do not violate database integrityconstraints.Transactions are correct programs

Isolationn Serializabilityq n If several transactions are executed concurrently,the results must be the same as if they wereexecuted serially in some order.Incomplete resultsq q An incomplete transaction cannot reveal its resultsto other transactions before its commitment.Necessary to avoid cascading aborts.

Serial Historyn n n All the actions of a transaction occur consecutively.No interleaving of transaction operations.If each transaction is consistent (obeys integrity rules),then the database is guaranteed to be consistent at theend of executing a serial history.T1: Read(x)Write(x)CommitT2: Write(x)Write(y)Read(z)CommitT3: Read(x)Read(y)Read(z)CommitHs 3(z),C3}

Serializable Historyn n Transactions execute concurrently, but the neteffect of the resulting history upon the databaseis equivalent to some serial history.Equivalent with respect to what?q q Conflict equivalence: the relative order of execution ofthe conflicting operations belonging to unabortedtransactions in two histories are the same.Conflicting operations: two incompatible operations(e.g., Read and Write) conflict if they both access thesame data item.n n Incompatible operations of each transaction is assumed toconflict; do not change their execution orders.If two operations from two different transactions conflict,the corresponding transactions are also said to conflict.

Serializable HistoryT1: Read(x)Write(x)CommitT2: Write(x)Write(y)Read(z)CommitT3: Read(x)Read(y)Read(z)CommitThe following are not conflict equivalentHs 3(z),C3}H1 {W2(x),R1(x), R3(x),W1(x),C1,W2(y),R3(y),R2(z),C2,R3(z),C3}The following are conflict equivalent; thereforeH2 is serializable.Hs 3(z),C3}H2 3(z),C3}

Durabilityn Once a transaction commits, the system mustguarantee that the results of its operationswill never be lost, in spite of subsequentfailures.n Database recovery

Locking-Based Concurrency Controln n n n Transactions indicate their intentions by requesting locksfrom the scheduler (called lock manager).Locks are either read lock (rl) [also called shared lock] orwrite lock (wl) [also called exclusive lock]Read locks and write locks conflict (because Read andWrite operations are incompatiblerlwlrlyes nowlnonoLocking works nicely to allow concurrent processing oftransactions.

Two-Phase Locking (2PL)❶ A Transaction locks an object before using it.❷ When an object is locked by another transaction, therequesting transaction must wait.❸ When a transaction releases a lock, it may not requestanother lock.Lock pointObtain lockNo. of locksRelease lockPhase 1BEGINPhase 2END

Strict 2PLHold locks until the end.Obtain lockRelease lockBEGINperiod ofdata itemuseENDTransactionduration

Recovery Management Architecturen Volatile storageq n Consists of the main memory of the computer system (RAM).Stable storageq q Resilient to failures and loses its contents only in the presence ofmedia failures (e.g., head crashes on disks).Implemented via a combination of hardware (non-volatilestorage) and software (stable-write, stable-read, clean-up)components.SecondarystorageLocal RecoveryManagerMain ase BufferManagerReadDatabasebuffers(Volatiledatabase)

Update Strategiesn In-place updateq n Each update causes a change in one or moredata values on pages in the database buffersOut-of-place updateq Each update causes the new value(s) of dataitem(s) to be stored separate from the old value(s)

In-Place Update RecoveryInformationDatabase LogEvery action of a transaction must not onlyperform the action, but must also write alog record to an append-only file.Oldstable databasestateUpdateOperationNewstable databasestateDatabaseLog

LoggingThe log contains information used by the recoveryprocess to restore the consistency of a system.This information may includeq transaction identifierq type of operation (action)q items accessed by the transaction to perform theactionq old value (state) of item (before image)q new value (state) of item (after image)

Why Logging?Upon recovery:q q all of T1's effects should be reflected in the database(REDO if necessary due to a failure)none of T2's effects should be reflected in the database(UNDO if necessary)systemcrashBeginBegin0T1EndT2ttime

REDO ProtocolOldstable databasestateREDONewstable databasestateDatabaseLogn n n REDO'ing an action means performing it again.The REDO operation uses the log information and performsthe action that might have been done before, or not donedue to failures.The REDO operation generates the new image.

UNDO ProtocolNewstable databasestateUNDOOldstable databasestateDatabaseLogn n UNDO'ing an action means to restore the object to itsbefore image.The UNDO operation uses the log information and restoresthe old value of the object.

When to Write Log Records IntoStableStoreAssume a transaction T updates a page Pn Fortunate caseq q q n We can recover (undo) by restoring P to its old state byusing the logUnfortunate caseq q n System writes P in stable databaseSystem updates stable log for this updateSYSTEM FAILURE OCCURS!. (before T commits)System writes P in stable databaseSYSTEM FAILURE OCCURS!. (before stable log is updated)We cannot recover from this failure because there is nolog record to restore the old value.Solution: Write-Ahead Log (WAL) protocol

Write–Ahead Log Protocoln Notice:q q n If a system crashes before a transaction is committed, then allthe operations must be undone. Only need the before images(undo portion of the log).Once a transaction is committed, some of its actions might haveto be redone. Need the after images (redo portion of the log).WAL protocol :❶❷Before a stable database is updated, the undo portion of the logshould be written to the stable logWhen a transaction commits, the redo portion of the log must bewritten to stable log prior to the updating of the stable database.

Outlineü Database fundamentalsn New applicationsn Technological changesn Some important areas of study

Web as a databasen Characteristics of the webq Publicly indexable web:n n q Deep web (hidden web)n q Over 500 billion documentsVery dynamicn n n More than 25 billion static HTML pages.Over 53 billion pages in dynamic web23% of web pages change daily.40% of commercial pages change daily.Most Internet users gain access to the web usingsearch engines.

Properties of web datan Lack of a schemaq q n Volatilityq q n Changes frequentlyMay conform to one schema now, but not laterScaleq q n Data is at best “semi-structured”Missing data, additional attributes, “similar” data but not identicalDoes it make sense to talk about a schema for Web?How do you capture “everything”?Querying difficultyq q q What is the user language?What are the primitives?Aren’t search engines or metasearch engines sufficient?

Web Data Modelingn n Can’t depend on a strict schema to structure the dataData are self-descriptive{name: {first:“Tamer”, last: “Ozsu”}, institution:“University of Waterloo”, salary: 300000}n Usually represented as an edge-labeled graphq XML can also be modeled this ��“Ozsu”300000

Web Queryingn Search engines and metasearchersq q q n Keyword-basedCategory-basedMore semanticsQuestion Answering (QA) Systemsq q q Finding answers to natural language questions, e.g.What is a computer?Analyze the question and try to guess what type ofinformation that is required.Not only locate relevant documents but also extractanswers from them.

Web Queryingn Semistructured data queryingq q q Basic principle: Consider Web as a collection ofsemistructured data and use those techniquesUses an edge-labeled graph model of dataExample systems & languages:n n n Lore/LorelUnQLStruQL

OEM Model

Lorel Exampletitles of documentsby PatrickValduriezFind the authorsof all bookswrittenwhose priceis under c Dbib.doc(.authors)?.author “PatrickValduriez”D.what “Books” AND D.price 100

Web Query Languagesn n Basic principle: Take into account thedocuments content and internal structure aswell as external links.First generationq q q q Model the web as interconnected collection ofatomic objects.WebSQLW3QSWebLog

WebSQL Examplesn Simple search for all documents about “hypertext”SELECT D.URL, D.TITLEFROMDOCUMENT DSUCH THAT D MENTIONS “hypertext”WHERE D.TYPE “text/html”n Find all links to applets from documents about “Java”SELECT A.LABEL, A.HREFFROMDOCUMENT DSUCH THAT D MENTIONS “Java”ANCHOR ASUCH THAT BASE XWHERE A.LABEL “applet”Demonstrates two scoping methods and a search for links.

WebSQL Examples (cont’d)n Find documents mentioning “Computer Science” and alldocuments that are linked to them through paths of length 2 containing only local linksSELECT D1.URL, D1.TITLE, D2.URL, D2.TITLEFROMDOCUMENT D1SUCH THAT D1 MENTIONS “Computer Science”DOCUMENT D2SUCH THAT D1 - - - D2Demonstrates combination of content and structure specification in aquery.

Web Query Languagesn Second generationq Model the web as a linked collection ofstructured objectsq WebOQLq StruQL

WebOQLq q q PrimePeekHangq q q ConcatenateHeadTailq String patternmatch ( )

WebOQL ExamplesFind the titles and abstracts of all documents authored by“Ozsu”SELECT [y.title, y’.URL]FROMx IN dbDocuments, y in x’WHERE y.authors “Ozsu”

More Recent Approachesn Fusion Tablesq q q Users contribute data in spreadsheet, CVS, KML formatPossible joins between multiple data setsExtensive visualization

More Recent Approachesn XMLq q Data exchange languagePrimarily tree based structure

More Recent Approachesn RDF (Resource Description Framework) &SPARQLq q q W3C recommendationSimple, self-descriptive modelBuilding block of semantic web & Linked OpenData (LOD)

Growth in mobile web access

Streaming data applicationsn Inputs: One or more sources generate data continuously, inreal time, and in fixed orderq q q q q n Outputs: Want to collect and process the data on-lineq q q q n Sensor networks – weather monitoring, road traffic monitoring,motion detectionWeb data – financial trading, news/sports tickersScientific data – space station data, various experiment dataTransaction logs – telecom, point-of-sale purchasesNetwork traffic analysis (IP packet headers) – bandwidth usage,routing decisions, securityEnvironment monitoringLocation monitoringCorrelations across stock pricesDenial-of-service attack detectionUp-to-date answers generated continuously or periodically

Traditional database applicationsTransient queries- issued once,then forgottenPersistent data- stored until deletedby user or application

Streaming data applicationsTransient data- deleted as windowslides forwardPersistent queries- generate up-to-dateanswers as time goes on

Characteristicsn Streams have unbounded lengthPush-based (data-driven), rather than pullbased (query-driven) computation modelSingle-pass on-line processingn Uncertain datan n

acesocialne tosocialidlyn theevery.adingtprint.andnetSocialnetworksFigure 10: Facebook and LinkedIn have experienced large relativeincreases in global1 online reachRank Social ec 08ActiveReachDec 07RelativeChange 1.022.4%23.0%-3%3Classmates 15.04.2%1.8%137%Source: Nielsen Online, Global Index, December 2007 - December 2008.E.g. In Dec 08, 108.3 million people (30% of the world’s Internet population) visited Facebook. Facebook’sonline active reach has increased, relatively, by 168%, from 11.1% to 29.9%1‘Global’ refers to AU, BR, CH, DE, ES, FR, IT, UK & USA onlyFacebookhaspopularnow socialexceededusers(MayFigure11a: The mostnetworks900Min countrieswhereFacebookis theFiguresdoleadernot include tweets or blogsRank AustraliaSpainSwitzerland* FranceUKItaly2012)

Social Networking– sizeWorldwidesocialnetworkGlobal Phenomenon, Facebook Leading, Though Many Regional StrongholdsGlobal Social Networking Web Sites*830MM Unique Users, 20% Y/Y; 188MM Total Minutes, 25% Y/Y, 10/09QQ (Alumni Mini) – Tencent68MM users 138% Y/Y1Vkontakte23MM users 22% Y/YFacebook430MM users 137% Y/YSkyrock21MM users 10% Y/YMySpace110MM users-14% since 10/08Twitter58MM users, 1238% Y/YHi547MM users-18% Y/YBaidu Space63MM users 33% Y/YCyWorld21MM users 4% Y/YKaixin00125MM users 325% Y/YMixi14MM users 4% Y/YFriendster18MM users-47% Y/YOrkut54MM users 20% Y/YNote: *Global social networking websites exclude application-based networks such as IM networks. 1) QQ.com socialnetworks (Tencent properties) Y/Y growth since 1/09, data unavailable prior to 1/09 for QQ.com Mini. Usage stats are‘unique visitors’, per comScore global 10/09, may differ materially from company-disclosed ‘registered accounts’ stats.Other notable social networks include Windows Live Profiles, 56.com, Deviantart, Digg, Buzz Media, and Bebo.Source: comScore 10/09, Morgan Stanley Research.30

Looking at Facebook

a, foredldsdded1come fromagedto35-49shiftingfrompeoplethe youngthe old oldaes towork 5:TheFacebook’sgrowthglobal Communityaudience emberwebsites ! " " ! dssddedon).gwas a 3.7 million global increase in the number of 2-17 year old males visiting Facebook1‘Global’ refers to AU, BR, CH, DE, ES, FR, IT, UK & USA only Source: Nielsen Online, Custom Analytics, December 2007 – December 2008. E.g. Between Dec 07 and Dec 08,the share of the online global1 audience to ‘Member Community’ sites accounted for by 2-17 year olds decreasedSource: Nielsenrelativelyby 9% Online, Global Index, December 2007 – December 2008. E.g. Between Dec 07 and Dec 08 there1was a 3.7 million global increase in the number of 2-17 year old males visiting Facebook‘Global’ refers to AU, BR, CH, DE, ES, FR, IT, UK & USA only1‘Global’ refers to AU, BR, CH, DE, ES, FR, IT, UK & USA onlyFacebookout as a servicefor universitystudents Communitybut now almostwebsitesone third isofFigure 6:startedThe audiencecompositionof Memberits global audience is aged 35-49 years of age and almost one quarter is over 50 years

Some social network applicationsn Social network analyticsq n n n Finding patternsPrivacyInformation extraction and storageQuerying

Business Intelligence (BI) applicationsn “Business Intelligence is a set of methodologies, processes,architectures, and technologies that transform raw data into meaningfuland useful information used to enable more effective strategic, tactical,and operational insights and decision-making” (Forrester Research)s accessing data from theouse to perform enterpriseOLAP, querying, and preytics.Business intelligence frameworkVALUE OF BIof hardware, software,run a distributed decisiontform is significant.I reduces IT infrastructureminating redundant dataprocesses and duplicated in independent datathe enterprise. For examstified its multimillionwarehouse platform basedgs from data mart consolJ. Watson, B.H. aGetting data in(data warehousing)DataaccessMartGetting data out(business intelligence)Figure 1. BusinessWatsonintelligenceframework.Theframework includes& Wixom,IEEEBIComputer,2007 two primary activities, getting data in and getting data out.

BI applicationsn Evolution from decision support systemsq n Deep analysisCombination ofq q q q q q OLAPData warehousingData miningData visualizationInformation extractionPredictive analytics (statistics, game theory, mining)

Challengesn n n n n n Very large amounts of dataVery volatile dataData as streamsData is not structured in the database senseData representation is a graphUncertain data

Implications for Databases - Big Datan Volumeq n Varietyq q q q q n Structured dataText dataMultimedia dataGraph and RDF data Velocityq n Almost every dataset we use is high volumeStreaming dataVeracityq q Data quality & cleaningData uncertainty

Uncertain DataBy 2015, 80% of all available data will be uncertainBy 2015 the number of networked devices willbe double the entire global population. Allsensor data has uncertainty.8000 gate Uncertainty %Global Data Volume in Exabytes9000The total number of social mediaaccounts exceeds the entire globalpopulation. This data is highly uncertainin both its expression and content.Data quality solutions exist forenterprise data like customer,product, and address data, butthis is only a fraction of thetotal enterprise data.2010Enterprise Data02005Multiple sources: IDC,CiscoFrom Opher Etzion20102015

Implications for databasesHeterogeneity n of dataq q q n of applications and interfacesq q n Application integrationAbility to seamlessly access multiple applicationsof environmentq n Structured and non-structured dataForget schema in most casesMissing data, additional attributes, “similar” data but notidenticalin terms of network and systemsof workload

Implications for databasesn Distribution of dataq q q q n n Massive distributionSome data will be stored, some data will be streamingSome data will be very volatileSome data may not be available all the timeRe-architecting the DBMS (more later)Access and retrievalq q q q Better user interfacesHow do we do multi-modal searchHow do we connect querying with keyword search?How do we execute these queries over distributeddata? Will old techniques work?

Implications for databasesn Information qualityq q q q q n How reliable are the answersHow do we query over incomplete dataHow do we query over uncertain dataData lineage & provenanceData cleaningQuality of serviceq What guarantees can we give users?n n n n in terms of resultsin terms of DBMS functionsin terms of performancein terms of consistency

Outlineü Database fundamentalsü New applicationsn Technological changesn Some important areas of study

What is happening in the network world?n IDC prediction: 17 billion traditional networkeddevices by Handhelds M)

How about non-traditional devices?RFIDs and Sensors(1 Trillion)Traditional network devices(17 Billion)

Interconnection and data volumes

Pervasive networksn DLNA: Digital Living Network Alliance DLNA

How about network capabilities?n Bandwidth on fixed-lineTomorrow: Fire hoseGreg Papadopoulos, Sun Microsystems CTO (1994-2010)Today: Garden hose

How about network capabilities?n Transmission speed (messaging overhead)New Protocols(e.g., FiberChannelover Ethernet)SoftwareOverheadHardwareOverhead10 Mbps100 Mbps1 Gbps

Changing communicationsIn 10 years, there should ben a ubiquitous, open infrastructure with location detection;n an architecture that is secure, robust and trustworthy;n a local communications architecture that allows the localinterconnection of dozens or hundreds of small devices;n a network ready for existence of quantum computers;n automatic diagnosis and configurability mechanisms forInternet;n way for any physical object to tag itself in a way that linksit to relevant info and function in cyberspace;n reduction in energy cost per bit of data to the level of1/1000th the cost today.D.D. Clark et al., Making the World (of Communications) a Different Place, ACM SIGCOMM ComputerCommunications Review, July 2005.

Mobile Internet Ramping Faster than Desktop Internet Did –Apple Leading ChargeMobile SystemsiPhone iTouch vs. NTT docomo i-mode vs. AOL vs. Netscape UsersFirst 20 Quarters Since Launch100 86MMSubscribers (MM)80Mobile InternetDesktop InternetiPhone iTouchNetscape*Launched 6/07Launched 12/9460Mobile Internet 31MMNTT docomo i-mode40Launched 6/99 18MMDesktop Internet20AOL*v 2.0 Launched 9/94 8MMQ1Q3Q5Q7Q9Q11Q13Q15Q17Q19Quarters Since LaunchiPhone iTouchNTT docomo i-modeAOLNetscapeNote: *AOL subscribers data not available before CQ3:94; Netscape users limited to US only. Morgan Stanley Research estimates 50MM netbooks have shipped in first 10 quarters since launch (10/07). Source: Company Reports , Morgan Stanley Research.4

Smartphone PC Shipments Within 2 Years, Global –Implies Very Rapid Evolution of Internet AccessGrowth in numbersAnnual Unit Shipments (MM)Global Unit Shipments of Desktop PCs Notebook PCs vs. Smartphones, 2005 – 2013E7002012E: Inflection PointSmartphones Total 11E2012EDesktop PCs Notebook PCs Smartphones2013ENote: Notebook PCs include Netbooks. Source: IDC, Gartner, Morgan Stanley Research estimates.5

Characteristicsn No fixed location for user accessesq q n n n n Mobility is the keyLocation-dependent queriesLimited bandwidth/high latency on the d power

64-bit Processorsn 264 is big!q q 10,000 sites each with 1,600 Terabytes of addressspaceIf OS allocates address spaces to processes atthe rate of 1 GB/secn n n 32 bit address space is consumed in 4.3 seconds64 bit address space is consumed in 500 yearsSingle-level address spaceq q Mapping of large parts of the database to theaddress space saving I/O5 minute rule

Cluster Computingn n n n A parallel systemUsually homogeneousEach run its own OSimageScalability andavailability

Data Centers

Multi-core Architecturesn n n Double the number ofprocessors per chipevery 18 monthsPotential massiveparallelismChallenges:q q Total rethinking ofstorage engineConcurrency in thepresence of massiveparallelismIBM, solidDB descriptionHerb Sutter, “The Free Lunch is Over: A Fundamental TurnToward Concurrency in Software,” Dr. Dobb’s Journal, 2005

Storage Devicesn 4TB hard disk at US 497 (WD)q n n n n 8TB hard disks are 1,200 - 2,00018TB hard disks are coming“Disk is the new tape” (Stonebraker)Flash is now mainstreamq n 1,540 DVD movies, 800,000 digital photos orone million MP3 music filesDifferent storage hierarchyMore intelligent storage systemsq Move more processing to storage controllersn q Simple key-value type queries supported by storage serversMulticore and GPU also support this

Storage – One ServerEd Chang, Google, Beijing

Storage – One RackEd Chang, Google, Beijing

Storage – One CenterEd Chang, Google, Beijing

Combining These Technologies(University of Maryland, College Park)

Technology implications for DBMSsn n n n MassiveparallelizationMain-memorydatabasesHigh bandwidth/lowlatency in localconnectionsStill limited bandwithwith high latency inmobile connectionsP1M1D1PnDnIBM, solidDB descriptionMn

Outlineü Database fundamentalsü Technological changesü New applicationsn Some important technologies

Some important technologiesn n n n n n n n n n Disclaimer: the list is impartial and reflect mybiasesDistributed/Parallel data managementData integrationColumn storesMain memory data managementGraph databasesXML data managementCloud computingMobile data managementStream data management

99

Older but a bit more helpful100

Types of Data Stores*n Key/value storesq n Document storesq q n Typed vs untypedDocuments consist ofattributes: named/typedvalue pairsDocuments can beheterogeneousRelational databasesq q Rows and Material taken from "Column-oriented databases" by Jan Steemann – search onSlideShare

Row-oriented DBMSsn n Contiguous organization as rowsData file organizationq Values at specific offsets based on colum type

Row-oriented DBMSsn n n Data files are organized in pagesTo read data for a specific row, the full pageneeds to be readPages are not fully filledq q n Allows subsequent in-place updatesTradeoff between data movement and space wasteRows must fit onto pages, leaving parts unused

Column-oriented DBMSn Store data in column-specific filesq n n Simplest case: one data file per columnRow values for each column are stored contiguouslyUsually highly compressed

Motivation for distributed distributionDistributedDBMSintegrationintegration centralization

Transparent wareSELECT ubsystemWHERE DUR 12Distributed DatabaseANDEMP.ENO ASG.ENOANDPAY.TITLE oftwareUserQuery

Client/Server Architecture

Parallel DBMSn Loose d

Database Management System ! Abstracts common functions and creates a uniform, well-defined interface for applications accessing data ① Data Model all data are stored in a w