1. Introduction CSEP 545 Transaction Processing Philip A. Bernstein

Transcription

3/24/07Copyright 2007 Philip A. BernsteinCSEP 545 Transaction ProcessingPhilip A. Bernstein1. Introduction1

3/24/071. The Basics2. ACID Properties3 Atomicity and Two-Phase Commit3.4. Performance5. Styles of SystemOutline2

3/24/07 Reserve an airline seat. Buy an airline ticketWithdraw money from an ATM.Verify a credit card sale.Order an item from an Internet retailerPlace a bid at an on-line auctionSubmit a corporate purchase order The execution of a program that performs anadministrative function by accessing a shareddatabase, usually on behalf of an on-line user.Examplesp31.1 The Basics - What’s a Transaction?

3/24/07 Reliability - system should rarely failAvailability - system must be up all the timeResponse time - within 1-2 secondsg p - thousands of transactions/secondThroughputScalability - start small, ramp up to Internet-scaleSecurity – for confidentiality and high financeConfigurability - for above requirements low costAtomicity - no partial resultsDurability - a transaction is a legal contractDistribution - of users and dataThe “ities” are What MakesTransaction Processing (TP) Hard4

3/24/07 It’s a huge slice of the computer systemmarket. One of the largest applications ofcomputers. Most medium-to-large businesses use TP fortheir production systems. The business can’toperate without itit. It’s at the core of electronic commerceWhat Makes TP Important?5

3/24/07 TP system makes it easy to program transactions TP system has tools to make it easy to manage– is an independent unit of work– executes exactly once, and– produces permanent results. TheTh TP systemt ensures thatth t eachh transactiontti– Enter a request from a browser or other display device– The system performs some application-specific work,which includes database accesses– Receive a reply (usually, but not always) User’s viewpointTP System Infrastructure6

3/24/07End-UserTransaction ServerDatabase System(routes requests andsupervises their execution)Request ControllerrequestsFront End ProgramBack-End(Server)Cli tClientDefines System and Application StructureTP System Infrastructure 7

3/24/07– hundreds of thousands of active display devices– plus indirect access via Internet– tens of thousands of transactions per second, peak A large-scale example: airline reservations– 0-30 disk accesses– 10K - 1M instructions executed– 2-20 messages Typically 100 transaction types per application Transaction size has high variance. Typically,System Characteristics8

3/24/07 Contributing factors– failures due to environment, system mgmt, h/w, s/w– recovery time9 Fraction of time system is able to do useful work Some systems are very sensitive to downtime– airline reservation, stock exchange, telephone switching– downtime is front page newsDowntimeAvailabilityy95.8%1 hour/day1 hour/week99.41%1 hour/month99.86%1 hour/year99.9886%1 hour/20years99.99942%Availability

3/24/07servers and 10Ks of displays– App Server helps system engineer deploy it on the Internet,accessible from web browsers– App Server helps system engineer deploy it to 10s/100s of– E.g. application developer writes programs to debit a checkingaccount and verify a credit card purchase. A software product to create, execute and manage TPapplications Formerly called TP monitors. Some people sayApp Server TP monitor web functionality. Programmer writes an app to process a single request.App Server scales it up to a large, distributed systemApplication Servers10

3/24/07 Enterprise Java Beans, IBM Websphere,Microsoft .NET (COM ), BEA Weblogic,Oracle Application Server– an application programming interface (API)(e.g., Enterprise Java Beans)– tools for program development– tools for system management (app deployment,fault & performance monitoring, user mgmt, etc.) Components includeApplication Servers (cont’d)11

3/24/07Transaction ServerNetworkTransaction ServerRequest ControllerQueuesFront End ProgramMessageInputs Boxes below are distributed on an intranetApp Server Architecture, pre-Web12

ControllerATMBank Branch 500Credit CardAccountsATMBank Branch 2RequestControllerATMCIRRUSAccountsATMBank Branch 1Automated Teller Machine(ATM) Application Example

3/24/07Transaction Serverintranetother TPsystemsTransaction ServerRequest ControllerQueuesWeb ServerMessageInputsApplication Server Architecture14

putersRequestControllerInternet Retailer 15

3/24/07MusicElectronicsWebServerWeb ServiceTheInternetComputersRequestControllerToys Web services - interface and protocol standardsto do app server functions over the internet.Service Oriented Architecture (SOA)Weeb Service16

3/24/07 EAI and Application Servers address a similarproblem, with different emphasis IBM Websphere MQ, TIBCO, Vitria, SeeBeyond– A queuing system– A message mappingi systemt– Application adaptors (SAP, PeopleSoft, etc.) A software product to route requests betweenindependent application systems. Often includeEnterprise Application Integration(EAI)17

3/24/07ATMCheckingAccountsATM18LoanAccountsEAI RoutingATMBank Branch 500Credit CardAccountsQueuesATMBank Branch 2EAI RoutingATMCIRRUSAccountsQueuesATMBank Branch 1ATM Examplewith an EAI System

A workflow script languageWorkflow script interpreter and schedulerWorkflow trackingMessage translationApplication and queue system adaptors3/24/07 Transaction-centric vs. document-centric Structured processes vs. case management IBM Websphere MQ Workflow, Microsoft BizTalk, SAP,Vitria, Oracle Workflow, FileNET, Documentum, .––––– A software product that executes multi-transactionlong-running scripts (e.g. process an order) Product componentsWorkflow, or Business Process Mgmt19

LoanAccountsCredit cardAccounts3/24/07 Heterogeneous query systems (mediators).It’s database system software, but It’s similar to EAI with more focus on datatransformations than on message mgmt There are hybrids, e.g., BEA AquaLogicCheckinggAccountsQuery MediatorData Integration Systems(Enterprise Information Integration)20

Application ServerEnterprise Application IntegrationBusiness process management (aka Workflow)Enterprise Server Bus3/24/07 New ones all the time, that defy categorization.–––– In summary, there are many variations thatpackage different combinations of middlewarefeatures.Transactional Middleware21

3/24/07HardwareOperating systemDatabase systemApplication Server This course focuses primarily on theDatabase System and Application Server– Getting all those components to work togetherto produce a system with all those “ilities”. TP is partly a system engineering problem–––– TP is partly a component product problemSystem Software Vendor’s View22

3/24/07!1. The Basics2. ACID Properties3 Atomicity and Two-Phase3.Two Phase Commit4. Performance5. Styles of SystemOutline23

3/24/07––––Atomicity - all or nothingConsistency - preserve database integrityIsolation - execute as if they were run aloneDurability - results aren’t lost by a failure Transactions have 4 main properties1.2 The ACID Properties24

3/24/0725– For database operations, restore the data’s previous valuefrom before the transaction– But some real world operations are not undoable.Examples - transfer money, print ticket, fire missile Commit and abort are irrevocable actions. An Abort undoes operations that already executed– E.g. in a money transfer, debit one account, credit theother. Either debit and credit both run, or neither runs.– Successful completion is called Commit.– Transaction failure is called Abort. All-or-nothing, no partial results.Atomicity

3/24/07Deferred operationnever gets executedT1: Start. . .CommitDispense MoneyT1: Start. . .Dispense MoneyCommitSystem crashesSystem crashesTransaction abortsMoney is dispensedExample - ATM Dispenses Money(a non-undoable operation)26

3/24/07User reads output U entersUsert iinputt27BraintransportT2: StartGet input from display.CommitT1: Start.Display output.If error, AbortReading Uncommitted Output Isn’tUndoable

3/24/07" A well-designed TP application should have acompensation for every transaction type28– E.g. Certain money transfers– E.g. Fire missile, cancel contract– Contract law talks a lot about appropriate compensations Not all transactions have complete compensations– “Adjustment” in a financial system– Annul a marriage A transaction that reverses the effect of anothertransaction (that committed). For example,Compensating Transactions

3/24/07" Consistency preservation is a property of atransaction, not of the TP system(unlike the A, I, and D of ACID) If each transaction maintains consistency,then serial executions of transactions do too.29– Referential integrity - E.g. each order references anexisting customer number and existing part numbers– The books balance (debits credits, assets liabilities)Every transaction should maintain DB consistencyConsistency

3/24/07 ri[x] Read(x) by transaction Tiwi[x] Write(x) by transaction Tici Commit by transaction Tiai Abort by transaction TiA history is a sequence of such operations,in the order that the database systemprocessed them.Some Notation30

3/24/07T2: Start;B Read(x);C Read(y);If (B C 1) then B B - 1;Write(x, B);Commit;– e.g. try it with x 4 and y 2 initially Consistency predicate is x y. Serial executions preserve consistency.Interleaved executions may not. H r1[x] r2[x] r2[y] w2[x] w1[y]T1: Start;A Read(x);A A - 1;Write(y, A);Commit;Consistency Preservation Example31

3/24/07 Intuitively, the effect of a set of transactionsshould be the same as if they ran independently Formally, an interleaved execution oftransactions is serializable if its effect isequivalentqto a serial one. Implies a user view where the system runs eachuser’s transaction stand-alone. Of course, transactions in fact run with lots ofconcurrency, to use device parallelism.Isolation32

3/24/07 T2: Start;B Read(x);B B 1;Write(y, B);Commit;H r1[x] r2[x] w1[x] c1 w2[y] c2H is equivalent to executing T2 followed by T1Note, H is not equivalent to T1 followed by T2Also, note that T1 started before T2 and finishedbefore T2, yet the effect is that T2 ran first.T1: Start;A Read(x);A A 1;Write(x, A);Commit;A Serializability Example33

3/24/0734r1[x] r2[y] w2[y] w1[y] T2 T1 T1 T2 Serializability says the execution is equivalent tosome serial order, not necessarily to all serial ordersr1[y] r2[y] w2[y] w1[x] T1 T2 T2 T1 Client must control the relative order of transactions,using handshakes(wait for T1to commit before submitting T2). Some more serializable executions:r1[x] r2[y] w2[y] w1[x] T1 T2 T2 T1Serializability Examples (cont’d)

3/24/07 Compare to the OS view of synchronization– e.g. T1 is moving 100 from x to y.– T2 sees only half of the result of T1 r1[x] r1[y] w1[x] r2[x] r2[y] c2 w1[y] c1(inconsistent retrieval)– e.g. each transaction is trying to make x y,but the interleaved effect is a swap r1[x] r2[y] w2[x] w1[y]– e.g. T1 and T2 are each adding 100 to x r1[x] r2[x] w2[x] w1[x] (race condition)Non-Serializable Examples35

3/24/07– DB system writes all transaction updates to its log– to commit, it adds a record “commit(Ti)” to the log– when the commit record is on disk, the transaction iscommitted.– system waits for disk ack before acking to user When a transaction commits, its results willsurvive failures (e.g. of the application, OS,DB system even of the disk). Makes it possible for a transaction to be a legalcontractcontract. Implementation is usually via a logDurability36

3/24/07!1. The Basics!2. ACID Properties3 Atomicity and Two-Phase3.Two Phase Commit4. Performance5. Styles of SystemOutline37

3/24/07 Distributed systems make atomicity harder Suppose a transaction updates data managed bytwo DB systems.ycould commit the transaction,, One DB systembut a failure could prevent the other system fromcommitting. The solution is the two-phase commit protocol. Abstract “DB system” by resource manager(could be a SQL DBMS, message mgr, queuemgr, OO DBMS, etc.)381.3 Atomicity and Two-Phase Commit

3/24/0739– Phase 1 - T’s coordinator asks all participant RMs to“prepare the transaction”. Each participant RM replies“prepared” after T’s updates are durable.– Phase 2 - After receiving “prepared” from allparticipant RMs, the coordinator tells all participantRMs to commit. Main idea - all resource managers (RMs) save adurable copy of the transaction’s updates beforeany of them commit. If one RM fails after another commits, the failedRM can still commit after it recovers. The protocol to commit transaction TTwo-Phase Commit

StartCommit, Abort3/24/07ResourceManagerTransactionManager (TM)OtherTransactionManagers1. Start transaction returns a unique transaction identifier2. Resource accesses include the transaction identifier.For each transaction, RM registers with TM3. When application asks TM to commit, the TM runstwo-phase commit.Read,WriteApplication ProgramTwo-Phase CommitSystem Architecture40

3/24/07!1. The Basics!2. ACID Properties!3 Atomicity and Two-Phase!3.Two Phase Commit4. Performance5. Styles of SystemOutline41

3/24/07 TPC A & B (‘89-’95), now TPC C &W– http://www.tpc.org. TP Performance Council (TPC) sets standards42– 10% application server plus application– 30% communications system (not counting presentation)– 50% DB system Measured in max transaction per second (tps) orper minute (tpm), and dollars per tps or tpm. Dollars measured by list purchase price plus 5 yearvendor maintenance (“cost of ownership”) Workload typically has this profile:1.4 Performance Requirements

3/24/07 End of history and branch records are bottlenecks43StartRead message from terminal (100 bytes)Read write account record (random access)Write history record (sequential access)Read write teller record (random access)Read write branch record (random access)Write message to terminal (200 bytes)Commit Obsolete (a retired standard), but interesting Input is 100 byte message requesting deposit/withdrawal Database tables {Accounts, Tellers, Branches, History}TPC-A/B — Bank Tellers

54624854306823/24/07 TPC-C uses heavier weight rderNew-OrderOrderLineStockItemThe TPC-C Order-Entry Benchmark44

Get records describing a warehouse, customer, & districtUpdate the districtIncrement next available order numberInsert record into Order and New-OrderNew Order tablesFor 5-15 items, get Item record, get/update Stock recordInsert Order-Line Record3/24/0745 Payment, Order-Status, Delivery, Stock-Level havesimilar complexity, with different frequencies tpmC number of New-Order transaction per min.–––––– New-OrderTPC-C Transactions

3/24/07– Some high-end system sales require custombenchmarks. Not all vendors optimize for TPC-C. Does not predict how your application will run,or how much hardware you will need,or which system will work best on your workload Enables apples-to-apples comparison of TPsystemsComments on TPC-C46

3/24/07– HP ProLiant, 18K tpmC, 28K, 1.57/tpmC, 10/19/04– Dell, 70K tpmC, 66K, 0.96/tpmC, 3/9/0747 Examples of low cost (MS SQL Server, Windows, COM )– IBM, 4.0M tpmC, 12.0M, 2.97/tpmC(1/22/07 IBM AIX/DB2, MS Windows/COM ) Examples of high throughput (32 dual-core processors) System cost 27K (HP) - 12M (IBM)– Low end numbers are almost all MS SQL Server & Windows.– High end is mostly Oracle and IBM, Linux, BEA Tuxedo All numbers are highly sensitive to date submitted. 1 - 6 / tpmC for results released in 2006-2007.Typical TPC-C Numbers

3/24/07 More realistic disk configuration (smaller % ofprice) A bbrokeragekapplicationli i Replaces TPC-C, it’s database-centric Approved March 07Coming Soon, TPC-E48

3/24/07!1. The Basics!2. ACID Properties!3 Atomicity and Two-Phase!3.Two Phase Commit!4. Performance5. Styles of SystemOutline49

3/24/07 TP is System Engineering Compare TP to other kinds of systemengineering Batch processing - Submit a job and receive fileoutput.t t Real time - Submit requests that have a deadline Data warehouse - Submit queries to a shareddatabase, populated from TP data sources TP - Submit a request to run a transaction1.5 Styles of Systems50

3/24/07– TP gathers the input. BP post-processes work that has weakresponse time requirements– So, TP systems must also do BP well. A BP application is usually uniprogrammed soserializability is trivial. TP is multiprogrammed. BP performance is measured by throughput.TP is also measured by response time. BP can optimize by sorting transactions by the file key.TP must handle random transaction arrivals.arrivals BP produces new output file. To recover, re-run the app. BP has fixed and predictable load, unlike TP. But, where there is TP, there is almost always BP too.TP vs. Batch Processing (BP)51

3/24/07 In RT, response time goals are usually more importantthan completeness or correctness. In TP, correctness isparamount.– usually loose about atomicity and serializability52 RT has more stringent response time requirements. It maycontrol a physical process. RT deals with more specialized devices. RT doesn’t need or use a transaction abstractionTP vs. Real Time (RT)

3/24/07 Often long-running queries, usually with lower dataintegrity requirements than TP. TP systems provide the raw data for DSSs.– Populate the warehouse (extract, transform, load (ETL))– Run queries against the data warehouse Two usage scenariosTP and Data Warehouse53

3/24/07!1. The Basics!2. ACID Properties!3 Atomicity and Two-Phase!3.Two Phase Commit!4. Performance!5. Styles of SystemOutline54

3/24/07 This chapter covered TP system structure andproperties of transactions and TP systems The rest of the course drills deeply into eachof these areas, one by one.What’s Next?55

Workflow, or Business Process Mgmt Ł A software product that executes multi-transaction long-running scripts (e.g. process an order) Ł Product components Œ A workflow script language Œ Workflow script interpreter and scheduler Workflow tracking 3/24/07 19 Œ Workflow tracking Œ Message translation Œ Application and queue system adaptors Ł Transaction-centric vs. document-centric Ł .