Query Optimization Based On Heuristic Rules - IJERT

Transcription

International Journal of Engineering Research & Technology (IJERT)ISSN: 2278-0181Vol. 3 Issue 7, July - 2014Query Optimization Based on HeuristicRulesVishal HatmodeDepartment of Information Technology, SiddhantCollege of Engineering, Pune, India.Professor Sonali RangdaleDepartment of Information Technology, SiddhantCollege of Engineering, Pune, India.result of examined queries good query plan is identified. Thismay or may not be the absolute best of all strategies because thereare many ways of creating plans. There is a trade-off between theamount of time spent figuring out the best plan and the amountrunning the plan. Different database management systems havedifferent quality and ways of balancing these two. The mainperformance advantage is achieved in modern database system isby means of query optimizers. [1] Given a request for dataretrieval; an Optimizer will select an optimal plan for evaluatingwith the given request from among the manifold differentstrategies. The largest and most complex module of databasesystems is query optimizers and it’s been proved its difficulty tomodify and extend to accommodate these areas. Queryoptimization is very large area within the database field. It’s beenstudied in a great variety of contexts and from many divergentangles, giving rise to several different solutions in each case.Over time, SQL has become as the standard for relational querylanguages. Query optimization and query execution are the twokey components for query evaluation of an SQL database system[1][6].Heuristic Optimization is less expensive than that of costbased optimization. It is based on some heuristic rules by whichoptimizer can decide optimized query execution plan [6].IJERTAbstract: In this paper, we will Rules based on Heuristicapproach for query optimization. It is often observed inmany database industries that a lot of time is spent inexecuting inefficient SQL queries. The problem here is thatalthough the DBA knows that the query red are inefficient,the large sections of people who are actually running thesequeries are unable to write efficient queries. As a result, theperformance of the entire system affects because of the majorfall in the system performance i.e. the number of transactionsperformed per unit time is reduced. Typically, to overcomethis problem, most of the people frequently consult the DBAfor writing efficient and optimized queries. This approachresults in a lot of time and money loss. A better solution isusing a Query Optimizer Tool. A Query Optimizer willaccept the user query and automatically generate anequivalent but highly optimized and effective query. This willsave a lot of time, cost and effort. This in turn improves thesystem performance and its overall throughput capability.The Query Optimizer in this paper is based on HeuristicOptimizer. It tries to minimize the number of accesses byreducing the number of tuples and number of columns to beselect operation.Backround NeedSpeed of execution is main factor in huge databases ofbiological, physical or chemical projects. Optimization needed forI. INTRODUCTIONsaving energy and resources [2]. The query optimizer is one ofThe main function of many relational database management the important components in today’s database managementsystems is query optimization in which multiple query plans are systems. This component is responsible for transform userto be prepared for satisfying a query are examined, and based on submitted queries usually written in non procedural language intoKeywords: Heuristic Approach, Query Optimization, tuplesIJERTV3IS070992www.ijert.org(This work is licensed under a Creative Commons Attribution 4.0 International License.)1001

International Journal of Engineering Research & Technology (IJERT)ISSN: 2278-0181Vol. 3 Issue 7, July - 2014Algebraic Expressions for following querySELECT p.pname, d.dname FROM Patients p, Doctors d Wherep.doctor d.dname AND d.gender 'M'Advantages:1. Rather than considering with the time constraints, it adapts toclient requirements.2. Speed of query retrieval increaseDisadvantages:1. Uses cost based optimization hence expensive.Figure 1: Relational Algebra Expression for QueryIJERTefficient query evaluation program that can be executed againstdatabase. In this paper, we will gain a feel for how queryprocessing works in a database management system, specificallyfocusing on a simple Select-Project-Join query engine. We willalso see how various query execution plans have differentperformance results, which will provide some motivation forquery optimization techniques [5].Today all corporations, companies and organizations likebanks maintain their own databases. These databases are used forholding many different kinds of information like customerdetails, employee details, and so on and so forth. The size of thesedatabases is too huge i.e. containing large amount of records.Retrieval of information is done by querying the database usingSQL. Now if the query used is an inefficient one then it consumesa lot of time before producing the result [6]. This is because aninefficient query requires a lot of memory operations i.e. thenumber of I/O operations is very high. Since in any computersystem the main factor that hampers high speed performance isthe I/O operation, an inefficient query is very slow inperformance which consumes lot of time. Thus an inefficientquery reduces the system performance by reducing thethroughput greatly. Hence being able to write an efficient query isa important. An efficient query reduces the number of diskoperations and hence reduces the number of I/O operations diskaccess rate. As a result an efficient query produces the resultsmuch more quickly than its inefficient counterpart. Nowobviously the question that arises is how to write such an efficientquery? This is where Query Optimizer steps in. This allows youto write your own queries and will in turn generate an equivalentbut highly and efficient optimized query. Thus the QueryOptimizer helps in enhancing the system performance byincreasing the throughput.II. LITERATURE SURVEYDifferent Query Optimization ApproachesCost Based OptimizationA cost-based query optimizer works as follows: First, itgenerates all possible query execution plans. Next, the cost ofeach plan is estimated. Finally, based on the estimation, the planwith the lowest estimated cost is chosen. Since the decision ismade using estimated cost values, the plan chosen may not beoptimal. [1] The quality of optimizer decisions depends on thecomplexity and accuracy of cost functions used. It includesdifferent techniques such as the use of dynamic programming fordeciding best plan. Their main disadvantage is that it is verycostly. As a result most of the optimizers do not employ thisstrategy [2] [3]1. Generates all possible query execution plans and then basedon it cost is calculated.2. Quality depends on complexity and accuracy of the costFunction.IJERTV3IS070992Figure 2: Execution PlanSemantic Query OptimizationTwo queries can be called as a semantically equivalent ifthey return the same answer for a database. For this purpose, ituses integrity constraints to match results. Semantic queryoptimization is known as process of determining the set ofsemantic conversion that result in a semantically equivalent querywith a low execution cost. ODB-Optimizer defines morespecialized classes to be accessed and reduces the number offactors by applying different integrity constraint rules. [2][4]www.ijert.org(This work is licensed under a Creative Commons Attribution 4.0 International License.)1002

International Journal of Engineering Research & Technology (IJERT)ISSN: 2278-0181Vol. 3 Issue 7, July - 2014Advantages1. Supports recursive queries and queries havingdisjunction.Disadvantages:1. Suitable for only simple prototypes.2. No commercial implementations exist.Optimized Query: Select * from (select branch.id from branch)negation, t, (select customer.cid from customer) where t.name 'Brooklyn';Performance Enhancement:1) Projection operations reduce size of relations.2) Reduces the number of columns in relation and hence relationsize reduces.3) Better technique is to use selection rule.III.PROPOSED ARCHITECTUREProposed SystemThe Query Optimizer in this project is a HeuristicOptimiser. It tries to minimize the number of accesses byreducing the number of tuples and number of columns to besearched. Heuristic Optimization is less expensive than that ofcost based optimization. It is based on some heuristic rules bywhich optimiser can decide optimized query execution plan.[1][3][6] Important Heuristic Rules used are:1. Performing selection as early as possible.2. Perform projections as early as possible.Constraints1. The optimizer is a Heuristic optimizer only. It does not containanything related to cost based optimization.2. Parser has certain constraints like it takes only DML queries(select queries) and not any DDL queries.3. Also some of the clauses of SQL such as EXISTS, NOTEXIST and ORDER BY is not taken into consideration.4. Before changing the backend database the correspondingdatabase schema has to be specified before operating on it.5. Transparency for user application is not possibleIJERTIt’s been found that Cost-based optimization is moreexpensive, even with dynamic programming. Systems can useheuristics to decrease the number of choices that have to be madein a cost-based fashion. Heuristic optimization transforms thequery into query-tree by using a set of rules that (but not in allcases) improves execution performance [2][6].1. Perform selection early (reduces the number of tuples)2. Perform projection early (reduces the number of attributes)3. Performing most restrictive selection and join operations (i.e.with smaller result size) before other equivalent operations. [4][6]Few systems use only heuristics; while others combineboth heuristics with partial cost-based optimization.Example of two rulesPerform selection as early as possible.Original Query: Select * from branch, customer wherebranch.name 'Brooklyn' and customer. City 'Brooklyn';Transformed Query: Select * from (select * from branchwhere branch.name 'Brooklyn'), (select * from customer wherecustomer. City 'Brooklyn');Performance enhancement: Suppose there are branchand customer tables each has 100 and 100 tuples respectively.Figure 3: Query Optimization Process FlowIV. DESIGN OF THE PROPOSED SYSTEMDesign SpecificationsOriginal query: 100 * 100 tuples fetchedQuery processing refers to the range of activities involvedOptimized Query: Selection performed early hence say only 10in extracting data from a database. The cost of any processing aand 20 tuples selected so 10*20 tuples fetched.query is usually dominated by disk access, which is slowcompared to memory accessPerform projection as early as possible:Main tasks involved are:Original Query: Select branch.id, customer.cid from branch,1. Parsing the given SQL querycustomer where branch.name 'Brooklyn';2. Transforming it in the form of GLLIJERTV3IS070992www.ijert.org(This work is licensed under a Creative Commons Attribution 4.0 International License.)1003

International Journal of Engineering Research & Technology (IJERT)ISSN: 2278-0181Vol. 3 Issue 7, July - 20143. Convert GLL query into relational algebra,4. Optimization5. Regeneration of Queries in SQL format.The steps involved in processing a query are as follows:1. Parsing and translation2. Optimization and EvaluationTransforming query into internal form is the first actionmust be taken by system. The second action is query optimizationthat is it will generate a variety of equivalent plans for a query,and choose the least expensive one. And the third action is toevaluate this query [3].The inner block (SELECT MAX (SAL) FROM EMPLOYEEWHERE DNO 15)Translated in:MAXSALARY (σDNO 5(EMPLOYEE))The Outer blockSELECT LNAME; FNAME FROM EMPLOYEE WHERESALARY CTranslated in:Π (LNAME; FNAME (σSALARY C (EMPLOY EE)))C will represent the result returned by the inner block.1. The query optimizer would then select an execution plan foreach block.2. The inner block needs to be evaluated only once. (Uncorrelatednested query).3. It is much harder to optimize the more complex correlatednested queries.WHEREIJERTNeed for GLL:Single GLL statement can have many nested statementsininside and, therefore, it is necessary that every individual levelthe nested query be optimized independently of the other levelsby using GLL format the different levels of nesting can berepresented by different levels in GLL. Thus, each level can betranslated separately into relational algebra expression and can beprocessed independently.Example 2:SELECT BALANCE FROM ACCOUNTFor example if we have query asBALANCE 2500;Select a from b where c d;Corresponding Relational Expressions are:The corresponding GLL is:σbalance 2500 (π balance (account))orΠbalance (σbalance 2500(account))A relational algebra expression annotated withinstructions on how to evaluate it is called as evaluationprimitive. Several primitives may be grouped together into apipeline, in which several operations are performed in parallel.The Query execution engine picks up a query evaluation plan,executes that plan, and returns the result to the query. Thevarious evaluation plans for a given query can have differentcosts. User will write a query and optimizer executes the moste client evaluation plan [1][2].Figure 4: GLL Representation of QueryRelational algebra conversionTransforming an SQL Queries into Relational Algebra inSQL query can itself be translated into a relational algebraexpression on one of the several ways:1. SQL query is first translated into an equivalent extendedrelational algebra expression.2. SQL queries are divided into query blocks, which form thebasic units that can be transformed into the algebraic operatorsand optimized.3. Each query blocks contains a single SELECT-FROM-WHEREexpression, along with the GROUP BY and HAVING clauses.4. Nested queries within a query are identified as separate queryblocks.Equivalence of expressionsThis phase includes matching of relational algebra withone of the forms in equivalence rules. An equivalence rulestates that expressions of two forms are equivalent. We cantransform either to the other while preserving equivalence.Some important equivalence rules on relational algebra are asfollows:Rule 1:σθ1 θ 2 σθ1 (σ θ 2(E))Sample query:Select * from loan where lid 100 and bid 1200;Translating SQL Queries into Relational Algebra Example:Optimized query is:SELECT LNAME, FNAME FROM EMPLOYEE WHERE Select * from (select * from loan where lid 100) where bid SAL (SELECT MAX (SAL) FROM EMPLOYEE WHERE 1200;DNO 15);IJERTV3IS070992www.ijert.org(This work is licensed under a Creative Commons Attribution 4.0 International License.)1004

International Journal of Engineering Research & Technology (IJERT)ISSN: 2278-0181Vol. 3 Issue 7, July - 2014Rule 2:ΠL1 (ΠL1; L2; (ΠL1 .Ln (E)) .) ΠL1 (E)Sample query:Select lid from (select lid, bid from loan);Optimized query is:Select lid from loan;ΠL (E1 U E2) ΠL1 (E1) U ΠL (E2)Sample query:Select bid from (select from branch where bid 1000) union(select from loan1);Optimized query for RHS is:(Select bid from branch where bid 1000) union (select bid fromloan1 where bid 1000);IJERTRule 3:Optimizationσθ1 (E1 E2) (σθ1 (E)) (E2)In this stage, the query processor applies rules to theSample query:internaldata block structures of the query to translate theseSelect from loan; branch where loan:lid 100 and branch: bid structuresinto equivalent, but more efficient clientloan:lid;representations.The rules are mostly based upon mathematicalOptimized query is:modelsoftherelationalalgebra [2][4]. Expression and treeSelect * from (select * from loan1 where loan1:lid 100) t;(heuristics),uponcostestimatesof various algorithms, appliedbranch t1 where t1:bid t:bid;for operations or upon the semantics within the query and therelations it has. Selecting the correct rules to apply, when toRule 4:apply them and how should be they applied is the function ofσθ1 θ2 (E1 E2) (σθ1 (E1)) (σθ2 (E2))thequery optimizer [3] [6]. This phase includes the use ofSample query:someheuristic rules such as performing selections andSelect * from loan, branch where loan.lid 100 andprojectionoperations as early as possible.branch.name PUNE and branch.bid loan1.lid;Optimized query is:Select * from (select * from loan where loan.lid 100 and Regenerationbranch.name PUNE) t1 where t.lid t1.bid;Finally integrate all the intermediate SQL statements thatwillbepassing to backend database.Rule 5:ΠL1L2 (E1 E2) ΠL1 (E1)) (ΠL2 (E2))V. CONCLUSIONSample query:Select lid, name from loan1, branch where branch.bid loan.bid;Optimized query is:We have proposed a new approach to translating anSelect * from (select lid from loan1) t, (select name from branch)SQL queries into equivalent highly optimized SQL queriest1 where t.bid t1.bid;found in many commercial databases. A test database is builtconsisting of several lacks records. This test database willRule 6:then be used to time the execution speeds of "identical"σθ1 (E1) - (E2) σθ1 (E1)-σθ2 (E2)queries in the existing and new built query optimizer. ItSample query:proves that like to the large amount of data, data structure,(Select bid from branch where bid 1000) minus (select bidcomplex transaction logic and request for high data integrityfrom loan);and security in DBS query optimization is of at mostOptimized query for RHS is:importance. Ability to process queries in a timely manner is(Select bid from branch where bid 1000) minus (select bid fromone of the most critical functional requirements of a DBMS.loan where bid 1000);This is particularly proves for very large, mission criticalapplications such as banking systems, weather forecastingRule 7:and Stoke market applications, which can contain millions orσθ1 (E1) E2 σθ1 (E1) σθ1 (E1))even more than millions of records. The need for faster andSample query:faster results never ceases. Thus, a great amount of research(Select bid from branch where bid 1000) intersect (select bidand resources are spent on creating highly efficient queryfrom loan);optimization engines.Optimized query for RHS is:(Select bid from branch where bid 1000) intersect (select bidfrom loan where bid 1000);Rule 8:IJERTV3IS070992www.ijert.org(This work is licensed under a Creative Commons Attribution 4.0 International License.)1005

International Journal of Engineering Research & Technology (IJERT)ISSN: 2278-0181Vol. 3 Issue 7, July - 2014VI. REFERENCES[1][2][3][4][5]IJERT[6]Jayant R.Haritsa "Query Optimizer plan Diagram: Production, Reductionand Application, Data Engineering (ICDE)", 2011 IEEE 27thInternational conferenceG. R. Bamnote and S. S. Agrawal “Introduction to Query Processing andOptimization” IJRCSSE Volume 3, Issue 7, July 2013.Yannis E.Ioannidis paper on "Query Optimization" Computer SciencesDepartment University of Wisconsin Madison, WI 53706 in 2011Surajit Chaudhuri “An Overview of Query Optimization in RelationalSystems” Microsoft ResearchAnuja K Gaikwad, Rupali A Davalgaonkar and Seema Vaidya. Article:“Pathfinder of X Query for a Relational Database System.” InternationalJournal of Computer Applications NTSACT (1):22-24, August 2011.Classle.net “Query Optimization - Heuristic Approach” Availablethrough: uristicapproach [Accessed 3 March 2014]IJERTV3IS070992www.ijert.org(This work is licensed under a Creative Commons Attribution 4.0 International License.)1006

Query optimization and query execution are the two key components for query evaluation of an SQL database system [1][6]. Heuristic Optimizat. ion is less expensive than that of cost . based optimization. It is based on some heuristic rules by which . optimizer can . decide. optimized query execution plan [6]. Backround Need