An End To End Learning Based Cost Estimator - CMU 15-721 PDF Free Download

1y ago

20 Views

1 Downloads

611.51 KB

13 Pages

Report/dmca

Download PDF

Transcription

An End-to-End Learning-based Cost EstimatorJi SunGuoliang Li Department of Computer Science, Tsinghua @tsinghua.edu.cnABSTRACTSQL query. However, recent studies show that the classical query optimizer [13, 14, 20] often generates sub-optimalplans due to poor cost and cardinality estimation. First,traditional empirical cost/cardinality estimation techniquescannot capture the correlation between multiple columns,especially for a large number of tables and columns. Second, the cost model requires to be tuned by DBAs.Recently, the database community attempts to utilize machine learning models to improve cardinality estimation.MSCN [12] adopts the convolutional neural network to estimate the cardinality. However, this method has three limitations. Firstly, it can only estimate the cardinality, butcannot estimate the cost. Secondly, the deep neural networkwith average pooling is hard to represent complicated structures, e.g., complex predicates and tree-structured queryplan, and thus the model is hard to be generalized to support most of SQL queries. Thirdly, although MSCN outperforms PostgreSQL in cardinality estimation, its performancecan be further improved. For example, on the JOB-LIGHTworkload, the mean error is over 50 and the max error is over1,000. We can improve them to 24.9 and 289 respectively.There are four challenges to design an effective learningbased cost estimator. First, it requires to design an end-toend model to estimate both cost and cardinality. Second,the learning model should capture the tree-structured information of the query plan, e.g., estimating the cost of aplan based on its sub-plans. Third, it is rather hard to support predicates with string values, e.g., predicate "not LIKE‘%(co-production)%’", if we don’t know which values contain pattern ‘(co-production)’. As the string values are toosparse, it is rather hard to embed the string values into themodel. Fourth, the model should have a strong generalization ability to support various of SQL queries.To address these challenges, we propose an end-to-endlearning-based cost estimation framework by using deep neural network. We design a tree-structured model that canlearn the representation of each sub-plan effectively andcan replace traditional cost estimator seamlessly. The treestructured model can also represent complex predicates withboth numeric values and string values.In summary, we make the following contributions.(1) We develop an effective end-to-end learning-based costestimation framework based on a tree-structured model,which can estimate both cost and cardinality simultaneously.(see Section 3).(2) We propose effective feature extraction and encodingtechniques, which consider both queries and physical execution in feature extraction. We embed these features intoCost and cardinality estimation is vital to query optimizer,which can guide the query plan selection. However traditional empirical cost and cardinality estimation techniquescannot provide high-quality estimation, because they maynot effectively capture the correlation between multiple tables. Recently the database community shows that thelearning-based cardinality estimation is better than the empirical methods. However, existing learning-based methodshave several limitations. Firstly, they focus on estimatingthe cardinality, but cannot estimate the cost. Secondly, theyare either too heavy or hard to represent complicated structures, e.g., complex predicates.To address these challenges, we propose an effective endto-end learning-based cost estimation framework based on atree-structured model, which can estimate both cost andcardinality simultaneously. We propose effective featureextraction and encoding techniques, which consider bothqueries and physical operations in feature extraction. Weembed these features into our tree-structured model. Wepropose an effective method to encode string values, whichcan improve the generalization ability for predicate matching. As it is prohibitively expensive to enumerate all stringvalues, we design a patten-based method, which selects patterns to cover string values and utilizes the patterns to embed string values. We conducted experiments on real-worlddatasets and experimental results showed that our methodoutperformed baselines.PVLDB Reference Format:Ji Sun, Guoliang Li. An End-to-End Learning-based Cost Estimator. PVLDB, 13(3): 307-319, 2019.DOI: TIONQuery optimizer is a vital component of database systems, which aims to select an optimized query plan for a Guoliang Li is the corresponding author. This work wassupported by 973 Program (2015CB358700), NSF of China(61632016, 61521002, 61661166012), Huawei, and TAL.This work is licensed under the Creative Commons AttributionNonCommercial-NoDerivatives 4.0 International License. To view a copyof this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. Forany use beyond those covered by this license, obtain permission by emailinginfo@vldb.org. Copyright is held by the owner/author(s). Publication rightslicensed to the VLDB Endowment.Proceedings of the VLDB Endowment, Vol. 13, No. 3ISSN 2150-8097.DOI: https://doi.org/10.14778/3368289.3368296307

our tree-structured model, which can estimate the cost andcardinality utilizing the tree structure (see Section 4).(3) For predicates with string values, we propose an effectivemethod to encode string values for improving the generalization ability. As it is prohibitively expensive to enumerateall possible string values, we design a pattern-based method,which selects patterns to cover string values and utilizes thepatterns to embed the string values (see Section 5).(4) We conducted experiments on real-world datasets, andexperimental results showed that our method outperformedexisting approaches (see Section 3211!132241ms"cardinality cost142!142) del[0.2,0.1,0.6, ,0.5]Hash Join estimate (real)RepresentationModel4 (1)814(147)[0.1,0.7,0.6, ,0.8]Nested Loop12213(250)Seq ScanRepresentationModel5(10)[0.3,0.45,0.1, ,0.1]Nested Loop1(1)Seq Scan345009(250)Bitmap ScanIndex ScanRepresentationModel[0.1,0.3,0.2, ,0.8]RepresentationModel[0.9,0.2,0.1, ,0.8]RepresentationModel[0.1,0.5,0.2, ,0.8]RepresentationModel[0.2,0.1,0.2, ,0.3]RepresentationModelFigure 1: Comparison of Traditional Cost Estimation andLearning-based Cost Estimation.RELATED WORKTraditional Cardinality Estimation. Traditional cardinality estimation techniques can be broadly classified intothree classes. The first is histogram-based methods [11]. Thecore idea is to divide the cell values into equal depth or equalwidth buckets, keep the cardinality of each bucket, and estimate the cardinality according to the buckets. The methodis easy to implement and has been widely used in commercialized databases. However, it is not effective to estimatethe correlations between different columns. The second issketching, which aims to solve distinct cardinality estimationproblem, including FM [7], MinCount [1], LinearCount [28],LogLog [5], HyperLogLog [6]. The basic idea first maps thetuple values to bitmaps, then counts the continuous zerosor the number of hitting for each position, and finally infersthe approximate number of distinct values. These methodscan estimate the distinct number of rows for each dataseteffectively. However, they are not suitable for estimatingrange query. The third is sampling-based methods [18, 26,30, 14]. These methods utilize the data samples to estimatethe cardinality. In order to address the sample vanishingproblem (valid samples decrease rapidly for joins), [14] proposed index-based sampling. Sampling methods improvethe accuracy of cardinality estimation, but they bring spaceoverhead and only be adopted by in-memory database likeHyPer [25]. Another limitation of this method is 0-tupleproblem, i.e., when a query is sparse, if the bitmap equalsto 0, the sample is invalid.Traditional Cost Model. Traditional cost estimation isestimated by combining multiple factors like cost of sequential page fetch, cost of random page fetch, cost of CPUcost of processing a tuple and cost of performing operation.Firstly, these factors are highly correlated to the cardinality of data affected by the query. Secondly, the weight ofeach factor has to be tuned. There are some works focusing the cost model tuning [29, 19, 13], and [13] conductedexperiments on the IMDB dataset to show that cardinalityestimation is much more crucial than the cost model for costestimation and query optimization.Learning-based Cardinality Estimation. The databasecommunity starts to solve this problem by using learningbased method like statistic machine learning or deep neuralnetwork. The first learning based work on cardinality estimation [21] first classifies queries according to the querystructure (join condition, attributes in predicates etc.), andthen trains a model on the values of the predicates, but themodel is ineffective to train on unknown structured query.The state-of-the-art method [12] trains a multi-set convolutional network on queries, but this method is not suitablefor query optimization, because the query-based encodingis too tricky when optimizing on a tree structure, and thegeneralization is limited. [27] proposed a vision of trainingrepresentation for the join tree with reinforcement learning.Yang et al [31] proposed deep likelihood models to capturethe data distribution of multiple attributes, but it focusedon cardinality estimation on single tables. Marcus et al [22]proposed an end-to-end learning-based optimizer, but theirfocus is not to estimate the cost and they utilize the cost toselect a good query plan.Machine Learning for Database.Many machinelearning techniques have recently proposed for optimizingdatabases [15], e.g., learned join order selection [32], knobtuning [16, 33], performance prediction [2, 29, 17, 8, 34], butthey all require experts to select the features according tooperation properties. A deep learning based approach [23]is also proposed.3. OVERVIEW OF COST ESTIMATORCost estimation is to estimate the execution cost of aquery plan, and the estimated cost is used by the queryoptimizer to select physical plans with low cost. Cardinalityestimation is to estimate the number of tuples in the result of a (sub)query. In this section we propose the systemoverview of cost and cardinality estimation.Traditional databases estimate the cost and cardinalityusing statistics. For filter operations, cardinality estimator (e.g., PostgreSQL, DB2) estimates the cardinality usingthe histograms; for join operations, the cardinality is estimated by empirical functions with selectivity of joined tables(nodes) as variables. In Figure 1, the numbers on top of eachnode are estimated cardinality and real cardinality. We findthat there exist large errors in traditional methods.In general, we can effectively estimate the cardinality forleaf nodes (like Scan) by using the histogram; however, theerror would be very large for joins because of the correlations between tables. Usually the more joins are, the largererror is. Unlike traditional cost estimation methods, ourlearning-based model can learn the correlation among multiple columns and tables, and the representation can retainaccurate information on distribution of results even for thequeries with dozens of operations.Moreover, the query plan is a tree structure, and the planis executed in a bottom-up manner. Intuitively, the cost/cardinality of a plan should be estimated based on its subplans. To this end, we design a tree-structured model thatmatches the plan naturally, where each model can be composed of some sub-models in the same way as a plan is madeup of sub-plans. We use the tree-structured model to estimate the cost/cardinality of a plan in a bottom-up manner.Learning-based Cost Estimator.The end-to-endlearning-based tree-structured cost estimator includes three308

Training Data GeneratorFeature ExtractorPhysical PlansReal CostMergeJoinHashJoinRealCardinalityScanScan plan,cost,card Scanname LIKE‘Mike%’MergeJoinNameHashJoinScanScanHash JoinScanSamplesqueriesTree-structured Model[0,1,0,0,0,1,0.12, ,0.78]EncodingPredicate Encoding00001MetaData EncodingOperation Encoding0000110100Sample BitmapEncodingRepresentation LayervectorFeatureEmbeddingQuery ure 2: Architecture of learning-based cost estimatorExecution Plan1"Node Type":"Nested Loop""Join Filter":"(mc.movie id t.id)""Node Type":"Hash Join""Hash":"(mi idx.info type id it.id)"4"Node Type":"Seq Scan"“Table Name":"info type""Filter":"info 'top 250 rank'""Node Type":"Index Scan"“Table Name":"movie companies""Filter":"(production year 2010)""Index":"(id mi idx.movie id)"2"Node Type":"Hash Join""Hash":"(mc.movie id mi idx.movie id)"3"Node Type":"Hash Join""Hash":"(mc.company type id ct.id)"5"Node Type":"Seq Scan"“Table Name":"movie info idx""Node Type":"Seq Scan"“Table Name":"company type""Filter":"kind 'production companies'"one-hot 0001 0010not 0000010mi idx.movie id000000000100000000001000Operationone-hotNested Loop0001it.id000000010000Hash Join0010it.info000000100000Seq Scan0100Index Scan1000Table Nameone-hot"Node Type":"Seq Scan"“Table Name":"movie companies"8"Filter":"note not like '%(as Metro-Goldwyn-MayerPictures)%' AND (note like '%(co-production)%’ ORmc.note LIKE '%(presents)%’)”movie companies.production yearcompany type.kind1987production companiestop 270 rankmovie companies.note(as Metro-Goldwyn-Mayer utorstop 250 rank(2006)(worldwide)(TV)1995special eﬀects companies(2011)(UK)(TV)1966info type.infotop 260 rankmc.movie idmi idx.info type id6Sample DatasetsOne-hot EncodingOperator79mc.company type id p 250 rankDictionaryToken'top 250 rank''productioncompanies'Representation0.14,0.43, ,0.920.51,0.22, ,0.11'(as Metro0.91,0.35,Goldwyn-Mayer ,0.25Pictures)'info type0001movie info idx0010mc.note001000000000company type0100mc.production year010000000000'(co-production)'movie ,0.11, ,0.020.13,0.41, ,0.76miscellaneous companiesidOperationMetaDataPredicateSample 1 0001 000000000010000000000001 0001 000000000100000000001000 0001 000000010000000000100000 0001 0.14,0.43, ,0.92padding000001000000 0001 000010000000000100000000 0001 0.51,0.22, ,0.11001000000000 0100 0.91,0.35, ,0.25001000000000 1000 0.37,0.11, ,0.02001000000000 1000 0.13,0.41, ,0.76010000000000 0010 0.81100000000000 0001 0000000010010000100Figure 3: Running Example of query plan encoding (padding means filling up the corresponding blocks with zeros) B C D by using dynamic programming, it must knowthe cost of {A, B, C, D, A B, B C, C D, A (B C), (A B) C, (B C) D, B (C D), · · · }. Whenevaluating A (B C), the representations of sub-plansB C and A can be extracted from the memory pool directly without re-calculating. Note that we only keep themappings of the current query and the mappings will beremoved when the query is processed.Workflow. For offline training, the training data are generated by Training Data Generator, which are encoded intotensors by Feature Extractor. Then the training data is fedinto the Training Model and the model updates weights byback-propagating based on current training loss. The detailsof model training is discussed in Section 4.3.For online cost estimation, when the query optimizer asksthe cost of a plan, Feature Extractor encodes it in a up-downmanner recursively. If the sub-plan rooted at the currentnode has been evaluated before, it extracts representationfrom Representation Memory Pool, which stores a mappingfrom a query plan to its estimated cost. If the current subplan is new, Feature Extractor encodes the root and goes toits children nodes. We input the encoded plan vector intoTree-structured Model, and then the model evaluates the costand cardinality of the plan and returns them to the queryoptimizer. Finally, the estimator puts all the representationsof ‘new’ sub-plans into Representation Memory Pool.main components, including training data generator, featureextractor, and tree-structured model, as shown in Figure 2.1) Training Data Generator generates training data asfollows. It first generates some queries according to the potential join graph of the dataset and the predicates in theworkload (see Section 4.3 for details). Then for each query,it extracts a physical plan by the optimizer and gets the realcost/cardinality. A training data is a triple ha physical plan,the real cost of the plan, the real cardinality of the plani.2) Feature Extractor extracts useful features from thequery plan, e.g., query operation and predicates. Each nodein the query plan is encoded into feature vectors and eachvector is organized into tensors. Then the tree-structuredvectors are taken as input of the training model. For simple features, we can encode them by using one-hot vector orbitmap. While for complicated features, e.g., LIKE predicate, we encode each tuple hcolumn, operator, operandi intovectors, by using a one-to-one mapping (see Section 4.1).3) Tree-structured Model defines a tree-structured modelwhich can learn representations for the (sub)plans, and therepresentations can be used in cost and cardinality estimation. The model is trained based on the training data, storesthe updated parameters in the model, and estimates costand cardinality for new query plans.4) Representation Memory Pool stores the mappingfrom sub-plans to their representations when processing aquery. For example, when the optimizer optimizes query A309

Table 2: Features of Condition OperatorsOperatorsFeaturesand/or/not[Operator] /! / / /LIKE/IN [Operator, Column, Operand]Table 1: Main Plan OperationsOperationFeaturesAggregate[Operator, N amekeys ]Sort[Operator, N amekeys ]Join[Operator, P redicatejoin ][Operator, N ametable , N ameindex ,ScanP redicatef ilter , SampleBitmap]arrow represent forward search, and each dotted line represents one step backtracking. We append an empty node tothe end of sequence for each dotted line. Thus, we can encode each distinct complex predicate tree as a unique vectorand the compound predicate can be encoded as a tensor.MetaData is the set of columns, tables and indexes usedin a query node. We use a one-hot vector for columns, tables, and indexes respectively. Then the meta node vectoris a concatenation of column vectors, table vectors and index vectors. (Note that a node may contain multiple tables,columns and indexes, and we can compute the unions ofdifferent vectors using the OR semantic.) We encode bothmeta data and predicate, because some nodes may not contain predicates. Figure 3 shows an example.Sample Bitmap is a fix-sized 0-1 vector where each bit denotes whether the corresponding tuple satisfies the predicateof the query node. If the data tuple matches the predicate,the corresponding bit is 1; 0 otherwise. This feature is onlyincluded in single nodes with predicates. As it is expensiveto maintain a vector for all tuples, we select some samples foreach table and maintain a vector for the samples. Figure 3shows an example of encoding sample data.After encoding each node in the plan, we need to encodethe tree-structured plan into a vector using a one-to-onemapping strategy. We also adopt the DFS method in thesame ways as encoding compound predicates.Figure 3 shows a running example of feature encoding fora plan extracted from the JOB workload. The plan is encoded as a vector using the one-hot encoding scheme, whichconsiders both query and database samples. Then the vectors are taken as an input of the training model.4.TREE-BASED LEARNING MODELIn this section, we introduce a tree-structured deep neuralnetwork based solution for end-to-end cost estimation. Wefirst introduce feature extraction in Section 4.1, and thendiscuss model design in Section 4.2. Finally we present themodel training in Section 4.3.4.1 Feature extraction and encodingWe first encode a query node as a node vector, and thentransform the node vectors into a tree-structured vector.There are four main factors that may affect the querycost, including the physical query operation, query predicate, meta data, and data. Next we discuss how to extractthese features and encode them into vectors.Operation is the physical operation used in the querynode, including Join operation (e.g., Hash Join, Merge Join,Nested Loop Join); Scan operation (e.g., Sequential Scan,Bitmap Heap Scan, Index Scan, Bitmap Index Scan, IndexOnly Scan); Sort operation (e.g., Hash Sort, Merge Sort);Aggregation operation (e.g., Plain Aggregation, Hash Aggregation). These operations significantly affect the cost.Each operation can be encode as a one-hot vector. Figure 3shows the one-hot vectors of different operations.Predicate is the set of filter/join conditions used in anode. The predicate may contain join conditions like‘movie.movie id mi idx.movie id’ or filter conditionslike ‘production year 1988’. Besides the atomic predicates with only one condition, there may exist compoundpredicates with multiple conditions, like ‘production year 1988 AND production year 1993’. The predicates affect the query cost, because the qualified tuples will changeafter applying the predicates.Each atomic predicate is composed of three parts, Column, Operator, and Operand. Operator and Column can beencoded as one-hot vectors. For Operand, if it is a numericvalue, we encode it by a normalized float; if it is string value,we encode it with a string representation (see Section 5 fordetails). Then the vector of an atomic predicate is the concatenation of the vectors for column, operator and operand.Table 2 shows the vector of each predicate.For a compound predicate, we first generate a vector foreach atomic predicate and then transfer multiple vectorsinto a vector using a one-to-one mapping strategy. Thereare multiple ways to transfer a tree structure to a sequencein a one-to-one mapping, and here we take the depth firstsearch (DFS) as an example. We first transfer the nodesto a sequence using DFS, and then concatenate the vectorsfollowing the sequence order. Figure 4 shows an example ofencoding a compound predicate. We transfer the nodes intoa sequence in the visited order, where the solid lines with4.2 Model DesignOur model is composed of three layers, embedding layer,representation layer and estimation layer as shown in Figure 5. Firstly, feature vectors in each plan node are largeand sparse, we should condense them and extract highdimensional information of features, and thus the embeddinglayer embeds the vector for each plan node. Secondly, therepresentation layer employs a tree-structured model, whereeach node is a representation model and the tree structure isthe same as the plan. Each representation model learns twovectors (global vector and local vector) for the corresponding sub-plan, where the global vector captures the information of the sub-plan rooted at the node and the local vectorcaptures the information of the node. Node that each representation model learns the two vectors based on the vectorsof its two children and the feature vector of the corresponding node. Finally, based on the vectors of the root node,estimation layer estimates the cost and cardinality.4.2.1 Embedding LayerThe embedding layer embeds a sparse vector to a densevector. As discussed in Section 4.1, there are 4 types of features, Operation, Metadata, Predicate and Sample Bitmap.Operation is a one-hot encoding vector, and Metadata andSample Bitmap are bitmap vectors. We use a one-layer fullyconnected neural network with ReLU activator to embedthese three vectors. However, the structure of the Predicatevector is complicated because it contains multiple AND/ORsemantics, and we design an effective model to learn the representation of predicates.Our goal is to estimate the number of tuples that satisfy apredicate. For an atomic predicate, we can directly use the310

NodeBool perandepisode nr 37000001000000100.0004season nr 10paddingpaddingpaddingANDORproduction year 1922ANDseason nr 4ANDORANDseason nr 12ANDseason nr 4season nr 4Noneepisode nr 37season nr 12NoneNoneANDseason nr 4Noneepisode nr 37NoneNoneNoneproduction year 1922Figure 4: Predicates Encoding (padding means filling up the corresponding blocks with zeros)SQL QueryPredicateCostRepresentation ModelCardinalityGlt 1SELECT MIN(mc.note) AS production note,MIN(t.title) AS movie title,MIN(t.production year) AS movie yearFROM company type AS ct,info type AS it,movie companies AS mc,movie info idx AS mi idx,title AS tWHERE ct.kind 'production companies'AND it.info 'top 250 rank'AND mc.note NOT LIKE '%(as Metro-Goldwyn-Mayer Pictures)%'AND (mc.note LIKE '%(co-production)%')OR mc.note LIKE '%(presents)%')AND t.production year 2010AND ct.id mc.company type idAND t.id mc.movie idAND t.id mi idx.movie idAND mc.movie id mi idx.movie idAND it.id mi idx.info type id;ANDEstimation LayerlRt 1SigmoidGrt 1Gt 1kt1ftAVGmc.note LIKE'%(presents)%'mc.note NOT LIKE'%(as MetroGoldwyn-MayerPictures)%'kt2xtanhxrtσrRt 1ORGt xσtanhσRt 1RtEtLinearConcatenateReLUmc.note 1Encoded Query PlanRepresentation Layer2Index ScanG3 R3Hash JoinG4 R4Seq Scan4Seq Scan5Seq Scan7Seq Scan8RepresentationModelG0 R0RepresentationModelG6 R6RepresentationModel63Hash JoinG9 R9RepresentationModel9Predicates Embedding LayerRepresentationModelG5 R5G7 R7RepresentationModelG0 R0RepresentationModelG0 R0MINMetaDataHash JoinG2 R2OperationNested LoopRepresentationModelSample Bitmap1LinearG0 R0LinearMAXLinearPredicateG8 R8RepresentationModelANDORmc.note LIKE '%(presents)%'mc.note LIKE '%(co-production)%'mc.note NOT LIKE '%(as Metro-Goldwyn-Mayer Pictures)%'G0 R0Figure 5: Two Level Tree Modelvector. But for a compound predicate with multiple condiding layer. However, LSTM cannot represent the semantictions, we need to learn the semantic of the predicates andof predicates explicitly. We compare them in Section 7.2.the distribution of the results after applying the predicatesEmbedding Formulation. We denote features Operaon the dataset.tion, Metadata, Predicate and Sample Bitmap of nodet asConsider a compound predicate with two atomic prediOt , Mt , Pt , Bt respectively, and we denote each node of feacates using the AND semantic. We can estimate the numberture Predicate as Pt with Ptl as its left child and Ptr rightof results satisfying the predicate by the minimum number ofchild. Embedding Model can be formalized as below. E isestimated results satisfying the atomic predicates. Thus wethe embedding output. W is the weight of a fully connecteduse the min pooling layer to combine the two atomic predneural network. b is a bias.icates. Consider a compound predicate with two atomicpredicates using the OR semantic. We can estimate theE [embed(Ot ), embed(Mt ), embed(Bt ), embed(Pt )]number of results satisfying the predicate by the maximumembed(Ot ) ReLU (Wo · Ot bo )embed(Mt ) ReLU (Wm · Mt bm )number of estimated results satisfying the atomic predicates.Thus we use the max pooling layer to combine the twoembed(Bt ) ReLU (Wb · Bt bb )atomic predicates. min(embed(Ptl), embed(Ptr )) Pt and,In this way, we use a tree pooling to encode a predicate,embed(Pt ) max(embed(Ptl), embed(Ptr )) Pt or, where the tree structure is the same as the predicate treeWp · Pt bpPt expr.structure. Particularly, the leaf node is a fully connectedwhere type(Pt ) is the type of a node, which includes AND,neural network, the OR semantic is replaced with the maxOR, and a predicate expression.pooling layer and the AND semantic is replaced with the minpooling layer. The advantages of this model are two folds.The first is that only the leaf nodes need to be trained so4.2.2 Representation Layerthat it’s easy to do efficient batch training. The second isCost estimation has two main challenges – informationthat this model converges faster and performs better.vanishing and space explosion. First, it is easy to estimateFigure 4 shows a compound predicate and its embeddedthe cost for simple operations, e.g., estimating the cost ofmodel. For leaf nodes, we use a fully connected neural neta filtering predicate on a single table, but it is rather hardwork. For conjunction nodes, we use max pooling layer forto estimate the cost of joining multiple tables, because the‘OR’ and min pooling layer for ‘AND’ which meet the sejoin space is large and the joined tuples are sparse. In othermantic of ‘AND’ and ‘OR’.words, in leaf nodes, we can capture much information forPredicate Embedding Layer: LSTM vs Min-Maxsingle table processing. But for upper nodes in the queryPooling. We can also use LSTM as the predicate embedplan, the correlation among nodes may be lost and thisis the information vanishing problem. Second, to retain311

output layer is a sigmoid function to predict the normalizedcardinality and cost. The output layer should be able toevaluate cost or cardinality for any sub-plan by its representation vector. This layer takes the representation Rt of theupper model in Representation Layer as input, and can beformalized as below:enough information to capture the correlations among tables, it requires to store much more space and intermediateresults. But the space grows exponentially and becomes prohib

Learning-based Cardinality Estimation. The database community starts to solve this problem by using learning-based method like statistic machine learning or deep neural network. The ﬁrst learning based work on cardinality es-timation [21] ﬁrst classiﬁes queries according to the query structure (join condition, attributes in predicates etc .