Performance Prediction On The Use Of ML For Blackbox

Transcription

On the Use of ML for Blackbox SystemPerformance PredictionSilvery Fu1, Saurabh Gupta2, Radhika Mittal2, Sylvia Ratnasamy1(1: UC Berkeley, 2: UIUC)

Performance prediction is increasingly important! Optimization, capacity planning, SLO-aware ization/F(parameters) performanceE.g., how many workers, size of input, machine configurations JCT, query latency2

Challenges Accurate precise predictions Simple/easy-to-use in-depth understanding of thesystems not required General works across a spectrum ofworkloads and applicationsCan ML provide an accurate,general, and simpleperformance predictor?3

ML for system perf. prediction?This paper: a systematicand broad study onperformance prediction! Accurate precise predictions Simple/easy-to-use in-depth understanding of thesystems not required General works across a spectrum ofworkloads and applicationsCan ML provide an accurate,general, and simpleperformance predictor?4

ML for system perf. prediction?Start with the best-case scenario!The Best-Case (BC) Test Given parameters P1, P2, P3, , Pk, want to learn F(P) Perf. (e.g. JCT)-Dataset: data points of P X, JCT Y ; split into training and testing sets ML assumptions:-One-feature-at-a-time: e.g., vary P2, keeping P1, P3, Pk fixedSeen-configuration: e.g., points where P2 1GB appear in training and testing-sets Systems assumptions:-No-contention: dedicated EC2 instances, isolated experiments;Identical-inputs: same input data for a given input dataset size;5

Applications and ModelsML models:Nearest-neighbors,Linear-regression,Random forest,SVM, SVM-kernelized,Neural networks6

Metrics and Predictors Accuracy metric: rMSREYi: true value ML predictors Best-of-Model/BoM-err rMSRE of the most accurate modelTo obtain O-err: Allow Oracle topeek at both theerror function andtest data! Oracle predictor O-errf(Xi ): predicted value BoM-err O-err7

Best Case Test Results Accuracy metric:- rMSRE ML predictors BoM-err rMSRE of the most accurate model Oracle predictor O-errBoM-err O-err: .a lower boundif O-err high,"impossible toaccurately predictperformance"!8

Best Case Test Results Accuracy metric:- rMSRE ML predictors BoM-err rMSRE of the most accurate model Oracle predictor O-errError 5% for 90%of predictions!Error 15% forBoM-err O-err: 99% predictions! .a lower boundif O-err high,"impossible toaccurately predictperformance"!9

Best Case Test ResultsBoM-err: rMSRE from themost accurate modelO-err: rMSRE from the OracleObservations: Despite best-case assumptions, the BoM often fails to achieve high accuracy. Oracle errors (the lower bound) are high.10

Best Case Test ResultsBoM-err: rMSRE from themost accurate modelO-err: rMSRE from the OracleHigh Oracle error even underour best-case setup!Observations:(Accurate ) Despite best-case assumptions, the BoM often fails to achieve high accuracy. Oracle errors (the lower bound) are high.11

MethodologystartRun BC test12

MethodologystartRun BC testO-errhigh?13

E.g., Spark worker readiness Spark launches ajob once at least80% of targetworkers are on-kubernetes.html14

Root-causesFix?15

Root-causesFix?16

With system modificationsBeforeAfter For all applications,Oracle error is nowwell within 10%! Best-of-Model errorlikewise!17

All root-causes FixesTrade-off between predictability and other design goals!E.g., disabling an optimization can lead to higher prediction accuracy butdegraded performance18

All root-causesFixesThese "fixes" require in-depthunderstanding of the app. andreasoning about trade-offs!(Easy-to-use ) Trade-off between predictability and other design goals!E.g., disabling an optimization can lead to higher prediction accuracy butdegraded performance19

Embrace variability: probabilistic predictions Idea: predicting a mixture distribution instead of a single value;Then, use the "modes" of each distribution as the "top-k" prediction value E.g., identify the worst predicted performance that violates SLAML: Mixture DensityNetworks decreaseand ProbabilisticForestSignificantinRandomBoM-errwith top-3 (k 3) predictions!But top-k predictions may not beuseful to all cases!(General )doi: 0

MethodologystartfailRun BC testYReconfigure appO-errhigh?YNBoM-errhigh?YOption to“fix”systemvariability?YOption touse “top k”preds.?N for both:failYSo far, best-case setup only! one-feature-at-a-time seen-configuration no-contention identical-inputsSwitch to top-kmodels21

What if we go "beyond the best case"? Relaxing the one-feature-at-a-time assumption: Relaxing the seen-configuration assumption: configuration-to-predict is never seen during model training!Run on modifiedsystems with thefixes!Relaxing the no-contention assumption: vary all parameters!use default/shared EC2 instances!Relaxing the identical-inputs assumption: varied datasets (e.g., different random seeds in data generation)22

What if we go "beyond the best case"? Relaxing the one-feature-at-a-time assumption: Run on modifiedhigh configuration-to-predict is never seen during model training!systems with theif the underlying performancefixes!Relaxing the no-contentionassumption:trend isdifficult to learn!Relaxing the seen-configurationassumption:Prediction errorscan remain vary all parameters!use default/shared EC2 instances!(General )Relaxing the identical-inputs assumption: varied datasets (e.g., different random seeds in data generation)23

Methodology BlueprintstartsuccessfailRun BC testPick modelYNReconfigure appO-errhigh?YNBoM-errhigh?NRun BBC testBoM-errhigh?YOption to“fix”systemvariability?Y(Future Work)Option touse “top k”preds.?N for both:failYfailYSwitch to top-kmodels- add features tocharacterize data inputs(if failed relaxation ofidentical-inputs)24

Methodology BlueprintstartsuccessfailRun BC testPick modelYNReconfigure appO-errhigh?YNBoM-errhigh?NRun BBC testBoM-errhigh?YOption to“fix”systemvariability?Y(Future Work)Option touse “top k”preds.?N for both:failYfailYSwitch to top-kmodels- add features tocharacterize data inputs(if failed relaxation ofidentical-inputs)25

Conclusion: Taken "out of the box", many apps exhibit a surprisingly highdegree of irreducible error We can significantly improve the accuracy if we accept the lossof simplicity and/or generality: modify applications modify predictions .but they don't work in all cases Need a more nuanced methodology for applying ML26

Conclusion: Taken "out of the box", many apps exhibit a surprisingly high degreeAccurateof irreducible error precise predictionsCan ML provide an accurate,general,simple We can significantly improve the accuracyif weandacceptthe loss Simple/easy-to-useperformance predictor?of simplicityand/or generality:in-depth understandingof thesystems notrequired modifyapplications modify predictions General .butthey don't work in all casesworks across a spectrum ofNo.workloads and applications Need a more nuanced methodology for applying ML27

Thanks!Datasets: -dataTools: https://github.com/perfd/perfd.gitContact: silvery@eecs.berkeley.edu28

system variability? Run BC test start BoM-err high? O-err high? N fail N for both: Option to use “top k” preds.? Y Y Y Y fail (Future Work) - add features to characterize data inputs (if failed relaxation of identical-inputs) N Y success Reconfigure app Run BBC test Switch to top-k models