Automating The Search For Faster Matrix Multiplication Algorithms A .

Transcription

AUTOMATING THE SEARCH FOR FASTER MATRIX MULTIPLICATIONALGORITHMSA THESISSUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCEOF STANFORD UNIVERSITYIN PARTIAL FULFILLMENT OF THE REQUIREMENTSFOR THE DEGREE OFBACHELOR OF SCIENCE WITH HONORSWill MonroeAdvised by Virginia Vassilevska Williams and Ryan WilliamsMay 2013

AbstractWe present an implementation of the automated analysis proposed by Williams in her 2012 generalization of the Coppersmith-Winograd algorithm. We show ω 2.372772 using the 4th powerof the CW construction, an improvement on previous 4th power bounds, in addition to confirmingindependent work by Williams demonstrating ω 2.372711 using the 8th power.

AcknowledgementsFirst and foremost, to Virginia Vassilevska Williams, for her excellent mentorship and finite butunbounded patience; and to Ryan Williams, for being my guide in the undergraduate researchlabyrinth and opening door after door for me.To Meredith Hutchin and Yoav Shoham, and to my co-conspirators in the honors program, DimaBrezhnev, Bryce Cronkite-Ratcliff, and Sandy Huang, for their enthusiasm and encouragement, andfor filling Friday afternoons with fascinating ideas and delicious sandwiches.To Mehran Sahami, Eric Roberts, Jerry Cain, Phil Levis, Virginia Williams, and Alexis Wing,for helping ensure that I’ll be here doing research for a little while longer. I also would not have beennearly as inclined to be a researcher had it not been for the whirlwind of excitement that was Prof.Levis’ CURIS project two summers ago; my thanks to the people who made that project possible,including Ewen Cheslack-Postava, Behram Mistree, and all eight of the fun and friendly undergradswith whom I had the pleasure of working: Alex, Angela, Ben, Elliot, Emily, Jiwon, Kevin, andKotaro.And of course, to my parents, Bill and Carol Monroe, for their constant love and support, andfor reminding me to keep things in perspective, take care of myself, and have fun. I won’t forget myNumber One Job.This is only the teaser drop at the top of the roller coaster—thanks to everyone who poweredme up the hill. It’s going to be a wild ride.i

ContentsAcknowledgementsi1 Introduction12 Williams’ program22.1Rank constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22.2Free variable constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32.3Computing values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42.3.1Even powers algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42.3.2General powers algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . .62.3.3“Zero” values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83 Parameters93.1Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .93.2q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .93.3Zero variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .103.4Free variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .103.5Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .104 Implementation114.1Parallelization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .114.2Improving performance12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 Results135.1Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .135.2Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .156 Source code196.1Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .196.2Getting started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19ii

List of Tables5.1Variable settings for confirmed bounds . . . . . . . . . . . . . . . . . . . . . . . . . .186.1Useful definitions for interactive searching . . . . . . . . . . . . . . . . . . . . . . . .21iii

List of Figures5.1Best exponents by power and q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .145.2Best exponents including 5th power. . . . . . . . . . . . . . . . . . . . . . . . . . .145.3Best exponents for the 8th power, varying the number of zero variables . . . . . . . .155.4Performance of the lower powers, as a function of ω . . . . . . . . . . . . . . . . . . .165.5Performance of 8th power, for each setting of zero variables . . . . . . . . . . . . . .17iv

Chapter 1IntroductionSince Volker Strassen’s discovery in 1969 that n n matrix multiplication could be accomplishedin better than O(n3 ) time [Str69], theoretical research in matrix multiplication algorithms has seena long staircase of gradual improvement in ω, the exponent of matrix multiplication. Progresshad appeared to stall after Coppersmith and Winograd reached ω 2.376 [CW87], their secondbreakthrough in the field; no lower exponent was found until three years ago, when Stothers analyzedthe fourth power of CW’s tensor to obtain ω 2.3737 [Sto10], followed shortly thereafter by VirginiaVassilevska Williams’ analysis of the eighth power achieving the current bound of ω 2.3727 [Wil12].Previous analyses of Coppersmith-Winograd’s algorithm and its higher tensor powers, includingthe analysis accomplished by Stothers, required separate, arduous derivations for each of a numberof values that increased quadratically with tensor power. The major achievement of Williams’ paperwas the construction of an algorithm to generalize the analysis of CW-like tensors, in a way that isboth mechanical and computationally tractable for large powers. A tantalizing challenge presentedby Williams in her paper is the prospect of performing future analyses of a similar nature entirely bycomputer. In this report I present my work towards realizing that prospect, under the advisementof Virginia Vassilevska Williams and Ryan Williams.In chapter 2, we review the nonlinear program derived in Williams’ paper. Chapter 3 lists theparameters that can be varied when searching for better solutions. Chapter 4 describes details ofthe implementation of the search, and the results obtained using this implementation are presentedin chapter 5. Finally, chapter 6 provides instructions for obtaining and running the search programand a guide to the layout of the source code.1

Chapter 2Williams’ programFollowing Schönhage and Coppersmith-Winograd, we instead minimize a bound on ω/3, conventionally labeled τ .We choose P, the tensor power of the construction we are analyzing, and construct a constraintprogram in bP 2 /3c P 1 variables aIJK (the “small variables”), where I, J, and K are integers,0 I J K, and I J K 2P. (The STOC ’12 paper includes a more general form of theprogram with two sets of variables, a and ā; in our analysis here we elect to set aIJK āIJK .)The program is then: minimize τ , subject to Nonnegative: aIJK 0 for all aIJK . Permutations identity:Xperm(I, J, K)aIJK 1,I,J,Kwhere perm(I, J, K) is the number of unique permutations of (I, J, K). Rank constraint (see section 2.1). Free variable constraints (see section 2.2).Descriptions of the constraints follow.2.1Rank constraintTo compute a matrix product is to evaluate a particular trilinear form, a sumPi,j,k tijk xi yj zk ,where t is the tensor of the trilinear form. Coppersmith and Winograd [CW87] defined the valueV (τ ) of a trilinear form to indicate its similarity to a sum of matrix product tensors.2

CHAPTER 2. WILLIAMS’ PROGRAM3The rank constraint is a generalization of Schönhage’s asymptotic sum inequality [Sch81]. Itrelates the values VIJK (τ ) of a set of trilinear forms that compose the tensor being analyzed to theborder rank r of the tensor.Williams defines additional variables A (the “large variables”) to be the sum of all aijk whereP {i, j, k}, times 2 if the other two indices are unique. (Equivalently, A (i,j,k) if {i,j,k} aijk ,where here the sum is over all triples i, j, k that sum to 2P, not just those in sorted order, andaijk asort(i,j,k) .) The rank constraint in its most general form isQPr I,J,KVIJK perm(I,J,K)aIJK.QA A In analyzing the P power of [CW87], we substitute Coppersmith and Winograd’s border rank,r q 2, and take logs to getXP log(q 2) perm(I, J, K) log(VIJK )aIJK I,J,KXA log(A ). The nonlinearity in this constraint is embedded in the second term, as each A is a sum involvingsmall variables aijk .2.2Free variable constraintsWhen analyzing powers above the third, it is necessary to include a number of additional constraints.These constraints take the formaIJKI0 J 0 K0YaI 0 J 0 K 0 pIJK 1.(2.1)I 0 ,J 0 ,K 0They are constructed by solving the linear system defining A for some subset of aIJK (referred toin the implementation as the “included” set) in terms of A and the remaining aIJK (the “excluded”set, S). The included set may be chosen arbitrarily, although its size is fixed by the rank of thematrixdA daIJK .(For the second and third powers, it is possible to solve for all aIJK in terms ofA —i.e., the matrix has full rank—and as a result, no constraints are needed here.)With the coefficients of the solved linear system in hand, we create one additional constraint(2.1) for each variable aIJK in S. The coefficients of the solved linear system give the powers000I J KpIJK where each aI 0 J 0 K 0 is “included” (not in S).daI 0 J 0 K 0,daIJK

CHAPTER 2. WILLIAMS’ PROGRAM0040I J KSome of the derivatives pIJKmay be negative. To allow for setting certain aIJK to zero whensearching for solutions, and to avoid numerical instability in solving the program, Williams rewrotethe constraints (2.1) in the formaIJKI0 J 0 K0YaI 0 J 0 K 0 pIJK I 0 ,J 0 ,K 00 0I0 J 0 K0YaI 0 J 0 K 0 pIJK .I 0 ,J 0 ,K 000 0J K 0pIIJK0J K 0pIIJKThis is the form we use in our implementation.2.3Computing valuesIn order to complete the constraint program, it is necessary to know a lower bound on the value VIJKof each trilinear form that makes up the construction. Williams gives two algorithms for computingthe values. The first algorithm can only be used for even P, while the second can be applied to anytensor power.2.3.1Even powers algorithmBounds on VIJK are computed recursively in terms of known bounds on the values for lower powers.The structure of the resulting expression is a product of sums:!1/3XVIJK 2ξ !1/3X ψ !1/3X·ζ {zCombined productsL {z}(2.2)LP result}The expressions ξ , ψ , ζ , and L are computed using a set of temporary variables αijk indexedby triples similar to the aijk variables of the constraint program. (Note that a different set of αijkis constructed for each value VIJK .) The triples (i, j, k) in this step are subject to the constraintsthat i j k 2P (as in the case of the value triples); i I/2, and if i I/2, then j J/2—that is, (i, j, k) is not lexicographically greater than its“complement” (I i, J j, K k); and j J, k K—no index of the triple may be greater than the corresponding index of thevalue’s triple.(Unlike the aIJK above, these triples do not have to be in sorted order.)

CHAPTER 2. WILLIAMS’ PROGRAMFrom these, define X Pi,j,kc(i, I, ) Similarly, Y Pi,j,k5c(i, I, )αijk , where 21 0if i I ;if i or i I (but not both);otherwise.c(j, J, )αijk , and Z Pi,j,kc(k, K, )αijk .These definitions form a linear system. As before, we will solve the linear system for some ofthe αijk in terms of X , Y , Z , and an excluded set of αijk ( ). This yields a matrix of coefficientsdαijk dαijk dαijkdX , dY , dZ ,anddαijkdαi0 j 0 k0for each αi0 j 0 k0 in .Once we have found these coefficients, we can compute ξ , ψ , and ζ , the “combined products”in the value lower bound. These are computed recursively from values of lower powers: defineWijk Vijk VI i,J j,K k . (The code calls Wijk “bases.” When evaluating VI i,J j,K k , we can usethe fact that VIJK Vsort(I,J,K) .) Thenξ YWijk3dαijkdX ( bI/2c;ξbI/2c i,j,kψ YWijk3dαijkdY ( bJ/2c;ψbJ/2c bK/2c;ζbK/2ci,j,kζ YWijkdα3 dZijk i,j,k1if I is odd1/2if I is even1if J is odd1/2 if J is evendα1 Y6 ijkWijk dZ . 2i,j,kAt this point, as in the linear system for the final program above, if the linear system was of fullrank ( ), we are finished; the bound on the value is given by (2.2) (with L 1).If not, we must compute L . Its value isL Yδijk αijk ,αijk where δijk have a form similar to that of ξ , etc.:δijk YWi0 j 0 k0dα 0 0 0i j kdαijk.i0 ,j 0 ,k0Here, L still contains some α variables, unlike the previous expressions. Our goal is to maximizethe bound on VIJK , so we should maximize L . We have some freedom to choose the values ofthe variables αijk , with some constraints. These constraints, concisely, are that every αijk , whetherexcluded ( ) or included, must be nonnegative. The included αijk are defined in terms of X ,Y , Z and the excluded αijk by the solved linear system above. To make them depend only on the

CHAPTER 2. WILLIAMS’ PROGRAM6excluded variables, we give values to X , Y , and Z :X Pξ 0ψ 0 0 ψ Y Pξ 0Z Pζ 0ζ 0Assuming we’ve computed the lower power values, X , Y , and Z are then constants, making allthe constraints linear. Furthermore, δijk are also constants. Therefore, maximizinglog(L ) Xlog(δijk )αijkαijk subject to these constraints is a linear program, which can be solved with any available LP solver.(We employ GLPK for this purpose—see section 6.1.)2.3.2General powers algorithmThe algorithm for finding VIJK for arbitrary P differs from the even powers algorithm only in theindices of the α variables, the computation of the bases W , and some multiplicative factors. Themost salient difference in the construction of the programs that i, j, k in αi,j,k , and in ξ , etc., areno longer integers but rather P-tuples of integers in {0, 1, 2}.For 1 p P, define i[p] to be the pth element of the tuple i. The restrictions on the indices i,j, and k for αi,j,k are: Ppi[p] I;Ppj[p] J;Ppk[p] K; for each p, i[p] j[p] k[p] 2; and for each p P, (i[p], j[p], k[p]) (i[p 1], j[p 1], k[p 1]), where is a lexicographiccomparison of the tuples.For example, one valid α for V123 (P 3) is α(001),(020),(201) .The equations for the large variables X , Y , Z use an extended definition of the perm function,permP (i, j, k), which is the number of distinct triples of tuples (i0 , j 0 , k 0 ) one can obtain from allpossible permutations π of (1, 2, . . . , P) by permuting i, j, and k each by π: i0 [p] i[π(p)], j 0 [p] j[π(p)], k 0 [p] k[π(p)]. We use the following identity to avoid enumerating all possible π:permP (i, j, k) perm(i) · perm(j[p] : i[p] 0) · perm(j[p] : i[p] 1)

CHAPTER 2. WILLIAMS’ PROGRAM7The large variables areX X1c(i, ) permP (i, j, k)αi,j,kperm( )i,j,kX1c(j, ) permP (i, j, k)αi,j,kY perm( )i,j,kX1Z c(k, ) permP (i, j, k)αi,j,kperm( )i,j,kwith c(i, ) 1 if i is a permutation of and 0 otherwise. This linear system may be overconstrained,so before solving it, we remove two of the equations to make sure its rank is equal to the number ofequations. (In our implementation, we choose the lexicographically greatest X and lexicographicallygreatest Y ; Williams’ paper removes the lexicographically least X and Y . The choice is arbitrary.)Solving this system as above gives the derivativesalgorithm, each base Wijk is a product of P values:dαijk dαijkdαijkdX , dY , and dZ . In the general powersQPWijk p 1 Vi[p],j[p],k[p] . Note that althoughthis appears to be a simple generalization of the product of two values that occurs in the even powersalgorithm, here i[p], j[p], k[p] are restricted to {0, 1, 2}, so the general powers algorithm is not fullyrecursive, instead giving values for arbitrary powers only in terms of the values V002 and V011 .From the derivatives and bases, we constructξ perm( )YWijk3permP (i,j,k) dαijkdX perm( )Wijk3permP (i,j,k) dαijkdY perm( )Wijk3permP (i,j,k) dαijkdZ perm( )i,j,kψ perm( )Yi,j,kζ perm( )Yi,j,kand if is nonempty,δijk YW i0 j 0 k 0permP (i0 ,j 0 ,k0 )dα 0 0 0i j kdαijk.i0 ,j 0 ,k0In determining L , the setting of the large variables is also modified with perm( ) coefficients:X ξ Pperm( ) 0 ξ 0Y ψ Pperm( ) 0 ψ 0Z ζ Pperm( ) 0 ζ 0We then use an LP solver to maximize log(L ) and compute VIJK as in (2.2), identically to theeven powers algorithm.

CHAPTER 2. WILLIAMS’ PROGRAM2.3.38“Zero” valuesOur implementation includes another lemma from Williams’ paper (“Claim 7”) which applies tovalues V0JK from any power. These have a simpler lower bound that is a polynomial in q, raised tothe τ power: τ V0JK b JX b J(mod 2) Pb K b q .b, J b,22

Chapter 3ParametersWhile the nonlinear program itself represents a search over a family of algorithms, defined by theaIJK variables that define the search space of the nonlinear constraint solver, there are five configurable parameters that can be changed to influence the construction of the program itself. A feasiblesolution to the program generated for any setting of these parameters gives an upper bound on ω,so it is necessary to experiment with the setting of these parameters to ensure that one has exploredthe entire space of CW-like constructions.3.1PowerAs described in chapter 2, Williams’ algorithm is applied by starting with an analysis of a base caseof a particular matrix multiplication construction and recursively generalizing the analysis to highertensor powers P. Varying this parameter has already been a productive source of improvementsin ω: successive generalizations to higher powers of two of the Coppersmith-Winograd [CW87]construction by Stothers [Sto10] and Williams [Wil12] have each resulted in a better bound on ω.We have applied our automated analysis to powers 2, 3, 4, 5, 7, and 8 of the CW construction,although we were unable to find a feasible solution to the P 7 program for any ω 2.7.3.2qThe CW family of constructions, starting with [Str86], are assembled from a sum of some numberq of tensor products; Coppersmith and Winograd found ω 2.376 for q 6, but Williams achievedω 2.3727 using an analysis with q 5. We allow q to vary in our analysis between 4 and 7. Ofthese, q 5 appears to consistently give the best results across most powers (see chapter 5.1).9

CHAPTER 3. PARAMETERS3.310Zero variablesIt is possible to reduce the dimensionality of the search space for the nonlinear solver by requiringthat some number of the program parameters be exactly zero. This necessarily prevents the solverfrom finding some optimal solutions, but if the variables to be set to zero are chosen carefully, thisrestriction does not make solving the program much more difficult. If a feasible solution exists,reducing the number of variables can increase the probability of finding it and makes the solverconverge faster (see section 5.2).3.4Free variablesConstructing the program involves several steps of solving underconstrained linear systems by choosing a subset of the variables to treat as constants. There are two points in Williams’ algorithm wherethis strategy is employed: constructing the linear programming piece of the values computation(section 2.3), and constructing the derivatives constraint of the final program (section 2.2). Whichvariables are treated as constants can affect the bounds obtained on the various tensor values VIJK .Our implementation includes configurations of free variables used by Williams in [Wil12] andthe option to specify the free variables used for each value and for the final program; although oursearch automation does not currently look for improvements gained in this way, this could be asubject of future work. Since higher values loosen the rank constraint (due to the fact that VIJK isnondecreasing in τ ), to minimize ω one would need to search for the LP free variable setting thatmaximizes each value. An exhaustive search would run in exponential time and may be infeasiblealready for some of the 8th power values, but if a pattern could be uncovered in the optimal freevariable choices, this exhaustive search may be avoidable.3.5SubstitutionAlthough the nonlinear program includes both equalities and inequalities, the equalities have simpleforms that allow them to be eliminated by substitution in closed form. We implemented a search option that allows such substitution; however, we found that NLopt, the nonlinear solver we employedin searching for feasible solutions to the program, was unable to solve the programs constructed thisway in a reasonable amount of time. One likely cause of this is that the closed-form substitutionsintroduce divisions in the constraints, which lead to discontinuities and precision problems in thesolver.

Chapter 4ImplementationBecause the “constants” δijk can contain ratios of lower-power values, which depend nonlinearly onτ , constructing symbolic solutions for the values of higher powers, and therefore constructing thecomplete nonlinear program such that τ can be minimized, is not tractable. Instead, our searchproceeds as a series of attempts to find a feasible solution for specific values of τ (and q). With τand q fixed, the values computed are numeric constants that do not have to be recomputed for everyevaluation in the nonlinear solver.4.1ParallelizationSince each attempt is independent and CPU-bound, it makes sense to run searches in parallel, upto one search per available core. Our search implementation uses the Python multiprocessinglibrary to take advantage of multiple cores, employing a task queue abstraction to schedule searchesoptimally. The structure of the search parallelization is (Python pseudocode):def search():create inter-process queues input queue, output queuefill input queue with num cores random parameter settingsstart num cores worker processes running workerwhile jobs completed goal:block until output queue is not emptyextract a result from output queue and record the resultadd another random parameter setting to input queuekill worker processes11

CHAPTER 4. IMPLEMENTATION12def worker(input queue, output queue):while True:get a parameter setting from the input queueattempt to solve the program with the parametersplace the result (which includes success/failure)in output queue4.2Improving performanceSolving the nonlinear program dominates the running time of the search. Two small optimizationsmade dramatic differences in the efficiency of the solver.The first optimization was to use the SymPy lambdify function to preprocess the symbolicexpressions manipulated by the procedures that constructed the nonlinear program into Pythonfunctions that compute the expressions directly. The original SymPy expressions were runtime datastructures that required interpreting; lambdify transforms these into Python bytecode that runsjust as fast as a custom function written for each expression.The second optimization was to replace the equality in Williams’ rank constraint with an inequality. Williams’ STOC ’12 paper hadQPr I,J,Kperm(I,J,K)aIJKVIJK (τ )QA A .However, suppose we have a setting of aIJK such thatQPr I,J,Kperm(I,J,K)aIJKVIJK (τ )QA A .Since each VIJK is a nondecreasing, continuous function of τ and all aIJK are positive, there existsτ 0 τ for which the strict equality holds, so our τ still gives an upper bound on ω/3. It is thereforeacceptable to solve the program using the inequality, which is convenient because satisfying aninequality typically requires far fewer iterations than approximating an equality to a high degree ofprecision.The Python program is still significantly slower than the C code written by Williams to solvethe 8th power program; much of this slowness is likely due to the overhead of calls into and outof the NLopt Python wrappers. Possible future work in improving performance would involve thegeneration of C or C code, which would eliminate this overhead.

Chapter 5ResultsThe best exponent bound we obtained solely with our implementation wasω 2.372771003742,with P 4, q 5. We were also able to use parts of our implementation interactively to confirm asolution found by Williams with her Maple and C code, which givesω 2.3727104061for P 8, q 5. The settings for the variables aIJK that produced these bounds are included inTable 5.2 on page 18.5.1TrendsFigure 5.1 plots the best matrix multiplication exponents obtained as a function of P and q. Powers4 and 8 gave the best results of the data we collected; the results of the odd-powers attempts weredisappointing, giving exponents far higher than the even-power searches. (The fifth power is left outof Figure 5.1 because including it obscures differences between the other powers; Figure 5.1 showsthe fifth power in comparison. We were unable to find a feasible solution for the seventh power.)The success of the even powers algorithm suggests that the 16th power could be a fruitful next stepin the search.For all powers except the fifth, q 5 produced the best exponents. The best second- and thirdpower exponents for q 5 do not differ significantly from q 6, but for the fourth and eighthpowers, the difference is notable.The number of zero variables does impact the exponents found slightly, although it primarily13

CHAPTER 5. RESULTS142.400power 2power 3power 4power 82.395ω2.3902.3852.3802.3752.37045q76Figure 5.1: Best exponents by power and qpower 2power 3power 4power 5power 82.752.702.65ω2.602.552.502.452.402.3545q6Figure 5.2: Best exponents including 5th power7

CHAPTER 5. RESULTS150.0050 .00100.00056912number of zero variables16Figure 5.3: Best exponents for the 8th power, varying the number of zero variableshelps, achieving better exponents with higher numbers of variables set to zero. This is likely dueto the decrease in program complexity, requiring less searching in the global ISRES algorithm.Figure 5.1 shows the best exponents found for each number of variables set to zero. All fourexponents shown in this figure were obtained with q 5.5.2PerformanceFigure 5.2 shows the number of calls to the constraint functions required to achieve convergence (ordetermine that the program is not feasible) for the lower powers. Blue dots in the figure are successes,while red are failures; the red line is the kN 2 cap. While failed attempts are highly unpredictablein the amount of time they take to admit failure, successes occur reliably below a certain number ofNLopt calls for each power. We take advantage of this by automatically aborting the optimizationupon seeing that it has taken an unreasonable number of calls. The heuristic we use is to stop afterkN 2 calls, where N is the number of variables in the program (after removing the zero variables),and k is a constant. We found that k 20000 yields useful but conservative bounds.Limiting the number of variables of the program decreases the time required for convergencedramatically. Figure 5.2 shows three different configurations of zero variables for the 8th power; thedifference between using 6 zero variables and 16 is a factor-of-3 decrease in the number of iterationsuntil convergence for successes.

CHAPTER 5. RESULTSP 2,350000160 zero vars (out of 4)P 3,1000000300000800000250000200000NLopt callsNLopt calls0 zero vars (out of 3752.380P 4,2.385ω2.3902.39502.3652.4000 zero vars (out of 10)2.3702.375P 5,450000020000002.3802.385ω2.3902.3952.4000 zero vars (out of 14)400000035000003000000NLopt callsNLopt 0002.402.452.502.55ω2.60Figure 5.4: Performance of the lower powers, as a function of ω2.652.70

CHAPTER 5. RESULTSP 8,1.2 1e7176 zero vars (out of 30)P 8,1.0 1e71.09 zero vars (out of 30)0.8NLopt callsNLopt calls0.80.60.60.40.40.20.20.02.342.362.38P 8,7000000ω2.402.420.02.342.4412 zero vars (out of 30)ω2.402.422.442.422.4416 zero vars (out of 30)40000003500000500000030000004000000NLopt callsNLopt calls2.38P 4402.342.362.38ω2.40Figure 5.5: Performance of 8th power, for each setting of zero variables

CHAPTER 5. ,5a1,3,4a2,2,4a2,3,3Our result: P 4,ω 2.37277100374250.7909236679143.959562956941811 10 86.510583719407185 10 6769553544518Williams’ solution: P 8,ω 2.3727104061q5τ0.7909034687a0,0,16 0a0,1,15 0a0,2,14 0a0,3,13 0a0,4,12 0a0,5,11 5.607656585969684 10 5a0,6,10 0.0003442516415535713a0,7,91.0000000000697815 10 11a0,8,81.000000000151792 10 11a1,1,14 0a1,2,13 0a1,3,12 0a1,4,11 1.0000000000650404 10 11a1,5,10 1,7,80.0014957209858593493a2,2,12 0a2,3,11 1.0000000000434667 10 11a2,4,10 7.948735480255828 10 76338a2,7,70.009151018807902839a3,3,10 9.999999987632591 10 25297065a5,5,60.06900945039321159Table 5.1: Variable settings for confirmed bounds

Chapter 6Source codeOur Python source is available at http://stanford.edu/ wmonroe4/matrixmult/, downloadableas a zipped archive or through a git repository.6.1DependenciesA Python interpreter (http://python.org/) is needed to run the search program. Python 2.7 isrecommended to ensure that all the necessary libraries are supported. Our implementation usesthree third-party libraries, all of which are free and open-source.S

To Mehran Sahami, Eric Roberts, Jerry Cain, Phil Levis, Virginia Williams, and Alexis Wing, for helping ensure that I'll be here doing research for a little while longer. I also would not have been . Levis' CURIS project two summers ago; my thanks to the people who made that project possible, including Ewen Cheslack-Postava, Behram .