ANSYS Solvers: Usage And Performance

Transcription

ANSYS Solvers:Usage and PerformanceAnsys equation solvers: usage and guidelinesGene PooleAnsys Solvers Team, April, 2002

Outline Basic solver descriptions– Direct and iterative methods– Why so many choices? Solver usage in ANSYS– Available choices and defaults– How do I chose a solver? Practical usage considerations––––Performance issuesUsage rules of thumbUsage examplesHow do I chose the fastest solver?

Solver Basics: Ax bDirect MethodsFactor: A LDLT Solve:Lz bz D-1zLT x z Compute matrix LSolve triangular systems

Solver Basics: Ax bDirect MethodsStationary Methods(Guess and Go)Factor: A LDLT Solve:Lz bz D-1zLT x zIterative Methods Projection Methods(project and minimize)Choose x0Choose x0; r0 Ax0-b;p0 r0Iterate:x K 1 Gxk cUntil x k 1 – xk eIterate:Compute Apk;Update xk xk-1 αk pk-1rk rk-1 – αk Apkpk rk βk pk-1Until rk εCompute matrix LCompute sparse Ax productSolve triangular systemsVector updates

Solver Basics: LimitationsDirect Methods Factor is expensive– Memory & lots of flops– huge file to store L Solve I/O intensive– forward/backward readof huge L fileIterative Methods Sparse Ax multiplycheap but slow– Memory bandwidthand cache limited– Harder to parallelize Preconditioners arenot always robust Convergence is notguaranteed

ANSYS Direct Advantage Enhanced BCSLIB version 4.0– Parallel factorization– Reduced memory requirements for equationreordering– Support for U/P formulation Sparse solver interface improvements– Dynamic memory uses feedback for optimalI/O performance– Sparse assembly including direct eliminationof CEs

Multi-Point ConstraintsDirect elimination methodx1 GTx2 gA11A12x1b1 AT12A22x2b2solve :(GA11GT GA12 AT12 GT A22) x2 b2 Gb1 -AT12g - GA11g

ANSYS Iterative Advantage Powersolver has a proprietary and robustpreconditioner– Parallel matrix/vector multiply– Wide usage, robust Many additional iterative solvers forcomplex systems, non-symmetric, etc. New high performance parallel solvers– AMG Algebraic Multigrid– DDS Domain Decomposition Solver Ongoing efforts to utilize and enhanceAMG and DDS solvers when applicable

Solver Usage Sparse, PCG and ICCG solverscover 95% of all ANSYSapplications Sparse solver is now default inmost cases for robustness andefficiency reasons

Solver Usage: Choices Sparse direct solver ( BCSLIB )PCG solver (PowerSolver)Frontal solverICCGJCGListed by order of usage popularityANSYS now chooses sparse direct in nearly allapplications for robustness and efficiency

Solver Usage: -pp Choices AMG – Algebraic Multigrid– Good for ill-conditioned problems– Best ANSYS shared memory parallel performanceiterative solver– Good for nonlinear problems – can solve indefinitematrix DDS – Domain Decomposition Solver– Exploits MPP cluster computing for solver portion ofanalysis– Solver time scales even on many processorsStill under intensive developments

Solver Usage: Sparse Solver Real and complex, symmetric and non-symmetric Positive definite and indefinite(occurs in nonlinear andeigensolver) Supports block Lanczos Supports substructural USE pass Substructure Generation pass ( Beta in 6.1) Supports ALL physics including some CFD Large numbers of CEs Support for mixed U-P formulation with Lagrangemultipliers (efficient methods are used to support this) Pivoting and partial pivoting (EQSLV,sparse,0.01,-1)

Solver Usage: PCG Solver Real symmetric matrices Positive definite and indefinite matrices. Supportingindefinite matrices is a unique feature in our industry. Power Dynamics modal analyses based on PCG subspace Substructure USE pass and expansion pass All structural analyses and some other field problems Large numbers of CEs NOT for mixed U-P formulation Lagrange multiplierelements NO pivoting or partial pivoting capability

Solver Usage: ICCG Suite Collection of iterative solvers for specialcases Complex symmetric and non-symmetricsystems Good for multiphysics, i.e. EMAG Not good for general usage

Usage Guidelines: Sparse Capabilities– Adapts to memory available– ANSYS interface strives for optimal I/Omemory allocation– Uses machine tuned BLAS kernels thatoperate at near peak speed– Uses ANSYS file splitting for very large files– Parallel performance 2X to 3.5X faster on 4to 8 processor systems– 3X to 6X speedup possible on high endserver systems ( IBM, HP, SGI .)

Usage Guidelines:Sparse Resource requirements– Total factorization time depends on model geometryand element type Shell models best Bulky 3-D models with higher order elementsmore expensive– System requirements 1 Gbyte per million dofs 10 Gbyte disk per million dofs– Eventually runs out of resource 10 million dofs 100 Gbyte file 100 Gbytes X 3 300 Gbytes I/O 300 Gbytes @ 30 Mbytes/sec approx. 10,000seconds I/O wait time

Usage Guidelines: PCG Capabilities– Runs in-core, supports out-of-core (you don’t need to do this)– Parallel matrix/vector multiply achieves2X on 4 to 8 processor system– Memory saving element-by-elementtechnology for solid92 (and solid95beta in 6.1)

Usage Guidelines:PCG Resource requirements– 1 Gbyte per million dofs– Memory grows automatically for largeproblems– I/O requirement is minimal– Convergence is best for meshes with goodaspect ratios– 3-D cube elements converge better than thinshells or high aspect solids– Over 500k dofs shows best performancecompared to sparse

Usage Guidelines: Substructuring Eqslv,spar in generation pass– Requires pcg or sparse inexpansion pass Use pass uses sparse solver bydefault– May fail in symbolic assembly ( tryasso,,front) Pcg or sparse in expansion pass– Avoids large tri filesThis is Beta feature only in 6.1, no unsymmetric, no damping

Performance Summary Where to look– PCG solver; file.PCS– Sparse solver; output file Add Bcsopt ,,, ,,, -5 (undocu. Option) What to look for– Degrees of freedom– Memory usage– Total iterations (iterative only)

Usage Guidelines Tuning sparse solver performance– Bcsopt command (undocumented)– Optimal I/O for largest jobs– In-core for large memory systems andsmall to medium jobs ( 250,000 dofs )– Use parallel processing

User Control of Sparse Solver OptionsSparse solver control using undocumented command:bcsopt, ropt, mopt, msiz ,,, dbgmmdmetissgiwaveSet equationreorderingmethod-5forclimitnnnn - Mbytesup to 2048Force or limitsolver memoryspace in MbytesPrintperformancestats

Solvers and Modal Analyses Modal analyses most demanding in ANSYS– Block Lanczos is most robust Requires all of sparse solver resourcesplus additional space for eigenvectors Requires multiple solves during Lanczositerations– Subspace good for very large jobs and feweigenvalues Uses PCG solver Or uses the frontal solver Not as robust as block Lanczos

Some Solver Examples Some benchmarks 5.7 vs 6.0Typical large sparse solver jobsSparse solver memory problemPCG solver exampleAMG solver examples

Benchmark study; Static 9Total Solution Time5.76Sparse 31147748813770xPeak 5x

Benchmark 1677502851502851Total Solution Time5.76Sparse Solver32028911497892123114631131893Peak Memory5.7658112448011151249403121115

Sparse Solver Memory Usage Example 12 Million DOF Sparse solver jobSGI O2000 16 CPU systemMultiSolution: Sparse Assembly Option . Call No. 1ANSYS largest memory block available10268444 :ANSYS memory in use1323917280 :End of PcgEndANSYS largest memory block availableANSYS memory in useTotal Time (sec) for Sparse Assembly9.79 Mbytes1262.59 Mbytes588214172 :560.96 Mbytes256482560 :244.60 Mbytes63.53 cpu69.02 wallHeap space available at start of BCSSL4: nHeap 75619667 D.P. words577 Mbytes available for sparse solver576.93 Mbytes

Sparse Solver Memory Usage Example 1 (cont.)Carrier 2M dof ModelSPARSE MATRIX DIRECT SOLVER.Number of equations 2090946,Maximum wavefront ANSYS 6.0 memory allocationHeap space available at start of bcs mem0: nHeap 61665329 D.P. words470.47 MbytesEstimated work space needed for solver: min siz 256932078 D.P. words 1960.24 Mbytes275Initial memory increasedto 800 MbytesStart siz Work space needed for solver: start siz 110399416 D.P. words842.28 MbytesHeap space setting at start of bcs mem0: nHeap 110399416 D.P. words842.28 MbytesInitial BCS workspace memory 110399416 D.P. words 842.28 MbytesTotal Reordering Time (cpu,wall) 537.670542.897Increasing memory request for BCS work to67802738 D.P. words517.29 MbytesInitial BCS workspace is sufficientMemory available for solver 842.28 MBMemory required for in-core 0.00 MBOptimal memory required for out-of-core 517.29 MBMinimum memory required for out-of-core 162.39 MB800 Mbytes exceedsOptimal I/O settingInitial guess easily runs in optimal I/O mode

Sparse Solver Memory Usage Example 1 (cont.)Carrier2 2M dof Modelnumber of equations no. of nonzeroes in lower triangle of a no. of nonzeroes in the factor l maximum order of a front matrix maximum size of a front matrix no. of floating point ops for factor time (cpu & wall) for structure inputtime (cpu & wall) for orderingtime (cpu & wall) for symbolic factortime (cpu & wall) for value inputtime (cpu & wall) for numeric factorcomputational rate (mflops) for factortime (cpu & wall) for numeric solvecomputational rate (mflops) for solvei/o statistics: 57.4048E 166.905039unit .32587072.7894888171.507331541.Freeing BCS workspaceSparse Matrix Solver CP Time (sec) 14468.280Sparse Matrix Solver ELAPSED Time (sec) 15982.407DofsNzeros in K (40/1)Nzeros in L (1142/29)Trillion F.P. opsFactored Matrix file LN092.4 Billion D.P words, 18 Gbytes59 Gbytes transferredFile LN32 not usedElapsed time close to CPU time (4.5 Hours)Good processor utilization, reasonable I/O performance

Engine Block Analysis410,977 Solid45 Elements16,304 Combin40 Elements1,698,525 Equations20,299 Multi-Point CEs

Engine Block AnalysisSparse Solver Interface StatisticsSparse CE interface Matrix-------------------------Original A22Constraints GH G*A11 A12THGTModified A22dimcoefsmxcolmlth******* ********* *********1698525 11698525 58304862404# of columns modified by direct elimination of CEs:132849Over 20,000 CEs processed with minimaladditional memory requiredMemory available for solverMemory required for in-coreOptimal memory required forMinimum memory required for 547.22 MB 9417.10 MBout-of-core 527.29 MBout-of-core 127.25 MBMemory available is sufficient to run inOptimal I/O mode

Engine Block AnalysisSparse Solver Performance SummarySGI O2000 16-300Mhz Processors, 3 CPU runtime (cpu & wall) for structure inputtime (cpu & wall) for orderingtime (cpu & wall) for symbolic factortime (cpu & wall) for value inputtime (cpu & wall) for numeric factorcomputational rate (mflops) for factortime (cpu & wall) for numeric solvecomputational rate (mflops) for solvei/o statistics:unit number----------2025911 2.637.96267.125086.30560.60663.776.91I/O always showsup in 23Good sustained rate on factorization – nearly 600 mflops

Sparse Solver Example 2What can go wrongCustomer example: excessive elapsed timeHigh Performance HP 2 CPU desktop system ------------------------- Release 6.0UP20010919HPPA 8000-64 ------------------------- Maximum Scratch Memory Used 252053628 Words961.508 MB ------------------------- CP Time(sec) 6323.090Time 23:36:41 Elapsed Time (sec) 27575.000Date 01/10/2002 --------------------------*

Sparse Solver Example 2 (cont.)FEM model of large radiator650k Degrees of Freedom68,000 Solid95 Elements2089 Surf154 Elements3400 Constraint EquationsInitial memory setting –m 1000 –db 300

Sparse Solver Example 2 (cont.)MultiSolution: Sparse Assembly Option . Call No. 1ANSYS largest memory block available73741452 :ANSYS memory in use612110368 :70.33 Mbytes583.75 MbytesSparse Solver Interface Adding CEs. Call No. 1ANSYS largest memory block available73741164 :ANSYS memory in use612110656 :70.33 Mbytes583.75 MbytesSparse CE interface Matrixdimcoefs mxcolmlth-------------------------- ******* ********* *********Original A22648234 41415993461Constraints G3471232228H G*A11 A12T3471409194219HGT648234781339668The initial memory allocation (-m) has been exceeded.Supplemental memory allocations are being used.No. of columns modified by direct elimination of CEs:42558Modified A22648234 43974225692ANSYS largest memory block available288465472 :275.10 MbytesANSYS memory in use179570288 :171.25 MbytesTotal Time (sec) for processing CEs38.33 cpu61.73 wallEnd of PcgEndANSYS largest memory block available575083952 :548.44 MbytesANSYS memory in use133219536 :127.05 MbytesTotal Time (sec) for Sparse Assembly38.36 cpu61.77 wall584 Mbytes in useduring sparse AssemblyNeeds more memoryto process CEs548 Mbytes availableafter sparse Assembly

Sparse Solver Example 2

– System requirements 1 Gbyte per million dofs 10 Gbyte disk per million dofs – Eventually runs out of resource 10 million dofs 100 Gbyte file 100 Gbytes X 3 300 Gbytes I/O 300 Gbytes @ 30 Mbytes/sec approx. 10,000 seconds I/O wait time. Usage Guidelines: PCG Capabilities – Runs in-core, supports out-of-core (you don’t need to do this) – Parallel matrix .