Memory And Thread Placement Optimization Developer's Guide

Transcription

Memory and Thread PlacementOptimization Developer's GuideBetaPart No: 820–1691–13November 2010

Copyright 2007, 2010, Oracle and/or its affiliates. All rights reserved.This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectualproperty laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license,transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software,unless required by law for interoperability, is prohibited.The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing.If this is software or related software documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, the followingnotice is applicable:U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are“commercial computer software” or “commercial technical data” pursuant to the applicable Federal Acquisition Regulation and agency-specific supplementalregulations. As such, the use, duplication, disclosure, modification, and adaptation shall be subject to the restrictions and license terms set forth in the applicableGovernment contract, and, to the extent applicable by the terms of the Government contract, the additional rights set forth in FAR 52.227-19, CommercialComputer Software License (December 2007). Oracle America, Inc., 500 Oracle Parkway, Redwood City, CA 94065.This software or hardware is developed for general use in a variety of information management applications. It is not developed or intended for use in any inherentlydangerous applications, including applications which may create a risk of personal injury. If you use this software or hardware in dangerous applications, then youshall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure its safe use. Oracle Corporation and its affiliates disclaim anyliability for any damages caused by use of this software or hardware in dangerous applications.Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. Intel and Intel Xeon aretrademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARCInternational, Inc. UNIX is a registered trademark licensed through X/Open Company, Ltd.This software or hardware and documentation may provide access to or information on content, products, and services from third parties. Oracle Corporation andits affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and services. Oracle Corporationand its affiliates will not be responsible for any loss, costs, or damages incurred due to your access to or use of third-party content, products, or services.110425@25097

ContentsPreface .51Overview of Locality Groups . 9Locality Groups Overview .9MPO Observability Tools . 122MPO Observability Tools .13The pmadvise utility . 13Using the madv.so.1 Shared Object . 15madv.so.1 Usage Examples . 16The plgrp tool . 17Specifying Lgroups . 18Specifying Process and Thread Arguments . 18The lgrpinfo Tool . 19Options for the lgrpinfo Tool . 19The Solaris::lgrp Module . 21Functions in the Solaris::lgrp Module . 23Object Methods in the Solaris::lgrp Module . 273Locality Group APIs .31Verifying the Interface Version . 31Initializing the Locality Group Interface . 32Using lgrp init() . 32Using lgrp fini() . 33Locality Group Hierarchy . 33Using lgrp cookie stale() . 33Using lgrp view() . 343

ContentsUsing lgrp nlgrps() . 34Using lgrp root() . 34Using lgrp parents() . 35Using lgrp children() . 35Locality Group Contents . 35Using lgrp resources() . 36Using lgrp cpus() . 36Using lgrp mem size() . 37Locality Group Characteristics . 37Using lgrp latency cookie() . 37Locality Groups and Thread and Memory Placement . 38Using lgrp home() . 39Using madvise() . 39Using meminfo() . 40Locality Group Affinity . 42Examples of API Usage . 444Memory and Thread Placement Optimization Developer's Guide November 2010 (Beta)

PrefaceThe Memory and Thread Placement Optimization Developer's Guide provides information onlocality groups and the technologies that are available to optimize the use of computingresources in the Oracle Solaris operating system.Who Should Use This BookThis book is intended for use by system administrators, performance engineers, systemsprogrammers, and support engineers, and developers who are writing applications in anenvironment with multiple CPUs and a non-uniform memory architecture. The programminginterfaces and tools that are described in this book give the developer control over the system'sbehavior and resource allocation.Related Third-Party Web Site ReferencesThird-party URLs are referenced in this document and provide additional, related information.Note – Sun is not responsible for the availability of third-party web sites mentioned in thisdocument. Sun does not endorse and is not responsible or liable for any content, advertising,products, or other materials that are available on or through such sites or resources. Sun will notbe responsible or liable for any actual or alleged damage or loss caused or alleged to be caused byor in connection with use of or reliance on any such content, goods, or services that are availableon or through such sites or resources.Documentation, Support, and TrainingSee the following web sites for additional resources: Documentation (http://docs.sun.com)Support ml)Training (http://education.oracle.com) – Click the Sun link in the left navigation bar.5

PrefaceOracle Welcomes Your CommentsOracle welcomes your comments and suggestions on the quality and usefulness of itsdocumentation. If you find any errors or have any other suggestions for improvement, go tohttp://docs.sun.com and click Feedback. Indicate the title and part number of thedocumentation along with the chapter, section, and page number, if available. Please let usknow if you want a reply.Oracle Technology Network (http://www.oracle.com/technetwork/index.html) offers arange of resources related to Oracle software: Discuss technical problems and solutions on the Discussion Forums(http://forums.oracle.com).Get hands-on step-by-step tutorials with Oracle By Example html).Download Sample Code (http://www.oracle.com/technology/sample code/index.html).Typographic ConventionsThe following table describes the typographic conventions that are used in this book.TABLE P–1Typographic ConventionsTypefaceMeaningExampleAaBbCc123The names of commands, files, and directories,and onscreen computer outputEdit your .login file.Use ls -a to list all files.machine name% you have mail.What you type, contrasted with onscreencomputer outputmachine name% suaabbcc123Placeholder: replace with a real name or valueThe command to remove a file is rmfilename.AaBbCc123Book titles, new terms, and terms to beemphasizedRead Chapter 6 in the User's Guide.AaBbCc123Password:A cache is a copy that is storedlocally.Do not save the file.Note: Some emphasized itemsappear bold online.6Memory and Thread Placement Optimization Developer's Guide November 2010 (Beta)

PrefaceShell Prompts in Command ExamplesThe following table shows the default UNIX system prompt and superuser prompt for shellsthat are included in the Oracle Solaris OS. Note that the default system prompt that is displayedin command examples varies, depending on the Oracle Solaris release.TABLE P–2Shell PromptsShellPromptBash shell, Korn shell, and Bourne shell Bash shell, Korn shell, and Bourne shell for superuser#C shellmachine name%C shell for superusermachine name#7

8

1C H A P T E R1Overview of Locality Groups “Locality Groups Overview” on page 9“MPO Observability Tools” on page 12Locality Groups OverviewShared memory multiprocessor computers contain multiple CPUs. Each CPU can access all ofthe memory in the machine. In some shared memory multiprocessors, the memory architectureenables each CPU to access some areas of memory more quickly than other areas.When a machine with such a memory architecture runs the Oracle Solaris software, providinginformation to the kernel about the shortest access times between a given CPU and a given areaof memory can improve the system's performance. The locality group (lgroup) abstraction hasbeen introduced to handle this information. The lgroup abstraction is part of the MemoryPlacement Optimization (MPO) feature.An lgroup is a set of CPU–like and memory–like devices in which each CPU in the set canaccess any memory in that set within a bounded latency interval. The value of the latencyinterval represents the least common latency between all the CPUs and all the memory in thatlgroup. The latency bound that defines an lgroup does not restrict the maximum latencybetween members of that lgroup. The value of the latency bound is the shortest latency that iscommon to all possible CPU-memory pairs in the group.Lgroups are hierarchical. The lgroup hierarchy is a Directed Acyclic Graph (DAG) and issimilar to a tree, except that an lgroup might have more than one parent. The root lgroupcontains all the resources in the system and can include child lgroups. Furthermore, the rootlgroup can be characterized as having the highest latency value of all the lgroups in the system.All of its child lgroups will have lower latency values. The lgroups closer to the root have ahigher latency while lgroups closer to leaves have lower latency.9

Locality Groups OverviewA computer in which all the CPUs can access all the memory in the same amount of time can berepresented with a single lgroup (see Figure 1–1). A computer in which some of the CPUs canaccess some areas of memory in a shorter time than other areas can be represented by usingmultiple lgroups (see Figure 1–2).FIGURE 1–1Single Locality Group SchematicMachine with single latencyis represented by one lgroupCPUCPUCPUMemory10Memory and Thread Placement Optimization Developer's Guide November 2010 (Beta)

Locality Groups OverviewFIGURE 1–2Multiple Locality Groups SchematicMachine with multiplelatencies represented by multiple lgroupsCPUCPUCPUCPUCPUCPUMemoryMemorylgroup 1lgroup 2CPUCPUCPUCPUMemoryMemorylgroup 3lgroup 4CPUCPUroot lgroupThe organization of the lgroup hierarchy simplifies the task of finding the nearest resources inthe system. Each thread is assigned a home lgroup upon creation. The operating systemattempts to allocate resources for the thread from the thread's home lgroup by default. Forexample, the Oracle Solaris kernel attempts to schedule a thread to run on the CPUs in thethread's home lgroup and allocate the thread's memory in the thread's home lgroup by default.If the desired resources are not available from the thread's home lgroup, the kernel can traversethe lgroup hierarchy to find the next nearest resources from parents of the home lgroup. If thedesired resources are not available in the home lgroup's parents, the kernel continues to traversethe lgroup hierarchy to the successive ancestor lgroups of the home lgroup. The root lgroup isthe ultimate ancestor of all other lgroups in a machine and contains all of the machine'sresources.Chapter 1 Overview of Locality Groups11

MPO Observability ToolsThe Memory Placement Optimization (MPO) tools enable developers to tune the performanceof the MPO features in cases where the default MPO behaviors do not yield the desiredperformance.The lgroup APIs export the lgroup abstraction for applications to use for observability andperformance tuning. A new library, called liblgrp, contains the new APIs. Applications canuse the APIs to perform the following tasks: Traverse the group hierarchyDiscover the contents and characteristics of a given lgroupAffect the thread and memory placement on lgroupsMPO Observability ToolsThe MPO tools help developers to answer questions about system configuration and balance orplacement. The tools also provide the basic information and mechanisms that developers needin order to determine whether MPO is successful and to diagnose problems related to MPO.To determine the degree of success that MPO has in providing useful locality assignments andacceptable performance, it is important to know a given thread's affinities for lgroups, includingits home lgroup, and where the thread's memory is allocated.The MPO observability tools provide developers with the ability to determine the actions takenby the system. The MPO thread and memory placement tools enable developers to act on thatinformation. Developers can also use the dtrace(1M) tool to gain further insights into thesystem's behavior.12Memory and Thread Placement Optimization Developer's Guide November 2010 (Beta)

2C H A P T E R2MPO Observability ToolsThis chapter describes the tools that are available to use the MPO functionality that is availablein the Oracle Solaris operating system.This chapter discusses the following topics: “The pmadvise utility” on page 13 describes the tool that applies rules that define thememory use of a process. “Using the madv.so.1 Shared Object” on page 15 describes the madv.so.1 shared object andhow to use it to configure virtual memory advice. “The plgrp tool” on page 17 describes the tool that can display and set a thread's affinity for alocality group. “The lgrpinfo Tool” on page 19 prints information about the lgroup hierarchy, contents,and characteristics. “The Solaris::lgrp Module” on page 21 describes a Perl interface to the locality group APIthat is described in Chapter 3, “Locality Group APIs.”The pmadvise utilityThe pmadvise utility applies rules to a process that define how that process uses memory. Thepmadvise utility applies the rules, called advice, to the process with the madvise(3C) tool. Thistool can apply advice to a specific subrange of locations in memory at a specific time. Bycontrast, the madv.so.1(1) tool applies the advice throughout the execution of the targetprogram to all segments of a specified type.The pmadvise utility has the following options:-fThis option takes control of the target process. This option overrides the control of anyother process. See the proc(1) manual page.-oThis option specifies the advice to apply to the target process. Specify the advice in thisformat:13

The pmadvise utilityprivate adviceshared adviceheap advicestack adviceaddress:length adviceThe value of the advice term can be one of the eeaccess lwpaccess manyaccess defaultYou can specify an address and length to specify the subrange where the advice applies.Specify the address in hexadecimal notation and the length in bytes.If you do not specify the length and the starting address refers to the start of a segment,the pmadvise utility applies the advice to that segment. You can qualify the length byadding the letters K, M, G, T, P, or E to specify kilobytes, megabytes, gigabytes, terabytes,or exabytes, respectively.-vThis option prints verbose output in the style of the pmap(1) tool that shows the valueand locations of the advice rules currently in force.The pmadvise tool attempts to process all legal options. When the pmadvise tool attempts toprocess an option that specifies an illegal address range, the tool prints an error message andskips that option. When the pmadvise tool finds a syntax error, it quits without processing anyoptions and prints a usage message.When the advice for a specific region conflicts with the advice for a more general region, theadvice for the more specific region takes precedence. Advice that specifies a particular addressrange has precedence over advice for the heap and stack regions, and advice for the heap andstack regions has precedence over advice for private and shared memory.The advice rules in each of the following groups are mutually exclusive from other advice ruleswithin the same group:MADV NORMAL, MADV RANDOM, MADV SEQUENTIALMADV WILLNEED, MADV DONTNEED, MADV FREEMADV ACCESS DEFAULT, MADV ACCESS LWP, MADV ACCESS MANY14Memory and Thread Placement Optimization Developer's Guide November 2010 (Beta)

Using the madv.so.1 Shared ObjectUsing the madv.so.1 Shared ObjectThe madv.so.1 shared object enables the selective configuration of virtual memory advice forlaunched processes and their descendants. To use the shared object, the following string mustbe present in the environment:LD PRELOAD LD PRELOAD:madv.so.1The madv.so.1 shared object applies memory advice as specified by the value of the MADVenvironment variable. The MADV environment variable specifies the virtual memory advice touse for all heap, shared memory, and mmap regions in the process address space. This advice isapplied to all created processes. The following values of the MADV environment variable affectresource allocation among lgroups:access defaultThis value resets the kernel's expected access pattern to the default.access lwpThis value advises the kernel that the next LWP to touch an address rangeis the LWP that will access that range the most. The kernel allocates thememory and other resources for this range and the LWP accordingly.access manyThis value advises the kernel that many processes or LWPs will accessmemory randomly across the system. The kernel allocates the memoryand other resources accordingly.The value of the MADVCFGFILE environment variable is the name of a text file that contains oneor more memory advice configuration entries in the form exec-name:advice-opts.The value of exec-name is the name of an application or executable. The value of exec-name canbe a full pathname, a base name, or a pattern string.The value of advice-opts is of the form region advice. The values of advice are the same as thevalues for the MADV environment variable. Replace region with any of the following legal values:madvAdvice applies to all heap, shared memory, and mmap(2) regions in the processaddress space.heapThe heap is defined to be the brk(2) area. Advice applies to the existing heapand to any additional heap memory allocated in the future.shmAdvice applies to shared memory segments. See shmat(2) for moreinformation on shared memory operations.ismAdvice applies to shared memory segments that are using the SHM SHARE MMUflag. The ism option takes precedence over shm.dsmAdvice applies to shared memory segments that are using the SHM PAGEABLEflag. The dsm option takes precedence over shm.Chapter 2 MPO Observability Tools15

Using the madv.so.1 Shared ObjectmapsharedAdvice applies to mappings established by the mmap() system call using theMAP SHARED flag.mapprivateAdvice applies to mappings established by the mmap() system call using theMAP PRIVATE flag.mapanonAdvice applies to mappings established by the mmap() system call using theMAP ANON flag. The mapanon option takes precedence when multiple optionsapply.The value of the MADVERRFILE environment variable is the name of the path where errormessages are logged. In the absence of a MADVERRFILE location, the madv.so.1 shared objectlogs errors by using syslog(3C) with a LOG ERR as the severity level and LOG USER as the facilitydescriptor.Memory advice is inherited. A child process has the same advice as its parent. The advice is setback to the system default advice after a call to exec(2) unless a different level of advice isconfigured using the madv.so.1 shared object. Advice is only applied to mmap() regionsexplicitly created by the user program. Regions established by the run-time linker or by systemlibraries that make direct system calls are not affected.madv.so.1 Usage ExamplesThe following examples illustrate specific aspects of the madv.so.1 shared object.EXAMPLE 2–1Setting Advice for a Set of ApplicationsThis configuration applies advice to all ISM segments for applications with exec names thatbegin with foo. LD PRELOAD LD PRELOAD:madv.so.1MADVCFGFILE madvcfgexport LD PRELOAD MADVCFGFILEcat MADVCFGFILEfoo*:ism access lwpEXAMPLE 2–2Excluding a Set of Applications From AdviceThis configuration sets advice for all applications with the exception of ls. 16LD PRELOAD LD PRELOAD:madv.so.1MADV access manyMADVCFGFILE madvcfgexport LD PRELOAD MADV MADVCFGFILEcat MADVCFGFILEls:Memory and Thread Placement Optimization Developer's Guide November 2010 (Beta)

The plgrp toolEXAMPLE 2–3Pattern Matching in a Configuration FileBecause the configuration specified in MADVCFGFILE takes precedence over the value set in MADV,specifying * as the exec-name of the last configuration entry is equivalent to setting MADV. Thisexample is equivalent to the previous example. LD PRELOAD LD PRELOAD:madv.so.1MADVCFGFILE madvcfgexport LD PRELOAD MADVCFGFILEcat MADVCFGFILEls:*:madv access manyEXAMPLE 2–4Advice for Multiple RegionsThis configuration applies one type of advice for mmap() regions and different advice for heapand shared memory regions for applications whose exec() names begin with foo. LD PRELOAD LD PRELOAD:madv.so.1MADVCFGFILE madvcfgexport LD PRELOAD MADVCFGFILEcat MADVCFGFILEfoo*:madv access many,heap sequential,shm access lwpThe plgrp toolThe plgrp utility can display or set the home lgroup and lgroup affinities for one or moreprocesses, threads, or lightweight processes (LWPs). The system assigns a home lgroup to eachthread on creation. When the system allocates a CPU or memory resource to a thread, itsearches the lgroup hierarchy from the thread's home lgroup for the nearest available resourcesto the thread's home.The system chooses a home lgroup for each thread. The thread's affinity for its home lgroup isinitially set to none, or no affinity. When a thread sets an affinity for an lgroup in its processorset that is higher than the thread's affinity for its home lgroup, the system moves the thread tothat lgroup. The system does not move threads that are bound to a CPU. The system rehomes athread to the lgroup in its processor set that has the highest affinity when the thread's affinity forits home lgroup is removed (set to none).For a full description of the different levels of lgroup affinity and their semantics, see thelgrp affinity set(3LGRP) manual page.The plgrp tool supports the following options:-a lgroup listChapter 2 MPO Observability ToolsThis option displays the affinities of the processesor threads that you specify for the lgroups in thelist.17

The plgrp tool-Algroup list/none weak strong[,.]This option sets the affinity of the processes orthreads that you specify for the lgroups in the list.You can use a comma separated list oflgroup/affinity assignments to set several affinitiesat once.-FThis option takes control of the target process. Thisoption overrides the control of any other process.See the proc(1) manual page.-hThis option returns the home lgroup of theprocesses or threads that you specify. This is thedefault behavior of the plgrp tool when you do notspecify any options.-H lgroup listThis option sets the home lgroup of the processesor threads that you specify. This option sets astrong affinity for the listed lgroup. If you specifymore than one lgroup, the plgrp utility willattempt to home the threads to the lgroups in around robin fashion.Specifying LgroupsThe value of the lgroup list variable is a comma separated list of one or more of the followingattributes: lgroup IDRange of lgroup IDs, specified as start lgroup ID-end lgroup IDallrootleavesThe all keyword represents all of the lgroup IDs in the system. The root keyword representsthe ID of the root lgroup. The leaves keyword represents the IDs of all of the leaf lgroups. A leaflgroup is an lgroup that does not have any children.Specifying Process and Thread ArgumentsThe plgrp utility takes one or more space-separated processes or threads as arguments. You canspecify processes and threads in a the same syntax that the proc(1) tools use. You can specify aprocess ID as an integer, with the syntax pid or /proc/pid. You can use shell expansions withthe /proc/pid syntax. When you give a process ID alone, the arguments to the plgrp utilityinclude all of the threads of that process.18Memory and Thread Placement Optimization Developer's Guide November 2010 (Beta)

The lgrpinfo ToolYou can specify a thread explicitly by specifying the process ID and thread ID with the syntaxpid/lwpid. You can specify multiple threads of a process by defining ranges with can be selectedat once by using the - character to define a range, or with a comma-separated list. To specifythreads 1, 2, 7, 8, and 9 of a process whose process ID is pid, use the syntax pid/1,2,7-9.The lgrpinfo ToolThe lgrpinfo tool prints information about the lgroup hierarchy, contents, and characteristics.The lgrpinfo tool is a Perl script that requires the Solaris::Lgrp module. This tool uses theliblgrp(3LIB) API to get the information from the system and displays it in thehuman-readable form.The lgrpinfo tool prints general information about all of the lgroups in the system when youcall it withou

OverviewofLocalityGroups “LocalityGroupsOverview”onpage9 “