Address Translation Services - Composter

Transcription

Address Translation ServicesMichael Krause (HP, co-chair)Mark Hummel (AMD)David Wooten (Microsoft)Copyright 2006, PCI-SIG, All Rights Reserved1

Topics Introduction to Address Translation for DMAProblem StatementAddress Differentiated Memory RequestsTranslation RequestsTransaction CompletionsInvalidation RequestInvalidation CompletionsConfigurationPCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved2

ATS Specification Status The material in this presentation represents thecontents of the 0.7 version of the AddressTranslation Services Specification.9 ions/pciexpress/specification/draft/ats-spec-07 draft060327.pdf The technical details of 0.7 specifications areconsidered to be stable unless a specificproblem is found and needs to be fixed. The draft 0.9 specification will arrive shortly.PCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved3

Introduction to DMA AddressTranslation Address translation and an access check isapplied to DMA request from an IO device. In the Address Translation Services (ATS)specification, the entity doing the translation andchecking is called a Translation Agent (TA).PCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved4

DMA Remapping Illustration100DriverDriveroror OSOS100100305MMUMMU305TATADeviceDevice100MemoryOS orHypervisorPCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved5

DMA Remapping on HighVolume systems Multiple companies have announced support forDMA remapping in future chipsets for thevolume market These systems use tree-structured translationtables that are similar to CPU tables Different tree for each Bus/Dev/Func is possible9 Devices can share a device address space9 A devices can have a dedicated address spacePCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved6

Potential Issues of DMAr Increased latency of accesses9 Might need one or two accesses to find address oftree associated with a BDF9 Might need 3 or 4 accesses to walk the tree Translation caches (ATC or IOTLB) will benecessary to reduce overhead. Caches may not provide good behavior if notsized correctly9 Only two possibilities for sizing caches: too large ortoo small “Untimely” latency may cause issues withisochronous devicesPCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved7

ATS to the “rescue” ATS attempts to mitigate the impact of DMA Remappingby providing ways for Endpoints to participate intranslation cache management9 Device can maintain their own cache of translations – an“Address Translation Cache” (ATC)9 TA provides table-walking services to device to avoid excess bustraffic – also means that translation table format is uniform in asystem Device manages its ATC using its intimate knowledge offuture access pattern9 Look-ahead for isochronous devices to avoid “untimely” tablewalk latencies.9 High-load devices (graphics) don’t thrash ATC in TA.9 Application specific caching in devices – ring buffer9 Enable peer-to-peer in virtualized busPCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved8

New Protocols for ATS Differentiated memory address type – device willbe issuing Requests that use both translatedand non-translated addresses Translation Request – Address TranslationCache (ATC) in device requests a translationfrom central TA Translation Completion – translation is returnedin response to Translation Request Invalidation Request – when change occurs incentral table, need to inform remote ATCs Invalidation Completion – when ATC completesthe invalidation operation, it needs to tell TA.PCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved9

Differentiated Memory Request Previously reserved field in Memory Requestused to differentiate Address Type (AT) inRequest. 0765rFmtx 14 1321Type0 000007r654TC 23210reserved7654 33T EAttrD PRequester ID21ATTag076543210LengthLast DWBE1st DWBEAddress 63:32Address 31:02AT MeaningAT Meaning00b Default (un-translated)10b Translated01b Translation Request11b ReservedPCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reservedr10

Translation Request 0765rFmt0 14 1321Type0 000007r65TCRequester ID4 23210reserved7654 33T E AttrD P r r2AT0 1Tag1076543210Length00 000x xxx0Last DW BE 1st DW BE11111111Un-translated Address [63:32]Un-translated Address [31:02] rTranslation Request is Memory Read Request with ATfield set to 01b.Request is for TA to return the translated reference forthe memory range starting at the location referenced inthe Un-translated Address field.PCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved11

Translation Request (cont.) Always uses 64-bit form of address Length always an even number of dwords up toRCB Device can request translations for a range ofaddresses9 range is determined from Length and SmallestTranslation Unit (STU)*9 Size in bytes of range request is Length/2 * STU*Note: The STU is specified in binary multiples of 4KB andis nominally the same size as a page on the system.PCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved12

Translation CompletionError Completion765rFmt0 0 04 3 1210Type0 101076r5TC4 23210reservedCompleter IDRequester ID7654 33T E AttrD P 0 0BCompl. CStatus M0Tag21076543210Length00 0000 0000rByte Count0000 0000 0000rLower Address000 0000Cpl header for Translation CompletionValueStatusMeaning000bSuccessIn Cpl, means no translation found. Nominally means that table walkdid not reach a leaf (page table pointer) entry.001bUnsupportedRequestTranslation Requests from this Function are not supported by the TA.100bCompleterAbortError in the TA. Translation Request may be retried.AllothersReservedReserved – malformed packetPCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved13

Translation CompletionNormal Completion 0765rFmt1 04 1321Type0 101007r65TC4 23210reservedCompleter IDRequester ID7654 332T E AttrrD P 0 0BCompl.CStatusM0000Tag1076543210LengthByte CountrLower AddressCplD header for Translation CompletionTranslated Address [63:32]Translated Address [31:12]SNReservedUWRTranslation Completion translation entry (1 of N)PCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved14

Translation Completion ( cont.)FieldMeaningSSize of translation: This field is 0b if the translation applies to a 4KB rangeof memory. If this field is 1b, then the translation applies to a range ofmemory that is larger than 4KB. See 3.3.1NNon-snooped accesses: If this field is 1b, then the read and write requeststhat use this translation must not set the No Snoop bit in the Attributefield. If it is zero, then the Endpoint may use other means to determine if NoSnoop should be set.Reserved These bits shall be ignored by the ATC.UUn-translated access Only: When this field is set to 1b in a TranslationCompletion entry, the indicated range may only be accessed using untranslated addresses and the Translated Address field of this TranslationCompletion entry may not be used in a subsequent Read/Write Requestwith AT set to Translated. This value may be cached if R or W is set to 1b.R,WRead, Write – These two fields indicate what the transaction types that areallowed for the requests using the translation. If neither field is Set, then thetranslation is not valid and all the remaining fields of this dword are undefined.A value with R W 0b may not be cached in the ATC.PCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved15

Translation Completion ( cont.) TA can return 0 or more translations9 TA decides9 The TA may not return more than Length dwords May take one or two CplDs to close out the request. In a CplD, each translation entry covers the same sizedrange of addresses.9 Smallest range is STU9 All ranges will have some overlap with requested range9 All ranges in the completion must be the same size– If completion in two CplDs, then ranges in both must have samesize.9 Different completions can have different range size.PCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved16

Range DeterminationAddress Bits63:18 17 16 15 14 13 12STranslationRange Sizein 0111164Kxx011111128K Starting with S, look for first 0b (bit N) Range size is 2 (N 1) bytesPCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved17

Translation CompletionsMultiple CplDs TA may split a translation completion any time Length isgreater than 2 Since Length is no larger than allowed by RCB,completion will not take more than two CplDs In all CplD:9 Byte Count indicates bytes remaining to complete request,including the bytes in the “current” CplD9 Length indicates the number of dwords in the current Cpld. For Translation Completions, in the first CplD, LowerAddress is set to:(000 0000b) – (Length * 4)Example: if first CplD contains 2 translations the Lower Address is:(000 0000b) – (00 0000 0100b * 100b) (111 0000b)PCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved18

Translation CompletionsMissing CplDs If Bytes (Length * 4), then this is a first CplD ofa two CplD Completion. Else if Bytes (Length * 4) then this is amalformed TLP Else, if Lower Address Byte Count is not amultiple of RCB, then this is a second CplD oftwo; and if previous Transaction CompletionCplD with same Tag was not received, then aCplD is missing.PCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved19

InvalidationCopyright 2006, PCI-SIG, All Rights Reserved20

Invalidation Purpose9 Maintain consistency between TA and ATC Mechanism9 Invalidate Request Message9 Invalidate Complete MessagePCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved21

Invalidate Request 0765RFmt114 1321Type1 0010076R5TC4 23210ReservedRequester ID7654 33T E AttrD P 000 0 0Device ID2107654 3210Length00 0000 0001Message CodeITag0000 0001ReservedRReservedUn-translated Address [63:32]Un-translated Address [31:12]S ReservedMSGD packet with 4 DW of header and 2 DW of data9 Message code 0000 0001b9 Route by Device ID ITag9 Invalidate tag – Uniquely identifies each Invalidate Request Un-translated Address9 Starting address of block to be invalidated S9 Size bit (same encoding as in Translation Request)PCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved22

Invalidation Range Constraints Each Invalidate Request specifies a singlememory Region9 Power of 2 in size9 Naturally aligned9 Must be at least as big as a STU (StandardTranslation Unit)9 S field and least significant address bits encode rangesizePCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved23

Invalidate Completion 07R65 4Fmt01 1321Type1 0010076R5TC4 23210Reserved7654 33T E AttrD P 00Requester ID210RReservedDevice IDReserved7654 321000 0000 0000Message Code0000 0010CCITag Vector MSG packet with 4 DW of header9 Message code 0000 00109 Route by Device Id ITag Vector9 Invalidation tag vector – Uniquely identifies each Invalidate Completion9 Multiple completions may be compressed into a single response CC9 Completion Count – Indicates number of completions that have beensent for the corresponding request (0 8 completions)9 Always 1 for single TC devicesPCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved24

Invalidation CompletionSemantics Properties to satisfy9 Maintain consistency9 Prevent silent data corruption9 Prevent data leakage Invalidate Complete must not be returned until:9 Stale address translation is flushed from ATC9 Conflicting outstanding writes are pushed to the TA9 Conflicting outstanding reads are either completed ortagged for discardPCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved25

Invalidation Flow – Single TC1) Invalidate Request TC 0 ITag 3Root Complex2) Flush matching ATCentries. Drain or discardconflicting Reads .DeviceAATTCCTTAAPosted Write TC 1PCI-SIG Developers Conference3) Invalidate Completion TC 1 ITagV 1000b CC 1Copyright 2006, PCI-SIG, All Rights Reserved26

Invalidation Flow – Multi TC2) Flush matching ATCentries. Drain or discardconflicting Reads1)Root ComplexInvalidate Request (TC 0, ITag 1)DeviceAATTCCTTAAPosted Write TC 0Posted Write TC 1PCI-SIG Developers Conference4) InvalidateCompletion TC 1 ITagV 10b CC 23) InvalidateCompletion TC 0 ITagV 10b CC 2Copyright 2006, PCI-SIG, All Rights Reserved27

Request Acceptance Rules To avoid deadlock:9 Endpoints are not allow to create a dependency inwhich the acceptance of posted transaction isdependent upon the transmission of a postedtransaction Invalidate Requests and Completions both flowin the posted channel9 Could result in deadlock unless PCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved28

Request Acceptance Rules (cont) ATC must support worst case InvalidateRequest queue depth (32 entries)9 Use input command queuing– Must buffer most of command9 Use output response queuing– Only requires single bit of state per invalidate– Invalidate Completions are collapsible– Requester ID of source is captured once and used to routeInvalidate ResponsePCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved29

Request Acceptance Rules (cont)1)Root ComplexStream of InvalidateRequests are sent to devicewith ITag values 0, 1, 3, 6, 8DeviceTTAA3) Internal completionstatus is captured inresponse bit vector4) Invalidate Completion ITagV 1 0100 1011b Device Id Captured Requester IDAATTCCCapturedRequester Id2) Requester ID is capturedfrom first InvalidateRequest after resetPCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved30

Invalidate Flow Control Invalidation time may vary:9 Invalidate Requests have variable block size– May be larger than cached page size9 Dependent upon translation cache architecture– Page size– Associativity Could have negative impact on performance9 Posted channel may stall9 May effect other I/O flows due to credit basedcongestion spreading.PCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved31

Invalidate Flow Control (cont)3) Posted traffic in otherflows that share links withcredit starvation will blockRoot ComplexDevice 1Device 2SwitchTTAADevice 2Empty queue entry2) Congestion in postedchannel spreads toupstream links1) ATC invalidate queuereaches capacityOccupied queue entryPCI-SIG Developers ConferenceAATTCCCopyright 2006, PCI-SIG, All Rights Reserved32

Invalidate Flow Control (cont) Enable TA to flow control Invalidate Requests9 ATC must publish its Invalidate Queue Depth Not required if endpoint will:9 Handle invalidations at maximum arrival rate9 Rarely cause link backpressure9 Fully buffer maximum number of incominginvalidations with out backpressurePCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved33

Invalidation Ordering Semantics Properties9 Translation Completion travels in completion channel9 Invalidate Request travels in posted channel9 Translation Completion and Invalidate Request travelin same direction Consequence9 Invalidate Request may bypass TranslationCompletion9 Result is a stale address translation may persist in theATC.PCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved34

Invalidation Ordering Semantics(cont)InvalidateRequest ATranslationResponse BTranslationResponse BDeviceRoot ComplexTTAAInvalidateRequest ASwitchingFabricAATTCC Invalidate Request and Translation Responsecorrespond to overlapping memory regions Invalidate Request passes Translation Response Stale address translation gets installed in ATCPCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved35

Invalidation Ordering Semantics(cont) To eliminate stale entries, the ATC must:9 Snoop its outstanding Translation Request queueagainst incoming Invalidate Requests9 On hit:– Mark Translation Request as invalid– Discard results of Translation Response before issuingInvalidate Completion Results of Translation Response may be used in new requestssent prior to transmission of Invalidate CompletionPCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved36

Implicit Invalidation Events Invalidations triggered by following events9 Fundamental Reset––––Cold ResetWarm ResetHot ResetPERST#9 Function Level Reset Invalidate Complete Response not sentPCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved37

ConfigurationCopyright 2006, PCI-SIG, All Rights Reserved38

ATS Capability Structure31 30 29 28E RsvdP24 2321 20InvalidateRsvdPQueue DepthBitLocation16 15STU87Next Capability PointerRegister Description0Cap IDAttributes20:16Smallest Translation Unit (STU): This value indicates to the Functionthe minimum number of 4KB blocks that will indicated in a TranslationCompletion or Invalidate Requests. This is a power of 2 multiplier andthe number of blocks is 2 STU. A value of 0 indicates one 4KB blockand a value of 1 1111b would indicate an 8TB block. Default value is 00000b.RW28:24Invalidate Queue Depth: The number of Invalidate Requests that theEndpoint can accept before putting backpressure on the upstreamconnection. If zero, the Endpoint can accept 32 Invalidate Requests.RO31Enable (E): When Set, the Endpoint is enabled to cache translations.Default value is 0bRWPCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved39

QuestionsPCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved40

Thank you for attending thePCI-SIG Developers Conference 2006.For more information please go towww.pcisig.comPCI-SIG Developers ConferenceCopyright 2006, PCI-SIG, All Rights Reserved41

Address Translation ServicesMichael Krause (HP, co-chair)Mark Hummel (AMD)David Wooten (Microsoft)Copyright 2006, PCI-SIG, All Rights Reserved42

translation is not valid and all the remaining fields of this dword are undefined. A value with R W 0b may not be cached in the ATC. R,W Un-translated access Only: When this field is set to 1b in a Translation Completion entry, the indicated range may only be accessed using un-translated addresses and the Translated Address field of this .