Peer-to-Peer (P2P) File Systems - Virginia Tech

Transcription

Peer-to-Peer (P2P) FileSystems

P2P File SystemsPeer-to-Peer Systems Definition: “Peer-to-peer systems can be characterized as distributedsystems in which all nodes have identical capabilities andresponsibilities, and all communication is symmetric.” –Rowstron- Popular Examples: Goals (from Dabek, et. al.)NapsterGnutellaSymmetric and decentralizedOperate with unmanaged voluntary participantsFast location of dataTolerate frequent joining/leaving by serversBalanced loadCS 5204 – Operating Systems2

P2P File SystemsCFS: Properties Decentralized control (use ordinary Internet hosts)Scalability (overhead at most logarithmic in the numberof servers)Availability (placement of replicas on unrelated servers)Load balance (block distribution and caching)Persistence (renewable lifetimes)Quotas (source-limited insertions)Efficiency (comparable to FTP access)CS 5204 – Operating Systems3

P2P File SystemsCFS: Architecture Read-only file system interfaceDHashserverclientCFSChord Block distribution/fetching Caching/replication Quota enforcement Block lookupA generic, distributed block storeCS 5204 – Operating Systems4

P2P File SystemsCFS: Content-hash indexing Each block (except for the root block of a file system) is identified by an indexobtained from a hash (e.g., SHA-1) of its contents A root block is signed by the author; the index of the root block is a hash of theuser’s public keydata blockdirectoryblockroot blockH(D)H(B1)H(F)DH( public key )Inode blockB1FB2H(B2)data blocktimestampsignatureCS 5204 – Operating Systems5

P2P File SystemsChord: MappingsuccessorH(IP address virtual index) server s stores all values indexed by key k for which sis the successor of k (successor(k) is the node whoseidentifier is the smallest one greater than k ) each Chord server maintains two lists:a finger table for searchingr immediate successors and their latencyinformationH(block)blockserverCS 5204 – Operating Systems6

P2P File SystemsChord: Searching (1)Dn3Mfinger table for n1startCintervalsuccessorA n1 20[A,B)n2B n1 21[B,C)n3C n1 22[C,D)n3 M n1 2m-1Bn1n2ACS 5204 – Operating Systems7

P2P File SystemsChord: Searching (2)CS 5204 – Operating Systems8

P2P File SystemsChord: performanceCS 5204 – Operating Systems9

P2P File SystemsChord: Adding Servers (1)Two Invariants maintained: Successor information is correctSuccessor(k) is responsible for key kSteps:1.By out-of-band means, locate an existing server, n2.Update tables 3.Update successor/predecessor linksCreates finger tables for new serverUpdate other server’s finger tablesRedistribute responsibility for keys to n from its successor Call higher (DHash) layerCS 5204 – Operating Systems10

P2P File SystemsChord: Adding Servers (2)Adding a new node at 6 assuming that node 6knows, by out-of-band means, of node 2CS 5204 – Operating Systems11

P2P File SystemsDHash: Interface put h(block) – stores block using content-hashingput s(block, pubkey) – stores block as a root block; key is hash ofpubkeyget(key) – finds/returns block associated with keyCS 5204 – Operating Systems12

P2P File SystemsDHash: replication Places replicas on k servers following successorNote: each Chord server maintains a list of r immediate successors. Bykeeping r k, it is easy for DHash to determine replica locationsExistence of replicas eases reallocation when node leaves the systemBy fetching the successor list from Chord, the DHash layer can selectthe most efficient node from which to access a replica of a desiredblockCS 5204 – Operating Systems13

P2P File SystemsDHash: caching, load balancing, quotas Caching is effective because searches from different clients convergetoward the end of the search Virtual servers hosted on one machine allow for more capable machines tostore a larger portion of the identifier space Each server enforces a fixed, per-IP address quota on publishing nodesCS 5204 – Operating Systems14

P2P File SystemsDHash: replication and cachingservertarget serveridentifierreplicasCS 5204 – Operating Systems15

P2P File Systems Peer-to-Peer Systems Definition: "Peer-to-peer systems can be characterized as distributed systems in which all nodes have identical capabilities and responsibilities, and all communication is symmetric." -Rowstron- Popular Examples: Napster Gnutella Goals (from Dabek, et. al.) Symmetric and decentralized