CSE 124 Networked Services Fall 2009 - Computer Science

Transcription

CSE 124Networked ServicesFall 2009: Lecture 15B. S. Manoj, Ph.Dhttp://cseweb.ucsd.edu/classes/fa09/cse124Some of these slides are adapted from various sources/individuals including but not limited tothe images and text from the text book by Kurose and Ross and publications from theIEEE/ACM digital libraries. Use of these slides other than for pedagogical purpose for CSE 124,may require explicit permissions from the respective sources.11/12/2009CSE 124 Network Services FA 20091

Announcements Programming Project-2:– 1-pager submission deadline: 11/13/2009Project TitleTeam members1-2 paragraph description of the project– Progress report due Deadline: 11/20/2009 Progress made so far What further to do– Project final presentation: Schedule: 12/03/2009 Final report submission: Any time during the finals week Paper discussion session Paper will be posted soon Guest Lecture by experts from Sun micro systems– Tentatively scheduled for 11/24/200911/12/2009CSE 124 Network Services FA 20092

Electronic Mail: SMTP [RFC 2821] uses TCP to reliably transfer email message from client to server,port 25 direct transfer: sending server to receiving server three phases of transfer– handshaking (greeting)– transfer of messages– closure command/response interaction– commands: ASCII text– response: status code and phrase messages must be in 7-bit ASCII11/12/2009CSE 124 Network Services FA 20093

Scenario: Alice sends message to Bob4) SMTP client sends Alice’smessage over the TCPconnection5) Bob’s mail server places themessage in Bob’s mailbox6) Bob invokes his user agent toread message1) Alice uses UA to composemessage and “to”bob@someschool.edu2) Alice’s UA sends message to hermail server; message placed inmessage queue3) Client side of SMTP opens TCPconnection with Bob’s er46useragent5CSE 124 Network Services FA 20094

Sample SMTP 9220 hamburger.eduHELO crepes.fr250 Hello crepes.fr, pleased to meet youMAIL FROM: alice@crepes.fr 250 alice@crepes.fr. Sender okRCPT TO: bob@hamburger.edu 250 bob@hamburger.edu . Recipient okDATA354 Enter mail, end with "." on a line by itselfDo you like ketchup?How about pickles?.250 Message accepted for deliveryQUIT221 hamburger.edu closing connectionCSE 124 Network Services FA 20095

SMTP: final words SMTP uses persistent connections SMTP requires message (header& body) to be in 7-bit ASCII SMTP server uses CRLF.CRLFto determine end of messageComparison with HTTP: HTTP: pull SMTP: push both have ASCIIcommand/response interaction,status codes HTTP: each object encapsulatedin its own response msg SMTP: multiple objects sent inmultipart msg11/12/2009CSE 124 Network Services FA 20096

Mail message formatSMTP: protocol for exchanging emailmsgsRFC 822: standard for text messageformat: header lines, e.g.,– To:– From:– Subject:different from SMTP commands!headerblanklinebody body– the “message”, ASCII charactersonly11/12/2009CSE 124 Network Services FA 20097

Mail access protocolsSMTPSMTPuseragentsender’s mailserveraccessprotocoluseragentreceiver’s mailserver SMTP: delivery/storage to receiver’s server Mail access protocol: retrieval from server– POP: Post Office Protocol [RFC 1939] authorization (agent -- server) and download– IMAP: Internet Mail Access Protocol [RFC 1730] more features (more complex) manipulation of stored msgs on server– HTTP: gmail, Hotmail, Yahoo! Mail, etc.11/12/2009CSE 124 Network Services FA 20098

P2P email Email – a very important communication medium Desired features– High availability– Privacy or Confidentiality– Easiness of operation Client-server Email– Dominant in today’s Internet– Server receives, stores, and provides inbox access to users E-mail servers– Storage stress Large attachments sent over to large number of people– Processor stress Due to additional processing load for virus checking, spam filtering etc– Clustering is done for reliability Moderate improvement in fault tolerance– Expensive for large scales P2P Email may become an alternative– High reliability– No single server11/12/2009CSE 124 Network Services FA 2009– Relatively low complexity of the system9

P2P Email Retrievedmessages arelocal Inbox and Newmessages arestored in theDHT substrate– Many peers maystore yourmessages11/12/2009CSE 124 Network Services FA 200910

P2P Email P2P Email runs over DHT substrate– DHT may be any one of CAN, Chord, or Pastry Resilient to– faults– Disasters– Attacks Scalability– Reduced storage and processor stress Spam filtering and Virus check is done by User Agent Better security and privacy– Inbox is stored by User Agent11/12/2009CSE 124 Network Services FA 200911

Basic Components of P2P Email User– Each user has an email address which is public E.g, alice@alice.com– Certificate from an external certificate authority Email address certificates that binds Email addresses to public keys– Alice has her private key to decrypt messages encrypted using her public key– Inbox contains only notifications of Unread messages Nodes or peers– System nodes Nodes in the DHT substrateObjective is to provide persistent delivery of incoming messages– User Agents Runs Email reader softwareAccess the messages using System nodes– System nodes and User agents can coexist Otherwise, UA must have an IP address of at least one of the System Nodes to access the inbox and messagesDHT substrate space stores– Email address certificates– Email message bodies– Inboxes DHT lookup service11/12/2009CSEservice124 NetworkFA hash2009 function– Use one of the Existing lookupwithServicesa single12

Service Primitives of P2P Email UAs call service primitives to do higher level system functions– Sending– Retrieving– Deleting Each service is directly conducted by a UA on a peer node– IP address of the peer is looked up by the UA Five primitives– Store, Fetch, Delete, Append-inbox, and Read-inbox Store– Used to store Email message bodies and Email address certificates– Arguments an object, object’s ID, and a set of Email address certificates with permission to deletethe object–––––11/12/2009Message object: ID is an EFC 2822 message IDEmail certificate object: Id is an email address appended with certificateStores the object in k replicasLookup provides k closest peers to the Object IDUA requests each peer to store the message objectCSE 124 Network Services FA 200913

Service primitives of P2P Email Delete Service primitive– Used by a requester to reclaim storage space– Arguments: Object ID, Requester’s Email address– Receiver of the request Locates the requester’s certificate listAuthenticates the requesterRemoves the requester’s certificate from the object’s listIf resulting certificate list is empty, then object is discarded– Sometimes, the k closest nodes of the object may notcontain the object New nodes join the DHT with ID closer to the key UA can widen the search to include older nodes if k nodes do nothave the object11/12/2009CSE 124 Network Services FA 200914

Service primitives of P2P Email Append-Inbox– Used by a sender to append Email message header to a receiver’s inbox– Inbox consistency is not guaranteed Inbox is maintained by multiple peers and messages delivered asynchronously As the user reads emails, the inbox gets updated Fetch service– Used to retrieve stored objects– Arguments: Object ID– Returns the object (encrypted message or Email certificate) Read-Inbox– UA calls the Read-Inbox to retrieve a message notifications in the inbox– Arguments: Email address and certificate– Returns Email notifications for the receiver’s inbox Garbage Collection Process– Every peer maintains the objects sorted by key– A node can check if that is k-closest to the object’s original location11/12/2009 Otherwise, it can delete the object Or Least Recently Used object can be replacedCSE 124 Network Services FA 200915

P2P Email message creation Alice’s User Agent, A Creates Message–––––– Appends Bob’s email address with ”-certificate,” and maps this to a keyUses lookup service to obtain the list of k nodes closest to the keyFetches Bob’s certificate from one of these nodesExtracts Bob’s public keyGenerates a session key, and uses it to encrypt the e-mail message body.Generates an RFC 822 message ID that will be used to identify the message body.User Agent A Stores the message– maps the message ID to a key– A uses the lookup service to obtain the k nodes closest to the key– invokes the store operation on each of them using the message ID as identifier and the encrypted message body as objectUser Agent A constructs the e-mail message headers and Updates the inbox– A creates message headers Include the session key, the message ID header, and a digest of the message)Encrypts them with Bob’s public keyMaps Bob’s e-mail address to a key– User Agent Updates Bob’s Inbox 11/12/2009Obtains from the lookup service the k nodes closest to the keyInvokes the append-inbox function on each of them with Bob’s e-mail address and the encrypted messageheadersCSE 124 Network Services FA 200916

P2P Email: Reading the Inbox Bob wants to read his Emails– Retrieve Inbox User Agent B obtains the k-closest nodes with Bob’s emailaddress– Senders append message notifications to these nodes B invokes read-inbox action on all k nodes– To provide consistent inbox Creates a superset of all k message notifications Each of k nodes will delete their copy of notifications– Retrieve Messages UA B looksup k closest nodes to the message ID UA invokes fetch operation on one of the k nodes Verifies the message body by comparing the digest that the senderplaced in the message headers If the message body is not valid, then UA B proceeds to another ofthe k nodes– Repeats until the B gets a satisfactory message11/12/2009124 Network Services FA 2009 UA B decrypts theCSEobjectand peers delete the object17

Persistence of Data/Operation Probability of finding all k peersdown– p is the up probability of a peer Probability of success of peerretrieving an object (reading amessage)11/12/2009CSE 124 Network Services FA 200918

Persistency of Operation in P2PEmail Use of the system requires multipleoperations, n– E.g., read inbox, read message Probability of successful transaction over noperations11/12/2009CSE 124 Network Services FA 200919

Persistency of Operation in P2PEmail What is the optimal k for a certain persistencyof operation (reliability)– p- probability of node up– n- number of operations per transaction– k- number of peers for replicating the data11/12/2009CSE 124 Network Services FA 200920

P2P Email: behavior of k n 2, p 0.9, k 5 or 6; n 2, p 0.5, k 15 n 100, p 0.9, k 5 or 6; n 100, p 0.5, k 2011/12/2009CSE 124 Network Services FA 200921

P2P Search Challenges– Volume of data– P2P may not scale well as of today Google indexes only 2-3 Billion pages– Of a total of 550 Billion pages New players to the search market will face toughchallenges Will P2P search has sufficient capacity?11/12/2009CSE 124 Network Services FA 200922

P2P Search Two popular candidate for search basis– Flood queries over some or all peers E.g., Gnutella– Use DHT based substrates P2P text search Tested upto 100,000 documents P2P search has an estimated 500million documents– Mainly music file related key words– Search is based on the title, authors, and small keywords of objects Much smaller workload than the real Web!11/12/2009CSE 124 Network Services FA 200923

P2P search Fundamental constraints– Google: 3Billion x 1000 words/doc– Google: 1000 queries per second– Total index size of Web: 6x1013 Bytes Storage constraints– Each peer has very limited storage Assume 1GB per peer– To index 6x1013 Bytes 60,000 peer nodes are required Communication constraints––––Assume search could consume 10% of Internet’s capacityAssuming Internet bisection backbone bandwidth is 100Gbps1000 queries/secondPer query bandwidth budget is 100Gbps/1000 10Mbps 1 MBps11/12/2009CSE 124 Network Services FA 200924

P2P search approaches Partition by document– Docs are divided up among the hosts– Each peer maintains a local index of the docs it isresponsible for– Query must be broadcast or flooded to all peers– Peer returns most highly ranked documents Gnutella and KaZaA utilize partitition by document– Flooding query to 60000 packets would be requiredeach of size atleast 100 Bytes Communication cost: 6MBps : 6x higher than the budget11/12/2009CSE 124 Network Services FA 200925

P2P search approaches Partition by keyword– Each peer is responsible for a set of keywords that appear in thedocument– Each peer stores the posting list of the words that it isresponsible for– DHT is used to map keyword to peer that is responsible– Multiple keywords require multiple queries A test of 81,000 queries (at search engine for mit.edu) show that––––40% carries one term35% carries two terms25% carries three or more termsQueries moves about 300,000 bytes for MIT with with 1.7 million web pages For 3 Billion webpage: 530MBps [(3B/1.7M)*300KB]– An expected 530x improvement required Some keywords are particularly expensive: the, who, etc. [4000ximprovement required]11/12/2009CSE 124 Network Services FA 200926

P2P search optimization strategies Caching and precomputation Compression– Bloom filters– Gap compression– Adaptive Set Intersection– Clustering Compromises– Compromising result quality– Compromising P2P structure11/12/2009CSE 124 Network Services FA 200927

Summary Reading material:– IEEE Papers to be listed on the website11/12/2009CSE 124 Network Services FA 200928

SMTP: final words SMTP uses persistent connections SMTP requires message (header & body) to be in 7-bit ASCII SMTP server uses CRLF.CRLF to determine end of message Comparison with HTTP: HTTP: pull SMTP: push both have ASCII command/response interaction, status codes HTTP: each object encapsulated in its own response msg