Transcription
An introduction toLinux IPClinux.conf.au 2013Canberra, Australia2013-01-30man7.orgMichael Kerrisk @lwn.net1
Goal Limited time! Get a flavor of main IPC methodsman7.org2
Me Programming on UNIX & Linux since 1987 Linux man-pages maintainer http://www.kernel.org/doc/man-pages/ Kernel glibc APIAuthor of:Further info:http://man7.org/tlpi/man7.org3
You Can read a bit of C Have a passing familiarity with common syscalls fork(), open(), read(), write()man7.org4
There’s a lot of IPC Pipes FIFOs Pseudoterminals Stream vs Datagram (vs Seq.packet)UNIX vs Internet domain File vs AnonymousCross-memory attach Sockets Shared memory mappingsproc vm readv() / proc vm writev()Signals Standard, Realtime Eventfd POSIX message queues Futexes POSIX shared memory Record locks POSIX semaphores File locks Mutexes Named, Unnamed System V message queues Condition variables System V shared memory Barriers System V semaphores Read-write locksman7.org5
It helps to classify Pipes FIFOs Pseudoterminals Stream vs Datagram (vs Seq.packet)UNIX vs Internet domain File vs AnonymousCross-memory attach Sockets Shared memory mappingsproc vm readv() / proc vm writev()Signals Standard, Realtime Eventfd POSIX message queues Futexes POSIX shared memory Record locks POSIX semaphores File locks Mutexes Named, Unnamed System V message queues Condition variables System V shared memory Barriers System V semaphores Read-write locksman7.org6
It helps to classify Pipes FIFOs SocketscinummStream vs Datagram (vs Seq.packet)oCPOSIX message queues UNIX vs Internet domain POSIX shared memory POSIX semaphores Named, Unnamed System V message queues System V shared memory System V semaphoresman7.orgShared memory mappingsnoita Pseudoterminals Cross-memory attach File vs Anonymousproc vm readv() / proc vm writev()slanStandard,igRealtimeSEventfdSignals noiRecord lockstaziFile locksnorMutexeshcnConditionvariablesyS Futexes Barriers Read-write locks7
Communicationman7.org8
Synchronizatoinman7.org9
What we’ll coverYesMaybeman7.org10
What we’ll coverYesMaybeman7.org11
What is not covered Signals Can be used for communication and sync, but poor for bothSystem IPC Similar in concept to POSIX IPC But interface is terrible! Use POSIX IPC insteadThread sync primitives Mutexes, condition vars, barriers, R/W locks Can use process shared, but rare (and nonportable)Futexes Very low level Used to implement POSIX sems, mutexes, condvarsPseudoterminalsSpecialized use casesman7.org12
Communicationtechniquesman7.org13
Pipesman7.org14
Pipesls wc -lman7.org15
Pipes Pipe byte stream buffer in kernel Sequential (can’t lseek()) Multiple readers/writers difficultUnidirectional Write end read endman7.org16
Creating and using pipe Created using pipe():int filedes[1];pipe(filedes);.write(filedes[1], buf, count);read(filedes[0], buf, count);man7.org17
Sharing a pipe Pipes are anonymous No name in file systemHow do two processes share a pipe?man7.org18
Sharing a pipeint filedes[2];pipe(filedes);child pid fork();fork() duplicates parent’sfile descriptorsman7.org19
Sharing a pipeint filedes[2];pipe(filedes);child pid fork();if (child pid 0) {close(filedes[1]);/* Child now reads */} else {close(filedes[0]);/* Parent now writes */}(error checking omitted!)man7.org20
Closing unused file descriptors Parent and child must close unused descriptors close() write end Necessary for correct use of pipes!read() returns 0 (EOF)close() read end write() fails with EPIPE error SIGPIPE signalman7.org21
// http://man7.org/tlpi/code/online/dist/pipes/simple pipe.c.html// Create pipe, create child, parent writes argv[1] to pipe, child readspipe(pfd);/* Create the pipe */switch (fork()) {case 0:/* Child- reads from pipe */close(pfd[1]);/* Write end is unused */for (;;) {/* Read data from pipe, echo on stdout */numRead read(pfd[0], buf, BUF SIZE);if (numRead 0) break;/* End-of-file or error */write(STDOUT FILENO, buf, numRead);}write(STDOUT FILENO, "\n", 1);close(pfd[0]);.default:close(pfd[0]);/* Parent - writes to pipe *//* Read end is unused */write(pfd[1], argv[1], strlen(argv[1]));close(pfd[1]);/* Child will see EOF */.}man7.org22
I/O on pipes read() blocks if pipe is empty write() blocks if pipe is full Writes PIPE BUF guaranteed to be atomic Multiple writers PIPE BUF may be interleaved POSIX: PIPE BUF at least 512B Linux: PIPE BUF is 4096BCan use dup2() to connect filters via a pipe http://man7.org/tlpi/code/online/dist/pipes/pipe ls wc.c.htmlman7.org23
Pipes have limited capacity Limited capacity If pipe fills, write() blocks Before Linux 2.6.11: 4096 bytes Since Linux 2.6.11: 65,536 bytes Apps should be designed not to care about capacity–But, Linux has fcntl(fd, F SETPIPE SZ, size) man7.org(not portable)24
FIFOs(named pipes)man7.org25
FIFO (named pipe) (Anonymous) pipes can only be used by relatedprocesses FIFOs pipe with name in file system Creation: mkfifo(pathname, permissions) Any process can open and use FIFO I/O is same as for pipesman7.org26
Opening a FIFO open(pathname, O RDONLY) open(pathname, O WRONLY) Open read endOpen write endopen() locks until other end is opened Opens are synchronizedopen(pathname, O RDONLY O NONBLOCK) canbe usefulman7.org27
POSIXMessage Queuesman7.org28
Highlights of POSIX MQs Message-oriented communication Receiver reads messages one at a time– Unlike pipes, multiple readers/writers can be usefulMessages have priorities No partial or multiple message readsDelivered in priority orderMessage notification featureman7.org29
POSIX MQ API Queue management (analogous to files) mq open(): open/create MQ, set attributes mq close(): close MQ mq unlink(): remove MQ pathnameI/O: mq send(): send message mq receive(): receive messageOther: mq setattr(), mq getattr(): set/get MQ attributes mq notify(): request notification of msg arrivalman7.org30
Opening a POSIX MQ mqd mq open(name, flags [, mode, &attr]); Open create new MQ / open existing MQ name has form /somename Visible in a pseudo-filesystemReturns mqd t, a message queue descriptor Used by rest of APIman7.org31
Opening a POSIX MQ mqd mq open(name, flags [, mode, &attr]); flags (analogous to open()): O CREAT – create MQ if it doesn’t exist O EXCL – create MQ exclusively O RDONLY, O WRONLY, O RDWR – just like file open O NONBLOCK – non-blocking I/O mode sets permissions &attr: attributes for new MQ NULL gives defaultsman7.org32
Opening a POSIX MQ Examples:// Create new MQ, exclusive,// for writingmqd mq open("/mymq",O CREAT O EXCL O WRONLY,0600, NULL);// Open existing queue for readingmqd mq open("/mymq", O RDONLY);man7.org33
Unlink a POSIX MQ mq unlink(name); MQs are reference-counted MQ removed only after all users have closedman7.org34
Nonblocking I/O on POSIX MQs Message ques have a limited capacity Controlled by attributesBy default: mq receive() blocks if no messages in queue mq send() blocks if queue is fullO NONBLOCK: EAGAIN error instead of blocking Useful for emptying queue without blockingman7.org35
Sending a message mq send(mqd, msg ptr, msg len, msgprio); mqd – MQ descriptor msg ptr – pointer to bytes forming message msg len – size of message msgprio – priority––non-negative integer0 is lowest priorityman7.org36
Sending a message mq send(mqd, msg ptr, msg len, msgprio); Example:mqd t mqd;mqd mq open("/mymq",O CREAT O WRONLY,0600, NULL);char *msg "hello world";mq send(mqd, msg, strlen(msg), 0);http://man7.org/tlpi/code/online/dist/pmsg/pmsg send.c.htmlman7.org37
Receiving a message nb mq receive(mqd, msg ptr, msg len, &prio); mqd – MQ descriptor msg ptr – points to buffer that receives message msg len – size of buffer &prio – receives priority nb – returns size of message (bytes)man7.org38
Receiving a message nb mq receive(mqd, msg ptr, msg len, &prio); Example:const int BUF SIZE 1000;char buf[BUF SIZE];unsigned int prio;.mqd mq open("/mymq", O RDONLY);nbytes mq receive(mqd, buf,BUF LEN, pmsg receive.c.htmlman7.org39
POSIX MQ notifications mq notify(mqd, notification); One process can register to receive notification Notified when new msg arrives on empty queue & only if another process is not doing mq receive()notification says how caller should be notified Send me a signal Start a new thread (see mq notify(3) for example)One-shot; must re-enable Do so before emptying queue!man7.org40
POSIX MQ attributesstruct mq attr {long mq flags;//////long mq maxmsg; ////long mq msgsize; ////long mq curmsgs; ////MQ description flags0 or O NONBLOCK[mq getattr(), mq setattr()]Max. # of msgs on queue[mq open(), mq getattr()]Max. msg size (bytes)[mq open(), mq getattr()]# of msgs currently in queue[mq getattr()]};man7.org41
POSIX MQ details Per-process and system-wide limits governresource usageCan mount filesystem to obtain info on MQs:# mkdir /dev/mqueue# mount -t mqueue none /dev/mqueue# ls /dev/mqueuemymq# cat /dev/mqueue/mymqQSIZE:129 NOTIFY:2 SIGNO:0 NOTIFY PID:8260 See mq overview(7)man7.org42
Shared memoryman7.org43
Shared memory Processes share same physical pages ofmemory Communication copy data to memory Efficient; compare Data transfer: user space kernel user space Shared memory: single copy in user spaceBut, need to synchronize access.man7.org44
Shared memory Processes sharephysical pagesof memoryman7.org45
Shared memory We’ll cover three types: Shared anonymous mappings– Shared file mappings– related processesunrelated processes, backed by file in traditional filesystemPOSIX shared memory–unrelated processes, without use of traditional filesystemman7.org46
mmap() Syscall used in all three shmem types Rather complex: void *mmap(void *daddr, size t len, int prot,int flags, int fd, off t offset);man7.org47
mmap() addr mmap(daddr, len, prot, flags, fd, offset); daddr – choose where to place mapping; len – size of mappingprot – memory protections (read, write, exec)flags – control behavior of call Best to use NULL, to let kernel chooseMAP SHARED, MAP ANONYMOUSfd – file descriptor for file mappingsoffset – starting offset for mapping from fileaddr – returns address used for mappingman7.org48
Using shared memory addr mmap(daddr, len, prot, flags, fd, offset);addr looks just likeany C pointerBut, changes to regionseen by all processthat map itman7.org49
Shared anonymousmappingman7.org50
Shared anonymous mapping Share memory between related processes mmap() fd and offset args unneededaddr mmap(NULL, length,PROT READ PROT WRITE,MAP SHARED MAP ANONYMOUS,-1, 0);pid fork(); Allocates zero-initialized block of length bytes Parent and child share memory at addr:length http://man7.org/tlpi/code/online/dist/mmap/anon mmap.c.htmlman7.org51
Shared anonymous mappingaddr mmap(NULL, length,PROT READ PROT WRITE,MAP SHARED MAP ANONYMOUS,-1, 0);pid fork();man7.org52
Shared filemappingman7.org53
Shared file mapping Share memory between unrelated processes,backed by filefd open(.); addr mmap(., fd, offset);man7.org54
Shared file mapping fd open(.); addr mmap(., fd, offset); Contents of memory initialized from file Updates to memory automatically carriedthrough to file (“memory-mapped I/O”)All processes that map same region of file sharesame memoryman7.org55
Shared file mappingman7.org56
Shared file mappingfd open(pathname, O RDWR);addr mmap(NULL, length,PROT READ PROT WRITE,MAP SHARED,fd, 0);.close(fd);/* No longer need 'fd' */Updates are: visible to other process sharingmapping; and carried through to fileman7.org57
POSIXshared memoryman7.org58
POSIX shared memory Share memory between unrelated process,without creating file in (traditional) filesystem Don’t need to create a file Avoid file I/O overheadman7.org59
POSIX SHM API Object management shm open(): open/create SHM object mmap(): map SHM object shm unlink(): remove SHM object pathnameOperations on SHM object via fd returned byshm open(): fstat(): retrieve info (size, ownership, permissions) ftruncate(): change size fchown(): fchmod(): change ownership, permissionsman7.org60
Opening a POSIX SHM object fd shm open(name, flags, mode); Open create new / open existing SHM object name has form /somename Can be seen in dedicated tmpfs at /dev/shmReturns fd, a file descriptor Used by rest of APIman7.org61
Opening a POSIX SHM object fd shm open(name, flags, mode); flags (analogous to open()): O CREAT – create SHM if it doesn’t exist O EXCL – create SHM exclusively O RDONLY, O RDWR – indicates type of access O TRUNC – truncate existing SHM object to zerolengthmode sets permissions MBZ if O CREAT not specifiedman7.org62
Create and map new SHM object Create and map a new SHM object of size bytes:fd shm open("/myshm",O CREAT O EXCL O RDWR, 0600);ftruncate(fd, size);// Set size of objectaddr mmap(NULL, size,PROT READ PROT WRITE,MAP SHARED, fd, 0);man7.org63
Map existing SHM object Map an existing SHM object of unknown size:fd shm open("/myshm", O RDWR, 0); // No O CREAT// Use object size as length for mmap()struct stat sb;fstat(fd, &sb);addr mmap(NULL, sb.st size,PROT READ PROT WRITE,MAP SHARED, fd, 0);http://man7.org/tlpi/code/online/dist/pshm/pshm read.c.htmlman7.org64
But. How to prevent two process updatingshared memory at the same time?man7.org65
Synchronizationman7.org66
Synchronization Synchronize access to a shared resource Shared memory– SemaphoresFile–File locksman7.org67
POSIXsemaphoresman7.org68
POSIX semaphores Integer maintained inside kernelKernel blocks attempt to decrease value belowzeroTwo fundamental operations: sem post(): increment by 1 sem wait(): decrement by 1–May blockman7.org69
POSIX semaphores Semaphore represents a shared resourceE.g., N shared identical resources initialvalue of semaphore is NCommon use: binary value Single resource (e.g., shared memory)man7.org70
Unnames and named semaphores Two types of POSIX semaphore: Unnamed– Embedded in shared memoryNamed–Independent, named objectsman7.org71
Unnamed semaphores API sem init(semp, pshared, value): initializesemaphore pointed to by semp to value sem t *semp pshared: 0, thread sharing; ! 0, process sharing sem post(semp): add 1 to value sem wait(semp): subtract 1 from value sem destroy(semp): free semaphore, releaseresources back to system Must be no waiters!man7.org72
Unnamed semaphores example Two processes, writer and reader Sending data through POSIX shared memory Two unnamed POSIX semaphores inside shmenforce alternating access to shmman7.org73
Unnamed semaphores exampleman7.org74
Header file#define BUF SIZE 1024struct shmbuf {// Buffer in shared memorysem t wsem;// Writer semaphoresem t rsem;// Reader semaphoreint cnt;// Number of bytes used in 'buf'char buf[BUF SIZE]; // Data being transferred}man7.org75
Writerfd shm open(SHM PATH, O CREAT O EXCL O RDWR, OBJ PERMS);ftruncate(fd, sizeof(struct shmbuf));shmp mmap(NULL, sizeof(struct shmbuf),PROT READ PROT WRITE, MAP SHARED, fd, 0);sem init(&shmp- rsem, 1, 0);sem init(&shmp- wsem, 1, 1);// Writer gets first turnfor (xfrs 0, bytes 0; ; xfrs , bytes shmp- cnt) {sem wait(&shmp- wsem);// Wait for our turnshmp- cnt read(STDIN FILENO, shmp- buf, BUF SIZE);sem post(&shmp- rsem);// Give reader a turnif (shmp- cnt 0)break;}sem wait(&shmp- wsem);// EOF on stdin?// Wait for reader to finish// Clean upman7.org76
Readerfd shm open(SHM PATH, O RDWR, 0);shmp mmap(NULL, sizeof(struct shmbuf),PROT READ PROT WRITE, MAP SHARED, fd, 0);for (xfrs 0, bytes 0; ; xfrs ) {sem wait(&shmp- rsem);// Wait for our turn */if (shmp- cnt 0)break;bytes shmp- cnt;// Writer encountered EOF */write(STDOUT FILENO, shmp- buf, shmp- cnt) ! shmp- cnt);sem post(&shmp- wsem);// Give writer a turn */}sem post(&shmp- wsem);man7.org// Let writer know we're finished77
Named semaphores API Object management sem open(): open/create semaphore sem unlink(): remove semaphore pathnameman7.org78
Opening a POSIX semaphore semp sem open(name, flags [, mode, value]); Open create new / open existing semaphore name has form /somename Can be seen in dedicated tmpfs at /dev/shmReturns sem t *, reference to semaphore Used by rest of APIman7.org79
Opening a POSIX semaphore semp sem open(name, flags [, mode, value]); flags (analogous to open()): O CREAT – create SHM if it doesn’t exist O EXCL – create SHM exclusivelyIf creating new semaphore: mode sets permissions value initializes semaphoreman7.org80
Socketsman7.org81
Sockets Big topic Just a high-level view Some notable features when running as IPCman7.org82
Sockets “A socket is endpoint of communication.” . you need two of them Bidirectional Created via: fd socket(domain, type, protocol);man7.org83
Socket domains Each socket exists in a domain Domain determines: Method of identifying socket (“address”) “Range” of communication––Processes on a single hostAcross a networkman7.org84
Common socket domains UNIX domain (AF UNIX) Communication on single host Address file system pathnameIPv4 domain (AF INET) Communication on IPv4 network Address IPv4 address (32 bit) port numberIPv6 domain (AF INET6) Communication on IPv6 network Address IPv6 address (128 bit) port numberman7.org85
Socket type Determines semantics of communication Two main types available in all domains: Stream (SOCK STREAM) Datagram (SOCK DGRAM)UNIX domain (on Linux) also provides Sequential packet (SOCK SEQPACKET)man7.org86
Stream sockets SOCK STREAM Byte stream Connection-oriented Like a two-party phone call Reliable data arrives “intact” or not at all Intact: In order UnduplicatedInternet domain: TCP protocolman7.org87
Datagram sockets SOCK DGRAM Message-oriented Connection-less Like a postal systemUnreliable; messages may arrive: Duplicated Out of order Not at allInternet domain: UDP protocolman7.org88
Sequential packet sockets SOCK SEQPACKET Midway between stream and datagram sockets Message-oriented Connection-oriented ReliableUNIX domain In INET domain, only with SCTP protocolman7.org89
Stream sockets APIman7.org90
Stream sockets APIman7.org91
Stream sockets APIman7.org92
Stream sockets API socket(SOCK STREAM) – create a socket Passive socket: bind() – assign address to socket listen() – specify size of incoming connection queue accept() – accept connection off incoming queueActive socket: connect() – connect to passive socketI/O: write(), read(), close() send(), recv() – socket specific flagsman7.org93
Datagram sockets APIman7.org94
Datagram sockets API socket(SOCK DGRAM) – create socket bind() – assign address to socket sendto() – send datagram to an address recvfrom() – receive datagram and address ofsenderclose()man7.org95
Sockets: noteworthy points Bidirectional communication UNIX domain datagram sockets are reliable UNIX domain sockets can pass file descriptors Internet domain sockets are only method fornetwork communicationUDP sockets allow broadcast / multicast ofdatagramssocketpair() UNIX domain Bidirectional pipeman7.org96
Other criteria affectingchoice of anIPC mechanismman7.org97
Criteria for selecting an IPC mechanism The obvious Consistency with application design FunctionalityLet’s look at some other criteriaman7.org98
IPC IDs and handles Each IPC object has: ID – the method used to identify an objectHandle – the reference used in a process to accessan open objectman7.org99
IPC IDs and handlesman7.org100
File descriptor handles Some handles are file descriptorsFile descriptors can be multiplexed via poll() /select() /epoll Sockets, pipes, FIFOs On Linux, POSIX MQ descriptors are file descriptors One good reason to avoid System V messagequeuesman7.org101
IPC access permissions How is access to IPC controlled? Possibilities UID/GID permissions mask Related processes (via fork()) Other–e.g., Internet domain: application-determinedman7.org102
IPC access permissionsman7.org103
IPC object persistence What is the lifetime of an IPC object? Process: only as long as held open by at least oneprocessKernel: until next reboot– State persists even if no connected processFilesystem: persists across reboot–Memory mapped fileman7.org104
IPC object persistenceman7.org105
Thanks! And Questions(slides up soon at http://man7.org/conf/)Mamaku (Black Tree Fern) image (c) Rob Suistednaturespic.comMichael lwn.nethttp://lwn.net/Linux man-pages /doc/man-pages/man7.org(No Starch Press, 2010)106
Jan 30, 2013 · An introduction to Linux IPC linux.conf