Introduction To Linux Device Drivers - Mulix

Transcription

Introduction to Linux Device DriversRecreating Life One Driver At a TimeMuli Ben-Yehudamulix at mulix.orgIBM Haifa Research Labs and Haifux - Haifa Linux ClubLinux Device Drivers, Technion, Jan 2005 – p.1/50

Why Write Linux Device Drivers?For fun,For profit (Linux is hot right now, especially embeddedLinux),To scratch an itch.Because you can!OK, but why Linux drivers?Because the source is available.Because of the community’s cooperation andinvolvement.Have I mentioned it’s fun yet?Linux Device Drivers, Technion, Jan 2005 – p.2/50

klife - Linux kernel game of lifeklife is a Linux kernel Game of Life implementation. It is asoftware device driver, developed specifically for this talk.The game of life is played on a square grid, wheresome of the cells are alive and the rest are dead.Each generation, based on each cell’s neighbors, wemark the cell as alive or dead.With time, amazing patterns develop.The only reason to implement the game of life insidethe kernel is for demonstration purposes.Software device drivers are very common on Unix systemsand provide many services to the user. Think about/dev/null, /dev/zero, /dev/random, /dev/kmem.Linux Device Drivers, Technion, Jan 2005 – p.3/50

Anatomy of a Device DriverA device driver has three sides: one side talks to therest of the kernel, one talks to the hardware, and onetalks to the user:UserKernelDeviceFileDevice DriverHardwareLinux Device Drivers, Technion, Jan 2005 – p.4/50

Kernel Interface of a Device DriverIn order to talk to the kernel, the driver registers withsubsystems to respond to events. Such an event mightbe the opening of a file, a page fault, the plugging in ofa new USB device, etc.KernelEvent Listx File Open.x Page Faultx Interrupt.x HotplugDevice DriverLinux Device Drivers, Technion, Jan 2005 – p.5/50

User Interface of a Device driverSince Linux follows the UNIX model, and in UNIXeverything is a file, users talk with device driversthrough device files.Device files are a mechanism, supplied by the kernel,precisely for this direct User-Driver interface.klife is a character device, and thus the user talks to itthrough a character device file.The other common kind of device file is a block devicefile. We will only discuss character device files today.Linux Device Drivers, Technion, Jan 2005 – p.6/50

Anatomy of klife device driverThe user talks with klife through the /dev/klife device file.When the user opens /dev/klife, the kernel callsklife’s open routineWhen the user closes /dev/klife, the kernel callsklife’s release routineWhen the user reads or writes from or to /dev/klife you get the idea. . .klife talks to the kernel throughits initialization function. . . and through register chrdev. . . and through hooking into the timer interruptWe will elaborate on all of these laterLinux Device Drivers, Technion, Jan 2005 – p.7/50

Driver Initialization Codes t a t i c i n t i n i t k l i f e m o d u l e i n i t ( void ){int ret ;pr debug ( " k l i f e module i n i t c a l l e d \ n " ) ;i f ( ( r e t r e g i s t e r c h r d e v (KLIFE MAJOR NUM , " k l i f e " , & k l i f e f o p s ) ) p r i n t k (KERN ERR " r e g i s t e r c h r d e v : %d \ n " , r e t ) ;return ret ;}Linux Device Drivers, Technion, Jan 2005 – p.8/50

Driver InitializationOne function (init) is called on the driver’s initialization.One function (exit) is called when the driver is removedfrom the system.Question: what happens if the driver is compiled intothe kernel, rather than as a module?The init function will register hooks that will get thedriver’s code called when the appropriate eventhappens.Question: what if the init function doesn’t register anyhooks?There are various hooks that can be registered: fileoperations, pci operations, USB operations, networkoperations - it all depends on what kind of device this is.Linux Device Drivers, Technion, Jan 2005 – p.9/50

Registering Chardev Hooksstruct file operations klife fops {. owner THIS MODULE ,. open k l i f e o p e n ,. release k l i f e r e l e a s e ,. read k l i f e r e a d ,. write klife write ,.mmap klife mmap ,. ioctl klife ioctl};.i f ( ( r e t r e g i s t e r c h r d e v (KLIFE MAJOR NUM , " k l i f e " , & k l i f e f o p s ) ) 0 )p r i n t k (KERN ERR " r e g i s t e r c h r d e v : %d \ n " , r e t ) ;Linux Device Drivers, Technion, Jan 2005 – p.10/50

User Space Access to the DriverWe saw that the driver registers a character device tied to agiven major number, but how does the user create such afile?# mknod /dev/klife c 250 0And how does the user open it?if ((kfd open("/dev/klife", O RDWR)) 0) {perror("open /dev/klife");exit(EXIT FAILURE);}And then what?Linux Device Drivers, Technion, Jan 2005 – p.11/50

File Operations. . . and then you start talking to the device. klife uses thefollowing device file operations:open for starting a game (allocating resources).release for finishing a game (releasing resources).write for initializing the game (setting the startingpositions on the grid).read for generating and then reading the next state ofthe game’s grid.ioctl for querying the current generation number, and forenabling or disabling hooking into the timer interrupt(more on this later).mmap for potentially faster but more complex directaccess to the game’s grid.Linux Device Drivers, Technion, Jan 2005 – p.12/50

The open and release Routinesopen and release are where you perform any setup not donein initialization time and any cleanup not done in module unload time.Linux Device Drivers, Technion, Jan 2005 – p.13/50

klife openklife’s open routine allocates the klife structure which holdsall of the state for this game (the grid, starting positions,current generation, etc).s t a t i c i n t k l i f e o p e n ( s t r u c t inode inode , s t r u c t f i l e f i l p ){struct klife k;int ret ;r e t a l l o c k l i f e (& k ) ;i f ( ret )return ret ;f i l p p r i v a t e d a t a k ;return 0;}Linux Device Drivers, Technion, Jan 2005 – p.14/50

klife open - alloc klifes t a t i c i n t a l l o c k l i f e ( s t r u c t k l i f e pk ){int ret ;struct klife k;k k m a l l o c ( s i z e o f ( k ) , GFP KERNEL ) ;if (! k)r e t u r n ENOMEM;ret i n i t k l i f e (k );i f ( ret ) {kfree ( k ) ;k NULL ;} pk k ;return ret ;}Linux Device Drivers, Technion, Jan 2005 – p.15/50

klife open - init klifestatic int i n i t k l i f e ( struct klife k){int ret ;memset ( k , 0 , s i z e o f ( k ) ) ;s p i n l o c k i n i t (& k l o c k ) ;r e t ENOMEM;/ one page t o be e x p o r t e d t o userspace /k g r i d ( v o i d ) get zeroed page (GFP KERNEL ) ;i f ( ! k g r i d )goto done ;k t m p g r i d k m a l l o c ( s i z e o f ( k t m p g r i d ) , GFP KERNEL ) ;i f ( ! k t m p g r i d )goto f r e e g r i d ;Linux Device Drivers, Technion, Jan 2005 – p.16/50

klife open - init klife cont’k timer hook . f u n c k l i f e t i m e r i r q h a n d l e r ;k timer hook . data k ;return 0;free grid :free page ( ( unsigned l o n g ) k g r i d ) ;done :return ret ;}Linux Device Drivers, Technion, Jan 2005 – p.17/50

klife releaseklife’s release routine frees the resource allocated duringopen time.s t a t i c i n t k l i f e r e l e a s e ( s t r u c t inode inode , s t r u c t f i l e f i l p ){s t r u c t k l i f e k f i l p p r i v a t e d a t a ;i f ( k t i m e r )klife timer unregister (k );i f ( k mapped ) {/ undo s e t t i n g t h e g r i d page t o be r e s e r v e d /ClearPageReserved ( v i r t t o p a g e ( k g r i d ) ) ;}free klife (k );return 0;}Linux Device Drivers, Technion, Jan 2005 – p.18/50

Commentary on open and releaseBeware of races if you have any global data . . . many adriver author stumble on this point.Note also that release can fail, but almost no onechecks errors from close(), so it’s better if it doesn’t . . .Question: what happens if the userspace programcrashes while holding your device file open?Linux Device Drivers, Technion, Jan 2005 – p.19/50

writeFor klife, I “hijacked” write to mean “please initialize thegrid to these starting positions”.There are no hard and fast rules to what write has tomean, but it’s good to KISS (Keep It Simple, Silly.)Linux Device Drivers, Technion, Jan 2005 – p.20/50

klife write - 1s t a t i c s s i z e t k l i f e w r i t e ( s t r u c t f i l e f i l p , c o n s t char user ubuf ,s i z e t count , l o f f t f pos ){s i z e t sz ;char k b u f ;s t r u c t k l i f e k f i l p p r i v a t e d a t a ;ssize t ret ;sz count PAGE SIZE ? PAGE SIZE : count ;k b u f k m a l l o c ( sz , GFP KERNEL ) ;i f ( ! kbuf )r e t u r n ENOMEM;Not trusting users: checking the size of the user’s bufferLinux Device Drivers, Technion, Jan 2005 – p.21/50

klife write - 2r e t EFAULT ;i f ( copy from user ( k b u f , ubuf , sz ) )goto f r e e b u f ;r e t k l i f e a d d p o s i t i o n ( k , k b u f , sz ) ;i f ( ret 0)r e t sz ;free buf :kfree ( kbuf ) ;return ret ;}Use copy from user in case the user is passing a badpointer.Linux Device Drivers, Technion, Jan 2005 – p.22/50

Commentary on writeNote that even for such a simple function, care must beexercised when dealing with untrusted users.Users are always untrusted.Always be prepared to handle errors!Linux Device Drivers, Technion, Jan 2005 – p.23/50

readFor klife, read means “please calculate and give me thenext generation”.The bulk of the work is done in two other routines:klife next generation calculates the next generationbased on the current one, according to the rules ofthe game of life.klife draw takes a grid and “draws” it as a singlestring in a page of memory.Linux Device Drivers, Technion, Jan 2005 – p.24/50

klife read - 1s t a t i c ssize tk l i f e r e a d ( s t r u c t f i l e f i l p , char ubuf , s i z e t count , l o f f t f pos ){struct klife klife ;char page ;ssize t len ;ssize t ret ;unsigned l o n g f l a g s ;k l i f e f i l p p r i v a t e d a t a ;/ s p e c i a l h a n d l i n g f o r mmap /i f ( k l i f e mapped )r e t u r n klife read mapped ( f i l p , ubuf , count , f pos ) ;i f ( ! ( page k m a l l o c ( PAGE SIZE , GFP KERNEL ) ) )r e t u r n ENOMEM;Linux Device Drivers, Technion, Jan 2005 – p.25/50

klife read - 2s p i n l o c k i r q s a v e (& k l i f e l o c k , f l a g s ) ;klife next generation ( k l i f e ) ;l e n k l i f e d r a w ( k l i f e , page ) ;s p i n u n l o c k i r q r e s t o r e (& k l i f e l o c k , f l a g s ) ;i f ( len 0 ) {r e t len ;goto free page ;}/ l e n can ’ t be n e g a t i v e /l e n min ( count , ( s i z e t ) l e n ) ;Note that the lock is held for the shortest possible time.We will see later what the lock protects us against.Linux Device Drivers, Technion, Jan 2005 – p.26/50

klife read - 3i f ( copy to user ( ubuf , page , l e n ) ) {r e t EFAULT ;goto free page ;} f pos l e n ;r e t len ;free page :k f r e e ( page ) ;return ret ;}copy to user in case the user is passing us a bad page.Linux Device Drivers, Technion, Jan 2005 – p.27/50

klife read - 4s t a t i c ssize tklife read mapped ( s t r u c t f i l e f i l p , char ubuf , s i z e t count ,l o f f t f pos ){struct klife klife ;unsigned l o n g f l a g s ;k l i f e f i l p p r i v a t e d a t a ;s p i n l o c k i r q s a v e (& k l i f e l o c k , f l a g s ) ;klife next generation ( k l i f e ) ;s p i n u n l o c k i r q r e s t o r e (& k l i f e l o c k , f l a g s ) ;return 0;}Again, mind the short lock holding time.Linux Device Drivers, Technion, Jan 2005 – p.28/50

Commentary on readThere’s plenty of room for optimization in this code. . . can you see where?Linux Device Drivers, Technion, Jan 2005 – p.29/50

ioctlioctl is a “special access” mechanism, for operationsthat do not cleanly map anywhere else.It is considered extremely bad taste to use ioctls inLinux where not absolutely necessary.New drivers should use either sysfs (a /proc -like virtualfile system) or a driver specific file system (you canwrite a Linux file system in less than a 100 lines ofcode).In klife, we use ioctl to get the current generationnumber, for demonstration purposes only . . .Linux Device Drivers, Technion, Jan 2005 – p.30/50

klife ioctl - 1s t a t i c i n t k l i f e i o c t l ( s t r u c t inode inode , s t r u c t f i l e f i l e ,unsigned i n t cmd , unsigned l o n g data ){s t r u c t k l i f e k l i f e f i l e p r i v a t e d a t a ;unsigned l o n g gen ;i n t enable ;int ret ;unsigned l o n g f l a g s ;ret 0;s w i t c h ( cmd ) {case KLIFE GET GENERATION :s p i n l o c k i r q s a v e (& k l i f e l o c k , f l a g s ) ;gen k l i f e gen ;s p i n u n l o c k i r q r e s t o r e (& k l i f e l o c k , f l a g s ) ;i f ( copy to user ( ( v o i d ) data , & gen , s i z e o f ( gen ) ) ) {r e t EFAULT ;goto done ;}Linux Device Drivers, Technion, Jan 2005 – p.31/50

klife ioctl - 2break ;case KLIFE SET TIMER MODE :i f ( copy from user (& enable , ( v o i d ) data , s i z e o f ( enable ) ) ) {r e t EFAULT ;goto done ;}pr debug ( " user r e q u e s t t o %s t i m e r mode \ n " ,enable ? " enable " : " d i s a b l e " ) ;i f ( k l i f e t i m e r & & ! enable )klife timer unregister ( klife );e l s e i f ( ! k l i f e t i m e r & & enable )klife timer register ( klife );break ;}done :return ret ;}Linux Device Drivers, Technion, Jan 2005 – p.32/50

memory mappingThe read-write mechanism, previously described,involves an overhead of a system call and relatedcontext switching and of memory copying.mmap maps pages of a file into memory, thus enablingprograms to directly access the memory directly andsave the overhead, . . . but:fast synchronization between kernel space and userspace is a pain (why do we need it?),and Linux read and write are really quite fast.mmap is implemented in klife for demonstrationpurposes, with read() calls used for synchronization andtriggering a generation update.Linux Device Drivers, Technion, Jan 2005 – p.33/50

klife mmap.SetPageReserved ( v i r t t o p a g e ( k l i f e g r i d ) ) ;r e t remap pfn range ( vma , vma v m s t a r t ,v i r t t o p h y s ( k l i f e g r i d ) PAGE SHIFT ,PAGE SIZE , vma vm page prot ) ;pr debug ( " io remap page range r e t u r n e d %d \ n " , r e t ) ;i f ( ret 0)k l i f e mapped 1 ;return ret ;}Linux Device Drivers, Technion, Jan 2005 – p.34/50

klife Interrupt HandlerWhat if we want a new generation on every raisedinterrupt?Since we don’t have a hardware device to raiseinterrupts for us, let’s hook into the one hardware everyPC has - the clock - and steal its interrupt!Linux Device Drivers, Technion, Jan 2005 – p.35/50

Usual Request For an Interrupt HandlerUsually, interrupts are requested using request irq():/ c l a i m our i r q /r c ENODEV;i f ( r e q u e s t i r q ( card i r q , & t r i d e n t i n t e r r u p t ,SA SHIRQ , card names [ p c i i d d r i v e r d a t a ] ,card ) ) {p r i n t k (KERN ERR" t r i d e n t : unable t o a l l o c a t e i r q %d \ n " , card i r q ) ;goto o u t p r o c f s ;}Linux Device Drivers, Technion, Jan 2005 – p.36/50

klife Interrupt HandlerIt is impossible to request the timer interrupt.Instead, we will directly modify the kernel code to callour interrupt handler, if it’s registered.We can do this, because the code is open. . .Linux Device Drivers, Technion, Jan 2005 – p.37/50

Aren’t Timers Good Enough For You?“Does every driver which wishes to get periodicnotifications need to hook the timer interrupt?” - Nope.Linux provides an excellent timer mechanism which canbe used for periodic notifications.The reason for hooking into the timer interrupt in klife isbecause we wish to be called from hard interruptcontext, also known as top half context . . . . . whereas timer functions are called in softirq bottomhalf context.Why insist on getting called from hard interrupt context?So we can demonstrate deferring work.Linux Device Drivers, Technion, Jan 2005 – p.38/50

The Timer Interrupt Hook PatchThe patch adds a hook which a driver can register for,to be called directly from the timer interrupt handler. Italso creates two functions:register timer interruptunregister timer interruptLinux Device Drivers, Technion, Jan 2005 – p.39/50

Hook Into The Timer Interrupt Routine 1’ ’ marks the lines added to the kernel. struct timer interrupt hook timer interrupt hook ; s t a t i c v o i d c a l l t i m e r h o o k ( s t r u c t p t r e g s regs ) { s t r u c t t i m e r i n t e r r u p t h o o k hook t i m e r i n t e r r u p t h o o k ; i f ( hook & & hook f u n c ) hook f u n c ( hook data ) ; }@@ 8 5 1 , 6 8 6 2 , 8 @@ v o i d d o t i m e r ( s t r u c t p t r e g s regs )update process times ( user mode ( regs ) ) ;#endifupdate times ( ) ; c a l l t i m e r h o o k ( regs ) ;}Linux Device Drivers, Technion, Jan 2005 – p.40/50

Hook Into The Timer Interrupt Routine 2 i n t r e g i s t e r t i m e r i n t e r r u p t ( s t r u c t t i m e r i n t e r r u p t h o o k hook ) { p r i n t k ( KERN INFO " r e g i s t e r i n g a t i m e r i n t e r r u p t hook %p " " ( f u n c %p , data %p ) \ n " , hook , hook f u n c , hook data ) ; xchg (& timer hook , hook ) ; return 0; } v o i d u n r e g i s t e r t i m e r i n t e r r u p t ( s t r u c t t i m e r i n t e r r u p t h o o k hook ) { p r i n t k ( KERN INFO " u n r e g i s t e r i n g a t i m e r i n t e r r u p t hook \ n " ) ; xchg (& timer hook , NULL ) ; }Linux Device Drivers, Technion, Jan 2005 – p.41/50

Commentary - The Timer Interrupt HookNote that the register and unregister calls use xchg(), toensure atomic replacement of the pointer to thehandler. Why use xchg() rather than a lock?What context (hard interrupt, bottom half, processcontext) will we be called in?Which CPU’s timer interrupts would we be called in?What happens on an SMP system?Linux Device Drivers, Technion, Jan 2005 – p.42/50

Deferring WorkYou were supposed to learn in class about bottomhalves, softirqs, tasklets and other such curse words.The timer interrupt (and every other interrupt) has tohappen very quickly. Why?The interrupt handler (top half, hard irq) usually just setsa flag which says “there is work to be done”.The work is then deferred to a bottom half context,where it is done by an (old style) bottom half, softirq, ortasklet.For klife, we defer the work we wish to do (updating thegrid) to a bottom half context by scheduling a tasklet.Linux Device Drivers, Technion, Jan 2005 – p.43/50

Preparing The TaskletDECLARE TASKLET DISABLED ( k l i f e t a s k l e t , k l i f e t a s k l e t f u n c , 0 ) ;s t a t i c void k l i f e t i m e r r e g i s t e r ( s t r u c t k l i f e k l i f e ){unsigned l o n g f l a g s ;int ret ;s p i n l o c k i r q s a v e (& k l i f e l o c k , f l a g s ) ;/ prime t h e t a s k l e t w i t h t h e c o r r e c t data ours /t a s k l e t i n i t (& k l i f e t a s k l e t , k l i f e t a s k l e t f u n c ,( unsigned l o n g ) k l i f e ) ;r e t r e g i s t e r t i m e r i n t e r r u p t (& k l i f e timer hook ) ;i f ( ! ret )k l i f e t i m e r 1 ;s p i n u n l o c k i r q r e s t o r e (& k l i f e l o c k , f l a g s ) ;pr debug ( " r e g i s t e r t i m e r i n t e r r u p t r e t u r n e d %d \ n " , r e t ) ;}Linux Device Drivers, Technion, Jan 2005 – p.44/50

The klife TaskletHere’s what our klife tasklet does:First, it derives the klife structure from the parameter itgets.Then, it locks it, to prevent concurrent access onanother CPU. What are we protecting against?Then, it generates the new generation.What must we never do here?Hint: can tasklets block?Last, it releases the lock.Linux Device Drivers, Technion, Jan 2005 – p.45/50

Deferring Work - The klife Tasklets t a t i c v o i d k l i f e t i m e r i r q h a n d l e r ( v o i d data ){s t r u c t k l i f e k l i f e data ;/ 2 t i m e s a second /i f ( k l i f e t i m e r i n v o c a t i o n % (HZ / 2 ) 0 )t a s k l e t s c h e d u l e (& k l i f e t a s k l e t ) ;}s t a t i c v o i d k l i f e t a s k l e t f u n c ( unsigned l o n g data ){s t r u c t k l i f e k l i f e ( v o i d ) data ;s p i n l o c k (& k l i f e l o c k ) ;klife next generation ( k l i f e ) ;s p i n u n l o c k (& k l i f e l o c k ) ;}Linux Device Drivers, Technion, Jan 2005 – p.46/50

Adding klife To The Build SystemBuilding the module in kernel 2.6 is a breeze. All that’srequired to add klife to the kernel’s build system are thesetiny patches:In drivers/char/Kconfig: c o n f i g GAME OF LIFE t r i s t a t e " k e r n e l game o f l i f e " help K e r n e l i m p l e m e n t a t i o n o f t h e Game o f L i f e .in drivers/char/Makefile o b j ( CONFIG GAME OF LIFE ) k l i f e . oLinux Device Drivers, Technion, Jan 2005 – p.47/50

SummaryWriting Linux drivers is easy . . . . . and fun!Most drivers do fairly simple things, which Linuxprovides APIs for.The real fun is when dealing with the hardware’s quirks.It gets easier with practice . . . . . but it never gets boring.Questions?Linux Device Drivers, Technion, Jan 2005 – p.48/50

Where To Get HelpgoogleCommunity resources: web sites and mailing lists.Distributed documentation (books, articles, magazines)Use The Source, Luke!Your fellow kernel hackers.Linux Device Drivers, Technion, Jan 2005 – p.49/50

Bibliographykernelnewbies - http://www.kernelnewbies.orglinux-kernel mailing list archives h t t p : / / marc . theaimsgroup . com / ? l l i n u x k e r n e l&w 2Understanding the Linux Kernel, by Bovet and CesatiLinux Device Drivers, 3rd edition, by Rubini et. al.Linux Kernel Development, 2nd edition, by Robert Love/usr/src/linux-xxx/Linux Device Drivers, Technion, Jan 2005 – p.50/50

Why Write Linux Device Drivers? For fun, For profit (Linux is hot right now, especially embedded Linux), To scratch an itch. Because you can! OK, but why Linux drivers? Because the source is available. Because of the community's cooperation and involvement. Have I mentioned it's fun yet? Linux Device Drivers, Technion, Jan 2005 - p.2/50