Introduction To NASM Programming

Transcription

Introduction toNASM ProgrammingICS312Machine-Level andSystems ProgrammingHenri Casanova (henric@hawaii.edu)

Machine code Each type of CPU understands its own machine language Instructions are numbers that are stored in bytes in memoryEach instruction has its unique numeric code, called theopcodeInstruction of x86 processors vary in size Some may be 1 byte, some may be 2 bytes, etc.Many instructions include operands as wellopcode Example: operandsOn x86 there is an instruction to add the content of EAX to thecontent of EBX and to store the result back into EAXThis instruction is encoded (in hex) as: 03C3Clearly, this is not easy to read/remember

Assembly code An assembly language program is stored as textEach assembly instruction corresponds to exactlyone machine instruction Not true of high-level programming languagesE.g.: a function call in C corresponds to many, manymachine instructionsThe instruction on the previous slides (EAX EAX EBX) is written simply as:add eax, ebxmnemonicoperands

Assembler An assembler translates assembly code intomachine codeAssembly code is NOT portable across architectures In this course we use the Netwide Assembler(NASM) assembler to write 32-bit Assembler Different ISAs, different assembly languagesSee Homework #0 for getting NASM installed/runningNote that different assemblers for the sameprocessor may use slightly different syntaxes for theassembly code The processor designers specify machine code, whichmust be adhered to 100%, but not assembly code syntax

Comments Before we learn any assembly, it’s importantto know how to insert comments into asource file Uncommented code is a really bad ideaUncommented assembly is a really, really badideaIn fact, commenting assembly is necessaryWith NASM, comments are added after a ‘;’Example:add eax, ebx ; y y b

Assembly directives Most assembler provides “directives”, to do things that are notpart of the machine code but are convenientDefining immediate constants Including files %include “some file”If you know the C preprocessor, these are the same ideas as Say your code always uses the number 100 for a specific thing,say the “size” of an arrayYou can just put this in the NASM code:%define SIZE 100Later on in your code you can do things like:moveax, SIZE#define SIZE 100or#include “stdio.h”Use %define whenever possible to avoid “code duplication” Because code duplication is evil

NASM Program Structure; include directivessegment .data; DX directivessegment .bss; RESX directivessegment .text; instructions

C Driver for Assembly code Creating a whole program in assembly requires a lot of work e.g., set up all the segment registers correctlyYou will rarely write something in assembly from scratch, butrather only pieces of programs, with the rest of the programswritten in higher-level languages like CIn this class we will “call” our assembly code from C The main C function is called a driverint main()// C driver{int ret status;ret status asm main();return ret status;}.add eax, ebxmov ebx, [edi].

So what’s in the text segment? The text segment defines the asm main symbol:! global! asm main:! ! asm main; makes the symbol visible; marks the beginning of asm main; all instructions go hereOn Windows, you need the ‘ ’ before asm main although in C thecall is simply to “asm main” not to “ asm main”On Linux you do not need the ‘ ’I’ll assume Linux from now on (e.g., in all the .asm files on thecourse’s Web site)

NASM Program Structure; include directivessegment .data; DX directivessegment .bss; RESX directivessegment .textglobal asm mainasm main:; instructions

More on the text segment Before and after running the instructions of your programthere is a need for some “setup” and “cleanup”We’ll understand this later, but for now, let’s just accept thefact that your text segment will always looks like this:enter 0,0pusha;; Your program here;popamov eax, 0leaveret

NASM Skeleton File; include directivessegment .data; DX directivessegment .bss; RESX directivessegment .textglobal asm mainasm main:enter0,0pusha; Your program herepopamoveax, 0leaveret

Our First Program Let’s just write a program that adds two 4byte integers and writes the result to memory Yes, this is boring, but we have to startsomewhereThe two integers are initially in the .datasegment, and the result will be written inthe .bss segment

Our First Programsegment .datainteger1dd15integer2dd6segment .bssresultresd1segment .textglobal asm mainasm main:enter0,0pushamoveax, [integer1]addeax, [integer2]mov[result], eaxpopamoveax, 0leaveret; first int; second int; result; eax int1; eax int1 int2; result int1 int2File ics312 first v0.asmon the Web site

I/O? This is all well and good, but it’s not very interesting if we can’t“see” anything We would like to: Be able to provide input to the programBe able to get output from the programAlso, debugging will be difficult, so it would be nice if we couldtell the program to print out all register values, or to print outthe content of some zones of memoryDoing all this requires quite a bit of assembly code andrequires techniques that we will not see for a whileThe author of our textbook provides a nice I/O package thatwe can just use, without understanding how it works for now

asm io.asm and asm io.inc The “PC Assembly Language” book comes withmany add-ons and examples A very useful one is the I/O package, which comesas two files: Downloadable from the course’s Web siteasm io.asm (assembly code)asm io.inc(macro code)Simple to use: Assemble asm io.asm into asm io.oPut “%include asm io.inc” at the top of your assemblycodeLink everything together into an executable

Simple I/O Say we want to print the result integer inaddition to having it stored in memoryWe can use the print int “macro” provided inasm io.inc/asmThis macro prints the content of the eaxregister, interpreted as an integerWe invoke print int as:call print intLet’s modify our program

Our First Program%include “asm io.inc”segment .datainteger1integer2segment .bssresultsegment .textdddd156resd 1; first int; second int; resultglobal asm mainasm main:enter0,0pushamoveax, [integer1]addeax, [integer2]mov[result], eaxcallpopamovprint inteax, 0File ics312 first v1.asmon the Web site; eax int1; eax int1 int2; result int1 int2; print result

How do we run the program? Now that we have written our program, say in file ics312 first v1.asmusing a text editor, we need to assemble itWhen we assemble a program we obtain an object file (a .o file)We use NASM to produce the .o file:% nasm -f elf ics312 first v1.asm -o ics312 first v1.oSo now we have a .o file, that is a machine code translation of ourassembly codeWe also need a .o file for the C driver:% gcc -m32 -c driver.c -o driver.o We generate a 32-bit object (our machines are likely 64-bit)We also create asm io.o by assembling asm io.asmNow we have three .o files.We link them together to create an executable:% gcc driver.o ics312 first v1.o asm io.o -o ics312 first v1And voila. right?

The Big 1.oFile2.oFile3.old (“gcc”)executableDriver.cgccDriver.o

More I/OAXAH EAXprint char: prints out the character corresponding to the ASCIIcode stored in ALprint string: prints out the content of the string stored at theaddress stored in eax ALThe string must be null-terminated (last byte 00)print nl: prints a new lineread int: reads an integer from the keyboard and stores it intoeaxread char: reads a character from the keyboard and stores itinto ALLet us modify our code so that the two input integers are readfrom the keyboard, so that there are more convenientmessages printed to the screen

Our First Program%include “asm io.inc”segment .datamsg1db“Enter a number: ”, 0msg2db“The sum of “, 0msg3db“ and “, 0msg4dbsegment .bss“ is: “, 0integer1resd1; first integerinteger2resd1; second integerresultresd1; resultsegment .textglobal asm mainasm main:enter0,0pushamoveax, msg1; note that this is a pointer!callcallprint stringread int; read the first integermov[integer1], eax ; store it in memorymoveax, msg1callprint stringcallread intmov[integer2], eax ; store it in movcallcallpopamovleavereteax, [integer1]eax, [integer2][result], eaxeax, msg2print stringeax, [integer1]print inteax, msg3print stringeax, [integer2]print inteax, msg4print stringeax, [result]print intprint nl; eax first integer; eax second integer; store the result; note that this is a pointer; note that this is a value; note that this is a pointer; note that this is a value; note that this is a pointer; note that this is a valueeax, 0; note that this is a pointer!; read the second integerFile ics312 first v2.asmon the Web site. let’s compile/run it

Our First Program In the examples accompanying our textbookthere is a very similar example of a firstprogram (called first.asm)So, this is great, but what if we had a bug totrack? We will see that writing assembly code is verybug-proneIt would be very cumbersome to rely onprint statements to print out all registers, etc.So asm io.inc/asm also provides twoconvenient macros for debugging!

dum regs and dump mem The macro dump regs prints out the bytes stored in all theregisters (in hex), as well as the bits in the FLAGS register(only if they are set to 1)dump regs13 ‘13’ above is an arbitrary integer, that can be used to distinguishoutputs from multiple calls to dump regsThe macro dump memory prints out the bytes stored inmemory (in hex). It takes three arguments: An arbitrary integer for output identification purposesThe address at which memory should be displayedThe number minus one of 16-byte segments that should bedisplayedfor instancedump mem 29, integer1, 3prints out “29”, and then (3 1)*16 bytes

Using dump regs and dump mem To demonstrate the usage of these twomacros, let’s just write a program thathighlights the fact that the Intel x86processors use Little Endian encodingWe will do something ugly using 4 bytes Store a 4-byte hex quantity that corresponds tothe ASCII codes: “live” “l” 6Ch“i” 69h“v” 76h“e” 65hPrint that 4-byte quantity as a string

Little-Endian Exposed%include “asm io.inc”segment .databytesenddddb06C697665h ; “live”0; nullFileics312 littleendian.asmon the site.let’s run itsegment .textglobal asm mainasm main:enter0,0pushamoveax, bytes; note that this is an addresscallcallprint stringprint nl; print the string at that address; print a new linemovdump memeax, [bytes]0, bytes, 1; load the 4-byte value into eax; display the memorydump regspusha0; display the registerspopamovleavereteax, 0

Output of the programThe program prints“evil” and not “live”The address of “bytes”is 0804A020”“bytes” starts hereevilMemory Dump # 0 Address 0804A0200804A020 65 76 69 6C 00 00 00 00 25 69 00 25 73 00 52 65 "evil?%i?%s?Re"0804A030 67 69 73 74 65 72 20 44 75 6D 70 20 23 20 25 64 "gister Dump # %d"Register Dump # 0EAX 6C697665 EBX B7747FF4 ECX BFBCB2C4 EDX BFBCB254ESI 00000000 EDI 00000000 EBP BFBCB208 ESP BFBCB1E8EIP 080484A4 FLAGS 0282SFand yes, it’s “evil”The “dump” starts ataddress 0804A020 (amultiple of 16)bytes in eax arein the “live” order

Conclusion It is paramount for the assembly languageprogrammer to understand the memorylayout preciselyWe have seen the basics for creating anassembly language program, assembling itwith NASM, linking it with a C driver, andrunning it

We use NASM to produce the .o file: % nasm -f elf ics312_first_v1.asm -o ics312_first_v1.o So now we have a .o file, that is a machine code translation of our assembly code We also need a .o file for the C driver: % gcc -m32 -c driver.c -o driver.o We generate a 32-bit object (our machines are likely 64-bit)