CS61c Summer 2014 Final Exam - University Of California .

Transcription

CS61c Summer 2014 Final ExamRead this first: This exam is marked out of 100 points, and amounts to 30% of your final grade. There are 9 questionsacross 16 pages in this exam. The last question is extra credit – it will not be included in our calculation of the final classcurve.You may use two double sided 8.5x11” page of handwritten notes, as well as the MIPS green sheet provided to you asreferences throughout this exam. Otherwise your only resource is what you know or can deduce during the examinationperiod. Answers should be placed inside the spaces provided on the test, and should be no longer than is necessary tocommunicate the solution. Misplaced or exceedingly verbose solutions may be marked down.You may find this exam difficult. If you do, then do not panic. Remember that different questions have different weights,and some may be more difficult than others. Answer those questions and subquestions in which you are most confidentfirst, then move on to more difficult questions. We recommend that you skim over the entire exam before attempting toanswer any questions. Also be sure to read all questions carefully before answering them, and do not hesitate to ask forclarifications if you are unclear on what a question is asking.There are some true-or-false questions in this exam. Correct answers to these questions will receive full credit, but incorrectanswers will full negative credit. E.g. if a T/F question is worth 1 point then a correct solution yields 1 points, butan incorrect answer yields -1 points. Note that you may get negative points in total on such a question if you chooserandomly.Your Name:Your TA:AndrewDavidFredHokeunKevinLogin:Login of the person to your left:Login of the person to your 2121612121101Time (minutes)212121212130212131801Score

2cs61c-M1: The Smorgasbord(a) How many bits do you need to represent all the characters in c?7, there are 127 characters in c, so we just need 7 bits to represent them.(b) If we were to modify the IEEE standard so that floats did not have an implicit leading 1, how would the number ofrepresentable numbers change (more, fewer, or no difference)? Briefly explain why.Fewer because encoding for many numbers are non-unique.(c) In an a.out file’s text section, you see the left six bits of an instruction is 0x04. As a result of executing thisinstruction:What is the most amount of bytes that your PC can change?217 bytes (off by one error is ok)What is the absolute least amount of bytes that it can change?0 bytes(d) Write two git commands that will track the file “final.c” and commit those changes with the message “DONE”.git add final.cgit commit -m “DONE”(e) Lets say we created a new form of MIPS where there are a total of 64 - 32 bit registers, but we still have a 32 bitaddress space. To make this change possible, we remove the rd field for R-types and assume that we will write tors. For example add t2, t1 adds t1 and t2 together and stores the result in t2. We use however many bitswe need for rs, rt and shamt and use the remaining bits for funct.How many different operations can we have with this new version of MIPS? (Leave your answer as an expression)Just for clarification: addu t1, t2 and addu t4, t5 are considered the same instruction29 2 6 1We removed the rd field, so we have an extra 5 bits. We use 1 bit each for rt and rs because we need to represent64 registers. The rest of the 3 bits then go to the funct. I and J types remain the same number of instructions.

3cs61c-M2: Call me once, return to you. Call me twice, return to . who?Here is a MIPS function to be deciphered. Annotate this function with the values of the specified registers after eachinstruction executes. Please use the following formats: R[15 : 0] - gives the less significant half of R. 0 13 - gives 13 consecutive 0s. 1 12 - gives 12 consecutive 1s. 0 28 1 0 3 - gives 0b1000 (8).mystery:addiu v0 ra 0 v0: R[31 : 0]la t0 mystery1addiu t2 0 -1 t2: 1 322sll t2 t2 28 t2: 1 4 0 283nor t2 0 t2 t2: 0 4 1 284and ra t2 v0 ra: 0 4 R[27 : 0]5sra ra ra 2 ra: 0 6 R[27 : 2]6lui t2 0x800 t2: 0 4 1 0 277or ra t2 ra ra: 0 4 1 0 R[27 : 2]sw ra 4( t0)jr v0Here mystery is called for the first time in the following way. Assume that we are not using a delayed-branching MIPS.0x00000100:0x00000104: foo:0x00000108:.jal mystery.(a) If mystery is only called this one time, what address will it return to?0x00000108(b) What value will it return?0x00000108Suppose mystery is called again in the following location.0x0000100C:0x00001010: bar:0x00001014:.jal mystery.

cs61c-4(c) If mystery is called this second time, what address will it return to?0x00000108(d) What value will it return?0x00001014(e) What is the maximum byte address that mystery can be called from for the first time still function as previouslydescribed? Please leave answers relative to powers of 2.228 8

5cs61c-M3: A is for apple, C is for cacheWe would like to apply the concept of caching to computed data, defining a simulated cache in software, to cache someexpensive computations. We will be looking at an implementation of a software N-way set associative LRU cache. Thecache sets are contiguously packed into an array of single-element blocks. Instead of entire cache lines, single computedvalues should be stored./* Defines a single one-element block in the cache. */typedef struct { int valid; int tag; int value; } block;/* Contains the entire cache’s settings and data. */typedef struct { block * blocks; size t size; size t associativity; } cache;/* Resizes the given cache to the given size and associativity* Also initializes the given cache if it was not initialized previously */void cache resize(cache * sim, size t new size, size t new associativity);/* Copies the contents of one cache to another cache. Assumes both caches* have been initialized. Does not necessarily respect the LRU ordering* from the source cache */void cache copy(cache * dst, cache * src);/* Returns the cached value of a tag or 0 if the tag is not in the cache* "contains" is used to alert when a tag was not found */int cache get(cache * sim, int tag, int * contains);/* Puts a value with the given tag into the cache* Also updates the LRU of the set it is put in */void cache put(cache * sim, int tag, int value);/* Updates the LRU of the set starting at array index "base index"* The LRU block at array index "base index way" is shifted up to "base index" */void cache update LRU(cache * sim, size t base index, size t way);Fill in all of the following sections. Each definition is pasted directly into every instance where it is used. You may useassume any memory allocations always succeed.SECTION Acalloc(new size,sizeof(block))SECTION Bsim- blocksSECTION Csim- blocksSECTION Dsim- blocks alt.blocksSECTION Esrc- blocks[n].validSECTION F*containsvoid cache resize(cache * sim, size t new size, size t new associativity) {cache alt;alt.size new size;alt.associativity sim- associativity new associativity;alt.blocks SECTION A;if(SECTION B) {cache copy(&alt, sim);}sim- size new size;free(SECTION C);SECTION D;}void cache copy(cache * dst, cache * src) {for(size t n 0; n src- size; n ) {if(SECTION E) {

6cs61c-cache put(dst, src- blocks[n].tag, src- blocks[n].value);}}}int cache get(cache * sim, int tag, int * contains) {size t base index (tag % (sim- size / sim- associativity)) * sim- associativity;for(size t n 0; n sim- associativity; n ) {if(sim- blocks[base index n].valid && sim- blocks[base index n].tag tag) {cache update LRU(sim,base index,n);SECTION F 1;return sim- blocks[base index].value;}}SECTION F 0;return 0;}The remaining function implementations are provided for your convenience, but have no sections in need of completing.void cache put(cache * sim, int tag, int value) {int contains 0;cache get(sim, tag, &contains);if(!contains) {size t base index (tag % (sim- size / sim- associativity)) * sim- associativity;block * place sim- blocks (base index sim- associativity-1);place- value value;place- tag tag;place- valid 1;cache update LRU(sim, base index, sim- associativity-1);}}void cache update LRU(cache * sim, size t base index, size t way) {block tmp sim- blocks[base index way];for(size t n way; n 0; n--) {sim- blocks[base index n] sim- blocks[base index n-1];}sim- blocks[base index] tmp;}

cs61c-7M4: Cache, gotta hit it! Missing it would be painful.(a) Following statements are related to cache optimization. Specify whether each of the following statements wouldgenerally be true or false. (Reminder: incorrect answers on T/F questions are penalized with negative credit)T / F – Assuming the same cache size and the same block size, increasing set associativity of a cache reducesconflict misses.T / F – Assuming the same set associativity and the same block size, increasing the size of a cache reducescompulsory misses.T / F – Smaller caches have shorter hit time than larger caches.T / F – Adding a lower level cache reduces miss penalty.T / F – Adding a lower level cache increases hit time.T / F – Increasing set associativity increases hit time.For following questions through (b) to (d), we will be working on a 32-bit byte addressed MIPS machine, with a singledirect mapped 1KiB cache, write-through and no-write allocate policies, and 16B blocks. The cache is initially cold foreach question.Consider the function matrix multiply listed below, which multiplies the matrix A by itself and stores the result intothe matrix C. Note that starting addresses of each Matrix are as in comments.#define N4int A[N * N]; // starts at address 0x10000int C[N * N]; // starts at address 0x20000void matrix multiply() {for (int i 0; i N; i ) {for (int j 0; j N; j ) {for (int k 0; k N; k ) {C[i*N j] A[i*N k] * A[k*N j];}}}}(b) What is the cache hit rate of the function matrix multiply while i 0?13/48(c) If our cache is two-way set-associative with LRU replacement policy, what would be the hit rate of the functionmatrix multiply while i 0?43/48(d) If our cache is two-way set-associative with LRU replacement policy and its block size is 32B, what would be thehit rate of the function matrix multiply while i 0?15/16 ( 45/48)

8cs61c-section*F1: True or false, but not the kind you like(a) Sometimes circuits are equivalent, even though they appear to be quite different. There is one group of equivalentcircuits listed below, circle the circuits that belong to that group. For full credit, you must show your work.ABCDOutABOutCDABOutCDABOutCDABOutCDFirst: A · B · C · DSecond: A · B C · D (A · B) · (C · D) A · B · C · DThird: A B · C D A B C DFourth: (A B) · (C D) (A · B) · (C · D)Fifth: A B C D A · B · C · DFirst, second, and fifth are equivalent.

9cs61c-(b) 2-input NOR gates are said to be complete because any Boolean function can be computed with them. Prove thisfact. Hint: implement a subset of the standard gates (AND, NOT, OR, NOR, NAND, XOR, XNOR) using justNOR gates, then apply a standard boolean algebra technique using these gates.NOTORAABOutOutAANDOutBUsing these three gates and applying the sum of products technique we can compute any boolean function.(c) We want to implement a very simple finite state machine that determines its next state by the result of and ANDoperation on the current state and the input. The output is always the current state. Assume registers have a CLKto Q delay of 5ns, a setup time of 2ns, and a hold time of 3ns. To achieve a clock rate of 25MHz, what is themaximum propogation delay that a NOR gate could have, assuming we are implementing AND as a combination ofone or more of the gates built in part (b)?Max Delay CLK-to-Q CL Setup TimeMax Freq 1/Max Delay1 40ns25MHz40ns 5ns x 2nsx 33nsOur implementation of AND has a critical path of 2 NOR gates, so each NOR gate must have a delay less than orequal to 33/2 16.5ns(d) Complete the state diagram for a finite state machine that outputs 1 if and only if it has just seen the input sequence101 and it has never seen the input sequence 001. You may add more arrows or more states as you see fit. Providea brief description of each state.ExampleInput : 1101010100101Output: 0001010100000Here is one example of a working implementation:

cs61c-10

11cs61c-F2: Datapath-ologyWe want to extend the familiar MIPS datapath to expedite memory operations such as memcpy, strcpy, memset, andsuch. We will implement 3 new R-Type instructions, all of which use counters to allow quickly scanning through memoryvalues. We will use two 0-indexed counters named NL and NS, respectively for loads and stores.rnResets both memory counters.lbn rd, rt, ( rs)Computes an address at NL offset from rs. The contents of memory at this address are loadedinto rt. NL is stored into rd. Then NL is incremented.sbn rd, rt, ( rs)Computes an address at NS offset from rs. The contents rt are stored into memory at thisaddress. NS is stored into rd. Then NS is incremented.For sanity’s sake, we will assume that rd and rt are never equal for the duration of this topic.Fill out the RTL descriptions for these three operations.rnNL 0; NS 0lbn rd, rt, ( rs)R[rt] MEM[R[rs] NL] ; R[rd] NLNL NL 1sbn rd, rt, ( rs)MEM[R[rs] NS] R[rt] ; R[rd] NSNS NS 1Before we implement these instructions, let’s review how they work. Complete the strcpy MIPS code below to copythe string at address a0 to address a1 using only TAL MIPS instructions and these new instructions. Write only oneinstruction in each of the three blank lines. Assume we are not using delayed-branch MIPS.strcpy:1loop:rnlbn 0, t0, ( a0)2sbn 0, t0, ( a1)3bne t0, 0, loopjr raNow, let’s implement these instructions in the datapath. First, we want to implement counters for use in the datapath.We are going to base our counters off of

You may use two double sided 8.5x11” page of handwritten notes, as well as the MIPS green sheet provided to you as referencesthroughoutthisexam. uringtheexamination period. Answers should be placed inside the spaces provided on the test, and should be no longer than is necessary to