FIXER: Flow Integrity Extensions For Embedded RISC-V

Transcription

FIXER: Flow Integrity Extensions for EmbeddedRISC-VTrent JaegerSwaroop GhoshAditya BasuAsmit DeSchool of EECSSchool of EECSSchool of EECSSchool of EECSThe Pennsylvania State University The Pennsylvania State University The Pennsylvania State University The Pennsylvania State UniversityUniversity Park, USAUniversity Park, USAUniversity Park, USAUniversity Park, mit@psu.eduControl Flow Integrity (CFI) [4] involves staticallycomputing a valid control flow graph (CFG) of the program andensuring that during runtime, the program abides by that CFG.A coarse-grained approach to ensuring control flow integritywhile returning from functions is the use of a shadow stack (aseparate stack residing in a secure memory location) [5]. Oneach function call, the return address is saved on the shadowstack alongside being put on the stack normally. While returningfrom a function, the return address on the stack is validatedagainst the one on the shadow stack. On mismatch, it is assumedthat the return address has been compromised and the programexecution is halted. However, a shadow stack can be expensiveand can hurt performance since the pages housing the shadowstack may not be present in cache and will require hundreds tothousands of cycles to bring the page onto the cache and performthe validation. Several software and compiler level systems havebeen proposed in literature for supporting shadow stack [6-7].System StackB. Defense MechanismsStack canaries [1] are sacrificial words placed on the stackat stack frame boundaries to detect potential return addressoverwriting. If an adversary overflows a buffer in order toc978-3-9819263-2-3/DATE19/ saddressSaved %ebpLocal variablesBuffer23Fig. 1. Buffer overflow exploit.348foo() Stack FrameA. Security VulnerabilitiesTraditional computing systems are inherently vulnerable toa wide attack surface from the topmost application level to thesystems architecture level, leading to serious security andintegrity concerns such as leaking private SSH keys or launchingDenial-of-Service (DOS) attacks. Programming languages likeC which are closer to the hardware, provide a lot of flexibility interms of memory and IO access to facilitate system and devicelevel programming. However, this also means that suchlanguages often tend to have inherent security deficiencies andcan lead to vulnerabilities if not used with proper and securepractices. Buffer overflow is the most commonly exploitedvulnerability that can cater to a wide attack surface. In a programwithout bounds checking, an adversary can overload a user inputwith excess data that can overrun the buffer capacity andoverwrite nearby memory locations with potentially maliciousdata (Fig. 1), leading to several attack scenarios, such as returnoriented programming (ROP), VTable hijacking, functionpointer manipulation and even violation of data flow in program.Address Space Layout Randomization (ASLR) [3]randomizes the code, stack, heap, and shared library locationson the address space, to make it difficult for the adversary todetermine the specific addresses and launch attacks. However,buffer over-read and side-channel vulnerabilities can be used byan adversary to reverse engineer the randomized address.foo(): SomePushPushJumpcodeargs for bar()return address on stackto bar()1bar():bar() Stack FrameI. INTRODUCTIONData Execution Prevention (DEP) [2] is employed toprevent an adversary from injecting malicious code onto thestack. Memory pages are marked WْX, meaning, a page caneither be executable (code) or be writable (stack, heap), but notboth. This prevents an adversary from executing malicious codefrom the stack. However, an adversary can return to existingcode in the program or functions in the linked library usinggadget chains (return-to-libc attack).Malicious PayloadKeywords—Buffer overflow, ROP, Shadow Stack, RISC-Voverwrite the return address, the canary word will also beoverwritten. Before returning in call stack, canary word ischecked, and if modified, the return address is assumed to becompromised, and the program is halted.Links to malicious locationAbstract—With the recent proliferation of Internet of Things(IoT) and embedded devices, there is a growing need to develop asecurity framework to protect such devices. RISC-V is a promisingopen source architecture that targets low-power embeddeddevices and SoCs. However, there is a dearth of practical and lowoverhead security solutions in the RISC-V architecture. Programscompiled using RISC-V toolchains are still vulnerable to codeinjection and code reuse attacks such as buffer overflow andreturn-oriented programming (ROP). In this paper, we proposeFIXER, a hardware implemented security extension to RISC-Vthat provides a defense mechanism against such attacks. FIXERenforces fine-grained control-flow integrity (CFI) of runningprograms on backward edges (returns) and forward edges (calls)without requiring any architectural modifications to the RISC-Vprocessor core. We implement FIXER on RocketChip, a RISC-VSoC platform, by leveraging the integrated Rocket CustomCoprocessor (RoCC) to detect and prevent attacks. Compared toexisting software based solutions, FIXER reduces energy overheadby 60% at minimal execution time (1.5%) and area (2.9%)overheads. Some code (adversary injectspayload here)Begin bar() epilogue actionsJump to malicious location23

The common challenges associated with the existing securehardware platforms include design overhead, lack of provisionsto patch the design and keep pace with rapidly evolving threats,need of code changes or instrumentation of the programbinaries, compiler modifications, and, lack of adaptability toadjust the security level in runtime as needed. Furthermore,these platforms are associated with performance impact. Toalleviate these issues, a decoupled architecture using hardwareperformance monitors implemented on a RISC-V coprocessorhas been proposed in [29].In this work, we propose Flow Integrity eXtensions forEmbedded RISC-V (FIXER), a low energy, low overheadsecurity solution that ensures integrity of backward and forwardedge control flow of programs running on a RISC-V core.FIXER decouples the security architecture from the RISC-Vcore architecture, enabling a highly flexible security systemdesign. In the target deployment platform, the unmodifiedRISC-V core will be a hard IP, while the dynamicallyreconfigurable FIXER coprocessor will be implemented on anHDFI [24]NILE [29]FIXER9988898988HAFIX [20]9888998989GRIFFIN [22]9988999989CFI [4]9 888998989PUMP [18]Control flow hijacking protectionData flow hijacking protectionMaintains high-performanceLow energy overheadNo architecture modificationsNo source code pre-processingNo compiler modificationsSoftware flexibilityHardware flexibilityDynamic patchingASLR [3]Secure hardware platforms e.g., ARM TrustZone [12] andIntel Software Guard Extensions (SGX) [13] isolate thehardware so that the access to systems assets are restricted.Hardware acceleration of security validation has been proposedto address the performance impact partially while covering asubset of security threats e.g., Intel CFI EnforcementTechnology (CET) [14] to protect against control-flowhijacking. Intel Memory Protection Extensions (MPX) [15] withextended instruction set architecture is developed to preventmemory safety violations such as buffer overflow, heapoverflow and pointer corruption. Intel TransactionalSynchronization Extensions (TSX) [16] exposes and exploitshidden concurrency in multi-threaded applications. Intel PT [17]logs TSX events when a transaction begins, commits or aborts.It has been shown in [18] that tagging of code and data usingsoftware-defined metadata and processing the tag using customdesigned processor can detect ROP, code reuse, buffer overflow,code injection, memory safety violation and pointer corruption.Although effective, this new architecture cannot be readilydeployed due to lack of re-configurability, and, area, energy andperformance overhead. Other hardware-assisted techniques toprotect forward and backward edges in control flow areproposed in [19-22]. Data flow protection in stack and heapusing hardware assistance is also proposed [23-24]. Specializedhardware stack redundancy systems have also been developedfor embedded systems [25-28], however these are architecturedependent and cannot be updated post-deployment.TABLE I. QUALITATIVE COMPARISON OF FIXER WITH RELATED WORKSCanary [1]Even with the presence of a shadow stack, an adversary canbend the control flow of a program. To prevent such incorrectcontrol flows for indirect calls, the program is first analyzed tocompute a coarse-grained or fine-grained CFG [4]. A controlflow policy matrix can then be created from the CFG thatspecifies the allowed call targets for each call site. Duringexecution of the program, for each indirect call, the policymatrix is looked up to determine the validity of the call target.However, this approach still suffers from similar performancedegradation if the policy resides in memory. Compile-time andruntime enforcement of CFI have been shown in [8-9]. Lazy CFI[10-11] can somewhat alleviate the performance loss, but thatleaves room for generating false 9899989999on-chip FPGA. Such an approach has the potential to be scaledto hybrid processor designs e.g., a Xeon FPGA core [30]. Insuch designs, the primary core can be completely unmodified,while the re-configurable FPGA core can be utilized toimplement the security architecture. The FPGA also providesthe flexibility to change and update the security architecture indemand to new threats, without a complete redesign of theprimary computing core. With the number of vulnerabilitiesrapidly increasing, it demands an efficient low-power flexibleand scalable security solution that is sustainable for long periodsof time. FIXER potentially unlocks the design capability toprotect our systems from such cybersecurity threats. Softwarebased CFI techniques are also limited by the size of the addressspace, which can be overcome by FIXER’s flexible FPGAimplementation. Compared to NILE [29], FIXER achievesbetter performance. Although NILE uses an unmodified RISCV core similar to FIXER, the core-coprocessor interface ismodified for the coprocessor to tap into more resources of thecore. Table I shows a qualitative comparison of FIXER with thestate-of-the-art memory protection solutions. The majorcontributions of this work are, (a) a decoupled and flexiblecoprocessor based design for security assurance; (b)enforcement of backward edge and forward edge CFIprotection; (c) low energy overhead than [29]; (d) ease of reconfigurability to address new security threats and attacks.The paper is organized as follows: Section II provides anoverview of the RocketChip and the Rocket CustomCoprocessor architecture. Section III describes the FIXERdesign flow and implementation. Experimental results arepresented in Section IV. Security implications are discussed inSection V and conclusions are drawn in Section VI.II. OVERVIEW OF THE ROCKETCHIP ARCHITECTUREFIXER architecture is based on Rocket Chip [31] (written inCHISEL [32]), an open source parameterized system-on-chip(SoC) design generator. We use the RocketChip generator togenerate synthesizable RTL for the standard Rocket Core SoC,a six-stage single-issue in-order pipeline processor that executesthe 64-bit scalar RISC-V ISA (Fig. 2(a)). The Rocket Tileconsists of the scalar core, the L1 instruction and data caches,and the Rocket Custom Coprocessor (RoCC). The RoCC acts asa user customizable accelerator for the core and can be triggeredby a set of custom instructions capable of communicatingbetween the core and the RoCC over the RoCCIO interface.Design, Automation And Test in Europe (DATE 2019)349

validrd[4:0]data[63:0]valid1112 xs2 113 xs1 114 xd 115L2 CacheMemIOMemory731TABLE II. ROCC INSTRUCTION OPCODESOpcode0001011010101110110111111011RoCC Instructions: In general, 32-bit RoCC instructionsextend the RISC-V ISA and are encoded as shown in Fig. 2(b).The four custom instructions supported by Rocket Chip is shownin Table II. The xs1, xs2, and xd bits control read and write ofthe core registers by the RoCC instruction. If xs1 is 1, then the64-bit value in the integer register specified by rs1 is passed tothe RoCC. If the xs1 bit is clear, no value is passed over theRoCCIO interface. Similarly, xs2 bit controls the read of registerspecified by rs2. If the xd bit is 1 and rd is not 0, the core willwait for a value to be returned by the coprocessor over theRoCCIO after issuing the instruction to the coprocessor. Thevalue is then written to the register specified by rd. If the xd is 0or rd is 0, the core will not wait for a value from RoCC. Theopcode field specifies the custom instruction for the RoCC, andthe funct7 field further specifies a user-defined functionimplemented in the RoCC. The RoCC is responsible forsignaling illegal instructions to the core.RoCCIO Interface: The RoCC interacts with the Rocketcore and the shared memory system via the standard RoCCIOinterface (Fig. 2(a)). The core initiates a coprocessor commandby passing the RoCC instruction directly to the coprocessor viainst, as well as the relevant register values via rs1 and rs2. If theinstruction supplied to the RoCC set the xd bit, then the RoCCmust eventually supply a response value over the RoCCresponse interface via data.III. FIXER SECURITY ARCHITECTUREA. FIXER Design for Backward-Edge CFIThe first security primitive implemented in FIXER toprevent a memory corruption vulnerability is a Shadow Stack. Cprograms compiled with the GNU GCC Toolchain for RISC-Vtarget architecture do not provide any protection againstmemory corruption vulnerabilities such as, buffer overflow. Anadversary can provide malicious inputs to a program and iscapable of overwriting the return address of a function and3503foo() Stack ess 1Local variables4Buffer []12bar():Saved %ebp3 Some codePush args for bar()Push return address on stackPush return address on RoCCShadow StackJump to bar() Some code (adversary may injectpayload here)Retrieve return address from3RoCC Shadow StackCompare retrieved address withthe return address on stack4 Match: Proceed execution Mismatch: Throw CFI errorFig. 3. CFI violation detection using a Shadow Stack.(a)(b)Fig. 2. (a) RocketChip architecture. FIXER coprocessor is also shown, (b)RoCC instruction encoding.RoCC Instructioncustom0custom1custom2custom32foo(): 52425L1 D Return addressCFI errorif mismatchfunct7TileLinkIOL1 I 51920rs2ready5System Stackbar() Stack erfaceRequestInterfaceinst[31:7]Shadow Stack7rdRocket Custom Coprocessor[FIXER Security Module]RocketChip Scalar CoreRoCCIORoCC Stack Frame0Rocket Tileredirecting the control flow of the program. The Shadow Stacksecurity primitive can enforce CFI at the backward edge (returnto functions). The RoCC is used to implement the ShadowStack, thus preventing the need to modify the core systemarchitecture. The Shadow Stack is designed as a hardwarememory on the RoCC. Fig. 3 shows the steps for detecting CFIviolation using a Shadow Stack. The return address is pushed onthe system stack by default when a function call is made in theprogram. During this time, same return address is sent using aRoCC custom instruction to the RoCC to push it on the ShadowStack as a backup. The return address is popped from the systemstack to the instruction pointer register for execution whenreturning from a function. During this return the RoCC ShadowStack is queried to retrieve the backup return address andcompare against the one from the system stack. If they match,the program proceeds with normal execution, else a potentialmemory corruption is detected and program execution isstopped. Note that compared to HAFIX [20] where ShadowStack is part of core, FIXER implements it in the coprocessorleaving the core architecture untouched. It is to be noted thatFIXER is complementary to existing DEP protection, since theFIXER instructions must be tamperproof to ensure protection.Fig. 4(a) details the software design flow for FIXER. Thesource code is first marked with CFI tags (for saving to shadowstack and validation) and compiled to an intermediate assemblycode using the RISC-V GNU toolchain. The assembly code isparsed by expanding the tags and injecting the required RoCCinstructions in the assembly. The lifted assembly code isgenerated using a custom parsing script or a compiler pass andthen assembled and linked to produce the fully compiled RISCV binary. These steps are further elaborated in Section II.B.Fig. 4(b) shows the hardware design flow for FIXER (codedin CHISEL [32] as a RoCC). The hardware implementation ofFIXER in RoCC is described in Section II.C. The relevantconfiguration files for RoCC targeting the FPGA platform arealso written. The RocketChip with the RoCC is then compiledwith the RocketChip Generator to output the synthesizableVerilog code, from which the FPGA bitstream is compiled. Therequired RISC-V Linux system image, the FPGA devicetree andthe generated bitstream is then deployed to the FPGA to run theRocketChip system. This FIXER assisted RocketChip systemcan successfully protect against CFI violations on the RISC-Vprograms compiled with FIXER assisted compilation process.B. RISC-V Software Design with FIXERAny program that needs to be backward-edge CFI enforced,is compiled and processed by the following steps:Design, Automation And Test in Europe (DATE 2019)

Source CodeAnnotation Mark CFITags GenerateassemblyTagExpansionCompilation Parse asm Insert RoCCCFI instn LiftassemblyFIXER Design FIXER codein CHISEL FPGA Config Assemble Link RISCVbinary(a)Fig. 4. FIXER design flow in (a) software and (b) hardware.Step 1 - Source code annotation: We annotate the functioncalls and returns with a special tag to indicate the sites where theenforcement needs to take place. We use CFI CALL tag beforea function call and a corresponding CFI RET tag just before areturn from the called function, as shown in Fig. 5.Step 2 – Tag expansion: We expand the CFI tags to actualRISC-V assembly instructions. During compilation, weintercept the intermediate assembly code of the program andinject the RoCC custom instructions to communicate with theRoCC. Fig. 6 shows the assembly instructions corresponding toCFI CALL and CFI RET, that are placed just before the calland jr ra (return) instructions respectively.For CFI CALL, we first retrieve the current value of theprogram counter from the instruction pointer register using theauipc instruction and add 14 bytes offset (instructions arevariable length) to calculate the target return address. We savethe computed return address in a temporary register t0. Then wecraft the RoCC instruction cfi call to push the returnaddress from t0 to the Shadow Stack. A generic 32-bit RoCCinstruction extends the RISC-V ISA and is encoded in theformat as shown in Fig. 3. There are four RoCC instructionsavailable (custom0-3) that are identified by the 7-bit opcodefield, as shown in Table I. The funct7 field can be used to furtherspecify a particular function of the RoCC instruction. We usecustom0 to implement the CFI instructions. We set the funct7field to b’0000000 (0) for cfi call and to b’0000001 (1) forcfi ret. We use the rs1 field to set it to use the t0 register(b’00101), where we temporarily stored the computed returnaddress and set the corresponding xs1 bit to 1. The final craftedinstruction word for cfi call is represented by 0x0002a00b.For CFI RET, we set the funct7 field to b’0000001 (1) andset the rd field to use the t0 temporary register (b’00101) alongwith xd bit as 1. The final crafted instruction word for cfi retis represented by 0x0200428b. During a return from a function,the saved return address is popped from the system stack on tothe link register ra. We then use the cfi ret custominstruction to retrieve the backup return address from the RoCCvoid main () {.CFI CALLmyFunc();.}Fig. 5. Source code annotationvoid myFunc() {.CFI RETreturn;}# CFI ig. 6. Tag expansion# CFI RET.word0x0200428bbnet0,ra, cfi errorjrraSynthesis GenerateVerilog SynthesizeVerilog FPGAbitstreamDeployment Pack bin Generatedevicetree Compileriscv-linux Flash FPGA(b)Shadow Shack on to the temporary register t0. The value in t0 isthen compared against the value in the register ra using the bneinstruction. If they match, the execution proceeds by completingthe return (jr ra: jump register), else we throw a CFI error.Step 3 – Compilation: The final CFI enforced assemblycode is passed to the compiler to assemble, link and generate thefinal executable binary of the program. No compilermodifications are necessary to embed the instructions in the finalbinary since we provided the custom instruction as a binaryinstruction word, and the RoCC instruction format is alreadysupported by the GNU toolchain.C. FIXER Hardware Implementation in RoCCFig. 7 shows the FIXER implementation in the RoCC. Theprogram binary runs on the Rocket Core and sends RoCCinstructions over the RoCCIO whenever a security validation isrequired. The RoCC instruction is first passed through the Cmddecoder, which extracts the individual fields of the RoCCinstruction, and the contents of the two registers rs1 and rs2 ifspecified. The opcode field is decoded to the custom0 instructionin our implementation. The funct7 field is decoded to interpret acfi call or a cfi ret.For cfi call, the contents of core register t0 (the returnaddress) is sent through the rs1[63:0] field of the RoCCIOinterface. The shadow stack is implemented as a SRAMmemory with 64-bit wide words. A top-of-stack register (ToS)holds the address of the top of the shadow stack. If acfi call is interpreted, the content of the ToS register isincremented by 1. The updated value in the ToS register is usedto decode the write address for the shadow stack. The value inthe rs1 field is written to this address on the shadow stack. Thisoperation is non-blocking, so the core can continue executionafter issuing the cfi call instruction. There is a commandqueue at the RoCCIO interface to prevent race conditions. If theinstruction function is interpreted as cfi ret, then the ToSregister is read to obtain the address for the shadow stack. Thisaddress is used to read the saved return address from theshadow stack memory. The value is then sent back to the coreFig. 7. FIXER implementation in RoCC.Design, Automation And Test in Europe (DATE 2019)351

by writing to the rd[63:0] field of the response interface of theRoCCIO, which writes the value to the t0 register on the coreas indicated by the RoCC instruction. Our proof-of-conceptimplementation of the shadow stack can accommodate 1000addresses. However, this can be updated on demand by simplyreconfiguring the FIXER module on the FPGA, a benefitexclusive to our implementation. The size of the shadow stackwill be limited by the memory available on the target FPGA.D. Forward-edge Protection with FIXERA shadow stack only protects control flow on returnboundaries. However, programs often use function pointers tojump to multiple function addresses. To ensure the validity ofsuch function calls using function pointers, a pre-computed callpolicy is enforced. A static or runtime analysis is performed onthe program to construct a control flow graph (CFG). The CFGis represented as a policy matrix that indicates the valid calltargets for each function call made using a function pointer. Thepolicy matrix is loaded in memory and at runtime, it is queriedto validate the call target for every indirect function call. Thisforward-edge protection is implemented as another FIXERsecurity module (Fig. 7). The policy matrix memory is createdin the RoCC along with peripheral caller and callee addressdecoders. Our proof-of-concept implementation has 64 rows(each represents an originating call site address) in the matrixand each row holds a 64-bit policy vector (each bit represents acall target address). A set (unset) bit indicates that the call isvalid (invalid) for that (caller, callee) pair. A RoCC instructioncfi matld is used to load the policy bitmap into the FIXERmodule prior to the program execution. A RoCC instructioncfi fwd is inserted before every indirect function call in thesource code. The cfi fwd instruction sends the caller and thedereferenced function pointer (callee) addresses to the RoCC forvalidation. The forward-edge FIXER module then validates theaction using the policy matrix and sends back a 1 or 0 indicatingallow or disallow respectively. Similar to the shadow stackimplementation, the policy matrix size can also be updated postdeployment by reconfiguring the FPGA.IV. EXPERIMENTAL RESULTSWe implemented FIXER on a Xilinx Zynq FPGA. Thehardware architecture of the security module is coded inCHISEL in the RocketChip Generator. The high level CHISELcode is translated to synthesizable Verilog code using theavailable tools in the RocketChip Generator. We prepared aFPGA system image using the generated Verilog and ran thesystem on a Zybo board. A sample program is written with 1billion iterations of function calls and returns. One version of thecode implemented a simple software version of the shadow stack(softcfi). The software shadow stack is created as a regular stackin the address space. During function calls, the return address issimultaneously placed on the system stack as well as the shadowstack. Another version instrumented the code with the proposedRoCC CFI instructions (FIXER). We compiled the baseline(base code with no CFI checks), the softcfi and FIXER versionsusing the RISC-V GNU GCC compiler. The three versions ofthe program were run on the RocketChip system running on theFPGA. The base code takes 19 seconds to execute, whereas thesoftware enforced CFI code takes 74 seconds. FIXER takes 29352Execution rtmedianqsortvvaddmultiplydhrystone(a)(b)Fig. 8. RISC-V benchmark evaluation for backward-edge protection w.r.t. (a)execution time (number of cycles), and (b) effective CPI.TABLE III. BENCHMARK INSTRUCTION 037X1.068607XImprovement over seconds resulting in 1.5X overhead over the base code and 2.55X lower overhead compared to the pure softwareenforcement. The FPGA on idle draws 370mA current, while onload (with the program running) draws 420mA current, resultingin 1.13X increase. The corresponding energy overhead is 3.89Xfor the pure software enforced CFI and only 1.53X for theFIXER (60.52% improvement). The FIXER RoCC moduleincurs only 2.9% area overhead over the vanilla RocketChipwithout RoCC.We evaluated FIXER by enforcing it on the set ofbenchmarks provided for testing RISC-V architecture. Thebenchmarks are modified to create three versions forperformance comparison: (i) the baseline with no CFIenforcement, (ii) the softcfi with the software based CFIenforcement, and (iii) the FIXER with RoCC based CFIprotection. We ensured that the benchmark code remains thesame across all the three versions except the CFI enforcementcode. We compiled the benchmarks with the RISC-V GNUtoolchain without any compiler optimizations and ran thecompiled binaries on the Zybo FPGA board. Fig. 9 show theevaluation results backward-edge FIXER. The correspondinginstruction overheads are shown in Table IV. With thebackward-edge protection, the execution time overhead withsoftcfi is 18% on average across the six benchmarks comparedto 1.5% with FIXER. The softcfi increases the CPI (cycles perinstruction) by 4.6% over the baseline, while the FIXERincreases the CPI by only 0.5%. With the forward-edgeprotection, the execution time overhead with softcfi is 2% onaverage across the six benchmarks compared to 0.61% withFIXER and CPI reduces 0.4% on average, which is negligible.V. SECURITY IMPLICATIONSPerformance vs. Security: FIXER is targeted for hybridarchitectures, e.g., CPU FPGA, or ASIC FPGA. Our currentresults are based on both the RocketChip and the RoCCaccelerator being on the FPGA since we do not have access tosuch architecture. It is true that if the FPGA is off-chip, therecould be performance degradation (due to speed gap betweenCPU and FPGA) if the checking is performed in a synchronousand fine-grained manner. One of the ways to reduce theperformance issues is by making the checking asynchronous, byDesign, Automation And Test in Europe (DATE 2019)

using interrupts. In such cases the program can continueexecution, until the FPGA raises an interrupt to halt the program.However, it cannot be guaranteed that the adversary has not beenable to take control of the system before the FPGA detects theattack. When the FPGA is on-chip, e.g., Intel Xeon withembedded FPGA, the performance overheads can be alleviateddue to QuickPath Interconnect (QPI) interface between the coreand the FPGA for fast communication.Security Vulnerabilities and Limitations: FIXER enforcesprotection for a single process only. For a simultaneous multiprocess protection, the FIXER design can be expanded toaccommodate multiple shadow stacks and policy memories fordifferent processes. A round-robin scheduler on the FIXERmodule can assign the shadow stacks and policy memories toeach process based on the process ID. The FIXER module onthe FPGA also needs to be protected from tampering or dataleaks. The current RocketChip implementation allows the entirecode containing custom RoCC instructions to be run withsupervisor privileges. However, this can be restricted via systemcalls so that the RoCC instru

each function call, the return address is saved on the shadow stack alongside being put on the stack normally. While returning from a function, the return address on the stack is validated against the one on the shadow stack. On mismatch, it is assumed that the return address has been compromised and the program execution is halted.