Reconfigurable Computing: Architectures And Design Methods

Transcription

Reconfigurable computing: architectures and designmethodsT.J. Todman, G.A. Constantinides, S.J.E. Wilton, O. Mencer, W. Luk and P.Y.K. CheungAbstract: Reconfigurable computing is becoming increasingly attractive for many applications.This survey covers two aspects of reconfigurable computing: architectures and design methods.The paper includes recent advances in reconfigurable architectures, such as the Alters Stratix II andXilinx Virtex 4 FPGA devices. The authors identify major trends in general-purpose and specialpurpose design methods. It is shown that reconfigurable computing designs are capable ofachieving up to 500 times speedup and 70% energy savings over microprocessor implementationsfor specific applications.1IntroductionReconfigurable computing is rapidly establishing itself as amajor discipline that covers various subjects of learning,including both computing science and electronic engineering. Reconfigurable computing involves the use ofreconfigurable devices, such as field programmable gatearrays (FPGAs), for computing purposes. Reconfigurablecomputing is also known as configurable computing orcustom computing, since many of the design techniques canbe seen as customising a computational fabric for specificapplications [1].Reconfigurable computing systems often have impressiveperformance. Consider, as an example, the pointmultiplication operation in elliptic curve cryptography.For a key size of 270 bits, it has been reported [2] thata point multiplication can be computed in 0.36 ms witha reconfigurable computing design implemented in anXC2V6000 FPGA at 66 MHz. In contrast, an optimisedsoftware implementation requires 196.71 ms on a dual-xeoncomputer at 2.6 GHz; so the reconfigurable computingdesign is more than 540 times faster, while its clock speed isalmost 40 times slower than the Xeon processors.This example illustrates a hardware design implementedon a reconfigurable computing platform. We regard suchimplementations as a subset of reconfigurable computing,which in general can involve the use of runtime reconfiguration and soft processors.Is this speed advantage of reconfigurable computing overtraditional microprocessors a one-off or a sustainable trend?q IEE, 2005IEE Proceedings online no. 20045086doi: 10.1049/ip-cdt:20045086Paper first received 14th July and in revised form 9th November 2004T.J. Todman, O. Mencer and W. Luk are with the Department ofComputing, Imperial College London, 180 Queen’s Gate, London SW72AZ, UKG.A. Constantinides and P.Y.K. Cheung are with the Department ofElectrical and Electronic Engineering, Imperial College London,Exhibition Rd, South Kensington, London SW7 2BT, UKS.J.E. Wilton is with the Department of Electrical and ComputerEngineering, University of British Columbia, 2356 Main Mall,Vancouver, British Columbia, Canada V6T 1Z4E-mail: tjt97@doc.ic.ac.ukIEE Proc.-Comput. Digit. Tech., Vol. 152, No. 2, March 2005Recent research suggests that it is a trend rather than aone-off for a wide variety of applications: from imageprocessing [3] to floating-point operations [4].Sheer speed, while important, is not the only strength ofreconfigurable computing. Another compelling advantage isreduced energy and power consumption. In a reconfigurablesystem, the circuitry is optimised for the application, suchthat the power consumption will tend to be much lower thanthat for a general-purpose processor. A recent study [5]reports that moving critical software loops to reconfigurablehardware results in average energy savings of 35% to 70%with an average speedup of 3 to 7 times, depending on theparticular device used.Other advantages of reconfigurable computing include areduction in size and component count (and hence cost),improved time-to-market, and improved flexibility andupgradability. These advantages are especially importantfor embedded applications. Indeed, there is evidence [6] thatembedded systems developers show a growing interest inreconfigurable computing systems, especially with theintroduction of soft cores which can contain one or moreinstruction processors [7 – 12].In this paper, we present a survey of modern reconfigurable system architectures and design methods. Although wealso provide background information on notable aspects ofolder technologies, our focus is on the most recentarchitectures and design methods, as well as the trendsthat will drive each of these areas in the near future. In otherwords, we intend to complement other survey papers[13 – 17] by:(i) providing an up-to-date survey of material that appearsafter the publication of the papers mentioned above;(ii) identifying explicitly the main trends in architecturesand design methods for reconfigurable computing;(iii) examining reconfigurable computing from a perspective different from existing surveys, for instance classifyingdesign methods as special-purpose and general-purpose;(iv) offering various direct comparisons of technologyoptions according to a selected set of metrics from differentperspectives.2BackgroundMany of today’s computationally intensive applicationsrequire more processing power than ever before. Applications such as streaming video, image recognition193

and processing, and highly interactive services are placingnew demands on the computation units that implement theseapplications. At the same time, the power consumptiontargets, the acceptable packaging and manufacturing costs,and the time-to-market requirements of these computationunits are all decreasing rapidly, especially in the embeddedhand-held devices market. Meeting these performancerequirements under the power, cost and time-to-marketconstraints is becoming increasingly challenging.In the following, we describe three ways of supportingsuch processing requirements: high-performance microprocessors, application-specific integrated circuits andreconfigurable computing systems.High-performance microprocessors provide an offthe-shelf means of addressing processing requirementsdescribed earlier. Unfortunately for many applications,a single processor, even an expensive state-of-the-artprocessor, is not fast enough. In addition, the powerconsumption (100 W or more) and cost (possibly thousandsof dollars) state-of-the-art processors place them out of reachfor many embedded applications. Even if microprocessorscontinue to follow Moore’s Law so that their densitydoubles every 18 months, they may still be unable to keepup with the requirements of some of the most aggressiveembedded applications.Application-specific integrated circuits (ASICs) provideanother means of addressing these processing requirements.Unlike a software implementation, an ASIC implementationprovides a natural mechanism for implementing the largeamount of parallelism found in many of these applications.In addition, an ASIC circuit does not need to suffer from theserial (and often slow and power-hungry) instruction fetch,decode and execute cycle that is at the heart of allmicroprocessors. Furthermore, ASICs consume less powerthan reconfigurable devices. Finally, an ASIC can containjust the right mix of functional units for a particularapplication; in contrast, an off-the-shelf microprocessorcontains a fixed set of functional units which must beselected to satisfy a wide variety of applications.Despite the advantages of ASICs, they are ofteninfeasible or uneconomical for many embedded systems.This is primarily due to two factors: the cost of producing anASIC often due to the mask’s cost (up to 1 million [18]),and the time to develop a custom integrated circuit, can bothbe unacceptable. Only the very highest-volume applicationswould the improved performance and lower per-unit pricewarrant the high nonrecurring engineering (NRE) cost ofdesigning an ASIC.A third means of providing this processing power isa reconfigurable computing system. A reconfigurablecomputing system typically contains one or more processorsand a reconfigurable fabric upon which custom functionalunits can be built. The processor(s) executes sequential andnoncritical code, while code that can be efficiently mappedto hardware can be ‘executed’ by processing units that havebeen mapped to the reconfigurable fabric. Like a customintegrated circuit, the functions that have been mapped tothe reconfigurable fabric can take advantage of theparallelism achievable in a hardware implementation.Also like an ASIC, the embedded system designer canproduce the right mix of functional and storage units in thereconfigurable fabric, providing a computing structure thatmatches the application.Unlike an ASIC, however, a new fabric need not bedesigned for each application. A given fabric can implementa wide variety of functional units. This means that areconfigurable computing system can be built out ofoff-the-shelf components, significantly reducing the long194design-time inherent in an ASIC implementation. Alsounlike an ASIC, the functional units implemented in thereconfigurable fabric can change over time. This means thatas the environment or usage of the embedded systemchanges, the mix of functional units can adapt to bettermatch the new environment. The reconfigurable fabric in ahandheld device, for instance, might implement large matrixmultiply operations when the device is used in one mode,and large signal processing functions when the device isused in another mode.Typically, not all of the embedded system functionalityneeds to be implemented by the reconfigurable fabric. Onlythose parts of the computation that are time-critical andcontain a high degree of parallelism need to be mapped tothe reconfigurable fabric, while the remainder of thecomputation can be implemented by a standard instructionprocessor. The interface between the processor and thefabric, as well as the interface between the memory and thefabric, are therefore of the utmost importance. Modernreconfigurable devices are large enough to implementinstruction processors within the programmable fabricitself: soft processors. These can be general purpose, orcustomised to a particular application; application specificinstruction processors and flexible instruction processors aretwo such approaches. Section 4.3.2 deals with softprocessors in more detail.Other devices show some of the flexibility of reconfigurable computers. Examples include graphics processor unitsand application specific array processors. These devicesperform well on their intended application, but cannot runmore general computations, unlike reconfigurable computers and microprocessors.Despite the compelling promise of reconfigurable computing, it has limitations of which designers should be aware.For instance, the flexible routing on the bit level tends toproduce large silicon area and performance overhead whencompared with ASIC technology. Hence for large volumeproduction of designs in applications without the need forfield upgrade, ASIC technology or gate array technology canstill deliver higher performance design at lower unit cost thanreconfigurable computing technology. However, sinceFPGA technology tracks advances in memory technologyand has demonstrated impressive advances in the last fewyears, many are confident that the current rapid progress inFPGA speed, capacity and capability will continue, togetherwith the reduction in price.It should be noted that the development of reconfigurablesystems is still a maturing field. There are a number ofchallenges in developing a reconfigurable system.We describe three of such challenges below.First, the structure of the reconfigurable fabric and theinterfaces between the fabric, processor(s) and memorymust be very efficient. Some reconfigurable computingsystems use a standard field-programmable gate array[19 –24] as a reconfigurable fabric, while others adoptcustom-designed fabrics [25 –36].Another challenge is the development of computer-aideddesign and compilation tools that map an application to areconfigurable computing system. This involves determining which parts of the application should be mapped to thefabric and which should be mapped to the processor,determining when and how often the reconfigurable fabricshould be reconfigured, which changes the functional unitsimplemented in the fabric, as well as the specification ofalgorithms for efficient mappings to the reconfigurablesystem.In this paper, we provide a survey of reconfigurablecomputing, focusing our discussion on both the issuesIEE Proc.-Comput. Digit. Tech., Vol. 152, No. 2, March 2005

described above. In the following Section, we provide asurvey of various architectures that are found useful forreconfigurable computing; material on design methods willfollow.3ArchitecturesWe shall first describe system-level architectures forreconfigurable computing. We then present various flavoursof reconfigurable fabric. Finally we identify and summarisethe main trends.3.1 System-level architecturesA reconfigurable system typically consists of one or moreprocessors, one or more reconfigurable fabrics, and one ormore memories. Reconfigurable systems are often classifiedaccording to the degree of coupling between the reconfigurable fabric and the CPU. Compton and Hauck [14] presentthe four classifications shown in Fig. 1a –d. In Fig. 1a, thereconfigurable fabric is in the form of one or more standalone devices. The existing input and output mechanisms ofthe processor are used to communicate with the reconfigurable fabric. In this configuration, the data transfer betweenthe fabric and the processor is relatively slow, so thisarchitecture only makes sense for applications in which asignificant amount of processing can be done by the fabricwithout processor intervention. Emulation systems oftentake on this sort of architecture [37, 38].Figures 1b, c show two intermediate structures. In bothcases, the cost of communication is lower than that of thearchitecture in Fig. 1a. Architectures of these types aredescribed in [28, 29, 33, 35, 39 – 42].Next, Fig. 1d shows an architecture in which theprocessor and the fabric are very tightly coupled; in thiscase, the reconfigurable fabric is part of the processor itself;perhaps forming a reconfigurable sub-unit that allows forthe creation of custom instructions. Examples of this sort ofarchitecture have been described in [30, 32, 36, 43].Figure 1e shows a fifth organisation. In this case, theprocessor is embedded in the programmable fabric.The processor can either be a ‘hard’ core [44, 45], or canbe a ‘soft’ core which is implemented using the resources ofthe programmable fabric itself [7 –12].A summary of the above organisations can be found inTable 1. Note that the bandwidth is the theoreticalmaximum available to the CPU: for example, in Chess[30], we assume that each block RAM is being accessed atits maximum rate. Organisation (a) is by far the mostcommon, and accounts for all commercial reconfigurableplatforms.3.2 Reconfigurable fabricThe heart of any reconfigurable system is the reconfigurablefabric. The reconfigurable fabric consists of a set ofreconfigurable functional units, a reconfigurable interconnect, and a flexible interface to connect the fabric to the restof the system. In this Section, we review each of thesecomponents, and show how they have been used in bothcommercial and academic reconfigurable systems.A common theme runs through this entire section: in eachcomponent of the fabric, there is a tradeoff betweenflexibility and efficiency. A highly flexible fabric is typicallymuch larger and much slower than a less flexible fabric.On the other hand, a more flexible fabric is better able toadapt to the application requirements.In the following discussions, we will see how thistradeoff has influenced the design of every part of everyIEE Proc.-Comput. Digit. Tech., Vol. 152, No. 2, March 2005Fig. 1 Five classes of reconfigurable systemsThe first four are adapted from [14]a External stand-alone processing unitb Attached processing unitc Co-processord Reconfigurable functional unite Processor embedded in a reconfigurable fabricreconfigurable system. A summary of the main features ofvarious architectures can be found in Table 2.3.2.1 Reconfigurable functional units: Reconfigurable functional units can be classified as either coarsegrained or fine-grained. A fine-grained functional unit cantypically implement a single function on a single (or smallnumber) of bits. The most common kind of fine-grained195

Table 1: Summary of system architecturesClassCPU to memorySharedFine grained orbandwidth, MB smemory sizecoarse grainedExample application152 MBFine grainedVideo processing(a) External stand-aloneprocessing unitRC2000 [46]528(b) (c) Attached processingunit co-processorPilchard [47]106420 kbytesFine grainedDES encryption8002048 bytesCoarse grainedVideo compression640012288 bytesCoarse grainedVideo processing16001172 kBFine grainedVideo compressionMorphosys [35](d ) Reconfigurablefunctional unitChess [30](e) Processor embedded ina reconfigurable fabricXilinx Virtex II Pro [24]Table 2: Comparison of reconfigurable fabrics and devicesFinegrained orBase logicRoutingEmbeddedSpecialFabric of devicecoarse grainedcomponentarchitecturememoryFeaturesActel ProASICþ [19]Fine3-input blockHorizontal and256 9 bit blocksFlash-based2 kbit memory blocksARMv4T embeddedvertical tracksAltera Excalibur [44]Altera Stratix II [20]Garp [29]FineFine coarseFine4-input lookupHorizontal andtablesvertical tracks8-input adaptiveHorizontal and512 bits, 4 kbits,logic modulevertical tracksand 512 kbit blocksLogic or arithmetic2-bit buses inExternal to fabricfunctions on fourhorizontal2-bit input wordsand vertical4-input lookupHorizontal andtablesvertical tracksprocessorDSP blockscolumnsXilinx Virtex II ProFine[45]18 kbit blocksEmbedded multipliers,PowerPC 405processorXilinx Virtex II [24]DReAM [48]FineCoarse4-input lookupHorizontal andtablesvertical tracks18 kbit blocksEmbedded multipliers8-bit ALUs16-bit localTwo 16 8Targets mobileand global busesdual port memoryapplications256 8 memory blocksElixent D-fabrix [27]Coarse4-bit ALUs4-bit busesHP Chess [30]Coarse4-bit ALUs4-bit buses256 8 bit memoriesIMEC ADRES [31]Coarse32-bit ALUs32-bit busesSmall register files ineach logic componentMatrix [32]Coarse8-bit ALUsHierarchical256 8 bit memories8-bit busesMorphoSys [35]CoarseALU and multiplier,BusesExternal to fabricand shift unitsPiperench [28]Coarse8-bit ALUs8-bit busesExternal to fabricRaPiD [26]CoarseALUsBusesEmbedded memory blocksSilicon HiveCoarseALUs, shifters,BusesFive embedded memoriesFunctional unitsarranged in ‘stripes’Avispa [34]accumulatorsand multipliers196IEE Proc.-Comput. Digit. Tech., Vol. 152, No. 2, March 2005

Fig. 2 Fine-grained reconfigurable functional unitsa Three-input lookup tableb Cluster of lookup tablesfunctional units are the small lookup tables that are used toimplement the bulk of the logic in a commercial fieldprogrammable gate array. A coarse-grained functional unit,on the other hand, is typically much larger, and may consistof arithmetic and logic units (ALUs) and possibly even asignificant amount of storage. In this Section, we describethe two types of functional units in more detail.Many reconfigurable systems use commercial FPGAs asa reconfigurable fabric. These commercial FPGAs containmany three to six input lookup tables, each of which can bethought of as a very fine-grained functional unit. Figure 2aillustrates a lookup table; by shifting in the correct pattern ofbits, this functional unit can implement any single functionof up to three inputs – the extension to lookup tables withlarger numbers of inputs is clear. Typically, lookup tablesare combined into clusters, as shown in Fig. 2b. Figure 3shows clusters in two popular FPGA families. Figure 3ashows a cluster in the Altera Stratix device; Altera callsthese clusters ‘logic array blocks’ [20]. Figure 3b shows acluster in the Xilinx architecture [24]; Xilinx calls theseclusters ‘configurable logic blocks’ (CLBs). In the Alteradiagram, each block labelled ‘LE’ is a lookup table, while inthe Xilinx diagram, each ‘slice’ contains two lookup tables.Other commercial FPGAs are described in [19, 21– 23].Reconfigurable fabrics containing lookup tables are veryflexible, and can be used to implement any digital circuit.However, compared to the coarse-grained structures inSection 3.2.2, these fine-grained structures have significantly more area, delay and power overhead. Recognisingthat these fabrics are often used for arithmetic purposes,FPGA companies have added additional features such ascarry-chains and cascade-chains to reduce the overheadwhen implementing common arithmetic and logic functions. Figure 4 shows how the carry and cascade chains, aswell as the ability to break a 4-input lookup table into fourtwo-input lookup tables, can be exploited to efficientlyimplement carry-select adders [20]. The multiplexers andthe exclusive-or gate in Fig. 4 are included as part of eachlogic array block, and need not be implemented using otherlookup tables.The example in Fig. 4 shows how the efficiencyof commercial FPGAs can be improved by addingIEE Proc.-Comput. Digit. Tech., Vol. 152, No. 2, March 2005Fig. 3 Commercial logic block architecturesa Altera logic array block [20]b Xilinx configurable logic block [24]architectural support for common functions. We can gomuch further than this, though, and embed significantlylarger, but far less flexible, reconfigurable functional units.There are two kinds of devices that contain coarse-grained197

Fig. 4Implementing a carry-select adder in an Altera Stratix device [20]‘LUT’ denotes ‘lookup table’Fig. 5Altera DSP block [20]functional units; modern FPGAs, which are primarilycomposed of fine-grained functional units, are increasinglybeing enhanced by the inclusion of larger blocks. As anexample, the Xilinx Virtex device contains embedded 18-bitby 18-bit multiplier units [24]. When implementing algorithms requiring a large amount of multiplication, theseembedded blocks can significantly improve the density,speed and power of the device. On the other hand, foralgorithms which do not perform multiplication, these blocksare rarely useful. The Altera Stratix devices contain a largerbut more flexible embedded block, called a DSP block,shown in Fig. 5 [20]. Each of these blocks can performaccumulate functions as well as multiply operations. Thecomparison between the two devices clearly illustrates theflexibility and overhead tradeoff; the Altera DSP block maybe more flexible than the Xilinx multiplier, however, itconsumes more chip area and runs somewhat slower.The commercial FPGAs described above contain bothfine-grained and coarse-grained blocks. There are alsodevices which contain only coarse-grained blocks [25, 26,28, 30, 31, 35]. An example of a coarse-grained architectureis the ADRES architecture shown in Fig. 6 [31]. Eachreconfigurable functional unit in this device contains a 32bit ALU which can be configured to implement one ofseveral functions including addition, multiplication and198Fig. 6ADRES reconfigurable functional unit [31]logic functions, with two small register files. Clearly, such afunctional unit is far less flexible than the fine-grainedfunctional units described earlier; however, if the application requires functions which match the capabilities of theALU, these functions can be very efficiently implemented inthis architecture.IEE Proc.-Comput. Digit. Tech., Vol. 152, No. 2, March 2005

3.2.2Reconfigurable interconnects: Regardless of whether a device contains fine-grained functionalunits, coarse-grained functional units, or a mixture of thetwo, the functional units needed to be connected in a flexibleway. Again, there is a tradeoff between the flexibility of theinterconnect (and hence the reconfigurable fabric) and thespeed, area and power-efficiency of the architecture.As before, reconfigurable interconnect architectures canbe classified as fine-grained or coarse-grained. The distinction is based on the granularity with which wires areswitched. This is illustrated in Fig. 7, which shows a flexibleinterconnect between two buses. In the fine-grainedarchitecture in Fig. 7a, each wire can be switchedindependently, while in Fig. 7b the entire bus is switchedas a unit. The fine-grained routing architecture in Fig. 7a ismore flexible, since not every bit needs to be routed in thesame way; however, the coarse-grained architecture inFig. 7b contains far fewer programming bits, and hencesuffers much less overhead.Fine-grained routing architectures are usually found incommercial FPGAs. In these devices, the functional unitsFig. 7 Routing architecturesa Fine-grainedb Coarse-grainedare typically arranged in a grid pattern, and they areconnected using horizontal and vertical channels. Significant research has been performed in the optimisation of thetopology of this interconnect [49, 50]. Coarse-grainedrouting architectures are commonly used in devicescontaining coarse-grained functional units. Figure 8 showstwo examples of coarse-grained routing architectures:(a) the Totem reconfigurable system [25]; (b) the SiliconHive reconfigurable system [34], which is less flexible butfaster and smaller.3.2.3Emerging directions: Several emergingdirections will be covered in the following. These directionsinclude low-power techniques, asynchronous architecturesand molecular microelectronics:. Low-power techniques: Early work explores the use oflow-swing circuit techniques to reduce the power consumption in a hierarchical interconnect for a low-energy FPGA[51]. Recent work involves: (a) activity reduction inpower-aware design tools, with energy saving of 23%[52]; (b) leakage current reduction methods such as gatebiasing and multiple supply-voltage integration, with up totwo times leakage power reduction [53]; and (c) dualsupply-voltage methods with the lower voltage assigned tononcritical paths, resulting in an average power reduction of60% [54]. Asynchronous architectures: There is an emerginginterest in asynchronous FPGA architectures. An asynchronous version of Piperench [28] is estimated to improveperformance by 80%, at the expense of a significant increasein configurable storage and wire count [55]. Other efforts inthis direction include fine-grained asynchronous pipelines[56], quasi delay-insensitive architectures [57], and globallyasynchronous locally synchronous techniques [58]. Molecular microelectronics: In the long term, moleculartechniques offer a promising opportunity for increasing thecapacity and performance of reconfigurable computingarchitectures [59]. Current work is focused on developingprogrammable logic arrays based on molecular-scale nanowires [60, 61].3.3 Architectures: main trendsThe following summarises the main trends in architecturesfor reconfigurable computing.3.3.1 Coarse-grained fabrics: As reconfigurablefabrics are migrated to more advanced technologies, the cost(in terms of both speed and power) of the interconnect partof a reconfigurable fabric is growing. Designers areresponding to this by increasing the granularity of theirlogic units, thereby reducing the amount of interconnectneeded. In the Stratix II device, Altera moved away fromsimple 4-input lookup tables, and used a more complexlogic block which can implement functions of up to 7 inputs.We should expect to see a slow migration to more complexlogic blocks, even in stand-alone FPGAs.3.3.2 Heterogeneous functions: As devices areFig. 8 Example coarse-grained routing architecturesa Totem coarse-grained routing architecture [25]b Silicon Hive coarse-grained routing architecture [34]IEE Proc.-Comput. Digit. Tech., Vol. 152, No. 2, March 2005migrated to more advanced technologies, the number oftransistors that can be devoted to the reconfigurable logicfabric increases. This provides new opportunities to embedcomplex nonprogrammable (or semi-programmable) functions, creating heterogeneous architectures with bothgeneral-purpose logic resources and fixed-functionembedded blocks. Modern Xilinx parts have embedded 18by 18 bit multipliers, while modern Altera parts haveembedded DSP units which can perform a variety of199

multiply accumulate functions. Again, we should expect tosee a migration to more heterogeneous architectures in thenear future.3.3.3 Soft cores: The use of ‘soft’ cores, particularly for instruction processors, is increasing. A ‘soft’ core isone in which the vendor provides a synthesisable version ofthe function, and the user implements the function using thereconfigurable fabric. Although this is less area- and speedefficient than a hard embedded core, the flexibility and theease of integrating these soft cores makes them attractive.The extra overhead becomes less of a hindrance as thenumber of transistors devoted to the reconfigurable fabricincreases. Altera and Xilinx both provide numerous softcores, including soft instruction processors such as NIOS [7]and Microblaze [12]. Soft instruction processors have alsobeen developed by a number of researchers, ranging fromcustomisable JVM and MIPS processors [10] to onesspecialised for machine learning [8] and data encryption [9].4Design methodsHardware compilers for high-level descriptions are increasingly recognised to be the key to reducing the productivitygap for advanced circuit development in general, and forreconfigurable designs in particular. This Section looks athigh-level design methods from two perspectives: specialpurpose design and general-purpose design. Low-leveldesign methods and tools, covering topics such astechnology mapping, floor-planning, and place and route,are beyond the scope of this paper – interested readers arereferred to [14].4.1 General-purpose designThis Section describes design methods and tools based on ageneral-purpose programming language such as C, possiblyadapted to facilitate hardware development. Of course,traditional hardware description languages like VHDL andVerilog are widely available, especially on commercialreconfigurable platforms.A number of compilers from C to

custom computing, since many of the design techniques can be seen as customising a computational fabric for specific applications [1]. Reconfigurable computing systems often have impressive performance. Consider, as an example, the point multiplication operation in elliptic curve cryptogr