COMPILER CONSTR UCTION

Transcription

COMPILER CONSTRUCTIONWilliam M. WaiteDepartment of Electrical EngineeringUniversity of ColoradoBoulder, Colorado 80309USAemail:William.Waite@colorado.eduGerhard GoosInstitut Programmstrukturen und DatenorganisationFakult at f ur InformatikUniversit at KarlsruheD-76128 .deCompiler Construction, a modern text written by two leaders in the in theeld, demonstrates how a compiler is built. Describing the necessary toolsand how to create and use them, the authors compose the task into modules, placing equal emphasis on the action and data aspects of compilation.Attribute grammars are used extensively to provide a uniform treatment ofsemantic analysis, competent code generation and assembly. The authorsalso explain how intermediate representations can be chosen automaticallyon the basis of attribute dependence. Thus, all aspects of the subject arepresented in terms of a uniform model subject to automation. This approach imparts a vivid understanding of the compilation and the decisionsthat must be made when designing a compiler.From the back page of the printed book.All known errors from the rst and second printing (1994 and 1995) have been xed. While everyprecaution has been taken in preparation of this book, the authors assume no responsibility for errorsor omissions, or damages resulting from the use of the information contained here.c 1984{1994 by Springer-Verlag, Berlin, New York Inc. ISBN 0-387-90821-8 and ISBN 3-540-90821c 1995 by William M. Waite and Gerhard Goos.All rights reserved. No part of this book may be translated, reproduced, archived or sold in any formwithout written permission from one of the authors.The content of Compiler Construction is made available via the Web by permission of the authorsas a service to the community and only for educational purposes. The book may be accessed freelyvia Web browsers. The URL is s/CompilerConstruction.ps.gz.Karlsruhe, 22nd February 1996

To all who know more than one language

PrefaceCompilers and operating systems constitute the basic interfaces between a programmer andthe machine for which he is developing software. In this book we are concerned with theconstruction of the former. Our intent is to provide the reader with a rm theoretical basisfor compiler construction and sound engineering principles for selecting alternate methods,implementing them, and integrating them into a reliable, economically viable product. Theemphasis is upon a clean decomposition employing modules that can be re-used for many compilers, separation of concerns to facilitate team programming, and exibility to accommodatehardware and system constraints. A reader should be able to understand the questions hemust ask when designing a compiler for language X on machine Y, what tradeo s are possible,and what performance might be obtained. He should not feel that any part of the design restson whim; each decision must be based upon speci c, identi able characteristics of the sourceand target languages or upon design goals of the compiler.The vast majority of computer professionals will never write a compiler. Nevertheless,study of compiler technology provides important bene ts for almost everyone in the eld. It focuses attention on the basic relationships between languages and machines. Un-derstanding of these relationships eases the inevitable transitions to new hardware andprogramming languages and improves a person's ability to make appropriate tradeo sin design and implementation. It illustrates application of software engineering techniques to the solution of a signi cantproblem. The problem is understandable to most users of computers, and involves bothcombinatorial and data processing aspects. Many of the techniques used to construct a compiler are useful in a wide variety of applications involving symbolic data. In particular, every man-machine interface constitutesa form of programming language and the handling of input involves these techniques. We believe that software tools will be used increasingly to support many aspects ofcompiler construction. Much of Chapters 7 and 8 is therefore devoted to parser generators and analyzers for attribute grammars. The details of this discussion are onlyinteresting to those who must construct such tools; the general outlines must be knownto all who use them. We also realize that construction of compilers by hand will remainan important alternative, and thus we have presented manual methods even for thosesituations where tool use is recommended.Virtually every problem in compiler construction has a vast number of possible solutions.We have restricted our discussion to the methods that are most useful today, and make noattempt to give a comprehensive survey. Thus, for example, we treat only the LL and LRparsing techniques and provide references to the literature for other approaches. Because wedo not constantly remind the reader that alternative solutions are available, we may sometimesappear overly dogmatic although that is not our intent.i

iiPrefaceChapters 5 and 8, and Appendix B, state most theoretical results without proof. Althoughthis makes the book unsuitable for those whose primary interest is the theory underlying acompiler, we felt that emphasis on proofs would be misplaced. Many excellent theoreticaltexts already exist; our concern is reduction to practice.A compiler design is carried out in the context of a particular language/machine pair.Although the principles of compiler construction are largely independent of this context, thedetailed design decisions are not. In order to maintain a consistent context for our majorexamples, we therefore need to choose a particular source language and target machine. Thesource language that we shall use is de ned in Appendix A. We chose not to use an existinglanguage for several reasons, the most important being that a new language enabled us tocontrol complexity: Features illustrating signi cant questions in compiler design could beincluded while avoiding features that led to burdensome but obvious detail. It also allowsus to illustrate how a compiler writer derives information about a language, and provides anexample of an informal but relatively precise language de nition.We chose the machine language of the IBM 370 and its imitators as our target. Thisarchitecture is widely used, and in many respects it is a di cult one to deal with. Theproblems are representative of many computers, the important exceptions being those (suchas the Intel 8086) without a set of general registers. As we discuss code generation andassembly strategies we shall point out simpli cations for more uniform architectures likethose of the DEC PDP11 and Motorola 68000.We assume that the reader has a minimum of one year of experience with a blockstructured language, and some familiarity with computer organization. Chapters 5 and 8use notation from logic and set theory, but the material itself is straightforward. Severalimportant algorithms are based upon results from graph theory summarized in Appendix B.This book is based upon many compiler projects and upon the lectures given by theauthors at the Universit at Karlsruhe and the University of Colorado. For self-study, werecommend that a reader with very little background begin with Section 1.1, Chapters 2and 3, Section 12.1 and Appendix A. His objective should be to thoroughly understand therelationships between typical programming languages and typical machines, relationships thatde ne the task of the compiler. It is useful to examine the machine code produced by existingcompilers while studying this material. The remainder of Chapter 1 and all of Chapter 4 givean overview of the organization of a compiler and the properties of its major data structures,while Chapter 14 shows how three production compilers have been structured. From thismaterial the reader should gain an appreciation for how the various subtasks relate to oneanother, and the important characteristics of the interfaces between them.Chapters 5, 6 and 7 deal with the task of determining the structure of the source program.This is perhaps the best-understood of all compiler tasks, and the one for which the mosttheoretical background is available. The theory is summarized in Chapter 5, and applied inChapters 6 and 7. Readers who are not theoretically inclined, and who are not concernedwith constructing parser generators, should skim Chapter 5. Their objectives should be tounderstand the notation for describing grammars, to be able to deal with nite automata,and to understand the concept of using a stack to resolve parenthesis nesting. These readersshould then concentrate on Chapter 6, Section 7.1 and the recursive descent parse algorithmof Section 7.2.2.The relationship between Chapter 8 and Chapter 9 is similar to that between Chapter 5and Chapter 7, but the theory is less extensive and less formal. This theory also underliesparts of Chapters 10 and 11. We suggest that the reader who is actually engaged in compiler construction devote more e ort to Chapters 8-11 than to Chapters 5-7. The reason isthat parser generators can be obtained \o the shelf" and used to construct the lexical andsyntactic analysis modules quickly and reliably. A compiler designer must typically devote

Prefaceiiimost of his e ort to specifying and implementing the remainder of the compiler, and hencefamiliarity with Chapters 8-11 will have a greater e ect on his productivity.The lecturer in a one-semester, three-hour course that includes exercises is compelled torestrict himself to the fundamental concepts. Details of programming languages (Chapter 2),machines (Chapter 3) and formal languages and automata theory (Chapter 5) can only becovered in a cursory fashion or must be assumed as background. The speci c techniquesfor parser development and attribute grammar analysis, as well as the whole of Chapter 13,must be reserved for a separate course. It seems best to present theoretical concepts fromChapter 5 in close conjunction with the speci c methods of Chapters 6 and 7, rather than asa single topic. A typical outline is:1. The Nature of the Problem4 hours1.1. Overview of compilation (Chapter 1)1.2. Languages and machines (Chapters 2 and 3)2. Compiler Data Structures (Chapter 4)4 hours3. Structural Analysis10 hours3.1. Formal Systems (Chapter 5)3.2. Lexical analysis (Chapter 6)3.3. Parsing (Chapter 7)Review and Examination2 hours4. Consistency Checking10 hours4.1. Attribute grammars (Chapter 8)4.2. Semantic analysis (Chapter 9)5. Code Generation (Chapter 10)8 hours6. Assembly (Chapter 11)2 hours7. Error Recovery (Chapter 12)3 hoursReview2 hoursThe students do not write a compiler during this course. For several years it has beenrun concurrently with a practicum in which the students implement the essential parts of aLAX compiler. They are given the entire compiler, with stubs replacing the parts they are towrite. In contrast to project courses in which the students must write a complete compiler, thisapproach has the advantage that they need not be concerned with unimportant organizationaltasks. Since only the central problems need be solved, one can deal with complex languageproperties. At the same time, students are forced to read the environment programs and toadhere to interface speci cations. Finally, if a student cannot solve a particular problem itdoes not cause his entire project to fail since he can take the solution given by the instructorand proceed.AcknowledgementsThis book is the result of many years of collaboration. The necessary research projects andtravel were generously supported by our respective universities, the Deutsche Forschungsgemeinschaft and the National Science Foundation.It is impossible to list all of the colleagues and students who have in uenced our work.We would, however, like to specially thank four of our doctoral students, Lynn Carter, BruceHaddon, Uwe Kastens and Johannes R ohrich, for both their technical contributions and theirwillingness to read the innumerable manuscripts generated during the book's gestation. MaeJean Ruehlman and Gabriele Sahr also have our gratitude for learning more than they everwanted to know about computers and word processing as they produced and edited thosemanuscripts.

ivPreface

ContentsPrefaceContents1 Introduction and Overview1.11.21.31.41.5Translation and Interpretation .The Tasks of a Compiler . . . . .Data Management in a CompilerCompiler Structure . . . . . . . .Notes and References . . . . . . .2 Properties of Programming Languages2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . .2.1.1 Syntax, Semantics and Pragmatics . . . . . . .2.1.2 Syntactic Properties . . . . . . . . . . . . . . .2.1.3 Semantic Properties . . . . . . . . . . . . . . .2.2 Data Objects and Operations . . . . . . . . . . . . . .2.2.1 Elementary Types . . . . . . . . . . . . . . . .2.2.2 Composite Types . . . . . . . . . . . . . . . . .2.2.3 Strings . . . . . . . . . . . . . . . . . . . . . . .2.2.4 Pointers . . . . . . . . . . . . . . . . . . . . . .2.2.5 Type Equivalence . . . . . . . . . . . . . . . . .2.3 Expressions . . . . . . . . . . . . . . . . . . . . . . . .2.4 Control Structures . . . . . . . . . . . . . . . . . . . .2.5 Program Environments and Abstract Machine States .2.5.1 Constants, Variables and Assignment . . . . .2.5.2 The Environment . . . . . . . . . . . . . . . . .2.5.3 Binding . . . . . . . . . . . . . . . . . . . . . .2.6 Notes and References . . . . . . . . . . . . . . . . . . .3 Properties of Real and Abstract Machines3.1 Basic Characteristics . . . . . . . . . .3.1.1 Storage Classes . . . . . . . . .3.1.2 Access Paths . . . . . . . . . .3.1.3 Operations . . . . . . . . . . .3.2 Representation of Language Elements3.2.1 Elementary Objects . . . . . .3.2.2 Composite Objects . . . . . . .3.2.3

compiler design is carried out in the con text of a particular language/mac hine pair. Although the principles of compiler construction are largely indep enden t of this con text, the detailed design decisions are not. In order to main tain a consisten t con text for our ma jor examples, w e therefore need to c ho ose a particular source .