Arquitectura de Computadores

Notas de estudo

Alberto José Proença

1999/00



Índice geral
 
 

Anexo E : Survey of RISC Architectures
 
 

This Web extension is part of  Computer Organization and Design: The Hardware/Software Interface, Second Edition (COD:HSI 2e), by David A. Patterson and John L. Hennessy.

To print this Web extension, please select the format best for your needs:

If you believe that you have found an error in this extension, please send email to riscbugs@mkp.com. Include a description of the error, approximate location of error, the original text, and the corrected text. A $1 reward will be given to the first person who reports the error. Rewards are distributed once a year.
 

Contents

 E.1    Introduction
 E.2    Address Modes and Instruction Formats
 E.3    Instructions: The MIPS Core Subset
 E.4    Instructions: Multimedia Extensions of the Desktop/Server RISCs
 E.5    Instructions: Digital Signal Processing Extensions of the Embedded RISCs
 E.6    Instructions: Common Extensions to MIPS Core
 E.7    Instructions Unique to MIPS V
 E.8    Instructions Unique to Alpha
 E.9    Instructions Unique to SPARC v.9
 E.10  Instructions Unique to PowerPC
 E.11  Instructions Unique to PA-RISC 2.0
 E.12  Instructions Unique to ARM
 E.13  Instructions Unique to Thumb
 E.14  Instructions Unique to SuperH
 E.15  Instructions Unique to M32R
 E.16  Instructions Unique to MIPS16
 E.17  Concluding Remarks
 E.18  Acknowledgments
 E.19  References


E.1 Introduction

We cover two groups of reduced instruction set computer (RISC) architectures in this Web extension. The first group is the desktop and server RISCs:

The second group is the embedded RISCs:

There has never been another class of computers so similar. This similarity allows the presentation of 10 architectures in about 50 pages. Characteristics of the desktop RISCs are found in Figure E.1 and the embedded RISCs in Figure E.2.
 
 
 

  Alpha  MIPS I  PA-RISC 1.1  PowerPC  SPARC V8 
Date announced 1992 1986 1986 1993 1987
Instruction size (bits) 32 32 32 32 32
Address space (size, model) 64 bits, flat 32 bits, flat 48 bits, segmented 32 bits, flat 32 bits, flat
Data alignment Aligned Aligned Aligned Unaligned Aligned
Data addressing modes 1 1 5 4 2
Protection Page Page Page Page Page
Minimum page size 8 KB 4 KB 4 KB 4 KB 8 KB
I/O Memory mapped Memory mapped Memory mapped Memory mapped Memory mapped
Integer registers (number, model, size) 31 GPR 
x 64 bits
31 GPR 
x 32 bits
31 GPR 
x 32 bits
32 GPR 
x 32 bits
31 GPR 
x 32 bits
 Separate floating-point registers 31 x 32 or 
31 x 64 bits
16 x 32 or 
16 x 64 bits 
56 x 32 or 
28 x 64 bits 
32 x 32 or 
32 x 64 bits
32 x 32 or 
32 x 64 bits
Floating-point format IEEE 754 single, double  IEEE 754 single, double IEEE 754 single, double IEEE 754 single, double IEEE 754 single, double
Figure E.1 Summary of the first version of five recent architectures for desktops and servers. Except for the number of data address modes and some instruction set details, the integer instruction sets of these architectures are very similar. Contrast this with Figure E.34. Later versions of these architectures all support a flat, 64-bit address space.


 
 

  ARM  Thumb  SuperH  M32R  MIPS16 
Date announced 1985 1995 1992 1997 1996
Instruction size (bits) 32 16 16 16/32 16/32
Address space (size, model) 32 bits, flat 32 bits, flat 32 bits, flat 32 bits, flat 32/64 bits, flat
Data alignment Aligned Aligned Aligned Aligned Aligned
Data addressing modes 6 6 4 3 2
Integer registers (number, model, size) 15 GPR 
x 32 bits
8 GPR + SP, LR x 32 bits 16 GPR 
x 32 bits
16 GPR 
x 32 bits
8 GPR + SP, RA 
x 32/64 bits
I/O Memory mapped Memory mapped Memory mapped Memory mapped Memory mapped
Figure E.2 Summary of five recent architectures for embedded applications. Except for number of data address modes and some instruction set details, the integer instruction sets of these architectures are similar. Contrast this with Figure E.34

Notice that the embedded RISCs tend to have 8 to 16 general purpose-registers while the desktop/server RISCs have 32, and that the length of instructions is 16 to 32 bits in embedded RISCs but always 32 bits in desktop/server RISCs.

Although shown as separate embedded instruction set architectures, Thumb and MIPS16 are really optional modes of ARM and MIPS invoked by call instructions. When in this mode they execute a subset of the native architecture using 16-bit-long instructions. These 16-bit instruction sets are not intended to be full architectures, but they are enough to encode most procedures. Both machines expect procedures to be homogeneous, with all instructions in either 16-bit mode or 32-bit mode. Programs will consist of procedures in 16-bit mode for density or in 32-bit mode for performance.

One complication of this description is that some of the older RISCs have been extended over the years. We decided to describe the latest version of the architectures: Alpha version 3, MIPS V, PA-RISC 2.0, and SPARC version 9 for the desktop machines; ARM version 4, Thumb version 1, Hitachi SuperH SH-3, M32R version 1, and MIPS16 version 1 for the embedded ones.

The remaining sections proceed as follows. After discussing the addressing modes and instruction formats of our RISC architectures, we present the survey of the instructions in five steps:

We give the evolution of the instruction sets in the final section and conclude with a speculation about future directions for RISCs.

E.2 Addressing Modes and Instruction Formats

Figure E.3 shows the data addressing modes supported by the desktop architectures. Since all have one register that always has the value 0 when used in address modes, the absolute address mode with limited range can be synthesized using zero as the base in displacement addressing. (This register can be changed by ALU operations in PowerPC; it is always 0 in the other machines.) Similarly, register-indirect addressing is synthesized by using displacement addressing with an offset of 0. Simplified addressing modes is one distinguishing feature of RISC architectures.
 

Addressing mode  Alpha  MIPS V  PA-RISC 2.0  PowerPC  SPARC V9 
Register + offset (displacement or based) X X X X X
Register + register (indexed) --- X (FP) X (Loads) X X
Register + scaled register (scaled) --- --- X --- ---
Register + offset and update register --- --- X X ---
Register + register and update register --- --- X X ---
Figure E.3 Summary of data addressing modes supported by the desktop architectures. PA-RISC also has short address versions of the offset addressing modes. MIPS V has indexed addressing for floating-point loads and stores. (These addressing modes are described in Figure 2.5, page 75.) 


 
 

Addressing mode  ARMv4  Thumb  SuperH  M32R  MIPS16 
Register + offset (displacement or based) X X X X X
Register + register (indexed) X X X --- ---
Register + scaled register (scaled) X --- --- --- ---
Register + offset and update register X --- --- --- ---
Register + register and update register X --- --- --- ---
Register indirect --- --- X X ---
Autoincrement, autodecrement X X X X ---
PC-relative data X X (loads) X --- X (loads)
Figure E.4 Summary of data addressing modes supported by the embedded architectures. SuperH and M32R have separate register indirect and register + offset addressing modes rather than just putting 0 in the offset of the latter mode. This increases the use of 16-bit instructions in the M32R, and it gives a wider set of addresses modes to different data transfer instructions in SuperH. To get greater addressing range, ARM and Thumb shift the offset left 1 or 2 bits if the data size is halfword or word. (These addressing modes are described in Figure 2.5, page 75.) 

Figure E.4 shows the data addressing modes supported by the embedded architectures. Unlike the desktop RISCs, these embedded machines do not reserve a register to contain 0. Although most have two to three simple addressing modes, ARM and SuperH have several, including fairly complex calculations. ARM has an addressing mode that can shift one register by any amount, add it to the other registers to form the address, and then update one register with this new address.

References to code are normally PC-relative, although jump register indirect is supported for returning from procedures, for case statements, and for pointer function calls. One variation is that PC-relative branch addresses are shifted left 2 bits before being added to the PC for the desktop RISCs, thereby increasing the branch distance. This works because the length of all instructions for the desktop RISCs is 32 bits and instructions must be aligned on 32-bit words in memory. Embedded architectures with 16-bit-long instructions usually shift the PC-relative address by 1 for similar reasons.

Figure E.5 shows the format of the desktop RISC instructions, which includes the size of the address in the instructions. Each instruction set architecture uses these four primary instruction formats. Figure E.6 shows the six formats for the embedded RISC machines. The desire to have smaller code size via 16-bit instructions leads to more instruction formats.
 

Figure E.5 Instruction formats for desktop RISC architectures. These four formats are found in all five architectures. (The superscript notation in this figure means the width of a field in bits.) Although the register fields are located in similar pieces of the instruction, be aware that the destination and two source fields are scrambled. Op = the main opcode, Opx = an opcode extension, Rd = the destination register, Rs1 = source register 1, Rs2 = source register 2, and Const = a constant (used as an immediate or as an address). Unlike the other RISCs, Alpha has a format for immediates in arithmetic and logical operations that is different from the data transfer format shown here. It provides an 8-bit immediate in bits 20 to 13 of the RR format, with bits 12 to 5 remaining as an opcode extension. 


 
 


Figure E.6 Instruction formats for embedded RISC architectures. These six formats are found in all five architectures. The notation is the same as Figure E.5. Note the similarities in branch, jump, and call formats, and the diversity in register-register, register-immediate, and data transfer formats. The differences result from whether the architecture has 8 or 16 registers, whether it is a 2- or 3-operand format, and whether the instruction length is 16 or 32 bits. 


 
 

Format: instruction category  Alpha  MIPS V  PA-RISC 2.0  PowerPC  SPARC V9 
Branch: all Sign Sign Sign Sign Sign
Jump/call: all Sign --- Sign Sign Sign
Register-immediate: data transfer Sign Sign Sign Sign Sign
Register-immediate: arithmetic Zero Sign Sign Sign Sign
Register-immediate: logical Zero Zero --- Zero Sign
Figure E.7 Summary of constant extension for desktop RISCs. The constants in the jump and call instructions of MIPS are not sign-extended since they only replace the lower 28 bits of the PC, leaving the upper 4 bits unchanged. PA-RISC has no logical immediate instructions. 


 
 

Format: instruction category  ARMv4  Thumb  SuperH  M32R MIPS16 
Branch: all Sign Sign Sign Sign Sign
Jump/call: all Sign Sign/Zero Sign Sign ---
Register-immediate: data transfer Zero Zero Zero Sign Zero
Register-immediate: arithmetic Zero Zero Sign Sign Zero/Sign
Register-immediate: logical Zero --- Zero Zero ---
Figure E.8 Summary of constant extension for embedded RISCs. The 16-bit length instructions have much shorter immediates than those of the desktop RISCs, typically only 5 to 8 bits. Most embedded RISCs, however, have a way to get a long address for procedure calls from two sequential halfwords. The constants in the jump and call instructions of MIPS are not sign-extended since they only replace the lower 28 bits of the PC, leaving the upper 4 bits unchanged. The 8-bit immediates in ARM can be rotated right an even number of bits between 2 and 30, yielding a large range of immediate values. For example, all powers of 2 are immediates in ARM. 

Figures E.7 and E.8 show the variations in extending constant fields to the full width of the registers. In this subtle point, the RISCs are similar but not identical.

E.3 Instructions: The MIPS Core Subset

The similarities of each architecture allow simultaneous descriptions, starting with the operations equivalent to the MIPS core.

MIPS Core Instructions

Almost every instruction found in the MIPS core is found in the other architectures, as Figures E.9 through E.13 show. (For reference, definitions of the MIPS core instructions are found in Figure 4.42 of Chapter 4 and on the back inside cover of the book.) Instructions are listed under four categories: data transfer (Figure E.9); arithmetic, logical (Figure E.10); control (Figure E.11); and floating point (Figure E.12). A fifth category (Figure E.13) shows conventions for register usage and pseudoinstructions on each architecture. If a MIPS core instruction requires a short sequence of instructions in other architectures, these instructions are separated by semicolons in Figures E.9 through E.13. (To avoid confusion, the destination register will always be the leftmost operand in this Web extension, independent of the notation normally used with each architecture.) Figures E.14 through E.17 show the equivalent listing for embedded RISCs. Note that floating point is generally not defined for the embedded RISCs.
 

Data transfer 
(instruction formats) 
R-I  R-I  R-I, R-R  R-I, R-R  R-I, R-R 
Instruction name  Alpha  MIPS V  PA-RISC 2.0  PowerPC  SPARC V9 
Load byte signed LDBU; SEXTB LB LDB;
EXTRW,S 31,8
LBZ; EXTSB LDSB
Load byte unsigned LDBU LBU LDB, LDBX, LDBS LBZ LDUB
Load halfword signed LDWU; SEXTW LH LDH;
EXTRW,S 31,16
LHA LDSH
Load halfword 
unsigned
LDWU LHU LDH, LDHX, LDHS LHZ LDUH
Load word LDLS LW LDW, LDWX, LDWS LW LD
Load SP float LDS* LWC1 FLDWX, FLDWS LFS LDF
Load DP float LDT LDC1 FLDDX, FLDDS LFD LDDF
Store byte STB SB STB, STBX, STBS STB STB
Store halfword STW SH STH, STHX, STHS STH STH
Store word STL SW STW, STWX, STWS STW ST
Store SP float STS SWC1 FSTWX, FSTWS STFS STF
Store DP float STT SDC1 FSTDX, FSTDS STFD STDF
Read, write 
special registers
MF_, MT_ MF, MT_ MFCTL, MTCTL MFSPR, MF_, MTSPR, MT_ RD, WR,
RDPR, WRPR,
LDXFSR, STXFSR
Move integer to FP register ITOFS MFC1/ DMFC1 STW; FLDWX STW; LDFS ST; LDF
Move FP to integer register FTTOIS MTC1/ DMTC1 FSTWX; LDW STFS; LW STF; LD
Figure E.9 Desktop RISC data transfer instructions equivalent to MIPS core. A sequence of instructions to synthesize a MIPS instruction is shown separated by semicolons. If there are several choices of instructions equivalent to MIPS core, they are separated by commas. For this figure, halfword is 16 bits and word is 32 bits. Note that in Alpha, LDS converts single-precision floating point to double precision and loads the entire 64-bit register. 

Arithmetic, logical 
(instruction formats) 
R-R, R-I  R-R, R-I  R-R, R-I  R-R, R-I  R-R, R-I 
Instruction name  Alpha  MIPS V  PA-RISC 2.0  PowerPC  SPARC V9 
Add ADDL ADDU, ADDU ADDL, LD0, ADDI, UADDCM ADD, ADDI ADD
Add (trap if overflow) ADDLV ADD, ADDI ADDO, ADDIO ADDO;
MCRXR; BC
ADDcc; TVS
Sub SUBL SUBU SUB, SUBI SUBF SUB
Sub (trap if overflow) SUBLV SUB SUBTO, SUBIO SUBF/oe SUBcc; TVS
Multiply MULL MULT,
MULTU
SHiADD; ...;
(i=1,2,3)
MULLW,
MULLI
MULX
Multiply (trap if overflow) MULLV --- SHiADDO; ...; --- ---
Divide --- DIV, DIVU DS; ...; DS DIVW DIVX
Divide (trap if overflow) --- --- --- --- ---
And AND AND, ANDI AND AND, ANDI AND
Or BIS OR, ORI OR OR, ORI OR
Xor XOR XOR, XORI XOR XOR, XORI XOR
Load high part register LDAH LUI LDIL ADDIS SETHI (B fmt.)
Shift left logical SLL SLLV, SLL DEPW, Z 31-i,32-i RLWINM SLL
Shift right logical SRL SRLV, SRL EXTRW, U 31, 32-i RLWINM 32-i SRL
Shift right arithmetic SRA SRAV, SRA EXTRW, S 31, 32-i SRAW SRA
Compare CMPEQ, CMPLT, CMPLE SLT/U,
SLTI/U
COMB CMP(I)CLR SUBcc r0,...
Figure E.10 Desktop RISC arithmetic-logical instructions equivalent to MIPS core. Dashes mean the operation is not available in that architecture, or not synthesized in a few instructions. Such a sequence of instructions is shown separated by semicolons. If there are several choices of instructions equivalent to MIPS core, they are separated by commas. Note that in the "Arithmetic, logical" category all machines but SPARC use separate instruction mnemonics to indicate an immediate operand; SPARC offers immediate versions of these instructions but uses a single mnemonic. (Of course these are separate opcodes!) 

Control 
(instruction formats) 
B, J/C  B, J/C  B, J/C  B, J/C  B, J/C 
Instruction name  Alpha  MIPS V  PA-RISC 2.0  PowerPC  SPARC V9 
Branch on integer 
compare
B_
(<, >, <=, >=, =, not=)
BEQ, BNE, B_Z (<, >, <=, >=) COMB, COMIB BC BR_Z, BPcc (<, >, <=, >=, =, not=)
Branch on floating- 
point compare
FB_(<, >, <=, >=, =, not=) BC1T, BC1F FSTWX f0; LDW t; BB t BC FBPfcc (<, >, <=, >=, =, ...)
Jump, jump register BR, JMP J, JR BL r0, BLR r0 B, BCLR, BCCTR BA, JMPL r0,...
Call, call register BSR JAL, JALR BL, BLE BL, BLA, BCLRL, BCCTRL CALL, JMPL
Trap CALL_PAL GENTRAP BREAK BREAK TW, TWI Ticc, SIR
Return from interrupt CALL_PAL REI JR; ERET RFI, RFIR RFI DONE, RETRY, RETURN
Figure E.11 Desktop RISC control instructions equivalent to MIPS core. If there are several choices of instructions equivalent to MIPS core, they are separated by commas. 

Floating point 
(instruction formats) 
R-R  R-R  R-R  R-R  R-R 
Instruction name  Alpha  MIPS V  PA-RISC 2.0  PowerPC  SPARC V9 
Add single, double ADDS, ADDT ADD.S,
ADD.D
FADD
FADD/dbl
FADDS,
FADD
FADDS,
FADDD
Subtract single, double SUBS, SUBT SUB.S,
SUB.D
FSUB
FSUB/dbl
FSUBS,
FSUB
FSUBS,
FSUBD
Multiply single, double MULS, MULT MUL.S,
MUL.D
FMPY
FMPY/dbl
FMULS,
FMUL
FMULS,
FMULD
Divide single, double DIVS, DIVT DIV.S,
DIV.D
FDIV,
FDIV/dbl
FDIVS,
FDIV
FDIVS,
FDIVD
Compare CMPT_ (=, <, <=, UN) C_.S, C_.D (<, >, <=, >=, =, ...) FCMP,
FCMP/dbl
(<, =, >)
FCMP FCMPS,
FCMPD
Move R-R ADDT Fd,F31, Fs MOV.S, MOV.D FCPY FMV FMOVS/D/Q
Convert 
(single,double,integer) 
to 
(single,double,integer)
CVTST,
CVTTS,
CVTTQ,
CVTQS,
CVTQT
CVT.S.D,
CVT.D.S,
CVT.S.W,
CVT.D.W,
CVT.W.S,
CVT.W.D
FCNVFF,s,d
FCNVFF,d,s
FCNVXF,s,s
FCNVXF,d,d
FCNVFX,s,s
FCNVFX,d,s
---,
FRSP,
---,
FCTIW,
---,
---
FSTOD,
FDTOS,
FSTOI,
FDTOI,
FITOS,
FITOD
Figure E.12 Desktop RISC floating-point instructions equivalent to MIPS core. Dashes mean the operation is not available in that architecture, or not synthesized in a few instructions. If there are several choices of instructions equivalent to MIPS core, they are separated by commas. 

Conventions  Alpha  MIPS V  PA-RISC 2.0  PowerPC  SPARC V9 
Register with value 0 r31 (source) r0 r0 r0 (addressing) r0
Return address register (any) r31 r2, r31 link (special) r31
No-op LDQ_U r31, ... SLL r0, r0, r0 OR r0, r0, r0 ORI r0, r0, #0 SETHI r0,0
Move R-R integer BIS ..., r31, ... ADD ..., r0, ... OR ..., r0, ... OR rx, ry, ry OR ..., r0, ...
Operand order OP Rs1, Rs2, Rd OP Rd, Rs1, Rs2 OP Rs1, Rs2, Rd OP Rd, Rs1, Rs2 OP Rs1, Rs2, Rd
Figure E.13 Conventions of desktop RISC architectures equivalent to MIPS core.

Instruction name  ARMv4  Thumb  SuperH  M32R  MIPS16 
Data transfer 
(instruction formats) 
DT  DT  DT  DT  DT 
Load byte signed LDRSB LDRSB MOV.B LDB LB
Load byte unsigned LDRB LDRB MOV.B; EXTU.B LDUB LBU
Load half word signed LDRSH LDRSH MOV.W LDH LH
Load half word 
unsigned
LDRH LDRH MOV.W; EXTU.W LDUH LHU
Load word LDR LDR MOV.L LD LW
Store byte STRB STRB MOV.B STB SB
Store half word STRH STRH MOV.W STH SH
Store word STR STR MOV.L ST SW
Read, write 
special registers
MRS, MSR ---1 LDC, STC MVFC, MVTC MOVE
Figure E.14 Embedded RISC data transfer instructions equivalent to MIPS core. A sequence of instructions to synthesize a MIPS instruction is shown separated by semicolons. Note that floating point is generally not defined for the embedded RISCs. Thumb and MIPS16 are just 16-bit instruction subsets of the ARM and MIPS architectures, so machines can switch modes and execute the full instruction set. We use ---1 to show sequences that are available in 32-bit mode but not 16-bit mode in Thumb or MIPS16. 

Arithmetic, logical 
(instruction formats) 
R-R, R-I  R-R, R-I  R-R, R-I  R-R, R-I  R-R, R-I 
Instruction name  ARMv4  Thumb  SuperH  M32R  MIPS16 
Add ADD ADD ADD ADD, ADDI, ADD3 ADDU, ADDIU
Add (trap if overflow) ADDS; SWIVS ADD; BVC .+4; SWI ADDV ADDV, ADDV3 ---1
Subtract SUB SUB SUB SUB SUBU
Subtract (trap if overflow) SUBS; SWIVS SUB; BVC .+1; SWI SUBV SUBV ---1
Multiply MUL MUL MUL MUL MULT,
MULTU
Multiply (trap if overflow)         ---
Divide --- --- DIV1, DIVoS,DIVoU DIV, DIVU DIV, DIVU
Divide (trap if overflow) --- ---     ---
And AND AND AND AND, AND3 AND
Or ORR ORR OR OR, OR3 OR
Xor EOR EOR XOR XOR, XOR3 XOR
Load high part register --- ---   SETH ---1
Shift left logical LSL3 LSL2 SHLL, SHLLn SLL, SLLI, SLL3 SLLV, SLL
Shift right logical LSR3 LSR2 SHRL, SHRLn SRL, SRLI, SRL3 SRLV, SRL
Shift right arithmetic ASR3 ASR2 SHRA, SHAD SRA, SRAI, SRA3 SRAV, SRA
Compare CMP,CMN,
TST,TEQ
CMP, CMN,
TST
CMP/cond,
TST
CMP/I, CMPU/I CMP/I2, SLT/I,
SLT/IU
Figure E.15 Embedded RISC arithmetic-logical instructions equivalent to MIPS core. Dashes mean the operation is not available in that architecture, or not synthesized in a few instructions. Such a sequence of instructions is shown separated by semicolons. If there are several choices of instructions equivalent to MIPS core, they are separated by commas. Thumb and MIPS16 are just 16-bit instruction subsets of the ARM and MIPS architectures, so machines can switch modes and execute the full instruction set. We use ---1 to show sequences that are available in 32-bit mode but not 16-bit mode in Thumb or MIPS16. The superscript 2 shows new instructions found only in 16-bit mode of Thumb or MIPS16, such as CMP/I2. ARM includes shifts as part of every data operation instruction, so the shifts with superscript 3 are just a variation of a move instruction, such as LSR3

Control 
(instruction formats) 
B, J, C  B, J, C  B, J, C  B, J, C  B, J, C 
Instruction name  ARMv4  Thumb  SuperH  M32R  MIPS16 
Branch on integer 
compare
B/cond B/cond BF, BT BEQ, BNE, BC,BNC, B__Z BEQZ2, BNEZ2, BTEQZ2, BTNEZ2
Jump, jump register MOV pc,ri MOV pc,ri BRA, JMP BRA, JMP B2, JR
Call, call register BL BL BSR, JSR BL, JL JAL, JALR, JALX2
Trap SWI SWI TRAPA TRAP BREAK
Return from interrupt MOVS pc, r14 ---1 RTS RTE ---1
Figure E.16 Embedded RISC control instructions equivalent to MIPS core. Thumb and MIPS16 are just 16-bit instruction subsets of the ARM and MIPS architectures, so machines can switch modes and execute the full instruction set. We use ---1 to show sequences that are available in 32-bit mode but not 16-bit mode in Thumb or MIPS16. The superscript 2 shows new instructions found only in 16-bit mode of Thumb or MIPS16, such as BTEQZ2

Conventions ARMv4  Thumb  SuperH  M32R  MIPS16 
Return address reg. R14 R14 PR (special) R14 RA (special)
No-op MOV r0,r0 MOV r0,r0 NOP NOP SLL r0, r0
Operands, order OP Rd, Rs1, Rs2 OP Rd, Rs1 OP Rs1, Rd OP Rd, Rs1 OP Rd, Rs1, Rs2
Figure E.17 Conventions of embedded RISC instructions equivalent to MIPS core.

Every architecture must have a scheme for compare and conditional branch, but despite all the similarities, each of these architectures has found a different way to perform the operation.

Compare and Conditional Branch

SPARC uses the traditional four condition code bits stored in the program status word: negative, zero, carry, and overflow. They can be set on any arithmetic or logical instruction; unlike earlier architectures, this setting is optional on each instruction. An explicit option leads to fewer problems in pipelined implementation. Although condition codes can be set as a side effect of an operation, explicit compares are synthesized with a subtract using r0 as the destination. SPARC conditional branches test condition codes to determine all possible unsigned and signed relations. Floating point uses separate condition codes to encode the IEEE 754 conditions, requiring a floating-point compare instruction. Version 9 expanded SPARC branches in four ways: a separate set of condition codes for 64-bit operations; a branch that tests the contents of a register and branches if the value is =, not=, <, <=, >=, or <= 0 (see MIPS below); three more sets of floating-point condition codes; and branch instructions that encode static branch prediction.

PowerPC also uses four condition codes: less than, greater than, equal, and summary overflow, but it has eight copies of them. This redundancy allows the PowerPC instructions to use different condition codes without conflict, essentially giving PowerPC eight extra 4-bit registers. Any of these eight condition codes can be the target of a compare instruction and any can be the source of a conditional branch. The integer instructions have an option bit that behaves as if the integer op is followed by a compare to zero that sets the first condition "register." PowerPC also lets the second "register" be optionally set by floating-point instructions. PowerPC provides logical operations among these eight 4-bit condition code registers (CRAND, CROR, CRXOR, CRNAND, CRNOR, CREQV), allowing more complex conditions to be tested by a single branch.

MIPS uses the contents of registers to evaluate conditional branches. Any two registers can be compared for equality (BEQ) or inequality (BNE), and then the branch is taken if the condition holds. The set-on-less-than instructions (SLT, SLTI, SLTU, SLTIU) compare two operands and then set the destination register to 1 if less and to 0 otherwise. These instructions are enough to synthesize the full set of relations. Because of the popularity of comparisons to 0, MIPS includes special compare-and-branch instructions for all such comparisons: greater than or equal to zero (BGEZ), greater than zero (BGTZ), less than or equal to zero (BLEZ), and less than zero (BLTZ). Of course, equal and not equal to zero can be synthesized using r0 with BEQ and BNE. Like SPARC, MIPS I uses a condition code for floating point with separate floating-point compare and branch instructions; MIPS IV expanded this to eight floating-point condition codes, with the floating-point comparisons and branch instructions specifying the condition to set or test.

Alpha compares (CMPEQ, CMPLT, CMPLE, CMPULT, CMPULE) test two registers and set a third to 1 if the condition is true and to 0 otherwise. Floating-point compares (CMTEQ, CMTLT, CMTLE, CMTUN) set the result to 2.0 if the condition holds and to 0 otherwise. The branch instructions compare one register to 0 (BEQ, BGE, BGT, BLE, BLT, BNE)or its least significant bit to 0 (BLBC, BLBS)and then branch if the condition holds.

PA-RISC has many branch options, which we'll see in section E.8. The most straightforward is a compare and branch instruction (COMB), which compares two registers, branches depending on the standard relations, and then tests the least-significant bit of the result of the comparison.

ARM is similar to SPARC, in that it provides four traditional condition codes that are optionally set. CMP subtracts one operand from the other and the difference sets the condition codes. Compare negative (CMN) adds one operand to the other and the sum sets the condition codes. TST performs logical AND on the two operands to set all condition codes but overflow, while TEQ uses exclusive OR to set the first three condition codes. Like SPARC, the conditional version of the ARM branch instruction tests condition codes to determine all possible unsigned and signed relations. As we shall see in section E.9, one unusual feature of ARM is that every instruction has the option of executing conditionally depending on the condition codes. (This bears similarities to the annulling option of PA-RISC, seen in section E.8.)

Not suprisingly, Thumb follows ARM. Differences are that setting condition codes are not optional, the TEQ instruction is dropped, and there is no conditional execution of instructions.

The Hitachi SuperH uses a single T-bit condition that is set by compare instructions. Two branch instructions decide to branch if either the T bit is 1 (BT) or the T bit is 0 (BF). The two flavors of branch allow fewer comparison instructions.

Mitsubishi M32R also offers a single condition code bit (C) used for signed and unsigned comparisons (CMP, CMPI, CMPU, CMPUI) to see if one register is less than the other or not, similar to the MIPS set-on-less-than instructions. Two branch instructions test to see if the C bit is 1 or 0: BC and BNC. The M32R also includes instructions to branch on equality or inequality of registers (BEQ and BNE) and all relations of a register to 0 (BGEZ, BGTZ, BLEZ, BLTZ, BEQZ, BNEZ). Unlike BC and BNC,these last instructions are all 32 bits wide.

MIPS16 keeeps set-on-less-than instructions (SLT, SLTI, SLTU, SLTIU), but instead of putting the result in one of the eight registers, it is placed in a special register named T. MIPS16 is always implemented in machines that also have the full 32-bit MIPS instructions and registers; hence, register T is really register 24 in the full MIPS architecture. The MIPS16 branch instructions test to see if a register is or is not equal to zero (BEQZ and BNEZ). There are also instructions that branch if register T is or is not equal to zero (BTEQZ and BTNEZ). To test if two registers are equal, MIPS added compare instructions (CMP, CMPI) that compute the exclusive OR of two registers and place the result in register T. Compare was added since MIPS16 left out instructions to compare and branch if registers or equal or not (BEQ and BNE).

Figures E.18 and E.19 summarize the schemes used for conditional branches.
 

  Alpha  MIPS V  PA-RISC 2.0  PowerPC  SPARC V9 
Number of condition code bits 
(integer and FP)
0 8 FP 8 FP 8 x 4 both 2 x 4 integer, 
4 x 2 FP
Basic compare instructions 
(integer and FP)
1 integer, 
1 FP
1 integer, 
1 FP
4 integer, 
2 FP
4 integer, 
2 FP
1 FP
Basic branch instructions 
(integer and FP)
1 2 integer, 
1 FP
7 integer 1 both 3 integer, 
1 FP
Compare register with register/ 
const and branch
--- =, not= =, not=, <, <=, >, >=, 
even, odd
--- ---
Compare register to zero and 
branch
=, not=, <, <=, >, >=, 
even, odd
=, not=, <, <=, >, >= =, not=, <, <=, >, >=, 
even, odd
--- =, not=, <, <=, >, >=
Figure E.18 Summary of five desktop RISC approaches to conditional branches. Floating-point branch on PA-RISC is accomplished by copying the FP status register into an integer register and then using the branch on bit instruction to test the FP comparison bit. Integer compare on SPARC is synthesized with an arithmetic instruction that sets the condition codes using r0 as the destination. 


 
 

  ARMv4  Thumb  SuperH  M32R  MIPS16 
Number of condition code bits 4 4 1 1 1
Basic compare instructions 4 3 2 2 2
Basic branch instructions 1 1 2 3 2
Compare register with register/ 
const and branch
--- --- =, >, >= =, not= ---
Compare register to zero and 
branch
--- --- =, >, >= =, not=, <, <=, >, >= =, not=
Figure E.19 Summary of five embedded RISC approaches to conditional branches.

E.4 Instructions: Multimedia Extensions of the Desktop/Server RISCs

Since every desktop microprocessor by definition has its own graphical displays, as transistor budgets increased it was inevitable that support would be added for graphics operations. Many graphics systems use 8 bits to represent each of the three primary colors plus 8 bits for a location of a pixel (see Chapter 1).

The addition of speakers and microphones for teleconferencing and video games suggested support of sound as well. Audio samples need more than 8 bits of precision, but 16 bits are sufficient.

Every microprocessor has special support so that bytes and halfwords take up less space when stored in memory, but due to the infrequency of arithmetic operations on these data sizes in typical integer programs, there is little support beyond data transfers. The architects of the Intel i860, which was justified as a graphical accelerator within the company, recognized that many graphics and audio applications would perform the same operation on vectors of these data. Although a vector unit was beyond the transistor budget of the i860 in 1989, by partitioning the carry chains within a 64-bit ALU (see Chapter 4), it could perform simultaneous operations on short vectors of eight 8-bit operands, four 16-bit operands, or two 32-bit operands. The cost of such partitioned ALUs was small. Applications that lend themselves to such support include MPEG (video), games like DOOM (3D graphics), Adobe Photoshop (digital photography), and teleconferencing (audio and image processing).

Like a virus, over time such multimedia support has spread to nearly every desktop microprocessor. HP was the first successful desktop RISC to include such support. As we shall see, this virus spread unevenly. The PowerPC is the only holdout, and rumors are that it is "running a fever."

These extensions have been called subword parallelism, vector, or SIMD (single instruction, multiple data) (see Chapter 9). Since Intel marketing uses SIMD to describe the MMX extension of the the 80x86, that may become the popular name. Figure E.20 summarizes the support by architecture.
 

Instruction category  Alpha 
MAX 
MIPS MDMX  PA-RISC MAX2  Power PC  SPARC VIS 
Add/subtract   8B,4H 4H   4H,2W
Saturating add/sub   8B,4H 4H    
Multiply   8B,4H     4B/H
Compare 8B (>=) 8B,4H (=,<,<=)     4H,2W (=,not=,>,<=)
Shift right/left   8B,4H 4H    
Shift right arithmetic   4H 4H    
Multiply and add   8B,4H      
Shift and add (saturating)     4H    
And/or/xor 8B,4H,2W 8B,4H,2W 8B,4H,2W   8B,4H,2W
Absolute difference 8B       8B
Max/min 8B, 4W 8B,4H      
Pack (2n bits --> n bits) 2W->2B, 4H->4B 2*2W->4H, 
2*4H->8B
2*4H->8B   2W->2H, 
2W->2B, 4H->4B
Unpack/merge 2B->2W, 4B->4H 2*4B->8B, 
2*2H->4H
    4B->4H, 
2*4B->8B
Permute/shuffle   8B,4H 4H    
Register sets Integer Fl. Pt. + 192b Acc. Integer   Fl. Pt.
Figure E.20 Summary of multimedia support for desktop RISCs. B stands for byte (8 bits), H for halfword (16 bits), and W for word (32 bits). Thus 8B means an operation on 8 bytes in a single instruction. Pack and unpack use the notation 2*2W to mean 2 operands each with 2 words. Note that MDMX has vector/scalar operations, where the scalar is specified as an element of one of the vector registers. This table is a simplification of the full multimedia architectures, leaving out many details. For example, MIPS MDMX includes instructions to multiplex between two operands, HP MAX2 includes an instruction to calculate averages, and SPARC VIS includes instructions to set registers to constants. Also, this table does not include the memory alignment operation of MDMX, MAX, and VIS. 


 

From Figure E.20 you can see that in general MIPS MDMX works on 8 bytes or 4 halfwords per instruction, HP PA-RISC MAX2 works on 4 halfwords, SPARC VIS works on 4 halfwords or 2 words, and Alpha doesn't do much. The Alpha MAX operations are just byte versions of compare, min, max, and absolute difference, leaving it up to software to isolate fields and perform parallel adds, subtracts, and multiplies on bytes and halfwords. MIPS also added operations to work on two 32-bit floating-point operands per cycle, but they are considered part of MIPS V and not simply multimedia extensions (see section E.7).

One feature not generally found in general-purpose microprocessors is saturating operations. Saturation means that when a calculation overflows, the result is set to the largest positive number or most negative number, rather than a modulo calculation as in two's complement arithmetic. Commonly found in digital signal processors (see the next section), these saturating operations are helpful in routines for filtering.

These machines largely used existing register sets to hold operands: integer registers for Alpha and HP PA-RISC and floating-point registers for MIPS and Sun. Hence data transfers are accomplished with standard load and store instructions. MIPS also added a 192-bit (3*64) wide register to act as an accumulator for some operations. By having 3 times the native data width, it can be partitioned to accumulate either 8 bytes with 24 bits per field or 4 halfwords with 48 bits per field. This wide accumulator can be used for add, subtract, and multiply/add instructions. MIPS claims performance advantages of 2 to 4 times for the accumulator.

Perhaps the surprising conclusion of this table is the lack of consistency. The only operations found on all four are the logical operations (AND, OR, XOR), which do not need a partitioned ALU. If we leave out the frugal Alpha, then the only other common operations are parallel adds and subtracts on 4 halfwords.

Each manufacturer states that these are instructions intended to be used in hand-optimized subroutine libraries, an intention likely to be followed, as a compiler that works well with all desktop RISCs' multimedia extensions would be challenging.

E.5 Instructions: Digital Signal Processing Extensions of the Embedded RISCs

One feature found in every digital signal processor (DSP) architecture is support for integer multiply-accumulate. The multiplies tend to be on shorter words than regular integers, such as 16-bits, and the accumulator tends to be longer words, such as 64 bits. The reason for multiply-accumulate is to efficiently implement digital filters, common in DSP applications. Since Thumb and MIPS16 are subset architectures, they do not provide such support. Instead, programmers should use the DSP or multimedia extensions found in the 32-bit mode instructions of ARM and MIPS V.

Figure E.21 shows the size of the multiply, the size of the accumulator, and the operations and instruction names for the embedded RISCs. Machines with accumulator sizes greater than 32 and less than 64 bits will force the upper bits to remain as the sign bits, thereby "saturating" the add to set to maximum and minimum fixed-point values if the operations overflow.
 
 

  ARMv4  Thumb  SuperH  M32R  MIPS16 
Size of multiply 32B x 32B --- 32B x 32B, 16B x 16B  32B x 16B, 16B x 16B ---
Size of accumulator 32B/64B --- 32B/42B, 48B/64B 56B ---
Accumulator name Any GPR or pairs of GPRs --- MACH, MACL ACC ---
Operations 32B/64B product+ 64B accumulate 
signed/unsigned
--- 32B product + 42B/32B accumulate (operands in memory); 
64B product + 64B/48B accumulate (operands in memory); 
clear MAC
32B/48B product+ 64B accumulate, round, move ---
Corresponding instruction names MLA, SMLAL, UMLAL --- MAC, MACS, MAC.L, MAC.LS,
CLRMAC
MACHI/MACLO, MACWHI/MACWLO, RAC, RACH, 
MVFACHI/
MVFACLO, MVTACHI/
MVTACLO
---
Figure E.21 Summary of five embedded RISC approaches to multiply-accumulate.

E.6 Instructions: Common Extensions to MIPS Core

Figures E.22 through E.28 list instructions not found in Figures E.9 through E.17 in the same four categories. Instructions are put in these lists if they appear in more than one of the standard architectures. The instructions are defined using the hardware description language defined in Figure E.29.
 

Name  Definition  Alpha MIPS V  PA-RISC 2.0  Power PC  SPARC V9 
Atomic swap R/M 
(for locks and semaphores)
Temp<---Rd; 
Rd<---Mem[x]; 
Mem[x]<---Temp
LDL/Q_L; STL/Q_C LL; SC --- (see E.8) LWARX;
STWCX
CASA,
CASX
Load 64-bit integer Rd<---64 Mem[x] LDQ LD LDD LD LDX
Store 64-bit integer Mem[x]<---64 Rd STQ SD STD STD STX
Load 32-bit integer 
unsigned
Rd32..63<---32 Mem[x]; 
Rd0..31<---32 0
LDL; EXTLL LWU LDW LWZ LDUW
Load 32-bit integer 
signed
Rd32..63<---32 Mem[x]; 
Rd0..31<---32 Mem[x]032
LDL LW LDW; EXTRD,S 63,8 LWA LDSW
Prefetch Cache[x]<---hint FETCH, FETCH_M* PREF, PREFX LDD, r0 LDW, r0 DCBT,
DCBTST
PRE- FETCH
Load coprocessor Coprocessor<--- Mem[x] --- LWCi CLDWX, CLDWS --- ---
Store coprocessor Mem[x]<--- Coprocessor --- SWCi CSTWX, CSTWS --- ---
Endian (Big/Little Endian?) Either Either Either Either Either
Cache flush (Flush cache block at this address) ECB CP0op FDC, FIC DCBF FLUSH
Shared memory 
synchronization
(All prior data transfers complete before next data transfer may start) WMB SYNC SYNC SYNC MEMBAR
Figure E.22 Data transfer instructions not found in MIPS core but found in two or more of the five desktop architectures. The load linked/store conditional pair of instructions gives Alpha and MIPS atomic operations for semaphores, allowing data to be read from memory, modified, and stored without fear of interrupts or other machines accessing the data in a multiprocessor (see Chapter 9). Prefetching in the Alpha to external caches is accomplished with FETCH and FETCH_M; on-chip cache prefetches use LD_Q A, R31, and LD_Y A. F31 is used in the Alpha 21164 (see Bhandarkar [1995], p. 190). 

Name  Definition  Alpha MIPS V  PA-RISC 2.0  PowerPC  SPARC V9 
64-bit integer arithmetic ops Rd<---64Rs1 op64 Rs2 ADD, SUB, MUL DADD, DSUB
DMULT, DDIV
ADD, SUB,
SHLADD,
DS
ADD, SUBF, MULLD, DIVD ADD, SUB, MULX, S/UDIVX
64-bit integer logical ops Rd<---64Rs1 op64 Rs2 AND, OR, XOR AND, OR, XOR AND, OR, XOR AND, OR, XOR AND, OR, XOR
64-bit shifts Rd<---64Rs1 op64 Rs2 SLL, SRA, SRL DSLL/V, DSRA/V, DSRL/V DEPD,Z EXTRD,S
EXTRD,U
SLD, SRAD, SRLD SLLX, SRAX, SRLX
Conditional move if (cond) Rd<---Rs CMOV_ MOVN/Z SUBc, n; ADD --- MOVcc, MOVr
Support for multiword integer add CarryOut,Rd <--- Rs1 + Rs2 + OldCarryOut --- ADU; SLTU; ADDU,
DADU; SLTU; DADDU
ADDC ADDC, ADDE. ADDcc
Support for multiword integer sub CarryOut, Rd <--- Rs1 
Rs2 + OldCarryOut
--- SUBU; SLTU; SUBU,
DSUBU; SLTU; DSUBU
SUBB SUBFC, SUBFE. SUBcc
And not Rd <--- Rs1 & ~(Rs2) BIC --- ANDCM ANDC ANDN
Or not Rd <--- Rs1 | ~(Rs2) ORNOT --- --- ORC ORN
Add high 
immediate
Rd0..15<---Rs10..15
(Const<<16);
--- --- ADDIL (R---I) ADDIS
(R-I)
---
Coprocessor 
operations
(Defined by 
coprocessor)
--- COPi COPR,i --- IMPDEPi
Figure E.23 Arithmetic-logical instructions not found in MIPS core but found in two or more of the five desktop architectures.

Name  Definition  Alpha MIPS V  PA-RISC 2.0  PowerPC  SPARC V9 
Optimized delayed 
branches
(Branch not always delayed) --- BEQL, BNEL, B_ZL (<, >, <=, >=) COMBT, n, COMBF, n --- BPcc, A, FPBcc, A
Conditional trap if (COND) {R31<---PC; PC<---0..0#i} --- T_,,T_I (=, not=, <, >, <=, >=) SUBc, n; BREAK TW, TD, TWI, TDI Tcc
No. control registers Misc. regs (virtual 
memory, interrupts,...)
6 equiv. 12 32 33 29
Figure E.24 Control instructions not found in MIPS core but found in two or more of the five desktop architectures.

Name  Definition  Alpha MIPS V  PA-RISC 2.0  PowerPC  SPARC V9 
Multiply and add Fd <--- ( Fs1 x Fs2) + Fs3 --- MADD.S/D FMPYFADD sgl/dbl FMADD/S  
Multiply and sub Fd <--- ( Fs1 x Fs2) - Fs3 --- MSUB.S/D   FMSUB/S  
Neg mult and add Fd <--- -(( Fs1 x Fs2) + Fs3) --- NMADD.S/D FMPYFNEG sgl/dbl FNMADD/S  
Neg mult and sub Fd <--- -(( Fs1 x Fs2) - Fs3) --- NMSUB.S/D   FNMSUB/S  
Square root Fd <--- SQRT(Fs) SQRT_ SQRT.S/D FSQRT sgl/dbl FSQRT/S FSQRTS/D
Conditional move if (cond) Fd<---Fs FCMOV_ MOVF/T,
MOVF/T.S/D
FTEST; FCPY --- FMOVcc
Negate Fd <--- Fs ^ x80000000 CPYSN NEG.S/D FNEG sgl/dbl FNEG FNEGS/D/Q
Absolute value Fd <--- Fs & x7FFFFFFF --- ABS.S/D FABS/dbl FABS FABSS/D/Q
Figure E.25 Floating-point instructions not found in MIPS core but found in two or more of the five desktop architectures.

Name  Definition  ARMv4  Thumb  SuperH  M32R  MIPS16 
Atomic swap R/M 
(for semaphores)
Temp<---Rd; 
Rd<---Mem[x]; 
Mem[x]<---Temp
SWP, SWPB ---1 (see TAS) LOCK; UNLOCK ---1
Memory management unit Paged address translation Via 
coprocessor instructions
---1 LDTLB   ---1
Endian (Big/Little Endian?) Either Either Either Big Either
Figure E.26 Data transfer instructions not found in MIPS core but found in two or more of the five embedded architectures.We use ---1 to show sequences that are available in 32-bit mode but not 16-bit mode in Thumb or MIPS16. 


 
 

Name  Definition  ARMv4  Thumb  SuperH  M32R  MIPS16 
Load immediate Rd<---Imm MOV MOV MOV,
MOVA
LDI, LD24 LI
Support for multiword integer add CarryOut, Rd <--- Rd +Rs1 + OldCarryOut ADCS ADC ADDC ADDX ---1
Support for multiword integer sub CarryOut, Rd <--- Rd -Rs1` + OldCarryOut SBCS SBC SUBC SUBX ---1
Negate Rd <--- 0 - Rs1   NEG2 NEG NEG NEG
Not Rd <--- ~(Rs1) MVN MVN NOT NOT NOT
Move Rd <--- Rs1 MOV MOV MOV MV MOVE
Rotate right Rd <--- Rs >> i, 
Rd0...i-1 <--- Rs31-i...31
ROR ROR ROTR    
And not Rd <--- Rs1 & ~(Rs2) BIC BIC      
Figure E.27 Arithmetic-logical instructions not found in MIPS core but found in two or more of the five embedded architectures. We use ---1 to show sequences that are available in 32-bit mode but not in 16-bit mode in Thumb or MIPS16. The superscript 2 shows new instructions found only in 16-bit mode of Thumb or MIPS16, such as NEG2


 
 

Name  Definition  ARMv4  Thumb  SuperH  M32R  MIPS16 
No. control registers Misc. registers 21 29 9 5 36
Figure E.28 Control information in the five embedded architectures.


 

Although most of the categories are self-explanatory, a few bear comment:


 

Notation  Meaning  Example  Meaning 
<-- Data transfer. Length of transfer is given by the destination's length; the length is specified when not clear. Regs[R1]<--Regs[R2]; Transfer contents of R2 to R1. Registers have a fixed length, so transfers shorter than the register size must indicate which bits are used.
M Array of memory accessed in bytes. The starting address for a transfer is indicated as the index to the memory array. Regs[R1]<--M[x]; Place contents of memory location x into R1. If a transfer starts at M[i] and requires 4 bytes, the transferred bytes are M[i], M[i+1], M[i+2], and M[i+3].
<--n Transfer an n-bit field, used whenever length of transfer is not clear. M[y]<--16M[x]; Transfer 16 bits starting at memory location x to memory location y. The length of the two sides should match.
Xn Subscript selects a bit. Regs[R1]0<--0; Change sign bit of R1 to 0. (Bits are numbered from MSB starting at 0.)
Xm..n Subscript selects a field. Regs[R3]24..31<--M[x]; Moves contents of memory location x into low-order byte of R3.
Xn Superscript replicates a bit field. Regs[R3]0..23<--024; Sets high-order three bytes of R3 to 0.
## Concatenates two fields. Regs[R3]<--024## M[x];
 

F2##F3<--64M[x];

Moves contents of location x into low byte of R3; clears upper three bytes. Moves 64 bits from memory starting at location x; 1st 32 bits go into F2, 2nd 32 into F3.
*,& Dereference a pointer; get the address of a variable. p*<--&x; Assign to object pointed to by p, the address of the variable x.
<<,>> C logical shifts (left, right). Regs[R1] << 5 Shift R1 left 5 bits.
==,!=,
>,<,
>=,<=
C relational operators; equal, not equal, greater, less, greater or equal, less or equal. (Regs[R1]== Regs[R2]) & (Regs[R3]!=Regs[R4]) True if contents of R1 equal the contents of R2 and contents of R3 do not equal the contents of R4.
&, |, ^, ! C bitwise logical operations: and, or, exclusive or, and complement. (Regs[R1] &
(Regs[R2]| Regs[R3]))
Bitwise AND of R1 and the bitwise OR of R2 and R3.
Figure E.29 Hardware description notation (and some standard C operators).

One difference that needs a longer explanation is the optimized branches. Figure E.30 shows the options. The Alpha and PowerPC offer branches that take effect immediately, like branches on earlier architectures. To accelerate branches, these machines use branch prediction (see section 6.1). All the rest of the desktop RISCs offer delayed branches (again, see section 6.1). The embedded RISCs generally do not support delayed branch, with the exception of SuperH, which has it as an option.

The other three desktop RISCs provide a version of delayed branch that makes it easier to fill the delay slot. The SPARC "annulling" branch executes the instruction in the delay slot only if the branch is taken; otherwise the instruction is annulled. This means the instruction at the target of the branch can safely be copied into the delay slot since it will only be executed if the branch is taken. The restrictions are that the target is not another branch and that the target is known at compile time. (SPARC also offers a nondelayed jump because an unconditional branch with the annul bit set does not execute the following instruction.) Later versions of the MIPS architecture have added a branch likely instruction that also annuls the following instruction if the branch is not taken. PA-RISC allows almost any instruction to annul the next instruction, including branches. Its "nullifying" branch option will execute the next instruction depending on the direction of the branch and whether it is taken (i.e., if a forward branch is not taken or a backward branch is taken). Presumably this choice was made to optimize loops, allowing the instructions following the exit branch and the looping branch to execute in the common case.
 
 

  (Plain) branch  Delayed branch  Annulling delayed branch 
Found in architectures Alpha, PowerPC, ARM, Thumb, SuperH, M32R, MIPS 16 MIPS V, PA-RISC, SPARC, 
SuperH
MIPS V, SPARC PA-RISC
Execute following instruction Only if branch not taken Always Only if branch taken If forward branch not taken or backward branch taken
Figure E.30 When the instruction following the branch is executed for three types of branches.

Now that we have covered the similarities, we will focus on the unique features of each architecture. We first cover the desktop/server RISCs, ordering them by length of description of the unique features from shortest to longest, and then the embedded RISCs.


E.7 Instructions Unique to MIPS V

MIPS has gone through five generations of instruction sets, and this evolution has generally added features found in other architectures. Here are the salient unique features of MIPS, the first several of which were found in the original instruction set.

Nonaligned Data Transfers

MIPS has special instructions to handle misaligned words in memory. A rare event in most programs, it is included for supporting 16-bit minicomputer applications and for doing memcpy and strcpy faster. Although most RISCs trap if you try to load a word or store a word to a misaligned address, on all architectures misaligned words can be accessed without traps by using four load byte instructions and then assembling the result using shifts and logical ors. The MIPS load and store word left and right instructions (LWL, LWR, SWL, SWR) allow this to be done in just two instructions: LWL loads the left portion of the register and LWR loads the right portion of the register. SWL and SWR do the corresponding stores. Figure E.31 shows how they work. There are also 64-bit versions of these instructions.
 

Figure E.31 MIPS instructions for unaligned word reads. This figure assumes operation in Big Endian mode. Case 1 first loads the 3 bytes 101, 102, and 103 into the left of R2, leaving the least-significant byte undisturbed. The following LWR simply loads byte 104 into the least-significant byte of R2, leaving the other bytes of the register unchanged using LWL. Case 2 first loads byte 203 into the most-significant byte of R4, and the following LWR loads the other 3 bytes of R4 from memory bytes 204, 205, and 206. LWL reads the word with the first byte from memory, shifts to the left to discard the unneeded byte(s), and changes only those bytes in Rd. The byte(s) transferred are from the first byte until the lowest-order byte of the word. The following LWR addresses the last byte, right shifts to discard the unneeded byte(s), and finally changes only those bytes of Rd. The byte(s) transferred are from the last byte up to the highest-order byte of the word. Store word left (SWL) is simply the inverse of LWL, and store word right (SWR) is the inverse of LWR. Changing to Little Endian mode flips which bytes are selected and discarded. (If big-little, left-right, load-store seem confusing, don't worry; they work!) 

Remaining Instructions

Below is a list of the remaining unique details of the MIPS architecture:

There is no specific provision in the MIPS architecture for floating-point execution to proceed in parallel with integer execution, but the MIPS implementations of floating point allow this to happen by checking to see if arithmetic interrupts are possible early in the cycle (see Appendix A). Normally, exception detection would force serialization of execution of integer and floating-point operations.

E.8 Instructions Unique to Alpha

The Alpha was intended to be an architecture that was easy to build high-performance implementations. Toward that goal, the architects orginally made two controversial decisions: imprecise floating-point exceptions and no byte or halfword data transfers.

To simplify pipelined execution, Alpha does not require that an exception act as if no instuctions past a certain point are executed and that all before that point have been executed. It supplies the TRAPB instruction, which stalls until all prior arithmetic instructions are guaranteed to complete without incurring arithmetic exceptions. In the most conservative mode, placing one TRAPB per exception-causing instruction slows execution by roughly five times but provides precise exceptions (see Darcy and Gay [1996]).

Code that does not include TRAPB does not the obey IEEE 754 floating-point standard. The reason is that parts of the standard (NaNs, infinities, and denormal) are implemented in software on Alpha, as it is on many other microprocessors. To implement these operations in software, however, programs must find the offending instruction and operand values, which cannot be done with imprecise interrupts!

When the architecture was developed, it was believed by the architects that byte loads and stores would slow down data transfers. Byte loads require an extra shifter in the data transfer path, and byte stores require that the memory system perform a read-modify-write for memory systems with error correction codes since the new ECC value must be recalculated. This omission meant that byte stores require the sequence load word, replace desired byte, and then store word. (Inconsistently, floating-point loads go though considerable byte swapping to convert the obtuse VAX floating-point formats into a canonical form.)

To reduce the number of instructions to get the desired data, Alpha includes an elaborate set of byte manipulation instructions: extract field and zero rest of a register (EXTxx), insert field (INSxx), mask rest of a register (MSKxx), zero fields of a register (ZAP), and compare multiple bytes (CMPGE).

Apparently the implementors were not as bothered by load and store byte as were the original architects. Beginning with the shrink of the second version of the Alpha chip (21164A), the architecture does include loads and stores for bytes and halfwords.

Remaining Instructions

Below is a list of the remaining unique instructions of the Alpha architecture:

E.9 Instructions Unique to SPARC v.9

Several features are unique to SPARC.

Register Windows

The primary unique feature of SPARC is register windows, an optimization for reducing register traffic on procedure calls. Several banks of registers are used, with a new one allocated on each procedure call. Although this could limit the depth of procedure calls, the limitation is avoided by operating the banks as a circular buffer, providing unlimited depth. The knee of the cost-performance curve seems to be six to eight banks.

SPARC can have between 2 and 32 windows, typically using eight registers each for the globals, locals, incoming parameters, and outgoing parameters. (Given that each window has 16 unique registers, an implementation of SPARC can have as few as 40 physical registers and as many as 520, although most have 128 to 136, so far.) Rather than tie window changes with call and return instructions, SPARC has the separate instructions SAVE and RESTORE. SAVE is used to "save" the caller's window by pointing to the next window of registers in addition to performing an add instruction. The trick is that the source registers are from the caller's window of the addition operation, while the destination register is in the callee's window. SPARC compilers typically use this instruction for changing the stack pointer to allocate local variables in a new stack frame. RESTORE is the inverse of SAVE, bringing back the caller's window while acting as an add instruction, with the source registers from the callee's window and the destination register in the caller's window. This automatically deallocates the stack frame. Compilers can also make use of it for generating the callee's final return value.

The danger of register windows is that the larger number of registers could slow down the clock rate. This was not the case for early implementations. The SPARC architecture (with register windows) and the MIPS R2000 architecture (without) have been built in several technologies since 1987. For several generations the SPARC clock rate has not been slower than the MIPS clock rate for implementations in similar technologies, probably because cache-access times dominate register-access times in these implementations. The current generation machines took different implementation strategies—in-order vs. out-of-order—and it's unlikely that the number of registers by themselves determined the clock rate in either machine.

Another data transfer feature is alternate space option for loads and stores. This simply allows the memory system to identify memory accesses to input/output devices, or to control registers for devices such as the cache and memory-management unit.

Fast Traps

Version 9 SPARC includes support to make traps fast. It expands the single level of traps to at least four levels, allowing the window overflow and underflow trap handlers to be interrupted. The extra levels mean the handler does not need to check for page faults or misaligned stack pointers explicitly in the code, thereby making the handler faster. Two new instructions were added to return from this multilevel handler: RETRY (which retries the interrupted instruction) and DONE (which does not). To support user-level traps, the instruction RETURN will return from the trap in nonprivileged mode.

Support for Lisp and Smalltalk

The primary remaining arithmetic feature is tagged addition and subtraction. The designers of SPARC spent some time thinking about languages like LISP and Smalltalk, and this influenced some of the features of SPARC already discussed: register windows, conditional trap instructions, calls with 32-bit instruction addresses, and multiword arithmetic (see Taylor et al. [1986] and Ungar et al. [1984]). A small amount of support is offered for tagged data types with operations for addition, subtraction, and hence comparison. The two least-significant bits indicate whether the operand is an integer (coded as 00), so TADDcc and TSUBcc set the overflow bit if either operand is not tagged as an integer or if the result is too large. A subsequent conditional branch or trap instruction can decide what to do. (If the operands are not integers, software recovers the operands, checks the types of the operands, and invokes the correct operation based on those types.) It turns out that the misaligned memory access trap can also be put to use for tagged data, since loading from a pointer with the wrong tag can be an invalid access. Figure E.32 shows both types of tag support.
 

Figure E.32 SPARC uses the two least-significant bits to encode different data types for the tagged arithmetic instructions. (a) Integer arithmetic, which takes a single cycle as long as the operands and the result are integers. (b) The misaligned trap can be used to catch invalid memory accesses, such as trying to use an integer as a pointer. For languages with paired data like LISP, an offset of –3 can be used to access the even word of a pair (CAR) and +1 can be used for the odd word of a pair (CDR). 

Overlapped Integer and Floating-Point Operations

SPARC allows floating-point instructions to overlap execution with integer instructions. To recover from an interrupt during such a situation, SPARC has a queue of pending floating-point instructions and their addresses. RDPR allows the processor to empty the queue. The second floating-point feature is the inclusion of floating-point square root instructions FSQRTS, FSQRTD, and FSQRTQ.

Remaining Instructions

The remaining unique features of SPARC are as follows:

E.10 Instructions Unique to PowerPC

PowerPC is the result of several generations of IBM commercial RISC machines—IBM RT/PC, IBM Power-1, and IBM Power-2—plus the Motorola 88x00.

Branch Registers: Link and Counter

Rather than dedicate one of the 32 general-purpose registers to save the return address on procedure call, PowerPC puts the address into a special register called the link register. Since many procedures will return without calling another procedure, link doesn't always have to be saved away. Making the return address a special register makes the return jump faster since the hardware need not go through the register read pipeline stage for return jumps.

In a similar vein, PowerPC has a count register to be used in for loops where the program iterates for a fixed number of times. By using a special register the branch hardware can determine quickly whether a branch based on the count register is likely to branch, since the value of the register is known early in the execution cycle. Tests of the value of the count register in a branch instruction will automatically decrement the count register.

Given that the count register and link register are already located with the hardware that controls branches, and that one of the problems in branch prediction is getting the target address early in the pipeline (see Chapter 6), the PowerPC architects decided to make a second use of these registers. Either register can hold a target address of a conditional branch. Thus PowerPC supplements its basic conditional branch with two instructions that get the target address from these registers (BCLR, BCCTR).

Remaining Instructions

Unlike most other RISC machines, register 0 is not hardwired to the value 0. It cannot be used as a base register—that is, it generates a 0 in this case—but in base+index addressing it can be used as the index. The other unique features of the PowerPC are as follows:

E.11 Instructions Unique to PA-RISC 2.0

PA-RISC was expanded slightly in 1990 with version 1.1 and changed significantly in 2.0 with 64-bit extensions in 1996. PA-RISC perhaps has the most unusual features of any desktop RISC machine. For example, it has the most addressing modes, instruction formats, and, as we shall see, several instructions that are really the combination of two simpler instructions.

Nullification

As shown in Figure E.30, several RISC machines can choose to not execute the instruction following a delayed branch in order to improve utilization of the branch slot. This is called nullification in PA-RISC, and it has been generalized to apply to any arithmetic-logical instruction as well as to all branches. Thus an add instruction can add two operands, store the sum, and cause the following instruction to be skipped if the sum is zero. Like conditional move instructions, nullification allows PA-RISC to avoid branches in cases where there is just one instruction in the then part of an if statement.

A Cornucopia of Conditional Branches

Given nullification, PA-RISC did not need to have separate conditional branch instructions. The inventors could have recommended that nullifying instructions precede unconditional branches, thereby simplifying the instruction set. Instead, PA-RISC has the largest number of conditional branches of any RISC machine. Figure E.33 shows the conditional branches of PA-RISC. As you can see, several are really combinations of two instructions.
 

Name  Instruction  Notation 
COMB Compare and branch if (cond(Rs1,Rs2)) {PC <--- PC + offset12}
COMIB Compare imm. and branch if (cond(imm5,Rs2)) {PC <--- PC + offset12}
MOVB Move and branch Rs2 <--- Rs1,
if (cond(Rs1,0))
{PC <--- PC + offset12}
MOVIB Move immediate and branch Rs2 <--- imm5,
if (cond(imm5,0))
{PC <--- PC + offset12}
ADDB Add and branch Rs2 <--- Rs1 + Rs2,
if (cond(Rs1 + Rs2,0))
{PC <--- PC + offset12}
ADDIB Add imm. and branch Rs2 <--- imm5 + Rs2,
if (cond(imm5 + Rs2,0))
{PC <--- PC + offset12}
BB Branch on bit if (cond(Rsp,0) {PC <--- PC + offset12}
BVB Branch on variable bit if (cond(Rssar,0) {PC <--- PC + offset12}
Figure E.33 The PA-RISC conditional branch instructions. The 12-bit offset is called offset12 in this table, and the 5-bit immediate is called imm5. The 16 conditions are =, <, <=, odd, signed overflow, unsigned no overflow, zero or no overflow unsigned, never, and their respective complements. The BB instruction selects one of the 32 bits of the register and branches depending if its value is 0 or 1. The BVB selects the bit to branch using the shift amount register, a special-purpose register. The subscript notation specifies a bit field. 

Synthesized Multiply and Divide

PA-RISC provides several primitives so that multiply and divide can be synthesized in software. Instructions that shift one operand 1, 2, or 3 bits and then add, trapping or not on overflow, are useful in multiplies. (Alpha also includes instructions that multiply the second operand of adds and subtracts by 4 or by 8: S4ADD,S8ADD, S4SUB,and S8SUB.) Divide step performs the critical step of nonrestoring divide, adding or subtracting depending on the sign of the prior result. Magenheimer et al. [1988] measured the size of operands in multiplies and divides to show how well the multiply step would work. Using these data for C programs, Muchnick [1988] found that by making special cases the average multiply by a constant takes 6 clock cycles and multiply of variables takes 24 clock cycles. PA-RISC has 10 instructions for these operations.

The original SPARC architecture used similar optimizations, but with increasing number of transistors the instruction set was expanded to include full multiply and divide operations. PA-RISC gives some support along these lines by putting a full 32-bit integer multiply in the floating-point unit; however, the integer data must first be moved to floating-point registers.

Decimal Operations

COBOL programs will compute on decimal values, stored as 4 bits per digit, rather than converting back and forth between binary and decimal. PA-RISC has instructions that will convert the sum from a normal 32-bit add into proper decimal digits. It also provides logical and arithmetic operations that set the condition codes to test for carries of digit, bytes, or halfwords. These operations also test whether bytes or halfwords are zero. These operations would be useful in arithmetic on 8-bit ASCII characters. Five PA-RISC instructions provide decimal support.

Remaining Instructions

Here are some remaining PA-RISC instructions:

E.12 Instructions Unique to ARM

It's hard to pick the most unusual feature of ARM, but perhaps it is conditional execution of instructions. Every instruction starts with a 4-bit field that determines whether it will act as a nop or as a real instruction, depending on the condition codes. Hence conditional branches are properly considered as conditionally executing the unconditional branch instruction. Conditional execution allows avoiding a branch to jump over a single instruction. It takes less code space and time to simply conditionally execute one instruction.

The 12-bit immediate field has a novel interpretation. The 8 least-significant bits are zero-extended to a 32-bit value, then rotated right the number of bits specified in the first 4 bits of the field multiplied by 2. Whether this split actually catches more immediates than a simple 12-bit field would be an interesting study. One advantange is that this scheme can represent all powers of 2 in a 32-bit word.

Operand shifting is not limited to immediates. The second register of all arithmetic and logical processing operations has the option of being shifted before being operated on. The shift options are shift left logical, shift right logical, shift right arithmetic, and rotate right. Once again, it would be interesting to see how often operations like rotate-and-add, shift-right-and-test, ..., occur in ARM programs.

Remaining Instructions

Below is a list of the remaining unique instructions of the ARM architecture:

E.13 Instructions Unique to Thumb

In the ARM version 4 model, frequently executed procedures will use ARM instructions to get maximum performance, with the less frequently executed ones using Thumb to reduce the overall code size of the program. Since typically only a few procedures dominate execution time, the hope is that this hybrid gets the best of both worlds.

Although Thumb instructions are translated by the hardware into conventional ARM instructions for execution, there are several restrictions. First, conditional execution is dropped from almost all instructions. Second, only the first 8 registers are easily available in all instructions, with the stack pointer, link register, and program counter used implicitly in some instructions. Third, Thumb uses a two-operand format to save space. Fourth, the unique shifted immediates and shifted second operands have disappeared and are replaced by separate shift instructions. Fifth, the addressing modes are simplified. Finally, putting all instructions into 16 bits forces many more instruction formats.

In many ways the simplified Thumb architecture is more conventional than ARM.

Here are additional changes made from ARM in going to Thumb:


E.14 Instructions Unique to SuperH

Register 0 plays a special role in SuperH address modes. It can be added to another register to form an address in indirect indexed addressing and PC-relative addressing. R0 is used to load constants to give a larger addressing range than can easily be fit into the 16-bit instructions of the SuperH. R0 is also the only register that can be an operand for immediate versions of AND, CMP, OR, and XOR.

Below is a list of the remaining unique details of the SuperH architecture:


E.15 Instructions Unique to M32R

The most unusual feature of the M32R is a slight VLIW approach to the pairs of 16-bit instructions. A bit is reserved in the first instruction of the pair to say whether this instruction can be executed in parallel with the next instruction—that is, the two instructions are independent—or if these two must be executed sequentially. (An earlier machine that offered a similar option was the Intel i860.) This feature is included for future implementations of the architecture.

One surprise is that all branch displacements are shifted left 2 bits before being added to the PC and the lower two bits of the PC are set to 0. Since some instructions are only 16 bits long, this shift means that a branch cannot go to any instruction in the program: it can only branch to instructions on word boundaries. A similar restriction is placed on the return address for the branch-and-link and jump-and-link instructions: they can only return to a word boundary. Thus for a slightly larger branch distance, software must ensure that all branch addresses and all return addresses are aligned to a word boundary. The M32R code space is probably slightly larger and it probably executes more NOP instructions than it would if the branch address were only shifted left 1 bit.

However, the VLIW feature above means that a NOP can execute in parallel with another 16-bit instruction, so that the padding doesn't take more clock cycles. The code size expansion depends on the ability of the compiler to schedule code and to pair successive 16-bit instructions; Mitsubishi claims that code size overall is only 7% larger than that for the Motorola 680x0 architecture.

Below is the remaining unique detail of the M32R architecture:

E.16 Instructions Unique to MIPS16

MIPS16 is not really a separate instruction set but a 16-bit extension of the full 32-bit MIPS architecture. It is compatible with any of the 32-bit address MIPS architectures (MIPS I, MIPS II) or 64-bit architectures (MIPS III, IV, V). The ISA mode bit determines the width of instructions: 0 means 32-bit-wide instructions and 1 means 16-bit-wide instructions. The new JALX instruction toggles the ISA mode bit to switch to the other ISA. JR and JALR have been redefined to set the ISA mode bit from the most significant bit of the register containing the branch address, and this bit is not considered part of the address. All jump and link instructions save the current mode bit as the most significant bit of the return address.

Hence MIPS supports whole procedures containing either 16-bit or 32-bit instructions, but it does not support mixing the two lengths together in a single procedure. The one exception is the JAL and JALX: these two instructions need 32 bits even in the 16-bit mode, presumably to get a large enough address to branch to far procedures.

In picking this subset, MIPS decided to include opcodes for some three-operand instructions and to keep 16 opcodes for 64-bit operations. The combination of this many opcodes and operands in 16 bits led the architects to provide only 8 easy-to-use registers—just like Thumb—whereas the other embedded RISCs offer about 16 registers. Since the hardware must include the full 32 registers of the 32-bit ISA mode, MIPS16 includes move instructions to copy values between the 8 MIPS16 registers and the remaining 24 registers of the full MIPS architecture. To reduce pressure on the 8 visible registers, the stack pointer is considered a separate register. MIPS16 includes a variety of separate opcodes to do data transfers using sp as a base register and to increment sp: LWSP, LDSP, SWSP, SDSP, ADJSP, DADJSP, ADDIUSPD, and DADDIUSP.

To fit within the 16-bit limit, immediate fields have generally been shortened to 5 to 8 bits. MIPS16 provides a way to extend its shorter immediates into the full width of immediates in the 32-bit mode. Borrowing a trick from the Intel 8086, the EXTEND instruction is really a 16-bit prefix than can be prepended to any MIPS16 instruction with an address or immediate field. The prefix supplies enough bits to turn the 5-bit fields of data transfers and 5- to 8-bit fields of arithmetic immediates into 16-bit constants. Alas, there are two exceptions. ADDIU and DADDIU start with 4-bit immediate fields, but since EXTEND can only supply 11 more bits, the wider immediate is limited to 15 bits. EXTEND also extends the 3-bit shift fields into 5-bit fields for shifts. (In case you were wondering, the EXTEND prefix does not need to start on a 32-bit boundary.)

To further address the supply of constants, MIPS16 added a new addressing mode! PC-relative addressing for load word (LWPC) and load double (LDPC) shifts an 8-bit immediate field by 2 or 3 bits, respectively, adding it to the PC with the lower 2 or 3 bits cleared. The constant word or doubleword is then loaded into a regsiter. Thus 32-bit or 64-bit constants can be included with MIPS16 code, despite the loss of LIU to set the upper register bits. Given the new addressing mode, there is also an instruction (ADDIUPC) to calculate a PC-relative address and place it in a register.

MIPS16 differs from the other embedded RISCs in that it can subset a 64-bit address architecture. As a result it has 16-bit instruction-length versions of 64-bit data operations: data transfer (LD, SD, LWU), arithmetic operations (DADDU/IU, DSUBU, DMULT/U, DDIV/U), and shifts (DSLL/V, DSRA/V, DSRL/V).

Since MIPS plays such a prominent role in this book, we show all the additional changes made from the MIPS core instructions in going to MIPS16:


E.17 Concluding Remarks

This Web extension covers the addressing modes, instruction formats, and all instructions found in 10 recent RISC architectures. Although the later sections concentrate on the differences, it would not be possible to cover 10 architectures in these few pages if there were not so many similarities. In fact, we would guess that more than 90% of the instructions executed for any of these architectures would be found in Figures E.9 through E.17. To contrast this homogeneity, Figure E.34 gives a summary for four architectures from the 1970s in a format similar to that shown in Figure E.1. (Imagine trying to write a single chapter in this style for those architectures!) In the history of computing, there has never been such widespread agreement on computer architecture.
 

  IBM 360/370  Intel 8086  Motorola 68000  DEC VAX 
Date announced 1964/1970 1978 1980 1977
Instruction size(s) (bits) 16,32,48 8,16,24,32,40,48 16,32,48,64,80 8,16,24,32,..., 432
Addressing (size, model) 24 bits, flat/ 
31 bits, flat
4+16 bits, 
segmented
24 bits, flat 32 bits, flat
Data aligned? Yes 360/ No 370 No 16-bit aligned No
Data addressing modes 2/3 5 9 >= 14
Protection Page None Optional Page
Page size 2 KB & 4 KB --- 0.25 to 32 KB 0.5 KB
I/O Opcode Opcode Memory mapped Memory mapped
Integer registers (size, 
model, number)
16 GPR x 32 bits 8 dedicated 
data x 16 bits
8 data & 8 address 
x 32 bits
15 GPR x 32 bits
Separate floating-point registers 4 x 64 bits Optional: 
8 x 80 bits
Optional: 
8 x 80 bits
0
Floating-point format IBM (floating 
hexadecimal)
IEEE 754 single, 
double, extended
IEEE 754 single, 
double, extended
DEC
Figure E.34 Summary of four 1970s architectures. Unlike the architectures in Figure E.1, there is little agreement between these architectures in any category. (See Chapter 3 for more details on the 80x86 and Web Extension III for a description of the VAX.) 

This style of architectures cannot remain static, however. Like people, instruction sets tend to get bigger as they get older. Figure E.35 shows the genealogy of these instruction sets, and Figure E.36 shows which features were added to or deleted from generations of desktop RISCs over time.

As you can see, all the desktop RISC machines have evolved to 64-bit address architectures, and they have done so fairly painlessly. The only remaining major desktop 32-bit address architecture is the Intel 80x86, which is to be succeeded by the Intel/HP IA-64 in about 1999.

Whether IA-64 is similar to the 80x86, similar to desktop RISCs, or the beginning of a new branch of computer architecture will be known in just a moment on the architecture timeline. If IA-64 proves successful, then microprocessor architectures of the 1970s will finally step into history rather than shape the cost and performance of modern desktop computing.
 
 

Figure E.35 The lineage of RISC instruction sets. Commercial machines are shown in plain text and research machines in bold. The CDC-6600 and Cray-1 were load-store machines with register 0 fixed at 0, and separate integer and floating-point registers. Instructions could not cross word boundaries. An early IBM research machine led to the 801 and America research projects, with the 801 leading to the unsuccessful RT/PC and America leading to the successful Power architecture. Some people who worked on the 801 later joined Hewlett Packard to work on the PA-RISC. The two university projects were the basis of MIPS and SPARC machines. According to Furber [1996], the Berkeley RISC project was the inspiration of the ARM architecture. While ARM1, ARM2, and ARM3 were names of both architectures and chips, ARM version 4 is the name of the architecture used in ARM7, ARM8, and StrongARM chips. (There are no ARMv4 and ARM5 chips, but ARM6 and early ARM7 chips use the ARM3 architecture.) DEC built a RISC microprocessor in 1988 but did not introduce it. Instead, DEC shipped workstations using MIPS microprocessors for three years before they brought out their own RISC instruction set, Alpha 21064, which is very similar to MIPS III and PRISM. The Alpha architecture has had small extensions but they have not been formalized with version numbers; we used the version 3 because that is the version of the reference manual. The Alpha 21164A chip added byte and halfword loads and stores, and the Alpha 21264 includes the MAX multimedia and bit count instructions. Internally, Digital names chips after the fabrication technology: EV4 (21064), EV45 (21064A), EV5 (21164), EV56 (21164A), and EV6 (21264). "EV" stands for "extended VAX." 


 
 

  PA-RISC SPARC MIPS POWER
Feature  1.0  1.1  2.0  v. 8  v. 9  II  III  IV  V PC 
Interlocked loads X " " X "   + " "   X " "
Load/store FP double X " " X "   + " "   X " "
Semaphore X " " X "   + " "   X " "
Square root X " " X "   + " "     + "
Single-precision FP ops X " " X " X " " "       +
Memory synchronize X " " X "   + " "   X " "
Coprocessor X " " X -- X " " "        
Base + index addressing X " " X "       +   X " "
equiv. 32 64-bit FP registers   " "   +     + "   X " "
Annulling delayed branch X " " X "   + " "        
Branch register contents X " "   + X " " "        
Big/Little Endian   + "   + X " " "       +
Branch prediction bit         +   + " "   X " "
Conditional move         +       +   X " --
Prefetch data into cache     +   +       +   X " "
64-bit addressing/ int. ops     +   +     + "       +
32-bit multiply, divide   + "   + X " " "   X " "
Load/store FP quad         +             + --
Fused FP mul/add     +           +   X " "
String instructions X " "               X " --
Multimedia support   X " X           X      
Figure E.36 Features added to desktop RISC machines. X means in the original machine, + means added later, " means continued from prior machine, and — means removed from architecture. Alpha is not included, but it added byte and word loads and stores, and bit count and multimedia extensions, in version 3. MIPS V added the MDMX instructions and paired single floating-point operations. 

E.18 Acknowledgments

We would like to thank the following people for comments on drafts of this Web extension: Professor Steven B. Furber, University of Manchester; Dr. Dileep Bhandarkar, Intel Corporation; Dr. Earl Killian, Silicon Graphics/MIPS; and Dr. Hiokazu Takata, Mitsubishi Electric Corporation.


E.19 References

Bhandarkar, D. P. [1995]. Alpha Architecture and Implementations, Digital Press, Newton, MA.
Darcy, J. D., and D. Gay [1996]. FLECKmarks: Measuring Floating Point Performance using a Full IEEE Compliant Arithmetic BenchmarK, CS 252 class project, U.C. Berkeley (see HTTP.CS.Berkeley.EDU/~darcy/Projects/cs252/).
Digital Semiconductor [1996]. Alpha Architecture Handbook, Version 3, Digital Press, Maynard, MA, Order number EC-QD2KB-TE (October).
Furber, S. B. [1996]. ARM System Architecture, Addison-Wesley, Harlow, England 
(see www.cs.man.ac.uk/amulet/publications/books/ARMsysArch).
Hewlett Packard [1994]. PA-RISC 2.0 Architecture Reference Manual, 3rd ed.
Hitachi [1997]. SuperH RISC Engine SH7700 Series Programming Manual
(see www.halsp.hitachi.com/tech_prod/ and search for title).
IBM [1994]. The PowerPC Architecture, Morgan Kaufmann, San Francisco.
Kane, G., and J. Heinrich [1992]. MIPS RISC Architecture, Prentice Hall, Englewood Cliffs, NJ.
Kane, G. [1996]. PA-RISC 2.0 Architecture, Prentice Hall PTR, Upper Saddle River, NJ.
Kissell, K. D. [1997]. MIPS16: High-Density for the Embedded Market 
(see www.sgi.com/MIPS/arch/MIPS16/MIPS16.whitepaper.pdf).
Magenheimer, D. J., L. Peters, K. W. Pettis, and D. Zuras [1988]. "Integer Multiplication and Division on the HP Precision Architecture," IEEE Trans. on Computers 37:8, 980–990.
MIPS [1997]. MIPS16 Application Specific Extension Product Description
(see www.sgi.com/MIPS/arch/MIPS16/mips16.pdf)
Mitsubishi [1996]. Mitsubishi 32-Bit Single Chip Microcomputer M32R Family Software Manual (September).
Muchnick, S. S. [1988]. "Optimizing Compilers for SPARC," Sun Technology 1:3 (Summer), 64–77.
Silicon Graphics [1996]. MIPS V Instruction Set (see http://www.sgi.com/MIPS/arch/ISA5/#MIPSV_indx).
Sites, R. L., and R. Witek (eds.) [1995]. Alpha Architecture Reference Manual, Second Edition, Digital Press, Newton, MA.
Sun Microsystems [1989]. The SPARC Architectural Manual, Version 8, Part No. 800-1399-09, August 25.
Taylor, G., P. Hilfinger, J. Larus, D. Patterson, and B. Zorn [1986]. "Evaluation of the SPUR LISP Architecture," Proc. 13th Symposium on Computer Architecture (June), Tokyo.
Ungar, D., R. Blau, P. Foley, D. Samples, and D. Patterson [1984]. "Architecture of SOAR: Smalltalk on a RISC," Proc. 11th Symposium on Computer Architecture (June), Ann Arbor, MI, 188–197.
Weaver, D. L., and T. Germond [1994]. The SPARC Architectural Manual, Version 9, Prentice Hall, Englewood Cliffs, NJ.
Weiss, S., and J. E. Smith [1994]. Power and PowerPC, Morgan Kaufmann, San Francisco.