Chapter 3
Arithmetic for Computers
Chapter 3 — Arithmetic for Computers — 2
Arithmetic for Computers
 Operations on integers
 Addition and subtraction
 Multiplication and division
 Dealing with overflow
 Floating-point real numbers
 Representation and operations
§3.1Introduction
Chapter 3 — Arithmetic for Computers — 3
Integer Addition
 Example: 7 + 6
§3.2AdditionandSubtraction
 Overflow if result out of range
 Adding +ve and –ve operands, no overflow
 Adding two +ve operands
 Overflow if result sign is 1
 Adding two –ve operands
 Overflow if result sign is 0
Chapter 3 — Arithmetic for Computers — 4
Integer Subtraction
 Add negation of second operand
 Example: 7 – 6 = 7 + (–6)
+7: 0000 0000 … 0000 0111
–6: 1111 1111 … 1111 1010
+1: 0000 0000 … 0000 0001
 Overflow if result out of range
 Subtracting two +ve or two –ve operands, no overflow
 Subtracting +ve from –ve operand
 Overflow if result sign is 0
 Subtracting –ve from +ve operand
 Overflow if result sign is 1
Chapter 3 — Arithmetic for Computers — 5
Dealing with Overflow
 Some languages (e.g., C) ignore overflow
 Use MIPS addu, addui, subu instructions
 Other languages (e.g., Ada, Fortran)
require raising an exception
 Use MIPS add, addi, sub instructions
 On overflow, invoke exception handler
 Save PC in exception program counter (EPC)
register
 Jump to predefined handler address
 mfc0 (move from coprocessor reg) instruction can
retrieve EPC value, to return after corrective action
Chapter 3 — Arithmetic for Computers — 6
Arithmetic for Multimedia
 Graphics and media processing operates
on vectors of 8-bit and 16-bit data
 Use 64-bit adder, with partitioned carry chain
 Operate on 8×8-bit, 4×16-bit, or 2×32-bit vectors
 SIMD (single-instruction, multiple-data)
 Saturating operations
 On overflow, result is largest representable
value
 c.f. 2s-complement modulo arithmetic
 E.g., clipping in audio, saturation in video
Chapter 3 — Arithmetic for Computers — 7
Multiplication
 Start with long-multiplication approach
1000
× 1001
1000
0000
0000
1000
1001000
Length of product is
the sum of operand
lengths
multiplicand
multiplier
product
§3.3Multiplication
Chapter 3 — Arithmetic for Computers — 8
Multiplication Hardware
Initially 0
Chapter 3 — Arithmetic for Computers — 9
Optimized Multiplier
 Perform steps in parallel: add/shift
 One cycle per partial-product addition
 That’s ok, if frequency of multiplications is low
Chapter 3 — Arithmetic for Computers — 10
Faster Multiplier
 Uses multiple adders
 Cost/performance tradeoff
 Can be pipelined
 Several multiplication performed in parallel
Chapter 3 — Arithmetic for Computers — 11
MIPS Multiplication
 Two 32-bit registers for product
 HI: most-significant 32 bits
 LO: least-significant 32-bits
 Instructions
 mult rs, rt / multu rs, rt
 64-bit product in HI/LO
 mfhi rd / mflo rd
 Move from HI/LO to rd
 Can test HI value to see if product overflows 32 bits
 mul rd, rs, rt
 Least-significant 32 bits of product –> rd
Chapter 3 — Arithmetic for Computers — 12
Division
 Check for 0 divisor
 Long division approach
 If divisor ≤ dividend bits
 1 bit in quotient, subtract
 Otherwise
 0 bit in quotient, bring down next
dividend bit
 Restoring division
 Do the subtract, and if remainder
goes < 0, add divisor back
 Signed division
 Divide using absolute values
 Adjust sign of quotient and remainder
as required
1001
1000 1001010
-1000
10
101
1010
-1000
10
n-bit operands yield n-bit
quotient and remainder
quotient
dividend
remainder
divisor
§3.4Division
Chapter 3 — Arithmetic for Computers — 13
Division Hardware
Initially dividend
Initially divisor
in left half
Chapter 3 — Arithmetic for Computers — 14
Division Example
Iteration Step Quotient Divisor Remainder
0 Initial values 0000 0010 0000 0000 0111
1
1: Rem = Rem - Div 0000 0010 0000 1110 0111
2b: Rem < 0 → +Div, sll Q, Q0 = 0 0000 0010 0000 0000 0111
3: Shift Div right 0000 0001 0000 0000 0111
2
1: Rem = Rem - Div 0000 0001 0000 1111 0111
2b: Rem < 0 → +Div, sll Q, Q0 = 0 0000 0001 0000 0000 0111
3: Shift Div right 0000 0000 1000 0000 0111
3
1: Rem = Rem - Div 0000 0000 1000 1111 1111
2b: Rem < 0 → +Div, sll Q, Q0 = 0 0000 0000 1000 0000 0111
3: Shift Div right 0000 0000 0100 0000 0111
4
1: Rem = Rem - Div 0000 0000 0100 0000 0011
2a: Rem ≥ 0 → sll Q, Q0 = 1 0001 0000 0100 0000 0011
3: Shift Div right 0001 0000 0010 0000 0011
5
1: Rem = Rem - Div 0001 0000 0010 0000 0001
2a: Rem ≥ 0 → sll Q, Q0 = 1 0011 0000 0010 0000 0001
3: Shift Div right 0011 0000 0001 0000 0001
11
0010 0111
- 10
11
-10
1
n + 1 = 4 + 1
steps
Chapter 3 — Arithmetic for Computers — 15
Optimized Divider
 One cycle per partial-remainder subtraction
 Looks a lot like a multiplier!
 Same hardware can be used for both
Chapter 3 — Arithmetic for Computers — 16
Faster Division
 Can’t use parallel hardware as in multiplier
 Subtraction is conditional on sign of remainder
 Faster dividers (e.g. SRT devision)
generate multiple quotient bits per step
 Still require multiple steps
Chapter 3 — Arithmetic for Computers — 17
MIPS Division
 Use HI/LO registers for result
 HI: 32-bit remainder
 LO: 32-bit quotient
 Instructions
 div rs, rt / divu rs, rt
 No overflow or divide-by-0 checking
 Software must perform checks if required
 Use mfhi, mflo to access result
Chapter 3 — Arithmetic for Computers — 18
Floating Point
 Representation for non-integral numbers
 Including very small and very large numbers
 Like scientific notation
 –2.34 × 1056
 +0.002 × 10–4
 +987.02 × 109
 In binary
 ±1.xxxxxxx2 × 2yyyy
 Types float and double in C
normalized
not normalized
§3.5FloatingPoint
Chapter 3 — Arithmetic for Computers — 19
Floating Point Standard
 Defined by IEEE Std 754-1985
 Developed in response to divergence of
representations
 Portability issues for scientific code
 Now almost universally adopted
 Two representations
 Single precision (32-bit)
 Double precision (64-bit)
Chapter 3 — Arithmetic for Computers — 20
IEEE Floating-Point Format
 S: sign bit (0  non-negative, 1  negative)
 Normalize significand: 1.0 ≤ |significand| < 2.0
 Always has a leading pre-binary-point 1 bit, so no need to
represent it explicitly (hidden bit)
 Significand is Fraction with the “1.” restored
 Exponent: excess representation: actual exponent + Bias
 Ensures exponent is unsigned
 Single: Bias = 127; Double: Bias = 1023
S Exponent Fraction
single: 8 bits
double: 11 bits
single: 23 bits
double: 52 bits
Bias)(ExponentS
2Fraction)(11)(x 

Chapter 3 — Arithmetic for Computers — 21
Single-Precision Range
 Exponents 00000000 and 11111111 reserved
 Smallest value
 Exponent: 00000001
 actual exponent = 1 – 127 = –126
 Fraction: 000…00  significand = 1.0
 ±1.0 × 2–126 ≈ ±1.2 × 10–38
 Largest value
 exponent: 11111110
 actual exponent = 254 – 127 = +127
 Fraction: 111…11  significand ≈ 2.0
 ±2.0 × 2+127 ≈ ±3.4 × 10+38
Chapter 3 — Arithmetic for Computers — 22
Double-Precision Range
 Exponents 0000…00 and 1111…11 reserved
 Smallest value
 Exponent: 00000000001
 actual exponent = 1 – 1023 = –1022
 Fraction: 000…00  significand = 1.0
 ±1.0 × 2–1022 ≈ ±2.2 × 10–308
 Largest value
 Exponent: 11111111110
 actual exponent = 2046 – 1023 = +1023
 Fraction: 111…11  significand ≈ 2.0
 ±2.0 × 2+1023 ≈ ±1.8 × 10+308
Chapter 3 — Arithmetic for Computers — 23
Floating-Point Precision
 Relative precision
 all fraction bits are significant
 Single: approx 2–23
 Equivalent to 23 × log102 ≈ 23 × 0.3 ≈ 6 decimal
digits of precision
 Double: approx 2–52
 Equivalent to 52 × log102 ≈ 52 × 0.3 ≈ 16 decimal
digits of precision
Chapter 3 — Arithmetic for Computers — 24
Floating-Point Example
 Represent –0.75
 –0.75 = (–1)1 × 1.12 × 2–1
 S = 1
 Fraction = 1000…002
 Exponent = –1 + Bias
 Single: –1 + 127 = 126 = 011111102
 Double: –1 + 1023 = 1022 = 011111111102
 Single: 1011111101000…00
 Double: 1011111111101000…00
Chapter 3 — Arithmetic for Computers — 25
Floating-Point Example
 What number is represented by the singleprecision
float
11000000101000…00
 S = 1
 Fraction = 01000…002
 Exponent = 100000012 = 129
 x = (–1)1 × (1 + .012) × 2(129 – 127)
= (–1) × 1.25 × 22
= –5.0
Chapter 3 — Arithmetic for Computers — 26
Denormal Numbers
 Exponent = 000...0  hidden bit is 0
 Smaller than normal numbers
 allow for gradual underflow, with
diminishing precision
 Denormal with fraction = 000...0
Two representations
of 0.0!
BiasS
2Fraction)(01)(x 

0.0 BiasS
20)(01)(x
Chapter 3 — Arithmetic for Computers — 27
Infinities and NaNs
 Exponent = 111...1, Fraction = 000...0
 ±Infinity
 Can be used in subsequent calculations,
avoiding need for overflow check
 Exponent = 111...1, Fraction ≠ 000...0
 Not-a-Number (NaN)
 Indicates illegal or undefined result
 e.g., 0.0 / 0.0
 Can be used in subsequent calculations
Chapter 3 — Arithmetic for Computers — 28
Floating-Point Addition
 Consider a 4-digit decimal example
 9.999 × 101 + 1.610 × 10–1
 1. Align decimal points
 Shift number with smaller exponent
 9.999 × 101 + 0.016 × 101
 2. Add significands
 9.999 × 101 + 0.016 × 101 = 10.015 × 101
 3. Normalize result & check for over/underflow
 1.0015 × 102
 4. Round and renormalize if necessary
 1.002 × 102
Chapter 3 — Arithmetic for Computers — 29
Floating-Point Addition
 Now consider a 4-digit binary example
 1.0002 × 2–1 + –1.1102 × 2–2 (0.5 + –0.4375)
 1. Align binary points
 Shift number with smaller exponent
 1.0002 × 2–1 + –0.1112 × 2–1
 2. Add significands
 1.0002 × 2–1 + –0.1112 × 2–1 = 0.0012 × 2–1
 3. Normalize result & check for over/underflow
 1.0002 × 2–4, with no over/underflow
 4. Round and renormalize if necessary
 1.0002 × 2–4 (no change) = 0.0625
Chapter 3 — Arithmetic for Computers — 30
FP Adder Hardware
 Much more complex than integer adder
 Doing it in one clock cycle would take too
long
 Much longer than integer operations
 Slower clock would penalize all instructions
 FP adder usually takes several cycles
 Can be pipelined
Chapter 3 — Arithmetic for Computers — 31
FP Adder Hardware
Step 1
Step 2
Step 3
Step 4
Chapter 3 — Arithmetic for Computers — 32
Floating-Point Multiplication
 Consider a 4-digit decimal example
 1.110 × 1010 × 9.200 × 10–5
 1. Add exponents
 For biased exponents, subtract bias from sum
 New exponent = 10 + –5 = 5
 2. Multiply significands
 1.110 × 9.200 = 10.212  10.212 × 105
 3. Normalize result & check for over/underflow
 1.0212 × 106
 4. Round and renormalize if necessary
 1.021 × 106
 5. Determine sign of result from signs of operands
 +1.021 × 106
Chapter 3 — Arithmetic for Computers — 33
Floating-Point Multiplication
 Now consider a 4-digit binary example
 1.0002 × 2–1 × –1.1102 × 2–2 (0.5 × –0.4375)
 1. Add exponents
 Unbiased: –1 + –2 = –3
 Biased: (–1 + 127) + (–2 + 127) = –3 + 254 – 127 = –3 + 127
 2. Multiply significands
 1.0002 × 1.1102 = 1.1102  1.1102 × 2–3
 3. Normalize result & check for over/underflow
 1.1102 × 2–3 (no change) with no over/underflow
 4. Round and renormalize if necessary
 1.1102 × 2–3 (no change)
 5. Determine sign: +ve × –ve  –ve
 –1.1102 × 2–3 = –0.21875
Chapter 3 — Arithmetic for Computers — 34
FP Arithmetic Hardware
 FP multiplier is of similar complexity to FP
adder
 But uses a multiplier for significands instead of
an adder
 FP arithmetic hardware usually does
 Addition, subtraction, multiplication, division,
reciprocal, square-root
 FP  integer conversion
 Operations usually takes several cycles
 Can be pipelined
Chapter 3 — Arithmetic for Computers — 35
FP Instructions in MIPS
 FP hardware is coprocessor 1
 Adjunct processor that extends the ISA
 Separate FP registers
 32 single-precision: $f0, $f1, … $f31
 Paired for double-precision: $f0/$f1, $f2/$f3, …
 Release 2 of MIPs ISA supports 32 × 64-bit FP reg’s
 FP instructions operate only on FP registers
 Programs generally don’t do integer ops on FP data,
or vice versa
 More registers with minimal code-size impact
 FP load and store instructions
 lwc1, ldc1, swc1, sdc1
 e.g., ldc1 $f8, 32($sp)
Chapter 3 — Arithmetic for Computers — 36
FP Instructions in MIPS
 Single-precision arithmetic
 add.s, sub.s, mul.s, div.s
 e.g., add.s $f0, $f1, $f6
 Double-precision arithmetic
 add.d, sub.d, mul.d, div.d
 e.g., mul.d $f4, $f4, $f6
 Single- and double-precision comparison
 c.xx.s, c.xx.d (xx is eq, lt, le, …)
 Sets or clears FP condition-code bit
 e.g. c.lt.s $f3, $f4
 Branch on FP condition code true or false
 bc1t, bc1f
 e.g., bc1t TargetLabel
Chapter 3 — Arithmetic for Computers — 37
FP Example: °F to °C
 C code:
float f2c (float fahr) {
return ((5.0/9.0)*(fahr - 32.0));
}
 fahr in $f12, result in $f0, literals in global memory
space
 Compiled MIPS code:
f2c: lwc1 $f16, const5($gp)
lwc2 $f18, const9($gp)
div.s $f16, $f16, $f18
lwc1 $f18, const32($gp)
sub.s $f18, $f12, $f18
mul.s $f0, $f16, $f18
jr $ra
Chapter 3 — Arithmetic for Computers — 38
FP Example: Array Multiplication
 X = X + Y × Z
 All 32 × 32 matrices, 64-bit double-precision elements
 C code:
void mm (double x[][],
double y[][], double z[][]) {
int i, j, k;
for (i = 0; i! = 32; i = i + 1)
for (j = 0; j! = 32; j = j + 1)
for (k = 0; k! = 32; k = k + 1)
x[i][j] = x[i][j]
+ y[i][k] * z[k][j];
}
 Addresses of x, y, z in $a0, $a1, $a2, and
i, j, k in $s0, $s1, $s2
Chapter 3 — Arithmetic for Computers — 39
FP Example: Array Multiplication
 MIPS code:
li $t1, 32 # $t1 = 32 (row size/loop end)
li $s0, 0 # i = 0; initialize 1st for loop
L1: li $s1, 0 # j = 0; restart 2nd for loop
L2: li $s2, 0 # k = 0; restart 3rd for loop
sll $t2, $s0, 5 # $t2 = i * 32 (size of row of x)
addu $t2, $t2, $s1 # $t2 = i * size(row) + j
sll $t2, $t2, 3 # $t2 = byte offset of [i][j]
addu $t2, $a0, $t2 # $t2 = byte address of x[i][j]
l.d $f4, 0($t2) # $f4 = 8 bytes of x[i][j]
L3: sll $t0, $s2, 5 # $t0 = k * 32 (size of row of z)
addu $t0, $t0, $s1 # $t0 = k * size(row) + j
sll $t0, $t0, 3 # $t0 = byte offset of [k][j]
addu $t0, $a2, $t0 # $t0 = byte address of z[k][j]
l.d $f16, 0($t0) # $f16 = 8 bytes of z[k][j]
…
Chapter 3 — Arithmetic for Computers — 40
FP Example: Array Multiplication
…
sll $t0, $s0, 5 # $t0 = i*32 (size of row of y)
addu $t0, $t0, $s2 # $t0 = i*size(row) + k
sll $t0, $t0, 3 # $t0 = byte offset of [i][k]
addu $t0, $a1, $t0 # $t0 = byte address of y[i][k]
l.d $f18, 0($t0) # $f18 = 8 bytes of y[i][k]
mul.d $f16, $f18, $f16 # $f16 = y[i][k] * z[k][j]
add.d $f4, $f4, $f16 # f4=x[i][j] + y[i][k]*z[k][j]
addiu $s2, $s2, 1 # $k k + 1
bne $s2, $t1, L3 # if (k != 32) go to L3
s.d $f4, 0($t2) # x[i][j] = $f4
addiu $s1, $s1, 1 # $j = j + 1
bne $s1, $t1, L2 # if (j != 32) go to L2
addiu $s0, $s0, 1 # $i = i + 1
bne $s0, $t1, L1 # if (i != 32) go to L1
Chapter 3 — Arithmetic for Computers — 41
Accurate Arithmetic
 IEEE Std 754 specifies additional rounding
control
 Extra bits of precision (guard, round, sticky)
 Choice of rounding modes
 Allows programmer to fine-tune numerical behavior of
a computation
 Not all FP units implement all options
 Most programming languages and FP libraries just
use defaults
 Trade-off between hardware complexity,
performance, and market requirements
Chapter 3 — Arithmetic for Computers — 42
Interpretation of Data
 Bits have no inherent meaning
 Interpretation depends on the instructions
applied
 Computer representations of numbers
 Finite range and precision
 Need to account for this in programs
The BIG Picture
Chapter 3 — Arithmetic for Computers — 43
Associativity
 Parallel programs may interleave
operations in unexpected orders
 Assumptions of associativity may fail
§3.6ParallelismandComputerArithmetic:Associativity
(x+y)+z x+(y+z)
x -1.50E+38 -1.50E+38
y 1.50E+38
z 1.0 1.0
1.00E+00 0.00E+00
0.00E+00
1.50E+38
 Need to validate parallel programs under
varying degrees of parallelism
Chapter 3 — Arithmetic for Computers — 44
x86 FP Architecture
 Originally based on 8087 FP coprocessor
 8 × 80-bit extended-precision registers
 Used as a push-down stack
 Registers indexed from TOS: ST(0), ST(1), …
 FP values are 32-bit or 64 in memory
 Converted on load/store of memory operand
 Integer operands can also be converted
on load/store
 Very difficult to generate and optimize code
 Result: poor FP performance
§3.7RealStuff:FloatingPointinthex86
Chapter 3 — Arithmetic for Computers — 45
x86 FP Instructions
 Optional variations
 I: integer operand
 P: pop operand from stack
 R: reverse operand order
 But not all combinations allowed
Data transfer Arithmetic Compare Transcendental
FILD mem/ST(i)
FISTP mem/ST(i)
FLDPI
FLD1
FLDZ
FIADDP mem/ST(i)
FISUBRP mem/ST(i)
FIMULP mem/ST(i)
FIDIVRP mem/ST(i)
FSQRT
FABS
FRNDINT
FICOMP
FIUCOMP
FSTSW AX/mem
FPATAN
F2XMI
FCOS
FPTAN
FPREM
FPSIN
FYL2X
Chapter 3 — Arithmetic for Computers — 46
Streaming SIMD Extension 2 (SSE2)
 Adds 4 × 128-bit registers
 Extended to 8 registers in AMD64/EM64T
 Can be used for multiple FP operands
 2 × 64-bit double precision
 4 × 32-bit single precision
 Instructions operate on them simultaneously
 Single-Instruction Multiple-Data
Chapter 3 — Arithmetic for Computers — 47
Right Shift and Division
 Left shift by i places multiplies an integer
by 2i
 Right shift divides by 2i?
 Only for unsigned integers
 For signed integers
 Arithmetic right shift: replicate the sign bit
 e.g., –5 / 4
 111110112 >> 2 = 111111102 = –2
 Rounds toward –∞
 c.f. 111110112 >>> 2 = 001111102 = +62
§3.8FallaciesandPitfalls
Chapter 3 — Arithmetic for Computers — 48
Who Cares About FP Accuracy?
 Important for scientific code
 But for everyday consumer use?
 “My bank balance is out by 0.0002¢!” 
 The Intel Pentium FDIV bug
 The market expects accuracy
 See Colwell, The Pentium Chronicles
Chapter 3 — Arithmetic for Computers — 49
Concluding Remarks
 ISAs support arithmetic
 Signed and unsigned integers
 Floating-point approximation to reals
 Bounded range and precision
 Operations can overflow and underflow
 MIPS ISA
 Core instructions: 54 most frequently used
 100% of SPECINT, 97% of SPECFP
 Other instructions: less frequent
§3.9ConcludingRemarks
Chapter 1 — Computer Abstractions and Technology — 50
Exercises
 Answer the following exercises, and send your
answers as a PDF attachment to the email address
listed below
xamiri@fi.muni.cz
 Leave body of the email blank
 Deadline is March 31st
Chapter 1 — Computer Abstractions and Technology — 51
Exercise 1
 Calculate the product of the octal unsigned 6-bit integers A = 50 and
B = 23 using the hardware described below (adjust the register
sizes). You should show the contents of each register on each step.
Chapter 1 — Computer Abstractions and Technology — 52
Exercise 2
 Calculate the product of the hexadecimal unsigned 8-bit integers A =
66 and B = 04 using the hardware described below (adjust the
register sizes). You should show the contents of each register on
each step.
Chapter 1 — Computer Abstractions and Technology — 53
Exercise 3
 Calculate A = 50 divided by B = 23 using the hardware described
below. You should show the contents of each register on each step.
Assume A and B are octal unsigned 6-bit integers (adjust the register
sizes in the hardware).
Chapter 1 — Computer Abstractions and Technology — 54
Exercise 4
 Calculate A = 50 divided by B = 23 using the hardware described
below. You should show the contents of each register on each step.
Assume A and B are octal unsigned 6-bit integers (adjust the register
sizes in the hardware).
Chapter 1 — Computer Abstractions and Technology — 55
Exercise 5
 What decimal number does the following bit pattern represent if it is
a floating-point number? Use the IEEE 754 standard.
0xAFBF0000
Chapter 1 — Computer Abstractions and Technology — 56
Exercise 6
 Write down the binary representation of the following decimal
number:
- - 938.8125
 a) assuming the IEEE 754 single precision format.
 b) assuming the IEEE 754 double precision format.
Chapter 1 — Computer Abstractions and Technology — 57
Exercise 7
 NVIDIA has a “half” format, which is similar to IEEE 754 except that
it is only 16 bits wide. The leftmost bit is still the sign bit, the
exponent is 5 bits wide (exponent bias = 011112 = 15), and the
mantissa is 10 bits long. A hidden 1 is assumed.
 a) Calculate the sum of the following decimal numbers A and B by
hand, assuming A and B are stored in the 16-bit NVIDIA format.
Assume one guard bit, one round bit and one sticky bit, and round to
the nearest even. Show all the steps.
A = 2.3109375 × 101 B = 6.391601562 × 10-1
 b) Calculate the product of the following decimal numbers A and B
by hand, assuming A and B are stored in the 16-bit NVIDIA format.
Assume one guard bit, one round bit and one sticky bit, and round to
the nearest even. Show all the steps; however, do the multiplication
in human-readable format instead of using any techniques. Write
your answer as a 16-bit pattern. How accurate is your result?
A = 6.18 × 102 B = 5.796875 × 101