#+hugo_base_dir: ../

#+hugo_section: Intro-to-Comp-Systems

How to write a micro-instruction

A micro-instruction is just a binary (often converted to hex) number that represents a list of control signals, and points to the next micro-instruction. Micro-instructions can be changed by changing the A set of micro-instructions forms a micro-routine (otherwise known as an instruction).

Control signals

To understand the micro-instruction, you must understand what the control signals do. I will be referencing the Hard vs Micro paper. It has LA (load accumulator), and EA (enable accumulator) signals. They are essentially just wires connected to the control unit (referenced by the 16 coming out of CONTROL) Each register also generally has a clock signal (CLK), but that is connected directly to the clock circuit and is not required for the micro-instructions. The naming system for the signals is somewhat arbitrary but usually follows the pattern (L, E) (register). This is subverted somewhat by the ALU, but that will most likely be directly defined. The other main difference is the inclusion of I, which just means increment.

Load (L)

A load signal allows the contents of a register to be changed by the state of the bus (a collection of wires). This signal must be paired with an E signal to do anything.

Enable (E)

An enable signal changes the state of the bus to match the contents of the register. Say a register contains the value 1111, when the enable signal is used for that register, the contents of the bus will now be 1111. This signal must be paired with a L signal to do anything.

Micro-instructions

A micro-instruction is a binary number composed of several fields stored at a specific address in the ROM (read only memory) of the control unit.

Control field

This is just a binary representation of the signals. In the hard vs micro document, the letters are signals are abbreviated into the following order:

ILELAWLELELEASEL Expanded: IP LP EP LM R W LD ED LI EI LA EA A S E

A micro instruction will generally contain a pair of signals (L E) that transfers the contents of one register into another. The contents of the enabled register moves into the loaded register through the bus. A 1 means that the signal is on while a 0 means the signal is off.

Next Address field

This field has 4 different sub fields. it is used to determine the next micro instruction.

CD

CD (short for condition) is used for conditional logic. This bit makes the next micro-instruction depend on the value of the negative flag. This is generally off, unless you want conditional logic, see JN micro-routine
MAP

the map bit uses the next micro-instruction in the control ROM, see Fetch-routine . This is generally off.
CRJA field

This field consists of an address in the control ROM

LDA micro-routine

here is a step by step on how to write this instruction with micro-instructions.

Find out where the data is coming from.
Map out the sub-steps:
- IR -> MAR
- RAM -> MDR
- MDR -> ACC
determine the signals for each substep:
- EI, LM
- R
- ED, LA
Link the micro-instructions with the next address field.
- address of next micro-instruction (04) in this case
- address of next micro-instruction (05) in this case
- link back to fetch (00) in this case
Write the codes!
- (0001000001000000)(0)(0)(0)(0100)
- (0000010000000000)(0)(0)(0)(0101)
- (0000000010010000)(0)(0)(0)(0000)
Optional convert to hex
- 82004
- 20005
- 4800

Mock Exam Questions

Question 1 and 2 seem to reference this article: ~~[8086-8088](pdfs/8086.pdf)~~❌

Q1

If a physical branch target address is 5A230 when CS = 5200, what will it be if the CS = 7800 ?

We need to find the offset, where offset + CS(shifted 4 bits) = physical branch target address. CS(C segment register)
Offset = Physical branch target address - CS
Offset = 5A230 - 52000 = 8230 (hex)

The offset is then used to find the physical branch target adress.

78000 + 8230 = 80230 = physical branch target address

Q2

Given that the EA of a data is 2359 and DS = 490B, what is the PA of data?

EA (effective Address), DS (D segment register), PA(Physical Address).

Physical address is given by EA + DS(shifted 4 bits).

DS = 490B0 (hex)
EA = 2359 (hex)
PA = 490B0 + 2359 = 4B409

Q3

Assuming, W, X, Y and Z as memory addresses. Write a program using any machine sequence that will carry out the following: Z ← W + (Z-X).

    .globl main
    .text

main:
    lw $t1, Z #load z into temporary register 1
    lw $t2, X #load x into temporary register 2
    sub $s1, $t1, $t2 #s1 <- t1-t2
    lw $t1, W #load w into temporary register 1
    add $s1, $t1, $s1 #s1 <- t1
    sw $s1, Z # stores s1 into Z

    li $v0, 1 #prints z
    lw $a0, Z
    syscall

    li $v0, 10 #exits
    syscall


.data
Z: .word  12 #arbitrary value
X: .word 10 #arbitrary value
W: .word 5 #arbitrary value

Downloadable solution prints 7

Q4

Assume that the code below is run on a machine with a 2 GHz clock that requires the following number of cycles for each instruction: add, addi, sll, sra take 4cc each, lw takes 5cc, bne, beq take 3cc each. How many seconds will it take to execute this code. The values of registers are $4=0x20, $5= 0x20, $6= 0x30, $7= 0x10.

    .globl  main
    .text

main:
    sra $6, $6, 2   # changes original value of 0x30 / 2^2  = 0xC
    sll $7, $7, 2 # original value = 0x10, 0x10 * 2^2 = 0x40
    add $8, $0, $0 # sets register 8 to 0
L2: add $12, $4, $8 #marks L2, $12 <- 0x20 + $8, $8 is iterator
    lw  $12, 0($12) # $12 = memory at address of ($12) on stack
    add $9, $0, $0 # $9 = 0;
L1: add $11, $5, $9 #marks L1, $11 = 0x20 + $9, $9 is iterator
    lw  $11, 0($11) #$11 = memory at address of ($11) on stack
    addi $9, $9, 4 #$9 = $9 + 4
    bne $9, $7, L1 # goes to L1 if $9 != $7  0x0 + 0x4 * 0x10 = 0x40 = $9 loop executes 0x10 times
    addi $8, $8, 4 # $8 = $8 + 4
    beq $8, $6, L2 # goes to L2 if $8 == $6 exits before  $8 can equal $6

Instruction definitions

sra (shift right arithmetic) (sra destination, origin, shift(in bits)) rounds down
sll (shift left logical) (sll destination, origin, shift)
bne (bne r1, r2, branch address) goes to branch address if r1 != r2
beq (beq, r1, r2, branch address) goest to branch address if r1 == r2

Calculate number of clock cycles

Before loops
- sra 4cc
- sll 4cc
- add 4cc
- total = 12cc
L2
- add 4cc
- lw 5cc
- add 4cc
- L1 clocks
- addi 4cc
- beq 3cc
- total = (20cc + L1 clocks)*1 loop
L3
- add 4cc
- lw 5cc
- addi 4ccc
- bne 3cc
- total = (16cc * 16 loops) = 256 clocks
Total clocks

12cc + 20cc + 256cc = 288cc

Calculate time

288cc/(2*10^9cc/s) = 1.44 * 10^-7 seconds

Q5

X[i] = A[B[i]] + C[i+4]

starting address of A in $1
starting address of B in $2
starting address of C in $3
starting address of X in $4
i value in register $5

Q5 Solution download

.globl main
.text

main:
    sll     $s4, $5, 2          # multiplies i * 4 to conform to address form
    add     $t2, $2, $s4        # gets address of B[i] offsets address of B by i
    lw      $t3, ($t2)          # sets t3 to value at address t3 = B[i]

    sll     $t1, $t3, 2         # sets t1 to t3 * 4 to conform to address form
    add     $t2, $1, $t1        # offsets addres value in $1 by $t1 A[B[i]]
    lw      $s1, ($t2)          # sets t3 to value at address $t2 t3 = A[B[i]]

    addi    $t1, $5, 4          # offsets i by 4  = i+4
    sll     $t1, $5, 2          # multiplies i*4 to conform to address form
    add     $t2, $3, $t1        # offsets C address by i+4
    lw      $s2, ($t2)          # s2 = C[i+4]

    add     $t1, $s1, $s2       # adds A[B[i]] + C[i+4]
    add     $s3, $4, $s4        # offsets address of X by i
    sw      $t1, ($s3)          # stores $t1 to address of ($s3)

Q6

The memory units that follow are specified by the number of words times the number of bits per word. How many address lines and input/output data lines are needed in each case? (a) 8K X 16 (b) 2G X 8 (c) 16M X 32 (d) 256K X 64

Part A

Number of words = 8K
Number of bits per word = 16
log base 2 of words = address lines
2³ * 2¹⁰ = 2¹³ = 13 address lines
I/O lines = address lines + bits per word
13 + 16 = 29 I/O lines

Part B

2¹ * 2³⁰ = 2G
Address lines = 31
I/O lines = 31 + 8 = 39
39 I/O lines

Part C

2⁴ * 2²⁰ = 16M
Address lines = 24
I/O lines = 24 + 32 = 56
56 I/O lines

Part D

2⁸ * 2¹⁰ = 256K
Address lines = 18
I/O lines = 18 + 64
82 I/O lines

Q7

Find the number of bytes that can be stored in the memories: (a) 8K X 16 (b) 2G X 8 (c) 16M X 32 (d) 256K X 64

Part A

8 bits per byte
8K words
16 bits per word
number of bits = 8K*16 = 2¹³ * 2⁴ = 2¹⁷
number of bytes = 2¹⁷/2³ = 2¹⁴ = 16K bytes

Part B

Number of bits = 2G*8 = 2³¹*2³
Number of bytes = 2³¹*2³/2³ = 2³¹ = 2G bytes

Part C

Number of bytes = 16M * 32 / 2³ = 2²⁴ * 32 /2³ = 2²⁶ = 64M bytes

Part D

Number of bytes = 256K*64/2³ = 2¹⁸ * 64 /2³ = 2²⁴ /2³ = 2²¹ = 2M bytes

DONE Q8

How many 128 x 8 memory chips are needed to provide a memory capacity of 4096 x 16?

4096*16 / 128*8 = 64

Q9

Given a 32 x 8 ROM chip with an enable input, show the external connections necessary to construct a 128 x 8 ROM with four chips and a decoder. Useful Video

2⁵ = 32
5 address lines for each ROM
128/32 = 4 chips
4 outputs on decoder (connected to enable inputs)
2² = 4
2 address lines for the decoder
5 + 2 = 7 total lines for complete address

Q10

Assume we have 8GB of word addressable memory with a word size of 64 bits and each refill line of memory stores 16 words. We have a direct-mapped cache with 1024 refill lines. Answer the following questions. ~~[More examples](pdfs/Q10.pdf)~~❌

What is the address format for the cache
If we changed the cache to a 2-way set associative cache, how does the format change?
If we changed the cache to a fully associative cache, how does the format change?

Solution

Part 1
- 8GB = 8*2³⁰ bytes
- 64 bit word = 8 byte word
- 8GB / 8 byte word = 1GW (gigaword)
- 1G = 2³⁰
- 30 bits for line addresses
- 16 words per line = 2⁴
- 4 bits for word number on line
- address format (for MM) = [30-4 bit line address][4 bit word address]
- 1024 refill lines = 2¹⁰
- 10 bits for line number
- 26-10 bits for tag
- [tag][line number][word number]
- 16-10-4 = cache memory format
Part 2
- 2 way set associate cache
- divide refill lines by 2
- 1024/2 = 512 = 2⁹
- 26-9 = tag
- 17-9-4 = 2 way set cache memory format
Part 3
- fully associative cache
- 1 refill line = 2⁰
- 26-0 = tag
- 26-0-4 = fully associate cache

Hard Drive Sample question

This is a writeup of question 1 from ~~[here](pdfs/MagneticDisk.pdf)~~❌.

Consider a disk pack with the following specifications 16 surfaces, 128 tracks per surface, 256 sectors per track, 512 bytes per sector

Capacity of Disk Space

Use the formula Capacity of Disk Space.
16 surfaces * 128 tracks/surface * 256 sectors/track * 512 bytes per sector = $2^{28}$ bytes
256 MB

Bit Required to address sector

Use a modified version of Capacity of Disk Space to find total number of sectors
16 surfaces * 128 tracks/surface * 256 sectors/track = 2¹⁹ sectors
Use the same method as the cache problems to find the number of bits
19 bits required for address

Formatted Disk Space (32 bytes per sector format overhead)

Use the formula Formatting Overhead
2¹⁹ sectors * 32 bytes/sector = 2¹⁹ * 2⁵ bytes = 2²⁴ bytes
16 MB
Use the formula Formatted disk space
Total Disk space - Formatting overhead
256 MB - 16 MB
240 MB

Useful formulas and references

Hard Drives

Hard Drive Website ~~[Hard Drive Document from nitin](pdfs/MagneticDisk.pdf)~~❌

Formulas

Disk access time = Seek Time + Rotational delay + Transfer time + Controller overhead + Queuing delay
Average Disk Access Time = Average seek time + Average rotational delay + Transfer time + Controller overhead + Queuing delay
Average Seek Time = 1/3 * Time taken for one full stroke
- (Time taken to move from track 1 to track 1 + Time taken to move from track 1 to last track)/2
- {0 + (k-1)t}/2
- (k-1)t/2
Average Rotational Latency = 1/2 * Time taken for one full rotation
Capacity of disk pack = Total number of surfaces * Number of tracks per surface * Number of sectors per track * storage capacity of one sector
Formatting Overhead = Number of sectors * Overhead per sector
Formatted Disk Space = Total disk space or capacity - formatting overhead
Recording density or storage density = Capacity of track / circumference of the track
Track Capacity = Recording density of the track * Circumference of the track
Data Transfer Rate = Number of heads * Bytes that can be read in one full rotation * Number of rotations in one second = Number of heads * Capacity of one track * Number of rotations in one second
Tracks per surface = (Outer radius - Inner radius) / Inter Track gap

Circuit reference

Logic Gates
Full Adder

Computer Arithmetic

~~[Computer arithemetic document from nitin](pdfs/Computer Arithemtic.pdf)~~❌

Memory + Cache

Main Memory and Organization : credit to Robbie Schad
~~[Cache Notes](pdfs/notes_cache.pdf)~~❌
~~[Direct Mapping Examples](pdfs/Direct Mapping Examples.pdf)~~❌
~~[Fully Associative Mapping Examples](pdfs/Fully Associative Mapping Examples.pdf)~~❌
~~[Set Associate Mapping Examples](pdfs/Set Associative Mapping Examples.pdf)~~❌

Formulas

Direct mapped cache
- physical address size (bits) = TAG + Line Number + Block/Line Offset
- $\text{MM(size in bytes)} = 2^{\text{number Of Bits In Physical Address}} * 2^3$
- $\text{BlockOffset(size in bytes)} = 2^{\text{Bits In Block Offset}} * 2^3 $
- $\text{number of lines} = 2^{\text{bits in line number}} = \frac{\text{Cache size}}{\text{Line Size}} $
- $\text{tag directory size} = \text{number of lines in cache * Number of bits in tag} = \text{Number of Tags} * \text{Tag size} $
Set Associative Mapped cache
- physical address size (bits) = TAG + Set Number + Block/line Offset
- $2^\text{Set Number (bits)} = \frac{\text{Lines in cache}}{\text{Number of sets}}$
Fully associative cache
- physical address size (bits) = TAG + Block offset

Pipelining

~~[pipelining document from nitin](pdfs/Pipelining in Computer Architecture.pdf)~~❌

Formulas

$\text{Speed Up (S)} = \frac{\text{Non-pipelined execution time}}{\text{Pipelined execution time}}$
$\text{Efficiency} = \frac{\text{Speed Up}}{\text{Number of stages in Pipelined Architecture}}$
$\text{Efficiency} = \frac{\text{Number of boxes utilized in phase time diagram}}{\text{Number of boxes in phase time diagram}}$
$\text{Throughput} = \frac{\text{Number of instructions executed}}{\text{Total time taken}}$
Non-pipelined execution time = Total number of instructions * Time taken to execute one instruction = n * k clock cycles
Pipelined execution time

= Time taken to execute first instruction + Time take to execute remaining instructions

= 1 * k clock cycles + (n-1) * 1 clock cycle

= (k + n-1) clock cycles
Cycle time = Maximum delay due to any stage + Delay due to its register
delay due to its register = latch delay
pipeline time for x tasks = Time taken for 1st task + Time taken for remaining tasks

= number of phases * cycle time + (total tasks -1) * cycle time

MIPS

~~[Mips Theory](pdfs/MIPS-Theory.pdf)~~❌
Mips Reference from Berkeley
Mips Reference from Cburch
MIPS DATAPATH
MIPS PIPELINED DATAPATH

compPosts

How to write a micro-instruction

Control signals

Load (L)

Enable (E)

Micro-instructions

Control field

Next Address field

LDA micro-routine

Mock Exam Questions

Q1

Q2

Q3

Q4

Instruction definitions

Calculate number of clock cycles

Calculate time

Q5

Q6

Part A

Part B

Part C

Part D

Q7

Part A

Part B

Part C

Part D

DONE Q8

Q9

Q10

Solution

Hard Drive Sample question

Capacity of Disk Space

Bit Required to address sector

Formatted Disk Space (32 bytes per sector format overhead)

Useful formulas and references

Hard Drives

Formulas

Circuit reference

Computer Arithmetic

Memory + Cache

Formulas

Pipelining

Formulas

MIPS