#+hugo_base_dir: ../
#+hugo_section: Intro-to-Comp-Systems
How to write a micro-instruction
A micro-instruction is just a binary (often converted to hex) number that represents a list of control signals, and points to the next micro-instruction. Micro-instructions can be changed by changing the A set of micro-instructions forms a micro-routine (otherwise known as an instruction).
Control signals
To understand the micro-instruction, you must understand what the control signals do. I will be referencing the Hard vs Micro paper. It has LA (load accumulator), and EA (enable accumulator) signals. They are essentially just wires connected to the control unit (referenced by the 16 coming out of CONTROL) Each register also generally has a clock signal (CLK), but that is connected directly to the clock circuit and is not required for the micro-instructions. The naming system for the signals is somewhat arbitrary but usually follows the pattern (L, E) (register). This is subverted somewhat by the ALU, but that will most likely be directly defined. The other main difference is the inclusion of I, which just means increment.
Load (L)
A load signal allows the contents of a register to be changed by the state of the bus (a collection of wires). This signal must be paired with an E signal to do anything.
Enable (E)
An enable signal changes the state of the bus to match the contents of the register. Say a register contains the value 1111, when the enable signal is used for that register, the contents of the bus will now be 1111. This signal must be paired with a L signal to do anything.
Micro-instructions
A micro-instruction is a binary number composed of several fields
stored at a specific address in the ROM (read only memory) of the control unit.
Control field
This is just a binary representation of the signals. In the hard vs micro document, the letters are signals are abbreviated into the following order:
- ILELAWLELELEASEL Expanded: IP LP EP LM R W LD ED LI EI LA EA A S E
A micro instruction will generally contain a pair of signals (L E) that transfers the contents of one register into another. The contents of the enabled register moves into the loaded register through the bus. A 1 means that the signal is on while a 0 means the signal is off.
Next Address field
This field has 4 different sub fields. it is used to determine the next micro instruction.
-
CD
CD (short for condition) is used for conditional logic. This bit makes the next micro-instruction depend on the value of the negative flag. This is generally off, unless you want conditional logic, see JN micro-routine
-
MAP
the map bit uses the next micro-instruction in the control ROM, see Fetch-routine . This is generally off.
-
CRJA field
This field consists of an address in the control ROM
LDA micro-routine
here is a step by step on how to write this instruction with micro-instructions.
- Find out where the data is coming from.
-
Map out the sub-steps:
- IR -> MAR
- RAM -> MDR
- MDR -> ACC
-
determine the signals for each substep:
- EI, LM
- R
- ED, LA
-
Link the micro-instructions with the next address field.
- address of next micro-instruction (04) in this case
- address of next micro-instruction (05) in this case
- link back to fetch (00) in this case
-
Write the codes!
- (0001000001000000)(0)(0)(0)(0100)
- (0000010000000000)(0)(0)(0)(0101)
- (0000000010010000)(0)(0)(0)(0000)
-
Optional convert to hex
- 82004
- 20005
- 4800
Mock Exam Questions
Question 1 and 2 seem to reference this article: [8086-8088](pdfs/8086.pdf)❌
Q1
If a physical branch target address is 5A230 when CS = 5200, what will it be if the CS = 7800 ?
- We need to find the offset, where offset + CS(shifted 4 bits) = physical branch target address. CS(C segment register)
- Offset = Physical branch target address - CS
- Offset = 5A230 - 52000 = 8230 (hex)
The offset is then used to find the physical branch target adress.
- 78000 + 8230 = 80230 = physical branch target address
Q2
Given that the EA of a data is 2359 and DS = 490B, what is the PA of data?
-
EA (effective Address), DS (D segment register), PA(Physical Address)
.
Physical address is given by EA + DS(shifted 4 bits).
- DS = 490B0 (hex)
- EA = 2359 (hex)
- PA = 490B0 + 2359 = 4B409
Q3
Assuming, W, X, Y and Z as memory addresses. Write a program using any machine sequence that will carry out the following: Z ← W + (Z-X).
.globl main
.text
main:
lw $t1, Z #load z into temporary register 1
lw $t2, X #load x into temporary register 2
sub $s1, $t1, $t2 #s1 <- t1-t2
lw $t1, W #load w into temporary register 1
add $s1, $t1, $s1 #s1 <- t1
sw $s1, Z # stores s1 into Z
li $v0, 1 #prints z
lw $a0, Z
syscall
li $v0, 10 #exits
syscall
.data
Z: .word 12 #arbitrary value
X: .word 10 #arbitrary value
W: .word 5 #arbitrary value
Downloadable solution prints 7
Q4
Assume that the code below is run on a machine with a 2 GHz clock that requires the following number of cycles for each instruction: add, addi, sll, sra take 4cc each, lw takes 5cc, bne, beq take 3cc each. How many seconds will it take to execute this code. The values of registers are $4=0x20, $5= 0x20, $6= 0x30, $7= 0x10.
.globl main
.text
main:
sra $6, $6, 2 # changes original value of 0x30 / 2^2 = 0xC
sll $7, $7, 2 # original value = 0x10, 0x10 * 2^2 = 0x40
add $8, $0, $0 # sets register 8 to 0
L2: add $12, $4, $8 #marks L2, $12 <- 0x20 + $8, $8 is iterator
lw $12, 0($12) # $12 = memory at address of ($12) on stack
add $9, $0, $0 # $9 = 0;
L1: add $11, $5, $9 #marks L1, $11 = 0x20 + $9, $9 is iterator
lw $11, 0($11) #$11 = memory at address of ($11) on stack
addi $9, $9, 4 #$9 = $9 + 4
bne $9, $7, L1 # goes to L1 if $9 != $7 0x0 + 0x4 * 0x10 = 0x40 = $9 loop executes 0x10 times
addi $8, $8, 4 # $8 = $8 + 4
beq $8, $6, L2 # goes to L2 if $8 == $6 exits before $8 can equal $6
Instruction definitions
-
sra (shift right arithmetic) (sra
destination
,origin
,shift(in bits)
) rounds down -
sll (shift left logical) (sll
destination
,origin
,shift
) - bne (bne r1, r2, branch address) goes to branch address if r1 != r2
- beq (beq, r1, r2, branch address) goest to branch address if r1 == r2
Calculate number of clock cycles
-
Before loops
- sra 4cc
- sll 4cc
- add 4cc
- total = 12cc
-
L2
- add 4cc
- lw 5cc
- add 4cc
- L1 clocks
- addi 4cc
- beq 3cc
- total = (20cc + L1 clocks)*1 loop
-
L3
- add 4cc
- lw 5cc
- addi 4ccc
- bne 3cc
- total = (16cc * 16 loops) = 256 clocks
-
Total clocks
12cc + 20cc + 256cc = 288cc
Calculate time
288cc/(2*109cc/s) = 1.44 * 10-7 seconds
Q5
X[i] = A[B[i]] + C[i+4]
- starting address of A in $1
- starting address of B in $2
- starting address of C in $3
- starting address of X in $4
- i value in register $5
.globl main
.text
main:
sll $s4, $5, 2 # multiplies i * 4 to conform to address form
add $t2, $2, $s4 # gets address of B[i] offsets address of B by i
lw $t3, ($t2) # sets t3 to value at address t3 = B[i]
sll $t1, $t3, 2 # sets t1 to t3 * 4 to conform to address form
add $t2, $1, $t1 # offsets addres value in $1 by $t1 A[B[i]]
lw $s1, ($t2) # sets t3 to value at address $t2 t3 = A[B[i]]
addi $t1, $5, 4 # offsets i by 4 = i+4
sll $t1, $5, 2 # multiplies i*4 to conform to address form
add $t2, $3, $t1 # offsets C address by i+4
lw $s2, ($t2) # s2 = C[i+4]
add $t1, $s1, $s2 # adds A[B[i]] + C[i+4]
add $s3, $4, $s4 # offsets address of X by i
sw $t1, ($s3) # stores $t1 to address of ($s3)
Q6
The memory units that follow are specified by the number of words times the number of bits per word. How many address lines and input/output data lines are needed in each case? (a) 8K X 16 (b) 2G X 8 (c) 16M X 32 (d) 256K X 64
Part A
- Number of words = 8K
- Number of bits per word = 16
- log base 2 of words = address lines
- 23 * 210 = 213 = 13 address lines
- I/O lines = address lines + bits per word
- 13 + 16 = 29 I/O lines
Part B
- 21 * 230 = 2G
- Address lines = 31
- I/O lines = 31 + 8 = 39
- 39 I/O lines
Part C
- 24 * 220 = 16M
- Address lines = 24
- I/O lines = 24 + 32 = 56
- 56 I/O lines
Part D
- 28 * 210 = 256K
- Address lines = 18
- I/O lines = 18 + 64
- 82 I/O lines
Q7
Find the number of bytes that can be stored in the memories: (a) 8K X 16 (b) 2G X 8 (c) 16M X 32 (d) 256K X 64
Part A
- 8 bits per byte
- 8K words
- 16 bits per word
- number of bits = 8K*16 = 213 * 24 = 217
- number of bytes = 217/23 = 214 = 16K bytes
Part B
- Number of bits = 2G*8 = 231*23
- Number of bytes = 231*23/23 = 231 = 2G bytes
Part C
- Number of bytes = 16M * 32 / 23 = 224 * 32 /23 = 226 = 64M bytes
Part D
- Number of bytes = 256K*64/23 = 218 * 64 /23 = 224 /23 = 221 = 2M bytes
DONE Q8
How many 128 x 8 memory chips are needed to provide a memory capacity of 4096 x 16?
- 4096*16 / 128*8 = 64
Q9
Given a 32 x 8 ROM chip with an enable input, show the external connections necessary to construct a 128 x 8 ROM with four chips and a decoder. Useful Video
- 25 = 32
- 5 address lines for each ROM
- 128/32 = 4 chips
- 4 outputs on decoder (connected to enable inputs)
- 22 = 4
- 2 address lines for the decoder
- 5 + 2 = 7 total lines for complete address

Q10
Assume we have 8GB of word addressable memory with a word size of 64 bits and each refill line of memory stores 16 words. We have a direct-mapped cache with 1024 refill lines. Answer the following questions. [More examples](pdfs/Q10.pdf)❌
- What is the address format for the cache
- If we changed the cache to a 2-way set associative cache, how does the format change?
- If we changed the cache to a fully associative cache, how does the format change?
Solution
-
Part 1
- 8GB = 8*230 bytes
- 64 bit word = 8 byte word
- 8GB / 8 byte word = 1GW (gigaword)
- 1G = 230
- 30 bits for line addresses
- 16 words per line = 24
- 4 bits for word number on line
- address format (for MM) = [30-4 bit line address][4 bit word address]
- 1024 refill lines = 210
- 10 bits for line number
- 26-10 bits for tag
- [tag][line number][word number]
- 16-10-4 = cache memory format
-
Part 2
- 2 way set associate cache
- divide refill lines by 2
- 1024/2 = 512 = 29
- 26-9 = tag
- 17-9-4 = 2 way set cache memory format
-
Part 3
- fully associative cache
- 1 refill line = 20
- 26-0 = tag
- 26-0-4 = fully associate cache
Hard Drive Sample question
This is a writeup of question 1 from [here](pdfs/MagneticDisk.pdf)❌.
Consider a disk pack with the following specifications 16 surfaces, 128 tracks per surface, 256 sectors per track, 512 bytes per sector
Capacity of Disk Space
-
Use the formula
Capacity of Disk Space
. - 16 surfaces * 128 tracks/surface * 256 sectors/track * 512 bytes per sector = \(2^{28}\) bytes
- 256 MB
Bit Required to address sector
-
Use a modified version of
Capacity of Disk Space
to find total number of sectors - 16 surfaces * 128 tracks/surface * 256 sectors/track = 219 sectors
- Use the same method as the cache problems to find the number of bits
- 19 bits required for address
Formatted Disk Space (32 bytes per sector format overhead)
-
Use the formula
Formatting Overhead
- 219 sectors * 32 bytes/sector = 219 * 25 bytes = 224 bytes
- 16 MB
-
Use the formula
Formatted disk space
- Total Disk space - Formatting overhead
- 256 MB - 16 MB
- 240 MB
Useful formulas and references
Hard Drives
Hard Drive Website [Hard Drive Document from nitin](pdfs/MagneticDisk.pdf)❌
Formulas
-
Disk access time
= Seek Time + Rotational delay + Transfer time + Controller overhead + Queuing delay -
Average Disk Access
Time = Average seek time + Average rotational delay + Transfer time + Controller overhead + Queuing delay -
Average Seek Time
= 1/3 * Time taken for one full stroke- (Time taken to move from track 1 to track 1 + Time taken to move from track 1 to last track)/2
- {0 + (k-1)t}/2
- (k-1)t/2
-
Average Rotational Latency
= 1/2 * Time taken for one full rotation -
Capacity of disk pack
= Total number of surfaces * Number of tracks per surface * Number of sectors per track * storage capacity of one sector -
Formatting Overhead
= Number of sectors * Overhead per sector -
Formatted Disk Space
= Total disk space or capacity - formatting overhead -
Recording density or storage density
= Capacity of track / circumference of the track -
Track Capacity
= Recording density of the track * Circumference of the track -
Data Transfer Rate
= Number of heads * Bytes that can be read in one full rotation * Number of rotations in one second = Number of heads * Capacity of one track * Number of rotations in one second - Tracks per surface = (Outer radius - Inner radius) / Inter Track gap
Circuit reference
-
Logic Gates
- Full Adder

Computer Arithmetic
[Computer arithemetic document from nitin](pdfs/Computer Arithemtic.pdf)❌
Memory + Cache
- Main Memory and Organization : credit to Robbie Schad
-
[Cache Notes](pdfs/notes_cache.pdf)❌ -
[Direct Mapping Examples](pdfs/Direct Mapping Examples.pdf)❌ -
[Fully Associative Mapping Examples](pdfs/Fully Associative Mapping Examples.pdf)❌ -
[Set Associate Mapping Examples](pdfs/Set Associative Mapping Examples.pdf)❌
Formulas
-
Direct mapped cache
- physical address size (bits) = TAG + Line Number + Block/Line Offset
- \(\text{MM(size in bytes)} = 2^{\text{number Of Bits In Physical Address}} * 2^3\)
- \(\text{BlockOffset(size in bytes)} = 2^{\text{Bits In Block Offset}} * 2^3 \)
- \(\text{number of lines} = 2^{\text{bits in line number}} = \frac{\text{Cache size}}{\text{Line Size}} \)
- \(\text{tag directory size} = \text{number of lines in cache * Number of bits in tag} = \text{Number of Tags} * \text{Tag size} \)
-
Set Associative Mapped cache
- physical address size (bits) = TAG + Set Number + Block/line Offset
- \(2^\text{Set Number (bits)} = \frac{\text{Lines in cache}}{\text{Number of sets}}\)
-
Fully associative cache
- physical address size (bits) = TAG + Block offset
Pipelining
[pipelining document from nitin](pdfs/Pipelining in Computer Architecture.pdf)❌
Formulas
-
\(\text{Speed Up (S)} = \frac{\text{Non-pipelined execution time}}{\text{Pipelined execution time}}\)
-
\(\text{Efficiency} = \frac{\text{Speed Up}}{\text{Number of stages in Pipelined Architecture}}\)
-
\(\text{Efficiency} = \frac{\text{Number of boxes utilized in phase time diagram}}{\text{Number of boxes in phase time diagram}}\)
-
\(\text{Throughput} = \frac{\text{Number of instructions executed}}{\text{Total time taken}}\)
-
Non-pipelined execution time = Total number of instructions * Time taken to execute one instruction = n * k clock cycles
-
Pipelined execution time
= Time taken to execute first instruction + Time take to execute remaining instructions
= 1 * k clock cycles + (n-1) * 1 clock cycle
= (k + n-1) clock cycles
-
Cycle time
= Maximum delay due to any stage + Delay due to its register -
delay due to its register
=latch delay
-
pipeline time for x tasks = Time taken for 1st task + Time taken for remaining tasks
= number of phases * cycle time + (total tasks -1) * cycle time
MIPS
-
[Mips Theory](pdfs/MIPS-Theory.pdf)❌ - Mips Reference from Berkeley
- Mips Reference from Cburch
- MIPS DATAPATH
- MIPS PIPELINED DATAPATH