Friday, 12 February 2016

Instruction Pipelining: Introduction

Earlier computers executed instructions sequentially that involved following five steps:

1)  fetch an instruction from memory

2) decode to know what the instruction was

3) read the instruction's input from register file

4) perform the computation

5) write the result obtained to the right location.

The important thing to understand is that each instruction is executed before fetching next one. 

Also, each step listed above require different hardware and executing sequentially means that most of the hardware sit idle, leading to underutilization of computational resources.

To solve this problem, computer architects used the technique of instruction pipelining.

Instruction pipelining is a technique for overlapping the execution of several instructions to reduce the execution time of a set of instructions.

Processors using pipelining are Pentium, PowerPC and ARM to name a few.

There are some pertinent materials on instruction pipelining on the internet and I will be providing the link to some of them.

Instruction pipelines and hazards (video)


Computer Architecture:  Pipelining

RISC Pipelining 






Tuesday, 19 January 2016

Little endian and big endian format

Some data formats like integers, floating number use more than a byte for representation purpose. The CPU needs to assign addresses to 8-bit memory locations and these values must be stored in more than one locations.

There are two commonly used organizations for multibyte data.

1) Big endian format

In this format, the most significant byte of a value is stored in location L, the following byte in location L+1 and so on for next bytes.

Suppose we need to store 32 bits value 0x12345678 in the memory of a computer. Then data organization will look like as given below:

Memory Address
Data (in hex)
300
12
301
34
302
56
303
78

2) Little endian format

In this format, the least significant byte is stored in location L, the next byte in location L+1, and so on. Using the same example as we have used in big endian, data organization will look like as given below:

Memory Address
Data(in hex)
400
78
401
56
402
34
403
12

As long as the CPU is designed to handle a specific format, neither is better than the other.
However, the data needs to be converted if it is transferred between machines using different multi-byte data format.

Sunday, 17 January 2016

Von-Neumann and Harvard computer model

Von Neumann computer model

It contains three main building blocks.

1) Central Processing Unit (CPU)
     It is responsible for processing of data and ensures that programs are executed in correct order

2) Memory
   There is single memory that stores both data and instructions

3) Input-Output devices
   Input devices are used to give input to the computer. Some of the common examples are keyboard      and mouse.

These three components are connected using system bus.

Harvard computer model

Harvard computer model contains same building blocks as Von Neumann model. But there is one important difference. 
There is a separate memory for data and instruction namely- data memory and instruction memory.
Because of this, data and instructions can be fetched from memory simultaneously. This luxury is not available in Von Neumann model.


Von Neumann and Harvard computer model.
Note that input and output devices are not shown, but they are present in both the computer models.




Saturday, 16 January 2016

World's first microprocessor

Most of you will say that it's Intel 4004. However, intel 4004 is the first commercially available microprocessor. The first microprocessor is MP944 which was a classified design used to control the swing wings and flight control of the F-14 Tomcat fighter.

Busicomm digital calculator used Intel 4004


The design work of Intel 4004 was started in April 1970 and it was completed in January 1971.

F-14 Tomcat used MP944 microprocessor


As for MP944, the design work started in 1968 and finished in June 1970.
The reason very few of us know about MP944 is because US Navy wanted to keep it classified. As a result, MP944 remained hidden in the history of microprocessors. 

Implementation of instructions

Instructions are stored in memory encoded in binary representation. How they are encoded? This is decided by ISA(Instruction Set Architecture) designers. There are five stages which take place when implementing any instruction.

1) Fetch

As we know instructions are stored in memory as the binary number. During Fetch stage, CPU send address using address bus to the memory and get the instruction. Then this instruction is stored in a register. This register is earmarked for storing the instruction and is not accessed by the programmer.

There is another register- Program counter which stores the address of next address of the instruction to be fetched. After fetching the instruction, it is incremented to point to next address.

2) Decode

In this stage, CPU determines the type of instruction to be implemented. It is done by checking some bits of the binary representation of instruction. The instruction can be load/store, arithmetic or branch instruction.

3) Register read

After determining the type of instructions in Decode stage, operands are loaded to dedicated registers. Operands are present in a memory called register file. The size of the register file is determined by ISA of the processor.

4) Execute

The operands in the registers are operated upon at Execute stage. The operation can be the addition, comparison, subtraction, multiplication, division.

5) Write back

The result computed in Executed stage is to be stored in the respective register as dictated by the binary representation of the instruction. If the instruction is of store type then the result computed is stored in data memory. 

Wednesday, 13 January 2016

Implication of Amdahl's law

We have discussed the theory and some examples based on Amdahl's theory in previous posts. Here, we will learn about the implication of Amdahl's law that guide the engineers working round the clock to improve the performance of the processor.

For convenience, we will specify the Amdahl's law here.



Frac(unused) is the fraction of time (not instruction) that the improvement is not in use.

Frac(used) is the fraction of the time that improvement is in use and speedup(used) is the speedup that occurs when the improvement is used. 

Let's consider two scenarios in which processor's performance is enhanced.


Enhancement 1: Speedup of 20 on 10% of time

Here, Frac_used=0.1 
         speedup_used=20

Frac_unused= 1- Frac_used=0.9




Hence, speedup in the first scenario is 1.15


Enhancement 2: Speedup of 1.6 on 80% of time

Here, Frac_used=0.8
         speedup_used=1.6

Frac_unused= 1- Frac_used=0.2







Hence, speedup in the second scenario is 1.43

We observed that speedup in the second case is more than the first one.

Implication of Amdahl's law:

From this, it is concluded that it is better to have small speedup on the large percentage of execution time than large speedup on the small percentage of execution time.

Some examples based on speedup and Amdahl's law

Here are some of the examples based on speedup and Amdahl's law. The purpose of these examples is to understand the concepts by applying them in given situations.

Example 1:

A computer spends 80 percent of time executing a particular type of instruction. Engineers claim to improve that instruction execution by the factor of 10. What is the resultant speedup obtained?

Sol:

We need to apply Amdahl's law here.













Example 2:

A computer program to be executed by a given processor has the following characteristic:

S.No
Instruction type
% of time
CPI
1
Integer-type
40
1
2
Branch
20
4
3
Load
30
2
4
Store
10
3


The processor is clocked at 2GHz.

Calculate the speedup obtained in each of the cases

1. Branch instruction CPI is changed from 4 to 3

2. Clock frequency is changed from 2GHz to 2.3GHz

3. Store instruction CPI is changed from 3 to 2.

Sol.

     Let us discuss each case one by one.

Case1:  Branch instruction CPI is changed from 4 to 3



Here Frac_used=0.2 (from table given in the table)
Speedup_used= 4/3=1.33

Frac_unused= 1-  Frac_used= 0.8








Case2:  Clock frequency is changed from 2GHz to 2.3GHz




execution time is inversely proportional to the frequency with which the processor is operating.










Case3: Store instruction CPI is changed from 3 to 2



Frac_used=0.1 (from the table given in the question)
speedup_used= 3/2=1.5

Frac_unused= 1- Frac_used=0.9