Pipelining is a particularly effective way of organizing parallel activity in a computer system. The basic idea is very simple. It is frequently encountered in manufacturing plants, where pipelining is commonly known as an assembly line operation. By laying the production process out in an assembly line, product at various stages can be worked on simultaneously. This process is also referred to as pipelining, because, as in a pipeline, new inputs are accepted at one end before previously accepted inputs appear as outputs at the other end.
In simple words we can say
Pipelining is a technique of decomposing a sequential process into sub-operation with each sub-operation being executed in specially dedicated segments that operate concurrently with each other.
Instruction pipelininig
Let and refer to the fetch and execute steps for instruction . Execution of a program consists of a sequence of fetch and execute steps is shown in the Figure
Now consider a CPU that has two separate hardware units, one for fetching instructions and another for executing them.
The instruction fetch by the fetch unit is stored in an intermediate storage buffer
The results of execution are stored in the destination location specified by the instruction.
For simplicity it is assumed that fetch and execute steps of any instruction can be completed in one clock cycle.
The operation of the computer proceeds as follows:
- Both the fetch and execute units are kept busy all the time and one instruction is completed after each clock cycle except the first clock cycle.
If a long sequence of instructions is executed, the completion rate of instruction execution will be twice that achievable by the sequential operation with only one unit that performs both fetch and execute.Figure : Hardware Organization1. In the first clock cycle, the fetch unit fetches an instruction (instruction , step ) and stored it in buffer at the end of the clock cycle.2. In the second clock cycle, the instruction fetch unit proceeds with the fetch operation for instruction (step ).3. Meanwhile, the execution unit performs the operation specified by instruction which is already fetched and available in the buffer (step ).4. By the end of the second clock cycle, the execution of the instruction is completed and instruction is available.5. instruction is stored in buffer replacing , which is no longer needed.6. Step is performed by the execution unit during the third clock cycle, while instruction is being fetched by the fetch unit.7. Both the fetch and execute units are kept busy all the time and one instruction is completed after each clock cycle except the first clock cycle.8. If a long sequence of instructions is executed, the completion rate of instruction execution will be twice that achievable by the sequential operation with only one unit that performs both fetch and execute.Basic idea of instruction pipelining with hardware organization is shown in the Figure aboveFigure : Pipeline Execution.
The processing of an instruction need not be divided into only two steps. To gain further speed up, the pipeline must have more stages.
Let us consider the following decomposition of the instruction execution:- Fetch Instruction (FI): Read the next expected instruction into a buffer.
- Decode Instruction ((DI): Determine the opcode and the operand specifiers.
- Calculate Operand (CO): calculate the effective address of each source operand.
- Fetch Operands(FO): Fetch each operand from memory.
- Execute Instruction (EI): Perform the indicated operation.
- Write Operand(WO): Store the result in memory.
There will be six different stages for these six subtasks. For the sake of simplicity, let us assume the equal duration to perform all the subtasks. It the six stages are not of equal duration, there will be some waiting involved at various pipeline stages.
The timing diagram for the execution of instruction in pipeline fashion is shown in the Figure
Figure : Timing sequence in pipeline execution.From this timing diagram it is clear that the total execution time of 8 instructions in this 6 stages pipeline is 13-time unit. The first instruction gets completed after 6 time unit, and there after in each time unit it completes one instruction.Without pipeline, the total time required to complete 8 instructions would have been 48 (6 X 8) time unit. Therefore, there is a speed up in pipeline processing and the speed up is related to the number of stages.
Performance Issue
Now suppose that n instructions are processed and these instructions are executed one after another. The total time required to execute all n instructions is
The time required to execute n instructions without pipeline is Ti
speed up = Ti/Tk
Primary conflicts for instruction pipelining
1.Resource conflict: Caused by access to memory by two segments at the same time.
Most of these conflicts can be resolved by using seperate instruction and data memories- 2.Data dependency: conflicts arise when an instruction depends on the result of a previous instruction but this result is not yet available.
- 3.Branch difficulties: arise from branch and other instructions that change the value of program counter
Resolution of conflicts
1.Hardware interlocks is method to remove data dependency conflict ,An interlock is a
circuit that detects instructions whose source operands are destinations of the instructions farther up in the pipeline.Detection of this situation causes the instruction whose source is not available to be delayed by enough clock cycles to resolve the conflict.
2.Operand forwarding another technique uses special hardware to detect a conflict and then avoid it by routing the data through special paths between pipeline segments.
for resolving the branch conflicts we first have to understand what a branch instruction is
A branch instruction is a instruction that alters the sequencial flow of the program by loading the program counter by targer address
3.Prefetch target instruction: In this technique pre-fetching of the target instruction is involve in addition to the instruction following the branch,both are saved until branch is executed if branching condition satisfied then pipeline continues from target instruction.
4.Branch target buffer: BTB is associative memory included in fetch segment .Each entry consists of address of previously executed branch instruction and target instruction . when pipeline decodes the branch instruction BTB search for the address of instruction if available then prefetch continues from new path.
5.Branch prediction: A pipeline with branch prediction uses special additional logic to predict the outcome of conditional branch and starts fetching the instruction stream from predicted path.
6.Delayed branch: In this procedure the compiler detects the branch and rearranges the machine code by adding some useful instruction that keep the pipeline operating without instrruptions.
4.Branch target buffer: BTB is associative memory included in fetch segment .Each entry consists of address of previously executed branch instruction and target instruction . when pipeline decodes the branch instruction BTB search for the address of instruction if available then prefetch continues from new path.
5.Branch prediction: A pipeline with branch prediction uses special additional logic to predict the outcome of conditional branch and starts fetching the instruction stream from predicted path.
6.Delayed branch: In this procedure the compiler detects the branch and rearranges the machine code by adding some useful instruction that keep the pipeline operating without instrruptions.
No comments:
Post a Comment