































| <b>dce</b><br>2011 | CPU Pipelining: Example                                                         |           |         |       |            |          |         |  |  |
|--------------------|---------------------------------------------------------------------------------|-----------|---------|-------|------------|----------|---------|--|--|
| •                  | Assumptions:                                                                    |           |         |       |            |          |         |  |  |
|                    | <ul> <li>Only consider the following instructions:</li> </ul>                   |           |         |       |            |          |         |  |  |
|                    | lw, sw, add, sub, and, or, slt, beq                                             |           |         |       |            |          |         |  |  |
|                    | <ul> <li>Operation times for instruction classes are:</li> </ul>                |           |         |       |            |          |         |  |  |
|                    | <ul> <li>Memory access</li> </ul>                                               |           |         | 2 ns  |            |          |         |  |  |
|                    | <ul> <li>ALU operation</li> </ul>                                               |           |         | 2 ns  |            |          |         |  |  |
|                    | <ul> <li>Register file read or write</li> </ul>                                 |           |         | 1     | l ns       |          |         |  |  |
|                    | <ul> <li>Use a single- cycle (not multi-cycle) model</li> </ul>                 |           |         |       |            |          |         |  |  |
|                    | <ul> <li>Clock cycle must accommodate the slowest instruction (2 ns)</li> </ul> |           |         |       |            |          |         |  |  |
|                    | - Both pipelined & non-pipelined approaches use the same HW components          |           |         |       |            |          |         |  |  |
|                    |                                                                                 |           |         |       |            |          |         |  |  |
|                    | InstrClass                                                                      | IstrFetch | RegRead | ALUOp | DataAccess | RegWrite | TotTime |  |  |
|                    | lw                                                                              | 2 ns      | 1 ns    | 2 ns  | 2 ns       | 1 ns     | 8 ns    |  |  |
|                    | SW                                                                              | 2 ns      | 1 ns    | 2 ns  | 2 ns       |          | 7 ns    |  |  |
|                    | add, sub, and, or, slt                                                          | 2 ns      | 1 ns    | 2 ns  |            | 1 ns     | 6 ns    |  |  |
|                    | beq                                                                             | 2 ns      | 1 ns    | 2 ns  |            |          | 5 ns    |  |  |
| ВК                 |                                                                                 |           |         |       |            |          | 17      |  |  |





































| Or alternatively                                                                                                                                                          |    |    |    |       |     |     |    |     |     |     |   |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|----|----|-------|-----|-----|----|-----|-----|-----|---|
| Clock Number                                                                                                                                                              |    |    |    |       |     |     |    |     |     |     |   |
| Inst. #                                                                                                                                                                   | 1  | 2  | 3  | 4     | 5   | 6   | 7  | 8   | 9   | 10  |   |
| LOAD                                                                                                                                                                      | IF | ID | EX | MEM   | WB  |     |    |     |     |     |   |
| Inst. <i>i</i> +1                                                                                                                                                         |    | IF | ID | EX    | MEM | WB  |    |     |     |     |   |
| Inst. <i>i</i> +2                                                                                                                                                         |    |    | IF | ID    | EX  | MEM | WB |     |     |     |   |
| Inst. <i>i</i> +3                                                                                                                                                         |    |    |    | stall | IF  | ID  | EX | MEM | WB  |     |   |
| Inst. <i>i</i> +4                                                                                                                                                         |    |    |    |       |     | IF  | ID | EX  | MEM | WB  |   |
| Inst. <i>i</i> +5                                                                                                                                                         |    |    |    |       |     |     | IF | ID  | EX  | MEM |   |
| Inst. <i>i</i> +6                                                                                                                                                         |    |    |    |       |     |     |    | IF  | ID  | EX  | 1 |
| <ul> <li>LOAD instruction "steals" an instruction fetch cycle which will cause the pipeline to stall.</li> <li>Thus, no instruction completes on clock cycle 8</li> </ul> |    |    |    |       |     |     |    |     |     |     |   |
| 179 ich                                                                                                                                                                   |    |    |    |       |     |     |    |     |     |     |   |





































| Some example situations                 |                                                                    |                                                                                                                                                                       |  |  |  |  |  |
|-----------------------------------------|--------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|--|
| Situation                               | Example                                                            | Action                                                                                                                                                                |  |  |  |  |  |
| No Dependence                           | LW R1, 45(R2)<br>ADD R5, R6, R7<br>SUB R8, R6, R7<br>OR R9, R6, R7 | No hazard possible because no<br>dependence exists on R1 in the<br>immediately following three instructions.                                                          |  |  |  |  |  |
| Dependence<br>requiring stall           | LW R1, 45(R2)<br>ADD R5, R1, R7<br>SUB R8, R6, R7<br>OR R9, R6, R7 | Comparators detect the use of R1 in the<br>ADD and stall the ADD (and SUB and OR)<br>before the ADD begins EX                                                         |  |  |  |  |  |
| Dependence<br>overcome by<br>forwarding | LW R1, 45(R2)<br>ADD R5, R6, R7<br>SUB R8, R1, R7<br>OR R9, R6, R7 | Comparators detect the use of R1 in SUB<br>and forward the result of LOAD to the ALU<br>in time for SUB to begin with EX                                              |  |  |  |  |  |
| Dependence with<br>accesses in order    | LW R1, 45(R2)<br>ADD R5, R6, R7<br>SUB R8, R6, R7<br>OR R9, R1, R7 | No action is required because the read of<br>R1 by OR occurs in the second half of the<br>ID phase, while the write of the loaded<br>data occurred in the first half. |  |  |  |  |  |

























| Levaluating Branch Alternatives                                                                             |         |            |       |      |      |  |  |  |  |
|-------------------------------------------------------------------------------------------------------------|---------|------------|-------|------|------|--|--|--|--|
| Pipeline speedup = $\frac{\text{Pipeline depth}}{1 + \text{Branch frequency} \times \text{Branch penalty}}$ |         |            |       |      |      |  |  |  |  |
| Assume: 4% unconditional branch,<br>6% conditional branch- untaken,<br>10% conditional branch-taken         |         |            |       |      |      |  |  |  |  |
| Scheduling Branch CPI speedup v.speedup v. scheme penalty<br>unpipelined stall                              |         |            |       |      |      |  |  |  |  |
| Stall pipeline                                                                                              | 3 1.    | 60 3.1     | 1.0   |      |      |  |  |  |  |
| Predict not ta                                                                                              | ken1x0. | 04+3x0.10  | 1.34  | 3.7  | 1.19 |  |  |  |  |
| Predict taken                                                                                               | 1x0.14  | +2x0.061.2 | 6 4.0 | 1.29 |      |  |  |  |  |
| Delayed bran                                                                                                | ch (    | ).5 1.10   | 4.5   | 1.45 |      |  |  |  |  |
|                                                                                                             |         |            |       |      |      |  |  |  |  |
|                                                                                                             |         |            |       |      |      |  |  |  |  |

34



