Saturday, February 13, 2016

Basic of Setup and Hold

We all know about D or Delay Flip-Flops. It is edge triggered device which transfer input (D) to output (Q) on its rising or falling of clock (clk). Truth table and timing diagram of D Flip-flop is

DFF_with_Truth Table.JPG
D Flip Flop and Truth Table
D-Flip Flop Timing Diagram
Above simulation timing diagram do not include the delay of D flip-flop. We can analyze this delay's behavior of D flip-flop by following two mux diagram.
Two MUX diagram of D Flip Flop with Truth Table
Let’s take delays of mux-1 and mux-2 are t1 and t2 respectively. When clk is ‘0’ then mux-1 take at least t1 time to update the value from 'D' to 'Qn' and when clk is ‘1’ then 'Qn' is remains same. During t1 time input 'D' must not change. This t1 time is called Setup Time (Ts). So the setup time is the time interval before clock where input ‘D’ must held stable.

Similarly for mux-2 when clk is ‘1’ then mux-2 takes at least t2 to update the value from 'Qn' to 'Qn+1' and when clk is ‘0’ then 'Qn+1' is remains same. During t2 time input 'Qn' must not change. This t2 time is called Hold Time (Th). So the hold time is the time interval after clock where input ‘Qn’ must held stable.

There is one more timing property which we should also need to understand with setup and hold i.e. 'clock to Q' delay. Time interval from clock input to data out from flop.
Timing Diagram of two mux D flip-flop
Timing Diagram of two mux D flip-flop

Summery : Important timing parameter of Flip-Flop are.
Setup Time (Ts) : Till the time input ‘D’ remains stable before the clock edge.
Hold Time (Th) : Till the time input ‘D’ remains stable after the clock edge.
Clock to Output Time (Tc2q) : Time between input clock edge to the output ‘Q’.

SETUP TIME ANALYSIS IN NETWORK For synchronous circuit with in one clock cycle (Tclk) data should transfer from one flop to next flop. This transfer time depends
  1. clock to Q delay of launch flop (Tc2q)
  2. propagation delay of combo logic (Tpd)
  3. setup delay of capture flop (Ts).
Setup Time Analysis
Let’s introduce the delay in timing equation one by one. From above diagram we can say that for successful transfer of ‘D’ the propagation delay of the combo logic should be less than one clock period.
Tpd < Tclk
Tpdis dynamic delay which depends on combo logic delay and Tc2qand Tsis static delay which are fixed for certain flop. The clock to Q delay of launch flop is the additional delay in path, so combined of Tc2qwith Tpd should be less than one clock period Tclk.
Tpd + Tc2q < Tclk
It is also compulsory that data should arrive at capture flop setup time (Ts) before the clock edge.
Tpd + Tc2q < Tclk - Ts
Till this point we have not include clock uncertainty and clock buffers delay in network. Now from above equation data arrival time is (Tpd + Tc2q) and data required time is (Tclk- Ts).
The difference of data required time to data arrival time is called the SLACK.
SLACK = (Tclk - Ts) - (Tpd + Tc2q)
SLACK can be positive or negative, positive slack means data arrive at capture flop is before the clock edge and negative slack means data arrive at capture flop is after the clock edge also called setup violation.
Setup violation accrue if clock arrives early than data. To avoid setup violation 'maximum data delay' should be smaller than 'minimum clock delay'.

In a synchronous circuit transfer time should be greater than hold time of the capture flop. Because data should not change too fast so that it violate hold time of capture flop. This transfer time depends on
  1. clock to Q delay of launch flop (Tc2q)
  2. propagation delay of combo logic (Tpd)
Hold Time Analysis
For proper communication delay should be greater than hold time of capture flop. So
Tc2q+ Tpd > Th
Uncertainty in clock path do not play much role because we use same clock edge for hold time analysis. From above equation data arrival time (Tc2q+ Tpd) should be greater than data require time (Th).
The difference of data arrival time to data required time is called the HOLD SLACK.
SLACK = (Tc2q+ Tpd) - (Th)
SLACK can be positive or negative, positive slack means there is enough delay for capture flop to hold previous data and negative slack means there is less delay and previous data changes before it successfully captured by capture flop this also called hold violation at capture flop.
Hold violation accrue if data arrives early than clock. To avoid hold violation 'maximum clock delay' should be smaller than 'minimum data delay'.

Now also review other timing parameter like clock tree delay and  uncertainty on setup and hold delay in flop network in next part.

Metastability and the Combinatorial Logic

In digital electronics any signal can hold four values that are  
  • 0 - LOW
  • 1 - HIGH
  • x - Unknown - Either 0 or 1.
  • z - High Impedance

Consider a case where we have D flip-flop and there is change in data signal (D) at positive edge of the clock (clk). As shown in shaded reason in timing diagram.

D Flip-Flop and Timing Diagram

Here we can not predict the output (Q) because 'D' is unstable at positive edge of 'clk'. So it is Neither 0 nor 1. This state is called metastable state. Generally it accrues when there are setup and hold violations in design. We should take care of it in advance otherwise it can make whole system metastable.

There is one more condition which can make system unstable that is due to combinational logic change. Let us take an example of 2 input AND gate. Truth table of 2 input AND gate is
AND Gate and Truth Table
AND Gate and Truth Table
From the above truth table the output of AND gate is ‘zero’ for input ‘01’ and ‘10’. When input change from ‘01’ to ‘10’, it can take following two path
AND Gate and Transitions between 01 to 10 and via-versa 
  1.     01 -> 00 -> 10
  2.     01 -> 11 -> 10
When it takes 1st path then output remains ‘0’, but when it take 2nd path then output goes from 0 to 1 then 0. This HIGH output accrue for very short duration. This is called Glitch and this may cause metastability in system. Similar behavior for OR gate also there is LOW output accrue for very short duration.
OR Gate and Transitions between 01 to 10 and via versa
OR Gate and Transitions between 01 to 10 and via-versa
Sometime we can see this behavior in netlist simulation. Designers use Gray Code Technique where small glitch can affect the design. In Gray code only one bit change at a time.

More Useful Links...

Tuesday, February 9, 2016

High Fanout Synthesis

Some times we generally talk about the high fanout nets (HFN). This article discusses about the concepts related to high fanout nets, what are they? How we handle them in synthesis?

What is High Fanout Net (HFN):
Generally the High Fanout nets are those nets which drive many loads. Basically in any design we have clock nets, reset nets, test enable etc. So every sequential element in the design need this connectivity of clock, reset, test enable (in case of scannable flop). Usually these nets cater to many sequential elements so they are generally referred as high fanout nets.

How to handle HFN during Synthesis:
Now the question arises can we set any net as high fanout net. The answer is yes. During synthesis one can explicitly define the high fanout nets by using command set_max_fanout . Synthesis tool will treat all nets as HFN which has more fanout than this defined value. The delay involved in these nets will be large as it is driving many loads. you can see this by doing report timing through this net after doing synthesis. So tool will try to buffer these nets. 

Please note that this HFN buffering will not make much sense in the front-end (implementation) side since the back-end (physical design) tool will remove the buffering and will re-do synthesis at their end. So unnecessarily during synthesis we are buffering the HFN paths. Now how to avoid this.

Well general practice is to keep ideal network on these HFN, by using set_ideal_network command, so that tool will treat these networks as ideal network and will not do buffering at all. This will speed up the process and will save time and redundancy.   

Hope i am clear !! leave comment if you have any.

Wednesday, February 3, 2016

Front End Information - Design to Simulation

Silicon Design means design a system by adding Flops or Latches or Combo logic like AND, OR logic gates. These are the smallest unit of HDL language by which we make our system. These Flops or logic gates are made of transistors and they have different characteristics depends on library used but here we will discuss only design using flop and logic gates. So any logic equation equivalent logic design can be solve by K-Map, but when number of variable increases then it is difficult to manually add logics gates and design.
So to solve the problem we need upper level hardware descriptive language (HDL), which can make design coding easy and conversion from HDL language to RTL is done by synthesis tools. In Industry mainly two HDL language used.
  1. Verilog
  2. VHDL

Both language have similar way of coding or programming. Both have synthesizable as well as  non-synthesizable constructs. Design used for FPGA Prototyping should have synthesizable constructs only. Here the list of some frequently used verilog constructs.
  1. Synthesizable Constructs - These constructs are used in designing of DUT (design under test) and TB (testbench).

IO Ports
input, output, inout


passing the arguments in modules
function and task
Combo logic only
Data flow
Wire and combo logic
for, while, forever


if, else, case

‘Ifdef, ‘else, ‘end
Define statements.
+, -, *, /, %
!, &&, ||
{ }
Bit wise operation
&, |, ^,~

  1. Non-Synthesizable Constructs - These constructs are used in designing of TB (testbench) only.

Time & delay data type is not supported
Real data type is not supported
Mostly used in testbench
Force /Release
force & release
Used in forcing signals in testbench
Mostly used in testbench
Rarely used

Verilog or VHDL language is similar to ‘C’ language even name and functionality of constructs are similar but while coding Good designer know the key difference between both languages that is ‘HDL’ is not sequential language like 'C' and what you will code in 'HDL' is finally going to convert in Combo and Flops. In ‘HDL’ we have output at each clock edge. So while designing with 'HDL' designer should also think hardware equivalent circuit of its design. This also help the back-end tools like synthesis and Place&Route tools to do their job.

Let’s see an example of 4 bit counter. Counter increment by one at each clock when ‘enable’ input is HIGH and reset to zero till input ‘enable’ is LOW.

1st we understand the requirements before writing the verilog code for above system. Above system have 2 inputs ‘enable’ and ‘clock’ and 4 outputs ‘b4c_out[3:0]’. ‘b4c_out[3]’ represents upper bit and ‘b4c_out[0]’ represents lower bit.

// 4 bit synchronous counter with enable
module bit4_counter (clock,enable,b4c_out);
input clock,enable;
output reg [3:0] b4c_out;
always @(posedge clock) begin
if(enable == 1’b1) begin
b4c_out <= b4c_out + 1’b1;
b4c_out <= 4’b0;
When design is complete we can take this design for back-end process but that will take some time and if the design will not work, it is difficult to exact  route cause the problem only reviewing RTL (HDL Code). So before jump to synthesis we can Simulate the design to check its functional behavior. To simulate the design or DUT we need to give stimulus to DUT through its interfaces.
Testbench + DUT
We can write TB or Testbench in verilog or in system-verilog. We are not going to synthesize it so we can use any constructs to build testbench. It has following components
  • Interface with DUT
  • Stimulus to DUT
  • Checker and wavedump.
It is very important that stimulus is correct because against that stimulus only we are testing our design. Most of simulation tools give facility of dumping waveform which help in debugging the exact problem.
Testbench for 4 bit counter and waveform dump is

// testbench for 4 bit counter is
`timescale 1ns/1ps
module testbench;
reg CLOCK;
wire [3:0] B4C_OUT;
bit4_counter bc4_cnt (
.clock   (CLOCK),
.enable  (ENABLE),
.b4c_out (B4C_OUT));
CLOCK = 1’b0;
forever #10 CLOCK = ~CLOCK;
ENABLE = 1’b0;
#100 ENABLE = 1’b1;
#5000 $finish;
// dump waveform
Waveform dump (viewed in Cadence Simvision waveform viewer)