01signal.com

Timing constraints for multi-cycle paths

This page belongs to a series of pages about timing. The previous pages explained the theory behind timing calculations, showed how to write several timing constraints and discussed the principles of timing closure. This page discusses timing constraints for multi-cycle paths.

Introduction

The first thing to know about multi-cycle paths is that it's usually a bad idea. Even though this page explains how to use multi-cycle path constraints, the conclusion should be to avoid this technique altogether. There are reasons for using this technique in the ASIC world, but in an FPGA design it's usually better to add a clock instead.

That said, let's see when this kind of timing exception is even relevant.

Consider this example of Verilog code:

   reg foo, bar;
   reg en, pre_en;

   always @(posedge clk)
     begin
	pre_en <= !pre_en;
	en <= pre_en;

	if (en)
	  begin
	     foo <= !foo;
	     bar <= foo;
	  end
     end

Note that this example is incomplete: You probably need to add synthesis attributes to @pre_en and @en. Otherwise, unexpected things can happen due to the synthesizer's optimizations. More on that below.

@pre_en changes back and forth between '0' and '1' on every clock cycle. @en does the same, with a slight delay.

The "if (en)" part means that @en takes the role as a clock enable for everything between "begin" and "end": When @en is low, nothing happens in this part of the Verilog code. In other words, @foo and @bar behave as if the clock edge doesn't exist when @en is low.

In this example, @en is high once in every two clock cycles. Therefore, @foo and @bar behave as if the clock's frequency was half of what it really is. The timing requirements can therefore be eased accordingly: The calculations for tsetup can be done with a clock period that is twice as large.

As for thold, there is no change: The calculation for this timing requirement assumes that the same clock edge reaches both flip-flops. The clock period has therefore no significance, as already discussed in the example of a thold analysis. Accordingly, the illusion of a slower clock makes no difference regarding thold.

When to use a clock enable

There are only two goods reason to use a clock enable:

It's not clear if a clock enable improves the power consumption on an FPGA. Arguably, an extra clock wastes power. But the clock enable is also a signal with a high fan-out. On the average, a clock enable changes its value as often as the clock that it substitutes. So in terms of changes of the logic state (which is the main contributor to power consumption), there is no difference.

Feel free to skip this page unless you have a special reason why you must use multi-cycle paths. Even if you have a clock enable in the design that is technically suitable, it might be better not to use the relevant timing exception. In particular, if the timing constraints are achieved easily without this timing exception, it is not worth the risk of making a mistake.

The timing exception

In relation to the Verilog code above, these are the multi-cycle path constraints for Vivado:

set en_regs [all_fanout -endpoints_only -only_cells -flat [get_nets en]]
set_multicycle_path -setup -from $en_regs -to $en_regs 2
set_multicycle_path -hold -from $en_regs -to $en_regs 1

The same for Quartus:

set en_regs [get_fanouts en]
set_multicycle_path -setup -from $en_regs -to $en_regs 2
set_multicycle_path -hold -from $en_regs -to $en_regs 1

The difference between Vivado and Quartus is just in the first row: The command "all_fanout" is used with Vivado, and "get_fanouts" is used with Quartus. The constraints are similar on other tools that work with SDC.

The first row finds all synchronous elements (cells) that are the end of any path that starts at @en. This list of cell objects is stored in $en_regs. The next couple of rows change the timing requirements for paths that both start and end at cells that are in this list.

This requires a lot of explanations. Why did I write the definition for $en_regs this way? Why are there two set_multicycle_path commands? Why does it say "-hold" on the second command, even though I said that the timing requirements for thold are unaffected by multi-cycle paths?

I shall start with the set_multicycle_path commands, because that's the easier part to explain.

The set_multicycle_path commands

As shown above, the commands are

set_multicycle_path -setup -from $en_regs -to $en_regs 2
set_multicycle_path -hold -from $en_regs -to $en_regs 1

To somewhat generalize the example above, let's consider this as well: If the clock enable was active once in each eight clock cycles, the Verilog code would have been:

   reg en;
   reg [2:0] pre_en;

   always @(posedge clk)
     begin
	pre_en <= pre_en + 1;
	en <= (pre_en == 0);
     end

Note that @en behaves like a strobe, and is active during only one clock cycle each time. It is not the MSb of a clock divider.

The multi-cycle constraints for this possibility would be:

set_multicycle_path -setup -from $en_regs -to $en_regs 8
set_multicycle_path -hold -from $en_regs -to $en_regs 7

Given these two examples, it's clear that the number in the first set_multicycle_path command is simply the division ratio of the clock enable.

As for the second command, its the same division ratio, but minus one. So it's always N and N-1.

There isn't much to explain regarding the first command: If the clock enable is active in one clock cycle out of N, the allowed delay is multiplied by N. This is relevant for the timing requirement of tsetup.

But why is there a second command? Why is there a need to say anything about thold? The answer is that the first command also changes the requirement for the minimal delay. In other words, the calculation for thold is also affected by the command with the "-setup" option. Why? There is probably no good explanation.

The second command rectifies this: It changes back the requirement for the minimal delay to the original value. So after the second command, the calculation for thold is made as it was before.

There are lengthy explanations in the documentation on why N-1 is used in the second command. But truth to be said, there is nothing interesting in that extra information. The behavior of set_multicycle_path with regards to thold is weird, and understanding why the number should be N-1 doesn't make it less weird.

But set_multicycle_path was the easy part. Now comes the real difficulty: To generate the correct list of cell objects for use as $en_regs.

Selecting the registers

In order to select which registers should be listed in $en_regs, it's necessary to understand what makes a path eligible as a multi-cycle path. So this is the rule: It is allowed to ease the timing requirement of a path only if the clock enable controls both sides. This means that when the clock enable is inactive, it is guaranteed that neither of the two sequential elements changes its value after the clock edge.

Think about it in terms of clock domains: All sequential elements that are controlled by the clock enable belong to an imaginary clock domain. The clock inside this imaginary clock domain has a lower frequency, so the timing requirements inside this clock domain can be adjusted.

But if one of the path's sides doesn't belong to this imaginary clock domain, this is an imaginary clock domain crossing between two related clocks. There is no need to do anything special about such a path, because it is already taken care of by the existing timing constraints. But it's incorrect to apply a multi-cycle exception to a path of this sort.

The timing constraints that are shown above reflect this idea: All sequential elements that are controlled by @en are listed in $en_regs as cell objects. Then, the two set_multicycle_path commands are applied to paths that both begin and end at sequential elements that belong to this list.

The most difficult part about multi-cycle paths is to ensure that this list of sequential elements is correct: This list should contain all sequential elements that are controlled by the clock enable. But no other sequential elements should be on this list.

If a sequential element is missing from this list, the timing enforcement for the related paths will be stricter than necessary. This isn't a disaster, but it makes the timing exception less efficient.

But if a sequential element that shouldn't be on the list is added to it by mistake, there can be serious consequences: This will result in paths for which the timing requirements aren't strict enough. In other words, the tools don't guarantee the sequential elements' requirements for proper operation. And when timing requirements are not met, weird things can happen.

I've chosen to use the "all_fanout" command (or "get_fanouts") for creating this list of sequential elements. This is not always guaranteed to work correctly, which is what I'm going to discuss next. After that, I'll discuss other options for creating this list. These other options are relevant in particular when the FPGA tools don't support "all_fanout" or similar commands.

Possible problems with all_fanout and get_fanouts

The most likely mistake with a multi-cycle constraint is that the clock enable itself (i.e. @en) is included in the list (i.e. $en_regs). If this happens, the multi-cycle exception is applied to all paths from @en itself to the sequential elements that it controls. This effectively means that the timing requirements of these paths are not guaranteed. This can have a visible effect, because the clock enable is often a signal with a high fan-out.

In the example from above, this is avoided with @pre_en. You may have asked yourself why @en wasn't defined just like this:

always @(posedge clk)
  en <= !en; // Wrong!

Had @en been defined this way, there would have been a path from @en to itself. As a result, @en would have been included in $en_regs.

So @pre_en solves this problem. But it's important to make sure that the synthesizer doesn't eliminate this register for the sake of optimization. For example, Quartus' synthesizer detects that the only use of @pre_en is to give @en a value (in the Verilog code at the top of this page). Hence the synthesizer removes @pre_en, and continues as if it said "en <= !en". As a result, @en is included in $en_regs. A possible solution is to declare @pre_en as follows:

reg pre_en /* synthesis preserve */;

This simple example demonstrates how an unexpected optimization by the synthesizer can have a disastrous result. Even though the solution is simple, it's easy to overlook the need to prevent this optimization.

Another possible mishap with the clock enable is related to the fact that this signal often has a high fan-out. The tools may therefore replicate the register automatically, so that each replica has a fan-out that is lower than a certain limit. But how will this affect $en_regs? The criterion for inclusion in this list was based upon a specific net. The sequential elements that are controlled by replicas of @en are hence not included.

The result of a situation like this are not disastrous, however: As already mentioned, this only means that the timing enforcement for some paths will be stricter than necessary. The design's reliability is not impacted.

Replication of registers was already discussed in the context of high fan-outs. As mentioned there, it's better to manually replicate @en than waiting for the synthesizer to do it. In order to avoid surprises, always add a synthesis attribute that disallows the replication of this register. If a high fan-out causes problems with timing closure later, solve them with a manual replication. It will be easier to understand the source of the problem this way. If the synthesizer suddenly replicates @en because the project has grown, it will not be as easy to realize why the timing constraints were not achieved.

Either way, the command that defines $en_regs must be updated, so that the replicas of @en are included.

Speaking of which, note that the definition of $en_regs relies on a net's name. As already discussed, this means that $en_regs becomes an empty list if the synthesizer just changes the name of the net to something different from "en". Consequently, the multi-cycle path constraints become completely useless. This possibility isn't a disaster either: The design remains reliable, but it's more difficult to achieve the timing constraints.

Another possible problem is that @en must not be used for anything else than as a clock enable, because of the way $en_regs is defined. For example, consider this Verilog code:

reg [7:0] counter;

always @(posedge clk)
  if (en)
    counter <= counter + 1;

In this example, @en is clearly used as a clock enable. It's therefore OK that all paths that are related to @counter are multi-cycle paths. But what about this?

reg [7:0] counter;

always @(posedge clk)
  if (en)
    counter <= counter + 1;
  else
    counter <= counter - 1;

Here, @en is used just a like any register. The value of @counter changes on every clock cycle. So @counter should definitely not be a candidate for a multi-cycle path. And yet, all flip-flops of @counter are included in $en_regs: There are paths from @en to all of these flip-flops.

This is relatively easy to solve, by creating a replica of @en:

reg [7:0] counter;
reg non_ce_en;

always @(posedge clk)
  non_ce_en <= pre_en;

always @(posedge clk)
  if (non_ce_en)
    counter <= counter + 1;
  else
    counter <= counter + 2;

Note that a synthesis attribute is required to prevent the synthesizer from merging @en and @non_ce_en into one register.

In conclusion, the definition of $en_regs that is based upon all paths that start at @en is simple and concise. But this definition is also a minefield. So let's look at a few alternatives.

Alternative ways for creating $en_regs

The safest way to create a list of sequential elements for a multi-cycle path timing exception is to rely on the design hierarchy: All logic that is controlled by the clock enable should be in a separate module (and possibly in sub-modules). This allows creating $en_regs by finding cells based upon the full name of the cell object. For example, with Vivado:

set all_sync [all_fanout -endpoints_only -only_cells -flat \
  [get_nets -of_objects [get_clocks clk]]]
set en_regs [filter $all_sync {name =~ module_ins/multicycle_ins/* }]

The first command finds all logic elements that are connected to @clk except for the clock buffer itself (this clock is represented by the clock object with the name "clk"). The result is stored in $all_sync. This is one possible way to make a list that includes all synchronous elements that may be relevant. The second command creates a list of all logic elements in $all_sync that are inside the said separate module.

Note that this method relies on the name of the clock object and the names of the instantiations. These are not expected to change. With this method, it doesn't matter if the clock enable is replicated or if its name is changed by the synthesizer.

Another advantage with a separate module is that it's easier to work with the Verilog code: There is a smaller chance for confusion between the sequential elements that are controlled by the clock enable and those that aren't.

However, it's not always natural to separate the logic that depends on the clock enable, and put it in a separate module. Besides, if the Verilog code is already written and is known to work correctly, it may not be a good idea to make changes to it.

I shall also mention another alternative, which may be suitable in some situations: A naming convention for all registers. For example, it's possible to give all registers than are controlled by the clock enable a name that begins with "MC_". This choice makes the command for creating $en_regs simple: It's just a matter of searching for cells objects according to their name. Other logic elements (e.g. block RAMs) can be included as well by choosing their instantiation's name accordingly. Some people will say that this method makes the Verilog code ugly, and some will say that it makes it easier to work with. No method is perfect.

Why not use -of_objects

It might seem appealing to define $en_regs according to a simple criterion: Find the net that has the name "en", and add all registers which are connected to this net. For example, this can be written for Vivado as:

set en_regs [get_cells -of_objects [get_nets en]]

There are several reasons why this is wrong. The first reason is that this includes @en itself. Hence the multi-cycle constraints are applied to all paths from the clock enable itself to the sequential elements that it controls. As mentioned above, this is a serious mistake.

The second reason is that some sequential elements may be overlooked. According to the command above, the criterion for inclusion is that the cell should be connected to a specific net (@en). This works when this net is connected directly to the flip-flop's CE input. But often the synthesizer chooses to use a combinatorial function that is based upon @en instead.

For example, the synthesizer can choose to implement @foo as if the Verilog code was like this:

foo <= foo ^ en;

This is functionally equivalent to the original expression:

if (en)
  foo <= !foo;

An optimization of this sort is legitimate and should be expected: The synthesizer must use a LUT anyhow in order to implement the NOT gate. So why not using this LUT to obtain the next value of the flip-flop directly? Why add another wire to the flip-flop's CE input?

The side effect of this optimization is that the flip-flop itself is not connected directly to @en. It will hence not be included in $en_regs. The implications of a situation like this have already been discussed.

It's often possible to tell the synthesizer to use @en only as the synchronous elements' clock enable input. For example, some synthesizers support a synthesis attribute that is called "direct_enable" or something similar. Note that when this feature is used, the synthesizer's freedom to optimize the logic is reduced. So the design's performance can be negatively impacted in order to solve a technical issue with the tools.

On top of everything, if @en is replicated or renamed, the same problems that have been mentioned above occur.

So for all these reasons, it's a bad choice to use the direct connection to a net as a criterion.

Interaction with a reset

Suppose that we add a synchronous reset to the Verilog example from above:

   always @(posedge clk)
     begin
	pre_en <= !pre_en;
	en <= pre_en;
     end

   always @(posedge clk)
     if (reset)
       begin
	  foo <= 0;
	  bar <= 0;
       end
     else if (en)
       begin
	  foo <= !foo;
	  bar <= foo;
       end

Recall that the idea behind a multi-cycle timing exception was that all synchronous elements behave as if they were all part of an imaginary clock domain. The imaginary clock inside this clock domain has half the frequency of @clk. Hence all registers must ignore @clk when @en is low. This is not true in this last example of Verilog code: @reset has an effect regardless of @en.

For example, consider what happens if @reset is defined like this:

assign reset = foo;

This is a legitimate synchronous reset, even though it has probably no practical use. But this definition shows the problem with a multipath exception: When @en is high, @foo becomes high on the clock cycle after that. But that makes @reset high as well. So on the next clock cycle, @foo becomes low again. @foo changes value on every clock cycle. So the multicycle path from @foo to itself causes a timing requirement that is not tight enough on this path.

This is easily solved by making @en control the synchronous reset as well:

   always @(posedge clk)
     if (en && reset)
       begin
	  foo <= 0;
	  bar <= 0;
       end
     else if (en)
       begin
	  foo <= !foo;
	  bar <= foo;
       end

This is correct, however @reset must be active along with @en. A simple solution is to hold @reset high for several clock cycles.

The same principles apply for an asynchronous reset. Everything that is written on the page about asynchronous resets is relevant here too, but it's even more complicated with multi-cycle path constraints. The easiest solution is probably to use a synchronizer, as suggested on a different page.

@en and asynchronous resets

One of the benefits of @pre_en is that it ensures that all replicas of @en have the same logic level all the time. This may sound obvious, but it's not guaranteed if an asynchronous reset is used incorrectly. For example, suppose that the original Verilog code was:

reg en;

always @(posedge clk or posedge reset)
  if (reset)
    en <= 0;
  else
    en <= !en; // This is not recommended!

If @en is replicated, the result can be equivalent to this:

reg en, en_1, en_2;

always @(posedge clk or posedge reset)
  if (reset)
    begin
      en <= 0;
      en_1 <= 0;
      en_2 <= 0;
    end
  else
    begin 
      en <= !en;
      en_1 <= !en_1;
      en_2 <= !en_2;
    end

Note that the next value of @en_1 depends on its own value, not the value of @en. This is a realistic outcome of a replication of a register.

What happens if the asynchronous reset is disactivated in an unsafe way? It's possible that some replicates of @en react to the first clock edge after the reset, and other replicas ignore this clock edge. The result will be that the logic state of the replicas will never become the same (until the next reset).

@pre_en solves this, because all replicas copy their next value from the same source. This ensures proper operation in the long run, even if there is a rough start.

Summary

It's easy to use a clock enable. It's easy to use set_multicycle_path as a command for a timing exception. But to make this work reliably isn't easy at all. There are many things that can go wrong, and sometimes the reason is that the synthesizer changes its behavior in response to a growing logic design.

The result of these unexpected issues could be that the multi-cycle path doesn't fulfill its purpose: If the eased timing requirements aren't applied on some paths, the benefit of this method is questionable. Even worse, a mistake can lead to an unreliable logic design, if the multi-cycle path exception is applied to paths that should not have been affected.

It is therefore crucial to read the timing reports for the relevant paths in order to ensure that nothing unexpected has occurred. Unfortunately, this doesn't prevent surprises in the future: The synthesizer's behavior is difficult to predict as the design evolves.

So if possible, it's much better to generate an additional clock from the same PLL instead of using a clock enable. The clock domain crossings with this new clock are reliable, because the two clocks are related clocks. The timing constraints for this new clock are generated automatically by the tools. There is no risk for any surprises this way.

So if you're reading this because you want to add a clock enable and a multi-cycle exception to your design, I hope this page has given you something to think about.


This concludes the part about timing constraints for paths that are inside the FPGA. But what about I/O? That's what the next page starts discussing.

Copyright © 2021-2024. All rights reserved. (b4b9813f)