The objective of VLSI (Very Large Scale Integration) physical design for ASICs (Application-Specific Integrated Circuits) is to transform a logical design description (RTL - Register Transfer Level) into a physical layout that can be fabricated as an integrated circuit. This involves translating the high-level functional representation of the circuit into a physical implementation that meets design constraints, performance targets, and manufacturability requirements.
- Architectural Design
- RTL Design / Behavioral Modeling
- Floorplanning
- placement
- clock Tree Synthesis
- Routing
Riscv_toolchain Installation
https://github.com/kunalg123/riscv_workshop_collaterals/blob/master/run.sh
-
Download the run.sh
-
Open terminal
-
cd Downloads
-
./run.sh
-
If permission denied, then
chmod +x run.sh
./run.sh
-
If error in configure,
cd
cd riscv_toolchain/iverilog
./configure
make
sudo make install
-
To check if riscv-gcc compiler is in the path,
-
gedit ~/.bashrc
-
insert this in the bash file if not present:
export PATH=~/riscv_toolchain/riscv64-unknown-elf-gcc-8.3.0-2019.08.0-x86_64-linux-ubuntu14/bin:$PATH
-
Yosys with GTKwave Installation
-
cd
-
git clone https://github.com/YosysHQ/yosys.git
-
cd yosys
-
sudo apt install make
-
sudo apt-get update
-
sudo apt-get install build-essential clang bison flex libreadline-dev gawk tcl-dev libffi-dev git graphviz xdot pkg-config python3 libboost-system-dev libboost-python-dev libboost-filesystem-dev zlib1g-dev
-
make config-gcc
-
make
-
sudo make install
-
sudo apt install gtkwave
-
Type
yosys
If received as shown above, installation is successful.
Introduction to RISCV ISA and GNU Compiler Toolchain
- Introduction to Basic Keywords
- Introduction
- From Apps to Hardware
- Detail Description of Course Content
- Labwork for RISCV Toolchain
- C Program
- RISCV GCC Compiler and Dissemble
- Spike Simulation and Debug
- Integer Number Representation
- 64-bit Unsigned Numbers
- 64-bit Signed Numbers
- Labwork For Signed and Unsigned Numbers
Introduction to ABI and Basic Verification Flow
- Application Binary Interface
- Introduction to ABI
- Memory Allocation for Double Words
- Load, Add and Store Instructions
- 32-Registers and their ABI Names
- Labwork using ABI Function Calls
- Algorithm for C Program using ASM
- Review ASM Function Calls
- Simulate C Program using Function Call
- Lab to Run C-Program On RISCV-CPU
Introduction to Verilog RTL design and Synthesis
- Introduction to Open-Source Simulator iVerilog
- Introduction to iVerilog Design Testbench
- Labs using iVerilog and GTKwave
- Introduction to Lab
- iVerilog GTKwave Part-1
- iVerilog GTKwave Part-2
- Introduction to Yosys and Logic synthesis
- Introduction to Yosys
- Introduction to Logic Synthesis
- Labs using Yosys and Sky130 PDKs
- Yosys good mux
Timing Libs, Hierarchical vs Flat Synthesis and Efficient Flop Coding Styles
- Introduction to Timing Dot Libs
- Introduction to Dot Lib
- Hierarchical vs Flat Synthesis
- Hierarchical Synthesis Flat Synthesis
- Various Flop Coding Styles and Optimization
- Why Flops and Flop Coding Styles
- Lab Flop Synthesis Simulations
- Interesting Optimisations
Combinational and Sequential Optmizations
- Introduction to Optimisations
- Combinational Logic Optimisations
- Sequential Logic Optimisations
- Sequential Optimisations for Unused Outputs
GLS, Blocking vs Non-Blocking and Synthesis-Simulation Mismatch
- GLS Synthesis-Simulation Mismatch and Blocking Non-Blocking Statements
- GLS Concepts And Flow Using Iverilog
- Synthesis Simulation Mismatch
- Blocking And Non Blocking Statements In Verilog
- Caveats With Blocking Statements
- Labs on GLS and Synthesis-Simulation Mismatch
- Labs on Synth-Sim Mismatch for Blocking Statement
Introduction
-
ISA (Instruction Set Archhitecture)
- ISA defines the interface between a computer's hardware and its software, specifically how the processor and its components interact with the software instructions that drive the execution of tasks.
- It encompasses a set of instructions, addressing modes, data types, registers, memory organization, and the mechanisms for executing and managing instructions.
-
RISC-V (Reduced Instruction Set Computing - Five).
- It is an open-source Instruction Set Architecture (ISA) that has gained significant attention and adoption in the world of computer architecture and semiconductor design.
- RISC architectures simplify the instruction set by focusing on a smaller set of instructions, each of which can be executed in a single clock cycle. This approach usually leads to faster execution of individual instructions.
data:image/s3,"s3://crabby-images/596f7/596f740f5bcb6b9f7a86312402c46d7f8e85493e" alt="image"
From Apps to Hardware
-
Apps: Application software, often referred to simply as "applications" or "apps," is a type of computer software that is designed to perform specific tasks or functions for end-users.
-
System software: System software refers to a category of computer software that acts as an intermediary between the hardware components of a computer system and the user-facing application software. It provides essential services, manages hardware resources, and enables the execution of application programs. System software plays a critical role in maintaining the overall functionality, security, and performance of a computer system.'
-
Operating System: The operating system is a fundamental piece of software that manages hardware resources and provides various services for both users and application programs. It controls tasks such as memory management, process scheduling, file system management, and user interface interaction. Examples of operating systems include Microsoft Windows, macOS, Linux, and Android.
-
Compiler: A compiler is a type of software tool that translates high-level programming code written by developers into assembly-level language.
-
Assembler: An assembler is a software tool that translates assembly language code into machine code or binary code that can be directly executed by a computer's processor.
-
RTL: RTL serves as an abstraction level in the design process that represents the behavior of a digital circuit in terms of registers and the operations that transfer data between them.
-
Hardware: Hardware refers to the physical components of a computer system or any electronic device. It encompasses all the tangible parts that make up a computing or electronic device and enable it to perform various tasks.
Detail Description of Course Content
Pseudo Instructions: Pseudo-instructions are used to simplify programming, improve code readability, and reduce the number of explicit instructions a programmer needs to write. They are especially useful for common programming patterns that involve multiple instructions.
Ex: li, mv
.
Base Integer Instructions: The term "base integer instructions" refers to the fundamental set of instructions that form the foundation for performing basic arithmetic, logical, and data movement operations.
Ex: add, sub, and, or, xor, sll
.
Multiply Extension Intructions: The RISC-V architecture includes a set of multiply and multiply-accumulate (MAC) extension instructions that enhance the instruction set to perform efficient multiplication and multiplication-accumulate operations.
Ex: mul, mulh, mulhu, mulhsu
.
Single and Double Precision Floating Point Extension: The RISC-V architecture includes floating-point extensions that provide support for both single-precision (32-bit) and double-precision (64-bit) floating-point arithmetic operations. These extensions are often referred to as the "F" and "D" extensions, respectively. Floating-point arithmetic is essential for handling real numbers with fractional parts and for performing accurate calculations involving decimal values.
Application Binary Interface: ABI stands for "Application Binary Interface." It is a set of rules and conventions that govern how software components interact with each other at the binary level. The ABI defines various aspects of program execution, including how function calls are made, how parameters are passed and returned, how memory is allocated and managed, and more.
Memory Allocation and Stack Pointer
- Memory allocation refers to the process of assigning and managing memory segments for various data structures, variables, and objects used by a program. It involves allocating memory space from the system's memory pool and releasing it when it is no longer needed to prevent memory leaks.
- The stack pointer is a register used by a program to keep track of the current position of the program's execution on the call stack.
C Program
- We wrote a C program for calculating the sum from 1 to n using a text editor, leafpad.
leafpad sumton.c
#include<stdio.h>
int main(){
int i, sum=0, n=111;
for (i=1;i<=n; ++i) {
sum +=i;
}
printf("Sum of numbers from 1 to %d is %d \n",n,sum);
return 0;
}
Using the gcc compiler, we compiled the program to get the output.
gcc sumton.c
.\a.out
- Using the riscv gcc compiler, we compiled the C program.
riscv64-unknown-elf-gcc -O1 -mabi=lp64 -march=rv64i -o sum1ton.o sum1ton.c
-
Using
ls -ltr sum1ton.c
, we can check that the object file is created. -
To get the dissembled ALP code for the C program,
riscv64-unknown-elf-objdump -d sum1ton.o | less
.
-
In order to view the main section, type
/main
. -
Here, since we used -O1 optimisation, the number of instructions are 15.
data:image/s3,"s3://crabby-images/6473e/6473e062ca4548f2e31c46700d5131d57bb7a396" alt="image"
- When we use -Ofast optimisation, we can see that the number of instructions have been reduced to 12.
data:image/s3,"s3://crabby-images/ae524/ae52459205422bac4bbf0e844fe76839c7e8e141" alt="image"
- -Onumber : level of optimisation required
- -mabi : specifies the ABI (Application Binary Interface) to be used during code generation according to the requirements
- -march : specifies target architecture
- In order to view the different options available for these fields, use the following commands
go to the directory where riscv64-unkonwn-elf is present
- -O1 :
riscv64-unkonwn-elf --help=optimizer
- -mabi :
riscv64-unknown-elf-gcc --target-help
- -march :
riscv64-unknown-elf-gcc --target-help
- To quit:
- use
esc :q
to quit
- use
Spike Simulation and Debug
spike pk sum1ton.o
is used to check whether the instructions produced are right to give the correct output.
-
spike -d pk sum1ton.c
is used for debugging. -
The contents of the registers can also be viewed.
- press ENTER : to show the first line and successive ENTER to show successive lines
- reg 0 a2 : to check content of register a2 0th core
- q : to quit the debug process
Unsigned Numbers
- Unsigned numbers, also known as non-negative numbers, are numerical values that represent magnitudes without indicating direction or sign.
- Range: [0, (2^n)-1 ]
Signed Numbers
- Signed numbers are numerical values that can represent both positive and negative magnitudes, along with zero.
- Range :
- Positive : [0 , 2^(n-1)-1]
- Negative : [-1 to 2^(n-1)]
Labwork
- We wrote a C program that shows the maximum and minimum values of 64bit unsigned numbers.
#include <stdio.h>
#include <math.h>
int main(){
unsigned long long int max = (unsigned long long int) (pow(2,64) -1);
unsigned long long int min = (unsigned long long int) (pow(2,64) *(-1));
printf("lowest number represented by unsigned 64-bit integer is %llu\n",min);
printf("highest number represented by unsigned 64-bit integer is %llu\n",max);
return 0;
}
- We wrote a C program that shows the maximum and minimum values of 64bit signed numbers.
#include <stdio.h>
#include <math.h>
int main(){
long long int max = (long long int) (pow(2,63) -1);
long long int min = (long long int) (pow(2,63) *(-1));
printf("lowest number represented by signed 64-bit integer is %lld\n",min);
printf("highest number represented by signed 64-bit integer is %lld\n",max);
return 0;
}
Introduction to ABI
- An Application Binary Interface (ABI) is a set of rules and conventions that dictate how binary code interacts with and communicates with other binary code, typically at the level of machine code or compiled code. In simpler terms, it defines the interface between two software components or systems that are written in different programming languages, compiled by different compilers, or running on different hardware architectures.
- The ABI is crucial for enabling interoperability between different software components, such as different libraries, object files, or even entire programs. It allows components compiled independently and potentially on different platforms to work seamlessly together by adhering to a common set of rules for communication and data representation.
Memory Allocation for Double Words
64-bit number (or any multi-byte value) can be loaded into memory in little-endian or big-endian. It involves understanding the byte order and arranging the bytes accordingly
- Little-Endian: In little-endian representation, you store the least significant byte (LSB) at the lowest memory address and the most significant byte (MSB) at the highest memory address.
- Big-Endian: In big-endian representation, you store the most significant byte (MSB) at the lowest memory address and the least significant byte (LSB) at the highest memory address.
In Little-Endian representation, it would be stored as follows in memory:
data:image/s3,"s3://crabby-images/3dec3/3dec38ea8b26f32d3587759eaaa3848fce889072" alt="image"
In Big-Endian representation, it would be stored as follows in memory:
data:image/s3,"s3://crabby-images/65020/650205341eaa9c87ab69dd08dfe103d7a8693e7f" alt="image"
Load, Add and Store instructions
Load, Add, and Store instructions are fundamental operations in computer architecture and assembly programming. They are often used to manipulate data within a computer's memory and registers. 1. **Load Instructions:** Load instructions are used to transfer data from memory to registers. They allow you to fetch data from a specified memory address and place it into a register for further processing.Example ld x6, 8(x5)
In this Example
ld
is the load double-word instruction.x6
is the destination register.8(x5)
is the memory address pointed to by registerx5
(base address + offset).
- Store Instructions: Store instructions are used to write data from registers into memory.They store values from registers into memory addresses
Example sd x8, 8(x9)
In this Example
sd
is the store double-word instruction.x8
is the source register.8(x9)
is the memory address pointed to by registerx9
(base address + offset).
- Add Instructions: Add instructions are used to perform addition operations on registers. They add the values of two source registers and store the result in a destination register.
Example add x9, x10, x11
In this Example
add
is the add instruction.x9
is the destination register.x10
andx11
are the source registers.
32-Registers and their ABI Names
The choice of the number of registers in a processor's architecture, such as the RISC-V RV64 architecture with its 32 general-purpose registers, involves a trade-off between various factors. While modern processors can have more registers but increasing the number of registers could lead to larger instructions, which would take up more memory and potentially slow down instruction fetch and decode. #### ABI Names ABI names for registers serve as a standardized way to designate the purpose and usage of specific registers within a software ecosystem. These names play a critical role in maintaining compatibility, optimizing code generation, and facilitating communication between different software components.data:image/s3,"s3://crabby-images/c7e6d/c7e6d99be483078e2a26191cd33aab1c19865d4f" alt="image"
Algorithm for C Program using ASM
- Incorporating assembly language code into a C program can be done using inline assembly or by linking separate assembly files with your C code.
- When you call an assembly function from your C code, the C calling convention is followed, including pushing arguments onto the stack or passing them in registers as required.
- The program executes the assembly function, following the assembly instructions you've provided.
Review ASM Function Calls
- We wrote C code in one file and your assembly code in a separate file.
- In the assembly file, we declared assembly functions with appropriate signatures that match the calling conventions of your platform.
C Program
custom1to9.c
#include <stdio.h>
extern int load(int x, int y);
int main()
{
int result = 0;
int count = 9;
result = load(0x0, count+1);
printf("Sum of numbers from 1 to 9 is %d\n", result);
}
Asseembly File
load.s
.section .text
.global load
.type load, @function
load:
add a4, a0, zero
add a2, a0, a1
add a3, a0, zero
loop:
add a4, a3, a4
addi a3, a3, 1
blt a3, a2, loop
add a0, a4, zero
ret
Simulate C Program using Function Call
- Compilation: To compile C code and Asseembly file use the command
riscv64-unknown-elf-gcc -O1 -mabi=lp64 -march=rv64i -o custom1to9.o custom1to9.c load.s
this would generate object file custom1to9.o
.
- Execution: To execute the object file run the command
spike pk custom1to9.o
Lab to Run C-Program on RICV-CPU
-
git clone https://github.com/kunalg123/riscv_workshop_collaterals.git
-
cd riscv_workshop_collaterals
-
ls -ltr
-
cd labs
-
ls -ltr
-
chmod 777 rv32im.sh
-
./rv32im.sh
Introduction to iVerilog Design Testbench
-
Simulator
- It is a tool used for simulating the design. It looks for the changes on the input signals to evaluate the outputs.
- If there is no change in the inputs, the simulator doesn't evaluate the outputs.
- RTL is checked for adherence to the spec by simulating the design.
- The tool used here is iverilog .
-
iVerilog
- It is an open-source Verilog simulator used for testing and simulating digital circuit designs described in the Verilog hardware description language (HDL).
- Both the design and the testbench are fed to the simulator and it produces a vcd (value change dump) file.
- In order to view the vcd file, we use the GTKwave where we can see the wave forms.
-
Design
- It is the actual verilog code or set of verilog codes which ahs the intended functionality to meet with the required specifications.
- Verilog is used to describe the behavior and structure of digital circuits at different levels of abstraction, from high-level system descriptions down to low-level gate-level representations.
-
Testbench
-
A testbench is a specialized Verilog module or program used to verify the functionality and behavior of another Verilog module, circuit, or design. Testbenches are essential for testing and simulating digital designs before they are synthesized or manufactured as physical chips.
-
It is a setup to apply stimulus to the design to check its functionality.
-
Introduction to Lab
-
Make a directory named vsd
mkdir vsd
. -
cd vsd
. -
git clone https://github.com/kunalg123/sky130RTLDesignAndSynthesisWorkshop.git
-
Creates a folder called
sky130RTLDesignAndSynthesisWorkshop
in thevsd
directory.-
my_lib : contains all the library files
-
lib : contains sky130 standard cell library used for our synthesis
-
verilog_model : contains all the standard cell verilog modules of the standard cells contained in the .lib
-
verilog_files : contains all the verilog source files and testbench files which are required for labs
-
iVerilog GTKwave Part-1
-
cd vsd/sky130RTLDesignAndSynthesisWorkshop/verilog_files
-
we have loaded the source code along with the testbench code into the iverilog simulator
-
iverilog good_mux.v tb_good_mux.v
-
We can see that an output file
a.out
has been created. -
./a.out
-
The output of the iverilog, a vcd file, is created which is loaded into the simualtor gtkwave.
-
gtkwave tb_good_mux.vcd
iVerilog GTKwave Part-2
-
In order to view the contents in the files,
-
gvim tb_good_mux.v -o good_mux.v
data:image/s3,"s3://crabby-images/9bd6f/9bd6fed20aa68a7fbdce0f39813cf99ece62ba69" alt="image"
Introduction to Yosys
-
Synthesizer
- It is a tool used for converting RTL design code to netlist.
- Here, the synthesizer used is Yosys.
-
Yosys
- It is an open-source framework for Verilog RTL synthesis and formal verification.
- Yosys provides a collection of tools and algorithms that enable designers to transform high-level RTL (Register Transfer Level) descriptions of digital circuits into optimized gate-level representations suitable for physical implementation on hardware.
data:image/s3,"s3://crabby-images/0a9c9/0a9c9cb189e77aa328f453312d1a9e9c1c6353f6" alt="image"
- Design and .lib files are fed to the synthesizer to get a netlist file.
- Netlist is the representation of the design in the form of standard cells in the .lib
-
Commands used to perform different opertions:
read_verilog
to read the designread_liberty
to read the .lib filewrite_verilog
to write out the netlist file
-
To verify the synthesis
data:image/s3,"s3://crabby-images/1557e/1557e4d0574a0778f67825d8829ca0f023338a8d" alt="image"
- Netlist along with the tesbench is fed to the iverilog simulator.
- The vcd file generated is fed to the gtkwave simulator.
- The output on the simulator must be same as the output observed during RTL simulation.
- Same RTL testbench can be used as the primary inputs and primary outputs remain same between the RTL design and synthesised netlist.
Introduction to Logic Synthesis
-
Logic Synthesis
- Logic synthesis is a process in digital design that transforms a high-level hardware description of a digital circuit, typically in a hardware description language (HDL) like Verilog or VHDL, into a lower-level representation composed of logic gates and flip-flops.
- The goal of logic synthesis is to optimize the design for various criteria such as performance, area, power consumption, and timing.
-
.lib
- It is a collection of logical modules like And, Or, Not etc.
- It has different flavors of same gate like 2 input AND gate, 3 input AND gate etc with different performace speed.
-
Why different flavors of gate?
- In order to make a circuit faster, the clock frequency should be high.
- For that, the time period of the clock should be as low as possible.
data:image/s3,"s3://crabby-images/d50ef/d50efaf1297694159dddcfba51c5210a12c702d7" alt="image"
- In a sequential circuit, clock period depends on:
- Clock to Q of flip-flop A.
- Propagation delay of combinational circuit.
- Setup time of flip-flop B.
data:image/s3,"s3://crabby-images/198c2/198c2ba106bee1c91463aed3af5c82665d7e8c19" alt="image"
-
Why need fast and slow cells?
- To ensure that there are no HOLD issues at flip-flop B, we require slow cells.
- For a smaller propagation time, we need faster cells.
- The collection forms the .lib
-
Faster Cells vs Slower Cells
- Load in digital circuit is of Capacitence.
- Faster the charging or dicharging of capacitance, lesser is the cell delay.
- However, for a quick charge/ discharge of capacitor, we need transistors capable of sourcing more current i.e, we need wide transistors.
- Wider transistors have lesser delay but consume more area and power.
- Narrow transistors have more delay but consume less area and performance.
- Faster cells come with a cost of area and power.
-
Selection of the Cells
- We have to guide the Synthesizer to choose the flavour of cells that is optimum for implementation of logic circuit.
- More use of faster cells leads to bad circuit in terms of power and area and also hold time violations.
- More use of slower cells leads to sluggish circuits amd may not meet the performance needs.
- Hence the guidance is offered to the synthesiser in the form of constraints.
Yosys good_mux
- To invoke yosys
cd
cd vsd/sky130RTLDesignAndSynthesisWorkshop/verilog_files
- Type
yosys
-
To read the library
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
-
To read the design
read_verilog good_mux.v
-
To syntheis the module
synth -top good_mux
- To generate the netlist
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
It gives a report of what cells are used and the number of input and output signals.
-
To see the logic realised
The mux is completely realised in the form of sky130 library cells.
-
To write the netlist
-
write_verilog good_mux_netlist.v
-
!gvim good_mux_netlist.v
-
To view a simplified code
write_verilog -noattr good_mux_netlist.v
!gvim good_mux_netlist.v
-
Introduction to Dot Lib
-
To view the contents in the .lib
gvim ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
-
The first line in the file
library ("sky130_fd_sc_hd__tt_025C_1v80")
:- tt : indicates variations due to process and here it indicates Typical Process.
- 025C : indicates the variations due to temperatures where the silicon will be used.
- 1v80 : indicates the variations due to the voltage levels where the silicon will be incorporated.
-
-
It also displays the units of various parameters.
-
It gives the features of the cells
-
To enable line number
:se nu
-
To view all the cells
:g//
-
To view any instance
:/instance
-
Since there are 5 inputs, for all the 32 possible combinations, it gives the delay, power and all the other parameters for each cell.
-
The below image shows the power consumption and area comparision.
Hierarchical Synthesis Flat Synthesis
Hierarchical Synthesis Hierarchical synthesis is an approach in digital design and logic synthesis where complex designs are broken down into smaller, more manageable modules or sub-circuits, and each module is synthesized individually. These synthesized modules are then integrated back into the overall design hierarchy. This approach helps manage the complexity of large designs and allows designers to work on different parts of the design independently.
-
The file we used in this lab is
multiple_modules.v
cd vsd/sky130RTLDesignAndSynthesisWorkshop/verilog_files
gvim multiple_modules.v
-
multiple_modules
instantiatessub_module1
andsub_module2
-
Launch
yosys
-
read the library file
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
-
read the verilog file
read_verilog multiple_modules.v
-
synth -top multiple_modules
to set it as top module
data:image/s3,"s3://crabby-images/13134/13134945a3b2a144ec189b2b02dcc5dd3e26ce45" alt="image"
- Here it shows
sub_module1
andsub_module2
instead of AND gate and OR gate.
write_verilog -noattr multiple_modules_hier.v
!gvim multiple_modules_hier.v
data:image/s3,"s3://crabby-images/03b87/03b87e91ed51751f456786fc373e846ef69d7b27" alt="image"
data:image/s3,"s3://crabby-images/9664b/9664ba36b790cba71bde6d3ec4da6384d24519b8" alt="image"
Flattened Synthesis Flattened synthesis is the opposite of hierarchical synthesis. Instead of maintaining the hierarchical structure of the design during synthesis, flattened synthesis combines all modules and sub-modules into a single, flat representation. This means that the entire design is synthesized as a single unit, without preserving the modular organization present in the original high-level description.
- Launch
yosys
- read the library file
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
- read the verilog file
read_verilog multiple_modules.v
synth -top multiple_modules
to set it as top moduleabc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
flatten
to write out a flattened netlistshow
data:image/s3,"s3://crabby-images/3db4d/3db4df0ef0ec31bd6d3f221ad6e8263a8faf79ce" alt="image"
write_verilog -noattr multiple_modules_flat.v
!gvim multiple_modules_flat.v
data:image/s3,"s3://crabby-images/8b415/8b415160c1ecb73a0852ecd1b19ed6aad888767e" alt="image"
data:image/s3,"s3://crabby-images/fbd79/fbd790ab1059cb66370347775591d0840f5e88d4" alt="image"
Why Flops and Flop Coding Styles
Why do we need a Flop?
- A flip-flop (often abbreviated as "flop") is a fundamental building block in digital circuit design.
- It's a type of sequential logic element that stores binary information (0 or 1) and can change its output based on clock signals and input values.
- In a combinational circuit, the output changes after the propagation delay of the circuit once inputs are changed.
- During the propagation of data, if there are different paths with different propagation delays, then a glitch might occur.
- There will be multiple glitches for multiple combinational circuits.
- Hence, we need flops to store the data from the combinational circuits.
- When a flop is used, the output of combinational circuit is stored in it and it is propagated only at the posedge or negedge of the clock so that the next combinational circuit gets a glitch free input thereby stabilising the output.
- We use control pins like set and reset to initialise the flops.
- They can be synchronous and asynchronous.
D Flip-Flop with Asynchronous Reset
- When the reset is high, the output of the flip-flop is forced to 0, irrespective of the clock signal.
- Else, on the positive edge of the clock, the stored value is updated at the output.
gvim dff_asyncres_syncres.v
data:image/s3,"s3://crabby-images/3a627/3a627498727ad37063d46507a560b9c10000e95f" alt="image"
D Flip_Flop with Asynchronous Set
- When the set is high, the output of the flip-flop is forced to 1, irrespective of the clock signal.
- Else, on positive edge of the clock, the stored value is updated at the output.
gvim dff_async_set.v
data:image/s3,"s3://crabby-images/cfde4/cfde4111cbd3077543e2b96be15e64b2ba0c76b2" alt="image"
D Flip-Flop with Synchronous Reset
-
When the reset is high on the positive edge of the clock, the output of the flip-flop is forced to 0.
-
Else, on the positive edge of the clock, the stored value is updated at the output.
gvim dff_syncres.v
data:image/s3,"s3://crabby-images/f101c/f101c1e6eb18c2431c497ead66fafc7608d3d0a9" alt="image"
D Flip-Flop with Asynchronous Reset and Synchronous Reset
- When the asynchronous resest is high, the output is forced to 0.
- When the synchronous reset is high at the positive edge of the clock, the output is forced to 0.
- Else, on the positive edge of the clock, the stored value is updated at the output.
- Here, it is a combination of both synchronous and asynchronous reset DFF.
gvim dff_asyncres_syncres.v
data:image/s3,"s3://crabby-images/92c93/92c93374e31135ff8a76ce7b15f23eeac4ab3692" alt="image"
Lab Flop Synthesis Simulations
D Flip-Flop with Asynchronous Reset
-
Simulation
cd vsd/sky130RTLDesignAndSynthesisWorkshop/verilog_files
iverilog dff_asyncres.v tb_dff_asyncres.v
./a.out
gtkwave tb_dff_asyncres.vcd
-
Synthesis
D Flip_Flop with Asynchronous Set
- Simulation
cd vsd/sky130RTLDesignAndSynthesisWorkshop/verilog_files
iverilog dff_async_set.v tb_dff_async_set.v
./a.out
gtkwave tb_dff_async_set.vcd
- Synthesis
cd vsd/sky130RTLDesignAndSynthesisWorkshop/verilog_files
yosys
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog dff_async_set.v
synth -top dff_async_set
dfflibmap -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show
data:image/s3,"s3://crabby-images/f6a74/f6a74b92c58130e9767cc8d683ade3619eb35e70" alt="image"
D Flip-Flop with Synchronous Reset
-
Simulation
cd vsd/sky130RTLDesignAndSynthesisWorkshop/verilog_files
iverilog dff_syncres.v tb_dff_syncres.v
./a.out
gtkwave tb_dff_syncres.vcd
-
Synthesis
cd vsd/sky130RTLDesignAndSynthesisWorkshop/verilog_files
yosys
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog dff_syncres.v
synth -top dff_syncres
dfflibmap -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show
data:image/s3,"s3://crabby-images/05f98/05f98a5ac1e9608cdae3b6fedc937f8d49514404" alt="image"
Interesting Optimisations
gvim mult_2.v
data:image/s3,"s3://crabby-images/1be83/1be83d5a476516ec5e19f2b11d6cd9d61381c6c5" alt="image"
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog mult_2.v
synth -top mul2
data:image/s3,"s3://crabby-images/5b69f/5b69ff5b6d665ba9a2a5fb30d67f4608060451fa" alt="image"
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show
data:image/s3,"s3://crabby-images/58ace/58acef72935066c79c317cf22db8915313fd85e6" alt="image"
write_verilog -noattr mul2_netlist.v
!gvim mul2_netlist.v
data:image/s3,"s3://crabby-images/ab1f4/ab1f4447a4ff3773cec0263ae8aad160ada9eee3" alt="image"
-
gvim mult_8.v
-
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
-
read_verilog mult_8.v
-
synth -top mult8
data:image/s3,"s3://crabby-images/2d03b/2d03bd3a1a1d45d18927a75afaf19dce45a11733" alt="image"
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show
data:image/s3,"s3://crabby-images/46991/46991f5142786c972a90943304c7624e4a1f2f28" alt="image"
write_verilog -noattr mult8_netlist.v
!gvim mult8_netlist.v
data:image/s3,"s3://crabby-images/944a3/944a3ef3c86a671dbdf018364740042839ef4e84" alt="image"
Combinational Optimisation
- Combinational logic refers to logic circuits where the outputs depend only on the current inputs and not on any previous states.
- Combinational optimization is a field of study in computer science and operations research that focuses on finding the best possible solution from a finite set of options for problems that involve discrete variables and have no inherent notion of time.
- Optimising the combinational logic circuit is squeezing the logic to get the most optimized digital design so that the circuit finally is area and power efficient.
- Techniques for Optimisations:
- Constant propagation is an optimization technique used in compiler design and digital circuit synthesis to improve the efficiency of code and circuit implementations by replacing variables or expressions with their constant values where applicable.
- Boolean logic optimization, also known as logic minimization or Boolean function simplification, is a process in digital design that aims to simplify Boolean expressions or logic circuits by reducing the number of terms, literals, and gates required to implement a given logical function.
Sequential Logic Optimisations
- Sequential logic optimizations involve improving the efficiency, performance, and resource utilization of digital circuits that include memory elements like flip-flops and latches.
- Optimizing sequential logic is crucial in ensuring that digital circuits meet timing requirements, consume minimal power, and occupy the least possible area while maintaining correct functionality.
- Optimisation methods:
- Sequential constant propagation, also known as constant propagation across sequential elements, is an optimization technique used in digital design to identify and propagate constant values through sequential logic elements like flip-flops and registers. This technique aims to replace variable values with their known constant values at various stages of the logic circuit, optimizing the design for better performance and resource utilization.
- State optimization, also known as state minimization or state reduction, is an optimization technique used in digital design to reduce the number of states in finite state machines (FSMs) while preserving the original functionality.
- Sequential logic cloning, also known as retiming-based cloning or register cloning, is a technique used in digital design to improve the performance of a circuit by duplicating or cloning existing registers (flip-flops) and introducing additional pipeline stages. This technique aims to balance the critical paths within a circuit and reduce its overall clock period, leading to improved timing performance and better overall efficiency.
- Retiming is an optimization technique used in digital design to improve the performance of a circuit by repositioning registers (flip-flops) along its paths to balance the timing and reduce the critical path delay. The primary goal of retiming is to achieve a shorter clock period without changing the functionality of the circuit.
opt_check
opt_check2
-
gvim opt_check2.v
-
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
-
read_verilog opt_check2.v
-
synth -top opt_check2
-
opt_clean -purge
-
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
-
show
data:image/s3,"s3://crabby-images/513ef/513eff12eddf3f85706d87d13013d244a033bad4" alt="image"
data:image/s3,"s3://crabby-images/1d450/1d45027fc82afbdc9469c187fec4630bad6951f7" alt="image"
opt_check3
gvim opt_check3.v
data:image/s3,"s3://crabby-images/0bbb4/0bbb48c49f8fc579042f1b957ffa344055ca2226" alt="image"
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog opt_check3.v
synth -top opt_check3
opt_clean -purge
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show
data:image/s3,"s3://crabby-images/f124d/f124d0b0661323243e3455b22ff0377a8aaa0dfa" alt="image"
data:image/s3,"s3://crabby-images/441de/441deea0333823c5141d40336c04fd53dbc21081" alt="image"
opt_check4
gvim opt_check4.v
data:image/s3,"s3://crabby-images/8ed7c/8ed7c29fd9c30394de218a739d253bbf57fed6c2" alt="image"
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog opt_check4.v
synth -top opt_check4
opt_clean -purge
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show
data:image/s3,"s3://crabby-images/50de7/50de787499ed9e4c93443e0ce2748a025653661a" alt="image"
data:image/s3,"s3://crabby-images/b2ece/b2eced9a31447f96611ad90185890a26d29411c4" alt="image"
multiple_module_opt
gvim multiple_module_opt.v
data:image/s3,"s3://crabby-images/2b738/2b738d47c504bdc248ce16b871f3fc0d25f3d50e" alt="image"
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog multiple_module_opt.v
synth -top multiple_module_opt
opt_clean -purge
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show
data:image/s3,"s3://crabby-images/b2829/b28292644d70007b666500d506e6b051c7e1accc" alt="image"
data:image/s3,"s3://crabby-images/79a3a/79a3a64e4eeed4d732a251e514b26816a27f45df" alt="image"
dff_const1
gvim dff_const1.v
data:image/s3,"s3://crabby-images/e261c/e261c72192ef599dbc64143a389cb463857d72a7" alt="image"
Simulation
iverilog dff_const1.v tb_dff_const1.v
/a.out
gtkwave tb_dff_const1.vcd
Synthesis
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog dff_const1.v
synth -top dff_const1
dfflibmap -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show
data:image/s3,"s3://crabby-images/e4580/e45807987b75fdf76da08a45bc8d04391b2aac6b" alt="image"
data:image/s3,"s3://crabby-images/2785e/2785e48664827d8219653bf600128758e6222265" alt="image"
dff_const2
gvim dff_const2.v
data:image/s3,"s3://crabby-images/d72a5/d72a570314866adebb73c8305107a80c8176ac4d" alt="image"
Simulation
iverilog dff_const2.v tb_dff_const2.v
/a.out
gtkwave tb_dff_const2.vcd
Synthesis
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog dff_const2.v
synth -top dff_const2
dfflibmap -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show
data:image/s3,"s3://crabby-images/3fbb9/3fbb999eccb0eeac0c549ed2be2f46797e2ef388" alt="image"
data:image/s3,"s3://crabby-images/1e289/1e289ccdef9fcc7af2d3437602964983815c301c" alt="image"
dff_const3
gvim dff_const3.v
data:image/s3,"s3://crabby-images/1b93e/1b93e12965dcd0fbb189f36cc2ab1eea2a793714" alt="image"
Simulation
iverilog dff_const3.v tb_dff_const3.v
/a.out
gtkwave tb_dff_const3.vcd
Synthesis
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog dff_const3.v
synth -top dff_const3
dfflibmap -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show
data:image/s3,"s3://crabby-images/6596b/6596b78e108799384a7615d522a3cf87b1538bb8" alt="image"
data:image/s3,"s3://crabby-images/36351/363515cbab25995dfd3dd7ef8d9e145626c9bfcf" alt="image"
dff_const4
gvim dff_const4.v
data:image/s3,"s3://crabby-images/e5332/e5332dec6630c0b06c168d78b3ed42964c3a9006" alt="image"
Simulation
iverilog dff_const4.v tb_dff_const4.v
/a.out
gtkwave tb_dff_const4.vcd
Synthesis
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog dff_const4.v
synth -top dff_const4
dfflibmap -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show
data:image/s3,"s3://crabby-images/863d3/863d30cb4034ad3f1a0433743723c44a8f689e23" alt="image"
data:image/s3,"s3://crabby-images/81725/81725765fc8d1edf2f2da49ee8046c756b8aff94" alt="image"
dff_const5
gvim dff_const5.v
data:image/s3,"s3://crabby-images/0837c/0837cf51d5be10d11c53962a46547810a9813e98" alt="image"
Simulation
iverilog dff_const4.v tb_dff_const4.v
/a.out
gtkwave tb_dff_const5.vcd
Synthesis
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog dff_const4.v
synth -top dff_const4
dfflibmap -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show
data:image/s3,"s3://crabby-images/c12cb/c12cb4913d1b696ebb7b4bc9d5de70ca02eeec29" alt="image"
data:image/s3,"s3://crabby-images/edc5b/edc5b60adb9f234b71dad7b5ff250b0f7d3b9a09" alt="image"
counter_opt
gvim counter_opt.v
data:image/s3,"s3://crabby-images/0e678/0e6782c79d40a6e402a7fb7fe429458f1ecea4d5" alt="image"
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog counter_opt.v
synth -top counter_opt
dfflibmap -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show
data:image/s3,"s3://crabby-images/64919/64919b54503acad2b2f2a0efc5db96b6244dc43c" alt="image"
data:image/s3,"s3://crabby-images/609f9/609f9c7b321cd37effd0e23493281259c12ec41a" alt="image"
counter_opt2
gvim counter_opt2.v
data:image/s3,"s3://crabby-images/41133/4113315049ca06b24b948f2317b2581845b8475e" alt="image"
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog counter_opt2.v
synth -top counter_opt2
dfflibmap -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show
data:image/s3,"s3://crabby-images/07756/07756bb7bbe7baa681cd7cf51fdbb42a9ab124a2" alt="image"
data:image/s3,"s3://crabby-images/344c7/344c724d73b9bb6b9b63eae8215fcbd0a0e7c263" alt="image"
GLS Concepts And Flow Using Iverilog
- Gate Level Simualtion
- Gate-level simulation is a technique used in digital design and verification to validate the functionality of a digital circuit at the gate-level implementation.
- It involves simulating the circuit using the actual logic gates and flip-flops that make up the design, as opposed to higher-level abstractions like RTL (Register Transfer Level) descriptions.
- This type of simulation is typically performed after the logic synthesis process, where a high-level description of the design is transformed into a netlist of gates and flip-flops.
- We perform this to verify logical correctness of the design after synthesizing it. Also ensuring the timing of the design is met.
data:image/s3,"s3://crabby-images/e30ed/e30edcd1ef5c0e0a26fb72965f8bcaa5ade7d7cd" alt="image"
-
Synthesis-Simulation Mismatch
- A synthesis-simulation mismatch refers to a situation in digital design where the behavior of a circuit, as observed during simulation, doesn't match the expected or desired behavior of the circuit after it has been synthesized.
- This discrepancy can occur due to various reasons, such as timing issues, optimization conflicts, and differences in modeling between the simulation and synthesis tools.
- This mismatch is a critical concern in digital design because it indicates that the actual hardware implementation might not perform as expected, potentially leading to functional or timing failures in the fabricated chip.
-
Blocking Statements
- Blocking statements are executed sequentially in the order they appear in the code and have an immediate effect on signal assignments.
- Example:
module BlockingExample(input A, input B, input C, output Y, output Z); wire temp; // Blocking assignment assign temp = A & B; always @(posedge C) begin // Blocking assignment Y = temp; Z = ~temp; end endmodule
-
Non-Blocking Statements
- Non-blocking assignments are used to model concurrent signal updates, where all assignments are evaluated simultaneously and then scheduled to be updated at the end of the time step.
- Example:
module NonBlockingExample(input clock, input D, input reset, output reg Q); always @(posedge clock or posedge reset) begin if (reset) Q <= 0; // Reset the flip-flop else Q <= D; // Non-blocking assignment to update Q with D on clock edge end endmodule
-
Caveats with Blocking Statements
- Blocking statements in hardware description languages like Verilog have their uses, but there are certain caveats and considerations to be aware of when working with them. Here are some important caveats associated with using blocking statements:
- Procedural Execution: Blocking statements are executed sequentially in the order they appear within a procedural block (such as an always block). This can lead to unexpected behavior if the order of execution matters and is not well understood.
- Lack of Parallelism: Blocking statements do not accurately represent the parallel nature of hardware. In hardware, multiple signals can update concurrently, but blocking statements model sequential behavior. As a result, using blocking statements for modeling complex concurrent logic can lead to incorrect simulations.
- Race Conditions: When multiple blocking assignments operate on the same signal within the same procedural block, a race condition can occur. The outcome of such assignments depends on their order of execution, which might lead to inconsistent or unpredictable behavior.
- Limited Representation of Hardware: Hardware systems are inherently concurrent and parallel, but blocking statements do not capture this aspect effectively. Using blocking assignments to model complex combinational or sequential logic can lead to models that are difficult to understand, maintain, and debug.
- Combinatorial Loops: Incorrect use of blocking statements can lead to unintentional combinational logic loops, which can result in simulation or synthesis errors.
- Debugging Challenges: Debugging code with many blocking assignments can be challenging, especially when trying to track down timing-related issues.
- Not Suitable for Flip-Flops: Blocking assignments are not suitable for modeling flip-flop behavior. Non-blocking assignments (<=) are generally preferred for modeling flip-flop updates to ensure accurate representation of concurrent behavior.
- Sequential Logic Misrepresentation: Using blocking assignments to model sequential logic might not capture the intended behavior accurately. Sequential elements like registers and flip-flops are better represented using non-blocking assignments.
- Synthesis Implications: The behavior of blocking assignments might not translate well during synthesis, leading to potential mismatches between simulation and synthesis results.
- Blocking statements in hardware description languages like Verilog have their uses, but there are certain caveats and considerations to be aware of when working with them. Here are some important caveats associated with using blocking statements:
ternary_operator_mux
gvim teranry_operator_mux.v
data:image/s3,"s3://crabby-images/a2d0c/a2d0c669fdbfa4cb8b41aeeb1c834d707e5293fc" alt="image"
Simulation
iverilog ternary_operator_mux.v tb_ternary_operator_mux.v
./a.out
gtkwave tb_ternary_operator_mux.vcd
Synthesis
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog ternary_operator_mux.v
synth -top ternary_operator_mux
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show
data:image/s3,"s3://crabby-images/747e1/747e1cdb859a5e50f70b1c216f7028a249906758" alt="image"
data:image/s3,"s3://crabby-images/218cf/218cf1f9f60b163a38f8be47e93cfaa1875a6cb6" alt="image"
GLS to Gate-Level Simulation
iverilog ../my_lib/verilog_model/primitives.v ../my_lib/verilog_model/sky130_fd_sc_hd.v ternary_operator_mux_net.v tb_ternary_operator_mux.v
./a.out
gtkwave tb_bad_mux.vcd
bad_mux
gvim bad_mux.v
data:image/s3,"s3://crabby-images/82225/8222538656e8910f4bc1b97a3f2f2d8cbfdaf266" alt="image"
Simualtion
iverilog bad_mux.v tb_bad_mux.v
./a.out
gtkwave tb_bad_mux.vcd
Synthesis
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog bad_mux.v
synth -top bad_mux
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show
data:image/s3,"s3://crabby-images/69671/696710b7157364cfe8188a4f3d65933acfb1f20e" alt="image"
data:image/s3,"s3://crabby-images/7fa82/7fa82261cb14203beb08c745e54fdd6bbd298040" alt="image"
GLS to Gate-Level Simulation
iverilog ../my_lib/verilog_model/primitives.v ../my_lib/verilog_model/sky130_fd_sc_hd.v bad_mux_net.v tb_bad_mux.v
./a.out
gtkwave tb_bad_mux.vcd
blocking_caveat
gvim blocking_caveat.v
data:image/s3,"s3://crabby-images/697b1/697b1722eeb097d7e095df8ea7f10b3f6e13816a" alt="image"
Simualtion
iverilog blocking_caveat.v tb_blocking_caveat.v
./a.out
gtkwave tb_blocking_caveat.vcd
Synthesis
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog blocking_caveat.v
synth -top blocking_caveat
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show
GLS to Gate-Level Simulation
iverilog ../my_lib/verilog_model/primitives.v ../my_lib/verilog_model/sky130_fd_sc_hd.v blocking_caveat_net.v tb_blocking_caveat.v
./a.out
gtkwave tb_blocking_caveat.vcd