Skip to content

Latest commit

 

History

History
177 lines (138 loc) · 6.66 KB

README.md

File metadata and controls

177 lines (138 loc) · 6.66 KB

32 bit RISCV Implementation for FPGAs

This is a soft implementation of a processor on the opensource RISC-V ISA. The project was created on Intel Quartus Prime software and tested on DE10 lite. The processor features a single issue, in-order 4-stage pipeline. It currently supports integer, multiply and single precision IEEE compliant floating point(without float-div) instruction sets.

Initially a proof of concept was developed on logisim. It was at first a single cycle implementation which was later pipelined. Later on it was moved to verilog.

enter image description here

Instructions are fetched at a latency of 1 clock cycle. There is a separate ALU for branch and jump address calculation located in the decode stage. Since the branch addresses are calculated in the decode stage, only a single cycle bubble is inserted when the branch is taken.

To prevent data hazards, port forwarding is done in the decode and the execution stages. The forwarding unit for the decode stage forwards data from memory/write-back stage. Combining the memory and write-back stage removes the stall needed by instructions that immediately follow load instruction as the load data can be forwarded from write-back stage, however since branch is calculated in the decode stage, only then the pipeline is stalled for a single cycle so that the data from previous instruction is available for forwarding if the branch instruction asks for it.

The memory stage features an Avalon master that connects the processor to data memory and other peripherals via the Avalon bus. Currently the processor is equipped with GPIO, UART and TIMER peripherals. Read and write latency is 1 cycle. The Avalon wait-request signal is used to stall the core until the data is available from load. This can increase from more than 1 cycle depending on the type of memory/peripherals used.

Instruction memory and data memory are both 32KB. The instruction and data memory are combined into a single dual port memory for the sake of using ISP in the future and to avoid separating the hex file generated by the compiler into instruction and data sections.

Interrupts are not supported at the moment.

Logic Utilization (Core only)

enter image description here

Benchmarks and Tests

CoreMark

2K performance run parameters for coremark.
CoreMark Size    : 666u
Total ticks      : 1550779828u
Total time (secs): 31
Iterations/Sec   : 129
Iterations       : 4000u
Compiler version : GCC8.3.0
Compiler flags   : -O2,-DITERATIONS=4000, -mabi=ilp32, -march=rv32im (See Makefile)
Memory location  : STACK
seedcrc          : 0xe9f5
[0]crclist       : 0xe714
[0]crcmatrix     : 0x1fd7
[0]crcstate      : 0x8e3a
[0]crcfinal      : 0x65c5
Correct operation validated. See README.md for run and reporting rules.

Coremark/mhz = 2.58

enter image description here

*rv32im is without hard division

Calculation of 100 PI digits:

#define  SCALE  10000
#define  ARRINIT  2000
int  main(void){
timer_set_prescaler(PRE_1);
int  digits = 100*14/4;
int  carry = 0;
int  arr[digits + 1];
for (int  i = 0; i <= digits; ++i)
arr[i] = ARRINIT;
for (int  i = digits; i > 0; i-= 14) {
int  sum = 0;
for (int  j = i; j > 0; --j) {
sum = sum * j + SCALE * arr[j];
arr[j] = sum % (j * 2 - 1);
sum /= j * 2 - 1;
}
printf("%04d",carry+sum/SCALE);
carry = sum % SCALE;
}
int  cnt = timer_read_cnt();
printf("\n100 PI digits\n");
printf("Cycles: %d\n",cnt);
}

Serial Window:

3141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117067
Cycles: 2282586

Dhrystone

enter image description here

	 DMIPS/Mhz = ( (1/execution time) x runs) / (1757 x Frequency in Mhz) 

enter image description here

I used the compiler settings of SiFive for dhrystone test and I got a higher score. I must say that dhrystone test is more of a compiler benchmark.

-O3 -mabi=ilp32 -march=rv32im -mno-div -static -DTIME -DNOENUM -Wno-implicit -save-temps -fno-builtin-printf -fno-common -falign-functions=4 -fno-builtin-printf 
Mcycles = 3930048
DMIPS/MHz = 1.448

Project

Folder Structure

.
+-- _images
+-- _DE10 Lite
|	+--RISCV_CORE (Quartus Prime lite 18.1 project)
+-- _Code
+-- _tests 
|	+--Dhrystone
|	+--Coremark
+-- _IPs (Platform Designer)
|	+--Core
|	+--gpio
|	+--timer
|	+--serial

Steps to compile project:

  1. Download and install Quartus Prime Lite 18.1.
  2. Open RISCV_CORE.qpf in Quartus_DE10.
  3. Change device and pin assignments according to whatever device you are using.
  4. Open tools/platform designer and edit system-on-chip ram IP. Give path of hex file generated by the compiler (See Programming section). Also edit core and check whatever features you need in the core such as adding support for M or F.
  5. Save and generate
  6. Compile Quartus project.
  7. You only need to compile once after which just update memory initialization file and program device again.

Programming

  1. Download the tool chain from here: https://static.dev.sifive.com/dev-tools/riscv64-unknown-elf-gcc-8.3.0-2020.04.0-x86_64-linux-ubuntu14.tar.gz
  2. Extract the compiler and note the path. (opt/riscv/bin/ etc)
  3. Edit makefile and set the path of the compiler.
  4. Add your code to main.c and run make.
  5. The hex file will be generated inside the folder mem_init. Give the path of this file when editing system-on-chip ram IP.

Libraries

**GPIO:**
void  gpio_mode(int, int); //pin(0-31), mode(input,output)
void  gpio_write(int, int);//pin(0-31),HIGH,LOW
int  gpio_read(int);//pin(0-31)
**SERIAL:**
int  serial_putc(int);//print char
void  serial_puts(char*);//print string
char  serial_getc();//get char
int  serial_available();//if recv
**TIMER:**
void  timer_set_prescaler(int);//configure timer clock
void  timer_set_compare(int);//enter compare value
int  timer_read_conf();//read configuration register (OVF,CMPF,PRESCALER)
int  timer_read_cnt();//read timer counter
**UTIL:**
printf ported to UART (Does not support floating yet)
delay function
**LCD:**
1602 LCD  interfaced with GPIOs

Examples

  • LCD
  • UART
  • TIMER