rvcodec.js: an online encoder/decoder for RISC-V instructions
The first functional version of our project, rvcodec.js, is now available at https://luplab.gitlab.io/rvcodecjs/!
Introduction
I have been working with RISC-V quite a bit these past two years. The first few simulation platforms demoing LupIO devices were based on RISC-V processors, which also triggered the development of a new bootloader riscv-dbl, and the interactive textbook framework LupBook embeds a RISC-V based emulator compiled in webassembly.
Quite a few times, I had to manually encode or decode RISC-V instructions, that is find the binary representation of an assembly instruction or vice-versa. For example, it took me some time to fully understand what the bootcode in tinyemu truly meant:
I unfortunately didn’t find any encoder/decoder online, although some exist for other languages such as MIPS. So in FQ21, I recruited two great undergraduate students, Hikari Sakai and Abhi Sohal, to implement an online instruction converter for RISC-V. Hikari worked on the converter engine while Abhi worked on the web interface.
Our project, called rvcodec.js, currently supports the RV32I instruction set as well as the Zifencei extension. We are planning to add the support for more instructions (e.g., RV64I and the MAFDC and Zicsr extensions).
Instruction encoding/decoding
In RISC processors, binary instructions are generally encoded according to a certain format. Here are the existing formats for RISC-V instructions:
The I-type
format, one of the most common, is typically used for instructions
operating on one register and one immediate value and placing the resulting
value in a register. For instance, take assembly instruction addi x5, x6, 7
;
it instructs the processor to add the value currently contained in register
x6
and the immediate value 7
and place the result in register x5
.
According to the specifications,
addi
has a 7-bit opcode
field equal to 0010011
and a 3-bit funct3
field
equal to 000
. The combination of these two fields is what determines that the
operation to perform is addi
.
Now, if the source register is x6
, then the 5-bit rs1
field is equal to
00110
. Similarly, if the destination register is x5
, then the 5-bit rd
field is equal to 00101
. Finally, if the immediate is 7
, then the 12-bit
imm
field is equal to 000000000111
.
Putting it together, and as confirmed by
rvcodec.js, the 32-bit
binary representation of assembly instruction addi x5, x6, 7
is
0b00000000011100110000001010010011
(which is 0x00730293
in hexadecimal).
Decoding is the exact opposite operation. From the binary representation of an
instruction, we can reconstitute the corresponding (human-readable) assembly
instruction. Try, for example, decoding 0x6ef0b0ef
.
rvcodec.js architecture
rvcodec.js
is a single-page application with no dependencies, written in
simple HTML+CSS+Javascript. It has an independent decoding/encoding engine which
we refer to as the “backend”, while the “frontend” is the web user interface.
The frontend is contained in subdirectory web-ui. It implements the webpage and calls the backend when a new instruction is to be converted.
The backend is contained in subdirectory
core. It creates an
Instruction
object upon receiving an instruction, either in binary or assembly
form. According to the instruction’s form, it runs either only the decoder
(binary to assembly) or both the encoder (assembly to binary) and the decoder.
The reason for running the decoder when converting an assembly instruction to binary is that the decoder also builds a list of instruction “fragments”, which are used by the frontend in order to implement the colored matching of fields. This matching shows how each token of the assembly instruction is related to each field of the binary representation.
Conclusion
This is a young project which plenty of potential for exciting features; supporting more instructions, having an even more accessible frontend, etc.
We are looking for contributors. If you are interested, take a look at the source code and at the list of current issues!