The first functional version of our project, rvcodec.js, is now available at https://luplab.gitlab.io/rvcodecjs/!
I have been working with RISC-V quite a bit these past two years. The first few simulation platforms demoing LupIO devices were based on RISC-V processors, which also triggered the development of a new bootloader riscv-dbl, and the interactive textbook framework LupBook embeds a RISC-V based emulator compiled in webassembly.
Quite a few times, I had to manually encode or decode RISC-V instructions, that is find the binary representation of an assembly instruction or vice-versa. For example, it took me some time to fully understand what the bootcode in tinyemu truly meant:
I unfortunately didn’t find any encoder/decoder online, although some exist for other languages such as MIPS. So in FQ21, I recruited two great undergraduate students, Hikari Sakai and Abhi Sohal, to implement an online instruction converter for RISC-V. Hikari worked on the converter engine while Abhi worked on the web interface.
Our project, called rvcodec.js, currently supports the RV32I instruction set as well as the Zifencei extension. We are planning to add the support for more instructions (e.g., RV64I and the MAFDC and Zicsr extensions).
In RISC processors, binary instructions are generally encoded according to a certain format. Here are the existing formats for RISC-V instructions:
I-type format, one of the most common, is typically used for instructions
operating on one register and one immediate value and placing the resulting
value in a register. For instance, take assembly instruction
addi x5, x6, 7;
it instructs the processor to add the value currently contained in register
x6 and the immediate value
7 and place the result in register
According to the specifications,
addi has a 7-bit
opcode field equal to
0010011 and a 3-bit
000. The combination of these two fields is what determines that the
operation to perform is
Now, if the source register is
x6, then the 5-bit
rs1 field is equal to
00110. Similarly, if the destination register is
x5, then the 5-bit
field is equal to
00101. Finally, if the immediate is
7, then the 12-bit
imm field is equal to
Putting it together, and as confirmed by
rvcodec.js, the 32-bit
binary representation of assembly instruction
addi x5, x6, 7 is
0b00000000011100110000001010010011 (which is
0x00730293 in hexadecimal).
Decoding is the exact opposite operation. From the binary representation of an
instruction, we can reconstitute the corresponding (human-readable) assembly
instruction. Try, for example, decoding
rvcodec.js is a single-page application with no dependencies, written in
we refer to as the “backend”, while the “frontend” is the web user interface.
The frontend is contained in subdirectory web-ui. It implements the webpage and calls the backend when a new instruction is to be converted.
The backend is contained in subdirectory
core. It creates an
Instruction object upon receiving an instruction, either in binary or assembly
form. According to the instruction’s form, it runs either only the decoder
(binary to assembly) or both the encoder (assembly to binary) and the decoder.
The reason for running the decoder when converting an assembly instruction to binary is that the decoder also builds a list of instruction “fragments”, which are used by the frontend in order to implement the colored matching of fields. This matching shows how each token of the assembly instruction is related to each field of the binary representation.
This is a young project which plenty of potential for exciting features; supporting more instructions, having an even more accessible frontend, etc.