Introduction

I have been working with RISC-V quite a bit these past two years. The first few simulation platforms demoing LupIO devices were based on RISC-V processors, which also triggered the development of a new bootloader riscv-dbl, and the interactive textbook framework LupBook embeds a RISC-V based emulator compiled in webassembly.

Quite a few times, I had to manually encode or decode RISC-V instructions, that is find the binary representation of an assembly instruction or vice-versa. For example, it took me some time to fully understand what the bootcode in tinyemu truly meant:

temu

I unfortunately didn’t find any encoder/decoder online, although some exist for other languages such as MIPS. So in FQ21, I recruited two great undergraduate students, Hikari Sakai and Abhi Sohal, to implement an online instruction converter for RISC-V. Hikari worked on the converter engine while Abhi worked on the web interface.

The first functional version of our project, rvcodec.js, is now available at https://luplab.gitlab.io/rvcodecjs/!

It currently supports the RV32I instruction set as well as the Zifencei extension. We are planning to add the support for more instructions (e.g., RV64I and the MAFDC and Zicsr extensions).

Instruction encoding/decoding

In RISC processors, binary instructions are generally encoded according to a certain format. Here are the existing formats for RISC-V instructions:

RISC-V instruction formats

The I-type format, one of the most common, is typically used for instructions operating on one register and one immediate value and placing the resulting value in a register. For instance, take assembly instruction addi x5, x6, 7; it instructs the processor to add the value currently contained in register x6 and the immediate value 7 and place the result in register x5.

According to the specifications, addi has a 7-bit opcode field equal to 0010011 and a 3-bit funct3 field equal to 000. The combination of these two fields is what determines that the operation to perform is addi.

addi instruction format

Now, if the source register is x6, then the 5-bit rs1 field is equal to 00110. Similarly, if the destination register is x5, then the 5-bit rd field is equal to 00101. Finally, if the immediate is 7, then the 12-bit imm field is equal to 000000000111.

Putting it together, and as confirmed by rvcodec.js, the 32-bit binary representation of assembly instruction addi x5, x6, 7 is 0b00000000011100110000001010010011 (which is 0x00730293 in hexadecimal).

addi conversion

Decoding is the exact opposite operation. From the binary representation of an instruction, we can reconstitute the corresponding (human-readable) assembly instruction. Try, for example, decoding 0x6ef0b0ef.

rvcodec.js architecture

rvcodec.js is a single-page application with no dependencies, written in simple HTML+CSS+Javascript. It has an independent decoding/encoding engine which we refer to as the “backend”, while the “frontend” is the web user interface.

The frontend is contained in subdirectory web-ui. It implements the webpage and calls the backend when a new instruction is to be converted.

The backend is contained in subdirectory core. It creates an Instruction object upon receiving an instruction, either in binary or assembly form. According to the instruction’s form, it runs either only the decoder (binary to assembly) or both the encoder (assembly to binary) and the decoder.

The reason for running the decoder when converting an assembly instruction to binary is that the decoder also builds a list of instruction “fragments”, which are used by the frontend in order to implement the colored matching of fields. This matching shows how each token of the assembly instruction is related to each field of the binary representation.

addi fields

Conclusion

This is a young project which plenty of potential for exciting features; supporting more instructions, having an even more accessible frontend, etc.

We are looking for contributors. If you are interested, take a look at the source code and at the list of current issues!