RISC-V ASM unaligned read/writes: alternative assembly#10530
Conversation
|
Jenkins: retest this please |
3b8e981 to
b4611fa
Compare
|
I would like to push back on this implementation a bit. Currently, all instructions that could be unaligned get emulated. I think it makes sense to check if the pointers are actually unaligned and only then use the more costly emulation. There should only be a negligible impact on performance in case the data is aligned. This also only introduces a small size overhead, the check should be a few instructions at most. |
|
On a separate issue: In my opinion readability would be increased if you just used the EDIT: This is probably obsolete in case the suggestion above gets adopted |
b4611fa to
bd46955
Compare
Not all RISC-V chips allow unaligned reads and writes with basic assembly instructions like lw/sw. Add alternative assembly that is turned on with: WOLFSSL_RISCV_ASM_NO_UNALIGNED.
bd46955 to
26c45cf
Compare
There was a problem hiding this comment.
I think you missed some in
int wc_AesSetKey(Aes* aes, const byte* key, word32 keyLen, const byte* iv,
int dir)
{
The pointer to key is not necessarily aligned (Found it out the hard way)
There was a problem hiding this comment.
Are you saying then that none of the fields of Aes will be aligned?
If so, then I will need to change the access to the fields: key, reg and tmp.
There was a problem hiding this comment.
That I do not know, the best solution for me is to align the buffers at its source, so I went into the ssl struct and corrected the placement of byte* key.
This was sufficient, so in this very case only key seems to not be aligned by default
There was a problem hiding this comment.
Also, why did you choose to use ALIGN16, is ALIGN8 not already sufficient?
There was a problem hiding this comment.
Because they are 16 byte buffers in AES, I made it align on 16 bytes.
Looking into it, the vector instructions are 64 bit loads and stores so changing to ALIGN8 will be fine.
|
I've modified the macros to check the alignment and choose the sequence to use. Thanks, |
There was a problem hiding this comment.
I see the following pattern several times:
addi t, p, o
andi t, t, <alignment mask>
bnez t, <label>
----
t: scratch register
p: register holding the base address
o: offset
This checks the alignment of p+o, but since o is always a valid offset (is it not?) it is sufficient to check p only.
One instruction can then be saved by doing:
andi t, p, <alignment mask>
bnez t, <label>
There was a problem hiding this comment.
One issue I see is that there is a lot of double checking for alignments:
The bulk variants call the underlying N times, but since the check for the alignment is done in the underlying only, it also gets checked N times.
In case the data is actually aligned there should only be one check.
I'd argue that most of the time the data is (or at the very least should be) aligned, so I'd try to keep the overhead for this case as small as possible.
If the data is unaligned, performance will take a hit even on hardware that supports misaligned loads and stores, e.g. when accesses cross cache-line boundaries.
Description
Not all RISC-V chips allow unaligned reads and writes with basic assembly instructions like lw/sw.
Add alternative assembly that is turned on with:
WOLFSSL_RISCV_ASM_NO_UNALIGNED.
Fixes #10525
Testing
./configure --disable-shared LDFLAGS=--static --host=riscv64 CC=riscv64-linux-gnu-gcc --enable-riscv-asm CFLAGS=-DWOLFSSL_RISCV_ASM_NO_UNALIGNED