Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring of C code for encoding and decoding #8

Open
4 of 6 tasks
kstrohmayer opened this issue Oct 20, 2023 · 15 comments
Open
4 of 6 tasks

Refactoring of C code for encoding and decoding #8

kstrohmayer opened this issue Oct 20, 2023 · 15 comments
Assignees

Comments

@kstrohmayer
Copy link

kstrohmayer commented Oct 20, 2023

CC: @kstrohmayer @mole99 @adam-hrvth

Refactoring of actual C code for testing the custom instruction.

  • Encoding / Decoding algorithm

    • Version without instruction set extensions
    • Version with instruction set extensions
  • Testing

    • Generate data
    • Verify encoding and decoding
  • Implement without instruction set extensions

    • Encoding without custom instruction
    • Decoding without custom instruction
  • Implement with instruction set extensions

    • Encoding with custom instruction
    • Decoding with custom instruction

Status

Actual status of the block wise run length algorithm.

C-implementation running on the PC

Implementation without any custom instruction. A function is considered working if it is verified with a self-checking test running at least 10 times with different data.

Coding Status - Adam Status - Leo
Encoding fixed number of signals (4), fixed number of samples (64) completed ?
Decoding fixed number of signals (4), fixed number of samples (64) completed ?
Encoding variable number of signals - signal width is integer divider of 8 (2,4,8,16), fixed number of samples (64) ? ?
Decoding variable number of signals - signal width is integer divider of 8 (2,4,8,16), fixed number of samples (64) ? ?
Encoding variable number of signals - signal width is integer divider of 8 (2,4,8,16), variable number of samples - samples fit fully in a 32bit word (16, 32, 64) ? ?
Decoding variable number of signals - signal width is integer divider of 8 (2,4,8,16), variable number of samples - samples fit fully in a 32bit word (16, 32, 64) ? ?

C-implementation running on the CV32E40X using FPGA

tbd

@kstrohmayer kstrohmayer changed the title Refactoring of C code Refactoring of C code for encoding Oct 20, 2023
@mole99
Copy link
Contributor

mole99 commented Oct 20, 2023

Just one correction: if what we discussed on Thursday hasn't changed, then I will fix the RTL for the custom instruction cntb (#10) and Adam will rework the C library in parallel.

Can you assign Adam to this task then?

@kstrohmayer
Copy link
Author

Pefectly fine for me.

@kstrohmayer kstrohmayer assigned adam-hrvth and unassigned mole99 Oct 22, 2023
@adam-hrvth
Copy link
Contributor

@kstrohmayer @mole99
After our discussion with Klaus, we've decided to investigate the root cause of the encoding errors before making significant algorithm changes. I suspect there may be an issue with the 'prepare_data' function, the data buffer, or where 'rle_compress' fetches the signal for encoding, or perhaps a combination of these two. Even when I feed various random data bits into the 'prepare_data' buffer, the output remains the same, as shown in the attached picture from the first five test runs. It's puzzling that the counted consecutive bits are often fewer than the bits in 'bits_rle_block,' preventing compression. While I'm not entirely certain about the correctness, you can find the output in the attached picture.

I have a separate repository for RLE algorithm testing in Visual Studio. If you want to run a test while I'm away, just check out the 'rle_encoding' branch and run the 'rle.vcxproj' file. This should open Visual Studio for debugging. If you encounter issues, it might be due to the project not being in Console mode, and you can refer to this article for help: https://www.codeproject.com/Questions/5289599/How-do-I-resolve-my-LNK2019-error-in-my-program

rle_test1

@adam-hrvth
Copy link
Contributor

One of the reasons the data in the buffer wasn't changing correctly is that Mario was rewriting the content of the buffer with strcpy((char*)data, "Hello World!"); before passing it to the rle_compress function. I have uploaded some of the results to Google Drive. The results from 0-9 bypass the part of rewriting the buffer. Results from 10-19 use Mario's original code.

Even though I am generating random data bits for each iteration, I only observe some minor changes in the data buffer. However, the data being sent to the rle_compress function remains consistently the same.

@kstrohmayer
Copy link
Author

Even though I am generating random data bits for each iteration, I only observe some minor changes in the data buffer. However, the data being sent to the rle_compress function remains consistently the same.

Might be the case because he only changes the "data word" of the SPI transfer he emulates.

@kstrohmayer
Copy link
Author

One of the reasons the data in the buffer wasn't changing correctly is that Mario was rewriting the content of the buffer with strcpy((char*)data, "Hello World!"); before passing it to the rle_compress function. I have uploaded some of the results to Google Drive. The results from 0-9 bypass the part of rewriting the buffer. Results from 10-19 use Mario's original code.

Does it work now?

@adam-hrvth
Copy link
Contributor

Unfortunately, not yet. The prepare_data function uses a data pointer as a buffer to store the clock, sync, data, and clear signals. However, from my tests, it seems like the data buffer is not handled correctly and doesn't return the correct data to the compression algorithm..

---- RLE test run: 0 ----

---- Prepare Data ----
Data bits: 0x3b37
DAC data: 0x103b370
Empty Data Buffer: 00000053798FF6E8

--- Print Data from buffer ---
Data: 00000053798FF6E8

@kstrohmayer
Copy link
Author

Is the required memory for the data buffer allocated?

@adam-hrvth
Copy link
Contributor

Is the required memory for the data buffer allocated?

Not in the original code. I did add a memory allocation before calling the prepare data function data[DATA_BYTE_SIZE] = malloc(sizeof(uint8_t)); but I haven't seen any difference in the results.

@adam-hrvth
Copy link
Contributor

The cntb_test skips most of the RLE algorithm by not using the prepare_data function. It generates random values and start positions, successfully returning consecutive bits, functioning well in both software and hardware. However, when I ran the rle_test with the same input data as the cntb_test, I obtained different results. It seems the rle_compression function is not handling the input data correctly, leaving most bits uncompressed. I recommend a code review and may need additional support. Test cntb and rle results are available on Google Drive.

@kstrohmayer
Copy link
Author

Hi Adam,
Let's do a call on Monday morning. I'm travelling with Gottfried to Photeon by car. So I don't know exactly when I can do the call.
I'll ping you.

@adam-hrvth
Copy link
Contributor

Hi Adam, Let's do a call on Monday morning. I'm travelling with Gottfried to Photeon by car. So I don't know exactly when I can do the call. I'll ping you.

That's fine, but I think it would be better to do this either in person or when I can share my screen so we could go through my test results.

@adam-hrvth
Copy link
Contributor

adam-hrvth commented Dec 4, 2023

Update: Moved the comment. I mistakenly posted this comment to the wrong issue.

Summary of changes
I have made improvements to the data preparation and loading functions to address issues with loading the data buffer for the encoding process.

There were multiple issues in the encoding process, including incorrect count values, faulty buffer loading, and inaccuracies in tracking the start position and bit values. The incorrect max count values left most signals uncompressed, and the uncompressed signals weren't loaded correctly into the buffer. Inaccuracies in the start position were leading to missing bits and incorrect count values. The handling of last bits between blocks also caused further errors in the count values.

To address these issues, I made changes to the counting function, allowing us to count a total of 32 values, from 0 to 31. The tracking of the start position for cntb was also changed to correctly read all bits for each signal and block. It can now accurately count consecutive bits, determine if the signal is compressed, and update the bit values accordingly.

However, there's a potential issue with the read function. This function returns the encoded data from a bitstream. While it correctly returns the count value, there might be an error in retrieving the bit value and the not_compressed value. This error could affect the decoding process as well. My next steps involve investigating whether the problem lies within the read function or the way we handle the encoded data buffer.

@adam-hrvth
Copy link
Contributor

I have updated the RLE algorithm to achieve a better compression ratio. Currently, the algorithm no longer relies on the not_compressed flag for storing or decompressing data accurately. This modification has led to a reduced number of bits stored in the bitstream/memory, resulting in an improved compression ratio. With the given sample data, the following reductions have been achieved:

  • CLK: 64-bits reduced to only 12-bits, with an 81.25% reduction in size.
  • SYNC: 64-bits reduced to 48-bits, with a 25% reduction in size.
  • DATA: 64-bits reduced to 42-bits, with a 34.38% reduction in size.
  • CLR: 64-bits reduced to 28-bits, with a 56.25% reduction in size.

It's important to note that these results are subjective, as they depend on the consecutive bits found in each signal. The CLK signal demonstrates the best result due to its all-one's nature.

@adam-hrvth adam-hrvth changed the title Refactoring of C code for encoding Refactoring of C code for encoding and decoding Jan 3, 2024
@adam-hrvth
Copy link
Contributor

adam-hrvth commented Jan 9, 2024

The current implementation of the RLE algorithm, running on the ULX3S FPGA, gives the following results. Measured with Saleae Logic 8 logic analyser.
In the following test cases, only the DATA signal differes, to make the comparison easier. In test case three, the signal changed rapidly, leading to less effective compression by the algorithm, hence the increase in size. This also resulted in an increased run time and a lower improvement figure between the hardware and software implementations.

Test Case 1

Signal Uncompressed bit number Compressed bit number Reduction in Size
CLK 64 12 81.25%
SYNC 64 18 71.88%
DATA 64 42 34.88%
CLR 64 18 71.88%
Overall 64 12 64.84%
Performance Improvement With Custom Instruction Without Custom Instruction /
RLE Run time (ms) 66.05704 74.02218 /
Improvement in ms 7.96514 / /
Improvement in percentage 10.76% / /

Test Case 2

Signal Uncompressed bit number Compressed bit number Reduction in Size
CLK 64 12 81.25%
SYNC 64 18 71.88%
DATA 64 45 29.69%
CLR 64 18 71.88%
Overall 64 12 64.84%
Performance Improvement With Custom Instruction Without Custom Instruction /
RLE Run time (ms) 66.84003 74.83414 /
Improvement in ms 7.99411 / /
Improvement in percentage 10.68% / /

Test Case 3

Signal Uncompressed bit number Compressed bit number Reduction in Size
CLK 64 12 81.25%
SYNC 64 18 71.88%
DATA 64 73 -14.06%
CLR 64 18 71.88%
Overall 64 12 64.84%
Performance Improvement With Custom Instruction Without Custom Instruction /
RLE Run time (ms) 80.85817 89.21094 /
Improvement in ms 8.35277 / /
Improvement in percentage 9.36% / /

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants