-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for AssemblyScript #35
Comments
That's a good question.! One challenge with AssemblyScript is that the code can no longer be easily extended or modified from the JavaScript realm, and communication between JS and the AssemblyScript code can be quite costly, so we'll want to keep it down to minimum. For starters, I'd suggest trying to create a version of the demo project that works with AssemblyScript, so we can compare the performance. If AssemblyScript does bring a notable performance improvement that can't be achieved by tuning the TS code, we can probably find a way to include an AssemblyScript binary in the releases, either in this repo or a dedicated repo. Once we have more data about the performance we can consider the best course of action. |
This sounds like a good plan. I will take a look at it! |
Thanks! Another idea that I had would be to come up with a AVR → WebAssembly compiler, that is to convert the raw AVR binary into WebAssembly code that does the same, so we don't have to pay the overhead of decoding the each instruction as the program is executing, and perhaps the JIT will be able to do a better job at optimizing the generated code. |
Oh wow. This sounds really interesting! The question would be, how to integrate the peripherals. |
That's a good question. I'd imagine having a bitmap or so that will indicate which memory addresses are mapped to peripherals. Whenever you update a memory address, you'd check in the bitmap. If it has a peripheral mapped to it, then you'd call a web-assembly function that will resemble the |
That should work. The peripherals will be in WebAssembly too? In this case, everything would run in WebAssembly, expect the visuals which have to stay in JavaScript. But then we have again the problem, that all peripherals have to be converted to WebAssembly :/. |
We could also mix-and-match. I believe stuff like the timers, which has to run constantly (in some cases after every CPU instruction or so), will have to live in the WebAssembly land. But some other, less frequent peripherals could possibly bridged over to JavaScript land. There's definitely much to explore here... |
The thing is, it always depends on the workload. So yes it is really interesting, but we need to start at some point with the exploring. Maybe we will see any point where we can say, we are already faster than ever expected. On the whole, you can say the CPU which is simulating will be always a lot faster than the simulated MCU. The gain of optimization is still for slow end devices. So, should we create a plan where to start with testing and exploring the possibilities each approach has? |
Yes, and as you say, it's good to have some baseline to compare to. Right now, the JS simulation has some thing that can already be improved (e.g. the lookup table for instructions), and it runs pretty okay on modern hardware - achieving between 50% speed on middle-range mobile phones and 160%+ speed on higher-end laptops. However, lower end devices (such as Raspberry Pi) only achieve simulation speed of 5% to 10%. So there is definitely room for improvement, especially if we consider the use-case of simulating more than one Arduino board at the same time (e.g. two boards communicating with eachother) |
Ok yes. For some playgrounds etc. it would be really funny and interesting to have multiple Boards running at once. So only if this case will be locked to higher-end devices, we need to improve it. I am actually not familiar with the benchmark code. Is the benchmark possible to run under all mentioned approaches or do we need a new benchmark to make a meaningful comparison? |
The current benchmark is pretty minimal - it runs compares a single-instruction program many many times to compare different approaches for decoding. I think a better benchmark would need:
I'd probably start with just the 1st, simpler benchmark, to get a feeling if the direction seem promising, and if it is, then we can devise a more extensive benchmark that will allow us to do a comprehensive comparison. What do you think? |
Starting with 1st should be the best approach. I would start looking at AssemblyScript or at WebAssembly directly? With WebAssembly directly we also have the decision between AVR Instruction translation and full interpretation (like JavaScript) in WebAssembly. Maybe we can focus on some most needed Assembler instructions to reduce the amount of initial instructions and to focus the benchmark on those. So we can faster see first results and decide after, where we dig deeper or if we can already see a clear winner? |
I believe that Web Assembly interpretation (written in C or RUST) wouldn't be much different than AssemblyScript, but it's pretty easy to write one or two instructions, as you suggest, and compare the generate WAT (Web Assembly Text) between the different implementations. Ideally, if there is no significant difference, using AssemblyScript means we can probably keep one code base which is preferable. Here are some useful resources:
|
Hey, i've been summoned. Yeah, i'm following along 😄 |
Yes, I agree. If there is not a really breaking point which requires to translate the TS code, we should stay at TS and use the AssemblyScript compiler.
|
And hello @gfeun. Nice to meet you! |
Yes, definitely, sounds like a good plan. It's going to be interesting :) |
Hey, I've created 3 WebAssembly Studio (WAS) workspaces for C, Rust and AssemblyScript: The code and folder structure is based on the empty template workspaces for each language. I've tried to bring all together in one workspace, but this is more complicated. I know, the example functions are not really hard to interpret differently. So can you take a look at it and tell me what you think? What would be a good example function to implement to find out something more meaningful? Currently, I have the opinion, that even if AssemblyScript could have at some point an outbreaking performance difference, we would always have the possibility to implement this specific feature in a different language. (To see the For some reference, you can access https://developer.mozilla.org/en-US/docs/WebAssembly/Understanding_the_text_format. |
Oh, wow. I just realized now that all compilers have pre-evaluated the method calls in the |
Yes, the compile seems to be very good at optimizing - it inlines the functions and then also precalculates the result of the expression. Pretty smart! So far, it seems like for basic arithmetic, we get roughly the same amount of opcodes. What I'd try to do next is to implement a complete opcode, e.g. define an array of hold the program data, and then implement an opcode that also reads and updates the data memory, such as Does that make sense? |
I would say it makes sense 😄. I will try it first with the AssemblyScript version and do some copy and paste of the original code. After that, I will convert it to the other both. So for correct understanding:
|
Indeed, we'd need some basic opcode decoding to extract the target registers out of |
Ok, I will do so. |
I had tonight another idea: Would it be possible to add some async command prefetching to reduce the time for opcode decoding? |
Yes, we can either create a different repo (if the code is entirely different), or a new branch here, in case the code is still the same (or auto-generated from the current code, like I did with the benchmark). As for IDE, that makes sense. I use VSCode for this repo, so if you open it with VSCode you should get a list of recommended extensions (prettier, eslint, etc). What do you mean by async command prefetching? |
I currently only mean for testing things. So the example AssemblyScript, Rust and C code. For the "real" implementation I would prefer a different branch with the target to bring it to master. I will try to get warm with VSCode for dev purposes ;D. We currently discovered the problem with the big if-else statement. The idea would be to evaluate this statement for the next opcode async. But this requires to bring the code in a format where this is possible. |
This is the basic algorithm:
So WASM can export a function that gets the amount of cycles, and runs the code for that many cycles (more or less, it doesn't have to be super accurate), then yields control back to JS. |
Ok. So the HEX files are compiled to WASM -> Resulting in a WASM program containing all the instructions of the program. Is this correct? I'm feeling like hanging somewhere. |
Yes. Then, this may open the door for further optimizations (e.g. skipping costly flag calculation if the next instruction discards the flags anyway), but let's first see if we can get the basic thing going and how much faster it is. |
Ok. Does this way need a rework of the CPU class? I think the first thing should also be trying to compile the project to WASM and then adding the compiling of the HEX files. Or do you think another way is better? |
This may need a lot more changes than just emitting WASM code directly. You may find the AVR to JS compiler experiment useful as a reference. If I remember correctly, I started from The code that decodes the opcodes and their args will probably stay the same, but the part where you generate the code will probably be different. probably something along the lines of: ...
} else if ((opcode & 0xfe08) === 0xf800) {
/* BLD, 1111 100d dddd 0bbb */
const b = opcode & 7;
const d = (opcode & 0x1f0) >> 4;
/* something that will generate the WASM equivalent of
data[d] = (~(1 << b) & data[d]) | (((data[95] >> 6) & 1) << b);
i.e. convert the following pseudo-code into WASM
temp1 ← data[d]
temp1 ← temp1 & ~(1 << b)
temp2 ← data[95]
temp2 ← temp2 >> 6
temp2 ← temp2 & 1
temp2 ← temp2 << b
data[d] ← temp1 | temp2
where each line in the above code is probably a single or two WASM instructions, and `~(1 << b)` is actually a constant (because we know the value of b at compile time)
*/;
... I hope this is helpful! |
Thank you. I will take a deeper look at it later this week. |
Hi @urish , Maybe you have some time, that we can talk about it in an online meeting? Best regards! |
Hi Derio! Great to read you got some progress! Feel free to book some time on my calendar at https://urish.org/calendar |
@Dudeplayz are you joining the call? |
@urish I am on the way, 3 min. Sorry! |
Alright, I'm waiting :-) |
@urish thanks for the talk this week. I have found the reason for the failing opcodes. It is mainly by an implicit type casting from u16 to u32, where u16 flips around and is then cast to u32, which doesn't flip around again because it can handle larger numbers. The discussion directly in AssemblyScript can be found here AssemblyScript/assemblyscript#2131. It seems that I still found a bug in AssemblyScript. |
And here is a link for the portability which you have mentioned. The typecasting is then a little bit different. |
I got it! The test program is now running without discrepancies. I had also to update the instruction.ts file, which I skipped due assumption it hasn't changed, but that was wrong. Here is the instruction.ts file with the applied fixes using the portability approach. If you like we can merge it into the main project soon. The made changes can be found here: wokwi/avr8js-research@42f0b39. The only thing I haven't tested yet is how to import the portability AS library in normal TS. I am now trying to get the jest unit tests running, which are throwing some errors due to my node.js environment I think. |
Hi Dario! Congrats on spotting the issue. Most of the instructions are covered by the unit tests, but not all of them. But I think, if the unit tests pass and Blink also eventually works, that's already a very good starting point. Thanks for sharing the modified instructions.ts. I merged your changes into a working branch, as-interop. I added an implementation of the u16/u32 functions, so the code still compiles/runs correctly with typescript. See commit b81a21d. If it looks good for you, I'm ready to merge it into master. |
I have some trouble getting the jest running with the assemblyscript/loader. I'm working on it to get the unit tests working. If I get it working or get stuck, I will try to get the timer working for testing the blink program. I had a look, and the only thing I am not sure about is the import of the types file. The docs are not well describing how to get the portability working. They throw some statements at you and mention some projects where to look. Maybe we wait with merging the AS things until I have some more parts finished. I think we have to extend the AssemblyScript library or import it. I had a look at the portability class and the functions they provide are designed to transform any number in a way that describes the same behavior as in AS/WASM (overflows, trimming, etc.). So if we start with that it means that WASM would be the preferred/limiting target. It would be also nice if you could mention my contribution somewhere. Atm, it wouldn't be clear, as you do the commits. I hope that is ok :) |
If you get hung on it for too long, remember you can always run the tests without jest. You'd need to create some kind of
I'm not sure - so do you advise to merge or wait? In general, there are two parts which are very sensitive to performance: instructions and the count() function in the timers. If we introduce code that uses the compatibility library, we need to make sure it doesn't impact performance. My
Of course. I added a comment with your name at the top of instructions.ts. And if you feel like making the commit under your name - then sure, go for it. Then I can merge your commit in place of mine. |
Thanks for the hint! I will do this now. I was very busy the last 2 weeks so I had to pause a bit.
If you don't plan to work on the instructions in the next weeks, we can merge. Otherwise, we could get the problem, that you can't test the compatibility with AS until we copied/merged it in the research project to see if the compiler is fine with it.
Ok. that would be nice, so I will do my own commit. I have still the problem, that I am not familiar with the Github Merge process 😅 So let's wait a bit until I got the tests running and I will try to create a Merge-Request. |
Sounds like a plan! In general, merge can be done in a few ways. The most straightforward one is when your branch places all the commits on top of the last commit in master, then these commits are simply copied over. Otherwise, there are a few options when I merge:
There's a nice book from @PascalPrecht that explains this, in case you want to better understand the process: https://rebase-book.com/ |
Hi @urish , I wish you a happy new year! |
Happy new year Dario, good to have you back!
Probably just a selection. I'm not even sure all instructions are covered.
I'd try to reason about the overflow issues you find, and just look which other instructions might be prone to them. Then we can come up with a few additional test cases to cover these cases specifically. Or -
That could work. In terms of memory size, the difference between u8 and u32 should be negligible. So I see no reason not to go with u32. JavaScript uses double internally, but if I'm not wrong, they are truncated to i32/u32 when you apply bitwise operations (depending on the operation). So I say - go for it, and let's find out :) |
Thanks :)
I have already added some tests for ADD and the SUB's. I would leave it for now as it is. I think I discovered most of the reasons for the failings in the commands. Always something with wrong int sizes. Later we should add the Rest of the tests. The best here is also to cover the edge cases, where the underlying data type could cause any errors.
The problems I discovered and solved until now are:
In the fix process, I overlooked and overthink some of the commands. I haven't found another reason until now. If you have an idea which command could break, name it and I check it.
For now, I think the commands are primarily fixed. The change to u32 is only possible inside the instructions. But there are still some drawbacks. The main problem is, that the
Another thing I came about is, could we refactor all data array accesses to use the data view? So the data type is always clear by the access command to the data view. Or is the overhead to high? For now, I would continue to get the cpu.spec.ts running. This results in implementing the rest of the CPU class, which I skipped for the callbacks etc. because there is also some glue code required. In the future, I think we should also update the peripherals to be compiled directly. |
Make sense
Good question. From my experience (I did some profiling a few months ago) it greatly varies between browsers. If I recall well, Chrome optimized DataView access, FireFox didn't (I think that's one of the reasons Wokwi runs faster on Chrome). But I guess you could try to do a quick profiling and see if it changed since.
When you get to that, Timers and SPI are good candidates. Timers are pretty complex (so might take a big amount of work to compile), but they can run as often as every CPU clock tick (so can greatly affect the performance). SPI is much simpler, on the other hand, but when it's used it can run as fast as half the clock speed, so it can also become a bottleneck. |
I will see if I can test it. If not it shouldn't be a problem. It's just an optimization to unify the calls and make the code a bit cleaner. So I will take a look at the end if time is left.
Thank's for the hint. I will do this. I am currently adding the interop for the eventCallbacks of the CPU. Then the CPU is finished, also the glue code for it. So now the next problem I have discovered. There are currently no closures supported in assemblyscript. Because functions can't passed directly between the JS and WASM world, my plan was to use some glue methods which take a callbackId and the WASM code is then calling with this id and the JS code executes the callbacks on the JS side. Here is my current code: // declared JS function to be called from WASM
export declare function callClockEventCallback(callbackId : u32) : void;
// exported WASM function to be called from JS side
export function addClockEvent(cpu : CPU, callbackId : u32, cycles : u64) : AVRClockEventCallback{
return cpu.addClockEvent(() => callClockEventCallback(callbackId), cycles)
} The compilation fails with: ERROR AS100: Not implemented: Closures
return cpu.addClockEvent(() => callClockEventCallback(callbackId), cycles)
The implementation status is still open AssemblyScript/assemblyscript#798.
What do you think? Is one of these the way to go or do you have another idea and hints? |
A similar problem is with the hooks. Every array access where an anonymous function is passed, is not usable because they can't be passed between JS<->WASM. Currently I only see a solution in adding methods for it, like |
Hello @urish , |
Thank you very much Dario! I think the next step would be, as you suggested, benchmark the performance improvements. I would start with some computationally heavy program, like calculating pi or the Mandelbrot Set. Also, I have recently come across this great article diving into the minutes of JS vs Web Assembly performance, and how to specifically optimize AssemblyScript code. After reading it, I'm not very optimistic that AssemblyScript will get us a big performance gain - if we are lucky, it might be somewhat faster than JavaScript. If this is the case, we can instead look at translating the AVR opcodes into Web Assembly, which may get us a much more significant performance gain (as we won't have to decode every instruction again and again). In any case, having a benchmark (or few) will be very helpful to guide our way. So I suggest focusing on that, and once we're certain about the direction, we can start looking into the details of how to package everything (e.g. assemblyscript/loader). |
Closing due to inactivity. |
Hey Uri,
We have already spoken about the possibilities to speed up the simulation. I am interested in adding support for AssemblyScript. I followed the recent discussions and speed ups. The question is if you still think it can bring some performance gains, also after the enhancements that were made?
The text was updated successfully, but these errors were encountered: