-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lednew #16
base: main
Are you sure you want to change the base?
Lednew #16
Conversation
I included more optimization from Rodorigo's sites. I see more possibilities around
Consideration of br** command shorter than rjmp is a factor. Since these needs to be checked in assembled code log, I am not yet merging them. (Besides they tends to cause merge conflicts) |
I now included all code shrinks. (More than enough for LED). Mostly Sergey's idea but I made 6 bytes shrink using Y+ for WDT. No rush except one of any patches to get under 512 for any use case. |
Guys, I love the passion that you have working on optimizing this bootloader. It is such a pleasure to (silently) look at how it improves. Great! |
Sorry ... I had a stupid mistake. I used "git push -f". If you ever pulled today before reading this, please clone fresh. |
I think I covered every optimization I did so far by now. Many of them are borrowed from Sergey. Since this series of changes were extensive, there may be some errors in my merge operations. Please check before accepting all. At least this is a linear history.
Cheers! |
I tested on Teensy-type board and lednew did not work. Oops, my last significant commit had a bug of dropping
I will update tree with forth push. |
Hi, Tested with Teensy compatible PCB. It works. I used string-descriptors branch from https://github.com/sigprof/nanoBoot/
Product name seems to use 2 bytes per character. So I trimmed product name to "nanoBt" (4 bytes savings) Now all platform (including promicro type LED) should fit into 512 bytes. I think this is the best we can do for 512 bytes. |
Instead of looping until one of two bits in UEINTX is set, and then retesting the value after the loop, just do `reti` inside the loop if `RXSTPI` is set. This saves 4 bytes by removing both a duplicate bit test instruction and an extra jump instruction.
The `wait_TXINI` and `wait_RXOUTI` subroutines were always called after a call to some other `clear_XXX` subroutine to clear a bit in `UEINTX`. Replace those two subroutines with `clear_bit_and_wait_TXINI` and `clear_bit_and_wait_RXOUTI`, which get the bit to be cleared as a parameter in `r17`, and then inline the remaining `clear_XXX` subroutines (a subroutine with just 2 instructions in its body actually takes more space than inline code if it is called less than 3 times).
Rewrite the part of the GET_DESCRIPTOR handling code that loads the address and length of the requested descriptor to use less jumps; the resulting code consumes 2 bytes less, even though it is actually more correct (no longer replies with some descriptor to requests for an unknown descriptor type). The new code also avoids hardcoding the high address byte, and no longer depends on the fact that all descriptors have the same high address byte (but it depends on the fact that all offsets between adjacent descriptors can fit into the `adiw` constant argument (0...63)).
Signed-off-by: Osamu Aoki <[email protected]>
The original LUFA code performed `UDCON |= (1 << DETACH);` to detach the USB device; however, the other bits of `UDCON` are known to be 0 at this time, therefore a simple write of a constant value could be performed here. In addition, the value of `(1 << DETACH)` is 1 on all AVR chips that could be potentially supported by this code, therefore even the instruction to load the constant value into a register could be omitted. Doing these changes removes 2 instructions, saving 4 bytes.
The initialization of USBCON does not need to read the current register value - just writing the reset value into that register is enough. This removes one instruction, saving 2 bytes. In theory this write could even be omitted completely, saving 4 more bytes, but this would make the code less robust if the application code attempting to enter the bootloader does not initialize some USB controller registers properly, therefore leaving that USBCON write there is safer.
The instruction which applied the mask of `(CONTROL_REQTYPE_TYPE | CONTROL_REQTYPE_RECIPIENT)` to bmRequestType was not actually needed, because the value of this mask is 0x7F, and the only bit which is not covered by the mask (0x80) is already known to be 0, therefore the masking did not actually change anything. Removing this instruction saves 2 bytes. Signed-off-by: Sergey Vlasov <[email protected]> Since following commits are skipped previously when cherry-picking. 97c8d96 ("TEMPORARY: EEPROM code which does not fit", 2021-08-25) e48801a ("WIP: Optimize some more code to make the EEPROM support fit", 2021-08-25) 78b804d ("WIP: Redo the EEPROM write implementation", 2021-08-29) e7e94f3 ("Save 2 bytes by reusing the TXINI clearing mask", 2021-08-29) Adjusted to keep branch to thunk code. Signed-off-by: Osamu Aoki <[email protected]>
The DEVICE_TO_HOST handling code needs to handle only the GET_DESCRIPTOR requests, which may come with two possible bmRequestType values: - 0x80 (IN direction, USB Standard Request, Recipient is the device) if a descriptor which applies to the device as a whole is requested (this code is used for the device and configuration descriptors); - 0x81 (IN direction, USB Standard Request, Recipient is the interface) if a descriptor specific to a particular interface is requested (this code is used for the HID class and HID report descriptors). Because these codes are numerically sequential, and the direction bit has already been tested (therefore it is known that bmRequestType >= 0x80), it is enough to make a single comparison with 0x82 to detemine whether bmRequestType has one of the above values. Removing the bit masking operation which is not needed after that change saves 2 bytes.
Signed-off-by: Osamu Aoki <[email protected]>
Access code to data in the extended IO address range can be made compact by using Y+. Both WDT and USB registers are in extended IO address. Fortunately, calls to these 2 types of addresses do not overlap. * initial code calls WDT * main code calls USB * exit code calls WDT This patch enables to use Y+ for WDT in addtion to USB. * YH is common to USB and WDT and 0 (very first). * YL is different. It's initialization is moved to each subroutine to avoid code duplication. * inside of the set_watchdog_timer subroutine * USB Initialization section Signed-off-by: Osamu Aoki <[email protected]>
Signed-off-by: Osamu Aoki <[email protected]>
No size or behavior changes, just a preparation to save some bytes with fallthrough. Cherry picked from Sergey's repo LED code properly moved. Signed-off-by: Osamu Aoki <[email protected]>
Restructure conditional jumps to use fallthrough for the `SET_HID_REPORT` case; this removes one jump instruction, saving 2 bytes. Cherry picked from Sergey Vlasov <[email protected]>: a358e89 ("Save 2 bytes by using fallthrough for SET_HID_REPORT", 2021-08-24) Comment by Osamu (conflict resolution) UNHANDLED_SETUP_REQUEST_1 was used since br command PC(word) offset is 6 bit offset address 00 00007ede first call ending up at UNHANDLED_SETUP_REQUEST This requires thunk to reach with br** command 4A 00 00007f28 old UNHANDLED_DEVICE_TO_HOST entry ending up at UNHANDLED_SETUP_REQUEST 5E 14 00 00007f3c UNHANDLED_SETUP_REQUEST_1 entry ending up at UNHANDLED_SETUP_REQUEST !B8 6E 5A 00007f96 UNHANDLED_SETUP_REQUEST entry br** commands 6 bit offset PC offset (64) +/- 2^7 byte offset +7E, -80 (126) rjmp command 11 bit offset PC offset (2K) +/- 2^12 byte offset +FFE, -1000 (4094) Signed-off-by: Osamu Aoki <[email protected]>
Due to reordering of code, UNHANDLED_SETUP_REQUEST is within scope of direct jump from br** command now. 00007f3e <DEVICE_TO_HOST>: ; - 0x80 - IN Type Request, USB Standard Request, Recipient is the device ; - 0x81 - IN Type Request, USB Standard Request, Recipient is the interface ; At this step it is known that bmRequestType >= 0x80, therefore checking for bmRequestType < 0x82 ; is enough to detect whether bmRequestType has one of the above values. cpi reg_bmRequestType, 0x82 ; Check whether bmRequestType is less than 0x82 (then it must be either 0x80 or 0x81) 7f3e: 22 38 cpi r18, 0x82 ; 130 brcc UNHANDLED_SETUP_REQUEST ; If bmRequestType >= 0x82, this request type is not handled here (it's not a GET_DESCRIPTOR request) 7f40: 50 f5 brcc .+84 ; 0x7f96 <UNHANDLED_SETUP_REQUEST> cpi reg_bRequest, 0x06 ; Compare bRequest with value 0x06 (REQ_GetDescriptor) 7f42: 36 30 cpi r19, 0x06 ; 6 brne UNHANDLED_SETUP_REQUEST ; jump to UNHANDLED_SETUP_REQUEST through f not equal 7f44: 41 f5 brne .+80 ; 0x7f96 <UNHANDLED_SETUP_REQUEST> Signed-off-by: Osamu Aoki <[email protected]>
Signed-off-by: Osamu Aoki <[email protected]>
Signed-off-by: Osamu Aoki <[email protected]>
br** command has relatively narrow jump range. Since UNHANDLED_SETUP_REQUEST is frequent entry point, moving it up to keep it at the center of code makes sense. This enables to drop thunk. Signed-off-by: Osamu Aoki <[email protected]>
No more need to call through its thunk Signed-off-by: Osamu Aoki <[email protected]>
Signed-off-by: Osamu Aoki <[email protected]>
There is no need to turn off LED Write program message gives the completion confirmation. Signed-off-by: Osamu Aoki <[email protected]>
Borrowed from from 99c4dbf ("WIP: String descriptor implementation", 2021-09-04) I skipped size optimization in USB signal handling since it was problematic. If no LED are used, this is good enough. Signed-off-by: Osamu Aoki <[email protected]>
Although we can further reduce size by skipping cli and some clr commands if we only care about power-on reset or real hardware reset. But if bootloader is called by long jump, this may not be safe thing to do. So reactivating this. Signed-off-by: Osamu Aoki <[email protected]>
Signed-off-by: Osamu Aoki <[email protected]>
This makes it possible to compile with opt-in features without touching the source code. Signed-off-by: Osamu Aoki <[email protected]>
/NUL certainly breaks on non-Windows system and should never be here NUL is ugly since it leaves NUL on other platform I like /dev/null better, but just in case, I am taking safer option. This should be the least invasive change to reduce noise on non-Windows platform without breaking Windows environment. Signed-off-by: Osamu Aoki <[email protected]>
Signed-off-by: Osamu Aoki <[email protected]>
Signed-off-by: Osamu Aoki <[email protected]>
Hi, Thanks. The USB device name thing is working for all platform types and within 512 bytes. Before the rebase, this was tested on my hardware. So this should be in good shape. I think this is as much we can do for nanoBoot. Osamu |
Hi,
You updated source on your led_feature_support branch.
If user enable LED for promicro, binary size will be 514.
I rebased lednew branch to your led_feature_support branch and applied few more size optimization Sergey Vlasov [email protected] was doing in his testing branch. Yes, 12 bytes reduction!
I cherry picked them with some minor trivial touch up to make them cleanly applicable.
Binary size:
I also updated README.md accordingly.