Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open SDK #27

Open
someburner opened this issue Feb 9, 2019 · 21 comments
Open

Open SDK #27

someburner opened this issue Feb 9, 2019 · 21 comments

Comments

@someburner
Copy link
Contributor

Looks like this is coming along! I'm interested in trying it out with my project but using esp-open-sdk.

I was able to get it to build with a fresh clone of the PR to esp-open-sdk, and then incrementally applied changes, but I got stuck at around this point: ea83a83

From there on I can't get it to compile, getting the message that I should be using "lwip 1.4 headers" - I tried taking them from Arduino but no luck, I guess I don't have a good enough grasp on which headers should go where? I also tried compiling with what I had, but couldn't figure out what had to be done for ipv4_addr.

If it's not a priority I don't mind working on it- I just need some more direction on the headers and what sorts of things I might need to change in my project to resolve ipv4_addr vs ip_addr. Thanks in advance!

@d-a-v
Copy link
Owner

d-a-v commented Feb 10, 2019

ip_addr_t is lwIP IP address structure.
in lwIP v2, this structure evolves when IPv6 is enabled.
nonos-sdk fw will ever only knows IPv4 with the initial structure, we must use this one when we talk to fw.
So I had to rename all SDK's references to ip_addr_t to use ipv4_addr_t instead because this name ipv4_addr_t does not exist in lwIP (I tried several ways, including using lwIP's ip4_addr_t. In the end ipv4_addr_t was the easiest way).

When IPv6 is not enabled, ipv4_addr_t is exactly the same as ip_addr_t (also as ipv4_addr and ip_addr).

@someburner
Copy link
Contributor Author

Okay, and what about the error in arch/cc.h about "lwip 1.4"? I understand the reason for why lwip with ipv6 requires new structs, but since this build is using relative paths everywhere it's very difficult to figure out where header files are being sourced from.

@d-a-v
Copy link
Owner

d-a-v commented Feb 11, 2019

I remember I had to replace a #include "something" by #include <samething> in sdk,
that, so -I is considered, otherwise same dir is used.

Tell me more about the error you have.

@someburner
Copy link
Contributor Author

#error LWIP_ERR_T definition should come from lwip1.4 from espressif

@d-a-v
Copy link
Owner

d-a-v commented Feb 11, 2019

That's because espressif changed this type a year ago.
This script extracts it from SDK.
You can use -DLWIP_ERR_T=s8 (or whatever lwip_err_t is in you sdk version).

@someburner
Copy link
Contributor Author

Okay thanks- still I am wondering which version/release is being used exactly? or should it not matter?

@d-a-v
Copy link
Owner

d-a-v commented Feb 11, 2019

On this repository, master is always better, generally.
The version used in arduino (which is supposed to be stable) is linked from there:builder/ as a submodule.

@someburner
Copy link
Contributor Author

someburner commented Feb 14, 2019

Deleted previous comment since it seems those arose from trying to start with a project that had too much going on dependency-wise. Now where I'm at:

Re-compiled esp-open-sdk from the PR. Was getting linker errors for free and calloc like this:

/home/jeff/build/esp8266/sdk2/xtensa-lx106-elf/xtensa-lx106-elf/sysroot/usr/lib/liblwip2.a(mem.o): In function `mem_malloc':
/home/jeff/build/esp8266/sdk2/lwip2/lwip2-src/src/core/mem.c:136: undefined reference to `free'
/home/jeff/build/esp8266/sdk2/xtensa-lx106-elf/xtensa-lx106-elf/sysroot/usr/lib/liblwip2.a(mem.o): In function `mem_free':
/home/jeff/build/esp8266/sdk2/lwip2/lwip2-src/src/core/mem.c:151: undefined reference to `free'
collect2: error: ld returned 1 exit status

I think it's looking for those in irom so I added malloc + free methods in flash in my project that call os_xxx methods. I was previously getting that to link some other way and it was crashing things immediately.

Next was fixing includes for my project. I think the issue is that esp-open-sdk copies esp-open-lwip includes into sysroot, so I needed to include the lwip2 folder in CFLAGS.

Now my test project can boot, and including <lwip/init.h> gets MAJOR==2. But it seems something is broken still: Calling wifi_station_dhcpc_start() shows the message that dhcp is starting up, and then calling wifi_station_connect() I get the EVENT_STAMODE_CONNECTED event. However after that, no IP address. I also tried setting a static IP/GW/mask and then calling connect, but I get:

connected with TestWiFi, channel 11
ip:192.168.1.55,mask:255.255.255.0,gw:192.168.1.1
lwESP: netif_set_up is called??ip:192.168.1.55,mask:255.255.255.0,11Wifi connected to ssid TestWiFi, ch 11
Wifi got ip:192.168.1.55,mask:255.255.255.0,gw:192.168.1.1

The router shows the ESP as associated but no IP. Is this a symptom of the older version in use? Or should I be getting an IP with dhcp? Or maybe something else I'm missing?

update:

I started working on some other stuff and just left my ESP sit on serial. Amazingly, I got an IP address finally, after about 30-35 minutes of nothing. Shows up on router and ping working. So that's a good sign, but need to figure out what is up with DHCP. Thoughts? I let another esp sit and it took the exact same amount of time- perhaps something todo with a timer somewhere..?

update2:

Started working back up the commits with open-sdk. After c3f36d4 and rebuilt, DHCP works right away

@someburner
Copy link
Contributor Author

someburner commented Feb 14, 2019

okay I got it all working up to 163bb82. Still need to figure out the deal with linking for free and malloc, but overall I'm pretty happy and look forward to testing this on a bigger scale. If there's any interest I can list out my steps for open-sdk.

update3: fixed malloc/free linking simply by patching core/mem.c. Was able to compile the project I was trying originally and saw immediate improvements. On some routers/networks my project will frequently get tcp disconnects unless put on a separate vlan. Doing just the bare minimum to switch to lwip2 and now no more disconnects. Thanks @d-a-v !

@d-a-v
Copy link
Owner

d-a-v commented Feb 15, 2019

Doing just the bare minimum to switch to lwip2 and now no more disconnects. Thanks @d-a-v !

👍 Thanks to you!

Is there any modification you had to do in my repository ?

If there's any interest I can list out my steps for open-sdk.

There is. You may start posting your steps in a new issue in esp-open-sdk.
(It will replace the unfinished one in pfalcon/esp-open-sdk#271)

@andrethomas
Copy link

Definitely some interest from my side also...

@someburner
Copy link
Contributor Author

someburner commented Feb 15, 2019

Okay- I would like to maintain compatibility with the work in this repo, and right now it is forked off from a fairly early commit that introduced all the changes to put debugs into PROGMEM (or at least that's how I interpreted it). The gluedebug.h had a lot going on so I just kept what it had originally. My project places all strings into flash NodeMCU-style already so for me it wasn't a problem. Maybe it's not actually that much that needs to be changed to make it compatible with both, but the commit I'm referring to is d45ac2a. This one compiled (lwip2.a) but then trying to actually compile a simple test project created a ton of errors. Then also this one: 1b88267. When I get some time, I'll make a branch that goes back to that version to reproduce the error for you to see, so we can try to resolve.

My forked version is here. If you scroll through commit history on the open branch you can see the point where the commits by @d-a-v start turning into commits I cherry-picked.

I also have my own version of esp-open-sdk that uses recent GCC/newer crosstool-ng, and I made a branch that should compile without issue on most linux distros. I tried doing this with gcc 4.8.5 at first but not having colorized text was killing me so I am using 7.4.

git clone --recursive https://github.com/someburner/esp-open-sdk.git -b sdk2-lwip2

That said- the later commits on my esp82xx-nonos-linklayer fork added some additional patches intended to eliminate any manual patching and should work with the original open-sdk and whatever SDK it's currently using.

So I think the best course of action would be to try to resolve the strings-in-flash thing, and then after that a diff between the 2 repos should show all the other changes I made (which are not that many, and mostly self-describing). I also want to try to get up to lwip 2.1.2.

I have a feeling pfalcon is going to want sntp implementation to be provided one way or the other.

lastly- I had a question- what is the deal with TCP_MSS 536 vs. 1460? Is 1460 considered stable or what is wrong with it?

@someburner
Copy link
Contributor Author

someburner commented Feb 16, 2019

Okay I finished getting it up to master of this repo. Like I said before, I generally see an improvement on the ~5 or 6 devices I've ran this on. However I have noticed one thing that I do not see with vendor lwip1.4 - which is resets due to exceptions. One in particular is getting this constantly, but I saw one on another running lwip210 too. I am almost positive this must be coming from the new lwip (base fw is stable running on >1K devices - no exceptions ever, all restarts/low-mem are handled gracefully). I am using TCP_MSS = 1460. It's worth noting these are probably the 2 worst connections I've ever had to deal with, but nonetheless exceptions are no bueno. I suppose it could be due to mis-handling up the stack as the behavior of lwip2 might be a bit different.

This might warrant a separate issue but wanted to get some feedback on stability at 1460 first / maybe something that I have configured that is Arduino-only. Unfortunately I don't have a way of recovering stack frame from any of these.. maybe I should work on that next :)

@someburner
Copy link
Contributor Author

@d-a-v - Think I may have found the issue:

When calling espconn_disconnect in my project, the registered disconnect callback never gets called. I put debugs in glue-lwip/espconn_tcp.c and found that espconn_Task is never being entered. It should be though- at the end of espconn_client_close there is

ets_post(espconn_TaskPrio, SIG_ESPCONN_CLOSE, (uint32_t)pclose);

Seems that espconn_init is never called. Is this on purpose? Calling espconn_init(); in my project on init seems to work, and the tasks work after that.

@d-a-v
Copy link
Owner

d-a-v commented Feb 22, 2019

First, sorry for being silent to your questions,
thanks for your work,

I had a question- what is the deal with TCP_MSS 536 vs. 1460? Is 1460 considered stable or what is wrong with it?

Only ram occupation. Also, working with 536 allowed me to check on good quality code that does not rely on this 1460 constant (that is on arduino core side). It is safe to work with 1460, allows better bandwidth (less inter-packet latency).

Seems that espconn_init is never called.

I have never used espconn (Arduino core does not use it). I have only ported it and included in the library for further test, also because it was requested at some point. I have never had any feedback.
Feel free to fix anything related with espconn with that in mind.
I think this espconn was (from lwIP v1.4 times) initially a rename of lwIP's netconn. I don't know why this rename or what are the additions / differences.

About my rather complex debug print system. That was long ago and at that time I couldn't configure my usb/serial to 74880bauds. That is why I am using buffers. I guess this could be completely removed.

So I think the best course of action would be to try to resolve the strings-in-flash thing

What are the issues ?

@someburner
Copy link
Contributor Author

sorry for being silent to your questions

Not a problem! But I do hope to get my changes merged in while this is still fresh in my head.

Only ram occupation.

Okay good to know. On a related note, there were other changes in lwipopts.h that diverge from Espressif lwip14 lwipopts.h, like MEM_SIZE. Some are commented clearly as required for ESP8266, but would be nice to have more comments explaining the changes. I am barely acquainted with lwip at all though so I have mostly copied the Arduino changes for open-sdk.

I have never used espconn (Arduino core does not use it). I have only ported it and included in the library for further test, also because it was requested at some point. I have never had any feedback.
Feel free to fix anything related with espconn with that in mind.

Okay that makes much more sense. After fixing espconn_init it appears to function the same as the SDK/lwip14 espconn. I've had this going a couple days now on my toughest devices and 20 others with good connections and have not seen an exception yet. Hopefully that was indeed the issue, but I can't say for sure for a few more days at least.

I guess this could be completely removed.

In that case I will try to resolve the issue by using some #ifdefs for Arduino vs. other, which is already necessary for SNTP implementation differences. I'll fork and submit a PR soon.

@someburner
Copy link
Contributor Author

someburner commented Feb 22, 2019

For the record, here is what happens when trying to link after commit d45ac2a:

/home/jeff/build/esp8266/sdk2.2.2/xtensa-lx106-elf/lib/gcc/xtensa-lx106-elf/7.4.0/../../../../xtensa-lx106-elf/bin/ld: /home/jeff/build/esp8266/sdk2.2.2/sdk/lib/libc.a(lib_a-abort.o):(.literal+0x0): undefined reference to `_exit'
/home/jeff/build/esp8266/sdk2.2.2/xtensa-lx106-elf/lib/gcc/xtensa-lx106-elf/7.4.0/../../../../xtensa-lx106-elf/bin/ld: /home/jeff/build/esp8266/sdk2.2.2/sdk/lib/libc.a(lib_a-abort.o): in function `abort':
/home/wjg/Repo/esp-open-sdk-20170622/crosstool-NG/.build/src/newlib-2.0.0/newlib/libc/stdlib/abort.c:63: undefined reference to `_exit'
/home/jeff/build/esp8266/sdk2.2.2/xtensa-lx106-elf/lib/gcc/xtensa-lx106-elf/7.4.0/../../../../xtensa-lx106-elf/bin/ld: /home/jeff/build/esp8266/sdk2.2.2/sdk/lib/libc.a(lib_a-signal.o):(.literal+0x0): undefined reference to `_getpid_r'
/home/jeff/build/esp8266/sdk2.2.2/xtensa-lx106-elf/lib/gcc/xtensa-lx106-elf/7.4.0/../../../../xtensa-lx106-elf/bin/ld: /home/jeff/build/esp8266/sdk2.2.2/sdk/lib/libc.a(lib_a-signal.o):(.literal+0x4): undefined reference to `_kill_r'
 ...
etc, etc
...

Found the issue- that commit sets ULWIPASSERT=1. Setting back to 0 fixes. Wish I had noticed that earlier.

@d-a-v
Copy link
Owner

d-a-v commented Feb 23, 2019

Nice work. What is preventing you to use master ?
Is it only because of ipv4_addr_t ?

@someburner
Copy link
Contributor Author

No I'm already on master with my open-dev branch, but they are cherry-picked commits with some edits so I'm trying to patch my changes onto your current master.

@d-a-v
Copy link
Owner

d-a-v commented Jul 16, 2019

@someburner latest "locking" commit is good for everyone.

@someburner
Copy link
Contributor Author

@d-a-v Thanks. I will pull that in on my esp-open-sdk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants