Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WeatherService silently stops updating on N4 JACEs #4

Open
scrambledleek opened this issue Mar 9, 2020 · 5 comments
Open

WeatherService silently stops updating on N4 JACEs #4

scrambledleek opened this issue Mar 9, 2020 · 5 comments

Comments

@scrambledleek
Copy link

scrambledleek commented Mar 9, 2020

The issue described below may not be a problem with the OpenWeatherMap provider itself, but Niagara Community user 'roc' suggested I record the issue here too.

Note: The issue was observed using the 4.0.1 release of this module.

I think the idea would be to add optional extras to the OWM provider that:

  • Output a log message whenever it is called by the WeatherService (so we can see when it stops)
  • Can output a log message when the OWN provider has not had a request to update from the WeatherProvider for longer than expected

Anyway, the issue description as posted on Niagara Community:

_We're getting occasional problems with the WeatherService on some of our JACE 8000s.

We are in the UK and using the OpenWeatherMap Provider kindly provided as open source by Neopsis: https://github.com/neopsis/niagara-weather

I've just been examining the issue on a JACE running N4.7.109.20.

What I see is that the WeatherService stops updating automatically, and using the UpdateWeatherReport action either at the service or report level does not update it either. There are no messages in the Application Director other than the regular station saves and NTP time sync updates.

I tried creating another report under the existing WeatherService, but that would not update either.

I tried disabling and re-enabling the WeatherService, but that did not help.

I tried duplicating the WeatherService to create a WeatherService1. The report within this was then able to update.

My guess is that there is some background thread or timer created when the WeatherService is created during station start-up or on creating a new entry under Services when a station is already running. This has silently died and neither automatic or manual updates occur any more. This is not removed and recreated when you disable and then re-enable the WeatherService, so doing that does not fix the issue.

The best workaround I have found is to 'cut' the WeatherService and 'paste' it back in the same location in the Services folder, then renaming it back from WeatherService1 to WeatherService. Getting it running again this way preserves any links in/out of the WeatherService (e.g. we use the current external temperature from the WeatherService in an average of the site's OAT sensor readings).

Can anyone shed any further light on what's going on and why there are no Application Director messages about failures to update?

Perhaps as a first step towards resolving this sort of issue Tridium can think about updating the WeatherService so the disabling and re-enabling the WeatherService would recreate this background thread/timer too?_

@scrambledleek
Copy link
Author

scrambledleek commented Mar 9, 2020

The Application Director Dump Threads output showed this for the unhealthy WeatherService thread:

"WeatherService" #26 daemon prio=5 os_prio=10 tid=0x1b61f3b8 nid=0x14 runnable [0x77ebf000]
   java.lang.Thread.State: RUNNABLE
 at java.net.SocketInputStream.socketRead0(Native Method)
 at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
 at java.net.SocketInputStream.read(SocketInputStream.java:171)
 at java.net.SocketInputStream.read(SocketInputStream.java:141)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
 at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
 - locked <0x0c7cfce0(a java.io.BufferedInputStream)
 at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
 at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
 at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587)
 - locked <0x0c7cf948(a sun.net.www.protocol.http.HttpURLConnection)
 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
 - locked <0x0c7cf948(a sun.net.www.protocol.http.HttpURLConnection)
 at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
 at com.neopsis.envas.weather.util.NvBaseReader.read(NvBaseReader.java:40)
 at com.neopsis.envas.weather.openweathermap.NvOwmReader.getForecast(NvOwmReader.java:55)
 at com.neopsis.envas.weather.openweathermap.BNvOwmProvider.updateReport(BNvOwmProvider.java:160)
 at javax.baja.weather.BWeatherReport.doUpdateWeatherReport(BWeatherReport.java:312)
 at auto.javax_baja_weather_BWeatherReport.invoke(AutoGenerated)
 at com.tridium.sys.schema.ComponentSlotMap.invoke(ComponentSlotMap.java:1890)
 at com.tridium.sys.engine.EngineUtil.doInvoke(EngineUtil.java:62)
 at javax.baja.sys.BComponent.doInvoke(BComponent.java:1258)
 at javax.baja.util.Invocation.run(Invocation.java:47)
 at javax.baja.util.Worker.process(Worker.java:168)
 at javax.baja.util.Worker$Processor.run(Worker.java:141)
 at java.lang.Thread.run(Thread.java:748)

The lines with 'locked' might be suspicious.

I noted that one of the changes between 4.0.1 and 4.0.2 was a change to the API used for fetching URLs, so I have updated the JACE 8000 showing the problem to the 4.0.2 version of this module to see if that helps.

@envas
Copy link
Contributor

envas commented May 31, 2020

Unfortunately, I cannot reproduce the error in my environment. However, I am testing 4.0.2 that uses Tridium HttpsConnection class. How is 4.0.2 running on your site?

@scrambledleek
Copy link
Author

scrambledleek commented Jun 13, 2020

Apologies for the delay!

I do still appear to have a problem with the 4.0.2 module (JACE 8000 with Niagara 4.7.109.20).

Below is the thread dump for the WeatherService thread. It looks a little different as it's using the Tridium class, but still appears to have got stuck in "java.net.SocketInputStream.socketRead0(Native Method)".

"WeatherService" #26 daemon prio=5 os_prio=10 tid=0x1a7cea58 nid=0x14 runnable [0x77f3f000]
   java.lang.Thread.State: RUNNABLE
 at java.net.SocketInputStream.socketRead0(Native Method)
 at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
 at java.net.SocketInputStream.read(SocketInputStream.java:171)
 at java.net.SocketInputStream.read(SocketInputStream.java:141)
 at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
 at sun.security.ssl.InputRecord.read(InputRecord.java:503)
 at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983)
 - locked <0x0c5fa098> (a java.lang.Object)
 at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1385)
 - locked <0x0c5fa068> (a java.lang.Object)
 at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1413)
 at sun.security.ssl.SSLSocketImpl.getSession(SSLSocketImpl.java:2301)
 at com.tridium.crypto.core.io.CryptoCoreClientSocketFactory.performHandshake(CryptoCoreClientSocketFactory.java:392)
 at com.tridium.crypto.core.io.CryptoCoreClientSocketFactory.initSocket(CryptoCoreClientSocketFactory.java:351)
 at com.tridium.crypto.core.io.CryptoCoreClientSocketFactory.initSocket(CryptoCoreClientSocketFactory.java:318)
 at com.tridium.crypto.core.io.CryptoCoreClientSocketFactory.createSocket(CryptoCoreClientSocketFactory.java:109)
 - locked <0x0c5f8e40> (a com.tridium.platcrypto.socket.CryptoClientSocketFactory)
 at javax.baja.net.HttpsConnection.createSocket(HttpsConnection.java:189)
 at javax.baja.net.HttpsConnection.connect(HttpsConnection.java:137)
 - locked <0x0c5f8ca0> (a javax.baja.net.HttpsConnection)
 at javax.baja.net.HttpConnection.connect(HttpConnection.java:288)
 - locked <0x0c5f8ca0> (a javax.baja.net.HttpsConnection)
 at com.neopsis.envas.weather.util.NvBaseReader.read(NvBaseReader.java:42)
 at com.neopsis.envas.weather.openweathermap.NvOwmReader.getCurrentConditions(NvOwmReader.java:177)
 at com.neopsis.envas.weather.openweathermap.BNvOwmProvider.updateReport(BNvOwmProvider.java:159)
 at javax.baja.weather.BWeatherReport.doUpdateWeatherReport(BWeatherReport.java:312)
 at auto.javax_baja_weather_BWeatherReport.invoke(AutoGenerated)
 at com.tridium.sys.schema.ComponentSlotMap.invoke(ComponentSlotMap.java:1890)
 at com.tridium.sys.engine.EngineUtil.doInvoke(EngineUtil.java:62)
 at javax.baja.sys.BComponent.doInvoke(BComponent.java:1258)
 at javax.baja.util.Invocation.run(Invocation.java:47)
 at javax.baja.util.Worker.process(Worker.java:168)
 at javax.baja.util.Worker$Processor.run(Worker.java:141)
 at java.lang.Thread.run(Thread.java:748)

@scrambledleek
Copy link
Author

scrambledleek commented Jun 13, 2020

The thread dump for the WeatherService after I've cleared the problem by cutting and then pasting the WeatherService is:

"WeatherService" #10476 daemon prio=5 os_prio=10 tid=0x1b07bd08 nid=0x3d in Object.wait() [0x1d4af000]
   java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at javax.baja.util.Queue.dequeue(Queue.java:174)
 - locked <0x0b1f6098> (a javax.baja.util.CoalesceQueue)
 at javax.baja.util.Queue.todo(Queue.java:268)
 at javax.baja.util.Worker$Processor.run(Worker.java:137)
 at java.lang.Thread.run(Thread.java:748)

@scrambledleek
Copy link
Author

scrambledleek commented Jan 18, 2021

I've just been looking at this again a little.

I notice the javax.baja.net.HttpConnection class has two different timeout methods. The one NvBaseReader is using sets up an SO_TIMEOUT (socket timeout) of 30 seconds; setTimeout(). The other sets up a 'connection timeout'; setConnectionTimeout(), which is also possible by calling connect() with a timeout argument.

It looks like setting a general timeout using setTimeout() does not set up a 'connection timeout' as you might think, and the default is for there not to be a timeout.

With the 4.0.1 versions of this module I'm not sure any timeouts were in place.
With the 4.0.2 version we have the SO_TIEMOUT, but no 'connection timeout' for making the initial connection.

My guess is that the Internet connection at the site the JACE 8000 is installed on is occasionally flaky, and that can coincide with making the initial connection, causing the WeatherService to lock-up. Do you think adding a call to setConnectionTimeout() as well as setTimeout() might cure this problem I've been seeing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants