Timeout exceptions, 133 errors, connection negotiation

gummeson · February 2019

I've been working for a while now on an application that requires a persistent and reliable connection to some Metawear-augmented IoT devices; we are using OG Google Pixel phones with most recent February updates to connect to the Metawears. Our app does a blescan every 15 seconds, looks for a MAC we have registered with our App, and tries to connect to it. Currently our code handles 2 simultaneous connections ~99% of the time, but 1% of the time the app goes into an unrecoverable state where usually we can't reconnect to metawears until we reboot the phone. We know that the metawears aren't still stuck in a zombie connection because our app finds them in the blescan, but can never connect to them again until the reboot.

Looking at a long debug trace, there are a couple of things that happen before we get into the situation where we have to reboot -- this sometimes after ~1 Day of reliable connection. When we start the connection, we initialize the GPIO, Gyroscope, and Accelerometer. Occasionally we will see a 'java.util.concurrent.TimeoutException: Did not receive event id within 1000ms' error. This might happen a couple of times, but eventually the connection establishes. This happens more frequently and eventually we start seeing 'BTLE service reports connection error: Non-zero onConnectionStateChange status (133)'. After we see this error, we can typically never connect to this device again until after a reboot.

I took the step of doing some additional debugging to reproduce this fault while connected to Android studio. I found that when the Timeout errors crop up, it's usually when the phone and metawear end up negotiating a very aggressive connection with low connection interval and no latency. I managed to nearly eliminate these poorly negotiated connections by using a minimum connection value of 60, a max of 125, a latency of 15, and a supervisor timeout of 5000. This magic combo seems to always result in a connection with connection interval of 60-70 and a latency of 13. When these settings are used for the connection, I don't see timeouts.

So, long story short, I feel like I have a solution, but don't feel very confident in it because I'm just guessing numbers and observing the behavior of the connection process. Does anyone have experience with this? Any guidance on selecting some connection settings that are known to work well?

Eric · February 2019

Thanks for posting your findings; this is really interesting information to have. You're pretty much exploring new territory here though. Most of our testing has been for aggressive connections as there is a demand for streaming lots of data.

Off hand, your parameters look fine to me. You can refer to the "nRF Connect" app for suggested settings but since you have values that work for your use case, you probably don't need to tweak them any further.

It is odd that you are running into timeout exceptions. Are the boards in a clean state when you set them up?

gummeson · February 2019

Always happy to share findings to help the community.

Thanks for the tip about nRF Connect -- I'll take a look at what they suggest. By clean state do you mean the metawear HW or android SW? When we get errors, we destroy and recreate the board objects, so in that sense they are clean. Though we only do this for the 133 errors and not the Timeout Exceptions. Maybe we should?

Eric · February 2019

I am referring to the hardware. The specific TimeoutException you are referring to only happens if the board has allocated all itz resources or the connection is poor and response is dropped.

gummeson · February 2019

This is very helpful. Given that these errors build up over time, I am thinking that there most be some corner cases where tearDown() doesn't get called when there is a disconnection between the phone and the Metawears. I think that my better connection settings are making it much longer until this happens again. In previous case, the Metawear may have run out of resources after a string of errors.

I am going to try adding an explicit tearDown() call first thing on connection -- we don't need any of the allocated resources between connection intervals.

Timeout exceptions, 133 errors, connection negotiation

Comments