Buffer overflow

When invoking the device.connect method of a MetaTracker instance, sometimes I get a

*** buffer overflow detected ***: python terminated

Reading some older posts, here is some extra information:

  • its an Ubuntu machine, 6gb ram
  • ram is mostly free
  • a lot of swap available, not being used by any program

Here is btmon output for a simple run with this error. Its a straightforward call to MetaTracker(mac_address).connect:

https://gist.github.com/guilhermesilveira/07df7861741c93e2b55b91ee1bd5bdd7

It happens intermitently with many MetaTracker devices.


One interesting fact is that a guaranteed way to get the buffer overflow error is to run a program and try to connect to two devices. The first connect invocation to a MetaTracker instance works fine, the second one fails with the buffer overflow every time. Perhaps those are two different buffer overflows.

Any suggestions?

Regards

Comments

  • If you do the same thing with our standard MetaBase app, does an error occur?
    Does your custom code work fine with a single device?

  • @Laura , I did the following:

    • removed batteries from all trackers (7 in total)
    • inserted battery in tracker E9, it showed up on metabase as the only one on
    • connected, retrieved firmware=1.2.5, asked for firmware update, choose LATER
    • "run diagnostic" does not execute anything, so i go back
    • it shows the e9 off for a while then back on, select it again, 15 times, worked fine
    • closed the app
    • turned off my cell phone bluetooth, just in case

    Open the computer, run the provided scan_connect.py from examples: https://github.com/mbientlab/MetaWear-SDK-Python/blob/master/examples/scan_connect.py

    • it shows E9 as the only metawear around
    • selects, works fine
    • disconnects
    • wait a few seconds, try again, repeat 3 times, all fine

    Which means device.connect works fine for the same device a few times if disconnection was succesful. That already helps isolating the problem.

    I noticed that I would get connection errors if disconnection was abrupt (i.e. due to an error if closes the program, the next connection typically fails).

    Next step, now I try the same process with two metatrackers on.

    • inserted battery in tracker FB (E9 is still on)
    • turn on the phone bluetooth, it showed up on metabase as the both on
    • connected to FB, same 1.2.5, firmware, back thing. did the same thing to E9
    • repeated 2 times for each
    • after the second time both show as OFF on Metabase, but when I tried to connect it worked, perhaps just a UI async glitch from the app itself
    • closed the app
    • turned off my cell phone bluetooth, just in case

    Now, on to the scan_connect.py script from github:

    • run the script twice, in two different terminal windows
    • both list both devices
    • choose FB on one, worked fine, quit
    • choose E9 on the second one, got a "Timed out while trying to connect to remote device". tried again, same error
    • just in case, on the second one tried FB, same error
    • tried FB on the first one again, same error
    • tried E9 on the first one, worked
    1. So far I can see I am getting random connection time outs. Are these random connection time outs expected? I saw in some post they were. If so, why do they happen and how to mitigate it besides a connection loop?

    Now that I was able to connect to both I go for both terminals and try again, first one on FB

    • first one connects successfuly, while reading the info I select E9 on the second:
    • second one dies with a *** buffer overflow detected ***
    1. The conclusion is that the script (from mbientlab itself) works most of the time with individual pieces, but seems to fail every time I try to connect to two devices (either from the same python process, from my original program, or in two different python processes running from the same directory, such as this example).

    2. I can not replicate connecting to two devices from the App since the app connects to one device at a time.

    Any suggestions on how to tackle the buffer overflow error? Is there any flag I can pass to the C library compilation or runtime that enables more debug information so we can better detect where the buffer overflow occurs and who ?

  • If it helps, when trying to connect to two devices at the same time, the json file in the cache folder is created for the first one (connected successfuly) but the buffer overflow occurs before creating the cache folder json file for the second device. It definitely occurs during the device.connect() invocation in the script provided (debugged it to get there).

  • More specifically, the error occurs on self.warble.connect_async(completed)

  • @guilhermesilveira said:
    Any suggestions on how to tackle the buffer overflow error? Is there any flag I can pass to the C library compilation or runtime that enables more debug information so we can better detect where the buffer overflow occurs and who ?

    There's no direct way to install a debug version of libwarble. You can checkout the warble C code, build the debug version, and replace the libwarble.so in the pywarble pyhton package.

    git clone https://github.com/mbientlab/Warble.git --recurse-submodules
    make CONFIG=debug -C Warble
    

    The debug .so will be in Warble/dist/debug/lib/{arch}/libwarble.so

  • @guilhermesilveira said:
    Now, on to the scan_connect.py script from github:

    • run the script twice, in two different terminal windows
    • both list both devices
    • choose FB on one, worked fine, quit
    • choose E9 on the second one, got a "Timed out while trying to connect to remote device". tried again, same error
    • just in case, on the second one tried FB, same error
    • tried FB on the first one again, same error
    • tried E9 on the first one, worked
    1. So far I can see I am getting random connection time outs. Are these random connection time outs expected? I saw in some post they were. If so, why do they happen and how to mitigate it besides a connection loop?

    Could be a variety of things. Connections won't always be successful on the first attempt; your app needs to accommodate for that.

    Now that I was able to connect to both I go for both terminals and try again, first one on FB

    • first one connects successfuly, while reading the info I select E9 on the second:
    • second one dies with a *** buffer overflow detected ***
    1. The conclusion is that the script (from mbientlab itself) works most of the time with individual pieces, but seems to fail every time I try to connect to two devices (either from the same python process, from my original program, or in two different python processes running from the same directory, such as this example).

    2. I can not replicate connecting to two devices from the App since the app connects to one device at a time.

    Are you trying to connect to two devices simultaneously? Connection attempts should be done serially not in parallel.

  • edited April 19

    Thanks Eric, I will compile and run with the debug settings.

    Answering your question, the connections are done serialized. You can think of the following codes if it was just one script:

    m1 = MetaWear(mac1)
    m1.connect()
    m2 = MetaWear(mac2)
    m2.connect()
    

    I suspect it might be the usage of a poor USB dongle on Ubuntu. I will receive a new device from another Branch tomorrow and try again.

    I will even try with both devices specifying the HCI Mac address for each one to see if the problem is with the devices or with some shared memory codes inside the Bluetooth library itselfI will even try with both devices specifying the HCI Mac address for each one to see if the problem is with the devices or with some shared memory buffer inside the Bluetooth library itself. Those libraries are usually stable enough, so I believe it will be just a poor quality dongle issue. I'll get back to it this week

    Thanks

  • @minousoso can you tell me what's is the firmware update?

  • @Eric @Laura my suspicion was correct, first there is an issue with the. bluetooth dongle. Some bluetooth dongles do not support more than one paired connection at a time. So after connecting with one device, the second one always fail. I got a few different models and most of them are quite stable, if they connect up to 5 devices, they always connect up to 5 devices.

    The problem with the buffer overflow error is that many situations result in a buffer overflow and since its a C level error (not a python error), it kills the program. I was able to isolate a few buffer overflow situations:

    First if the bluetooth dongle is removed while trying to make a connection, the buffer overflow occurs and kills the program.

    Second, when the bluetooth pairing limit is reached, it fails with the buffer overflow message. But we can't know beforehand - by code - if it reached the limit or not, since this is not something the bluetooth hardware provides us via code.

    Third, some random connection issues also seem to end up with a buffer overflow, I have not isolated it yet.

    I don't believe its an issue on mbientlab's code, it is most probably an issue with the linux bluetooth support. Unfortunately it does make it unsafe to create a product supporting linux to connect to the devices if your product allows the end user to choose their own bluetooth dongles. The program might crash with no chance of recovery.

    When required I will run some experiments with two different dongles and mac addresses and see what happens.

    regards

  • @guilhermesilveira,
    Amazing work! Please keep it up and do let us know the make and model of the dongles that performed better. We will be happy to let other users know and this is extremely useful for our community.

  • Is there a way to fix the *** buffer overflow detected ***: python terminated?
    Have issues while connecting to multiple sensors metawear sensors?

    example of the code used:
    from future import print_function
    import sys
    from mbientlab.metawear import MetaWear, libmetawear
    from mbientlab.metawear.cbindings import *
    from time import sleep
    from threading import Event

    def reset(MAC):

    device = MetaWear(MAC)
    device.connect()
    print("Connected")
    
    libmetawear.mbl_mw_logging_stop(device.board)
    libmetawear.mbl_mw_logging_clear_entries(device.board)
    libmetawear.mbl_mw_macro_erase_all(device.board)
    libmetawear.mbl_mw_debug_reset_after_gc(device.board)
    print("Erase logger and clear all entries")
    sleep(1.0)
    
    libmetawear.mbl_mw_debug_disconnect(device.board)
    sleep(1.0)
    
    device.disconnect()
    print("Disconnect")
    sleep(1.0)
    

    if name == 'main':
    reset("F4:XX:XX:XX:XX:23")
    sleep(1.0)
    reset("D5:XX:XX:XX:XX:34")
    sleep(1.0)
    reset("D8:XX:XX:XX:XX:B4")
    sleep(1.0)
    reset("DF:XX:XX:XX:XX:AA")

    error 1592506866.952153: Error on line: 296 (src/blestatemachine.cc): Operation now in progress
    *** buffer overflow detected ***: python terminated

  • In my case I:

    • first isolate the problem, does your entire program work with only one device?
    • if yes, try with two, if yes three
    • find out when it hangs

    If it hangs only with more than one, you might be having the same problem that I do: the chip your usb dongle uses might not support more than N connections at once. In that case you can do as I did, buy a better chip :( Mine currently supports 5 connections for my needs, but it does not handle 6.

    It is said that other apis are stabler than the python due to the underlying libraries, I did not test any other library.

    regards

  • The Javascript APIs are better for apps where you have many sensors because the bluetooth libraries are more reliable.

    As @guilhermesilveira mentioned, the Python libraries we use from third party vendors are very rudimentary and don't support multiple sensors and dongles as well.

    Please do make sure that your code is handling the multiple connections correctly. It should also handle failures in case one of your sensors isn't reset properly, your code should automatically retry.

  • AKRAKR
    edited July 2

    @guilhermesilveira said:
    In my case I:

    • first isolate the problem, does your entire program work with only one device?
    • if yes, try with two, if yes three
    • find out when it hangs

    If it hangs only with more than one, you might be having the same problem that I do: the chip your usb dongle uses might not support more than N connections at once. In that case you can do as I did, buy a better chip :( Mine currently supports 5 connections for my needs, but it does not handle 6.

    It is said that other apis are stabler than the python due to the underlying libraries, I did not test any other library.

    regards

    Thanks @guilhermesilveira. What usb dongle are you using?

Sign In or Register to comment.