Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Erroneous data tries to resend forever. #1

Open
ninjatill opened this issue Oct 10, 2018 · 2 comments
Open

Erroneous data tries to resend forever. #1

ninjatill opened this issue Oct 10, 2018 · 2 comments

Comments

@ninjatill
Copy link

ninjatill commented Oct 10, 2018

I recently started using this library and in testing today I have some erroneous data that was published. I'm not sure what generated the erroneous data but I am using an RS232 barcode scanner to read barcodes. The erroneous data will sit in the queue and resend forever. This blocks any subsequent, good, data from ever getting published. I would recommend a way of tracking the number of retries and setting a limit to the number of retries. If the limit is reached, the bad data can be deleted from the queue.

I would also suggest an option on the constructor which would allow the queue to skip over the retry data and send subsequent messages in the queue. Then it could still retry the first message as needed within the retry timeout. This would cause the published messages to arrive out of order but in my case that is perfectly acceptable.

In this image, that erroneous data is trying to resend every 30 seconds. I tried to scan a new barcode but it goes into the queue and never gets published because the erroneous data is blocking any new publishes.

image

@ninjatill
Copy link
Author

I missed the fact that this uses retained memory for the queue and so a simple reset failed to purge the erroneous data. So this morning, I had to cut power to device to clear the bad data.

@zabaat
Copy link

zabaat commented Apr 30, 2023

So my corrupted data was causing crashes when it tried to upload, I'm approaching this by modifying the checkQueueState() by calling a validation function that checks for garbage in the event and discarding the event as suggested by @atwalsh in #9
🙏 that it works in the field since I have 3 devices crashing with over 6000 events on them.
I have tested this on a events.dat pulled from one of the SD cards and it does filter the bad event as you can see in this image, and allows the others to proceed
image

bool isAnomalous(const std::string &s) {
    size_t nonPrintableCount = 0;
    for (const char &c : s) {
        if (!std::isprint(c) && c != '\0') {
            nonPrintableCount++;
        }
    }
    // Set a threshold, for example, 0.1 (10%)
    double threshold = 0.00;
    pubqLogger.trace("anomalous threshold: %f", static_cast<double>(nonPrintableCount) / static_cast<double>(s.length()));
    return (static_cast<double>(nonPrintableCount) / s.length()) > threshold;
}

bool validateEvent(PublishQueueEventData *eventData) {
    if (eventData == nullptr) {
        return false;
    }

    const char *buf = reinterpret_cast<const char *>(eventData);
    const char *eventName = &buf[sizeof(PublishQueueEventData)];
    const char *eventDataStr = eventName + strlen(eventName) + 1;

    std::string eventNameStr(eventName);
    std::string dataStr(eventDataStr);

    return !isAnomalous(dataStr) && isAsciiPrintable(dataStr);
}

void PublishQueueAsyncBase::checkQueueState() {
    if (!pausePublishing && Particle.connected() && millis() - lastPublish >= 1010) {
        PublishQueueEventData *data = getOldestEvent();

        if (data && validateEvent(data)) {
            // We have an event and can probably publish
            isSending = true;

            const char *buf = reinterpret_cast<const char *>(data);
            const char *eventName = &buf[sizeof(PublishQueueEventData)];
            const char *eventData = eventName;
            eventData += strlen(eventData) + 1;

            PublishFlags flags(PublishFlag(data->flags));

            pubqLogger.info("publishing %s %s ttl=%d flags=%x", eventName, eventData, data->ttl, flags.value());

            auto request = Particle.publish(eventName, eventData, data->ttl, flags);

            // Use this technique of looping because the future will not be handled properly
            // when waiting in a worker thread like this.
            while (!request.isDone()) {
                delay(1);
                if (!isSending) {
                    pubqLogger.info("publish canceled");
                    return;
                }
            }
            bool bResult = request.isSucceeded();
            if (bResult) {
                // Successfully published
                pubqLogger.info("published successfully");
                discardOldEvent(false);
            } else {
                // Did not successfully transmit, try again after retry time
                // This string is searched for in the automated test suite, if edited the
                // test suite must also be edited
                pubqLogger.info("publish failed, will retry in %lu ms", failureRetryMs);
                stateHandler = &PublishQueueAsyncBase::waitRetryState;
            }
            isSending = false;
            lastPublish = millis();
        } else {
            // No event
            // CUSTOM CODE https://github.com/rickkas7/PublishQueueAsyncRK/issues/9
            // There must be some number of events in the queue, even though lookup failed
            if (this->getNumEvents() > 5) {
                pubqLogger.info("DISCARDING BACKLOGGED EVENTS! custom modification to PublishQueueAsyncRK.cpp");
                discardOldEvent(false);
            }
        }
    } else {
        // Not cloud connected or can't publish yet (not connected or published too recently)
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants