Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing 1 in 256 edge-case that can lead to continuous no-output. #10

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Setcover
Copy link

@Setcover Setcover commented Feb 6, 2021

If the last byte of the checksum is 0x42 it can be mistaken for the first start-byte, causing a byte-shift which leads to checksum mismatch followed by no ouput (continuously in the HEPA edge-case).
Therefore checking the secondy start-byte is necessary.

The probability of this happening is 1 in 256 and in addition it has 100% probability if testing HEPA-filtered air with 0 particles.
Reason: With HEPA-filtered air all bytes are zero except:
start 1 + start 2 + frame-length + reserved
Decimal: 66 + 77 + 28 + 151 checksum: 256 + 66
Hex: 0x42 0x4d 0x00 0x1c ... 0x97 checksum: 0x01 0x42
Testing with hepa-filtered air, an arduino nano, software serial (2,3) the byte-shift with continuous no-output happened repeatedly after less than 10 measurement cycles. With disabled checksum-check the 15 uint16_t components of data then look like this:
Byte-shift.txt

---- considering & disregarding further edge-cases ----
It is not possible for byte 31 (second checksum-byte) to be 0x42 too, as the sum of the first 30 bytes is at most 30*255 which is less than 66 * 256 which is the smallest checksum that would cause byte 31 to be 0x42 (=66).
The sequence 0x42 0x42 0x42 0x4d with the last two being the start bytes is therefore impossible.

Though it is not impossible for the sequence 0x42 0x42 0x4d or 0x42 0x4d to appear at another position in the sequence it can only happen with:

  • particle count above 19712 = 77*256 and next finer particle count being equal to 66 mod 256
  • particle count of precisely 16973 = 66*256 + 77

which is both really unlikely and very likely not a stable condition with a low chance of multiple byte-shift leading to checksum-mismatch and continuous no-output.

…cksum mismatch and therefore no output.

If the last byte of the checksum is 0x42 it can be mistaken for the first start-byte, therefore checking the secondy start-byte is neccessary.
The probability of this happening is 1 in 256 and in addition it has 100% probability if testing hepa-filtered air with 0 particels.
Reason: All non-zero bytes remaining are: 
Decimal: 66 + 77 + 28 + 151 checksum: 256 + 66 
Hex: 0x42 0x4d 0x00 0x1c ... 0x97 checksum: 0x01 0x42

It is not possible for byte 31 (second checksum-byte) to be 0x42 too, as the sum of the first 30 bytes is at most 30*255 which is less than 66 * 256 which is the smallest checksum that would cause byte 31 to be 0x42 (=66). The sequence 0x42 0x42 0x42 0x4d with the last two being the start bytes is therefore impossible.

Though it is not impossible for the sequence 0x42 0x42 0x4d or 0x42 0x4d to appear at another position in the sequence it can only happen with particle count above 16896 = 66*256 and then this event is extremely unlikely itself and will extremely likely lead to a checksum mismatch therefore be discarded.
@Setcover Setcover changed the title Fixing 1 in 256 edge-case that can lead to byte shift followed by che… Fixing 1 in 256 edge-case that can lead to continous no-output. Feb 6, 2021
fixed formatting
this time using clang-format
@ladyada
Copy link
Member

ladyada commented Feb 6, 2021

@dherrada wanna test?

@Setcover Setcover changed the title Fixing 1 in 256 edge-case that can lead to continous no-output. Fixing 1 in 256 edge-case that can lead to continuous no-output. Feb 6, 2021
@ladyada ladyada requested a review from caternuson September 6, 2021 00:45
@caternuson
Copy link

Would switching to passive mode be an easier way to deal with this?

It looks like the issue relates to determining the start byte(s) from a continuous stream of serial data (active mode). Datasheet is a bit sparse, but it looks like passive mode is more of a request/response setup.

image

@brentru
Copy link
Member

brentru commented Jan 23, 2025

@caternuson Hi - we are looking at an issue with this driver on WipperSnapper. It's been a few years since you proposed this fix. Do you still think that switching the read to "passive mode" would resolve this issue?

@caternuson
Copy link

@brentru I think it's at least worth investigating. Just checked and I do not currently have a PID 3686 to try it out though. I'll order one up to have since it looks like the general issue this PR was fixing still exists.

@brentru
Copy link
Member

brentru commented Jan 24, 2025

@caternuson Thanks!

@caternuson
Copy link

@brentru OK, got a PID 3686 and played around with it a bit. It looks like all the current guides and examples are based on this "active mode" which does have the benefit of only needing to connect to the TX pin. My suggestion would require more than a simple code update, since connecting to RX would then be needed. So, no longer suggesting "passive mode". At least not now - maybe for a future major revision.

The general idea for the fix seems simple - just need to check for both start bytes:
image
and not just 0x42.

This PR has merge conflicts now though. :(

What are the two sensors being supported now?

// Check for the start character in the stream for both sensors
if ((serial_dev->peek() != 0x42) && (serial_dev->peek() != 0x16)) {
serial_dev->read();
return false;
}

What has the 0x16 start byte? And might a similar issue exist for that sensors?

@brentru
Copy link
Member

brentru commented Jan 30, 2025

The 0x16 start byte may be the for the Cubic PM1006 air quality sensor present on the IKEA VINDSTYRKA

@caternuson
Copy link

Yah, looks like. It was added with #13. And it looks like it does have more than 0x16 as a start sequence:

// Validate start header is correct if using Cubic PM1006 sensor
if (is_pm1006 &&
(buffer[0] != 0x16 || buffer[1] != 0x11 || buffer[2] != 0x0B)) {
return false;
}

but with 3 bytes less likely to be false detected.

I agree with the general intent of what this PR is attempting. Stick with "active mode" for the sensor and improve the start sequence detection logic. But now there are two sensors in play. It may be easier to break things out in subclasses and override read() in each as needed for the specific sensor.

@brentru
Copy link
Member

brentru commented Jan 30, 2025

It may be easier to break things out in subclasses and override read() in each as needed for the specific sensor.

Yeah, Cubic has more than just the start byte to detect. I think that is a good plan, virtual read() then both classes override it (PM and Cublic).

@brentru
Copy link
Member

brentru commented Jan 31, 2025

@caternuson Aren't we detecting both start bytes for active mode within the existing driver, though:

https://github.com/adafruit/Adafruit_PM25AQI/blob/master/Adafruit_PM25AQI.cpp#L130

@caternuson
Copy link

Hmmm. Maybe? This issue/PR is 4 years old now. And that looks like an update to the logic since then.

The code at the point of time of this PR was much simpler and just checked for the first byte:

if (serial_dev->peek() != 0x42) {
serial_dev->read();
return false;
}
// Now read all 32 bytes
if (serial_dev->available() < 32) {
return false;

Can also see that in the code diff for this PR.

So maybe those updates fixed this issue? But you are currently seeing an issue with Wippersnapper?

@brentru
Copy link
Member

brentru commented Jan 31, 2025

@caternuson Yep, I understand. It appears that we are checking for both start bytes (buffer[0] != 0x42 || buffer[1] != 0x4d). I'm not sure what an appropriate fix would be here without more testing.

But you are currently seeing an issue with Wippersnapper?

@tyeth raised the issue in WipperSnapper in September and I believe he tested it recently. He was able to "patch" it by adding a retry around reading the data, but long-term that's a patch around a driver-specific issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants