Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Planning for long duration testing with audio watermarking #46

Closed
cta-source opened this issue Feb 4, 2022 · 10 comments
Closed

Planning for long duration testing with audio watermarking #46

cta-source opened this issue Feb 4, 2022 · 10 comments

Comments

@cta-source
Copy link
Contributor

The current audio watermarking proposal has a 60s duration "base" pseudo-random sequence. The test content spec (currently being integrated into the DPC spec as an Annex) defines some stereo WAV files, e.g., PN01.wav, with the prescribed noise in the Lch and silence in the Rch. The code structure permits extracting a mediaTime from a 20mS segment of recorded audio.

For long duration playback testing, if we loop the 60s PN sequence, mediaTime will effectively become modulo(60s), so an actual time of 10m12s will be detected as 12 seconds.

I think this is adequate for our purposes, since the OF can be monitoring time throughout. An error of skipping an integer number of minutes could happen, in theory, but it would need to be accurate to the 20mS level. I'm not sure this is worth the effort to plan for e.g. 2 hour long sequences. Plus there are some implementation issues--we do NOT want to process 2 hours of PN sequence in the same manner as we process 60s.

So--is this OK for long duration playback, that when we compare the recorded time of X min Y seconds, we only look at the Y?

@rbouqueau
Copy link
Collaborator

So--is this OK for long duration playback, that when we compare the recorded time of X min Y seconds, we only look at the Y?

The answer to this question is also needed to generate the test content.

@jpiesing
Copy link

So--is this OK for long duration playback, that when we compare the recorded time of X min Y seconds, we only look at the Y?

@cta-source Who is in a position to answer this question?

@cta-source
Copy link
Contributor Author

I can answer, since it's my proposed approach. @rbouqueau , I'm going to over-complicate this in case I don't have your intended question right.

  • If we're doing a Long Duration Playback test, verifying that the audio played out 100% and did not terminate early, I would verify that X and Y are correct, not just Y.

  • If the test is, "all segments played out and in order, through the (e.g.) 2 hour period", there is code in the audio white noise project to verify each audio segment occurs at the correct location. This code can only check around a local "neighborhood" of length observationperiod, which is usually 20mS. Because the pre-coded white noise file (e.g., PN01.wav) is only 60s long, you can only use it to find a segment error to a mod(60s) resolution. The calling program must repeatedly ask, "Starting at neighborhood, where in the pre-code white noise file does the given audio for the next observationperiod show up?" In this example, neighborhood might be, e.g., 1h12m18s (but expressed in units of samples at 48 kHz); the observationperiod is 20 mS (also in samples); you must specify PN01.wav, and the return is an array of presentation time (again in samples).

Critically, for generating the test content--it would be much better if the 'looping' consistently puts exactly the first sample of the next block of PN01 file data immediately after the final sample of the prior block of PN01 file data. That is, if PN01 is the array of samples of PN01,

T=00: PN01.wav[0], PN01.wav[0], ... PN01.wav[47999]
T=60: PN01.wav[0], PN01.wav[0], ... PN01.wav[47999]
T=120: PN01.wav[0], PN01.wav[0], ... PN01.wav[47999]
...
T=(1h59m00s): PN01.wav[0], PN01.wav[0], ... PN01.wav[47999]

...with no extra sample or missing sample between 47999 of one block, and 0 of the next.

If this not feasible, please let me know. If you can generate a test file ready, I can check it.

@rbouqueau
Copy link
Collaborator

Understood. However who can confirm that only checking Y in {X,Y} is ok for validation?

@jpiesing
Copy link

Understood. However who can confirm that only checking Y in {X,Y} is ok for validation?

I don't fully understand the PNR approach but it seems to me that it would catch the following;

  • Glitches (repetitions, omissions) where these show as a block of non-standard length
  • A wrong static offset between video and audio less than the 60s block length
  • A wrong dynamic offset between video and audio as long as this is small
  • A consistent drift between video and audio that is not corrected
    Are there other kinds of problem which might happen?

If checking Y in {X,Y} would catch the problems we can identify and there's no better solution then go for it.

@cta-source
Copy link
Contributor Author

Sorry, I wasn't clear. I originally proposed checking only Y in {X, Y} in February. Since then, after more discussion, I don't think that's right. See my 7/22 comment in this issue. To summarize,

  • If we're doing a Long Duration Playback test, verifying that the audio played out 100% and did not terminate early, I would verify that X and Y are correct, not just Y. This can be done in a number of ways, most of which don't require the white noise code.
  • If we are verifying the DPCTF observation requirement that all segments play out in order, then checking for Y does not do that. The white noise code can check that.

So: Checking only Y is not OK for validation. Checking X,Y for length of playback makes sense if that is the observation requirement. Using white noise segment validation makes sense for stricter observation requirements.

If it would help to jump on a call, I can do that.

@jpiesing
Copy link

I'm still lost about what we're discussing.

I think the question we're trying to answer is "what audio to use in the long duration playback stream"? Am I correct?

If so, then I see 4 choices.

  • Looping PN01v02.wav
  • The original audio from Croatia / Tears of Steel as appropriate with the beeps that @nicholas-fr adds to sync with the flashes
  • Both of the above mixed
  • Nothing

Is there a 5th choice?

I was expecting to go with option 1 ...

@cta-source
Copy link
Contributor Author

@jpiesing and @rbouqueau ; I agree with the 4 choices except that "nothing" is a bit of a null choice. I don't see a 5th choice. Commenting on the remaining 3:

If we pick one "winner" of these three options, we are betting a bit on what we can or cannot do.

  • Looping PN01v02.wav
    What we get: Most robust version of testing mic-to-speaker
    What we lose: Any manual audio sync testing from beeps and flashes.
  • The original audio from Croatia / Tears of Steel as appropriate with the beeps that @nicholas-fr adds to sync with the flashes
    What we get: Manual audio sync testing
    What we lose: Any automated (white noise) testing
  • Both of the above mixed
    What we get: Manual audio sync testing; wired (line-out to line-in) audio sync testing; maybe mic-to-speaker audio sync testing (depends on environmental noise).
    What we lose: Robustness of the noise-only version for mic-to-speaker; enough environmental noise will overwhelm the system.

(Re "enough" environmental noise: In one mic-to-speaker white noise test, I put the mic next to a TV turned up to a reasonable listening volume, and put the speaker a couple of meters away. No problem.)

If the above three are the options, I recommend we attempt "mixed". Based on my own testing, the white noise should be resolved properly. We'll know when we try the Croatia annotated audio, but my tests were promising.

Could I suggest an intermediate step? Generate two cases, "Looping" and "Mixed" (#1 and #3 above), but for shorter times, like 10 minutes. I can test them and find out things like, does a beep screw up the sync pattern on the receiver, does the noise come through on mixed, is the looping accurate (no dropped/added periods) at the sample level?

@rbouqueau , for mixing, the white noise file should be mixed at -13 dB below the "main" audio file peak signal (and the peak may be the beep?). That is, if the main audio peaks at one level, the white noise should be attenuated by 13 dB before mixing with the main.

@rbouqueau
Copy link
Collaborator

I see that some audio is included in https://dash.akamaized.net/WAVE/Mezzanine/releases/3/tos_LD1_1920x1080@30_7200.mp4 . Does this mean that this issue is implemented? CC @nicholas-fr

@nicholas-fr
Copy link
Collaborator

Audio in LD content was replaced with looped PN01 in mezzanine release v4, resolving this issue.

Further work to determine if we can use combined PN + source audio is underway, and a separate issue (#55)was raised to track that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants