Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try different codecs for Orcasound #19

Open
Molkree opened this issue Jun 15, 2021 · 1 comment
Open

Try different codecs for Orcasound #19

Molkree opened this issue Jun 15, 2021 · 1 comment
Labels
enhancement New feature or request

Comments

@Molkree
Copy link
Member

Molkree commented Jun 15, 2021

Streaming in flac

Right now aac seems to cut off frequencies above 16-17 kHz:
image

So flac (or even some tweaked aac?) will provide more data.
No nodes stream in flac at the moment but over the summer we can try that and see if it's significantly better.

@Molkree Molkree added the enhancement New feature or request label Jun 15, 2021
@Molkree
Copy link
Member Author

Molkree commented Jun 15, 2021

Related discussion from Slack

@wetdog:

I see that you're calculating the spectrogram directly on the data that comes from wavfile.read method, I tested and the specgram function of matplotlib does produces the same spectrogtam images, but the scale of the data is different as wavfile.read outputs int16, or int24 data and it's common to scale waveforms to -1,1 range.

@Molkree:

yeah, it's int16 because ffmpeg converts to pcm_s16le

Input #0, mpegts, from 'bush_point/live999.ts':
  Duration: 00:00:09.92, start: 9991.417211, bitrate: 149 kb/s
  Program 1 
    Metadata:
      service_name    : Service01
      service_provider: FFmpeg
    Stream #0:0[0x100]: Audio: aac (LC) ([15][0][0][0] / 0x000F), 48000 Hz, stereo, fltp, 124 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (aac (native) -> pcm_s16le (native))

I haven't put much thought into ffmpeg settings when converting :thinking_face:
So maybe specifying f32le would increase quality and it will be in float -1,1 range like you said
I just assumed that ffmpeg would choose the same/best as incoming .ts but it used "The default for muxing into WAV files is pcm_s16le"
I just don't quite understand what the incoming audio bitness is, it says fltp (which stands for Planar Floating point format that suggests floating point) and aac (LC)/aac (native)

So again to the question if it's even worth it to use more bits cause I don't know the incoming source
I've looked at the .ts file through MPC-HC Properties and it lists this:

Audio: PCM 48000Hz stereo 1536kbps [A: pcm_s16le, 48000 Hz, 2 channels, s16, 1536 kb/s]
Audio: IEEE Float 48000Hz stereo 3072kbps [A: pcm_f32le, 48000 Hz, stereo, fp32, 3072 kb/s]
Audio: AAC 48000Hz stereo 130kbps [A: aac lc, 48000 Hz, stereo, 130 kb/s]

Both pcm_s16le and pcm_f32le so I'm confused

I've tried converting .ts to pcm_f32le .wav
here's the comparison:
.ts -- 181 KB
.wav pcm_s16le -- 1877 KB
.wav pcm_f32le -- 3753 KB

int16 spectrogram
live000_int16_spectrogram

float32 spectrogram
live000_float32_spectrogram

@scottveirs:

FWIW each of the current nodes is using the Pisound ADC to sample the hydrophones at 48kHz and 24 bits, usually in stereo, i.e. two channels from two nearby hydrophones. The ffmpeg command running on each node is something close to:

ffmpeg -f jack -i ffjack -f segment -segment_list "/tmp/$NODE_NAME/hls/$timestamp/live.m3u8" -segment_list_flags +live -segment_time $SEGMENT_DURATION -segment_format mpegts -ar $STREAM_RATE -ac $CHANNELS -threads 3 -acodec aac "/tmp/$NODE_NAME/hls/$timestamp/live%03d.ts"

Note the -acodec aac part which I think suggests we have been using (without much forethought) whatever ffmpeg considers "defaults" for aac encoding -- https://trac.ffmpeg.org/wiki/Encode/AAC

Confirming via remote login to each Rpi just now that Orcasound Lab, Bush Point, and Port Townsend all have STREAM_RATE=48000. They are all sampling the hydrophone signals at 48,000 samples/second, though Orcasound Lab is running a slightly more recent branch of orcanode using Jack that we know could be pushed up to a sample rate of 192,000 (e.g. for experiments in higher-resolution sampling this summer).

@Molkree:

I can't see any difference at all between int16/float32 in Sox (it was noticeable with previous method)
But if Scott said that hydrophones sample at 24 bits maybe s24le would be enough anyway

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant