Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Visualization Output Plugin #1449

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
250 changes: 250 additions & 0 deletions src/output/plugins/visualization/VisualizationOutputPlugin.cxx
Original file line number Diff line number Diff line change
@@ -0,0 +1,250 @@
/*
* Copyright 2003-2022 The Music Player Daemon Project
* http://www.musicpd.org
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License along
* with this program; if not, write to the Free Software Foundation, Inc.,
* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
*/

/**
* \page vis_out_protocol Visualization Network Protocol
*
* See \ref vis_out "RFC: Visualizatoin Output Plugin" for background.
*
*
* \section vis_out_protocol_timing Timing
*
* In order to deliver sound data to the client at the proper time, the protocol
* needs to take into account:
*
* - network latency: the delta between writing the sound data to the socket & its
* receipt on the client
*
* - player buffering: the player may buffer sound data (mplayer, for instance,
* buffers half a second's worth of audio before beginning playback)
*
* - render time: the client presumably wishes the current frame to appear on-screen
* at the moment the current sound information is ending
*
* Throughout, let \e t be "song time" be measured on the server, and T(t) be
* sound information for song time \e t. Let FPS be the frames-per-second at
* which the client would like to render.
*
* Then, at an interval of 1/FPS seconds, the server needs to write
*
\verbatim
T(t - {buffer time} + {render time} + {one way latency})
\endverbatim
*
* to the client socket. If we denote that time value by tau, then the server
* should wait min(0, -tau) ms to write the first frame.
*
* A few examples will illustrate.
*
* \subsection vis_out_protocol_timing_eg_1 Example 1
*
* Let the client render time be 4ms and round-trip network latency be
* 6ms. Assume no player buffering. In order to render a frame corresponding to
* song time \e t, the client would need, at time \e t - 4 ms, sound information
* corresponding to time \e t, or T(t). The server would need to \e send that
* information at time \e t - 7ms (half of one round-trip plus render time).
*
* In other words, on the server side at song time \e t, we would need to write
* T(t + 7ms) to the client socket. If the server writes T(t+7ms) immediately,
* the client will receive it at \e t + 4ms, take 4ms to render the next frame,
* and so at \e t + 7ms hence, finish rendering T(t+7).
*
* \subsection vis_out_protocol_timing_eg_2 Example 2
*
* Imagine we are running the same client on a laptop, connected to an MPD
* server over the internet, and using mplayer as the player. This gives 500ms
* of buffer time. Let us assume the same 4ms render time, but now a 20ms
* round-trip time.
*
* In order to render a frame corresponding to song time \e t, the client would
* need, at time \e t - 4ms, T(t). This would need to be sent from the server at
* time \e t - 14ms. We now need to incorporate the client-side buffering,
* however. Song time \e t will be actually played on the client at \e t + 500ms
* on the server.
*
* In other words, on the server side at song time \e t, we would need to write
* T(t-486ms) to the client socket.
*
* Since the sound won't start on the client for 0.5 sec, it would make no sense
* to begin writing sound information for 486ms. Let t(0) be the moment the
* client connects and the player begins buffering. If, at t(0) + 486ms, the
* server writes T(t(0)), the client will receive it at t(0) + 496ms & complete
* rendering it at t(0) + 500ms, which is when the client-side player will
* begin playing song time t(0).
*
* \section vis_out_protocol_proto The Protocol
*
* \subsection vis_out_protocol_proto_design Design
*
* The author is unaware of any existing network protocols in this area, so he
* designed his own after reveiwing the Shoutcast & Ultravox
* protocols. Experience with the TLS & 802.11 protocols also informed this
* design.
*
* Design goals include:
*
* - client convenience
* - this in particular drove the choice to stream updates; everything
* needed to simply push the data out is knowable at handshake time,
* so why force the client to send a request?
* - efficiency on the wire
* - binary format
* - streaming preferred over request/response
* - future extensibility
* - protocol versioning built-in from the start
* - parsing convenience
* - streaming messages come with a few "magic bytes" at the start
* to assist clients in "locking on" to the stream & recovering from
* corrupted data
* - all messages conform to the "type-length-value" (TLV) format
* beloved of parser writers
*
*
* \subsection vis_out_protocol_proto_overview Overview
*
* The protocol is a combination of request/response as well as streaming. After
* an initial handshake (client goes first) the server will begin streaming
* messages to the client; i.e. at the interval the client specified during the
* initial handshake the server will send FRAME messages containing sound
* information useful for visualizers. Additionally, METADATA messages will be
* sent on receipt of notifications from MPD that the song has changed. The
* client need not request these messages or does the client need to acknowledge
* them in any way.
*
* The client \e may, at any time after handshake completion, initiate two other
* exchanges:
*
* - PING: the client may want to periodically adjust it's estimate of the
* round-trip client-side latency; the server will respond with a PONG.
* The client can measure the delta between request & response & update
* its estimate.
*
* - ADJBUF: the client may periodically want to adjust the "buffer time";
* that is, the delta between song time as measured on the server and
* the song time to each FRAME & METADATA frame corresponds; the server
* will adjust it's internal timers & respond with an ADJBUFACK message.
* The server \e may send one more frame at the old setting of tau.
*
* Schematically, a conversation looks like this:
*
\verbatim
Client Server

desired protocol version
tau (buffer offset)
desired sound params --------- CLIHLO --------->
damping parameter
window parameter
...

<-------- SRVHLO --------- offered protocol version
|
sequence number --------- PING ----------> |
<-------- PONG ---------- sequence number | tau ms
|
|
<------- METADATA-------- artist, title &c v
<-------- FRAME --------- samples, spectrum |
bass/mids/trebs |
... | tau ms
|
<------- METADATA-------- artist, title &c v
<-------- FRAME --------- samples, spectrum |
bass/mids/trebs |
... |
sequence number --------- PING ----------> |
<-------- PONG ---------- sequence number | tau ms
|
|
<------- METADATA-------- artist, title &c v
<-------- FRAME --------- samples, spectrum |
bass/mids/trebs |
... | tau ms
|
<------- METADATA-------- artist, title &c v
<-------- FRAME --------- samples, spectrum |
bass/mids/trebs |
... | tau ms
|
tau' (new buffer -------- ADJBUF --------> |
offset) <------ ADJBUFACK ------- |
|
<------- METADATA-------- artist, title &c v
<-------- FRAME --------- samples, spectrum |
bass/mids/trebs |
... | tau' ms
|
<------- METADATA-------- artist, title &c v
<-------- FRAME --------- samples, spectrum
bass/mids/trebs
...
....
(forever)
\endverbatim
*
* There is no formal "close" or "teardown" message; each side simply detects
* when the other has gone away & treats that as the end of the conversation.
*
*
* \subsection vis_out_protocol_proto_msgs Messages
*
* All messages:
*
* - integers use network byte order (i.e. big endian)
* - use TLV format (streaming messages prepend magic bytes)
*
\verbatim

+-----------------------+-----------------+-----------------------+--------+
| TYPE (16-bit unsigned)| LENGTH | PAYLOAD | CHECK |
| class | message type | 16-bits unsigned| LENGTH bytes | 1 byte |
|-------+---------------|-----------------|-----------------------+--------+
| 4 bits| 12 bits | (max len 65535) | format is msg-specfic | 00 |
+-----------------------+-----------------+-----------------------+--------+

\endverbatim
*
* Notes:
*
* - the message type is comprised of two values packed into a u16_t:
*
* - class: (type & f000) >> 12:
* - 0: handshake
* - 1; control (PING, e.g.)
* - 2: streaming (FRAME, e.g.)
*
* - message type: (type & 0ffff) see below for values
*
* - the "length" field is the length of the \e payload \e only
*
* - the "check" byte is intended as a sanity test & shall always be zero
* TODO(sp1ff): replace this with a proper checksum?
*
* TODO(sp1ff): define each message
*
*
*/


const struct AudioOutputPlugin visualization_output_plugin = {
"visualization",
nullptr, // cannot serve as the default output
nullptr, // TODO(sp1ff): Write me!
nullptr, // no particular mixer
};
136 changes: 136 additions & 0 deletions src/output/plugins/visualization/VisualizationOutputPlugin.hxx
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
/*
* Copyright 2003-2022 The Music Player Daemon Project
* http://www.musicpd.org
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License along
* with this program; if not, write to the Free Software Foundation, Inc.,
* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
*/

#ifndef MPD_VISUALIZATION_OUTPUT_PLUGIN_HXX
#define MPD_VISUALIZATION_OUTPUT_PLUGIN_HXX

/**
* \page vis_out RFC: Visualiation Output Plugin
*
* \section vis_out_intro Introduction
*
* This plugin started from a conversation on the #mpd IRC channel. I asked
* about the best way to implement a music visualizer as a remote <a
* href="https://musicpd.org">MPD</a> client. All the current MPD visualizers of
* which I'm aware use the <a
* href="https://mpd.readthedocs.io/en/latest/plugins.html#fifo">fifo</a> output
* plugin and hence must be run on the same host as the MPD daemon.
*
* The response I got was a suggestion that I write an output plugin that would
* \e just stream the data needed to implement a remote visualizer. I've begun
* work on such a plugin, but before I spend too much time implementing it I
* would like to lay out my proposal & solicit feedback.
*
* The codebase uses <a href="https://www.doxygen.nl>Doxygen</a>-style comments,
* so I'm presenting this RFC as a few doxygen pages in the first files I'd be
* adding to the project.
*
*
* \section vis_out_prior Prior Art
*
* Music visualization sources which I consulted before settling on this
* proposal:
*
* - This <a href="https://github.com/MusicPlayerDaemon/MPD/pull/488>PR</a>
* proposed solving this problem by implementing an output plugin that would
* stream the raw PCM data over TCP, the idea being that the remote visualizer
* would do the sound analysis client-side. The PR was discarded as being
* redundant with the <a
* href="https://mpd.readthedocs.io/en/latest/plugins.html#httpd">httpd</a>
* output plugin. I would also observe that such a solution sends far more
* data on the wire than is needed for visualization (on which more below).
*
* - <a href="https://github.com/ncmpcpp/ncmpcpp">ncmpcpp</a> uses the fifo
* output plugin, and as such can only provide the visualization feature when
* it's being run locally. The sound analysis is limited, as well (see below)
*
* - <a href="https://github.com/dpayne/cli-visualizer">cli-visualizer</a> will
* work with the MPD fifo (again assuming the MPD daemon is running
* locally). Limited sound analysis, as well.
*
* - <a href="http://www.geisswerks.com/about_milkdrop.html">MilkDrop</a>:
* reading the source code was very instructive in terms of sound analysis for
* music visualization; that aspect of this proposal is largely based on it.
*
*
* \section vis_out_proposal The Proposal
*
* A new output plugin "visualization" will be implemented. The plugin will
* cache recent PCM data. The plugin will also be a ServerSocket. When clients
* connect, they will provide the details of the sound analysis they would like
* performed, the frame rate at which they would like to receive updates and the
* offset between client-side render time & server-side song time (to account
* for network lag, client-side buffering & the time needed to render each
* frame). Once that initial handshake is complete, the server will stream
* updates containing sound analysis results at regular intervals to the
* client.
*
* \subsection vis_output_proposal_analysis Sound Analysis
*
* Given audio data in raw PCM format, a number of steps may be taken to
* analyze that data & produce infromation useful to visualizers:
*
* - the PCM data may optionally be damped by taking a weighted average between
* the current values & prior values in the time domain; this will have the
* effect of reducing noise in the higher frequency ranges
*
* - the PCM data may have a "window function" applied to it in the time domain
* around the time of interest; such a function has the effect of "dialing
* down" audio samples further from the timestamp of interest and again will
* reduce higher-frequency noise; the size of the window may be configured to
* incorporate more or less data as desired.
*
* - the resulting PCM data will be shifted into the frequency domain by
* application of the Discrete Fourier Transform
*
* - the human ear can only distinguish frequence from (about) 200Hz to 20000Hz,
* and in practice musical sound information doesn't show much activity above
* 10000Hz; it is therefore convenient to throw out frequency data outside
* some (client-configurable) range
*
* - it is also convenient to divide the resulting spectrum into a few coarse
* bands, such as bass/mids/trebs. This is computationally non-trivial because
* perceptually, frequency is not linear, it's logrithmic. A change of one
* octave corresponds to a doubling in frequency. Intuitively, this means that
* the difference betwenn 200 & 300Hz is much greater than the difference
* betwen 5000 & 5100Hz, e.g. The plugin will peform this service for
* each client.
*
* - it can also be useful to maintain a weighted time average of the activity
* in each frequency range for purposes of beat detection
*
*
* \subsection vis_output_protocol The Protocol
*
* Note that each update need only be based on relatively few samples (Winamp,
* e.g. uses 576). This will keep the data transferred on the wire small (at
* least by comparison to, say, the httpd output plugin which of course needs to
* send the entire song). Casting the protocol in terms of client-side FPS
* allows us to avoid a "request/response" protocol & simply stream until the
* client goes away.
*
* I've broken out the detailed protocol specification into its own
* \ref vis_out_protocol "page".
*
*
*/

extern const struct AudioOutputPlugin visualization_output_plugin;

#endif