Skip to content

UDP Sound Sync

Frank edited this page Jan 12, 2025 · 47 revisions

Sound Sync

UDP Sound Sync is a feature to synchronize (share) the sound input of a 'master' device with one or more 'slave' devices. All devices must be running WLED audioreactive, and they must be in the same local network.

If you have more than one device or wish to use Line Input but your audio source is not right next where your LEDs are, then Audio Sync is the feature you need, no need for multiple mics or long cables. Sound Sync is different from direct LEDs control using protocols like E1.31, DMX, DDP or AdaLight. For the WLED instance that you wish to be the source, simply set the Sync mode in the Audio Reactive module to Send and then use Recieve on all the WLED instances you wish to use the data.

You might have a setup something like this or just a simple wired line-in rather than Bluetooth or Chromecast

UDP Sound Sync does not sync the actual animations, but rather transmits summary audio sampling information to several devices that still run their own animations locally. In a nutshell, it means that several devices can share a single audio source.

Setup

Before configuring UDP Sound Sync, make sure you have gone into the WiFi Preferences and clicked on 'Disable WiFi sleep' at the bottom of the page. It has caused us innumerable problems in the past.

In order to configure UDP sound sync, the ‘master’ needs to be an ESP32 along with an audio input.

You would then to go the ‘Sync Interfaces’ page and configure the 'Audio Sync' at the bottom of the page. Transmit (Send) for the ESP32 and Receive for devices without an audio input (either ESP32's or ESP8266's). Make sure the UDP port is the same on all devices.

This does not sync the actual animations, but rather just the transmission of summary audio sampling information (as best we can) at 10-50 fps.

In order to change the UDP Sync Mode (Disabled/Transmit/Receive), you need to power-cycle the ESP32/ESP8266.

Technical

When an ESP32 is configured for audio transmission, it will connect to a UDP Multicast address, and begin sending a single UDP Multicast packet containing the data used to generate sound-reactive animations out to any other devices that are configured to receive on the same network. The following information is transmitted:

V1 format, used in SR WLED up to 0.13.x

the V1 format is not recommended for new development, as every WLED variant (since 0.13.3) can read the V2 format.

#define UDP_SYNC_HEADER "00001"
struct audioSyncPacket {
  char header[6] = UDP_SYNC_HEADER;
  uint8_t myVals[32];     //  32 Bytes
  int sampleAgc;          //  04 Bytes
  int sampleRaw;          //  04 Bytes
  float sampleAvg;        //  04 Bytes
  bool samplePeak;        //  01 Bytes
  uint8_t fftResult[16];  //  16 Bytes - FFT results, one byte per GEQ channel
  double FFT_Magnitude;   //  08 Bytes
  double FFT_MajorPeak;   //  08 Bytes
};

UDP_SYNC_HEADER is a versioning number that's defined in audio_reactive.h

  • this is a C language "struct". Due to padding performed by gcc, the actual V1 package is slightly bigger; it includes "padding bytes" for aligning struct members to word boundaries.
  • make sure that "reserved" and "gap" fields are initialized to 0.

V2 Format - WLED version >= 0.14.0 (including MoonModules fork)

#define UDP_SYNC_HEADER_V2 "00002"

// new "V2" audiosync struct - 44 Bytes
    struct __attribute__ ((packed)) audioSyncPacket {  // WLEDMM "packed" ensures that there are no additional gaps
      char    header[6];      //  06 Bytes  offset 0
      uint8_t gap1[2];        // gap added by compiler: 02 Bytes, offset 6
      float   sampleRaw;      //  04 Bytes  offset 8  - either "sampleRaw" or "rawSampleAgc" depending on soundAgc setting
      float   sampleSmth;     //  04 Bytes  offset 12 - either "sampleAvg" or "sampleAgc" depending on soundAgc setting
      uint8_t samplePeak;     //  01 Bytes  offset 16 - 0 no peak; >=1 peak detected. In future, this will also provide peak Magnitude
      uint8_t frameCounter;   //  01 Bytes  offset 17 - track duplicate/out of order packets
      uint8_t fftResult[16];  //  16 Bytes  offset 18
      uint8_t gap2[2];            // gap added by the compiler: 02 Bytes, offset 34
      float  FFT_Magnitude;   //  04 Bytes  offset 36
      float  FFT_MajorPeak;   //  04 Bytes  offset 40
    };

Caution:

  • this is a C language "packed struct", so you can see padding bytes as gaps.
  • binary formats are the ones utilized by the esp32 compiler (float is 4bytes, uint8_t is one byte unsigned)
  • make sure that "reserved" and "gap" fields are initialized to 0.

The V2 format expects that AGC is performed by the sender, so there is no need to transmit "AGC" and "non-AGC" samples separately. To save bandwidth, the myvals[] array was removed, and all numbers are either float or uint8_t.

SR-WLED 0.13.3 still sends out V1 format, however it is able to receive and decode V2 format, too.

values

  • In general, all sample data is scaled to be in the range of [0...255], to make effects happy. For the values transmitted as float, additional accuracy can be provided by using the fraction part of the number - for example sampleSmth= 127.125. Samples transmitted are the max value from approx 20 milliseconds of sampling, with AGC gain already applied.
  • fftResult[16] are 16 frequency "channels", values in the range of [0...255]
  • uint8_t samplePeak: 0 if no peak, 1 if peak
  • FFT_MajorPeak: strongest frequency from the FFT analysis, in Hz. Minimum frequency is 40hz.
  • FFT_Magnitude: amplitude of the strongest frequency- in units that only the ArduinoFFT library knows 🤷 . Typical range is [0... 4096].

Using a PC as source (PC audio to WLED)

For windows, there is WledSRServer (https://github.com/Victoare/SR-WLED-audio-server-win) which is a small application that is doing the audio capturing, FFT computation and packet sending on the system. It sends out V2 packets.

What else ?

You might want to take a look at this library, which allows to send and receive WLED Audio Sync data independent from WLED.

When an ESP32 or ESP8266 is configured to receive audio data from another device, the receiver will disable any onboard microphone sampling and FFT processing, in favor of audio data received from the network. Any time a UDP Multicast packet is received from a transmitter, it will be treated as a discrete microphone sample and stored in memory the same way it would be if it were a local microphone.

  • An ESP8266 will not be able to use any FFT data transmitted from an ESP32, as a result of the differences in hardware and software.

  • The UDP  Multicast  IP is 239.0.0.1, and the default UDP port is 11988.

  • UDP port can be changed in WLED config pages, for example to have several groups of devices by assigning different UDP ports to each group.

  • the software sends/receives one packet every 20 milliseconds (approx). An external sender may be slower, but not faster than 20ms = 50fps.

  • UDP multicast is generally not very reliable with typical "consumer grade hardware". Some users found that creating a "port forwarding rule" on their local Wifi router helps. For example, you could create a "dynamic port forwarding rule" for UDP port 11988.

UDP Sound sync brought to you by @spedione on Discord.

Reference: https://github.com/Aircoookie/WLED/wiki/UDP-Realtime-Control

Clone this wiki locally