Sound Localization and Processing for Inducing Synesthetic Experiences in Virtual Reality
Aleksei Tepljakov, Sergei Astapov, Dirk Draheim, Eduard Petlenkov, and Kristina Vassiljeva
Talk Outline
 Why synesthesia and Virtual Reality?
 Problem statement
 Description of the proposed solution
 Experimental results
 Conclusion and future work
Synesthesia and Virtual Reality
 Synesthesia is the act of experiencing one sense modality as another, e.g., a person may vividly experience flashes of colors when listening to a series of sounds.
 Recent technological advances in the Virtual Reality field allow to induce such experiences due to the effect of presence achieved in the virtual environment.
 In this contribution, we focus on localization and visual interpretation of sound in Virtual Reality.
Leibniz' Monad Theory and Applications
 The original trigger for the particular synesthetic scenario has been discussions against the background of the Leibniz anniversary year 2016.
 In Leibniz theory monads are the smallest building blocks of mind that interact only via their senses.
 The eventual goal of the present project is thus to create the experience of a full exchange of senses.
 This has numerous medical and artistic applications.
Problem statement
To achieve a synesthetic experience we need to
 Precisely localize the sound source.
 Analyze the sound and extract its characteristics.
 Visualize the sound in the 3D virtual space.
Sound localization
 We use a conical array of microphones.
 Our proposition is to use a DOA method, since compared to SRPPHAT it avoids frequency domain computations and is thus more efficient in terms of performance.
 Furthermore, the proposed DOA method allows to reduce the number of microphone pairs for crosscorrelation.
Sets of microphones

For azimuth $\phi$ estimation we have set of pairs
\[
A_{h}=\left\{ \left(m_{i}^{h},m_{j}^{h}\right)\subseteq S_{2}^{M_{h}}\biggm\alpha_{ij}<\frac{\pi}{2}\right\} ,
\]
where $S_2^{M_h}$ is the set of all combinations of horizontal microphone pairs.

For elevation $\theta$ estimation we have
\[
A_{v}=\left\{ \left(m_{i}^{h},m_{j}^{v}\right)\bigmm_{i}^{h}\in A_{act},j=[1,M_{v}]\right\} \cup S_{2}^{M_{v}},
\]
where $S_{2}^{M_{v}}$ is the set of all combinations of vertical microphone pairs, and $A_{act}$ is the set of active horizontal microphones.
Sound localization: AOA estimation

Assuming far field disposition of the acoustic source
\[
\hat{\varphi}_{ij}=\sin^{1}\left(\frac{\tau_{ij}\cdot c}{l}\right)=\sin^{1}\left(\frac{\Delta k_{ij}/f_{s}\cdot c}{l}\right)\tag{1}.
\]

To estimate $\tau_{ij}$ we apply crosscorrelation
\[
R_{ij}\left(\mathrm{\Delta}k\right)=\sum_{k=0}^{N1}x_{m_{i}}[k]\cdot x_{m_{j}}[k\mathrm{\Delta}k]. \tag{2}
\]

Then, the TDOA is
\[
\mathrm{\Delta}k_{ij}=\arg\max\left(R_{ij}\left(\mathrm{\Delta}k\right)\right). \tag{3}
\]

Finally, AOA estimates $(\phi,\theta)$ are computed using (3)$$(6).
Acoustic feature extraction
Sound Visualization

In the VR environment the incoming sound waves are visualized as spheres moving towards the listener.

The color, size, velocity of travel, and sampling rate for generating the spheres can be determined experimentally.

The incoming waveforms are broken down into frames and analyzed as discussed previously.
Experimental setup: Microphone array
Experimental setup: Full configuration
Experimental setup: Data

A sound source is manually moved within a plane at a distance of about $r=1.5$m from the conical array with constant velocity.

An audio clip with modern music is used as audio such that has no distinct spectral features.

The AOA estimation discussed above is carried out with a window of $t_{s}=0.1$s.

The resulting angles (with average tolerance about $3^{\circ}$) are filtered and a trajectory of motion is recovered.
Experimental configuration
Acoustic localization results
Experimental signal analysis

The MFCC is calculated for the sound clip recorded by the central microphone of the circular array.

The sound amplitude and dominant spectral features are encoded as color as proposed above.

Thus, all necessary parameters for the VR sound visualization system have been successfully obtained.
Signal analysis results
Conclusions and further research

We have developed a prototype for acoustic sound localization, processing, and visualization for inducing a synesthetic experience in a VR environment.

Experimental data was successfully processed using the proposed approach yielding usable results.

Further research is necessary and has several branches: Realtime application; Implementation and verification in an embedded system; Expansion of the microphone array for accurate multiple sound source detection; study of the induced synesthetic effect in real subjects.
Thank you for your attention!
For more information visit http://recreation.ee/