### Sound Localization and Processing for Inducing Synesthetic Experiences in Virtual Reality

Aleksei Tepljakov, Sergei Astapov, Dirk Draheim, Eduard Petlenkov, and Kristina Vassiljeva

### Talk Outline

• Why synesthesia and Virtual Reality?

• Problem statement

• Description of the proposed solution

• Experimental results

• Conclusion and future work

### Synesthesia and Virtual Reality

• Synesthesia is the act of experiencing one sense modality as another, e.g., a person may vividly experience flashes of colors when listening to a series of sounds.
• Recent technological advances in the Virtual Reality field allow to induce such experiences due to the effect of presence achieved in the virtual environment.
• In this contribution, we focus on localization and visual interpretation of sound in Virtual Reality.

### Leibniz' Monad Theory and Applications

• The original trigger for the particular synesthetic scenario has been discussions against the background of the Leibniz anniversary year 2016.
• In Leibniz theory monads are the smallest building blocks of mind that interact only via their senses.
• The eventual goal of the present project is thus to create the experience of a full exchange of senses.
• This has numerous medical and artistic applications.

### Problem statement

To achieve a synesthetic experience we need to

• Precisely localize the sound source.
• Analyze the sound and extract its characteristics.
• Visualize the sound in the 3D virtual space.

### Sound localization

• We use a conical array of microphones.
• Our proposition is to use a DOA method, since compared to SRP-PHAT it avoids frequency domain computations and is thus more efficient in terms of performance.
• Furthermore, the proposed DOA method allows to reduce the number of microphone pairs for cross-correlation.

### Sets of microphones

• For azimuth $\phi$ estimation we have set of pairs
$A_{h}=\left\{ \left(m_{i}^{h},m_{j}^{h}\right)\subseteq S_{2}^{M_{h}}\biggm|\alpha_{ij}<\frac{\pi}{2}\right\} ,$
where $S_2^{M_h}$ is the set of all combinations of horizontal microphone pairs.
• For elevation $\theta$ estimation we have
$A_{v}=\left\{ \left(m_{i}^{h},m_{j}^{v}\right)\bigm|m_{i}^{h}\in A_{act},j=[1,M_{v}]\right\} \cup S_{2}^{M_{v}},$
where $S_{2}^{M_{v}}$ is the set of all combinations of vertical microphone pairs, and $A_{act}$ is the set of active horizontal microphones.

### Sound localization: AOA estimation

• Assuming far field disposition of the acoustic source
$\hat{\varphi}_{ij}=\sin^{-1}\left(\frac{\tau_{ij}\cdot c}{l}\right)=\sin^{-1}\left(\frac{\Delta k_{ij}/f_{s}\cdot c}{l}\right)\tag{1}.$
• To estimate $\tau_{ij}$ we apply cross-correlation
$R_{ij}\left(\mathrm{\Delta}k\right)=\sum_{k=0}^{N-1}x_{m_{i}}[k]\cdot x_{m_{j}}[k-\mathrm{\Delta}k]. \tag{2}$
• Then, the TDOA is
$\mathrm{\Delta}k_{ij}=\arg\max\left(R_{ij}\left(\mathrm{\Delta}k\right)\right). \tag{3}$
• Finally, AOA estimates $(\phi,\theta)$ are computed using (3)$-$(6).

### Acoustic feature extraction

• Human perception of sound frequency contents for speech signals does not follow a linear scale.
• So we will use the Mel scale:
$f_m=2595\log_{10}\left(1+\frac{f}{700}\right).$
• We analyze the audio signal using the MFCC method which has also been successfully applied to modeling music.
• The corresponding algorithm returns several features of the signal, in this work we consider the auditory spectrum portion denoted hereinafter as $A_{spec}$.

### Sound Visualization

• In the VR environment the incoming sound waves are visualized as spheres moving towards the listener.
• The color, size, velocity of travel, and sampling rate for generating the spheres can be determined experimentally.
• The incoming waveforms are broken down into frames and analyzed as discussed previously.

### Color Mapping

• The size of a single sphere is determined by the scaled maximum amplitude in a waveform frame.
• The color of the sphere is determined by the dominant feature in auditory spectrum. A transform is defined as
$\xi:\mathscr{I}\rightarrow\mathscr{C},$
where $\mathscr{I}\subset\mathbb{N}$ is the index of the dominant feature in $A_{spec}$, and $\mathscr{C}\subset\mathbb{R}^{3}$ is the parameterized color specification in a particular color space.
• For this work, we consider the RGB color space.

### Experimental setup: Data

• A sound source is manually moved within a plane at a distance of about $r=1.5$m from the conical array with constant velocity.
• An audio clip with modern music is used as audio such that has no distinct spectral features.
• The AOA estimation discussed above is carried out with a window of $t_{s}=0.1$s.
• The resulting angles (with average tolerance about $3^{\circ}$) are filtered and a trajectory of motion is recovered.

### Experimental signal analysis

• The MFCC is calculated for the sound clip recorded by the central microphone of the circular array.
• The sound amplitude and dominant spectral features are encoded as color as proposed above.
• Thus, all necessary parameters for the VR sound visualization system have been successfully obtained.

### Conclusions and further research

• We have developed a prototype for acoustic sound localization, processing, and visualization for inducing a synesthetic experience in a VR environment.
• Experimental data was successfully processed using the proposed approach yielding usable results.
• Further research is necessary and has several branches: Real-time application; Implementation and verification in an embedded system; Expansion of the microphone array for accurate multiple sound source detection; study of the induced synesthetic effect in real subjects.