Sound Localization and Processing for Inducing Synesthetic Experiences in Virtual Reality
Aleksei Tepljakov, Sergei Astapov, Dirk Draheim, Eduard Petlenkov, and Kristina Vassiljeva
Talk Outline
- Why synesthesia and Virtual Reality?
- Problem statement
- Description of the proposed solution
- Experimental results
- Conclusion and future work
Synesthesia and Virtual Reality
- Synesthesia is the act of experiencing one sense modality as another, e.g., a person may vividly experience flashes of colors when listening to a series of sounds.
- Recent technological advances in the Virtual Reality field allow to induce such experiences due to the effect of presence achieved in the virtual environment.
- In this contribution, we focus on localization and visual interpretation of sound in Virtual Reality.
Leibniz' Monad Theory and Applications
- The original trigger for the particular synesthetic scenario has been discussions against the background of the Leibniz anniversary year 2016.
- In Leibniz theory monads are the smallest building blocks of mind that interact only via their senses.
- The eventual goal of the present project is thus to create the experience of a full exchange of senses.
- This has numerous medical and artistic applications.
Problem statement
To achieve a synesthetic experience we need to
- Precisely localize the sound source.
- Analyze the sound and extract its characteristics.
- Visualize the sound in the 3D virtual space.
Sound localization
- We use a conical array of microphones.
- Our proposition is to use a DOA method, since compared to SRP-PHAT it avoids frequency domain computations and is thus more efficient in terms of performance.
- Furthermore, the proposed DOA method allows to reduce the number of microphone pairs for cross-correlation.
Sets of microphones
-
For azimuth $\phi$ estimation we have set of pairs
\[
A_{h}=\left\{ \left(m_{i}^{h},m_{j}^{h}\right)\subseteq S_{2}^{M_{h}}\biggm|\alpha_{ij}<\frac{\pi}{2}\right\} ,
\]
where $S_2^{M_h}$ is the set of all combinations of horizontal microphone pairs.
-
For elevation $\theta$ estimation we have
\[
A_{v}=\left\{ \left(m_{i}^{h},m_{j}^{v}\right)\bigm|m_{i}^{h}\in A_{act},j=[1,M_{v}]\right\} \cup S_{2}^{M_{v}},
\]
where $S_{2}^{M_{v}}$ is the set of all combinations of vertical microphone pairs, and $A_{act}$ is the set of active horizontal microphones.
Sound localization: AOA estimation
-
Assuming far field disposition of the acoustic source
\[
\hat{\varphi}_{ij}=\sin^{-1}\left(\frac{\tau_{ij}\cdot c}{l}\right)=\sin^{-1}\left(\frac{\Delta k_{ij}/f_{s}\cdot c}{l}\right)\tag{1}.
\]
-
To estimate $\tau_{ij}$ we apply cross-correlation
\[
R_{ij}\left(\mathrm{\Delta}k\right)=\sum_{k=0}^{N-1}x_{m_{i}}[k]\cdot x_{m_{j}}[k-\mathrm{\Delta}k]. \tag{2}
\]
-
Then, the TDOA is
\[
\mathrm{\Delta}k_{ij}=\arg\max\left(R_{ij}\left(\mathrm{\Delta}k\right)\right). \tag{3}
\]
-
Finally, AOA estimates $(\phi,\theta)$ are computed using (3)$-$(6).
Acoustic feature extraction
Sound Visualization
-
In the VR environment the incoming sound waves are visualized as spheres moving towards the listener.
-
The color, size, velocity of travel, and sampling rate for generating the spheres can be determined experimentally.
-
The incoming waveforms are broken down into frames and analyzed as discussed previously.
Experimental setup: Microphone array
Experimental setup: Full configuration
Experimental setup: Data
-
A sound source is manually moved within a plane at a distance of about $r=1.5$m from the conical array with constant velocity.
-
An audio clip with modern music is used as audio such that has no distinct spectral features.
-
The AOA estimation discussed above is carried out with a window of $t_{s}=0.1$s.
-
The resulting angles (with average tolerance about $3^{\circ}$) are filtered and a trajectory of motion is recovered.
Experimental configuration
Acoustic localization results
Experimental signal analysis
-
The MFCC is calculated for the sound clip recorded by the central microphone of the circular array.
-
The sound amplitude and dominant spectral features are encoded as color as proposed above.
-
Thus, all necessary parameters for the VR sound visualization system have been successfully obtained.
Signal analysis results
Conclusions and further research
-
We have developed a prototype for acoustic sound localization, processing, and visualization for inducing a synesthetic experience in a VR environment.
-
Experimental data was successfully processed using the proposed approach yielding usable results.
-
Further research is necessary and has several branches: Real-time application; Implementation and verification in an embedded system; Expansion of the microphone array for accurate multiple sound source detection; study of the induced synesthetic effect in real subjects.
Thank you for your attention!
For more information visit http://recreation.ee/