Audio-visual speaker verification using continuous fused HMMs
Posted on October 24, 2006
Dean, David and Sridharan, Sridha and Wark, Tim (2006) Audio-visual speaker verification using continuous fused HMMs. In Proceedings HCSNet Workshop on the Use of Vision in HCI, Canberra, Australia.
This paper examines audio-visual speaker verification using a novel adaptation of fused hidden Markov models, in comparison to output fusion of individual classifiers in the audio and video modalities. A comparison of both hidden Markov model (HMM) and Gaussian mixture model (GMM) classifiers in both modalities under output fusion shows that the choice of audio classier is more important than video. Although temporal information allows a HMM to out-perform a GMM individually in video, this temporal information does not carry through to output fusion with an audio classier, where the difference between the two video classifiers is minor. An adaptation of fused hidden Markov models, designed to be more robust to within-speaker variation, is used to show that the temporal relationship between video observations and audio states can be harnessed to reduce errors in audio-visual speaker verification when compared to output fusion.
[ link | paper (pdf) | slides (ppt) ]
» Filed Under fhmm, publications, research, speech
Comments
Leave a Reply