<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>the blog of david dean &#187; research</title>
	<atom:link href="http://www.davidbdean.com/category/research/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.davidbdean.com</link>
	<description>currently not blogging much at all</description>
	<lastBuildDate>Sat, 21 Jun 2008 15:30:40 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Interspeech and AVSP 2007</title>
		<link>http://www.davidbdean.com/2007/10/12/interspeech-and-avsp-2008/</link>
		<comments>http://www.davidbdean.com/2007/10/12/interspeech-and-avsp-2008/#comments</comments>
		<pubDate>Fri, 12 Oct 2007 05:33:33 +0000</pubDate>
		<dc:creator>David Dean</dc:creator>
				<category><![CDATA[biometrics]]></category>
		<category><![CDATA[conference]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[speech]]></category>

		<guid isPermaLink="false">http://www.davidbdean.com/2007/10/12/interspeech-and-avsp-2008/</guid>
		<description><![CDATA[I recently attended two speech related conference over in Europe. It seems I like my international conferences in twos. The first conference was the Interspeech 2007 conference in Antwerp, Belgium, and the second was the International Conference on Auditory-Visual Speech Processing (AVSP) 2007 near Hilvarenbeek in the Netherlands. Both were good experiences and will be [...]]]></description>
			<content:encoded><![CDATA[<p>I recently attended two speech related conference over in Europe. <a href="http://www.davidbdean.com/2006/07/04/mmua-and-icassp-2006/">It seems I like my international conferences in twos</a>. The first conference was the <a href="http://www.interspeech2007.org/">Interspeech 2007</a> conference in Antwerp, Belgium, and the second was the <a href="http://foap.uvt.nl/avsp2007/">International Conference on Auditory-Visual Speech Processing (AVSP) 2007</a> near Hilvarenbeek in the Netherlands. Both were good experiences and will be helpful to my research.</p>
<p>After a lovely 20 hour flight from Brisbane, with stop-overs at every corner of the globe (it seemed), I arrived in Antwerp for the eight annual Interspeech conference. The Interspeech conferences replace the Eurospeech and ICSLP conferences, which used to alternate year-by-year. It is now considered taboo to mention these earlier names, as it is just Interspeech &#8211; at least this is what we were told at the welcome lecture. </p>
<p>Although speech is a fairly focussed area of signal processing, there is still a lot of topics that can be covered under Interspeech&#8217;s umbrella, and some of them weren&#8217;t of much interest to me. However, I did manage to attend a number of sessions on most of the 5 days of the conference. I was a little dissapointed that my area of research, multi-modal speech processing, had it&#8217;s only oral and poster session <em>on at the same time!</em>. That was quite annoying, but I did manage to see most of both sessions, even though I was presenting an oral in one of them. In particular I found some of the research into recognising speech with infrared sensors by Bo Zhu at MIT interesting.</p>
<p>The social program of Interspeech was quite nice, with lots of free food and Belgium beer available at various social events on most of the nights of the conference. Entrance to the Antwerp Zoo, next door to the conference venue, was also included in the conference registration, although all those animals in such a small area seemed a little sad to me.</p>
<p>On the final day of the Interspeech conference, I had to pack my bags and catch a hour or so train to Tilberg, Netherlands where I could catch an expensive taxi to Kasteel Groenendael in Hilvarenbeek for AVSP 2007. It probably would have been nice if AVSP had arranged a shuttle bus, as the taxi to Hilvarenbeek cost more than the train trip from Belgium, although I did get to share the cost with some other attendees on the way back.</p>
<p>The AVSP 2007 conference was a small workshop style conference specifically devoted to my area of research, although it does focus on human perception as well as automatic, which is more my style. I got to see a lot of interesting research at the AVSP workshop, although did seem to be a little perception heavy. However, I found the studies of how humans do what I am trying to perform with computers provided a good perspective on my research that I don&#8217;t normally encounter. Although even further away from my area of research, I found invited speaker <a href="http://weblamp.princeton.edu/%7epsych/psychology/research/ghazanfar/index.php">Asif Ghazanfar</a>&#8217;s talk on speech perception in monkeys (that is monkey-speech perception) to be very well presented and quite interesting.</p>
<p>Kasteel Groenendael is <a href="http://www.philips.com/">Philip Electronic</a>&#8217;s executive training centre just outside the small village on Hilvarenbeek. Seeing what their executive training centre is like, I don&#8217;t think I&#8217;d mind working for Philips. Everything was provided for us at the workshop, and I&#8217;d probably even say it was worth loosing my weekend. I also got to meet and discuss research with a lot of interesting people over breakfast, lunch and dinner over the two days, and I hope to keep in touch with many of them.</p>
<p>Finally, I&#8217;d like to thank <a href="http://www.qut.edu.au">QUT</a> and <a href="http://www.assta.org/initiatives/conferences/">ASSTA</a> for supplying the funding to travel to Europe and attend these conferences.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.davidbdean.com/2007/10/12/interspeech-and-avsp-2008/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Journal Impact and Eigenfactor</title>
		<link>http://www.davidbdean.com/2007/07/19/journal-impact-and-eigenfactor/</link>
		<comments>http://www.davidbdean.com/2007/07/19/journal-impact-and-eigenfactor/#comments</comments>
		<pubDate>Thu, 19 Jul 2007 06:57:07 +0000</pubDate>
		<dc:creator>David Dean</dc:creator>
				<category><![CDATA[impact]]></category>
		<category><![CDATA[journals]]></category>
		<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://www.davidbdean.com/2007/07/19/journal-impact-and-eigenfactor/</guid>
		<description><![CDATA[I&#8217;ve been looking for good journals to submit a paper to for a while now, and working out the quality of journals is not the easiest thing to do. Traditionally, the impact factor as calculated and listed in Journal Citation Reports (JCR) have been used, but the access to these impact factors is not free, [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been looking for good journals to submit a paper to for a while now, and working out the quality of journals is not the easiest thing to do. Traditionally, the impact factor as calculated and listed in <a href="http://scientific.thomson.com/free/essays/journalcitationreports/impactfactor/">Journal Citation Reports (JCR)</a> have been used, but the access to these impact factors is not free, and even when you have access it is a pain to use.</p>
<p>However, while perusing the wikipedia article on <a href="http://en.wikipedia.org/wiki/Impact_factor">impact factors</a>, I stumbled upon <a href="http://www.eigenfactor.org/index.php">eigenfactor.org</a> who take a much more <a href="http://en.wikipedia.org/wiki/PageRank">page-ranky</a> approach to journal impact. This apparently does a much better job than the more linear approach of the JCR. And what is even better is that it is easy to use. Here, go look at the journals listed in <a href="http://www.eigenfactor.org/results.php?fulljournalname1=&#038;finecat=EP&#038;resultsperpage=100&#038;issnnumber=&#038;ordering=perarticle&#038;grping=%25&#038;Submit=Search">Computer Science &#8211; AI</a>. Very easy to understand. And what is even <em>more</em> better is that it is completely free and any Joe or Jane on the internet can access it. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.davidbdean.com/2007/07/19/journal-impact-and-eigenfactor/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Portrait and Sebastien Marcel interview</title>
		<link>http://www.davidbdean.com/2007/06/18/google-portrait-and-sebastien-marcel-interview/</link>
		<comments>http://www.davidbdean.com/2007/06/18/google-portrait-and-sebastien-marcel-interview/#comments</comments>
		<pubDate>Mon, 18 Jun 2007 12:20:16 +0000</pubDate>
		<dc:creator>David Dean</dc:creator>
				<category><![CDATA[IDIAP]]></category>
		<category><![CDATA[biometrics]]></category>
		<category><![CDATA[face recognition]]></category>
		<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://www.davidbdean.com/2007/06/18/google-portrait-and-sebastien-marcel-interview/</guid>
		<description><![CDATA[Have you seen Google Portrait? It is actually by Sebastien Marcel at IDIAP, not Google, but it is a nice little application of face detection. Basically you can type anything you like into the search box, and the site will search Google Images for your search term, and return any faces it finds in the [...]]]></description>
			<content:encoded><![CDATA[<p>Have you seen <a href="http://www.idiap.ch/googleportrait/">Google Portrait</a>? It is actually by <a href="http://www.idiap.ch/~marcel">Sebastien Marcel</a> at IDIAP, not Google, but it is a nice little application of face detection. Basically you can type anything you like into the search box, and the site will search Google Images for your search term, and return any faces it finds in the images. Sebastien did an <a href="http://blog.outer-court.com/archive/2007-06-18-n53.html">interview</a> on Google Blogoscoped, which is where I have just heard about this. <a href="http://www.bananasecurity.com/">BananaScreen</a> was also mentioned, a fun little application that locks and unlocks your winxp computer based on if it recognises you in the webcam!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.davidbdean.com/2007/06/18/google-portrait-and-sebastien-marcel-interview/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>An introduction to audio-visual speech recognition</title>
		<link>http://www.davidbdean.com/2007/04/30/an-introduction-to-audio-visual-speech-recognition/</link>
		<comments>http://www.davidbdean.com/2007/04/30/an-introduction-to-audio-visual-speech-recognition/#comments</comments>
		<pubDate>Mon, 30 Apr 2007 04:30:18 +0000</pubDate>
		<dc:creator>David Dean</dc:creator>
				<category><![CDATA[audio-visual]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[speech]]></category>

		<guid isPermaLink="false">http://www.davidbdean.com/2007/04/30/an-introduction-to-audio-visual-speech-recognition/</guid>
		<description><![CDATA[This is from an introduction to my latest paper, and I thought it might be useful to put up here. Feel free to leave any comments on this below.
Audio-visual Speech Recognition
Automatic speech recognition is a very mature area of research, and one that is increasingly becoming involved in our day-to-day lives. While many systems that [...]]]></description>
			<content:encoded><![CDATA[<p>This is from an introduction to my latest paper, and I thought it might be useful to put up here. Feel free to leave any comments on this below.</p>
<h3>Audio-visual Speech Recognition</h3>
<p>Automatic speech recognition is a very mature area of research, and one that is increasingly becoming involved in our day-to-day lives. While many systems that can recognise speech from an audio signal have shown promise when performing well defined tasks like dictation or call-centre navigation in reasonably controlled environments, automatic speech recognition has certainly not yet reached the stage where a user can seamlessly interact with a automatic speech interface [<a href="#1">1</a>]. One of the major stumbling blocks to speech becoming an alternative human-computer interface is the lack of robustness of present systems to channel or environmental noise, which can degrade performance by many orders of magnitude [<a href="#2">2</a>].</p>
<p>However, speech does not consist of the audio modality alone, and studies of human production and perception of speech have shown that the visual movement of the speaker&#8217;s face and lips are an important factor in human communication. Hiding or modifying one of the modalities independent of the other has been shown to cause errors in human speech perception [<a href="#3">3</a>, <a href="#4">4</a>]. </p>
<p>Fortunately many of the sources of audio degradation can be considered to have little effect on the visual signal, for example, a group of people talking out of view of the camera. A similar assumption can also be drawn about many sources of video degradation, such as face movement or minor occlusions. By taking advantage of visual speech in combination with traditional audio speech, automatic speech recognition systems can increase the robustness to degradation in both modalities.</p>
<p>The chosen method of combining these two orthogonal sources of information is still a major area of ongoing research in audio-visual speech recognition (AVSR). Early AVSR systems could be generally be divided into two main groups, early or late integration, based on whether the two modalities were combined before or after classification/scoring. Late integration had the advantage that the reliability of each modality&#8217;s classifier could be weighted easily before combination, but was difficult to use on anything but isolated word recognition due to the problem of aligning and fusing two possibly significantly different speech transcriptions. This was not a problem with early integration, where features are combined before using a single classifier, but, on the other hand, it would be very difficult to model the reliability of each modality. </p>
<p>To allow a compromise between these two extremes, middle integration schemes were developed that allow classifier scores to be combined in a weighted manner within the structure of the classifier itself. The simplest of the middle integration methods, and the subject of this paper, is the synchronous multi-stream HMM [<a href="#1">1</a>] (MSHMM). There are more complicated middle integration designs, primarily intended to allow modelling of the asynchronous nature of audio visual speech, such as asynchronous [<a href="#5">5</a>], product [<a href="#1">1</a>] or coupled HMMs [<a href="#6">6</a>]. These designs can be significantly more complicated to train and test, however, and the small performance increase may not be worth it, especially in embedded environments where processing power or memory might be limited.</p>
<h3>References</h3>
<p><a name="1">[1]</a> G. Potamianos, C. Neti, G. Gravier, A. Garg, and A. Senior, “<a href="http://scholar.google.com/scholar?q=Recent+advances+in+the+automatic+recognition+of+audiovisual+speech">Recent advances in the automatic recognition of audiovisual speech</a>,” <em>Proceedings of the IEEE</em>, vol. 91, no. 9, pp. 1306–1326, 2003.</p>
<p><a name="2">[2]</a> Y. Gong, “<a href="http://scholar.google.com/scholar?q=Speech+recognition+in+noisy+environments%3A+A+survey">Speech recognition in noisy environments: A survey</a>,” <em>Speech Communication</em>, vol. 16, no. 3, pp. 261–291, 1995.</p>
<p><a name="3">[3]</a> H. McGurk and J. MacDonald, “<a href="http://scholar.google.com/scholar?hl=en&#038;lr=&#038;q=Hearing+lips+and+seeing+voices&#038;btnG=Search">Hearing lips and seeing voices</a>,” <em>Nature</em>, vol. 264, no. 5588, pp. 746–748, Dec. 1976.</p>
<p><a name="4">[4]</a> S. M. Thomas and T. R. Jordan, “<a href="http://scholar.google.com/scholar?hl=en&#038;lr=&#038;q=Contributions+of+oral+and+extraoral+facial+movement+to+visual+and+audiovisual+speech+perception&#038;btnG=Search">Contributions of oral and extraoral facial movement to visual and audiovisual speech perception</a>,” <em>Journal of Experimental Psychology: Human Perception and Performance</em>, vol. 30, no. 5, pp. 873–888, 2004.</p>
<p><a name="5">[5]</a> S. Bengio, “<a href="http://scholar.google.com/scholar?q=Multimodal+speech+processing+using+asynchronous+hidden+markov+models">Multimodal speech processing using asynchronous hidden markov models</a>,” <em>Information Fusion</em>, vol. 5, no. 2, pp. 81–9, June 2004.</p>
<p><a name="6">[6]</a> A. Nefian, L. Liang, X. Pi, L. Xiaoxiang, C. Mao, and K. Murphy, “<a href="http://scholar.google.com/scholar?q=A+coupled+hmm+for+audio-visual+speech+recognition">A coupled hmm for audio-visual speech recognition</a>,” in <em>Acoustics, Speech, and Signal Processing, 2002. Proceedings. (ICASSP ’02). IEEE International Conference on</em>, vol. 2, 2002, pp. 2013–2016.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.davidbdean.com/2007/04/30/an-introduction-to-audio-visual-speech-recognition/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Audio-visual speech and the McGurk effect</title>
		<link>http://www.davidbdean.com/2007/04/23/audio-visual-speech-and-the-mcgurk-effect/</link>
		<comments>http://www.davidbdean.com/2007/04/23/audio-visual-speech-and-the-mcgurk-effect/#comments</comments>
		<pubDate>Mon, 23 Apr 2007 03:49:55 +0000</pubDate>
		<dc:creator>David Dean</dc:creator>
				<category><![CDATA[audio-visual]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[speech]]></category>

		<guid isPermaLink="false">http://www.davidbdean.com/2007/04/23/audio-visual-speech-and-the-mcgurk-effect/</guid>
		<description><![CDATA[It may not be immediately obvious to most, but speech is fundamentally a multimodal interaction. (Multimodal is the fancy-pants way of saying that the interaction occurs through more than one mode or channel of communication &#8211; audio, visual, gestural, etc.).
While we can communicate very well with audio alone, such as during a telephone call, our [...]]]></description>
			<content:encoded><![CDATA[<p>It may not be immediately obvious to most, but speech is fundamentally a <a href="http://en.wikipedia.org/wiki/Multimodal_interaction">multimodal interaction</a>. (Multimodal is the <a href="http://en.wikipedia.org/wiki/Jargon">fancy-pants way</a> of saying that the interaction occurs through more than one mode or channel of communication &#8211; audio, visual, gestural, etc.).</p>
<p>While we can communicate very well with audio alone, such as during a telephone call, our brains make use of many visual cues when we talk face-to-face. As well as more broad visual cues such as gestures and facial expressions, it may come as a surprise to learn that the actual motion of the lips play a very important part in the comprehension of human speech.</p>
<p>A useful demonstration of the impact of the visual modality on speech is the McGurk effect, first published by <a href="http://scholar.google.com/scholar?hl=en&#038;lr=&#038;q=mcgurk+Hearing+lips+and+seeing+voices&#038;btnG=Search">McGurk and McDonald in 1976</a>. Rather than explain it in too much detail right now, go watch the video below from an <a href="http://www.hackszine.com/blog/archive/2007/02/hear_with_your_eyes_the_mcgurk.html">episode of the Hackszine video podcast</a>.</p>
<p><object width="425" height="350"><param name="movie" value="http://www.youtube.com/v/T4fUi0eG1X4"></param><param name="wmode" value="transparent"></param><embed src="http://www.youtube.com/v/T4fUi0eG1X4" type="application/x-shockwave-flash" wmode="transparent" width="425" height="350"></embed></object></p>
<p>The basic and original McGurk effect was demonstrated by dubbing a video of a person saying &#8216;gah&#8217; with audio of them saying &#8216;bah&#8217;. If you watch the dubbed video, they appear to be saying &#8216;dah&#8217;, but the audio along clearly says &#8216;bah&#8217;. This shows that even though you may not realise it, the visual lip movements are having an effect on your perception of speech. The hackszine video extends the McGurk effect to cover bad dubbing in general, but I would only consider the McGurk effect to cover when said bad dubbing appears to make the person say something that is neither in the video or dubbed audio.</p>
<p>Finally, this (I think) Japanese talk show appears to be <em>very</em> interested in the McGurk effect. It makes for fairly amusing watching.</p>
<p><object width="425" height="350"><param name="movie" value="http://www.youtube.com/v/eD4x_6HBi7E"></param><param name="wmode" value="transparent"></param><embed src="http://www.youtube.com/v/eD4x_6HBi7E" type="application/x-shockwave-flash" wmode="transparent" width="425" height="350"></embed></object></p>
<p>More information:</p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Mcgurk_effect">McGurk Effect</a> at Wikipedia</li>
<li><a href="http://homepage.ntu.edu.tw/~karchung/Phonetics%20II%20page%20seventeen.htm">Hearing with your eyes: The McGurk Effect</a></li>
<li>McGurk, Harry; and MacDonald, John (1976); &#8220;<a href="http://scholar.google.com/scholar?hl=en&#038;lr=&#038;q=mcgurk+Hearing+lips+and+seeing+voices&#038;btnG=Search">Hearing lips and seeing voices</a>,&#8221; Nature, Vol 264(5588), pp. 746–748</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.davidbdean.com/2007/04/23/audio-visual-speech-and-the-mcgurk-effect/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Audio-visual speaker verification using continuous fused HMMs</title>
		<link>http://www.davidbdean.com/2006/10/24/audio-visual-speaker-verification-using-continuous-fused-hmms/</link>
		<comments>http://www.davidbdean.com/2006/10/24/audio-visual-speaker-verification-using-continuous-fused-hmms/#comments</comments>
		<pubDate>Tue, 24 Oct 2006 03:06:57 +0000</pubDate>
		<dc:creator>David Dean</dc:creator>
				<category><![CDATA[fhmm]]></category>
		<category><![CDATA[publications]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[speech]]></category>

		<guid isPermaLink="false">http://www.davidbdean.com/2006/10/24/audio-visual-speaker-verification-using-continuous-fused-hmms/</guid>
		<description><![CDATA[Dean, David and Sridharan, Sridha and Wark, Tim (2006) Audio-visual speaker verification using continuous fused HMMs. In Proceedings  HCSNet Workshop on the Use of Vision in HCI, Canberra, Australia.
This paper examines audio-visual speaker verification using a novel adaptation of fused hidden Markov models, in comparison to output fusion of individual classifiers in the audio [...]]]></description>
			<content:encoded><![CDATA[<p>Dean, David and Sridharan, Sridha and Wark, Tim (2006) Audio-visual speaker verification using continuous fused HMMs. In <em>Proceedings  HCSNet Workshop on the Use of Vision in HCI</em>, Canberra, Australia.</p>
<blockquote><p>This paper examines audio-visual speaker verification using a novel adaptation of fused hidden Markov models, in comparison to output fusion of individual classifiers in the audio and video modalities. A comparison of both hidden Markov model (HMM) and Gaussian mixture model (GMM) classifiers in both modalities under output fusion shows that the choice of audio classier is more important than video. Although temporal information allows a HMM to out-perform a GMM individually in video, this temporal information does not carry through to output fusion with an audio classier, where the difference between the two video classifiers is minor. An adaptation of fused hidden Markov models, designed to be more robust to within-speaker variation, is used to show that the temporal relationship between video observations and audio states can be harnessed to reduce errors in audio-visual speaker verification when compared to output fusion.</p></blockquote>
<p>[ <a href="http://eprints.qut.edu.au/archive/00005271/">link</a> | <a href="http://eprints.qut.edu.au/archive/00005271/01/5271.pdf">paper (pdf)</a> | <a href="http://eprints.qut.edu.au/archive/00005390/02/Audio-Visual_Speaker_Verification_using_Continuous_Fused_HMMs.ppt">slides (ppt)</a> ]</p>
]]></content:encoded>
			<wfw:commentRss>http://www.davidbdean.com/2006/10/24/audio-visual-speaker-verification-using-continuous-fused-hmms/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>An examination of audio-visual fused HMMs for speaker recognition</title>
		<link>http://www.davidbdean.com/2006/10/24/an-examination-of-audio-visual-fused-hmms-for-speaker-recognition/</link>
		<comments>http://www.davidbdean.com/2006/10/24/an-examination-of-audio-visual-fused-hmms-for-speaker-recognition/#comments</comments>
		<pubDate>Tue, 24 Oct 2006 03:02:49 +0000</pubDate>
		<dc:creator>David Dean</dc:creator>
				<category><![CDATA[fhmm]]></category>
		<category><![CDATA[publications]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[speech]]></category>

		<guid isPermaLink="false">http://www.davidbdean.com/2006/10/24/an-examination-of-audio-visual-fused-hmms-for-speaker-recognition/</guid>
		<description><![CDATA[Dean, David and Wark, Tim and Sridharan, Sridha (2006) An examination of audio-visual fused HMMs for speaker recognition. In Proceedings Second Workshop on Multimodal User Authentication, Toulouse, France.
Fused hidden Markov models (FHMMs) have been shown to work well for the task of audio-visual speaker recognition, but only in an output decision-fusion configuration of both the [...]]]></description>
			<content:encoded><![CDATA[<p>Dean, David and Wark, Tim and Sridharan, Sridha (2006) An examination of audio-visual fused HMMs for speaker recognition. In <em>Proceedings Second Workshop on Multimodal User Authentication</em>, Toulouse, France.</p>
<blockquote><p>Fused hidden Markov models (FHMMs) have been shown to work well for the task of audio-visual speaker recognition, but only in an output decision-fusion configuration of both the audio- and video-biased versions of the FHMM structure. This paper looks at the performance of the audio and video-biased versions independently, and shows that the audio-biased version is considerably more capable for speaker recognition. Additionally, this paper shows that by taking advantage of the temporal relationship between the acoustic and visual data, the audio-biased FHMM provides better performance at less processing cost than best-performing output decision-fusion of regular HMMs.</p></blockquote>
<p>[ <a href="http://eprints.qut.edu.au/archive/00005343/">link</a> | <a href="http://eprints.qut.edu.au/archive/00005343/01/5269.pdf">paper (pdf)</a> | <a href="http://eprints.qut.edu.au/archive/00005343/02/An_Examination_of_Audio-Visual_Fused_HMMs_for_Speaker_Recognition.ppt">slides (ppt)</a> ]</p>
]]></content:encoded>
			<wfw:commentRss>http://www.davidbdean.com/2006/10/24/an-examination-of-audio-visual-fused-hmms-for-speaker-recognition/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Comparing Audio and Visual Information for Speech Processing</title>
		<link>http://www.davidbdean.com/2006/10/24/comparing-audio-and-visual-information-for-speech-processing/</link>
		<comments>http://www.davidbdean.com/2006/10/24/comparing-audio-and-visual-information-for-speech-processing/#comments</comments>
		<pubDate>Tue, 24 Oct 2006 02:59:54 +0000</pubDate>
		<dc:creator>David Dean</dc:creator>
				<category><![CDATA[publications]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[speech]]></category>

		<guid isPermaLink="false">http://www.davidbdean.com/2006/10/24/comparing-audio-and-visual-information-for-speech-processing/</guid>
		<description><![CDATA[Dean, David and Lucey, Patrick and Sridharan, Sridha and Wark, Tim (2005) Comparing Audio and Visual Information for Speech Processing. In Proceedings The Eighth International Symposium on Signal Processing and Its Applications, pages pp. 58-61, Sydney, Australia.
This paper examines the utility of audio-visual speech for the two related tasks of speech and speaker recognition. A [...]]]></description>
			<content:encoded><![CDATA[<p>Dean, David and Lucey, Patrick and Sridharan, Sridha and Wark, Tim (2005) Comparing Audio and Visual Information for Speech Processing. In <em>Proceedings The Eighth International Symposium on Signal Processing and Its Applications</em>, pages pp. 58-61, Sydney, Australia.</p>
<blockquote><p>This paper examines the utility of audio-visual speech for the two related tasks of speech and speaker recognition. A study of the confusion that exists between speaker and speech elements was performed to show that principal component analysis (PCA) based visual speech is considerably better for the task of speaker recognition than for speech. Decision fusion speech and speaker recognition engines were also tested under various levels of acoustic degradation to find that the optimal fusion configuration for speaker recognition was substantially different than that for speech. These results highlight the problem of employing similar visual features for both speech and speaker recognition.</p></blockquote>
<p>[ <a href="http://eprints.qut.edu.au/archive/00005342/">link</a> | <a href="http://eprints.qut.edu.au/archive/00005342/01/4693.pdf">paper (pdf)</a> | <a href="http://eprints.qut.edu.au/archive/00005342/02/Comparing_Audio_and_Visual_Information_for_Speech_Processing.ppt">slides (ppt)</a> ]</p>
]]></content:encoded>
			<wfw:commentRss>http://www.davidbdean.com/2006/10/24/comparing-audio-and-visual-information-for-speech-processing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Audio-visual speaker identification using the CUAVE database</title>
		<link>http://www.davidbdean.com/2006/10/24/audio-visual-speaker-identification-using-the-cuave-database/</link>
		<comments>http://www.davidbdean.com/2006/10/24/audio-visual-speaker-identification-using-the-cuave-database/#comments</comments>
		<pubDate>Tue, 24 Oct 2006 02:56:28 +0000</pubDate>
		<dc:creator>David Dean</dc:creator>
				<category><![CDATA[publications]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[speech]]></category>

		<guid isPermaLink="false">http://www.davidbdean.com/2006/10/24/audio-visual-speaker-identification-using-the-cuave-database/</guid>
		<description><![CDATA[Dean, David and Lucey, Patrick and Sridharan, Sridha (2005) Audio-visual speaker identification using the CUAVE database. In Vatikiotis-Bateson, Eric and Burnham, Denis and Fels, Sidney, Eds. Proceedings Auditory-Visual Speech Processing 2005, British Columbia, Canada.
The freely available nature of the CUAVE database allows it to provide a valuable platform to form benchmarks and compare research. This [...]]]></description>
			<content:encoded><![CDATA[<p>Dean, David and Lucey, Patrick and Sridharan, Sridha (2005) Audio-visual speaker identification using the CUAVE database. In Vatikiotis-Bateson, Eric and Burnham, Denis and Fels, Sidney, Eds. <em>Proceedings Auditory-Visual Speech Processing 2005</em>, British Columbia, Canada.</p>
<blockquote><p>The freely available nature of the CUAVE database allows it to provide a valuable platform to form benchmarks and compare research. This paper shows that the CUAVE database can successfully be used to test speaker identifications systems, with performance comparable to existing systems implemented on other databases. Additionally, this research shows that the optimal configuration for decision-fusion of an audio-visual speaker identification system relies heavily on the video modality in all but clean speech conditions.
</p></blockquote>
<p>[ <a href="http://eprints.qut.edu.au/archive/00005341/">link</a> | <a href="http://eprints.qut.edu.au/archive/00005341/01/4860.pdf">paper (pdf)</a> | <a href="http://eprints.qut.edu.au/archive/00005341/02/AVSP_Poster.ppt">poster (ppt)</a> ]</p>
]]></content:encoded>
			<wfw:commentRss>http://www.davidbdean.com/2006/10/24/audio-visual-speaker-identification-using-the-cuave-database/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>QUT ePrints suggestions</title>
		<link>http://www.davidbdean.com/2006/10/24/qut-eprints-suggestions/</link>
		<comments>http://www.davidbdean.com/2006/10/24/qut-eprints-suggestions/#comments</comments>
		<pubDate>Tue, 24 Oct 2006 00:44:07 +0000</pubDate>
		<dc:creator>David Dean</dc:creator>
				<category><![CDATA[QUT]]></category>
		<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://www.davidbdean.com/2006/10/24/qut-eprints-suggestions/</guid>
		<description><![CDATA[I recently sent this email to QUT&#8217;s ePrints service, and I thought I&#8217;d post it up here too, in case anyone else is interested.
Hi,
I have been updating my QUT eprints lately, and I would like to give a few suggestions as to how the system could be improved.
1) In lists of publications, make author names [...]]]></description>
			<content:encoded><![CDATA[<p>I recently sent this email to <a href="http://eprints.qut.edu.au/">QUT&#8217;s ePrints</a> service, and I thought I&#8217;d post it up here too, in case anyone else is interested.</p>
<blockquote><p>Hi,</p>
<p>I have been updating my QUT eprints lately, and I would like to give a few suggestions as to how the system could be improved.</p>
<p>1) In lists of publications, make author names link to their list of publications. That is, if you see a list of publications and I am an author in one of them, my name should link to <a href="http://eprints.qut.edu.au/view/person/Dean,_David.html">http://eprints.qut.edu.au/view/person/Dean,_David.html</a> where the rest of my publications can be viewed.</p>
<p>2) A bibtex and Endnote-format file should be available for download on both individual papers and lists of papers</p>
<p>3) Author pages should have a header detailing contact information about the author, if available (similar to the user pages)</p>
<p>4) RSS or Atom syndication feeds should be available for authors so people can keep an eye on an author if needed. This would also be useful for other lists &#8211; topics, faculties, all papers, etc.</p>
<p>5) I&#8217;m not completely sure how (or if) you can attach other files to a paper, such as slides or datasets, etc.</p>
<p>Thankyou,</p>
<p>David Dean</p></blockquote>
<p><strong><br />
UPDATE:</strong> Paula Callan, QUT eResearch Access Coordinator, contacted me and told me that <strike>all</strike> some of these things <strike>and more, presumably,</strike> will be available when the a system goes live in a month or so. I hope the old links still work (whoops, I read the email incorrectly &#8211; just the first two points should be in the new system, the rest are on the drawing board).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.davidbdean.com/2006/10/24/qut-eprints-suggestions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
