TR#472: Multimodal Person Recognition using Unconstrained Audio and Video

Tanzeem Choudhury, Brian Clarkson, Tony Jebara and Alex Pentland

Submitted to AVBPA'99
Washington, DC

We propose a person identification technique that can recognize and verify people from unconstrained video and audio. We do not expect fully frontal face image or clean speech as our input. Our recognition algorithm can detect and compensate for pose variation and changes in the auditory background and also select the most reliable video frame and audio clip to use for recognition. We also use 3D depth information of a human head to detect the presence of an actual person as opposed to an image of that person. Our system achieves 100% recognition and verification rates on natural real-time input with 26 registered clients.