Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation Duangmanee Putthividhya Hagai T. Attias Srikantan S. Nagarajan UCSD Golden Metallic, Inc. UCSF 9500 Gilman Drive P. O. Box 475608 513 Parnassus Avenue La Jolla, CA 92307 San Francisco, CA 91147 San Francisco, CA 94143 putthi@ htattias@ sri@ Abstract one might ask is how to deal with numerous fast-growing user-generated content that often lacks descriptive annota- We present topic-regression multi-modal Latent Dirich- tion texts which would enable accurate semantic retrieval to let Allocation (tr-mmLDA), a novel statistical topic model be performed. The traditional solution is to employ manual for the task of image and video annotation. At the heart of labeling—a process that is costly and unscalable to large- our new annotation model lies a novel latent variable re- scale repositories. With recent unprecedented availability gression approach to capture correlations between image of image and video data online, there is a growing demand or video features and annotation texts. Instead of sharing a to bypass the human intervention and develop automated set of latent topics between the 2 data modalities as in the tools that can generate semantic descriptors of multimedia formulation of correspondence LDA in [2], our approach content—automatic annotation systems. Given a database introduces a regression module to correlate the 2 sets


