多媒体内容分析与检索技术概要1.ppt

下载文档 降价啦

2
0
约1.75万字
约 55页
2017-07-10 发布于湖北
举报
版权申诉
保障服务

多媒体内容分析与检索技术概要1.ppt

1、本文档共55页，可阅读全部内容。
2、原创力文档（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。
3、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。

多媒体内容分析与检索技术概要1

* * * 控制在10页 * * * Method does not account for the frequency response of the human ear The necessary equalization can be added by applying the Fletcher-Munson equal-loudness contours Human ear can hear over 120 decibel range Software produces approximately over a 100 decibel range from a 16 bit audio recordings * 由於生理構造不同，男女生的音高範圍並不相同，一般而言：男生的音高範圍約在 35 ~ 72 半音，對應的頻率是 62 ~ 523 Hz。女生的音高範圍約在 45 ~ 83 半音，對應的頻率是 110 ~ 1000 Hz。 However, it should be emphasized that we are not using pitch alone to identify male or female voices. Moreover, we also use the information from timbre (or more precisely, formants) for such task. More information will be covered in later chapters. 但是我們分辨男女的聲並不是只憑音高，而還是依照音色（共振峰），詳見後續說明。 * Ex. Putting hand over mouth as you speak reduces brightness of speech as well as loudness. * Computed as magnitude-weighted average of differences between spectral components centroid * These aspects of sound vary over time. Trajectory in time is computed during the analysis but not stored in the db. Average value over trajectory Variance of value over trajectory Autocorrelation – measure of smoothness of trajectory – can distinguish between a pitch glissando a wildly varying pitch (ex) which the simple variance measure cannot 1st 3 are weighted to emphasize the important sections of the sound Table shows resulting analysis for a recording of male laughter. You can see some of the important characteristics of the sound. * * Specify sound directly by submitting constraints on the values of the N-vector directly to the system. Ex. User can ask for a certain range of pitch. OR train by example. M - # of sounds in summation In practice, can ignore off-diagonal elements of R if the feature vector elements are reasonably independent of each other. Simplifying leads to significant savings in computation time. Mean covariance together become system’s model of the perceptual property being trained by the user. * Again, off-dia