[工学]3-2线性、二次分类器.ppt

[工学]3-2线性、二次分类器

* * Subsetting methods: a filter method in which the dimension is reduced by selecting a few of the original features and ignoring the others. The dataset under study consists of samples from known classes and the objective is to find that sub-set of features which maintains the class separation. Feature space transformation methods: an aggregation method in which the idea is to reduce the dimension of the sample space, by creating a new set of features which are linear or nonlinear combinations of the original features. * * 常用算法: Linear Mapping Algorithms: 找一个线性变换矩阵A:X= ATY 使某种可分离性判据J极大化. A叫作特征提取器. 其中: Y=[y1, y2, …, yD]T D维原始特征空间的样本 D d X =[x1, x2, …, xd]T d维压缩后特征空间的样本 * * Linear Mapping Algorithms的关键: find a criteria for discarding some eigenvectors and eigenvalues of original eigenspace model. Linear Mapping Algorithms的方法有: Stipulate d as a integer and so keep the d largest eigenvectors. Keep those d eigenvectors whose size is larger than a absolute threshold Keep the d eigenvectors such that a specified fraction of energy in the eigenspectrum(computed as the sum of eigenvalues) is retained * * Non-Linear Mapping Methodology:参见: D.SUDHANVA and K.C.G “Dimensionality reduction using geometric projections:A new technique” Pattern Recognition, Vol,25,pp.809-817,1992 WITOLD DZWINEL “How to make Sammon’s mapping useful for multidimensional data structures analysis” Pattern Recognition, Vol,27,pp.949-959,1994 * * 问题的提出: 把一个高维特征空间映射为低维特征空间的映射很多,哪种映射对分类最有利,需要有一种比较标准.理想的标准是最小误识率.但在实际中,误识率计算非常复杂,因此,实际中常用一些类别可分离性判据作为标准.这是一种次最优方法. * * 可分离性判据满足的条件:P178 * * 各类特征向量间的平均距离: Fisher准则函数: Or: 其它: * * * * 对于对数的似然比检验 , ?也是一个随机 变量。它可以用两个密度函数 和 来描述。如下图所示,当两个密度函数偏离较大时,错误率一定低,反之,会大。 * * 两类模式可分性的一种度量是它们均值的差 ,称为偏离度 。 定义量: 称为有(单)向偏离度,或第i类相对第j类的相对信息。有些作者称它为Kullback—leibler距离。 * * 由上两式可知 这样,当相对信息(平均可分性信息)I(1,2)和I(2,1)大时,JD也大,可分性好。 * * * * 可分性的另一种度量 而量 ,有时称为Bhattacharyya系数。 这两个量比起偏离度来,直观上更难解释。但 若写为: * * * * 若原来的两个

文档评论(0)

1亿VIP精品文档

相关文档