- 2
- 0
- 约5.29千字
- 约 15页
- 2017-03-09 发布于上海
- 举报
The use of machine translation tools for crosslingual text机器翻译工具在跨语言文本中的应用
Kernel Canonical Correlation Analysis (Language Independent Document Representation) Blaz Fortuna Marko Grobelnik Dunja Mladeni? Jozef Stefan Institute, Ljubljana Outline What is KCCA – intuition and theory Preliminary results for AC corpora Applications of KCCA Related approaches What is KCCA about? KCCA enables to represent documents in a “language neutral way” Intuition behind KCCA: Given a parallel corpus (such as Acquis)… …first, we automatically identify language independent semantic concepts from text, …then, we re-represent documents with the identified concepts, …finally, we are able to perform cross language statistical operations (such as retrieval, classification, clustering…) Input for KCCA On input we have set of aligned documents: For each document we have a version in each language Documents are represented as bag-of-words vectors The Output from KCCA The goal: find pairs of semantic dimensions that co-appear in documents and their translations with high correlation Semantic dimension is a weighted set of words. These pairs are pairs of vectors, one from e.g. English bag-of-words space and one from German bag-of-words space. The Algorithm – Theory (1/2) Formally the KCCA solves: max(x,y) Corr(x,, , , y,, , ) x, y – semantic directions for English and German ( , ) is a pair of aligned documents The Algorithm – Theory (2/2) Examples of Semantic Dimensions from Acquis corpus: English-French (1/2) Most important words from semantic dimensions automatically generated from 2000 documents: Examples of Semantic Dimensions from Acquis corpora: English-Slovene (2/2) Most important words from semantic dimensions automatically generated from 2000 documents : Applications of KCCA Cross-lingual document retrieval: retrieved documents depend only on the meaning of the query and not its language. Automatic document categorization: only one classifier is learned and not a separate classifier for each language Document clustering: documents s
您可能关注的文档
- The Science of Fashion Show courses时装表演课程课程.csusm.ppt
- THE SCIENCE OF LOVE IS THERE SUCH A THING爱情的科学有这样一种东西.ppt
- The Science of Climate Change bren科学的气候变化布伦.ucsb.ppt
- THE SCIENCE OF SCIENTIFIC WRITING George D科学写作的科学. .ppt
- The Science of Climate Change ecoles气候变化是科学.csbf.qc.ppt
- The Scientific Method chatt科学方法聊天.hdsb.ppt
- The Science of Geography Western Oregon University地理科学俄勒冈西部大学.ppt
- The Science Case for STEP physics步进物理科学.ucla.ppt
- The Scientific Method MDCP'S Science Home Page科学方法是科学优化首页.ppt
- The Scientific Method SchoolNotes科学的方法schoolnotes.ppt
- The Urinary System kohliscience泌尿系统kohliscience.ppt
- The University of Texas at Austin Department of 得克萨斯大学奥斯汀分校系.ppt
- The Use of Simulation to Determine Maximum Capacity in the利用模拟来确定最大容量.ppt
- THE U美国.S. TELECOMS MELTDOWN REASONS AND .ppt
- The U美国.S. Consumer Product Safety Commission.ppt
- The value of “traditional” reviews in the era of systematic“传统”评价在系统化时代的价值.ppt
- The Verilog Hardware Description Language Extra MaterialsVerilog硬件描述语言的额外材料.ppt
- The Very Hungry Caterpillar Reading and Language Arts非常饿的卡特彼勒阅读和语言艺术.ppt
- The Verbal Behavior Approach Teaching Children with 言语行为教学法教学.ppt
- The Very Hungry Caterpillar很饿的毛毛虫. PCMAC.ppt
最近下载
- 幽默卡通动漫笑点设计方法.docx VIP
- 行星式球磨机立式-FOCUCY弗卡斯.PDF VIP
- 贵州省六校联考2025届高三下学期3月高考实用性联考(四)英语试卷(含答案).docx VIP
- 颈椎病的中医护理课件.pptx VIP
- 原调正谱G小小蝴蝶la farfalletta钢琴伴奏高清打印版.pdf VIP
- 上海大学2024-2025学年第1学期《高等数学(上)》期末考试试卷(B卷)附参考答案.pdf
- 广发证券-交通银行-601328-零售转型提速,定增落地夯实资本根基.pdf
- 上海大学2024-2025学年第1学期《高等数学(上)》期末考试试卷(A卷)附参考答案.pdf
- 循环系统护理小讲课.pptx
- 建筑屋面西瓦专项施工方案(范本).doc VIP
原创力文档

文档评论(0)