- 1、本文档共19页,可阅读全部内容。
- 2、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
- 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
语料库语言学的目标和方法
Introduction to Corpus Linguistics
1.1 What is a corpus?
In the language sciences a corpus is a body of written text or transcribed speech which can serve as a basis for linguistic analysis and description. In many respects it is the use to which the body of textual material is put, rather than its design features, which define what a corpus is.
A corpus constitutes an empirical basis not only for identifying the elements and structural patterns which make up the systems we use in a language, but also for mapping out our use of these systems. A corpus can be analyzed and compared with other corpora or parts of corpora to study variation. Most importantly, it can be analyzed distributionally to show how often particular phonological, lexical, grammatical, discoursal or pragmatic features occur, and also where they occur.
By the 1990s there were many corpus-making projects in various parts of the world. Lancashire (1991) shows the huge range of corpora, archives and other electronic databases available or being compiled for a wide variety of purposes. Some of the largest corpus projects have been undertaken for commercial purposes, by dictionary publishers. Other projects in corpus compilation or analysis are on a smaller scale, and do not necessarily become well known. Undertaken as part of graduate theses or undergraduate projects, they enabled students to gain original insights into the structure and use of language.
1. 2 Categorization of Corpus
Computerized corpora consist of:
Raw corpora (原始语料库),这就是将现实中的口语和笔语用文字形式收集起来,按一定原则(语域,语体,历时,共时等)归类汇编起来的各种语料库。
Annotated corpora (附码语料库),这是指对原始语料进行了词性、语法、语音、语义或语篇乃至语用标记附码的语料库
Parallel corpora (平行语料库),这是指两种或多种语言在句子乃至单词短语层面上实现同步对译的互动语料库,如英法德西班牙等语种的平行语料库CRATER (McEnery Oakes 1996)和英汉双语平行语料库 (中国外语教学研究中心基地 2000)等
Learners corpora (学习者语料库), 即非母语学习者的口语和笔语语料库,其中包括注有学习者拼写和语法差错标记以及修改提示的语料库。如ICLE (国际英语学习者书面语料库),LINDSEI (国际英语学习者口语语料库)(Granger 2000) 和 CLEC (中国英语学习者书面语料库)(桂诗春 2001)等等
Lattice corpora (网格式语料库)
您可能关注的文档
- IBM LOTUS配制说明【DOC精选】.doc
- IIR滤波器的设计与仿真【DOC精选】.doc
- IIR数字滤波器的MATLAB实现【DOC精选】.doc
- IIS. 配置伪静态详细图文教程【DOC精选】.doc
- IIR滤波器的DSP实现【DOC精选】.docx
- IIR数字滤波器的设计大纲【DOC精选】.doc
- Iherb海淘攻略 -最新的, 可以使用支付宝支付的【DOC精选】.doc
- Implication of DNA Demethylation and Bivalent Hist【DOC精选】.doc
- IMDB电影排行榜(绝对都是超级经典的电影)【DOC精选】.doc
- IIS安全配置手册安全加固手册【DOC精选】.doc
文档评论(0)