- 0
- 0
- 约8.25万字
- 约 18页
- 2025-10-17 发布于北京
- 举报
DifferentiatingData-andText-MiningTerminology
JANH.KROEZE,MACHDELC.MATTHEEANDTHEOJ.D.BOTHMA
UniversityofPretoria
Whenanewdisciplineemergesitusuallytakessometimeandlotsofacademicdiscussionbeforeconceptsandtermsgetstandardised.Suchanew
disciplineistextmining.Inagroundbreakingpaper,Untanglingtextdatamining,Hearst[1999]tackledtheproblemofclarifyingtext-miningconcepts
andterminology.ThisessayaimstobuildonHearst’sideasbypointingoutsomeinconsistenciesandsuggestinganimprovedandextended
categorisationofdata-andtext-miningtechniques.Theessayisaconceptualstudy.Ashortoverviewoftheproblemsregardingtext-miningconcepts
isgiven.ThisisfollowedbyasummaryandcriticaldiscussionofHearst’sattempttoclarifytheterminology.Theessenceoftextminingisfoundto
bethediscoveryorcreationofnewknowledgefromacollectionofdocuments.Theparametersofnon-novel,semi-novelandnovelinvestigationare
usedtodifferentiatebetweenfull-textinformationretrieval,standardtextminingandintelligenttextmining.Thesameparametersarealsousedto
differentiatebetweenrelatedprocessesfornumericaldataandtextmetadata.Thesedistinctionsmaybeusedasaroadmapintheevolvingfieldsof
data/informationretrieval,knowledgediscoveryandthecreationofnewknowledge.
CategoriesandSubjectDescriptors:H.2.8[DatabaseManagement]:DatabaseApplications–Datamining;H.2.4[DatabaseManagement]:Systems
–Textualdatabases;H.3.1[InformationStorageandRetrieval]:ContentAnalysisandIndexing;H.3.3[InformationStorageandRetrieval]:
InformationSearchandRetrieval;H.3.6[InformationStorageandRetrieval]:LibraryAutomation–Largetextarchives;I.2.7
您可能关注的文档
最近下载
- 《外国教育史》全套教学课件.pptx
- DG-T 052-2019青饲料收获机-农机鉴定大纲.docx VIP
- 年增产增材制造(3D打印环境影响环评报批.docx VIP
- 2024年山东水利职业学院辅导员考试笔试题库附答案.docx VIP
- 2025年《义务教育劳动课程标准(2025年版)》解读心得.docx VIP
- 早产儿脑室内出血后脑室扩张/脑积水管理专家共识(2025).pptx VIP
- 【中国电力企业联合会&隆基绿能】2025背接触(BC)电池技术发展白皮书.pdf
- 档案库房消防安全防火制度及流程.docx VIP
- 年处理6600吨废催化剂综合回收利用项目(一期工程处理含贵金属废催化剂1600吨年) 环境影响报告书.pdf VIP
- 国家电投贵州金元织金“上大压小”异地改建项目(2×660MW)项目 环境影响报告书.pdf VIP
原创力文档

文档评论(0)