区分数据与挖掘术语学术探讨.pdfVIP

  • 0
  • 0
  • 约8.25万字
  • 约 18页
  • 2025-10-17 发布于北京
  • 举报

DifferentiatingData-andText-MiningTerminology

JANH.KROEZE,MACHDELC.MATTHEEANDTHEOJ.D.BOTHMA

UniversityofPretoria

Whenanewdisciplineemergesitusuallytakessometimeandlotsofacademicdiscussionbeforeconceptsandtermsgetstandardised.Suchanew

disciplineistextmining.Inagroundbreakingpaper,Untanglingtextdatamining,Hearst[1999]tackledtheproblemofclarifyingtext-miningconcepts

andterminology.ThisessayaimstobuildonHearst’sideasbypointingoutsomeinconsistenciesandsuggestinganimprovedandextended

categorisationofdata-andtext-miningtechniques.Theessayisaconceptualstudy.Ashortoverviewoftheproblemsregardingtext-miningconcepts

isgiven.ThisisfollowedbyasummaryandcriticaldiscussionofHearst’sattempttoclarifytheterminology.Theessenceoftextminingisfoundto

bethediscoveryorcreationofnewknowledgefromacollectionofdocuments.Theparametersofnon-novel,semi-novelandnovelinvestigationare

usedtodifferentiatebetweenfull-textinformationretrieval,standardtextminingandintelligenttextmining.Thesameparametersarealsousedto

differentiatebetweenrelatedprocessesfornumericaldataandtextmetadata.Thesedistinctionsmaybeusedasaroadmapintheevolvingfieldsof

data/informationretrieval,knowledgediscoveryandthecreationofnewknowledge.

CategoriesandSubjectDescriptors:H.2.8[DatabaseManagement]:DatabaseApplications–Datamining;H.2.4[DatabaseManagement]:Systems

–Textualdatabases;H.3.1[InformationStorageandRetrieval]:ContentAnalysisandIndexing;H.3.3[InformationStorageandRetrieval]:

InformationSearchandRetrieval;H.3.6[InformationStorageandRetrieval]:LibraryAutomation–Largetextarchives;I.2.7

文档评论(0)

1亿VIP精品文档

相关文档