区分数据与挖掘术语学术探讨.pdfVIP

区分数据与挖掘术语学术探讨.pdf

DifferentiatingData-andText-MiningTerminology

JANH.KROEZE,MACHDELC.MATTHEEANDTHEOJ.D.BOTHMA

UniversityofPretoria

Whenanewdisciplineemergesitusuallytakessometimeandlotsofacademicdiscussionbeforeconceptsandtermsgetstandardised.Suchanew

disciplineistextmining.Inagroundbreakingpaper,Untanglingtextdatamining,Hearst[1999]tackledtheproblemofclarifyingtext-miningconcepts

andterminology.ThisessayaimstobuildonHearst’sideasbypointingoutsomeinconsistenciesandsuggestinganimprovedandextended

categorisationofdata-andtext-miningtechniques.Theessayisaconceptualstudy.Ashortoverviewoftheproblemsregardingtext-miningconcepts

isgiven.ThisisfollowedbyasummaryandcriticaldiscussionofHearst’sattempttoclarifytheterminology.Theessenceoftextminingisfoundto

bethediscoveryorcreationofnewknowledgefromacollectionofdocuments.Theparametersofnon-novel,semi-novelandnovelinvestigationare

usedtodifferentiatebetweenfull-textinformationretrieval,standardtextminingandintelligenttextmining.Thesameparametersarealsousedto

differentiatebetweenrelatedprocessesfornumericaldataandtextmetadata.Thesedistinctionsmaybeusedasaroadmapintheevolvingfieldsof

data/informationretrieval,knowledgediscoveryandthecreationofnewknowledge.

CategoriesandSubjectDescriptors:H.2.8[DatabaseManagement]:DatabaseApplications–Datamining;H.2.4[DatabaseManagement]:Systems

–Textualdatabases;H.3.1[InformationStorageandRetrieval]:ContentAnalysisandIndexing;H.3.3[InformationStorageandRetrieval]:

InformationSearchandRetrieval;H.3.6[InformationStorageandRetrieval]:LibraryAutomation–Largetextarchives;I.2.7

更多 >