词汇动态特性与金融指数的相关性分析-计算机科学与技术专业论文.docxVIP

  • 2
  • 0
  • 约3.26万字
  • 约 43页
  • 2018-11-28 发布于上海
  • 举报

词汇动态特性与金融指数的相关性分析-计算机科学与技术专业论文.docx

词汇动态特性与金融指数的相关性分析-计算机科学与技术专业论文

万方数据 万方数据 最佳。为提高模型训练的效率和降低时间复杂度,使用主成分分析方 法进行特征降维。在对上证指数的实验结果中,涨跌预测准确率达 72%左右,指数回归结果 Pearson 相关系数在 0.5 左右,因此表明使 用自然语言处理技术分析金融指数可行有效,进一步也表明财经文本 中词汇动态特性与股市指数具有显著正相关。最后对模型进行误差分 析以及未来研究方向进行论述。 关键词:自然语言处理,词汇动态特性,财经文本,Adaboost,相关 性分析 II ABSTRACT The stock market is a barometer of the national economy, is an important reflection of the national economic development. Therefore, to understand the financial and stock markets may be an effective way to understand national economic development. Financial and stock market, however, is continuously changing. And to understand it is relatively difficult? The main factors which influence the financial and stock market are related state policies, financial news and investors’ mood of the stock market, and so on. Although the underlying factors cannot be easily understood or measured, the factors are buried in related online news. Therefore, it may be reasonable to employ text-related method to research the relationship between these factors and financial Index. From the point of view, Natural Language Processing, This paper aims to find out the correlation between lexical dynamics and the stock market index changes. More specific in this thesis, ―Term-Index‖ correlation is researched. This correlation problem is formalized into two problems: one is classification: to predict the rise and fall of the stock market index; another is regression problem: to predict the possibility of rise and fall of the stock index. The financial text is expressed as a collection of words, vocabulary in the daily financial text constantly updated, this change is called: Term Dynamic Characteristics. Using the dynamic characteristics of term from the text to identify those with the highly index correlated term (highly-index-correlated term HICT), The identification of HICT words is done by the analysis of the stock index information and the frequency distribution method in time series, And taking the weight value of HICT as the feature, we trained the for

文档评论(0)

1亿VIP精品文档

相关文档