基于邻域粗糙集的文本主题特征提取
DOI:
作者:
作者单位:

1.太原理工大学信息与计算机学院;2.太原理工大学电气与动力工程学院

作者简介:

通讯作者:

中图分类号:

TP301

基金项目:

山西省回国留学人员科研项目(2015-045)


Research on Text Topic Feature Extraction Based on Neighborhood Rough Set
Author:
Affiliation:

Taiyuan University of Technology College of Information and Computer

Fund Project:

Scientific Research Projects for Returned Overseas Students in Shanxi Province (2015-045)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    LDA主题模型是一种有效的文本语义信息提取工具,利用在文档层中实现词项的共现,将词项矩阵转化为主题矩阵,得到主题特征,然而在生成文档过程中会蕴含冗余主题。针对LDA主题模型提取主题特征时存在冗余的不足,本文提出一种基于邻域粗糙集的LDA主题模型约简算法NRS-LDA。利用邻域粗糙集构造主题决策系统,通过预先设定主题个数,计算出每个主题的重要度,根据重要度进行排序,将排序后重要度低的主题删除。将本文提出的NRS-LDA算法应用于K-means文本聚类问题上,并与传统的文本特征提取算法及改进的算法进行比较,实验结果表明本文所提NRS-LDA方法可以得到更高的聚类精度。

    Abstract:

    LDA topic model is an effective tool for text feature extraction. Although the topic feature is obtained through the co-occurrence of the term in the document level, which transfers the term space into the topic space, the redundant topic is included in the process of generating the document. As to the redundant topic shortage during topic feature extraction by LDA, this paper proposes an LDA topic model reduction algorithm NRS-LDA based on neighborhood rough set. Use the neighborhood rough set to construct the topic decision system. By pre-setting the number of topics, calculate the importance of each topic; rank according to the importance degree and delete the topics of low importance. Apply the NRS-LDA algorithm to the K-means text clustering problem and compare it with the traditional extraction algorithm of text feature and with the improved algorithm. The experimental results show that the proposed NRS-LDA method can obtain higher clustering accuracy.

    参考文献
    相似文献
    引证文献
引用本文

靳红伟,谢 珺,续欣莹. 基于邻域粗糙集的文本主题特征提取[J]. 科学技术与工程, 2019, 19(22): 208-214.
JIN Hong-wei,,XU Xin-ying. Research on Text Topic Feature Extraction Based on Neighborhood Rough Set[J]. Science Technology and Engineering,2019,19(22):208-214.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2019-01-27
  • 最后修改日期:2019-04-28
  • 录用日期:2019-05-05
  • 在线发布日期: 2019-08-28
  • 出版日期:
×
律回春渐,新元肇启|《科学技术与工程》编辑部恭祝新岁!
亟待确认版面费归属稿件,敬请作者关注