基于权重搜索树改进K近邻的高维分类算法
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP301.6

基金项目:

国家自然科学基金资助项目(61762031);广西科技重大专项 (桂科AA19046004);广西重点研发项目(桂科AB18126006)


Improved k-nearest neighbor algorithm based on weight search tree for high-dimensional classification
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    信息采集技术日益发展导致的高维、大规模数据,给数据挖掘带来了巨大挑战,针对K-近邻分类算法在高维数据分类中存在效率低、时间成本高的问题,提出基于权重搜索树改进K-近邻(K-nearest neighbor algorithm based on weight search tree, WSTKnn)的高维分类算法,该算法根据特征属性权重的大小,选取部分属性作为结点构建搜索树,通过搜索树将数据集划分为不同的矩阵区域,未知样本需查找搜索树获得最“相似”矩阵区域,仅与矩阵区域中的数据距离度量,从而降低数据规模,以减少时间复杂度。并研究和讨论最适合高维数据距离度量的闵式距离。在六个标准高维数据仿真实验表明,WSTKnn算法对比K-近邻分类算法、决策树和SVM算法,分类时间显著减少,同时分类准确率也优于其他算法,具有更好的性能,有望为解决高维数据相关问题提供一定参考。

    Abstract:

    The ongoing development of information acquisition technique resulted in high-dimensional and large-scale data, it enormously challenges the data mining. Aiming at low efficiency and high time cost of K-nearest neighbor classification algorithms in high-dimensional data, an Improved k-nearest neighbor algorithm based on weight search tree (WSTKnn) for high-dimensional classification was proposed in this paper. The algorithm selects some attributes as nodes to construct a search tree according to the weight of feature attributes. The search tree divides the data set into different matrix regions. Unknown samples need to find the search tree to obtain the most "similar" matrix region, and only calculate the distance from the data contained in the matrix area. thus, reduce data size to reduce time complexity. And discussed the Minkowski Distance that would be most suitable for distance measurement of high-dimensional data. Simulation experiments on 6 standard high-dimensional data show that the classification time of WSTKnn has better performance than K-nearest neighbor, Decision Tree and SVM, the classification time is significantly reduced and the classification accuracy is better than other algorithms. WSTKnn has better performance on the classification of high-dimensional data, which is expected to give some references for solving the related problem of high-dimensional data.

    参考文献
    相似文献
    引证文献
引用本文

梁淑蓉,陈基漓,谢晓兰. 基于权重搜索树改进K近邻的高维分类算法[J]. 科学技术与工程, 2021, 21(7): 2760-2766.
Liang Shurong, Chen Jili, Xie Xiaolan. Improved k-nearest neighbor algorithm based on weight search tree for high-dimensional classification[J]. Science Technology and Engineering,2021,21(7):2760-2766.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2020-06-29
  • 最后修改日期:2020-12-08
  • 录用日期:2020-08-10
  • 在线发布日期: 2021-03-31
  • 出版日期:
×
律回春渐,新元肇启|《科学技术与工程》编辑部恭祝新岁!
亟待确认版面费归属稿件,敬请作者关注