基于粒度和信息熵的并行支持向量机算法
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP311

基金项目:

国家重点研发计划项目(No. 2018YFC1504705)、国家自然科学基金项目(No.41562019)


The Parallel SVM Algorithm by Using Granularity and Information Entropy
Author:
Affiliation:

Fund Project:

National Key R&D Program Project (No. 2018YFC1504705), National Natural Science Foundation of China Project (No. 41562019)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对大数据环境下并行SVM算法存在噪音数据较敏感、训练样本数据冗余等问题,提出了基于粒度和信息熵的GIESVM-MR(the SVM algorithm by using granularity and information entropy based on MapReduce)算法。该算法首先提出了噪音清除策略(noise cleaning, NC)对每个特征属性的重要程度进行评价,获得样本与类别之间的相关度,以达到识别和删除噪音数据的目的;其次提出了基于粒度的数据压缩策略(Data Compression based on Granulation, GDC),通过筛选信息粒的方式保留类边界样本删除非支持向量,得到规模较小的数据集,从而解决了大数据环境下训练样本数据冗余问题;最后结合Bagging的思想和MapReduce计算模型并行化训练SVM,生成最终的分类模型。实验表明,GIESVM-MR算法的分类效果更佳,且在大规模的数据集下算法的执行效率更高。

    Abstract:

    Aiming at the problems of noise data sensitive and training sample redundancy of parallel SVM algorithm in big data environment, this paper have proposed a parallel SVM algorithm by using granularity and information entropy, named GIESVM-MR. Firstly, the algorithm proposed the NC (noise cleaning) method to evaluate the importance of each feature attribute and obtain the correlation between the sample and the category, which effectively identify and delete noise data. Secondly, a GDC (Data Compression based on Granulation) strategy is proposed, which screen the information granules to retain class boundary samples and delete non-support vectors. Then result in a smaller data set, and solve the problem of training sample data redundancy in a big data environment. Finally, the final classification model is generated by combining the idea of Bagging and MapReduce computing model. Experimental results show that the GIESVM-MR algorithm not only effectively improves the classification accuracy, but also reduces the time complexity of parallel SVM algorithm in big data environment.

    参考文献
    相似文献
    引证文献
引用本文

毛伊敏,张刘鑫,卢欣荣. 基于粒度和信息熵的并行支持向量机算法[J]. 科学技术与工程, 2021, 21(10): 4124-4132.
Mao Yimin, Zhang Liuxin, Lu Xinrong. The Parallel SVM Algorithm by Using Granularity and Information Entropy[J]. Science Technology and Engineering,2021,21(10):4124-4132.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2020-07-10
  • 最后修改日期:2021-04-02
  • 录用日期:2020-12-03
  • 在线发布日期: 2021-04-28
  • 出版日期:
×
律回春渐,新元肇启|《科学技术与工程》编辑部恭祝新岁!
亟待确认版面费归属稿件,敬请作者关注