基于相码模型的汉字表征

首页 > 过刊浏览>2021年第21卷第5期 >1937-1947

基于相码模型的汉字表征
DOI:
                        
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:TP391
基金项目:北京市社会科学基金项目，北京市公安局局级课题

Characterization of Chinese Characters Based on Cross Quadrant Mnemonic Mapping Model

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

为解决汉语自然语言处理任务中未登录词问题，人们经常利用汉字的笔画、偏旁、拼音等细粒度特征提高模型的学习能力。为找出这类特征的最佳组合，本文通过统计方法研究了汉字的音节、起笔、偏旁、声调、词频、笔画数等特征，提出一种可融合多种汉字特征的跨象限助记符映射模型，即相码模型，该模型可自动实现中文字、词与字母编码间的可逆映射。在字符级模型的文本分类实验中，效果理想。此外，模型生成的编码长度适中，保留了可读特性，可用于特殊场合的文本标注，也能为中文文本提供等量的平行语料数据。可见，该模型是自然语言处理中一个较好的辅助模型。

Abstract:

In order to solve the OOV(out of vocabulary) problem in Chinese natural language processing, people often use the fine-grained characteristics of Chinese characters such as strokes, radicals, Pinyin to improve the learning ability of the model. Arround finding the best combination of these features, this paper studied the syllable, first stroke, radical, tone, word frequency, stroke number and other features of Chinese characters by statistical method, and proposed a cross-quadrant mnemonic mapping model which can integrate multiple Chinese characters features. The model can automatically realize the reversible mapping among Chinese characters, words and sequence codes of 26 Latin letters. In the text classification experiment of character-level model, the effect is ideal. In addition, the coding length of the model is moderate, and it retains the readability. It can be used for text annotation in special occasions, and can also provide equal amount of parallel corpus data for Chinese text. So, it is a better auxiliary model in natural language processing.

参考文献

相似文献

引证文献

引用本文

范晓明,王斌君. 基于相码模型的汉字表征[J]. 科学技术与工程, 2021, 21(5): 1937-1947.
Fan Xiaoming, Wang Binjun. Characterization of Chinese Characters Based on Cross Quadrant Mnemonic Mapping Model[J]. Science Technology and Engineering,2021,21(5):1937-1947.

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2020-05-06
最后修改日期:2021-02-05
录用日期:2020-07-04
在线发布日期: 2021-03-18
出版日期:

首页

期刊简介

投稿指南

分类索引

刊文选读

订阅指南

资料下载

样刊邮寄查询

常见问题解答

联系我们

引用本文

分享

文章指标

历史