计算机工程

基于辅助短语标记的名词短语识别

展开
  • 沈阳航空航天大学 知识工程中心, 沈阳 110136
刘飞(1987-), 女, 辽宁大连人, 在读硕士, 主要研究方向:知识管理与智能人机交互, E-mail:fei_l2011@163.com;张桂平(1962-), 女, 辽宁本溪人, 教授, 主要研究方向:自然语言处理, 机器翻译, E-mail:zgp@ge-soft.com。

收稿日期: 2013-10-24

基金资助

国家科技支撑计划项目(项目编号:2012BAH14F00);辽宁省教育厅科学研究一般项目(项目编号:L2012056)

Recognition of Chinese noun phrase based on auxiliary phrase mark

Expand
  • Knowledge Engineering Research Center, Shenyang Aerospace University, Shenyang 110136

Received date: 2013-10-24

摘要

名词短语的识别是自然语言处理领域中非常重要的子任务。而名词短语的识别性能与识别效率一直是研究人员关注的焦点, 为了达到兼顾二者的目的, 提出了一种基于辅助短语标记识别名词短语的方法。首先, 在分析了短语不同分类体系的基础上, 构建了一种映射公式, 并根据该公式对不同分类体系的短语类别之间进行映射。然后, 根据映射结果及短语的概率分布进行辅助短语标记的组合。实验结果表明, 本文的方法在提高F值的基础上, 有效地降低了系统的时间开销。

本文引用格式

刘 飞, 周俏丽, 张桂平 . 基于辅助短语标记的名词短语识别[J]. 沈阳航空航天大学学报, 2014 , 31(1) : 52 -59 . DOI: 10.3969/j.issn.2095-1248.2014.01.012

Abstract

Noun Phrase Recognition is one of the most critical components in natural language processing field.The noun phrase recognition performance and its efficiency are the focus of researchers′ attention.In order to combine the two elements, this paper proposes a method of recognizing noun phrases based on auxiliary phrase mark.First, this paper presents a mapping between phrases by using the mapping formula based on the detailed analysis of the different classification system of the phrases.Then, according to the mapping results and the probability of the distribution of the auxiliary phrase mark, lots of combinations are established.Experimental results show that this method effectively reduces the time of noun phrase recognition without reducing the F-value.

参考文献

[1]梁颖红.基于多Agent的英汉文本语块识别技术研究[D].哈尔滨:哈尔滨工业大学, 2006:8-14.
[2]Angel S Y, Kam Fai Wong, et al.Effectiveness analysis of linguistics and corpus based noun phrase partial parsers[C].In Proceedings of Natural Language Processing Pacific Rim Symposium, 1995:252-257.
[3]Abney S.Partial parsing via finite-state cascades[J].Natural Language Engineering, 1996, 2(4):337-344.
[4]Ramshaw, Lance and Mitch Marcus.Text chunking using transformation-based learning[C].Somerset, New Jersey:Association for Computational Linguistics, 1995.
[5]周雅倩, 郭以昆, 黄萱菁, 等.基于最大熵方法的中英文基本名词短语识别[J].计算机研究与发展, 2003, 40(3):440-446.
[6]Koeling, Rob.Chunking with maximum entropy models[C].2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning, 2000:139-141.
[7]李荣.基于隐马尔可夫模型的汉语非嵌套名词短语识别[J].忻州师范学院学报, 2004, 5(20):122-124.
[8]Kudo, Taku and Yuji Matumoto.Chunking with support vector machines[C].2nd Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies.Pittsburgh, Pennsylvania:Association for Computational Linguistics, 2001:1-8.
[9]Sha Fei and Fernando Pereira.Shallow parsing with conditional random fields[C].Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology.Edmonton, Canada:Association for Computational Linguistics, 2003:134-141.
[10]张斌.现代汉语短语[M].上海:华东师范大学出版社, 2000.
[11]石毓智.汉语语法[M].北京:商务印书馆, 2010:8.
[12]周强, 俞士汶.汉语短语标注标记集的确定[J].中文信息学报, 1996(4):1-11.
[13]Dan Klein, Slav Petrov.Learning accurate, compact and interpretable tree annotation[C].Proceedings of the 21st International Comference on Computational Linguistics and 44th Annual Meeting of the ACL, 2009:25-32.
[14]Lafferty J, McCallum A, Pereira F.Conditional random fields:probabilistic models for segmenting and labeling sequence data[C].In International Conference on Machine Learning, 2001:139-141.
[15]李荣.汉语名词短语和动词短语的自动识别方法研究[M].北京:兵器工业出版社, 2008.
文章导航

/