Computer Engineering

Research of Latent Semantic Indexing based on subspace optimization

Expand
  • Shenyang Aerospace University, Knowledge Engineering Research Center, Shenyang, Liao Ning, 110136, China

Received date: 2012-09-16

Abstract

Latent Semantic Indexing is an unsupervised feature extraction technology, and its effectiveness has been proven in several research fields such as information indexing.Because the effect relies entirely on the characteristic distribution of data, optimizing the data can improve the technology′s effectiveness.An optimized technology of the Latent Semantic Indexing-Augmented Space Model has been proposed, and a new strategy based on the documents′ lengths and distribution of the features′ DF is also presented in this paper, which can ensure that the favorable structure of big scale corpus can be inherited by the two subspaces as far as possible.Experiments prove that precision and a shorter time of the algorithm can be obtained by an appropriate subspace dividing strategy.In the end, this paper shows a better performance-the precision in the classification experiment is 85.92%-by adopting the Augmented Space Model to integrate different subspaces.

Cite this article

JI Duo, CHANG Li-wei, CAI Dong-feng . Research of Latent Semantic Indexing based on subspace optimization[J]. Journal of Shenyang Aerospace University, 2013 , 30(2) : 60 -65 . DOI: 10.3969/j.issn.2095-1248.2013.02.014

References

[1]Scott Deerwester, Susan T.Dumais, Richard Harshman.Indexing by Latent Semantic Analysis[J].Journal of the American Society for Information Science, 1990, 41(6):391-407.
[2]April Kontostathis, William M.Pottenger.Detecting patterns in the LSI term-term matrix[C].Workshop onthe Foundation of Data Mining and Discovery:The 2002 IEEE International Conference on Data Mining, 2002:243-248.
[3]Avinash Atreya, Charles Elkan.Latent Semantic Indexing (LSI) Fails for TRECCollections[J].SIGKDD Explorations, 2011, 12(2):5-10.
[4]April Kontostathis.Essential Dimensions of Latent SemanticIndexing (EDLSI).In Proceedings of the 40th Annual HawaiiInternational Conference on System Sciences [CD], 2007.Kona, Hawaii, USA:Computer Society Press.
[5]Muhammad MuazzemHossain, Victor Prybutok, Nicholas Evangelopoulos.Causal Latent Semantic Analysis (cLSA):An Illustration[J].International Business Research, 2011, 4(2):38-50.
[6]刘云峰, 齐欢, Xiang′en Hu, 等.潜在语义分析权重计算的改进[J].中文信息学报, 2005, 19(6):64-69.
[7]任纪生, 王作英.一种新的潜在语义分析语言模型[J].高技术通讯, 2005, 15(8):1-5.
[8]AprilKontostathis, WilliamM.Pottenger, Brian D.Davison.Identification of Critical Values in Latent Semantic Indexing[J].Foundations of Data Mining and knowledge Discovery, 2005:333-346.
[9]Peter Wiemer-Hastings.How Latent is Latent Semantic Analysis[C].USA:Proceedings of the 16th international Joint Conference on Artificial Intelligence, 1999:932-937.
Outlines

/