排序方式: 共有7条查询结果,搜索用时 15 毫秒
1
1.
潜在语义分析的理论研究及应用 总被引:1,自引:0,他引:1
潜在语义分析(Iatent Semantic Analysis,简称LSA)通过奇异值分解(Singular Value Decomposition,简称SVD)分析文本集之间的关系.是产生关键词——语义之间映射规则的方法。而随后又出现的PLSA(Probabilistic Latent Semantic Analysis)时基于奇异值分解的LSA又进行统计学的极大似然估计重新解释。LSA最初应用在文本信息检索领域,随着应用领域的不断拓展.LSA在信息过滤、跨语言检索、认知科学和数据挖掘中的信息理解、判断和预测等众多领域中得到了广泛的应用。 相似文献
2.
基于图情领域权威期刊论文数据集,利用概率潜在语义分析(PLSA)算法对表征专家专长的文档进行处理,以此来定位图情领域专家的研究领域。实验结果表明,该方法具有可行性并取得较好的实验结果。 相似文献
3.
Wenjuan Luo Fuzhen Zhuang Weizhong Zhao Qing He Zhongzhi Shi 《Information processing & management》2015
Aspect level sentiment analysis is important for numerous opinion mining and market analysis applications. In this paper, we study the problem of identifying and rating review aspects, which is the fundamental task in aspect level sentiment analysis. Previous review aspect analysis methods seldom consider entity or rating but only 2-tuples, i.e., head and modifier pair, e.g., in the phrase “nice room”, “room” is the head and “nice” is the modifier. To solve this problem, we novelly present a Quad-tuple Probability Latent Semantic Analysis (QPLSA), which incorporates entity and its rating together with the 2-tuples into the PLSA model. Specifically, QPLSA not only generates fine-granularity aspects, but also captures the correlations between words and ratings. We also develop two novel prediction approaches, the Quad-tuple Prediction (from the global perspective) and the Expectation Prediction (from the local perspective). For evaluation, systematic experiments show that: Quad-tuple PLSA outperforms 2-tuple PLSA significantly on both aspect identification and aspect rating prediction for publication datasets. Moreover, for aspect rating prediction, QPLSA shows significant superiority over state-of-the-art baseline methods. Besides, the Quad-tuple Prediction and the Expectation Prediction also show their strong ability in aspect rating on different datasets. 相似文献
4.
赵伟 《安徽职业技术学院学报》2014,(3):1-3
概率潜在语义分析(Probabilistic Latent Semantic Analysis,PLSA)中通过将文档—单词关系转变成文档—主题—单词关系对文档进行排序、过滤、分类等操作,计算量巨大。文章设计了基于MPI(Message Passing Interface)的PLSA高效并行方案,对模型系统和训练数据处理以及并行算法加以优化,提出了一种大数据条件下PLSA并行算法,解决了以往数据规模太大难以计算的问题,算法较优化前训练速度有较大提升,具有扩展性和可行性。 相似文献
5.
高效的Web页面语义标注方法是提高Web信息资源利用效率和知识创新的关键。针对当前Web页面语义标注方法存在的问题和Web页面表现出的结构特征和文本特征及其主题分布规律,设计了基于PLSA主题模型的Web页面语义标注算法。该算法分别对Web页面的结构特征和文本特征构建独立的PLSA主题模型,采用自适应不对称学习算法对这些独立的PLSA主题模型进行集成和优化,最终形成新的综合性的PLSA主题模型进行未知Web页面的自动语义标注。实验结果表明,该算法能够显著提高Web页面语义标注的准确率和效率,可以有效地解决大规模Web页面语义标注问题。 相似文献
6.
7.
Test Data Likelihood for PLSA Models 总被引:2,自引:0,他引:2
Probabilistic Latent Semantic Analysis (PLSA) is a statistical latent class model that has recently received considerable attention. In its usual formulation it cannot assign likelihoods to unseen documents. Furthermore, it assigns a probability of zero to unseen documents during training. We point out that one of the two existing alternative formulations of the Expectation-Maximization algorithms for PLSA does not require this assumption. However, even that formulation does not allow calculation ofthe actual likelihood values. We therefore derive a new test-data likelihood substitute for PLSA and compare it to three existing likelihood substitutes. An empirical evaluation shows that our new likelihood substitute produces the best predictions about accuracies in two different IR tasks and is therefore best suited to determine the number of EM steps when training PLSA models. The new likelihood measure and its evaluation also suggest that PLSA is not very sensitive to overfitting for the two tasks considered. This renders additions like tempered EM that especially address overfitting unnecessary.The work reported here was carried out while the author was at the Palo Alto Research Center (PARC). 相似文献
1