期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

程志黄荣怀《现代远距离教育》2008,(2):71-73

文本挖掘是从非结构化或半结构化文本材料中获取有效、新颖、潜在有用的、可理解的知识模式的过程。本文首先对文本挖掘的定义、过程和实现途径作了论述,然后试图从信息检索、游览检索结果的效率、垃玻邮件的过滤、个人化主页服务、文档的管理和BBS文档的鉴别和过滤六个方面,对文本挖掘在教育中的应用作了探讨。最后,还对文本挖掘的一些应用系统作了简要介绍。相似文献

2.

Web挖掘的方法及教育应用

程志桂占吉《中国电化教育》2006,(7)

Web挖掘能从大量非结构化、异构的Web信息资源中发现有用的知识或者模式,它已经被广泛地应用于许多不同的领域。目前,Web挖掘在教育领域中的应用也逐渐被人们关注。本文首先分别对Web挖掘的分类和方法做了详细的论述,最后对Web挖掘在教育中的应用做了比较全面的探讨。相似文献

3.

文本挖掘技术研究及其在信息检索中的应用

乔良《教育技术导刊》2009,8(4):160-161

文本挖掘是一个对具有丰富语义的文本进行分析从而理解其所包含的内容和意义的过程,已经成为数据挖掘中一个日益流行而重要的研究领域。给出了文本挖掘的定义和框架,对文本挖掘中预处理、文本摘要、文本分类、聚类、关联分析及可视化技术进行了详尽的分析,并归纳了最新的研究进展,指出了文本挖掘在信息检索中的作用。相似文献

4.

基于Web挖掘的专业文本特征提取方法研究

吕林霞张明新《兰州石化职业技术学院学报》2007,7(3):33-35

通过对专业信息自动分类的文本特征提取方法的分析研究,提出在文本分析时根据Web内容挖掘和结构挖掘的方法提取特征词条来建立文本特征空间,同时利用专业类别向量、专业词典技术可有效解决高维空间问题. 相似文献

5.

面向主题挖掘与观点分析的博客知识挖掘

王萍《中国电化教育》2011,(2)

Blog是Web2.0环境下用户自创内容的重要形式,已经成为互联网上一种重要的信息源和知识源.如何快速、准确地获得Blog信息及隐藏在信息中的知识是人们的迫切需要.本文构建了一个Blog知识挖掘框架,该框架基于文本聚类和主题模型两种文本分析方法,挖掘Blog日志中潜在的概念主题,并对所挖掘的概念主题进行观点分析,这将有助于对于领域知识的深层次研究.笔者应用该方法以e-LearningBlog日志为研究对象,进行了实例研究. 相似文献

6.

Web文本挖掘中的特征表示与特征提取技术 总被引：2，自引：0，他引：2

陈淑珍《三明高等专科学校学报》2004,21(2):53-57,87

Web文本挖掘是人工智能一个崭新的研究领域。分词、特征表示和特征子集提取技术是文本挖掘过程中前期的基础性工作。介绍了文本挖掘中分词、特征表示及特征子集提取的常用技术及发展趋势。相似文献

7.

基于CSS的结构化文档视觉信息伪装技术实现

杨志刚《教育技术导刊》2013,12(1):154-156

对结构化文档的字体视觉冗余进行研究,利用CSS修改字体外观实现信息嵌入。该思路将加密和隐藏结合,载体应用范围广,隐藏密度大,安全性高,易于软件实现,所提出的嵌入方式对格式文本信息隐藏都具有指导意义。相似文献

8.

基于文本数据的数据挖掘算法研究

李艳灵李刚《新乡师范高等专科学校学报》2003,17(2):35-37

总结了数据挖掘的基本方法、文本数据挖掘的关键技术,讨论了文本挖掘的定义和文本分类的一些形式,并对文本数据的数据挖掘算法进行了研究。相似文献

9.

面向教育技术学文献数据的主题挖掘 总被引：3，自引：0，他引：3

王萍《现代教育技术》2009,19(5):46-50

对网络环境下海量的科技文献数据进行文本挖掘可以有效地提高科技文献信息的可用性,发现隐藏的知识。LDA (Latent Dirichlet Allocation)模型是一种能够提取文本隐含主题的非监督学习模型。论文基于LDA模型,以三种国际教育技术期刊在2004-2008年间出版的学术文献为研究对象,进行了主题挖掘和文献分析。相似文献

10.

Web文本挖掘中的特征表示与特征提取技术

陈淑珍《三明学院学报》2004,21(2):53-57

Web文本挖掘是人工智能一个崭新的研究领域。分词、特征表示和特征子集提取技术是文本挖掘过程中前期的基础性工作。介绍了文本挖掘中分词、特征表示及特征子集提取的常用技术及发展趋势。相似文献

11.

数据挖掘中Web文档转换算法的设计与实现

赵小龙佘东《巢湖学院学报》2011,(6):34-38

Web文本挖掘是数据挖掘技术在网络信息处理中的一个重要应用,如何将web文档转换成数据挖掘所要求的格式,即web文档预处理是一项很重要的研究课题.本文的方法是:从Internet网上下载了大量的网页文件,将网页文件转换成文本文件,然后通过算法对这些文本文件中的数据进行词频统计,删除非用词,去掉高频词,对单词进行词根处理,建立用词词表,从而抽取用词,按字母排序生成词频索引,和字典文件进行对照,获取单词的ID,最后生成Reuters-21578的Database数据格式.这样就将web文档数据转换成标准的数据集,以便为数据挖掘中分类、聚类作好准备. 相似文献

12.

基于FastText的新闻文本多分类研究

张超超卢新明《教育技术导刊》2020,19(3):44-47

在迅速增加的海量数据中,文本形式的数据占很大比重。文本分类作为最常见的文本挖掘技术,可在大量杂乱的文本数据中发现有价值的信息,具有重要意义。文本分类面临的首要问题是如何在确保分类准确率的同时缩短分类时间。提出使用分类模型FastText学习单词特征以解决该问题,同时在数据集上使用停用词处理方法降低噪声数据对分类模型的影响。实验结果表明,使用FastText文本分类模型在数据集上准确率达到96.11%,比传统模型提高近4%,且模型处理每条文本的平均时间为1.5ms,缩短了约1/3。相似文献

13.

Web multimedia information retrieval using improved Bayesian algorithm 总被引：1，自引：0，他引：1

余轶军陈纯余轶民林怀忠《Journal of Zhejiang University. Science. B》2003,(4)

INTRODUCTIONPeoplearebecomingmoreinterestednotonlyintextinformationbutalsoinmultimediainfor mationsuchasimage,audioandvideo .Nowmoreandmoreattentionisbeingpaidtocontent basedretrievalsystemsforwebusebecausetheyplayakeyroleinutilizinginformationavailableon… 相似文献

14.

Web multimedia information retrieval using improved Bayesian algorithm

余轶军陈纯余轶民林怀忠《浙江大学学报(A卷英文版)》2003,4(4):415-420

The main thrust of this paper is application of a novel data mining approach on the log of user's feedback to improve web multimedia information retrieval performance. A user space model was constructed based on data mining, and then integrated into the original information space model to improve the accuracy of the new information space model. It can remove clutter and irrelevant text information and help to eliminate mismatch between the page author's expression and the user's understanding and expectation. User space model was also utilized to discover the relationship between high-level and low-level features for assigning weight. The authors proposed improved Bayesian algorithm for data mining. Experiment proved that the authors' proposed algorithm was efficient. Project (No. 20020335020) supported by Research Fund for Doctoral Program, Ministry of Education of China 相似文献

15.

矿冶类文本的多元化特点及翻译原则

邵春美《黄石理工学院学报(人文社科版)》2011,28(1):63-67

文章从文本类型理论入手,首先界定矿冶类文本的范围,结合实例分析该类文本的多元化语体特点,提出了术语与文化兼顾的信息性翻译原则,以期提高矿冶类文本的翻译质量,更好地促进世界矿冶文化的交流。相似文献

16.

Detecting substance-related problems in narrative investigation summaries of child abuse and neglect using text mining and machine learning

《Child abuse & neglect》2019

BackgroundState child welfare agencies collect, store, and manage vast amounts of data. However, they often do not have the right data, or the data is problematic or difficult to inform strategies to improve services and system processes. Considerable resources are required to read and code these text data. Data science and text mining offer potentially efficient and cost-effective strategies for maximizing the value of these data.ObjectiveThe current study tests the feasibility of using text mining for extracting information from unstructured text to better understand substance-related problems among families investigated for abuse or neglect.MethodA state child welfare agency provided written summaries from investigations of child abuse and neglect. Expert human reviewers coded 2956 investigation summaries based on whether the caseworker observed a substance-related problem. These coded documents were used to develop, train, and validate computer models that could perform the coding on an automated basis.ResultsA set of computer models achieved greater than 90% accuracy when judged against expert human reviewers. Fleiss kappa estimates among computer models and expert human reviewers exceeded .80, indicating that expert human reviewer ratings are exchangeable with the computer models.ConclusionThese results provide compelling evidence that text mining procedures can be a cost-effective and efficient solution for extracting meaningful insights from unstructured text data. Additional research is necessary to understand how to extract the actionable insights from these under-utilized stores of data in child welfare. 相似文献

17.

基于结构挖掘和使用挖掘的Web挖掘算法研究

焦金涛《南平师专学报》2008,27(5):44-47

Web挖掘是指使用数据挖掘技术从Web文档和服务中发现和提取信息和知识的技术。本文概述了Web数据挖掘的基本情况以及Web结构挖掘和Web使用挖掘的基本概念。结合对Web结构挖掘中的PageRank算法和Web使用挖掘的主要步骤和算法的研究后,本文提出了一种融合这两种Web挖掘算法的一种新的、综合的Web挖掘算法。相似文献

18.

EM算法在文本挖掘中的应用

严华云肖良军《湖州职业技术学院学报》2008,6(3):12-14

在网络环境中文本挖掘的过程主要包括特征提取、特征选择、挖掘方法选择、结果评价和知识模块等几个部分;最新的发展方向是基于EM算法对文本进行挖掘,基于该算法的的比较挖掘模型为：首先对已知数据集任意分为几个类,然后根据各个类集和背景集对文档集的各个词进行似然,再通过求和可以得到整个数据集的似然,该过程反复进行,直到收敛,从而可以根据各类和背景集结果中的较大的概率值得出文本的共同主题和各个类的主题。相似文献

19.

基于PDM的数据挖掘原型系统研究

石勇周传宏张传强《上海大学学报(英文版)》2004,8(Z1)

Product data management (PDM) is a unique technique that integrates and manages all applications, information and processes defining a product from design to manufacture, and to end-user support. However, exploration of valuable information and knowledge from the PDM system has become a key in improvement of efficiency and implementation of knowledge management in an enterprise. This paper introduces a data mining prototype system model based on PDM, and emphasizes some important techniques such as design of the prototype system framework, methods of data selection and integration of data mining prototype system and PDM. The model basically solves the problem of functional losses in mining and analyzing data in PDM. Application of data mining to PDM is meaningful to the ideas and techniques of PDM, and to the rapid development of data mining application itself. Also, it is useful in improving development and usage of enterprise databases. 相似文献