首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 656 毫秒
1.
The processing of intellectual property documents, such as patents, has been important to the industry, business, and law communities. Recently, the importance of patent processing has also been recognized in academic research communities, particularly by information retrieval and natural language processing researchers. In addition, large test collections that include patents have recently become available, to enable the systematic evaluation of methodologies from a scientific point of view. In the light of these activities, this special issue is intended to collect advanced research papers on patent processing. As an introduction to the special issue on patent processing, this paper surveys the relevant literature and outlines the papers selected for the special issue.  相似文献   

2.
Typical pseudo-relevance feedback methods assume the top-retrieved documents are relevant and use these pseudo-relevant documents to expand terms. The initial retrieval set can, however, contain a great deal of noise. In this paper, we present a cluster-based resampling method to select novel pseudo-relevant documents based on Lavrenko’s relevance model approach. The main idea is to use overlapping clusters to find dominant documents for the initial retrieval set, and to repeatedly use these documents to emphasize the core topics of a query.  相似文献   

3.
In information retrieval, cluster-based retrieval is a well-known attempt in resolving the problem of term mismatch. Clustering requires similarity information between the documents, which is difficult to calculate at a feasible time. The adaptive document clustering scheme has been investigated by researchers to resolve this problem. However, its theoretical viewpoint has not been fully discovered. In this regard, we provide a conceptual viewpoint of the adaptive document clustering based on query-based similarities, by regarding the user’s query as a concept. As a result, adaptive document clustering scheme can be viewed as an approximation of this similarity. Based on this idea, we derive three new query-based similarity measures in language modeling framework, and evaluate them in the context of cluster-based retrieval, comparing with K-means clustering and full document expansion. Evaluation result shows that retrievals based on query-based similarities significantly improve the baseline, while being comparable to other methods. This implies that the newly developed query-based similarities become feasible criterions for adaptive document clustering.  相似文献   

4.
How to merge and organise query results retrieved from different resources is one of the key issues in distributed information retrieval. Some previous research and experiments suggest that cluster-based document browsing is more effective than a single merged list. Cluster-based retrieval results presentation is based on the cluster hypothesis, which states that documents that cluster together have a similar relevance to a given query. However, while this hypothesis has been demonstrated to hold in classical information retrieval environments, it has never been fully tested in heterogeneous distributed information retrieval environments. Heterogeneous document representations, the presence of document duplicates, and disparate qualities of retrieval results, are major features of an heterogeneous distributed information retrieval environment that might disrupt the effectiveness of the cluster hypothesis. In this paper we report on an experimental investigation into the validity and effectiveness of the cluster hypothesis in highly heterogeneous distributed information retrieval environments. The results show that although clustering is affected by different retrieval results representations and quality, the cluster hypothesis still holds and that generating hierarchical clusters in highly heterogeneous distributed information retrieval environments is still a very effective way of presenting retrieval results to users.  相似文献   

5.
Interdocument similarities are the fundamental information source required in cluster-based retrieval, which is an advanced retrieval approach that significantly improves performance during information retrieval (IR). An effective similarity metric is query-sensitive similarity, which was introduced by Tombros and Rijsbergen as method to more directly satisfy the cluster hypothesis that forms the basis of cluster-based retrieval. Although this method is reported to be effective, existing applications of query-specific similarity are still limited to vector space models wherein there is no connection to probabilistic approaches. We suggest a probabilistic framework that defines query-sensitive similarity based on probabilistic co-relevance, where the similarity between two documents is proportional to the probability that they are both co-relevant to a specific given query. We further simplify the proposed co-relevance-based similarity by decomposing it into two separate relevance models. We then formulate all the requisite components for the proposed similarity metric in terms of scoring functions used by language modeling methods. Experimental results obtained using standard TREC test collections consistently showed that the proposed query-sensitive similarity measure performs better than term-based similarity and existing query-sensitive similarity in the context of Voorhees’ nearest neighbor test (NNT).  相似文献   

6.
We study several machine learning algorithms for cross-language patent retrieval and classification. In comparison with most of other studies involving machine learning for cross-language information retrieval, which basically used learning techniques for monolingual sub-tasks, our learning algorithms exploit the bilingual training documents and learn a semantic representation from them. We study Japanese–English cross-language patent retrieval using Kernel Canonical Correlation Analysis (KCCA), a method of correlating linear relationships between two variables in kernel defined feature spaces. The results are quite encouraging and are significantly better than those obtained by other state of the art methods. We also investigate learning algorithms for cross-language document classification. The learning algorithm are based on KCCA and Support Vector Machines (SVM). In particular, we study two ways of combining the KCCA and SVM and found that one particular combination called SVM_2k achieved better results than other learning algorithms for either bilingual or monolingual test documents.  相似文献   

7.
贺秀莲  韩雪莲 《情报探索》2014,(1):79-81,86
PATENTICS能对美国、日本、中国及欧洲、PCT专利进行全文检索和下载,并对检索结果进行专利分析。介绍了使用PATENTICS进行专利检索及对检索结果进行分析的方法,有助于企业实现专利的查新检索以确定研发方向,还有助于企业了解国内外行业的专利态势,从而为专利布局提供有效依据。  相似文献   

8.
本文利用Innography专利检索与分析平台,对全球碳纤维布专利进行了检索及数据统计,从专利的年度发展趋势、专利权人、专利技术热点和该技术的核心专利等多方面分析了全球碳纤维布技术的大概情况和日美等技术重点国的详细情况,以期为我国碳纤维布技术的研发决策提供支持与依据。研究发现,碳纤维布技术近年来发展迅速,日本和美国在碳纤维布技术研究领域占据主导地位,目前正处于新一轮发展阶段。  相似文献   

9.
The term mismatch problem in information retrieval is a critical problem, and several techniques have been developed, such as query expansion, cluster-based retrieval and dimensionality reduction to resolve this issue. Of these techniques, this paper performs an empirical study on query expansion and cluster-based retrieval. We examine the effect of using parsimony in query expansion and the effect of clustering algorithms in cluster-based retrieval. In addition, query expansion and cluster-based retrieval are compared, and their combinations are evaluated in terms of retrieval performance by performing experimentations on seven test collections of NTCIR and TREC.  相似文献   

10.
To resolve some of lexical disagreement problems between queries and FAQs, we propose a reliable FAQ retrieval system using query log clustering. On indexing time, the proposed system clusters the logs of users’ queries into predefined FAQ categories. To increase the precision and the recall rate of clustering, the proposed system adopts a new similarity measure using a machine readable dictionary. On searching time, the proposed system calculates the similarities between users’ queries and each cluster in order to smooth FAQs. By virtue of the cluster-based retrieval technique, the proposed system could partially bridge lexical chasms between queries and FAQs. In addition, the proposed system outperforms the traditional information retrieval systems in FAQ retrieval.  相似文献   

11.
吴丹  齐和庆 《现代情报》2009,29(7):215-221
信息检索发展中的一个重要理论问题是如何对查询与文档进行匹配,由此形成了不同的信息检索模型。跨语言信息检索是信息检索研究的一个分支,也是近年来的热点问题。本文主要对信息检索模型的研究进展,及其在跨语言信息检索中的应用进展进行分析与综述。  相似文献   

12.
研究了专利信息中特征词组的半自动获取和标注方法,构建了一个可应用于专利检索系统中的领域知识分类体系,进而利用知识分类体系的语意检索功能为专利信息库构建了一个有效的检索模块,由此产生了完整的专利知识库系统。该系统对支持设计者进行技术创新具有较大的实用意义。  相似文献   

13.
Contextual document clustering is a novel approach which uses information theoretic measures to cluster semantically related documents bound together by an implicit set of concepts or themes of narrow specificity. It facilitates cluster-based retrieval by assessing the similarity between a query and the cluster themes’ probability distribution. In this paper, we assess a relevance feedback mechanism, based on query refinement, that modifies the query’s probability distribution using a small number of documents that have been judged relevant to the query. We demonstrate that by providing only one relevance judgment, a performance improvement of 33% was obtained.  相似文献   

14.
不同数据库法律事件代码的多来源和不一致性给专利法律事件的分类和数据深加工带来较大阻碍,目前尚未形成层次化、系统化的分类体系架构。对此,以欧洲专利局(EPO)提供的全球专利法律状态XML数据为研究对象,详细介绍数据覆盖范围以及不同类型交换数据的数据格式;基于对PatSnap、INNOGRAPHY、incoPat、SooPAT、万象云和佰腾网等国内外知名专利检索分析平台中的专利法律状态对比,把EPO专利法律状态归纳为在审、授权、有效、失效、技术转移和其他六大类,并根据各类别在专利生命周期中出现的位置,对专利当前法律状态的有效性划分为在审、有效和失效3种状态,同时将4 000多个EPO法律事件代码中的转让代码和专利许可代码进行详细分类标引,针对专利法律状态分类体系构建、专利当前有效性判定、专利技术转移分类等关键问题,对EPO专利数据进行结构化解析和深加工,从而构建EPO专利法律状态数据库物理模型,为提高专利检索和分析工作的质量及效率提供支撑。  相似文献   

15.
The number of patent documents is currently rising rapidly worldwide, creating the need for an automatic categorization system to replace time-consuming and labor-intensive manual categorization. Because accurate patent classification is crucial to search for relevant existing patents in a certain field, patent categorization is a very important and useful field. As patent documents are structural documents with their own characteristics distinguished from general documents, these unique traits should be considered in the patent categorization process. In this paper, we categorize Japanese patent documents automatically, focusing on their characteristics: patents are structured by claims, purposes, effects, embodiments of the invention, and so on. We propose a patent document categorization method that uses the k-NN (k-Nearest Neighbour) approach. In order to retrieve similar documents from a training document set, some specific components to denote the so-called semantic elements, such as claim, purpose, and application field, are compared instead of the whole texts. Because those specific components are identified by various user-defined tags, first all of the components are clustered into several semantic elements. Such semantically clustered structural components are the basic features of patent categorization. We can achieve a 74% improvement of categorization performance over a baseline system that does not use the structural information of the patent.  相似文献   

16.
Determining requirements when searching for and retrieving relevant information suited to a user’s needs has become increasingly important and difficult, partly due to the explosive growth of electronic documents. The vector space model (VSM) is a popular method in retrieval procedures. However, the weakness in traditional VSM is that the indexing vocabulary changes whenever changes occur in the document set, or the indexing vocabulary selection algorithms, or parameters of the algorithms, or if wording evolution occurs. The major objective of this research is to design a method to solve the afore-mentioned problems for patent retrieval. The proposed method utilizes the special characteristics of the patent documents, the International Patent Classification (IPC) codes, to generate the indexing vocabulary for presenting all the patent documents. The advantage of the generated indexing vocabulary is that it remains unchanged, even if the document sets, selection algorithms, and parameters are changed, or if wording evolution occurs. Comparison of the proposed method with two traditional methods (entropy and chi-square) in manual and automatic evaluations is presented to verify the feasibility and validity. The results also indicate that the IPC-based indexing vocabulary selection method achieves a higher accuracy and is more satisfactory.  相似文献   

17.
刘明丽 《现代情报》2013,33(12):74-77
本文结合科技查新工作实践,对高校图书馆开展基于科技查新的专利信息服务的模式与策略探讨及研究。对北京工业大学师生进行专利信息服务的抽样调查,深入了解我校师生的专利意识、专利基础知识以及专利信息服务的需求,结合调查问卷结果和工作实践,提出了北京工业大学专利信息服务策略。基于开展的与知识产权相关的科技查新服务实践,对科技查新过程中用户专利意识的培养、专利文献资源检索与利用、用户专利申请前文献调研及知识产权局专利申请流程等方面进行探讨,并对科技查新过程中文献检索、查新结论撰写等进行分析研究。  相似文献   

18.
19.
This paper describes our novel retrieval model that is based on contexts of query terms in documents (i.e., document contexts). Our model is novel because it explicitly takes into account of the document contexts instead of implicitly using the document contexts to find query expansion terms. Our model is based on simulating a user making relevance decisions, and it is a hybrid of various existing effective models and techniques. It estimates the relevance decision preference of a document context as the log-odds and uses smoothing techniques as found in language models to solve the problem of zero probabilities. It combines these estimated preferences of document contexts using different types of aggregation operators that comply with different relevance decision principles (e.g., aggregate relevance principle). Our model is evaluated using retrospective experiments (i.e., with full relevance information), because such experiments can (a) reveal the potential of our model, (b) isolate the problems of the model from those of the parameter estimation, (c) provide information about the major factors affecting the retrieval effectiveness of the model, and (d) show that whether the model obeys the probability ranking principle. Our model is promising as its mean average precision is 60–80% in our experiments using different TREC ad hoc English collections and the NTCIR-5 ad hoc Chinese collection. Our experiments showed that (a) the operators that are consistent with aggregate relevance principle were effective in combining the estimated preferences, and (b) that estimating probabilities using the contexts in the relevant documents can produce better retrieval effectiveness than using the entire relevant documents.  相似文献   

20.
相关概念的关联参照检索是概念检索的重要研究内容。本文提出了一种基于主题的语义关联的参照检索模型,通过融合语义网、本体论的相关知识及信息提取等语言处理技术,提取关于特定主题的文档的主题概念及概念之间的关联构成该主题的语义关联模型,并辅助于参照检索过程。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号