首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
In this paper, we lay out a relational approach for indexing and retrieving photographs from a collection. The increase of digital image acquisition devices, combined with the growth of the World Wide Web, requires the development of information retrieval (IR) models and systems that provide fast access to images searched by users in databases. The aim of our work is to develop an IR model suited to images, integrating rich semantics for representing this visual data and user queries, which can also be applied to large corpora.  相似文献   

2.
This paper reports on the underlying IR problems encountered when dealing with the complex morphology and compound constructions found in the Hungarian language. It describes evaluations carried out on two general stemming strategies for this language, and also demonstrates that a light stemming approach could be quite effective. Based on searches done on the CLEF test collection, we find that a more aggressive suffix-stripping approach may produce better MAP. When compared to an IR scheme without stemming or one based on only a light stemmer, we find the differences to be statistically significant. When compared with probabilistic, vector-space and language models, we find that the Okapi model results in the best retrieval effectiveness. The resulting MAP is found to be about 35% better than the classical tf idf approach, particularly for very short requests. Finally, we demonstrate that applying an automatic decompounding procedure for both queries and documents significantly improves IR performance (+10%), compared to word-based indexing strategies.  相似文献   

3.
In this article we examine the effectiveness of consent in data protection legislation. We argue that the current legal framework for consent, which has its basis in the idea of autonomous authorisation, does not work in practice. In practice the legal requirements for consent lead to ‘consent desensitisation’, undermining privacy protection and trust in data processing. In particular we argue that stricter legal requirements for giving and obtaining consent (explicit consent) as proposed in the European Data protection regulation will further weaken the effectiveness of the consent mechanism. Building on Miller and Wertheimer’s ‘Fair Transaction’ model of consent we will examine alternatives to explicit consent.  相似文献   

4.
本文简要介绍了国家自然科学基金重点项目"闪存数据库技术研究"在闪存存储管理、闪存数据库索引、闪存数据库缓冲区管理、闪存数据库查询处理、闪存数据库事务管理等方面取得的重要创新性研究成果,并对主要研究热点和发展趋势进行了分析。  相似文献   

5.
The emerging discipline of Machine Ethics is concerned with creating autonomous artificial moral agents that perform ethically significant actions out in the world. Recently, Wallach and Allen (Moral machines: teaching robots right from wrong, Oxford University Press, Oxford, 2009) and others have argued that a virtue-based moral framework is a promising tool for meeting this end. However, even if we could program autonomous machines to follow a virtue-based moral framework, there are certain pressing ethical issues that need to be taken into account, prior to the implementation and development stages. Here I examine whether the creation of virtuous autonomous machines is morally permitted by the central tenets of virtue ethics. It is argued that the creation of such machines violates certain tenets of virtue ethics, and hence that the creation and use of those machines is impermissible. One upshot of this is that, although virtue ethics may have a role to play in certain near-term Machine Ethics projects (e.g. designing systems that are sensitive to ethical considerations), machine ethicists need to look elsewhere for a moral framework to implement into their autonomous artificial moral agents, Wallach and Allen’s claims notwithstanding.  相似文献   

6.
7.
In a dynamic retrieval system, documents must be ingested as they arrive, and be immediately findable by queries. Our purpose in this paper is to describe an index structure and processing regime that accommodates that requirement for immediate access, seeking to make the ingestion process as streamlined as possible, while at the same time seeking to make the growing index as small as possible, and seeking to make term-based querying via the index as efficient as possible. We describe a new compression operation and a novel approach to extensible lists which together facilitate that triple goal. In particular, the structure we describe provides incremental document-level indexing using as little as two bytes per posting and only a small amount more for word-level indexing; provides fast document insertion; supports immediate and continuous queryability; provides support for fast conjunctive queries and similarity score-based ranked queries; and facilitates fast conversion of the dynamic index to a “normal” static compressed inverted index structure. Measurement of our new mechanism confirms that in-memory dynamic document-level indexes for collections into the gigabyte range can be constructed at a rate of two gigabytes/minute using a typical server architecture, that multi-term conjunctive Boolean queries can be resolved in just a few milliseconds each on average even while new documents are being concurrently ingested, and that the net memory space required for all of the required data structures amounts to an average of as little as two bytes per stored posting, less than half the space required by the best previous mechanism.  相似文献   

8.
The article analyzes user–IR system interaction from the broad, socio-cognitive perspective of lessons we can learn about human brain evolution when we compare the Neanderthal brain to the human brain before and after a small human brain mutation is hypothesized to have occurred 35,000–75,000 years ago. The enhanced working memory mutation enabled modern humans (i) to decode unfamiliar environmental stimuli with greater focusing power on adaptive solutions to environmental changes and problems, and (ii) to encode environmental stimuli in more efficient, generative knowledge structures. A sociological theory of these evolving, more efficient encoding knowledge structures is given. These new knowledge structures instilled in humans not only the ability to adapt to and survive novelty and/or changing conditions in the environment, but they also instilled an imperative to do so. Present day IR systems ignore the encoding imperative in their design framework. To correct for this lacuna, we propose the evolutionary-based socio-cognitive framework model for designing interactive IR systems. A case study is given to illustrate the functioning of the model.  相似文献   

9.
With the advances in natural language processing (NLP) techniques and the need to deliver more fine-grained information or answers than a set of documents, various QA techniques have been developed corresponding to different question and answer types. A comprehensive QA system must be able to incorporate individual QA techniques as they are developed and integrate their functionality to maximize the system’s overall capability in handling increasingly diverse types of questions. To this end, a new QA method was developed to learn strategies for determining module invocation sequences and boosting answer weights for different types of questions. In this article, we examine the roles and effects of the answer verification and weight boosting method, which is the main core of the automatically generated strategy-driven QA framework, in comparison with a strategy-less, straightforward answer-merging approach and a strategy-driven but with manually constructed strategies.  相似文献   

10.
【目的/意义】为提升主流融媒体意识形态建设和舆论引导能力,解决大数据时代背景下主流融媒体多模态 信息资源管理的困境,构建高效的热点发现机制。【方法/过程】笔者着眼于主流融媒体热点发现需求构建需求体 系,然后利用Scrapy-Redis框架、HBase数据库和MapReduce实现了数据的精准采集、有序存储和高效处理,再基于 多模态信息融合的理念,借助 NLP技术对信息资源的特征进行提取,最后利用 LDA2vec模型和 Single-Pass算法实 现了信息归集和热点的发现与更新。【结果/结论】仿真实验结果表明,本研究所使用的方法,能够较好地实现多模 态信息的归集和热点的提取,效果较同类模型有明显提升。【创新/局限】但是在运用NLP技术处理多模态信息时各 处理环节的衔接尚不够流畅,后续仍需进行改进提升。  相似文献   

11.
Hashing has been an emerging topic and has recently attracted widespread attention in multi-modal similarity search applications. However, most existing approaches rely on relaxation schemes to generate binary codes, leading to large quantization errors. In addition, amounts of existing approaches embed labels into the pairwise similarity matrix, leading to expensive time and space costs and losing category information. To address these issues, we propose an Efficient Discrete Matrix factorization Hashing (EDMH). Specifically, EDMH first learns the latent subspaces for individual modality through matrix factorization strategy, which preserves the semantic structure representation information of each modality. In particular, we develop a semantic label offset embedding learning strategy, improving the stability of label embedding regression. Furthermore, we design an efficient discrete optimization scheme to generate compact binary codes discretely. Eventually, we present two efficient learning strategies EDMH-L and EDMH-S to pursue high-quality hash functions. Extensive experiments on various widely-used databases verify that the proposed algorithms produce significant performance and outperform some state-of-the-art approaches, with an average improvement of 2.50% (for Wiki), 2.66% (for MIRFlickr) and 2.25% (for NUS-WIDE) over the best available results, respectively.  相似文献   

12.
智能信息处理的基础理论探讨   总被引:1,自引:0,他引:1  
提出综合自然语言理解和计算智能作为智能信息处理基础理论的思想。本文用智能分类、智能标引、智能检索、智能文描、机器翻译等智能信息处理实践说明自然语言理解可以提供理论架构而计算智能可以提供技术实现。  相似文献   

13.
In this paper, we present a well-defined general matrix framework for modelling Information Retrieval (IR). In this framework, collections, documents and queries correspond to matrix spaces. Retrieval aspects, such as content, structure and semantics, are expressed by matrices defined in these spaces and by matrix operations applied on them. The dualities of these spaces are identified through the application of frequency-based operations on the proposed matrices and through the investigation of the meaning of their eigenvectors. This allows term weighting concepts used for content-based retrieval, such as term frequency and inverse document frequency, to translate directly to concepts for structure-based retrieval. In addition, concepts such as pagerank, authorities and hubs, determined by exploiting the structural relationships between linked documents, can be defined with respect to the semantic relationships between terms. Moreover, this mathematical framework can be used to express classical and alternative evaluation measures, involving, for instance, the structure of documents, and to further explain and relate IR models and theory. The high level of reusability and abstraction of the framework leads to a logical layer for IR that makes system design and construction significantly more efficient, and thus, better and increasingly personalised systems can be built at lower costs.  相似文献   

14.
《普罗米修斯》2012,30(1):75-91

In April 1997, Tasmania (Australia) adopted the reputably successful New Brunswick (Canada) industrial strategy to build an information technology (IT) industry of significance. The strategy aims to overcome isolation in small regional economies and structurally change from declining natural resource industries. Both plans reject neo-classical economics-based industry policy, opting instead for a strong state-based investment planning approach. An analytical framework is set out, using Adolph Lowe's 'Instrumental Analysis', to examine implementation of both IT strategies. Implications of this analysis are drawn for any attempts at developing IT regional plans and, more generally, as a guide for broad strategic-based national industrial strategies.  相似文献   

15.
【目的/意义】大数据时代对各领域信息检索系统检索模型查准率提出了较高要求。然而,现阶段对于传统检索模型的相关研究陷入瓶颈,表现为近若干年被提出的相关模型查准率提升幅度小,无法较好满足当前用户对于精准查询的需求。由此,高查准率检索模型亟待探索。近年来,一种基于数字信号处理理论的新型检索模型构架(Digital Signal Processing Framework:DSPF)被提出。同时,基于该模型构架的检索模型已被验证相较于传统检索模型具备显著的查准率优势。【方法/过程】据此,本研究基于数字信号处理理论构架,引入了经典概率模型F2LOG与F2EXP的词项权重计算方法,提出了模型DSPF-F2LOG与DSPF-F2EXP。为验证其查准率,本研究通过实验法,基于多种不同类型的标准数据集,采用多项查准率指标,将其与多个经典检索模型进行查准率对比分析。【结果/结论】实验结果表明,本研究所提模型较经典检索模型普遍具备更高查准率,且至少与当前查准率最高的基于数字信号处理理论的检索模型具备相当的查准率表现。本研究所提出的两个高查准率DSP模型可有效提高当前各领域信息检索系统对于非结构化文本的查准率。...  相似文献   

16.
The application of natural language processing (NLP) to financial fields is advancing with an increase in the number of available financial documents. Transformer-based models such as Bidirectional Encoder Representations from Transformers (BERT) have been successful in NLP in recent years. These cutting-edge models have been adapted to the financial domain by applying financial corpora to existing pre-trained models and by pre-training with the financial corpora from scratch. In Japanese, by contrast, financial terminology cannot be applied from a general vocabulary without further processing. In this study, we construct language models suitable for the financial domain. Furthermore, we compare methods for adapting language models to the financial domain, such as pre-training methods and vocabulary adaptation. We confirm that the adaptation of a pre-training corpus and tokenizer vocabulary based on a corpus of financial text is effective in several downstream financial tasks. No significant difference is observed between pre-training with the financial corpus and continuous pre-training from the general language model with the financial corpus. We have released our source code and pre-trained models.  相似文献   

17.
云平台下大型矩阵乘法运算处理方案设计   总被引:1,自引:0,他引:1  
本文在开源云计算平台Hadoop的基础上利用MapReduce和HDFS,针对大型矩阵相乘,并结合个人实际水平进行简单的开发应用。通过开发和研究,进一步探讨云计算关键技术MapReduce对于海量数据处理的意义。  相似文献   

18.
Does human intellectual indexing have a continuing role to play in the face of increasingly sophisticated automatic indexing techniques? In this two-part essay, a computer scientist and long-time TREC participant (Pérez-Carballo) and a practitioner and teacher of human cataloging and indexing (Anderson) pursue this question by reviewing the opinions and research of leading experts on both sides of this divide. We conclude that human analysis should be used on a much more selective basis, and we offer suggestions on how these two types indexing might be allocated to best advantage. Part I of the essay critiques the comparative research, then explores the nature of human analysis of messages or texts and efforts to formulate rules to make human practice more rigorous and predictable. We find that research comparing human versus automatic approaches has done little to change strongly held beliefs, in large part because many associated variables have not been isolated or controlled.Part II focuses on current methods in automatic indexing, its gradual adoption by major indexing and abstracting services, and ways for allocating human and machine approaches. Overall, we conclude that both approaches to indexing have been found to be effective by researchers and searchers, each with particular advantages and disadvantages. However, automatic indexing has the over-arching advantage of decreasing cost, as human indexing becomes ever more expensive.  相似文献   

19.
20.
This paper proposes a strategic model for assessing the coherence between companies’ knowledge strategies and their business strategies as well as in their competitive and organisational contexts. In analysing knowledge management literature, we locate three principal strategies: (1) knowledge development (internal or external), (2) knowledge sharing (codification or personalisation) and (3) knowledge exploitation (internal or external). We then position the three strategies and six related policies in the context-content-process dimensions of Pettigrew's model to create a useful framework for strategic analysis and a model to assess the coherence of companies’ knowledge strategy. The model can be used to evaluate how an existing knowledge strategy aligns with a company's characteristics and to formulate and implement a coherent knowledge strategy based on the current competitive environment, organisational context and business strategy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号