首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper is concerned with some aspects of database interfaces for casual, naive users. A “casual user” is defined as an individual who wishes to execute queries once or twice a month, and a “naive user” is someone who has little or no expertise in operating computers. The study focuses on a specific group of casual, naive users, analyzes their needs and proposes a solution. The proposed interface consists of a graphical display of a model of a database and a natural language query language. One of the unique properties of the database interface is that it allows the user to see local item names within the context of a global structure. The interface was then tested to determine whether it was acceptable to the user population and to discover the level of graphical model that the users would find most comfortable.  相似文献   

2.
One difficult problem in information retrieval (IR) is the proper interpretation of user queries. It is extremely hard for users to express their information needs in a specific yet exhaustive way. In an effort to alleviate this problem, two theoretical models have been proposed to utilize user characteristics maintained in the form of a user profile. Although the idea of integrating user profiles into an IR system is intuitively appealing, and the models seem viable, no research to date has established a foundation for the roles of user profiles in such a system. Aiming at the investigation of the roles of user profiles, therefore, this study first identifies and extends various query/profile interaction models to provide a ground upon which the investigation can be undertaken. From a continuum of models characterized on the basis of interaction types, metrics, and parameters, nearly 400 models are chosen to investigate the “model space.” New measures are developed based on the notion of user satisfaction/frustration. In addition, three different criteria are used to guide users in making judgments on the quality of retrieved items. Analysis of the data obtained from the experiments shows that, for a wide variety of criteria and metrics, there are always some query/profile interaction models that outperform the query alone model. In addition, preferable characteristics for different criteria are identified in terms of interaction types, parameters, and metrics.  相似文献   

3.
Noetica is a tool for structuring knowledge about concepts and the relationships between them. It differs from typical information systems in that the knowledge it represents is abstract, highly connected and includes meta-knowledge (knowledge about knowledge). Noetica represents knowledge using a strongly-typed semantic network. By providing a rich type system it is possible to represent conceptual information using formalised structures. A class hierarchy provides a basic classification for all objects. This allows for a consistency of representation that is not often found in “free” semantic networks and gives the ability to easily extend a knowledge model while retaining its semantics. We also provide visualisation and query tools for this data model. Visualisation can be used to explore complete sets of link-classes, show paths while navigating through the database, or visualise the results of queries. Noetica supports goal-directed queries (a series of user-supplied goals that the system attempts to satisfy in sequence) and path-finding queries (where the system find relationships between objects in the database by following links).  相似文献   

4.
When public catalog users enter queries that exactly match the catalog's controlled vocabulary, online systems should respond with browsing lists of alphabetically arranged subject headings, because such displays guide users to retrievals based on the assignment of the matched subject headings to bibliographic records. Unfortunately, studies of online catalog searching demonstrate that alphabetical displays are no longer capable of managing large numbers of subdivided forms of subject headings, because searchers exhibit low levels of perseverance when faced with large numbers of retrievals. This paper introduces a new approach to displaying retrieved subject headings in subject searching—the exact-display approach—designed to encourage users to browse bibliographic information. The purpose of this paper is to emphasize the importance of the exact-display approach by showing how many user queries would be candidates for this approach, demonstrate an implementation of the exact-display approach in an experimental online catalog, and feature end-user experiences with this approach as implemented in the experimental catalog. End-user experiences gave the authors the opportunity to make several recommendations for enhancing the original design of the exact-display approach so that future implementations of this approach in operational online catalogs are responsive to the needs of online catalog users.  相似文献   

5.
Using genetic algorithms to evolve a population of topical queries   总被引:1,自引:1,他引:0  
Systems for searching the Web based on thematic contexts can be built on top of a conventional search engine and benefit from the huge amount of content as well as from the functionality available through the search engine interface. The quality of the material collected by such systems is highly dependant on the vocabulary used to generate the search queries. In this scenario, selecting good query terms can be seen as an optimization problem where the objective function to be optimized is based on the effectiveness of a query to retrieve relevant material. Some characteristics of this optimization problem are: (1) the high-dimensionality of the search space, where candidate solutions are queries and each term corresponds to a different dimension, (2) the existence of acceptable suboptimal solutions, (3) the possibility of finding multiple solutions, and in many cases (4) the quest for novelty. This article describes optimization techniques based on Genetic Algorithms to evolve “good query terms” in the context of a given topic. The proposed techniques place emphasis on searching for novel material that is related to the search context. We discuss the use of a mutation pool to allow the generation of queries with new terms, study the effect of different mutation rates on the exploration of query-space, and discuss the use of a especially developed fitness function that favors the construction of queries containing novel but related terms.  相似文献   

6.
Query auto completion (QAC) models recommend possible queries to web search users when they start typing a query prefix. Most of today’s QAC models rank candidate queries by popularity (i.e., frequency), and in doing so they tend to follow a strict query matching policy when counting the queries. That is, they ignore the contributions from so-called homologous queries, queries with the same terms but ordered differently or queries that expand the original query. Importantly, homologous queries often express a remarkably similar search intent. Moreover, today’s QAC approaches often ignore semantically related terms. We argue that users are prone to combine semantically related terms when generating queries.We propose a learning to rank-based QAC approach, where, for the first time, features derived from homologous queries and semantically related terms are introduced. In particular, we consider: (i) the observed and predicted popularity of homologous queries for a query candidate; and (ii) the semantic relatedness of pairs of terms inside a query and pairs of queries inside a session. We quantify the improvement of the proposed new features using two large-scale real-world query logs and show that the mean reciprocal rank and the success rate can be improved by up to 9% over state-of-the-art QAC models.  相似文献   

7.
We investigated the searching behaviors of twenty-four children in grades 6, 7, and 8 (ages 11–13) in finding information on three types of search tasks in Google. Children conducted 72 search sessions and issued 150 queries. Children's phrase- and question-like queries combined were much more prevalent than keyword queries (70% vs. 30%, respectively). Fifty two percent of the queries were reformulations (33 sessions). We classified children's query reformulation types into five classes based on the taxonomy by Liu et al. (2010). We found that most query reformulations were by Substitution and Specialization, and that children hardly repeated queries. We categorized children's queries by task facets and examined the way they expressed these facets in their query formulations and reformulations. Oldest children tended to target the general topic of search tasks in their queries most frequently, whereas younger children expressed one of the two facets more often. We assessed children's achieved task outcomes using the search task outcomes measure we developed. Children were mostly more successful on the fact-finding and fully self-generated task and partially successful on the research-oriented task. Query type, reformulation type, achieved task outcomes, and expressing task facets varied by task type and grade level. There was no significant effect of query length in words or of the number of queries issued on search task outcomes. The study findings have implications for human intervention, digital literacy, search task literacy, as well as for system intervention to support children's query formulation and reformulation during interaction with Google.  相似文献   

8.
Information retrieval systems consist of many complicated components. Research and development of such systems is often hampered by the difficulty in evaluating how each particular component would behave across multiple systems. We present a novel integrated information retrieval system—the Query, Cluster, Summarize (QCS) system—which is portable, modular, and permits experimentation with different instantiations of each of the constituent text analysis components. Most importantly, the combination of the three types of methods in the QCS design improves retrievals by providing users more focused information organized by topic.We demonstrate the improved performance by a series of experiments using standard test sets from the Document Understanding Conferences (DUC) as measured by the best known automatic metric for summarization system evaluation, ROUGE. Although the DUC data and evaluations were originally designed to test multidocument summarization, we developed a framework to extend it to the task of evaluation for each of the three components: query, clustering, and summarization. Under this framework, we then demonstrate that the QCS system (end-to-end) achieves performance as good as or better than the best summarization engines.Given a query, QCS retrieves relevant documents, separates the retrieved documents into topic clusters, and creates a single summary for each cluster. In the current implementation, Latent Semantic Indexing is used for retrieval, generalized spherical k-means is used for the document clustering, and a method coupling sentence “trimming” and a hidden Markov model, followed by a pivoted QR decomposition, is used to create a single extract summary for each cluster. The user interface is designed to provide access to detailed information in a compact and useful format.Our system demonstrates the feasibility of assembling an effective IR system from existing software libraries, the usefulness of the modularity of the design, and the value of this particular combination of modules.  相似文献   

9.
In this paper the features of a microprocessor based architecture for bibliographic retrieval system are illustrated. The proposed system consists of the following three functional blocks: the “query processor”, the “simple query executers” and the “answer composer”. The query processor parses the queries and breaks the complex query into simple queries. Each simple query executer is able to perform the operations satisfying a simple query. Finally, the answer composer puts together the results of all simple query executers and produces the response to the query originally raised. This machine will allow the implementation of a very powerfull query language. The basic design goals are the system modularity and a whatever complex query's fulfilment. This is achieved through the proposed query language and by means of the system architecture allowing high parallelism in the performed operations.  相似文献   

10.
An imperfect document selection system is represented as the analogy of a system in which symbols are selected and transmitted through a noisy channel. Provided that transmission reception uncertainties and not meaning are considered, it is suggested that one of Shannon's equations is applicable, and a single figure measure of system efficiency, Ht, is proposed.Values obtained using this new yardstick are compared with Recall/Precision values obtained for a typical system. Further research is required to test whether system “improvements” resulting in higher values of Ht are perceived as such by users.  相似文献   

11.
Web queries in question format are becoming a common element of a user's interaction with Web search engines. Web search services such as Ask Jeeves – a publicly accessible question and answer (Q&A) search engine – request users to enter question format queries. This paper provides results from a study examining queries in question format submitted to two different Web search engines – Ask Jeeves that explicitly encourages queries in question format and the Excite search service that does not explicitly encourage queries in question format. We identify the characteristics of queries in question format in two different data sets: (1) 30,000 Ask Jeeves queries and 15,575 Excite queries, including the nature, length, and structure of queries in question format. Findings include: (1) 50% of Ask Jeeves queries and less than 1% of Excite were in question format, (2) most users entered only one query in question format with little query reformulation, (3) limited range of formats for queries in question format – mainly “where”, “what”, or “how” questions, (4) most common question query format was “Where can I find………” for general information on a topic, and (5) non-question queries may be in request format. Overall, four types of user Web queries were identified: keyword, Boolean, question, and request. These findings provide an initial mapping of the structure and content of queries in question and request format. Implications for Web search services are discussed.  相似文献   

12.
A transaction log analysis of the Nanyang Technological University (NTU) OPAC was conducted to identify query and search failure patterns with the goal of identifying areas of improvement for the system. One semester’s worth of OPAC transaction logs were obtained and from these, 641,991 queries were extracted and used for this work. Issues investigated included query length, frequency and type of search options and Boolean operators used as well as their relationships with search failure. Among other findings, results indicate that a majority of the queries were simple, with short query lengths and a low usage of Boolean operators. Failure analysis revealed that on average, users had an almost equal chance of obtaining no records or at least one record to a submitted query. We propose enhancements and suggest future areas of work to improve the users’ search experience with the NTU OPAC.  相似文献   

13.
Increasing knowledge of paedophile activity in P2P systems is a crucial societal concern, with important consequences on child protection, policy making, and internet regulation. Because of a lack of traces of P2P exchanges and rigorous analysis methodology, however, current knowledge of this activity remains very limited. We consider here a widely used P2P system, eDonkey, and focus on two key statistics: the fraction of paedophile queries entered in the system and the fraction of users who entered such queries. We collect hundreds of millions of keyword-based queries; we design a paedophile query detection tool for which we establish false positive and false negative rates using assessment by experts; with this tool and these rates, we then estimate the fraction of paedophile queries in our data; finally, we design and apply methods for quantifying users who entered such queries. We conclude that approximately 0.25% of queries are paedophile, and that more than 0.2% of users enter such queries. These statistics are by far the most precise and reliable ever obtained in this domain.  相似文献   

14.
This paper proposes an efficient and effective solution to the problem of choosing the queries to suggest to web search engine users in order to help them in rapidly satisfying their information needs. By exploiting a weak function for assessing the similarity between the current query and the knowledge base built from historical users’ sessions, we re-conduct the suggestion generation phase to the processing of a full-text query over an inverted index. The resulting query recommendation technique is very efficient and scalable, and is less affected by the data-sparsity problem than most state-of-the-art proposals. Thus, it is particularly effective in generating suggestions for rare queries occurring in the long tail of the query popularity distribution. The quality of suggestions generated is assessed by evaluating the effectiveness in forecasting the users’ behavior recorded in historical query logs, and on the basis of the results of a reproducible user study conducted on publicly-available, human-assessed data. The experimental evaluation conducted shows that our proposal remarkably outperforms two other state-of-the-art solutions, and that it can generate useful suggestions even for rare and never seen queries.  相似文献   

15.
The research examines the notion that the principles underlying the procedure used by doctors to diagnose a patient's disease are useful in the design of “intelligent” IR systems because the task of the doctor is conceptually similar to the computer (or human) intermediary's task in “intelligent information retrieval”: to draw out, through interaction with the IR system, the user's query/information need. The research is reported in two parts. In Part II, an information retrieval tool is described which is based on “intelligent information retrieval” assumptions about the information user. In Part I, presented here, the theoretical framework for the tool is set out. This framework is borrowed from the diagnostic procedure currently used in medicine, called “differential diagnosis”. Because of the severe consequences that attend misdiagnosis, the operating principle in differential diagnosis is (1) to expand the uncertainty in the diagnosis situation so that all possible hypotheses and evidence are considered, then (2) to contract the uncertainty in a step by step fashion (from an examination of the patient's symptoms, through the patient's history and a physical (signs), to laboratory tests). The IR theories of Taylor, Kuhlthau and Belkin are used to demonstrate that these medical diagnosis procedures are already present in IR and that it is a viable model with which to design “intelligent” IR tools and systems.  相似文献   

16.
A growing body of research is beginning to explore the information-seeking behavior of Web users. The vast majority of these studies have concentrated on the area of textual information retrieval (IR). Little research has examined how people search for non-textual information on the Internet, and few large-scale studies has investigated visual information-seeking behavior with general-purpose Web search engines. This study examined visual information needs as expressed in users’ Web image queries. The data set examined consisted of 1,025,908 sequential queries from 211,058 users of Excite, a major Internet search service. Twenty-eight terms were used to identify queries for both still and moving images, resulting in a subset of 33,149 image queries by 9855 users. We provide data on: (1) image queries – the number of queries and the number of search terms per user, (2) image search sessions – the number of queries per user, modifications made to subsequent queries in a session, and (3) image terms – their rank/frequency distribution and the most highly used search terms. On average, there were 3.36 image queries per user containing an average of 3.74 terms per query. Image queries contained a large number of unique terms. The most frequently occurring image related terms appeared less than 10% of the time, with most terms occurring only once. We contrast this to earlier work by P.G.B. Enser, Journal of Documentation 51 (2) (1995) 126–170, who examined written queries for pictorial information in a non-digital environment. Implications for the development of models for visual information retrieval, and for the design of Web search engines are discussed.  相似文献   

17.
18.
The methods of information queries building in the SDI systems on the basis of the user's publications are presented in this paper. In most cases the users of the SDI system are scientists whose work is marked by publications resulting from the research they do. It was found that the users' publications may constitute input data for information queries building.The examination of the possible compatibility between the user's information queries and his publications consisted of determining the similarity between of a set of keywords indexed from the information query and a set of keywords indexed from the user's publications.Two methods of information query constructions determined by logical operators AND, OR, NOT and a set of weighted keywords are described.  相似文献   

19.
In the web environment, most of the queries issued by users are implicit by nature. Inferring the different temporal intents of this type of query enhances the overall temporal part of the web search results. Previous works tackling this problem usually focused on news queries, where the retrieval of the most recent results related to the query are usually sufficient to meet the user's information needs. However, few works have studied the importance of time in queries such as “Philip Seymour Hoffman” where the results may require no recency at all. In this work, we focus on this type of queries named “time-sensitive queries” where the results are preferably from a diversified time span, not necessarily the most recent one. Unlike related work, we follow a content-based approach to identify the most important time periods of the query and integrate time into a re-ranking model to boost the retrieval of documents whose contents match the query time period. For that purpose, we define a linear combination of topical and temporal scores, which reflects the relevance of any web document both in the topical and temporal dimensions, thus contributing to improve the effectiveness of the ranked results across different types of queries. Our approach relies on a novel temporal similarity measure that is capable of determining the most important dates for a query, while filtering out the non-relevant ones. Through extensive experimental evaluation over web corpora, we show that our model offers promising results compared to baseline approaches. As a result of our investigation, we publicly provide a set of web services and a web search interface so that the system can be graphically explored by the research community.  相似文献   

20.
The Web has become a worldwide source of information and a mainstream business tool. It is changing the way people conduct the daily business of their lives. As these changes are occurring, we need to understand what Web searching trends are emerging within the various global regions. What are the regional differences and trends in Web searching, if any? What is the effectiveness of Web search engines as providers of information? As part of a body of research studying these questions, we have analyzed two data sets collected from queries by mainly European users submitted to AlltheWeb.com on 6 February 2001 and 28 May 2002. AlltheWeb.com is a major and highly rated European search engine. Each data set contains approximately a million queries submitted by over 200,000 users and spans a 24-h period. This longitudinal benchmark study shows that European Web searching is evolving in certain directions. There was some decline in query length, with extremely simple queries. European search topics are broadening, with a notable percentage decline in sexual and pornographic searching. The majority of Web searchers view fewer than five Web documents, spending only seconds on a Web document. Approximately 50% of the Web documents viewed by these European users were topically relevant. We discuss the implications for Web information systems and information content providers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号