首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A transaction log analysis of the Nanyang Technological University (NTU) OPAC was conducted to identify query and search failure patterns with the goal of identifying areas of improvement for the system. One semester’s worth of OPAC transaction logs were obtained and from these, 641,991 queries were extracted and used for this work. Issues investigated included query length, frequency and type of search options and Boolean operators used as well as their relationships with search failure. Among other findings, results indicate that a majority of the queries were simple, with short query lengths and a low usage of Boolean operators. Failure analysis revealed that on average, users had an almost equal chance of obtaining no records or at least one record to a submitted query. We propose enhancements and suggest future areas of work to improve the users’ search experience with the NTU OPAC.  相似文献   

2.
To resolve some of lexical disagreement problems between queries and FAQs, we propose a reliable FAQ retrieval system using query log clustering. On indexing time, the proposed system clusters the logs of users’ queries into predefined FAQ categories. To increase the precision and the recall rate of clustering, the proposed system adopts a new similarity measure using a machine readable dictionary. On searching time, the proposed system calculates the similarities between users’ queries and each cluster in order to smooth FAQs. By virtue of the cluster-based retrieval technique, the proposed system could partially bridge lexical chasms between queries and FAQs. In addition, the proposed system outperforms the traditional information retrieval systems in FAQ retrieval.  相似文献   

3.
In ad hoc querying of document collections, current approaches to ranking primarily rely on identifying the documents that contain the query terms. Methods such as query expansion, based on thesaural information or automatic feedback, are used to add further terms, and can yield significant though usually small gains in effectiveness. Another approach to adding terms, which we investigate in this paper, is to use natural language technology to annotate - and thus disambiguate - key terms by the concept they represent. Using biomedical research documents, we quantify the potential benefits of tagging users’ targeted concepts in queries and documents in domain-specific information retrieval. Our experiments, based on the TREC Genomics track data, both on passage and full-text retrieval, found no evidence that automatic concept recognition in general is of significant value for this task. Moreover, the issues raised by these results suggest that it is difficult for such disambiguation to be effective.  相似文献   

4.
To obtain high performances, previous works on FAQ retrieval used high-level knowledge bases or handcrafted rules. However, it is a time and effort consuming job to construct these knowledge bases and rules whenever application domains are changed. To overcome this problem, we propose a high-performance FAQ retrieval system only using users’ query logs as knowledge sources. During indexing time, the proposed system efficiently clusters users’ query logs using classification techniques based on latent semantic analysis. During retrieval time, the proposed system smoothes FAQs using the query log clusters. In the experiment, the proposed system outperformed the conventional information retrieval systems in FAQ retrieval. Based on various experiments, we found that the proposed system could alleviate critical lexical disagreement problems in short document retrieval. In addition, we believe that the proposed system is more practical and reliable than the previous FAQ retrieval systems because it uses only data-driven methods without high-level knowledge sources.  相似文献   

5.
Although most of the queries submitted to search engines are composed of a few keywords and have a length that ranges from three to six words, more than 15% of the total volume of the queries are verbose, introduce ambiguity and cause topic drifts. We consider verbosity a different property of queries from length since a verbose query is not necessarily long, it might be succinct and a short query might be verbose. This paper proposes a methodology to automatically detect verbose queries and conditionally modify queries. The methodology proposed in this paper exploits state-of-the-art classification algorithms, combines concepts from a large linguistic database and uses a topic gisting algorithm we designed for verbose query modification purposes. Our experimental results have been obtained using the TREC Robust track collection, thirty topics classified by difficulty degree, four queries per topic classified by verbosity and length, and human assessment of query verbosity. Our results suggest that the methodology for query modification conditioned to query verbosity detection and topic gisting is significantly effective and that query modification should be refined when topic difficulty and query verbosity are considered since these two properties interact and query verbosity is not straightforwardly related to query length.  相似文献   

6.
Queries submitted to search engines can be classified according to the user goals into three distinct categories: navigational, informational, and transactional. Such classification may be useful, for instance, as additional information for advertisement selection algorithms and for search engine ranking functions, among other possible applications. This paper presents a study about the impact of using several features extracted from the document collection and query logs on the task of automatically identifying the users’ goals behind their queries. We propose the use of new features not previously reported in literature and study their impact on the quality of the query classification task. Further, we study the impact of each feature on different web collections, showing that the choice of the best set of features may change according to the target collection.  相似文献   

7.
This study addresses the question of whether the way in which sets of query terms are identified has an impact on the effectiveness of users’ information seeking efforts. Query terms are text strings used as input to an information access system; they are products of a method or grammar that identifies a set of query terms. We conducted an experiment that compared the effectiveness of sets of query terms identified for a single book by three different methods. One had been previously prepared by a human indexer for a back-of-the-book index. The other two were identified by computer programs that used a combination of linguistic and statistical criteria to extract terms from full text. Effectiveness was measured by (1) whether selected query terms led participants to correct answers and (2) how long it took participants to obtain correct answers. Our results show that two sets of terms – the human terms and the set selected according to the linguistically more sophisticated criteria – were significantly more effective than the third set of terms. This single case demonstrates that query languages do have a measurable impact on the effectiveness of query term languages in the interactive information access process. The procedure described in this paper can be used to assess the effectiveness for information seekers of query terms identified by any query language.  相似文献   

8.
Large-scale search engines have become a fundamental tool to efficiently access information on the Web. Typically, users expect answers in sub-second time frames, which demands highly efficient algorithms to traverse the data structures to return the top-k results. Despite different top-k algorithms that avoid processing all postings for all query terms, finding one algorithm that performs the fastest on any query is not always possible. The fastest average algorithm does not necessarily perform the best on all queries when evaluated on a per-query basis. To overcome this challenge, we propose to combine different state-of-the-art disjunctive top-k query processing algorithms to minimize the execution time by selecting the most promising one for each query. We model the selection step as a classification problem in a machine-learning setup. We conduct extensive experimentation and compare the results against state-of-the-art baselines using standard document collections and query sets. On ClueWeb12, our proposal shows a speed-up of up to 1.20x for non-blocked index organizations and 1.19x for block-based ones. Moreover, tail latencies are reduced showing proportional improvements on average, but a resulting dramatic decrease in latency variance. Given these findings, the proposed approach can be easily applied to existing search infrastructures to speed up query processing and reduce resource consumption, positively impacting providers’ operative costs.  相似文献   

9.
Professional, workplace searching is different from general searching, because it is typically limited to specific facets and targeted to a single answer. We have developed the semantic component (SC) model, which is a search feature that allows searchers to structure and specify the search to context-specific aspects of the main topic of the documents. We have tested the model in an interactive searching study with family doctors with the purpose to explore doctors’ querying behaviour, how they applied the means for specifying a search, and how these features contributed to the search outcome. In general, the doctors were capable of exploiting system features and search tactics during the searching. Most searchers produced well-structured queries that contained appropriate search facets. When searches failed it was not due to query structure or query length. Failures were mostly caused by the well-known vocabulary problem. The problem was exacerbated by using certain filters as Boolean filters. The best working queries were structured into 2–3 main facets out of 3–5 possible search facets, and expressed with terms reflecting the focal view of the search task. The findings at the same time support and extend previous results about query structure and exhaustivity showing the importance of selecting central search facets and express them from the perspective of search task. The SC model was applied in the highest performing queries except one. The findings suggest that the model might be a helpful feature to structure queries into central, appropriate facets, and in returning highly relevant documents.  相似文献   

10.
To improve search engine effectiveness, we have observed an increased interest in gathering additional feedback about users’ information needs that goes beyond the queries they type in. Adaptive search engines use explicit and implicit feedback indicators to model users or search tasks. In order to create appropriate models, it is essential to understand how users interact with search engines, including the determining factors of their actions. Using eye tracking, we extend this understanding by analyzing the sequences and patterns with which users evaluate query result returned to them when using Google. We find that the query result abstracts are viewed in the order of their ranking in only about one fifth of the cases, and only an average of about three abstracts per result page are viewed at all. We also compare search behavior variability with respect to different classes of users and different classes of search tasks to reveal whether user models or task models may be greater predictors of behavior. We discover that gender and task significantly influence different kinds of search behaviors discussed here. The results are suggestive of improvements to query-based search interface designs with respect to both their use of space and workflow.  相似文献   

11.
Both general and domain-specific search engines have adopted query suggestion techniques to help users formulate effective queries. In the specific domain of literature search (e.g., finding academic papers), the initial queries are usually based on a draft paper or abstract, rather than short lists of keywords. In this paper, we investigate phrasal-concept query suggestions for literature search. These suggestions explicitly specify important phrasal concepts related to an initial detailed query. The merits of phrasal-concept query suggestions for this domain are their readability and retrieval effectiveness: (1) phrasal concepts are natural for academic authors because of their frequent use of terminology and subject-specific phrases and (2) academic papers describe their key ideas via these subject-specific phrases, and thus phrasal concepts can be used effectively to find those papers. We propose a novel phrasal-concept query suggestion technique that generates queries by identifying key phrasal-concepts from pseudo-labeled documents and combines them with related phrases. Our proposed technique is evaluated in terms of both user preference and retrieval effectiveness. We conduct user experiments to verify a preference for our approach, in comparison to baseline query suggestion methods, and demonstrate the effectiveness of the technique with retrieval experiments.  相似文献   

12.
Query auto completion (QAC) models recommend possible queries to web search users when they start typing a query prefix. Most of today’s QAC models rank candidate queries by popularity (i.e., frequency), and in doing so they tend to follow a strict query matching policy when counting the queries. That is, they ignore the contributions from so-called homologous queries, queries with the same terms but ordered differently or queries that expand the original query. Importantly, homologous queries often express a remarkably similar search intent. Moreover, today’s QAC approaches often ignore semantically related terms. We argue that users are prone to combine semantically related terms when generating queries.We propose a learning to rank-based QAC approach, where, for the first time, features derived from homologous queries and semantically related terms are introduced. In particular, we consider: (i) the observed and predicted popularity of homologous queries for a query candidate; and (ii) the semantic relatedness of pairs of terms inside a query and pairs of queries inside a session. We quantify the improvement of the proposed new features using two large-scale real-world query logs and show that the mean reciprocal rank and the success rate can be improved by up to 9% over state-of-the-art QAC models.  相似文献   

13.
Query suggestion is generally an integrated part of web search engines. In this study, we first redefine and reduce the query suggestion problem as “comparison of queries”. We then propose a general modular framework for query suggestion algorithm development. We also develop new query suggestion algorithms which are used in our proposed framework, exploiting query, session and user features. As a case study, we use query logs of a real educational search engine that targets K-12 students in Turkey. We also exploit educational features (course, grade) in our query suggestion algorithms. We test our framework and algorithms over a set of queries by an experiment and demonstrate a 66–90% statistically significant increase in relevance of query suggestions compared to a baseline method.  相似文献   

14.
Interactive query expansion (IQE) (c.f. [Efthimiadis, E. N. (1996). Query expansion. Annual Review of Information Systems and Technology, 31, 121–187]) is a potentially useful technique to help searchers formulate improved query statements, and ultimately retrieve better search results. However, IQE is seldom used in operational settings. Two possible explanations for this are that IQE is generally not integrated into searchers’ established information-seeking behaviors (e.g., examining lists of documents), and it may not be offered at a time in the search when it is needed most (i.e., during the initial query formulation). These challenges can be addressed by coupling IQE more closely with familiar search activities, rather than as a separate functionality that searchers must learn. In this article we introduce and evaluate a variant of IQE known as Real-Time Query Expansion (RTQE). As a searcher enters their query in a text box at the interface, RTQE provides a list of suggested additional query terms, in effect offering query expansion options while the query is formulated. To investigate how the technique is used – and when it may be useful – we conducted a user study comparing three search interfaces: a baseline interface with no query expansion support; an interface that provides expansion options during query entry, and a third interface that provides options after queries have been submitted to a search system. The results show that offering RTQE leads to better quality initial queries, more engagement in the search, and an increase in the uptake of query expansion. However, the results also imply that care must be taken when implementing RTQE interactively. Our findings have broad implications for how IQE should be offered, and form part of our research on the development of techniques to support the increased use of query expansion.  相似文献   

15.
XML is a pervasive technology for representing and accessing semi-structured data. XPath is the standard language for navigational queries on XML documents and there is a growing demand for its efficient processing.In order to increase the efficiency in executing four navigational XML query primitives, namely descendants, ancestors, children and parent, we introduce a new paradigm where traditional approaches based on the efficient traversing of nodes and edges to reconstruct the requested subtrees are replaced by a brand new one based on basic set operations which allow us to directly return the desired subtree, avoiding to create it passing through nodes and edges.Our solution stems from the NEsted SeTs for Object hieRarchies (NEASTOR) formal model, which makes use of set-inclusion relations for representing and providing access to hierarchical data. We define in-memory efficient data structures to implement NESTOR, we develop algorithms to perform the descendants, ancestors, children and parent query primitives and we study their computational complexity.We conduct an extensive experimental evaluation by using several datasets: digital archives (EAD collections), INEX 2009 Wikipedia collection, and two widely-used synthetic datasets (XMark and XGen). We show that NESTOR-based data structures and query primitives consistently outperform state-of-the-art solutions for XPath processing at execution time and they are competitive in terms of both memory occupation and pre-processing time.  相似文献   

16.
Over time, researchers have acknowledged the importance of understanding the users’ strategies in the design of search systems. However, when involving users in the comparison of search systems, methodological challenges still exist as researchers are pondering on how to handle the variability that human participants bring to the comparisons. This paper present methods for controlling the complexity of user-centered evaluations of search user interfaces through within-subjects designs, balanced task sets, time limitations, pre-formulated queries, cached result pages, and through limiting the users’ access to result documents. Additionally, we will present our experiences in using three measures – search speed, qualified search speed, and immediate accuracy – to facilitate the comparison of different search systems over studies.  相似文献   

17.
Understanding users’ navigation on the Web is important towards improving the quality of information and the speed of accessing large-scale Web data sources. Clustering of users’ navigation into sessions has been proposed in order to identify patterns and similarities which are then managed in the context of Web users oriented applications (searching, e-commerce, etc.). This paper deals with the problem of assessing the quality of user session clusters in order to make inferences regarding the users’ navigation behavior. A common model-based clustering algorithm is used to result in clusters of Web users’ sessions. These clusters are validated by using a statistical test, which measures the distances of the clusters’ distributions to infer their dissimilarity and distinguishing level. Furthermore, a visualization method is proposed in order to interpret the relation between clusters. Using real data sets, we illustrate how the proposed analysis can be applied in popular application scenarios to reveal valuable associations among Web users’ navigation sessions.  相似文献   

18.
We propose a new query reformulation approach, using a set of query concepts that are introduced to precisely denote the user’s information need. Since a document collection is considered to be a domain which includes latent primitive concepts, we identify those concepts through a local pattern discovery and a global modeling using data mining techniques. For a new query, we select its most associated primitive concepts and choose the most probable interpretations as query concepts. We discuss the issue of constructing the primitive concepts from either the whole corpus or from the retrieved set of documents. Our experiments are performed on the TREC8 collection. The experimental evaluation shows that our approach is as good as current query reformulation approaches, while being particularly effective for poorly performing queries. Moreover, we find that the approach using the primitive concepts generated from the set of retrieved documents leads to the most effective performance.  相似文献   

19.
This study aims to investigate users’ subjective well-being and loyalty towards social network sites (SNSs). Despite the growing role of network externalities in SNS continuance decisions, the SNS usage literature has paid scant attention to the relationship between network externalities, SNS identification, and users’ subjective well-being. In this study, we identify four components of network externalities: perceived network size, external prestige, compatibility, and complementarity. In the research model, both network size and external prestige are hypothesized positively to affect SNS identification. Perceived compatibility and perceived complementarity are hypothesized positively to affect user satisfaction. Satisfaction and SNS identification are hypothesized positively to affect user subjective well-being and loyalty towards the SNS. Users’ subjective well-being is hypothesized positively to affect their loyalty towards the SNS. Data collected from 615 valid users of Facebook provide strong support for most of these hypotheses. The findings indicate that perceived network size negatively affects users’ SNS identifications. Other components of network externalities have positive effects on SNS identification and satisfaction, which in turn have positive effects on users’ subjective well-being and loyalty towards SNS. Implications for theory and practice and suggestions for future research are also discussed.  相似文献   

20.
In this paper, we use time series analysis to evaluate predictive scenarios using search engine transactional logs. Our goal is to develop models for the analysis of searchers’ behaviors over time and investigate if time series analysis is a valid method for predicting relationships between searcher actions. Time series analysis is a method often used to understand the underlying characteristics of temporal data in order to make forecasts. In this study, we used a Web search engine transactional log and time series analysis to investigate users’ actions. We conducted our analysis in two phases. In the initial phase, we employed a basic analysis and found that 10% of searchers clicked on sponsored links. However, from 22:00 to 24:00, searchers almost exclusively clicked on the organic links, with almost no clicks on sponsored links. In the second and more extensive phase, we used a one-step prediction time series analysis method along with a transfer function method. The period rarely affects navigational and transactional queries, while rates for transactional queries vary during different periods. Our results show that the average length of a searcher session is approximately 2.9 interactions and that this average is consistent across time periods. Most importantly, our findings shows that searchers who submit the shortest queries (i.e., in number of terms) click on highest ranked results. We discuss implications, including predictive value, and future research.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号