首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 359 毫秒
1.
In the information retrieval systems, one of the most important and difficult operations is to extract appropriate keywords from documents. This paper proposes an effective substring search method by extending a pattern matching machine for multi-keyword based on Aho and Corasick (AC) called AC machine. The proposed method enables us to extract keyword candidates as much as possible and to select the suitable keywords for users' purpose at a retrieval stage. This method contains four types of substring search methods (exact, prefix, suffix and proper substring search). This paper also proposes a construction algorithm of the retrieval structure for speeding up the substring search. From the simulation results, it is shown that the retrieval time of the presented method is as fast as the key retrieval method based on the trie.  相似文献   

2.
In sponsored search, many advertisers have not achieved their expected performances while the search engine also has a large room to improve their revenue. Specifically, due to the improper keyword bidding, many advertisers cannot survive the competitive ad auctions to get their desired ad impressions; meanwhile, a significant portion of search queries have no ads displayed in their search result pages, even if many of them have commercial values. We propose recommending a group of relevant yet less-competitive keywords to an advertiser. Hence, the advertiser can get the chance to win some (originally empty) ad slots and accumulate a number of impressions. At the same time, the revenue of the search engine can also be boosted since many empty ad shots are filled. Mathematically, we model the problem as a mixed integer programming problem, which maximizes the advertiser revenue and the relevance of the recommended keywords, while minimizing the keyword competitiveness, subject to the bid and budget constraints. By solving the problem, we can offer an optimal group of keywords and their optimal bid prices to an advertiser. Simulation results have shown the proposed method is highly effective in increasing ad impressions, expected clicks, advertiser revenue, and search engine revenue.  相似文献   

3.
In sponsored search advertising (SSA), keywords serve as the basic unit of business model, linking three stakeholders: consumers, advertisers and search engines. This paper presents an overarching framework for keyword decisions that highlights the touchpoints in search advertising management, including four levels of keyword decisions, i.e., domain-specific keyword pool generation, keyword targeting, keyword assignment and grouping, and keyword adjustment. Using this framework, we review the state-of-the-art research literature on keyword decisions with respect to techniques, input features and evaluation metrics. Finally, we discuss evolving issues and identify potential gaps that exist in the literature and outline novel research perspectives for future exploration.  相似文献   

4.
This paper presents a new approach to query expansion in search engines through the use of general non-topical terms (NTTs) and domain-specific semi-topical terms (STTs). NTTs and STTs can be used in conjunction with topical terms (TTs) to improve precision in retrieval results. In Phase I, 20 topical queries in two domains (Health and the Social Sciences) were carried out in Google and from the results of the queries, 800 pages were textually analysed. Of 1442 NTTs and STTs identified, 15% were shared between the two domains; 62% were NTTs and 38% were STTs; and approximately 64% occurred before while 36% occurred after their respective topical terms (TTs). Findings of Phase II showed that query expansion through NTTs (or STTs) particularly in the ‘exact title’ and URL search options resulted in more precise and manageable results. Statistically significant differences were found between Health and the Social Sciences vis-à-vis keyword and ‘exact phrase’ search results; however there were no significant differences in exact title and URL search results. The ratio of exact phrase, exact title, and URL search result frequencies to keyword search result frequencies also showed statistically significant differences between the two domains. Our findings suggest that web searching could be greatly enhanced combining NTTs (and STTs) with TTs in an initial query. Additionally, search results would improve if queries are restricted to the exact title or URL search options. Finally, we suggest the development and implementation of knowledge-based lists of NTTs (and STTs) by both general and specialized search engines to aid query expansion.  相似文献   

5.
Content characteristics of a webpage include factors such as keyword position in a webpage, keyword duplication, layout, and their combination. These factors may impact webpage visibility in a search engine. Four hypotheses are presented relating to the impact of selected content characteristics on webpage visibility in search engine results lists. Webpage visibility can be improved by increasing the frequency of keywords in the title, in the full-text and in both the title and full-text.  相似文献   

6.
This paper presents a novel IR-style keyword search model for semantic web data retrieval, distinguished from current retrieval methods. In this model, an answer to a keyword query is a connected subgraph that contains all the query keywords. In addition, the answer is minimal because any proper subgraph can not be an answer to the query. We provide an approximation algorithm to retrieve these answers efficiently. A special ranking strategy is also proposed so that answers can be appropriately ordered. The experimental results over real datasets show that our model outperforms existing possible solutions with respect to effectiveness and efficiency.  相似文献   

7.
In this research, we evaluate the effect of gender targeted advertising on the performance of sponsored search advertising. We analyze nearly 7,000,000 records spanning 33 consecutive months of a keyword advertising campaign from a major US retailer. In order to determine the effect of demographic targeting, we classify the campaign’s key phrases by a probability of being targeted for a specific gender, and we then compare the key performance indicators among these groupings using the critical sponsored search metrics of impressions, clicks, cost-per-click, sales revenue, orders, and items, and return on advertising. Findings from our research show that the gender-orientation of the key phrase is a significant determinant in predicting behaviors and performance, with statistically different consumer behaviors for all attributes as the probability of a male or female keyword phrase changes. However, gender neutral phrases perform the best overall, generating 20 times the return of advertising than any gender targeted category. Insight from this research could result in sponsored advertising efforts being more effectively targeted to searchers and potential consumers.  相似文献   

8.
As network analysis methods prevail, more metrics are applied to co-word networks to reveal hot topics in a field. However, few studies have examined the relationships among these metrics. To bridge this gap, this study explores the relationships among different ranking metrics, including one frequency-based and six network-based metrics, in order to understand the impact of network structural features on ranking themes on co-word networks. We collected bibliographic data from three disciplines from Web of Science (WoS), and generated 40 simulation networks following the preferential attachment assumption. Correlation analysis on the empirical and simulated networks shows strong relationships among the metrics. Their relationships are consistent across disciplines. The metrics can be categorized into three groups according to the strength of their correlations, where Degree Centrality, H-index, and Coreness are in one group, Betweenness Centrality, Clustering Coefficient, and frequency in another, and Weighted PageRank by itself. Regression analysis on the simulation networks reveals that network topology properties, such as connectivity, sparsity, and aggregation, influence the relationships among selected metrics. In addition, when comparing the top keywords ranked by the metrics in the three disciplines, we found the metrics exhibit different discriminative capacity. Coreness and H-index may be better suited for categorizing keywords rather than ranking keywords. Findings from this study contribute to a better understanding of the relationships among different metrics and provide guidance for using them effectively in different contexts.  相似文献   

9.
Collaborative and co-located information access is becoming increasingly common. However, fairly little attention has been devoted to the design of ubiquitous computing approaches for spontaneous exploration of large information spaces enabling co-located collaboration. We investigate whether an entity-based user interface provides a solution to support co-located search on heterogeneous devices. We present the design and implementation of QueryTogether, a multi-device collaborative search tool through which entities such as people, documents, and keywords can be used to compose queries that can be shared to a public screen or specific users with easy touch enabled interaction. We conducted mixed-methods user experiments with twenty seven participants (nine groups of three people), to compare the collaborative search with QueryTogether to a baseline adopting established search and collaboration interfaces. Results show that QueryTogether led to more balanced contribution and search engagement. While the overall s-recall in search was similar, in the QueryTogether condition participants found most of the relevant results earlier in the tasks, and for more than half of the queries avoided text entry by manipulating recommended entities. The video analysis demonstrated a more consistent common ground through increased attention to the common screen, and more transitions between collaboration styles. Therefore, this provided a better fit for the spontaneity of ubiquitous scenarios. QueryTogether and the corresponding study demonstrate the importance of entity based interfaces to improve collaboration by facilitating balanced participation, flexibility of collaboration styles and social processing of search entities across conversation and devices. The findings promote a vision of collaborative search support in spontaneous and ubiquitous multi-device settings, and better linking of conversation objects to searchable entities.  相似文献   

10.
李海林  林春培 《科研管理》2022,43(1):176-183
   鉴于传统方法对科研成果关键词研究存在较强主观影响和较少考虑时间因素等问题,提出基于时间序列聚类的科研成果关键词分析方法。该方法通过统计分析方法验证关键词出现顺序在一定程度上反映了关键词反映主题思想的重要性,将关键词的重要度转化为时间序列数据,分别从重要度的数值和趋势两个角度出发,使用动态时间弯曲方法度量关键词重要度时间序列数据之间的相似性,结合近邻传播方法对关键词时间序列数据之间的相似性矩阵进行聚类分析,实现科研成果的关键词分析研究。通过对某科研管理类重要期刊2008—2017年期间刊发的科研成果论文关键词研究发现:新方法不仅可以对科研成果中关键词的关注热度和趋势进行聚类划分,自适应地找到中心关键词作为相应类别的特征代表对象,还能为科研成果关键词的主题分析提供理论方法和决策支持。  相似文献   

11.
Real time search is an increasingly important area of information seeking on the Web. In this research, we analyze 1,005,296 user interactions with a real time search engine over a 190 day period. Using query log analysis, we investigate searching behavior, categorize search topics, and measure the economic value of this real time search stream. We examine aggregate usage of the search engine, including number of users, queries, and terms. We then classify queries into subject categories using the Google Directory topical hierarchy. We next estimate the economic value of the real time search traffic using the Google AdWords keyword advertising platform. Results shows that 30% of the queries were unique (used only once in the entire dataset), which is low compared to traditional Web searching. Also, 60% of the search traffic comes from the search engine’s application program interface, indicating that real time search is heavily leveraged by other applications. There are many repeated queries over time via these application program interfaces, perhaps indicating both long term interest in a topic and the polling nature of real time queries. Concerning search topics, the most used terms dealt with technology, entertainment, and politics, reflecting both the temporal nature of the queries and, perhaps, an early adopter user-based. However, 36% of the queries indicate some geographical affinity, pointing to a location-based aspect to real time search. In terms of economic value, we calculate this real time search stream to be worth approximately US $33,000,000 (US $33 M) on the online advertising market at the time of the study. We discuss the implications for search engines and content providers as real time content increasingly enters the main stream as an information source.  相似文献   

12.
Diversification of web search results aims to promote documents with diverse content (i.e., covering different aspects of a query) to the top-ranked positions, to satisfy more users, enhance fairness and reduce bias. In this work, we focus on the explicit diversification methods, which assume that the query aspects are known at the diversification time, and leverage supervised learning methods to improve their performance in three different frameworks with different features and goals. First, in the LTRDiv framework, we focus on applying typical learning to rank (LTR) algorithms to obtain a ranking where each top-ranked document covers as many aspects as possible. We argue that such rankings optimize various diversification metrics (under certain assumptions), and hence, are likely to achieve diversity in practice. Second, in the AspectRanker framework, we apply LTR for ranking the aspects of a query with the goal of more accurately setting the aspect importance values for diversification. As features, we exploit several pre- and post-retrieval query performance predictors (QPPs) to estimate how well a given aspect is covered among the candidate documents. Finally, in the LmDiv framework, we cast the diversification problem into an alternative fusion task, namely, the supervised merging of rankings per query aspect. We again use QPPs computed over the candidate set for each aspect, and optimize an objective function that is tailored for the diversification goal. We conduct thorough comparative experiments using both the basic systems (based on the well-known BM25 matching function) and the best-performing systems (with more sophisticated retrieval methods) from previous TREC campaigns. Our findings reveal that the proposed frameworks, especially AspectRanker and LmDiv, outperform both non-diversified rankings and two strong diversification baselines (i.e., xQuAD and its variant) in terms of various effectiveness metrics.  相似文献   

13.
Topic evolution has been described by many approaches from a macro level to a detail level, by extracting topic dynamics from text in literature and other media types. However, why the evolution happens is less studied. In this paper, we focus on whether and how the keyword semantics can invoke or affect the topic evolution. We assume that the semantic relatedness among the keywords can affect topic popularity during literature surveying and citing process, thus invoking evolution. However, the assumption is needed to be confirmed in an approach that fully considers the semantic interactions among topics. Traditional topic evolution analyses in scientometric domains cannot provide such support because of using limited semantic meanings. To address this problem, we apply the Google Word2Vec, a deep learning language model, to enhance the keywords with more complete semantic information. We further develop the semantic space as an urban geographic space. We analyze the topic evolution geographically using the measures of spatial autocorrelation, as if keywords are the changing lands in an evolving city. The keyword citations (keyword citation counts one when the paper containing this keyword obtains a citation) are used as an indicator of keyword popularity. Using the bibliographical datasets of the geographical natural hazard field, experimental results demonstrate that in some local areas, the popularity of keywords is affecting that of the surrounding keywords. However, there are no significant impacts on the evolution of all keywords. The spatial autocorrelation analysis identifies the interaction patterns (including High-High leading, High-Low suppressing) among the keywords in local areas. This approach can be regarded as an analyzing framework borrowed from geospatial modeling. Moreover, the prediction results in local areas are demonstrated to be more accurate if considering the spatial autocorrelations.  相似文献   

14.
This paper studies the influence of question-related variables (closed/open and predictable/unpredictable source) on Web users’ choices of search strategy (direct address, subject directory, and search engine) in the initial stage of a search. Subjects are 54 Finnish and American students with about 2.5 yr of Web searching. Data were gathered via a questionnaire asking for decisions for 16 questions of four types: closed/predictable source; closed/unpredictable source; open/predictable source; open/unpredictable source. The participants not only indicated a fairly high degree of familiarity with the initial search options and used different search strategies but also were influenced in their choice of an initial search strategy by question-related characteristics. Of the two question characteristics in the study, the most influential is the predictable/unpredictable source of the answer. The participants mentioned 24 different types of reasons for selecting the initial search strategy, which were grouped by their focus on questions, sources, and search strategy options. The reasons varied across question types. A model of choice behavior shows relationships among the initial question, the reasons, and the choice of initial search strategies.  相似文献   

15.
Latent semantic indexing (LSI) has been demonstrated to outperform lexical matching in information retrieval. However, the enormous cost associated with the singular value decomposition (SVD) of the large term-by-document matrix becomes a barrier for its application to scalable information retrieval. This work shows that information filtering using level search techniques can reduce the SVD computation cost for LSI. For each query, level search extracts a much smaller subset of the original term-by-document matrix, containing on average 27% of the original non-zero entries. When LSI is applied to such subsets, the average precision can degrade by as much as 23% due to level search filtering. However, for some document collections an increase in precision has also been observed. Further enhancement of level search can be based on a pruning scheme which deletes terms connected to only one document from the query-specific submatrix. Such pruning has achieved a 65% reduction (on average) in the number of non-zeros with a precision loss of 5% for most collections.  相似文献   

16.
科技查新检索中的关键词选择   总被引:12,自引:0,他引:12  
科技查新检索不同于一般文献检索,查新检索者要针对课题的查新点检索一系列的文献数据库,并对以此出具的查新报告的客观性、公正性承担法律责任。在浩如烟海的国内外文献中针对特定检索课题力求做到查准、查全并非易事,单就检索过程中的关键词选择就往往令人困惑!科技文献中普遍存在的一物多名(或同义词)是查新检索中的永恒难题。为了尽量减少漏检,查新者应对检索用关键词进行扩展,并以适当的检索策略覆盖关键词表达的盲区,必要时还应采用分类法检索作为主题法检索的补充,通过多种途径设法化解一物多名给科技文献资源共享带来的难题!科技查新本身是一种创造性的劳动。查新者应充满自信地敞开思路,用创造性的思维,深入自如地施展文献检索才艺。  相似文献   

17.
A user’s single session with a Web search engine or information retrieval (IR) system may consist of seeking information on single or multiple topics, and switch between tasks or multitasking information behavior. Most Web search sessions consist of two queries of approximately two words. However, some Web search sessions consist of three or more queries. We present findings from two studies. First, a study of two-query search sessions on the AltaVista Web search engine, and second, a study of three or more query search sessions on the AltaVista Web search engine. We examine the degree of multitasking search and information task switching during these two sets of AltaVista Web search sessions. A sample of two-query and three or more query sessions were filtered from AltaVista transaction logs from 2002 and qualitatively analyzed. Sessions ranged in duration from less than a minute to a few hours. Findings include: (1) 81% of two-query sessions included multiple topics, (2) 91.3% of three or more query sessions included multiple topics, (3) there are a broad variety of topics in multitasking search sessions, and (4) three or more query sessions sometimes contained frequent topic changes. Multitasking is found to be a growing element in Web searching. This paper proposes an approach to interactive information retrieval (IR) contextually within a multitasking framework. The implications of our findings for Web design and further research are discussed.  相似文献   

18.
高劲松  黄梅  付家炜 《现代情报》2021,40(12):130-139
[目的/意义] 能以简洁的可视化来追踪某学科研究热点随时间的变化趋势,对于掌握学科研究热点的动向具有重要意义。词频分析法是学科研究热点分析方法之一,目前存在众多的基于词频分析的可视化工具,但是这些可视化工具能够以简洁的可视化形式清晰地展现年度热点存在局限性。[方法/过程] 因此本文提出通过学科领域年度发文量与学科全部发文量的比值来衡量年度热点关键词对总年度热点关键词贡献率的可视化方法:基于年度贡献率与二八定律设定并调整阈值参数来控制年度高频关键词的呈现数量,将选取的年度高频关键词按照词频大小与年份依次排序以实现研究热点可视化。[结果/结论]以"关联数据"领域为例进行实证研究,通过分析本文方法提取的高频关键词与现有高频词阈值算法的匹配情况,对比本文方法与Citespace共现图谱的可视化呈现效果,对本文方法的可行性进行检验与评价。  相似文献   

19.
This paper discusses the impact of metadata implementation in a webpage on its visibility performance in a search engine results list. Influential internal and external factors of metadata implementation were identified. How these factors affect webpage visibility in a search engine results list was examined in an experimental study. Findings suggest that metadata is a good mechanism to improve webpage visibility, the metadata subject field plays a more important role than any other metadata field and keywords extracted from the webpage itself, particularly title or full-text, are most effective. To maximize the effects, these keywords should come from both title and full-text.  相似文献   

20.
专门针对实物期权定价方法应用领域中的,R&D投资问题,结合当前国内外研究成果,综述了R&D投资实物期权的各种模型,分析了R&D投资的实物期权方法模型存在的问题以及模型中相关参数估计问题。并在此基础上对标准R&D投资实物期权的三个关注点进行了剖析,提出于对模型真实性和模型需要估计的参数数目之间平衡关系的看法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号