首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Across the world, millions of users interact with search engines every day to satisfy their information needs. As the Web grows bigger over time, such information needs, manifested through user search queries, also become more complex. However, there has been no systematic study that quantifies the structural complexity of Web search queries. In this research, we make an attempt towards understanding and characterizing the syntactic complexity of search queries using a multi-pronged approach. We use traditional statistical language modeling techniques to quantify and compare the perplexity of queries with natural language (NL). We then use complex network analysis for a comparative analysis of the topological properties of queries issued by real Web users and those generated by statistical models. Finally, we conduct experiments to study whether search engine users are able to identify real queries, when presented along with model-generated ones. The three complementary studies show that the syntactic structure of Web queries is more complex than what n-grams can capture, but simpler than NL. Queries, thus, seem to represent an intermediate stage between syntactic and non-syntactic communication.  相似文献   

3.
4.
The use of non-English Web search engines has been prevalent. Given the popularity of Chinese Web searching and the unique characteristics of Chinese language, it is imperative to conduct studies with focuses on the analysis of Chinese Web search queries. In this paper, we report our research on the character usage of Chinese search logs from a Web search engine in Hong Kong. By examining the distribution of search query terms, we found that users tended to use more diversified terms and that the usage of characters in search queries was quite different from the character usage of general online information in Chinese. After studying the Zipf distribution of n-grams with different values of n, we found that the curve of unigram is the most curved one of all while the bigram curve follows the Zipf distribution best, and that the curves of n-grams with larger n (n = 3–6) had similar structures with β-values in the range of 0.66–0.86. The distribution of combined n-grams was also studied. All the analyses are performed on the data both before and after the removal of function terms and incomplete terms and similar findings are revealed. We believe the findings from this study have provided some insights into further research in non-English Web searching and will assist in the design of more effective Chinese Web search engines.  相似文献   

5.
Identifying perceived emotional content of music constitutes an important aspect of easy and efficient search, retrieval, and management of the media. One of the most promising use cases of music organization is an emotion-based playlist, where automatic music emotion recognition plays a significant role in providing emotion related information, which is otherwise, generally unavailable. Based on the importance of the auditory system in emotional recognition and processing, in this study, we propose a new cochleogram-based system for detecting the affective musical content. To effectively simulate the response of the human auditory periphery, the music audio signal is processed by a detailed biophysical cochlear model, thus obtaining an output that closely matches the characteristics of human hearing. In this proposed approach, based on the cochleogram images, which we construct directly from the response of the basilar membrane, a convolutional neural network (CNN) is used to extract the relevant music features. To validate the practical implications of the proposed approach with regard to its possible integration in different digital music libraries, an extensive study was conducted to evaluate the predictive performance of our approach in different aspects of music emotion recognition. The proposed approach was evaluated on publicly available 1000 songs database and the experimental results showed that it performed better in comparison with common musical features (such as tempo, mode, pitch, clarity, and perceptually motivated mel-frequency cepstral coefficients (MFCC)) as well as official ”MediaEval” challenge results on the same reference database. Our findings clearly show that the proposed approach can lead to better music emotion recognition performance and be used as part of a state-of-the-art music information retrieval system.  相似文献   

6.
A growing body of studies is developing approaches to evaluating human interaction with Web search engines, including the usability and effectiveness of Web search tools. This study explores a user-centered approach to the evaluation of the Web search engine Inquirus – a Web meta-search tool developed by researchers from the NEC Research Institute. The goal of the study reported in this paper was to develop a user-centered approach to the evaluation including: (1) effectiveness: based on the impact of users' interactions on their information problem and information seeking stage, and (2) usability: including screen layout and system capabilities for users. Twenty-two volunteers searched Inquirus on their own personal information topics. Data analyzed included: (1) user pre- and post-search questionnaires and (2) Inquirus search transaction logs. Key findings include: (1) Inquirus was rated highly by users on various usability measures, (2) all users experienced some level of shift/change in their information problem, information seeking, and personal knowledge due to their Inquirus interaction, (3) different users experienced different levels of change/shift, and (4) the search measure precision did not correlate with other user-based measures. Some users experienced major changes/shifts in various user-based variables, such as information problem or information seeking stage with a search of low precision and vice versa. Implications for the development of user-centered approaches to the evaluation of Web and information retrieval (IR) systems and further research are discussed.  相似文献   

7.
The Cepstrum processing method has been used on power cables (1, 2) for determining regions of damage. The method consists of: (1) observing the spectrum of an original broad band signal source, (2) introducing the signal into the cable, (3) computing the change in the observed spectrum (as a result of echos from regions of cable non-uniformity) when the signal is injected into the cable, and finally (4) computing the power-spectrum of the change in observed spectrum. A limitation of this technique is the use of band-limiting spectrum analyzers. Their limited bandwidth reduces range resolution estimation when used with the Fast Fourier Transform (FFT) technique. The maximum entropy method (MEM) is a more useful spectral estimator for this measurement technique. Examples are presented which show a comparison of the FFT and MEM techniques applied to practical cables.  相似文献   

8.
This paper presents not only mycommunityinfo.ca (MCI) as an innovative World Wide Web (WWW)-based community information (CI) site, but also how its unique approach to facilitating online CI searching on the Web reveals through empirical data how people use such information and communication technologies (ICTs) to address their everyday information needs. The geographic focus for this study is on three communities in Southwestern Ontario. MCI collects unobtrusively query data that are logged daily from its own Web site, the Web sites of three municipal governments, and one municipal agency from this region. One year’s worth of these data was supplied to determine the types of CI that are sought through Web searching. A content analysis of a large purposive sample of all of MCI’s query data reveals more specific and diverse conceptual CI needs between and within communities than those reported in other studies employing different data collection methods. As a result, using a centralized approach to online CI access via the WWW by other CI providers such as the 211 network may be a disservice to its users. Additionally, the findings demonstrate how a thorough analysis of such data may improve the informational content and overall design of municipal government Web sites. The analysis of these data also has the potential of improving current CI taxonomies.  相似文献   

9.
李娜 《科教文汇》2011,(10):157-158
许多现代艺术家在摆脱具象艺术和传统写实的束缚时,他们或多或少的被原始艺术所吸引,尤其是具有原始意味的非洲雕刻艺术。非洲雕刻艺术那种有着奔放飞跃的神秘意味,让现代艺术家惊喜不已。这些艺术往往与巫术文化有关,在其影响下,原始艺术具有象征、隐喻的意味,同时在造型手法上大胆粗犷,但不缺乏细节处理。正是这种无拘束的艺术让现代艺术家分析、思考、借鉴它们的风格和内涵。  相似文献   

10.
Traditional information retrieval techniques that primarily rely on keyword-based linking of the query and document spaces face challenges such as the vocabulary mismatch problem where relevant documents to a given query might not be retrieved simply due to the use of different terminology for describing the same concepts. As such, semantic search techniques aim to address such limitations of keyword-based retrieval models by incorporating semantic information from standard knowledge bases such as Freebase and DBpedia. The literature has already shown that while the sole consideration of semantic information might not lead to improved retrieval performance over keyword-based search, their consideration enables the retrieval of a set of relevant documents that cannot be retrieved by keyword-based methods. As such, building indices that store and provide access to semantic information during the retrieval process is important. While the process for building and querying keyword-based indices is quite well understood, the incorporation of semantic information within search indices is still an open challenge. Existing work have proposed to build one unified index encompassing both textual and semantic information or to build separate yet integrated indices for each information type but they face limitations such as increased query process time. In this paper, we propose to use neural embeddings-based representations of term, semantic entity, semantic type and documents within the same embedding space to facilitate the development of a unified search index that would consist of these four information types. We perform experiments on standard and widely used document collections including Clueweb09-B and Robust04 to evaluate our proposed indexing strategy from both effectiveness and efficiency perspectives. Based on our experiments, we find that when neural embeddings are used to build inverted indices; hence relaxing the requirement to explicitly observe the posting list key in the indexed document: (a) retrieval efficiency will increase compared to a standard inverted index, hence reduces the index size and query processing time, and (b) while retrieval efficiency, which is the main objective of an efficient indexing mechanism improves using our proposed method, retrieval effectiveness also retains competitive performance compared to the baseline in terms of retrieving a reasonable number of relevant documents from the indexed corpus.  相似文献   

11.
Recent developments have shown that entity-based models that rely on information from the knowledge graph can improve document retrieval performance. However, given the non-transitive nature of relatedness between entities on the knowledge graph, the use of semantic relatedness measures can lead to topic drift. To address this issue, we propose a relevance-based model for entity selection based on pseudo-relevance feedback, which is then used to systematically expand the input query leading to improved retrieval performance. We perform our experiments on the widely used TREC Web corpora and empirically show that our proposed approach to entity selection significantly improves ad hoc document retrieval compared to strong baselines. More concretely, the contributions of this work are as follows: (1) We introduce a graphical probability model that captures dependencies between entities within the query and documents. (2) We propose an unsupervised entity selection method based on the graphical model for query entity expansion and then for ad hoc retrieval. (3) We thoroughly evaluate our method and compare it with the state-of-the-art keyword and entity based retrieval methods. We demonstrate that the proposed retrieval model shows improved performance over all the other baselines on ClueWeb09B and ClueWeb12B, two widely used Web corpora, on the [email protected], and [email protected] metrics. We also show that the proposed method is most effective on the difficult queries. In addition, We compare our proposed entity selection with a state-of-the-art entity selection technique within the context of ad hoc retrieval using a basic query expansion method and illustrate that it provides more effective retrieval for all expansion weights and different number of expansion entities.  相似文献   

12.
After the significant discovery of the hole-doped nickelate compound Nd0.8Sr0.2NiO2, analyses of the electronic structure, orbital components, Fermi surfaces and band topology could be helpful to understand the mechanism of its superconductivity. Based on first-principle calculations, we find that Ni states contribute the largest Fermi surface. The states form an electron pocket at Γ, while 5dxy states form a relatively bigger electron pocket at A. These Fermi surfaces and symmetry characteristics can be reproduced by our two-band model, which consists of two elementary band representations: B1g@1a ⊕ A1g@1b. We find that there is a band inversion near A, giving rise to a pair of Dirac points along M-A below the Fermi level upon including spin-orbit coupling. Furthermore, we perform density functional theory based Gutzwiller (DFT+Gutzwiller) calculations to treat the strong correlation effect of Ni 3d orbitals. In particular, the bandwidth of has been renormalized largely. After the renormalization of the correlated bands, the Ni 3dxy states and the Dirac points become very close to the Fermi level. Thus, a hole pocket at A could be introduced by hole doping, which may be related to the observed sign change of the Hall coefficient. By introducing an additional Ni 3dxy orbital, the hole-pocket band and the band inversion can be captured in our modified model. Besides, the nontrivial band topology in the ferromagnetic two-layer compound La3Ni2O6 is discussed and the band inversion is associated with Ni and La 5dxy orbitals.  相似文献   

13.
Stochastic simulation has been very effective in many domains but never applied to the WWW. This study is a premiere in using neural networks in stochastic simulation of the number of rejected Web pages per search query. The evaluation of the quality of search engines should involve not only the resulting set of Web pages but also an estimate of the rejected set of Web pages. The iterative radial basis functions (RBF) neural network developed by Meghabghab and Nasr [Iterative RBF neural networks as meta-models for stochastic simulations, in: Second International Conference on Intelligent Processing and Manufacturing of Materials, IPMM’99, Honolulu, Hawaii, 1999, pp. 729–734] was adapted to the actual evaluation of the number of rejected Web pages on four search engines, i.e., Yahoo, Alta Vista, Google, and Northern Light. Nine input variables were selected for the simulation: (1) precision, (2) overlap, (3) response time, (4) coverage, (5) update frequency, (6) boolean logic, (7) truncation, (8) word and multi-word searching, (9) portion of the Web pages indexed. Typical stochastic simulation meta-modeling uses regression models in response surface methods. RBF becomes a natural target for such an attempt because they use a family of surfaces each of which naturally divides an input space into two regions X+ and X− and the n patterns for testing will be assigned either class X+ or X−. This technique divides the resulting set of responses to a query into accepted and rejected Web pages. To test the hypothesis that the evaluation of any search engine query should involve an estimate of the number of rejected Web pages as part of the evaluation, RBF meta-model was trained on 937 examples from a set of 9000 different simulation runs on the nine different input variables. Results show that two of the variables can be eliminated which include: response time and portion of the Web indexed without affecting evaluation results. Results show that the number of rejected Web pages for a specific set of search queries on these four engines very high. Also a goodness measure of a search engine for a given set of queries can be designed which is a function of the coverage of the search engine and the normalized age of a new document in result set for the query. This study concludes that unless search engine designers address the issue of rejected Web pages, indexing, and crawling, the usage of the Web as a research tool for academic and educational purposes will stay hindered.  相似文献   

14.
In this paper we describe the design of a groupware framework, CIRLab, for experimenting with collaborative information retrieval (CIR) techniques in different search scenarios. This framework has been designed applying design patterns and an object-oriented middleware platform to maximize its reusability and adaptability in new contexts with a minimum of programming efforts. Our collaborative search application comprises three main modules: the Core, which supports various modern state-of-the-art CIR techniques that can be reused or extended in a distributed collaborative environment; the Facades Mediator, an event-driven notification service which allows easy integration between the Core and front-end applications; and finally, the Actions Tracker, which allows researchers to perform experiments on the different elements involved in the collaborative search sessions. The applying of this framework is illustrated through the analysis of the collaborative search-driven development case study.  相似文献   

15.
This paper develops a typology of information cultures by synthesizing empirical and theoretical research in organization science and information science. Four information culture types are proposed. In a Result-oriented culture, the goal of information management is to enable the organization to compete and succeed in its market or sector. In a Rule-following culture, information is managed to control internal operations, and to reinforce rules and policies. In a Relationship-based culture, information is managed to encourage communication, participation, and a sense of identity. In a Risk-taking culture, information is managed to encourage innovation, creativity, and the exploration of new ideas. We expect most organizations to display to varying degrees norms and behaviors from all four types, and that the information culture profile of an organization would be related to its effectiveness. The paper ends by looking at the practical and theoretical value of a systematic examination of information culture and its link to organizational effectiveness.  相似文献   

16.
Several studies of Web server workloads have hypothesized that these workloads are self-similar. The explanation commonly advanced for this phenomenon is that the distribution of Web server requests may be heavy-tailed. However, there is another possible explanation: self-similarity can also arise from deterministic, chaotic processes. To our knowledge, this possibility has not previously been investigated, and so existing studies on Web workloads lack an adequate comparison against this alternative. We conduct an empirical study of workloads from two different Web sites: one public university, and one private company, using the largest datasets that have been described in the literature. Our study employs methods from nonlinear time series analysis to search for chaotic behavior in the web logs of these two sites. While we do find that the deterministic components (i.e. the well-known “weekend effect”) are significant components in these time series, we do not find evidence of chaotic behavior. Predictive modeling experiments contrasting heavy-tailed with deterministic models showed that both approaches were equally effective in modeling our datasets.  相似文献   

17.
The paper explores the importance of internet-facilitated value co-creation, especially in cultural industries. Through an extensive review of the literature, it shows that in many industries, a transformational shift is taking place from value creation to value co-creation, which is fundamentally changing the relationship between consumers and producers. In particular, the paper examines value creation and co-creation in the popular music industry. This reveals that though much of the research on music and the internet has revolved around the issue of music piracy, evidence is now emerging that the internet is enabling some record labels, musicians and fans to work together to co-create value for mutual benefit. The paper concludes by arguing that value co-creation is an important development that can transform the relationship between consumers and producers, and that in the popular music industry value co-creation can promote new, more positive, relationships among record labels, artists and fans.  相似文献   

18.
A growing body of research is beginning to explore the information-seeking behavior of Web users. The vast majority of these studies have concentrated on the area of textual information retrieval (IR). Little research has examined how people search for non-textual information on the Internet, and few large-scale studies has investigated visual information-seeking behavior with general-purpose Web search engines. This study examined visual information needs as expressed in users’ Web image queries. The data set examined consisted of 1,025,908 sequential queries from 211,058 users of Excite, a major Internet search service. Twenty-eight terms were used to identify queries for both still and moving images, resulting in a subset of 33,149 image queries by 9855 users. We provide data on: (1) image queries – the number of queries and the number of search terms per user, (2) image search sessions – the number of queries per user, modifications made to subsequent queries in a session, and (3) image terms – their rank/frequency distribution and the most highly used search terms. On average, there were 3.36 image queries per user containing an average of 3.74 terms per query. Image queries contained a large number of unique terms. The most frequently occurring image related terms appeared less than 10% of the time, with most terms occurring only once. We contrast this to earlier work by P.G.B. Enser, Journal of Documentation 51 (2) (1995) 126–170, who examined written queries for pictorial information in a non-digital environment. Implications for the development of models for visual information retrieval, and for the design of Web search engines are discussed.  相似文献   

19.
郑海文 《科教文汇》2014,(16):227-228
徐冰是当代最活跃的艺术家之一,作为一个受85新潮影响的特定时代的艺术家,徐冰不像有些艺术家那样全盘西化,走出一条更有创造和生命力的艺术形式,大量地结合中西的文化形式,产生出一种矛盾和碰撞,引人反思。“烟草计划”是徐冰2000年以来的作品,对于这个作品的解读可以是多角度的,就像是艺术家自己理解的那样:将解释权归给观众。笔者发现徐冰的“烟草计划”与霍米?巴巴的后殖民主义思想中的矛盾、混杂性非常契合,用此来解读也会很有意思。本文主要是通过霍米·巴巴的后殖民主义思想来解读徐冰的“烟草计划·达勒姆”。  相似文献   

20.
POSIE (POSTECH Information Extraction System) is an information extraction system which uses multiple learning strategies, i.e., SmL, user-oriented learning, and separate-context learning, in a question answering framework. POSIE replaces laborious annotation with automatic instance extraction by the SmL from structured Web documents, and places the user at the end of the user-oriented learning cycle. Information extraction as question answering simplifies the extraction procedures for a set of slots. We introduce the techniques verified on the question answering framework, such as domain knowledge and instance rules, into an information extraction problem. To incrementally improve extraction performance, a sequence of the user-oriented learning and the separate-context learning produces context rules and generalizes them in both the learning and extraction phases. Experiments on the “continuing education” domain initially show that the F1-measure becomes 0.477 and recall 0.748 with no user training. However, as the size of the training documents grows, the F1-measure reaches beyond 0.75 with recall 0.772. We also obtain F-measure of about 0.9 for five out of seven slots on “job offering” domain.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号