首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 437 毫秒
1.
The retrieval effectiveness of the underlying document search component of an expert search engine can have an important impact on the effectiveness of the generated expert search results. In this large-scale study, we perform novel experiments in the context of the document search and expert search tasks of the TREC Enterprise track, to measure the influence that the performance of the document ranking has on the ranking of candidate experts. In particular, our experiments show that while the expert search system performance is related to the relevance of the retrieved documents, surprisingly, it is not always the case that increasing document search effectiveness causes an increase in expert search performance. Moreover, we simulate document rankings designed with expert search performance in mind and, through a failure analysis, show why even a perfect document ranking may not result in a perfect ranking of candidate experts.  相似文献   

2.
Statistical language models have been successfully applied to many information retrieval tasks, including expert finding: the process of identifying experts given a particular topic. In this paper, we introduce and detail language modeling approaches that integrate the representation, association and search of experts using various textual data sources into a generative probabilistic framework. This provides a simple, intuitive, and extensible theoretical framework to underpin research into expertise search. To demonstrate the flexibility of the framework, two search strategies to find experts are modeled that incorporate different types of evidence extracted from the data, before being extended to also incorporate co-occurrence information. The models proposed are evaluated in the context of enterprise search systems within an intranet environment, where it is reasonable to assume that the list of experts is known, and that data to be mined is publicly accessible. Our experiments show that excellent performance can be achieved by using these models in such environments, and that this theoretical and empirical work paves the way for future principled extensions.  相似文献   

3.
Many machine learning algorithms have been applied to text classification tasks. In the machine learning paradigm, a general inductive process automatically builds a text classifier by learning, generally known as supervised learning. However, the supervised learning approaches have some problems. The most notable problem is that they require a large number of labeled training documents for accurate learning. While unlabeled documents are easily collected and plentiful, labeled documents are difficultly generated because a labeling task must be done by human developers. In this paper, we propose a new text classification method based on unsupervised or semi-supervised learning. The proposed method launches text classification tasks with only unlabeled documents and the title word of each category for learning, and then it automatically learns text classifier by using bootstrapping and feature projection techniques. The results of experiments showed that the proposed method achieved reasonably useful performance compared to a supervised method. If the proposed method is used in a text classification task, building text classification systems will become significantly faster and less expensive.  相似文献   

4.
Currently, many software companies are looking to assemble a team of experts who can collaboratively carry out an assigned project in an agile manner. The most ideal members for an agile team are T-shaped experts, who not only have expertise in one skill-area but also have general knowledge in a number of related skill-areas. Existing related methods have only used some heuristic non-machine learning models to form an agile team from candidates, while machine learning has been successful in similar tasks. In addition, they have only used the number of candidates’ documents in various skill-areas as a resource to estimate the candidates’ T-shaped knowledge to work in an agile team, while the content of their documents is also very important. To this end, we propose a multi-step method that rectifies the drawbacks mentioned. In this method, we first pick out the best possible candidates using a state-of-the-art model, then we re-estimate their relevant knowledge for working in the team with the help of a deep learning model, which uses the content of the candidates’ posts on StackOverflow. Finally, we select the best possible members for the given agile team from among these candidates using an integer linear programming model. We perform our experiments on two large datasets C# and Java, which comprise 2,217,366 and 2,320,883 posts from StackOverflow, respectively. On datasets C# and Java, our method selects, respectively, 68.6% and 55.2% of the agile team members from among T-shaped experts, while the best baseline method only selects, respectively, 49.1% and 40.2% of the agile team members from among T-shaped experts. In addition, the results show that our method outperforms the best baseline method by 8.1% and 11.4% in terms of F-measure on datasets C# and Java, respectively.  相似文献   

5.
Professional work is often regulated by procedures that shape the information seeking involved in performing a task. Yet, research on professionals’ information seeking tends to bypass procedures and depict information seeking as an informal activity. In this study we analyze two healthcare tasks governed by procedures: triage and timeouts. While information seeking is central to both procedures, we find that the coordinating nurses rarely engage in information seeking when they triage patients. Inversely, the physicians value convening for timeouts to seek information. To explain these findings we distinguish between junior and expert professionals and between uncertain and equivocal tasks. The triage procedure specifies which information to retrieve but expert professionals such as the coordinating nurses tend to perform triage, which is an uncertain task, by holistic pattern recognition rather than information seeking. For timeouts, which target an equivocal task, the procedure facilitates information seeking by creating a space for open-ended collaborative reflection. Both junior and expert physicians temporarily suspend patient treatment in favor of this opportunity to reflect on their actions, though partly for different reasons. We discuss implications for models of professionals’ information seeking.  相似文献   

6.
DNS tunneling is a typical attack adopted by cyber-criminals to compromise victims’ devices, steal sensitive data, or perform fraudulent actions against third parties without their knowledge. The fraudulent traffic is encapsulated into DNS queries to evade intrusion detection. Unfortunately, traditional defense systems based on Deep Packet Inspection cannot always detect such traffic. As a result, DNS tunneling is one problem that has worried the cybersecurity community over the past decade.In this paper, we propose a robust and reliable Deep Learning-based DNS tunneling detection approach to mine valuable insight from DNS query payloads. More precisely, several features are first extracted by the DNS flow, and then they are arranged as bi-dimensional images. A Convolutional Neural Network is used to automatically and adaptively learn spatial hierarchies of features to be used in a fully connected neural network for traffic classification. The proposed approach may result in an extremely interesting task in predictive security approaches to attack detection.The effectiveness of the proposal is evaluated in several experiments using a real-world traffic dataset. The obtained results show that our approach achieves 99.99% of accuracy and performs better than state-of-the-art solutions.  相似文献   

7.
The problem of quality estimation of crowdsourced work is of great importance. Although a variety of aggregation methods have been proposed to find high-quality structured claims in multiple-choice crowdsourcing tasks such as item labeling, they do not apply to more general tasks, such as article writing and brand design with unstructured submissions. One possibility to tackle this problem is to ask another set of crowd workers to review and grade each submission, essentially transforming unstructured submissions into structured ratings. Nevertheless, such an approach incurs unnecessary monetary cost and delay. In this paper, we address this problem by exploiting task requesters’ historical feedback and directly modeling the submission quality. We propose two embedding-based methods where the first one learns worker embedding and the second one learns both worker embedding and meta information embedding, with additional consideration of neighborhood similarity. Experimental results on three large-scale crowdsourcing data sets demonstrate that our embedding-based feature-learning methods perform much better than feature-engineering methods that use popular learning-to-rank algorithms. At the same time, our methods do not require additional crowdsourced grading.  相似文献   

8.
9.
Machine learning applications must continually utilize label information from the data stream to detect concept drift and adapt to the dynamic behavior. Due to the computational expensiveness of label information, it is impractical to assume that the data stream is fully labeled. Therefore, much research focusing on semi-supervised concept drift detection has been proposed. Despite the large research effort in the literature, there is a lack of analysis on the information resources required with the achievable concept drift detection accuracy. Hence, this paper aims to answer the unexplored research question of “How many labeled samples are required to detect concept drift accurately?” by proposing an analytical framework to analyze and estimate the information resources required to detect concept drift accurately. Specifically, this paper disintegrates the distribution-based concept drift detection task into a learning task and a dissimilarity measurement task for independent analyses. The analyses results are then correlated to estimate the required number of labels within a set of data samples to detect concept drift accurately. The proximity of the information resources estimation is evaluated empirically, where the results suggest that the estimation is accurate with high amount of information resources provided. Additionally, estimation results of a state-of-the-art method and a benchmark data set are reported to show the applicability of the estimation by proposed analytical framework within benchmarked environments. In general, the estimation from the proposed analytical framework can serve as guidance in designing systems with limited information resources. This paper also hopes to assist in identifying research gaps and inspiring new research ideas regarding the analysis of the amount of information resources required for accurate concept drift detection.  相似文献   

10.
Zero-shot object classification aims to recognize the object of unseen classes whose supervised data are unavailable in the training stage. Recent zero-shot learning (ZSL) methods usually propose to generate new supervised data for unseen classes by designing various deep generative networks. In this paper, we propose an end-to-end deep generative ZSL approach that trains the data generation module and object classification module jointly, rather than separately as in the majority of existing generation-based ZSL methods. Due to the ZSL assumption that unseen data are unavailable in the training stage, the distribution of generated unseen data will shift to the distribution of seen data, and subsequently causes the projection domain shift problem. Therefore, we further design a novel meta-learning optimization model to improve the proposed generation-based ZSL approach, where the parameters initialization and the parameters update algorithm are meta-learned to assist model convergence. We evaluate the proposed approach on five standard ZSL datasets. The average accuracy increased by the proposed jointly training strategy is 2.7% and 23.0% for the standard ZSL task and generalized ZSL task respectively, and the meta-learning optimization further improves the accuracy by 5.0% and 2.1% on two ZSL tasks respectively. Experimental results demonstrate that the proposed approach has significant superiority in various ZSL tasks.  相似文献   

11.
Cost optimization continues to be a critical concern for many human resources departments. The key is to balance between costs and business value. In particular, computer science organizations prefer to hire people who are expert in only one skill area and have a slight superficial knowledge in other areas that gives them the ability to collaborate across different aspects of project. Community Question Answering networks provide good platforms for people and organizations to share knowledge and find experts. An important issue in expert finding is that an expert has to constantly update his knowledge after being saturated in his field of expertise to still be identified as expert. A person who fails to preserve his expertise is likely to lose his expertise. This work justifies this question that does take the concept of time into account improve the quality of expertise retrieval. We propose a new method for T-shaped expert finding that is based on temporal expert profiling. The proposed method takes the temporal property of expertise into account to mine the shape of expertise for each candidate expert based on his profile. To this end, for each candidate expert, we take snapshots of his expertise trees at regular time intervals and learn the relation between temporal changes in different expertise trees and candidates’ profile. Finally, we use a filtering technique that is applied on top of the profiling method, to find shape of expertise for candidate experts. Experimental results on a large test collection show the superiority of the proposed method in terms of quality of results in comparison with state-of-the-art.  相似文献   

12.
Automatic review assignment can significantly improve the productivity of many people such as conference organizers, journal editors and grant administrators. A general setup of the review assignment problem involves assigning a set of reviewers on a committee to a set of documents to be reviewed under the constraint of review quota so that the reviewers assigned to a document can collectively cover multiple topic aspects of the document. No previous work has addressed such a setup of committee review assignments while also considering matching multiple aspects of topics and expertise. In this paper, we tackle the problem of committee review assignment with multi-aspect expertise matching by casting it as an integer linear programming problem. The proposed algorithm can naturally accommodate any probabilistic or deterministic method for modeling multiple aspects to automate committee review assignments. Evaluation using a multi-aspect review assignment test set constructed using ACM SIGIR publications shows that the proposed algorithm is effective and efficient for committee review assignments based on multi-aspect expertise matching.  相似文献   

13.
Early time series classification is a variant of the time series classification task, in which a label must be assigned to the incoming time series as quickly as possible without necessarily screening through the whole sequence. It needs to be realized on the algorithmic level by fusing a decision-making method that detects the right moment to stop and a classifier that assigns a class label. The contribution addressed in this paper is twofold. Firstly, we present a new method for finding the best moment to perform an action (terminate/continue). Secondly, we propose a new learning scheme using classifier calibration to estimate classification accuracy. The new approach, called CALIMERA, is formalized as a cost minimization problem. Using two benchmark methodologies for early time series classification, we have shown that the proposed model achieves better results than the current state-of-the-art. Two most serious competitors of CALIMERA are ECONOMY and TEASER. The empirical comparison showed that the new method achieved a higher accuracy than TEASER for 35 out of 45 datasets and it outperformed ECONOMY in 20 out of 34 datasets.  相似文献   

14.
In a multi-agent framework, distributed optimization problems are generally described as the minimization of a global objective function, where each agent can get information only from a neighborhood defined by a network topology. To solve the problem, this work presents an information-constrained strategy based on population dynamics, where payoff functions and tasks are assigned to each node in a connected graph. We prove that the so-called distributed replicator equation (DRE) converges to an optimal global outcome by means of the local-information exchange subject to the topological constraints of the graph. To show the application of the proposed strategy, we implement the DRE to solve an economic dispatch problem with distributed generation. We also present some simulation results to illustrate the theoretic optimality and stability of the equilibrium points and the effects of typical network topologies on the convergence rate of the algorithm.  相似文献   

15.
In recent years, reasoning over knowledge graphs (KGs) has been widely adapted to empower retrieval systems, recommender systems, and question answering systems, generating a surge in research interest. Recently developed reasoning methods usually suffer from poor performance when applied to incomplete or sparse KGs, due to the lack of evidential paths that can reach target entities. To solve this problem, we propose a hybrid multi-hop reasoning model with reinforcement learning (RL) called SparKGR, which implements dynamic path completion and iterative rule guidance strategies to increase reasoning performance over sparse KGs. Firstly, the model dynamically completes the missing paths using rule guidance to augment the action space for the RL agent; this strategy effectively reduces the sparsity of KGs, thus increasing path search efficiency. Secondly, an iterative optimization of rule induction and fact inference is designed to incorporate global information from KGs to guide the RL agent exploration; this optimization iteratively improves overall training performance. We further evaluated the SparKGR model through different tasks on five real world datasets extracted from Freebase, Wikidata and NELL. The experimental results indicate that SparKGR outperforms state-of-the-art baseline models without losing interpretability.  相似文献   

16.
Recently, models that based on Transformer (Vaswani et al., 2017) have yielded superior results in many sequence modeling tasks. The ability of Transformer to capture long-range dependencies and interactions makes it possible to apply it in the field of portfolio management (PM). However, the built-in quadratic complexity of the Transformer prevents its direct application to the PM task. To solve this problem, in this paper, we propose a deep reinforcement learning-based PM framework called LSRE-CAAN, with two important components: a long sequence representations extractor and a cross-asset attention network. Direct Policy Gradient is used to solve the sequential decision problem in the PM process. We conduct numerical experiments in three aspects using four different cryptocurrency datasets, and the empirical results show that our framework is more effective than both traditional and state-of-the-art (SOTA) online portfolio strategies, achieving a 6x return on the best dataset. In terms of risk metrics, our framework has an average volatility risk of 0.46 and an average maximum drawdown risk of 0.27 across the four datasets, both of which are lower than the vast majority of SOTA strategies. In addition, while the vast majority of SOTA strategies maintain a poor turnover rate of approximately greater than 50% on average, our framework enjoys a relatively low turnover rate on all datasets, efficiency analysis illustrates that our framework no longer has the quadratic dependency limitation.  相似文献   

17.
Question answering websites are becoming an ever more popular knowledge sharing platform. On such websites, people may ask any type of question and then wait for someone else to answer the question. However, in this manner, askers may not obtain correct answers from appropriate experts. Recently, various approaches have been proposed to automatically find experts in question answering websites. In this paper, we propose a novel hybrid approach to effectively find experts for the category of the target question in question answering websites. Our approach considers user subject relevance, user reputation and authority of a category in finding experts. A user’s subject relevance denotes the relevance of a user’s domain knowledge to the target question. A user’s reputation is derived from the user’s historical question-answering records, while user authority is derived from link analysis. Moreover, our proposed approach has been extended to develop a question dependent approach that considers the relevance of historical questions to the target question in deriving user domain knowledge, reputation and authority. We used a dataset obtained from Yahoo! Answer Taiwan to evaluate our approach. Our experiment results show that our proposed methods outperform other conventional methods.  相似文献   

18.
项目管理长期存在项目任务数量繁多、任务关系复杂,并行效率低下的问题。传统的项目管理方法不能直观表示任务关系,无法对风险耦合较高的项目任务进行识别、解耦工作,产生项目管理的进度与成本风险损失,严重影响任务并行实施效果。针对此问题,本文提出利用数值设计结构矩阵表达项目任务关系,构建风险概率矩阵与风险损失矩阵对项目总体预计风险损失值进行评估,使用遗传算法对项目预计风险损失值最低流程进行求解。本文通过空间科学任务并行设计项目流程优化项目实证,论证该模型解耦寻优方法的可行性。  相似文献   

19.
Personalized recommender systems have been extensively studied in human-centered intelligent systems. Existing recommendation techniques have achieved comparable performance in predictive accuracy; however, the trade-off between recommendation accuracy and diversity poses new challenges, as diversification may lead to accuracy loss, whereas it can solve the over-fitting problem and enhance the user experience. In this study, we propose a heuristic optimization-based recommendation model that jointly optimizes accuracy and diversity performance by obtaining a set of optimized solutions. To establish the best accuracy-diversity balance, a novel trajectory-reinforcement-based bacterial colony optimization algorithm was developed. The improved bacterial colony optimization algorithm was comprehensively evaluated by comparing it with eight popular and state-of-the-art algorithms on ten benchmark testing problems with different degrees of complexity. Furthermore, an optimization-based recommendation model was applied to a real-world recommendation dataset. The results demonstrate that the improved bacterial colony optimization algorithm achieves the best overall performance for benchmark problems in terms of convergence and diversity. In the real-world recommendation task, the proposed approach improved the diversity performance by 1.62% to 8.62% while maintaining superior (1.88% to 40.32%) accuracy performance. Additionally, the proposed personalized recommendation model can provide a set of nondominated solutions instead of a single solution to accommodate the ever-changing preferences of users and service providers. Therefore, this work demonstrates the excellence of an optimization-based recommendation approach for solving the accuracy-diversity trade-off.  相似文献   

20.
Many enterprise employees may publish content outside their corporate intranet, making the Web a valuable source for identifying company experts. In this article, we thoroughly investigate the usefulness of Web search engines (WSEs) for expert search. In particular, we claim that the ranking of documentary expertise evidence provided by a WSE should also give an indication of the importance of such evidence. To investigate this, we mimic the rankings of seven different WSEs by trying to reproduce their underlying ranking mechanisms in order to search for candidate experts in the TREC CERC collection. Experimental results show that our approach is effective for expert search, and can significantly improve an intranet-based expert search engine. Moreover, when the mimicking of WSEs is further improved by training, expert search performance is also generally enhanced. Finally, we show that WSEs can be mimicked as effectively using only titles and snippets instead of the full content of WSEs’ results, while drastically reducing network costs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号