首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Acquiring information properly through machine learning requires familiarity with the available algorithms and understanding how they work and how to address the given problem in the best possible way. However, even for machine-learning experts in specific industrial fields, in order to predict and acquire information properly in different industrial fields, it is necessary to attempt several instances of trial and error to succeed with the application of machine learning. For non-experts, it is much more difficult to make accurate predictions through machine learning.In this paper, we propose an autonomic machine learning platform which provides the decision factors to be made during the developing of machine learning applications. In the proposed autonomic machine learning platform, machine learning processes are automated based on the specification of autonomic levels. This autonomic machine learning platform can be used to derive a high-quality learning result by minimizing experts’ interventions and reducing the number of design selections that require expert knowledge and intuition. We also demonstrate that the proposed autonomic machine learning platform is suitable for smart cities which typically require considerable amounts of security sensitive information.  相似文献   

2.
In this paper, we propose a new learning method for extracting bilingual word pairs from parallel corpora in various languages. In cross-language information retrieval, the system must deal with various languages. Therefore, automatic extraction of bilingual word pairs from parallel corpora with various languages is important. However, previous works based on statistical methods are insufficient because of the sparse data problem. Our learning method automatically acquires rules, which are effective to solve the sparse data problem, only from parallel corpora without any prior preparation of a bilingual resource (e.g., a bilingual dictionary, a machine translation system). We call this learning method Inductive Chain Learning (ICL). Moreover, the system using ICL can extract bilingual word pairs even from bilingual sentence pairs for which the grammatical structures of the source language differ from the grammatical structures of the target language because the acquired rules have the information to cope with the different word orders of source language and target language in local parts of bilingual sentence pairs. Evaluation experiments demonstrated that the recalls of systems based on several statistical approaches were improved through the use of ICL.  相似文献   

3.
Ranking is a central component in information retrieval systems; as such, many machine learning methods for building rankers have been developed in recent years. An open problem is transfer learning, i.e. how labeled training data from one domain/market can be used to build rankers for another. We propose a flexible transfer learning strategy based on sample selection. Source domain training samples are selected if the functional relationship between features and labels do not deviate much from that of the target domain. This is achieved through a novel application of recent advances from density ratio estimation. The approach is flexible, scalable, and modular. It allows many existing supervised rankers to be adapted to the transfer learning setting. Results on two datasets (Yahoo’s Learning to Rank Challenge and Microsoft’s LETOR data) show that the proposed method gives robust improvements.  相似文献   

4.
Consumers often display unique habitual behaviors, and knowledge of these behaviors is of great value in prediction of future demand. We investigated consumer behavior in bicycle sharing in Beijing, where demand prediction is critical for cost-effective rebalancing of bicycle locations (putting bikes where and when they will be rented) and supply (number of bicycles). We created baseline statistical demand models, borrowing methods from economics, signal processing and animal tracking to find consumption cycles of 7, 12, 24 h and 7-days. Lorenz curves of bicycle demand revealed significant stratification of consumer behavior and a long-tail of infrequent demand. To overcome the limits of traditional statistical models, we developed a deep-learning model to incorporate (1) weather and air quality, (2) time-series of demand, and (3) geographical location of demand. Customer segmentation was added at a later stage, to explore potential for improvement with customer demographics. Our final machine learning model with tuned hyperparameters yielded around 50% improvement in predictions over a discrete wavelet transform model, and 80–90% improvement in predictions over a naïve model the reflects some current industry practice. We assessed causality in the deep-learning model, finding that location and air quality had the strongest causal impact on demand. The extreme market segmentation of customer demand, and our relatively short time span of data combined to make it difficult to find sufficient data on all customers for a model fit based on segmentation. We reduced our model data to only the 10 most frequent to see whether such segmentation improves our model's predictive success. These results, though limited, suggest that customer behavior within market segments is more stable than across all customers, as was expected.  相似文献   

5.
林萍  吕健超 《情报科学》2023,41(2):135-142
【目的/意义】提出基于Stacking集成学习的问答信息采纳行为识别策略,促进在线健康社区问答的精准化推送、助推数字化医疗服务高质量发展。【方法/过程】构建以集成学习方法和非集成学习方法为基学习器、以逻辑回归算法(LR)为元学习器的Stacking集成学习模型,比较单预测模型、同类预测模型组合、不同类预测模型组合的Stacking集成学习模型预测精度,选取“寻医问药”平台的慢性病问答构建数据集验证模型的优越性,并选取“快速问医生有问必答120”平台数据验证模型的可移植性。【结果/结论】Stacking集成模型相比于单预测模型能够更精准识别被采纳问答信息,模型具有较强的泛化性,可以适用于不同的在线健康社区。【创新/局限】本文基于Stacking集成思想构建两阶段预测模型,并借助机器学习构建最佳预测模型组合,显著提高在线健康社区问答信息采纳识别精度,但伴随问答信息积累,在线健康社区问答模式不断发展变化,考虑结合历史数据和每日更新数据的动态预测方法是未来研究工作重点。  相似文献   

6.
Adequate adherence is a necessary condition for success with any intervention, including for computerized cognitive training designed to mitigate age-related cognitive decline. Tailored prompting systems offer promise for promoting adherence and facilitating intervention success. However, developing adherence support systems capable of just-in-time adaptive reminders requires understanding the factors that predict adherence, particularly an imminent adherence lapse. In this study we built machine learning models to predict participants’ adherence at different levels (overall and weekly) using data collected from a previous cognitive training intervention. We then built machine learning models to predict adherence using a variety of baseline measures (demographic, attitudinal, and cognitive ability variables), as well as deep learning models to predict the next week's adherence using variables derived from training interactions in the previous week. Logistic regression models with selected baseline variables were able to predict overall adherence with moderate accuracy (AUROC: 0.71), while some recurrent neural network models were able to predict weekly adherence with high accuracy (AUROC: 0.84-0.86) based on daily interactions. Analysis of the post hoc explanation of machine learning models revealed that general self-efficacy, objective memory measures, and technology self-efficacy were most predictive of participants’ overall adherence, while time of training, sessions played, and game outcomes were predictive of the next week's adherence. Machine-learning based approaches revealed that both individual difference characteristics and previous intervention interactions provide useful information for predicting adherence, and these insights can provide initial clues as to who to target with adherence support strategies and when to provide support. This information will inform the development of a technology-based, just-in-time adherence support systems.  相似文献   

7.
In the traditional distributed machine learning scenario, the user’s private data is transmitted between clients and a central server, which results in significant potential privacy risks. In order to balance the issues of data privacy and joint training of models, federated learning (FL) is proposed as a particular distributed machine learning procedure with privacy protection mechanisms, which can achieve multi-party collaborative computing without revealing the original data. However, in practice, FL faces a variety of challenging communication problems. This review seeks to elucidate the relationship between these communication issues by methodically assessing the development of FL communication research from three perspectives: communication efficiency, communication environment, and communication resource allocation. Firstly, we sort out the current challenges existing in the communications of FL. Second, we have collated FL communications-related papers and described the overall development trend of the field based on their logical relationship. Ultimately, we discuss the future directions of research for communications in FL.  相似文献   

8.
Recently, sentiment classification has received considerable attention within the natural language processing research community. However, since most recent works regarding sentiment classification have been done in the English language, there are accordingly not enough sentiment resources in other languages. Manual construction of reliable sentiment resources is a very difficult and time-consuming task. Cross-lingual sentiment classification aims to utilize annotated sentiment resources in one language (typically English) for sentiment classification of text documents in another language. Most existing research works rely on automatic machine translation services to directly project information from one language to another. However, different term distribution between original and translated text documents and translation errors are two main problems faced in the case of using only machine translation. To overcome these problems, we propose a novel learning model based on active learning and semi-supervised co-training to incorporate unlabelled data from the target language into the learning process in a bi-view framework. This model attempts to enrich training data by adding the most confident automatically-labelled examples, as well as a few of the most informative manually-labelled examples from unlabelled data in an iterative process. Further, in this model, we consider the density of unlabelled data so as to select more representative unlabelled examples in order to avoid outlier selection in active learning. The proposed model was applied to book review datasets in three different languages. Experiments showed that our model can effectively improve the cross-lingual sentiment classification performance and reduce labelling efforts in comparison with some baseline methods.  相似文献   

9.
In this paper, an adaptive Takagi–Sugeno (T–S) fuzzy controller based on reinforcement learning for controlling the nonlinear dynamical systems is proposed. The parameters of the T–S fuzzy system are learned using the reinforcement learning based on the actor-critic method. This on-line learning algorithm improves the controller performance over the time, which it learns from its own faults through the reinforcement signal from the external environment and tries to reinforce the T–S fuzzy system parameters to converge. The updating parameters are developed using the Lyapunov stability criterion. The proposed controller is faster in learning than the T–S fuzzy that parameters learned using the gradient descent method under the same conditions. Moreover, it is able to handle the load changes and the system uncertainties. The test is carried out based on two mathematical models. In addition, the proposed controller is applied practically for controlling a direct current (DC) shunt machine. The results indicate that the response of the proposed controller has a good performance compared with other controllers.  相似文献   

10.
李静  徐路路 《现代情报》2019,39(4):23-33
[目的/意义]细粒度分析学科领域热点主题发展脉络并对利用机器学习算法对未来发展趋势进行准确预测研究。[方法/过程]提出一种基于机器学习算法的研究热点趋势预测方法与分析框架,以基因工程领域为例利用主题概率模型识别WOS核心集中论文摘要数据研究热点主题并进行主题演化关联构建,然后选取BP神经网络、支持向量机及LSTM模型等3种典型机器学习算法进行预测分析,最后利用RE指标和精准度指标评价机器学习算法预测效果并对基因工程领域在医药卫生、农业食品等方面研究趋势进行分析。[结果/结论]实验表明基于LSTM模型对热点主题未来发展趋势预测准确度最高,支持向量机预测效果次之,BP神经网络预测效果较差且预测稳定性不足,同时结合专家咨询和文献调研表明本文方法可快速识别基因领域研究主题及发展趋势,可为我国学科领域大势研判和架构调整提供决策支持和参考。  相似文献   

11.
High-resolution probabilistic load forecasting can comprehensively characterize both the uncertainties and the dynamic trends of the future load. Such information is key to the reliable operation of the future power grid with a high penetration of renewables. To this end, various high-resolution probabilistic load forecasting models have been proposed in recent decades. Compared with a single model, it is widely acknowledged that combining different models can further enhance the prediction performance, which is called the model ensemble. However, existing model ensemble approaches for load forecasting are linear combination-based, like mean value ensemble, weighted average ensemble, and quantile regression, and linear combinations may not fully utilize the advantages of different models, seriously limiting the performance of the model ensemble. We propose a learning ensemble approach that adopts the machine learning model to directly learn the optimal nonlinear combination from data. We theoretically demonstrate that the proposed learning ensemble approach can outperform conventional ensemble approaches. Based on the proposed learning ensemble model, we also introduce a Shapley value-based method to evaluate the contributions of each model to the model ensemble. The numerical studies on field load data verify the remarkable performance of our proposed approach.  相似文献   

12.
13.
In this article, we focus on Chinese word segmentation by systematically incorporating non-local information based on latent variables and word-level features. Differing from previous work which captures non-local information by using semi-Markov models, we propose an alternative method for modeling non-local information: a latent variable word segmenter employing word-level features. In order to reduce computational complexity of learning non-local information, we further present an improved online training method, which can arrive the same objective optimum with a significantly accelerated training speed. We find that the proposed method can help the learning of long range dependencies and improve the segmentation quality of long words (for example, complicated named entities). Experimental results demonstrate that the proposed method is effective. With this improvement, evaluations on the data of the second SIGHAN CWS bakeoff show that our system is competitive with the state-of-the-art systems.  相似文献   

14.
由于受限于编译时所见的信息和缺乏精确的输入数据集和目标机信息,编译器为了保持程序正确性和避免性能降级必须做出保守的假设,往往得不到最佳性能。为了克服静态优化的不足,在研究java虚拟机中运行时优化技术的基础上,结合LLVM编译器架构,阐述了面向C/C++程序的运行时优化技术。  相似文献   

15.
Graph Convolutional Networks (GCNs) have been established as a fundamental approach for representation learning on graphs, based on convolution operations on non-Euclidean domain, defined by graph-structured data. GCNs and variants have achieved state-of-the-art results on classification tasks, especially in semi-supervised learning scenarios. A central challenge in semi-supervised classification consists in how to exploit the maximum of useful information encoded in the unlabeled data. In this paper, we address this issue through a novel self-training approach for improving the accuracy of GCNs on semi-supervised classification tasks. A margin score is used through a rank-based model to identify the most confident sample predictions. Such predictions are exploited as an expanded labeled set in a second-stage training step. Our model is suitable for different GCN models. Moreover, we also propose a rank aggregation of labeled sets obtained by different GCN models. The experimental evaluation considers four GCN variations and traditional benchmarks extensively used in the literature. Significant accuracy gains were achieved for all evaluated models, reaching results comparable or superior to the state-of-the-art. The best results were achieved for rank aggregation self-training on combinations of the four GCN models.  相似文献   

16.
何晓庆  蔡娜 《软科学》2013,27(1):141-144
组合方法首先选取支持向量机预测算法和一阶指数平滑法对经济时间序列分别进行预测,来建立模糊自适应变权重组合预测模型。为对比模糊自适应变权重的经济时间序列组合预测模型的预测效果,选取了两种定值加权组合预测模型:平均加权模型、误差平方和最小组合预测模型。通过实验比较分析:模糊自适应变权重组合预测可以综合利用各单项预测方法的优点,比单一模型预测结果精度有了很大提高,且优于定值加权组合预测,在经济时间序列的预测方面有较高的应用价值。  相似文献   

17.
Similarity search with hashing has become one of the fundamental research topics in computer vision and multimedia. The current researches on semantic-preserving hashing mainly focus on exploring the semantic similarities between pointwise or pairwise samples in the visual space to generate discriminative hash codes. However, such learning schemes fail to explore the intrinsic latent features embedded in the high-dimensional feature space and they are difficult to capture the underlying topological structure of data, yielding low-quality hash codes for image retrieval. In this paper, we propose an ordinal-preserving latent graph hashing (OLGH) method, which derives the objective hash codes from the latent space and preserves the high-order locally topological structure of data into the learned hash codes. Specifically, we conceive a triplet constrained topology-preserving loss to uncover the ordinal-inferred local features in binary representation learning. By virtue of this, the learning system can implicitly capture the high-order similarities among samples during the feature learning process. Moreover, the well-designed latent subspace learning is built to acquire the noise-free latent features based on the sparse constrained supervised learning. As such, the latent under-explored characteristics of data are fully employed in subspace construction. Furthermore, the latent ordinal graph hashing is formulated by jointly exploiting latent space construction and ordinal graph learning. An efficient optimization algorithm is developed to solve the resulting problem to achieve the optimal solution. Extensive experiments conducted on diverse datasets show the effectiveness and superiority of the proposed method when compared to some advanced learning to hash algorithms for fast image retrieval. The source codes of this paper are available at https://github.com/DarrenZZhang/OLGH .  相似文献   

18.
作为知识经济时代出现的新兴管理方法,企业知识管理的一个重要的研究课题,是对组织成员和人力资源的知识管理。传统的知识管理方法基于封闭的企业管理历史数据,不能深入地挖掘企业管理知识。本文拟从网络数据的行为建模和多源数据融合的角度,探索新的企业知识管理方法。以企业知识管理为研究目标,以石化企业员工为研究对象,以自然语言分析和处理为模式表示方法,以统计机器学习为知识分析工具,重点研究如何通过网络行为建模和多源数据融合,从而更全方位地对企业员工的性格、日常行为建立一套行之有效的分析机制,实现在开放网络时代的企业知识管理和绩效管理。在石化企业内部的知识管理实验性尝试表明,所提知识管理方法,能够更深入地对企业员工的行为进行分析和理解,能够更有效地将员工行为和企业人力资源中的若干关键指标建立相关性模型,从而为企业的知识管理和决策人员提供强有力的技术支持。  相似文献   

19.
李欣  温阳  黄鲁成  苗红 《科研管理》2021,42(1):20-32
研究前沿是科技创新过程中最具潜力和前瞻性的研究方向,尽早识别研究前沿对科学研究、企业研发资源优化配置、政府创新战略前瞻部署等至关重要。针对目前在研究前沿识别研究中存在的不足,提出一种基于机器学习的研究前沿识别方法。该方法首先通过构建机器学习模型来识别出潜在高被引论文,解决利用引文分析法来识别研究前沿的时滞性问题,并将潜在高被引论文纳入研究前沿识别的高被引论文核心文档集中;其次,以高被引论文核心文档集为数据源,利用聚类分析法识别出研究前沿主题,并对研究前沿主题进行对比和评价分析,进而识别出研究前沿;最后,以太阳能光伏电池研究领域为例进行了实证研究,验证了该方法的可行性和有效性,为研究前沿识别提供了新的研究方法。  相似文献   

20.
Existing approaches to learning path recommendation for online learning communities mainly rely on the individual characteristics of users or the historical records of their learning processes, but pay less attention to the semantics of users’ postings and the context. To facilitate the knowledge understanding and personalized learning of users in online learning communities, it is necessary to conduct a fine-grained analysis of user data to capture their dynamical learning characteristics and potential knowledge levels, so as to recommend appropriate learning paths. In this paper, we propose a fine-grained and multi-context-aware learning path recommendation model for online learning communities based on a knowledge graph. First, we design a multidimensional knowledge graph to solve the problem of monotonous and incomplete entity information presentation of the single layer knowledge graph. Second, we use the topic preference features of users’ postings to determine the starting point of learning paths. We then strengthen the distant relationship of knowledge in the global context using the multidimensional knowledge graph when generating and recommending learning paths. Finally, we build a user background similarity matrix to establish user connections in the local context to recommend users with similar knowledge levels and learning preferences and synchronize their subsequent postings. Experiment results show that the proposed model can recommend appropriate learning paths for users, and the recommended similar users and postings are effective.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号