-
Notifications
You must be signed in to change notification settings - Fork 11
/
index.json
1 lines (1 loc) · 83 KB
/
index.json
1
var index_json = [{"summary": " The article is devoted to the problem of small learning samples in machine\nlearning. The flaws of maximum likelihood learning and minimax learning are\nlooked into and the concept of minimax deviation learning is introduced that is\nfree of those flaws.\n", "id": "1707.04849v1", "title": "Minimax deviation strategies for machine learning and recognition with\n short learning samples"}, {"summary": " We conduct an empirical study of machine learning functionalities provided by\nmajor cloud service providers, which we call em machine learning clouds.\nMachine learning clouds hold the promise of hiding all the sophistication of\nrunning large-scale machine learning: Instead of specifying how to run a\nmachine learning task, users only specify what machine learning task to run and\nthe cloud figures out the rest. Raising the level of abstraction, however,\nrarely comes free --- a performance penalty is possible. How good, then, are\ncurrent machine learning clouds on real-world machine learning workloads?\n We study this question by presenting mlbench, a novel benchmark dataset\nconstructed with the top winning code for all available competitions on Kaggle,\nas well as the results we obtained by running mlbench on machine learning\nclouds from both Azure and Amazon. We analyze the strength and weakness of\nexisting machine learning clouds and discuss potential future directions.\n", "id": "1707.09562v2", "title": "mlbench: How Good Are Machine Learning Clouds for Binary Classification\n with Good Features?"}, {"summary": " Introduction to Machine learning covering Statistical Inference (Bayes, EM,\nML/MaxEnt duality), algebraic and spectral methods (PCA, LDA, CCA, Clustering),\nand PAC learning (the Formal model, VC dimension, Double Sampling theorem).\n", "id": "0904.3664v1", "title": "Introduction to Machine Learning: Class Notes 67577"}, {"summary": " In this paper, we propose AutoCompete, a highly automated machine learning\nframework for tackling machine learning competitions. This framework has been\nlearned by us, validated and improved over a period of more than two years by\nparticipating in online machine learning competitions. It aims at minimizing\nhuman interference required to build a first useful predictive model and to\nassess the practical difficulty of a given machine learning challenge. The\nproposed system helps in identifying data types, choosing a machine learn- ing\nmodel, tuning hyper-parameters, avoiding over-fitting and optimization for a\nprovided evaluation metric. We also observe that the proposed system produces\nbetter (or comparable) results with less runtime as compared to other\napproaches.\n", "id": "1507.02188v1", "title": "AutoCompete: A Framework for Machine Learning Competition"}, {"summary": " We introduce a new method for training deep Boltzmann machines jointly. Prior\nmethods require an initial learning pass that trains the deep Boltzmann machine\ngreedily, one layer at a time, or do not perform well on classifi- cation\ntasks.\n", "id": "1212.2686v1", "title": "Joint Training of Deep Boltzmann Machines"}, {"summary": " This is the Proceedings of the ICML Workshop on #Data4Good: Machine Learning\nin Social Good Applications, which was held on June 24, 2016 in New York.\n", "id": "1607.02450v2", "title": "Proceedings of the 2016 ICML Workshop on #Data4Good: Machine Learning in\n Social Good Applications"}, {"summary": " In this article, we extend the conventional framework of\nconvolutional-Restricted-Boltzmann-Machine to learn highly abstract features\namong abitrary number of time related input maps by constructing a layer of\nmultiplicative units, which capture the relations among inputs. In many cases,\nmore than two maps are strongly related, so it is wise to make multiplicative\nunit learn relations among more input maps, in other words, to find the optimal\nrelational-order of each unit. In order to enable our machine to learn\nrelational order, we developed a reinforcement-learning method whose optimality\nis proven to train the network.\n", "id": "1706.08001v1", "title": "Temporal-related Convolutional-Restricted-Boltzmann-Machine capable of\n learning relational order via reinforcement learning procedure?"}, {"summary": " We propose a clustering-based iterative algorithm to solve certain\noptimization problems in machine learning, where we start the algorithm by\naggregating the original data, solving the problem on aggregated data, and then\nin subsequent steps gradually disaggregate the aggregated data. We apply the\nalgorithm to common machine learning problems such as the least absolute\ndeviation regression problem, support vector machines, and semi-supervised\nsupport vector machines. We derive model-specific data aggregation and\ndisaggregation procedures. We also show optimality, convergence, and the\noptimality gap of the approximated solution in each iteration. A computational\nstudy is provided.\n", "id": "1607.01400v1", "title": "An Aggregate and Iterative Disaggregate Algorithm with Proven Optimality\n in Machine Learning"}, {"summary": " The engineering of machine learning systems is still a nascent field; relying\non a seemingly daunting collection of quickly evolving tools and best\npractices. It is our hope that this guidebook will serve as a useful resource\nfor machine learning practitioners looking to take advantage of Bayesian\noptimization techniques. We outline four example machine learning problems that\ncan be solved using open source machine learning libraries, and highlight the\nbenefits of using Bayesian optimization in the context of these common machine\nlearning applications.\n", "id": "1612.04858v1", "title": "Bayesian Optimization for Machine Learning : A Practical Guidebook"}, {"summary": " As machine learning systems become ubiquitous, there has been a surge of\ninterest in interpretable machine learning: systems that provide explanation\nfor their outputs. These explanations are often used to qualitatively assess\nother criteria such as safety or non-discrimination. However, despite the\ninterest in interpretability, there is very little consensus on what\ninterpretable machine learning is and how it should be measured. In this\nposition paper, we first define interpretability and describe when\ninterpretability is needed (and when it is not). Next, we suggest a taxonomy\nfor rigorous evaluation and expose open questions towards a more rigorous\nscience of interpretable machine learning.\n", "id": "1702.08608v2", "title": "Towards A Rigorous Science of Interpretable Machine Learning"}, {"summary": " Despite incredible recent advances in machine learning, building machine\nlearning applications remains prohibitively time-consuming and expensive for\nall but the best-trained, best-funded engineering organizations. This expense\ncomes not from a need for new and improved statistical models but instead from\na lack of systems and tools for supporting end-to-end machine learning\napplication development, from data preparation and labeling to\nproductionization and monitoring. In this document, we outline opportunities\nfor infrastructure supporting usable, end-to-end machine learning applications\nin the context of the nascent DAWN (Data Analytics for What's Next) project at\nStanford.\n", "id": "1705.07538v2", "title": "Infrastructure for Usable Machine Learning: The Stanford DAWN Project"}, {"summary": " Despite the promise of brain-inspired machine learning, deep neural networks\n(DNN) have frustratingly failed to bridge the deceptively large gap between\nlearning and memory. Here, we introduce a Perpetual Learning Machine; a new\ntype of DNN that is capable of brain-like dynamic 'on the fly' learning because\nit exists in a self-supervised state of Perpetual Stochastic Gradient Descent.\nThus, we provide the means to unify learning and memory within a machine\nlearning framework. We also explore the elegant duality of abstraction and\nsynthesis: the Yin and Yang of deep learning.\n", "id": "1509.00913v3", "title": "On-the-Fly Learning in a Perpetual Learning Machine"}, {"summary": " Mechanical learning is a computing system that is based on a set of simple\nand fixed rules, and can learn from incoming data. A learning machine is a\nsystem that realizes mechanical learning. Importantly, we emphasis that it is\nbased on a set of simple and fixed rules, contrasting to often called machine\nlearning that is sophisticated software based on very complicated mathematical\ntheory, and often needs human intervene for software fine tune and manual\nadjustments. Here, we discuss some basic facts and principles of such system,\nand try to lay down a framework for further study. We propose 2 directions to\napproach mechanical learning, just like Church-Turing pair: one is trying to\nrealize a learning machine, another is trying to well describe the mechanical\nlearning.\n", "id": "1602.00198v1", "title": "Discussion on Mechanical Learning and Learning Machine"}, {"summary": " This is an index to the papers that appear in the Proceedings of the 29th\nInternational Conference on Machine Learning (ICML-12). The conference was held\nin Edinburgh, Scotland, June 27th - July 3rd, 2012.\n", "id": "1207.4676v2", "title": "Proceedings of the 29th International Conference on Machine Learning\n (ICML-12)"}, {"summary": " We study the problem of distributed multi-task learning with shared\nrepresentation, where each machine aims to learn a separate, but related, task\nin an unknown shared low-dimensional subspaces, i.e. when the predictor matrix\nhas low rank. We consider a setting where each task is handled by a different\nmachine, with samples for the task available locally on the machine, and study\ncommunication-efficient methods for exploiting the shared structure.\n", "id": "1603.02185v1", "title": "Distributed Multi-Task Learning with Shared Representation"}, {"summary": " We comment on the fact that gradient ascent for logistic regression has a\nconnection with the perceptron learning algorithm. Logistic learning is the\n\"soft\" variant of perceptron learning.\n", "id": "1708.07826v1", "title": "Logistic Regression as Soft Perceptron Learning"}, {"summary": " Machine learning based system are increasingly being used for sensitive tasks\nsuch as security surveillance, guiding autonomous vehicle, taking investment\ndecisions, detecting and blocking network intrusion and malware etc. However,\nrecent research has shown that machine learning models are venerable to attacks\nby adversaries at all phases of machine learning (eg, training data collection,\ntraining, operation). All model classes of machine learning systems can be\nmisled by providing carefully crafted inputs making them wrongly classify\ninputs. Maliciously created input samples can affect the learning process of a\nML system by either slowing down the learning process, or affecting the\nperformance of the learned mode, or causing the system make error(s) only in\nattacker's planned scenario. Because of these developments, understanding\nsecurity of machine learning algorithms and systems is emerging as an important\nresearch area among computer security and machine learning researchers and\npractitioners. We present a survey of this emerging area in machine learning.\n", "id": "1707.03184v1", "title": "A Survey on Resilient Machine Learning"}, {"summary": " We consider the problem of distributed multi-task learning, where each\nmachine learns a separate, but related, task. Specifically, each machine learns\na linear predictor in high-dimensional space,where all tasks share the same\nsmall support. We present a communication-efficient estimator based on the\ndebiased lasso and show that it is comparable with the optimal centralized\nmethod.\n", "id": "1510.00633v1", "title": "Distributed Multitask Learning"}, {"summary": " The problem of learning automata from example traces (but no equivalence or\nmembership queries) is fundamental in automata learning theory and practice. In\nthis paper we study this problem for finite state machines with inputs and\noutputs, and in particular for Moore machines. We develop three algorithms for\nsolving this problem: (1) the PTAP algorithm, which transforms a set of\ninput-output traces into an incomplete Moore machine and then completes the\nmachine with self-loops; (2) the PRPNI algorithm, which uses the well-known\nRPNI algorithm for automata learning to learn a product of automata encoding a\nMoore machine; and (3) the MooreMI algorithm, which directly learns a Moore\nmachine using PTAP extended with state merging. We prove that MooreMI has the\nfundamental identification in the limit property. We also compare the\nalgorithms experimentally in terms of the size of the learned machine and\nseveral notions of accuracy, introduced in this paper. Finally, we compare with\nOSTIA, an algorithm that learns a more general class of transducers, and find\nthat OSTIA generally does not learn a Moore machine, even when fed with a\ncharacteristic sample.\n", "id": "1605.07805v2", "title": "Learning Moore Machines from Input-Output Traces"}, {"summary": " The last decade has seen huge progress in the development of advanced machine\nlearning models; however, those models are powerless unless human users can\ninterpret them. Here we show how the mind's construction of concepts and\nmeaning can be used to create more interpretable machine learning models. By\nproposing a novel method of classifying concepts, in terms of 'form' and\n'function', we elucidate the nature of meaning and offer proposals to improve\nmodel understandability. As machine learning begins to permeate daily life,\ninterpretable models may serve as a bridge between domain-expert authors and\nnon-expert users.\n", "id": "1607.00279v1", "title": "Meaningful Models: Utilizing Conceptual Structure to Improve Machine\n Learning Interpretability"}, {"summary": " The emerging field of quantum machine learning has the potential to\nsubstantially aid in the problems and scope of artificial intelligence. This is\nonly enhanced by recent successes in the field of classical machine learning.\nIn this work we propose an approach for the systematic treatment of machine\nlearning, from the perspective of quantum information. Our approach is general\nand covers all three main branches of machine learning: supervised,\nunsupervised and reinforcement learning. While quantum improvements in\nsupervised and unsupervised learning have been reported, reinforcement learning\nhas received much less attention. Within our approach, we tackle the problem of\nquantum enhancements in reinforcement learning as well, and propose a\nsystematic scheme for providing improvements. As an example, we show that\nquadratic improvements in learning efficiency, and exponential improvements in\nperformance over limited time periods, can be obtained for a broad class of\nlearning problems.\n", "id": "1610.08251v1", "title": "Quantum-enhanced machine learning"}, {"summary": " This is the Proceedings of NIPS 2016 Workshop on Interpretable Machine\nLearning for Complex Systems, held in Barcelona, Spain on December 9, 2016\n", "id": "1611.09139v1", "title": "Proceedings of NIPS 2016 Workshop on Interpretable Machine Learning for\n Complex Systems"}, {"summary": " In this paper we present applications of different machine learning\nalgorithms in aquaculture. Machine learning algorithms learn models from\nhistorical data. In aquaculture historical data are obtained from farm\npractices, yields, and environmental data sources. Associations between these\ndifferent variables can be obtained by applying machine learning algorithms to\nhistorical data. In this paper we present applications of different machine\nlearning algorithms in aquaculture applications.\n", "id": "1405.1304v1", "title": "Application of Machine Learning Techniques in Aquaculture"}, {"summary": " TF.Learn is a high-level Python module for distributed machine learning\ninside TensorFlow. It provides an easy-to-use Scikit-learn style interface to\nsimplify the process of creating, configuring, training, evaluating, and\nexperimenting a machine learning model. TF.Learn integrates a wide range of\nstate-of-art machine learning algorithms built on top of TensorFlow's low level\nAPIs for small to large-scale supervised and unsupervised problems. This module\nfocuses on bringing machine learning to non-specialists using a general-purpose\nhigh-level language as well as researchers who want to implement, benchmark,\nand compare their new methods in a structured environment. Emphasis is put on\nease of use, performance, documentation, and API consistency.\n", "id": "1612.04251v1", "title": "TF.Learn: TensorFlow's High-level Module for Distributed Machine\n Learning"}, {"summary": " From the point of view of a programmer, the robopsychology is a synonym for\nthe activity is done by developers to implement their machine learning\napplications. This robopsychological approach raises some fundamental\ntheoretical questions of machine learning. Our discussion of these questions is\nconstrained to Turing machines. Alan Turing had given an algorithm (aka the\nTuring Machine) to describe algorithms. If it has been applied to describe\nitself then this brings us to Turing's notion of the universal machine. In the\npresent paper, we investigate algorithms to write algorithms. From a pedagogy\npoint of view, this way of writing programs can be considered as a combination\nof learning by listening and learning by doing due to it is based on applying\nagent technology and machine learning. As the main result we introduce the\nproblem of learning and then we show that it cannot easily be handled in\nreality therefore it is reasonable to use machine learning algorithm for\nlearning Turing machines.\n", "id": "1606.02767v2", "title": "Theoretical Robopsychology: Samu Has Learned Turing Machines"}, {"summary": " MM (majorization--minimization) algorithms are an increasingly popular tool\nfor solving optimization problems in machine learning and statistical\nestimation. This article introduces the MM algorithm framework in general and\nvia three popular example applications: Gaussian mixture regressions,\nmultinomial logistic regressions, and support vector machines. Specific\nalgorithms for the three examples are derived and numerical demonstrations are\npresented. Theoretical and practical aspects of MM algorithm design are\ndiscussed.\n", "id": "1611.03969v1", "title": "An Introduction to MM Algorithms for Machine Learning and Statistical"}, {"summary": " Since 2006, deep learning (DL) has become a rapidly growing research\ndirection, redefining state-of-the-art performances in a wide range of areas\nsuch as object recognition, image segmentation, speech recognition and machine\ntranslation. In modern manufacturing systems, data-driven machine health\nmonitoring is gaining in popularity due to the widespread deployment of\nlow-cost sensors and their connection to the Internet. Meanwhile, deep learning\nprovides useful tools for processing and analyzing these big machinery data.\nThe main purpose of this paper is to review and summarize the emerging research\nwork of deep learning on machine health monitoring. After the brief\nintroduction of deep learning techniques, the applications of deep learning in\nmachine health monitoring systems are reviewed mainly from the following\naspects: Auto-encoder (AE) and its variants, Restricted Boltzmann Machines and\nits variants including Deep Belief Network (DBN) and Deep Boltzmann Machines\n(DBM), Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN).\nFinally, some new trends of DL-based machine health monitoring methods are\ndiscussed.\n", "id": "1612.07640v1", "title": "Deep Learning and Its Applications to Machine Health Monitoring: A\n Survey"}, {"summary": " In this position paper, I first describe a new perspective on machine\nlearning (ML) by four basic problems (or levels), namely, \"What to learn?\",\n\"How to learn?\", \"What to evaluate?\", and \"What to adjust?\". The paper stresses\nmore on the first level of \"What to learn?\", or \"Learning Target Selection\".\nTowards this primary problem within the four levels, I briefly review the\nexisting studies about the connection between information theoretical learning\n(ITL [1]) and machine learning. A theorem is given on the relation between the\nempirically-defined similarity measure and information measures. Finally, a\nconjecture is proposed for pursuing a unified mathematical interpretation to\nlearning target selection.\n", "id": "1501.04309v1", "title": "Information Theory and its Relation to Machine Learning"}, {"summary": " There are two common approaches for optimizing the performance of a machine:\ngenetic algorithms and machine learning. A genetic algorithm is applied over\nmany generations whereas machine learning works by applying feedback until the\nsystem meets a performance threshold. Though these are methods that typically\noperate separately, we combine evolutionary adaptation and machine learning\ninto one approach. Our focus is on machines that can learn during their\nlifetime, but instead of equipping them with a machine learning algorithm we\naim to let them evolve their ability to learn by themselves. We use evolvable\nnetworks of probabilistic and deterministic logic gates, known as Markov\nBrains, as our computational model organism. The ability of Markov Brains to\nlearn is augmented by a novel adaptive component that can change its\ncomputational behavior based on feedback. We show that Markov Brains can indeed\nevolve to incorporate these feedback gates to improve their adaptability to\nvariable environments. By combining these two methods, we now also implemented\na computational model that can be used to study the evolution of learning.\n", "id": "1705.10201v2", "title": "Machine Learned Learning Machines"}, {"summary": " Recently, increased computational power and data availability, as well as\nalgorithmic advances, have led machine learning techniques to impressive\nresults in regression, classification, data-generation and reinforcement\nlearning tasks. Despite these successes, the proximity to the physical limits\nof chip fabrication alongside the increasing size of datasets are motivating a\ngrowing number of researchers to explore the possibility of harnessing the\npower of quantum computation to speed-up classical machine learning algorithms.\nHere we review the literature in quantum machine learning and discuss\nperspectives for a mixed readership of classical machine learning and quantum\ncomputation experts. Particular emphasis will be placed on clarifying the\nlimitations of quantum algorithms, how they compare with their best classical\ncounterparts and why quantum resources are expected to provide advantages for\nlearning problems. Learning in the presence of noise and certain\ncomputationally hard problems in machine learning are identified as promising\ndirections for the field. Practical questions, like how to upload classical\ndata into quantum form, will also be addressed.\n", "id": "1707.08561v2", "title": "Quantum machine learning: a classical perspective"}, {"summary": " In [1], we introduced mechanical learning and proposed 2 approaches to\nmechanical learning. Here, we follow one such approach to well describe the\nobjects and the processes of learning. We discuss 2 kinds of patterns:\nobjective and subjective pattern. Subjective pattern is crucial for learning\nmachine. We prove that for any objective pattern we can find a proper\nsubjective pattern based upon least base patterns to express the objective\npattern well. X-form is algebraic expression for subjective pattern. Collection\nof X-forms form internal representation space, which is center of learning\nmachine. We discuss learning by teaching and without teaching. We define data\nsufficiency by X-form. We then discussed some learning strategies. We show, in\neach strategy, with sufficient data, and with certain capabilities, learning\nmachine indeed can learn any pattern (universal learning machine). In appendix,\nwith knowledge of learning machine, we try to view deep learning from a\ndifferent angle, i.e. its internal representation space and its learning\ndynamics.\n", "id": "1706.00066v1", "title": "Descriptions of Objectives and Processes of Mechanical Learning"}, {"summary": " Using an optimization algorithm to solve a machine learning problem is one of\nmainstreams in the field of science. In this work, we demonstrate a\ncomprehensive comparison of some state-of-the-art first-order optimization\nalgorithms for convex optimization problems in machine learning. We concentrate\non several smooth and non-smooth machine learning problems with a loss function\nplus a regularizer. The overall experimental results show the superiority of\nprimal-dual algorithms in solving a machine learning problem from the\nperspectives of the ease to construct, running time and accuracy.\n", "id": "1404.6674v1", "title": "A Comparison of First-order Algorithms for Machine Learning"}, {"summary": " Understanding why machine learning models behave the way they do empowers\nboth system designers and end-users in many ways: in model selection, feature\nengineering, in order to trust and act upon the predictions, and in more\nintuitive user interfaces. Thus, interpretability has become a vital concern in\nmachine learning, and work in the area of interpretable models has found\nrenewed interest. In some applications, such models are as accurate as\nnon-interpretable ones, and thus are preferred for their transparency. Even\nwhen they are not accurate, they may still be preferred when interpretability\nis of paramount importance. However, restricting machine learning to\ninterpretable models is often a severe limitation. In this paper we argue for\nexplaining machine learning predictions using model-agnostic approaches. By\ntreating the machine learning models as black-box functions, these approaches\nprovide crucial flexibility in the choice of models, explanations, and\nrepresentations, improving debugging, comparison, and interfaces for a variety\nof users and models. We also outline the main challenges for such methods, and\nreview a recently-introduced model-agnostic explanation approach (LIME) that\naddresses these challenges.\n", "id": "1606.05386v1", "title": "Model-Agnostic Interpretability of Machine Learning"}, {"summary": " Which topics of machine learning are most commonly addressed in research?\nThis question was initially answered in 2007 by doing a qualitative survey\namong distinguished researchers. In our study, we revisit this question from a\nquantitative perspective. Concretely, we collect 54K abstracts of papers\npublished between 2007 and 2016 in leading machine learning journals and\nconferences. We then use machine learning in order to determine the top 10\ntopics in machine learning. We not only include models, but provide a holistic\nview across optimization, data, features, etc. This quantitative approach\nallows reducing the bias of surveys. It reveals new and up-to-date insights\ninto what the 10 most prolific topics in machine learning research are. This\nallows researchers to identify popular topics as well as new and rising topics\nfor their research.\n", "id": "1703.10121v1", "title": "The Top 10 Topics in Machine Learning Revisited: A Quantitative\n Meta-Study"}, {"summary": " Mismatching problem between the source and target noisy corpora severely\nhinder the practical use of the machine-learning-based voice activity detection\n(VAD). In this paper, we try to address this problem in the transfer learning\nprospective. Transfer learning tries to find a common learning machine or a\ncommon feature subspace that is shared by both the source corpus and the target\ncorpus. The denoising deep neural network is used as the learning machine.\nThree transfer techniques, which aim to learn common feature representations,\nare used for analysis. Experimental results demonstrate the effectiveness of\nthe transfer learning schemes on the mismatch problem.\n", "id": "1303.2104v1", "title": "Transfer Learning for Voice Activity Detection: A Denoising Deep Neural\n Network Perspective"}, {"summary": " This paper presents machine learning solutions to a practical problem of\nNatural Language Generation (NLG), particularly the word formation in\nagglutinative languages like Tamil, in a supervised manner. The morphological\ngenerator is an important component of Natural Language Processing in\nArtificial Intelligence. It generates word forms given a root and affixes. The\nmorphophonemic changes like addition, deletion, alternation etc., occur when\ntwo or more morphemes or words joined together. The Sandhi rules should be\nexplicitly specified in the rule based morphological analyzers and generators.\nIn machine learning framework, these rules can be learned automatically by the\nsystem from the training samples and subsequently be applied for new inputs. In\nthis paper we proposed the machine learning models which learn the\nmorphophonemic rules for noun declensions from the given training data. These\nmodels are trained to learn sandhi rules using various learning algorithms and\nthe performance of those algorithms are presented. From this we conclude that\nmachine learning of morphological processing such as word form generation can\nbe successfully learned in a supervised manner, without explicit description of\nrules. The performance of Decision trees and Bayesian machine learning\nalgorithms on noun declensions are discussed.\n", "id": "1402.3382v1", "title": "Machine Learning of Phonologically Conditioned Noun Declensions For\n Tamil Morphological Generators"}, {"summary": " Machine learning is the capacity of a computational system to learn\nstructures from datasets in order to make predictions on newly seen data. Such\nan approach offers a significant advantage in music scenarios in which\nmusicians can teach the system to learn an idiosyncratic style, or can break\nthe rules to explore the system's capacity in unexpected ways. In this chapter\nwe draw on music, machine learning, and human-computer interaction to elucidate\nan understanding of machine learning algorithms as creative tools for music and\nthe sonic arts. We motivate a new understanding of learning algorithms as\nhuman-computer interfaces. We show that, like other interfaces, learning\nalgorithms can be characterised by the ways their affordances intersect with\ngoals of human users. We also argue that the nature of interaction between\nusers and algorithms impacts the usability and usefulness of those algorithms\nin profound ways. This human-centred view of machine learning motivates our\nconcluding discussion of what it means to employ machine learning as a creative\ntool.\n", "id": "1611.00379v1", "title": "The Machine Learning Algorithm as Creative Musical Tool"}, {"summary": " In this short vision paper, we introduce a machine learning optimizer for\ndata management and describe its architecture and main functionality.\n", "id": "1301.1575v1", "title": "BigDB: Automatic Machine Learning Optimizer"}, {"summary": " Relational logistic regression (RLR) is a representation of conditional\nprobability in terms of weighted formulae for modelling multi-relational data.\nIn this paper, we develop a learning algorithm for RLR models. Learning an RLR\nmodel from data consists of two steps: 1- learning the set of formulae to be\nused in the model (a.k.a. structure learning) and learning the weight of each\nformula (a.k.a. parameter learning). For structure learning, we deploy Schmidt\nand Murphy's hierarchical assumption: first we learn a model with simple\nformulae, then more complex formulae are added iteratively only if all their\nsub-formulae have proven effective in previous learned models. For parameter\nlearning, we convert the problem into a non-relational learning problem and use\nan off-the-shelf logistic regression learning algorithm from Weka, an\nopen-source machine learning tool, to learn the weights. We also indicate how\nhidden features about the individuals can be incorporated into RLR to boost the\nlearning performance. We compare our learning algorithm to other structure and\nparameter learning algorithms in the literature, and compare the performance of\nRLR models to standard logistic regression and RDN-Boost on a modified version\nof the MovieLens data-set.\n", "id": "1606.08531v1", "title": "A Learning Algorithm for Relational Logistic Regression: Preliminary\n Results"}, {"summary": " The current processes for building machine learning systems require\npractitioners with deep knowledge of machine learning. This significantly\nlimits the number of machine learning systems that can be created and has led\nto a mismatch between the demand for machine learning systems and the ability\nfor organizations to build them. We believe that in order to meet this growing\ndemand for machine learning systems we must significantly increase the number\nof individuals that can teach machines. We postulate that we can achieve this\ngoal by making the process of teaching machines easy, fast and above all,\nuniversally accessible.\n While machine learning focuses on creating new algorithms and improving the\naccuracy of \"learners\", the machine teaching discipline focuses on the efficacy\nof the \"teachers\". Machine teaching as a discipline is a paradigm shift that\nfollows and extends principles of software engineering and programming\nlanguages. We put a strong emphasis on the teacher and the teacher's\ninteraction with data, as well as crucial components such as techniques and\ndesign principles of interaction and visualization.\n In this paper, we present our position regarding the discipline of machine\nteaching and articulate fundamental machine teaching principles. We also\ndescribe how, by decoupling knowledge about machine learning algorithms from\nthe process of teaching, we can accelerate innovation and empower millions of\nnew uses for machine learning models.\n", "id": "1707.06742v3", "title": "Machine Teaching: A New Paradigm for Building Machine Learning Systems"}, {"summary": " Preference learning (PL) is a core area of machine learning that handles\ndatasets with ordinal relations. As the number of generated data of ordinal\nnature is increasing, the importance and role of the PL field becomes central\nwithin machine learning research and practice. This paper introduces an open\nsource, scalable, efficient and accessible preference learning toolbox that\nsupports the key phases of the data training process incorporating various\npopular data preprocessing, feature selection and preference learning methods.\n", "id": "1506.01709v1", "title": "The Preference Learning Toolbox"}, {"summary": " We consider the problems of robust PAC learning from distributed and\nstreaming data, which may contain malicious errors and outliers, and analyze\ntheir fundamental complexity questions. In particular, we establish lower\nbounds on the communication complexity for distributed robust learning\nperformed on multiple machines, and on the space complexity for robust learning\nfrom streaming data on a single machine. These results demonstrate that gaining\nrobustness of learning algorithms is usually at the expense of increased\ncomplexities. As far as we know, this work gives the first complexity results\nfor distributed and online robust PAC learning.\n", "id": "1703.10444v1", "title": "On Fundamental Limits of Robust Learning"}, {"summary": " Machine learning is a quickly evolving field which now looks really different\nfrom what it was 15 years ago, when classification and clustering were major\nissues. This document proposes several trends to explore the new questions of\nmodern machine learning, with the strong afterthought that the belief function\nframework has a major role to play.\n", "id": "1504.03874v1", "title": "Bridging belief function theory to modern machine learning"}, {"summary": " In this short paper, the Electre Tri-Machine Learning Method, generally used\nto solve ordinal classification problems, is proposed for solving the Record\nLinkage problem. Preliminary experimental results show that, using the Electre\nTri method, high accuracy can be achieved and more than 99% of the matches and\nnonmatches were correctly identified by the procedure.\n", "id": "1505.06614v1", "title": "Electre Tri-Machine Learning Approach to the Record Linkage Problem"}, {"summary": " This is the Proceedings of the 2016 ICML Workshop on Human Interpretability\nin Machine Learning (WHI 2016), which was held in New York, NY, June 23, 2016.\n Invited speakers were Susan Athey, Rich Caruana, Jacob Feldman, Percy Liang,\nand Hanna Wallach.\n", "id": "1607.02531v2", "title": "Proceedings of the 2016 ICML Workshop on Human Interpretability in\n Machine Learning (WHI 2016)"}, {"summary": " We identify conditional parity as a general notion of non-discrimination in\nmachine learning. In fact, several recently proposed notions of\nnon-discrimination, including a few counterfactual notions, are instances of\nconditional parity. We show that conditional parity is amenable to statistical\nanalysis by studying randomization as a general mechanism for achieving\nconditional parity and a kernel-based test of conditional parity.\n", "id": "1706.08519v1", "title": "On conditional parity as a notion of non-discrimination in machine\n learning"}, {"summary": " This is the Proceedings of the 2017 ICML Workshop on Human Interpretability\nin Machine Learning (WHI 2017), which was held in Sydney, Australia, August 10,\n2017. Invited speakers were Tony Jebara, Pang Wei Koh, and David Sontag.\n", "id": "1708.02666v1", "title": "Proceedings of the 2017 ICML Workshop on Human Interpretability in\n Machine Learning (WHI 2017)"}, {"summary": " Machine learning has become pervasive in multiple domains, impacting a wide\nvariety of applications, such as knowledge discovery and data mining, natural\nlanguage processing, information retrieval, computer vision, social and health\ninformatics, ubiquitous computing, etc. Two essential problems of machine\nlearning are how to generate features and how to acquire labels for machines to\nlearn. Particularly, labeling large amount of data for each domain-specific\nproblem can be very time consuming and costly. It has become a key obstacle in\nmaking learning protocols realistic in applications. In this paper, we will\ndiscuss how to use the existing general-purpose world knowledge to enhance\nmachine learning processes, by enriching the features or reducing the labeling\nwork. We start from the comparison of world knowledge with domain-specific\nknowledge, and then introduce three key problems in using world knowledge in\nlearning processes, i.e., explicit and implicit feature representation,\ninference for knowledge linking and disambiguation, and learning with direct or\nindirect supervision. Finally we discuss the future directions of this research\ntopic.\n", "id": "1705.02908v1", "title": "Machine Learning with World Knowledge: The Position and Survey"}, {"summary": " We propose a novel notion of a quantum learning machine for automatically\ncontrolling quantum coherence and for developing quantum algorithms. A quantum\nlearning machine can be trained to learn a certain task with no a priori\nknowledge on its algorithm. As an example, it is demonstrated that the quantum\nlearning machine learns Deutsch's task and finds itself a quantum algorithm,\nthat is different from but equivalent to the original one.\n", "id": "0803.2976v2", "title": "Quantum Learning Machine"}, {"summary": " Deep Boltzmann machines are in principle powerful models for extracting the\nhierarchical structure of data. Unfortunately, attempts to train layers jointly\n(without greedy layer-wise pretraining) have been largely unsuccessful. We\npropose a modification of the learning algorithm that initially recenters the\noutput of the activation functions to zero. This modification leads to a better\nconditioned Hessian and thus makes learning easier. We test the algorithm on\nreal data and demonstrate that our suggestion, the centered deep Boltzmann\nmachine, learns a hierarchy of increasingly abstract representations and a\nbetter generative model of data.\n", "id": "1203.3783v1", "title": "Learning Feature Hierarchies with Centered Deep Boltzmann Machines"}, {"summary": " We introduce the hyperparameter search problem in the field of machine\nlearning and discuss its main challenges from an optimization perspective.\nMachine learning methods attempt to build models that capture some element of\ninterest based on given data. Most common learning algorithms feature a set of\nhyperparameters that must be determined before training commences. The choice\nof hyperparameters can significantly affect the resulting model's performance,\nbut determining good values can be complex; hence a disciplined, theoretically\nsound search strategy is essential.\n", "id": "1502.02127v2", "title": "Hyperparameter Search in Machine Learning"}, {"summary": " This is a machine learning application paper involving big data. We present\nhigh-accuracy prediction methods of rare events in semi-structured machine log\nfiles, which are produced at high velocity and high volume by NORC's\ncomputer-assisted telephone interviewing (CATI) network for conducting surveys.\nWe judiciously apply natural language processing (NLP) techniques and\ndata-mining strategies to train effective learning and prediction models for\nclassifying uncommon error messages in the log---without access to source code,\nupdated documentation or dictionaries. In particular, our simple but effective\napproach of features preallocation for learning from imbalanced data coupled\nwith naive Bayes classifiers can be conceivably generalized to supervised or\nsemi-supervised learning and prediction methods for other critical events such\nas cyberattack detection.\n", "id": "1510.00772v1", "title": "Machine Learning for Machine Data from a CATI Network"}, {"summary": " Optimal transport distances, otherwise known as Wasserstein distances, have\nrecently drawn ample attention in computer vision and machine learning as a\npowerful discrepancy measure for probability distributions. The recent\ndevelopments on alternative formulations of the optimal transport have allowed\nfor faster solutions to the problem and has revamped its practical applications\nin machine learning. In this paper, we exploit the widely used kernel methods\nand provide a family of provably positive definite kernels based on the Sliced\nWasserstein distance and demonstrate the benefits of these kernels in a variety\nof learning tasks. Our work provides a new perspective on the application of\noptimal transport flavored distances through kernel methods in machine learning\ntasks.\n", "id": "1511.03198v1", "title": "Sliced Wasserstein Kernels for Probability Distributions"}, {"summary": " Distillation (Hinton et al., 2015) and privileged information (Vapnik &\nIzmailov, 2015) are two techniques that enable machines to learn from other\nmachines. This paper unifies these two techniques into generalized\ndistillation, a framework to learn from multiple machines and data\nrepresentations. We provide theoretical and causal insight about the inner\nworkings of generalized distillation, extend it to unsupervised, semisupervised\nand multitask learning scenarios, and illustrate its efficacy on a variety of\nnumerical simulations on both synthetic and real-world data.\n", "id": "1511.03643v3", "title": "Unifying distillation and privileged information"}, {"summary": " Multi-hop inference is necessary for machine learning systems to successfully\nsolve tasks such as Recognising Textual Entailment and Machine Reading. In this\nwork, we demonstrate the effectiveness of adaptive computation for learning the\nnumber of inference steps required for examples of different complexity and\nthat learning the correct number of inference steps is difficult. We introduce\nthe first model involving Adaptive Computation Time which provides a small\nperformance benefit on top of a similar model without an adaptive component as\nwell as enabling considerable insight into the reasoning process of the model.\n", "id": "1610.07647v2", "title": "Learning to Reason With Adaptive Computation"}, {"summary": " We introduce and describe the results of a novel shared task on bandit\nlearning for machine translation. The task was organized jointly by Amazon and\nHeidelberg University for the first time at the Second Conference on Machine\nTranslation (WMT 2017). The goal of the task is to encourage research on\nlearning machine translation from weak user feedback instead of human\nreferences or post-edits. On each of a sequence of rounds, a machine\ntranslation system is required to propose a translation for an input, and\nreceives a real-valued estimate of the quality of the proposed translation for\nlearning. This paper describes the shared task's learning and evaluation setup,\nusing services hosted on Amazon Web Services (AWS), the data and evaluation\nmetrics, and the results of various machine translation architectures and\nlearning protocols.\n", "id": "1707.09050v1", "title": "A Shared Task on Bandit Learning for Machine Translation"}, {"summary": " This paper introduces Dex, a reinforcement learning environment toolkit\nspecialized for training and evaluation of continual learning methods as well\nas general reinforcement learning problems. We also present the novel continual\nlearning method of incremental learning, where a challenging environment is\nsolved using optimal weight initialization learned from first solving a similar\neasier environment. We show that incremental learning can produce vastly\nsuperior results than standard methods by providing a strong baseline method\nacross ten Dex environments. We finally develop a saliency method for\nqualitative analysis of reinforcement learning, which shows the impact\nincremental learning has on network attention.\n", "id": "1706.05749v1", "title": "Dex: Incremental Learning for Complex Environments in Deep Reinforcement\n Learning"}, {"summary": " We present generalization bounds for the TS-MKL framework for two stage\nmultiple kernel learning. We also present bounds for sparse kernel learning\nformulations within the TS-MKL framework.\n", "id": "1302.0406v1", "title": "Generalization Guarantees for a Binary Classification Framework for\n Two-Stage Multiple Kernel Learning"}, {"summary": " In this paper we study different approaches for time series modeling. The\nforecasting approaches using linear models, ARIMA alpgorithm, XGBoost machine\nlearning algorithm are described. Results of different model combinations are\nshown. For probabilistic modeling the approaches using copulas and Bayesian\ninference are considered.\n", "id": "1703.01977v1", "title": "Linear, Machine Learning and Probabilistic Approaches for Time Series\n Analysis"}, {"summary": " Mutual learning of a pair of tree parity machines with continuous and\ndiscrete weight vectors is studied analytically. The analysis is based on a\nmapping procedure that maps the mutual learning in tree parity machines onto\nmutual learning in noisy perceptrons. The stationary solution of the mutual\nlearning in the case of continuous tree parity machines depends on the learning\nrate where a phase transition from partial to full synchronization is observed.\nIn the discrete case the learning process is based on a finite increment and a\nfull synchronized state is achieved in a finite number of steps. The\nsynchronization of discrete parity machines is introduced in order to construct\nan ephemeral key-exchange protocol. The dynamic learning of a third tree parity\nmachine (an attacker) that tries to imitate one of the two machines while the\ntwo still update their weight vectors is also analyzed. In particular, the\nsynchronization times of the naive attacker and the flipping attacker recently\nintroduced in [1] are analyzed. All analytical results are found to be in good\nagreement with simulation results.\n", "id": "cond-mat/0209234v1", "title": "Mutual learning in a tree parity machine and its application to\n cryptography"}, {"summary": " Machine Learning approaches are good in solving problems that have less\ninformation. In most cases, the software domain problems characterize as a\nprocess of learning that depend on the various circumstances and changes\naccordingly. A predictive model is constructed by using machine learning\napproaches and classified them into defective and non-defective modules.\nMachine learning techniques help developers to retrieve useful information\nafter the classification and enable them to analyse data from different\nperspectives. Machine learning techniques are proven to be useful in terms of\nsoftware bug prediction. This study used public available data sets of software\nmodules and provides comparative performance analysis of different machine\nlearning techniques for software bug prediction. Results showed most of the\nmachine learning methods performed well on software bug datasets.\n", "id": "1506.07563v1", "title": "Benchmarking Machine Learning Technologies for Software Defect Detection"}, {"summary": " Predictive models are often used for real-time decision making. However,\ntypical machine learning techniques ignore feature evaluation cost, and focus\nsolely on the accuracy of the machine learning models obtained utilizing all\nthe features available. We develop algorithms and indexes to support\ncost-sensitive prediction, i.e., making decisions using machine learning models\ntaking feature evaluation cost into account. Given an item and a online\ncomputation cost (i.e., time) budget, we present two approaches to return an\nappropriately chosen machine learning model that will run within the specified\ntime on the given item. The first approach returns the optimal machine learning\nmodel, i.e., one with the highest accuracy, that runs within the specified\ntime, but requires significant up-front precomputation time. The second\napproach returns a possibly sub- optimal machine learning model, but requires\nlittle up-front precomputation time. We study these two algorithms in detail\nand characterize the scenarios (using real and synthetic data) in which each\nperforms well. Unlike prior work that focuses on a narrow domain or a specific\nalgorithm, our techniques are very general: they apply to any cost-sensitive\nprediction scenario on any machine learning algorithm.\n", "id": "1408.4072v1", "title": "Indexing Cost Sensitive Prediction"}, {"summary": " As data science continues to grow in popularity, there will be an increasing\nneed to make data science tools more scalable, flexible, and accessible. In\nparticular, automated machine learning (AutoML) systems seek to automate the\nprocess of designing and optimizing machine learning pipelines. In this\nchapter, we present a genetic programming-based AutoML system called TPOT that\noptimizes a series of feature preprocessors and machine learning models with\nthe goal of maximizing classification accuracy on a supervised classification\nproblem. Further, we analyze a large database of pipelines that were previously\nused to solve various supervised classification problems and identify 100 short\nseries of machine learning operations that appear the most frequently, which we\ncall the building blocks of machine learning pipelines. We harness these\nbuilding blocks to initialize TPOT with promising solutions, and find that this\nsensible initialization method significantly improves TPOT's performance on one\nbenchmark at no cost of significantly degrading performance on the others.\nThus, sensible initialization with machine learning pipeline building blocks\nshows promise for GP-based AutoML systems, and should be further refined in\nfuture work.\n", "id": "1607.08878v1", "title": "Identifying and Harnessing the Building Blocks of Machine Learning\n Pipelines for Sensible Initialization of a Data Science Automation Tool"}, {"summary": " In order to achieve state-of-the-art performance, modern machine learning\ntechniques require careful data pre-processing and hyperparameter tuning.\nMoreover, given the ever increasing number of machine learning models being\ndeveloped, model selection is becoming increasingly important. Automating the\nselection and tuning of machine learning pipelines consisting of data\npre-processing methods and machine learning models, has long been one of the\ngoals of the machine learning community. In this paper, we tackle this\nmeta-learning task by combining ideas from collaborative filtering and Bayesian\noptimization. Using probabilistic matrix factorization techniques and\nacquisition functions from Bayesian optimization, we exploit experiments\nperformed in hundreds of different datasets to guide the exploration of the\nspace of possible pipelines. In our experiments, we show that our approach\nquickly identifies high-performing pipelines across a wide range of datasets,\nsignificantly outperforming the current state-of-the-art.\n", "id": "1705.05355v1", "title": "Probabilistic Matrix Factorization for Automated Machine Learning"}, {"summary": " We investigate the learning of quantitative structure activity relationships\n(QSARs) as a case-study of meta-learning. This application area is of the\nhighest societal importance, as it is a key step in the development of new\nmedicines. The standard QSAR learning problem is: given a target (usually a\nprotein) and a set of chemical compounds (small molecules) with associated\nbioactivities (e.g. inhibition of the target), learn a predictive mapping from\nmolecular representation to activity. Although almost every type of machine\nlearning method has been applied to QSAR learning there is no agreed single\nbest way of learning QSARs, and therefore the problem area is well-suited to\nmeta-learning. We first carried out the most comprehensive ever comparison of\nmachine learning methods for QSAR learning: 18 regression methods, 6 molecular\nrepresentations, applied to more than 2,700 QSAR problems. (These results have\nbeen made publicly available on OpenML and represent a valuable resource for\ntesting novel meta-learning methods.) We then investigated the utility of\nalgorithm selection for QSAR problems. We found that this meta-learning\napproach outperformed the best individual QSAR learning method (random forests\nusing a molecular fingerprint representation) by up to 13%, on average. We\nconclude that meta-learning outperforms base-learning methods for QSAR\nlearning, and as this investigation is one of the most extensive ever\ncomparisons of base and meta-learning methods ever made, it provides evidence\nfor the general effectiveness of meta-learning over base-learning.\n", "id": "1709.03854v1", "title": "Meta-QSAR: a large-scale application of meta-learning to drug design and\n discovery"}, {"summary": " Machine learning is a thriving part of computer science. There are many\nefficient approaches to machine learning that do not provide strong theoretical\nguarantees, and a beautiful general learning theory. Unfortunately, machine\nlearning approaches that give strong theoretical guarantees have not been\nefficient enough to be applicable. In this paper we introduce a logical\napproach to machine learning. Models are represented by tuples of logical\nformulas and inputs and outputs are logical structures. We present our\nframework together with several applications where we evaluate it using SAT and\nSMT solvers. We argue that this approach to machine learning is particularly\nsuited to bridge the gap between efficiency and theoretical soundness. We\nexploit results from descriptive complexity theory to prove strong theoretical\nguarantees for our approach. To show its applicability, we present experimental\nresults including learning complexity-theoretic reductions rules for board\ngames. We also explain how neural networks fit into our framework, although the\ncurrent implementation does not scale to provide guarantees for real-world\nneural networks.\n", "id": "1609.02664v1", "title": "Machine Learning with Guarantees using Descriptive Complexity and SMT\n Solvers"}, {"summary": " Complex problems may require sophisticated, non-linear learning methods such\nas kernel machines or deep neural networks to achieve state of the art\nprediction accuracies. However, high prediction accuracies are not the only\nobjective to consider when solving problems using machine learning. Instead,\nparticular scientific applications require some explanation of the learned\nprediction function. Unfortunately, most methods do not come with out of the\nbox straight forward interpretation. Even linear prediction functions are not\nstraight forward to explain if features exhibit complex correlation structure.\n In this paper, we propose the Measure of Feature Importance (MFI). MFI is\ngeneral and can be applied to any arbitrary learning machine (including kernel\nmachines and deep learning). MFI is intrinsically non-linear and can detect\nfeatures that by itself are inconspicuous and only impact the prediction\nfunction through their interaction with other features. Lastly, MFI can be used\nfor both --- model-based feature importance and instance-based feature\nimportance (i.e, measuring the importance of a feature for a particular data\npoint).\n", "id": "1611.07567v1", "title": "Feature Importance Measure for Non-linear Learning Algorithms"}, {"summary": " Molecular machine learning has been maturing rapidly over the last few years.\nImproved methods and the presence of larger datasets have enabled machine\nlearning algorithms to make increasingly accurate predictions about molecular\nproperties. However, algorithmic progress has been limited due to the lack of a\nstandard benchmark to compare the efficacy of proposed methods; most new\nalgorithms are benchmarked on different datasets making it challenging to gauge\nthe quality of proposed methods. This work introduces MoleculeNet, a large\nscale benchmark for molecular machine learning. MoleculeNet curates multiple\npublic datasets, establishes metrics for evaluation, and offers high quality\nopen-source implementations of multiple previously proposed molecular\nfeaturization and learning algorithms (released as part of the DeepChem open\nsource library). MoleculeNet benchmarks demonstrate that learnable\nrepresentations, and in particular graph convolutional networks, are powerful\ntools for molecular machine learning and broadly offer the best performance.\nHowever, for quantum mechanical and biophysical datasets, the use of\nphysics-aware featurizations can be significantly more important than choice of\nparticular learning algorithm.\n", "id": "1703.00564v1", "title": "MoleculeNet: A Benchmark for Molecular Machine Learning"}, {"summary": " Much work has been done refining and characterizing the receptive fields\nlearned by deep learning algorithms. A lot of this work has focused on the\ndevelopment of Gabor-like filters learned when enforcing sparsity constraints\non a natural image dataset. Little work however has investigated how these\nfilters might expand to the temporal domain, namely through training on natural\nmovies. Here we investigate exactly this problem in established temporal deep\nlearning algorithms as well as a new learning paradigm suggested here, the\nTemporal Autoencoding Restricted Boltzmann Machine (TARBM).\n", "id": "1210.8353v1", "title": "Temporal Autoencoding Restricted Boltzmann Machine"}, {"summary": " In this technical report we presented a novel approach to machine learning.\nOnce the new framework is presented, we will provide a simple and yet very\npowerful learning algorithm which will be benchmark on various dataset.\n The framework we proposed is based on booleen circuits; more specifically the\nclassifier produced by our algorithm have that form. Using bits and boolean\ngates instead of real numbers and multiplication enable the the learning\nalgorithm and classifier to use very efficient boolean vector operations. This\nenable both the learning algorithm and classifier to be extremely efficient.\nThe accuracy of the classifier we obtain with our framework compares very\nfavorably those produced by conventional techniques, both in terms of\nefficiency and accuracy.\n", "id": "1409.4044v1", "title": "A new approach in machine learning"}, {"summary": " The restricted Boltzmann machine, an important tool used in machine learning\nin particular for unsupervized learning tasks, is investigated from the\nperspective of its spectral properties. Based on empirical observations, we\npropose a generic statistical ensemble for the weight matrix of the RBM and\ncharacterize its mean evolution, with respect to common learning procedures as\na function of some statistical properties of the data. In particular we\nidentify the main unstable deformation modes of the weight matrix which emerge\nat the beginning of the learning and unveil in some way how these further\ninteract in later stages of the learning procedure.\n", "id": "1708.02917v1", "title": "Spectral Learning of Restricted Boltzmann Machines"}, {"summary": " As machine learning is applied to an increasing variety of complex problems,\nwhich are defined by high dimensional and complex data sets, the necessity for\ntask oriented feature learning grows in importance. With the advancement of\nDeep Learning algorithms, various successful feature learning techniques have\nevolved. In this paper, we present a novel way of learning discriminative\nfeatures by training Deep Neural Nets which have Encoder or Decoder type\narchitecture similar to an Autoencoder. We demonstrate that our approach can\nlearn discriminative features which can perform better at pattern\nclassification tasks when the number of training samples is relatively small in\nsize.\n", "id": "1607.01354v1", "title": "Learning Discriminative Features using Encoder-Decoder type Deep Neural\n Nets"}, {"summary": " Meta-learning consists in learning learning algorithms. We use a Long Short\nTerm Memory (LSTM) based network to learn to compute on-line updates of the\nparameters of another neural network. These parameters are stored in the cell\nstate of the LSTM. Our framework allows to compare learned algorithms to\nhand-made algorithms within the traditional train and test methodology. In an\nexperiment, we learn a learning algorithm for a one-hidden layer Multi-Layer\nPerceptron (MLP) on non-linearly separable datasets. The learned algorithm is\nable to update parameters of both layers and generalise well on similar\ndatasets.\n", "id": "1610.06072v1", "title": "Learning to Learn Neural Networks"}, {"summary": " In this paper, the framework of kernel machines with two layers is\nintroduced, generalizing classical kernel methods. The new learning methodology\nprovide a formal connection between computational architectures with multiple\nlayers and the theme of kernel learning in standard regularization methods.\nFirst, a representer theorem for two-layer networks is presented, showing that\nfinite linear combinations of kernels on each layer are optimal architectures\nwhenever the corresponding functions solve suitable variational problems in\nreproducing kernel Hilbert spaces (RKHS). The input-output map expressed by\nthese architectures turns out to be equivalent to a suitable single-layer\nkernel machines in which the kernel function is also learned from the data.\nRecently, the so-called multiple kernel learning methods have attracted\nconsiderable attention in the machine learning literature. In this paper,\nmultiple kernel learning methods are shown to be specific cases of kernel\nmachines with two layers in which the second layer is linear. Finally, a simple\nand effective multiple kernel learning method called RLS2 (regularized least\nsquares with two layers) is introduced, and his performances on several\nlearning problems are extensively analyzed. An open source MATLAB toolbox to\ntrain and validate RLS2 models with a Graphic User Interface is available.\n", "id": "1001.2709v1", "title": "Kernel machines with two layers and multiple kernel learning"}, {"summary": " This paper comments on the published work dealing with robustness and\nregularization of support vector machines (Journal of Machine Learning\nResearch, vol. 10, pp. 1485-1510, 2009) [arXiv:0803.3490] by H. Xu, etc. They\nproposed a theorem to show that it is possible to relate robustness in the\nfeature space and robustness in the sample space directly. In this paper, we\npropose a counter example that rejects their theorem.\n", "id": "1308.3750v1", "title": "Comment on \"robustness and regularization of support vector machines\" by\n H. Xu, et al., (Journal of Machine Learning Research, vol. 10, pp. 1485-1510,\n 2009, arXiv:0803.3490)"}, {"summary": " Even though active learning forms an important pillar of machine learning,\ndeep learning tools are not prevalent within it. Deep learning poses several\ndifficulties when used in an active learning setting. First, active learning\n(AL) methods generally rely on being able to learn and update models from small\namounts of data. Recent advances in deep learning, on the other hand, are\nnotorious for their dependence on large amounts of data. Second, many AL\nacquisition functions rely on model uncertainty, yet deep learning methods\nrarely represent such model uncertainty. In this paper we combine recent\nadvances in Bayesian deep learning into the active learning framework in a\npractical way. We develop an active learning framework for high dimensional\ndata, a task which has been extremely challenging so far, with very sparse\nexisting literature. Taking advantage of specialised models such as Bayesian\nconvolutional neural networks, we demonstrate our active learning techniques\nwith image data, obtaining a significant improvement on existing active\nlearning approaches. We demonstrate this on both the MNIST dataset, as well as\nfor skin cancer diagnosis from lesion images (ISIC2016 task).\n", "id": "1703.02910v1", "title": "Deep Bayesian Active Learning with Image Data"}, {"summary": " For a learning task, data can usually be collected from different sources or\nbe represented from multiple views. For example, laboratory results from\ndifferent medical examinations are available for disease diagnosis, and each of\nthem can only reflect the health state of a person from a particular\naspect/view. Therefore, different views provide complementary information for\nlearning tasks. An effective integration of the multi-view information is\nexpected to facilitate the learning performance. In this paper, we propose a\ngeneral predictor, named multi-view machines (MVMs), that can effectively\ninclude all the possible interactions between features from multiple views. A\njoint factorization is embedded for the full-order interaction parameters which\nallows parameter estimation under sparsity. Moreover, MVMs can work in\nconjunction with different loss functions for a variety of machine learning\ntasks. A stochastic gradient descent method is presented to learn the MVM\nmodel. We further illustrate the advantages of MVMs through comparison with\nother methods for multi-view classification, including support vector machines\n(SVMs), support tensor machines (STMs) and factorization machines (FMs).\n", "id": "1506.01110v1", "title": "Multi-view Machines"}, {"summary": " Machine Learning is usually defined as a subfield of AI, which is busy with\ninformation extraction from raw data sets. Despite of its common acceptance and\nwidespread recognition, this definition is wrong and groundless. Meaningful\ninformation does not belong to the data that bear it. It belongs to the\nobservers of the data and it is a shared agreement and a convention among them.\nTherefore, this private information cannot be extracted from the data by any\nmeans. Therefore, all further attempts of Machine Learning apologists to\njustify their funny business are inappropriate.\n", "id": "0911.1386v1", "title": "Machine Learning: When and Where the Horses Went Astray?"}, {"summary": " mlpy is a Python Open Source Machine Learning library built on top of\nNumPy/SciPy and the GNU Scientific Libraries. mlpy provides a wide range of\nstate-of-the-art machine learning methods for supervised and unsupervised\nproblems and it is aimed at finding a reasonable compromise among modularity,\nmaintainability, reproducibility, usability and efficiency. mlpy is\nmultiplatform, it works with Python 2 and 3 and it is distributed under GPL3 at\nthe website http://mlpy.fbk.eu.\n", "id": "1202.6548v2", "title": "mlpy: Machine Learning Python"}, {"summary": " Pylearn2 is a machine learning research library. This does not just mean that\nit is a collection of machine learning algorithms that share a common API; it\nmeans that it has been designed for flexibility and extensibility in order to\nfacilitate research projects that involve new or unusual use cases. In this\npaper we give a brief history of the library, an overview of its basic\nphilosophy, a summary of the library's architecture, and a description of how\nthe Pylearn2 community functions socially.\n", "id": "1308.4214v1", "title": "Pylearn2: a machine learning research library"}, {"summary": " EnsembleSVM is a free software package containing efficient routines to\nperform ensemble learning with support vector machine (SVM) base models. It\ncurrently offers ensemble methods based on binary SVM models. Our\nimplementation avoids duplicate storage and evaluation of support vectors which\nare shared between constituent models. Experimental results show that using\nensemble approaches can drastically reduce training complexity while\nmaintaining high predictive accuracy. The EnsembleSVM software package is\nfreely available online at http://esat.kuleuven.be/stadius/ensemblesvm.\n", "id": "1403.0745v1", "title": "EnsembleSVM: A Library for Ensemble Learning Using Support Vector\n Machines"}, {"summary": " This paper introduces the Encog library for Java and C#, a scalable,\nadaptable, multiplatform machine learning framework that was 1st released in\n2008. Encog allows a variety of machine learning models to be applied to\ndatasets using regression, classification, and clustering. Various supported\nmachine learning models can be used interchangeably with minimal recoding.\nEncog uses efficient multithreaded code to reduce training time by exploiting\nmodern multicore processors. The current version of Encog can be downloaded\nfrom http://www.encog.org.\n", "id": "1506.04776v1", "title": "Encog: Library of Interchangeable Machine Learning Models for Java and\n C#"}, {"summary": " Artificial intelligence offers superior techniques and methods by which\nproblems from diverse domains may find an optimal solution. The Machine\nLearning technologies refer to the domain of artificial intelligence aiming to\ndevelop the techniques allowing the computers to \"learn\". Some systems based on\nMachine Learning technologies tend to eliminate the necessity of the human\nintelligence while the others adopt a man-machine collaborative approach.\n", "id": "0904.3667v1", "title": "Considerations upon the Machine Learning Technologies"}, {"summary": " Recent machine learning techniques can be modified to produce creative\nresults. Those results did not exist before; it is not a trivial combination of\nthe data which was fed into the machine learning system. The obtained results\ncome in multiple forms: As images, as text and as audio.\n This paper gives a high level overview of how they are created and gives some\nexamples. It is meant to be a summary of the current work and give people who\nare new to machine learning some starting points.\n", "id": "1601.03642v1", "title": "Creativity in Machine Learning"}, {"summary": " We present a comprehensive review of the most effective content-based e-mail\nspam filtering techniques. We focus primarily on Machine Learning-based spam\nfilters and their variants, and report on a broad review ranging from surveying\nthe relevant ideas, efforts, effectiveness, and the current progress. The\ninitial exposition of the background examines the basics of e-mail spam\nfiltering, the evolving nature of spam, spammers playing cat-and-mouse with\ne-mail service providers (ESPs), and the Machine Learning front in fighting\nspam. We conclude by measuring the impact of Machine Learning-based filters and\nexplore the promising offshoots of latest developments.\n", "id": "1606.01042v1", "title": "Machine Learning for E-mail Spam Filtering: Review,Techniques and Trends"}, {"summary": " It is commonly believed that increasing the interpretability of a machine\nlearning model may decrease its predictive power. However, inspecting\ninput-output relationships of those models using visual analytics, while\ntreating them as black-box, can help to understand the reasoning behind\noutcomes without sacrificing predictive quality. We identify a space of\npossible solutions and provide two examples of where such techniques have been\nsuccessfully used in practice.\n", "id": "1606.05685v2", "title": "Using Visual Analytics to Interpret Predictive Machine Learning Models"}, {"summary": " In this work, we study the use of logistic regression in manufacturing\nfailures detection. As a data set for the analysis, we used the data from\nKaggle competition Bosch Production Line Performance. We considered the use of\nmachine learning, linear and Bayesian models. For machine learning approach, we\nanalyzed XGBoost tree based classifier to obtain high scored classification.\nUsing the generalized linear model for logistic regression makes it possible to\nanalyze the influence of the factors under study. The Bayesian approach for\nlogistic regression gives the statistical distribution for the parameters of\nthe model. It can be useful in the probabilistic analysis, e.g. risk\nassessment.\n", "id": "1612.05740v1", "title": "Machine Learning, Linear and Bayesian Models for Logistic Regression in\n Failure Detection Problems"}, {"summary": " The generation of artificial data based on existing observations, known as\ndata augmentation, is a technique used in machine learning to improve model\naccuracy, generalisation, and to control overfitting. Augmentor is a software\npackage, available in both Python and Julia versions, that provides a high\nlevel API for the expansion of image data using a stochastic, pipeline-based\napproach which effectively allows for images to be sampled from a distribution\nof augmented images at runtime. Augmentor provides methods for most standard\naugmentation practices as well as several advanced features such as\nlabel-preserving, randomised elastic distortions, and provides many helper\nfunctions for typical augmentation tasks used in machine learning.\n", "id": "1708.04680v1", "title": "Augmentor: An Image Augmentation Library for Machine Learning"}, {"summary": " The recent, remarkable growth of machine learning has led to intense interest\nin the privacy of the data on which machine learning relies, and to new\ntechniques for preserving privacy. However, older ideas about privacy may well\nremain valid and useful. This note reviews two recent works on privacy in the\nlight of the wisdom of some of the early literature, in particular the\nprinciples distilled by Saltzer and Schroeder in the 1970s.\n", "id": "1708.08022v1", "title": "On the Protection of Private Information in Machine Learning Systems:\n Two Recent Approaches"}, {"summary": " We present a framework to derive risk bounds for vector-valued learning with\na broad class of feature maps and loss functions. Multi-task learning and\none-vs-all multi-category learning are treated as examples. We discuss in\ndetail vector-valued functions with one hidden layer, and demonstrate that the\nconditions under which shared representations are beneficial for multi- task\nlearning are equally applicable to multi-category learning.\n", "id": "1606.01487v1", "title": "Bounds for Vector-Valued Function Estimation"}, {"summary": " Data-target pairing is an important step towards multi-target localization\nfor the intelligent operation of unmanned systems. Target localization plays a\ncrucial role in numerous applications, such as search, and rescue missions,\ntraffic management and surveillance. The objective of this paper is to present\nan innovative target location learning approach, where numerous machine\nlearning approaches, including K-means clustering and supported vector machines\n(SVM), are used to learn the data pattern across a list of spatially\ndistributed sensors. To enable the accurate data association from different\nsensors for accurate target localization, appropriate data pre-processing is\nessential, which is then followed by the application of different machine\nlearning algorithms to appropriately group data from different sensors for the\naccurate localization of multiple targets. Through simulation examples, the\nperformance of these machine learning algorithms is quantified and compared.\n", "id": "1703.00084v1", "title": "Multi-Sensor Data Pattern Recognition for Multi-Target Localization: A\n Machine Learning Approach"}, {"summary": " Algorithms learned from data are increasingly used for deciding many aspects\nin our life: from movies we see, to prices we pay, or medicine we get. Yet\nthere is growing evidence that decision making by inappropriately trained\nalgorithms may unintentionally discriminate people. For example, in automated\nmatching of candidate CVs with job descriptions, algorithms may capture and\npropagate ethnicity related biases. Several repairs for selected algorithms\nhave already been proposed, but the underlying mechanisms how such\ndiscrimination happens from the computational perspective are not yet\nscientifically understood. We need to develop theoretical understanding how\nalgorithms may become discriminatory, and establish fundamental machine\nlearning principles for prevention. We need to analyze machine learning process\nas a whole to systematically explain the roots of discrimination occurrence,\nwhich will allow to devise global machine learning optimization criteria for\nguaranteed prevention, as opposed to pushing empirical constraints into\nexisting algorithms case-by-case. As a result, the state-of-the-art will\nadvance from heuristic repairing, to proactive and theoretically supported\nprevention. This is needed not only because law requires to protect vulnerable\npeople. Penetration of big data initiatives will only increase, and computer\nscience needs to provide solid explanations and accountability to the public,\nbefore public concerns lead to unnecessarily restrictive regulations against\nmachine learning.\n", "id": "1708.00754v1", "title": "Fairness-aware machine learning: a perspective"}, {"summary": " Statistical machine learning methods are increasingly used for neuroimaging\ndata analysis. Their main virtue is their ability to model high-dimensional\ndatasets, e.g. multivariate analysis of activation images or resting-state time\nseries. Supervised learning is typically used in decoding or encoding settings\nto relate brain images to behavioral or clinical observations, while\nunsupervised learning can uncover hidden structures in sets of images (e.g.\nresting state functional MRI) or find sub-populations in large cohorts. By\nconsidering different functional neuroimaging applications, we illustrate how\nscikit-learn, a Python machine learning library, can be used to perform some\nkey analysis steps. Scikit-learn contains a very large set of statistical\nlearning algorithms, both supervised and unsupervised, and its application to\nneuroimaging data provides a versatile tool to study the brain.\n", "id": "1412.3919v1", "title": "Machine Learning for Neuroimaging with Scikit-Learn"}, {"summary": " Support vector machines have attracted much attention in theoretical and in\napplied statistics. Main topics of recent interest are consistency, learning\nrates and robustness. In this article, it is shown that support vector machines\nare qualitatively robust. Since support vector machines can be represented by a\nfunctional on the set of all probability measures, qualitative robustness is\nproven by showing that this functional is continuous with respect to the\ntopology generated by weak convergence of probability measures. Combined with\nthe existence and uniqueness of support vector machines, our results show that\nsupport vector machines are the solutions of a well-posed mathematical problem\nin Hadamard's sense.\n", "id": "0912.0874v2", "title": "Qualitative Robustness of Support Vector Machines"}, {"summary": " Inspired by a growing interest in analyzing network data, we study the\nproblem of node classification on graphs, focusing on approaches based on\nkernel machines. Conventionally, kernel machines are linear classifiers in the\nimplicit feature space. We argue that linear classification in the feature\nspace of kernels commonly used for graphs is often not enough to produce good\nresults. When this is the case, one naturally considers nonlinear classifiers\nin the feature space. We show that repeating this process produces something we\ncall \"deep kernel machines.\" We provide some examples where deep kernel\nmachines can make a big difference in classification performance, and point out\nsome connections to various recent literature on deep architectures in\nartificial intelligence and machine learning.\n", "id": "1001.4019v1", "title": "Classifying Network Data with Deep Kernel Machines"}, {"summary": " We study the typical properties of polynomial Support Vector Machines within\na Statistical Mechanics approach that allows us to analyze the effect of\ndifferent normalizations of the features. If the normalization is adecuately\nchosen, there is a hierarchical learning of features of increasing order as a\nfunction of the training set size.\n", "id": "cond-mat/0010423v1", "title": "Hierarchical learning in polynomial Support Vector Machines"}, {"summary": " We evaluate the following Machine Learning techniques for Green Energy (Wind,\nSolar) Prediction: Bayesian Inference, Neural Networks, Support Vector\nMachines, Clustering techniques (PCA). Our objective is to predict green energy\nusing weather forecasts, predict deviations from forecast green energy, find\ncorrelation amongst different weather parameters and green energy availability,\nrecover lost or missing energy (/ weather) data. We use historical weather data\nand weather forecasts for the same.\n", "id": "1406.3726v1", "title": "Evaluation of Machine Learning Techniques for Green Energy Prediction"}, {"summary": " We investigate Fano schemes of conditionally generic intersections, i.e. of\nhypersurfaces in projective space chosen generically up to additional\nconditions. Via a correspondence between generic properties of algebraic\nvarieties and events in probability spaces that occur with probability one, we\nuse the obtained results on Fano schemes to solve a problem in machine\nlearning.\n", "id": "1301.3078v1", "title": "Fano schemes of generic intersections and machine learning"}, {"summary": " Screening is an effective technique for speeding up the training process of a\nsparse learning model by removing the features that are guaranteed to be\ninactive the process. In this paper, we present a efficient screening technique\nfor sparse support vector machine based on variational inequality. The\ntechnique is both efficient and safe.\n", "id": "1310.8320v1", "title": "Safe and Efficient Screening For Sparse Support Vector Machine"}, {"summary": " Abundant accumulation of digital histopathological images has led to the\nincreased demand for their analysis, such as computer-aided diagnosis using\nmachine learning techniques. However, digital pathological images and related\ntasks have some issues to be considered. In this mini-review, we introduce the\napplication of digital pathological image analysis using machine learning\nalgorithms, address some problems specific to such analysis, and propose\npossible solutions.\n", "id": "1709.00786v1", "title": "Machine learning methods for histopathological image analysis"}];