Lifelong machine learning /

"This is an introduction to an advanced machine learning paradigm that continuously learns by accumulating past knowledge that it then uses in future learning and problem solving. In contrast, the current dominant machine learning paradigm learns in isolation: given a training dataset, it runs...

Full description

Bibliographic Details
Main Authors: Chen, Zhiyuan (Computer scientist) (Author), Liu, Bing, 1963- (Author)
Format: Book
Language:English
Published: Cham, Switzerland : Springer, [2018]
Edition:Second edition
Series:Synthesis lectures on artificial intelligence and machine learning ; #38
Subjects:
Table of Contents:
  • 1. Introduction
  • 1.1 Classic machine learning paradigm
  • 1.2 Motivating examples
  • 1.3 A brief history of lifelong learning
  • 1.4 Definition of lifelong learning
  • 1.5 Types of knowledge and key challenges
  • 1.6 Evaluation methodology and role of big data
  • 1.7 Outline of the book
  • 2. Related learning paradigms
  • 2.1 Transfer learning
  • 2.1.1 Structural correspondence learning
  • 2.1.2 Naïve Bayes transfer classifier
  • 2.1.3 Deep learning in transfer learning
  • 2.1.4 Difference from lifelong learning
  • 2.2 Multi-task learning
  • 2.2.1 Task relatedness in multi-task learning
  • 2.2.2 GO-MTL: multi-task learning using latent basis
  • 2.2.3 Deep learning in multi-task learning
  • 2.2.4 Difference from lifelong learning
  • 2.3 Online learning
  • 2.3.1 Difference from lifelong learning
  • 2.4 Reinforcement learning
  • 2.4.1 Difference from lifelong learning
  • 2.5 Meta learning
  • 2.5.1 Difference from lifelong learning
  • 2.6 Summary
  • 3. Lifelong supervised learning
  • 3.1 Definition and overview
  • 3.2 Lifelong memory-based learning
  • 3.2.1 Two memory-based learning methods
  • 3.2.2 Learning a new representation for lifelong learning
  • 3.3 Lifelong neural networks
  • 3.3.1 MTL net
  • 3.3.2 Lifelong EBNN
  • 3.4 ELLA: an efficient lifelong learning algorithm
  • 3.4.1 Problem setting
  • 3.4.2 Objective function
  • 3.4.3 Dealing with the first inefficiency
  • 3.4.4 Dealing with the second inefficiency
  • 3.4.5 Active task selection
  • 3.5 Lifelong naive Bayesian classification
  • 3.5.1 Naïve Bayesian text classification
  • 3.5.2 Basic ideas of LSC
  • 3.5.3 LSC technique
  • 3.5.4 Discussions
  • 3.6 Domain word embedding via meta-learning
  • 3.7 Summary and evaluation datasets
  • 4. Continual learning and catastrophic forgetting
  • 4.1 Catastrophic forgetting
  • 4.2 Continual learning in neural networks
  • 4.3 Learning without forgetting
  • 4.4 Progressive neural networks
  • 4.5 Elastic weight consolidation
  • 4.6 iCaRL: incremental classifier and representation learning
  • 4.6.1 Incremental training
  • 4.6.2 Updating representation
  • 4.6.3 Constructing exemplar sets for new classes
  • 4.6.4 Performing classification in iCaRL
  • 4.7 Expert gate
  • 4.7.1 Autoencoder gate
  • 4.7.2 Measuring task relatedness for training
  • 4.7.3 Selecting the most relevant expert for testing
  • 4.7.4 Encoder-based lifelong learning
  • 4.8 Continual learning with generative replay
  • 4.8.1 Generative adversarial networks
  • 4.8.2 Generative replay
  • 4.9 Evaluating catastrophic forgetting
  • 4.10 Summary and evaluation datasets
  • 5. Open-world learning
  • 5.1 Problem definition and applications
  • 5.2 Center-based similarity space learning
  • 5.2.1 Incrementally updating a CBS learning model
  • 5.2.2 Testing a CBS learning model
  • 5.2.3 CBS learning for unseen class detection
  • 5.3 DOC: deep open classification
  • 5.3.1 Feed-forward layers and the 1-vs.-rest layer
  • 5.3.2 Reducing open-space risk
  • 5.3.3 DOC for image classification
  • 5.3.4 Unseen class discovery
  • 5.4 Summary and evaluation datasets
  • 5058 6. Lifelong topic modeling
  • 6.1 Main ideas of lifelong topic modeling
  • 6.2 LTM: a lifelong topic model
  • 6.2.1 LTM model
  • 6.2.2 Topic knowledge mining
  • 6.2.3 Incorporating past knowledge
  • 6.2.4 Conditional distribution of Gibbs sampler
  • 6.3 AMC: a lifelong topic model for small data
  • 6.3.1 Overall algorithm of AMC
  • 6.3.2 Mining must-link knowledge
  • 6.3.3 Mining cannot-link knowledge
  • 6.3.4 Extended Pólya Urn model
  • 6.3.5 Sampling distributions in Gibbs sampler
  • 6.4 Summary and evaluation datasets
  • 7. Lifelong information extraction
  • 7.1 NELL: a never-ending language learner
  • 7.1.1 NELL architecture
  • 7.1.2 Extractors and learning in NELL
  • 7.1.3 Coupling constraints in NELL
  • 7.2 Lifelong opinion target extraction
  • 7.2.1 Lifelong learning through recommendation
  • 7.2.2 AER algorithm
  • 7.2.3 Knowledge learning
  • 7.2.4 Recommendation using past knowledge
  • 7.3 Learning on the job
  • 7.3.1 Conditional random fields
  • 7.3.2 General dependency feature
  • 7.3.3 The L-CRF algorithm
  • 7.4 Lifelong-RL: lifelong relaxation labeling
  • 7.4.1 Relaxation labeling
  • 7.4.2 Lifelong relaxation labeling
  • 7.5 Summary and evaluation datasets
  • 5058 8. Continuous knowledge learning in chatbots
  • 8.1 LiLi: lifelong interactive learning and inference
  • 8.2 Basic ideas of LiLi
  • 8.3 Components of LiLi
  • 8.4 A running example
  • 8.5 Summary and evaluation datasets
  • 9. Lifelong reinforcement learning
  • 9.1 Lifelong reinforcement learning through multiple environments
  • 9.1.1 Acquiring and incorporating bias
  • 9.2 Hierarchical Bayesian lifelong reinforcement learning
  • 9.2.1 Motivation
  • 9.2.2 Hierarchical Bayesian approach
  • 9.2.3 MTRL algorithm
  • 9.2.4 Updating hierarchical model parameters
  • 9.2.5 Sampling an MDP
  • 9.3 PG-ELLA: lifelong policy gradient reinforcement learning
  • 9.3.1 Policy gradient reinforcement learning
  • 9.3.2 Policy gradient lifelong learning setting
  • 9.3.3 Objective function and optimization
  • 9.3.4 Safe policy search for lifelong learning
  • 9.3.5 Cross-domain lifelong reinforcement learning
  • 9.4 Summary and evaluation datasets
  • 10. Conclusion and future directions
  • Bibliography
  • Authors' biographies