Machine learning : a probabilistic perspective /

"This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach. The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as...

Full description

Bibliographic Details
Main Author:	Murphy, Kevin P., 1970-
Format:	Book
Language:	English
Published:	Cambridge, MA : MIT Press, c2012 Cambridge, Mass. : c2012 Cambridge, Mass. : [2012]
Series:	Adaptive computation and machine learning series Adaptive computation and machine learning
Subjects:	Machine learning Probabilities


LEADER	38353nam a2201225 a 4500
001	38368390-14df-4ae8-b607-a77ce0f48c8a
005	20240903000000.0
008	120315s2012 maua b 001 0 eng
010			\|a 2012004558
010			\|a 2012004558
016	7		\|a 016102606 \|2 Uk
020			\|a 0262018020 (hardcover : alk. paper)
020			\|a 0262018020 \|q hardcover \|q alkaline paper
020			\|a 9780262018029 (hardcover : alk. paper)
020			\|a 9780262018029 \|q hardcover \|q alkaline paper
020			\|a 9780262018029
035			\|a (CaONFJC)cou21606713
035			\|a (DLC) 2012004558
035			\|a (DLC)cou21606713
035			\|a (MdBJ)4292011
035			\|a (NcD)005663550DUK01
035			\|a (NjP)7197888-princetondb
035			\|a (OCoLC)781277861
035			\|a (OCoLC)ocn781277861
035			\|a (OCoLC)ocn857550652
035			\|a (PU)5842857-penndb-Voyager
035			\|a (RPB)b62564298-01bu_inst
035			\|a 11683574
035			\|a 4292011
035			\|a 781277861
035			\|a 7927280
035			\|a 7929714
035			\|a 8529694
035			\|a ybp7646301
035			\|z (NjP)Voyager7197888
040			\|a DLC \|b eng \|c DLC \|d DLC
040			\|a DLC \|b eng \|c DLC \|d YDX \|d BTCTA \|d UKMGB \|d BDX \|d YDXCP \|d OCLCO \|d CDX \|d PUL \|d IXA \|d Z@L \|d COO \|d MYG \|d WEX \|d OCLCF \|d NhCcYME
040			\|a DLC \|b eng \|c DLC \|d YDX \|d BTCTA \|d UKMGB \|d BDX \|d YDXCP \|d OCLCO \|d CDX \|d PUL \|d IXA \|d Z@L
040			\|a DLC \|b eng \|c DLC \|d YDX \|d BTCTA \|d UKMGB \|d BDX \|d YDXCP \|d OCLCO \|d CDX \|d PUL
040			\|a DLC \|b eng \|c DLC \|d YDX \|d BTCTA \|d UKMGB \|d BDX \|d YDXCP \|d OCLCO \|d NhCcYME
040			\|a DLC \|b eng \|c DLC \|d YDX \|d BTCTA \|d UKMGB \|d BDX \|d YDXCP \|d OCLCO
042			\|a pcc
049			\|a JHEE
049			\|a PAUU
049			\|a PULT
049			\|a RBNN
050	0	0	\|a Q325.5 \|b .M87 2012
079			\|a ocn781277861
082	0	0	\|a 006.3/1 \|2 23
090			\|a Q325.5 \|b .M87 2012
100	1		\|a Murphy, Kevin P., \|d 1970-
245	1	0	\|a Machine learning : \|b a probabilistic perspective / \|c Kevin P. Murphy
260			\|a Cambridge, MA : \|b MIT Press, \|c c2012
260			\|a Cambridge, Mass. : \|b MIT Press, \|c c2012
264		1	\|a Cambridge, Mass. : \|b MIT Press, \|c [2012]
264		4	\|c ©2012
300			\|a xxix, 1067 p. : \|b ill. (chiefly col.) ; \|c 24 cm
300			\|a xxix, 1067 p. : \|b ill. (some col.) ; \|c 24 cm
300			\|a xxix, 1067 pages : \|b illustrations (chiefly color) ; \|c 24 cm
336			\|a text \|b txt \|2 rdacontent
337			\|a unmediated \|b n \|2 rdamedia
338			\|a volume \|b nc \|2 rdacarrier
490	0		\|a Adaptive computation and machine learning series
490	1		\|a Adaptive computation and machine learning series
500			\|a This WorldCat-derived record is shareable under Open Data Commons ODC-BY, with attribution to OCLC \|5 CTY
504			\|a Includes bibliographical references (p. [1015]-1045) and indexes
504			\|a Includes bibliographical references and index
505	0		\|a Probability -- Generative models for discrete data -- Gaussian models -- Bayesian statistics -- Frequentist statistics -- Linear regression -- Logistic regression -- Generalized linear models and the exponential family -- Directed graphical models (Bayes nets) -- Mixture models and the EM algorithm -- Latent linear models -- Sparse linear models -- Kernels -- Gaussian processes -- Adaptive basis function models -- Markov and hidden Markov models -- State space models -- Undirected graphical models (Markov random fields) -- Exact inference for graphical models -- Variational inference -- More variational inference -- Monte Carlo inference -- Markov chain Monte Carlo (MCMC) inference -- Clustering -- Graphical model structure learning -- Latent variable models for discrete data -- Deep learning -- Notation
505	0	0	\|a Contents note continued: \|g 11.2.3 \|t Using mixture models for clustering -- \|g 11.2.4. \|t Mixtures of experts -- \|g 11.3. \|t Parameter estimation for mixture models -- \|g 11.3.1. \|t Unidentifiability -- \|g 11.3.2. \|t Computing a MAP estimate is non-convex -- \|g 11.4. \|t EM algorithm -- \|g 11.4.1. \|t Basic idea -- \|g 11.4.2. \|t EM for GMMs -- \|g 11.4.3. \|t EM for mixture of experts -- \|g 11.4.4. \|t EM for DGMs with hidden variables -- \|g 11.4.5. \|t EM for the Student distribution -- \|g 11.4.6. \|t EM for probit regression -- \|g 11.4.7. \|t Theoretical basis for EM -- \|g 11.4.8. \|t Online EM -- \|g 11.4.9. \|t Other EM variants -- \|g 11.5. \|t Model selection for latent variable models -- \|g 11.5.1. \|t Model selection for probabilistic models -- \|g 11.5.2. \|t Model selection for non-probabilistic methods -- \|g 11.6. \|t Fitting models with missing data -- \|g 11.6.1. \|t EM for the MLE of an MVN with missing data -- \|g 12.1. \|t Factor analysis -- \|g 12.1.1. \|t FA is a low rank parameterization of an MVN -- \|g 12.1.2. \|t Inference of the latent factors -- \|g 12.1.3. \|t Unidentifiability -- \|g 12.1.4. \|t Mixtures of factor analysers -- \|g 12.1.5. \|t EM for factor analysis models -- \|g 12.1.6. \|t Fitting FA models with missing data -- \|g 12.2. \|t Principal components analysis (PCA) -- \|g 12.2.1. \|t Classical PCA: statement of the theorem -- \|g 12.2.2. \|t Proof -- \|g 12.2.3. \|t Singular value decomposition (SVD) -- \|g 12.2.4. \|t Probabilistic PCA -- \|g 12.2.5. \|t EM algorithm for PCA -- \|g 12.3. \|t Choosing the number of latent dimensions -- \|g 12.3.1. \|t Model selection for FA/PPCA -- \|g 12.3.2. \|t Model selection for PCA -- \|g 12.4. \|t PCA for categorical data -- \|g 12.5. \|t PCA for paired and multi-view data -- \|g 12.5.1. \|t Supervised PCA (latent factor regression) -- \|g 12.5.2. \|t Partial least squares -- \|g 12.5.3. \|t Canonical correlation analysis -- \|g 12.6. \|t Independent Component Analysis (ICA) -- \|g 12.6.1. \|t Maximum likelihood estimation -- \|g 12.6.2. \|t FastICA algorithm -- \|g 12.6.3. \|t Using EM -- \|g 12.6.4. \|t Other estimation principles -- \|g 13.1. \|t Introduction -- \|g 13.2. \|t Bayesian variable selection -- \|g 13.2.1. \|t spike and slab model -- \|g 13.2.2. \|t From the Bernoulli-Gaussian model to l0 regularization -- \|g 13.2.3. \|t Algorithms -- \|g 13.3. \|t l1 regularization: basics -- \|g 13.3.1. \|t Why does l1 regularization yield sparse solutions? -- \|g 13.3.2. \|t Optimality conditions for lasso -- \|g 13.3.3. \|t Comparison of least squares, lasso, ridge and subset selection -- \|g 13.3.4. \|t Regularization path -- \|g 13.3.5. \|t Model selection -- \|g 13.3.6. \|t Bayesian inference for linear models with Laplace priors -- \|g 13.4. \|t l1 regularization: algorithms -- \|g 13.4.1. \|t Coordinate descent -- \|g 13.4.2. \|t LARS and other homotopy methods -- \|g 13.4.3. \|t Proximal and gradient projection methods -- \|g 13.4.4. \|t EM for lasso -- \|g 13.5. \|t l1 regularization: extensions -- \|g 13.5.1. \|t Group Lasso -- \|g 13.5.2. \|t Fused lasso -- \|g 13.5.3. \|t Elastic net (ridge and lasso combined) -- \|g 13.6. \|t Non-convex regularizers -- \|g 13.6.1. \|t Bridge regression -- \|g 13.6.2. \|t Hierarchical adaptive lasso -- \|g 13.6.3. \|t Other hierarchical priors -- \|g 13.7. \|t Automatic relevance determination (ARD)/sparse Bayesian learning (SBL) -- \|g 13.7.1. \|t ARD for linear regression -- \|g 13.7.2. \|t Whence sparsity? -- \|g 13.7.3. \|t Connection to MAP estimation -- \|g 13.7.4. \|t Algorithms for ARD -- \|g 13.7.5. \|t ARD for logistic regression -- \|g 13.8. \|t Sparse coding -- \|g 13.8.1. \|t Learning a sparse coding dictionary -- \|g 13.8.2. \|t Results of dictionary learning from image patches -- \|g 13.8.3. \|t Compressed sensing -- \|g 13.8.4. \|t Image inpainting and denoising -- \|g 14.1. \|t Introduction -- \|g 14.2. \|t Kernel functions -- \|g 14.2.1. \|t RBF kernels -- \|g 14.2.2. \|t Kernels for comparing documents -- \|g 14.2.3. \|t Mercer (positive definite) kernels -- \|g 14.2.4. \|t Linear kernels -- \|g 14.2.5. \|t Matern kernels -- \|g 14.2.6. \|t String kernels -- \|g 14.2.7. \|t Pyramid match kernels -- \|g 14.2.8. \|t Kernels derived from probabilistic generative models -- \|g 14.3. \|t Using kernels inside GLMs -- \|g 14.3.1. \|t Kernel machines -- \|g 14.3.2. \|t L1VMs, RVMs, and other sparse vector machines -- \|g 14.4. \|t kernel trick -- \|g 14.4.1. \|t Kernelized nearest neighbor classification -- \|g 14.4.2. \|t Kernelized K-medoids clustering -- \|g 14.4.3. \|t Kernelized ridge regression -- \|g 14.4.4. \|t Kernel PCA -- \|g 14.5. \|t Support vector machines (SVMs) -- \|g 14.5.1. \|t SVMs for regression -- \|g 14.5.2. \|t SVMs for classification -- \|g 14.5.3. \|t Choosing C -- \|g 14.5.4. \|t Summary of key points -- \|g 14.5.5. \|t probabilistic interpretation of SVMs -- \|g 14.6. \|t Comparison of discriminative kernel methods -- \|g 14.7. \|t Kernels for building generative models -- \|g 14.7.1. \|t Smoothing kernels -- \|g 14.7.2. \|t Kernel density estimation (KDE) -- \|g 14.7.3. \|t From KDE to KNN -- \|g 14.7.4. \|t Kernel regression -- \|g 14.7.5. \|t Locally weighted regression -- \|g 15.1. \|t Introduction -- \|g 15.2. \|t GPs for regression -- \|g 15.2.1. \|t Predictions using noise-free observations -- \|g 15.2.2. \|t Predictions using noisy observations -- \|g 15.2.3. \|t Effect of the kernel parameters -- \|g 15.2.4. \|t Estimating the kernel parameters -- \|g 15.2.5. \|t Computational and numerical issues -- \|g 15.2.6. \|t Semi-parametric GPs -- \|g 15.3. \|t GPs meet GLMs -- \|g 15.3.1. \|t Binary classification -- \|g 15.3.2. \|t Multi-class classification -- \|g 15.3.3. \|t GPs for Poisson regression -- \|g 15.4. \|t Connection with other methods -- \|g 15.4.1. \|t Linear models compared to GPs -- \|g 15.4.2. \|t Linear smoothers compared to GPs -- \|g 15.4.3. \|t SVMs compared to GPs -- \|g 15.4.4. \|t LIVM and Rs compared to GPs -- \|g 15.4.5. \|t Neural networks compared to GPs -- \|g 15.4.6. \|t Smoothing splines compared to GPs -- \|g 15.4.7. \|t RKHS methods compared to GPs -- \|g 15.5. \|t GP latent variable model -- \|g 15.6. \|t Approximation methods for large datasets -- \|g 16.1. \|t Introduction -- \|g 16.2. \|t Classification and regression trees (CART) -- \|g 16.2.1. \|t Basics -- \|g 16.2.2. \|t Growing a tree -- \|g 16.2.3. \|t Pruning a tree -- \|g 16.2.4. \|t Pros and cons of trees -- \|g 16.2.5. \|t Random forests -- \|g 16.2.6. \|t CART compared to hierarchical mixture of experts -- \|g 16.3. \|t Generalized additive models -- \|g 16.3.1. \|t Backfitting -- \|g 16.3.2. \|t Computational efficiency -- \|g 16.3.3. \|t Multivariate adaptive regression splines (MARS) -- \|g 16.4. \|t Boosting -- \|g 16.4.1. \|t Forward stagewise additive modeling -- \|g 16.4.2. \|t L2boosting -- \|g 16.4.3. \|t AdaBoost -- \|g 16.4.4. \|t LogitBoost -- \|g 16.4.5. \|t Boosting as functional gradient descent -- \|g 16.4.6. \|t Sparse boosting -- \|g 16.4.7. \|t Multivariate adaptive regression trees (MART) -- \|g 16.4.8. \|t Why does boosting work so well? -- \|g 16.4.9. \|t Bayesian view -- \|g 16.5. \|t Feedforward neural networks (multilayer perceptrons) -- \|g 16.5.1. \|t Convolutional neural networks -- \|g 16.5.2. \|t Other kinds of neural networks -- \|g 16.5.3. \|t brief history of the field -- \|g 16.5.4. \|t backpropagation algorithm -- \|g 16.5.5. \|t Identifiability -- \|g 16.5.6. \|t Regularization -- \|g 16.5.7. \|t Bayesian inference -- \|g 16.6. \|t Ensemble learning -- \|g 16.6.1. \|t Stacking -- \|g 16.6.2. \|t Error-correcting output codes -- \|g 16.6.3. \|t Ensemble learning is not equivalent to Bayes model averaging -- \|g 16.7. \|t Experimental comparison -- \|g 16.7.1. \|t Low-dimensional features -- \|g 16.7.2. \|t High-dimensional features -- \|g 16.8. \|t Interpreting black-box models -- \|g 17.1. \|t Introduction -- \|g 17.2. \|t Markov models -- \|g 17.2.1. \|t Transition matrix -- \|g 17.2.2. \|t Application: Language modeling -- \|g 17.2.3. \|t Stationary distribution of a Markov chain -- \|g 17.2.4. \|t Application: Google's PageRank algorithm for web page ranking -- \|g 17.3. \|t Hidden Markov models -- \|g 17.3.1. \|t Applications of HMMs -- \|g 17.4. \|t Inference in HMMs -- \|g 17.4.1. \|t Types of inference problems for temporal models -- \|g 17.4.2. \|t forwards algorithm -- \|g 17.4.3. \|t forwards-backwards algorithm -- \|g 17.4.4. \|t Viterbi algorithm -- \|g 17.4.5. \|t Forwards filtering, backwards sampling -- \|g 17.5. \|t Learning for HMMs -- \|g 17.5.1. \|t Training with fully observed data -- \|g 17.5.2. \|t EM for HMMs (the Baum-Welch algorithm) -- \|g 17.5.3. \|t Bayesian methods for "fitting" HMMs -- \|g 17.5.4. \|t Discriminative training -- \|g 17.5.5. \|t Model selection -- \|g 17.6. \|t Generalizations of HMMs -- \|g 17.6.1. \|t Variable duration (semi-Markov) HMMs -- \|g 17.6.2. \|t Hierarchical HMMs -- \|g 17.6.3. \|t Input-output HMMs -- \|g 17.6.4. \|t Auto-regressive and buried HMMs -- \|g 17.6.5. \|t Factorial HMM -- \|g 17.6.6. \|t Coupled HMM and the influence model -- \|g 17.6.7. \|t Dynamic Bayesian networks (DBNs) -- \|g 18.1. \|t Introduction -- \|g 18.2. \|t Applications of SSMs -- \|g 18.2.1. \|t SSMs for object tracking -- \|g 18.2.2. \|t Robotic SLAM -- \|g 18.2.3. \|t Online parameter learning using recursive least squares -- \|g 18.2.4. \|t SSM for time series forecasting -- \|g 18.3. \|t Inference in LG-SSM -- \|g 18.3.1. \|t Kalman filtering algorithm -- \|g 18.3.2. \|t Kalman smoothing algorithm -- \|g 18.4. \|t Learning for LG-SSM -- \|g 18.4.1. \|t Identifiability and numerical stability -- \|g 18.4.2. \|t Training with
505	0	0	\|a Contents note continued: \|g 19.5.3 \|t Approximate methods for computing the MLEs of MRFs -- \|g 19.5.4. \|t Pseudo likelihood -- \|g 19.5.5. \|t Stochastic maximum likelihood -- \|g 19.5.6. \|t Feature induction for maxent models -- \|g 19.5.7. \|t Iterative proportional fitting (IPF) -- \|g 19.6. \|t Conditional random fields (CRFs) -- \|g 19.6.1. \|t Chain-structured CRFs, MEMMs and the label-bias problem -- \|g 19.6.2. \|t Applications of CRFs -- \|g 19.6.3. \|t CRF training -- \|g 19.7. \|t Structural SVMs -- \|g 19.7.1. \|t SSVMs: a probabilistic view -- \|g 19.7.2. \|t SSVMs: a non-probabilistic view -- \|g 19.7.3. \|t Cutting plane methods for fitting SSVMs -- \|g 19.7.4. \|t Online algorithms for fitting SSVMs -- \|g 19.7.5. \|t Latent structural SVMs -- \|g 20.1. \|t Introduction -- \|g 20.2. \|t Belief propagation for trees -- \|g 20.2.1. \|t Serial protocol -- \|g 20.2.2. \|t Parallel protocol -- \|g 20.2.3. \|t Gaussian BP -- \|g 20.2.4. \|t Other BP variants -- \|g 20.3. \|t variable elimination algorithm -- \|g 20.3.1. \|t generalized distributive law -- \|g 20.3.2. \|t Computational complexity of VE -- \|g 20.3.3. \|t weakness of VE -- \|g 20.4. \|t junction tree algorithm -- \|g 20.4.1. \|t Creating a junction tree -- \|g 20.4.2. \|t Message passing on a junction tree -- \|g 20.4.3. \|t Computational complexity of JTA -- \|g 20.4.4. \|t JTA generalizations -- \|g 20.5. \|t Computational intractability of exact inference in the worst case -- \|g 20.5.1. \|t Approximate inference -- \|g 21.1. \|t Introduction -- \|g 21.2. \|t Variational inference -- \|g 21.2.1. \|t Alternative interpretations of the variational objective -- \|g 21.2.2. \|t Forward or reverse KL? -- \|g 21.3. \|t mean field method -- \|g 21.3.1. \|t Derivation of the mean field update equations -- \|g 21.3.2. \|t Example: mean field for the Ising model -- \|g 21.4. \|t Structured mean field -- \|g 21.4.1. \|t Example: factorial HMM -- \|g 21.5. \|t Variational Bayes -- \|g 21.5.1. \|t Example: VB for a univariate Gaussian -- \|g 21.5.2. \|t Example: VB for linear regression -- \|g 21.6. \|t Variational Bayes EM -- \|g 21.6.1. \|t Example: VBEM for mixtures of Gaussians -- \|g 21.7. \|t Variational message passing and VIBES -- \|g 21.8. \|t Local variational bounds -- \|g 21.8.1. \|t Motivating applications -- \|g 21.8.2. \|t Bohning's quadratic bound to the log-sum-exp function -- \|g 21.8.3. \|t Bounds for the sigmoid function -- \|g 21.8.4. \|t Other bounds and approximations to the log-sum-exp function -- \|g 21.8.5. \|t Variational inference based on upper bounds -- \|g 22.1. \|t Introduction -- \|g 22.2. \|t Loopy belief propagation: algorithmic issues -- \|g 22.2.1. \|t brief history -- \|g 22.2.2. \|t LBP on pairwise models -- \|g 22.2.3. \|t LBP on a factor graph -- \|g 22.2.4. \|t Convergence -- \|g 22.2.5. \|t Accuracy of LBP -- \|g 22.2.6. \|t Other speedup tricks for LBP -- \|g 22.3. \|t Loopy belief propagation: theoretical issues -- \|g 22.3.1. \|t UGMs represented in exponential family form -- \|g 22.3.2. \|t marginal polytope -- \|g 22.3.3. \|t Exact inference as a variational optimization problem -- \|g 22.3.4. \|t Mean field as a variational optimization problem -- \|g 22.3.5. \|t LBP as a variational optimization problem -- \|g 22.3.6. \|t Loopy BP vs mean field -- \|g 22.4. \|t Extensions of belief propagation -- \|g 22.4.1. \|t Generalized belief propagation -- \|g 22.4.2. \|t Convex belief propagation -- \|g 22.5. \|t Expectation propagation -- \|g 22.5.1. \|t EP as a variational inference problem -- \|g 22.5.2. \|t Optimizing the EP objective using moment matching -- \|g 22.5.3. \|t EP for the clutter problem -- \|g 22.5.4. \|t LBP is a special case of EP -- \|g 22.5.5. \|t Ranking players using TrueSkill -- \|g 22.5.6. \|t Other applications of EP -- \|g 22.6. \|t MAP state estimation -- \|g 22.6.1. \|t Linear programming relaxation -- \|g 22.6.2. \|t Max-product belief propagation -- \|g 22.6.3. \|t Graphcuts -- \|g 22.6.4. \|t Experimental comparison of graphcuts and BP -- \|g 22.6.5. \|t Dual decomposition -- \|g 23.1. \|t Introduction -- \|g 23.2. \|t Sampling from standard distributions -- \|g 23.2.1. \|t Using the cdf -- \|g 23.2.2. \|t Sampling from a Gaussian (Box-Muller method) -- \|g 23.3. \|t Rejection sampling -- \|g 23.3.1. \|t Basic idea -- \|g 23.3.2. \|t Example -- \|g 23.3.3. \|t Application to Bayesian statistics -- \|g 23.3.4. \|t Adaptive rejection sampling -- \|g 23.3.5. \|t Rejection sampling in high dimensions -- \|g 23.4. \|t Importance sampling -- \|g 23.4.1. \|t Basic idea -- \|g 23.4.2. \|t Handling unnormalized distributions -- \|g 23.4.3. \|t Importance sampling for a DGM: likelihood weighting -- \|g 23.4.4. \|t Sampling importance resampling (SIR) -- \|g 23.5. \|t Particle filtering -- \|g 23.5.1. \|t Sequential importance sampling -- \|g 23.5.2. \|t degeneracy problem -- \|g 23.5.3. \|t resampling step -- \|g 23.5.4. \|t proposal distribution -- \|g 23.5.5. \|t Application: robot localization -- \|g 23.5.6. \|t Application: visual object tracking -- \|g 23.5.7. \|t Application: time series forecasting -- \|g 23.6. \|t Rao-Blackwellised particle filtering (RBPF) -- \|g 23.6.1. \|t RBPF for switching LG-SSMs -- \|g 23.6.2. \|t Application: tracking a maneuvering target -- \|g 23.6.3. \|t Application: Fast SLAM -- \|g 24.1. \|t Introduction -- \|g 24.2. \|t Gibbs sampling -- \|g 24.2.1. \|t Basic idea -- \|g 24.2.2. \|t Example: Gibbs sampling for the Ising model -- \|g 24.2.3. \|t Example: Gibbs sampling for inferring the parameters of a GMM -- \|g 24.2.4. \|t Collapsed Gibbs sampling -- \|g 24.2.5. \|t Gibbs sampling for hierarchical GLMs -- \|g 24.2.6. \|t BUGS and JAGS -- \|g 24.2.7. \|t Imputation Posterior (IP) algorithm -- \|g 24.2.8. \|t Blocking Gibbs sampling -- \|g 24.3. \|t Metropolis Hastings algorithm -- \|g 24.3.1. \|t Basic idea -- \|g 24.3.2. \|t Gibbs sampling is a special case of MH -- \|g 24.3.3. \|t Proposal distributions -- \|g 24.3.4. \|t Adaptive MCMC -- \|g 24.3.5. \|t Initialization and mode hopping -- \|g 24.3.6. \|t Why MH works -- \|g 24.3.7. \|t Reversible jump (trans-dimensional) MCMC -- \|g 24.4. \|t Speed and accuracy of MCMC -- \|g 24.4.1. \|t burn-in phase -- \|g 24.4.2. \|t Mixing rates of Markov chains -- \|g 24.4.3. \|t Practical convergence diagnostics -- \|g 24.4.4. \|t Accuracy of MCMC -- \|g 24.4.5. \|t How many chains? -- \|g 25.5. \|t Auxiliary variable MCMC -- \|g 24.5.1. \|t Auxiliary variable sampling for logistic regression -- \|g 24.5.2. \|t Slice sampling -- \|g 24.5.3. \|t Swendsen Wang -- \|g 24.5.4. \|t Hybrid/Hamiltonian MCMC -- \|g 24.6. \|t Annealing methods -- \|g 24.6.1. \|t Simulated annealing -- \|g 24.6.2. \|t Annealed importance sampling -- \|g 24.6.3. \|t Parallel tempering -- \|g 24.7. \|t Approximating the marginal likelihood -- \|g 24.7.1. \|t candidate method -- \|g 24.7.2. \|t Harmonic mean estimate -- \|g 24.7.3. \|t Annealed importance sampling -- \|g 25.1. \|t Introduction -- \|g 25.1.1. \|t Measuring (dis)similarity -- \|g 25.1.2. \|t Evaluating the output of clustering methods -- \|g 25.2. \|t Dirichlet process mixture models -- \|g 25.2.1. \|t From finite to infinite mixture models -- \|g 25.2.2. \|t Dirichlet process -- \|g 25.2.3. \|t Applying Dirichlet processes to mixture modeling -- \|g 25.2.4. \|t Fitting a DP mixture model -- \|g 25.3. \|t Affinity propagation -- \|g 25.4. \|t Spectral clustering -- \|g 25.4.1. \|t Graph Laplacian -- \|g 25.4.2. \|t Normalized graph Laplacian -- \|g 25.4.3. \|t Example -- \|g 25.5. \|t Hierarchical clustering -- \|g 25.5.1. \|t Agglomerative clustering -- \|g 25.5.2. \|t Divisive clustering -- \|g 25.5.3. \|t Choosing the number of dusters -- \|g 25.5.4. \|t Bayesian hierarchical clustering -- \|g 25.6. \|t Clustering datapoints and features -- \|g 25.6.1. \|t Biclustering -- \|g 25.6.2. \|t Multi-view clustering -- \|g 26.1. \|t Introduction -- \|g 26.2. \|t Structure learning for knowledge discovery -- \|g 26.2.1. \|t Relevance networks -- \|g 26.2.2. \|t Dependency networks -- \|g 26.3. \|t Learning tree structures -- \|g 26.3.1. \|t Directed or undirected tree? -- \|g 26.3.2. \|t Chow-Liu algorithm for finding the ML tree structure -- \|g 26.3.3. \|t Finding the MAP forest -- \|g 26.3.4. \|t Mixtures of trees -- \|g 26.4. \|t Learning DAG structures -- \|g 26.4.1. \|t Markov equivalence -- \|g 26.4.2. \|t Exact structural inference -- \|g 26.4.3. \|t Scaling up to larger graphs -- \|g 26.5. \|t Learning DAG structure with latent variables -- \|g 26.5.1. \|t Approximating the marginal likelihood when we have missing data -- \|g 26.5.2. \|t Structural EM -- \|g 26.5.3. \|t Discovering hidden variables -- \|g 26.5.4. \|t Case study: Google's Rephil -- \|g 26.5.5. \|t Structural equation models -- \|g 26.6. \|t Learning causal DAGs -- \|g 26.6.1. \|t Causal interpretation of DAGs -- \|g 26.6.2. \|t Using causal DAGs to resolve Simpson's paradox -- \|g 26.6.3. \|t Learning causal DAG structures -- \|g 26.7. \|t Learning undirected Gaussian graphical models -- \|g 26.7.1. \|t MLE for a GGM -- \|g 26.7.2. \|t Graphical lasso -- \|g 26.7.3. \|t Bayesian inference for GGM structure -- \|g 26.7.4. \|t Handling non-Gaussian data using copulas -- \|g 26.8. \|t Learning undirected discrete graphical models -- \|g 26.8.1. \|t Graphical lasso for MRFs/CRFs -- \|g 26.8.2. \|t Thin junction trees -- \|g 27.1. \|t Introduction -- \|g 27.2. \|t Distributed state LVMs for discrete data -- \|g 27.2.1. \|t Mixture models -- \|g 27.2.2. \|t Exponential family PCA -- \|g 27.2.3. \|t LDA and rnPCA -- \|g 27.2.4. \|t GaP model and non-negative matrix factorization -- \|g 27.3. \|t Latent Dirichlet allocation (LDA) -
505	0	0	\|a Contents note continued: \|g 28.3.2 \|t Deep auto-encoders -- \|g 28.3.3. \|t Stacked denoising auto-encoders -- \|g 28.4. \|t Applications of deep networks -- \|g 28.4.1. \|t Handwritten digit classification using DBNs -- \|g 28.4.2. \|t Data visualization and feature discovery using deep auto-encoders -- \|g 28.4.3. \|t Information retrieval using deep auto-encoders (semantic hashing) -- \|g 28.4.4. \|t Learning audio features using 1d convolutional DBNs -- \|g 28.4.5. \|t Learning image features using 2d convolutional DBNs -- \|g 28.5. \|t Discussion -- \|t Index to code.
505	0	0	\|g 1.1 \|t Machine learning: what and why? -- \|g 1.1.1. \|t Types of machine learning -- \|g 1.2. \|t Supervised learning -- \|g 1.2.1. \|t Classification -- \|g 1.2.2. \|t Regression -- \|g 1.3. \|t Unsupervised learning -- \|g 1.3.1. \|t Discovering clusters -- \|g 1.3.2. \|t Discovering latent factors -- \|g 1.3.3. \|t Discovering graph structure -- \|g 1.3.4. \|t Matrix completion -- \|g 1.4. \|t Some basic concepts in machine learning -- \|g 1.4.1. \|t Parametric vs non-parametric models -- \|g 1.4.2. \|t simple non-parametric classifier: K-nearest neighbors -- \|g 1.4.3. \|t curse of dimensionality -- \|g 1.4.4. \|t Parametric models for classification and regression -- \|g 1.4.5. \|t Linear regression -- \|g 1.4.6. \|t Logistic regression -- \|g 1.4.7. \|t Overfitting -- \|g 1.4.8. \|t Model selection -- \|g 1.4.9. \|t No free lunch theorem -- \|g 2.1. \|t Introduction -- \|g 2.2. \|t brief review of probability theory -- \|g 2.2.1. \|t Discrete random variables -- \|g 2.2.2. \|t Fundamental rules -- \|g 2.2.3. \|t Bayes rule -- \|g 2.2.4. \|t Independence and conditional independence -- \|g 2.2.5. \|t Continuous random variables -- \|g 2.2.6. \|t Quantiles -- \|g 2.2.7. \|t Mean and variance -- \|g 2.3. \|t Some common discrete distributions -- \|g 2.3.1. \|t binomial and Bernoulli distributions -- \|g 2.3.2. \|t multinornial and multinoulli distributions -- \|g 2.3.3. \|t Poisson distribution -- \|g 2.3.4. \|t empirical distribution -- \|g 2.4. \|t Some common continuous distributions -- \|g 2.4.1. \|t Gaussian (normal) distribution -- \|g 2.4.2. \|t Degenerate pdf -- \|g 2.4.3. \|t Laplace distribution -- \|g 2.4.4. \|t gamma distribution -- \|g 2.4.5. \|t beta distribution -- \|g 2.4.6. \|t Pareto distribution -- \|g 2.5. \|t Joint probability distributions -- \|g 2.5.1. \|t Covariance and correlation -- \|g 2.5.2. \|t multivariate Gaussian -- \|g 2.5.3. \|t Multivariate Student t distribution -- \|g 2.5.4. \|t Dirichlet distribution -- \|g 2.6. \|t Transformations of random variables -- \|g 2.6.1. \|t Linear transformations -- \|g 2.6.2. \|t General transformations -- \|g 2.6.3. \|t Central limit theorem -- \|g 2.7. \|t Monte Carlo approximation -- \|g 2.7.1. \|t Example: change of variables, the MC way -- \|g 2.7.2. \|t Example: estimating π by Monte Carlo integration -- \|g 2.7.3. \|t Accuracy of Monte Carlo approximation -- \|g 2.8. \|t Information theory -- \|g 2.8.1. \|t Entropy -- \|g 2.8.2. \|t KL divergence -- \|g 2.8.3. \|t Mutual information -- \|g 3.1. \|t introduction -- \|g 3.2. \|t Bayesian concept learning -- \|g 3.2.1. \|t likelihood -- \|g 3.2.2. \|t Prior -- \|g 3.2.3. \|t Posterior -- \|g 3.2.4. \|t Posterior predictive distribution -- \|g 3.2.5. \|t more complex prior -- \|g 3.3. \|t beta-binomial model -- \|g 3.3.1. \|t Likelihood -- \|g 3.3.2. \|t Prior -- \|g 3.3.3. \|t Posterior -- \|g 3.3.4. \|t Posterior predictive distribution -- \|g 3.4. \|t Dirichlet-multinomial model -- \|g 3.4.1. \|t Likelihood -- \|g 3.4.2. \|t Prior -- \|g 3.4.3. \|t Posterior -- \|g 3.4.4. \|t Posterior predictive -- \|g 3.5. \|t Naive Bayes classifiers -- \|g 3.5.1. \|t Model fitting -- \|g 3.5.2. \|t Using the model for prediction -- \|g 3.5.3. \|t log-sum-exp trick -- \|g 3.5.4. \|t Feature selection using mutual information -- \|g 3.5.5. \|t Classifying documents using bag of words -- \|g 4.1. \|t Introduction -- \|g 4.1.1. \|t Notation -- \|g 4.1.2. \|t Basics -- \|g 4.1.3. \|t MLE for an MVN -- \|g 4.1.4. \|t Maximum entropy derivation of the Gaussian -- \|g 4.2. \|t Gaussian discriminant analysis -- \|g 4.2.1. \|t Quadratic discriminant analysis (QDA) -- \|g 4.2.2. \|t Linear discriminant analysis (LDA) -- \|g 4.2.3. \|t Two-class LDA -- \|g 4.2.4. \|t MLE for discriminant analysis -- \|g 4.2.5. \|t Strategies for preventing overfitting -- \|g 4.2.6. \|t Regularized LDA -- \|g 4.2.7. \|t Diagonal LDA -- \|g 4.2.8. \|t Nearest shrunken centroids classifier -- \|g 4.3. \|t Inference in jointly Gaussian distributions -- \|g 4.3.1. \|t Statement of the result -- \|g 4.3.2. \|t Examples -- \|g 4.3.3. \|t Information form -- \|g 4.3.4. \|t Proof of the result -- \|g 4.4. \|t Linear Gaussian systems -- \|g 4.4.1. \|t Statement of the result -- \|g 4.4.2. \|t Examples -- \|g 4.4.3. \|t Proof of the result -- \|g 4.5. \|t Digression: The Wishart distribution -- \|g 4.5.1. \|t Inverse Wishart distribution -- \|g 4.5.2. \|t Visualizing the Wishart distribution -- \|g 4.6. \|t Inferring the parameters of an MVN -- \|g 4.6.1. \|t Posterior distribution of -- \|g 4.6.2. \|t Posterior distribution of Σ -- \|g 4.6.3. \|t Posterior distribution of and [sigma] -- \|g 4.6.4. \|t Sensor fusion with unknown precisions -- \|g 5.1. \|t Introduction -- \|g 5.2. \|t Summarizing posterior distributions -- \|g 5.2.1. \|t MAP estimation -- \|g 5.2.2. \|t Credible intervals -- \|g 5.2.3. \|t Inference for a difference in proportions -- \|g 5.3. \|t Bayesian model selection -- \|g 5.3.1. \|t Bayesian Occam's razor -- \|g 5.3.2. \|t Computing the marginal likelihood (evidence) -- \|g 5.3.3. \|t Bayes factors -- \|g 5.3.4. \|t Jeffreys-Lindley paradox -- \|g 5.4. \|t Priors -- \|g 5.4.1. \|t Uninformative priors -- \|g 5.4.2. \|t Jeffreys priors -- \|g 5.4.3. \|t Robust priors -- \|g 5.4.4. \|t Mixtures of conjugate priors -- \|g 5.5. \|t Hierarchical Bayes -- \|g 5.5.1. \|t Example: modeling related cancer rates -- \|g 5.6. \|t Empirical Bayes -- \|g 5.6.1. \|t Example: beta-binomial model -- \|g 5.6.2. \|t Example: Gaussian-Gaussian model -- \|g 5.7. \|t Bayesian decision theory -- \|g 5.7.1. \|t Bayes estimators for common loss functions -- \|g 5.7.2. \|t false positive vs false negative tradeoff -- \|g 5.7.3. \|t Other topics -- \|g 6.1. \|t Introduction -- \|g 6.2. \|t Sampling distribution of an estimator -- \|g 6.2.1. \|t Bootstrap -- \|g 6.2.2. \|t Large sample theory for the MLE -- \|g 6.3. \|t Frequentist decision theory -- \|g 6.3.1. \|t Bayes risk -- \|g 6.3.2. \|t Minimax risk -- \|g 6.3.3. \|t Admissible estimators -- \|g 6.4. \|t Desirable properties of estimators -- \|g 6.4.1. \|t Consistent estimators -- \|g 6.4.2. \|t Unbiased estimators -- \|g 6.4.3. \|t Minimum variance estimators -- \|g 6.4.4. \|t bias-variance tradeoff -- \|g 6.5. \|t Empirical risk minimization -- \|g 6.5.1. \|t Regularized risk minimization -- \|g 6.5.2. \|t Structural risk minimization -- \|g 6.5.3. \|t Estimating the risk using cross validation -- \|g 6.5.4. \|t Upper bounding the risk using statistical learning theory -- \|g 6.5.5. \|t Surrogate loss functions -- \|g 6.6. \|t Pathologies of frequentist statistics -- \|g 6.6.1. \|t Counter-intuitive behavior of confidence intervals -- \|g 6.6.2. \|t p-values considered harmful -- \|g 6.6.3. \|t likelihood principle -- \|g 6.6.4. \|t Why isn't everyone a Bayesian? -- \|g 7.1. \|t Introduction -- \|g 7.2. \|t Model specification -- \|g 7.3. \|t Maximum likelihood estimation (least squares) -- \|g 7.3.1. \|t Derivation of the MLE -- \|g 7.3.2. \|t Geometric interpretation -- \|g 7.3.3. \|t Convexity -- \|g 7.4. \|t Robust linear regression -- \|g 7.5. \|t Ridge regression -- \|g 7.5.1. \|t Basic idea -- \|g 7.5.2. \|t Numerically stable computation -- \|g 7.5.3. \|t Connection with PCA -- \|g 7.5.4. \|t Regularization effects of big data -- \|g 7.6. \|t Bayesian linear regression -- \|g 7.6.1. \|t Computing the posterior -- \|g 7.6.2. \|t Computing the posterior predictive -- \|g 7.6.3. \|t Bayesian inference when σ2 is unknown -- \|g 7.6.4. \|t EB for linear regression (evidence procedure) -- \|g 8.1. \|t Introduction -- \|g 8.2. \|t Model specification -- \|g 8.3. \|t Model fitting -- \|g 8.3.1. \|t MLE -- \|g 8.3.2. \|t Steepest descent -- \|g 8.3.3. \|t Newton's method -- \|g 8.3.4. \|t Iteratively reweighted least squares (IRLS) -- \|g 8.3.5. \|t Quasi-Newton (variable metric) methods -- \|g 8.3.6. \|t l2 regularization -- \|g 8.3.7. \|t Multi-class logistic regression -- \|g 8.4. \|t Bayesian logistic regression -- \|g 8.4.1. \|t Laplace approximation -- \|g 8.4.2. \|t Derivation of the BIC -- \|g 8.4.3. \|t Gaussian approximation for logistic regression -- \|g 8.4.4. \|t Approximating the posterior predictive -- \|g 8.4.5. \|t Residual analysis (outlier detection) -- \|g 8.5. \|t Online learning and stochastic optimization -- \|g 8.5.1. \|t Online learning and regret minimization -- \|g 8.5.2. \|t Stochastic optimization and risk minimization -- \|g 8.5.3. \|t LMS algorithm -- \|g 8.5.4. \|t perceptron algorithm -- \|g 8.5.5. \|t Bayesian view -- \|g 8.6. \|t Generative vs discriminative classifiers -- \|g 8.6.1. \|t Pros and cons of each approach -- \|g 8.6.2. \|t Dealing with missing data -- \|g 8.6.3. \|t Fisher's linear discriminant analysis (FLDA) -- \|g 9.1. \|t Introduction -- \|g 9.2. \|t exponential family -- \|g 9.2.1. \|t Definition -- \|g 9.2.2. \|t Examples -- \|g 9.2.3. \|t Log partition function -- \|g 9.2.4. \|t MLE for the exponential family -- \|g 9.2.5. \|t Bayes for the exponential family -- \|g 9.2.6. \|t Maximum entropy derivation of the exponential family -- \|g 9.3. \|t Generalized linear models (GLMs) -- \|g 9.3.1. \|t Basics -- \|g 9.3.2. \|t ML and MAP estimation -- \|g 9.3.3. \|t Bayesian inference -- \|g 9.4. \|t Probit regression -- \|g 9.4.1. \|t ML/MAP estimation using gradient-based optimization -- \|g 9.4.2. \|t Latent variable interpretation -- \|g 9.4.3. \|t Ordinal probit regression -- \|g 9.4.4. \|t Multinomial probit models -- \|g 9.5. \|t Multi-task learning -- \|g 9.5.1. \|t Hierarchical Bayes for multi-task learning -- \|g 9.5.2. \|t Application to personalized email spam filtering -- \|g 9.5.3. \|t Application to domain adaptation -- \|g 9.5.4. \|t Other kinds of prior -- \|g 9.6. \|t Generalized linear mixed models -- \|g 9.6.1. \|t Example:
505	0	0	\|t - \|g 27.3.1 \|t Basics -- \|g 27.3.2. \|t Unsupervised discovery of topics -- \|g 27.3.3. \|t Quantitatively evaluating LDA as a language model -- \|g 27.3.4. \|t Fitting using (collapsed) Gibbs sampling -- \|g 27.3.5. \|t Example -- \|g 27.3.6. \|t Fitting using batch variational inference -- \|g 27.3.7. \|t Fitting using online variational inference -- \|g 27.3.8. \|t Determining the number of topics -- \|g 27.4. \|t Extensions of LDA -- \|g 27.4.1. \|t Correlated topic model -- \|g 27.4.2. \|t Dynamic topic model -- \|g 27.4.3. \|t LDA-HMM -- \|g 27.4.4. \|t Supervised LDA -- \|g 27.5. \|t LVMs for graph-structured data -- \|g 27.5.1. \|t Stochastic block model -- \|g 27.5.2. \|t Mixed membership stochastic block model -- \|g 27.5.3. \|t Relational topic model -- \|g 27.6. \|t LVMs for relational data -- \|g 27.6.1. \|t Infinite relational model -- \|g 27.6.2. \|t Probabilistic matrix factorization for collaborative filtering -- \|g 27.7. \|t Restricted Boltzmann machines (RBMs) -- \|g 27.7.1. \|t Varieties of RBMs -- \|g 27.7.2. \|t Learning RBMs -- \|g 27.7.3. \|t Applications of RBMs -- \|g 28.1. \|t Introduction -- \|g 28.2. \|t Deep generative models -- \|g 28.2.1. \|t Deep directed networks -- \|g 28.2.2. \|t Deep Boltzmann machines -- \|g 28.2.3. \|t Deep belief networks -- \|g 28.2.4. \|t Greedy layer-wise learning of DBNs -- \|g 28.3. \|t Deep neural networks -- \|g 28.3.1. \|t Deep multi-layer perceptrons --
505	0	0	\|t - \|g 9.6.2 \|t Computational issues -- \|g 9.7. \|t Learning to rank -- \|g 9.7.1. \|t pointwise approach -- \|g 9.7.2. \|t pairwise approach -- \|g 9.7.3. \|t listwise approach -- \|g 9.7.4. \|t Loss functions for ranking -- \|g 10.1. \|t Introduction -- \|g 10.1.1. \|t Chain rule -- \|g 10.1.2. \|t Conditional independence -- \|g 10.1.3. \|t Graphical models -- \|g 10.1.4. \|t Graph terminology -- \|g 10.1.5. \|t Directed graphical models -- \|g 10.2. \|t Examples -- \|g 10.2.1. \|t Naive Bayes classifiers -- \|g 10.2.2. \|t Markov and hidden Markov models -- \|g 10.2.3. \|t Medical diagnosis -- \|g 10.2.4. \|t Genetic linkage analysis -- \|g 10.2.5. \|t Directed Gaussian graphical models -- \|g 10.3. \|t Inference -- \|g 10.4. \|t Learning -- \|g 10.4.1. \|t Plate notation -- \|g 10.4.2. \|t Learning from complete data -- \|g 10.4.3. \|t Learning with missing and/or latent variables -- \|g 10.5. \|t Conditional independence properties of DGMs -- \|g 10.5.1. \|t d-separation and the Bayes Ball algorithm (global Markov properties) -- \|g 10.5.2. \|t Other Markov properties of DGMs -- \|g 10.5.3. \|t Markov blanket and full conditionals -- \|g 10.6. \|t Influence (decision) diagrams -- \|g 11.1. \|t Latent variable models -- \|g 11.2. \|t Mixture models -- \|g 11.2.1. \|t Mixtures of Gaussians -- \|g 11.2.2. \|t Mixture of multinoullis --
505	0	0	\|t f
505	0	0	\|t semi-parametric GLMMs for medical data -
505	0	0	\|t ully observed data -- \|g 18.4.3 \|t EM for LG-SSM -- \|g 18.4.4. \|t Subspace methods -- \|g 18.4.5. \|t Bayesian methods for "fitting" LG-SSMs -- \|g 18.5. \|t Approximate online inference for non-linear, non-Gaussian SSMs -- \|g 18.5.1. \|t Extended Kalman filter (EKF) -- \|g 18.5.2. \|t Unscented Kalman filter (UKF) -- \|g 18.5.3. \|t Assumed density filtering (ADF) -- \|g 18.6. \|t Hybrid discrete/continuous SSMs -- \|g 18.6.1. \|t Inference -- \|g 18.6.2. \|t Application: data association and multi-target tracking -- \|g 18.6.3. \|t Application: fault diagnosis -- \|g 18.6.4. \|t Application: econometric forecasting -- \|g 19.1. \|t Introduction -- \|g 19.2. \|t Conditional independence properties of UGMs -- \|g 19.2.1. \|t Key properties -- \|g 19.2.2. \|t undirected alternative to d-separation -- \|g 19.2.3. \|t Comparing directed and undirected graphical models -- \|g 19.3. \|t Parameterization of MRFs -- \|g 19.3.1. \|t Hammersley-Clifford theorem -- \|g 19.3.2. \|t Representing potential functions -- \|g 19.4. \|t Examples of MRFs -- \|g 19.4.1. \|t Ising model -- \|g 19.4.2. \|t Hopfield networks -- \|g 19.4.3. \|t Potts model -- \|g 19.4.4. \|t Gaussian MRFs -- \|g 19.4.5. \|t Markov logic networks -- \|g 19.5. \|t Learning -- \|g 19.5.1. \|t Training maxent models using gradient methods -- \|g 19.5.2. \|t Training partially observed maxent models --
520			\|a "This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach. The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic methods, the book stresses a principled model-based approach, often using the language of graphical models to specify models in a concise and intuitive way. Almost all the models described have been implemented in a MATLAB software package--PMTK (probabilistic modeling toolkit)--that is freely available online"--Back cover
650		0	\|a Machine learning
650		0	\|a Probabilities
650		7	\|a Machine learning \|2 fast
650		7	\|a Probabilities \|2 fast
776	1		\|c Electronic resource \|z 9780262306164
830		0	\|a Adaptive computation and machine learning series
830		0	\|a Adaptive computation and machine learning
999	1	0	\|i 38368390-14df-4ae8-b607-a77ce0f48c8a \|l 11683574 \|s US-CTY \|m machine_learningprobabilistic_perspective__________________________________20121067___mitpra________________________________________murphy__kevin_p____________________p
999	1	0	\|i 38368390-14df-4ae8-b607-a77ce0f48c8a \|l 991039657769707861 \|s US-MDBJ \|m machine_learningprobabilistic_perspective__________________________________20121067___mitpra________________________________________murphy__kevin_p____________________p
999	1	0	\|i 38368390-14df-4ae8-b607-a77ce0f48c8a \|l 990056635500108501 \|s US-NCD \|m machine_learningprobabilistic_perspective__________________________________20121067___mitpra________________________________________murphy__kevin_p____________________p
999	1	0	\|i 38368390-14df-4ae8-b607-a77ce0f48c8a \|l 7929714 \|s US-NIC \|m machine_learningprobabilistic_perspective__________________________________20121067___mitpra________________________________________murphy__kevin_p____________________p
999	1	0	\|i 38368390-14df-4ae8-b607-a77ce0f48c8a \|l 9971978883506421 \|s US-NJP \|m machine_learningprobabilistic_perspective__________________________________20121067___mitpra________________________________________murphy__kevin_p____________________p
999	1	0	\|i 38368390-14df-4ae8-b607-a77ce0f48c8a \|l 9958428573503681 \|s US-PU \|m machine_learningprobabilistic_perspective__________________________________20121067___mitpra________________________________________murphy__kevin_p____________________p
999	1	0	\|i 38368390-14df-4ae8-b607-a77ce0f48c8a \|l 991035706799706966 \|s US-RPB \|m machine_learningprobabilistic_perspective__________________________________20121067___mitpra________________________________________murphy__kevin_p____________________p
999	1	1	\|l 11683574 \|s ISIL:US-CTY \|t BKS \|a sml \|b 39002124030470 \|c Q325.5 .M87X 2012 \|g 0 \|v 1 piece \|x circ \|y 10616633 \|p LOANABLE
999	1	1	\|l 991039657769707861 \|s ISIL:US-MDBJ \|t BKS \|a LSC shmoffs \|b 31151034179683 \|c Q325.5 .M87 2012 \|d 0 \|x jhbooks \|y 23453586290007861 \|p LOANABLE
999	1	1	\|l 990056635500108501 \|s ISIL:US-NCD \|t BKS \|a PERKN PKX2 \|b D04495461X \|d 0 \|x BOOK \|y 23767522200008501 \|p UNLOANABLE
999	1	1	\|l 990056635500108501 \|s ISIL:US-NCD \|t BKS \|a PERKN PKX \|b D03616010H \|c Q325.5 .M87 2012 \|d 0 \|x BOOK \|y 23767522220008501 \|p UNLOANABLE
999	1	1	\|l 7929714 \|s ISIL:US-NIC \|t BKS \|a uris \|b 31924117139356 \|c Oversize Q325.5 .M87 2012 + \|d lc \|k 1 \|x Book \|y 2bb14f22-cb0a-4bc1-96ab-250387ce4e6e \|p LOANABLE
999	1	1	\|l 9971978883506421 \|s ISIL:US-NJP \|t BKS \|a lewis stacks \|b 32101109833903 \|c Q325.5 .M87 2012 \|d 0 \|x Gen \|y 23632108580006421 \|p UNLOANABLE
999	1	1	\|l 9971978883506421 \|s ISIL:US-NJP \|t BKS \|a engineer stacks \|b 32101071111601 \|c Q325.5 .M87 2012 \|d 0 \|x Gen \|y 23632108600006421 \|p UNLOANABLE
999	1	1	\|l 9971978883506421 \|s ISIL:US-NJP \|t BKS \|a engineer stacks \|b 32101097602286 \|c Q325.5 .M87 2012 \|d 0 \|x Gen \|y 23632108560006421 \|p UNLOANABLE
999	1	1	\|l 9971978883506421 \|s ISIL:US-NJP \|t BKS \|a engineer stacks \|b 32101097600124 \|c Q325.5 .M87 2012 \|d 0 \|x Gen \|y 23632108520006421 \|p UNLOANABLE
999	1	1	\|l 9971978883506421 \|s ISIL:US-NJP \|t BKS \|a engineer stacks \|b 32101097602294 \|c Q325.5 .M87 2012 \|d 0 \|x Gen \|y 23632108540006421 \|p UNLOANABLE
999	1	1	\|l 9958428573503681 \|s ISIL:US-PU \|t BKS \|a VanPeltLib vanp \|b 31198060849929 \|c Q325.5 .M87 2012 \|d 0 \|x BOOK \|y 23304155170003681 \|p LOANABLE
999	1	1	\|l 991035706799706966 \|s ISIL:US-RPB \|t BKS \|a SCIENCE STACKS \|b 31236096533131 \|c Q325.5 .M87 2012 \|d 0 \|y 23253166710006966 \|p LOANABLE
999	1	1	\|l 991035706799706966 \|s ISIL:US-RPB \|t BKS \|a RES_SHARE IN_RS_REQ \|b 31236105453040 \|c Q325.5 .M87 2012 \|d 0 \|y 23253166720006966 \|p UNLOANABLE

Machine learning : a probabilistic perspective /

Similar Items