David MacKay
.






Search :
.

Information Theory, Pattern Recognition and Neural Networks

Part III Physics Course: Minor option. 12 lectures.

What's new, 2009:

  1. Lecture times: Mondays and Wednesdays, 2pm, starting 26th January. Pippard Lecture Theatre
  2. Course synopsis: | Handout 1 (2 pages postscript) | pdf |
  3. Handout 2: | Handout 2 (2 pages postscript) | pdf |

What's new, 2008:

  1. Course synopsis: | Handout 1 (2 pages postscript) | pdf |
  2. Handout 2: | Handout 2 (2 pages postscript) | pdf |

What's new, 2007:

  1. Course synopsis: | Handout 1 (2 pages postscript) | pdf |
  2. Handout 2: | Handout 2 (2 pages postscript) | pdf |
  3. Handout 3: | Handout 3 (1 pages postscript) | pdf |
  4. PAST PAPERS: || Papers from 2004, 2005, 2006, with some worked solutions (postscript) (pdf) | | The whole 2006 paper (pdf) | | 2001 questions | & solutions |
  5. | solutions for 2007 |
  6. Slides for lectures 1-12 are available on line.
  7. perl program for Huffman algorithm: huffman.p
    python programs for Huffman algorithm: huffman10.py
    Bent coin: a sparse file containing N=10000 bits of which roughly 0.01 are 1s.

The main thing at this site is the free on-line course textbook Information Theory, Inference and Learning Algorithms, (which also has its own website).

An old (2006) incarnation of this website is here


Want to ask a question?

- please click to see the FAQ about the book, the course, or the software.

Want to give feedback on the book, or report typos?

Great, to ask a question about the book please use this FAQ; to report a typo, mail me. THANK YOU! List of corrections already reported

Mathematical prerequisites for ITPRNN course

The level of Mathematics assumed is 1A Maths for Natural Scientists.

All the following topics will be reviewed.
  • Probability and statistics:
    • definition of conditional probability, joint probability, marginal probability;
    • Bayes' theorem;
    • definition of independence;
    • mean and standard deviation of a sum of independent random variables; law of large numbers.
  • Linear algebra:
    • inner products between vectors;
    • Taylor expansions and gradients of functions over vector spaces;
    • eigenvectors and eigenvalues of symmetric matrices.
  • Differentiation using Chain rule.
  • Statistical physics:
    • definition of partition function;
    • obtaining thermodynamic functions from the partition function;
    • entropy of a set of N binary spins as a function of the fraction x of spins that are +1;
    • relationship between canonical ensemble and microcanonical ensemble, and equivalence of the Boltzmann and Gibbs entropies S=k_B log and S= p_i 1/p_i (for large systems).

Course Summary


Information Theory, Pattern Recognition and Neural Networks

Minor Option [16 lecture synopsis]
(from 2006, the course is reduced to 12 lectures)
Lecturer: David MacKay
Introduction to information theory [1]
The possibility of reliable communication over unreliable channels. The (7,4) Hamming code and repetition codes.
Entropy and data compression [3]
Entropy, conditional entropy, mutual information, Shannon information content. The idea of typicality and the use of typical sets for source coding. Shannon's source coding theorem. Codes for data compression. Uniquely decodeable codes and the Kraft-MacMillan inequality. Completeness of a symbol code. Prefix codes. Huffman codes. Arithmetic coding.
Communication over noisy channels [3]
Definition of channel capacity. Capacity of binary symmetric channel; of binary erasure channel; of Z channel. Joint typicality, random codes, and Shannon's noisy channel coding theorem. Real channels and practical error-correcting codes. Hash codes.
Statistical inference, data modelling and pattern recognition [2]
The likelihood function and Bayes' theorem. Clustering as an example
Approximation of probability distributions [2]
Laplace's method. (Approximation of probability distributions by Gaussian distributions.)
Monte Carlo methods: Importance sampling, rejection sampling, Gibbs sampling, Metropolis method. (Slice sampling, Hybrid Monte Carlo, Overrelaxation, exact sampling. *)
Variational methods and mean field theory. Ising models.
Neural networks and content-addressable memories [2]
The Hopfield network. [* = non-examinable]
Bibliography

Back to Synopsis page 1 | Lecture and reading summary

Bibliography

The course textbook is Information theory, inference, and learning algorithms, by D.J.C.MacKay (2003, C.U.P.) (Rayleigh library: 39 M 20; Betty & Gordon Moore Library: Q360.M33 2003). This 640-page textbook covers the whole course, and a whole lot more. All students are strongly encouraged to buy or borrow this textbook. If you buy it at the CUP bookshop and show your University ID, you can get a 20% discount. You may download the book for free too. Guarantee: If you buy the book, then decide that you don't want to keep it, I will buy it from you for a good price and sell it on to a future student.

Other online resources

  1. A nice summary of graphical models by Kevin Murphy
  2. Introduction to Statistical Thinking - Michael Lavine offers a free book on statistics, emphasizing the likelihood principle.
  3. computer vision/image analysis/imaging books online

Textbook recommendations

Other highly recommended books are as follows; I especially recommend Goldie and Pinch (1991), Bishop (1995), and Sivia (1996), which are all reasonably priced.

For information theory and coding theory, excellent texts are McEliece (1977) and the original book by Shannon and Weaver (1949), but these are hard to find (ask your library to get Shannon (1993)). Three excellent alternatives are Hamming (1986), Goldie and Pinch (1991), and Welsh (1988). Golomb et al. (1994) is readable and discusses the practical side of coding theory as well as information theory. Gallager (1968) is similar and goes into more depth; it's a good book. Cover and Thomas (1991) is also good, though their approach is theoretical rather than practical. An important journal paper on Arithmetic coding is Witten et al. (1987) (available in the Pt II/III library).

For neural networks and pattern recognition, an excellent text is Bishop (1995); Ripley (1996) is also recommended. Ripley's book is encyclopaedic, covering a wide range of statistical models and giving large numbers of citations of the original literature; he includes a set of practical data sets which are referred to frequently throughout the book, and he also goes into some theoretical depth. Ripley's coverage is from the point of view of the statistician. Bishop's perspective is that of the Physicist-Engineer. Real data sets do not appear in Bishop's book, but simple examples are given throughout, and Bishop includes exercises too. An alternative text which emphasises connections between neural networks and statistical physics is Hertz et al. (1991). This text discusses Hopfield networks at length, unlike Bishop (1995) and Ripley (1996). An older text on pattern recognition is Duda and Hart (1973), recently republished (Duda et al., 2000) - recommended. An older book on neural networks which was written at the start of the latest craze of neural nets is Rumelhart and McClelland (1986). It's an exciting read. An excellent book on the state of the art in supervised neural network methods is Neal (1996).

For pure statistical inference, I highly recommend Sivia (1996); Berger (1985) and Bretthorst (1988) (now out of print) are also very good. Jeffreys (1939) is an important classic, and Box and Tiao (1973) is well worth reading too. Connections between statistical inference and statistical Physics are explored in the essential essays of Jaynes (Rosenkrantz, 1983). For further reading on graphical models and Bayesian belief networks, which have widespread importance in the Artifical Intelligence community, Jensen (1996) is recommended; it includes a floppy disc with the Hugin software for simulating Bayesian networks. A more theoretical text on graphical models is Lauritzen (1996). For further reading about probabilistic modelling of proteins and nucleic acids, I highly recommend Durbin et al. (1998).

Berger, J. (1985) Statistical Decision theory and Bayesian Analysis. Springer.

Bishop, C. M. (1995) Neural Networks for Pattern Recognition. Oxford University Press.

Box, G. E. P., and Tiao, G. C. (1973) Bayesian inference in statistical analysis. Addison-Wesley.

Bretthorst, G. (1988) Bayesian spectrum analysis and parameter estimation. Springer. Also available at bayes.wustl.edu.

Cover, T. M., and Thomas, J. A. (1991) Elements of Information Theory. New York: Wiley.

Duda, R. O., and Hart, P. E. (1973) Pattern Classification and Scene Analysis. Wiley.

Duda, R. O., Hart, P. E., and Stork, D. G. (2000) Pattern Classification. New York: Wiley. 2nd Edition.

Durbin, R., Eddy, S. R., Krogh, A., and Mitchison, G. (1998) Biological Sequence Analysis. Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press.

Gallager, R. G. (1968) Information Theory and Reliable Communication. New York: Wiley.

Goldie, C. M., and Pinch, R. G. E. (1991) Communication theory. Cambridge: Cambridge University Press.

Golomb, S. W., Peile, R. E., and Scholtz, R. A. (1994) Basic Concepts in Information Theory and Coding: The Adventures of Secret Agent 00111 . New York: Plenum Press.

Hamming, R. W. (1986) Coding and Information Theory. Englewood Cliffs, NJ: Prentice-Hall, 2nd edition.

Hertz, J., Krogh, A., and Palmer, R. G. (1991) Introduction to the Theory of Neural Computation. Addison-Wesley.

Jeffreys, H. (1939) Theory of Probability. Oxford Univ. Press. 3rd edition reprinted 1985.

Jensen, F. V. (1996) An Introduction to Bayesian Networks. London: UCL press.

Lauritzen, S. L. (1996) Graphical Models. Number 17 in Oxford Statistical Science Series. Oxford: Clarendon Press.

McEliece, R. J. (1977, recently reprinted in 2nd edn by C.U.P.) The Theory of Information and Coding: A Mathematical Framework for Communication. Reading, Mass.: Addison-Wesley.

Neal, R. M. (1996) Bayesian Learning for Neural Networks. Number 118 in Lecture Notes in Statistics. New York: Springer.

Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge.

Rosenkrantz, R. D. (1983) E.T. Jaynes. Papers on Probability, Statistics and Statistical Physics. Kluwer.

Rumelhart, D. E., and McClelland, J. E. (1986) Parallel Distributed Processing. Cambridge Mass.: MIT Press.

Shannon, C. E. (1993) Collected Papers. New York: IEEE Press. Edited by N. J. A. Sloane and A. D. Wyner.

Shannon, C. E., and Weaver, W. (1949) The Mathematical Theory of Communication. Urbana: Univ. of Illinois Press.

Sivia, D. S. (1996) Data Analysis. A Bayesian Tutorial. Oxford University Press.

Welsh, D. (1988) Codes and Cryptography. Clarendon press.

Witten, I. H., Neal, R. M., and Cleary, J. G. (1987) Arithmetic coding for data compression. Communications of the ACM 30 (6): 520-540.


In 2012, Emli-Mari Nel kindly organised the filming of 16 lectures.
From 2nd November 2012, You can view the videos online (synchronized with snapshots and slides) at the video lectures website.
videolectures (The original videos of the lectures and slides are also available at http://www.inference.org.uk/itprnn_lectures)



Supervision information

Supervisions are on Thursdays and Fridays.

Thursdays in TCM seminar room (top floor, Mott building) Fridays 1.45-2.45pm in Mott seminar room (Room 531) Fridays 3-4pm in Mott seminar room, mostly
Thu 5th February 4pm-5pmFri 6 FebFri 6 Feb pippard
Thu 12 Feb 4pm-5pmFri 13 FebFri 13 Feb
Thu 19 Feb 5pm-6pmFri 20 FebFri 20 Feb
Thu 26 Feb 5pm-6pmFri 27 FebFri 27 Feb
Thu 5 March 5pm-6pmFri 6 MarFri 6 Mar pippard
Thu 12 March 5pm-6pmFri 13 MarFri 13 Mar

What work to do

The book contains numerous exercises, complete with worked solutions. A subset of these exercises will be designated as the exercises you should work on each week. [Generally, these exercises are the ones marked `highly recommended' by a marginal rat.] I encourage you not to look at the worked solutions for these exercises before you have attempted them yourself.

In the supervisions, we will focus in detail on just a few of the exercises. (You should work on all the exercises yourselves before supervisions.)

So far

The first week's recommended exercises: 1.3 (p.8), 1.5-7 (p.13), 1.9, & 1.11 (p.14).


Information Theory, Inference, and Learning Algorithms

(Hardback, 640 pages, Published September 2003)

Order your copy

Price: £35.00 / $60.00 from |CUP UK/USA| |amazon.co.uk/.com/.ca/.co.jp| | Barnes & Noble USA | booksprice. | fetchbook.info | allbookstores | biggerbooks | blackwells | directtextbook | kalahari.net (South Africa) | Special Paperback edition for South Asia.|


Download the book too

You can browse and search the book on Google books.

You may download The book in one file (640 pages):

U.K. english Canada canada South Africa south africa
PDF (A4) pdf (9M) (fourth printing, March 2005) pdf pdf
Postscript (A4) postscript (fourth printing, March 2005) (5M) postscript postscript
EPUB - experimental format epub file (fourth printing) (1.4M) ( ebook-convert --isbn 9780521642989 --authors "David J C MacKay" --book-producer "David J C MacKay" --comments "Information theory, inference, and learning algorithms - experimental epub version 31.8.2014" --language "English" --pubdate "2003" --title "Information theory, inference, and learning algorithms" --cover ~/pub/itila/images/Sept2003Cover.jpg book.pdf book.epub ) experimental epub file was created in UK 31 Aug 2014
DJVU djvu file (6M) (2nd printing) djvu file djvu file
(djvu information | Download djView)
Just the words (latex) [provided for convenient searching] (2.4M) (latex) (latex)
Just the figures NEW All in one file (postscript) [provided for use of teachers] (2M)
(pdf) (5M) (ps.gz) (pdf)
In individual eps files
Individual chapters postscript and pdf available from this page mirror mirror

Notes:

  • Version 6.0 was released Thu 26/6/03; the book is finished. You are welcome to view the book on-screen. Version 6.0 was used for the first printing, published by C.U.P. September 2003.
  • Version 6.6 was released Mon 22/12/03; it will be used for the second printing, to be released January 2004. In this second printing, a small number of typographical errors were corrected, and the design of the book was altered slightly. Page-numbering generally remains unchanged, except in chapters 1, 6, and 28, where a few paragraphs, figures, and equations have moved around. All equation, section, and exercise numbers are unchanged.
  • Version 7.0 is the third printing (November 2004). Its only differences from the 2nd printing are a number of corrections, and the renaming of the chapter `Correlated Random Variables' to `Dependent Random Variables'.

Copyright issues: The book is copyright (c) Cambridge University Press. It has been available in bookstores since September 2003. The cover price in 2003 was 30 pounds (UK) and $50 (USA); in 2006, 35 pounds and $60 (USA).

Now the book is published, these files will remain viewable on this website. The same copyright rules will apply to the online copy of the book as apply to normal books. [e.g., copying the whole book onto paper is not permitted.]

History:
Draft 1.1.1 - March 14 1997.
Draft 1.2.1 - April 4 1997.
Draft 1.2.3 - April 9 1997.
Draft 1.2.4 - April 10 1997. Margins altered so as to print better on Northamerican paper
Draft 1.3.0 - December 23 1997.
Draft 1.9.0 - Feb 1 1999.
Draft 2.0.0 - Jan 6 2000. New page layout.
Draft 2.2.0 - Dec 23 2000. Fresh draft.
Draft 3.1415 - Jan 12 2003. Nearly finished.
Draft 4.0 - April 15 2003. Chapter sequence finalized.
Draft 4.1 - April 18 2003. Adding new frontmatter (Preface etc)to book. Corrected printing of letter version.
Version 6.0 - Thu 26 June 2003. Final version.
Version 6.6 - Mon 22 December 2003. Second printing.

  • Here is my method for converting to two-up under linux:
    pstops '4:0L@.67(20cm,1cm)+1L@.67(20cm,15cm),3R@.67(1cm,15.25cm)\
    +2R@.67(1cm,29.25cm)' $*.ps $*.dps 

Errors in the book

I would be grateful to hear about errors in my lecture notes.

A list of corrections is provided.

Please send me new corrections by email. Thank you!

If you have a query about the book that you think other people might also ask, please use one of the following links to submit your query through metaFAQ; you may find the answer is already there:
Query categories
  • Any other query

  • Software, demonstrations, etc.

    Hint: To force download to file, use shift-click

    The tcl programs and other links marked (*) can be run directly in netscape if you have got the tcl/tk plugin. You can also run the tcl programs in the normal way if you have a linux machine with X windows and tcl and you download the tcl files then execute them (this is preferable to the netscape approach, I think). [The top four lines of each tcl file contain flags which change the program from working with the netscape plugin to working alone under X.] The other programs can be used by people on unix systems that have the relevant software (perl, gnuplot, octave.)

    Please select a software category from the sidebar.



    Hint: To force download to file, use shift-click

    The tcl programs and other links marked (*) can be run directly in netscape if you have got the tcl/tk plugin. You can also run the tcl programs in the normal way if you have a linux machine with X windows and tcl and you download the tcl files then execute them (this is preferable to the netscape approach, I think). [The top four lines of each tcl file contain flags which change the program from working with the netscape plugin to working alone under X.] The other programs can be used by people on unix systems that have the relevant software (perl, gnuplot, octave.)


    Inference methods


    Hint: To force download to file, use shift-click

    The tcl programs and other links marked (*) can be run directly in netscape if you have got the tcl/tk plugin. You can also run the tcl programs in the normal way if you have a linux machine with X windows and tcl and you download the tcl files then execute them (this is preferable to the netscape approach, I think). [The top four lines of each tcl file contain flags which change the program from working with the netscape plugin to working alone under X.] The other programs can be used by people on unix systems that have the relevant software (perl, gnuplot, octave.)


    Online Hamiltonian Monte Carlo Demo

    (formerly known as `Hybrid Monte Carlo')

    Task: sample from the posterior distribution of a bivariate Gaussian distribution given ten data points. The left-hand image shows the details underlying a sequence of HMC transitions. The right-hand image shows 50 of the successive states produced by the Markov chain - the endpoints of the trajectories.
    Hamiltonian Monte Carlo demonstration details  Hamiltonian Monte Carlo demonstration 
    fit.tar tar file giving octave source code.


    There is also an introductory octave DEMO that explains Hamiltonian Monte Carlo and contrasts it with the Metropolis method, using the same "strongly-correlated Gaussian" example as is used in the Adler Overrelaxation demo.


    Hint: To force download to file, use shift-click

    The tcl programs and other links marked (*) can be run directly in netscape if you have got the tcl/tk plugin. You can also run the tcl programs in the normal way if you have a linux machine with X windows and tcl and you download the tcl files then execute them (this is preferable to the netscape approach, I think). [The top four lines of each tcl file contain flags which change the program from working with the netscape plugin to working alone under X.] The other programs can be used by people on unix systems that have the relevant software (perl, gnuplot, octave.)



    Hint: To force download to file, use shift-click

    The tcl programs and other links marked (*) can be run directly in netscape if you have got the tcl/tk plugin. You can also run the tcl programs in the normal way if you have a linux machine with X windows and tcl and you download the tcl files then execute them (this is preferable to the netscape approach, I think). [The top four lines of each tcl file contain flags which change the program from working with the netscape plugin to working alone under X.] The other programs can be used by people on unix systems that have the relevant software (perl, gnuplot, octave.)


    Sparse Graph Codes


    Other files

    • Images for Chromatic abberation demo: | 1 | 2 |
    chromatic aberration demo by David MacKay

    Links to other people's software

    Haskell (Naive) Bayes library by Frederik Eaton

    Site last modified Sun Aug 31 18:51:05 BST 2014