HW2 NUIM CS401 F2005

The purpose of this assignment is to gain some practical experience with implementing and training linear units and feedforward sigmoidal networks.

Simple sample working code (in C) is available for a linear threshold unit and for a vanilla backpropagation network. You are welcome to use this, either by extending it appropriately, or by referring to it and copying or translating in constructing your own implementations, in whatever language you choose.

Turn in a brief writeup, hopefully interspersed with graphs and other graphical data. Handwritten is fine, if readable. Include your code as an Appendix. Rough-and-ready code is fine: functionality is what I care about.

Part 1

Train a linear threshold unit with the training dataset in the perceptron code directory, and show the evolution of the position of the separation surface. Also train a simple linear unit, with a quadratic error measure (ie LMS), and show the line in input space that separates positive and negative output of the trained unit. Explain why this is not a good class separation surface.

N.B. the code made available does not include a bias input or equivalently a constant additive term. You should include that missing term in your work.

Part 2

Train a backpropagation network on a difficult training set of your choice. (The random data in the backprop code directory would be fine, but finding or generating your own would be more fun.) Plot the training error rate as a function of training time, and also the error rate on a test set. Play around in an attempt to get the network to exhibit a non-monotonic mean error on the testing set despite a monotonically decreasing mean error on the training set.

N.B. the code made available does not include a bias input or weight. Instead a sleazy hack is used to fake up an extra bias input line. This does the job for the hidden units, but not for the output units. You can remedy that issue or not, as you choose, but you must justify your choice ... in writing.

Bonus points; look inside the network and figure out what the hidden units are doing and how it all works.

Due Date

12:00 Monday 17 October 2005.

Question/Answer Section

Q: What language should I use?
A: Whatever you like: you are free to make this decision yourself. You can choose FORTRAN66 or Emacs lisp and I might question your judgment but I won't hold it against you. If you would like to use this as an opportunity to learn a new language, I might suggest Octave (a free Matlab-like system), SciPy (Scientific Python, aka numpy: Numeric Python), or perhaps a system extremely popular among machine learning researchers and for good reason: R.

Q: What tool should I use to make graphs and plots?
A: Whatever you like. I would recommend using something canned instead of writing your own. (You can suck the data into a spreadsheet and generate a plot that way, if you're a serious masochist who eschews anything mechanizable and want a dramatic restriction on the number of points that can be plotted.)

Q: What do you mean by "mean error"?
A: The average error across all training patterns. This can be squared error or classification error, as appropriate for the task and situation.

Q: I've used C++ but your sample code is in C. Help!
A: Don't panic: the sample C code is also valid portable C++. I've avoided use of pointers even where they would have been idiomatic C to make it easier for C++ programmers to read. The only exception is in the I/O routines, which use C constructs instead of C++ ones. The output should be self explanatory. The read_double() routine just reads a double from stdin.

Q: The generate-training-set shell scripts are in a language I don't know (sh) and use utilities I don't have (paste, awk, rl). Help!
A: Don't panic: you don't need to run that script yourself if you don't want to. Instead you can just download the training-set file it generates. Instead of using the generating fresh new separate training and test sets, or making your data some other way, you can just split the training-set file into training and test portions. Certainly you don't need to worry about generating fresh data until you have everything else working.

Barak A. Pearlmutter