A "perceptron" classifies an input vector x as being in class 1 or class -1 based on whether w.x > theta, where w is the weight vector and theta is the threshold. Using y for the output, this is y = sign(w.x - theta) where sign(a) = +/- 1 depending on the sign if a. We can get rid of the threshold "theta" by moving to "homogeneous coordinates". This involves augmenting each input vector x with a new extra first element holding the number 1, and augmenting the weight vector with a new first element called the bias. Using x~ and w~ to denote these, we have w~ = (-theta w) x~ = (1 x) so w~.x~ = -theta + w.x so we can write y = sign(w~.x~) We will usually just write w or x when we mean these "augmented" w~ and x~. We now consider learning. Let d be the desired output for input x. We change the weights using w(new) = w - (y-d) eps x where eps is chosen to be just a little bigger than necessary to make sign(w(new) . x ) = d This is the perceptron learning rule. Given repeated presentations of the training data, it converges to a w that correctly classifies all the data, assuming such a w exists.