A "perceptron" classifies an input vector x as being in class 1 or
class -1 based on whether w.x > theta, where w is the weight vector
and theta is the threshold.  Using y for the output, this is
 y = sign(w.x - theta)
where sign(a) = +/- 1 depending on the sign if a.

We can get rid of the threshold "theta" by moving to "homogeneous
coordinates".  This involves augmenting each input vector x with a
new extra first element holding the number 1, and augmenting the
weight vector with a new first element called the bias.  Using x~ and
w~ to denote these, we have
 w~ = (-theta w)
 x~ = (1 x)
 w~.x~ = -theta + w.x
so we can write
 y = sign(w~.x~)
We will usually just write w or x when we mean these "augmented" w~
and x~.

We now consider learning.  Let d be the desired output for input x.
We change the weights using
 w(new) = w - (y-d) eps x
where eps is chosen to be just a little bigger than necessary to make
 sign(w(new) . x ) = d
This is the perceptron learning rule.  Given repeated presentations of
the training data, it converges to a w that correctly classifies all
the data, assuming such a w exists.