INTERMITTENT REINFORCEMENT SCHEDULES
A lot of conventional trainers are opposed to clicker training
because it uses food rewards. They claim, quite correctly, that
a dog who is always rewarded with food will become dependent on
the food reward and will refuse to work unless there is food present.
This is quite true. However, in clicker training, we don't reward
every behaviour.
Generally, when you are training a new behaviour, such as targeting
the dresser drawer, it's a good idea to c/t for every correct response.
This is called a continuous reinforcement schedule.
Once the dog knows the behaviour and it's on cue, it's time to
introduce an intermittent reinforcement schedule. Now, there are
whole books written about the subject of reinforcement schedules,
but we're going to use just one - the variable ratio reinforcement
schedule (what a mouthful!). Again, an example will be useful, so
let's suppose you've taught your dog to sit on cue, and you now
want to put the behaviour on a variable ratio reinforcement schedule.
The ratio is the proportion of sits which are rewarded, and the
variable is the number of sits in between reinforcements. First
you decide what ratio, or proportion of sits you want to reinforce.
This means that you reinforce 1 in 5 sits, or 1 in 10 sits, or 1
in 20 sits, whatever ratio you decide on. Suppose you decide to
reinforce 1 out of every 5 sits.
You then make sure that you average 1 reinforcement to every 5
sits, but that you vary the number of sits in between reinforcements
(hence variable ratio - these names do make sense, sort of). It
is very important not to reinforce the dog on every 5th sit, as
he will see a pattern emerging. However, after a large number of
sits he should have received on average 1 reward for every
5 sits.
So in 20 sits you would give 4 rewards, but NOT on the 1st, 6th,
11th and 16th sits! You might give one on the 2nd sit, one on the
9th, one on the 12th and one on the 17th, in other words, the number
of sits which don't get rewarded is different each time, and the
dog has no way of working out in advance which sit is going to be
rewarded.
A reinforcement schedule like this has the effect of making the
behaviour really persistent. It works with people too. Suppose you
know that the coke machine in the office is very reliable and always
gives you a tin of coke when you put your money in and push the
button. If one morning you put your money in, push the button and
nothing happens, are you going to try again? You might try once
more, but then you'll probably give up and phone the vending company.
This is exactly what happens when you stop reinforcing a behaviour
which has been trained on a continuous reinforcement schedule -
the behaviour disappears pretty quickly.
Now consider the case where you put money in a slot machine. You
don't know when you're going to win, you don't know how many times
you're going to have to put money in before you get your payoff,
but you keep going, especially if you've won something before, because
you just know that payoff is just around the corner. You've been
put on a variable ratio reinforcement schedule!
|