Home >> Clicker Primer >> Page Eighteen
  

INTERMITTENT REINFORCEMENT SCHEDULES

A lot of conventional trainers are opposed to clicker training because it uses food rewards. They claim, quite correctly, that a dog who is always rewarded with food will become dependent on the food reward and will refuse to work unless there is food present.

This is quite true. However, in clicker training, we don't reward every behaviour.

Generally, when you are training a new behaviour, such as targeting the dresser drawer, it's a good idea to c/t for every correct response. This is called a continuous reinforcement schedule.

Once the dog knows the behaviour and it's on cue, it's time to introduce an intermittent reinforcement schedule. Now, there are whole books written about the subject of reinforcement schedules, but we're going to use just one - the variable ratio reinforcement schedule (what a mouthful!). Again, an example will be useful, so let's suppose you've taught your dog to sit on cue, and you now want to put the behaviour on a variable ratio reinforcement schedule.

The ratio is the proportion of sits which are rewarded, and the variable is the number of sits in between reinforcements. First you decide what ratio, or proportion of sits you want to reinforce. This means that you reinforce 1 in 5 sits, or 1 in 10 sits, or 1 in 20 sits, whatever ratio you decide on. Suppose you decide to reinforce 1 out of every 5 sits.

You then make sure that you average 1 reinforcement to every 5 sits, but that you vary the number of sits in between reinforcements (hence variable ratio - these names do make sense, sort of). It is very important not to reinforce the dog on every 5th sit, as he will see a pattern emerging. However, after a large number of sits he should have received on average 1 reward for every 5 sits.

So in 20 sits you would give 4 rewards, but NOT on the 1st, 6th, 11th and 16th sits! You might give one on the 2nd sit, one on the 9th, one on the 12th and one on the 17th, in other words, the number of sits which don't get rewarded is different each time, and the dog has no way of working out in advance which sit is going to be rewarded.

A reinforcement schedule like this has the effect of making the behaviour really persistent. It works with people too. Suppose you know that the coke machine in the office is very reliable and always gives you a tin of coke when you put your money in and push the button. If one morning you put your money in, push the button and nothing happens, are you going to try again? You might try once more, but then you'll probably give up and phone the vending company. This is exactly what happens when you stop reinforcing a behaviour which has been trained on a continuous reinforcement schedule - the behaviour disappears pretty quickly.

Now consider the case where you put money in a slot machine. You don't know when you're going to win, you don't know how many times you're going to have to put money in before you get your payoff, but you keep going, especially if you've won something before, because you just know that payoff is just around the corner. You've been put on a variable ratio reinforcement schedule!