In another post I explained the power of a variable reward schedule and how to use it into your advantage. A variable ratio schedule is the most powerful reward schedule because it takes the longest for a behaviour to become extinct. How can you use this information in re-training undesired behaviour?
‘Extinction’ of behaviour
Extinction means that the behaviour will never be displayed in a certain situation. There is 0% chance of a reward, so therefor the behaviour has become ‘useless’ in that situation.
This is what we want to accomplish when a horse displays undesired behaviour, like kicking the stall door. We want to ignore the behaviour in order to make clear that this will not get him anywhere.
Why does it often seem not to work at all (ignoring undesired behaviour)?. It is because of a natural occurrence in learning that is called ‘extinction burst’.
Once the owner decides to ignore this undesired behaviour in order to let it become extinct (0% chance of a reward so therefor displaying the behaviour has no value for the horse anymore) the behaviour will first show an ‘extinction burst’.
During the extinction burst the horse will show an increased amount effort in the hope for a reward. If one decides to ‘reward’ (read: react) to this undesired behaviour in any way, even if it is with shouting at the horse in an attempt to punish this undesired behaviour, chances are that the horse regards this as his reward. After all, it is the receiver (horse) who determines if something is a reward.
How to handle it
If the horse kicks a door in order to get your attention and he gets what he wants, it is a reward. Every time an extinction burst is rewarded it takes longer for the behaviour to become extinct.
So if you expect the horse wants your attention, make sure he doesn’t get it. Every time he kicks his stall door walk out of sight or turn your back. In this way you make sure you don’t give him attention for kicking the door.
If you want to let a behaviour go extinct the extinction burst is the most important moment not to reinforce.
This is also the moment most people are tempted to react. The person interprets the increased undesired behaviour as ‘the horse hasn’t learnt anything’ and because the bad behaviour increased (instead of decreased) they feel the need to interfere in the hope punishment will solve this.
A second, smaller extinction bursts can occur over time, which are called spontaneous recovery of behaviour. In the case of our horse kicking the barn door, he might show the behaviour again but less extreme. When the extinction burst(s) don’t get reinforced the behaviour will go extinct.
In dealing with undesired behaviour we always want to know what caused the behaviour, so we can work on that too.
Sometimes it is really hard to determine what reinforces a certain undesired behaviour. If the behaviour is ‘self rewarding’ just ignoring the behaviour won’t work. The horse will get his reward regardless what you are doing. Then you have to figure out how you can reinforce the opposite behaviour more than the undesired behaviour or find a way to prevent it.
Rewarding the opposite behaviour
In the case of door kicking you can ignore the noise and start rewarding the horse for ‘four hooves on the ground’. In this way you communicate what it is you do want from the horse: standing still. Use the reward he wants for the undesired behaviour: your attention or during feeding time the food.
This approach works really well, but it takes a lot of effort from the trainer. You must be paying attention when the horse is standing still and is quiet. That can be a bigger challenge than just ignoring the door kicking.
Make sure everybody is on the same page if you want to re-train behaviour like door kicking. Ask everyone to follow the simple rules: go to horses that stand still and look for attention, ignore the door kickers.
Every time an extinction burst is rewarded, the behaviour becomes stronger. Something you want in training desired behaviours, not in re-training undesired behaviours.