Classical and Operant Conditioning
© 2010
This eText is the property of Toru Sato. All rights reserved © 2010. This eText is not to be copied, distributed, or downloaded without permission of the author. Any violation of copyright found in this eText is unintentional. Please notify the author if copyrighted material is found and not appropriately referenced.
The two most common types of conditioning are called classical and operant conditioning. Let us first begin with classical conditioning, a theory that originated from the discoveries of I. P. Pavlov (1927) and later elaborated on by J. B. Watson (1930) among many others.
Classical Conditioning
There are four important concepts to understand in classical conditioning; unconditioned stimulus, unconditioned response, conditioned stimulus, conditioned response. Below is a description of each.
An unconditioned stimulus is one that naturally or automatically triggers a response. For example, when you see the person you are in love with (we will call him/her "your lover"), your heart begins to race. In this example, the sight of your lover is the unconditioned stimulus.
An unconditioned response is a response that occurs naturally in response to the unconditioned stimulus. If we use the previous example with your lover, your heart racing in response to the seeing your lover is the unconditioned response.
A conditioned stimulus is a previously neutral stimulus that, after being presented together with the unconditioned stimulus numerous times, eventually comes to trigger a new response. For example, suppose you smell a sample of this fragrance at a department store. You don't find it special and it certainly does not make your heart race. Now suppose that your lover begins using this fragrance afterwards. Every time you see your lover, you smell the fragrance s/he is wearing. After a while, the fragrance itself would eventually make your heart start racing. In this case, the fragrance has transformed from being a neutral stimulus to a conditioned stimulus. The heart racing in response to the fragrance has now become the conditioned response. A conditioned response is the learned response to the conditioned stimulus.
Conditioned stimuli and responses may sometimes undergo extinction. Extinction occurs when the conditioned response decreases or disappears. This means that after we have learned something through classical conditioning, we can unlearn it. When the conditioned stimulus and the unconditioned stimulus are no longer presented together, the conditioned response usually decreases or disappears. For example, if your lover (unconditioned stimulus) switched to another fragrance, after a while, your heart would no longer begin racing when you smell the original fragrance.
Operant Conditioning
The other common type of conditioning is known as operant conditioning. The early ideas of operant conditioning came from Edward Thorndike (1932). With his research on animals, Thorndike observed that the most basic form of learning occurs through trial and error. Through his many observations, he eventually developed the most basic idea behind operant conditioning known as the "Law of Effect." The Law of Effect states that, "behaviors are more likely to be repeated if they lead to satisfying consequences and less likely to be repeated if they lead to unsatisfying consequences." Operant conditioning, however, is most commonly associated with the work of B. F. Skinner (1976), a scholar who elaborated on Thorndike's ideas through systematic research and logic. Let us examine the core concepts of operant conditioning (most of which were intitially developed by Skinner), known as reinforcement, punishment, and extinction. The following is a brief explanation of each of these concepts.
When we want to increase the likelihood of an organism behaving in a certain way we use reinforcement. Reinforcements can be divided into two categories, positive and negative.
Positive reinforcement occurs when a certain behavior is followed by the presentation of a pleasant stimulus. This makes the organism more likely to behave in the same way again. For example, let us say that you completed your homework right after arriving home from school and your mother rewards you by giving you a cupcake. The cupcake you received may act as a positive reinforcer, the stimulus that is responsible for the positive reinforcement. This may motivate you to complete your homework right after arriving home from school again in the future.
Negative reinforcement occurs when a certain behavior is followed by the removal of an unpleasant stimulus. For example, let us say that you have a headache. You take some pain medication to relieve your headache. If after twenty minutes, your headache is relieved, the pain medication may have acted as a negative reinforcer, the stimulus that is responsible for the negative reinforcement. This may motivate you to take the same medication again the next time you have a headache.
In addition to positive and negative, reinforcers can be divided into two categories in a different way. The two categories are called primary and secondary reinforcers. A primary reinforcer is a stimulus that is directly related to our survival. Food and water might be considered good examples of primary reinforcers. In most cases, obtaining these things make us feel good because they have a direct effect on our survival. A secondary reinforcer is a stimulus that is not directly related to survival but is associated with a primary reinforcer. A good example of a secondary reinforcer may be money. Money often allows us to buy things such as food to help us survive but it is only useful because it is associated with being able to buy those things. If we were starving in the middle of the desert with nobody we can contact, having money would be quite useless for our survival.
When we want to decrease the likelihood of an organism behaving in a certain way we use punishment. Punishment can also be divided into two categories, positive and negative.
Positive Punishment occurs when a behavior is followed by the presentation of an unpleasant stimulus. For example, if you pet your neighbor's dog and he bites you, you experience pain. This pain from the dog bite may act as a positive punisher, the stimulus that is responsible for the positive punishment. This may motivate you to avoid petting your neighbor's dog the next time you see him.
Negative Punishment occurs when a behavior is followed by the removal of a pleasant stimulus. For example, when your child misbehaves, she may lose the privilege of playing with her favorite toy. This loss of privilege will act as a negative punisher and decrease the likelihood of her misbehaving in the future.
Anything we have learned through operant conditioning may also go through extinction. In extinction the association between the behavior and its consequence is weakened by not experiencing the consequence we used to experience after the behavior. For example, let us say that every time you complete your homework right after arriving home from school your mother rewards you by giving you a cupcake. After your 10th birthday, she stops giving you cupcakes for completing your homework right away. This makes you less motivated to complete your homework right away and eventually you stop doing it. In this case, you have experienced extinction.
Schedules of Reinforcement
When we think of reinforcement, we often assume that we are reinforced every time with engage in a certain behavior. This is often referred to as continuous reinforcement. In real life, however, it is quite rare that continuous reinforcement occurs in any consistent way. This raises an important question. When we are not reinforced every single time we behave in a certain way, what happens? Depending on the pattern or "schedule of reinforcement," it can have different effects. There are four most commonly discussed schedules of reinforcement; fixed ratio, variable ratio, fixed interval and variable interval. The following is a brief description of each of these.
When we are on a fixed ratio schedule of reinforcement, we are reinforced after a specific number of times we repeat a certain behavior. For example, if you are working in a bakery and you are paid by the number of cookies you make, you are on a fixed ratio schedule of reinforcement. Because the amount of work you do is proportionate to the amount of reinforcement received, people on a fixed ratio schedule tend to repeat the behavior multiple times very quickly.
When we are on a variable ratio schedule of reinforcement, we are reinforced after an unpredictable number of times we repeat a certain behavior. For example, if you are playing on a slot machine in a casino, you do not know how many times you will have to play until you win. Some days it may be five, on other days it may be three hundred. This is a variable ratio schedule of reinforcement. People on a variable ratio schedule tend to repeat the behavior multiple times very quickly. This is because the more times you repeat the behavior, the higher your chances of being reinforced sooner! Furthermore, when we are on a variable ratio schedule of reinforcement, it is difficult to stop repeating the behavior. We always think "Maybe I will win (i.e. be reinforced) the next time!" This is one of the reasons why gambling is so dangerously addictive.
When we are on a fixed interval schedule of reinforcement, we are reinforced on the first response after a specific amount of time has passed. For example, if you receive a weekly pay check at work, you are on a fixed interval schedule of reinforcement. In contrast to the fixed ratio schedule, it does not matter how many times you repeat a behavior. The important thing is to do it once after a certain time has passed (e.g., go to work on pay day). As you can imagine, when we are on a fixed interval schedule of reinforcement, we do not repeat the behavior very quickly but make sure we do after a certain amount of time goes by. This is why people who receive their weekly paychecks at work are less likely to call in sick on pay day than any other day of the week.
When we are on a variable interval schedule of reinforcement, we are reinforced on the first response after an unpredictable amount of time has passed. For example, if you are consistently studying in a class that has pop quizzes, you are on a variable interval schedule of reinforcement. You do not know when you will have your next pop quiz. It might be tomorrow or three weeks later. But if you keep studying you are reinforced by receiving a good grade every time there is a pop quiz. As you can imagine, when we are on a variable interval schedule of reinforcement, we do not repeat the behavior very quickly (e.g., reading over your class notes) but keep repeating it in a steady fashion (e.g., read over your class notes before every class).
References
Pavlov, I. P. (1927). Conditioned Reflexes: An Investigation of the Physiological Activity of the Cerebral Cortex (trans. & ed. G. V. Anrep). London: Oxford University Press
Skinner, B. F. (1976). About Behaviorism. New York: Vintage Books.
Thorndike, E. (1932). The Fundamentals of Learning. New York: Teachers College Press.
Watson, J. B. (1930). Behaviorism (revised edition). Chicago: University of Chicago Press.
Back to Toru Sato's General Psychology page