Operant Conditioning

    Operant conditioning, sometimes called instrumental conditioning is based on learning the relationship between one’s actions and their consequences. B.F. Skinner is regarding as one of the most famous pioneers of operant conditioning, but it is fair to say that his work was based on Thorndike's Law of effect.

    B.F. Skinner studied operant conditioning by conducting experiments using animals which he placed in a 'Skinner Box' which was similar to Thorndike’s puzzle box and he agreed with Thorndike’s contention that environmental consequences affect the probability of response, but he rejected Thorndike’s stress on terms such as “satisfying” and “annoying”. Skinner further developed the study of operant conditioning, distinguishing four important concepts: positive reinforcement, negative reinforcement, punishment and extinction.

I. Types of reinforcement:

      1.Positive reinforcement:in this type the probability that the desired response will be preformed is increased by giving the subject something it wants “reward” whenever it makes the desired response. Essentially, positive reinforcement refers to the case where you reward someone in order to in increase the frequency of a particular behavior. For example if you give your dog a biscuit “reward” every time he comes to your call, your dog will learn to respond in order to get a treat. Therefore, your dog will come to your call more often.

      2.Negative reinforcement: in this type the probability that the desired response will be preformed is increased by preventing something undesirable whenever the desired response is made. Negative reinforcement can be “turned off” once the desired response has been achieved. There are two types of negative reinforcement :

  • Escape:

the behavior removes something undesirable. For example the loud buzzer that tells you that you car seat belt is not fastened can be taking away by fastening your seat belt.

  • Avoidance:

the subject gets a warning that an aversive stimulus will soon occur, and the appropriate response completely avoids the aversive stimulus. For example heeding the warning inherent in a stop sign: if you stop first before entering the intersection, you are likely to avoid a crash. In avoidance your behavior your behavior stops an aversive stimulus from happening.

      3.Punishment: unlike the first two types in this type the probability that a response will be made is decreased by giving the subject something undesirable whenever the response is made. For example, sending a child to his room because he started a fight decreases the probability that he or she will start a fight again in the future. When the stimulus is applied it is punishment when removed it is negative reinforcement.

      4.Extinction: in this type the probability that a response will be made is also decreased. In this the behavior that used to bring reward no longer does so which leads to the extinction of the behavior. Such concept can also be found in the theory of Classical conditioning.

II.Partial reinforcement

      Researchers have found interesting effect when partially reinforcing behavior For example, if we trained rat A that each time it presses a lever it will receive food, and we trained rat B that every other time it presses a lever it will receive food, they would both press the lever fairly often If we began extinction training, however, where neither rat receives food after a lever press, extinction will take the longest in rat B. It turns out that it takes longer to extinguish the lever press for the rat who acquired the response while receiving only occasional reinforcement this is called partial reinforcement effect.

III. Schedules of reinforcement    

   There are four basic types of partial reinforcement:

      1.Fixed ratio: in the fixed ratio the subject receives reinforcement only after a fixed number of responses. For example a worker will receive money for say, every, 100 envelops stuffed.

      2.Variable ratio: the subject receives reinforcement after a varying number of responses. A classic example of that is a slot machine: dropping a coin in the slot machine will be reinforced after a variable number of pulls.

      3.Fixed interval: the subject will be reinforced on the first response after a fix period of time has elapse since the first reinforcement. For example an animal on a fixed interval of 45 seconds will receive the food pellet for the first lever press after 45 seconds had elapsed since the last reinforcement.

      4.Variable interval: the animal will be reinforced for the first response made after a variable amount of time has elapsed since the last reinforcement. However in variable interval there is an average time interval period. If the subject is being reinforced after every response we call that Continuous reinforcement schedule.

  We can see from the explanations given above that the schedule most resistant to extinction is the variable ratio.

IV. Important concepts

  • Discriminative stimulus:

which is a stimulus condition that indicates that the organism’s behavior will have consequences. For example one of the classic operant conditioning tasks: a pigeon pecking at a key to get a food pellet. The food pellet functions as a positive reinforcement, increasing the pigeon’s pecking behavior. Above the key, we then place a light that illuminates periodically. When the light is on, the pigeon will get a food pellet whenever it pecks the key. However, when the light is off it will not get a food pellet. So, the pigeon’s actions can be reinforced only when the light is on. The discriminative stimulus in this case is the light. This stimulus became a situation for operant response.

  • generalization:

Another concept to take in consideration is generalization a concept from classical conditioning also applies to operant conditioning. Let’s say that we train an animal to peck for food when a green light is on “discriminative stimulus”. After the training the animal will peck not only when the green light is on, but also when similar colored lights are on.

  • Shaping:

Let us say you want to train your dog to fetch your slippers. In order to do this, you would have to wait until your dog actually fetched your slippers the first time so you could reinforce the behavior, so that your dog would fetch your slippers the first time? You would probably be waiting forever because fetching slippers is not something dogs naturally do. In shaping, you reinforce successive approximations to the desired behavior. So for instance you might begin by reinforcing your dog every time he looks at your slippers. After he is doing that consistently, you would stop reinforcing him for just looking at your slippers, and start reinforcing him only when he walks towards your slipper. Once he is doing that you reinforce him only for picking up the slippers and so on. This is a simple example shaping would actually require much smaller steps and usually takes a longer time.

References:

  • Skinner, B. F. (1938). The Behavior of organisms: An experimental analysis.

  • Skinner, B. F. (1953). Science and human behavior.

  • Watson, J. B. (1913). Psychology as the Behaviorist views it. 

See Also:

Classical Conditioning

Behavior therapies

Trial and error

Conditioning Watson and little Albert

Behaviorism