V. |
DIRECTLY RELATED GA TECHNIQUES |
Running a genetic algorithm entails setting a number of parameter values. Finding settings that work well on one problem is not a trivial task; if poor settings are used, a genetic algorithm performance can be severely impacted. This section deals with a new technique for setting the probabilities of applying genetic operators during the course of a run. The technique involves adapting the operator probabilities based on their observed performance as the run take place. In the section that follows, we summarize prior work in determining reasonable operator probabilities, describe experiments measuring the performance of the new technique, and describes a use of the technique in its first real-world application.
The problem dealt with in this paper is that a given genetic operator will be applied during reproduction. Setting operator probabilities correctly can be very important when one has limited resources with which to solve a problem or when one is comparing the performance of a genetic algorithm with the performance of other types of algorithms on the same problem. Two useful strategies have been employed by researchers in the genetic algorithm field to find good operator probabilities. One was contained in Ken DeJong's thesis work (DeJong, 1975); the other was described by John Grefenstette in (Grefenstette, 1986).
The problem DeJong and Grefenstette were attacking was that of determining robust parameter settings for population size, operator probabilities, and evaluation normalization techniques. The problem of adapting genetic algorithm parameters in general was simplified, in that the only chromosomal representation technique used was the bit string, and only two operators were considered - mutation and crossover. This is a simpler problem that of finding robust parameters for genetic algorithms across all representations, across the range of all possible operators, and so forth. Yet, even stripped down to the simple form just given, the problem is not easy to solve.
A first difficulty is that even in this stripped-down form, the problem is not precisely specified. We are asking for good values for the parameters, but measuring how good such values are is non-trivial, since the range of potential applications of genetic algorithms using binary representations is of infinite size. DeJong solved this difficulty by simplifying the problem more: he used a test suite of function optimization problems compiled by researchers in the field of function optimization theory to stand in for the full range of possible domains. DeJong's choice of functions in the test suite was a good one.
A second difficulty in solving this optimization problem given a test suite is that the evaluation function is noisy. Genetic algorithms are stochastic, and the same parameter settings used on the same problems by the same genetic algorithm generally yield different results. A consequence of this fact is that it can take a tremendous amount of computer time to find good parameter settings across a number of problems. DeJong's research was of immense benefit to people using genetic algorithms because it provided them with parameter values that were robust, in the sense that they had been validated on a number of different problems of different types.
The problem of noise in the evaluation function meant that DeJong was forced to derive his parameter values by hand. Several years later, however, as computing resources had become cheaper , Grefenstette derived new values for the parameters in question by using a genetic algorithm to do so. Grefenstette approach was to evolve parameter values encoded in binary on a chromosome, evaluating each chromosome by assigning the values it encoded to a genetic algorithm that was run on each problem in DeJong's test suite. The result, at greater cost in computing resources, was a new set of parameter settings that out-performed DeJong's settings.
Given this history of difficulty in optimizing six parameters involved in a genetic algorithm with a simple representation format and two genetic operators, it is understandable that many researchers have shied away from considering representations that are not binary and operators that are different from the two that have been parameterized at such expense. But this raises a third problem suggested by the research described above: a great many real-world optimization problems appear ripe for solution by genetic algorithms, yet the binary representation appears ineffective or inefficient for them. Further, operators other than binary crossover and binary mutation appear to contribute to good performance in those domains. A second, and potentially more important, problem, then, is that of parameterizing a genetic algorithm that differs from the type studied so thoroughly by researchers in the field.
The genetic algorithm that has been used to test adaptive operator probability routines had the following characteristics. It was steady state; that is, instead of carryng out generational population replacement, children were inserted after each reproduction event. The system used a ranking technique for normalizing the evaluations of each individual, and the normalized evaluations were the result of assigning exponentially decreasing values to each individual in ranked order. Thus, given a value of .95 with which to produce evaluations, the best individual in the population was given 1000 x .95 = 950, the next was given 950 x .95 = 902.5, and so on, with the exception that no individual was given a normalized evaluation less than 1. The parameter that determined the normalized evaluations was interpolated from .95 to .3 over the course of the run, with an interpolation interval of 50 children. Deletion of population members was done by removing the last member of the population. The population size was 100. the total number of individuals evaluated in each run was 1000.
The system was operator-based: that is, each reproduction event was the result of asking a single operator to apply itself. Each reproduction event used exactly one of the following operators:
The term "guaranteed" is a local check that an operator is not producing a child that is a copy of its parents.
The idea that one should adapt one's genetic algorithm during the course of a run is not a new one. C. Shaefer (Shaefer, 1986) has built an approach to numerical optimization around this idea in his ARGOT program. The thing adapted in ARGOT is the representation itself, and pursuing this idea has led Shaefer to some interesting results. Shaffer and Morishima (Shaffer, Morishima, 1987) have investigated encoding crossover points on chromosomes, so that information controlling crossover locations evolves during run. Darrell Whitley (Whitley, 1987) has investigated the adaptation of the reproductive fitness of a parent based on performance of its offspring. The idea described here is that one adapts the probability that a genetic operator will be applied in reproduction based on the performance of the operator's offspring.
The first intuition underlying the adaptation mechanism is that an adaptive mechanism should alter the probability of applying an operator in proportion to the observed performance of the individuals created by that operator in the course of a run.
A second intuition underlying the mechanism is that purely local measures of performance are not suficient. One must reward an operator that produces a good child, but one must also reward an operator that set stage for this production. This intuition is grounded in the sort of situation that often happens toward the end of a run, whe one's population is mostly converged.
This intuitions were implemented in the following way. Whenever a new member was added to the population, a pointer was established to its parent or parents, and a pointer was established to the operator that had created the new member. At the same time, a check was made to determine whether the new member was better than the current best member of the population. If so, the amount that it was better was stored with it as its "local delta". Each of the operators above was given an initial probability. Periodically, these weights were altered, using the following parameters: W (the adaption window) - how far back to consider individuals in computing delta; S - the amount of weight that will be shifted each tine adaption is done; I - the interval between adaptations; P - the proportion of delta that is passed back to an individual's ancestors when computing delta; M - a limit on the number of generations that delta is passed back.
The technique for computing the delta associated with each operator is as follows. First, each of the last W individuals added to the population passes P times its local delta back to its parents. This quantity is shared among the parents. Each of those parents passes P times the quantity it received back to its parents, and so on until delta has been passed back M generations or until an ancestor outside the adaption window is about to receive delta. Now each of the last W individuals in the population computes its delivered delta by summing its local delta and its inherited delta. The delta for an operator over an adaptation interval, given individuals I1...In that were created by the operator, is the sum of the derived daltas for I1 through In< divided by n. That is, the delta for an operator over an interval is the sum of the derived deltas of the individuals it produced over the interval, divided by the number of individuals it produced. If an operator produced no individuals, its delta is 0.
If we have j operators, a current vector of probabilities V of length j, and a vector D of derived deltas associated with each operator, we carry out the adaptation of the probabilitiesin V in the following way. If there is no delta over the interval, the probabilities are not changed. Otherwise, we form V'=((100-S)/100)*V, (V' is V with S% proportionally removed). Next, we form D', the result of normalizing D so it totals S. Now we set the new vector of operator probabilities V''=V'+D'.
The mechanism described here has several features that should be noted. It allows rapid parameterization of operator probabilities across the range of potential genetic algorithms and operator set. As described here, it is tailored to a steady state reproduction scheme. It would not be literally applicable to problems with noisy evaluation functions. Further, Davis believes that the feedback rate would be greatly damped using generational replacement. Thus, futher work will be required to adapt it to generational genetic algorithms and noisy-reinforcement domains.
Using an adaptive mechanism leads to performance slightly inferior to that carefully-derived iterpolated parameter settings on th test problem tha Davis performed. Observation of those cases in which th adaptive mechanism does badly shows that there are many in which an operator's probability varies wildly from the average trajectories.
This suggests that one could derive parameters for interpolating values by doing a manipulation of th adaptive trajectories - by doing a best fit of a line to the adaptive trajectories, for example. The adaptive mechanism uses strictly local information to modify operator probabilities. The interpolation mechanism uses more global information. It appears that we could combine both mechanisms to out profit.
The most significant parameter impacting solution of the test problem was not the relative probabilities of the operators. Rather, it was a parameter having to do with the selection of parents. This parameter, too, can be adapted usefully over the course of a genetic algorithm run.
The mechanism reported here is in its infancy, but it has already provided invaluable assistance in several real-world applications of genetic algorithms to problems with representations and operator sets that are widely different from the representation and operator sets studied by DeJong and Grefenstette. There is no good deal of work to do in order to make the mechanism more robust, to study its properties further, to run it using binary representation on the standard test suite, and so forth. Combined with techniques for adapting the other parameters in a genetic algorithm, it's hopped that it will help us to produce quick and effective applications to real-world problems.