CAP-2: Genetic Algorithms

II.	GENETIC ALGORITHMS

Contents of this chapter

II.1 Basic Operations

II.2 Schema Theorem and Theoretical Background

II.3 Solving a Problem: Genotype and Fitness

II.4 Models of Evolution

II.5 Advanced Operators

II.6 Non-Conventional Genotypes

II.7 Important Issues in the Implementation of a GA

II.8 Response to Selection

II.1 Basic Operations

The basic operations first performed by a GA are the generation and evaluation of an initial population. Subsequently, a main loop of operations that include selecting the best individuals, crossing them and applying mutation on every locus (string position) is done. The solution is usually the best string that is present in the final population.

The computational effort in a normal GA usually resides in the evaluation of the initial and new strings. Normally the fitness function is complex: a mathematical function with many parameters, a full neural network that needs to inspect dozens of patterns for computing one single (float) fitness value, the simulation of a given system that yields a figure of merit of the string for this system, etc... Operators like selection, mutation, crossover or replacement are of linear complexity and work at constant rates for a given problem.

Figure 1. GA at Work.

A traditional GA works on binary strings of fixed length and applies fitness proportionate selection, one point crossover and bit-flip mutation. The initial population is generated by following an uniform random distribution, that is, every position in every string contains one 0 or one 1 with the same probability (0.5). The evolution model is represented by successive generations of populations, i.e., a new population completely replaces the old one. The algorithm stops when a predefined number of generations have been reached. Finally the evaluation of any given string is made by mapping the genotype to the phenotype by means of a binary-to-decimal operation for every group of bits of the genotype representing one problem parameter.

II.2 Schema Theorem and Theoretical Background

The theoretical basis of genetic algorithms relies on the concept of schema (pl. schemata). Schemata are templates that partially specify a solution (more strictly, a solution in the genotype space). If genotypes are strings built using symbols from an alphabet A, schemata are strings whose symbols belong to A U {*}. This extra-symbol must be interpreted as a wildcard, being loci occupied by it called undefined. A chromosome is said to match a schema if they agree in the defined positions. For example:

10011010

011

A schema can be viewed as a hyperplane in a

-dimensional space representing a set of solutions with common properties. Obviously, the number of solutions that match a schema H depend on the number of defined positions in it (its order, represented o(H)). Another related concept is the defining-length (represented

(H)) of a schema, defined as the distance between the first and the last defined positions in it. See Table 1 for examples.

Schema H	`o`(H)	(H)
1*******	1	0
011*	3	2
111***	3	3

Table 1. Examples of schemata.

Figure 2. The Concept of Schema.

The GA works by allocating strings to best schemata exponentially through successive generations, being the selection mechanism the main responsible for this behavior. On the other hand the crossover operator is encharged for exploring new combinations of the present schemata in order to get fittest individuals. Finally the purpose of the mutation operator is to introduce fresh genotypic material in the population (it is specially important as generations proceed).

Figure 3. Derivation of The Schema Theorem.

All this behavior is analytically described by the so-called schema theorem, where P's are the probabilities of crossover and mutation and m(H,t) is the number of instances (strings) the schema H has in the generation number t.

Figure 4. The Schema Theorem.

II.3 Solving a Problem: Genotype and Fitness

In order to apply a GA to a given problem the first decisions one has to make are concerning the kind of genotype the problem needs. That means that you must decide how the parameters of your problem maps into a binary string (if a binary string is to be used at all). This implies that you must choice the number of bits per parameter, the range of the decoded binary-to-decimal parameter, if the strings have constant or changing lengths during the GA operations, etc...

In traditional applications the strings use a binary alphabet and their length is constant during all the evolution process. Also all the parameters decode to the same range of values and are allocated the same number of bits for the genes in the string.

In most recent applications this situation is changing in a very important way. More and more complex applications need dynamic length strings or else a different non-binary alphabet. Typically integer or float genes are using in many domains as neural networks training, function optimization with a large number of variables, reordering problems as the Traveling Salesperson Problem, etc...

The traditional concept of a schema advises the utilization of binary genes because the number of schemata sampled per string is maximized. But a new interpretation of the schema concept has arrived to explain why many people have found very useful using other non-binary genes. This new interpretation (Antonisse, 1989) considers benefitial the utilization of higher cardinality alphabets due to their higher expression power (as many traditional AI paradigms consider). It is based on the consideration of the wildcard symbol (*) as an instantiation character for different sets of symbols and not the traditional wildcard symbol. To be precise, * is reinterpreted as a family of symbols *x, where x can be any subset of symbols of the alphabet. For binary strings these two interpretations are the same but for higher cardinality alphabets they are not.

Regarding genotypes, many other developments have pushed forward the utilization of non-string representations such as trees of symbols (e.g. Genetic Programming (Koza, 1992)) that are much more appropriate for a large number of applications. Also extending every allele of the string along with the position (locus) it occupies in the string is interesting because we can freely mix the genes of the same string in order to get together the building blocks automatically during evolution. This is very interesting when we do not know which are the building blocks of our problem and want the GA to discover them. By associating the locus to every gene value we can reorder the string in the correct way before evaluating the string. These mechanism are common to the so-called messy GAs (Goldberg, Deb, Korb, 1991).

Here we arrive to the problem of decoding (called expression) the genotype to yield the phenotype, that is, the string of problem parameters. This is necessary since we need to evaluate the string in order to assign it a fitness value. Thus we use the phenotype as the input to the fitness function and get back a fitness value that helps the algorithm to relatively rank the strings in the population pool.

The construction of an appropriate fitness function is very important for the correct work of the GA. This function represents the problem environment, that is, it decides who much well the string solves the problem. Many problems arise when the fitness function has to be constructed:

The fitness function depends on whether we want to maximize or minimize some criterion.

The environment can present noise in the evaluations (partial evaluation for example).

The fitness function may change dynamically as the GA proceeds.

The fitness function might be so complex that only approximations to fitness values can be computed.

The fitness function should allocate very different values to strings in order to facilitate the work of the selection operator.

The fitness function must consider the constraints of the problem. Normally if unfeasible solutions can appear the fitness function must allocate a small fitness value (if it is a maximization problem).

The fitness function might incorporate some different sub-objectives. Such a multi-objective function presents non-conventional problems when used in the GA.

The fitness function is a black box for the GA. You feed it with the phenotype and then you get the fitness value. Internally this may be achieved by a mathematical function, a complex computer simulator program or a human used that decides who good a string is (it is normal when strings encode images or music or something alike).

Regarding dynamic fitness functions a genotype of diploid genes can be used in which every value is determined by some relationship among two alleles encoded in the same string. The extension to K-ploid strings is straightforward. A dominance operator needs to be defined in order to get a single gene (parameter) value from the K alleles that determine it.

As the evolution proceeds the fitness values assigned by the fitness function to the strings are very similar since the strings we are evaluating are also very similar. There exist some different scaling operators that help in separating the fitness values in order to improve the work of the selection mechanism. The most common ones are (Michalewicz, 1992):

Linear Scaling: f'=af+b
Sigma Truncation: f'=max[0, f-(f*-c)], where f* (resp. ) is the mean fitness value (resp. standard deviation) in the population.
Power Scaling: f'=(f)^k

Also when very different strings get the same fitness values an important problem arises in that crossing them yield very bad individuals (called lethals) unable of further improvements. Sharing methods (Goldberg, 1989) help to allow the concurrent existence of very different solutions in the same genetic population. Also parallel models help in overcoming this problem.

Sometimes is very interesting to rank the population according to the fitness values of the strings and then to apply a reproductive plan that uses the rank position of the strings instead of the fitness values. This is good for ill-designed fitness function or when very similar fitness values are expected. This mechanism is called ranking (Whitley, 1989), and keeps a constant selection pressure on the population.

II.4 Models of Evolution

In traditional GAs the basic evolution step is a generation of individuals. That means that the atomic advance step of the GA is one generation. There exists another interesting alternative called Steady-State GAs in which the atomic step is the computation of one single new individual. In these algorithms after generating the initial population, the GA proceeds in steps that consist of selecting two parents, crossing them to get one (or two) single individual and mutating the resulting offspring. The new string is inserted back in the population and one of the pre-existing strings (a random string or the worst present string) leave the pool (normally if the new string is worst than the actual worst it is not inserted at all).

This one-at-a-time reproduction was initially introduced to overcome problems in the training of neural networks and was extended from there to the rest of present applications. It represents an interesting equilibrium between the exploration and exploitation edges of a GA, because the GA maintains a pool of strings (exploration, schemata, etc...) and at the same time it performs some kind of hill-climbing since the population is only pushed torward better solutions. Elitism and ranking are very common to steady-state GAs.

Finally there exists an intermediate model of evolution for a sequential GA in which a percentage of the full population is selected for the application of genetic operations. This percentage is called the GAP and it represents the generalization of any evolution model since it includes the two apposite approaches (generations versus one-at-a-time) and any other intermediate model.

Many other models of evolution exist in the field of parallel GAs depending on their granularity, homogeneity, etc... but we will discuss them in a following chapter.

II.5 Advanced Operations

Besides the traditional operations, a GA can incorporate new techniques at different levels. One of the most used innovations is to define new genetic operators improving in some way the behaviour of the general one-point crossover or bit-flip mutation for certain problems. Normally these new operators are the result of a theoretical study of GAs or else they are new versions of traditional operators that incorporate problem-dependent knowledge.

Figure 5. Some Examples of New Crossover Techniques: Single-Point, Double-Point, Uniform and Arithmetic Crossover Operators.

Also advanced genotypes can be used instead of the traditional binary encoding. Finally new schemes of evolution (steady-state, parallelism, ...) or foreign operators (hillclimbing operators, existing algorithms, ...) can be added to the GA skeleton, being applied with a given probability of success as the next generation is being created from the previous one.

Figure 6. A Steady-State GA Architecture with Different Operators.

II.6 Non-Conventional Genotypes

As mentioned before, genetic algorithms have traditionally used a pure binary genotype in which every parameter of the problem is encoded in binary by using a given number of bits. The number of bits for every gene (parameter) and the decimal range in which they decode are usually the same but nothing precludes the utilization of a different number of bits or range for every gene.

In order to avoid the so-called hamming cliffs the gray code is sometimes used when decoding a binary string. With this code any two adjacent decimal values are encoded with binary numbers that only differ in one bit. This helps the GA to make a gracefully approach to an optimum and normally reduces the complexity of the resulting search.

Nevertheless, the plethora of applications to which GAs have been faced have motivated the design of many different genotypes. This means that the string can be constructed over an integer or real alphabet, for example. Integer alphabets have interesting properties for reordering problems such as the Traveling Salesperson Problem and others. On the other hand floating-point genes are very useful for numeric optimization (for example for neural network training). These higher cardinality alphabets enhance the search by reducing the number of ill-constructed strings resulting from the application of crossover or mutation. Also, when a very large number of parameters has to be optimized, it is helpful to rely on shorter strings (also because populations and the number of steps are smaller).

If the genotype can be expressed as a string, we need to decide for any given application if the genome length is going to be kept constant or allowed to change. Also, it is an interesting decision to use diploid individuals in dynamic environments (i.e., environments in which the fitness function changes as the search proceeds). In a diploid individual two -homologous- chromosomes describe the same set of parameters. When decoding the string some kind of dominance operator need to be used in order to decide the value of the problem parameter (gene).

In many recent applications more sophisticated genotypes are appearing:

One individual can be a tree of symbols (like in Genetic Programming applications - see Figure 7).

The individual is a combination of a string and a tree.

Some parts of the string can evolve and some others don't in different moments of the search.

The individual can be a matrix of symbols.

The genotype could be a string of symbols that indicate operations to be performed. A simulator interprets the symbols and allocates fitness values depending on the result of the interpretation of the string.

Figure 7. Symbols Tree.

II.7 Important Issues in the Implementation of a GA

It follows an initial summary of issues relating the implementation of a sequential GA:

Do not implement the population as an array of fixed length: your implementation will not be flexible and sorting or insertions/replacements will bee slow operations. Instead use a list of pointers (specially if you are using a steady-state GA).

Do not make re-evaluations of individuals when the fitness value is needed. This error is very common when people develops general GA libraries. This might be negligible for toy applications but real applications do not admit two or more evaluations of the same individual (because of very high computational costs).

Do not implement in traditional imperative languages a library for evolutionary algorithms. Reuse, errors detection, comparisons and many other operations are readily easy if you use an object oriented language (and also an O. O. system design) for the implementation. Please notice that an implementation with an O. O. language is quite different from an O. O. GA system!.

Do not implement with a Logic Language, the implementation requires a lot of memory recurses and time of computation.

II.8 Response to Selection

In this section we summarize of the theory presented in Mühlenbein and Schlierkamp-Voosen (1995). Let

be the average fitness of the population at generation t. The response to selection is defined as

.
The amount of selection is measured by the selection differential

,
where

is the average fitness of the selected parents. The equation for the response to selection relates R and S:

.
The value b(t) is called the realized heritability. For many fitness functions and selection schemes, the selection differential can be expressed as a function of the standard deviation

of the fitness of the population. For truncation selection (selecting the T·N best individuals) and for normally distributed fitness, the selection differential is proportional to the standard deviation:

.
The vaule I is called the selection intensisty. For arbitrary distributions one can show the following estimate:

.
For normally distributed fitness the famous equation for the response to selection is obtained:

.
The above equation is valid for a large range of distributions, not just for a normal distribution. The response depends on the selection intensity, th realized heritability, and the standard deviation of the fitness distribution. In order to use the above equation for prediction, one has to estimate b(t). The equation also gives a design goal for genetic operators -- to maximize the product of heritability and standard deviation. In other words, if two recombination operators have the same heritability, the operator creating an offspring population with larger standard deviation is to be preferred.
The equation defines also a design goal for a selection method -- to maximize the product of selection intensity and standard deviation. In simpler terms, if two selection methods have the same selection intensity, the method giving the higher standard deviation of the selected parents is to be preferred. For proportionate selection as used by the simple genetic algorithm (Goldberg, 1989) it was shown by Mühlenbein and Schlierkamp-Voosen (1993) that

.
The equation shows that the selection intensity of proportionate selection goes to zero for t

inf, whereas truncation selection has a constant selection intensity. Proportionate selection selects too weak in the final stages of the search.

II.

GENETIC ALGORITHMS