A Modern Genetic Optimization Method (2019)
Genetic Optimization Using Direction-Based Stochastic Search
Review of Multi-Offspring Improved Real-Coded Genetic Algorithm (MOIRCGA)
This is a review of a modern genetic algorithm called multi-offspring improved real-coded genetic algorithm (MOIRCGA). The original paper can be found here. The method used by this algorithm is, formally, called heuristical normal distribution and direction-based crosser (HNDDBX). This algorithm uses generates a direction-based candidate solution and modifies it with some parameterized random variable. The coded algorithm can be found on my github.
Genetic algorithms have been used for optimization since the 1960's. With the increased computational power of modern computers, genetic algorithms have gained attention for solving complex, non-linear objectives. For example, evolutionary strategies are used in neural architecture search, a technique for automating the design of neural networks.
This paper was mainly focused on constrained optimization. This includes linear and non-linear programming problems. Linear programming can be used to find exact solutions for a parameterized function with a set of constraints. However, problems can not always be defined as a parameterized function. For example, if the problem relying on computing some metric on a large dataset, has a large candidate space, or has fewer constraints than parameters, it might be better to use evolutionary strategies.
Scipy has the following global optimization strategies: basin hopping, brute force, differential evolution, SHGO, and dual annealing. The full list can be found here.
- Optimization Components
- Normal Distribution Crossovers
- Uniform Distribution Crossovers
- Combinatorial Mutations
The code for this example can be found here. A test example for constrained optimization is shown below. The problem has both equality and inequality constraints. Constraints cannot be explicitly applied to genetic algorithms due to the stochastic nature of the algorithm. An L2 penalty can be added to the constraint functions to account for constraints, where p1 and p2 are scaling parameters.
Where eq and ineq are the equality and inequality constraints, respectively. The initial population is generated from a random uniform distribution between the upper and lower bound for each parameter.
After the initial population is generated, the losses are calculated for each candidate solution, and the candidates are sorted in ascending order. The population is split into 1…n (top 50th percentile) and n+1…m (bottom 50th percentile). The algorithm will uses these two groups to heuristically calculate the variation or energy of the search space. For example, if there are 100 candidates, the heuristic will use the difference of 1st ranked candidate with the 50th ranked candidate. The algorithm will cycle through each candidate pair and generate new candidate solutions.
The direction components are made of averages between the top candidates. There are four main components:
(1) candidate pair XNi, A
(2) candidate pair XMi, B
(3) the average of the top N candidates, C
(4) the current best candidate, D
(5) an average of (1), (3), and (4), E.
Normal Distribution Crossovers
The first crossover searches the space between the C, D, and A. The search energy is defined as the squared uniform distance between A and B. The larger the difference between the ith ranked candidate and the 50+ith ranked individual, the larger the search space will be. The second crossover searches around the D candidate using the square uniform distance between the D and C as the search energy.
Uniform Distribution Crossovers
The first crossover uses the difference between A and B as surrogate gradient. A standard uniform distribution is used with shift the D in the direction of the surrogate gradient. The second crossover uses a surrogate gradient with (D-E), and it shift E in the direction of the D.
A percentage of the crossover candidates will be mutated by three different operations. The operations can be randomly selected. The standard Cauchy distribution has a median value of 1, but its mean and variance are undefined. The first mutation changes the scale and/or direction of the candidate by some proportion of generated Cauchy number + 1. The second mutation re-samples the candidate around the D by the squared uniform distance of D to the candidate. The third mutation uses Levy flights to simulate a random walk characterized by many small changes and a few large displacements.
The chance of getting duplicate candidate solutions increased with the number of candidates and convergence to the optimal candidate. In order to keep enough variation in the sample of candidates, the duplicate candidates are removed and replaced with randomly generated candidates.
The MOIRCGA uses soft margin constraints with various distribution-based search strategies. The direction-based components of the search space are calculated by averaging different top candidates. The energy of the search space is estimated heuristically by the squared uniform distance between various direction-based components. Various mutations are used to effectively search the space around a large space around a candidate. Steps are taken to preserve the variation in the candidate population.