Optimize Genetic Algorithms in Python*

Ramya Ravi, AI software marketing engineer, Intel | LinkedIn*

Urszula Zofia Gumińska, development tools software engineer, Intel | LinkedIn

Chandan Damannagari, director, AI Software, Intel | LinkedIn

10/3/2024

Genetic algorithms (GA) are computational techniques inspired by the process of natural selection to solve constrained and unconstrained optimization problems. These algorithms are usually applied to various optimization problems, which are NP-hard, where finding a solution by standard methods consumes time and resources. GAs are based on an analogy of biological evolution and how human chromosomes behave.

This article demonstrates a code example on how to implement a general GA and perform an offload computation to a GPU using numba-dpex for Intel® Distribution for Python*.

Operations in GAs

For GAs, it is possible to obtain a high-quality result based on three important biology-inspired operations: selection, crossover, and mutation. To apply GAs to a specific problem, it is important to define the representation of the chromosome and the GA operations.

Selection

This is the process of selecting a mate and recombining it to create offspring. Parent selection is crucial for the convergence rate of GA as good parents drive individuals to better and fitter solutions.

Figure 1. An example of the selection process where half of the chromosomes are removed from the next generation.

The selection process usually requires some additional algorithms that will determine which chromosomes will become parents.

Crossover

This process is the same as biological crossover. Here, more than one parent is selected, and one or more offspring are produced using the parents' genetic materials.

Figure 2. An example of a crossover operation.

The crossover operation creates child genomes from selected parent chromosomes, as shown in figure 2. It can be a one-point crossover and only one child genome is created. The first part of the child genome comes from the first parent and the second half from the second parent.

Mutation

A small, random tweak in the chromosome provides a new solution. It is used to maintain and introduce diversity in the genetic population and is usually applied with a low probability.

Figure 3. An example of a mutation operation where one chromosome value is changed.

The mutation operation can change the chromosome as shown in figure 3.

Optimize Genetic Algorithms with Intel® Distribution for Python*

Intel® Distribution for Python* helps developers achieve near-native code performance using libraries like Intel® oneAPI Math Kernel Library (oneMKL) and Intel® oneAPI Data Analytics Library (oneDAL) . Researchers and developers can scale compute-intensive Python code from laptops to powerful servers using optimized NumPy, SciPy, and Numba*.
To optimize the genetic algorithm with Intel Distribution of Python, use Data Parallel Extension for Numba* ( numba-dpex ) range kernel. This kernel represents the simplest form of parallelism and data-parallel running over a set of work items with each work item representing a logical thread of running.

Let's look at the example implementation of the kernel for a vector-add operation:


import dpnp
import numba_dpex

@numba_dpex.kernel
def vector_add(item: numba_dpex.kernel_api.Item, a, b, c):
    i = item.get_id(0)
    c[i] = a[i] + b[i]

To run this code, create all three vectors, a, b and c, where a and b are filled with the data, and c stores the result of the operation. Then, you need to send the vectors to a specific device, like a GPU, and then run the kernel as follows:


N = 1024
a = dpnp.ones(N, device="gpu")
b = dpnp.ones_like(a, device="gpu")
c = dpnp.zeros_like(a, device="gpu")
 
numba_dpex.call_kernel(vector_add, numba_dpex.kernel_api.Range(N), a, b, c)

In the previous code, the vector-add operation was run on a GPU and the result was stored in vector c. Similarly, for any other function or any other algorithm, the implementation looks the same.

Code Implementation

To create the general GA and optimize the algorithm to run on GPUs using the numba-dpex for Intel Distribution for Python, see the code example . It also explains how to implement different operations of GA such as selection, crossover, and mutation and how to adjust these methods to other optimization problems.

1. Set up the environment and import the libraries:


import numpy as np 
import time 
import random 
import math

2. Initialize the population by setting the following:

Population: 5,000
Chromosome size: 10
Generations: 5.

Each chromosome contains 10 random floats between 0 and 1. To reproduce the results later, set the seed as follows:


random.seed(1111)

pop_size = 5000
chrom_size = 10
num_generations = 5

fitnesses = np.zeros(pop_size, dtype=np.float32)
chromosomes = np.zeros(shape=(pop_size, chrom_size), dtype = np.float32)
for i in range(pop_size):
  for j in range(chrom_size):
    chromosomes[i][j] = random.uniform(0,1)

3. Implement the GA by creating an evaluation method. This function is the baseline and comparison for numba-dpex . Here, the fitness of an individual is computed by an arbitrary set of algebraic operations on the chromosome.


def eval_genomes_plain(chromosomes, fitnesses):
  for i in range(len(chromosomes)):
    num_loops = 3000
    for j in range(num_loops):
      fitnesses[i] += chromosomes[i][1]
    for j in range(num_loops):
      fitnesses[i] -= chromosomes[i][2]
    for j in range(num_loops):
      fitnesses[i] += chromosomes[i][3]

    if (fitnesses[i] < 0):
      fitnesses[i] = 0

4. Implement the crossover operation: The inputs have two different chromosomes as the first and second parent. The output of the function returns one new chromosome.


def crossover(first, second):
  index = random.randint(0, len(first) - 1)
  index2 = random.randint(0, len(second) - 1)

  child_sequence = []

  for y in range(math.floor(len(first) / 2)):
      child_sequence.append( first[ (index + y) % len(first) ] )

  for y in range(math.floor(len(second)/ 2)):
      child_sequence.append( second[ (index2 + y) % len(second) ] )
      
  return child_sequence

5. Implement the mutation operation: This code example has a one percent chance to change every float in the chromosome to a different random number.


def mutation(child_sequence, chance=0.01):
  child_genome = np.zeros(len(child_sequence), dtype=np.float32)

  # Mutation
  for a in range(len(child_sequence)):
    if random.uniform(0,1) < chance:
      child_genome[a] = random.uniform(0,1)
    else:
      child_genome[a] = child_sequence[a]

  return child_genome

6. Implement the selection operation, which is the base for creating a new generation. Within this function, a new population is created after crossover and mutation operations.


def next_generation(chromosomes, fitnesses):
  fitness_pairs = []
  fitnessTotal = 0.0
  for i in range(len(chromosomes)):
    fitness_pairs.append( [chromosomes[i], fitnesses[i]] )
    fitnessTotal += fitnesses[i]

  # Sort fitness in descending order
  fitnesses = list(reversed(sorted(fitnesses)))
  sorted_pairs = list(reversed(sorted(fitness_pairs, key=lambda x: x[1])))

  new_chromosomes = np.zeros(shape=(pop_size, chrom_size), dtype = np.float32)

  # Roulette wheel
  rouletteWheel = []
  fitnessProportions = []
  for i in range(len(chromosomes)):
      fitnessProportions.append( float( fitnesses[i]/fitnessTotal ) )
      if(i == 0):
          rouletteWheel.append(fitnessProportions[i])
      else:
          rouletteWheel.append(rouletteWheel[i - 1] + fitnessProportions[i])

  # New population
  for i in range(len(chromosomes)):

      # Selection
      spin1 = random.uniform(0, 1)
      spin2 = random.uniform(0, 1)

      j = 0
      while( rouletteWheel[j] <= spin1 ):
          j += 1

      k = 0
      while( rouletteWheel[k] <= spin2 ):
          k += 1

      parentFirst = sorted_pairs[j][0]
      parentSecond = sorted_pairs[k][0]

      # Crossover    
      child_sequence = crossover(parentFirst, parentSecond)

      # Mutation
      child_genome = mutation(child_sequence)
      
      # Add new chromosome to next population
      new_chromosomes[i] = child_genome

  return new_chromosomes

7. Run the prepared functions, starting with a baseline on a CPU. As the first population is initialized, each generation contains the following steps:

1. Evaluation of the current population using the eval_genomes_plain function
2. Create the next generation using a next_generation function
3. Wipe fitnesses values, since the new generation was already created

The computation time for those operations is measured and printed. The first chromosome is also printed to show that the computations were the same on both CPU and GPU.


print("CPU:")
start = time.time()

# Genetic Algorithm on CPU
for i in range(num_generations):
  print("Gen " + str(i+1) + "/" + str(num_generations))
  eval_genomes_plain(chromosomes, fitnesses)
  chromosomes = next_generation(chromosomes, fitnesses) 
  fitnesses = np.zeros(pop_size, dtype=np.float32)
end = time.time()

time_cpu = end-start
print("time elapsed: " + str((time_cpu)))
print("First chromosome: " + str(chromosomes[0]))

8. Run on a GPU: Start with a new population initialization (the same as step 2), and then implement an evaluation function for the GPU. The only difference between GPU and CPU implementation is that on GPUs you use a flatten data structure for chromosomes. Also, instead of having to loop over all the chromosomes, use kernels from numba-dpex and a global index.


import numba_dpex
from numba_dpex import kernel_api

@numba_dpex.kernel
def eval_genomes_sycl_kernel(item: kernel_api.Item, chromosomes, fitnesses, chrom_length):
  pos = item.get_id(0)
  num_loops = 3000
  for i in range(num_loops):
    fitnesses[pos] += chromosomes[pos*chrom_length + 1]
  for i in range(num_loops):
    fitnesses[pos] -= chromosomes[pos*chrom_length + 2]
  for i in range(num_loops):
    fitnesses[pos] += chromosomes[pos*chrom_length + 3]

  if (fitnesses[pos] < 0):
    fitnesses[pos] = 0

9. Similar to the CPU, the time for evaluation, creation of a new generation, and a fitness wipe is measured while a GPU is running. Send all the chromosomes and the fitness container to the chosen device. Then you can run a kernel with a defined range.


import dpnp

print("SYCL:")
start = time.time()

# Genetic Algorithm on GPU
for i in range(num_generations):
  print("Gen " + str(i+1) + "/" + str(num_generations))
  chromosomes_flat = chromosomes.flatten()
  chromosomes_flat_dpctl = dpnp.asarray(chromosomes_flat, device="gpu")
  fitnesses_dpctl = dpnp.asarray(fitnesses, device="gpu")
  exec_range = kernel_api.Range(pop_size)
  numba_dpex.call_kernel(eval_genomes_sycl_kernel, exec_range, chromosomes_flat_dpctl, fitnesses_dpctl, chrom_size)
  fitnesses = dpnp.asnumpy(fitnesses_dpctl)
  chromosomes = next_generation(chromosomes, fitnesses)
  fitnesses = np.zeros(pop_size, dtype=np.float32)

end = time.time()
time_sycl = end-start
print("time elapsed: " + str(time_sycl))
print("First chromosome: " + str(chromosomes[0]))

Apply the same steps on other optimization problems. Define the chromosome, selection, crossover, mutation, and evaluation operations. The rest of running the algorithm is the same.

Run the previous code example and compare the performance of this algorithm while performing sequential running on a CPU and parallel running on a GPU. The code output illustrates that the performance speed increases using a numba-dpex parallel implementation that runs on a GPU.

Next Steps

Incorporate Intel Distribution for Python into your workflow and accelerate performance with optimizations for NumPy, SciPy, and Numba that scale from laptops up to powerful servers.
Feel free to contribute to the Intel Distribution for Python project on GitHub* .

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Optimize Genetic Algorithms in Python*

Get the Latest on All Things CODE

Operations in GAs

Selection

Crossover

Mutation

Optimize Genetic Algorithms with Intel® Distribution for Python*

Code Implementation

Next Steps

Additional Resources

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Optimize Genetic Algorithms in Python*

Get the Latest on All Things CODE

Operations in GAs

Selection

Crossover

Mutation

Optimize Genetic Algorithms with Intel® Distribution for Python*

Code Implementation

Next Steps

Additional Resources

Product and Performance Information