bamboocqh 2018-10-07
通常在机器学习中,我们总是需要优化特征或参数。这就是为什么在一些机器学习技术中使用优化算法变得非常普遍的原因。遗传算法在机器学习的许多领域都得到了应用。我将在本教程中尝试解释什么是遗传算法。我们还将实现遗传算法的一个简单版本,并解释其背后的直觉。然后,我们将尝试理解遗传算法在机器学习中的一些用例。
遗传算法是一种基于进化理论的优化算法。它的核心是基于一个适应度函数来选择最优的参数或成分,该适应度函数表示它们如何很好地满足所期望的优化问题。然后这些参数发生突变,希望它们能产生更健康的后代。这是遗传算法背后的基本直觉。
一个基本的遗传算法有以下步骤:
下面是要解释的简单流程图
我们现在将从头开始实施遗传算法。我们将寻求优化简单的线性方程--F(X)= A1X1 + A2X2 + A3X3 + A4X4。其中X1 = 4,X2 = -2,X3 = 3.5,X4 = -4.2
全部随机选择。
遗传算法将帮助我们找出最佳乘数--A值 - 它将产生我们给定代数的最大F(X)。我们创建了一些函数来帮助我们执行适应度计算,父母选择,交叉和变异。
我们将执行10代以上的GA,每代有6个解决方案。
我们的基本遗传算法将按以下步骤完成:
初始化种群:这涉及初始化您的样本解决方案。对于我们的种群解决方案,将初始化它们在-5和+ 5之间(随机选择),如下所示:
#Import numpy library import numpy as np #Set seed for reproducability np.random.seed(100)
创建适应度计算函数:我们现在将继续循环数代,并从我们的初始解决方案开始计算每一代解决方案的适应度。这个适应度函数将使用我们已经制作的适应度函数来计算,这有助于我们最大化线性方程F(X)。Python代码如下:
#Calcuate fitness of population def calculate_fitness(inputs, population): ''' Description: This function calculates the fitness of our solutions Input Arguments: input values and a population of weights Output: Fitness - a fitness value ''' fitness = np.sum(population*inputs, axis=1) return fitness
选择父母:我们将继续选择父池,以根据适应度函数生成下一代。这意味着我们将在每一代中选择最适合的父母并使用它来创造下一代的后代。我们将使用choose_mating函数执行此操作。从每一代开始,我们将选择三个最适合的父母,Python代码如下所示:
#Choose parents to mate def choose_mating(population, fitness, number_of_parents): ''' Description: This function chooses the most fit solutions to mate in order to yield the next generation Input Arguments: a population, the fitness and the number of parents required to mate Output: the parents to mate ''' #initialize empty numpy array to hold parents parents = np.empty((number_of_parents, population.shape[1])) #loop to fill the created parents array with the fittest solutions and stop at the required number of parents for number in range(number_of_parents): maximum_fitness_index = np.where(fitness == np.max(fitness)) maximum_fitness_index = maximum_fitness_index[0][0] parents[number, :] = population[maximum_fitness_index, :] fitness[maximum_fitness_index] = -99999999999 return parents
交叉:接下来,我们将从上一代选择的父母进行交叉,以创造下一代。我们将使用单点交叉并选择在父母中心交叉的点。这意味着第一个亲本的前半部分基因将是后代的第一个基因,第二个交配亲本的后半部分同理。使用以下Python代码执行此操作:
#Perform cross over def crossingover(parents, size_of_offspring): ''' Description: This function performs a one point cross over operation on the parents. Input Arguments: Parents - the parents pool to be crossed over size of offspring - the size of offspring required Output: offspring - the crossed over offspring ''' # initialize an empty numpy array to hold the offsprings offspring = np.empty(size_of_offspring) # specify the point of cross over - at the center in our case point_of_crossover = np.uint8(size_of_offspring[1]/2) #loop over the number of offspring required to get the required offsprings for k in range(size_of_offspring[0]): # get index of first parent to be mated parent1_index = k%parents.shape[0] # get index of second parent to be mated parent2_index = (k+1)%parents.shape[0] # proceed to assign the first half of the first parents gene to the first half gene of the offspring offspring[k, 0:point_of_crossover] = parents[parent1_index, 0:point_of_crossover] # proceed to assign the second half of the second parents gene to the second half gene of the offspring offspring[k, point_of_crossover:] = parents[parent2_index, point_of_crossover:] return offspring
进行突变:接下来,我们将在后代的基因中引入一些随机变异,以减少父母和后代之间过度的对称。代码如下:
#perform mutation def mutation(cross_over_offspring): ''' Description: This function performs some random variation to the genes of the solutions by adding a random bias number. Input - cross_over_offspring - offspring obtained from the cross over function Output - Mutated offspring ''' #loop over the offsprings and add a random bias number to their genes for number in range(cross_over_offspring.shape[0]): # Create the random bias number to be used for muation. The arguments taken are the limits(low and high) and the size random_bias_number = np.random.uniform(-1.5, 1.5, 1) #Proceed to mutate by adding the bias number cross_over_offspring[number, 3] = cross_over_offspring[number, 3] + random_bias_number return cross_over_offspring
获得最佳解:经过每一代的循环,我们将在最后一代从解决方案中获得最适合的解。这将是我们用遗传算法得到的最适合的解。现在,我们将继续对一个简单的示例函数执行遗传算法,目标是使其最大化。
该等式给出为F(X)= A1X1 + A2X2 + A3X3 + A4X4,其中X1 = 4 X2 = -2 X3 = 3.5 X4 = -4.2全部随机选择。
我们将寻求优化值 - A1,A2,A3和A4,以最大化函数F(X)。我们将从一组X的输入开始,然后指定权重的数量。然后我们指定要交叉的若干代,并在最后选择最佳解决方案。
前面创建的函数将被组合起来创建遗传算法,该算法将帮助我们在指定的几代中生成最大值。Python实现如下:
''' Implementing the GA ''' # Specify inputs of the equation ( X values) inputs = [4,-2,3.5,-4.2] # Specify the number of weights or multipliers (A values, in our case) for each input which we are looking to optimize number_of_weights = 4 # Specify the number of solutions in each generation. Each solution will have the number of weights solutions = 6 #Specify the number of parents mating to yield the specified number of solutions number_of_parents_mating = 3 # Define the population size based on the number of parents mating and the number of solutions population_size = (solutions,number_of_weights) #randomly create the inital population according to the specified population size new_population = np.random.uniform(low=-5.0, high=5.0, size=population_size) #Print out the initialized population print(new_population) #Specify the number of generations to mutate through number_of_generations = 10 #Loop over the specified number of generations for generation in range(number_of_generations): #Print the generation in which we are currently looping through to keep track print("Generation : ", generation) # Calculate the fitness of the population in the current generation fitness = calculate_fitness(inputs, new_population) # Proceed to select the most fit parents to mate parents = choose_mating(new_population, fitness, number_of_parents_mating) # Proceed to create offsprings via the one point cross over crossover = crossingover(parents, size_of_offspring=(population_size[0]-parents.shape[0], number_of_weights)) # Proceed to add random bias to the genes of the created offspring via the mutation function mutated = mutation(crossover) # Proceed to create the new population from the parents and the mutated new_population[0:parents.shape[0], :] = parents new_population[parents.shape[0]:, :] = mutated # Get the best or most fit solution in this current generation print("Best result : ", np.max(np.sum(new_population*inputs, axis=1))) #Calculate fitness of final generation fitness = calculate_fitness(inputs, new_population) # Get most fit solution of final generation most_fit_index = np.where(fitness == np.max(fitness)) #Print out the best solution print("Best solution : ", new_population[most_fit_index, :]) print("Best solution fitness : ", fitness[most_fit_index])
遗传算法是最受欢迎的优化算法之一。如您所见,它在机器学习中也有广泛的应用领域。应用领域仅受您的想象力限制。现在您已经了解了有关遗传算法的所有知识,您可以有效地使用它来优化您的机器学习模型。