In terms of solving problems, any guideline that guides towards solving a problem, and not an instance of a problem is called an "Algorithm". 

Classical problems are those that classical set theory and logic and therefore classical algorithms can solve. But some problems, especially NP-Hard optimization problems, which are usually finding a maximum or minimum take time to be solved using classical methods. In such cases, we use methods that are a tad Smarter or more efficient. For instance, let's say you have a document with 2 million pages, if you look it up for a phrase or a specific page or paragraph, and you use the classic methods which Guarantee success at some point (consider we know for a fact that the phrase exists), and definitive success, it would take 2 million times N, N being the time it takes to look through one page to find what you're looking for. Again, two million times N is the worst case scenario, which we like to call big O. But if you had multiple people each randomly checking a page, they'd actually have a better shot since their chances of success improves as they go forward, since they would eventually get an idea which way to go, or where to look next. These stochastic methods are known as meta-heuristics and one very early instance of them is the Genetic Algorithm.

As long as we're clear on what needs to be done, let's look at another example. Say you have a multi-dimensional function and you're looking for the maximum or minimum of said function. For this particular example, let's say maximum. Now, because it's hard to compute because one would have to try many points of the function within its domain, and not being able to visualize it either, you can't look through every possible input since it's continuous and if you miss as much as one point or input, you'd be making a mistake by classical problem solving standards. So what GA proposes is we treat the problem like a world of people that play the role of those points, and we generate a number of inputs or points within the domain of the function, and we call them a GENERATION, and these points each have specific values, for instance if your function is 3 dimensional, each point would have a value for x, one for y and one for the z axis, so we call those it's DNA, or genes if you will, and based on those genes and their outcome we calculate a percentage of fitness for the point. Now it depends on lots of measures, but for now, let's just say we have a fitness function and we generate let's say 10 random points on the function F, and each have a fitness level or percentage. Based on their fitness percentage, the likelihood of them being picked increases, just like in life, those who can adjust to their environment have more likeliness to survive. So if one point has a fitness level of 83 percent, it means that it has a 83 percent chance of being picked. After we pick the same number of new candidates as the population, we pick "parents" 2 by 2, they "mate" and generate two offsprings that receive half their genes from parent 1 and half from parent 2, and have a likeliness to be mutated as well. Afterwards, these offsprings are kept. Finally, we replace them with the old generation. Now these new points have new fitness values, and they mate just like their parents and generate new offsprings. The process is repeated from that point.

Now if we know to a certain point what our objective is, or we get to a certain member of the population or child with a 100 percent fitness level, we're done. If not, then we reach a point where we don't reach better results. That is when we can simply assume the best value or member or child our objective or in this case maximum.

Of course, these were all the basics of the classic genetic algorithm. As I mentioned, it has been updated with new methods of selecting parents, selecting offsprings, mutating them, creating offsprings based on parents(crossover). So, this is just as I mentioned in the title, and introduction.

There are many complications when it comes to GA. There are many different ways to encode your problem, represent your problem. There are also many different parameters, different operators for evaluation, selection, crossover, mutation and next generation selection.

These are all things that have to be adjusted for any given problem. After the methods are chosen and set, then you should probably try different parameters for the mutation rate(probability), crossover rate, population size, maximum number of iterations and so on. You can even adjust some of these parameters dynamically while GA is running, using fuzzy inference systems. You can even use another meta-heuristic or even GA itself to optimize its own parameters. The sky's the limit.

I'm working on GA as an extracurricular activity for my GA class and AI class. I've managed to get some work done using other meta-heuristics like PSO and SA, and I'm going to be moving on to ACO soon.

Thank you for reading. Stay tuned!