Home > Genetic Programming, Software Development > ECJ: A First/Simple Tutorial

ECJ: A First/Simple Tutorial


(This is going to be another long one.)

When we last left our heroes they were looking into JGAP as a cool framework for the creation of genetic algorithms. One simple example left them warm and fuzzy and the second left them temporarily bewildered. Taking a better look at the Watchmaker Framework helped clear their minds and made them realize that more thought was needed. They stopped, took a deep breath and ordered cappuccinos.

In the course of human events, research is inevitable. In my research I ran across Mehdi Khoury’s tutorial page which included a comparison of different genetic programming packages as of June 2007. The package he rated the highest? A package named ECJ developed at George Mason University’s Evolutionary Computation Laboratory.

In the course of my research in GP frameworks I decided I wanted to give ECJ a try as well. Nothing like a recommendation from a web site I’ve never heard of to influence my decision making capabilities about cool technologies.

I will be reversing the examples as the Hello World example is really about genetic algorithms while the Simple Math Test, taken from Toby Segaran’s Programming Collective Intelligence, is about genetic programming. It just makes sense to do GA before GP.

If you disagree I look forward to reading your blog posting where you do the reverse.

Hello World – A Genetic Algorithm

Input: the alphabet + a comma + an exclamation point + a space
Output: “Hello, world!”

The ECJ Properties File

The use of properties files is a very Java thing. Not to say that other programming languages don’t use something similar, but Java has made their use a virtue. In some cases, properties files are fantastic, while in other cases XML files are better. Folks who prefer verbose descriptions prefer XML and all others prefer the key=value format of properties; which you decide to use should be based on the problem at hand and not a knee-jerk reaction. Good luck figuring out if you’re having a brain-based knee-jerk reaction right now.

For this ECJ example, I rearranged the various properties so that related key/value pairs would be together. I hope this will make the explanation more coherent.

The first thing to bear in mind with ECJ is that you don’t get to write main(). The properties file tells the controller class ec.Evolve what to do and it does it with great verve and joie de vivre. I am sure you could write your own controller class to execute ECJ within your own applications, but that is left as an exercise for the reader.

The properties file is called from the command line like so:

java ec.Evolve -file [properties file]

As long as the ECJ framework is in the classpath (there is no JAR file to speak of, but you can always make your own), your custom classes are in the classpath, and the path to the properties file is accurate you should be good to go. I have run this example from within Eclipse and on the command line (using Kubuntu) and the results are the same.

verbosity    = 0

breedthreads = 1
evalthreads  = 1
seed.0       = 4357

The first batch of properties are global framework configurations:

  • log file level verbosity (0 – output everything, all the way to 5000 – output nothing). More detail here.
  • number of threads used for crossover/breeding
  • number of threads used for population evaluation
  • random number seed
state       = ec.simple.SimpleEvolutionState

pop         = ec.Population
init        = ec.simple.SimpleInitializer
finish      = ec.simple.SimpleFinisher
breed       = ec.simple.SimpleBreeder
eval        = ec.simple.SimpleEvaluator
stat        = ec.simple.SimpleStatistics
exch        = ec.simple.SimpleExchanger

generations          = 100
quit-on-run-complete = true

Next comes a number of existing classes to accomplish basic tasks. I am eternally grateful to them for the amount of work they saved me. The ones deserving of an explanation are:

  • ec.simple.SimpleEvolutionState – a subclass of EvolutionState. It contains the configuration information needed by the framework to get your genes evolving.
  • ec.Population – contains the current collection of Individuals/chromosomes that have been bred and/or evaluated.
  • ec.simple.SimpleBreeder – creates/breeds Individuals. While ECJ can handle multiple sub-populations the SimpleBreeder does not.

The generations and quit-on-run-complete are complementary; continue to evolve until 100 generations are completed or our fitness function has a perfect match.

checkpoint           = false
prefix               = ec
checkpoint-modulo    = 1

stat.file            = $out.stat

breed.elites.0       = 1

Checkpoint file configurations are defined here. The non-obvious ones are:

  • prefix – the prefix used for the checkpoint file (not used in this example, but necessary to run Evolve)
  • checkpoint-modulo – run a checkpoint every generation or after every N generations?
  • stat.file – name of the output file. The $ means write it in the folder where the Java process started. If a relative path is used then use the folder where the process started as the anchor for the relative path.
pop.subpops  = 1
pop.subpop.0 = ec.Subpopulation

pop.subpop.0.size                   = 100
pop.subpop.0.duplicate-retries      = 0
pop.subpop.0.species                = ec.vector.GeneVectorSpecies
pop.subpop.0.species.ind            = ec.vector.GeneVectorIndividual
pop.subpop.0.species.fitness        = ec.simple.SimpleFitness
#
# Hello, world! is 13 characters long
#
pop.subpop.0.species.genome-size    = 13
pop.subpop.0.species.crossover-type = two
pop.subpop.0.species.crossover-prob = 1.0
pop.subpop.0.species.mutation-prob  = 0.05

pop.subpop.0.species.pipe          = ec.vector.breed.VectorMutationPipeline
pop.subpop.0.species.pipe.source.0 = ec.vector.breed.VectorCrossoverPipeline
pop.subpop.0.species.pipe.source.0.source.0 = ec.select.TournamentSelection
pop.subpop.0.species.pipe.source.0.source.1 = ec.select.TournamentSelection

select.tournament.size = 2

The next group defines the configuration of the population:

  • Only one sub-population
  • Use the ec.Subpopulation class as its container
  • Create 100 Individuals
  • Use the GeneVectorSpecies and GeneVectorIndividual as the container for my custom gene
  • Use SimpleFitness to determine who are the current winners. Our custom fitness function will configure this for every individual
  • The genome size matches the length of our string…in this case 13. One character per gene
  • The crossover-type defines one of five possible ways to breed between the selected individuals. Type two means that the genes between two points in the chromosome will be swapped out with each other.
  • The crossover probability and mutation probability defines how often a crossover and mutation will occur: a one means all the time, 0.05 means not so often.
  • The mutation and crossover pipelines contain selection objects that decide who gets to breed and who gets mutated. TounamentSelection selects a number of individuals at random (how many is defined in the property select.tournament.size) and picks a winner based on the individual’s fitness value.
# each of these is really on one line
pop.subpop.0.species.gene
                = hiddenclause.example.ecj.CharVectorGene
pop.subpop.0.species.gene.alphabet
                = abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXY Z,!

eval.problem = hiddenclause.example.ecj.HelloWorld

Finally, I defined my 2 implementation classes (my gene and my fitness function) and their parameters.

There was no gene/individual type I could use so I defined a subclass of VectorGene and called it (wait for it) CharVectorGene. Since that gene is responsible for creating more genes, a Prototype for you Design Patterns geeks, it will also contain the alphabet containing the allowed characters to be used in this example.

HelloWorld is the fitness function that will push the population into evolving into a greeting.

Fitness Function

ECJ has the concept of an Individual where JGAP has a Chromosome and Watchmaker has AbstractCandidateFactorys to create the chromosomes out of any type you want. In the code I do the same check as before: check that the proper character is at the proper location; give it a point for every hit.

        int fitnessValue = 0;

        GeneVectorIndividual charVectorIndividual
                              = (GeneVectorIndividual) individual;
        long length = charVectorIndividual.size();
        for (int i = 0; i < length; i++) {
            CharVectorGene charVectorGene
                      = (CharVectorGene) charVectorIndividual.genome[i];
            char actual = charVectorGene.getAllele();
            if (actual == _expected[i]) {
                fitnessValue += 1;
            }
        }

In ECJ I am responsible for configuring the Fitness object as we get closer to the perfect message so my fitness function is not the ultimate arbiter of a gene’s fitness.

    SimpleFitness fitness = (SimpleFitness) charVectorIndividual.fitness;
    fitness.setFitness(evolutionState, fitnessValue,
                    fitnessValue == charVectorIndividual.genomeLength());

CharVectorGene – A Gene for Chars

This is where the gene is both created and initialized. The setup() method is only called once for the entire run to load up any parameters, in this case the desired alphabet.

    public void setup(final EvolutionState state, final Parameter base) {
        super.setup(state, base);

        Parameter def = defaultBase();

        String alphabetStr = state.parameters.getStringWithDefault(
                     base.push(P_ALPHABET), def.push(P_ALPHABET), "");
        if (alphabetStr.length() == 0)
            state.output.fatal(
                  "CharVectorGene must have a default alphabet", 
                  base.push(P_ALPHABET));

        alphabet = alphabetStr.toCharArray();
    }

Every gene has to have a character. This method randomly assigned a character to itself.

    public void reset(EvolutionState state, int thread) {
        int idx = state.random[thread].nextInt(alphabet.length);
        allele = alphabet[idx];
    }

There are four standard Java methods that turn out to be quite important to ECJ.

    public boolean equals(Object other)
    public int hashCode()
    public Object clone()
    public String toString()

Standard Java rules apply as to how you should implement them. I probably did not do such a good job.

ECJ’s HelloWorld Output

After running HelloWorld I found that the output to stdout doesn’t do anything more than tell you how many generations have passed and that (maybe) a solution was found. For the really interesting output you want to look at the out.stat file (ECJ automatically adds a space between each letter when it collects the stringified version of each gene):

...
Generation: 45
Best Individual:
Evaluated: T
Fitness: 13.0
 H e l l o ,   w o r l d !

Best Individual of Run:
Evaluated: T
Fitness: 13.0
 H e l l o ,   w o r l d !

The most interesting thing about the above is that ECJ came up with the solution in the least number of generations (JGAP: 233, Watchmaker: 125, ECJ: 45). It would appear that ECJ, though it took me longer to figure out, is the best petri dish so far. I am sure there is fine tuning that could be done to make the various toy example behave similarly, but I think I’ll leave that to someone who cares as an exercise for the reader.

Miscellaneous Comments

Okay, maybe I should have titled this section Miscellaneous Complaints.

The above example took me about a week of on-and-off thought. There was no simple way for me to discover what I did not know about ECJ, but it did a great job of highlighting over and over again that I knew nothing. Since there was no existing class to handle strings I had to figure out, sans documentation, what I needed to do. While intellectually interesting, it was quite frustrating. As you might expect there were a lot of dead-ends in my search.

In addition, due to the flexible method of assigning classes to various categories there should be a lot of checks for class types to insure the run doesn’t crash due to a bad case of downcasting. The ECJ examples used them; I removed them from my code.

While I was happy to finally figure out how to make ECJ work I also have to admit I was exhausted by the time that happened. Hunting through the Javadocs and various README files was decidedly unsatisfying.

Without going into a lot of detail, and I won’t, the ECJ framework appears to be quite powerful. While it has self-admitted warts the most interesting design choice I found was the use of properties files to glue everything together. JGAP and Watchmaker don’t use properties files at all though they might; I just didn’t run into them. ECJ’s use of them is very Spring-like only without the XML and the strict usage of interfaces. I am a big fan of the Spring Framework so ECJ gained points on the use of properties as a pseudo-dependency-injection file, but lost points by not using interfaces properly.

My big suggestion to the ECJ team: port ECJ to Spring and use more interfaces in the implementation code (or more accurately, stop downcasting the interfaces if you know that someone could mess with the properties files). Yeah, the use of XML is ugly, but it makes extending ECJ more consistent and should make writing tests for the various framework components easier.

Or not. I know it is going to be a lot of work and who knows how much code you want to maintain backward compatibility with.

On second thought, leave it alone. How about some more documentation?

The Code

helloworld.params

verbosity    = 0

breedthreads = 1
evalthreads  = 1
seed.0       = 4357

state  = ec.simple.SimpleEvolutionState

pop    = ec.Population
init   = ec.simple.SimpleInitializer
finish = ec.simple.SimpleFinisher
breed  = ec.simple.SimpleBreeder
eval   = ec.simple.SimpleEvaluator
stat   = ec.simple.SimpleStatistics
exch   = ec.simple.SimpleExchanger

generations             = 100
quit-on-run-complete    = true
checkpoint              = false
prefix                  = ec
checkpoint-modulo       = 1

stat.file       = $out.stat

pop.subpops     = 1
pop.subpop.0    = ec.Subpopulation

pop.subpop.0.size                  = 100
pop.subpop.0.duplicate-retries     = 0
pop.subpop.0.species               = ec.vector.GeneVectorSpecies
pop.subpop.0.species.ind           = ec.vector.GeneVectorIndividual
pop.subpop.0.species.fitness       = ec.simple.SimpleFitness

# Place on one line
pop.subpop.0.species.gene          
    = hiddenclause.example.ecj.CharVectorGene
# Place on one line
pop.subpop.0.species.gene.alphabet 
    = abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXY Z,!
#
# Hello, world! is 13 characters long
#
pop.subpop.0.species.genome-size    = 13
pop.subpop.0.species.crossover-type = two
pop.subpop.0.species.crossover-prob = 1.0
pop.subpop.0.species.mutation-prob  = 0.05

# Place on one line
pop.subpop.0.species.pipe                   
    = ec.vector.breed.VectorMutationPipeline
# Place on one line
pop.subpop.0.species.pipe.source.0          
    = ec.vector.breed.VectorCrossoverPipeline
pop.subpop.0.species.pipe.source.0.source.0 = ec.select.TournamentSelection
pop.subpop.0.species.pipe.source.0.source.1 = ec.select.TournamentSelection

select.tournament.size = 2

eval.problem = hiddenclause.example.ecj.HelloWorld

breed.elites.0 = 1

HelloWorld.java

/**
 * HelloWorld.java
 *
 * This is an example only! Use it for anything else at your own risk!
 * You have been warned! Coder/user beware!
 */
package hiddenclause.example.ecj;

import ec.EvolutionState;
import ec.Individual;
import ec.Problem;
import ec.simple.SimpleFitness;
import ec.simple.SimpleProblemForm;
import ec.vector.GeneVectorIndividual;

public class HelloWorld extends Problem implements SimpleProblemForm {
    private char[] _expected = "Hello, world!".toCharArray();

    public void evaluate(final EvolutionState evolutionState, 
                                    final Individual individual, 
                                    final int subPopulation, 
                                    final int threadNum) {
        if (individual.evaluated)
            return;

        int fitnessValue = 0;

        GeneVectorIndividual charVectorIndividual = (GeneVectorIndividual) individual;
        long length = charVectorIndividual.size();
        for (int i = 0; i < length; i++) {
            CharVectorGene charVectorGene 
                    = (CharVectorGene) charVectorIndividual.genome[i];
            char actual = charVectorGene.getAllele();
            if (actual == _expected[i]) {
                fitnessValue += 1;
            }
        }

        SimpleFitness fitness 
                         = (SimpleFitness) charVectorIndividual.fitness;
        fitness.setFitness(evolutionState, fitnessValue, 
                fitnessValue == charVectorIndividual.genomeLength());

        charVectorIndividual.evaluated = true;
    }

    public void describe(final Individual individual, 
                         final EvolutionState state, 
                         final int subPopulation, 
                         final int threadNum,
                         final int log, final int verbosity) {
        // Do Nothing
    }
}

CharVectorGene.java

/**
 * CharVectorGene.java
 *
 * This is an example only! Use it for anything else at your own risk!
 * You have been warned! Coder/user beware!
 */
package hiddenclause.example.ecj;

import ec.EvolutionState;
import ec.util.Parameter;
import ec.vector.VectorGene;

/**
 * @author carlos
 */
public class CharVectorGene extends VectorGene {
    public final static String P_ALPHABET = "alphabet";

    private static char[]      alphabet;
    private char               allele;

    @Override
    public void setup(final EvolutionState state, final Parameter base) {
        super.setup(state, base);

        Parameter def = defaultBase();

        String alphabetStr = state.parameters.getStringWithDefault(
                          base.push(P_ALPHABET), def.push(P_ALPHABET), "");
        if (alphabetStr.length() == 0)
            state.output.fatal(
                       "CharVectorGene must have a default alphabet", 
                       base.push(P_ALPHABET));

        alphabet = alphabetStr.toCharArray();
    }

    /*
     * (non-Javadoc)
     * @see ec.vector.VectorGene#reset(ec.EvolutionState, int)
     */
    @Override
    public void reset(EvolutionState state, int thread) {
        int idx = state.random[thread].nextInt(alphabet.length);
        allele = alphabet[idx];
    }

    public char getAllele() {
        return allele;
    }

    /*
     * (non-Javadoc)
     * @see ec.vector.VectorGene#equals(java.lang.Object)
     */
    @Override
    public boolean equals(Object other) {
        if (!this.getClass().isInstance(other)) {
            return false;
        }

        CharVectorGene that = (CharVectorGene) other;

        return allele == that.allele;
    }

    /*
     * @see ec.vector.VectorGene#hashCode()
     */
    @Override
    public int hashCode() {
        int hash = this.getClass().hashCode();

        hash = (hash << 1 | hash >>> 31) ^ allele;

        return hash;
    }

    @Override
    public Object clone() {
        CharVectorGene charVectorGene = (CharVectorGene) (super.clone());

        return charVectorGene;
    }

    @Override
    public String toString() {
        return Character.toString(allele);
    }
}

Next time: the ECJ version of Toby Segaran’s Simple Math Test.

Or maybe I will do that first in Watchmaker.

Advertisements
  1. Aron Kohn
    June 19, 2012 at 11:36 am

    Thank you so much.

  2. Jag
    September 26, 2012 at 11:41 am

    Thanks

  3. gif
    February 21, 2013 at 5:53 pm

    great tutorial ! this is very helpful for me to start using ECJ. thanks !

  4. September 3, 2013 at 10:22 am

    Thank you man! It really helped me in my studies with ECJ…

  5. Julia
    October 23, 2013 at 4:07 pm

    Hello! How can I run the example with eclipse?

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: