Home > Genetic Programming, Software Development > ECJ: A Second Tutorial

ECJ: A Second Tutorial


This is the fifth of my postings on introductory examples in genetic algorithms and genetic programming. If you have been following these posts you already know that I do not go into what is GA or GP or how you would go about implementing your own GA/GP systems. If you want an introduction please read Chapter 11, Evolving Intelligence, from Toby Segaran’s Programming Collective Intelligence. It is a great introduction to GA/GP and I highly recommend it.

Time to look at the ECJ version of the GP example. Let me warn you: there are a lot of steps I will be skipping; look at the code I modified and at the code from the ECJ code drop. This framework isn’t as straightforward as JGAP or Watchmaker, but I am coming to believe it is the more powerful of the three.

Simple Math Test – A Genetic Program

Input: a series of number pairs

Output: the formula the transforms the pair of numbers to a desired output value.

As it turns out ECJ Tutorial 4 is an example of the above. In a testament to the use of highly accurate names the example is called MultiValuedRegression. Toby Segaran calls his multi-valued regression example Simple Math Test.

(Hmm. Which example would you rather try? I am pretty certain that those of us (well, me) who fit in the novice category of the Dreyfus model of skill acquisition would prefer the simpler name…unless you are into math in which case you are already insulted that we would grow code to figure out the solution instead of figuring it out ourselves. But I digress.)

I thought this post was going to be a short one. ECJ is both simple enough and complex enough that I will refer you to the full code from the ECJ download to view all the pieces involved in running this example and to the tutorial documentation to understand how this example works. However, looking at the fitness function and at the custom class I had to write will help in understanding how the ECJ pieces fit.

The following gratuitous diagram is from the ECJ documentation.

High-level Diagram of the ECJ GP Framework from the ECJ Tutorial 4 Documention

While I would have preferred a UML diagram, this will do. It is all about aggregation relationships anyway:

  • Individuals have a Species
  • Individuals contain a GPTree (solution tree)
  • A GPTree has GPTreeConstraints
  • A GPTree has Nodes which have NodeConstraints
  • Nodes may have child nodes. The child nodes in turn have NodeConstraints
  • NodeConstraints contain the child type and the Node return type

Here is my gratuitous UML diagram.

ECJ GP UML Diagram

ECJ GP UML Diagram

I admit it is not as explicit as ECJ’s diagram and it doesn’t use color (I’m not good with color. Except for purple. And maybe salmon. And bone. Or is it eggshell?).

The original ECJ MultiValuedRegression example does not use constants. The fitness function used the formula x^2 * y + x*y + y, but if you recall the Toby Segaran formula was x^2 + 2y + 3x + 5. What’s different? The use of actual numbers: 2, 3 and 5, to be exact.

In order to create the parse tree of code to be executed we have to use operators for addition and multiplication, variable wrappers for x and y, and numeric wrappers for integers (in the ECJ example, the subtraction operator is also used so I left it for old-times sake). In a situation where the formula to be evolved is quite unknown you may find yourself throwing in trig functions, power notation, the division operator and any other functions you think will help your population evolve in the proper direction.

I created a new Eclipse project and called it SimpleMathTest. I added the ECJ installation to the project’s classpath and in addition I copied the following into the src folder from the ECJ code drop:

  • ec.app.tutorial4.*.java. Rename the package to hiddenclause.ec.app.tutorial4 or whatever you like, but any place in the instructions below where I reference the package name you need to substitute the proper name.
  • ec.params
  • simple.params
  • koza.params
  • tutorial4.params

The various param files refer to each other. In order to make them visible to each other I had to change the paths declared using the parent.0 property.

Within tutorial4.params I changed the value of parent.0 to:

parent.0 = koza.params

Within koza.params I changed the value of parent.0 to:

parent.0 = simple.params

Within simple.params I changed the value of parent.0 to:

parent.0 = ec.params

In order to run the example within Eclipse create a run configuration for the project. The Run Configuration tabs for SimpleMathTest should be set as:

  • Main
    • Project: SimpleMathTest
    • Main class: ec.Evolve
  • Arguments
    • Program Arguments: -file tutorial4.params -p gp.tree.c=true
    • Working Directory: ${workspace_loc:SimpleMathTest/src/hiddenclause/ec/app/tutorial4}

No other tabs needed to be changed.

The Fitness Code

I changed the fitness code from:

MultiValuedRegression.java

...
    expectedResult = currentX * currentX * currentY
                   + currentX * currentY
                   + currentY;
...

to this:

...
    expectedResult = currentX * currentX
                   + 2 * currentY
                   + 3 * currentX
                   + 5;
...

The MultiValuedRegression class has a small local API, but a rather large one if you look at its inheritance tree. The three methods I care about within MultiValuedRegression are:

  • setup() – this is where the object gets information from the properties database. It is only called once.
  • clone() – creates a deep copy of the MultiValuedRegression object.
  • evaluate() – the fitness function. Well, technically not the fitness function as the KozaFitness object looks at the fitness score generated by evaluate() and picks who goes into the next generation.

Running the modified example code in ECJ did not find the formula. From a testing perspective the failure was to be expected. Now I could implement a class to handle constants and update the configuration file.

The Custom Data Classes

I implemented two classes using ECJ naming conventions: Int and IntData. Int is a wrapper for an integer value; it has to inherit from the ERC (Ephemeral Random Constants) class which holds a constant value. IntData is a wrapper for the result of the calculation and is checked in the fitness function; its value changes with each individual checked. Int is created and populated in the GPTree as it tries and reverse engineer the formula; IntData is passed in to each Individual as a place to store the result of the code execution (in this case a calculation).

The Int class has 6 local methods. They are all quite shallow so take a look at the code for a peek into what they do (the code is located below). The method I care about is eval(): it takes an incoming GPData object, downcasts it to an object of type IntData, and stores the Int object’s value in it.

    @Override
    public void eval(final EvolutionState state, final int thread,
                     final GPData input,
                     final ADFStack stack,
                     final GPIndividual individual,
                     final Problem problem) {
        IntData rd = ((IntData) (input));
        rd.x = _val;
    }

The IntData class has one local method named copyTo(). All it does is take its current value and assign it to an incoming GPData object.

public class IntData extends GPData {
    public int x; // return value

    @Override
    public GPData copyTo(final GPData gpd)
    {
        ((IntData) gpd).x = x;
        return gpd;
    }
}

The Configuration File

The changes in here were pretty easy: Add the new wrapper as a function, add the result wrapper, and declare the use of the fitness function.

The new wrapper is defined in tutorial4.param as:

...
gp.fs.0.size = 6
...
gp.fs.0.func.2 = hiddenclause.ec.app.tutorial4.Int
gp.fs.0.func.2.nc = nc0
...

The fitness function is declared as:

eval.problem = hiddenclause.ec.app.tutorial4.MultiValuedRegression

The result wrapper is declared twice: once for external use (the value checked within MultiValuedRegression) and once for internal use.

eval.problem.data = hiddenclause.ec.app.tutorial4.IntData
eval.problem.stack.context.data = hiddenclause.ec.app.tutorial4.IntData

The Output

Once all that was done, I was able to run the example and see if I could evolve the result I was looking for. The out.stat file had this to say:

...
Final Statistics
================
Total Individuals Evaluated: 7168

Best Individual of Run:
Evaluated: true
Fitness: Raw=0.0 Adjusted=1.0 Hits=10
Tree 0:
((x - (3 - x)) + (y + y)) + ((8 + x) + (x * x))

The above simplifies to:

x - 3 + x + y + y + 8 + x + x * x

2x - 3 + 2y + 8 + x + x^2

3x + 5 + 2y + x^2

x^2 + 2y + 3x + 5

The cat was alive.

The Code

tutorial4.params

# Copyright 2006 by Sean Luke and George Mason University
# Licensed under the Academic Free License version 3.0
# See the file "LICENSE" for more information

# Modified by Carlos Valcarcel for use as an example on the Hidden Clause blog.

parent.0 = koza.params

# We have one function set, of class GPFunctionSet
gp.fs.size = 1
gp.fs.0 = ec.gp.GPFunctionSet
# We'll call the function set "f0".  It uses the default GPFuncInfo class
#gp.fs.0.name = f0
#gp.fs.0.info = ec.gp.GPFuncInfo

# The function set.
gp.fs.0.size = 6
gp.fs.0.func.0 = hiddenclause.ec.app.tutorial4.X
gp.fs.0.func.0.nc = nc0
gp.fs.0.func.1 = hiddenclause.ec.app.tutorial4.Y
gp.fs.0.func.1.nc = nc0
gp.fs.0.func.2 = hiddenclause.ec.app.tutorial4.Int
gp.fs.0.func.2.nc = nc0
gp.fs.0.func.3 = hiddenclause.ec.app.tutorial4.Add
gp.fs.0.func.3.nc = nc2
gp.fs.0.func.4 = hiddenclause.ec.app.tutorial4.Sub
gp.fs.0.func.4.nc = nc2
gp.fs.0.func.5 = hiddenclause.ec.app.tutorial4.Mul
gp.fs.0.func.5.nc = nc2

eval.problem = hiddenclause.ec.app.tutorial4.MultiValuedRegression
eval.problem.data = hiddenclause.ec.app.tutorial4.IntData
# The following should almost *always* be the same as eval.problem.data
# For those who are interested, it defines the data object used internally
# inside ADF stack contexts
eval.problem.stack.context.data = hiddenclause.ec.app.tutorial4.IntData

Int.java

/*
 * This is a version of the MultiValuedRegression code from the ECJ code drop to
 * present an implementation of Toby Segaran's SimpleMathTest example.
 *
 * This is an example only! Use it for anything else at your own risk!
 * You have been warned! Coder/user beware!
 */

package hiddenclause.ec.app.tutorial4;

import ec.EvolutionState;
import ec.Problem;
import ec.gp.ADFStack;
import ec.gp.ERC;
import ec.gp.GPData;
import ec.gp.GPIndividual;
import ec.gp.GPNode;
import ec.util.Code;
import ec.util.Parameter;

public class Int extends ERC {
    private int _val;

    @Override
    public void checkConstraints(final EvolutionState state,
                                 final int tree,
                                 final GPIndividual typicalIndividual,
                                 final Parameter individualBase) {
        super.checkConstraints(state, tree, typicalIndividual, individualBase);
        if (children.length != 0)
            state.output.error("Incorrect number of children for node "
                    + toStringForError() + " at " + individualBase);
    }

    @Override
    public void eval(final EvolutionState state,
                     final int thread,
                     final GPData input,
                     final ADFStack stack,
                     final GPIndividual individual,
                     final Problem problem) {
        IntData rd = ((IntData) (input));
        rd.x = _val;
    }

    @Override
    public void resetNode(EvolutionState state, int thread) {
        _val = Math.abs(state.random[thread].nextInt() % 10);
    }

    @Override
    public String encode() {
        return Code.encode(_val);
    }

    @Override
    public boolean nodeEquals(GPNode node) {
        if (this.getClass() != node.getClass())
            return false;
        return (((Int) node)._val == _val);
    }

    @Override
    public String toString() {
        return Integer.toString(_val);
    }
}

IntData.java

/*
 * This is a version of the MultiValuedRegression code from
 * the ECJ code drop to present an implementation of Toby
 * Segaran's SimpleMathTest example.
 *
 * This is an example only! Use it for anything else at your own risk!
 * You have been warned! Coder/user beware!
 */

package hiddenclause.ec.app.tutorial4;

import ec.gp.GPData;

public class IntData extends GPData {
    public int x; // return value

    @Override
    public GPData copyTo(final GPData gpd)
    {
        ((IntData) gpd).x = x;
        return gpd;
    }
}
Advertisements
  1. October 28, 2010 at 10:50 am

    Hi,
    is it possible to specify custom train data ?
    E.g. from file:
    1.23
    1.25
    1.35

    thanks in advance

    • cvalcarcel
      March 20, 2011 at 11:01 pm

      Forgive me, but I don’t understand the question. While the code is “learning” it is the fitness function that does all the training. The above fitness function has values that it is looking for given certain inputs so by definition you can specify custom training data by changing those values.

      What exactly are you looking for?

  2. roadtocaliforniaaacation
    May 15, 2012 at 5:35 am

    Hi,

    Great tutorial! Unfortunately, I have troubles getting your example to run. The issue is that the original MultiValuedRegression example uses instances of DoubleData to exchange values. However, when I introduce IntData to the example without any other modifications on MultiValuedRegression (and the functions like Add, Sub and so on), I get ClassCastExceptions because we have now instances of IntData which cannot be casted to DoubleData.

    Did you omit this problem or do you have any clue how to bypass this problem without replacing DoubleData in every class?

    Thanks in advance

  3. roadtocaliforniaaacation
    May 15, 2012 at 6:58 am

    Did you change anything regarding the usage of the DoubleData class? Otherwise, if I don’t do any modifications I will get ClassCastExceptions due to the parallel usage of DoubleData and IntData.

  4. bcruz1
    August 11, 2012 at 1:29 am

    does ECJ have the code for evolving more complex programs? more than (+ – * /)? would i have to hand code the if-else statment or for-loop into ECJ frame work so i can use it as a building block while evolving a program? it seems weird that tutorial 4 is called genetic programming but there is only a math equation.

    • August 11, 2012 at 8:37 am

      I’m not the author but to answer your question I can say two things: Yes, it is possible, but it is very difficult! The framework allows you to combine arbitrary functions and of course, you can also specify logical statements or iteration statements. However, if you want to do this you need a very good fitness function and usually a heck of time. The more complex your problem is, the larger your tree gets. Evaluating huge trees takes a lot of time and since there are a lot of possible combinations of your basic functions, your search space is very large. So, to sum it up, you should be as specific and as restrictive as possible to define your problem, the available function set and of course, your fitness functions.

      By the way, the term “programming” does not mean in general that you create a computer program. Especially tasks in Operations Research or optimization algorithms often have the term “programming” in their names (e.g. linear programming). If you take an even broader look you will see that a computer program is basically just a mathematical equation. There is practically no difference although computer programs usually appear to be more high-level with their control statements. But as every mathematical function, a computer program just transforms inputs with a specific algorithm (== function definition) into output.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: