Home

Welcome to the XCSF wiki!

Please feel free to help improve these pages.

Quick start: Compiling and Running.
Note: changes made in each version are recorded in CHANGELOG.md.
Not familiar with learning classifier systems? See these introductory videos: [1], [2].

Overview

XCSF is rule-based and maintains a population of classifiers where each classifier $cl$ consists of:

a condition component $cl.C$ that determines whether the rule matches input $\vec{x}$
an action component $cl.A$ that selects an action $a$ to be performed for a given $\vec{x}$
a prediction component $cl.P$ that computes the expected payoff for performing $a$ upon receipt of $\vec{x}$

XCSF thus generates rules of the general form:

IF matches $\leftarrow cl.C(\vec{x})$ THEN perform action $a \leftarrow cl.A(\vec{x})$ and EXPECT payoff $p \leftarrow cl.P(\vec{x})$.

For each step within a learning trial, XCSF constructs a match set $[M]$ composed of classifiers in the population set $[P]$ whose $cl.C$ matches $\vec{x}$. If $[M]$ contains fewer than $\theta_{\text{mna}}$ actions, a covering mechanism generates classifiers with matching $cl.C$ and random $cl.A$.

For each possible action $a_k$ in $[M]$, XCSF estimates the expected payoff by computing the fitness-weighted average prediction $P(a_k)$. That is, for each action $a_k$ and classifier prediction $p_j$ in $[M]$, the system prediction $P_k=\sum_j F_j p_j / \sum_j F_j$.

A system action is then randomly or probabilistically selected during exploration, and the highest payoff action $P_k$ used during exploitation. Classifiers in $[M]$ advocating the chosen action are subsequently used to construct an action set $[A]$. The action is then performed and a scalar reward $r \in \mathbb{R}$ received, along with the next sensory input.

Upon reaching a terminal state within the environment (as is always the case in single-step problems), each classifier $cl_j \in [A]$ has its experience incremented and fitness, error, and set size $as$ updated using the Widrow-Hoff delta rule with learning rate $\beta \in$ [0,1] as follows.

Error: $\epsilon_j \leftarrow \epsilon_j + \beta (|r - p_j| - \epsilon_j)$
Accuracy: $\kappa_j =$
- 1 if $\epsilon_j < \epsilon_0$
- $\alpha (\epsilon_j / \epsilon_0)^{-\nu}$ otherwise.
  With target error threshold $\epsilon_0$ and accuracy offset $\alpha \in$ [0,1], and slope $\nu$.
Relative accuracy: $\kappa_j' = (\kappa_j \cdot num_j) / \sum_j \kappa_j \cdot num_j$
Where classifiers are initialised with numerosity $num = 1$.
Fitness: $F_j \leftarrow F_j + \beta(\kappa_j' - F_j)$
Set size estimate: $as_j \leftarrow as_j + \beta(|[A]| - as_j)$

Thereafter, $cl.C$, $cl.A$, and $cl.P$ are updated according to the representation adopted.

The evolutionary algorithm (EA) is applied to classifiers within $[A]$ if the average time since its previous execution exceeds $\theta_{\text{EA}}$. Upon invocation, the time stamp $ts$ of each classifier is updated. Two parents are chosen based on their fitness via roulette wheel selection (or tournament) and $\lambda$ number of offspring are created via crossover with probability $\chi$ and mutation with probability $\mu$.

Offspring parameters are initialised by setting the error and fitness to the parental average, and discounted by reduction parameters for error $\epsilon_R$ and fitness $F_R$. Offspring $exp$ and $num$ are set to one. If subsumption is enabled and the offspring are subsumed by either parent with sufficient accuracy $(\epsilon_j < \epsilon_0)$ and experience $(exp_j > \theta_{\text{sub}})$ it is not included in $[P]$; instead the parents' $num$ is incremented.

The resulting offspring are added to $[P]$ and the maximum (micro-classifier) population size $N$ is enforced by removing classifiers selected via roulette with the deletion vote.

The deletion vote is set proportionally to the set size estimate $as$. However, the vote is increased by a factor $\overline{F}/{F_j}$ for classifiers that are sufficiently experienced $(exp_j < \theta_{\text{del}})$ and with small fitness $(F_j < \delta\overline{F}$) where $\overline{F}$ is the $[P]$ mean fitness, and typically $\delta = 0.1$. Classifiers selected for deletion have their (micro-classifier) $num$ decremented, and in the event that $num < 1$, are removed from $[P]$.

In multi-step problems, the previous action set $[A]_{-1}$ is updated after each step using a $\gamma \in$ [0,1] discounted reward (similar to Q-learning) and the EA may be run therein.

Schematic

A schematic illustration of XCSF is shown above. For supervised learning, a single (dummy) action can be used such that $[A] = [M]$ and the system prediction made directly accessible to the environment.

A number of interacting pressures have been identified. A set pressure provides more frequent reproduction opportunities for more general rules. In opposition is a fitness pressure which represses the reproduction of inaccurate and over-general rules. Many forms of $cl.C$, $cl.A$, and $cl.P$ have been used for classifier knowledge representation since the original ternary conditions, integer actions, and scalar predictions. Some of these are implemented here.

Note: since XCSF is a generalisation of XCS, original XCS behaviour can be specified by using Ternary conditions, Integer actions, and Constant predictions. It is a common misconception that "XCSF is for function approximation and XCS is for RL": XCS can perform regression with piecewise constant predictions (see Wilson, 2001) and XCSF can be used for RL where the payoff is computed.

Related Literature:

S. W. Wilson (1995) Classifier fitness based on accuracy
S. W. Wilson (2001) Function approximation with a classifier system
S. W. Wilson (2002) Classifiers that approximate functions
M. V. Butz (2006) Rule-based evolutionary online learning systems
R. J. Urbanowicz and J. H. Moore (2009) Learning classifier systems: A complete introduction, review, and roadmap
L. Bull (2015) A brief history of learning classifier systems: From CS-1 to XCS and its variants
R. J. Urbanowicz and W. N. Browne (2017) Introduction to learning classifier systems

Features

This project implements both supervised learning via the updating of match set predictions directly and reinforcement learning via the updating of action set predictions with an environment reward. All mutation rates self-adapted.

Python Library

See example notebooks and Python Library API.

Conditions

Always matching dummy conditions
Hyperrectangles (center-spread and unordered bound)
Hyperellipsoids (center-spread)
Neural networks
GP trees
Dynamical GP graphs
Ternary bitstrings (binarises inputs)
Both conditions and actions in single dynamical GP graphs
Both conditions and actions in single neural networks

Actions

Integers
Neural networks

Predictions

Piece-wise constant
Linear least squares
Quadratic least squares
Linear recursive least squares
Quadratic recursive least squares
Stochastic gradient descent neural networks

This project is released under the terms of the GNU General Public License v3.0 (GPLv3).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly