-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathGP_Tree.py
1852 lines (1484 loc) · 88.8 KB
/
GP_Tree.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
"""
Name: GP_Tree.py
Authors: Ryan Urbanowicz - Written at Dartmouth College, Hanover, NH, USA
Contact: [email protected]
Created: February 20, 2018 (Siddharth Verma - Netaji Subhas Institute of Technology, Delhi, India.)
Description: This module defines an individual GP Tree classifier within the rule population, along with all respective parameters.
Also included are tree-level methods, including matching, subsumption, crossover, and mutation.
Parameter update methods are also included.
---------------------------------------------------------------------------------------------------------------------------------------------------------
ExSTraCS V2.0: Extended Supervised Tracking and Classifying System - An advanced LCS designed specifically for complex, noisy classification/data mining tasks,
such as biomedical/bioinformatics/epidemiological problem domains. This algorithm should be well suited to any supervised learning problem involving
classification, prediction, data mining, and knowledge discovery. This algorithm would NOT be suited to function approximation, behavioral modeling,
or other multi-step problems. This LCS algorithm is most closely based on the "UCS" algorithm, an LCS introduced by Ester Bernado-Mansilla and
Josep Garrell-Guiu (2003) which in turn is based heavily on "XCS", an LCS introduced by Stewart Wilson (1995).
Copyright (C) 2014 Ryan Urbanowicz
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABLILITY
or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation,
Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
---------------------------------------------------------------------------------------------------------------------------------------------------------
"""
import math
import numpy as np
import copy
import random
import string
import Functions
from collections import deque
from operator import attrgetter
from exstracs_constants import *
from exstracs_pareto import *
class GP_Tree:
"""Genetic-Programming Syntax Tree created for performing the common GP operations on trees. This tree is traversed
and represented according to Depth-First-search (DFS) traversal.
Parameters and functions used to describe the tree are described as follows:"""
# Node class of the tree which contains the terminal or the function with its children.
class _Node:
def __init__(self, data):
"""
:rtype: _Node object
"""
# Data is the function or the terminal constant
self.data = data
# Parent of every node will also be provided (Not implemented yet).
self.parent = None
# Its size would be equal to the arity of the function. For the terminals, it would be none
self.children = None
# This is overriden to define the representation of every node. The function is called recursively
# to build the whole representation of the tree.
def __str__(self):
# "retval" is the final string that would be returned. Here the content of the node is added.
retval = str(self.data) + "=>"
# Content of the children of the node is added here in retval.
if self.children is not None:
for child in self.children:
retval += str(child.data) + ", "
retval += "END\n" # After every node and its children, string is concatenated with "END"
# Recursive calls to all the nodes of the tree.
if self.children is not None:
for child in self.children:
retval += child.__str__()
return retval
def __init__(self, function_set=("add", "mul", "sub", "div", "cos", "max", "sin", "neg", "min"),\
num_features=None, min_depth=2, max_depth=3):
"""The constructor of GP_Tree accepts the function set, terminal set, number of features to keep in the tree
(eg: if the value of num_features =1, features in tree would be X0, if num_features=3, features in tree would be
X0, X1, X2), max and min depth values and the fitness metric to be used."""
self.function_set = function_set
# A list of functions to be used. Custom functions can be created.'
self.terminal_set = [random.randint(-6, 6), random.randint(-6, 6), random.randint(-6, 6)]
# List of floating point or zero arity functions acting as the terminals
# of the tree
if num_features is not None:
self.num_features = num_features
else:
self.num_features = int(cons.env.formatData.numAttributes)
# Specifies the num of features in the input file
self.min_depth = min_depth
# Specifies the minimum depth of the tree.
self.max_depth = max_depth
# Specifies the maximum depth of the tree.
###################################################################
# ExSTraCS parameters #
###################################################################
self.timeStampGA = 0 # Time since rule last in a correct set.
self.initTimeStamp = 0 # Iteration in which the rule first appeared.
self.aveMatchSetSize = 1
self.lastMatch = 0 # Experimental - for brief fitness update
# Classifier Accuracy Tracking --------------------------------------
self.matchCount = 0 # Known in many LCS implementations as experience i.e. the total number of times this
# classifier was in a match set
self.correctCount = 0 # The total number of times this classifier was in a correct set
self.matchCover = 0 # The total number of times this classifier was in a match set within a single epoch.
# (value fixed after epochComplete)
self.correctCover = 0 # The total number of times this classifier was in a correct set within a single epoch.
# (value fixed after epochComplete)
# Covering sets initially overly optimistic prediction values - this takes care of error with prediction which
# previously had only zero value fitness and indFitness scores for covered rules.
self.indFitness = 1.0
self.fitness = 1.0
self.condition = None
self.errorSum = 0
self.errorCount = 0
self.phenCount = 0 # number of times phenProb added to count for testing reasons
self.phenSum = 0 # sum of phenotype probability calculation values for continuous variables
self.totalFreq = 1
self.id = ''.join(random.choice(string.ascii_lowercase) for i in range(7))
self.one_count = 0
self.zero_count = 0
self.isTree = True
# Major Parameters --------------------------------------------------
self.phenotype = None # Class if the endpoint is discrete, and a continuous phenotype if the endpoint is
# continuous
self.phenotype_RP = None # NEW - probability of this phenotype occurring by chance.
# Fitness Metrics ---------------------------------------------------
self.accuracy = 0.0 # Classifier accuracy - Accuracy calculated using only instances in the data set which this
# rule matched.
self.accuracyComponent = None # Accuracy adjusted based on accuracy by random chance, and transformed to give
# more weight to early changes in accuracy.
self.coverDiff = 1 # Number of instance correctly covered by rule beyond what would be expected by chance.
self.indFitness = 0.0 # Fitness from the perspective of an individual rule (strength/accuracy based)
self.relativeIndFitness = None
self.fitness = 1 # CHANGED: Original self.fitness = cons.init_fit
# Classifier fitness - initialized to a constant initial fitness value
self.numerosity = 1 # The number of rule copies stored in the population.
# (Indirectly stored as incremented numerosity)
self.aveMatchSetSize = None # A parameter used in deletion which reflects the size of match sets within this
# rule has been included.
self.deletionVote = None # The current deletion weight for this classifier.
# Experience Management ---------------------------------------------
self.epochComplete = False # Has this rule existed for a complete epoch (i.e. a cycle through training set).
# Fitness Metrics ---------------------------------------------------
# self.freqComponent = None
# Fitness Sharing
# self.adjustedAccuracy = 0.0
# self.relativeAccuracy = 0.0
self.lastMatch = 0
self.lastFitness = 0.0
self.sumIndFitness = 1.0 # experimental
self.partOfCorrect = True
self.lastMatchFitness = 1.0
# self.isMatchSetFitness = False #For Debugging
self.aveRelativeIndFitness = None
self.matchedAndFrontEstablished = False
self.totalFreq = 1
###################################################################
# Some required parameters for the tree:
###################################################################
self.root = None
# This is the root of the tree. It is from here, that the tree is traversed by every member function.
self.number_of_terminals = 0
self.number_of_functions = 0
# These parameters are required for calculating the "terminal_ratio" in generation methods.
self._add_features_in_terminal_set(prefix="X")
# Features are added in the final terminal_set
self.generate_half_and_half()
# Generate the Ramped Half and Half structure of the tree.
self.specifiedAttList = self.getSpecifiedAttList()
# Get the Specified Attribute List of the tree containing all the attributes in the tree.
####################################################################
# this returns the string representation of the root which builds the representation of the whole tree recursively.
def __str__(self):
return self.root.__str__()
@property
def terminal_ratio(self):
# Returns the ratio of the number of terminals to the number of all the functions in the tree.
return self.number_of_terminals / float(self.number_of_terminals + self.number_of_functions)
# Adds the number of arguments as specified in num_features in the syntax tree. The arguments is prefixed
# with "X" followed by the index number. Eg: X0, X1, X2 ....
def _add_features_in_terminal_set(self, prefix):
temp_list = []
for i in range(self.num_features):
feature_str = "{prefix}{index}".format(prefix=prefix, index=i)
temp_list.append(feature_str)
temp_list.extend(self.terminal_set)
self.terminal_set = temp_list
@staticmethod
def create_node(data):
return GP_Tree._Node(data)
#####################################################################################
# Tree Generation Methods #
#####################################################################################
# The main tree generation function. Recursive function that starts building the tree from the root and returns the
# root of the constructed tree.
def _generate(self, condition, depth, height):
node = None # The node that would be returned and get assigned to the root of the tree.
# See functions: 'generate_full' and 'generate_grow' for assignment to the root of the tree.
# Condition to check if currently function is to be added. If the condition is false, then the terminal
# is not yet reached and a function should be inserted.
if condition(depth, height) is False:
node_data = random.choice(self.function_set) # Randomly choosing a function from the function set
node_arity = Functions.get_arity(
node_data) # Getting the arity of the function to determine the node's children
node = GP_Tree._Node(node_data) # Creating the node.
self.number_of_functions += 1
node.children = [] # Creating the empty children list
for _ in range(node_arity):
child = self._generate(condition, depth + 1, height)
child.parent = node
node.children.append(child) # Children are added recursively.
else: # Now the terminal should be inserted
node_data = random.choice(self.terminal_set) # Choosing the terminal randomly.
node = GP_Tree._Node(node_data) # Creating the terminal node
self.number_of_terminals += 1
node.children = None # Children is none as the arity of the terminals is 0.
return node # Return the node created.
def generate_full(self):
# The method constructs the full tree. Note that only the function 'condition' is different in the
# 'generate_grow()' and 'generate_full()' methods.
def condition(depth, height):
return depth == height
height = random.randint(self.min_depth, self.max_depth)
self.root = self._generate(condition, 0, height)
def generate_grow(self):
# The method constructs a grown tree.
def condition(depth, height):
return depth == height or (depth >= self.min_depth and random.random() < self.terminal_ratio)
height = random.randint(self.min_depth, self.max_depth)
self.root = self._generate(condition, 0, height)
def generate_half_and_half(self):
# Half the time, the expression is generated with 'generate_full()', the other half,
# the expression is generated with 'generate_grow()'.
# Selecting grow or full method randomly.
method = random.choice((self.generate_grow, self.generate_full))
# Returns either a full or a grown tree
method()
#####################################################################################
# Tree Traversal Methods: Different ways of representing the tree expression #
#####################################################################################
# Depth-First-Traversal of the tree. It first reads the node, then the left child and then the right child.
def tree_expression_DFS(self):
expression = [] # expression to be built and returned
# Recursive function as an helper to this function.
self._tree_expression_DFS_helper(self.root, expression)
return expression
# Helper recursive function needed by the function "tree_expression_DFS()".
def _tree_expression_DFS_helper(self, node, expression):
expression.append(node.data) # Expression to be built.
if node.children is not None:
for child in node.children: # Adding children to the expression recursively.
self._tree_expression_DFS_helper(child, expression)
return
# Breadth-First-Traversal of the tree. It first reads the left child, then the node itself and then the
# right child.
def tree_expression_BFS(self):
q = deque() # BFS is implemented using a queue (FIFO)
expression = [] # Expression to be built and returned.
# Adding root to the queue
node = self.root
q.append(node)
while q:
popped_node = q.popleft()
if popped_node.children is not None:
for child in popped_node.children:
q.append(child)
expression.append(popped_node.data)
return expression
#####################################################################################
# Matching #
#####################################################################################
def match(self, state):
return True
#####################################################################################
# Tree Evaluation Methods #
#####################################################################################
# Function that sets the phenotype of the tree classifier. As it is taken from the exstracs code.
def setPhenotype(self,
args): # Ensure that for every instance we recalculate the tree phenotype prediction. It never stays fixed.
args = [int(i) for i in args]
# Tuple to numpy array conversion. This is done because the evaluate function accepts only numpy arrays.
# Reshaping denotes that this is a single sample with number of features = the length of the input args.
args = np.asarray(args).reshape(1, len(args))
dataInfo = cons.env.formatData
# -------------------------------------------------------
# BINARY PHENOTYPE - ONLY
# -------------------------------------------------------
if dataInfo.discretePhenotype and len(dataInfo.phenotypeList) == 2:
if self.evaluate(args) > 0:
self.phenotype = '1'
if not self.epochComplete:
self.one_count += 1
else:
self.phenotype = '0'
if not self.epochComplete:
self.zero_count += 1
if not self.epochComplete: # Not sure where Ben came up with this but it seems to makes reasonable sense. May want to examie more carefully.
self.phenotype_RP = ((cons.env.formatData.classProportions['1'] * self.one_count) + (
cons.env.formatData.classProportions['0'] * self.zero_count)) / (
self.one_count + self.zero_count)
# For trees there could be one uniform random chance. Ultimately we want to use balanced accuracy for trees (do we do better than by chance) but for now we will just use 0.5 for simplicity.
# -------------------------------------------------------
# MULTICLASS PHENOTYPE
# -------------------------------------------------------
elif dataInfo.discretePhenotype and not len(dataInfo.phenotypeList) == 2:
if self.evaluate(args) < 0: # lowest class
self.phenotype = dataInfo.phenotypeList[0]
elif self.evaluate(args) >= len(dataInfo.phenotypeList) - 1: # lowest class
self.phenotype = dataInfo.phenotypeList[len(dataInfo.phenotypeList) - 1]
else: # one of the middle classes
count = 1
notfoundClass = True
while notfoundClass:
if self.evaluate(args) < count and self.evaluate(args) >= count - 1:
self.phenotype = dataInfo.phenotypeList[count]
notfoundClass = False
if count > len(dataInfo.phenotypeList):
notfoundClass = False
print("ERROR:setPhenotype in tree: Failed to find class")
count += 1
if not self.epochComplete: # Not sure where Ben came up with this but it seems to makes reasonable sense. May want to examie more carefully.
self.phenotype_RP = 0.5
# -------------------------------------------------------
# CONTINUOUS PHENOTYPE
# -------------------------------------------------------
else: # ContinuousCode #########################
self.phenotype = self.evaluate(args)
if not self.epochComplete: # Not sure where Ben came up with this but it seems to makes reasonable sense. May want to examie more carefully.
self.phenotype_RP = 0.5
# This function evaluates the syntax tree and returns the value evaluated by the tree.
def evaluate(self, X_Data):
# X_Data : shape = [n_samples, num_features]
# Training vectors, where n_samples is the number of samples and num_features is the number of features.
# Return: Y_Pred: shape = [n_samples]
# Evaluated value of the n_samples.
Y_Pred = []
# print (X_Data)
for features in X_Data:
if features.size != self.num_features:
print(self.num_features, features.size)
raise ValueError("Number of input features in X_Data is not equal to the parameter: 'num_features'.")
Y_Pred.append(self._evaluate_helper(self.root, features))
return float(np.asarray(Y_Pred))
# Helper function for the func: "evaluate()". This makes recursive calls for the evaluation.
def _evaluate_helper(self, node, X_Data):
# Terminal nodes
if node.children is None:
if isinstance(node.data, str):
feature_name = node.data
index = int(feature_name[1:])
return X_Data[index]
else:
return node.data
args = [] # Will contain the input arguments i.e the children of the function in the tree.
for child in node.children:
args.append(self._evaluate_helper(child, X_Data)) # Evaluation by the recursive calls.
func = Functions.get_function(node.data) # Get the function from the alias name
return float(func(*args)) # Return the computed value
#####################################################################################
# Genetic operations: These functions are made just for the integration with exstracs#
#####################################################################################
def uniformCrossover(self, classifier, state, phenotype):
"""This function is made only for the sake of integration with exstracs. Real
crossover takes place takes place in the func: "uniformCrossover" which is
declared outside this class."""
return uniformCrossover(self, classifier, state)
def Mutation(self, state, phenotype):
origform = copy.deepcopy(self)
'''Need to confirm from Ryan, which method to call?
# Option 1: Call this method
mutation_NodeReplacement(self)
# Option 2: Call this method
randomTree=GP_Tree(min_depth=3, max_depth=4)
randomTree.generate_half_and_half()
mutation_Uniform(self, randomTree)
'''
return str(self) == str(origform)
#####################################################################################
# Deletion Method #
#####################################################################################
# Function to calculate the deletion vote of the tree.
def getDelProp(self, meanFitness):
""" Returns the vote for deletion of the classifier. """
if self.fitness / self.numerosity >= cons.delta * meanFitness or self.matchCount < cons.theta_del:
self.deletionVote = self.aveMatchSetSize * self.numerosity
elif self.fitness == 0.0 or self.fitness / self.numerosity == 0.0:
self.deletionVote = self.aveMatchSetSize * self.numerosity * meanFitness / (cons.init_fit / self.numerosity)
else:
# print self.fitness
self.deletionVote = self.aveMatchSetSize * self.numerosity * meanFitness / (
self.fitness / self.numerosity) # note, numerosity seems redundant (look into theory of deletion in LCS.
return self.deletionVote
#####################################################################################
# Subsumption Methods #
#####################################################################################
# figure out which booleans these should return
def subsumes(self, cl):
""" Returns if the classifier (self) subsumes cl """
return False
def isSubsumer(self):
""" Returns if the classifier (self) is a possible subsumer. A classifier must have sufficient experience (one epoch) and it must also be as or more accurate than the classifier it is trying to subsume. """
return False
def isMoreGeneral(self, cl):
""" Returns if the classifier (self) is more general than cl. Check that all attributes specified in self are also specified in cl. """
return False
#####################################################################################
# Some Miscellaneous methods needed in exstracs framework #
#####################################################################################
def setPhenProb(self):
pass
# Taken from the exstracs code
def calcPhenProb(self, error):
if self.epochComplete:
return
phenRange = [self.phenotype - error, self.phenotype + error]
count = 0
ref = 0
# print cons.env.formatData.phenotypeRanked
# print self.phenotype
while ref < len(cons.env.formatData.phenotypeRanked) and cons.env.formatData.phenotypeRanked[ref] <= phenRange[
1]:
if cons.env.formatData.phenotypeRanked[ref] >= phenRange[0]:
count += 1
ref += 1
self.phenSum += count / float(cons.env.formatData.numTrainInstances)
self.phenCount += 1
def equals(self, cl):
if not cl.isTree:
return False
else:
return str(cl) == str(self)
def getSpecifiedAttList(self):
tree_args = set()
self._getSpecifiedAttList_Helper(self.root, tree_args)
return list(tree_args)
def _getSpecifiedAttList_Helper(self, node, tree_args):
if (node.children is None):
if isinstance(node.data, str):
nodeAtt = int(float(node.data.split('X')[1]))
tree_args.add(nodeAtt)
return
for child in node.children:
self._getSpecifiedAttList_Helper(child, tree_args)
#####################################################################################
# Parameters Update #
#####################################################################################
def updateClonePhenotype(self, phenotype):
if phenotype == self.phenotype:
self.correctCount = 1
self.correctCover = 1
else:
self.correctCount = 0
self.correctCover = 0
def updateEpochStatus(self, exploreIter):
""" Determines when a learning epoch has completed (one cycle through training data). """
# if not self.epochComplete and (exploreIter - self.initTimeStamp-1) >= cons.env.formatData.numTrainInstances and cons.offlineData:
if not self.epochComplete and (
exploreIter - self.initTimeStamp) >= cons.env.formatData.numTrainInstances and cons.offlineData:
self.epochComplete = True
cons.firstEpochComplete = True
# randomProbClass = cons.env.formatData.classProportions[self.phenotype] #Put this in as a fix - for rules that become epoch complete after having been extrapolated on a previous run.
# self.usefulDiff = self.correctCover - randomProbClass*self.matchCover
self.usefulDiff = (
self.correctCover - self.phenotype_RP * self.matchCover) # *len(self.specifiedAttList)
# Pareto Fitness - Epoch Complete Front Construction
if self.accuracyComponent > 0 and self.usefulDiff > 0:
objectivePair = [self.accuracyComponent, self.usefulDiff]
changedme = cons.env.formatData.ecFront.updateFront(objectivePair)
# if changedme:
# print 'new'
# print self.accuracyComponent
# print self.usefulDiff
# print self.initTimeStamp
# if self.accuracyComponent == 0.692582281546 and self.usefulDiff == 204.723:
# print'I was born and now am complete'
# print self.initTimeStamp
return True
return False
def updateExperience(self):
""" Increases the experience of the classifier by one. Once an epoch has completed, rule accuracy can't change."""
self.matchCount += 1
if self.epochComplete: # Once epoch Completed, number of matches for a unique rule will not change, so do repeat calculation
pass
else:
self.matchCover += 1
######MOVE TO NEW PLACE EVENTUALLY
if not cons.env.formatData.discretePhenotype:
self.phenotype_RP = 0.5
# print "gets here"
"""
self.phenotype_RP = float(self.phenSum) / float(self.phenCount)
if self.phenotype_RP > 1:
print("phenSum: " + str(self.phenSum))
print("matchCover: " + str(self.matchCover))
print("RP: " + str(self.phenotype_RP))
print("Count: " + str(self.phenCount))
raise NameError("Problem with phenotype_RP")
"""
def updateCorrect(self):
""" Increases the correct phenotype tracking by one. Once an epoch has completed, rule accuracy can't change."""
self.correctCount += 1
if self.epochComplete: # Once epoch Completed, number of correct for a unique rule will not change, so do repeat calculation
pass
else:
self.correctCover += 1
def updateError(self, trueEndpoint):
if not self.epochComplete:
# Error caclulations are limited to extremes of observed training data phenotypes in calculating the range centroid.
if self.phenotype > cons.env.formatData.phenotypeList[1]:
adjustedError = 1
elif self.phenotype < cons.env.formatData.phenotypeList[0]:
adjustedError = 1
else:
error = abs(self.phenotype - trueEndpoint)
adjustedError = error / (cons.env.formatData.phenotypeList[1] - cons.env.formatData.phenotypeList[0])
self.errorSum += adjustedError # Error is fraction of total phenotype range (i.e. maximum possible error)
# if adjustedError == 0:
# print("#########################################")
self.errorCount += 1
def updateIncorrectError(self):
if not self.epochComplete:
self.errorSum += 1.0
self.errorCount += 1
def updateAccuracy(self, exploreIter):
""" Update the accuracy tracker """
nonUsefulDiscount = 0.001
coverOpportunity = 1000
adjAccuracy = 0
# -----------------------------------------------------------------------------------
# CALCULATE ACCURACY
# -----------------------------------------------------------------------------------
if cons.env.formatData.discretePhenotype:
self.accuracy = self.correctCover / float(self.matchCover)
else: # ContinuousCode #########################
self.accuracy = 1 - (
self.errorSum / self.matchCover) # 1- average error based on range centroid. Should be natural pressure to achieve narrow endpoint range.
# -----------------------------------------------------------------------------------
# CALCULATE ADJUSTED ACCURACY
# -----------------------------------------------------------------------------------
if self.accuracy > self.phenotype_RP:
adjAccuracy = self.accuracy - self.phenotype_RP
elif self.matchCover == 2 and self.correctCover == 1 and not self.epochComplete and (
exploreIter - self.timeStampGA) < coverOpportunity:
adjAccuracy = self.phenotype_RP / 2.0
else:
adjAccuracy = self.accuracy * nonUsefulDiscount
# -----------------------------------------------------------------------------------
# CALCULATE ACCURACY COMPONENT
# -----------------------------------------------------------------------------------
maxAccuracy = 1 - self.phenotype_RP
if maxAccuracy == 0:
self.accuracyComponent = 0
else:
self.accuracyComponent = adjAccuracy / float(
maxAccuracy) # Accuracy contribution scaled between 0 and 1 allowing for different maximum accuracies
self.accuracyComponent = 2 * ((1 / float(1 + math.exp(-5 * self.accuracyComponent))) - 0.5) / float(
0.98661429815)
self.accuracyComponent = math.pow(self.accuracyComponent, 1)
def updateCorrectCoverage(self):
self.coverDiff = (self.correctCover - self.phenotype_RP * self.matchCover)
def updateIndFitness(self, exploreIter):
""" Calculates the fitness of an individual rule based on it's accuracy and correct coverage relative to the 'Pareto' front """
coverOpportunity = 1000
if self.coverDiff > 0:
# print 'quality'
# -----------------------------------------------------------------------------------
# CALCULATE CORRECT COVER DIFFERENCE COMPONENT
# -----------------------------------------------------------------------------------
# NOTES: Coverage is directly comparable when epoch complete, otherwise we want to estimate what coverage might be farther out.
if self.epochComplete:
# Get Pareto Metric
self.indFitness = cons.env.formatData.ecFront.getParetoFitness([self.accuracyComponent, self.coverDiff])
else: # Rule Not epoch complete
# EXTRAPOLATE coverDiff up to number of trainin instances (i.e. age at epoch complete)
ruleAge = exploreIter - self.initTimeStamp + 1 # Correct, because we include the current instance we are on.
self.coverDiff = self.coverDiff * cons.env.formatData.numTrainInstances / float(ruleAge)
objectivePair = [self.accuracyComponent, self.coverDiff]
# BEFORE PARETO FRONTS BEGIN TO BE UPDATED
if len(
cons.env.formatData.necFront.paretoFrontAcc) == 0: # Nothing stored yet on incomplete epoch front
# Temporarily use accuracy as fitness in this very early stage.
# print 'fit path 1'
self.indFitness = self.accuracyComponent
if ruleAge >= coverOpportunity: # attempt to update front
cons.env.formatData.necFront.updateFront(objectivePair)
# PARETO FRONTS ONLINE
else: # Some pareto front established.
if len(cons.env.formatData.ecFront.paretoFrontAcc) > 0: # Leave epoch incomplete front behind.
self.indFitness = cons.env.formatData.ecFront.getParetoFitness(objectivePair)
# print 'fit path 2'
else: # Keep updating and evaluating with epoch incomplete front.
if ruleAge < coverOpportunity: # Very young rules can not affect bestCoverDiff
self.preFitness = cons.env.formatData.necFront.getParetoFitness(objectivePair)
# print 'fit path 3'
else:
cons.env.formatData.necFront.updateFront(objectivePair)
self.indFitness = cons.env.formatData.necFront.getParetoFitness(objectivePair)
self.matchedAndFrontEstablished = True
# print 'fit path 4'
else:
# print 'poor'
# print self.accuracyComponent
self.indFitness = self.accuracyComponent / float(1000)
if self.indFitness < 0:
print("negative fitness error")
if round(self.indFitness,
5) > 1: # rounding added to handle odd division error, where 1.0 was being treated as a very small decimal just above 1.0
print("big fitness error")
# self.indFitness = math.pow(self.indFitness,cons.nu) #Removed 11/25/15 - seems redundant with accuracy version (use one or the other)
if self.indFitness < 0:
print("CoverDiff: " + str(self.coverDiff))
print("Accuracy: " + str(self.accuracyComponent))
print("Fitness: " + str(self.indFitness))
raise NameError("Problem with fitness")
self.lastIndFitness = copy.deepcopy(self.indFitness)
# NEW
def updateRelativeIndFitness(self, indFitSum, partOfCorrect, exploreIter):
""" Updates the relative individual fitness calculation """
self.sumIndFitness = indFitSum
self.partOfCorrect = partOfCorrect
# self.lastRelativeIndFitness = copy.deepcopy(self.relativeIndFitness) #Is this needed????
if partOfCorrect:
self.relativeIndFitness = self.indFitness * self.numerosity / float(self.sumIndFitness)
# self.relativeIndFitness = self.indFitness/float(self.sumIndFitness) #Treat epoch complete or incomplete equally here. This will give young rules a boost (in this method, relative fitness can be larger than 1 for NEC rules.
if self.relativeIndFitness > 1.0:
self.relativeIndFitness = 1.0
# if self.epochComplete: #Fitness shared only with other EC rules.
# self.relativeIndFitness = self.indFitness*self.numerosity / self.sumIndFitness
# else:
# if indFitSum == 0:
# self.relativeIndFitness = 0
# else:
# self.relativeIndFitness = self.indFitness*self.numerosity*(exploreIter-self.initTimeStamp+1) / self.sumIndFitness
else:
self.relativeIndFitness = 0
# NEW
def updateFitness(self, exploreIter):
""" Update the fitness parameter. """
if self.epochComplete:
percRuleExp = 1.0
else:
percRuleExp = (exploreIter - self.initTimeStamp + 1) / float(cons.env.formatData.numTrainInstances)
# Consider the useful accuracy cutoff - -maybe use a less dramatic change or ...na...
beta = 0.2
if self.matchCount >= 1.0 / beta:
# print 'fit A'
# print self.fitness
# print self.relativePreFitness
self.fitness = self.fitness + beta * percRuleExp * (self.relativeIndFitness - self.fitness)
elif self.matchCount == 1 or self.aveRelativeIndFitness == None: # second condition handles special case after GA rule generated, but not has not gone through full matching yet
# print 'fit B'
self.fitness = self.relativeIndFitness
self.aveRelativeIndFitness = self.relativeIndFitness
# if self.initTimeStamp == 2:
# print self.aveRelativePreFitness
# print 5/0
else:
# print 'fit C'
self.fitness = (self.aveRelativeIndFitness * (
self.matchCount - 1) + self.relativeIndFitness) / self.matchCount # often, last releative prefitness is 0!!!!!!!!!!!!!!!!!!!!!!!
self.aveRelativeIndFitness = (self.aveRelativeIndFitness * (
self.matchCount - 1) + self.relativeIndFitness) / self.matchCount
self.lastMatchFitness = copy.deepcopy(self.fitness)
self.fitness = self.indFitness
if self.fitness < 0 or round(self.fitness, 4) > 1:
print('Negative Fitness')
print(self.fitness)
raise NameError("problem with fitness")
# RULE FITNESS TESTING CODE------------------------------------------------------------------------------------------------------
# ruleTimeID = 3279
# if self.initTimeStamp == ruleTimeID:
# print '-----------------'
# print str(self.condition) + str(self.specifiedAttList)
# print 'sumfitness '+str(self.sumIndFitness)
# print 'correct? '+str(self.partOfCorrect)
# print 'relIndFit ' +str(self.relativeIndFitness)
# print 'fitness ' +str(self.fitness)
# self.isMatchSetFitness = True
# if self.initTimeStamp == 2:
# self.reportClassifier('matchupdate')
# if self.fitness < 0.0001:
# print 'fit= '+str(self.fitness)
# print 'uDiff= '+str(self.usefulDiff)
"""
if self.initTimeStamp == 1171:
print('Iteration: '+str(exploreIter))
print(self.numerosity)
print(self.aveMatchSetSize)
print(self.indFitness)
print(self.fitness)
print(self.deletionVote)
"""
# NEW
def briefUpdateFitness(self, exploreIter):
# print 'briefupdateFit'
# Activated durring matchign for all epoch complete rules -
# Recalculate ENC rule fitness based on progression of algorithm (i.e. recalculate Coverage extrapolation and recheck pareto front - don't need to check for nondominated points
# this is because points can only have lower extrapolation inbetween being in a match set. This will effect rules that deal with rare states the most.
# ALSO we need to adapt exstracs to really large datasets, where epoch complete is rare. ( consider prediction (weight votes by experience),
# also consider that INcomplete pareto front might be most important (maybe not shift entirely over to EC front
# Also revisit compaction - for big data - don't want to automatically remove rules that are not EC - want to include a percent data cutoff - could be same as for iterations passing, so that for small datasets, we alway use epoch complete
# but for big ones we use experienced ENC rules.
# Also revisit adjustment of all fitness to be no higher than best area ratio when over the front in extrapolation.
# self.reportClassifier('lastupdate')
# Recalculate coverDiff (this should not have changed) This could be stored, so as not to have to recalculate
# coverDiff = self.correctCover - self.phenotype_RP*self.matchCover
indFitness = None
# Criteria for a fitness update:
# if self.partOfCorrect and coverDiff > 0 and self.matchedAndFrontEstablished == True and (len(cons.env.formatData.necFront.paretoFrontAcc) > 0 or len(cons.env.formatData.ecFront.paretoFrontAcc) > 0): #fitness only changes if was last part of a correct set - cause otherwise releativePreFitness stays at 0
# print(self.coverDiff)
if self.coverDiff > 0 and self.matchedAndFrontEstablished == True and (
len(cons.env.formatData.necFront.paretoFrontAcc) > 0 or len(
cons.env.formatData.ecFront.paretoFrontAcc) > 0): # fitness only changes if was last part of a correct set - cause otherwise releativePreFitness stays at 0
# print 'NEW NEW NEW NEW NEW NEW NEW NEW'
# print 'Before correction-----------------------'
# print 'fitTYPE= ' +str(self.isMatchSetFitness)
# print 'fit= '+str(self.fitness)
# print 'pFit= '+str(self.preFitness)
# print 'uDiff= '+str(self.coverDiff)
# lastPreFitness = copy.deepcopy(self.indFitness)
# EXTRAPOLATE coverDiff up to number of training instances (i.e. age at epoch complete) This changes because more iterations have gone by.*******
ruleAge = exploreIter - self.initTimeStamp + 1 # Correct, because we include the current instance we are on.
coverDiff = self.coverDiff * cons.env.formatData.numTrainInstances / float(ruleAge)
# if coverDiff > self.coverDiff and self.coverDiff != None:
# print 'exploreIter= '+str(exploreIter)
# print 'InitTimestamp+1= ' + str(self.initTimeStamp+1)
# print 'ruleAge= '+ str(ruleAge)
# x = 5/0
# Get new pareto fitness
objectivePair = [self.accuracyComponent, coverDiff]
# BEFORE PARETO FRONTS BEGIN TO BE UPDATED
if len(cons.env.formatData.ecFront.paretoFrontAcc) > 0: # Leave epoch incomplete front behind.
indFitness = cons.env.formatData.ecFront.getParetoFitness(objectivePair)
# print 'EC'
else: # Keep updating and evaluating with epoch incomplete front.
indFitness = cons.env.formatData.necFront.getParetoFitness(objectivePair)
# print 'ENC'
# print 'pFit= '+str(self.indFitness
indFitness = math.pow(indFitness, cons.nu)
# if self.lastPreFitness < self.indFitness or self.lastPreFitness < indFitness:
# if self.initTimeStamp == 2:
# print "SWITCHING OVER FROM ACCURCY TO PARETO FRONT"
# self.reportClassifier('update')
# x = 5/0
# Calculate new adjusted fitness by using last matching update score and recalculating. (preserve originals so that updates are always based on originals, not on updated estimate)
tempSumIndFitness = copy.deepcopy(self.sumIndFitness)
# tempSumIndFitness = tempSumIndFitness - self.indFitness*self.numerosity #this is not the true sum, but the sum with out the current rule's fitness.
tempSumIndFitness = tempSumIndFitness - self.indFitness
if self.lastIndFitness != indFitness: # ok because with many new rules they may still be maxed out at highest fitness possible.
# if self.epochComplete: #Fitness shared only with other EC rules.
# self.relativeIndFitness = self.indFitness*self.numerosity / self.sumPreFitness
# else:
# self.relativeIndFitness = self.indFitness*self.numerosity*(exploreIter-self.initTimeStamp+1) / self.sumPreFitness
# NOTE - have to re-adjust sumprefitnessnec to account for change in indFitness
# Readjust sumIndFitness with new indFitness information. (this is an estimate, because the other rules may have changed.
# tempSumIndFitness = tempSumIndFitness + indFitness*self.numerosity
tempSumIndFitness = tempSumIndFitness + indFitness
# self.relativeIndFitness = indFitness*self.numerosity / tempSumIndFitness
self.relativeIndFitness = indFitness / float(tempSumIndFitness)
percRuleExp = (exploreIter - self.initTimeStamp + 1) / float(cons.env.formatData.numTrainInstances)
# Consider the useful accuracy cutoff - -maybe use a less dramatic change or ...na...
beta = 0.2
if self.matchCount >= 1.0 / beta:
self.fitness = self.lastMatchFitness + beta * percRuleExp * (
self.relativeIndFitness - self.lastMatchFitness)
elif self.matchCount == 1 or self.aveRelativeIndFitness == None: # second condition handles special case after GA rule generated, but not has not gone through full matching yet
# print 'fit B'
self.fitness = self.relativeIndFitness
# self.aveRelativeIndFitness = self.relativeIndFitness
# if self.initTimeStamp == 2:
# print self.aveRelativePreFitness
# print 5/0
else:
# print 'fit C'
self.fitness = (self.aveRelativeIndFitness * (
self.matchCount - 1) + self.relativeIndFitness) / self.matchCount # often, last releative prefitness is 0!!!!!!!!!!!!!!!!!!!!!!!
# self.aveRelativeIndFitness = (self.aveRelativeIndFitness*(self.matchCount-1)+self.relativeIndFitness)/self.matchCount
# if cons.env.formatData.discretePhenotype:
# #Consider the useful accuracy cutoff - -maybe use a less dramatic change or ...na...
# beta = 0.2
# if self.matchCount >= 1.0/beta:
# self.fitness = self.lastMatchFitness + beta*(self.relativeIndFitness-self.lastMatchFitness)
# elif self.matchCount == 1:
# self.fitness = self.relativeIndFitness
# else:
# self.fitness = (self.aveRelativeIndFitness*(self.matchCount-1)+self.relativeIndFitness)/self.matchCount
#
#
# #self.fitness = self.indFitness #TEMPORARY
# else: #ContinuousCode #########################
# #UPDATE NEEDED!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
# if (self.phenotype[1]-self.phenotype[0]) >= cons.env.formatData.phenotypeRange:
# self.fitness = pow(0.0001, 5)
# else:
# if self.matchCover < 2 and self.epochComplete:
# self.fitness = pow(0.0001, 5)
# else:
# self.fitness = pow(self.accuracy, cons.nu) #- (self.phenotype[1]-self.phenotype[0])/cons.env.formatData.phenotypeRange)
# self.isMatchSetFitness = False
# x= 5/0
# if self.initTimeStamp == 2:
# self.reportClassifier('update')
else: # No fitness extrapolation update required
# if self.initTimeStamp == 2:
# print 'no update required B'
pass
else: # No fitness extrapolation update required
# if self.initTimeStamp == 2:
# print 'no update required A'
pass
# self.reportClassifier('update')
if round(self.fitness, 5) > 1:
self.fitness = 1.0
print('FITNESS ERROR - adjust - too high')
if self.fitness < 0:
self.fitness = 0.0
print('FITNESS ERROR - adjust - too low')
# print self.fitness
# x = 5/0
self.lastIndFitness = copy.deepcopy(
indFitness) # TARGET - won't this cause below to never run? - also where is lastIndFitness first stored??don't see it above.
self.fitness = self.indFitness
def updateNumerosity(self, num):
""" Alters the numberosity of the classifier. Notice that num can be negative! """
self.numerosity += num
def updateMatchSetSize(self, matchSetSize):
""" Updates the average match set size. """
if self.matchCount == 1:
self.aveMatchSetSize = matchSetSize
elif self.matchCount < 1.0 / cons.beta: # < 5
self.aveMatchSetSize = (self.aveMatchSetSize * (self.matchCount - 1) + matchSetSize) / float(
self.matchCount)