-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DES #490
base: main
Are you sure you want to change the base?
Add DES #490
Conversation
66d00fc
to
dbbf329
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
578470b
to
4edd689
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first part of the review. I will check the rest of the code later.
} | ||
|
||
void NARAlgorithm::LoadDataInternal() { | ||
typed_relation_ = model::ColumnLayoutTypedRelationData::CreateFrom(*input_table_, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the second parameter of CreateFrom
function is is_null_equal_null
. this parameter can be introduced as an algorithm parameter, like this way.
or, maybe, the authors of the algorithm have discussed this case in the paper?
auto domain = domains[feature_index]; | ||
auto decoded = encoded_value_ranges_[feature_index].Decode(domain); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, you copying the shared pointer when using auto domain = domains[feature_index]
. Use const reference, no need to increase ref count of shared ptr for no reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, no need to create an additional copy of decoded
.
I'm aware that InsertInCons
wants a shared pointer, not a const ref, but when passing const ref to it it will create a copy automatically, so no problems here.
EncodedNAR::EncodedNAR(FeatureDomains domains, TypedRelation const* typed_relation) { | ||
size_t feature_count = domains.size(); | ||
for (size_t feature_index = 0; feature_index < feature_count; feature_index++) { | ||
encoded_value_ranges_.emplace_back(EncodedValueRange()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a constructor, so we can just do
EncodedNAR::EncodedNAR(FeatureDomains domains, TypedRelation const* typed_relation) { | |
size_t feature_count = domains.size(); | |
for (size_t feature_index = 0; feature_index < feature_count; feature_index++) { | |
encoded_value_ranges_.emplace_back(EncodedValueRange()); | |
} | |
EncodedNAR::EncodedNAR(const FeatureDomains& domains, const TypedRelation* typed_relation) | |
: encoded_value_ranges_(domains.size(), EncodedValueRange()) { |
implication_sign_pos_ = RNG().Next(); | ||
SetQualities(domains, typed_relation); | ||
} | ||
|
||
EncodedNAR::EncodedNAR(size_t feature_count) { | ||
for (size_t feature_index = 0; feature_index < feature_count; feature_index++) { | ||
encoded_value_ranges_.emplace_back(EncodedValueRange()); | ||
} | ||
implication_sign_pos_ = RNG().Next(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing with RNG
, just create a std::random_device
as a field in EncodedNAR
class
EncodedNAR::EncodedNAR(size_t feature_count) { | ||
for (size_t feature_index = 0; feature_index < feature_count; feature_index++) { | ||
encoded_value_ranges_.emplace_back(EncodedValueRange()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EncodedNAR::EncodedNAR(size_t feature_count) { | |
for (size_t feature_index = 0; feature_index < feature_count; feature_index++) { | |
encoded_value_ranges_.emplace_back(EncodedValueRange()); | |
} | |
EncodedNAR::EncodedNAR(size_t feature_count) | |
: encoded_value_ranges_(feature_count, EncodedValueRange()) { |
using model::ValueRange; | ||
|
||
DES::DES() : NARAlgorithm({}) { | ||
using namespace config::names; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using namespace config::names; |
@@ -0,0 +1,89 @@ | |||
#include "des.h" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include-what-you-use is planned to be introduced in the CI, so it would be better to add these lines:
#include <algorithm> | |
#include <cstddef> | |
#include <vector> | |
#include <memory> |
This also applies to other files
|
||
static FeatureDomains FindFeatureDomains(TypedRelation const* typed_relation); | ||
std::vector<EncodedNAR> GetRandomPopulationInDomains(FeatureDomains domains) const; | ||
void EvolvePopulation(std::vector<EncodedNAR>& population) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find definition of DES::EvolvePopulation
method. Shouldn't it's declaration be removed?
|
||
private: | ||
double implication_sign_pos_ = -1; | ||
std::vector<EncodedValueRange> encoded_value_ranges_ = std::vector<EncodedValueRange>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::vector<EncodedValueRange> encoded_value_ranges_ = std::vector<EncodedValueRange>(); | |
std::vector<EncodedValueRange> encoded_value_ranges_{}; |
switch (index) { | ||
case 0: | ||
return permutation; | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These break
's are unreachable
std::shared_ptr<model::ValueRange> Decode(std::shared_ptr<model::ValueRange> domain) const { | ||
switch (domain->GetTypeId()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, it's better to add
std::shared_ptr<model::ValueRange> Decode(std::shared_ptr<model::ValueRange> domain) const { | |
switch (domain->GetTypeId()) { | |
std::shared_ptr<model::ValueRange> Decode(std::shared_ptr<model::ValueRange> domain) const { | |
using namespace model; | |
switch (domain->GetTypeId()) { |
and remove model::
further in this method
case model::TypeId::kInt: { | ||
std::shared_ptr<model::IntValueRange> int_domain = | ||
std::static_pointer_cast<model::IntValueRange>(domain); | ||
model::Int span = int_domain->upper_bound - int_domain->lower_bound; | ||
model::Int resulting_lower = int_domain->lower_bound + span * this->bound1; | ||
model::Int resulting_upper = int_domain->lower_bound + span * this->bound2; | ||
if (resulting_lower > resulting_upper) { | ||
std::swap(resulting_lower, resulting_upper); | ||
} | ||
return std::make_shared<model::IntValueRange>( | ||
model::IntValueRange(resulting_lower, resulting_upper)); | ||
} | ||
case model::TypeId::kDouble: { | ||
std::shared_ptr<model::DoubleValueRange> double_domain = | ||
std::static_pointer_cast<model::DoubleValueRange>(domain); | ||
model::Double span = double_domain->upper_bound - double_domain->lower_bound; | ||
model::Double resulting_lower = double_domain->lower_bound + span * this->bound1; | ||
model::Double resulting_upper = double_domain->lower_bound + span * this->bound2; | ||
if (resulting_lower > resulting_upper) { | ||
std::swap(resulting_lower, resulting_upper); | ||
} | ||
return std::make_shared<model::DoubleValueRange>( | ||
model::DoubleValueRange(resulting_lower, resulting_upper)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's better to use private template method here?
template <typename T, typename RangeT>
std::shared_ptr<RangeT> DecodeTypedValueRange(std::shared_ptr<model::ValueRange> const& domain) const {
std::shared_ptr<RangeT> int_domain = std::static_pointer_cast<RangeT>(domain);
T span = int_domain->upper_bound - int_domain->lower_bound;
T resulting_lower = int_domain->lower_bound + span * this->bound1;
T resulting_upper = int_domain->lower_bound + span * this->bound2;
if (resulting_lower > resulting_upper) {
std::swap(resulting_lower, resulting_upper);
}
return std::make_shared<RangeT>(resulting_lower, resulting_upper);
}
and call it here:
case model::TypeId::kInt: { | |
std::shared_ptr<model::IntValueRange> int_domain = | |
std::static_pointer_cast<model::IntValueRange>(domain); | |
model::Int span = int_domain->upper_bound - int_domain->lower_bound; | |
model::Int resulting_lower = int_domain->lower_bound + span * this->bound1; | |
model::Int resulting_upper = int_domain->lower_bound + span * this->bound2; | |
if (resulting_lower > resulting_upper) { | |
std::swap(resulting_lower, resulting_upper); | |
} | |
return std::make_shared<model::IntValueRange>( | |
model::IntValueRange(resulting_lower, resulting_upper)); | |
} | |
case model::TypeId::kDouble: { | |
std::shared_ptr<model::DoubleValueRange> double_domain = | |
std::static_pointer_cast<model::DoubleValueRange>(domain); | |
model::Double span = double_domain->upper_bound - double_domain->lower_bound; | |
model::Double resulting_lower = double_domain->lower_bound + span * this->bound1; | |
model::Double resulting_upper = double_domain->lower_bound + span * this->bound2; | |
if (resulting_lower > resulting_upper) { | |
std::swap(resulting_lower, resulting_upper); | |
} | |
return std::make_shared<model::DoubleValueRange>( | |
model::DoubleValueRange(resulting_lower, resulting_upper)); | |
} | |
case TypeId::kInt: { | |
return DecodeTypedValueRange<Int, IntValueRange>(domain); | |
} | |
case TypeId::kDouble: { | |
return DecodeTypedValueRange<Double, DoubleValueRange>(domain); | |
} |
Also you can define Mixed
case as template specialization:
template <>
std::shared_ptr<model::StringValueRange>
EncodedValueRange::DecodeTypedValueRange<model::String, model::StringValueRange>
(std::shared_ptr<model::ValueRange> const& domain) const {
using namespace model;
std::shared_ptr<StringValueRange> string_domain =
std::static_pointer_cast<StringValueRange>(domain);
std::vector<String> string_vector = string_domain->domain;
size_t span = string_vector.size();
// upper_bound is not used, resulting NARs bind categorical values with a single
// value.
String result;
if (bound1 == 1.0) {
result = string_vector.back();
} else {
result = string_vector[(size_t)(span * this->bound1)];
}
return std::make_shared<StringValueRange>(StringValueRange(result));
}
Note that specialization should appear after template itself and before first instantiation. The easier way to do this is to place template specialization and EncodeValueRange::Decode
definition in .cpp
file -- in that order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also we should avoid using c-style casts: "(size_t)
"
case model::TypeId::kInt: { | ||
std::shared_ptr<model::IntValueRange> int_domain = | ||
std::static_pointer_cast<model::IntValueRange>(domain); | ||
model::Int span = int_domain->upper_bound - int_domain->lower_bound; | ||
model::Int resulting_lower = int_domain->lower_bound + span * this->bound1; | ||
model::Int resulting_upper = int_domain->lower_bound + span * this->bound2; | ||
if (resulting_lower > resulting_upper) { | ||
std::swap(resulting_lower, resulting_upper); | ||
} | ||
return std::make_shared<model::IntValueRange>( | ||
model::IntValueRange(resulting_lower, resulting_upper)); | ||
} | ||
case model::TypeId::kDouble: { | ||
std::shared_ptr<model::DoubleValueRange> double_domain = | ||
std::static_pointer_cast<model::DoubleValueRange>(domain); | ||
model::Double span = double_domain->upper_bound - double_domain->lower_bound; | ||
model::Double resulting_lower = double_domain->lower_bound + span * this->bound1; | ||
model::Double resulting_upper = double_domain->lower_bound + span * this->bound2; | ||
if (resulting_lower > resulting_upper) { | ||
std::swap(resulting_lower, resulting_upper); | ||
} | ||
return std::make_shared<model::DoubleValueRange>( | ||
model::DoubleValueRange(resulting_lower, resulting_upper)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, EncodedValueRange::Decode
should be placed in a .cpp
file
return std::make_shared<model::IntValueRange>( | ||
model::IntValueRange(resulting_lower, resulting_upper)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return std::make_shared<model::IntValueRange>( | |
model::IntValueRange(resulting_lower, resulting_upper)); | |
return std::make_shared<model::IntValueRange>(resulting_lower, resulting_upper); |
The purpose of make_shared
is to forward it's parameters to IntValueRange
's constructor and create IntValueRange
in-place. In your code temporary IntValueRange
is created and being copied (or moved) to shared_ptr
. Also it's not a good idea to repeat type name.
This also applies to Double
and Mixed
cases
std::mt19937 RNG::rng_ = std::mt19937(kSeed); | ||
std::uniform_real_distribution<double> RNG::uni_ = std::uniform_real_distribution(0.0, 1.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::mt19937 RNG::rng_ = std::mt19937(kSeed); | |
std::uniform_real_distribution<double> RNG::uni_ = std::uniform_real_distribution(0.0, 1.0); | |
std::mt19937 RNG::rng_{kSeed}; | |
std::uniform_real_distribution<double> RNG::uni_{0.0, 1.0}; |
@@ -0,0 +1 @@ | |||
#include "des/des.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "des/des.h" | |
#pragma once | |
#include "des/des.h" |
size_t antecounter = 0; | ||
for (auto const& [key, value] : ante_) { | ||
if (antecounter > 0) { | ||
result += ", "; | ||
} | ||
result += std::to_string(key); | ||
result += ": "; | ||
result += value->ToString(); | ||
antecounter++; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_t antecounter = 0; | |
for (auto const& [key, value] : ante_) { | |
if (antecounter > 0) { | |
result += ", "; | |
} | |
result += std::to_string(key); | |
result += ": "; | |
result += value->ToString(); | |
antecounter++; | |
} | |
for (auto it{ante_.begin()}; it != ante_.end(); ++it) { | |
if (it != ante_.begin()) { | |
result += ", "; | |
} | |
result += std::to_string(it->first); | |
result += ": "; | |
result += it->second->ToString(); | |
} |
size_t conscounter = 0; | ||
for (auto const& [key, value] : cons_) { | ||
if (conscounter > 0) { | ||
result += ", "; | ||
} | ||
result += std::to_string(key); | ||
result += ": "; | ||
result += value->ToString(); | ||
conscounter++; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_t conscounter = 0; | |
for (auto const& [key, value] : cons_) { | |
if (conscounter > 0) { | |
result += ", "; | |
} | |
result += std::to_string(key); | |
result += ": "; | |
result += value->ToString(); | |
conscounter++; | |
} | |
for (auto it{cons_.begin()}; it != cons_.end(); ++it) { | |
if (it != cons_.begin()) { | |
result += ", "; | |
} | |
result += std::to_string(it->first); | |
result += ": "; | |
result += it->second->ToString(); | |
} |
} else { | ||
result.confidence = num_rows_fit_ante_and_cons / (double)num_rows_fit_ante; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else
here makes no sense -- if
branch has return
at the end.
} else { | |
result.confidence = num_rows_fit_ante_and_cons / (double)num_rows_fit_ante; | |
} | |
result.confidence = num_rows_fit_ante_and_cons / (double)num_rows_fit_ante; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also (double)
is a c-style cast. Prefer using static_cast
value); // is it okay to use a non-long double here? clion says GetValue returns | ||
// just double. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's ok. Double
is just double
(it's defined exactly as using model::Double = double;
), and GetValue<T>
returns T
DoubleValueRange::DoubleValueRange(model::TypedColumnData const& column) { | ||
bool initialized = false; | ||
for (size_t row_index = 0; row_index < column.GetNumRows(); row_index++) { | ||
std::byte const* value = column.GetValue(row_index); | ||
double double_value = model::Type::GetValue<model::Double>( | ||
value); // is it okay to use a non-long double here? clion says GetValue returns | ||
// just double. | ||
if (!initialized) { | ||
lower_bound = double_value; | ||
upper_bound = double_value; | ||
initialized = true; | ||
continue; | ||
} | ||
if (double_value > upper_bound) { | ||
upper_bound = double_value; | ||
} | ||
if (double_value < lower_bound) { | ||
lower_bound = double_value; | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method repeats IntValueRange::IntValueRange
. The only difference is int_value
/double_value
type. I think, you can use template function like
template <typename T>
CreateValueRange(...) {
...
T typed_value = model::Type::GetValue<T>(value);
...
and in constructor will be CreateValueRange<model::Int>(...)
and CreateValueRange<model::Double>(...)
.
bool initialized = false; | ||
for (size_t row_index = 0; row_index < column.GetNumRows(); row_index++) { | ||
std::byte const* value = column.GetValue(row_index); | ||
long int int_value = model::Type::GetValue<model::Int>(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You assign value of type model::Int
to int_value
of type long
. Then you assign it to lower_bound
and upper_bound
, both of type model::Int
. Why can't int_value
be of type model::Int
?
Moreover, model::Int
is not the same type as long
-- it's alias for int_64_t
, which is twice as long on some machines (see https://en.cppreference.com/w/cpp/language/types).
long int int_value = model::Type::GetValue<model::Int>(value); | |
model::Int int_value = model::Type::GetValue<model::Int>(value); |
The same for DoubleValueRange
virtual bool Includes(std::byte const* value) const = 0; | ||
virtual std::string ToString() const = 0; | ||
|
||
protected: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style guide recommends that protected
section precede public
algorithm->Execute(); | ||
SUCCEED(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add tests, checking that algorithm works fine:
- test that NARs that should hold are found
- test that their qualities are calculated correctly (fitness, support, confidence)
return bound2; | ||
break; | ||
default: | ||
throw std::out_of_range("Index out of range for value range."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove code duplication by creating a static array of four elements in private section:
static constexpr double EncodedValueRange::* member_pointers_[kFieldCount] = {
&EncodedValueRange::permutation,
&EncodedValueRange::threshold,
&EncodedValueRange::bound1,
&EncodedValueRange::bound2
};
Those are pointers to the member values
And use it like this:
double& operator[](size_t index) {
if (index >= kFieldCount) {
throw std::out_of_range("Index out of range for EncodedValueRange.");
}
return this->*(member_pointers_[index]);
}
const double& operator[](size_t index) const {
if (index >= kFieldCount) {
throw std::out_of_range("Index out of range for EncodedValueRange.");
}
return this->*(member_pointers_[index]);
}
But that looks kinda ugly IMO. Maybe your approach is readable enough, decide for yourself
std::shared_ptr<model::IntValueRange> int_domain = | ||
std::static_pointer_cast<model::IntValueRange>(domain); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use auto
when specifying type of a variable when you are performing a cast. The reason is that the type can be easily deduced by both a compiler and a person reading the code:
std::shared_ptr<model::IntValueRange> int_domain = | |
std::static_pointer_cast<model::IntValueRange>(domain); | |
auto int_domain = std::static_pointer_cast<model::IntValueRange>(domain); |
std::shared_ptr<model::DoubleValueRange> double_domain = | ||
std::static_pointer_cast<model::DoubleValueRange>(domain); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here with an auto
case model::TypeId::kInt: { | ||
std::shared_ptr<model::IntValueRange> int_domain = | ||
std::static_pointer_cast<model::IntValueRange>(domain); | ||
model::Int span = int_domain->upper_bound - int_domain->lower_bound; | ||
model::Int resulting_lower = int_domain->lower_bound + span * this->bound1; | ||
model::Int resulting_upper = int_domain->lower_bound + span * this->bound2; | ||
if (resulting_lower > resulting_upper) { | ||
std::swap(resulting_lower, resulting_upper); | ||
} | ||
return std::make_shared<model::IntValueRange>( | ||
model::IntValueRange(resulting_lower, resulting_upper)); | ||
} | ||
case model::TypeId::kDouble: { | ||
std::shared_ptr<model::DoubleValueRange> double_domain = | ||
std::static_pointer_cast<model::DoubleValueRange>(domain); | ||
model::Double span = double_domain->upper_bound - double_domain->lower_bound; | ||
model::Double resulting_lower = double_domain->lower_bound + span * this->bound1; | ||
model::Double resulting_upper = double_domain->lower_bound + span * this->bound2; | ||
if (resulting_lower > resulting_upper) { | ||
std::swap(resulting_lower, resulting_upper); | ||
} | ||
return std::make_shared<model::DoubleValueRange>( | ||
model::DoubleValueRange(resulting_lower, resulting_upper)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also we should avoid using c-style casts: "(size_t)
"
if (bound1 == 1.0) { | ||
result = string_vector.back(); | ||
} else { | ||
result = string_vector[(size_t)(span * this->bound1)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this->bound1
? just bound1
is enough
} else { | ||
result.confidence = num_rows_fit_ante_and_cons / (double)num_rows_fit_ante; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also (double)
is a c-style cast. Prefer using static_cast
NARQualities result; | ||
if (num_rows_fit_ante == 0) { | ||
result.fitness = 0.0; | ||
result.confidence = 0.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I also think that {0.0, 0.0, 0.0}
will be better. In fact, we can do that for all cases:
if (num_rows_fit_ante == 0) {
return {0.0, 0.0, 0.0};
}
double confidence = static_cast<double>(num_rows_fit_ante_and_cons) / num_rows_fit_ante;
double support = static_cast<double>(num_rows_fit_ante_and_cons) / num_rows;
if (support == 0.0) {
return {0.0, 0.0, 0.0};
}
double inclusion = static_cast<double>(included_features) / feature_count;
double fitness = (confidence + support + inclusion) / 3.0;
return {fitness, confidence, support};
// TODO: this function is way too big and cluttered | ||
void NAR::SetQualities(TypedRelation const* typed_relation) { | ||
if (ante_.size() == 0 || cons_.size() == 0) { | ||
qualities_.fitness = 0.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about other fields?
qualities_ = {0.0, 0.0, 0.0};
size_t num_rows_fit_ante = 0; | ||
size_t num_rows_fit_ante_and_cons = 0; | ||
for (size_t rowi = 0; rowi < typed_relation->GetNumRows(); rowi++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extract typed_relation->GetNumRows()
to a variable since you will be using it later
for (size_t rowi = 0; rowi < typed_relation->GetNumRows(); rowi++) { | ||
bool row_fits_ante = true; | ||
bool row_fits_cons = true; | ||
for (size_t coli = 0; coli < typed_relation->GetNumColumns(); coli++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same with GetNumColumns()
model::TypedColumnData const& column = typed_relation->GetColumnData(coli); | ||
auto value = column.GetValue(rowi); | ||
row_fits_ante &= AnteFitsValue(coli, value); | ||
if (!row_fits_ante) { | ||
break; | ||
} | ||
row_fits_cons &= ConsFitsValue(coli, value); | ||
} | ||
num_rows_fit_ante += row_fits_ante; | ||
num_rows_fit_ante_and_cons += (row_fits_ante && row_fits_cons); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe smth like this is easier:
if (row_fits_ante) {
row_fits_ante = AnteFitsValue(coli, value);
} else {
break;
}
if (row_fits_ante) {
row_fits_cons &= ConsFitsValue(coli, value);
}
}
if (row_fits_ante) {
++num_rows_fit_ante;
if (row_fits_cons) {
++num_rows_fit_ante_and_cons;
}
}
bool map_binds_feature = map.find(feature_index) != map.end(); | ||
if (!map_binds_feature) { | ||
return true; | ||
} else if (map.at(feature_index)->Includes(value)) { | ||
return true; | ||
} else { | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess find
here saves one extra lookup, so maybe smth like this is acceptable too
auto it = map.find(feature_index);
if (it == map.end()) {
return ture;
}
return it->second->Includes(value);
}
namespace model { | ||
|
||
StringValueRange::StringValueRange(model::TypedColumnData const& column) { | ||
domain = std::vector<std::string>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
domain = std::vector<std::string>(); | |
domain.reserve(column.GetNumRows()); |
domain = std::vector<std::string>(); | ||
for (size_t row_index = 0; row_index < column.GetNumRows(); row_index++) { | ||
std::byte const* value = column.GetValue(row_index); | ||
std::string string_value = model::Type::GetValue<std::string>(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model
not needed here
std::string IntValueRange::ToString() const { | ||
std::string result; | ||
result += "["; | ||
result += std::to_string(lower_bound); | ||
result += " - "; | ||
result += std::to_string(upper_bound); | ||
result += "]"; | ||
return result; | ||
} | ||
|
||
IntValueRange::IntValueRange(model::TypedColumnData const& column) { | ||
bool initialized = false; | ||
for (size_t row_index = 0; row_index < column.GetNumRows(); row_index++) { | ||
std::byte const* value = column.GetValue(row_index); | ||
long int int_value = model::Type::GetValue<model::Int>(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe create a templated class NumericalValueRange
and use smth like:
using IntValueRange = NumericValueRange<Int>;
using DoubleValueRange = NumericValueRange<Double>;
like
template <typename T>
class NumericValueRange : public ValueRange {
public:
explicit NumericValueRange(const TypedColumnData& column) {
bool initialized = false;
for (size_t row_index = 0; row_index < column.GetNumRows(); ++row_index) {
const std::byte* value = column.GetValue(row_index);
T numeric_value = Type::GetValue<T>(value);
if (!initialized) {
lower_bound_ = numeric_value;
upper_bound_ = numeric_value;
initialized = true;
} else {
if (numeric_value < lower_bound_) {
lower_bound_ = numeric_value;
}
if (numeric_value > upper_bound_) {
upper_bound_ = numeric_value;
}
}
}
}
[[nodiscard]] std::string ToString() const override {
return "[" + std::to_string(lower_bound_) + " - " + std::to_string(upper_bound_) + "]";
}
[[nodiscard]] bool Includes(const std::byte* value) const override {
T numeric_value = Type::GetValue<T>(value);
return numeric_value >= lower_bound_ && numeric_value <= upper_bound_;
}
private:
T lower_bound_{};
T upper_bound_{};
};
std::string StringValueRange::ToString() const { | ||
std::string result; | ||
result += "["; | ||
if (domain.size() > 0) { | ||
result += domain[0]; | ||
} | ||
for (size_t i = 1; i < domain.size(); i++) { | ||
result += (", " + domain[i]); | ||
} | ||
result += "]"; | ||
return result; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (domain_.empty()) {
return "[]";
}
std::string result = "[" + domain_.front();
for (size_t i = 1; i < domain_.size(); ++i) {
result += ", " + domain_[i];
}
result += "]";
return result;
case TypeId::kString: | ||
return std::make_shared<StringValueRange>(column); | ||
case TypeId::kMixed: | ||
return std::make_shared<StringValueRange>(column); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use fallthrough:
case TypeId::kString:
case TypeId::kMixed:
return std::make_shared<StringValueRange>(column);
fdac96a
to
a72e340
Compare
std::vector<size_t> ind_vec; | ||
ind_vec.reserve(number_of_indices); | ||
ind_vec.insert(ind_vec.end(), indices.begin(), indices.end()); | ||
return ind_vec; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::vector<size_t> ind_vec; | |
ind_vec.reserve(number_of_indices); | |
ind_vec.insert(ind_vec.end(), indices.begin(), indices.end()); | |
return ind_vec; | |
std::vector<size_t> ind_vec {indices.begin(), indices.end()}; | |
return ind_vec; |
} else { | ||
index--; | ||
size_t feature = index / EncodedValueRange().kFieldCount; | ||
size_t feature_field = index % EncodedValueRange().kFieldCount; | ||
return encoded_value_ranges_[feature][feature_field]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else { | |
index--; | |
size_t feature = index / EncodedValueRange().kFieldCount; | |
size_t feature_field = index % EncodedValueRange().kFieldCount; | |
return encoded_value_ranges_[feature][feature_field]; | |
} | |
index--; | |
size_t feature = index / EncodedValueRange::kFieldCount; | |
size_t feature_field = index % EncodedValueRange::kFieldCount; | |
return encoded_value_ranges_[feature][feature_field]; |
return qualities_; | ||
} | ||
|
||
NAR EncodedNAR::Decode(FeatureDomains& domains) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NAR EncodedNAR::Decode(FeatureDomains& domains) const { | |
NAR EncodedNAR::Decode(FeatureDomains const& domains) const { |
If FeatureDomains
doesn't have const operator[]
(some of the stl containers may have this problem), use at
-- it must have const
version
Add differential evolution solver algorithm for mining numerical association rules.