Add DES #490

VanyaVolgushev · 2024-11-04T18:51:26Z

Add differential evolution solver algorithm for mining numerical association rules.

github-actions

clang-tidy made some suggestions

src/core/algorithms/nar/des/des.cpp

src/core/algorithms/nar/des/encoded_nar.cpp

src/core/algorithms/nar/des/rng.cpp

src/core/algorithms/nar/des/rng.h

src/core/algorithms/nar/value_range.h

alexandrsmirn

The first part of the review. I will check the rest of the code later.

src/core/algorithms/nar/nar.h

src/core/algorithms/nar/nar_algorithm.h

src/core/algorithms/nar/nar_algorithm.cpp

alexandrsmirn · 2024-11-19T20:35:15Z

src/core/algorithms/nar/nar_algorithm.cpp

+}
+
+void NARAlgorithm::LoadDataInternal() {
+    typed_relation_ = model::ColumnLayoutTypedRelationData::CreateFrom(*input_table_, true);


the second parameter of CreateFrom function is is_null_equal_null. this parameter can be introduced as an algorithm parameter, like this way.

or, maybe, the authors of the algorithm have discussed this case in the paper?

src/core/config/descriptions.h

src/core/algorithms/nar/value_range.h

src/core/algorithms/nar/des/des.cpp

src/core/algorithms/nar/des/encoded_nar.cpp

ol-imorozko · 2024-11-20T11:39:50Z

src/core/algorithms/nar/des/encoded_nar.cpp

+        auto domain = domains[feature_index];
+        auto decoded = encoded_value_ranges_[feature_index].Decode(domain);


again, you copying the shared pointer when using auto domain = domains[feature_index]. Use const reference, no need to increase ref count of shared ptr for no reason

again, no need to create an additional copy of decoded.
I'm aware that InsertInCons wants a shared pointer, not a const ref, but when passing const ref to it it will create a copy automatically, so no problems here.

ol-imorozko · 2024-11-20T11:44:15Z

src/core/algorithms/nar/des/encoded_nar.cpp

+EncodedNAR::EncodedNAR(FeatureDomains domains, TypedRelation const* typed_relation) {
+    size_t feature_count = domains.size();
+    for (size_t feature_index = 0; feature_index < feature_count; feature_index++) {
+        encoded_value_ranges_.emplace_back(EncodedValueRange());
+    }


That's a constructor, so we can just do

Suggested change

EncodedNAR::EncodedNAR(FeatureDomains domains, TypedRelation const* typed_relation) {

size_t feature_count = domains.size();

for (size_t feature_index = 0; feature_index < feature_count; feature_index++) {

encoded_value_ranges_.emplace_back(EncodedValueRange());

}

EncodedNAR::EncodedNAR(const FeatureDomains& domains, const TypedRelation* typed_relation)

: encoded_value_ranges_(domains.size(), EncodedValueRange()) {

ol-imorozko · 2024-11-20T11:45:00Z

src/core/algorithms/nar/des/encoded_nar.cpp

+    implication_sign_pos_ = RNG().Next();
+    SetQualities(domains, typed_relation);
+}
+
+EncodedNAR::EncodedNAR(size_t feature_count) {
+    for (size_t feature_index = 0; feature_index < feature_count; feature_index++) {
+        encoded_value_ranges_.emplace_back(EncodedValueRange());
+    }
+    implication_sign_pos_ = RNG().Next();
+}


same thing with RNG, just create a std::random_device as a field in EncodedNAR class

ol-imorozko · 2024-11-20T11:45:26Z

src/core/algorithms/nar/des/encoded_nar.cpp

+EncodedNAR::EncodedNAR(size_t feature_count) {
+    for (size_t feature_index = 0; feature_index < feature_count; feature_index++) {
+        encoded_value_ranges_.emplace_back(EncodedValueRange());
+    }


Suggested change

EncodedNAR::EncodedNAR(size_t feature_count) {

for (size_t feature_index = 0; feature_index < feature_count; feature_index++) {

encoded_value_ranges_.emplace_back(EncodedValueRange());

}

EncodedNAR::EncodedNAR(size_t feature_count)

: encoded_value_ranges_(feature_count, EncodedValueRange()) {

Petua41 · 2024-11-20T13:50:18Z

src/core/algorithms/nar/des/des.cpp

+using model::ValueRange;
+
+DES::DES() : NARAlgorithm({}) {
+    using namespace config::names;


Suggested change

using namespace config::names;

Petua41 · 2024-11-20T14:18:20Z

src/core/algorithms/nar/des/des.cpp

@@ -0,0 +1,89 @@
+#include "des.h"
+


Include-what-you-use is planned to be introduced in the CI, so it would be better to add these lines:

Suggested change

#include <algorithm>

#include <cstddef>

#include <vector>

#include <memory>

This also applies to other files

Petua41 · 2024-11-20T14:25:52Z

src/core/algorithms/nar/des/des.h

+
+    static FeatureDomains FindFeatureDomains(TypedRelation const* typed_relation);
+    std::vector<EncodedNAR> GetRandomPopulationInDomains(FeatureDomains domains) const;
+    void EvolvePopulation(std::vector<EncodedNAR>& population) const;


I can't find definition of DES::EvolvePopulation method. Shouldn't it's declaration be removed?

Petua41 · 2024-11-20T15:27:59Z

src/core/algorithms/nar/des/encoded_nar.h

+
+private:
+    double implication_sign_pos_ = -1;
+    std::vector<EncodedValueRange> encoded_value_ranges_ = std::vector<EncodedValueRange>();


Suggested change

std::vector<EncodedValueRange> encoded_value_ranges_ = std::vector<EncodedValueRange>();

std::vector<EncodedValueRange> encoded_value_ranges_{};

Petua41 · 2024-11-20T15:31:57Z

src/core/algorithms/nar/des/encoded_value_range.h

+        switch (index) {
+            case 0:
+                return permutation;
+                break;


These break's are unreachable

Petua41 · 2024-11-20T15:44:42Z

src/core/algorithms/nar/des/encoded_value_range.h

+    std::shared_ptr<model::ValueRange> Decode(std::shared_ptr<model::ValueRange> domain) const {
+        switch (domain->GetTypeId()) {


I think, it's better to add

Suggested change

std::shared_ptr<model::ValueRange> Decode(std::shared_ptr<model::ValueRange> domain) const {

switch (domain->GetTypeId()) {

std::shared_ptr<model::ValueRange> Decode(std::shared_ptr<model::ValueRange> domain) const {

using namespace model;

switch (domain->GetTypeId()) {

and remove model:: further in this method

Petua41 · 2024-11-20T16:22:34Z

src/core/algorithms/nar/des/encoded_value_range.h

+            case model::TypeId::kInt: {
+                std::shared_ptr<model::IntValueRange> int_domain =
+                        std::static_pointer_cast<model::IntValueRange>(domain);
+                model::Int span = int_domain->upper_bound - int_domain->lower_bound;
+                model::Int resulting_lower = int_domain->lower_bound + span * this->bound1;
+                model::Int resulting_upper = int_domain->lower_bound + span * this->bound2;
+                if (resulting_lower > resulting_upper) {
+                    std::swap(resulting_lower, resulting_upper);
+                }
+                return std::make_shared<model::IntValueRange>(
+                        model::IntValueRange(resulting_lower, resulting_upper));
+            }
+            case model::TypeId::kDouble: {
+                std::shared_ptr<model::DoubleValueRange> double_domain =
+                        std::static_pointer_cast<model::DoubleValueRange>(domain);
+                model::Double span = double_domain->upper_bound - double_domain->lower_bound;
+                model::Double resulting_lower = double_domain->lower_bound + span * this->bound1;
+                model::Double resulting_upper = double_domain->lower_bound + span * this->bound2;
+                if (resulting_lower > resulting_upper) {
+                    std::swap(resulting_lower, resulting_upper);
+                }
+                return std::make_shared<model::DoubleValueRange>(
+                        model::DoubleValueRange(resulting_lower, resulting_upper));
+            }


Maybe it's better to use private template method here?

template <typename T, typename RangeT> std::shared_ptr<RangeT> DecodeTypedValueRange(std::shared_ptr<model::ValueRange> const& domain) const { std::shared_ptr<RangeT> int_domain = std::static_pointer_cast<RangeT>(domain); T span = int_domain->upper_bound - int_domain->lower_bound; T resulting_lower = int_domain->lower_bound + span * this->bound1; T resulting_upper = int_domain->lower_bound + span * this->bound2; if (resulting_lower > resulting_upper) { std::swap(resulting_lower, resulting_upper); } return std::make_shared<RangeT>(resulting_lower, resulting_upper); }

and call it here:

Suggested change

case model::TypeId::kInt: {

std::shared_ptr<model::IntValueRange> int_domain =

std::static_pointer_cast<model::IntValueRange>(domain);

model::Int span = int_domain->upper_bound - int_domain->lower_bound;

model::Int resulting_lower = int_domain->lower_bound + span * this->bound1;

model::Int resulting_upper = int_domain->lower_bound + span * this->bound2;

if (resulting_lower > resulting_upper) {

std::swap(resulting_lower, resulting_upper);

}

return std::make_shared<model::IntValueRange>(

model::IntValueRange(resulting_lower, resulting_upper));

}

case model::TypeId::kDouble: {

std::shared_ptr<model::DoubleValueRange> double_domain =

std::static_pointer_cast<model::DoubleValueRange>(domain);

model::Double span = double_domain->upper_bound - double_domain->lower_bound;

model::Double resulting_lower = double_domain->lower_bound + span * this->bound1;

model::Double resulting_upper = double_domain->lower_bound + span * this->bound2;

if (resulting_lower > resulting_upper) {

std::swap(resulting_lower, resulting_upper);

}

return std::make_shared<model::DoubleValueRange>(

model::DoubleValueRange(resulting_lower, resulting_upper));

}

case TypeId::kInt: {

return DecodeTypedValueRange<Int, IntValueRange>(domain);

}

case TypeId::kDouble: {

return DecodeTypedValueRange<Double, DoubleValueRange>(domain);

}

Also you can define Mixed case as template specialization:

template <> std::shared_ptr<model::StringValueRange> EncodedValueRange::DecodeTypedValueRange<model::String, model::StringValueRange> (std::shared_ptr<model::ValueRange> const& domain) const { using namespace model; std::shared_ptr<StringValueRange> string_domain = std::static_pointer_cast<StringValueRange>(domain); std::vector<String> string_vector = string_domain->domain; size_t span = string_vector.size(); // upper_bound is not used, resulting NARs bind categorical values with a single // value. String result; if (bound1 == 1.0) { result = string_vector.back(); } else { result = string_vector[(size_t)(span * this->bound1)]; } return std::make_shared<StringValueRange>(StringValueRange(result)); }

Note that specialization should appear after template itself and before first instantiation. The easier way to do this is to place template specialization and EncodeValueRange::Decode definition in .cpp file -- in that order

also we should avoid using c-style casts: "(size_t)"

Petua41 · 2024-11-20T16:24:25Z

src/core/algorithms/nar/des/encoded_value_range.h

+            case model::TypeId::kInt: {
+                std::shared_ptr<model::IntValueRange> int_domain =
+                        std::static_pointer_cast<model::IntValueRange>(domain);
+                model::Int span = int_domain->upper_bound - int_domain->lower_bound;
+                model::Int resulting_lower = int_domain->lower_bound + span * this->bound1;
+                model::Int resulting_upper = int_domain->lower_bound + span * this->bound2;
+                if (resulting_lower > resulting_upper) {
+                    std::swap(resulting_lower, resulting_upper);
+                }
+                return std::make_shared<model::IntValueRange>(
+                        model::IntValueRange(resulting_lower, resulting_upper));
+            }
+            case model::TypeId::kDouble: {
+                std::shared_ptr<model::DoubleValueRange> double_domain =
+                        std::static_pointer_cast<model::DoubleValueRange>(domain);
+                model::Double span = double_domain->upper_bound - double_domain->lower_bound;
+                model::Double resulting_lower = double_domain->lower_bound + span * this->bound1;
+                model::Double resulting_upper = double_domain->lower_bound + span * this->bound2;
+                if (resulting_lower > resulting_upper) {
+                    std::swap(resulting_lower, resulting_upper);
+                }
+                return std::make_shared<model::DoubleValueRange>(
+                        model::DoubleValueRange(resulting_lower, resulting_upper));
+            }


Anyway, EncodedValueRange::Decode should be placed in a .cpp file

Petua41 · 2024-11-20T16:34:50Z

src/core/algorithms/nar/des/encoded_value_range.h

+                return std::make_shared<model::IntValueRange>(
+                        model::IntValueRange(resulting_lower, resulting_upper));


Suggested change

return std::make_shared<model::IntValueRange>(

model::IntValueRange(resulting_lower, resulting_upper));

return std::make_shared<model::IntValueRange>(resulting_lower, resulting_upper);

The purpose of make_shared is to forward it's parameters to IntValueRange's constructor and create IntValueRange in-place. In your code temporary IntValueRange is created and being copied (or moved) to shared_ptr. Also it's not a good idea to repeat type name.
This also applies to Double and Mixed cases

Petua41 · 2024-11-21T15:15:41Z

src/core/algorithms/nar/des/rng.cpp

+std::mt19937 RNG::rng_ = std::mt19937(kSeed);
+std::uniform_real_distribution<double> RNG::uni_ = std::uniform_real_distribution(0.0, 1.0);


Suggested change

std::mt19937 RNG::rng_ = std::mt19937(kSeed);

std::uniform_real_distribution<double> RNG::uni_ = std::uniform_real_distribution(0.0, 1.0);

std::mt19937 RNG::rng_{kSeed};

std::uniform_real_distribution<double> RNG::uni_{0.0, 1.0};

Petua41 · 2024-11-21T15:17:36Z

src/core/algorithms/nar/mining_algorithms.h

@@ -0,0 +1 @@
+#include "des/des.h"


Suggested change

#include "des/des.h"

#pragma once

#include "des/des.h"

Petua41 · 2024-11-21T15:22:21Z

src/core/algorithms/nar/nar.cpp

+    size_t antecounter = 0;
+    for (auto const& [key, value] : ante_) {
+        if (antecounter > 0) {
+            result += ", ";
+        }
+        result += std::to_string(key);
+        result += ": ";
+        result += value->ToString();
+        antecounter++;
+    }


Suggested change

size_t antecounter = 0;

for (auto const& [key, value] : ante_) {

if (antecounter > 0) {

result += ", ";

}

result += std::to_string(key);

result += ": ";

result += value->ToString();

antecounter++;

}

for (auto it{ante_.begin()}; it != ante_.end(); ++it) {

if (it != ante_.begin()) {

result += ", ";

}

result += std::to_string(it->first);

result += ": ";

result += it->second->ToString();

}

Petua41 · 2024-11-21T15:23:35Z

src/core/algorithms/nar/nar.cpp

+    size_t conscounter = 0;
+    for (auto const& [key, value] : cons_) {
+        if (conscounter > 0) {
+            result += ", ";
+        }
+        result += std::to_string(key);
+        result += ": ";
+        result += value->ToString();
+        conscounter++;
+    }


Suggested change

size_t conscounter = 0;

for (auto const& [key, value] : cons_) {

if (conscounter > 0) {

result += ", ";

}

result += std::to_string(key);

result += ": ";

result += value->ToString();

conscounter++;

}

for (auto it{cons_.begin()}; it != cons_.end(); ++it) {

if (it != cons_.begin()) {

result += ", ";

}

result += std::to_string(it->first);

result += ": ";

result += it->second->ToString();

}

Petua41 · 2024-11-21T15:29:40Z

src/core/algorithms/nar/nar.cpp

+    } else {
+        result.confidence = num_rows_fit_ante_and_cons / (double)num_rows_fit_ante;
+    }


else here makes no sense -- if branch has return at the end.

Suggested change

} else {

result.confidence = num_rows_fit_ante_and_cons / (double)num_rows_fit_ante;

}

result.confidence = num_rows_fit_ante_and_cons / (double)num_rows_fit_ante;

Also (double) is a c-style cast. Prefer using static_cast

Petua41 · 2024-11-21T16:56:50Z

src/core/algorithms/nar/value_range.cpp

+                value);  // is it okay to use a non-long double here? clion says GetValue returns
+                         // just double.


Yes, it's ok. Double is just double (it's defined exactly as using model::Double = double;), and GetValue<T> returns T

Petua41 · 2024-11-21T17:03:16Z

src/core/algorithms/nar/value_range.cpp

+DoubleValueRange::DoubleValueRange(model::TypedColumnData const& column) {
+    bool initialized = false;
+    for (size_t row_index = 0; row_index < column.GetNumRows(); row_index++) {
+        std::byte const* value = column.GetValue(row_index);
+        double double_value = model::Type::GetValue<model::Double>(
+                value);  // is it okay to use a non-long double here? clion says GetValue returns
+                         // just double.
+        if (!initialized) {
+            lower_bound = double_value;
+            upper_bound = double_value;
+            initialized = true;
+            continue;
+        }
+        if (double_value > upper_bound) {
+            upper_bound = double_value;
+        }
+        if (double_value < lower_bound) {
+            lower_bound = double_value;
+        }
+    }
+}


This method repeats IntValueRange::IntValueRange. The only difference is int_value/double_value type. I think, you can use template function like

template <typename T> CreateValueRange(...) { ... T typed_value = model::Type::GetValue<T>(value); ...

and in constructor will be CreateValueRange<model::Int>(...) and CreateValueRange<model::Double>(...).

Petua41 · 2024-11-21T17:06:30Z

src/core/algorithms/nar/value_range.cpp

+    bool initialized = false;
+    for (size_t row_index = 0; row_index < column.GetNumRows(); row_index++) {
+        std::byte const* value = column.GetValue(row_index);
+        long int int_value = model::Type::GetValue<model::Int>(value);


You assign value of type model::Int to int_value of type long. Then you assign it to lower_bound and upper_bound, both of type model::Int. Why can't int_value be of type model::Int?
Moreover, model::Int is not the same type as long -- it's alias for int_64_t, which is twice as long on some machines (see https://en.cppreference.com/w/cpp/language/types).

Suggested change

long int int_value = model::Type::GetValue<model::Int>(value);

model::Int int_value = model::Type::GetValue<model::Int>(value);

The same for DoubleValueRange

Petua41 · 2024-11-21T17:14:24Z

src/core/algorithms/nar/value_range.h

+    virtual bool Includes(std::byte const* value) const = 0;
+    virtual std::string ToString() const = 0;
+
+protected:


Style guide recommends that protected section precede public

Petua41 · 2024-11-21T17:42:25Z

src/tests/test_des.cpp

+    algorithm->Execute();
+    SUCCEED();
+}
+


Add tests, checking that algorithm works fine:

test that NARs that should hold are found

test that their qualities are calculated correctly (fitness, support, confidence)

ol-imorozko · 2024-11-22T18:52:50Z

src/core/algorithms/nar/des/encoded_value_range.h

+                return bound2;
+                break;
+            default:
+                throw std::out_of_range("Index out of range for value range.");


I think we can remove code duplication by creating a static array of four elements in private section:

static constexpr double EncodedValueRange::* member_pointers_[kFieldCount] = { &EncodedValueRange::permutation, &EncodedValueRange::threshold, &EncodedValueRange::bound1, &EncodedValueRange::bound2 };

Those are pointers to the member values

And use it like this:

double& operator[](size_t index) { if (index >= kFieldCount) { throw std::out_of_range("Index out of range for EncodedValueRange."); } return this->*(member_pointers_[index]); } const double& operator[](size_t index) const { if (index >= kFieldCount) { throw std::out_of_range("Index out of range for EncodedValueRange."); } return this->*(member_pointers_[index]); }

But that looks kinda ugly IMO. Maybe your approach is readable enough, decide for yourself

ol-imorozko · 2024-11-22T19:00:57Z

src/core/algorithms/nar/des/encoded_value_range.h

+                std::shared_ptr<model::IntValueRange> int_domain =
+                        std::static_pointer_cast<model::IntValueRange>(domain);


You can use auto when specifying type of a variable when you are performing a cast. The reason is that the type can be easily deduced by both a compiler and a person reading the code:

Suggested change

std::shared_ptr<model::IntValueRange> int_domain =

std::static_pointer_cast<model::IntValueRange>(domain);

auto int_domain = std::static_pointer_cast<model::IntValueRange>(domain);

ol-imorozko · 2024-11-22T19:01:45Z

src/core/algorithms/nar/des/encoded_value_range.h

+                std::shared_ptr<model::DoubleValueRange> double_domain =
+                        std::static_pointer_cast<model::DoubleValueRange>(domain);


same here with an auto

ol-imorozko · 2024-11-22T19:03:09Z

src/core/algorithms/nar/des/encoded_value_range.h

+            case model::TypeId::kInt: {
+                std::shared_ptr<model::IntValueRange> int_domain =
+                        std::static_pointer_cast<model::IntValueRange>(domain);
+                model::Int span = int_domain->upper_bound - int_domain->lower_bound;
+                model::Int resulting_lower = int_domain->lower_bound + span * this->bound1;
+                model::Int resulting_upper = int_domain->lower_bound + span * this->bound2;
+                if (resulting_lower > resulting_upper) {
+                    std::swap(resulting_lower, resulting_upper);
+                }
+                return std::make_shared<model::IntValueRange>(
+                        model::IntValueRange(resulting_lower, resulting_upper));
+            }
+            case model::TypeId::kDouble: {
+                std::shared_ptr<model::DoubleValueRange> double_domain =
+                        std::static_pointer_cast<model::DoubleValueRange>(domain);
+                model::Double span = double_domain->upper_bound - double_domain->lower_bound;
+                model::Double resulting_lower = double_domain->lower_bound + span * this->bound1;
+                model::Double resulting_upper = double_domain->lower_bound + span * this->bound2;
+                if (resulting_lower > resulting_upper) {
+                    std::swap(resulting_lower, resulting_upper);
+                }
+                return std::make_shared<model::DoubleValueRange>(
+                        model::DoubleValueRange(resulting_lower, resulting_upper));
+            }


also we should avoid using c-style casts: "(size_t)"

ol-imorozko · 2024-11-22T19:04:09Z

src/core/algorithms/nar/des/encoded_value_range.h

+                if (bound1 == 1.0) {
+                    result = string_vector.back();
+                } else {
+                    result = string_vector[(size_t)(span * this->bound1)];


why this->bound1? just bound1 is enough

ol-imorozko · 2024-11-22T19:36:30Z

src/core/algorithms/nar/nar.cpp

+    } else {
+        result.confidence = num_rows_fit_ante_and_cons / (double)num_rows_fit_ante;
+    }


Also (double) is a c-style cast. Prefer using static_cast

ol-imorozko · 2024-11-22T19:38:18Z

src/core/algorithms/nar/nar.cpp

+    NARQualities result;
+    if (num_rows_fit_ante == 0) {
+        result.fitness = 0.0;
+        result.confidence = 0.0;


Yes, I also think that {0.0, 0.0, 0.0} will be better. In fact, we can do that for all cases:

if (num_rows_fit_ante == 0) { return {0.0, 0.0, 0.0}; } double confidence = static_cast<double>(num_rows_fit_ante_and_cons) / num_rows_fit_ante; double support = static_cast<double>(num_rows_fit_ante_and_cons) / num_rows; if (support == 0.0) { return {0.0, 0.0, 0.0}; } double inclusion = static_cast<double>(included_features) / feature_count; double fitness = (confidence + support + inclusion) / 3.0; return {fitness, confidence, support};

ol-imorozko · 2024-11-22T19:39:58Z

src/core/algorithms/nar/nar.cpp

+// TODO: this function is way too big and cluttered
+void NAR::SetQualities(TypedRelation const* typed_relation) {
+    if (ante_.size() == 0 || cons_.size() == 0) {
+        qualities_.fitness = 0.0;


what about other fields?

qualities_ = {0.0, 0.0, 0.0};

ol-imorozko · 2024-11-22T19:46:03Z

src/core/algorithms/nar/nar.cpp

+    size_t num_rows_fit_ante = 0;
+    size_t num_rows_fit_ante_and_cons = 0;
+    for (size_t rowi = 0; rowi < typed_relation->GetNumRows(); rowi++) {


extract typed_relation->GetNumRows() to a variable since you will be using it later

ol-imorozko · 2024-11-22T19:46:32Z

src/core/algorithms/nar/nar.cpp

+    for (size_t rowi = 0; rowi < typed_relation->GetNumRows(); rowi++) {
+        bool row_fits_ante = true;
+        bool row_fits_cons = true;
+        for (size_t coli = 0; coli < typed_relation->GetNumColumns(); coli++) {


same with GetNumColumns()

ol-imorozko · 2024-11-22T19:53:19Z

src/core/algorithms/nar/nar.cpp

+            model::TypedColumnData const& column = typed_relation->GetColumnData(coli);
+            auto value = column.GetValue(rowi);
+            row_fits_ante &= AnteFitsValue(coli, value);
+            if (!row_fits_ante) {
+                break;
+            }
+            row_fits_cons &= ConsFitsValue(coli, value);
+        }
+        num_rows_fit_ante += row_fits_ante;
+        num_rows_fit_ante_and_cons += (row_fits_ante && row_fits_cons);
+    }


maybe smth like this is easier:

if (row_fits_ante) { row_fits_ante = AnteFitsValue(coli, value); } else { break; } if (row_fits_ante) { row_fits_cons &= ConsFitsValue(coli, value); } } if (row_fits_ante) { ++num_rows_fit_ante; if (row_fits_cons) { ++num_rows_fit_ante_and_cons; } }

ol-imorozko · 2024-11-22T19:59:09Z

src/core/algorithms/nar/nar.cpp

+    bool map_binds_feature = map.find(feature_index) != map.end();
+    if (!map_binds_feature) {
+        return true;
+    } else if (map.at(feature_index)->Includes(value)) {
+        return true;
+    } else {
+        return false;
+    }


I guess find here saves one extra lookup, so maybe smth like this is acceptable too

auto it = map.find(feature_index); if (it == map.end()) { return ture; } return it->second->Includes(value); }

ol-imorozko · 2024-11-22T20:08:18Z

src/core/algorithms/nar/value_range.cpp

+namespace model {
+
+StringValueRange::StringValueRange(model::TypedColumnData const& column) {
+    domain = std::vector<std::string>();


Suggested change

domain = std::vector<std::string>();

domain.reserve(column.GetNumRows());

ol-imorozko · 2024-11-22T20:09:05Z

src/core/algorithms/nar/value_range.cpp

+    domain = std::vector<std::string>();
+    for (size_t row_index = 0; row_index < column.GetNumRows(); row_index++) {
+        std::byte const* value = column.GetValue(row_index);
+        std::string string_value = model::Type::GetValue<std::string>(value);


model not needed here

ol-imorozko · 2024-11-22T20:12:33Z

src/core/algorithms/nar/value_range.cpp

+std::string IntValueRange::ToString() const {
+    std::string result;
+    result += "[";
+    result += std::to_string(lower_bound);
+    result += " - ";
+    result += std::to_string(upper_bound);
+    result += "]";
+    return result;
+}
+
+IntValueRange::IntValueRange(model::TypedColumnData const& column) {
+    bool initialized = false;
+    for (size_t row_index = 0; row_index < column.GetNumRows(); row_index++) {
+        std::byte const* value = column.GetValue(row_index);
+        long int int_value = model::Type::GetValue<model::Int>(value);


Maybe create a templated class NumericalValueRange and use smth like:

using IntValueRange = NumericValueRange<Int>; using DoubleValueRange = NumericValueRange<Double>;

like

template <typename T> class NumericValueRange : public ValueRange { public: explicit NumericValueRange(const TypedColumnData& column) { bool initialized = false; for (size_t row_index = 0; row_index < column.GetNumRows(); ++row_index) { const std::byte* value = column.GetValue(row_index); T numeric_value = Type::GetValue<T>(value); if (!initialized) { lower_bound_ = numeric_value; upper_bound_ = numeric_value; initialized = true; } else { if (numeric_value < lower_bound_) { lower_bound_ = numeric_value; } if (numeric_value > upper_bound_) { upper_bound_ = numeric_value; } } } } [[nodiscard]] std::string ToString() const override { return "[" + std::to_string(lower_bound_) + " - " + std::to_string(upper_bound_) + "]"; } [[nodiscard]] bool Includes(const std::byte* value) const override { T numeric_value = Type::GetValue<T>(value); return numeric_value >= lower_bound_ && numeric_value <= upper_bound_; } private: T lower_bound_{}; T upper_bound_{}; };

ol-imorozko · 2024-11-22T20:14:16Z

src/core/algorithms/nar/value_range.cpp

+std::string StringValueRange::ToString() const {
+    std::string result;
+    result += "[";
+    if (domain.size() > 0) {
+        result += domain[0];
+    }
+    for (size_t i = 1; i < domain.size(); i++) {
+        result += (", " + domain[i]);
+    }
+    result += "]";
+    return result;
+}


if (domain_.empty()) { return "[]"; } std::string result = "[" + domain_.front(); for (size_t i = 1; i < domain_.size(); ++i) { result += ", " + domain_[i]; } result += "]"; return result;

ol-imorozko · 2024-11-22T20:36:18Z

src/core/algorithms/nar/value_range.cpp

+        case TypeId::kString:
+            return std::make_shared<StringValueRange>(column);
+        case TypeId::kMixed:
+            return std::make_shared<StringValueRange>(column);


Use fallthrough:

case TypeId::kString: case TypeId::kMixed: return std::make_shared<StringValueRange>(column);

Petua41 · 2024-11-27T15:56:14Z

src/core/algorithms/nar/des/differential_functions.cpp

+    std::vector<size_t> ind_vec;
+    ind_vec.reserve(number_of_indices);
+    ind_vec.insert(ind_vec.end(), indices.begin(), indices.end());
+    return ind_vec;


Suggested change

std::vector<size_t> ind_vec;

ind_vec.reserve(number_of_indices);

ind_vec.insert(ind_vec.end(), indices.begin(), indices.end());

return ind_vec;

std::vector<size_t> ind_vec {indices.begin(), indices.end()};

return ind_vec;

Petua41 · 2024-11-27T15:58:23Z

src/core/algorithms/nar/des/encoded_nar.cpp

+    } else {
+        index--;
+        size_t feature = index / EncodedValueRange().kFieldCount;
+        size_t feature_field = index % EncodedValueRange().kFieldCount;
+        return encoded_value_ranges_[feature][feature_field];
+    }


Suggested change

} else {

index--;

size_t feature = index / EncodedValueRange().kFieldCount;

size_t feature_field = index % EncodedValueRange().kFieldCount;

return encoded_value_ranges_[feature][feature_field];

}

index--;

size_t feature = index / EncodedValueRange::kFieldCount;

size_t feature_field = index % EncodedValueRange::kFieldCount;

return encoded_value_ranges_[feature][feature_field];

Petua41 · 2024-11-27T16:03:32Z

src/core/algorithms/nar/des/encoded_nar.cpp

+    return qualities_;
+}
+
+NAR EncodedNAR::Decode(FeatureDomains& domains) const {


Suggested change

NAR EncodedNAR::Decode(FeatureDomains& domains) const {

NAR EncodedNAR::Decode(FeatureDomains const& domains) const {

If FeatureDomains doesn't have const operator[] (some of the stl containers may have this problem), use at -- it must have const version

VanyaVolgushev force-pushed the add-des-release branch 2 times, most recently from 66d00fc to dbbf329 Compare November 4, 2024 18:57

github-actions bot reviewed Nov 4, 2024

View reviewed changes

VanyaVolgushev force-pushed the add-des-release branch 2 times, most recently from 578470b to 4edd689 Compare November 9, 2024 18:24

alexandrsmirn requested changes Nov 19, 2024

View reviewed changes

ol-imorozko requested changes Nov 20, 2024

View reviewed changes

Petua41 reviewed Nov 20, 2024

View reviewed changes

Petua41 requested changes Nov 21, 2024

View reviewed changes

ol-imorozko requested changes Nov 22, 2024

View reviewed changes

VanyaVolgushev added 19 commits November 26, 2024 16:09

Add DES

6383a4a

Make global methods and fields go after private

023a3ff

Fix namespace

df4065c

Add newlines at ends of files

b5237f8

Add const to String var

e0decad

Fix FindFeatureDomains

55247ca

Fix GetRandomPopulationInDomains

ba0acc2

Add const ref to MutateIndividual and rename to reflect that

ec347c7

Add unsigned where required

4468f05

Remove double definition

716e82e

Add move to mutants

eb14603

Remove typedef from diff_functions.h

8538902

Remove c-stlye cast

c67c212

Add assert(number_of_indices <= population - 1)

8411e6a

Make GetRandIndices return a vector

f94b2b7

Use [ ] with samples

09a45ce

Remove unreachable break statements

7ace291

Add const to VectorSize and FeatureCount

ebab96a

Remove else from operator[]

bb56794

VanyaVolgushev added 5 commits November 27, 2024 17:01

Constexpr instead of const

f439248

Reference

0701118

Fix types in Decode

ddd6853

Fix compare_by_permutation

f888035

Add const

a72e340

VanyaVolgushev force-pushed the add-des-release branch from fdac96a to a72e340 Compare November 27, 2024 15:26

Petua41 reviewed Nov 27, 2024

View reviewed changes

		auto domain = domains[feature_index];
		auto decoded = encoded_value_ranges_[feature_index].Decode(domain);

+#include <algorithm>
+#include <cstddef>
+#include <vector>
+#include <memory>

	std::vector<EncodedValueRange> encoded_value_ranges_ = std::vector<EncodedValueRange>();
	std::vector<EncodedValueRange> encoded_value_ranges_{};

		std::shared_ptr<model::ValueRange> Decode(std::shared_ptr<model::ValueRange> domain) const {
		switch (domain->GetTypeId()) {

		return std::make_shared<model::IntValueRange>(
		model::IntValueRange(resulting_lower, resulting_upper));

		std::mt19937 RNG::rng_ = std::mt19937(kSeed);
		std::uniform_real_distribution<double> RNG::uni_ = std::uniform_real_distribution(0.0, 1.0);

		value); // is it okay to use a non-long double here? clion says GetValue returns
		// just double.

	long int int_value = model::Type::GetValue<model::Int>(value);
	model::Int int_value = model::Type::GetValue<model::Int>(value);

		std::shared_ptr<model::IntValueRange> int_domain =
		std::static_pointer_cast<model::IntValueRange>(domain);

	std::shared_ptr<model::IntValueRange> int_domain =
	std::static_pointer_cast<model::IntValueRange>(domain);
	auto int_domain = std::static_pointer_cast<model::IntValueRange>(domain);

		std::shared_ptr<model::DoubleValueRange> double_domain =
		std::static_pointer_cast<model::DoubleValueRange>(domain);

	domain = std::vector<std::string>();
	domain.reserve(column.GetNumRows());

	NAR EncodedNAR::Decode(FeatureDomains& domains) const {
	NAR EncodedNAR::Decode(FeatureDomains const& domains) const {

Add DES #490

Are you sure you want to change the base?

Add DES #490

Conversation

VanyaVolgushev commented Nov 4, 2024

github-actions bot left a comment

Choose a reason for hiding this comment

alexandrsmirn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Petua41 Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

Petua41 Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

Petua41 Nov 27, 2024 •

edited

Loading

Petua41 Nov 27, 2024 •

edited

Loading