Skip to content

Commit

Permalink
build based on 16935cc
Browse files Browse the repository at this point in the history
  • Loading branch information
Documenter.jl committed Oct 5, 2023
1 parent 9ba3e2c commit b9b6843
Show file tree
Hide file tree
Showing 11 changed files with 20 additions and 18 deletions.
2 changes: 1 addition & 1 deletion dev/.documenter-siteinfo.json
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"documenter":{"julia_version":"1.6.7","generation_timestamp":"2023-10-05T03:24:51","documenter_version":"1.1.0"}}
{"documenter":{"julia_version":"1.6.7","generation_timestamp":"2023-10-05T04:04:01","documenter_version":"1.1.0"}}
4 changes: 2 additions & 2 deletions dev/api/index.html

Large diffs are not rendered by default.

8 changes: 5 additions & 3 deletions dev/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,17 @@
config = EvoTreeRegressor(rowsample=0.5, rng=123)
m2 = fit_evotree(config, df; target_name=&quot;y&quot;);</code></pre><p>However, the following <code>m1</code> and <code>m2</code> models won&#39;t be because the there&#39;s stochasticity involved in the model from <code>rowsample</code> and the random generator in the <code>config</code> isn&#39;t reset between the fits:</p><pre><code class="language-julia hljs">config = EvoTreeRegressor(rowsample=0.5, rng=123)
m1 = fit_evotree(config, df; target_name=&quot;y&quot;);
m2 = fit_evotree(config, df; target_name=&quot;y&quot;);</code></pre><p>Note that in presence of multiple identical or very highly correlated features, model may not be reproducible if features are permuted since in situation where 2 features provide identical gains, the first one will be selected. Therefore, if the identity relationship doesn&#39;t hold on new data, different predictions will be returned from models trained on different features order. </p><p>At the moment, there&#39;s no reproducibility guarantee on GPU, although this may change in the future. </p><h2 id="Missing-values"><a class="docs-heading-anchor" href="#Missing-values">Missing values</a><a id="Missing-values-1"></a><a class="docs-heading-anchor-permalink" href="#Missing-values" title="Permalink"></a></h2><h3 id="Features"><a class="docs-heading-anchor" href="#Features">Features</a><a id="Features-1"></a><a class="docs-heading-anchor-permalink" href="#Features" title="Permalink"></a></h3><p>EvoTrees does not handle features having missing values. Proper preprocessing of the data is therefore needed (and a general good practice regardless of the ML model used).</p><p>This includes situations where values may be all non-missing, but where the <code>eltype</code> is the form <code>Union{Missing,Float64}</code>. A conversion the types using <code>identity</code> is recommended: </p><pre><code class="language-julia hljs">julia&gt; x = Vector{Union{Missing, Float64}}([1, 2])
m2 = fit_evotree(config, df; target_name=&quot;y&quot;);</code></pre><p>Note that in presence of multiple identical or very highly correlated features, model may not be reproducible if features are permuted since in situation where 2 features provide identical gains, the first one will be selected. Therefore, if the identity relationship doesn&#39;t hold on new data, different predictions will be returned from models trained on different features order. </p><p>At the moment, there&#39;s no reproducibility guarantee on GPU, although this may change in the future. </p><h2 id="Missing-values"><a class="docs-heading-anchor" href="#Missing-values">Missing values</a><a id="Missing-values-1"></a><a class="docs-heading-anchor-permalink" href="#Missing-values" title="Permalink"></a></h2><h3 id="Features"><a class="docs-heading-anchor" href="#Features">Features</a><a id="Features-1"></a><a class="docs-heading-anchor-permalink" href="#Features" title="Permalink"></a></h3><p>EvoTrees does not handle features having missing values. Proper preprocessing of the data is therefore needed (and a general good practice regardless of the ML model used).</p><p>This includes situations where values may be all non-missing, but where the <code>eltype</code> is <code>Union{Missing,Float64}</code> or <code>Any</code> for example. A conversion using <code>identity</code> is then recommended: </p><pre><code class="language-julia hljs">julia&gt; x = Vector{Union{Missing, Float64}}([1, 2])
2-element Vector{Union{Missing, Float64}}:
1.0
2.0

julia&gt; identity.(x)
2-element Vector{Float64}:
1.0
2.0</code></pre><p>For dealing with numerical or ordered categorical features containing missing values, a common approach is to first create an <code>Bool</code> indicator variable capturing the info on whether a value is missing:</p><pre><code class="language-julia hljs">transform!(df, :my_feat =&gt; ByRow(ismissing) =&gt; :my_feat_ismissing)</code></pre><p>Then, the missing values can be imputed (replaced by some default values such as <code>mean</code> or <code>median</code>, or using a more sophisticated approach such as predictions from another model):</p><pre><code class="language-julia hljs">transform!(df, :my_feat =&gt; (x -&gt; coalesce.(x, median(skipmissing(x)))) =&gt; :my_feat);</code></pre><p>For unordered categorical variables, a recode of the missing into a non missing level is sufficient:</p><pre><code class="language-julia hljs">julia&gt; x = categorical([&quot;a&quot;, &quot;b&quot;, missing])
2.0</code></pre><p>For dealing with numerical or ordered categorical features containing missing values, a common approach is to first create an <code>Bool</code> variable capturing the info on whether a value is missing:</p><pre><code class="language-julia hljs">using DataFrames
transform!(df, :my_feat =&gt; ByRow(ismissing) =&gt; :my_feat_ismissing)</code></pre><p>Then, the missing values can be imputed (replaced by some default values such as <code>mean</code> or <code>median</code>, or using a more sophisticated approach such as predictions from another model):</p><pre><code class="language-julia hljs">transform!(df, :my_feat =&gt; (x -&gt; coalesce.(x, median(skipmissing(x)))) =&gt; :my_feat);</code></pre><p>For unordered categorical variables, a recode of the missing into a non missing level is sufficient:</p><pre><code class="language-julia hljs">using CategoricalArrays
julia&gt; x = categorical([&quot;a&quot;, &quot;b&quot;, missing])
3-element CategoricalArray{Union{Missing, String},1,UInt32}:
&quot;a&quot;
&quot;b&quot;
Expand All @@ -38,4 +40,4 @@
&quot;a&quot;
&quot;b&quot;
&quot;missing value&quot;</code></pre><h3 id="Target"><a class="docs-heading-anchor" href="#Target">Target</a><a id="Target-1"></a><a class="docs-heading-anchor-permalink" href="#Target" title="Permalink"></a></h3><p>Target variable must have its element type <code>&lt;:Real</code>. Only exception is for <code>EvoTreeClassifier</code> for which <code>CategoricalValue</code>, <code>Integer</code>, <code>String</code> and <code>Char</code> are supported.</p><h2 id="Save/Load"><a class="docs-heading-anchor" href="#Save/Load">Save/Load</a><a id="Save/Load-1"></a><a class="docs-heading-anchor-permalink" href="#Save/Load" title="Permalink"></a></h2><pre><code class="language-julia hljs">EvoTrees.save(m, &quot;data/model.bson&quot;)
m = EvoTrees.load(&quot;data/model.bson&quot;);</code></pre></article><nav class="docs-footer"><a class="docs-footer-nextpage" href="models/">Models »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="auto">Automatic (OS)</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.1.0 on <span class="colophon-date" title="Thursday 5 October 2023 03:24">Thursday 5 October 2023</span>. Using Julia version 1.6.7.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
m = EvoTrees.load(&quot;data/model.bson&quot;);</code></pre></article><nav class="docs-footer"><a class="docs-footer-nextpage" href="models/">Models »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="auto">Automatic (OS)</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.1.0 on <span class="colophon-date" title="Thursday 5 October 2023 04:04">Thursday 5 October 2023</span>. Using Julia version 1.6.7.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
Loading

0 comments on commit b9b6843

Please sign in to comment.