Skip to content

Commit

Permalink
rebuild and retest
Browse files Browse the repository at this point in the history
  • Loading branch information
JohnMount committed Sep 28, 2023
1 parent 6d78489 commit 48bae74
Show file tree
Hide file tree
Showing 5 changed files with 54 additions and 52 deletions.
2 changes: 1 addition & 1 deletion coverage.txt
Original file line number Diff line number Diff line change
Expand Up @@ -102,4 +102,4 @@ pkg/vtreat/vtreat_impl.py 711 61 91%
-------------------------------------------------------------
TOTAL 1593 126 92%

================= 45 passed, 15 warnings in 137.81s (0:02:17) ==================
================== 45 passed, 15 warnings in 81.34s (0:01:21) ==================
101 changes: 51 additions & 50 deletions docs/vtreat.html
Original file line number Diff line number Diff line change
Expand Up @@ -114,57 +114,58 @@ <h1 class="modulename">
</span><span id="L-8"><a href="#L-8"><span class="linenos"> 8</span></a><span class="c1"># noinspection PyUnresolvedReferences</span>
</span><span id="L-9"><a href="#L-9"><span class="linenos"> 9</span></a><span class="kn">import</span> <span class="nn">numpy</span>
</span><span id="L-10"><a href="#L-10"><span class="linenos">10</span></a>
</span><span id="L-11"><a href="#L-11"><span class="linenos">11</span></a><span class="kn">from</span> <span class="nn">vtreat.vtreat_api</span> <span class="kn">import</span> <span class="o">*</span>
</span><span id="L-11"><a href="#L-11"><span class="linenos">11</span></a><span class="kn">from</span> <span class="nn">vtreat.vtreat_api</span> <span class="kn">import</span> <span class="n">unsupervised_parameters</span><span class="p">,</span> <span class="n">vtreat_parameters</span><span class="p">,</span> <span class="n">BinomialOutcomeTreatment</span><span class="p">,</span> <span class="n">MultinomialOutcomeTreatment</span><span class="p">,</span> <span class="n">NumericOutcomeTreatment</span><span class="p">,</span> <span class="n">UnsupervisedTreatment</span>
</span><span id="L-12"><a href="#L-12"><span class="linenos">12</span></a>
</span><span id="L-13"><a href="#L-13"><span class="linenos">13</span></a><span class="n">__docformat__</span> <span class="o">=</span> <span class="s2">&quot;restructuredtext&quot;</span>
</span><span id="L-14"><a href="#L-14"><span class="linenos">14</span></a><span class="n">__version__</span> <span class="o">=</span> <span class="s2">&quot;1.3.0&quot;</span>
</span><span id="L-15"><a href="#L-15"><span class="linenos">15</span></a>
</span><span id="L-16"><a href="#L-16"><span class="linenos">16</span></a><span class="vm">__doc__</span> <span class="o">=</span> <span class="s2">&quot;&quot;&quot;</span>
</span><span id="L-17"><a href="#L-17"><span class="linenos">17</span></a><span class="s2">This&lt;https://github.com/WinVector/pyvtreat&gt; is the Python version of the vtreat data preparation system</span>
</span><span id="L-18"><a href="#L-18"><span class="linenos">18</span></a><span class="s2">(also available as an R package&lt;https://winvector.github.io/vtreat/&gt;.</span>
</span><span id="L-19"><a href="#L-19"><span class="linenos">19</span></a>
</span><span id="L-20"><a href="#L-20"><span class="linenos">20</span></a><span class="s2">vtreat is a DataFrame processor/conditioner that prepares</span>
</span><span id="L-21"><a href="#L-21"><span class="linenos">21</span></a><span class="s2">real-world data for supervised machine learning or predictive modeling</span>
</span><span id="L-22"><a href="#L-22"><span class="linenos">22</span></a><span class="s2">in a statistically sound manner.</span>
</span><span id="L-23"><a href="#L-23"><span class="linenos">23</span></a>
</span><span id="L-24"><a href="#L-24"><span class="linenos">24</span></a><span class="s2">vtreat takes an input DataFrame</span>
</span><span id="L-25"><a href="#L-25"><span class="linenos">25</span></a><span class="s2">that has a specified column called &quot;the outcome variable&quot; (or &quot;y&quot;)</span>
</span><span id="L-26"><a href="#L-26"><span class="linenos">26</span></a><span class="s2">that is the quantity to be predicted (and must not have missing</span>
</span><span id="L-27"><a href="#L-27"><span class="linenos">27</span></a><span class="s2">values). Other input columns are possible explanatory variables</span>
</span><span id="L-28"><a href="#L-28"><span class="linenos">28</span></a><span class="s2">(typically numeric or categorical/string-valued, these columns may</span>
</span><span id="L-29"><a href="#L-29"><span class="linenos">29</span></a><span class="s2">have missing values) that the user later wants to use to predict &quot;y&quot;.</span>
</span><span id="L-30"><a href="#L-30"><span class="linenos">30</span></a><span class="s2">In practice such an input DataFrame may not be immediately suitable</span>
</span><span id="L-31"><a href="#L-31"><span class="linenos">31</span></a><span class="s2">for machine learning procedures that often expect only numeric</span>
</span><span id="L-32"><a href="#L-32"><span class="linenos">32</span></a><span class="s2">explanatory variables, and may not tolerate missing values.</span>
</span><span id="L-33"><a href="#L-33"><span class="linenos">33</span></a>
</span><span id="L-34"><a href="#L-34"><span class="linenos">34</span></a><span class="s2">To solve this, vtreat builds a transformed DataFrame where all</span>
</span><span id="L-35"><a href="#L-35"><span class="linenos">35</span></a><span class="s2">explanatory variable columns have been transformed into a number of</span>
</span><span id="L-36"><a href="#L-36"><span class="linenos">36</span></a><span class="s2">numeric explanatory variable columns, without missing values. The</span>
</span><span id="L-37"><a href="#L-37"><span class="linenos">37</span></a><span class="s2">vtreat implementation produces derived numeric columns that capture</span>
</span><span id="L-38"><a href="#L-38"><span class="linenos">38</span></a><span class="s2">most of the information relating the explanatory columns to the</span>
</span><span id="L-39"><a href="#L-39"><span class="linenos">39</span></a><span class="s2">specified &quot;y&quot; or dependent/outcome column through a number of numeric</span>
</span><span id="L-40"><a href="#L-40"><span class="linenos">40</span></a><span class="s2">transforms (indicator variables, impact codes, prevalence codes, and</span>
</span><span id="L-41"><a href="#L-41"><span class="linenos">41</span></a><span class="s2">more). This transformed DataFrame is suitable for a wide range of</span>
</span><span id="L-42"><a href="#L-42"><span class="linenos">42</span></a><span class="s2">supervised learning methods from linear regression, through gradient</span>
</span><span id="L-43"><a href="#L-43"><span class="linenos">43</span></a><span class="s2">boosted machines.</span>
</span><span id="L-44"><a href="#L-44"><span class="linenos">44</span></a>
</span><span id="L-45"><a href="#L-45"><span class="linenos">45</span></a><span class="s2">The idea is: you can take a DataFrame of messy real world data and</span>
</span><span id="L-46"><a href="#L-46"><span class="linenos">46</span></a><span class="s2">easily, faithfully, reliably, and repeatably prepare it for machine</span>
</span><span id="L-47"><a href="#L-47"><span class="linenos">47</span></a><span class="s2">learning using documented methods using vtreat. Incorporating</span>
</span><span id="L-48"><a href="#L-48"><span class="linenos">48</span></a><span class="s2">vtreat into your machine learning workflow lets you quickly work</span>
</span><span id="L-49"><a href="#L-49"><span class="linenos">49</span></a><span class="s2">with very diverse structured data.</span>
</span><span id="L-50"><a href="#L-50"><span class="linenos">50</span></a>
</span><span id="L-51"><a href="#L-51"><span class="linenos">51</span></a><span class="s2">Worked examples can be found `here`&lt;https://github.com/WinVector/pyvtreat/tree/master/Examples&gt;.</span>
</span><span id="L-52"><a href="#L-52"><span class="linenos">52</span></a>
</span><span id="L-53"><a href="#L-53"><span class="linenos">53</span></a><span class="s2">For more detail please see here: `arXiv:1611.09477</span>
</span><span id="L-54"><a href="#L-54"><span class="linenos">54</span></a><span class="s2">stat.AP`&lt;https://arxiv.org/abs/1611.09477&gt; (the documentation describes the R version,</span>
</span><span id="L-55"><a href="#L-55"><span class="linenos">55</span></a><span class="s2">however all of the examples can be found worked in Python </span>
</span><span id="L-56"><a href="#L-56"><span class="linenos">56</span></a><span class="s2">`here`&lt;https://github.com/WinVector/pyvtreat/tree/master/Examples/vtreat_paper1&gt;).</span>
</span><span id="L-57"><a href="#L-57"><span class="linenos">57</span></a>
</span><span id="L-58"><a href="#L-58"><span class="linenos">58</span></a><span class="s2">vtreat is available</span>
</span><span id="L-59"><a href="#L-59"><span class="linenos">59</span></a><span class="s2">as a `Python/Pandas package`&lt;https://github.com/WinVector/vtreat&gt;,</span>
</span><span id="L-60"><a href="#L-60"><span class="linenos">60</span></a><span class="s2">and also as an `R package`&lt;https://github.com/WinVector/vtreat&gt;.</span>
</span><span id="L-61"><a href="#L-61"><span class="linenos">61</span></a><span class="s2">&quot;&quot;&quot;</span>
</span><span id="L-13"><a href="#L-13"><span class="linenos">13</span></a>
</span><span id="L-14"><a href="#L-14"><span class="linenos">14</span></a><span class="n">__docformat__</span> <span class="o">=</span> <span class="s2">&quot;restructuredtext&quot;</span>
</span><span id="L-15"><a href="#L-15"><span class="linenos">15</span></a><span class="n">__version__</span> <span class="o">=</span> <span class="s2">&quot;1.3.0&quot;</span>
</span><span id="L-16"><a href="#L-16"><span class="linenos">16</span></a>
</span><span id="L-17"><a href="#L-17"><span class="linenos">17</span></a><span class="vm">__doc__</span> <span class="o">=</span> <span class="s2">&quot;&quot;&quot;</span>
</span><span id="L-18"><a href="#L-18"><span class="linenos">18</span></a><span class="s2">This&lt;https://github.com/WinVector/pyvtreat&gt; is the Python version of the vtreat data preparation system</span>
</span><span id="L-19"><a href="#L-19"><span class="linenos">19</span></a><span class="s2">(also available as an R package&lt;https://winvector.github.io/vtreat/&gt;.</span>
</span><span id="L-20"><a href="#L-20"><span class="linenos">20</span></a>
</span><span id="L-21"><a href="#L-21"><span class="linenos">21</span></a><span class="s2">vtreat is a DataFrame processor/conditioner that prepares</span>
</span><span id="L-22"><a href="#L-22"><span class="linenos">22</span></a><span class="s2">real-world data for supervised machine learning or predictive modeling</span>
</span><span id="L-23"><a href="#L-23"><span class="linenos">23</span></a><span class="s2">in a statistically sound manner.</span>
</span><span id="L-24"><a href="#L-24"><span class="linenos">24</span></a>
</span><span id="L-25"><a href="#L-25"><span class="linenos">25</span></a><span class="s2">vtreat takes an input DataFrame</span>
</span><span id="L-26"><a href="#L-26"><span class="linenos">26</span></a><span class="s2">that has a specified column called &quot;the outcome variable&quot; (or &quot;y&quot;)</span>
</span><span id="L-27"><a href="#L-27"><span class="linenos">27</span></a><span class="s2">that is the quantity to be predicted (and must not have missing</span>
</span><span id="L-28"><a href="#L-28"><span class="linenos">28</span></a><span class="s2">values). Other input columns are possible explanatory variables</span>
</span><span id="L-29"><a href="#L-29"><span class="linenos">29</span></a><span class="s2">(typically numeric or categorical/string-valued, these columns may</span>
</span><span id="L-30"><a href="#L-30"><span class="linenos">30</span></a><span class="s2">have missing values) that the user later wants to use to predict &quot;y&quot;.</span>
</span><span id="L-31"><a href="#L-31"><span class="linenos">31</span></a><span class="s2">In practice such an input DataFrame may not be immediately suitable</span>
</span><span id="L-32"><a href="#L-32"><span class="linenos">32</span></a><span class="s2">for machine learning procedures that often expect only numeric</span>
</span><span id="L-33"><a href="#L-33"><span class="linenos">33</span></a><span class="s2">explanatory variables, and may not tolerate missing values.</span>
</span><span id="L-34"><a href="#L-34"><span class="linenos">34</span></a>
</span><span id="L-35"><a href="#L-35"><span class="linenos">35</span></a><span class="s2">To solve this, vtreat builds a transformed DataFrame where all</span>
</span><span id="L-36"><a href="#L-36"><span class="linenos">36</span></a><span class="s2">explanatory variable columns have been transformed into a number of</span>
</span><span id="L-37"><a href="#L-37"><span class="linenos">37</span></a><span class="s2">numeric explanatory variable columns, without missing values. The</span>
</span><span id="L-38"><a href="#L-38"><span class="linenos">38</span></a><span class="s2">vtreat implementation produces derived numeric columns that capture</span>
</span><span id="L-39"><a href="#L-39"><span class="linenos">39</span></a><span class="s2">most of the information relating the explanatory columns to the</span>
</span><span id="L-40"><a href="#L-40"><span class="linenos">40</span></a><span class="s2">specified &quot;y&quot; or dependent/outcome column through a number of numeric</span>
</span><span id="L-41"><a href="#L-41"><span class="linenos">41</span></a><span class="s2">transforms (indicator variables, impact codes, prevalence codes, and</span>
</span><span id="L-42"><a href="#L-42"><span class="linenos">42</span></a><span class="s2">more). This transformed DataFrame is suitable for a wide range of</span>
</span><span id="L-43"><a href="#L-43"><span class="linenos">43</span></a><span class="s2">supervised learning methods from linear regression, through gradient</span>
</span><span id="L-44"><a href="#L-44"><span class="linenos">44</span></a><span class="s2">boosted machines.</span>
</span><span id="L-45"><a href="#L-45"><span class="linenos">45</span></a>
</span><span id="L-46"><a href="#L-46"><span class="linenos">46</span></a><span class="s2">The idea is: you can take a DataFrame of messy real world data and</span>
</span><span id="L-47"><a href="#L-47"><span class="linenos">47</span></a><span class="s2">easily, faithfully, reliably, and repeatably prepare it for machine</span>
</span><span id="L-48"><a href="#L-48"><span class="linenos">48</span></a><span class="s2">learning using documented methods using vtreat. Incorporating</span>
</span><span id="L-49"><a href="#L-49"><span class="linenos">49</span></a><span class="s2">vtreat into your machine learning workflow lets you quickly work</span>
</span><span id="L-50"><a href="#L-50"><span class="linenos">50</span></a><span class="s2">with very diverse structured data.</span>
</span><span id="L-51"><a href="#L-51"><span class="linenos">51</span></a>
</span><span id="L-52"><a href="#L-52"><span class="linenos">52</span></a><span class="s2">Worked examples can be found `here`&lt;https://github.com/WinVector/pyvtreat/tree/master/Examples&gt;.</span>
</span><span id="L-53"><a href="#L-53"><span class="linenos">53</span></a>
</span><span id="L-54"><a href="#L-54"><span class="linenos">54</span></a><span class="s2">For more detail please see here: `arXiv:1611.09477</span>
</span><span id="L-55"><a href="#L-55"><span class="linenos">55</span></a><span class="s2">stat.AP`&lt;https://arxiv.org/abs/1611.09477&gt; (the documentation describes the R version,</span>
</span><span id="L-56"><a href="#L-56"><span class="linenos">56</span></a><span class="s2">however all of the examples can be found worked in Python </span>
</span><span id="L-57"><a href="#L-57"><span class="linenos">57</span></a><span class="s2">`here`&lt;https://github.com/WinVector/pyvtreat/tree/master/Examples/vtreat_paper1&gt;).</span>
</span><span id="L-58"><a href="#L-58"><span class="linenos">58</span></a>
</span><span id="L-59"><a href="#L-59"><span class="linenos">59</span></a><span class="s2">vtreat is available</span>
</span><span id="L-60"><a href="#L-60"><span class="linenos">60</span></a><span class="s2">as a `Python/Pandas package`&lt;https://github.com/WinVector/vtreat&gt;,</span>
</span><span id="L-61"><a href="#L-61"><span class="linenos">61</span></a><span class="s2">and also as an `R package`&lt;https://github.com/WinVector/vtreat&gt;.</span>
</span><span id="L-62"><a href="#L-62"><span class="linenos">62</span></a><span class="s2">&quot;&quot;&quot;</span>
</span></pre></div>


Expand Down
3 changes: 2 additions & 1 deletion pkg/build/lib/vtreat/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@
# noinspection PyUnresolvedReferences
import numpy

from vtreat.vtreat_api import *
from vtreat.vtreat_api import unsupervised_parameters, vtreat_parameters, BinomialOutcomeTreatment, MultinomialOutcomeTreatment, NumericOutcomeTreatment, UnsupervisedTreatment


__docformat__ = "restructuredtext"
__version__ = "1.3.0"
Expand Down
Binary file modified pkg/dist/vtreat-1.3.0-py3-none-any.whl
Binary file not shown.
Binary file modified pkg/dist/vtreat-1.3.0.tar.gz
Binary file not shown.

0 comments on commit 48bae74

Please sign in to comment.