-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression #237
base: master
Are you sure you want to change the base?
Regression #237
Conversation
Signed-off-by: Omar Shrit <[email protected]>
Signed-off-by: Omar Shrit <[email protected]>
Signed-off-by: Omar Shrit <[email protected]>
Signed-off-by: Omar Shrit <[email protected]>
Signed-off-by: Omar Shrit <[email protected]>
Signed-off-by: Omar Shrit <[email protected]>
Signed-off-by: Omar Shrit <[email protected]>
Signed-off-by: Omar Shrit <[email protected]>
Signed-off-by: Omar Shrit <[email protected]>
Signed-off-by: Omar Shrit <[email protected]>
Signed-off-by: Omar Shrit <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only reviewed one example so far. I'm not entirely sure that all of these notebooks will adapt well to C++-only programs. Many of them focus on visualization and exploration, which isn't really a thing we can do just in a standalone program. So I think either they need to be really carefully gone over and all irrelevant parts removed, or maybe we leave them as notebook-only examples. Let me know what you think.
//!mkdir -p data && cat avocado.csv | sed 1d > avocado_trim.csv" | ||
//"Drop columns 1 and 2 (\"Unamed: 0\", \"Date\") as these are not required and their presence cause issues while loading the data." | ||
//!rm avocado_trim.csv" | ||
//"!mv avocado_trim2.csv avocado_trim.csv" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this isn't a notebook, we don't have the ability to call out to the shell like this. So maybe we either indicate in the comments that we expect the user to run these commands, or we adapt download_data_set.py
to get the data into the right form first (or, the version we link to on datasets.mlpack.org is pre-adapted).
cpp/linear_regression/avocado_price_prediction/avocado_price_prediction.cpp
Outdated
Show resolved
Hide resolved
cpp/linear_regression/avocado_price_prediction/avocado_price_prediction.cpp
Outdated
Show resolved
Hide resolved
cpp/linear_regression/avocado_price_prediction/avocado_price_prediction.cpp
Outdated
Show resolved
Hide resolved
cpp/linear_regression/avocado_price_prediction/avocado_price_prediction.cpp
Outdated
Show resolved
Hide resolved
cpp/linear_regression/avocado_price_prediction/avocado_price_prediction.cpp
Outdated
Show resolved
Hide resolved
cpp/linear_regression/avocado_price_prediction/avocado_price_prediction.cpp
Outdated
Show resolved
Hide resolved
cpp/linear_regression/avocado_price_prediction/avocado_price_prediction.cpp
Outdated
Show resolved
Hide resolved
cpp/linear_regression/avocado_price_prediction/avocado_price_prediction.cpp
Outdated
Show resolved
Hide resolved
cpp/linear_regression/avocado_price_prediction/avocado_price_prediction.cpp
Outdated
Show resolved
Hide resolved
I agree, I am trying to have at least one usage from the examples at least per method, because we have so many methods and really few examples. I know if someone looks at the docs they would find some examples from them, but I agree, no need anymore to import the old ones, we can be more creative and have new ones. Let us merge these three once I have solved the comments, I have tested them locally and I know they work, and they are old, as I did them one month ago or even more |
…rediction.cpp Co-authored-by: Ryan Curtin <[email protected]>
…rediction.cpp Co-authored-by: Ryan Curtin <[email protected]>
…rediction.cpp Co-authored-by: Ryan Curtin <[email protected]>
Sounds good to me---my primary concern is just making sure that all the comments in the file next to the code (which is what users will actually be looking at) make sense and don't reference things specific to the notebook but not in this version. 👍 |
Also, looks like the steps in |
Signed-off-by: Omar Shrit <[email protected]>
Signed-off-by: Omar Shrit <[email protected]>
Signed-off-by: Omar Shrit <[email protected]>
Signed-off-by: Omar Shrit <[email protected]>
Signed-off-by: Omar Shrit <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. In the longer term I am a little worried that we will have too many similar examples that will cause users to get lost, but I think we can sort that out later. I left a comment to that effect in the review, but up to you how you want to handle it (and if the answer is let's merge all three for now and sort it out later, that's fine with me).
default: all | ||
|
||
$(TARGET): $(OBJS) | ||
$(CXX) $(OBJS) -o $(TARGET) $(LDFLAGS) $(LIBS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to get CXXFLAGS
in here too?
* which can later be unmapped to strings. | ||
*/ | ||
/** PLEASE, delete the header of the dataset once you have downloaded the | ||
* datset to your data/ directory. **/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely we should support headers soon, but for now should we modify the dataset downloader to strip the header?
# default rule | ||
default: all | ||
|
||
$(TARGET): $(OBJS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here with CXXFLAGS
@@ -0,0 +1,138 @@ | |||
/** | |||
* Predicting California House Prices with Linear Regression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code-wise, this is basically the same example as the avocado one, but with a different dataset. Do you think it is worth including? I think quality over quantity is better, so if this is not demonstrating some new or different functionality vs. the avocado example, I would vote to only keep one of the two.
# for the BLAS and LAPACK libraries that you are using. | ||
|
||
TARGET := salary_prediction | ||
SRC := salary-prediction.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably rename this to salary_prediction.cpp
for consistency.
ungzip("avocado.csv.gz", "avocado.csv") | ||
avocado_data = pull_csv("avocado.csv") | ||
avocado_data = avocado_data.iloc[:, 2:] | ||
avocado_data.to_csv("avocado.csv", index=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or are you already stripping the header here? If so, it would probably be better to omit the comments from the files and work on adding header support to data::Load()
quickly, then we don't have to come back to the comments here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need the header support for data::Load
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree, but what I mean in this comment is, it looks like you are stripping the header from the avocado dataset and also other datasets. So I think we can remove the warning comments in the code that say 'make sure the header is removed', because it is already removed here---and also because we will be adding header removal support very soon to data::Load()
.
* Once the model is trained, we will be able to do some sample predictions. | ||
*/ | ||
#include <mlpack.hpp> | ||
#include <cmath> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this include is unnecessary.
Hi,
Ported a couple of examples of regression examples mostly linear