Regression #237

shrit · 2024-09-29T18:53:00Z

Hi,

Ported a couple of examples of regression examples mostly linear

Signed-off-by: Omar Shrit <[email protected]>

github-actions · 2024-09-29T18:53:11Z

👈 Launch a binder notebook on branch shrit/examples/logistic

Signed-off-by: Omar Shrit <[email protected]>

rcurtin

I only reviewed one example so far. I'm not entirely sure that all of these notebooks will adapt well to C++-only programs. Many of them focus on visualization and exploration, which isn't really a thing we can do just in a standalone program. So I think either they need to be really carefully gone over and all irrelevant parts removed, or maybe we leave them as notebook-only examples. Let me know what you think.

rcurtin · 2024-09-30T19:45:01Z

cpp/linear_regression/avocado_price_prediction/avocado_price_prediction.cpp

+//!mkdir -p data && cat avocado.csv | sed 1d > avocado_trim.csv"
+//"Drop columns 1 and 2 (\"Unamed: 0\", \"Date\") as these are not required and their presence cause issues while loading the data."
+//!rm avocado_trim.csv"
+//"!mv avocado_trim2.csv avocado_trim.csv"


Since this isn't a notebook, we don't have the ability to call out to the shell like this. So maybe we either indicate in the comments that we expect the user to run these commands, or we adapt download_data_set.py to get the data into the right form first (or, the version we link to on datasets.mlpack.org is pre-adapted).

cpp/linear_regression/avocado_price_prediction/avocado_price_prediction.cpp

shrit · 2024-10-01T17:56:04Z

I agree, I am trying to have at least one usage from the examples at least per method, because we have so many methods and really few examples.

I know if someone looks at the docs they would find some examples from them, but I agree, no need anymore to import the old ones, we can be more creative and have new ones.

Let us merge these three once I have solved the comments, I have tested them locally and I know they work, and they are old, as I did them one month ago or even more

…rediction.cpp Co-authored-by: Ryan Curtin <[email protected]>

rcurtin · 2024-10-02T22:13:50Z

Let us merge these three once I have solved the comments, I have tested them locally and I know they work, and they are old, as I did them one month ago or even more

Sounds good to me---my primary concern is just making sure that all the comments in the file next to the code (which is what users will actually be looking at) make sense and don't reference things specific to the notebook but not in this version. 👍

rcurtin · 2024-10-02T22:15:04Z

Also, looks like the steps in .ci/ just need slight modification to also install Pandas.

Signed-off-by: Omar Shrit <[email protected]>

rcurtin

Looks good overall. In the longer term I am a little worried that we will have too many similar examples that will cause users to get lost, but I think we can sort that out later. I left a comment to that effect in the review, but up to you how you want to handle it (and if the answer is let's merge all three for now and sort it out later, that's fine with me).

rcurtin · 2024-10-21T21:02:52Z

cpp/linear_regression/avocado_price_prediction/Makefile

+default: all
+
+$(TARGET): $(OBJS)
+	$(CXX) $(OBJS) -o $(TARGET) $(LDFLAGS) $(LIBS)


Do we need to get CXXFLAGS in here too?

rcurtin · 2024-10-21T21:04:05Z

cpp/linear_regression/avocado_price_prediction/avocado_price_prediction.cpp

+   * which can later be unmapped to strings.
+  */
+  /** PLEASE, delete the header of the dataset once you have downloaded the
+   * datset to your data/ directory. **/


Definitely we should support headers soon, but for now should we modify the dataset downloader to strip the header?

rcurtin · 2024-10-21T21:05:44Z

cpp/linear_regression/california_housing_price_prediction/Makefile

+# default rule
+default: all
+
+$(TARGET): $(OBJS)


Same here with CXXFLAGS

rcurtin · 2024-10-21T21:08:39Z

cpp/linear_regression/california_housing_price_prediction/california_house_price_prediction.cpp

@@ -0,0 +1,138 @@
+/**
+* Predicting  California House Prices with  Linear Regression


Code-wise, this is basically the same example as the avocado one, but with a different dataset. Do you think it is worth including? I think quality over quantity is better, so if this is not demonstrating some new or different functionality vs. the avocado example, I would vote to only keep one of the two.

rcurtin · 2024-10-21T21:09:14Z

cpp/linear_regression/salary_prediction/Makefile

+# for the BLAS and LAPACK libraries that you are using.
+
+TARGET := salary_prediction 
+SRC := salary-prediction.cpp


We should probably rename this to salary_prediction.cpp for consistency.

rcurtin · 2024-10-21T21:11:49Z

scripts/download_data_set.py

+    ungzip("avocado.csv.gz", "avocado.csv")
+    avocado_data = pull_csv("avocado.csv")
+    avocado_data = avocado_data.iloc[:, 2:]
+    avocado_data.to_csv("avocado.csv", index=False)


Or are you already stripping the header here? If so, it would probably be better to omit the comments from the files and work on adding header support to data::Load() quickly, then we don't have to come back to the comments here.

we need the header support for data::Load

Totally agree, but what I mean in this comment is, it looks like you are stripping the header from the avocado dataset and also other datasets. So I think we can remove the warning comments in the code that say 'make sure the header is removed', because it is already removed here---and also because we will be adding header removal support very soon to data::Load().

scripts/download_data_set.py

rcurtin · 2024-10-21T21:12:31Z

cpp/linear_regression/salary_prediction/salary-prediction.cpp

+ * Once the model is trained, we will be able to do some sample predictions.
+*/
+#include <mlpack.hpp>
+#include <cmath>


I think this include is unnecessary.

shrit added 10 commits August 4, 2024 19:19

Add avocado price prediction linear regression example

6b69c32

Signed-off-by: Omar Shrit <[email protected]>

Adding the avocado dataset download script

940f0dc

Signed-off-by: Omar Shrit <[email protected]>

Trim the first two columns with pandas

739b4ae

Signed-off-by: Omar Shrit <[email protected]>

California housing

a1bfda1

Signed-off-by: Omar Shrit <[email protected]>

Adding salary prediction

25b616d

Signed-off-by: Omar Shrit <[email protected]>

Add a make file make it compile

d83aafb

Signed-off-by: Omar Shrit <[email protected]>

Adding another example

a0a770c

Signed-off-by: Omar Shrit <[email protected]>

Compiling..

f04f9d4

Signed-off-by: Omar Shrit <[email protected]>

Salary prediction example functional

a283367

Signed-off-by: Omar Shrit <[email protected]>

Adding the california housing price prediction example

76f3ed5

Signed-off-by: Omar Shrit <[email protected]>

Merge branch 'master' into logistic

50b69ec

Signed-off-by: Omar Shrit <[email protected]>

rcurtin reviewed Sep 30, 2024

View reviewed changes

shrit and others added 3 commits October 2, 2024 17:59

Update cpp/linear_regression/avocado_price_prediction/avocado_price_p…

56064a5

…rediction.cpp Co-authored-by: Ryan Curtin <[email protected]>

Update cpp/linear_regression/avocado_price_prediction/avocado_price_p…

b005eca

…rediction.cpp Co-authored-by: Ryan Curtin <[email protected]>

Update cpp/linear_regression/avocado_price_prediction/avocado_price_p…

529d66b

…rediction.cpp Co-authored-by: Ryan Curtin <[email protected]>

shrit added 5 commits October 4, 2024 12:59

Fix the example and make it run

3a25215

Signed-off-by: Omar Shrit <[email protected]>

Fix the bugs in california housing

f541e41

Signed-off-by: Omar Shrit <[email protected]>

Clean a bit this example

7248b44

Signed-off-by: Omar Shrit <[email protected]>

Add pandas

502b717

Signed-off-by: Omar Shrit <[email protected]>

Change armadillo version

2dbbb97

Signed-off-by: Omar Shrit <[email protected]>

rcurtin reviewed Oct 21, 2024

View reviewed changes

github-actions bot added the s: stale label Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression #237

Regression #237

shrit commented Sep 29, 2024

github-actions bot commented Sep 29, 2024

rcurtin left a comment

rcurtin Sep 30, 2024

shrit commented Oct 1, 2024

rcurtin commented Oct 2, 2024

rcurtin commented Oct 2, 2024

rcurtin left a comment

rcurtin Oct 21, 2024

rcurtin Oct 21, 2024

rcurtin Oct 21, 2024

rcurtin Oct 21, 2024

rcurtin Oct 21, 2024

rcurtin Oct 21, 2024

shrit Oct 22, 2024

rcurtin Oct 25, 2024

rcurtin Oct 21, 2024

		@@ -0,0 +1,138 @@
		/**
		* Predicting California House Prices with Linear Regression

Regression #237

Are you sure you want to change the base?

Regression #237

Conversation

shrit commented Sep 29, 2024

github-actions bot commented Sep 29, 2024

rcurtin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shrit commented Oct 1, 2024

rcurtin commented Oct 2, 2024

rcurtin commented Oct 2, 2024

rcurtin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment