New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

IMP: Add *_samples_ncv pipelines #177

Closed

Oddant1 wants to merge 1 commit into qiime2:master from Oddant1:ncv_to_pipeline

Member

Oddant1 commented Sep 2, 2019

Closes #160. Initial stab at implementing the *_samples_ncv pipelines


          IMP: Add *_samples_ncv pipelines

d8db176

Oddant1 assigned nbokulich

nbokulich requested changes

View reviewed changes

Member

nbokulich left a comment

@Oddant1 thanks for putting this together. This is a good start, you have the basic workflow details in place, but a lot more work needs to be done. See the in-line comments.

If you plan to proceed, please also:

register these actions in plugin_setup.py
write a basic test for each pipeline (just to make sure they work). Create some toy arrays for this with numpy, do not use the real datasets we currently have in there, we need to cut down on runtime.

q2_sample_classifier/classify.py

		@@ -323,6 +323,68 @@ def regress_samples_ncv(
		return y_pred, importances


		def regress_samples_ncv_piepline(

Member

nbokulich Sep 5, 2019

misspelled pipeline

q2_sample_classifier/classify.py

@@ @@ -323,6 +323,68 @@ def regress_samples_ncv( @@
                   return y_pred, importances
+              def regress_samples_ncv_piepline(
+                      ctx, table: biom.Table, metadata: qiime2.NumericMetadataColumn,

Member

nbokulich Sep 5, 2019

pipelines should not use type annotation like this

Member

nbokulich Sep 5, 2019

see classify_samples for an example

q2_sample_classifier/classify.py

+                      estimator: str = defaults['estimator_r'], stratify: str = False,
+                      parameter_tuning: bool = False,
+                      missing_samples: str = defaults['missing_samples']
+                      ) -> (pd.Series, pd.DataFrame):

Member

nbokulich Sep 5, 2019

these outputs do not match the returns.

But more importantly pipelines should not include return annotations. See classify_samples for an example

q2_sample_classifier/classify.py

+                      missing_samples: str = defaults['missing_samples']
+                      ) -> (pd.Series, pd.DataFrame):
+                  y_pred, importances, probabilities = nested_cross_validation(

Member

nbokulich Sep 5, 2019

get action classify_samples_ncv, do not call nested_cross_validation directly.

q2_sample_classifier/classify.py



		def classify_samples_ncv_pipeline(
		ctx, table: biom.Table, metadata: qiime2.CategoricalMetadataColumn,

Member

nbokulich Sep 5, 2019

remove type annotations

q2_sample_classifier/classify.py

+                      missing_samples: str = defaults['missing_samples']
+                      ) -> (pd.Series, pd.DataFrame, pd.DataFrame):
+                  y_pred, importances, probabilities = nested_cross_validation(

Member

nbokulich Sep 5, 2019

use get action, do not call nested_cross_validation directly

q2_sample_classifier/classify.py

+                      stratify=True, parameter_tuning=parameter_tuning, classification=False,
+                      scoring=accuracy_score, missing_samples=missing_samples)
+                  split = ctx.get_action('sample_classifier', 'split_table')

Member

nbokulich Sep 5, 2019

we do NOT want split or fit here — that's why classify_samples_ncv should be called. Remove these.

q2_sample_classifier/classify.py

+                  X_train, X_test = split(table, metadata, test_size, random_state,
+                                          stratify=True, missing_samples=missing_samples)
+                  sample_estimator, importance = fit(

Member

nbokulich Sep 5, 2019

remove

q2_sample_classifier/classify.py

+                  confusion = ctx.get_action('sample_classifier', 'confusion_matrix')
+                  heat = ctx.get_action('sample_classifier', 'heatmap')
+                  X_train, X_test = split(table, metadata, test_size, random_state,

Member

nbokulich Sep 5, 2019

remove

q2_sample_classifier/classify.py

+                  accuracy_results, = confusion(y_pred, metadata, probabilities,
+                                                missing_samples='ignore')
+                  _heatmap, _ = heat(table, importance, sample_metadata=metadata,
+                                     group_samples=True, missing_samples=missing_samples)

Member

nbokulich Sep 5, 2019

I think you can set missing_samples='ignore' here, since the importances should always be ≤ the table features. Same with in the classify_samples pipeline if I did not catch that before.

nbokulich assigned Oddant1 and unassigned nbokulich

Oddant1 closed this

Member Author

Oddant1 commented Sep 6, 2019

This PR is being closed and issue #160 is being deferred to @nbokulich

Oddant1 deleted the ncv_to_pipeline branch

June 25, 2020 20:40

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet