Skip to content

Commit

Permalink
Create rule S6974: Subclasses of Scikit-Learn's "BaseEstimator" shoul…
Browse files Browse the repository at this point in the history
…d not set attributes ending with "_" in the "__init__" method (#3891)
  • Loading branch information
github-actions[bot] authored May 7, 2024
1 parent 86d6b7c commit 2ad4854
Show file tree
Hide file tree
Showing 3 changed files with 116 additions and 0 deletions.
2 changes: 2 additions & 0 deletions rules/S6974/metadata.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
{
}
24 changes: 24 additions & 0 deletions rules/S6974/python/metadata.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{
"title": "Subclasses of Scikit-Learn's \"BaseEstimator\" should not set attributes ending with \"_\" in the \"__init__\" method",
"type": "CODE_SMELL",
"status": "ready",
"remediation": {
"func": "Constant\/Issue",
"constantCost": "5min"
},
"tags": [
],
"defaultSeverity": "Major",
"ruleSpecification": "RSPEC-6974",
"sqKey": "S6974",
"scope": "All",
"defaultQualityProfiles": ["Sonar way"],
"quickfix": "partial",
"code": {
"impacts": {
"MAINTAINABILITY": "MEDIUM",
"RELIABILITY": "HIGH"
},
"attribute": "CONVENTIONAL"
}
}
90 changes: 90 additions & 0 deletions rules/S6974/python/rule.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
This rule raises an issue when an attributes ending with ``++_++`` is set in the ``++__init__++`` method of a class inheriting from
Scikit-Learn `BaseEstimator`


== Why is this an issue?

On a Scikit-Learn estimator, attributes that have a trailing underscore represents attributes that are estimated.
These attributes have to be set in the fit method. Their presence are used to verify if an estimator has been fitted.

[source,python]
----
from sklearn.neighbors import KNeighborsClassifier
X = [[0], [1], [2], [3]]
y = [0, 0, 1, 1]
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X, y)
knn.n_samples_fit_
----

In the example above the attributes of the `KNeighborsClassifier`, ``++n_samples_fit_++``
is set only after the estimator's `fit` method is called. Calling ``++n_samples_fit_++`` before the estimator is fitted
would raise an `AttributeError` exception.

When implementing a custom estimator by subclassing Scikit-Learn's `BaseEstimator`,
it is important to follow the above convention and not set attributes with a trailing underscore inside the ``++__init__++`` method.

== How to fix it

To fix this issue, move the attributes with a trailing underscore from the ``++__init__++`` method to the `fit` method.

=== Code examples

==== Noncompliant code example

[source,python,diff-id=1,diff-type=noncompliant]
----
from sklearn.base import BaseEstimator
class MyEstimator(BaseEstimator):
def __init__(self):
self.estimated_attribute_ = None # Noncompliant: an estimated attribute is set in the __init__ method.
----

==== Compliant solution

[source,python,diff-id=1,diff-type=compliant]
----
from sklearn.base import BaseEstimator
class MyEstimator(BaseEstimator):
def fit(self, X, y):
self.estimated_attribute_ = some_estimation(X) # Compliant
----

== Resources
=== Documentation

* Scikit-Learn documentation - https://scikit-learn.org/stable/developers/develop.html#parameters-and-init[Parameters and init]
* S6970 - The Scikit-learn `fit` method should be called before methods yielding results


ifdef::env-github,rspecator-view[]

(visible only on this page)

== Implementation specification

Verify that subclasses of BaseEstimator do not have attributes with a trailing underscore in the __init__ method.
Verify that inherited alongside BaseEstimator do not have attributes with a trailing underscore in the __init__ method.

=== Message

Primary : Move this estimated attribute inside the `fit` method.

Secondary: this attributes is used in this estimator (in the case of mixins)


=== Issue location

Primary : name of the attribute

Secondary : the name of the estimator subclassing the mixin and the BaseEstimator

=== Quickfix

Possible quickfix if the `fit` method does not exist. (Add the fit method with the estimated attribute) and delete the estimated attributes
from the __init__ method. This is only possible in an estimator directly subclassing BaseEstimator.

endif::env-github,rspecator-view[]

0 comments on commit 2ad4854

Please sign in to comment.