Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor openml_datasets.py #375

Open
anupamamurthi opened this issue Sep 14, 2022 · 5 comments · May be fixed by #471
Open

Refactor openml_datasets.py #375

anupamamurthi opened this issue Sep 14, 2022 · 5 comments · May be fixed by #471
Labels
datasets Issue relating to new or existing datasets easy Beginner issues good first issue Good for newcomers

Comments

@anupamamurthi
Copy link
Collaborator

https://github.com/Trusted-AI/AIF360/blob/master/aif360/sklearn/datasets/openml_datasets.py

Add a wrapper around this module so that datasets can be directly accessed using the wrapper

Instead of doing something like this,

from sklearn.datasets import fetch_openml

from aif360.sklearn.datasets.utils import standardize_dataset


# cache location
DATA_HOME_DEFAULT = os.path.join(os.path.dirname(os.path.abspath(__file__)),
                                 '..', 'data', 'raw')

def fetch_adult(subset='all', *, data_home=None, cache=True, binary_race=True,
                usecols=None, dropcols=None, numeric_only=False, dropna=True):
    if subset not in {'train', 'test', 'all'}:
        raise ValueError("subset must be either 'train', 'test', or 'all'; "
                         "cannot be {}".format(subset))
    df = fetch_openml(data_id=1590, data_home=data_home or DATA_HOME_DEFAULT,
                      cache=cache, as_frame=True).frame

the proposal is to have an OpenMLStore

class OpenMLStore(ABC):
@abc.abstractmethod
def init(self, **kwargs):
pass

def download(self, data_id, data_home):
     df = fetch_openml(data_id=1590, data_home=data_home or DATA_HOME_DEFAULT,
                  cache=cache, as_frame=True).frame

// decide on returning a DF or just the o/p directory location

def upload(self, **kwargs):
    pass

And fetch_adult() function can be updated to use OpenMLStore abstraction

@nrkarthikeyan nrkarthikeyan added good first issue Good for newcomers easy Beginner issues general General issues datasets Issue relating to new or existing datasets and removed general General issues labels Sep 14, 2022
@hoffmansc
Copy link
Collaborator

Can you elaborate on the shortcomings of the current method?

@yoshimii
Copy link

Hello, I'd like to work on this issue.

@jainsunishka
Copy link

I am making progress on this issue, and I would like to continue on this.

@mnagired mnagired linked a pull request Sep 22, 2023 that will close this issue
@vandanapathare
Copy link

Hello I would like to work on this issue

@jainsunishka
Copy link

Hello I would like to work on this issue

Hey @vandanapathare. I have already raised the PR and finishing up on my code review.

sharryhuang added a commit to sharryhuang/AIF360 that referenced this issue Sep 22, 2023
if_delegate_has_method is deprecated, use available_if instead Trusted-AI#375

Signed-off-by: sharryhuang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Issue relating to new or existing datasets easy Beginner issues good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants