Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform meta-features separation by dataset source #29

Closed
MorrisNein opened this issue Jul 1, 2023 · 1 comment
Closed

Perform meta-features separation by dataset source #29

MorrisNein opened this issue Jul 1, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request invalid This doesn't seem right

Comments

@MorrisNein
Copy link
Collaborator

Since #15, meta-features cache is stored by path data/metafeatures/*mf_source*/*dataset_id*.

Since datasets from different source can potentially have the same ID, the storage of meta-features needs an additional level of separation.

There are two possible ways to address this issue:

  1. Add directory separation (e.g.: data/metafeatures/*mf_source*/*dataset_source*/*dataset_id*.pkl or data/datasets/*dataset_source*/*dataset_id*/*mf_source*.pkl)
  2. Integrate a light-weight file database (sqlite3, TinyDB, etc.).
@MorrisNein MorrisNein added enhancement New feature or request invalid This doesn't seem right labels Jul 1, 2023
@DRMPN
Copy link
Member

DRMPN commented Jul 3, 2023

In my opinion it is easier to do 1 way.
There are several reasons for this:

  1. Entire files shouldn't be stored in the database. (I think this is a general rule, but I could be wrong)
  2. We have to do a database design.
  3. We have to decide whether to use clear SQL queries in the code or use ORM like Peewee etc.
  4. Database should be used to persist data, such as long-term storage.
  5. Consider NOSQL solutions like Redis, I've seen it used for caching but haven't tried to understand or implement it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

2 participants