Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse matrix #22

Open
severinsimmler opened this issue Aug 23, 2018 · 0 comments
Open

Sparse matrix #22

severinsimmler opened this issue Aug 23, 2018 · 0 comments

Comments

@severinsimmler
Copy link
Contributor

There is a pandas SparseDataFrame, which is interesting for our purposes especially with very large corpora. Someone should do some further research on the class. I've experimented with it a bit and found that some methods don't work like a normal DataFrame, and it all seems to me to be much slower.

See also:

class Corpus:
"""Model class for a Corpus.
Parameters:
documents (iterable): One or more Document objects.
sparse (str): If True, use the sparse DataFrame. NOT IMPLEMENTED.
Attributes:
dtm (pd.DataFrame): Document-term matrix with absolute word frequencies.
"""
def __init__(self, documents, sparse=False):
if sparse:
raise NotImplementedError("This feature is not yet implemented.")

@severinsimmler severinsimmler added this to the v1.1.0 milestone Aug 23, 2018
@severinsimmler severinsimmler changed the title oop: sparse matrix Sparse matrix Aug 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant