Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Kuzushiji MNIST dataset #22

Merged
merged 4 commits into from
Dec 7, 2021

Conversation

goodhamgupta
Copy link
Contributor

@goodhamgupta goodhamgupta commented Dec 6, 2021

Hi everyone,

Similar to my previous PR(#20), this PR aims to add support for the Kuzushiji MNIST dataset, as mentioned in #11 and #16.

For the KMNIST dataset:

  • The train dataset consists of 60k records, with each image having the size 28x28.
  • The test dataset consists of 10k records each image having the size 28x28.
  • It can be queried as follows:
{train_images, train_labels} = Scidata.KuzushijiMNIST.download
{test_images, test_labels} = Scidata.KuzushijiMNIST.download_test

The API is an exact copy of the mnist.ex file, with the only change being the URLs from which the datasets are downloaded. This PR also adds a quick unit test for the MNIST data loader.

A sample of the images present in the training dataset:

Screenshot 2021-12-07 at 10 37 59 AM
Screenshot 2021-12-07 at 10 38 45 AM

Visualized using:

{{bin_images, img_type, img_size}, train_labels} = Scidata.KuzushijiMNIST.download
images = bin_images |> Nx.from_binary(img_type) |> Nx.reshape(img_size)
images[0..1] |> Nx.to_heatmap() |> IO.inspect

Thanks!

Copy link
Contributor

@t-rutten t-rutten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this dataset! It's very cool to see these characters via Nx.heatmap :)

@goodhamgupta
Copy link
Contributor Author

Thanks for your kind review @t-rutten! I almost forgot about the Nx.heatmap functionality! 😅 I've added a few example images to the PR for reference now! 😄

Copy link
Contributor

@t-rutten t-rutten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We appreciate your contributions @goodhamgupta :)

@t-rutten t-rutten merged commit f1fd2c8 into elixir-nx:master Dec 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants