Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Git repo size #650

Closed
manzt opened this issue Mar 3, 2022 · 6 comments
Closed

Git repo size #650

manzt opened this issue Mar 3, 2022 · 6 comments
Labels
maintenance Related to maintenance of project

Comments

@manzt
Copy link
Member

manzt commented Mar 3, 2022

Not a concern with Gosling itself, but I recently re-cloned the repo and found it to be quite slow (~3 min). Upon inspection, the .git/ folder is very large (1.2 GB). I think this can occur from checking in many large binary objects (e.g., images).

gosling.js on  master is 📦 v0.9.17 via  v17.6.0
❯ du -h -d1

4.0K    ./embed
5.6M    ./img
336K    ./schema
 16K    ./public
 68K    ./logo
8.0K    ./scripts
 28K    ./.github
1.2G    ./.git
 36K    ./notebooks
608K    ./editor
996K    ./src
1.2G    .

for comparison, the .git folder for higlass (with >5000 commits) is 227M.

@manzt manzt added bug🐛 Something isn't working maintenance Related to maintenance of project and removed bug🐛 Something isn't working labels Mar 3, 2022
@manzt manzt changed the title Cloning repo is very slow Git repo size Mar 3, 2022
@manzt
Copy link
Member Author

manzt commented Mar 3, 2022

I think at some point the bundle for Gosling was checked into this repo, and was ~42Mb?

from https://stackoverflow.com/a/14329983/11008641

gosling.js on  master is 📦 v0.9.17 via  v17.6.0
❯ git rev-list --all --objects | \
    sed -n $(git rev-list --objects --all | \
    cut -f1 -d' ' | \
    git cat-file --batch-check | \
    grep blob | \
    sort -n -k 3 | \
    tail -n40 | \
    while read hash type size; do
         echo -n "-e s/$hash/$size/p ";
    done) | \
    sort -n -k1
42667313 main.js
42723724 main.js
42723724 main.js
42724139 main.js
42724200 main.js
42724507 main.js
42724663 main.js
42726288 main.js
42726560 main.js
42726877 main.js
42726964 main.js
42727343 main.js
42727343 main.js
42728427 main.js
42762390 main.js
42763888 main.js
42763888 main.js
42773344 main.js
42773344 main.js
42773344 main.js
42792513 main.js
42792513 main.js
42793592 main.js
42793592 main.js
42793756 main.js
42794184 main.js
42794185 main.js
42832661 main.js
42832665 main.js
42832722 main.js
42854024 main.js
42854024 main.js
42904025 main.js
42904153 main.js
42904311 main.js
42904311 main.js
43096900 main.js
43096901 main.js
43096946 main.js
43097807 main.js

@manzt
Copy link
Member Author

manzt commented Mar 3, 2022

Ok, so these large files seem to be coming from the gh-pages branch which I guess makes sense. A simple fix would be to delete and recreate the gh-pages branch, and then run git gc --prune=now --aggressive which will cleanup all the objects.

I don't think we need to worry about versioning the gh-pages branch so much, since any build of gosling can be recreated by checking out master and running yarn build.

@manzt
Copy link
Member Author

manzt commented May 6, 2023

Just freshly cloned today, and the total history is 1.37GB!

❯ gh repo clone gosling-lang/gosling.js
Cloning into 'gosling.js'...
remote: Enumerating objects: 15081, done.
remote: Counting objects: 100% (818/818), done.
remote: Compressing objects: 100% (367/367), done.
remote: Total 15081 (delta 516), reused 719 (delta 451), pack-reused 14263
Receiving objects: 100% (15081/15081), 1.37 GiB | 27.67 MiB/s, done.
Resolving deltas: 100% (11319/11319), done.

@manzt
Copy link
Member Author

manzt commented Dec 15, 2023

using git-sizer:

| Name                         | Value     | Level of concern               |
| ---------------------------- | --------- | ------------------------------ |
| Overall repository size      |           |                                |
| * Commits                    |           |                                |
|   * Count                    |  1.60 k   |                                |
|   * Total size               |   696 KiB |                                |
| * Trees                      |           |                                |
|   * Count                    |  5.77 k   |                                |
|   * Total size               |  3.80 MiB |                                |
|   * Total tree entries       |  94.9 k   |                                |
| * Blobs                      |           |                                |
|   * Count                    |  9.44 k   |                                |
|   * Total size               |  12.2 GiB | *                              |
| * Annotated tags             |           |                                |
|   * Count                    |    86     |                                |
| * References                 |           |                                |
|   * Count                    |   143     |                                |
|     * Branches               |     1     |                                |
|     * Tags                   |    93     |                                |
|     * Remote-tracking refs   |    49     |                                |
|                              |           |                                |
| Biggest objects              |           |                                |
| * Commits                    |           |                                |
|   * Maximum size         [1] |  2.86 KiB |                                |
|   * Maximum parents      [2] |     2     |                                |
| * Trees                      |           |                                |
|   * Maximum entries      [3] |   119     |                                |
| * Blobs                      |           |                                |
|   * Maximum size         [4] |  41.1 MiB | ****                           |
|                              |           |                                |
| History structure            |           |                                |
| * Maximum history depth      |   795     |                                |
| * Maximum tag depth      [5] |     1     |                                |
|                              |           |                                |
| Biggest checkouts            |           |                                |
| * Number of directories  [6] |    47     |                                |
| * Maximum path depth     [7] |     6     |                                |
| * Maximum path length    [8] |    74 B   |                                |
| * Number of files        [9] |   353     |                                |
| * Total size of files   [10] |  77.6 MiB |                                |
| * Number of symlinks         |     0     |                                |
| * Number of submodules       |     0     |                                |

[1]  0517bc58ad2bae89c97c99a76d5f6e36b5980972
[2]  8827b6ad51bc14763b111d31481cd64c9e0dcc85
[3]  8ab2fed13fc9b6addaecca8f085435323d7e96b0 (refs/remotes/origin/gh-pages:assets)
[4]  ebc00e563643bd30a87ee923b494481a174145f2 (4663a3a7d63552a1e46a0ef1d7f141278397d823:main.js)
[5]  0375d73a656f21c26dd81a58c7424d4f78855d6f (refs/tags/v0.0.10)
[6]  fab6680a43e12d2e162d02236953d7cf3a051fff (refs/remotes/origin/sehilyi/2d-axis^{tree})
[7]  5375a1e970e9c2b1aaf0295c369f7d02479ddaf1 (fcbc37e68d6caa63d78e12e528f420a38a8f96b8^{tree})
[8]  b2b48fc37ae92c31f4e912cc11f2c80ee99029a5 (a252f58440120a9c1255959219521d89deb32985^{tree})
[9]  b073d1cde2d26fadba821a7f4d88b7c3194b14ba (refs/remotes/origin/etowahadams/snapshots^{tree})
[10] b17a42e3f7089d202a350f347636eb6a4c7b0301 (21c513832502018d1098d5949fd4580cfe550861^{tree})

@etowahadams
Copy link
Contributor

etowahadams commented Dec 15, 2023

Woah that's huge! I had no idea.
How would you feel about removing the git history of images?

ex.

# BFG Repo-Cleaner
bfg --delete-files "MATRIX.jpg"

edit: Oh I see most of size is in gh-pages

@manzt
Copy link
Member Author

manzt commented Dec 19, 2023

Fixed in #1015

@manzt manzt closed this as completed Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Related to maintenance of project
Projects
None yet
Development

No branches or pull requests

2 participants