-
Notifications
You must be signed in to change notification settings - Fork 12
Repository bloat #178
Comments
We can completely delete that benchmark folder.
Best regards
Igor
… On 17 Apr 2017, at 11:48, Ondrej Marsalek ***@***.***> wrote:
Because the repository never forgets, it easily bloats with data that is checked in and then removed again. Currently, it has around 220 MB, while the working tree is only around 35 MB. I tried looking for some resources that could help and found this:
https://stubbisms.wordpress.com/2009/07/10/git-script-to-show-largest-pack-objects-and-trim-your-waist-line/
Running the script, I get a list that starts with the below listing. I think we should try and filter most of these from the history. The end of this list sorted by size is around 1 MB, so looking even further might still make sense. If we don't maintain a separate repository for examples, we need to be a bit careful so that we don't make people download hundreds of MBs if they want to run a simple simulation.
All sizes are in kB's. The pack column is the size of the object, compressed, inside the pack file.
size pack SHA location
30565 8618 0c12d4a625b53bca605d080b3ef03514d614a9c6 examples/ppi/qtip4pf/qtip4pf.pos_7.xyz
30565 8617 19aea3e92732eb3721a38cfcf909a5372ad8c7bc examples/ppi/qtip4pf/qtip4pf.pos_5.xyz
30565 8618 2a1c03bc59f33d6095a663541892668fd93ae724 examples/ppi/qtip4pf/qtip4pf.pos_2.xyz
30565 8617 3041cf398fb10fa955b34971bb88664f334f8965 examples/ppi/qtip4pf/qtip4pf.pos_1.xyz
30565 8618 72ecdcbb1d92a474a12411ffee7097495a40895a examples/ppi/qtip4pf/qtip4pf.pos_4.xyz
30565 8619 c1e61916a1dda9c95133da4c734a4e1d64cdf54a examples/ppi/qtip4pf/qtip4pf.pos_0.xyz
30565 8618 c590357735fca50c993f3973e3bd54ea8010d02f examples/ppi/qtip4pf/qtip4pf.pos_6.xyz
30565 8617 ffc3f0c99c22b9a5dc6c5bffc8de767c19b7d219 examples/ppi/qtip4pf/qtip4pf.pos_3.xyz
30564 9198 12e029354bca5043f93fb8436b5a1ccbfea65c81 examples/ppi/qtip4pf/qtip4pf.force_3.xyz
30564 9197 31d1f02d9a1ce3f0224bcc221e0769b1c51df38d examples/ppi/qtip4pf/qtip4pf.force_0.xyz
30564 9197 6e1ba45f6efc329e1eeae1e90caba4dc5efd45ff examples/ppi/qtip4pf/qtip4pf.force_5.xyz
30564 9196 a11d414c934f211410b606ec6a10b4c6e15ab460 examples/ppi/qtip4pf/qtip4pf.force_6.xyz
30564 9196 a4311bf1b268d6d58e25f259a364d4f7438cda4e examples/ppi/qtip4pf/qtip4pf.force_2.xyz
30564 9197 b9b993ac2df51a380ae5b65da42cc7da389cac13 examples/ppi/qtip4pf/qtip4pf.force_7.xyz
30564 9198 c55b9aba770369b4c6516441b553834202b519d3 examples/ppi/qtip4pf/qtip4pf.force_4.xyz
30564 9197 e7281c1b170259d7f71bacb2ec2c05fe5e6e27d3 examples/ppi/qtip4pf/qtip4pf.force_1.xyz
17721 3605 a79efe62102e38b34e29fed219f00afda8a1892d examples/ppi/qtip4pf/benchmark/qtip4pf.energies.dat
15500 4339 71032e78a592a092937c30b2a6ce771506e8c5ff examples/lammps/h2o-mts/MTS-Ensemble/trial-01/rpc.pos_0.xyz
13785 13162 992bda057bd0e8dd4b9fb3fafa1b39d2f0f5e2f6 data/diss-zurich10.pdf
8995 1411 25347623730ac47702369201032ad73f7aa80cda test/ph2/test_ph2.pdb
6608 6392 fd3d30f6b2a249968f88e2157aebda584e837de2 movies/ice-cage.flv
4345 1760 cdceced9914f6622f14a1878eca1d14791b33a39 examples/lammps/paracetamol-einstein/input.xml
3374 3179 1d1fd08788b781080445763eb970b1c9eb6b5dd5 data/ceri14psik-highlight.pdf
3224 1463 34d41828d417b38a91534326c4acdd7270f4f3f9 examples/lj/nst/reference/lj-nst.pos_0.pdb
3122 2993 cff1c905a7651caa25aac7f44f2e775a50096794 data/i-pi_1.0.zip
2902 2769 8d650c408757239e49ab82138bb63c5ec124028a data/tut-lugano10.pdf
2085 1959 9c8e64ebb9605c8f3849c71cd83ea450a9a703e1 data/lugano10.pdf
1972 266 49a85bc43c83197f25d01fb6d17f9c3638dcf93a examples/lammps/newdyn/nst-ice.xc.pdb
1913 333 3b8aea8aae32e049510f9f41c5ffe8951182bd59 examples/cp2k/basis/dftd3.dat
1859 158 4416e8913a2d64baaa44c126ce9cc43c21ce16bf examples/lammps/h2o-mts/MTS-Ensemble/trial-04/log.lammps
1787 1003 1ae7d88e2ac652f70a6587c768447c0d2e61c25e examples/lj/nst/reference/lj-nst.pos_0.pdb
1736 1011 037343786bd33f7d9e74af45b0ad2512e6b0c7e1 examples/lj/nst/reference/lj-nst.pos_0.pdb
1556 431 57652c560fac09ed25128b1db0058ee993a31202 examples/lammps/h2o-mts/MTS-Ensemble/trial-04/rpc.pos_0.xyz
1344 814 14266545d35648a107bbe12aba9852e466f2a5e9 examples/lj/nst/reference/lj-nst.pos_0.pdb
1228 1197 3e63eeeddbb442bb3040a33195a5a23de7668d64 images/header-homepage.jpg
1213 1183 b82671699bbf75cbe79fcab1dbc88924612282b4 data/i-pi_hands-on.zip
1034 419 21e92a4ed33f15d863ce7798449a506eb53da4f8 examples/lammps/paracetamol-phonons/simulation-fd.dynmat
996 411 5bcaf53662751881fd0f3a261046992301e4aab8 examples/lammps/paracetamol-debye/hessian.data
989 409 da9da9f135199c5f408cd06d1d526f4cf893c843 examples/lammps/paracetamol-phonons/simulation-fd.hess
950 404 2119abc2cd589b225fac326fb5e99cd3d720bc5c examples/lammps/paracetamol-phonons/simulation-fd.mode
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
That would certainly be useful and will make the working tree slimmer, but the trickier part is removing it and other large deleted files from the repository. Because this means rewriting history, I want to be careful. Does anyone have experience with |
I did this a few times and it worked just fine. It is just unclear to me if you need to delete than this whole repository or if the older branches can still stay. So far, I was the only user of my repositories, where I did this, so this wasn't an issue back then |
Let's do this carefully, but I am all in favor of rewriting history and
cleaning up the repo.
…On 17 April 2017 at 12:34, Thomas Spura ***@***.***> wrote:
I did this a few times and it worked just fine. It is just unclear to me
if you need to delete than this whole repository or if the older branches
can still stay. So far, I was the only user of my repositories, where I did
this, so this wasn't an issue back then
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#178 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABESZ0RzcNovfVQpg9pFW0Giag_ISvaCks5rw0BNgaJpZM4M_BTW>
.
|
I just tried using this tool: https://rtyley.github.io/bfg-repo-cleaner/ and it seems to work great. After filtering all files larger than 1 M (the specifics can be tweaked for the production run, of course), I get a much more acceptable 25 MB It will be the push to GitHub that will be the most sensitive part of this operation. Once that is done, everyone with write access must update their local clones and never push from a clone of the old bloated repository. We need to find a way to coordinate this. I suggest setting a date and time well ahead of time, sending a big fat warning to everyone with push access and getting explicit agreement that they know about it and will not push the old repository. |
The main problem is that once we rewrite the history everyone must delete his local repo and download the one with the new history... |
It requires some coordination, but unless we plan to turn it into a weekly activity, I think it is worth it. Best way to ensure that it is rare is to be careful when pushing stuff to the repo. |
This could also be the right moment to separate the example from the repo of the actual code. It would make code revision much much simpler... |
Because the repository never forgets, it easily bloats with data that is checked in and then removed again. Currently, it has around 220 MB, while the working tree is only around 35 MB. I tried looking for some resources that could help and found this:
https://stubbisms.wordpress.com/2009/07/10/git-script-to-show-largest-pack-objects-and-trim-your-waist-line/
Running the script, I get a list that starts with the below listing. I think we should try and filter most of these from the history. The end of this list sorted by size is around 1 MB, so looking even further might still make sense. If we don't maintain a separate repository for examples, we need to be a bit careful so that we don't make people download hundreds of MBs if they want to run a simple simulation.
The text was updated successfully, but these errors were encountered: