You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Developers and artists cloning the repo will clone a few gigabytes less (2.5GB saved in following tests).
Suggestion
GitHub will perform git gc only on demand. An org admin will need to open a support request with GitHub explicitly requesting git gc --aggressive on the asset repo.
Before this happens; the history should be rewritten with gitattributes to declare binaries in the initial commit.
Here's the .gitattributes file I used in tests.
* binary
*.cfdg text diff
*.md text diff
*.svg text=auto diff
.git* text diff
Background
Git handles binary deltas just fine; but you can improve how it handles binary data if you declare with Git you have binaries.
Git delta compression on binaries:
Original asset repo (no checkout): 8.4GB
Without gitattributes (git gc): 7.7GB
With gitattributes (git gc): 5.9GB
Raw assets size (single checkout without Git): 11GB
11GB of assets are tracked across 216 Git commits with automatic delta compression (8.4GB) packed into separate "blob packs". If you do Git maintanence and execute git gc --aggressive then git will "repack" all of the commits and determine their binary deltas (more efficient packing after you have a long history of commits).
If you add a .gitattributes file and reorganize the Git history so that the gitattributes is the initial commit, then Git appears to handle binary assets more efficiently than relying on their automatic heuristics for binary deltas.
File size by file extension:
File extension
File size by type
Type of file
7z
243M
binary
blend
3.6G
binary
cfdg
4.0K
text
jpg
8.8M
binary
kra
516K
binary
md
4.0K
text
odg
32K
binary
png
2.8M
binary
psd
4.0M
binary
svg
3.0M
text/mixed
xcf
6.3G
binary
zip
189M
binary
total
11G
Benchmark
git gc --aggressive benchmark (with reordered history with .gitattributes is initial commit):
70 minutes
$ time git gc --aggressive
Enumerating objects: 4300, done.
Counting objects: 100% (4300/4300), done.
Delta compression using up to 8 threads
Compressing objects: 83% (3586/4296)
Compressing objects: 100% (4296/4296), done.
Writing objects: 100% (4300/4300), done.
Selecting bitmap commits: 163, done.
Building bitmaps: 100% (106/106), done.
Total 4300 (delta 2034), reused 2167 (delta 0), pack-reused 0
real 70m31.830s
user 301m40.752s
sys 0m29.756s
Some source
Size calculation:
total_size() { grep -o '.*total$' | sed "s/\\([^ \\t]\\+\\).*total/${1}: \\1/";}
find * -type f | sed 's/^.*\.//' | sort -u | while read -er ext; do find * -type f -name "*.${ext}" -exec du -sch {} + | total_size "${ext}";done
du -shc * | total_size total
Why
Developers and artists cloning the repo will clone a few gigabytes less (2.5GB saved in following tests).
Suggestion
GitHub will perform
git gc
only on demand. An org admin will need to open a support request with GitHub explicitly requestinggit gc --aggressive
on the asset repo.Before this happens; the history should be rewritten with gitattributes to declare binaries in the initial commit.
Here's the .gitattributes file I used in tests.
Background
Git handles binary deltas just fine; but you can improve how it handles binary data if you declare with Git you have binaries.
Git delta compression on binaries:
11GB of assets are tracked across 216 Git commits with automatic delta compression (8.4GB) packed into separate "blob packs". If you do Git maintanence and execute
git gc --aggressive
then git will "repack" all of the commits and determine their binary deltas (more efficient packing after you have a long history of commits).If you add a
.gitattributes
file and reorganize the Git history so that the gitattributes is the initial commit, then Git appears to handle binary assets more efficiently than relying on their automatic heuristics for binary deltas.File size by file extension:
Benchmark
git gc --aggressive
benchmark (with reordered history with.gitattributes
is initial commit):70 minutes
Some source
Size calculation:
Background Reading
The text was updated successfully, but these errors were encountered: