-
-
Notifications
You must be signed in to change notification settings - Fork 398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation is too big #3479
Comments
I looked at de-duplicating files between versions. But every HTML has a timestamp of when it was compiled 😢. They also contain the navigation tree, so any change to navigation (new page, header, etc) will prevent de-duplication. |
Thoughts @mortenpi? |
Could look into just retaining the latest patch version per minor maybe? I don't have any amazing ideas though. I guess the "official" recommendation here would be to look at deploying to self-hosted S3 buckets or something if the docs get too big? |
It's a larger change, but ideally:
There are still problems, like the fact that Here's a script I tried, but I ran into the nav issue. import SHA
function sha256(filename)
if endswith(filename, ".html")
# <span class="colophon-date" title="Saturday 2 September 2023 03:04">Saturday 2 September 2023</span>
contents = read(filename, String)
io = IOBuffer()
write(
io,
replace(contents, r"<span class=\"colophon-date\".+?</span>" => ""),
)
seekstart(io)
return bytes2hex(SHA.sha256(io))
else
return bytes2hex(open(SHA.sha256, filename))
end
end
dirs = filter!(!isnothing, match.(r"v(\d+.\d+).(\d+)", readdir(".")))
versions = sort([VersionNumber(m.match) for m in dirs])
sha_file_to_version = Dict{Tuple{String,String},String}()
for v in versions
version = "v$v"
for (root, dirs, files) in walkdir(version)
for file in files
path = joinpath(root, file)
sha = sha256(path)
filename = replace(path, "$(version)/" => "")
first_version = get(sha_file_to_version, (sha, filename), nothing)
if first_version !== nothing
new_file = joinpath(first_version, filename)
run(`ln -fs $new_file $path`)
else
sha_file_to_version[(sha, filename)] = version
end
end
end
end The SciML docs are going to have the same problem at some point. There new versions are 16Mb (although they have a 3.9 Gb
|
Just to double check -- does the deduplication even work? I wouldn't be surprised if The timestamp could have an option to disable it (or maybe even populate it dynamically from And also, I'm not sure we want the de-duplication complexity in Documenter.. I understand that JuMP is hitting this edge case, but it is an edge case. That said, I guess it would have to be a SciML deploys to S3 by the way: https://github.com/SciML/SciMLDocs/blob/b9b5008c1fdca03b9365ec78ff220cefab48b632/.buildkite/aggregate.yml#L26-L31 |
One more note: if your |
Aren't the symlinks how we're doing the
Fresh clones are not very big
The overhead of managing a separate doc repo outweighs, I think. |
That's fair, it clones reasonably quickly, on a fast connection anyhow. But just as a datapoint, 90% of the time and disk space is spent on
|
I mean, they definitely work, as in they get deployed correctly. But are we sure they actually reduce the size of the tarball that GitHub tries to upload? I guess if they didn't, then the tarball would be much larger though, and so maybe they do work... |
Actually, pretty sure it doesn't. So currently, the whole
But the artifact is 900+ MB And you can download it actually -- it's a zip archive (of a |
Oh 😢 I guess we could replace a bunch with explicit re-directs then, like we did for https://github.com/JuliaOpt/juliaopt.github.io/blob/master/JuMP.jl/dev/index.html |
Anyway, I get your point that this is something we can fix. I don't know if documenter needs to do anything. |
For redirects, I have been working on some shared tooling in DocumenterTools: JuliaDocs/DocumenterTools.jl#76 It's still a draft though, since I am not sure what API we want exactly, so would be happy to have feedback there if it's something that you could use 🙂 |
So one problem with the re-directs is that pages can move or be deleted between patch releases. There are actually quite a few changes |
One thing that would save a lot is to remove the search index: odow/SDDP.jl#661, https://odow.github.io/SDDP.jl/v0.3.13/search/ It saves quite a lot of space, and the only downside is that someone can't search through old versions of the docs. I didn't realize that |
I also wonder about just removing all of the |
What about retaining just the Potentially, this type of post-processing is also something that can maybe be done in the GitHub workflow that actually deploys to GitHub Pages: https://docs.github.com/en/pages/getting-started-with-github-pages/configuring-a-publishing-source-for-your-github-pages-site#creating-a-custom-github-actions-workflow-to-publish-your-site |
I guess the question is to what extend do we want to support permalinks. If the docs can change between patch, then I'm okay having a script that we periodically run to post-process things. |
This has come up before JuliaDocs/Documenter.jl#1914 |
This was actually slightly non-trivial to get working. First up, we can change the doc build to use a GitHub action: Then because pushes from We need to upload with an SSH key: So it can trigger the deploy action: We still need I haven't tried yet, but here would be the place to delete or re-direct any pages that we want to delete |
The documentation preview from #3478 failed to build because it's very big:
https://github.com/jump-dev/JuMP.jl/actions/runs/6054461407/job/16431976144
I've gone and removed some old previews, but we should consider removing some old versions.
Do we need https://jump.dev/JuMP.jl/v0.19.2/ or as far back as https://jump.dev/JuMP.jl/v0.12/?
Perhaps we could just keep the latest patch release for each minor and redirect.
Each copy of the documentation is ~21 Mb, so we can have ~50 copies before we hit 1 Gb.
We could also redirect all of the PDFs to the latest copy, rather than storing a 4 Mb PDF for every version.
cc @mortenpi
The text was updated successfully, but these errors were encountered: