-
-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Lxml Parsing #891
base: main
Are you sure you want to change the base?
[WIP] Lxml Parsing #891
Conversation
Thanks for taking a look at this. Unfortunately it is a non-trivial task. We stopped regenerating the xml parsing code when we starting manually adjusting it. Regenerating it now will likely require a lot of changes which is hard to achieve in a Python project which doesn't have a lot of type safety or good test coverage. Though clearly these files are at the heart of Breathe's performance and memory problems. In this case, it is hard to know if this is the first error of a small number of potential problems or a very large number. If this project was well funded and supported then we might have the time and energy to work through such a large change but we have not been successful at securing such funding. Partly due to a lack of knowledge of how to go about it. Personally my preference for languages has also drifted away from Python. I find large Python code bases hard to maintain and whilst the new optional typing can help with that it is still not as robust as other languages. I have started on a re-write of Breathe using Rust as the backend and a slim Python layer connected via PyO3 bindings. The hope is that it'll be significantly faster and less memory intensive whilst also being more maintainable due to Rust's type system. I have a simple proof of concept but it is a long way from matching the current functionality but Breathe though I would like to get it there. In order to attempt to make it a more sustainable project, I intend to license it under the Parity license which makes it free to use for open source projects but will require a commercial license for closed source work. I realise this is not optimal for all of Breathe's current users but will hopefully give the project a clearer future and potential to expand in scope to deliver more solutions in this space. |
@AnthonyDiGirolamo our team is also interested in contributing to improved performance, since our documentation builds are currently taking several hours. I have tried your branch to see if I can make any progress, but ran into some issues which look like something more basic than the issue you have described in the PR: (backtrace)
It looks like the call to |
@igrr Thanks for trying it! Those are the types of exceptions I was working through. I got through the ones caused by the unit tests but not running an actual sphinx generation. I started this effort by re-running the latest generate-ds version which provided new The latest generate-ds version uses lxml for parsing which is what speeds everything up. For what it's worth I setup our breathe config to only run Doxygen on files with doxygen comment blocks. This reduces the number of xml output files that breathe has to parse which cut down our generation time drastically. Here's the change that did that: docs: Support Doxygen style comment blocks |
Right, but aren't both I guess I'll also try to run generateDS locally and see how far I can get. |
As far as I can tell there was some code in |
Alright folks we need to move forward with this, there is no way that we can accept in 2023 an hour to build motherfucking text files. (no one's fault, of course) I am happy to contribute, but already from the get-go I'm confused what the ruckus is all about. If my assessment is correct, our goal is to parse XML and build a structure of objects that is native to Sphinx, then we store these into 'doctree' files or pass them to Sphinx so it can run it's HTML rendering magic. So what is so hard about switching over to LXML? Both are still XML underneath, so it should be a matter of updating the API calls everywhere, isn't it? |
Hey @oxysoft yep that's right. This is still a problem for our sphinx build too. We try to limit doxygen to only run on the files we care about to limit what breathe parses. It worked well when there were less files but now it's getting too big and slow. (List of inputs https://cs.opensource.google/pigweed/pigweed/+/main:docs/BUILD.gn;l=113-197 ) Hopefully we can revisit this soon. |
Side note that there is also the approach of #962. I am not sure what is preferable at this point in time or how far both of the implementations are. |
Hi all! My team is looking to use doxygen with breathe and noticed very long build times, about ~60 seconds per rst file.
I started hacking on the parser stuff and re-ran generateDS 2.41.1 against doxygen 1.9.6 xsd files.
So far
./tests/runtests.sh
passes but I can't yet run sphinx.The error I'm getting in our sphinx run is
Any ideas?