Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix and speed improvement #3

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

eltomjan
Copy link

@eltomjan eltomjan commented Dec 3, 2017

Hello Tim,

I am sending small bugfixes of original rapidxml and my latest speed improvement.
My test (VS2008/Win/i7) spent 1/2 time parsing and on nVidia Tegra T20 (Colibri T20 COM/Ubuntu) aprox. 75% of original time (<10MB xml).

BR,
El Tom

Copy link
Owner

@timniederhausen timniederhausen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my tests using macros wasn't necessary, as all the *_pred::test() functions were inlined by the compiler (MSVC 15+, Clang 3.6)

Can you tell me which compiler you used to benchmark this?

@@ -657,6 +657,7 @@ namespace rapidxml
xml_base()
: m_name(0)
, m_value(0)
, m_value_size(0)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code deliberately left m_value_size uninitialized (it's only accessed if m_value != nullptr)

@@ -689,7 +690,7 @@ namespace rapidxml
//! <br><br>
//! Use value_size() function to determine length of the value.
//! \return Value of node, or empty string if node has no value.
Ch *value() const
const Ch *value() const
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should at least have a non-cost value() overload that returns a non-const Ch*

(Even then I'm not all too happy about breaking the public interface, especially since iterators etc. only return const nodes/attributes.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not good idea to change the value and there is always static_cast or some dirty tricks to break that rule ;-)

@@ -914,10 +915,19 @@ namespace rapidxml
xml_node(node_type type)
: m_type(type)
, m_first_node(0)
, m_last_node(0)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. This was left uninitialized deliberately.

if(shorter) {
// Add terminating zero after decoded shorter value 2
if (!(Flags & parse_no_string_terminators))
*dest = 0;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe the issue this is supposed to fix?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some tests for character references in 3ade903.
To me, everything (nul-terminated strings & length) seems to be just fine. Do you have a test-case that shows your issue?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found another bad feature - default unescape affect also ex. CR & LF (&#13; &#10; &#xA; &#xD;) so in multi-platform environment you cannot process XML without unrecoverable CR/LF mismatch sometimes...

@eltomjan
Copy link
Author

eltomjan commented Dec 5, 2017

When there are sequences like &gt; etc, size is ok, but string is not terminated properly (found end repeated). Anyway thanx 4 nice comments... Now I see lot of changes are my project related only... I am using VS2008 Express and some quite old on Ubuntu (has C++11 support yet), consts are because I had reset command and was not sure if I am not changing content I am using directly, anyway you can't extend value if you don't want to break structure... Next step was single allocation and relocate method, during iterating objects W/O recursion I was almost stucked in uninits/assert errors. If you are interested, these 2 are in my fork yet (but relocation was never used nor tested enought yet and Ubuntu can't allocate single big space by malloc even on x86). It takes 1.5/1.2s now to parse 10MB on Colibri T20 ;-( And VS2k8 do not respect even register - changes ptr somewhere in memory instead - maybe because it is Express(?) or as I'm using debug build as it is more stable on Linux, but macro take similar less time in release too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants