-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strings with accented characters are being treated as arrays #18
Comments
Oh, that's excellent. I've been waiting for this one to come up but couldn't produce the issue myself with Filemaker as the data source. When sax parsing, it's considered perfectly legitimate to send textual data in chunks. It's even ok to send text chunks before AND after a sub-element. The receiving handler is expected to sort it all out. So, this is considered perfectly legal:
I've never worked with any data sources that produced this kind of xml, and Filemaker doesn't return any broken text like that. But it looks like Nokogiri & Libxml are sending the text in two separate callbacks to the sax parser's text handler, whenever special characters are encountered. I'm not sure what the rule is with special characters yet, but it shouldn't matter - the rfm parser needs to concatenate all the text chunks into a single data string. Before the "sax-without-buffer" branch, a buffer collected all the text before sending it to the translator (which produces rfm objects). The buffer was a temporary fix for handling this chunky issue, and it added unnecessary overhead and complexity. So I yanked it. Now just need to handle the chunky data in a different way. Here's a parsing of a small data set containing repeating fields and an accented é, for each of the four parsing gems. Note the "memotext" field in the 2nd record of each parsing.
Just as you reported. Anyway, I'm well on the way to addressing this - and the missing data element issue too. |
Ok, should be fixed in sax-without-buffer & master branch now. I noted that Ox appears to eliminate an extra line-feed from source xml text nodes in ruby 1.8.7. This doesn't seem to be a problem in 1.9 and beyond. |
Using the sax-without-buffer branch.
The text was updated successfully, but these errors were encountered: