Fix issue where "CDATA" was incorrectly used as content type in atom entries #123

rpsh · 2023-08-10T07:48:19Z

According to the Atom specification, the acceptable values for the type of <content> are text, html, and xhtml.
This PR fixes the previous code that incorrectly used CDATA as the type value.

…format entries

lkiesow · 2023-12-22T01:15:46Z

If you carefully read the specification, that actually not quite true:

In the most common case, the type attribute is either text, html, xhtml,

But just that these are the most common cases does not mean that these are the only acceptable values. In fact, the specification goes on describing other values right after that.

That being said, you are right that if you want the content to be enclosed by <![CDATA[...]]>, that doesn't say anything about the media type and that should probably be separated? Just stating that it is HTML is also not correct. I'm open for suggestions.

taesungh · 2024-02-03T23:02:13Z

While the specification does describe other acceptable values for the type attribute, the <content type="CDATA"> elements created by feedgen fail the W3C Feed Validation Service:

Not a valid MIME type: CDATA (Invalid MIME Type)

I found <content:encoded> is often used for RSS feeds, but I'm not sure what the best approach is for Atom.

michaelnordmeyer · 2024-03-06T19:19:09Z

Content type html quoted by <!CDATA[...]> is the proper way to do it, IMHO, if the content should be parsed as HTML.

From my understanding the type is a hint for clients to know the content's format, which only applies after parsing the XML.

<!CDATA[...]> is for parsers to ignore any content within.

The different content types are text, html, and xhtml. If the content has characters which break XML parsing, this has to be mitigated.

True xhtml content won't break XML parsing, because it's a SGML subset. This type cannot be used anymore, because most people use HTML5, which is not a subset of SGML.

html content can break XML parsing, because it's not a subset of SGML, and has to be XML-escaped or put within <!CDATA[...]>.

text is the same like html, because any text can have XML-breaking characters.

After the XML parser has parsed the feed, apps can parse the content according to the content type, which is only necessary for html and xhtml, in order to display it to the user.

Alkarex · 2024-03-08T21:53:52Z

The spec is clear: <content type="CDATA"> needs to be decoded as Base64
https://www.rfc-editor.org/rfc/rfc4287#section-4.1.3.3
For the record, we (at FreshRSS) are receiving reports of bad/wrong feeds, which I am guessing are likely produced by this library
FreshRSS/FreshRSS#6180
See also another example of wrong decoding with SimplePie, the library used e.g. by WordPress
https://simplepie.org/demo/?feed=https%3A%2F%2Fpeterwunder.de%2Fprojects%2Fachievements%2Fatom.xml

Alkarex · 2024-03-19T12:45:23Z

@rpsh or anyone with write access to fix the conflicts here?

Fix issue where "CDATA" was incorrectly used as content type in atom …

cc78b0d

…format entries

Alkarex mentioned this pull request Mar 8, 2024

[BUG]The subscription source has garbled characters. FreshRSS/FreshRSS#6180

Closed

Alkarex approved these changes Mar 8, 2024

View reviewed changes

Alkarex mentioned this pull request Mar 8, 2024

SimplePie workaround for Atom cdata type bug FreshRSS/FreshRSS#6181

Closed

Alkarex mentioned this pull request Apr 6, 2024

[BUG] CSS Selector Retrieves Garbled Text FreshRSS/FreshRSS#5586

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issue where "CDATA" was incorrectly used as content type in atom entries #123

Fix issue where "CDATA" was incorrectly used as content type in atom entries #123

rpsh commented Aug 10, 2023

lkiesow commented Dec 22, 2023

taesungh commented Feb 3, 2024

michaelnordmeyer commented Mar 6, 2024

Alkarex commented Mar 8, 2024

Alkarex commented Mar 19, 2024

Fix issue where "CDATA" was incorrectly used as content type in atom entries #123

Are you sure you want to change the base?

Fix issue where "CDATA" was incorrectly used as content type in atom entries #123

Conversation

rpsh commented Aug 10, 2023

lkiesow commented Dec 22, 2023

taesungh commented Feb 3, 2024

michaelnordmeyer commented Mar 6, 2024

Alkarex commented Mar 8, 2024

Alkarex commented Mar 19, 2024