Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix RSS 1.0 data modeling #47

Open
matthew-carroll opened this issue Jan 14, 2024 · 2 comments
Open

Fix RSS 1.0 data modeling #47

matthew-carroll opened this issue Jan 14, 2024 · 2 comments
Assignees
Labels
area_rss1 bounty_donation Non-compensated work p1 Critical to solve but not immediate type_bug Something isn't working

Comments

@matthew-carroll
Copy link
Collaborator

While working on RSS 1.0 serialization, I discovered that the data model seems to be wrong and incomplete. We should fix the data model so that it captures all information from an RSS 1.0 document.

Here's a copy of what I found during working on serialization:

I think the existing RSS 1.0 data model is incorrect. Here's an RSS 1.0 basic example from the test directory:

<?xml version="1.0"?>

<rdf:RDF
        xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        xmlns="http://purl.org/rss/1.0/"
>

    <channel rdf:about="http://www.xml.com/xml/news.rss">
        <title>XML.com</title>
        <link>http://xml.com/pub</link>
        <description>XML.com features a rich mix of information and services for the XML community.</description>

        <image rdf:resource="http://xml.com/universal/images/xml_tiny.gif"/>

        <items>
            <rdf:Seq>
                <rdf:li resource="http://xml.com/pub/2000/08/09/xslt/xslt.html"/>
                <rdf:li resource="http://xml.com/pub/2000/08/09/rdfdb/index.html"/>
            </rdf:Seq>
        </items>

        <textinput rdf:resource="http://search.xml.com"/>

    </channel>

    <image rdf:about="http://xml.com/universal/images/xml_tiny.gif">
        <title>XML.com</title>
        <link>http://www.xml.com</link>
        <url>http://xml.com/universal/images/xml_tiny.gif</url>
    </image>

    <item rdf:about="http://xml.com/pub/2000/08/09/xslt/xslt.html">
        <title>Processing Inclusions with XSLT</title>
        <link>http://xml.com/pub/2000/08/09/xslt/xslt.html</link>
        <description>Processing document inclusions with general XML tools can be problematic. This article proposes a way of preserving inclusion information through SAX-based processing.</description>
    </item>

    <item rdf:about="http://xml.com/pub/2000/08/09/rdfdb/index.html">
        <title>Putting RDF to Work</title>
        <link>http://xml.com/pub/2000/08/09/rdfdb/index.html</link>
        <description>
            Tool and API support for the Resource Description Framework
            is slowly coming of age. Edd Dumbill takes a look at RDFDB,
            one of the most exciting new RDF toolkits.
        </description>
    </item>

    <textinput rdf:about="http://search.xml.com">
        <title>Search XML.com</title>
        <description>Search XML.com's XML collection</description>
        <name>s</name>
        <link>http://search.xml.com</link>
    </textinput>

</rdf:RDF>

Here's the spec for RSS 1.0: https://validator.w3.org/feed/docs/rss1.html#s5.5

Yet, here's the property list from rss1_feed.dart:

  final String? title;
  final String? description;
  final String? link;
  final String? image;
  final List<Rss1Item> items;
  final UpdatePeriod? updatePeriod;
  final int? updateFrequency;
  final DateTime? updateBase;
  final DublinCore? dc;

The parsing behavior is as follows:

final document = XmlDocument.parse(xmlString);
    XmlElement rdfElement;
    try {
      rdfElement = document.findAllElements('rdf:RDF').first;
    } on StateError {
      throw ArgumentError('channel not found');
    }

    final channel = rdfElement.findElements('channel');
    return Rss1Feed(
      title: findElementOrNull(rdfElement, 'title')?.innerText,
      link: findElementOrNull(rdfElement, 'link')?.innerText,
      description: findElementOrNull(rdfElement, 'description')?.innerText,
      items: rdfElement.findElements('item').map((element) => Rss1Item.parse(element)).toList(),
      image: findElementOrNull(rdfElement, 'image')?.getAttribute('rdf:resource'),
      updatePeriod: _parseUpdatePeriod(
        findElementOrNull(rdfElement, 'sy:updatePeriod')?.innerText,
      ),
      updateFrequency: parseInt(
        findElementOrNull(rdfElement, 'sy:updateFrequency')?.innerText,
      ),
      updateBase: parseDateTime(
        findElementOrNull(rdfElement, 'sy:updateBase')?.innerText,
      ),
      dc: channel.isEmpty ? null : DublinCore.parse(rdfElement.findElements('channel').first),
    );

We can see that this object parses the whole document, so it should capture enough information to recover the document, but it doesn't.

We can see that the parser pulls the title, description and link from the top-level RDF element, as it should.

We can see that the parse collects and parses all the top-level items within the RDF element, as it should.

However, the top-level image is reduced to a single attribute, despite the fact that the image can contain a title, link, and url. So we seem to be losing information. Based on a quick check of the spec, it looks like this parser might be confusing two different images. There's an image element under the RDF element, which is the one we want. Then there's an image element under the channel element. This parser is treating the image like a channel version, but it should be treating it like an RDF element.

Also, the textinput top-level element isn't parsed at all, despite being a part of the specification.

@matthew-carroll matthew-carroll added type_bug Something isn't working area_rss1 bounty_donation Non-compensated work p1 Critical to solve but not immediate labels Jan 14, 2024
@matthew-carroll matthew-carroll self-assigned this Jan 14, 2024
@toseefkhan403
Copy link

Hi @matthew-carroll, can I take up this issue?

@matthew-carroll
Copy link
Collaborator Author

@toseefkhan403 I'm already working on it. If you also need this work to be done, please be sure to describe the situation you're facing and why this change would be useful for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area_rss1 bounty_donation Non-compensated work p1 Critical to solve but not immediate type_bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants