Skip to content

Bidirectional, data-driven RSS/Atom feed consumer, producer and feeds aggregator

License

Notifications You must be signed in to change notification settings

alekseysotnikov/buran

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Buran (meaning "Snowstorm" or "Blizzard") was the first spaceplane to be produced as part of the Soviet/Russian Buran programme. Wikipedia

Buran

Clojars Project CircleCI codecov CodeScene System Mastery CodeScene Code Health

Buran is a library designed to consume and produce any RSS/Atom feeds using a data-driven approach. It works as a ROME wrapper, but in Buran, feeds are represented as data structures.

Buran can be used as an aggregator for various feed formats, converting them into regular Clojure data structures. When consuming a feed, Buran creates a map, which can be read or manipulated using regular functions such as filter, sort, assoc, dissoc, and more. After the modifications, Buran can generate your own feed, for example, in a different format (RSS 2.0, 1.0, 0.9x or Atom 1.0, 0.3).

Installation

  1. Add to project.clj - [buran "0.1.4"]

  2. Import

in your namespace

(:require [buran.core :refer [consume consume-http produce combine-feeds filter-entries sort-entries-by shrink]])

or REPL

(require '[buran.core :refer [consume consume-http produce combine-feeds filter-entries sort-entries-by shrink]])

Usage

Regardless of the feed format you are working with and whether you want to consume or produce a new feed, Buran uses the same data structure every time. Buran's API is concise, with functions such as consume, consume-http, produce, and some helpers to manipulate feeds, including combine-feeds, filter-entries, sort-entries-by and shrink. The basic workflow involves passing the data structure to the API functions repeatedly. See the documentation for Various options and details.

examples

Consume a feed from String

(def feed "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
           <feed xmlns=\"http://www.w3.org/2005/Atom\">
             <title>Feed title</title>
             <subtitle />
             <entry>
               <title>Entry title</title>
               <author>
                 <name />
               </author>
               <summary>entry description</summary>
             </entry>
           </feed>
           ")
(shrink (consume feed))
=>
{:info    {:feed-type "atom_1.0", 
           :title     "Feed title"},
 :entries [{:title       "Entry title", 
            :description {:value "entry description"}}]}

Produce a feed

(def feed {:info {:feed-type "atom_1.0"
                  :title     "Feed title"}
           :entries [{:title       "Entry title"
                      :description {:value "entry description"}}]})
(produce feed)
=>
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>
 <feed xmlns=\"http://www.w3.org/2005/Atom\">\r
   <title>Feed title</title>\r
   <subtitle />\r
   <entry>\r
     <title>Entry title</title>\r
     <author>\r
       <name />\r
     </author>\r
     <summary>entry description</summary>\r
   </entry>\r
 </feed>
 "

Consume a feed over http

(consume-http "https://stackoverflow.com/feeds/tag?tagnames=clojure")
=>
{:info {...},
 :entries [...],
 :foreign-markup [...]}

Shrink a feed (remove nils, empty colls, maps and etc.)

(shrink (consume-http "https://stackoverflow.com/feeds/tag?tagnames=clojure"))
=>
{:info {:description "most recent 30 from stackoverflow.com",
        :feed-type "atom_1.0",
        :published-date #inst"2018-08-20T08:03:33.000-00:00",
        :title "Active questions tagged clojure - Stack Overflow",
        :link "https://stackoverflow.com/questions/tagged/?tagnames=clojure&sort=active",
        :uri "https://stackoverflow.com/feeds/tag?tagnames=clojure",
        :links [{:href "https://stackoverflow.com/questions/tagged/?tagnames=clojure&sort=active",
                 :type "text/html",
                 :rel "alternate",
                 :length 0}, ...]},
 :entries [{:description {:type "html", :value "<p>..."},
            :updated-date #inst"2018-08-20T06:16:12.000-00:00",
            :foreign-markup [...],
            :published-date #inst"2018-08-20T05:54:39.000-00:00",
            :title "Clojure evaluate lazy sequence",
            :author "Constantine",
            :categories [{:name "clojure", :taxonomy-uri "https://stackoverflow.com/tags"}, ...],
            :link "https://stackoverflow.com/questions/51924808/clojure-evaluate-lazy-sequence",
            :uri "https://stackoverflow.com/q/51924808",
            :authors [{:name "Constantine", :uri "https://stackoverflow.com/users/4201205"}],
            :links [{:href "https://stackoverflow.com/questions/51924808/clojure-evaluate-lazy-sequence",
                     :rel "alternate",
                     :length 0}]}, ...],
 :foreign-markup [...]}

Various options

Consume feed

(consume {:from             (java.io.File. "~/feed.xml") 
                                        ; String, File, Reader, W3C DOM document, JDOM document, W3C SAX InputSource
          :validate         false       ; Indicates if the input should be validated
          :locale           (Locale/US) ; java.util.Locale
          :xml-healer-on    true        ; Healing trims leading chars from the stream (empty spaces and comments) until the XML prolog.
                                        ; Healing resolves HTML entities (from literal to code number) in the reader.
                                        ; The healing is done only with the File and Reader.
          :allow-doctypes   false       ; You should only activate it when the feeds that you process are absolutely trustful
          :throw-exception  false       ; false - return map with an exception, throw an exception otherwise
         })
(consume-http {:from             "https://stackoverflow.com/feeds/tag?tagnames=clojure" 
                                                      ; <http url string>, URL, File, InputStream
               :headers          {"X-Header" "Value"} ; Request's HTTP headers map
               :lenient          true                 ; Indicates if the charset encoding detection should be relaxed
               :default-encoding "US-ASCII"           ; Supports: UTF-8, UTF-16, UTF-16BE, UTF-16LE, CP1047, US-ASCII
               ... 
               + All options applied to a (consume) call.
              })

Beware! consume-http from either http url string or URL is rudimentary and works only for simplest cases. For instance, it does not follow HTTP 302 redirects. Please consider using a separate library like clj-http or http-kit for fetching the feed.

Produce feed

(produce {:feed            {:info {:feed-type "atom_1.0" ; Supports: atom_1.0, atom_0.3, rss_2.0, 
                                                         ; rss_1.0, rss_0.94, rss_0.93, rss_0.92, 
                                                         ; rss_0.91U (Userland), rss_0.91N (Netscape), 
                                                         ; rss_0.9
                                   :title "Feed title"}
                            :entries [{:title       "Entry 1 title"
                                       :description {:value "entry description"}}]
                            :foreign-markup nil}

          :to              :string ; <file path string>, :string, :w3cdom, :jdom, File, Writer
          :pretty-print    true    ; Pretty-print XML output
          :throw-exception false   ; false - return map with an exception, throw an exception otherwise
         })

License

Copyright © 2018-2023 Aleksei Sotnikov

Distributed under the Apache License 2.0