Skip to content

Commit

Permalink
Add external_urls filter
Browse files Browse the repository at this point in the history
This filter traverses all <a> tags and replaces
its url for an url poiting to a path of an existant
documentation.
  • Loading branch information
MasterEnoc committed Mar 5, 2021
1 parent e9d7849 commit 38e2b10
Show file tree
Hide file tree
Showing 6 changed files with 59 additions and 1 deletion.
1 change: 1 addition & 0 deletions docs/filter-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ The `call` method must return either `doc` or `html`, depending on the type of f
* [`AttributionFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/attribution.rb) — appends the license info and link to the original document
* [`TitleFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/title.rb) — prepends the document with a title (disabled by default)
* [`EntriesFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/entries.rb) — abstract filter for extracting the page's metadata
* [`ExternalUrlsFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/external_urls.rb) — replaces external URLs for relative URLs of existant devdocs documentation.

## Custom filters

Expand Down
5 changes: 5 additions & 0 deletions docs/scraper-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@ Additionally:

* [`TitleFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/title.rb) is a core HTML filter, disabled by default, which prepends the document with a title (`<h1>`).
* [`EntriesFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/entries.rb) is an abstract HTML filter that each scraper must implement and responsible for extracting the page's metadata.
* [`ExternalUrlsFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/external_urls.rb) is an HTML filter that replaces external URLs found in `<a>` tags to urls pointing to existant devdocs documentation.

### Filter options

Expand Down Expand Up @@ -185,6 +186,10 @@ More information about how filters work is available on the [Filter Reference](.

_Note: this filter is disabled by default._

* [`ExternalUrlsFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/external_urls.rb)

- `:external_urls` [Hash or Proc] If it is a Hash, replaces all URLs found in `<a>` tags for URLs of existant devdocs documentation. If it is a Proc, it is called with an URL (string) as argument and should return a relative URL pointing to an existant devdocs documentation. See [`backbone.rb`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/scrapers/backbone.rb)

## Keeping scrapers up-to-date

In order to keep scrapers up-to-date the `get_latest_version(opts)` method should be overridden. If `self.release` is defined, this should return the latest version of the documentation. If `self.release` is not defined, it should return the Epoch time when the documentation was last modified. If the documentation will never change, simply return `1.0.0`. The result of this method is periodically reported in a "Documentation versions report" issue which helps maintainers keep track of outdated documentations.
Expand Down
10 changes: 10 additions & 0 deletions lib/docs/core/filter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -96,5 +96,15 @@ def clean_path(path)
path = path.gsub %r{\+}, '_plus_'
path
end

def path_to_root
if subpath == ''
return '../'
else
previous_dirs = subpath.scan(/\//)
return '../' * previous_dirs.length
end
end

end
end
2 changes: 1 addition & 1 deletion lib/docs/core/scraper.rb
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ def stub(path, &block)
self.html_filters = FilterStack.new
self.text_filters = FilterStack.new

html_filters.push 'apply_base_url', 'container', 'clean_html', 'normalize_urls', 'internal_urls', 'normalize_paths', 'parse_cf_email'
html_filters.push 'apply_base_url', 'container', 'clean_html', 'normalize_urls', 'internal_urls', 'normalize_paths', 'parse_cf_email', 'external_urls'
text_filters.push 'images' # ensure the images filter runs after all html filters
text_filters.push 'inner_html', 'clean_text', 'attribution'

Expand Down
38 changes: 38 additions & 0 deletions lib/docs/filters/core/external_urls.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# frozen_string_literal: true

module Docs
class ExternalUrlsFilter < Filter

def call
if context[:external_urls]

root = path_to_root

css('a').each do |node|

next unless anchorUrl = node['href']

# avoid links already converted to internal links
next if anchorUrl.match?(/\.\./)

if context[:external_urls].is_a?(Proc)
node['href'] = context[:external_urls].call(anchorUrl)
next
end

url = URI(anchorUrl)

context[:external_urls].each do |host, name|
if url.host.to_s.match?(host)
node['href'] = root + name + url.path.to_s + '#' + url.fragment.to_s
end
end

end
end

doc
end

end
end
4 changes: 4 additions & 0 deletions lib/docs/scrapers/backbone.rb
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@ class Backbone < UrlScraper
Licensed under the MIT License.
HTML

options[:external_urls] = {
'underscorejs.org' => 'underscore'
}

def get_latest_version(opts)
doc = fetch_doc('https://backbonejs.org/', opts)
doc.at_css('.version').content[1...-1]
Expand Down

0 comments on commit 38e2b10

Please sign in to comment.