Skip to content

Commit

Permalink
Merge pull request #30 from shura71/master
Browse files Browse the repository at this point in the history
Added possibility to change HTTP request headers e.g. User-Agent
  • Loading branch information
benbalter authored May 31, 2024
2 parents e3006e8 + 5ab3596 commit fd59b01
Showing 1 changed file with 6 additions and 2 deletions.
8 changes: 6 additions & 2 deletions lib/sitemap-parser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ def urls
filter_sitemap_urls(urlset.search('url'))
elsif sitemapindex
options[:recurse] ? parse_sitemap_index : []
elsif raw_sitemap.strip.empty?
[]
else
raise 'Malformed sitemap, no urlset or sitemapindex'
end
Expand Down Expand Up @@ -95,14 +97,16 @@ def remote_sitemap?
end

def local_sitemap?
File.exist?(url) && url =~ %r{[\\/]sitemap(_index)?\.xml\Z}i
File.exist?(url)
end

def fetch_remote_sitemap
return nil unless remote_sitemap?

request_options = options.dup.tap { |opts| opts.delete(:recurse); opts.delete(:url_regex) }
request_options[:headers] = { 'User-Agent' => 'Sitemap-Parser' }
unless options[:headers] && options[:headers]['User-Agent']
request_options[:headers] = { 'User-Agent' => 'Sitemap-Parser' }
end
request = Typhoeus::Request.new(url, request_options)

response = request.run
Expand Down

0 comments on commit fd59b01

Please sign in to comment.