-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Small improvements to improve debugging and flexibility #97
Conversation
Allow the user to intervene in case Readability would include/exclude content that should be excluded/included instead.
Thanks @tuzz, these changes look reasonable! |
Released in 0.7.2 |
I appreciate the test case for Would it be worth elaborating in the README? |
@avk I can try to explain. I can add something to the README if it would be helpful. Basically, readability tries to extract the "useful" content from the page. It does this by scoring elements and then extracting the one with the highest score. Within that element, there might be sub-elements that aren't particularly useful. For example, a news article might have an advert banner in the middle of it. Because of this, readability does a second pass called "clean conditionally" where it tries to remove those sorts of elements based on some hardcoded rules. If you switch on In some cases, however, it might remove elements that you don't want it to (or include elements that it shouldn't). The change I introduced allows you intervene and override readability's decision using your own lambda. The lambda is provided with some context that includes the HTML element, the element's score, the decision that readability made about whether to clean_conditionally: lambda do |context|
!context[:remove]
end Perhaps a more useful lambda would be one where you force readability to always remove a specific piece of content: clean_conditionally: lambda do |context|
if context[:el].text.include?("Visit our blog")
true # Always remove elements that contain 'Visit our blog'
else
context[:remove] # Otherwise, remove the element according to readability's default rules.
end
end Hopefully that helps! |
@tuzz thank you; fantastic and thorough! Excited to experiment with this. |
Hello, this PR includes a few small improvements that should help users with debugging and allow them improved flexibility in which attributes are preserved and which nodes are removed by
clean_conditionally
.attributes: ["*"]
options[:debug]
to a function, e.g. so that you can add message to Rails loggingoptions[:clean_conditionally]
to a function so that you can override the default decisionThe above changes won't change Readability's behaviour except for the small bug fix in 3).
Thanks for your consideration.