-
Notifications
You must be signed in to change notification settings - Fork 15
Hpricot CSS Search
(Part of An Hpricot Showcase.)
CSS selectors are often the shortest and most readable technique for finding elements on an HTML page. Since more people are familiar with CSS than XPath, I recommend this approach.
When calling the search
or the slash /
methods, you may use CSS selectors as the search string.
doc = Hpricot(open("qwantz.html")) (doc/'div img[@src^="http://www.qwantz.com/comics/"]') #=> Elements[...]
For a complete list of selectors, see Supported CSS Selectors.
The quickest way to find a specific element is to search by ID. If an element is defined as <div id='menu'>
, you can search for the element by searching for #menu.
puts doc.search('#menu').inner_html
Another common search is to find all the elements with a given tag. The CSS selector for this is just the plain tag name.
To get a count of all span
tags:
puts doc.search("span").length
A shortcut for this is to use a symbol :span
as the search term, which will be converted to a string.
puts (doc/:span).length
Search for elements with a certain class by placing a dot before the class name.
doc.search(".entryTitle").each do |title| puts title.inner_html end
Often the search will happen quicker if you add the tag name.
(doc/"div.entryTitle").remove
A similar XPath would be //div[class=‘entryTitle’]
. Usually, the CSS selector is far superior to using XPath, though. If an element has more than one class, the CSS selector will still match the element. But the XPath expects only one class name.
So <div class="entryTitle dark">
is a match for doc.search("div.entryTitle")
. You can also search for class like this:
(doc/"div[@class~='entryTitle']").remove
If you’d like to narrow your search for a certain tag, it often helps to identify its parents. By seperating css selectors with a space, you can search deeper into the document.
(doc/"div.entryPermalink a").empty
That bit of code will find all links anywhere inside divs of the entryPermalink
class, emptying the element, removing any HTML inside the link. The links can be anywhere inside the div, children of its children down to any level.
Stacking CSS selector calls also has the same effect:
(doc/"div.entryPermalink"/"a").empty
If you want to limit your search to just the children of an element, use the > bracket.
doc.search("div.entryPermalink > a"). prepend("<b>found you on the left</b>"). append("<b>found you on the right</b>")
This code searches for all links which are immediate children of entryPermalink
classed divs. It then adds some HTML inside each link, to the beginning and end of its inner_html
.
Most people figure that CSS selectors aren’t as comprehensive for searching attributes when compared to XPath functions. But, that’s just not so. There’s quite a pallette of ways to search attributes.
For example, you can search for all elements with an attribute. To search for all form fields that have a checked
property:
doc.search("input[@checked]")
If you want to find all attributes set to a specific value, use the =
equals operator. Let’s search for an anchor named ‘part_two’:
doc.at("a[@name='part_two']")
Another common search is to find an attribute containing a bit of search text. For this, use the *=
operator. So, to find all elements with onclick
handlers which reference document.location
:
doc.search("*[@onclick*='document.location']").each do |ele| ele.remove_attribute('onclick') end
You can also stack the attributes to make a combined search. To find all links containing an onclick and a href attribute:
doc.search("a[@onclick][@href]")
Other attribute operators are listed among the Supported CSS Selectors.
If you are having a difficult time tracking down a certain element, it may help to use the :not
operator to narrow your search. So, to find paragraphs other than those in the blue
class:
doc.search("p:not(.blue)")
Return to An Hpricot Showcase.