Skip to content
paos edited this page Sep 13, 2010 · 3 revisions

(Part of An Hpricot Showcase.)

CSS selectors are often the shortest and most readable technique for finding elements on an HTML page. Since more people are familiar with CSS than XPath, I recommend this approach.

Using CSS Selectors

When calling the search or the slash / methods, you may use CSS selectors as the search string.

 doc = Hpricot(open("qwantz.html"))
 (doc/'div img[@src^="http://www.qwantz.com/comics/"]')
   #=> Elements[...]

For a complete list of selectors, see Supported CSS Selectors.

Selecting by ID

The quickest way to find a specific element is to search by ID. If an element is defined as <div id='menu'>, you can search for the element by searching for #menu.

 puts doc.search('#menu').inner_html

Selecting by Tag Name

Another common search is to find all the elements with a given tag. The CSS selector for this is just the plain tag name.

To get a count of all span tags:

 puts doc.search("span").length

A shortcut for this is to use a symbol :span as the search term, which will be converted to a string.

 puts (doc/:span).length

Selecting by Class

Search for elements with a certain class by placing a dot before the class name.

 doc.search(".entryTitle").each do |title|
   puts title.inner_html
 end

Often the search will happen quicker if you add the tag name.

 (doc/"div.entryTitle").remove

A similar XPath would be //div[class=‘entryTitle’] . Usually, the CSS selector is far superior to using XPath, though. If an element has more than one class, the CSS selector will still match the element. But the XPath expects only one class name.

So <div class="entryTitle dark"> is a match for doc.search("div.entryTitle") . You can also search for class like this:

 (doc/"div[@class~='entryTitle']").remove

Selecting by Hierarchy

If you’d like to narrow your search for a certain tag, it often helps to identify its parents. By seperating css selectors with a space, you can search deeper into the document.

 (doc/"div.entryPermalink a").empty

That bit of code will find all links anywhere inside divs of the entryPermalink class, emptying the element, removing any HTML inside the link. The links can be anywhere inside the div, children of its children down to any level.

Stacking CSS selector calls also has the same effect:

 (doc/"div.entryPermalink"/"a").empty

Selecting Close Children

If you want to limit your search to just the children of an element, use the > bracket.

 doc.search("div.entryPermalink > a").
   prepend("<b>found you on the left</b>").
   append("<b>found you on the right</b>")

This code searches for all links which are immediate children of entryPermalink classed divs. It then adds some HTML inside each link, to the beginning and end of its inner_html.

Searching Attributes

Most people figure that CSS selectors aren’t as comprehensive for searching attributes when compared to XPath functions. But, that’s just not so. There’s quite a pallette of ways to search attributes.

For example, you can search for all elements with an attribute. To search for all form fields that have a checked property:

 doc.search("input[@checked]")

If you want to find all attributes set to a specific value, use the = equals operator. Let’s search for an anchor named ‘part_two’:

 doc.at("a[@name='part_two']")

Another common search is to find an attribute containing a bit of search text. For this, use the *= operator. So, to find all elements with onclick handlers which reference document.location:

 doc.search("*[@onclick*='document.location']").each do |ele|
   ele.remove_attribute('onclick')
 end

You can also stack the attributes to make a combined search. To find all links containing an onclick and a href attribute:

 doc.search("a[@onclick][@href]")

Other attribute operators are listed among the Supported CSS Selectors.

Negating Searches

If you are having a difficult time tracking down a certain element, it may help to use the :not operator to narrow your search. So, to find paragraphs other than those in the blue class:

 doc.search("p:not(.blue)")

Return to An Hpricot Showcase.