Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zhangyiming #120

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>The blah</title>
<link rel="stylesheet" type="text/css" href="homework.css">
</head>
<body>
<div class="header">
<img src="images/blah.png">
<ul class="nav">
<li><a href="#">Home</a></li>
<li><a href="#">Site</a></li>
<li><a href="#">Other</a></li>
</ul>
</div>
<div class="main-content">
<h2>The Beach</h2>
<hr>
<ul class="photos">
<li><img src="images/0001.jpg" width="150" height="150" alt="Picl"></li>
<li><img src="images/0002.jpg" width="150" height="150" alt="Picl"></li>
<li><img src="images/0003.jpg" width="150" height="150" alt="Picl"></li>
</ul>
<p>
this is a demo page,just for learning WebPage.
</p>
</div>
<div class="footer">
<p>&copy;Mugglecoding</p>
</div>
</body>
</html>
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
<?xml version="1.0" encoding="UTF-8"?>
<projectDescription>
<name>1.1练习题答案</name>
<comment></comment>
<projects>
</projects>
<buildSpec>
<buildCommand>
<name>com.aptana.ide.core.unifiedBuilder</name>
<arguments>
</arguments>
</buildCommand>
</buildSpec>
<natures>
<nature>com.aptana.projects.webnature</nature>
</natures>
</projectDescription>
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from bs4 import BeautifulSoup
import string

with open('Desktop/12/index.html', 'r') as web_data:
with open('index.html', 'r') as web_data:
soup = BeautifulSoup(web_data, 'lxml')
titles = soup.select(
'body > div > div > div.col-md-9 > div > div > div > div.caption > h4 > a ')
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
from bs4 import BeautifulSoup

info = []

with open('index.html','r') as wb_data:
Soup = BeautifulSoup(wb_data,'lxml')

titles = Soup.select('body > div > div > div.col-md-9 > div > div > div > div.caption > h4 > a')
images = Soup.select('body > div > div > div.col-md-9 > div > div > div > img')
prices = Soup.select('body > div > div > div.col-md-9 > div > div > div > div.caption > h4.pull-right')
reviews = Soup.select('body > div > div > div.col-md-9 > div > div > div > div.ratings > p.pull-right')
stars = Soup.select('body > div > div > div.col-md-9 > div > div > div > div.ratings > p:nth-of-type(2)')

for title, image, price, review, star in zip(titles, images, prices, reviews, stars):
data = {
'title': title.get_text(),
'iamge': image.get('src'),
'price': price.get_text(),
'review': review.get_text(),
'star': len(star.find_all("span", class_="glyphicon glyphicon-star"))
}

info.append(data)


for i in info:
if i['star'] > 3:
print i['title'],i['price']
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from bs4 import BeautifulSoup

with open('Desktop/12/index.html', 'r') as wb_data: # 使用with open打开本地文件,需替换成自己的路径.关于路径怎么获取,windows可在资源管理器查看,mac可以把文件拖拽到终端内查看
with open('index.html', 'r') as wb_data:# 使用with open打开本地文件,需替换成自己的路径.关于路径怎么获取,windows可在资源管理器查看,mac可以把文件拖拽到终端内查看
soup = BeautifulSoup(wb_data, 'lxml') # 解析网页内容
# print(wb_data)

Expand All @@ -20,7 +20,7 @@
'review': review.get_text(),
'price': price.get_text(),
'star': len(star.find_all("span", class_='glyphicon glyphicon-star'))
# 观察发现,每一个星星会有一次<span class="glyphicon glyphicon-star"></span>,所以我们统计有多少次,就知道有多少个星星了;
# 观察发现,每一个星星会有一次<span class="glyphicon glyphicon-star"></span>,所以我们统计有多少次,就发现规律了知道有多少个星星了;
# 使用find_all 统计有几处是★的样式,第一个参数定位标签名,第二个参数定位css 样式,具体可以参考BeautifulSoup 文档示例http://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/#find-all;
# 由于find_all()返回的结果是列表,我们再使用len()方法去计算列表中的元素个数,也就是星星的数量
}
Expand Down