Cuiqingyao / IR-IESystem Public

Notifications You must be signed in to change notification settings
Fork 2
Star 1

Information Retrieval & Information Extraction System

Apache-2.0 license

1 star 2 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.idea		.idea
Crawler		Crawler
conf		conf
db		db
mysite		mysite
screenshot		screenshot
utils		utils
word_cut		word_cut
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Repository files navigation

IR-IESystem

Information Retrieval & Information Extraction System

程序结构

conf 存放数据库和爬虫的相关配置文件
Crawler--spider.py 爬虫程序
db--DBHelper.py 操作数据库 --jobs.sql 包含数据库结构和数据的SQL脚本
mysite web程序，提供浏览器访问页面
screenshot 存放中间结果截图，项目运行截图
utils 工具函数库
word_cut 分词与统计相关程序

相关依赖库

安装命令 pip3 install 依赖库名称
- pymysql
- requests
- BeautifulSoup
- django

如何运行

首先在本地配置好python3.6环境，安装相关依赖库。
接着需要在mysql数据库上运行jobs.sql脚本。
(选做)在确保网络平稳的状态下，运行Clawler--spider.py文件对51job网站上的招聘信息进行爬取并存储到mysql数据库中，因为在jobs.sql中已经将数据的插入命令写好，所以这一步选做(注意：因为爬虫程序爬取过程较慢，如果只是想看项目的演示过程，则可跳过这一步)。
进入mysite目录下,运行python manage.py runserver 8080命令，服务器将自行启动。
打开浏览器（建议使用Chrome），在地址栏输入127.0.0.1:8080，回车即可访问项目首页。

About

Information Retrieval & Information Extraction System

Apache-2.0 license

Report repository

Releases

No releases published

Packages

No packages published

Languages