get stocks information by web crawler

scrapy+mongodb+proxy+user_agent to crawl stocks from https://gupiao.baidu.com/stock 具体细节可以查看官方的帮助文档，中文本版：https://scrapy-chs.readthedocs.io/zh_CN/latest/intro/tutorial.html#intro-tutorial

technical route

request and re :use 'requests' and 're' modules to extract each stocks code from http://quote.eastmoney.com/stocklist.html
scrapy : use scrapy to get stocks information in detail from https://gupiao.baidu.com/stock
beautiful soup : use beautiful soup to extract interested information from the html file.
MongoDB : use mongodb to store data

request header : many website will reject your request if you use the default request header
--solutions: use header pool randomly in each request
agency : scrapy is a distributed crawler frame , so if you always use the same ip address ,there is a great chance that your ip will be banned.
--solutions: To avoid banned , we can get many agencies from https://free-proxy-list.net ,and put the usable ips to our ip pool ,if our request rejected ,we can change a new ip .

pipeline technique: after getting data, then we can use pipelines to filter the data

scrapy frame offer many configurations for user to set ,you can use appropriate setting for your own project .

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
main		main
README.md		README.md
SQL.sql		SQL.sql
StockInfo.txt		StockInfo.txt
debug.py		debug.py
scrapy.cfg		scrapy.cfg