forked from scrapy-plugins/scrapy-jsonrpc
-
Notifications
You must be signed in to change notification settings - Fork 2
/
README.rst
120 lines (91 loc) · 3.03 KB
/
README.rst
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
==================
scrapy-jsonrpc-api
==================
scrapy-jsonrpc is an extension to control a running Scrapy web crawler via
JSON-RPC.
Installation
============
Install scrapy-jsonrpc using ``pip``::
$ pip install scrapy-jsonrpc-api -i https://pypi.python.org/simple/
Configuration
=============
First, you need to include the entension to your ``EXTENSIONS`` dict in
``settings.py``, for example::
EXTENSIONS = {
'scrapy_jsonrpc.webservice.WebService': 500,
}
Then, you need to enable the extension with the `JSONRPC_ENABLED`_ setting,
set to ``True``.
The web server will listen on a port specified in `JSONRPC_PORT`_
(by default, it will try to listen on port 6080),
and will log to the file specified in `JSONRPC_LOGFILE`_.
The endpoint for accessing the crawler object is::
http://localhost:6080/crawler
The Api Response:
```text
[
{
"time()-engine.start_time": 4.3701982498168945,
"engine.has_capacity()": false,
"len(engine.downloader.active)": 10,
"engine.scraper.is_idle()": false,
"engine.spider.name": "zhihu",
"engine.spider_is_idle(engine.spider)": false,
"engine.slot.closing": false,
"len(engine.slot.inprogress)": 29,
"len(engine.slot.scheduler.dqs or [])": 0,
"len(engine.slot.scheduler.mqs)": 1,
"len(engine.scraper.slot.queue)": 0,
"len(engine.scraper.slot.active)": 19,
"engine.scraper.slot.active_size": 258036,
"engine.scraper.slot.itemproc_size": 0,
"engine.scraper.slot.needs_backout()": false
},
{
"Request": "30 oldest: 3s ago",
"TextResponse": "19 oldest: 3s ago",
"TestSpider": "1 oldest: 4s ago"
},
{
"log_count/INFO": 27,
"start_time": "2021-01-17 10:39:02",
"log_count/ERROR": 3,
"scheduler/enqueued/memory": 35,
"scheduler/enqueued": 35,
"scheduler/dequeued/memory": 34,
"scheduler/dequeued": 34,
"downloader/request_count": 34,
"downloader/request_method_count/GET": 34,
"downloader/request_bytes": 28923,
"downloader/response_count": 24,
"downloader/response_status_count/200": 24,
"downloader/response_bytes": 100766,
"response_received_count": 24,
"request_depth_max": 3,
"item_scraped_count": 2
}
]
```
Settings
========
These are the settings that control the web service behaviour:
JSONRPC_ENABLED
---------------
Default: ``False``
A boolean which specifies if the web service will be enabled (provided its
extension is also enabled).
JSONRPC_LOGFILE
---------------
Default: ``None``
A file to use for logging HTTP requests made to the web service. If unset web
the log is sent to standard scrapy log.
JSONRPC_PORT
------------
Default: ``[6080, 7030]``
The port range to use for the web service. If set to ``None`` or ``0``, a
dynamically assigned port is used.
JSONRPC_HOST
------------
Default: ``'127.0.0.1'``
The interface the web service should listen on.
.. _JSON-RPC 2.0: http://www.jsonrpc.org/