Skip to content

Commit

Permalink
Improve scrape performance by using lxml parser (home-assistant#71087)
Browse files Browse the repository at this point in the history
* Improve scape performance by using lxml parser

* load it

* tweak

* tweak

* ensure libxml2 is installed in dev container
  • Loading branch information
bdraco authored May 2, 2022
1 parent c23866e commit b770ca3
Show file tree
Hide file tree
Showing 5 changed files with 9 additions and 2 deletions.
1 change: 1 addition & 0 deletions Dockerfile.dev
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ RUN \
libavfilter-dev \
libpcap-dev \
libturbojpeg0 \
libxml2 \
git \
cmake \
&& apt-get clean \
Expand Down
2 changes: 1 addition & 1 deletion homeassistant/components/scrape/manifest.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"domain": "scrape",
"name": "Scrape",
"documentation": "https://www.home-assistant.io/integrations/scrape",
"requirements": ["beautifulsoup4==4.11.1"],
"requirements": ["beautifulsoup4==4.11.1", "lxml==4.8.0"],
"after_dependencies": ["rest"],
"codeowners": ["@fabaff"],
"iot_class": "cloud_polling"
Expand Down
2 changes: 1 addition & 1 deletion homeassistant/components/scrape/sensor.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ def __init__(

def _extract_value(self) -> Any:
"""Parse the html extraction in the executor."""
raw_data = BeautifulSoup(self.rest.data, "html.parser")
raw_data = BeautifulSoup(self.rest.data, "lxml")
_LOGGER.debug(raw_data)

try:
Expand Down
3 changes: 3 additions & 0 deletions requirements_all.txt
Original file line number Diff line number Diff line change
Expand Up @@ -974,6 +974,9 @@ lupupy==0.0.24
# homeassistant.components.lw12wifi
lw12==0.9.2

# homeassistant.components.scrape
lxml==4.8.0

# homeassistant.components.nmap_tracker
mac-vendor-lookup==0.1.11

Expand Down
3 changes: 3 additions & 0 deletions requirements_test_all.txt
Original file line number Diff line number Diff line change
Expand Up @@ -663,6 +663,9 @@ lru-dict==1.1.7
# homeassistant.components.luftdaten
luftdaten==0.7.2

# homeassistant.components.scrape
lxml==4.8.0

# homeassistant.components.nmap_tracker
mac-vendor-lookup==0.1.11

Expand Down

0 comments on commit b770ca3

Please sign in to comment.