Skip to content

Commit

Permalink
check for HTML Tag and remove them for LaTeX
Browse files Browse the repository at this point in the history
  • Loading branch information
myla committed Sep 3, 2024
1 parent a6ea9dd commit 4fb2e15
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions data_extraction/latex/build_latex.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import io
from decimal import Decimal
import shutil
import re

methodsdir = "../../methods"
imagedir = "../../project-page/static/images/"
Expand Down Expand Up @@ -208,6 +209,9 @@ def extract_title_and_text(markdown: str):
# Who puts hashtags in a title anyway?
title = lines[0].replace("#", "").strip()
text = "\n".join(lines[1:]).strip()
# check for html
clean = re.compile("<.*?>")
text = re.sub(clean, "", text)
return title, text


Expand Down

0 comments on commit 4fb2e15

Please sign in to comment.