Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve experiments #28

Merged
merged 4 commits into from
Oct 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 0 additions & 37 deletions test/beautifulSoup_test/extractURLs.py
Yowgf marked this conversation as resolved.
Show resolved Hide resolved

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,13 @@
var url2 = "https://www.example.com";
</script>
<p>Este é um exemplo</p>
"https://www.youtube.com/"
<span> https://www.youtube.com/ </span>
<span> "https://www.youtube.com/" </span>
<script>
var url3 = "https://www.example.org";
</script>
<script>
"https://www.kotlin.com"
</script>
</body>
</html>
</html>
37 changes: 37 additions & 0 deletions test/beautiful_soup/extract_urls.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
from bs4 import BeautifulSoup
import re
import sys

def extract_script_urls(html_content):
urls = []

soup = BeautifulSoup(html_content, 'html.parser')

script_tags = soup.find_all('script')

regex = re.compile('(https?:\/\/)?([a-zA-Z0-9\.-]+\.[a-z\.]{2,6})([\/\w \.-]*)*\/?')
for script_tag in script_tags:
if script_tag.string:
urls_found = [''.join(r) for r in regex.findall(script_tag.string)]
urls.extend(urls_found)

return urls

def main():
# Check if an HTML file was passed as a command-line argument
if len(sys.argv) < 2:
print("Usage: python script.py input_file.html")
return

# Read the content of the HTML file
file_path = sys.argv[1]
input_html = open(file_path).read()

urls = extract_script_urls(input_html)

print("Extracted URLs:")
for url in urls:
print(url)

if __name__ == "__main__":
main()
1 change: 1 addition & 0 deletions test/comby/count-for.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
for (:[_]) { for (:[_]) { :[_] } }
7 changes: 7 additions & 0 deletions test/comby/count_loops.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/sh
#
# Usage: ./count_loops.sh "my-file.c"

file=$1
[ -z "$file" ] && echo "Please provide a file to process" && exit 1
comby -count -match-only "$(cat count-for.txt)" '' "$file"
File renamed without changes.
7 changes: 7 additions & 0 deletions test/comby/main.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
for (int i = 0; i < 10; i++) {
for (int j = 0; j < 20; j++) {
for (int k = 0; k < 30; k++) {
printf("%d %d %d\n", i, j, k);
}
}
}
5 changes: 0 additions & 5 deletions test/comby_test/count_for_loops.comby

This file was deleted.

27 changes: 0 additions & 27 deletions test/comby_test/count_loops.py

This file was deleted.