From e1103b133ac453b9f2a1d8709af7a9369c02c04a Mon Sep 17 00:00:00 2001
From: beviah <beviah@users.noreply.github.com>
Date: Sat, 30 Mar 2024 02:35:48 -0500
Subject: [PATCH] Create README.md

added basic info and a teaser
---
 thesauruses-co/README.md | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)
 create mode 100644 thesauruses-co/README.md

diff --git a/thesauruses-co/README.md b/thesauruses-co/README.md
new file mode 100644
index 0000000..bca6c3d
--- /dev/null
+++ b/thesauruses-co/README.md
@@ -0,0 +1,28 @@
+# Thesauruses.co
+
+This is series of ~20 scripts used for parsing and structuring Wiktionary data in a template (locale) agnostic way!
+The same scripts are used to parse all 170+ wiktionary dumps!
+
+Final dataset was and will be used again at https://www.ezglot.com/
+
+We need to upgrade the server to handle the traffic with all new data we created over the years.
+
+## Files
+
+**1. wiki_parse.py**
+  - Loads Wiktionary XML dumps from ./xmls folder
+  - Detects templates and attempts replacements
+  - Normalizes common wiki markups found in various locales (maybe it is not locale agnostic!)
+              but does not rely on specific language strings (maybe it is!)
+  - Converts content into node-property-relation like paths with more/less consistent levels..
+              to be further dealt with in later script(s)
+
+I thought I had the problem solved with this script, but oh boy, was I wrong.. took *a few* more processing steps...
+
+**2. finer.py**
+
+ - *Every few dozen stars will motivate me to add one more script ;-)*
+
+**3. polngrams.py**
+
+...