-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deal with email tags in RIPE db changes #11
Comments
creating test casesUsing fresh temporary database with postgresql and download ripe data as described in the documentation. set -l day 2021-03-03
cp -a $day $day-trunc
cd $day-trunc
gzip -d --stdout ../$day/ripe.db.inet6num.gz | head --lines=100000 | gzip > ripe.db.inet6num.gz
gzip -d --stdout ../$day/ripe.db.inetnum.gz | head --lines=100000 | gzip > ripe.db.inetnum.gz
set -l b /home/bern/dev/certbund-contact-git
env PYTHONPATH=$b: python3 $b/intelmq_certbund_contact/ripe/ripe_import.py -v --restrict-to-country NO --conninfo 'host=localhost dbname=contactdb'
# using approximately 3% of the entries
psql -h localhost contactdb -c "select email from contact_automatic TABLESAMPLE BERNOULLI (3);" --tuples-only | \
head --lines=-1 | cut --characters=2- >sample_emails.txt
# sed 's/\(.*\)/UPDATE contact_automatic SET email=\'x-\1\' WHERE email=\'\1\';/' sample_emails.txt >sample_emails.sql
# psql -h localhost contactdb -f sample_emails.sql
# sed 's/\(.*\)/x-\1/' sample_emails.txt >sample_emails_new.txt
env PYTHONPATH=$b: python3 $b/intelmq_certbund_contact/ripe/ripe_diff.py -v --restrict-to-country NO --conninfo 'host=localhost dbname=contactdb' |
As #11 has a number of cases that we want to detect and possibly handle differently, test cases are needed. To create them reliably and without we use a script that creates sql commands. An alternative is have been to go through the fody-backend, but which would require one more component to run for building the test cases.
creating test casesTo find 10 email address that have been used in several contacts: SELECT * FROM (
SELECT COUNT(email), email FROM contact_automatic GROUP BY email ORDER BY COUNT(email) Desc) AS c
WHERE count >1 LIMIT 10; as single command to produce a list psql -h localhost contactdb -c "SELECT email
FROM (SELECT COUNT(email), email FROM contact_automatic
GROUP BY email ORDER BY COUNT(email) Desc) AS c
WHERE count >1 LIMIT 10;
" --tuples-only | head --lines=-1 | cut --characters=2- > multiple_emails.txt To change the email address of only one contact of several for one email, the sql command is more involved: sed 's/\(.*\)/UPDATE contact_automatic SET email=\'x-\1\' WHERE contact_automatic_id IN (SELECT contact_automatic_id FROM contact_automatic WHERE email=\'\1\' LIMIT 1);/' with the combination of |
There is a difficulty: the ripe_import workflow does not look at the differences, it only completely deletes the old contents and inserts the new database completely. So in order to find the organisation with changed email addresses, a One implementation strategy could be to let the diff run create the necessary changes in the email tags to be saved in a file and executed after the import run. |
If an organisation changes its email address, there is a warning when running
ripe_diff.py
.However, if this email address has had extra information (like being disabled or tags), there maybe additional
steps necessary. This issue should make sure that they are done by either flagging them to the update operator
or doing the changes itself (if possible).
considerations
The table
email_tag
is independent from the manual and automatic tables and has effects on all email addresses when it is attempted to use the address.So if for an organisation the contact with emailA is replaced by a different contact with emailB,
what should happen?
Let us consider the cases. We have two email addresses:
They both maybe used in one or several contacts (both manual or automatic).
Case 1 both have no status info
Nothing to do (solved)
Case 2 emailA has status info, emailB does not
If emailB is to be used for the first time, it could inherit the status info of emailA automatically.
(Though it is unclear if in all cases the new email address' operator actually wants this.)
Else it is not clear because the other organisations that already have emailB did not have that status
so it cannot be transfered without surprising them. -> Flag for manual decision.
Case 3 emailA does not, emailB has status info
If emailB has status infos, it is already used by another organisation, so it is unclear. -> Flag for manual decison.
Case 4 emailA and emailB both have status info
If they are the same status info -> do nothing (solved).
Else (the info differs) and it is undecided which is the better one for the remaining orgs for emailB
and the new one. -> Flag for decision.
-> Flag
As there are only a one subcase (2.1) where the script could automatically do the right thing,
a good approach seems to be to flag the situation.
The text was updated successfully, but these errors were encountered: