Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can memory usage be improved? #14

Open
bernhardreiter opened this issue Feb 12, 2021 · 0 comments
Open

Can memory usage be improved? #14

bernhardreiter opened this issue Feb 12, 2021 · 0 comments

Comments

@bernhardreiter
Copy link
Member

Running the ripe importer uses quite a bit of memory (~8GB as of today).
Can this be reduced?

Analysis

Downloading the ripe files for 2021-02-12 gives us the raw size we want to process:

2021-02-12> gzip -l *

gzip: delegated-ripencc-latest: not in gzip format
         compressed        uncompressed  ratio uncompressed_name
            8206950            78779140  89.6% ripe.db.aut-num
           25903102           507574226  94.9% ripe.db.inet6num
          242928983          3628042984  93.3% ripe.db.inetnum
            5843624            95550239  93.9% ripe.db.organisation
            4413327            77870566  94.3% ripe.db.role
          287295986          4387817155  93.5% (totals)

so 4.3 GB of uncompressed data uncompressed data.

Using https://pypi.org/project/memory-profiler/

# Debian Buster
apt-get install python3-memory-profiler python3-matplotlib

Decorating a few functions, where the memory consumption is:

--- a/intelmq_certbund_contact/ripe/ripe_data.py
+++ b/intelmq_certbund_contact/ripe/ripe_data.py
@@ -78,2 +78,3 @@ def add_common_args(parser):
 
+@profile
 def load_ripe_files(options) -> tuple:
@@ -205,2 +206,3 @@ def read_asn_whitelist(filename, verbose=False):
 
+@profile
 def parse_file(filename, fields, index_field=None, restriction=lambda x: True,
@@ -298,2 +300,3 @@ def parse_file(filename, fields, index_field=None, restriction=lambda x: True,
 
+@profile
 def build_index(obj_list, index_attribute):
@@ -441,2 +444,3 @@ def split_for_known_orgs(obj_list, organisation_index):
 
+@profile
 def sanitize_split_and_modify(obj_list, index, whitelist,
@@ -501,2 +505,3 @@ def sanitize_split_and_modify(obj_list, index, whitelist,
 
+@profile
 def convert_inetnum_to_networks(inetnum_list):
@@ -510,2 +515,3 @@ def convert_inetnum_to_networks(inetnum_list):
 
+@profile
 def convert_inet6num_to_networks(inet6num_list):
@@ -517,2 +523,3 @@ def convert_inet6num_to_networks(inet6num_list):
 
+@profile
 def process_inetnum_contacts(key, inet_list, inet_list_u, restrict_country):

We can get a plot, trying to import with a country restriction of NO:

env PYTHONPATH=/home/bern/dev/certbund-contact-git: python3-mprof run /home/bern/dev/certbund-contact-git/intelmq_certbund_contact/ripe/ripe_import.py -v --restrict-to-country NO --conninfo 'host=localhost port=5432 dbname=contactdb'
python3-mprof plot -t "ripe_importer memory profile 2021-12-02"

mprofile_20210212110015 dat

Here is the data file for interactive browsing (rename to remove the .txt suffix):
mprofile_20210212110015.dat.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant