We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running the ripe importer uses quite a bit of memory (~8GB as of today). Can this be reduced?
Downloading the ripe files for 2021-02-12 gives us the raw size we want to process:
2021-02-12> gzip -l * gzip: delegated-ripencc-latest: not in gzip format compressed uncompressed ratio uncompressed_name 8206950 78779140 89.6% ripe.db.aut-num 25903102 507574226 94.9% ripe.db.inet6num 242928983 3628042984 93.3% ripe.db.inetnum 5843624 95550239 93.9% ripe.db.organisation 4413327 77870566 94.3% ripe.db.role 287295986 4387817155 93.5% (totals)
so 4.3 GB of uncompressed data uncompressed data.
Using https://pypi.org/project/memory-profiler/
# Debian Buster apt-get install python3-memory-profiler python3-matplotlib
Decorating a few functions, where the memory consumption is:
--- a/intelmq_certbund_contact/ripe/ripe_data.py +++ b/intelmq_certbund_contact/ripe/ripe_data.py @@ -78,2 +78,3 @@ def add_common_args(parser): +@profile def load_ripe_files(options) -> tuple: @@ -205,2 +206,3 @@ def read_asn_whitelist(filename, verbose=False): +@profile def parse_file(filename, fields, index_field=None, restriction=lambda x: True, @@ -298,2 +300,3 @@ def parse_file(filename, fields, index_field=None, restriction=lambda x: True, +@profile def build_index(obj_list, index_attribute): @@ -441,2 +444,3 @@ def split_for_known_orgs(obj_list, organisation_index): +@profile def sanitize_split_and_modify(obj_list, index, whitelist, @@ -501,2 +505,3 @@ def sanitize_split_and_modify(obj_list, index, whitelist, +@profile def convert_inetnum_to_networks(inetnum_list): @@ -510,2 +515,3 @@ def convert_inetnum_to_networks(inetnum_list): +@profile def convert_inet6num_to_networks(inet6num_list): @@ -517,2 +523,3 @@ def convert_inet6num_to_networks(inet6num_list): +@profile def process_inetnum_contacts(key, inet_list, inet_list_u, restrict_country):
We can get a plot, trying to import with a country restriction of NO:
env PYTHONPATH=/home/bern/dev/certbund-contact-git: python3-mprof run /home/bern/dev/certbund-contact-git/intelmq_certbund_contact/ripe/ripe_import.py -v --restrict-to-country NO --conninfo 'host=localhost port=5432 dbname=contactdb' python3-mprof plot -t "ripe_importer memory profile 2021-12-02"
Here is the data file for interactive browsing (rename to remove the .txt suffix): mprofile_20210212110015.dat.txt
.txt
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Running the ripe importer uses quite a bit of memory (~8GB as of today).
Can this be reduced?
Analysis
Downloading the ripe files for 2021-02-12 gives us the raw size we want to process:
so 4.3 GB of uncompressed data uncompressed data.
Using https://pypi.org/project/memory-profiler/
# Debian Buster apt-get install python3-memory-profiler python3-matplotlib
Decorating a few functions, where the memory consumption is:
We can get a plot, trying to import with a country restriction of NO:
Here is the data file for interactive browsing (rename to remove the
.txt
suffix):mprofile_20210212110015.dat.txt
The text was updated successfully, but these errors were encountered: