You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Awesome project! I'm interested in importing CC web graphs into a DB so I can do processing. My understanding is that each web graph dump on CC's site only has the last 3 months of scraping. I'd like to combine them all so that I can generate a full graph of what hosts have linked to other hosts, ever.
Are node IDs consistent between different web graph dumps? If so, this would simplify merging them together substantially.
The text was updated successfully, but these errors were encountered:
Node IDs need to change between webgraphs because nodes are just enumerated starting from zero. This is a requirement of the webgraph format. Your plan is very similar to a time-aware webgraph, see #17.
Hello,
Awesome project! I'm interested in importing CC web graphs into a DB so I can do processing. My understanding is that each web graph dump on CC's site only has the last 3 months of scraping. I'd like to combine them all so that I can generate a full graph of what hosts have linked to other hosts, ever.
Are node IDs consistent between different web graph dumps? If so, this would simplify merging them together substantially.
The text was updated successfully, but these errors were encountered: