You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Several people have noted that the deleted tweet reports are not including some recent WBM snapshots (e.g. here). I think this is some kind of special handling for manually archived tweets but haven't looked in detail. In any case we need to fix the parser.
The text was updated successfully, but these errors were encountered:
This format is fairly rare (for one recent test scrape I found 42 snapshots with this format out of ~20k total).
The digests returned from the CDX index for these snapshots are consistently incorrect (possibly because they're being computed in some non-standard way that isn't documented and that we don't know about).
These snapshots are a minimal HTML representation that includes Schema.org metadata (something that Twitter doesn't seem to use anywhere else). This seems like an experiment at Twitter, possibly in conversation with the Internet Archive, but I haven't been able to find any documentation or anyone who's able to talk about it.
Several people have noted that the deleted tweet reports are not including some recent WBM snapshots (e.g. here). I think this is some kind of special handling for manually archived tweets but haven't looked in detail. In any case we need to fix the parser.
The text was updated successfully, but these errors were encountered: