doing applied fediverse research
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
rra 09d76040eb crawler now scrapes in parallel threads 7 years ago
.gitignore gitignore and readme 7 years ago
README.md more info on methodology and where it is lacking 7 years ago
fedicrawler.py crawler now scrapes in parallel threads 7 years ago
instance_scrape.json scrape with metadata on 30/5/2018 7 years ago
instances.txt first version, crawls only the announced peers 7 years ago

README.md

directorate for applied fediverse research

  • independently tries to verify fediverse statistics
  • draws conclusions from that

methodology

Currently the script starts from https://post.lurk.org and queries /api/v1/instance/peers to find servers it is peering with. For each of the peering servers it hasn't seen before it does the same and in addition it tries to query /api/v1/instance for meta data.

This method is a bit lacking because providing /api/v1/instance is voluntary and specific to later versions of mastodon/activitypub fediverse. We should study the methodology of fediverse.network for better results.

When the request fails on a given instance it just logs it as 'error' now.

Latest scrape results can be found in instance_scrape.json

TODO FIXME

  • add detailed error message to json when we get one
  • abstract the functions so we can multithread them