directorate for applied fediverse research
Vous ne pouvez pas sélectionner plus de 25 sujets Les noms de sujets doivent commencer par une lettre ou un nombre, peuvent contenir des tirets ('-') et peuvent comporter jusqu'à 35 caractères.
rra ce613426d0 rerun scrape may 2019 il y a 5 mois
.gitignore gitignore and readme il y a 1 an
README.md scraper now uses parallelism il y a 1 an
fedicrawler.py small fixes il y a 1 an
instance_scrape.json rerun scrape may 2019 il y a 5 mois
instances.txt first version, crawls only the announced peers il y a 1 an

README.md

directorate for applied fediverse research

  • independently tries to verify fediverse statistics
  • draws conclusions from that

methodology

Currently the script starts from https://post.lurk.org and queries /api/v1/instance/peers to find servers it is peering with. For each of the peering servers it hasn’t seen before it does the same and in addition it tries to query /api/v1/instance for meta data.

This method is a bit lacking because providing /api/v1/instance is voluntary and specific to later versions of mastodon/activitypub fediverse. We should study the methodology of fediverse.network for better results.

When the request fails on a given instance it just logs it as ‘error’ now.

Latest scrape results can be found in instance_scrape.json

TODO FIXME

  • add detailed error message to json when we get one
  • abstract the functions so we can multithread them
  • find a way to also scrape for instances that don’t announce themselves