rra/dafr

doing applied fediverse research

Go to file

rra 1512278240 now based primarily on nodeinfo2, add socksproxy, filtering out weird stuff		2020-04-30 12:37:57 +02:00
.gitignore	gitignore and readme	2018-05-30 09:05:29 +02:00
fedicrawler.py	now based primarily on nodeinfo2, add socksproxy, filtering out weird stuff	2020-04-30 12:37:57 +02:00
instance_scrape.json	29/4 2020	2020-04-29 13:16:30 +02:00
instances.txt	first version, crawls only the announced peers	2018-05-30 08:20:46 +02:00
README.md	scraper now uses parallelism	2018-06-07 23:51:07 +02:00

README.md

directorate for applied fediverse research

independently tries to verify fediverse statistics
draws conclusions from that

methodology

Currently the script starts from https://post.lurk.org and queries /api/v1/instance/peers to find servers it is peering with. For each of the peering servers it hasn't seen before it does the same and in addition it tries to query /api/v1/instance for meta data.

This method is a bit lacking because providing /api/v1/instance is voluntary and specific to later versions of mastodon/activitypub fediverse. We should study the methodology of fediverse.network for better results.

When the request fails on a given instance it just logs it as 'error' now.

Latest scrape results can be found in instance_scrape.json

TODO FIXME

add detailed error message to json when we get one
~~abstract the functions so we can multithread them~~
find a way to also scrape for instances that don't announce themselves