.gitignore | ||
fedicrawler.py | ||
instance_scrape.json | ||
instances.txt | ||
README.md |
doing frequent applied fediverse research
- hands-on approach to verify statistics on fediverse from fediverse.network and the-federation.info
- draws conclusions from that
methodology
Mapping the network
Currently the script starts from https://post.lurk.org and queries /api/v1/instance/peers
to find servers it is peering with. For each of the peering servers it hasn't seen before it does the same. This from the assumption that getting peer lists from Mastodon & Pleroma gives enough of a view of 'known fediverse' to work with.
initial peer list
all peers of the initial peers
all peers of the peers of the inintial peers ------------------------------------------------- + the known fediverse?
Instance metadata
We try to query /.well-known/nodeinfo
for instance meta-data such as software type etc. This is what both fediverse.network and the-federation.info do
When any request fails on a given instance it logs the raised Exception
, if it is a HTTP error instead we currently log the answer.
Latest scrape results can be found in instance_scrape.json
TODO FIXME
add detailed error message to json when we get oneabstract the functions so we can multithread them