rra/dafr

doing applied fediverse research

Go to file

rra c003a6ae96 added new script to document mastodon about pages		2020-05-05 15:24:34 +02:00
.gitignore	added new script to document mastodon about pages	2020-05-05 15:24:34 +02:00
about_collector.py	added new script to document mastodon about pages	2020-05-05 15:24:34 +02:00
fedicrawler.py	now based primarily on nodeinfo2, add socksproxy, filtering out weird stuff	2020-04-30 12:37:57 +02:00
instance_scrape.json	april 30 2020	2020-04-30 12:38:14 +02:00
instances.txt	first version, crawls only the announced peers	2018-05-30 08:20:46 +02:00
README.md	update readme	2020-04-30 12:53:54 +02:00

README.md

doing frequent applied fediverse research

hands-on approach to verify statistics on fediverse from fediverse.network and the-federation.info
draws conclusions from that

methodology

Mapping the network

Currently the script starts from https://post.lurk.org and queries /api/v1/instance/peers to find servers it is peering with. For each of the peering servers it hasn't seen before it does the same. This from the assumption that getting peer lists from Mastodon & Pleroma gives enough of a view of 'known fediverse' to work with.

initial peer list

all peers of the initial peers

all peers of the peers of the inintial peers

------------------------------------------------- + the known fediverse?

Instance metadata

We try to query /.well-known/nodeinfo for instance meta-data such as software type etc. This is what both fediverse.network and the-federation.info do

When any request fails on a given instance it logs the raised Exception, if it is a HTTP error instead we currently log the answer.

Latest scrape results can be found in instance_scrape.json

TODO FIXME

~~add detailed error message to json when we get one~~
~~abstract the functions so we can multithread them~~