{% extends "en/base.html" %} {% block title %}{% endblock %} {% block search %} {% endblock %} {% block results %}

def tfidf(query, words, corpus):

        # Term Frequency
        tf_count = 0
        for word in words:
            if query == word:
                tf_count += 1
        tf = tf_count/len(words)
        
        # Inverse Document Frequency
        idf_count = 0
        for words in corpus:
            if query in words:
                idf_count += 1
        
        tfidf_value = tf * idf
        
        return tf_count, tf_count, tfidf_value



[Note on contrast mappings]

The TF-IDF algorithm, shown above in the programming language Python, weaves a layer of contrast into the text. Not literally, but in the form of numbers. The most contrastful words are those that the algorithm consideres as the most important words for that text.

These contrast mappings allow for reading across the manifesto and the algorithm.

The TF-IDF values are calculated in two steps. The algorithm first counts the Term Frequency (TF) by counting the appearance of a word in the text, relatively to the total number of words in the document. This way of relative frequency counting makes it possible to compare wordcounts between documents with variating lengths. This makes it possible to compare Donna Harraway's long essay A Cyborg Manifesto (1984) with the relatively short text of The Call for Feminist Data written by Caroline Sinders (2018).

In the second step, the algorithm counts relatively against all the other documents in the same dataset, using the Inversed Document Frequency (IDF). This part of the algorithm, which is Karen Spärck Jones’ addition, introduced a subtle form of inversed relative counting throughout all the documents in the dataset. Instead of just counting word-frequency in one document, Karen proposed to count in a relative inter-document way.

This means that when a word only appears in one or a few documents, that its value is greatly enlarged. The concequence being that words as the or it will be given a very low number, as they appear in all the documents. And specific words, such as paranodal in A Feminist Server Manifesto, will get a very high value as this word is only used 4 times in the whole dataset and all of those 4 occurances where in this manifesto.

Another example is SCUM. Although the word SCUM is not the most commonly used word in the S.C.U.M. Manifesto, it is the word that gets the highest score: relative to all the other manifesto's, SCUM is mostly used in this manifesto. This increases the score a lot.

{{ manifesto | prettyfilename }}

{% for sentence in mappings %}

{% for word, tfidf in sentence %} {{ word }} {% endfor %}

{% endfor %}
{% endblock %} {% block suggestions %} {% endblock %}