Browse Source

[text 1] ok

master
Marianne Plano 2 years ago
parent
commit
44f0581504
  1. 8
      00_contributions/FINAL/layout/00_header.tex
  2. 53
      00_contributions/FINAL/layout/01_introduction.tex
  3. BIN
      pdfs/01_introduction.pdf
  4. 61
      pdfs/01_introduction.tex
  5. BIN
      vltk.pdf

8
00_contributions/FINAL/layout/00_header.tex

@ -37,6 +37,8 @@
\usepackage{eso-pic}
\usepackage{xespotcolor}
\usepackage{hyperref}
\usepackage{changepage}
\hypersetup{
hidelinks, % Remove visible links altogether
%urlbordercolor = 1 1 1% Make URL link border white
@ -175,11 +177,11 @@ keepaspectratio]{04_magic-magick/test_toc-03-white.png}%
% CODE STYLE
\lstset{
aboveskip=3mm,
belowskip=3mm,
aboveskip=12pt,
belowskip=12pt,
showstringspaces=false,
columns=flexible,
basicstyle={\small\ttfamily},
basicstyle={\ttfamily\fontsize{9pt}{12pt}\selectfont},
numbers=none,
numberstyle=\tiny,
keywordstyle=\bfseries,

53
00_contributions/FINAL/layout/01_introduction.tex

@ -53,7 +53,7 @@ In this introductory text, we will share some of the thoughts behind \nohyphens{
VLTK takes a dive into the logical operations that are used to process language with a computer to speak back to a range of unassuming habits in the field of computational language processing, and step towards modes of embedded, slow, and vernacular language processing and knowledge organisation. “Vernacular” in this text refers to everyday speech forming at the margins of standardisation; the ephemeral aspects of a culture's particularities that resist or exist alongside dominant systems of institutional aesthetics; or the encapsulation of a specific nowness in time.
\fontdimen3\font=0.2em
Stretching the vocabulary that is commonly used by computational language processing practices is an important part of the work. In these contexts, language is often understood as “natural” or “artificial,” where the natural refers to spoken or written human language and the artificial to formal languages such as programming languages. Questioning these terms enables us to approach language as culturally promiscuous, and constantly in flux. Language is complex, messy, and contains all kinds of structures; some emerge through use, like sayings or expressions, others are imposed by external forces, sometimes even violently.\footnote{ “It is often forgotten that {[}dictionaries{]} are artificial repositories, put together well after the languages they define. The roots of language are irrational and of a magical nature.”\emph{ }Jorge Luis Borges, \emph{El otro, el mismo}. (Buenos Aires: Emecé, 2005), 5. Translation by the authors.} Instead of thinking of language as “raw” data, we prefer to consider it as heavily embedded and dense cultural material, which carries traces of its uses through time and ties to different locations. And instead of speaking of “extracting” keywords or phrases, to think of such actions as \emph{re-formations} or \emph{di-versioning}.
Stretching the vocabulary that is commonly used by computational language processing practices is an important part of the work. In these contexts, language is often understood as “natural” or “artificial,” where the natural refers to spoken or written human language and the artificial to formal languages such as programming languages. Questioning these terms enables us to approach language as culturally promiscuous, and constantly in flux. Language is complex, messy, and contains all kinds of structures; some emerge through use, like sayings or expressions, others are imposed by external forces, sometimes even violently.\footnote{ \fontdimen3\font=0.6em “It is often forgotten that {[}dictionaries{]} are artificial repositories, put together well after the languages they define. The roots of language are irrational and of a magical nature.”\emph{ }Jorge Luis Borges, \emph{El otro, el mismo}. (Buenos Aires: Emecé, 2005), 5. Translation by the~authors.}\fontdimen3\font=0.2em Instead of thinking of language as “raw” data, we prefer to consider it as heavily embedded and dense cultural material, which carries traces of its uses through time and ties to different locations. And instead of speaking of “extracting” keywords or phrases, to think of such actions as \emph{re-formations} or \emph{di-versioning}.
We are curious about undoing and crossing such pervasive terminologies into methods that allow us to rethink how we work with language in code: from dictionaries to \emph{contradictionaries}, from counting via accounting to \emph{ac-count-abilities}, from overlapping to \emph{overlooping}, from formatting to \emph{formatterings}. For example, a \emph{contradictionary} could provide openings to possible interpretations of a word, instead of defining its meaning. A \emph{formattering} could refer to the shaping of matter through formats. And so~on.
@ -65,7 +65,7 @@ The specific focus of VLTK on language playfully blurs the boundaries between to
\fontdimen3\font=0.2em
\looseness=16
\noindent
The acronym VLTK is a response to, and joke on, a ubiquitous programming library called NLTK, which stands for Natural Language ToolKit. It is a well-known project among programmers and people working in the field of computational linguistics, which is also known as the field of Natural Language Processing (NLP). Current, concrete use cases of NLTK include operations such as text classification for customer support, sentiment analysis for marketing purposes, or automated hate speech detection. Our collective work around VLTK started from a discomfort with the expression “natural language,” which is used in the field of computational linguistics to refer to language that has not been structured (yet) for further computational processing. Accepting the premise that language is natural would imply ignoring the procedures through which language becomes naturalised, imposed, or overwritten, and ignoring the political mechanisms that sustain these efforts. We came to VLTK through a word play that scratched the itch of this discomfort and we started replacing the term “natural” with “vernacular” instead. Vernacular, a word close to “vulgar” -- of the people, common -- points towards the processes of language formation, and the context and urgency they require to exist. This small but defiant joke slowly grew through conversations, reflections, and a desire to try to do otherwise. It opened up and stretched generative spaces of interpretation.
The acronym VLTK is a response to, and joke on, a ubiquitous programming library called NLTK, which stands for Natural Language ToolKit. It is a well-known project among programmers and people working in the field of computational linguistics, which is also known as the field of Natural Language Processing (NLP). Current, concrete use cases of NLTK include operations such as text classification for customer support, sentiment analysis for marketing purposes, or automated hate speech detection. Our collective work around VLTK started from a discomfort with the expression “natural language,” which is used in the field of computational linguistics to refer to language that has not been structured (yet) for further computational processing. Accepting the premise that language is natural would imply ignoring the procedures through which language becomes naturalised, imposed, or overwritten, and ignoring the political mechanisms that sustain these efforts. We came to VLTK through a word play that scratched the itch of this discomfort and we started replacing the term “natural” with “vernacular” instead. Vernacular, a word close to “vulgar” -- of the people, common -- points towards the processes of language formation, and the context and urgency they require to exist. This small but defiant joke slowly grew through conversations, reflections, and a desire to try to do otherwise. It opened up and stretched generative spaces ofinterpretation.
\fontdimen3\font=0.15em
NLTK allows you to interface with text in many different ways using a programming language, in this case Python. The makers of NLTK introduce the project on their website by describing it as:
@ -81,12 +81,12 @@ NLTK comes with a whole set of interfaces, such as word counters, summarizers, t
\newpage
\vspace*{-4pt} % whenever title at start of page
\vspace*{-16pt} % whenever title at start of page
\section{Vernacular processing\\as mapping}
\noindent
The field of NLP understands mapping as an activity to turn so-called unstructured language into structured linguistic objects, such as a document index, a thesaurus, a dictionary, a comparative word list, or a morph analyser. In the NLTK textbook, \emph{Natural Language Processing with Python},\footnote{\emph{Natural Language Processing with Python }is a textbook, which is often used as a first mediator when working with NLTK tools. Steven Bird, Ewan Klein, and Edward Loper, \emph{Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit }(Cambridge: O'Reilly, 2009). The book is also available in a digital form at \url{https://www.nltk.org/book/}.} such mapping activities are introduced in the following way:
The field of NLP understands mapping as an activity to turn so-called unstructured language into structured linguistic objects, such as a document index, a thesaurus, a dictionary, a comparative word list, or a morph analyser. In the NLTK textbook, \emph{Natural Language Processing with Python},\footnote{\emph{Natural Language Processing with Python }is a textbook, which is often used as a first mediator when working with NLTK tools. Steven Bird, Ewan Klein, and Edward Loper, \emph{Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit }(Cambridge: O'Reilly, 2009). The book is also available in a digital form at \url{https://www.nltk.org/book}.} such mapping activities are introduced in the following way:
\begin{quoting}
Most often, we are mapping from a “word” to some structured object. For example, a document index maps from a word (which we can represent as a string), to a list of pages (represented as a list of integers). In this section, we will see how to represent such mappings in Python.\footnote{\url{https://www.nltk.org/book/ch05.html#sec-dictionaries}}
@ -95,6 +95,9 @@ The field of NLP understands mapping as an activity to turn so-called unstructur
\noindent
In the same chapter there is also a table that describes the different maps that NLTK comes with:
\interfootnotelinepenalty=10000
\vspace*{2pt}
\begin{tabularx}{1\textwidth}{
| >{\raggedright\arraybackslash}X
| >{\raggedright\arraybackslash}X
@ -115,14 +118,15 @@ In the same chapter there is also a table that describes the different maps that
\hline
\end{tabularx}
\noindent
Figure: NLTK's linguistic objects, From \emph{Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit}, “Table 3.1: Linguistic Objects as Mappings from Keys to Values”\footnote{\url{https://www.nltk.org/book/ch05.html#tab-linguistic-objects}}\\
\vspace*{-1\baselineskip}
\vspace*{4pt}
\noindent
NLTK uses the metaphor of mapping to form indexical relations between truth and map. The use of the word mapping was something that caught our attention -- it is this indexical relation that needs questioning and study. Considering that language maps generate a new kind of linguistic matter, one that is processed and transformed through code, how does that mutate language? How can these mutations be studied? What kinds of maps can be made to map language differently? Can mapping be done based on:
\vspace{\baselineskip}
{\fontsize{8pt}{10pt}\selectfont Figure: NLTK's linguistic objects, From \emph{Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit “Table 3.1: Linguistic Objects as Mappings from Keys to Values”}\footnote{ \url{https://www.nltk.org/book/ch05.html#tab-linguistic-objects}} \par}\\
\noindent
NLTK uses the metaphor of mapping to form indexical relations between truth and map. The use of the word mapping was something that caught our attention -- it is this indexical relation that needs questioning and study. Considering that language maps generate a new kind of linguistic matter, one that is processed and transformed through code, how does that mutate language? How can these mutations be studied? What kinds of maps can be made to map language differently? Can mapping be done based~on:
\\
\begin{compactitem}[$\bullet$]
\tightlist
\item
@ -144,6 +148,7 @@ NLTK uses the metaphor of mapping to form indexical relations between truth and
\noindent
Below is an attempt to remake the table of linguistic objects keeping these questions in mind:
\vspace*{2pt}
\begin{tabularx}{1\textwidth}{
| >{\raggedright\arraybackslash}X
| >{\raggedright\arraybackslash}X
@ -178,6 +183,15 @@ Below is an attempt to remake the table of linguistic objects keeping these ques
\hline
Formatterings & Aesthetics & Collages\\
\hline
\end{tabularx}
\vspace*{2pt}
\begin{tabularx}{1\textwidth}{
| >{\raggedright\arraybackslash}X
| >{\raggedright\arraybackslash}X
| >{\raggedright\arraybackslash}X |
}
\hline
Transfictions & Generative phrasings & Fictional transcripts\\
\hline
? & IN + NN & Prepositional agents\\
@ -190,10 +204,13 @@ Below is an attempt to remake the table of linguistic objects keeping these ques
\hline
\end{tabularx}
\vspace*{-1\baselineskip}
\vspace*{4pt}
\noindent
Figure: A table of VLTK possibilities. To be versioned and expanded.
{\fontsize{8pt}{10pt}\selectfont Figure: A table of VLTK possibilities. To be versioned and expanded.}
\vspace{\baselineskip}
\vspace{2\baselineskip}
\vspace*{-6pt}
\noindent
The table of VLTK possibilities includes \emph{Complexity matrices} that complexify the understanding of the context of a certain phrase or word\footnote{An example of this is word2complex, a workshop by Manetta Berends and Cristina Cochior. word2complex is a thought experiment to resist the flattening of meaning that is inherent in word2vec, a model commonly used to create “word embeddings.” \url{http://titipi.org/wiki/index.php/Word2complex}}; \emph{Navigation scores} that generate scores based on the path a reader took through the text, ready for a next reader to be used as a guide; and forms of encapsulated close reading, using \emph{Transfictions}, to provide ways to ruminate a set of phrases by dislocating and re-contextualizing them. This last one is interpreted into a script below that wraps expressions from the book \emph{Queer Phenomenology} by Sara Ahmed into a compilation of conversational utterances arranged by chance.\footnote{Sara Ahmed, \emph{Queer Phenomenology: Orientations, Objects, Others} (Durham: Duke University Press, 2006).}
@ -241,12 +258,14 @@ The code output above is a fictional script generated by taking some of Ahmed's
The dislocation and relocation of phrases in the transfiction above draws attention to the aesthetics of knowledge organisation and structuring. By playing with traces of orality, versioning the language by accompanying it with verbal expressions, the example can be read as an invitation to keep Ahmed's phrases close, giving room for generating new understandings of them. The repetition of the dialogical phrases, for example, injects an idea of timing and rhythm, leaving gaps for the reader to fill in with their own interpretations.
In “Eros in the Library,” Melissa Adler introduces the ancient Greek historian Pamphila, who weaved “multiple sources and genres together to create a pleasing set of histories,” through a method she she called \emph{poikilia}.\footnote{Adler, “Eros in the library,” 69.} Adler cites Adeline Grand-Clément's definition of \emph{poikilia} as “harmonia that does not unify.”\footnote{Adeline Grand-Clément, “Poikilia,” in \emph{A Companion to Ancient Aesthetics}, ed. Pierre Destrée and Penelope Murray (Hoboken: John Wiley \& Sons, 2015), 410, cited in Adler, “Eros in the library,” 69.} The aesthetic beauty and pleasure in Pamphilia's method shifts the purpose of knowledge organisation.
In “Eros in the Library,” Melissa Adler introduces the ancient Greek historian Pamphila, who weaved “multiple sources and genres together to create a pleasing set of histories,” through a method she she called \emph{poikilia}.\footnote{Adler, “Eros in the library,” 69.} Adler cites Adeline Grand-Clément's definition of \emph{poikilia} as “\nohyphens{harmonia} that does not unify.”\footnote{Adeline Grand-Clément, “Poikilia,” in \emph{A Companion to Ancient Aesthetics}, ed. Pierre Destrée and Penelope Murray (Hoboken: John Wiley \& Sons, 2015), 410, cited in Adler, “Eros in the library,” 69.} The aesthetic beauty and pleasure in Pamphilia's method shifts the purpose of knowledge organisation.
Instead of looking at the text from a distance by counting words and searching for numerical patterns throughout Ahmed's book, which is a very common practice in the field of NLP, these phrases were chosen after closely reading and discussing them. However, there is still an awkwardness in mixing them through programmatic relocations, placing the words in another context than the author intended them to be in, which speaks to their need to be handled with care because a misalignment of contexts can create hurt.
\fontdimen3\font=0.5em
The transfiction is an exercise to think about the relocation and recontextualisation of language, which has started from specific words from the book that resonated with the questions around vernacular language processing. They introduce a thinking around the notion of orientation, which adds a situated dimension to the metaphor of mapping. Ahmed describes orientation as a gesture of being “turned toward certain objects, those that help us to find our way.”\footnote{Sara Ahmed, \emph{Queer Phenomenology: Orientations, Objects, Others} (Durham: Duke University Press, 2006), 1.} If language is seen as a landscape of textual objects, in which one wishes to orient oneself, how do markers of orientation and markers of difference emerge? How do we orient ourselves? And what does it mean to be orientated through linguistic markers?
\fontdimen3\font=0.2em
What does it mean to use the metaphor of mapping when working with language processing tools? Which issues and tensions related to non-metaphorical mapping practices -- such as cartography -- can we learn from when we map language?
\section{\nohyphens{\emph{Where} is the vernacular?}}
@ -262,18 +281,20 @@ In the two examples above, we are speaking about different kinds of vernacular.
\emph{Where} the vernacular is positioned within vernacular language processing is a complex question. How do we differentiate between different forms of informal language, such as dialects, accents, or slang? How do we understand the vernacular in relation to standards, urgencies, access, and time? This is a political question. Due to structural inequalities, it is important for more forms of speech, accents, and grammar, to be included in mainstream ways of doing, such as in the case mentioned by Halcyon Lawrence, which is a request for the vernacular to become standardised. It matters who the standards exclude, who has access, or for what purpose time is optimised.
Optimisation is a term often encountered in technical environments, where it generally refers to maximising the technical performance and minimising the financial costs of a particular technology. Seda Gürses et al argue that “optimization-based systems are developed to capture and manipulate behavior and environments for the extraction of value” and that as a result, “they introduce broader risks and harms for users and environments beyond the outcome of a single algorithm within that system.”\footnote{Seda Gürses et al, “POTs: Protective Optimization Technologies,” \emph{FAT* '20: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency}, 177--188. Paper available at \url{https://arxiv.org/pdf/1806.02711.pdf}. Quote is p1 on uploaded version.} While optimisation has its purpose in specific situations such as the non-static naming system of roads, the way optimisation has been embraced by the sciences as a mode of operating removes the possibility to stay with the uncertainty of what will follow, because the goal is defined within a financial scope. In programming more specifically, code is often written with the projection of what it should do in the world already present.
\fontdimen3\font=0.7em
Optimisation is a term often encountered in technical environments, where it generally refers to maximising the technical performance and minimising the financial costs of a particular technology. Seda Gürses et al argue that “optimization-based systems are developed to capture and manipulate behavior and environments for the extraction of value” and that as a result, “they introduce broader risks and harms for users and environments beyond the outcome of a single algorithm within that system.”\footnote{Seda Gürses et al, “POTs: Protective Optimization Technologies,” \emph{FAT* '20: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency}, 177--188. Paper available at \url{https://arxiv.org/pdf/1806.02711.pdf}. Quote is p1 on uploaded version.} While optimisation has its purpose in specific situations such as the non-static naming system of roads, the way optimisation has been embraced by the sciences as a mode of operating removes the possibility to stay with the uncertainty of what will follow, because the goal is defined within a financial scope. In programming more specifically, code is often written with the projection of what it should do in the world already~present.
What might it mean instead to slow down and re-embed language processing in a messy world,\footnote{Derived from: “Reclaiming operations are never easy. If reclaiming scientific research means re-embedding the sciences in a messy world, it is not only a question of accepting this world as such, but of positively appreciating it, of learning how to foster and strengthen, in Whitehead's words, `the habits of concrete appreciation of the individual facts in their full interplay of emergent values'.” Isabelle Stengers, \emph{Another Science is Possible: A Manifesto for Slow Science }(Cambridge: Polity, 2018), 122.} making space for rethinking the goal of a project or even without aiming for solutions at all? To take the time to develop counter-hegemonic counting techniques that process language otherwise?\footnote{“...referencing and citation in Black studies are what Carmen Kynard calls 'vernacular insurrections': narratives that are 'not only counter hegemonic, but also affirmative of new, constantly mutating languages, identities, political methodologies, and social understandings that communities form in and of themselves both inwardly and outwardly . . . not merely the bits and pieces chipped off or chipping away at dominant culture, but a whole new emergence.”' Katherine McKittrick, \emph{Dear Science and Other Stories}, (Durham: Duke University Press, 2021), 26--27. McKittrick is citing Carmen Kynard, \emph{Vernacular Insurrections: Race, Black Protest, and the New Century in Composition Literacies Studies} (Albany: SUNY Press, 2013), 10--11.} VLTK turns to slow processing as a way to turn and return to the material at hand.
\section{\nohyphens{A conclusion that is\\a beginning}}
\noindent
Although the questions we ask may seem particular to language processing applications in scope, they are still relevant in a broader sense, as the intentions behind human communication are more and more evaluated by algorithms, especially on social media. For example, since 2017 hate speech and harassment recognition on Twitter has been heavily reliant on algorithms, but trolls have come to adopt methods that can circumvent their speech from being flagged. Twitter has what they call a Hate Lab that works on algorithms “to end hate speech and improve healthy conversations online.”\footnote{\url{https://developer.twitter.com/en/community/success-stories/hatelab}} Formally codifying the understanding of what hate speech consists of leaves plenty of ways to work around a detection algorithm, and so methods of pursuing reckless harassment and hate speech have become plentiful. One can, for instance, make use of stylistic devices such as metonymy, antiphrasis, or irony. On French Twitter, immigrants and their children are referred to mockingly as “chances for the country,” which implies that a certain category of immigrants' contribution to society is nefarious. By using such masked language, the messages escape hate speech detection while continuing to spread their harm. Of course, the deciphering of such allusions requires a familiarisation with the vernacular codes of these communities. Emojis are also used as signifiers of a shared universe of references. For the French it might be the map, a signifier of white supremacism that nods towards a map of the average IQs by country.\footnote{This refers to the map in the racist and antihuman book by Richard Lynn and Tatu Vanhanen, \emph{IQ and the Wealth of Nations} (Westport: Praeger, 2002). See also Pauline Moullot, “La carte mondiale des QI, relayée par des comptes d'extrême droite, a-t-elle une valeur scientifique?” \emph{Liberation}, November 14, 2019, \url{https://www.liberation.fr/checknews/2019/11/14/la-carte-mondiale-des-qi-relayee-par-des-comptes-d-extreme-droite-a-t-elle-une-valeur-scientifique_1754773/}} As such, it is important to remember that hate speech can also be vernacular language.
Although the questions we ask may seem particular to language processing applications in scope, they are still relevant in a broader sense, as the intentions behind human communication are more and more evaluated by algorithms, especially on social media. For example, since 2017 hate speech and harassment recognition on Twitter has been heavily reliant on algorithms, but trolls have come to adopt methods that can circumvent their speech from being flagged. Twitter has what they call a Hate Lab that works on algorithms “to end hate speech and improve healthy conversations online.”\footnote{\url{https://developer.twitter.com/en/community/success-stories/hatelab}} Formally codifying the understanding of what hate speech consists of leaves plenty of ways to work around a detection algorithm, and so methods of pursuing reckless harassment and hate speech have become plentiful. One can, for instance, make use of stylistic devices such as metonymy, antiphrasis, or irony. On French Twitter, immigrants and their children are referred to mockingly as “chances for the country,” which implies that a certain category of immigrants' contribution to society is nefarious. By using such masked language, the messages escape hate speech detection while continuing to spread their harm. Of course, the deciphering of such allusions requires a familiarisation with the vernacular codes of these communities. Emojis are also used as signifiers of a shared universe of references. For the French it might be the map, a signifier of white supremacism that nods towards a map of the average IQs by country.\footnote{This refers to the map in the racist and antihuman book by Richard Lynn and Tatu Vanhanen, \emph{IQ and the Wealth of Nations} (Westport: Praeger, 2002). See also Pauline Moullot, “La carte mondiale des QI, relayée par des comptes d'extrême droite, a-t-elle une valeur scientifique?” \emph{Liberation}, November~14,~2019, \href{https://liberation.fr/checknews/2019/11/14/la-carte-mondiale-des-qi-relayee-par-des-comptes-d-extreme-droite-a-t-elle-une-valeur-scientifique_1754773/}{https://www.liberation.fr/checknews/2019/\linebreak 11/14/la-carte-mondiale-des-qi-relayee-par-des-comptes-d-extreme-droite-a-t-elle-une-\linebreak valeur-scientifique\_1754773/}} As such, it is important to remember that hate speech can also be vernacular language.
On the other hand, vernacular communication can be harmfully misunderstood by algorithms trained with a normative use of language in mind. This is the case with Perspective, the toxic speech detection API from Jigsaw (Google), which has a history of flagging African American Vernacular English (AAVE) as toxic.\footnote{Devin Coldewey, “Racial bias observed in hate speech detection algorithm from Google,” \emph{Techcrunch}, August 15, 2019, \url{https://techcrunch.com/2019/08/14/racial-bias-observed-in-hate-speech-detection-algorithm-from-google/}} Still, as of early 2021, Perspective was processing about 500 million requests daily in online spaces such as the comment sections of \emph{El País}, \emph{Disqus}, \emph{The New York Times}, and others.\footnote{ Kyle Wiggers, “Jigsaw's AI-powered toxic language detector is now processing 500 million requests daily,” \emph{Venturebeat}, February 8, 2021, \url{https://venturebeat.com/2021/02/08/jigsaws-ai-powered-toxic-language-detector-is-now-processing-500-million-requests-daily/}} The risks of these language models reinforcing standards and refusing vernaculars are hard to understate. Communities that have been and still are marginalised become marginalised further through the rejection of their linguistic expression.
\fontdimen3\font=0.2em
On the other hand, vernacular communication can be harmfully misunderstood by algorithms trained with a normative use of language in mind. This is the case with Perspective, the toxic speech detection API from Jigsaw (Google), which has a history of flagging African American Vernacular English (AAVE) as toxic.\footnote{Devin Coldewey, “Racial bias observed in hate speech detection algorithm from Google,” \emph{Techcrunch}, August 15, 2019, \href{https://techcrunch.com/2019/08/14/racial-bias-observed-in-hate-speech-detection-algorithm-from-google/}{https://techcrunch.com/2019/08/14/racial-bias-observed-\linebreak in-hate-speech-detection-algorithm-from-google/}} Still, as of early 2021, Perspective was processing about 500 million requests daily in online spaces such as the comment sections of \emph{El País}, \emph{Disqus}, \emph{The New York Times}, and others.\footnote{ Kyle Wiggers, “Jigsaw's AI-powered toxic language detector is now processing 500 million requests daily,” \emph{Venturebeat}, February 8, 2021, \url{https://venturebeat.com/2021/02/08/jigsaws-ai-powered-toxic-language-detector-is-now-processing-500-million-requests-daily/}} The risks of these language models reinforcing standards and refusing vernaculars are hard to understate. Communities that have been and still are marginalised become marginalised further through the rejection of their linguistic expression.
Both cases show that language models are not able to adapt to contexts and that moderation should not be left to automated systems. The vernacular and the systematic are hard to pull apart, resulting in a complex interrelation that is urgent to be thought through, not just in the interest of platforms.
Both cases show that language models are not able to adapt to contexts and that moderation should not be left to automated systems. The vernacular and the systematic are hard to pull apart, resulting in a \nohyphens{complex} interrelation that is urgent to be thought through, not just in the interest of~platforms.
VLTK started from an interest to understand programming in relation to language processing, a practice that both shapes language and is shaped by it. To study this mutual transformation, we are specifically curious about programming practices that stay close to the material they work with, such as code that is written for a specific collection or a specific group of people, or esoteric code that is intentionally weird, peculiar, and not always made to be functional.\footnote{ “Welcome to Esolang, the esoteric programming languages wiki! This wiki is dedicated to the fostering and documentation of programming languages designed to be unique, difficult to program in, or just plain weird.” https://esolangs.org/wiki/Main\_Page} All the while keeping in mind that the constraints of programming languages themselves will also become the constraints of vernacular language processing.

BIN
pdfs/01_introduction.pdf

Binary file not shown.

61
pdfs/01_introduction.tex

@ -37,6 +37,8 @@
\usepackage{eso-pic}
\usepackage{xespotcolor}
\usepackage{hyperref}
\usepackage{changepage}
\hypersetup{
hidelinks, % Remove visible links altogether
%urlbordercolor = 1 1 1% Make URL link border white
@ -175,11 +177,11 @@ keepaspectratio]{04_magic-magick/test_toc-03-white.png}%
% CODE STYLE
\lstset{
aboveskip=3mm,
belowskip=3mm,
aboveskip=12pt,
belowskip=12pt,
showstringspaces=false,
columns=flexible,
basicstyle={\small\ttfamily},
basicstyle={\ttfamily\fontsize{9pt}{12pt}\selectfont},
numbers=none,
numberstyle=\tiny,
keywordstyle=\bfseries,
@ -267,7 +269,7 @@ In this introductory text, we will share some of the thoughts behind \nohyphens{
VLTK takes a dive into the logical operations that are used to process language with a computer to speak back to a range of unassuming habits in the field of computational language processing, and step towards modes of embedded, slow, and vernacular language processing and knowledge organisation. “Vernacular” in this text refers to everyday speech forming at the margins of standardisation; the ephemeral aspects of a culture's particularities that resist or exist alongside dominant systems of institutional aesthetics; or the encapsulation of a specific nowness in time.
\fontdimen3\font=0.2em
Stretching the vocabulary that is commonly used by computational language processing practices is an important part of the work. In these contexts, language is often understood as “natural” or “artificial,” where the natural refers to spoken or written human language and the artificial to formal languages such as programming languages. Questioning these terms enables us to approach language as culturally promiscuous, and constantly in flux. Language is complex, messy, and contains all kinds of structures; some emerge through use, like sayings or expressions, others are imposed by external forces, sometimes even violently.\footnote{ “It is often forgotten that {[}dictionaries{]} are artificial repositories, put together well after the languages they define. The roots of language are irrational and of a magical nature.”\emph{ }Jorge Luis Borges, \emph{El otro, el mismo}. (Buenos Aires: Emecé, 2005), 5. Translation by the authors.} Instead of thinking of language as “raw” data, we prefer to consider it as heavily embedded and dense cultural material, which carries traces of its uses through time and ties to different locations. And instead of speaking of “extracting” keywords or phrases, to think of such actions as \emph{re-formations} or \emph{di-versioning}.
Stretching the vocabulary that is commonly used by computational language processing practices is an important part of the work. In these contexts, language is often understood as “natural” or “artificial,” where the natural refers to spoken or written human language and the artificial to formal languages such as programming languages. Questioning these terms enables us to approach language as culturally promiscuous, and constantly in flux. Language is complex, messy, and contains all kinds of structures; some emerge through use, like sayings or expressions, others are imposed by external forces, sometimes even violently.\footnote{ \fontdimen3\font=0.6em “It is often forgotten that {[}dictionaries{]} are artificial repositories, put together well after the languages they define. The roots of language are irrational and of a magical nature.”\emph{ }Jorge Luis Borges, \emph{El otro, el mismo}. (Buenos Aires: Emecé, 2005), 5. Translation by the~authors.}\fontdimen3\font=0.2em Instead of thinking of language as “raw” data, we prefer to consider it as heavily embedded and dense cultural material, which carries traces of its uses through time and ties to different locations. And instead of speaking of “extracting” keywords or phrases, to think of such actions as \emph{re-formations} or \emph{di-versioning}.
We are curious about undoing and crossing such pervasive terminologies into methods that allow us to rethink how we work with language in code: from dictionaries to \emph{contradictionaries}, from counting via accounting to \emph{ac-count-abilities}, from overlapping to \emph{overlooping}, from formatting to \emph{formatterings}. For example, a \emph{contradictionary} could provide openings to possible interpretations of a word, instead of defining its meaning. A \emph{formattering} could refer to the shaping of matter through formats. And so~on.
@ -279,7 +281,7 @@ The specific focus of VLTK on language playfully blurs the boundaries between to
\fontdimen3\font=0.2em
\looseness=16
\noindent
The acronym VLTK is a response to, and joke on, a ubiquitous programming library called NLTK, which stands for Natural Language ToolKit. It is a well-known project among programmers and people working in the field of computational linguistics, which is also known as the field of Natural Language Processing (NLP). Current, concrete use cases of NLTK include operations such as text classification for customer support, sentiment analysis for marketing purposes, or automated hate speech detection. Our collective work around VLTK started from a discomfort with the expression “natural language,” which is used in the field of computational linguistics to refer to language that has not been structured (yet) for further computational processing. Accepting the premise that language is natural would imply ignoring the procedures through which language becomes naturalised, imposed, or overwritten, and ignoring the political mechanisms that sustain these efforts. We came to VLTK through a word play that scratched the itch of this discomfort and we started replacing the term “natural” with “vernacular” instead. Vernacular, a word close to “vulgar” -- of the people, common -- points towards the processes of language formation, and the context and urgency they require to exist. This small but defiant joke slowly grew through conversations, reflections, and a desire to try to do otherwise. It opened up and stretched generative spaces of interpretation.
The acronym VLTK is a response to, and joke on, a ubiquitous programming library called NLTK, which stands for Natural Language ToolKit. It is a well-known project among programmers and people working in the field of computational linguistics, which is also known as the field of Natural Language Processing (NLP). Current, concrete use cases of NLTK include operations such as text classification for customer support, sentiment analysis for marketing purposes, or automated hate speech detection. Our collective work around VLTK started from a discomfort with the expression “natural language,” which is used in the field of computational linguistics to refer to language that has not been structured (yet) for further computational processing. Accepting the premise that language is natural would imply ignoring the procedures through which language becomes naturalised, imposed, or overwritten, and ignoring the political mechanisms that sustain these efforts. We came to VLTK through a word play that scratched the itch of this discomfort and we started replacing the term “natural” with “vernacular” instead. Vernacular, a word close to “vulgar” -- of the people, common -- points towards the processes of language formation, and the context and urgency they require to exist. This small but defiant joke slowly grew through conversations, reflections, and a desire to try to do otherwise. It opened up and stretched generative spaces ofinterpretation.
\fontdimen3\font=0.15em
NLTK allows you to interface with text in many different ways using a programming language, in this case Python. The makers of NLTK introduce the project on their website by describing it as:
@ -295,12 +297,12 @@ NLTK comes with a whole set of interfaces, such as word counters, summarizers, t
\newpage
\vspace*{-4pt} % whenever title at start of page
\vspace*{-16pt} % whenever title at start of page
\section{Vernacular processing\\as mapping}
\noindent
The field of NLP understands mapping as an activity to turn so-called unstructured language into structured linguistic objects, such as a document index, a thesaurus, a dictionary, a comparative word list, or a morph analyser. In the NLTK textbook, \emph{Natural Language Processing with Python},\footnote{\emph{Natural Language Processing with Python }is a textbook, which is often used as a first mediator when working with NLTK tools. Steven Bird, Ewan Klein, and Edward Loper, \emph{Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit }(Cambridge: O'Reilly, 2009). The book is also available in a digital form at \url{https://www.nltk.org/book/}.} such mapping activities are introduced in the following way:
The field of NLP understands mapping as an activity to turn so-called unstructured language into structured linguistic objects, such as a document index, a thesaurus, a dictionary, a comparative word list, or a morph analyser. In the NLTK textbook, \emph{Natural Language Processing with Python},\footnote{\emph{Natural Language Processing with Python }is a textbook, which is often used as a first mediator when working with NLTK tools. Steven Bird, Ewan Klein, and Edward Loper, \emph{Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit }(Cambridge: O'Reilly, 2009). The book is also available in a digital form at \url{https://www.nltk.org/book}.} such mapping activities are introduced in the following way:
\begin{quoting}
Most often, we are mapping from a “word” to some structured object. For example, a document index maps from a word (which we can represent as a string), to a list of pages (represented as a list of integers). In this section, we will see how to represent such mappings in Python.\footnote{\url{https://www.nltk.org/book/ch05.html#sec-dictionaries}}
@ -309,6 +311,9 @@ The field of NLP understands mapping as an activity to turn so-called unstructur
\noindent
In the same chapter there is also a table that describes the different maps that NLTK comes with:
\interfootnotelinepenalty=10000
\vspace*{2pt}
\begin{tabularx}{1\textwidth}{
| >{\raggedright\arraybackslash}X
| >{\raggedright\arraybackslash}X
@ -329,14 +334,15 @@ In the same chapter there is also a table that describes the different maps that
\hline
\end{tabularx}
\noindent
Figure: NLTK's linguistic objects, From \emph{Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit}, “Table 3.1: Linguistic Objects as Mappings from Keys to Values”\footnote{\url{https://www.nltk.org/book/ch05.html#tab-linguistic-objects}}\\
\vspace*{-1\baselineskip}
\vspace*{4pt}
\noindent
NLTK uses the metaphor of mapping to form indexical relations between truth and map. The use of the word mapping was something that caught our attention -- it is this indexical relation that needs questioning and study. Considering that language maps generate a new kind of linguistic matter, one that is processed and transformed through code, how does that mutate language? How can these mutations be studied? What kinds of maps can be made to map language differently? Can mapping be done based on:
\vspace{\baselineskip}
{\fontsize{8pt}{10pt}\selectfont Figure: NLTK's linguistic objects, From \emph{Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit “Table 3.1: Linguistic Objects as Mappings from Keys to Values”}\footnote{ \url{https://www.nltk.org/book/ch05.html#tab-linguistic-objects}} \par}\\
\noindent
NLTK uses the metaphor of mapping to form indexical relations between truth and map. The use of the word mapping was something that caught our attention -- it is this indexical relation that needs questioning and study. Considering that language maps generate a new kind of linguistic matter, one that is processed and transformed through code, how does that mutate language? How can these mutations be studied? What kinds of maps can be made to map language differently? Can mapping be done based~on:
\\
\begin{compactitem}[$\bullet$]
\tightlist
\item
@ -358,6 +364,7 @@ NLTK uses the metaphor of mapping to form indexical relations between truth and
\noindent
Below is an attempt to remake the table of linguistic objects keeping these questions in mind:
\vspace*{2pt}
\begin{tabularx}{1\textwidth}{
| >{\raggedright\arraybackslash}X
| >{\raggedright\arraybackslash}X
@ -392,6 +399,15 @@ Below is an attempt to remake the table of linguistic objects keeping these ques
\hline
Formatterings & Aesthetics & Collages\\
\hline
\end{tabularx}
\vspace*{2pt}
\begin{tabularx}{1\textwidth}{
| >{\raggedright\arraybackslash}X
| >{\raggedright\arraybackslash}X
| >{\raggedright\arraybackslash}X |
}
\hline
Transfictions & Generative phrasings & Fictional transcripts\\
\hline
? & IN + NN & Prepositional agents\\
@ -404,10 +420,13 @@ Below is an attempt to remake the table of linguistic objects keeping these ques
\hline
\end{tabularx}
\vspace*{-1\baselineskip}
\vspace*{4pt}
\noindent
Figure: A table of VLTK possibilities. To be versioned and expanded.
{\fontsize{8pt}{10pt}\selectfont Figure: A table of VLTK possibilities. To be versioned and expanded.}
\vspace{\baselineskip}
\vspace{2\baselineskip}
\vspace*{-6pt}
\noindent
The table of VLTK possibilities includes \emph{Complexity matrices} that complexify the understanding of the context of a certain phrase or word\footnote{An example of this is word2complex, a workshop by Manetta Berends and Cristina Cochior. word2complex is a thought experiment to resist the flattening of meaning that is inherent in word2vec, a model commonly used to create “word embeddings.” \url{http://titipi.org/wiki/index.php/Word2complex}}; \emph{Navigation scores} that generate scores based on the path a reader took through the text, ready for a next reader to be used as a guide; and forms of encapsulated close reading, using \emph{Transfictions}, to provide ways to ruminate a set of phrases by dislocating and re-contextualizing them. This last one is interpreted into a script below that wraps expressions from the book \emph{Queer Phenomenology} by Sara Ahmed into a compilation of conversational utterances arranged by chance.\footnote{Sara Ahmed, \emph{Queer Phenomenology: Orientations, Objects, Others} (Durham: Duke University Press, 2006).}
@ -455,12 +474,14 @@ The code output above is a fictional script generated by taking some of Ahmed's
The dislocation and relocation of phrases in the transfiction above draws attention to the aesthetics of knowledge organisation and structuring. By playing with traces of orality, versioning the language by accompanying it with verbal expressions, the example can be read as an invitation to keep Ahmed's phrases close, giving room for generating new understandings of them. The repetition of the dialogical phrases, for example, injects an idea of timing and rhythm, leaving gaps for the reader to fill in with their own interpretations.
In “Eros in the Library,” Melissa Adler introduces the ancient Greek historian Pamphila, who weaved “multiple sources and genres together to create a pleasing set of histories,” through a method she she called \emph{poikilia}.\footnote{Adler, “Eros in the library,” 69.} Adler cites Adeline Grand-Clément's definition of \emph{poikilia} as “harmonia that does not unify.”\footnote{Adeline Grand-Clément, “Poikilia,” in \emph{A Companion to Ancient Aesthetics}, ed. Pierre Destrée and Penelope Murray (Hoboken: John Wiley \& Sons, 2015), 410, cited in Adler, “Eros in the library,” 69.} The aesthetic beauty and pleasure in Pamphilia's method shifts the purpose of knowledge organisation.
In “Eros in the Library,” Melissa Adler introduces the ancient Greek historian Pamphila, who weaved “multiple sources and genres together to create a pleasing set of histories,” through a method she she called \emph{poikilia}.\footnote{Adler, “Eros in the library,” 69.} Adler cites Adeline Grand-Clément's definition of \emph{poikilia} as “\nohyphens{harmonia} that does not unify.”\footnote{Adeline Grand-Clément, “Poikilia,” in \emph{A Companion to Ancient Aesthetics}, ed. Pierre Destrée and Penelope Murray (Hoboken: John Wiley \& Sons, 2015), 410, cited in Adler, “Eros in the library,” 69.} The aesthetic beauty and pleasure in Pamphilia's method shifts the purpose of knowledge organisation.
Instead of looking at the text from a distance by counting words and searching for numerical patterns throughout Ahmed's book, which is a very common practice in the field of NLP, these phrases were chosen after closely reading and discussing them. However, there is still an awkwardness in mixing them through programmatic relocations, placing the words in another context than the author intended them to be in, which speaks to their need to be handled with care because a misalignment of contexts can create hurt.
\fontdimen3\font=0.5em
The transfiction is an exercise to think about the relocation and recontextualisation of language, which has started from specific words from the book that resonated with the questions around vernacular language processing. They introduce a thinking around the notion of orientation, which adds a situated dimension to the metaphor of mapping. Ahmed describes orientation as a gesture of being “turned toward certain objects, those that help us to find our way.”\footnote{Sara Ahmed, \emph{Queer Phenomenology: Orientations, Objects, Others} (Durham: Duke University Press, 2006), 1.} If language is seen as a landscape of textual objects, in which one wishes to orient oneself, how do markers of orientation and markers of difference emerge? How do we orient ourselves? And what does it mean to be orientated through linguistic markers?
\fontdimen3\font=0.2em
What does it mean to use the metaphor of mapping when working with language processing tools? Which issues and tensions related to non-metaphorical mapping practices -- such as cartography -- can we learn from when we map language?
\section{\nohyphens{\emph{Where} is the vernacular?}}
@ -476,18 +497,20 @@ In the two examples above, we are speaking about different kinds of vernacular.
\emph{Where} the vernacular is positioned within vernacular language processing is a complex question. How do we differentiate between different forms of informal language, such as dialects, accents, or slang? How do we understand the vernacular in relation to standards, urgencies, access, and time? This is a political question. Due to structural inequalities, it is important for more forms of speech, accents, and grammar, to be included in mainstream ways of doing, such as in the case mentioned by Halcyon Lawrence, which is a request for the vernacular to become standardised. It matters who the standards exclude, who has access, or for what purpose time is optimised.
Optimisation is a term often encountered in technical environments, where it generally refers to maximising the technical performance and minimising the financial costs of a particular technology. Seda Gürses et al argue that “optimization-based systems are developed to capture and manipulate behavior and environments for the extraction of value” and that as a result, “they introduce broader risks and harms for users and environments beyond the outcome of a single algorithm within that system.”\footnote{Seda Gürses et al, “POTs: Protective Optimization Technologies,” \emph{FAT* '20: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency}, 177--188. Paper available at \url{https://arxiv.org/pdf/1806.02711.pdf}. Quote is p1 on uploaded version.} While optimisation has its purpose in specific situations such as the non-static naming system of roads, the way optimisation has been embraced by the sciences as a mode of operating removes the possibility to stay with the uncertainty of what will follow, because the goal is defined within a financial scope. In programming more specifically, code is often written with the projection of what it should do in the world already present.
\fontdimen3\font=0.7em
Optimisation is a term often encountered in technical environments, where it generally refers to maximising the technical performance and minimising the financial costs of a particular technology. Seda Gürses et al argue that “optimization-based systems are developed to capture and manipulate behavior and environments for the extraction of value” and that as a result, “they introduce broader risks and harms for users and environments beyond the outcome of a single algorithm within that system.”\footnote{Seda Gürses et al, “POTs: Protective Optimization Technologies,” \emph{FAT* '20: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency}, 177--188. Paper available at \url{https://arxiv.org/pdf/1806.02711.pdf}. Quote is p1 on uploaded version.} While optimisation has its purpose in specific situations such as the non-static naming system of roads, the way optimisation has been embraced by the sciences as a mode of operating removes the possibility to stay with the uncertainty of what will follow, because the goal is defined within a financial scope. In programming more specifically, code is often written with the projection of what it should do in the world already~present.
What might it mean instead to slow down and re-embed language processing in a messy world,\footnote{Derived from: “Reclaiming operations are never easy. If reclaiming scientific research means re-embedding the sciences in a messy world, it is not only a question of accepting this world as such, but of positively appreciating it, of learning how to foster and strengthen, in Whitehead's words, `the habits of concrete appreciation of the individual facts in their full interplay of emergent values'.” Isabelle Stengers, \emph{Another Science is Possible: A Manifesto for Slow Science }(Cambridge: Polity, 2018), 122.} making space for rethinking the goal of a project or even without aiming for solutions at all? To take the time to develop counter-hegemonic counting techniques that process language otherwise?\footnote{“...referencing and citation in Black studies are what Carmen Kynard calls 'vernacular insurrections': narratives that are 'not only counter hegemonic, but also affirmative of new, constantly mutating languages, identities, political methodologies, and social understandings that communities form in and of themselves both inwardly and outwardly . . . not merely the bits and pieces chipped off or chipping away at dominant culture, but a whole new emergence.”' Katherine McKittrick, \emph{Dear Science and Other Stories}, (Durham: Duke University Press, 2021), 26--27. McKittrick is citing Carmen Kynard, \emph{Vernacular Insurrections: Race, Black Protest, and the New Century in Composition Literacies Studies} (Albany: SUNY Press, 2013), 10--11.} VLTK turns to slow processing as a way to turn and return to the material at hand.
\section{\nohyphens{A conclusion that is\\a beginning}}
\noindent
Although the questions we ask may seem particular to language processing applications in scope, they are still relevant in a broader sense, as the intentions behind human communication are more and more evaluated by algorithms, especially on social media. For example, since 2017 hate speech and harassment recognition on Twitter has been heavily reliant on algorithms, but trolls have come to adopt methods that can circumvent their speech from being flagged. Twitter has what they call a Hate Lab that works on algorithms “to end hate speech and improve healthy conversations online.”\footnote{\url{https://developer.twitter.com/en/community/success-stories/hatelab}} Formally codifying the understanding of what hate speech consists of leaves plenty of ways to work around a detection algorithm, and so methods of pursuing reckless harassment and hate speech have become plentiful. One can, for instance, make use of stylistic devices such as metonymy, antiphrasis, or irony. On French Twitter, immigrants and their children are referred to mockingly as “chances for the country,” which implies that a certain category of immigrants' contribution to society is nefarious. By using such masked language, the messages escape hate speech detection while continuing to spread their harm. Of course, the deciphering of such allusions requires a familiarisation with the vernacular codes of these communities. Emojis are also used as signifiers of a shared universe of references. For the French it might be the map, a signifier of white supremacism that nods towards a map of the average IQs by country.\footnote{This refers to the map in the racist and antihuman book by Richard Lynn and Tatu Vanhanen, \emph{IQ and the Wealth of Nations} (Westport: Praeger, 2002). See also Pauline Moullot, “La carte mondiale des QI, relayée par des comptes d'extrême droite, a-t-elle une valeur scientifique?” \emph{Liberation}, November 14, 2019, \url{https://www.liberation.fr/checknews/2019/11/14/la-carte-mondiale-des-qi-relayee-par-des-comptes-d-extreme-droite-a-t-elle-une-valeur-scientifique_1754773/}} As such, it is important to remember that hate speech can also be vernacular language.
Although the questions we ask may seem particular to language processing applications in scope, they are still relevant in a broader sense, as the intentions behind human communication are more and more evaluated by algorithms, especially on social media. For example, since 2017 hate speech and harassment recognition on Twitter has been heavily reliant on algorithms, but trolls have come to adopt methods that can circumvent their speech from being flagged. Twitter has what they call a Hate Lab that works on algorithms “to end hate speech and improve healthy conversations online.”\footnote{\url{https://developer.twitter.com/en/community/success-stories/hatelab}} Formally codifying the understanding of what hate speech consists of leaves plenty of ways to work around a detection algorithm, and so methods of pursuing reckless harassment and hate speech have become plentiful. One can, for instance, make use of stylistic devices such as metonymy, antiphrasis, or irony. On French Twitter, immigrants and their children are referred to mockingly as “chances for the country,” which implies that a certain category of immigrants' contribution to society is nefarious. By using such masked language, the messages escape hate speech detection while continuing to spread their harm. Of course, the deciphering of such allusions requires a familiarisation with the vernacular codes of these communities. Emojis are also used as signifiers of a shared universe of references. For the French it might be the map, a signifier of white supremacism that nods towards a map of the average IQs by country.\footnote{This refers to the map in the racist and antihuman book by Richard Lynn and Tatu Vanhanen, \emph{IQ and the Wealth of Nations} (Westport: Praeger, 2002). See also Pauline Moullot, “La carte mondiale des QI, relayée par des comptes d'extrême droite, a-t-elle une valeur scientifique?” \emph{Liberation}, November~14,~2019, \href{https://liberation.fr/checknews/2019/11/14/la-carte-mondiale-des-qi-relayee-par-des-comptes-d-extreme-droite-a-t-elle-une-valeur-scientifique_1754773/}{https://www.liberation.fr/checknews/2019/\linebreak 11/14/la-carte-mondiale-des-qi-relayee-par-des-comptes-d-extreme-droite-a-t-elle-une-\linebreak valeur-scientifique\_1754773/}} As such, it is important to remember that hate speech can also be vernacular language.
On the other hand, vernacular communication can be harmfully misunderstood by algorithms trained with a normative use of language in mind. This is the case with Perspective, the toxic speech detection API from Jigsaw (Google), which has a history of flagging African American Vernacular English (AAVE) as toxic.\footnote{Devin Coldewey, “Racial bias observed in hate speech detection algorithm from Google,” \emph{Techcrunch}, August 15, 2019, \url{https://techcrunch.com/2019/08/14/racial-bias-observed-in-hate-speech-detection-algorithm-from-google/}} Still, as of early 2021, Perspective was processing about 500 million requests daily in online spaces such as the comment sections of \emph{El País}, \emph{Disqus}, \emph{The New York Times}, and others.\footnote{ Kyle Wiggers, “Jigsaw's AI-powered toxic language detector is now processing 500 million requests daily,” \emph{Venturebeat}, February 8, 2021, \url{https://venturebeat.com/2021/02/08/jigsaws-ai-powered-toxic-language-detector-is-now-processing-500-million-requests-daily/}} The risks of these language models reinforcing standards and refusing vernaculars are hard to understate. Communities that have been and still are marginalised become marginalised further through the rejection of their linguistic expression.
\fontdimen3\font=0.2em
On the other hand, vernacular communication can be harmfully misunderstood by algorithms trained with a normative use of language in mind. This is the case with Perspective, the toxic speech detection API from Jigsaw (Google), which has a history of flagging African American Vernacular English (AAVE) as toxic.\footnote{Devin Coldewey, “Racial bias observed in hate speech detection algorithm from Google,” \emph{Techcrunch}, August 15, 2019, \href{https://techcrunch.com/2019/08/14/racial-bias-observed-in-hate-speech-detection-algorithm-from-google/}{https://techcrunch.com/2019/08/14/racial-bias-observed-\linebreak in-hate-speech-detection-algorithm-from-google/}} Still, as of early 2021, Perspective was processing about 500 million requests daily in online spaces such as the comment sections of \emph{El País}, \emph{Disqus}, \emph{The New York Times}, and others.\footnote{ Kyle Wiggers, “Jigsaw's AI-powered toxic language detector is now processing 500 million requests daily,” \emph{Venturebeat}, February 8, 2021, \url{https://venturebeat.com/2021/02/08/jigsaws-ai-powered-toxic-language-detector-is-now-processing-500-million-requests-daily/}} The risks of these language models reinforcing standards and refusing vernaculars are hard to understate. Communities that have been and still are marginalised become marginalised further through the rejection of their linguistic expression.
Both cases show that language models are not able to adapt to contexts and that moderation should not be left to automated systems. The vernacular and the systematic are hard to pull apart, resulting in a complex interrelation that is urgent to be thought through, not just in the interest of platforms.
Both cases show that language models are not able to adapt to contexts and that moderation should not be left to automated systems. The vernacular and the systematic are hard to pull apart, resulting in a \nohyphens{complex} interrelation that is urgent to be thought through, not just in the interest of~platforms.
VLTK started from an interest to understand programming in relation to language processing, a practice that both shapes language and is shaped by it. To study this mutual transformation, we are specifically curious about programming practices that stay close to the material they work with, such as code that is written for a specific collection or a specific group of people, or esoteric code that is intentionally weird, peculiar, and not always made to be functional.\footnote{ “Welcome to Esolang, the esoteric programming languages wiki! This wiki is dedicated to the fostering and documentation of programming languages designed to be unique, difficult to program in, or just plain weird.” https://esolangs.org/wiki/Main\_Page} All the while keeping in mind that the constraints of programming languages themselves will also become the constraints of vernacular language processing.

BIN
vltk.pdf

Binary file not shown.
Loading…
Cancel
Save