You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
309 lines
38 KiB
309 lines
38 KiB
|
|
% --- PART 1 ---
|
|
% --- INTRODUCTION ---
|
|
|
|
% HEADER
|
|
|
|
\setlength{\tabcolsep}{3pt}
|
|
\setlength\arrayrulewidth{0.25pt}
|
|
|
|
\fancyfoot[RE]{\chaptersignone}
|
|
\fancyfoot[LO]{\chaptersignone}
|
|
\fancyfoot[RO]{\thepage}
|
|
\fancyfoot[LE]{\thepage}
|
|
|
|
% CHAPTER PAGE
|
|
|
|
\enlargethispage{-1\baselineskip}
|
|
|
|
\begin{tcolorbox}[boxrule=2pt, arc=24mm, colframe=black, colback=white, spread inwards=-16mm, spread outwards=-8mm, left=8mm, top=16pt, bottom=28pt]
|
|
\chapter[From \emph{contradictionaries} to \emph{formatterings} An introduction to VLTK -- Vernacular Language Toolkit\\Cristina Cochior, Julie Boschat-Thorez, Manetta Berends]{From \emph{contradictionaries}\\to \emph{formatterings}\\An introduction to VLTK\\-- Vernacular Language\\Toolkit\footnotemark\\\\Cristina Cochior\\Julie Boschat-Thorez\\Manetta Berends}
|
|
\end{tcolorbox}
|
|
\footnotetext{A toolkit among a myriad of other possible vernacular language toolkits.}
|
|
\begin{center}
|
|
\begin{tikzpicture}[overlay,remember picture,inner sep=0pt, outer sep=0]
|
|
\node[anchor=south] at [yshift=0](current page text area.south) {
|
|
\tikz\path[shading=ball, ball color=black] circle (\ballsize);}
|
|
\end{tikzpicture}
|
|
\end{center}
|
|
|
|
\addtolength{\skip\footins}{10pt}
|
|
|
|
\vspace{\baselineskip}
|
|
|
|
% CONTENT
|
|
|
|
\begin{quoting}
|
|
A language comes into existence by means of brutal necessity, and the rules of the language are dictated by what the language must convey.\footnote{James Baldwin, “If Black English Isn't a Language, Then Tell Me, What Is?” \emph{The New York Times}, July 29, 1979.}
|
|
\end{quoting}
|
|
|
|
\noindent
|
|
Despite their ubiquity, the processes of computational language manipulation are largely imperceptible; they envelop, inform, and often standardise intimate interactions with the world. From spam filters, to search optimisation engines, to targeted advertisements, to reshuffled social media timelines, online experiences are mediated by the ordering logics of language processing.
|
|
|
|
\emph{Vernaculars come to matter} brings together contributions by Cengiz Mengüç, Clara Balaguer, Michael Murtaugh, Ren Loren Britton, and Rosemary Grennan. This publication, whose title derives from Britton’s contribution, reflects on the roles that the vernacular can play in linguistic and technological environments, as well as what vernacular orders language could inhabit or create. It is a speculation on what vernacular language processing might mean when considering how and where language is situated. \emph{Vernaculars comes to matter} is made in the context of the project VLTK, a Vernacular Language ToolKit in the making.
|
|
|
|
In this introductory text, we will share some of the thoughts behind Vernacular Language Toolkit, or VLTK in short, the starting point of this publication. VLTK is an ongoing research project initiated by Cristina Cochior, Julie Boschat-Thorez and Manetta Berends that aims to connect the vernacular to “language processing,” a practice that refers to any kind of computational treatment of language. By combining these two, it explores what forms of “vernacular language processing” there could be.
|
|
|
|
\addtolength{\skip\footins}{-10pt}
|
|
|
|
VLTK takes a dive into the logical operations that are used to process language with a computer to speak back to a range of unassuming habits in the field of computational language processing, and step towards modes of embedded, slow, and vernacular language processing and knowledge organisation. “Vernacular” in this text refers to everyday speech forming at the margins of standardisation; the ephemeral aspects of a culture's particularities that resist or exist alongside dominant systems of institutional aesthetics; or the encapsulation of a specific nowness in time.
|
|
|
|
Stretching the vocabulary that is commonly used by computational language processing practices is an important part of the work. In these contexts, language is often understood as “natural” or “artificial,” where the natural refers to spoken or written human language and the artificial to formal languages such as programming languages. Questioning these terms enables us to approach language as culturally promiscuous, and constantly in flux. Language is complex, messy, and contains all kinds of structures; some emerge through use, like sayings or expressions, others are imposed by external forces, sometimes even violently.\footnote{ “It is often forgotten that {[}dictionaries{]} are artificial repositories, put together well after the languages they define. The roots of language are irrational and of a magical nature.”\emph{ }Jorge Luis Borges, \emph{El otro, el mismo}. (Buenos Aires: Emecé, 2005), 5. Translation by the authors.} Instead of thinking of language as “raw” data, we prefer to consider it as heavily embedded and dense cultural material, which carries traces of its uses through time and ties to different locations. And instead of speaking of “extracting” keywords or phrases, to think of such actions as \emph{re-formations} or \emph{di-versioning}.
|
|
|
|
We are curious about undoing and crossing such pervasive terminologies into methods that allow us to rethink how we work with language in code: from dictionaries to \emph{contradictionaries}, from counting via accounting to \emph{ac-count-abilities}, from overlapping to \emph{overlooping}, from formatting to \emph{formatterings}. For example, a \emph{contradictionary} could provide openings to possible interpretations of a word, instead of defining its meaning. A \emph{formattering} could refer to the shaping of matter through formats. And so on.
|
|
|
|
The specific focus of VLTK on language playfully blurs the boundaries between tool (code as language) and material (language as code). Language processing tools are often made as instruments that can be used for any kind of textual material, making them effective tools for certain tasks, but bombastic, rough, and imprecise on other occasions as they process text without engaging with its content. This tension between tool and material creates a generative space to formulate questions: How does language change when it undergoes computational processes if we don't rely on the dualisms? How can language processing tools operate with a sensibility for all sorts of different complexities, specificities, and weights of language? How can we develop ways of close reading through and with code? Whose language is being processed by code? And who is affected by the logics of these systems? Can we think of computational operations as transmutational processes if we understand the transformations of language from one thing to another as a form of computational alchemy?
|
|
|
|
\section{\nohyphens{From the “natural” to the\\“vernacular”}}
|
|
|
|
\noindent
|
|
The acronym VLTK is a response to, and joke on, a ubiquitous programming library called NLTK, which stands for Natural Language ToolKit. It is a well-known project among programmers and people working in the field of computational linguistics, which is also known as the field of Natural Language Processing (NLP). Current, concrete use cases of NLTK include operations such as text classification for customer support, sentiment analysis for marketing purposes, or automated hate speech detection. Our collective work around VLTK started from the discomfort with the expression “natural language,” which is used in the field of computational linguistics to refer to language that has not been structured (yet) for further computational processing. Accepting the premise that language is natural would imply ignoring the procedures through which language becomes naturalised, imposed, overwritten, and ignoring the political mechanisms that sustain these efforts. We came to VLTK through a word play that scratched the itch of this discomfort and we started replacing the term “natural” with “vernacular” instead. Vernacular, a word close to “vulgar” -- of the people, common -- points towards the processes of language formation, and the context and urgency they require to exist. This small but defiant joke slowly grew through conversations, reflections, and a desire to try to do otherwise. It opened up and stretched generative spaces of interpretation.
|
|
|
|
NLTK allows you to interface with text in many different ways using a programming language, in this case Python. The makers of NLTK introduce the project on their website by describing it as:
|
|
|
|
\begin{quoting}
|
|
(\ldots) a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum.\footnote{\url{http://www.nltk.org}}
|
|
\end{quoting}
|
|
|
|
\noindent
|
|
NLTK comes with a whole set of interfaces, such as word counters, summarizers, text generators, translators, context inspectors, dictionary functions, classification tools, and more. The toolkit is so extensive, and some of its components have been integrated within many applications or systems that they are used by people with backgrounds ranging from the arts and humanities to science and engineering.\footnote{“NLP is important for scientific, economic, social, and cultural reasons. NLP is experiencing rapid growth as its theories and methods are deployed in a variety of new language technologies. For this reason it is important for a wide range of people to have a working knowledge of NLP. Within industry, this includes people in human-computer interaction, business information analysis, and web software development. Within academia, it includes people in areas from humanities computing and corpus linguistics through to computer science and artificial intelligence. (To many people in academia, NLP is known by the name of 'Computational Linguistics.')” \url{https://www.nltk.org/book/ch00.html\#audience}} NLTK was initiated by Steven Bird and Edward Loper in the Department of Computer and Information Science at the University of Pennsylvania. The project is published under an open licence,\footnote{ NLTK is published under the Apache license v2.0, \url{https://www.apache.org/licenses/LICENSE-2.0}} which means that anyone can use, modify, and distribute versions of the software for commercial or other purposes.
|
|
|
|
\newpage
|
|
|
|
\vspace*{-4pt} % whenever title at start of page
|
|
|
|
\section{Vernacular processing\\as mapping}
|
|
|
|
\noindent
|
|
The field of NLP understands mapping as an activity to turn so-called unstructured language into structured linguistic objects, such as a document index, a thesaurus, a dictionary, a comparative word list or a morph analyser. In the NLTK textbook, \emph{Natural Language Processing with Python},\footnote{\emph{Natural Language Processing with Python }is a textbook, which is often used as a first mediator when working with NLTK tools. Steven Bird, Ewan Klein, and Edward Loper, \emph{Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit }(Cambridge: O'Reilly, 2009). The book is also available in a digital form at \url{https://www.nltk.org/book/}.} such mapping activities are introduced in the following way:
|
|
|
|
\begin{quoting}
|
|
Most often, we are mapping from a “word” to some structured object. For example, a document index maps from a word (which we can represent as a string), to a list of pages (represented as a list of integers). In this section, we will see how to represent such mappings in Python.\footnote{\url{https://www.nltk.org/book/ch05.html\#sec-dictionaries}}
|
|
\end{quoting}
|
|
|
|
\noindent
|
|
In the same chapter there is also a table that describes the different maps that NLTK comes with:
|
|
|
|
\begin{tabularx}{1\textwidth}{
|
|
| >{\raggedright\arraybackslash}X
|
|
| >{\raggedright\arraybackslash}X
|
|
| >{\raggedright\arraybackslash}X |
|
|
}
|
|
\hline
|
|
\textbf{Linguistic Object} & \textbf{Maps From} & \textbf{Maps to}\\
|
|
\hline
|
|
Document Index & Word & List of pages (where word is found)\\
|
|
\hline
|
|
Thesaurus & Word sense & List of synonyms\\
|
|
\hline
|
|
Dictionary & Headword & Entry (part-of-speech, sense definitions, etymology)\\
|
|
\hline
|
|
Comparative Wordlist & Gloss term & Cognates (list of words, one per language)\\
|
|
\hline
|
|
Morph Analyzer & Surface form & Morphological analysis (list of component morphemes)\\
|
|
\hline
|
|
\end{tabularx}
|
|
|
|
\noindent
|
|
Figure: NLTK's linguistic objects, From \emph{Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit}, “Table 3.1: Linguistic Objects as Mappings from Keys to Values”\footnote{\url{https://www.nltk.org/book/ch05.html\#tab-linguistic-objects}}\\
|
|
|
|
\noindent
|
|
NLTK uses the metaphor of mapping to form indexical relations between truth and map. The use of the word mapping was something that caught our attention -- it is this indexical relation that needs questioning and study. Considering that language maps generate a new kind of linguistic matter, one that is processed and transformed through code, how does that mutate language? How can these mutations be studied? What kinds of maps can be made to map language differently? Can mapping be done based on:
|
|
|
|
\vspace{\baselineskip}
|
|
|
|
\begin{compactitem}[$\bullet$]
|
|
\tightlist
|
|
\item
|
|
\textbf{disorientation} by losing familiarity with a text?
|
|
\item
|
|
\textbf{thickening of matter, structures or paths} by intersecting text with other texts?
|
|
\item
|
|
\textbf{revision of markers of orientation} by amending the path over time?
|
|
\item
|
|
\textbf{following threads} by focusing on one perspective at a time?
|
|
\item
|
|
\textbf{reparative taxonomies} by “reconfiguring relations according to local and personal vantage points”?\footnote{Melissa Adler, “Eros in the library: Considering the aesthetics of knowledge organization,” \emph{Art Libraries Journal}, 44, no. 2 (April 2019): 67--71.}
|
|
\item
|
|
\textbf{perversion} by operating outside of the normative discourse?\footnote{Melissa Adler, \emph{Cruising the Library: Perversities in the organisation of knowledge} (New York: Fordham University Press, 2017).}
|
|
\end{compactitem}
|
|
|
|
\vspace{\baselineskip}
|
|
|
|
\noindent
|
|
Below is an attempt to remake the table of linguistic objects keeping these questions in mind:
|
|
|
|
\begin{tabularx}{1\textwidth}{
|
|
| >{\raggedright\arraybackslash}X
|
|
| >{\raggedright\arraybackslash}X
|
|
| >{\raggedright\arraybackslash}X |
|
|
}
|
|
\hline
|
|
\textbf{Linguistic Object} & \textbf{Maps From} & \textbf{Maps to}\\
|
|
\hline
|
|
? & ? & ?\\
|
|
\hline
|
|
Cross-referencing matrix & Voices & ?\\
|
|
\hline
|
|
? & Working conditions & ?\\
|
|
\hline
|
|
Document x-dex & Angles, texts, forms & Traces\\
|
|
\hline
|
|
? & Sentences & Question (making space)\\
|
|
\hline
|
|
Navigation score & Words & Paths taken by visitor\\
|
|
\hline
|
|
\_\_MAGIC\_WORDS\_\_ & ? & Modes of engagement\\
|
|
\hline
|
|
Complexity matrix & Words & Textual patterns\\
|
|
\hline
|
|
? & Ideas & Linked data\\
|
|
\hline
|
|
? & Word-combinations & Word-puns\\
|
|
\hline
|
|
Contradictionary & Contradictions & ?\\
|
|
\hline
|
|
? & Differences & Markers of difference\\
|
|
\hline
|
|
Formatterings & Aesthetics & Collages\\
|
|
\hline
|
|
Transfictions & Generative phrasings & Fictional transcripts\\
|
|
\hline
|
|
? & IN + NN & Prepositional agents\\
|
|
\hline
|
|
Pythonic texts & Cultural traditions & Python-like syntax\\
|
|
\hline
|
|
Non-linear slowgression & Intuitive correlations & ?\\
|
|
\hline
|
|
Situated calculations & ? & ?\\
|
|
\hline
|
|
\end{tabularx}
|
|
|
|
\noindent
|
|
Figure: A table of VLTK possibilities. To be versioned and expanded.
|
|
|
|
\vspace{\baselineskip}
|
|
|
|
\noindent
|
|
The table of VLTK possibilities includes \emph{Complexity matrices} that complexify the understanding of the context of a certain phrase or word\footnote{An example of this is word2complex, a workshop by Manetta Berends and Cristina Cochior. word2complex is a thought experiment to resist the flattening of meaning that is inherent in word2vec, a model commonly used to create “word embeddings.” \url{http://titipi.org/wiki/index.php/Word2complex}} \emph{Navigation scores} that generate scores based on the path a reader took through the text, ready for a next reader to be used as a guide; and forms of encapsulated close reading, using \emph{Transfictions}, to provide ways to ruminate a set of phrases by dislocating and re-contextualizing them. This last one is interpreted into a script below that wraps expressions from the book \emph{Queer Phenomenology} by Sara Ahmed into a compilation of conversational utterances arranged by chance.\footnote{Sara Ahmed, \emph{Queer Phenomenology: Orientations, Objects, Others} (Durham: Duke University Press, 2006).}
|
|
|
|
\begin{lstlisting}
|
|
They looked at the auto-complete suggestions and suddenly said:
|
|
|
|
"You know, markers of difference!"
|
|
|
|
Which made me think... boo,
|
|
|
|
capacity to position ourselves, hmm...
|
|
|
|
They looked at our variables and suddenly whispered:
|
|
|
|
"You know, that allow people to move!"
|
|
|
|
Which made me think... ahem,
|
|
|
|
anchoring points, no?
|
|
|
|
They looked at this dataset and suddenly mumbled:
|
|
|
|
"You know, we are orientated!"
|
|
|
|
Which made me think... ahem,
|
|
|
|
inhabiting space, right?
|
|
|
|
They looked at all our different keyboards and suddenly realised:
|
|
|
|
"You know, emotional intentionality!"
|
|
|
|
Which made me think... duh,
|
|
|
|
we are orientated, no?
|
|
|
|
They looked at the auto-complete suggestions and suddenly whispered:
|
|
|
|
"You know, difference as a simple database category!"
|
|
\end{lstlisting}
|
|
|
|
\noindent
|
|
The code output above is a fictional script generated by taking some of Ahmed's phrases out of their context and placing them into a wholly different one. The phrases are taken from their academic register and are placed into a colloquial one, introducing a shift in tone and making space to relate to the snippets in other ways.
|
|
|
|
The dislocation and relocation of phrases in the transfiction above draws attention to the aesthetics of knowledge organisation and structuring. By playing with traces of orality, versioning the language by accompanying it with verbal expressions, the example can be read as an invitation to keep Ahmed's phrases close, giving room for generating new understandings of them. The repetition of the dialogical phrases, for example, injects an idea of timing and rhythm, leaving gaps for the reader to fill in with their own interpretations.
|
|
|
|
In “Eros in the Library,” Melissa Adler introduces the ancient Greek historian Pamphila, who weaved “multiple sources and genres together to create a pleasing set of histories,” through a method she she called \emph{poikilia}.\footnote{Adler, “Eros in the library,” 69.} Adler cites Adeline Grand-Clément's definition of \emph{poikilia} as “harmonia that does not unify.”\footnote{Adeline Grand-Clément, “Poikilia,” in \emph{A Companion to Ancient Aesthetics}, ed. Pierre Destrée and Penelope Murray (Hoboken: John Wiley \& Sons, 2015), 410, cited in Adler, “Eros in the library,” 69.}. The aesthetic beauty and pleasure in Pamphilia's method shifts the purpose of knowledge organisation.
|
|
|
|
Instead of looking at the text from a distance by counting words and searching for numerical patterns throughout Ahmed's book, which is a very common practice in the field of NLP, these phrases were chosen after closely reading and discussing them. However, there is still an awkwardness in mixing them through programmatic relocations, placing the words in another context than the author intended them to be in which speaks to their need to be handled with care because a misalignment of contexts can create hurt.
|
|
|
|
The transfiction is an exercise to think about the relocation and recontextualisation of language, which has started from specific words from the book that resonated with the questions around vernacular language processing. They introduce a thinking around the notion of orientation, which adds a situated dimension to the metaphor of mapping. Ahmed describes orientation as a gesture of being “turned toward certain objects, those that help us to find our way.”\footnote{Sara Ahmed, \emph{Queer Phenomenology: Orientations, Objects, Others} (Durham: Duke University Press, 2006), 1.} If language is seen as a landscape of textual objects, in which one wishes to orient oneself, how do markers of orientation and markers of difference emerge? How do we orient ourselves? And what does it mean to be orientated through linguistic markers?
|
|
|
|
What does it mean to use the metaphor of mapping when working with language processing tools? Which issues and tensions related to non-metaphorical mapping practices -- such as cartography -- can we learn from when we map language?
|
|
|
|
\section{\nohyphens{\emph{Where} is the vernacular?}}
|
|
|
|
\noindent
|
|
The vernacular is often depicted in opposition to the standard. However, this relationship is not fixed, as the vernacular may influence the standard over time, or the standard may cause the vernacular as a response. The ubiquity of standards creates access to systems, as such they are valuable technical rule sets. However, they are difficult to change, are defined by a limited few, and exclude some groups, languages, or habits. This field of tension introduces many questions around the relation between the vernacular and the standard, without idealising or renouncing either.
|
|
|
|
To dive into the relationship between standards and the vernacular, Halcyon Lawrence's research makes an important point. Lawrence's research demonstrates that English-language information spoken with non-native accents is just as well understood by her interviewees as that spoken with native accents, but listeners take a little longer to process the information. She concludes that including non-native accents in the technologies that accompany the everyday is achievable, yet time is a prerequisite for it. It takes time to communicate between vernaculars.\footnote{ Halcyon Lawrence, “Inauthentically Speaking: Speech Technology, Accent Bias and Digital Imperialism,” presentation at Computer History Museum, April 26, 2017, \url{https://www.youtube.com/watch?v=gJCVla9xYUs}. 1:25--17:15}
|
|
|
|
The relation between the vernacular and time is also explored in the writings of James C. Scott, who provides us with the analogy of vernacular road names. A road might be known by locals under different names depending on where the traveller is heading to.\footnote{James C. Scott, \emph{Two Cheers for Anarchism} (Princeton: Princeton University Press, 2012), 30--32.} He gives the example of a road between Durham and Guildford, which depending on the direction one is taking changes name: if one is heading to Guilford, the road becomes the Guilford Road and if one is heading to Durham, the road turns into Durham Road. Following this logic, several roads might share the same name, making it difficult to distinguish them from each other, which would be important in case of an emergency. In this case, the time it takes to find the road would need to be optimised.
|
|
|
|
In the two examples above, we are speaking about different kinds of vernacular. In the first case, the vernacular appears as vernacular language and in the second one as a form of a vernacular way finding system.
|
|
|
|
\emph{Where} the vernacular is positioned within vernacular language processing is a complex question. How do we differentiate between different forms of informal language, such as dialects, accents, or slang? How do we understand the vernacular in relation to standards, urgencies, access, and time? This is a political question. Due to structural inequalities it is important for some forms of speech, accents, grammar, to be included in mainstream ways of doing, such as in the case mentioned by Halcyon Lawrence, which is a request for the vernacular to become standardised. It matters who the standards exclude, who has access, or for what purpose time is optimised.
|
|
|
|
Optimisation is a term often encountered in technical environments, where it generally refers to maximising the technical performance and minimising the financial costs of a particular technology. Seda Gürses et al argue that “optimization-based systems are developed to capture and manipulate behavior and environments for the extraction of value” and that as a result, “they introduce broader risks and harms for users and environments beyond the outcome of a single algorithm within that system.”\footnote{Seda Gürses et al, “POTs: Protective Optimization Technologies,” \emph{FAT* '20: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency}, 177--188. Paper available at \url{https://arxiv.org/pdf/1806.02711.pdf}. Quote is p1 on uploaded version.} While optimisation has its purpose in specific situations such as the non-static naming system of roads, the way optimisation has been embraced by the sciences as a mode of operating removes the possibility to stay with the uncertainty of what will follow, because the goal is defined within a financial scope. In programming more specifically, code is often written with the projection of what it should do in the world already present.
|
|
|
|
What might it mean instead to slow down and re-embed language processing in a messy world,\footnote{Derived from: “Reclaiming operations are never easy. If reclaiming scientific research means re-embedding the sciences in a messy world, it is not only a question of accepting this world as such, but of positively appreciating it, of learning how to foster and strengthen, in Whitehead's words, `the habits of concrete appreciation of the individual facts in their full interplay of emergent values'.” Isabelle Stengers, \emph{Another Science is Possible: A Mnifesto for Slow Science }(Cambridge: Polity, 2018), 122.} making space for rethinking the goal of a project or even without aiming for solutions at all? To take the time to develop counter-hegemonic counting techniques that process language otherwise?\footnote{“...referencing and citation in Black studies are what Carmen Kynard calls 'vernacular insurrections': narratives that are 'not only counter hegemonic, but also affirmative of new, constantly mutating languages, identities, political methodologies, and social understandings that communities form in and of themselves both inwardly and outwardly . . . not merely the bits and pieces chipped off or chipping away at dominant culture, but a whole new emergence.”' Katherine McKittrick, Dear Science and Other Stories, (Durham: Duke University Press, 2021), 26--27. McKittrick is citing Carmen Kynard, \emph{Vernacular Insurrections: Race, Black Protest, and the New Century in Composition Literacies Studies} (Albany: SUNY Press, 2013), 10--11.} VLTK turns to slow processing as a way to turn and return to the material at hand.
|
|
|
|
\section{\nohyphens{A conclusion that is\\a beginning}}
|
|
|
|
\noindent
|
|
Although the questions we ask may seem particular to language processing applications in scope, they are still relevant in a broader sense, as the intentions behind human communication are more and more evaluated by algorithms, especially on social media. For example, since 2017 hate speech and harassment recognition on Twitter has been heavily relying on algorithms, but trolls have come to adopt methods that can circumvent their speech from being flagged. Twitter has what they call a Hate Lab that works on algorithms “to end hate speech and improve healthy conversations online.”\footnote{\url{https://developer.twitter.com/en/community/success-stories/hatelab}} Formally codifying the understanding of what hate speech consists of leaves plenty of ways to work around a detection algorithm, and so methods of pursuing reckless harassment and hate speech have become plentiful. One can, for instance, make use of stylistic devices such as metonymy, antiphrasis, or irony. On French Twitter, immigrants and their children are referred to mockingly as “chances for the country,” which implies that a certain category of immigrants' contribution to society is nefarious. By using such masked language, the messages escape hate speech detection while continuing to spread their harm. Of course, the deciphering of such allusions requires a familiarisation with the vernacular codes of these communities. Emojis are also used as signifiers of a shared universe of references. For the French it might be the map, a signifier of white supremacism that nods towards a map of the average IQs by country.\footnote{This refers to the map in the racist and antihuman book by Richard Lynn and Tatu Vanhanen, \emph{IQ and the Wealth of Nations}(Westport: Praeger, 2002). See also Pauline Moullot, “La carte mondiale des QI, relayée par des comptes d'extrême droite, a-t-elle une valeur scientifique?” \emph{Liberation}, November 14, 2019, \url{https://www.liberation.fr/checknews/2019/11/14/la-carte-mondiale-des-qi-relayee-par-des-comptes-d-extreme-droite-a-t-elle-une-valeur-scientifique\_1754773/}} As such, it is important to remember that hate speech can also be vernacular language.
|
|
|
|
On the other hand, vernacular communication can be harmfully misunderstood by algorithms trained with a normative use of language in mind. This is the case with Perspective, the toxic speech detection API from Jigsaw (Google), which has a history of flagging African American Vernacular English (AAVE) as toxic.\footnote{Devin Coldewey, “Racial bias observed in hate speech detection algorithm from Google,” \emph{Techcrunch}, August 15, 2019, \url{https://techcrunch.com/2019/08/14/racial-bias-observed-in-hate-speech-detection-algorithm-from-google/}} Still, as of early 2021, Perspective was processing about 500 million requests daily in online spaces such as the comment sections of El País, Disqus, The New York Times, and others.\footnote{ Kyle Wiggers, “Jigsaw's AI-powered toxic language detector is now processing 500 million requests daily,” \emph{Venturebeat}, February 8, 2021, \url{https://venturebeat.com/2021/02/08/jigsaws-ai-powered-toxic-language-detector-is-now-processing-500-million-requests-daily/}} The risks of these language models reinforcing standards and refusing vernaculars are hard to understate. Communities that have been and still are marginalised, become marginalised further through the rejection of their linguistic expression.
|
|
|
|
Both cases show that language models are not able to adapt to contexts and that moderation should not be left to automated systems. The vernacular and the systematic are hard to pull apart, resulting in a complex interrelation that is urgent to be thought through, not just in the interest of platforms.
|
|
|
|
VLTK started from an interest to understand programming in relation to language processing, a practice that both shapes language and is shaped by it. To study this mutual transformation, we are specifically curious about programming practices that stay close to the material they work with, such as code that is written for a specific collection or a specific group of people, or esoteric code that is intentionally weird, peculiar, and not always made to be functional.\footnote{ “Welcome to Esolang, the esoteric programming languages wiki! This wiki is dedicated to the fostering and documentation of programming languages designed to be unique, difficult to program in, or just plain weird.” https://esolangs.org/wiki/Main\_Page} All the while keeping in mind that the constraints of programming languages themselves will also become the constraints of vernacular language processing.
|
|
|
|
As non-professional practitioners of language processing, we are curious to understand what it means to work with tools that are commonly used while staying close to them. As such, we like to think of VLTK as a project for discussing and thinking, rather than working towards solutions; making space for programming practices, logics, and methods that depart from local standards, vernacular measurements, and forms of abstracting otherwise. VLTK is, for us, an environment to:
|
|
|
|
... think about text processing tools, question them, and talk about them, in order to explore their vernacular possibilities\\
|
|
... explore social aspects of formats and formal text processing systems\\
|
|
... explore textual data as vernacular matter, through reading systems, exercises, small scripts ...\\
|
|
... play with standards and taxonomies that shape structured data\\
|
|
... stay close to the specifics of the textual material it is processing\\
|
|
... gesture to what textual formattering does\\
|
|
... prefer the anecdotal to officiating structures\\
|
|
... look for the possibilities of movement within existing parameters\\
|
|
... question where the vernacular is located and what it is for\\
|
|
...
|
|
|
|
\section{About this publication}
|
|
|
|
\noindent
|
|
This publication came together as a form of resonant publishing: publishing that is not done at the end of a process of thought, but is embedded in the middle of a social process where thoughts develop and unfold.\footnote{Our model of publishing is informed by (among many others) Stevphen Shukaitis, “Toward and Insurrection of the Published? Ten Thoughts on Ticks and Comrades,” \emph{Transversal Texts}, June 2014, \url{https://transversal.at/transversal/0614/shukaitis/en}} The publication holds five contributions written by a group of co-conspirators that work with forms of vernacular languaging, software culture, and textual archives. Their contributions provide us with a rich ground of understandings of different forms of vernacular cultures and technologies.
|
|
|
|
We invited Rosemary Grennan from MayDay Rooms in London to be in conversation with us about their digital archive Leftovers, a shared online platform of political ephemera such as leaflets, posters, and manifestos. In the interview Rosemary speaks about the structure of the archive, how they used Optical Character Recognition (OCR) and NLP tools to rethink this structure, and which dissemination tactics they have developed to make the work public.
|
|
|
|
Clara Balaguer presents a range of voices and media formats together to speak about and through the vernacular. “A high-low mix tape on the subject of the vernacular” combines lyrics, poetry, email snippets, and theoretical writing in the form of a mix-tape and lecture performance, understanding the vernacular in relation to the hegemonic position of “correct” English, writing from an “I” perspective.
|
|
|
|
Ren Loren Britton's “Turnabouts and deadnames: shapeshifting trans* and disabled vernaculars” speaks about deadnames as haunting matterings filtered through the fixed categories of bureaucratic institutional interfaces. Britton describes the violence and harm that these standardised systems produce and the potential for resistance to a rectangularised spreadsheet logic through the practices of trans* vernacular language.
|
|
|
|
Cengiz Mengüç shares a selection of a growing archive and research-in-progress around vernacular street typography of photos taken in Turkey between travels and family visits through 2019 and 2021. His attention to typography in the public space reminds us that written language exists not only in its abstraction. Street typography is very much shaped by its materiality, such as the encoded information or the sun-faded gradients that appear over time, but also by the traces of “reverse diaspora”\footnote{This is a term that Mengüç used in an email exchange while introducing his contribution, referring to the attempt to ”trace back the roots of certain local Rotterdam (diasporic) aesthetics and design cultures, scrolling back and forth in my iphone folders, I decided to work with this selection of photos I took in Turkey in 2019 \& 2021 in between travels and family visits”.} aesthetics that have travelled between the city of Rotterdam and Turkey.
|
|
|
|
Michael Murtaugh uses the vernacular as a lens to understand the difference between programming projects and environments Processing and ImageMagick. “Torn at the seams: vernacular approaches to teaching with computational tools” introduces both software projects and describes how each of them comes with its own culture, aesthetics, mindset, and connections to specific contexts including the Bauhaus, minimal art, and the MIT Media Lab. Murtaugh embraces the vernacular and messiness in software projects and shows us how such an approach generates a whole range of open invitations for others.
|
|
|
|
VLTK is produced in the proximity of Varia, a collective-space in the South of Rotterdam that works on questions around everyday technology. This context allows us to unfold programming practices that combine practice-based research with networked publishing, while bridging fields of software studies and tool making, which we approach with trans*feminist sensibilities.\footnote{“Trans*feminism is certainly a polyhedric dynamic at work, in mutual affection with the previous forces. We refer to the research as such, in order to convoke around that star (*) all intersectional and intra-sectional aspects that are possibly needed. Our trans*feminist lens is sharpened by queer and anti-colonial sensibilities, and oriented towards (but not limited to) trans*generational, trans*media, trans*disciplinary, trans*geopolitical, trans*expertise, and trans*genealogical forms of study. The situated mixing of software studies, media archaeology, artistic research, science and technology studies, critical theory and queer-anticolonial-feminist-antifa-technosciences purposefully counters hierarchies, subalternities, privileges and erasures in disciplinary methods.” From “Volumetric Regimes, Material cultures of quantifies presence,” by Possible Bodies (Jara Rocha and Femke Snelting), \url{https://volumetricregimes.xyz/index.php?title=Introduction}}
|
|
|
|
Our (Cristina Cochior's, Julie Boschat-Thorez's, and Manetta Berends's) shared backgrounds in Media Design and Communication at the Piet Zwart Instituut, and hands-on experiences that we gained while working with language processing tools in art and design projects or commissions, have guided our understanding of the subject.
|
|
|
|
We orient ourselves through different languages: French, Dutch, Romanian, English, Darija, Spanish, and German, but also Python, HTML, CSS, Javascript, and Bash among others, which we learn while watching TV, browsing the internet, or in conversation with family members.
|
|
|
|
This publication is published in a printed edition and digital one. The digital version is published on a self-hosted MediaWiki instance, or “wiki” in short, where we aim to unfold this research trajectory further. We see the wiki as a porous place that allows us to do this in close proximity of peers, friends, and other co-conspirators. You can find it at: \url{https://vltk.vvvvvvaria.org/}.
|
|
|
|
The printed edition was made using Free, Libre and Open Source Software (F/LOSS) tools by Marianne Plano. This publication came together through the use of ImageMagick and LaTeX, pushing the interplay between standardised layouts and vernacular effects further.
|
|
|
|
Both the printed edition and the wiki are published under the CC4r open licence,\footnote{\url{https://gitlab.constantvzw.org/unbound/cc4r/}} which allows anyone to use, modify, and distribute versions of the work, under the condition that any derived work will be published under the same licence or one that is permissive in a similar way.
|
|
|
|
\clearpage{\thispagestyle{empty}\cleardoublepage}
|
|
|