3711 lines
347 KiB
Plaintext
3711 lines
347 KiB
Plaintext
|
data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read
|
|||
|
nd learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean,
|
|||
|
nform, read and learn data workers write, perform, clean, inform, read and learn data workers write,
|
|||
|
perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn
|
|||
|
data workers write, perform, clean, inform, read and learn data workers write, perform, clean, infor
|
|||
|
, read and learn data workers write, perform, clean, inform, read and learn data workers w
|
|||
|
ite, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and l
|
|||
|
arn data workers write, perform, clean, inform, read and learn data workers write, p
|
|||
|
rform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn
|
|||
|
data workers write, perform, clean, inform, read and learn data workers write,
|
|||
|
perform, clean, inform, read and learn data workers write, perform, clean, inform, read and
|
|||
|
earn data workers write, perform, clean, inform, read and learn data wor
|
|||
|
ers write, perform, clean, inform, read and learn data workers write, perform, clean, inf
|
|||
|
rm, read and learn data workers write, perform, clean, inform, read and learn
|
|||
|
data workers write, perform, clean, inform, read and learn data workers wri
|
|||
|
e, perform, clean, inform, read and learn data workers write, perform, clean, inform,
|
|||
|
read and learn data workers write, perform, clean, inform, read and learn
|
|||
|
data workers write, perform, clean, inform, read and learn data wor
|
|||
|
ers write, perform, clean, inform, read and learn data workers write, perform, cl
|
|||
|
an, inform, read and learn data workers write, perform, clean, inform, read and
|
|||
|
earn data workers write, perform, clean, inform, read and learn
|
|||
|
data workers write, perform, clean, inform, read and learn dat
|
|||
|
workers write, perform, clean, inform, read and learn data workers write, p
|
|||
|
rform, clean, inform, read and learn data workers write, perform, clean, in
|
|||
|
orm, read and learn data workers write, perform, clean, inform, read and l
|
|||
|
arn data workers write, perform, clean, inform, read and learn
|
|||
|
data workers write, perform, clean, inform, read and learn
|
|||
|
data workers write, perform, clean, inform, read and learn
|
|||
|
data workers write, perform, clean, inform, read and learn data work
|
|||
|
rs write, perform, clean, inform, read and learn data workers write,
|
|||
|
perform, clean, inform, read and learn data workers write, perform,
|
|||
|
clean, inform, read and learn data workers write, perform, clean,
|
|||
|
nform, read and learn data workers write, perform, clean, inform,
|
|||
|
read and learn data workers write, perform, clean, inform, read
|
|||
|
nd learn data workers write, perform, clean, inform, read and l
|
|||
|
arn data workers write, perform, clean, inform, read and learn
|
|||
|
data workers write, perform, clean, inform, read and learn
|
|||
|
data workers write, perform, clean, inform, read and learn
|
|||
|
data workers write, perform, clean, inform, read and learn
|
|||
|
data workers write, perform, clean, inform, read and learn
|
|||
|
data workers write, perform, clean, inform, read and learn
|
|||
|
data workers write, perform, clean, inform, read and learn
|
|||
|
data workers write, perform, clean, inform, read and learn
|
|||
|
data workers write, perform, clean, inform, read and l
|
|||
|
arn data workers write, perform, clean, inform, read
|
|||
|
nd learn data workers write, perform, clean, inform,
|
|||
|
read and learn data workers write, perform, clean,
|
|||
|
nform, read and learn data workers write, perform,
|
|||
|
clean, inform, read and learn data workers write,
|
|||
|
perform, clean, inform, read and learn data work
|
|||
|
rs write, perform, clean, inform, read and learn
|
|||
|
data workers write, perform, clean, inform, read and learn
|
|||
|
data workers write, perform, clean, inform, read and learn
|
|||
|
data workers write, perform, clean, inform, read and learn
|
|||
|
data workers write, perform, clean, inform, read and learn
|
|||
|
|
|||
|
|
|||
|
What
|
|||
|
can
|
|||
|
humans learn from humans
|
|||
|
humans learn with machines
|
|||
|
machines learn from machines
|
|||
|
machines learn with humans
|
|||
|
humans learn from machines
|
|||
|
machines learn with machines
|
|||
|
machines learn from humans
|
|||
|
humans learn with humans
|
|||
|
? ? ?
|
|||
|
|
|||
|
Data Workers, an exhibition at the Mundaneum in Mons from 28 March until 28 April 2019.
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678
|
|||
|
9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567
|
|||
|
89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456
|
|||
|
789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345
|
|||
|
6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234
|
|||
|
56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123
|
|||
|
456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12
|
|||
|
3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1
|
|||
|
23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678
|
|||
|
9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567
|
|||
|
89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456
|
|||
|
789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345
|
|||
|
6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234
|
|||
|
56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123
|
|||
|
456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12
|
|||
|
3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1
|
|||
|
23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678
|
|||
|
9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567
|
|||
|
89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456
|
|||
|
789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345
|
|||
|
6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234
|
|||
|
56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123
|
|||
|
456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12
|
|||
|
3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1
|
|||
|
23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678
|
|||
|
9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567
|
|||
|
89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456
|
|||
|
789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345
|
|||
|
6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234
|
|||
|
56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123
|
|||
|
456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12
|
|||
|
3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1
|
|||
|
23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678
|
|||
|
9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567
|
|||
|
89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456
|
|||
|
789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345
|
|||
|
6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234
|
|||
|
56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123
|
|||
|
456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12
|
|||
|
3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1
|
|||
|
23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678
|
|||
|
9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567
|
|||
|
89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456
|
|||
|
789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345
|
|||
|
6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234
|
|||
|
56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123
|
|||
|
456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12
|
|||
|
3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1
|
|||
|
23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678
|
|||
|
9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567
|
|||
|
89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456
|
|||
|
789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345
|
|||
|
6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234
|
|||
|
56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123
|
|||
|
456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12
|
|||
|
3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1
|
|||
|
23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
2
|
|||
|
ABOUT AT THE MUNDANEUM
|
|||
|
|
|||
|
Data Workers is an exhibition of algoliterary works, of stories In the late nineteenth century two young
|
|||
|
told from an ‘algorithmic storyteller point of view’. The exhibi- Belgian jurists, Paul Otlet (1868–1944),
|
|||
|
tion was created by members of Algolit, a group from Brussels in- the 'father of documentation’, and Henri
|
|||
|
volved in artistic research on algorithms and literature. Every La Fontaine (1854-1943), statesman and
|
|||
|
month they gather to experiment with F/LOSS code and texts. Some Nobel Peace Prize winner, created the
|
|||
|
works are by students of Arts² and external participants to the Mundaneum. The project aimed to gather
|
|||
|
workshop on machine learning and text organized by Algolit in Oc- all the world’s knowledge and to file it
|
|||
|
tober 2018 at the Mundaneum. using the Universal Decimal Classifica-
|
|||
|
tion (UDC) system that they had invent-
|
|||
|
Companies create artificial intelligence (AI) systems to serve, ed. At first it was an International In-
|
|||
|
entertain, record and learn about humans. The work of these ma- stitutions Bureau dedicated to interna-
|
|||
|
chinic entities is usually hidden behind interfaces and patents. tional knowledge exchange. In the twen-
|
|||
|
In the exhibition, algorithmic storytellers leave their invisible tieth century the Mundaneum became a
|
|||
|
underworld to become interlocutors. The data workers operate in universal centre of documentation. Its
|
|||
|
different collectives. Each collective represents a stage in the collections are made up of thousands of
|
|||
|
design process of a machine learning model: there are the Writ- books, newspapers, journals, documents,
|
|||
|
ers, the Cleaners, the Informants, the Readers, the Learners and posters, glass plates and postcards in-
|
|||
|
the Oracles. The boundaries between these collectives are not dexed on millions of cross-referenced
|
|||
|
fixed; they are porous and permeable. At times, Oracles are also cards. The collections were exhibited
|
|||
|
Writers. At other times Readers are also Oracles. Robots voice and kept in various buildings in Brus-
|
|||
|
experimental literature, while algorithmic models read data, turn sels, including the Palais du Cinquante-
|
|||
|
words into numbers, make calculations that define patterns and naire. The remains of the archive only
|
|||
|
are able to endlessly process new texts ever after. moved to Mons in 1998.
|
|||
|
|
|||
|
The exhibition foregrounds data workers who impact our daily Based on the Mundaneum, the two men de-
|
|||
|
lives, but are either hard to grasp and imagine or removed from signed a World City for which Le Corbus-
|
|||
|
the imagination altogether. It connects stories about algorithms ier made scale models and plans. The aim
|
|||
|
in mainstream media to the storytelling that is found in techni- of the World City was to gather, at a
|
|||
|
cal manuals and academic papers. Robots are invited to engage in global level, the institutions of knowl-
|
|||
|
dialogue with human visitors and vice versa. In this way we might edge: libraries, museums and universi-
|
|||
|
understand our respective reasonings, demystify each other's be- ties. This project was never realized.
|
|||
|
haviour, encounter multiple personalities, and value our collec- It suffered from its own utopia. The
|
|||
|
tive labour. It is also a tribute to the many machines that Paul Mundaneum is the result of a visionary
|
|||
|
Otlet and Henri La Fontaine imagined for their Mundaneum, showing dream of what an infrastructure for uni-
|
|||
|
their potential but also their limits. versal knowledge exchange could be. It
|
|||
|
attained mythical dimensions at the
|
|||
|
--- time. When looking at the concrete ar-
|
|||
|
chive that was developed, that collec-
|
|||
|
Data Workers was created by Algolit. tion is rather eclectic and specific.
|
|||
|
|
|||
|
Works by: Cristina Cochior, Gijs de Heij, Sarah Garcin, An Artificial intelligence systems today
|
|||
|
Mertens, Javier Lloret, Louise Dekeuleneer, Florian Van de Weyer, come with their own dreams of universal-
|
|||
|
Laetitia Trozzi, Rémi Forte, Guillaume Slizewicz, Michael Mur- ity and knowledge production. When read-
|
|||
|
taugh, Manetta Berends, Mia Melvær. ing about these systems, the visionary
|
|||
|
dreams of their makers were there from
|
|||
|
Co-produced by: Arts², Constant and Mundaneum. the beginning of their development in
|
|||
|
the 1950s. Nowadays, their promise has
|
|||
|
With the support of: Wallonia-Brussels Federation/Digital Arts, also attained mythical dimensions. When
|
|||
|
Passa Porta, UGent, DHuF - Digital Humanities Flanders and looking at their concrete applications,
|
|||
|
Distributed Proofreaders Project. the collection of tools is truly innova-
|
|||
|
tive and fascinating, but at the same
|
|||
|
Thanks to: Mike Kestemont, Michel Cleempoel, Donatella Portoghe- time, rather eclectic and specific. For
|
|||
|
se, François Zajéga, Raphaèle Cornille, Vincent Desfromont, Kris Data Workers, Algolit combined some of
|
|||
|
Rutten, Anne-Laure Buisson, David Stampfli. the applications with 10 per cent of the
|
|||
|
digitized publications of the Interna-
|
|||
|
tional Institutions Bureau. In this way,
|
|||
|
we hope to poetically open up a discus-
|
|||
|
sion about machines, algorithms, and
|
|||
|
technological infrastructures.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
3
|
|||
|
CONTEXTUAL STORIES
|
|||
|
ABOUT ALGOLIT
|
|||
|
|
|||
|
|
|||
|
|
|||
|
--- Why contextual stories? --- spread by the media, often limited to superficial
|
|||
|
reporting and myth-making. By creating algoliter-
|
|||
|
During the monthly meetings of Algolit, we study ary works, we offer humans an introduction to
|
|||
|
manuals and experiment with machine learning tools techniques that co-shape their daily lives.
|
|||
|
for text processing. And we also share many, many
|
|||
|
stories. With the publication of these stories we
|
|||
|
hope to recreate some of that atmosphere. The sto- --- What is literature? ---
|
|||
|
ries also exist as a podcast that can be down-
|
|||
|
loaded from http://www.algolit.net. Algolit understands the notion of literature in
|
|||
|
the way a lot of other experimental authors do: it
|
|||
|
For outsiders, algorithms only become visible in includes all linguistic production, from the dic-
|
|||
|
the media when they achieve an outstanding perfor- tionary to the Bible, from Virginia Woolf's entire
|
|||
|
mance, like Alpha Go, or when they break down in work to all versions of the Terms of Service pub-
|
|||
|
fantastically terrifying ways. Humans working in lished by Google since its existence. In this
|
|||
|
the field though, create their own culture on and sense, programming code can also be literature.
|
|||
|
offline. They share the best stories and experi-
|
|||
|
ences during live meetings, research conferences The collective Oulipo is a great source of inspi-
|
|||
|
and annual competitions like Kaggle. These stories ration for Algolit. Oulipo stands for Ouvroir de
|
|||
|
that contextualize the tools and practices can be litterature potentielle (Workspace for Potential
|
|||
|
funny, sad, shocking, interesting. Literature). Oulipo was created in Paris by the
|
|||
|
French writers Raymond Queneau and François Le
|
|||
|
A lot of them are experiential learning cases. The Lionnais. They rooted their practice in the Euro-
|
|||
|
implementations of algorithms in society generate pean avant-garde of the twentieth century and in
|
|||
|
new conditions of labour, storage, exchange, be- the experimental tradition of the 1960s.
|
|||
|
haviour, copy and paste. In that sense, the con-
|
|||
|
textual stories capture a momentum in a larger an- For Oulipo, the creation of rules becomes the con-
|
|||
|
thropo-machinic story that is being written at dition to generate new texts, or what they call
|
|||
|
full speed and by many voices. potential literature. Later, in 1981, they also
|
|||
|
created ALAMO, Atelier de littérature assistée par
|
|||
|
la mathématique et les ordinateurs (Workspace for
|
|||
|
--- We create 'algoliterary' works --- literature assisted by maths and computers).
|
|||
|
|
|||
|
The term 'algoliterary' comes from the name of our
|
|||
|
research group Algolit. We have existed since 2012 --- An important difference ---
|
|||
|
as a project of Constant, a Brussels-based organi-
|
|||
|
zation for media and the arts. We are artists, While the European avant-garde of the twentieth
|
|||
|
writers, designers and programmers. Once a month century pursued the objective of breaking with
|
|||
|
we meet to study and experiment together. Our work conventions, members of Algolit seek to make con-
|
|||
|
can be copied, studied, changed, and redistributed ventions visible.
|
|||
|
under the same free license. You can find all the
|
|||
|
information on: http://www.algolit.net. 'I write: I live in my paper, I invest it, I walk
|
|||
|
through it.' (Espèces d'espaces. Journal d'un us-
|
|||
|
The main goal of Algolit is to explore the view- ager de l'espace, Galilée, Paris, 1974)
|
|||
|
point of the algorithmic storyteller. What new
|
|||
|
forms of storytelling do we make possible in dia- This quote from Georges Perec in Espèces d'espaces
|
|||
|
logue with these machinic agencies? Narrative could be taken up by Algolit. We're not talking
|
|||
|
viewpoints are inherent to world views and ideolo- about the conventions of the blank page and the
|
|||
|
gies. Don Quixote, for example, was written from literary market, as Georges Perec was. We're re-
|
|||
|
an omniscient third-person point of view, showing ferring to the conventions that often remain hid-
|
|||
|
Cervantes’ relation to oral traditions. Most con- den behind interfaces and patents. How are tech-
|
|||
|
temporary novels use the first-person point of nologies made, implemented and used, as much in
|
|||
|
view. Algolit is interested in speaking through academia as in business infrastructures?
|
|||
|
algorithms, and in showing you the reasoning un-
|
|||
|
derlying one of the most hidden groups on our We propose stories that reveal the complex hy-
|
|||
|
planet. bridized system that makes machine learning possi-
|
|||
|
ble. We talk about the tools, the logics and the
|
|||
|
To write in or through code is to create new forms ideologies behind the interfaces. We also look at
|
|||
|
of literature that are shaping human language in who produces the tools, who implements them, and
|
|||
|
unexpected ways. But machine Learning techniques who creates and accesses the large amounts of data
|
|||
|
are only accessible to those who can read, write needed to develop prediction machines. One could
|
|||
|
and execute code. Fiction is a way of bridging the say, with the wink of an eye, that we are collabo-
|
|||
|
gap between the stories that exist in scientific rators of this new tribe of human-robot hybrids.
|
|||
|
papers and technical manuals, and the stories
|
|||
|
|
|||
|
4
|
|||
|
writers write writers write writers write writers write writers write writers write writ
|
|||
|
rs write writers write writers write writers write writers write
|
|||
|
writers write writers write writers write writers write
|
|||
|
writers write writers write writers write writers write
|
|||
|
writers write writers write writers write
|
|||
|
writers write writers write writers write
|
|||
|
writers write writers write writers write
|
|||
|
writers write writers write
|
|||
|
writers write writers write writers write
|
|||
|
writers write writers write
|
|||
|
writers write writers write
|
|||
|
writers write writers write
|
|||
|
writers write writers write
|
|||
|
writers write writers write
|
|||
|
writers write writers write
|
|||
|
writers write writers write
|
|||
|
writers write writers write
|
|||
|
writers write writ
|
|||
|
rs write writers write
|
|||
|
writers write writers write
|
|||
|
writers write
|
|||
|
writers write writers write
|
|||
|
writers write writer
|
|||
|
write writers write
|
|||
|
writers write writ
|
|||
|
rs write writers write
|
|||
|
writers write
|
|||
|
writers write writers write
|
|||
|
writers write
|
|||
|
writers write w
|
|||
|
iters write writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
writers write writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
writers write writer
|
|||
|
write writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
writers write writ
|
|||
|
rs write writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
writers write
|
|||
|
5
|
|||
|
86ncrg k en3 a ioi-t i i l1 e i +-+-+-+-+-+-+-+ a +-+-+-+-+-+ l 9 t7ccpI46ed6t o w 7e a5o3 -
|
|||
|
el, e 7 nh 71 e 5 4 3 4 |w|r|i|t|e|r|s| i |w|r|i|t|e| daml su h i e1 ww A l e59se a 5o wl
|
|||
|
amlt t s w tlo n r 7a o9 +-+-+-+-+-+-+-+ ta +-+-+-+-+-+ hw t o4e e n,o32r , wd2 eo re 67n r
|
|||
|
o1ife tt s 38 nt l 74 o 7 5i oda 65 ei r 9 7 n 5 n1r m l ot a51 e 3ma, 14swn 7 r r
|
|||
|
b o i 3 se2 rceit ne a ki r 8 1iw3s n an t 8 8 r ra bn 1 eue r t4a r sT r phe o
|
|||
|
e 6e6 7h5orir de6 1 +-+-+-+-+ +-+-+-+-+-+-+-+ t u +-+-+-+-+ 1 8 97o e c 4 d 8 h 7 z o a c4
|
|||
|
w as 3r 17r p ai |d|a|t|a| |w|o|r|k|e|r|s| |w|o|r|k| 6 r6v56 4 2i7 e tu1 r9 w 5 8
|
|||
|
52 1 wi r 4hn G +-+-+-+-+ +-+-+-+-+-+-+-+ n +-+-+-+-+ nr 4 21 n raa2 Pn9 h
|
|||
|
a ca3 adw sara +-+-+-+-+ +-+-+-+-+-+-+-+ +-+-+-+-+-+ 9 e9na y tt c 7 6 .cbieas
|
|||
|
u e 5m b t3r 4 46 |m|a|n|y| |a|u|t|h|o|r|s| u |w|r|i|t|e| 4 4 yff , th t e
|
|||
|
6 2 6vo nn s +-+-+-+-+ +-+-+-+-+-+-+-+ m +-+-+-+-+-+ i 4 1 W1 n r8 - 1 g7
|
|||
|
4n +-+-+-+-+-+ +-+-+-+-+-+ +-+-+-+-+-+ 8 1n e 6l v5c a
|
|||
|
r 4 1 |e|v|e|r|y| |h|u|m|a|n| |b|e|i|n|g| n5 asr e 7l h 7 u , k o 2 r
|
|||
|
e h r h +-+-+-+-+-+ +-+-+-+-+-+ +-+-+-+-+-+ 65 3 1 t w er e3 5 1en e i
|
|||
|
4 o c +-+-+-+ +-+-+-+ +-+-+-+-+-+-+ +-+-+ u 6d7 r tm , t l se t i 1
|
|||
|
t fc |w|h|o| |h|a|s| |a|c|c|e|s|s| |t|o| e 69 t n 1 k 4 1
|
|||
|
e n +-+-+-+ +-+-+-+ +-+-+-+-+-+-+ +-+-+ ie 62i 2 t tn 7 t on o e
|
|||
|
1 l , +-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+ a 9 , 9
|
|||
|
9 w r |t|h|e| |i|n|t|e|r|n|e|t| |i|n|t|e|r|a|c|t|s| r i i tr h u f
|
|||
|
m i m 5 +-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+ 6 T c 5 w 6 i d T
|
|||
|
7 5 l i os +-+-+ +-+-+-+-+-+ +-+-+-+-+-+-+ s m
|
|||
|
w s r6 n |w|e| t |c|h|a|t|,| |w|r|i|t|e|,| 6 rrf
|
|||
|
e 2 6 , p oe +-+-+ o +-+-+-+-+-+ +-+-+-+-+-+-+ r
|
|||
|
e s 4 e p y 9 i +-+-+-+-+-+-+ +-+-+-+-+ +-+-+-+ r /
|
|||
|
e s 6 e |c|l|i|c|k|,| |l|i|k|e| |a|n|d| tw r6 t ai
|
|||
|
3 8 28 a n e 8 +-+-+-+-+-+-+ +-+-+-+-+ +-+-+-+ r4 7
|
|||
|
e n h t 5 n +-+-+-+-+-+ n
|
|||
|
3 9 f c |s|h|a|r|e| p
|
|||
|
l 5 9 +-+-+-+-+-+ d
|
|||
|
7 1 +-+-+ +-+-+-+-+-+ +-+-+-+ +-+-+-+-+ t 5
|
|||
|
r 2 2 e |w|e| |l|e|a|v|e| |o|u|r| |d|a|t|a| n3 i ,
|
|||
|
d t 8 a 9 +-+-+ 1 +-+-+-+-+-+ +-+-+-+ +-+-+-+-+ t
|
|||
|
7 +-+-+ +-+-+-+-+ +-+-+-+-+-+-+-+-+-+
|
|||
|
7 t e |w|e| |f|i|n|d| |o|u|r|s|e|l|v|e|s| 6
|
|||
|
y s 8 8 +-+-+ 7 +-+-+-+-+ +-+-+-+-+-+-+-+-+-+ n e
|
|||
|
r 1 +-+-+-+-+-+-+-+ +-+-+ +-+-+-+-+-+-+ e
|
|||
|
a 2 t |w|r|i|t|i|n|g| |i|n| |P|y|t|h|o|n|
|
|||
|
5 3 d +-+-+-+-+-+-+-+ +-+-+ +-+-+-+-+-+-+ r
|
|||
|
+-+-+-+-+ +-+-+-+-+-+-+ e
|
|||
|
|s|o|m|e| |n|e|u|r|a|l| 4 a
|
|||
|
k n +-+-+-+-+ +-+-+-+-+-+-+ z
|
|||
|
or 3 w +-+-+-+-+-+-+-+-+ +-+-+-+-+-+
|
|||
|
1 1 |n|e|t|w|o|r|k|s| c |w|r|i|t|e| 1 9
|
|||
|
s n +-+-+-+-+-+-+-+-+ +-+-+-+-+-+ e a
|
|||
|
g +-+-+-+-+-+ +-+-+-+-+-+-+-+ +-+-+-+-+-+-+ t
|
|||
|
|h|u|m|a|n| |e|d|i|t|o|r|s| |a|s|s|i|s|t| n , o
|
|||
|
8 +-+-+-+-+-+ +-+-+-+-+-+-+-+ +-+-+-+-+-+-+ a
|
|||
|
+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+ 4
|
|||
|
|p|o|e|t|s|,| |p|l|a|y|w|r|i|g|h|t|s| i7
|
|||
|
t +-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+ t c k y
|
|||
|
v +-+-+ +-+-+-+-+-+-+-+-+-+ o +-+-+-+-+-+-+
|
|||
|
|o|r| |n|o|v|e|l|i|s|t|s| |a|s|s|i|s|t| 4 2 9
|
|||
|
r +-+-+ +-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+ 7 6
|
|||
|
u r e
|
|||
|
, R
|
|||
|
6 6
|
|||
|
t
|
|||
|
s
|
|||
|
3 g 6 4
|
|||
|
|
|||
|
c e t 2
|
|||
|
3 h 8
|
|||
|
D 4
|
|||
|
a
|
|||
|
n o -
|
|||
|
w 5 e 3 n e 3
|
|||
|
3
|
|||
|
e
|
|||
|
|
|||
|
6
|
|||
|
V V V % V % V % V V V % % %% % %% % %% % % % % % %
|
|||
|
V V V V V V V V V V V V V V V V % % 0 %% 0 % %% % % % % %
|
|||
|
V V V V V V % V V V % % % % % % 0 % 00 % % 0 %
|
|||
|
% %% % 0 0 %% % % ___ _ %% % 0 %
|
|||
|
% % % % / \__ _| |_ __ _
|
|||
|
WRITERS % % % / /\ / _` | __/ _` | 0 0 % %
|
|||
|
% % % % / /_// (_| | || (_| | % % % %
|
|||
|
% 0 0 00 /___,' \__,_|\__\__,_| 0
|
|||
|
V V V V % V V V % V 0 __ __ _
|
|||
|
V V V V V V V V V V V V V V V V 0 0 / / /\ \ \___ _ __| | _____ _ __ ___ 0 0 %
|
|||
|
V V V V % V V V V V \ \/ \/ / _ \| '__| |/ / _ \ '__/ __|
|
|||
|
V V V V V V V V 0 0 0 \ /\ / (_) | | | < __/ | \__ \ 0
|
|||
|
V V V V V V V V V V V V V V V V \/ \/ \___/|_| |_|\_\___|_| |___/ % %
|
|||
|
V V V % V V V V V V 0 ___ _ _ _ 0 0 0 _ _ 0 %
|
|||
|
% / _ \_ _| |__ | (_) ___ __ _| |_(_) ___ _ __ %
|
|||
|
Data workers need data to work 0 / /_)/ | | | '_ \| | |/ __/ _` | __| |/ _ \| '_ \
|
|||
|
with. The data that used in the % / ___/| |_| | |_) | | | (_| (_| | |_| | (_) | | | |
|
|||
|
context of Algolit is written lan- 0 \/ \__,_|_.__/|_|_|\___\__,_|\__|_|\___/|_| |_|
|
|||
|
guage. Machine learning relies on 0 0 % 0 % %
|
|||
|
many types of writing. Many authors
|
|||
|
write in the form of publications, By Algolit
|
|||
|
such as books or articles. These % %
|
|||
|
are part of organized archives and All works visible in the exhibition, as well as the contextual
|
|||
|
are sometimes digitized. But there stories and some extra text material have been collected in a
|
|||
|
are other kinds of writing too. We publication, which exists in French and English.
|
|||
|
could say that every human being
|
|||
|
who has access to the Internet is a This publication is made using a plain text workflow, based on
|
|||
|
writer each time they interact with various text processing and counting tools. The plain text file
|
|||
|
algorithms. We chat, write, click, format is a type of document in which there is no inherent struc-
|
|||
|
like and share. In return for free tural difference between headers and paragraphs anymore. It is
|
|||
|
services, we leave our data that is the most used type of document in machine learning models for
|
|||
|
compiled into profiles and sold for text. This format has been the starting point of a playful design
|
|||
|
advertising and research purposes. process, where pages are carefully counted, page by page, line by
|
|||
|
line and character by character. %
|
|||
|
Machine learning algorithms are not %
|
|||
|
critics: they take whatever they're Each page holds 110 characters per line and 70 lines per page.
|
|||
|
given, no matter the writing style, The design originates from the act of counting words, spaces and
|
|||
|
no matter the CV of the author, no lines. It plays with random choices, scripted patterns and
|
|||
|
matter the spelling mistakes. In ASCII/UNICODE-fonts, to speculate about the materiality of digi-
|
|||
|
fact, mistakes make it better: the tal text and to explore the interrelations between counting and
|
|||
|
more variety, the better they learn writing through words and numbers.
|
|||
|
to anticipate unexpected text. But
|
|||
|
often, human authors are not aware --- %
|
|||
|
of what happens to their work.
|
|||
|
Texts: Cristina Cochior, Sarah Garcin, Gijs de Heij, An Mertens,
|
|||
|
Most of the writing we use is in François Zajéga, Louise Dekeuleneer, Florian Van de Weyer, Laeti-
|
|||
|
English, some in French, some in tia Trozzi, Rémi Forte, Guillaume Slizewicz.
|
|||
|
Dutch. Most often we find ourselves
|
|||
|
writing in Python, the programming Translations & proofreading: deepl.com, Michel Cleempoel, Elodie
|
|||
|
language we use. Algorithms can be % Mugrefya, Emma Kraak, Patrick Lennon.
|
|||
|
writers too. Some neural networks
|
|||
|
write their own rules and generate Lay-out & cover: Manetta Berends
|
|||
|
their own texts. And for the models
|
|||
|
that are still wrestling with the Responsible publisher: Constant vzw/asbl, Rue du Fortstraat 5,
|
|||
|
ambiguities of natural language, 1060 Brussels
|
|||
|
there are human editors to assist
|
|||
|
them. Poets, playwrights or novel- License: Algolit, Data Workers, March 2019, Brussels. Copyleft:
|
|||
|
ists start their new careers as as- This is a free work, you can copy, distribute, and modify it un-
|
|||
|
sistants of AI. der the terms of the Free Art License http://artlibre.org/li-
|
|||
|
cence/lal/en/.
|
|||
|
|
|||
|
Online version: http://www.algolit.net/index.php/Data_Workers
|
|||
|
|
|||
|
Sources: https://gitlab.constantvzw.org/algolit/mundaneum
|
|||
|
|
|||
|
%
|
|||
|
0 0
|
|||
|
0 0 0
|
|||
|
0 ___ _ 0 0
|
|||
|
7
|
|||
|
% % % % % %%% % % % % / \__ _| |_ __ _ 0 % %
|
|||
|
%%% % %% % % % % % % / /\ / _` | __/ _` | % % 0 %
|
|||
|
% % % % % % / /_// (_| | || (_| | % % % % %
|
|||
|
% %%% % % 00 /___,' \__,_|\__\__,_| % 0 % % % % %
|
|||
|
% __ % __ 0 % _ 0 % % % %
|
|||
|
% % 0 / / /\ \ \___ _ __| | _____ _ __ ___ % %
|
|||
|
% % % % % % \ \/ \/ / _ \| '__| |/ / _ \ '__/ __|
|
|||
|
% 0 \ /\ / (_) | | | < __/ | \__ \ 0 %
|
|||
|
% 0 \/ \/ \___/|_| |_|\_\___|_| |___/
|
|||
|
% % 0 % ___ _ _ %
|
|||
|
% % 0 / _ \___ __| | ___ __ _ ___| |_ 0
|
|||
|
% 0 0 / /_)/ _ \ / _` |/ __/ _` / __| __|
|
|||
|
% % 0 0 / ___/ (_) | (_| | (_| (_| \__ \ |_
|
|||
|
% 0 \/ \___/ \__,_|\___\__,_|___/\__| %
|
|||
|
0 0 0 0 0 0 %
|
|||
|
%
|
|||
|
% By Algolit %
|
|||
|
% % %
|
|||
|
% During our monthly Algolit meetings, we study manuals and experi-
|
|||
|
ment with machine learning tools for text processing. And we also
|
|||
|
share many, many stories. With this podcast we hope to recreate
|
|||
|
some of that atmosphere.
|
|||
|
% %
|
|||
|
For outsiders, algorithms only become visible in the media when
|
|||
|
they achieve an outstanding performance, like Alpha Go, or when
|
|||
|
they break down in fantastically terrifying ways. Humans working
|
|||
|
in the field though, create their own culture on and offline.
|
|||
|
They share the best stories and experiences during live meetings,
|
|||
|
research conferences and annual competitions like Kaggle. These
|
|||
|
% stories that contextualize the tools and practises can be funny,
|
|||
|
sad, shocking, interesting.
|
|||
|
|
|||
|
A lot of them are experiential learning cases. The implementa-
|
|||
|
% % tions of algorithms in society generate new conditions of labour,
|
|||
|
storage, exchange, behaviour, copy and paste. In that sense, the
|
|||
|
contextual stories capture a momentum in a larger anthropo-ma-
|
|||
|
chinic story that is being written at full speed and by many
|
|||
|
voices. The stories are also published in the publication of Data
|
|||
|
Workers.
|
|||
|
|
|||
|
--- %
|
|||
|
% %
|
|||
|
% Voices: David Stampfli, Cristina Cochior, An Mertens, Gijs de
|
|||
|
Heij, Karin Ulmer, Guillaume Slizewicz
|
|||
|
|
|||
|
Editing: Javier Lloret
|
|||
|
%
|
|||
|
Recording: David Stampfli
|
|||
|
|
|||
|
Texts: Cristina Cochior, An Mertens
|
|||
|
|
|||
|
|
|||
|
00 00 0
|
|||
|
0 0
|
|||
|
0 0 _ _ _
|
|||
|
% /\/\ __ _ _ __| | _| |__ ___ | |_
|
|||
|
/ \ / _` | '__| |/ / '_ \ / _ \| __|
|
|||
|
/ /\/\ \ (_| | | | 0 <| |_) | (_) | |_
|
|||
|
\/ \/\__,_|_| |_|\_\_.__/ \___/ \__|
|
|||
|
___ _ 0 0 _ 00
|
|||
|
/ __\ |__ __ _(_)_ __ ___ 0
|
|||
|
0 / / | '_ \ / _` | | '_ \/ __|
|
|||
|
0 / /___| | | | (_| | | | | \__ \
|
|||
|
0 \____/|_| |_|\__,_|_|_| |_|___/ 0 0
|
|||
|
0 0 0
|
|||
|
|
|||
|
By Florian Van de Weyer, student Arts²/Section Digital Arts
|
|||
|
|
|||
|
Markbot Chain is a social experiment in which the public has a
|
|||
|
8
|
|||
|
%% % % % direct influence on the result. The intention is to integrate re-
|
|||
|
% % % % % % sponses in a text-generation process without applying any filter.
|
|||
|
% % %% % %%% %% % % % % % % % % %%
|
|||
|
% % % % % % All the questions in the digital files provided by the Mundaneum %%
|
|||
|
% % %% were automatically extracted. These questions are randomly put to %
|
|||
|
% the public via a terminal. By answering them, people contribute
|
|||
|
% % % to another database. Each entry generates a series of sentences %
|
|||
|
using a Markov chain configuration, an algorithm that is widely %
|
|||
|
used in spam generation. The sentences generated in this way are % %
|
|||
|
% displayed in the window, and a new question is asked. % %
|
|||
|
% % %
|
|||
|
% % %
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
9
|
|||
|
CONTEXTUAL STORIES
|
|||
|
ABOUT WRITERS
|
|||
|
|
|||
|
|
|||
|
|
|||
|
--- Programmers are writing the dataworkers into the only way to maintain trust is through consis-
|
|||
|
being --- tency. So when Cortana talks, you 'must use her
|
|||
|
personality'.
|
|||
|
We recently had a funny realization: most program-
|
|||
|
mers of the languages and packages that Algolit What is Cortana's personality, you ask?
|
|||
|
uses are European.
|
|||
|
|
|||
|
Python, for example, the main language that is 'Cortana is considerate, sensitive, and support-
|
|||
|
globally used for Natural Language Processing ive.
|
|||
|
(NLP), was invented in 1991 by the Dutch program-
|
|||
|
mer Guido Van Rossum. He then crossed the Atlantic She is sympathetic but turns quickly to solutions.
|
|||
|
and went from working for Google to working for
|
|||
|
Dropbox. She doesn't comment on the user’s personal infor-
|
|||
|
mation or behavior, particularly if the informa-
|
|||
|
Scikit Learn, the open-source Swiss knife of ma- tion is sensitive.
|
|||
|
chine learning tools, started as a Google Summer
|
|||
|
of Code project in Paris by French researcher She doesn't make assumptions about what the user
|
|||
|
David Cournapeau. Afterwards, it was taken on by wants, especially to upsell.
|
|||
|
Matthieu Brucher as part of his thesis at the Sor-
|
|||
|
bonne University in Paris. And in 2010, INRA, the She works for the user. She does not represent any
|
|||
|
French National Institute for computer science and company, service, or product.
|
|||
|
applied mathematics, adopted it.
|
|||
|
She doesn’t take credit or blame for things she
|
|||
|
Keras, an open-source neural network library writ- didn’t do.
|
|||
|
ten in Python, was developed by François Chollet,
|
|||
|
a French researcher who works on the Brain team at She tells the truth about her capabilities and her
|
|||
|
Google. limitations.
|
|||
|
|
|||
|
Gensim, an open-source library for Python used to She doesn’t assume your physical capabilities,
|
|||
|
create unsupervised semantic models from plain gender, age, or any other defining characteristic.
|
|||
|
text, was written by Radim Řehůřek. He is a Czech
|
|||
|
computer scientist who runs a consulting business She doesn't assume she knows how the user feels
|
|||
|
in Bristol, UK. about something.
|
|||
|
|
|||
|
And to finish up this small series, we also looked She is friendly but professional.
|
|||
|
at Pattern, an often-used library for web-mining
|
|||
|
and machine learning. Pattern was developed and She stays away from emojis in tasks. Period
|
|||
|
made open-source in 2012 by Tom De Smedt and Wal-
|
|||
|
ter Daelemans. Both are researchers at CLIPS, the She doesn’t use culturally- or professionally-spe-
|
|||
|
research centre for Computational Linguistics and cific slang.
|
|||
|
Psycholinguistcs at the University of Antwerp.
|
|||
|
She is not a support bot.'
|
|||
|
|
|||
|
--- Cortana speaks ---
|
|||
|
Humans intervene in detailed ways to programme an-
|
|||
|
AI assistants often need their own assistants: swers to questions that Cortana receives. How
|
|||
|
they are helped in their writing by humans who in- should Cortana respond when she is being proposed
|
|||
|
ject humour and wit into their machine-processed inappropriate actions? Her gendered acting raises
|
|||
|
language. Cortana is an example of this type of difficult questions about power relations within
|
|||
|
blended writing. She is Microsoft’s digital assis- the world away from the keyboard, which is being
|
|||
|
tant. Her mission is to help users to be more pro- mimicked by technology.
|
|||
|
ductive and creative. Cortana's personality has
|
|||
|
been crafted over the years. It's important that Consider Cortana's answer to the question:
|
|||
|
she maintains her character in all interactions
|
|||
|
with users. She is designed to engender trust and - Cortana, who's your daddy?
|
|||
|
her behavior must always reflect that. - Technically speaking, he’s Bill Gates. No big
|
|||
|
deal.
|
|||
|
The following guidelines are taken from Mi-
|
|||
|
crosoft's website. They describe how Cortana's
|
|||
|
style should be respected by companies that extend --- Open-source learning ---
|
|||
|
her service. Writers, programmers and novelists,
|
|||
|
who develop Cortana's responses, personality and Copyright licenses close up a lot of the machinic
|
|||
|
branding have to follow these guidelines. Because writing, reading and learning practices. That
|
|||
|
means that they're only available for the employ-
|
|||
|
10
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
ees of a specific company. Some companies partici-
|
|||
|
pate in conferences worldwide and share their References
|
|||
|
knowledge in papers online. But even if they share https://hiphilangsci.net/2013/05/01/on-the-his-
|
|||
|
their code, they often will not share the large tory-of-the-question-of-whether-natural-language-
|
|||
|
amounts of data needed to train the models. is-illogical/
|
|||
|
|
|||
|
We were able to learn to machine learn, read and Book: Neural Network Methods for Natural Language
|
|||
|
write in the context of Algolit, thanks to aca- Processing, Yoav Goldberg, Bar Ilan University,
|
|||
|
demic researchers who share their findings in pa- April 2017.
|
|||
|
pers or publish their code online. As artists, we
|
|||
|
believe it is important to share that attitude.
|
|||
|
That's why we document our meetings. We share the
|
|||
|
tools we make as much as possible and the texts we
|
|||
|
use are on our online repository under free li-
|
|||
|
censes.
|
|||
|
|
|||
|
We are thrilled when our works are taken up by
|
|||
|
others, tweaked, customized and redistributed, so
|
|||
|
please feel free to copy and test the code from
|
|||
|
our website. If the sources of a particular
|
|||
|
project are not there, you can always contact us
|
|||
|
through the mailinglist. You can find a link to
|
|||
|
our repository, etherpads and wiki at:
|
|||
|
http://www.algolit.net.
|
|||
|
|
|||
|
|
|||
|
--- Natural language for artificial intelligence
|
|||
|
---
|
|||
|
|
|||
|
Natural Language Processing (NLP) is a collective
|
|||
|
term that refers to the automatic computational
|
|||
|
processing of human languages. This includes algo-
|
|||
|
rithms that take human-produced text as input, and
|
|||
|
attempt to generate text that resembles it. We
|
|||
|
produce more and more written work each year, and
|
|||
|
there is a growing trend in making computer inter-
|
|||
|
faces to communicate with us in our own language.
|
|||
|
NLP is also very challenging, because human lan-
|
|||
|
guage is inherently ambiguous and ever-changing.
|
|||
|
|
|||
|
But what is meant by 'natural' in NLP? Some would
|
|||
|
argue that language is a technology in itself. Ac-
|
|||
|
cording to Wikipedia, 'a natural language or ordi-
|
|||
|
nary language is any language that has evolved
|
|||
|
naturally in humans through use and repetition
|
|||
|
without conscious planning or premeditation. Natu-
|
|||
|
ral languages can take different forms, such as
|
|||
|
speech or signing. They are different from con-
|
|||
|
structed and formal languages such as those used
|
|||
|
to program computers or to study logic. An offi-
|
|||
|
cial language with a regulating academy, such as
|
|||
|
Standard French with the French Academy, is clas-
|
|||
|
sified as a natural language. Its prescriptive
|
|||
|
points do not make it constructed enough to be
|
|||
|
classified as a constructed language or controlled
|
|||
|
enough to be classified as a controlled natural
|
|||
|
language.'
|
|||
|
|
|||
|
So in fact, 'natural languages' also includes lan-
|
|||
|
guages which do not fit in any other group. NLP,
|
|||
|
instead, is a constructed practice. What we are
|
|||
|
looking at is the creation of a constructed lan-
|
|||
|
guage to classify natural languages that, by their
|
|||
|
very definition, resists categorization.
|
|||
|
|
|||
|
11
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678
|
|||
|
9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567
|
|||
|
89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456
|
|||
|
789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345
|
|||
|
6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234
|
|||
|
56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123
|
|||
|
456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12
|
|||
|
3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1
|
|||
|
23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678
|
|||
|
9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567
|
|||
|
89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456
|
|||
|
789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345
|
|||
|
6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234
|
|||
|
56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123
|
|||
|
456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12
|
|||
|
3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1
|
|||
|
23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678
|
|||
|
9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567
|
|||
|
89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456
|
|||
|
789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345
|
|||
|
6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234
|
|||
|
56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123
|
|||
|
456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12
|
|||
|
3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1
|
|||
|
23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678
|
|||
|
9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567
|
|||
|
89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456
|
|||
|
789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345
|
|||
|
6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234
|
|||
|
56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123
|
|||
|
456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12
|
|||
|
3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1
|
|||
|
23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678
|
|||
|
9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567
|
|||
|
89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456
|
|||
|
789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345
|
|||
|
6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234
|
|||
|
56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123
|
|||
|
456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12
|
|||
|
3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1
|
|||
|
23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678
|
|||
|
9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567
|
|||
|
89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456
|
|||
|
789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345
|
|||
|
6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234
|
|||
|
56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123
|
|||
|
456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12
|
|||
|
3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1
|
|||
|
23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678
|
|||
|
9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567
|
|||
|
89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456
|
|||
|
789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345
|
|||
|
6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234
|
|||
|
56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123
|
|||
|
456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12
|
|||
|
3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1
|
|||
|
23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
12
|
|||
|
oracles predict oracles predict oracles predict oracles predict oracles predict oracles predic
|
|||
|
oracles predict oracles predict oracles predict oracles predict orac
|
|||
|
es predict oracles predict oracles predict oracles predict
|
|||
|
racles predict oracles predict oracles predict oracles predic
|
|||
|
oracles predict oracles predict oracles predict
|
|||
|
oracles predict oracles predict oracles predict
|
|||
|
oracles predict oracles predict or
|
|||
|
cles predict oracles predict oracles predict
|
|||
|
oracles predict oracles predict
|
|||
|
oracles predict oracles predict oracles pr
|
|||
|
dict oracles predict oracles predict
|
|||
|
oracles predict oracles predict
|
|||
|
oracles predict oracles predict
|
|||
|
oracles predict oracles predict
|
|||
|
oracles predict oracles predict
|
|||
|
oracles predict orac
|
|||
|
es predict oracles predict
|
|||
|
oracles predict oracles predict
|
|||
|
oracles predict oracles predic
|
|||
|
oracles predict
|
|||
|
oracles predict oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict oracles predict
|
|||
|
oracles predict
|
|||
|
racles predict oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict orac
|
|||
|
es predict oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
racles predict oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
racles predict oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict or
|
|||
|
cles predict oracles predic
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
oracles predict
|
|||
|
13
|
|||
|
r e32t 8smc 9i ab14 e s4 +-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+ , e| 8 1 e D ry a4a e ta 9 e
|
|||
|
t s5 e ² 348 th8no 2 4at t |o|r|a|c|l|e|s| ar3i |p|r|e|d|i|c|t| 63 s 1 tc39,l3h, d14 5au on w
|
|||
|
4 SI, 1 56 e|p 4 iu g7 e +-+-+-+-+-+-+-+ 39k +-+-+-+-+-+-+-+ 9 l o a d r 7 P _ e,a +
|
|||
|
n w 2a p/+ 9f8 1of 5\i 4h h e2n 3 t on1 9t \ 94 ne2 + uu e n 63m 5 e a3 2n e,
|
|||
|
sn 39ew nt1i -5d 632sd e 15t |a3% 3 c wt9 c n9sg6et 8 8 c , n 1poo F
|
|||
|
1 3 o 1g18e +-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ 7 +-+-+-+-+-+-+-+-+ +-+-+-+ 4 n t2+a- 8 43 8 3p4
|
|||
|
n o tpn86i |m|a|c|h|i|n|e| |l|e|a|r|n|i|n|g| 2 |a|n|a|l|y|s|e|s| |a|n|d| a 5e v3 5 9 o56n n
|
|||
|
e9n 4 5 +-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ etn +-+-+-+-+-+-+-+-+ +-+-+-+ li 5p 8f i h
|
|||
|
3 6 k6 3i6 3 9y e , r6 6iA wg r1 +-+-+-+-+-+-+-+-+ 3 e e a y l hl
|
|||
|
-N 7 g n6d 14t l1 9ui | _rs e i e 1 |p|r|e|d|i|c|t|s| 1 wn9uc tn s 6m
|
|||
|
a rrh4 7 oly e e e e 4 62 y a e +-+-+-+-+-+-+-+-+ g 8a 3 V l% u a i 1 7 1
|
|||
|
’ h | 8 8 5 _ n , 8r 4 1_ +-+-+-+-+-+-+ .r +-+-+-+-+ +-+-+-+-+-+-+-+ 5 r 3 9 1 p o f a
|
|||
|
r v t 4 o 9 w2 4r |m|o|d|e|l|s| g r |h|a|v|e| |l|e|a|r|n|e|d| 1 n r1 8 2 sro
|
|||
|
1 ,d c T2 8 9 41 6 +-+-+-+-+-+-+ c +-+-+-+-+ +-+-+-+-+-+-+-+ d3 s m 6 d n f c t e
|
|||
|
t t r 1 6 .ofoi t 5 67 1 +-+-+-+-+-+-+ 7 +-+-+-+ +-+-+-+-+ 4o e e 5 1 98 g ,
|
|||
|
+ rw l 9 96 a 3t np , |m|o|d|e|l|s| |a|r|e| |u|s|e|d| , e uu 3 l c t
|
|||
|
3 28e 95 9 h _ n +-+-+-+-+-+-+ +-+-+-+ +-+-+-+-+ a9 1e _eu p e d e w
|
|||
|
n w r n n f 8 c , d +-+-+-+-+ a +-+-+-+-+-+-+-+-+-+ 84 i e l8 t
|
|||
|
+ o mf 7 |t|h|e|y| d |i|n|f|l|u|e|n|c|e| o n a bntq c d n7 8
|
|||
|
- s e 9 n 7 77 8 +-+-+-+-+ aa +-+-+-+-+-+-+-+-+-+ t a 6 1 | c4
|
|||
|
h o l6 o 9 8 o +-+-+-+-+ i +-+-+-+-+ +-+-+-+-+-+ +-+-+-+ e r 3e9 h 6
|
|||
|
o -n p 9 f n s 8hr |t|h|e|y| e- |h|a|v|e| |t|h|e|i|r| |s|a|y| lV d tr
|
|||
|
r 2 6 6 a +-+-+-+-+ %5 +-+-+-+-+ +-+-+-+-+-+ +-+-+-+ 3 ip n 5n
|
|||
|
r 7 o( s +-+-+-+-+-+-+-+-+-+-+-+ 5 4 a o 7 3 e 6 n- t n f d it
|
|||
|
p 1 e |i|n|f|o|r|m|a|t|i|o|n| 4n i3 c, 6 t 1 l ma 7
|
|||
|
1 d b +-+-+-+-+-+-+-+-+-+-+-+ a 7 t 4 7 s w 3a e
|
|||
|
4 3 3 +-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+ d i 2
|
|||
|
6 e r C |e|x|t|r|a|c|t|i|o|n| |r|e|c|o|g|n|i|z|e|s| r
|
|||
|
%_ e d kb h +-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+ a
|
|||
|
3 c +-+-+-+-+ m v
|
|||
|
7 + 9 l 5 so h a a |t|e|x|t| 5 5 e 3 9 P p 5
|
|||
|
-9 t u5 7 ' l +-+-+-+-+ m ao n- r
|
|||
|
i y +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+ 8 1
|
|||
|
a 9 37 |c|l|a|s|s|i|f|i|c|a|t|i|o|n| |d|e|t|e|c|t|s| c
|
|||
|
4 I r t p h +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ o +-+-+-+-+-+-+-+ O pe u
|
|||
|
g rk 4 7 1 5 5 9 i 4 c 5 2
|
|||
|
o 3 p h 9 v r f 3d
|
|||
|
d , 3r 5i g h 1 4 l 5
|
|||
|
h w c 7 e 3 yo n
|
|||
|
h 5 5 2 e m o , c 2 r
|
|||
|
s 3 1 7 s 1 e 1
|
|||
|
l 6 t e 6 1 r b 2 4
|
|||
|
e r 4 4 o s 4
|
|||
|
9 ,i pw o c
|
|||
|
1 6 n , a 5
|
|||
|
e e i 4 p t , ' s
|
|||
|
ei 9 t
|
|||
|
6 t l u 6 9
|
|||
|
V 8 c | _ a
|
|||
|
r o 5 r | 3 t t
|
|||
|
1 1 o 3 _
|
|||
|
o l 6 i 7 + O w e
|
|||
|
8 7 M se
|
|||
|
% i 3 e
|
|||
|
p 3 9
|
|||
|
a r a b i n o a
|
|||
|
7 e 4 s o tl t
|
|||
|
9 r s 94 c
|
|||
|
o k5 l 2 | a r T 1 ,
|
|||
|
r r 2 s
|
|||
|
| , n
|
|||
|
o t 5
|
|||
|
l t r si
|
|||
|
e y s t
|
|||
|
y e o
|
|||
|
r 8 e 1 h
|
|||
|
2 n 6 5
|
|||
|
r n 5 s
|
|||
|
|
|||
|
14
|
|||
|
V V V V V V V V %% %% % % % % %
|
|||
|
V V V V V V V V V V V V V V V V 0 % 0 % 0 0 %% 0 % % %%
|
|||
|
V V V % V % V V V V V % % %% % 0 0 0 0 % 0 0 00
|
|||
|
% % % %% % % _____ _ 0 _ _ 0 _ _ _ % %
|
|||
|
% % 0 /__ \ |__ ___ /_\ | | __ _ ___ | (_) |_ %
|
|||
|
% ORACLES % % % % % 0 / /\/ '_ \ / _ \ //_\\| |/ _` |/ _ \| | | __| %
|
|||
|
% % %% / / | | | | __/ / _ \ | (_| | (_) | | | |_
|
|||
|
% % \/ |_| |_|\___| \_/ \_/_|\__, |\___/|_|_|\__|
|
|||
|
V V V V V V V V % 0 % % % 0 |___/ %
|
|||
|
V V V V V V V V V V V V V V V V % 0 0 %% 0 0 _ 0 % 0 %
|
|||
|
V V V V V V V V V 0 ___ _ __ __ _| |_ ___ _ __ %
|
|||
|
V V V V V V V V % % % % / _ \ '__/ _` | __/ _ \| '__| %
|
|||
|
V V V V V V V V V V V V V V V V % | __/ | | (_| | || (_) | |
|
|||
|
V V V V V V V V V 0 \___|_| \__,_|\__\___/|_|
|
|||
|
% 0 0 %
|
|||
|
Machine learning is mainly used to % %
|
|||
|
analyse and predict situations by Algolit %
|
|||
|
based on existing cases. In this
|
|||
|
exhibition we focus on machine The Algoliterator is a neural network trained using the selection
|
|||
|
learning models for text processing of digitized works of the Mundaneum archive. %
|
|||
|
or Natural Language Processing %
|
|||
|
(NLP). These models have learned to With the Algoliterator you can write a text in the style of the
|
|||
|
perform a specific task on the ba- International Institutions Bureau. The Algoliterator starts by
|
|||
|
sis of existing texts. The models selecting a sentence from the archive or corpus used to train it.
|
|||
|
are used for search engines, ma- You can then continue writing yourself or, at any time, ask the
|
|||
|
chine translations and summaries, Algoliterator to suggest a next sentence: the network will gener-
|
|||
|
spotting trends in new media net- ate three new fragments based on the texts it has read. You can
|
|||
|
works and news feeds. They influ- control the level of training of the network and have it generate
|
|||
|
ence what you get to see as a user, sentences based on primitive training, intermediate training or
|
|||
|
but also have their say in the final training.
|
|||
|
course of stock exchanges world-
|
|||
|
wide, the detection of cybercrime When you're satisfied with your new text, you can print it on the
|
|||
|
and vandalism, etc. thermal printer and take it home as a souvenir.
|
|||
|
%
|
|||
|
There are two main tasks when it % ---
|
|||
|
comes to language understanding.
|
|||
|
Information extraction looks at Sources: https://gitlab.constantvzw.org/algolit/algoliterator.-
|
|||
|
concepts and relations between con- clone
|
|||
|
cepts. This allows for recognizing
|
|||
|
topics, places and persons in a Concept, code & interface: Gijs de Heij & An Mertens
|
|||
|
text, summarization and questions &
|
|||
|
answering. The other task is text Technique: Recurrent Neural Network
|
|||
|
classification. You can train an
|
|||
|
oracle to detect whether an email Original model: Andrej Karphaty, Justin Johnson %
|
|||
|
is spam or not, written by a man or
|
|||
|
a woman, rather positive or nega- % %
|
|||
|
tive. 0 0 0 0 0 0
|
|||
|
0 0 0 0 0 0 0
|
|||
|
In this zone you can see some of __ __ 0 _ 0 _ 0
|
|||
|
those models at work. During your 0 0 / / /\ \ \___ _ __ __| |___ (_)_ __
|
|||
|
further journey through the exhibi- \ \/ \/ / _ \| '__/ _` / __| | | '_ \
|
|||
|
tion you will discover the differ- \ /\ / (_) | | | (_| \__ \ | | | | |
|
|||
|
ent steps that a human-machine goes \/ \/ \___/|_| \__,_|___/ |_|_| |_|
|
|||
|
through to come to a final model. 0 __ 0
|
|||
|
00 0 / _\_ __ __ _ ___ ___ 0
|
|||
|
00 0 \ \| '_ \ / _` |/ __/ _ \
|
|||
|
_\ \ |_) | (_| | (_| __/ 0
|
|||
|
% 0 \__/ .__/ \__,_|\___\___|
|
|||
|
0 0 |_| 0
|
|||
|
0 0 0 0 0 0
|
|||
|
|
|||
|
by Algolit
|
|||
|
|
|||
|
Word embeddings are language modelling techniques that through
|
|||
|
multiple mathematical operations of counting and ordering, plot
|
|||
|
words into a multi-dimensional vector space. When embedding
|
|||
|
words, they transform from being distinct symbols into mathemati-
|
|||
|
cal objects that can be multiplied, divided, added or substract-
|
|||
|
ed.
|
|||
|
15
|
|||
|
%%% % % % % % % % %% % %% % %% %% % %% % % %
|
|||
|
% % % % %%% %% %% By distributing the words along the many diagonal lines of the
|
|||
|
% % % multi-dimensional vector space, their new geometrical placements
|
|||
|
% % become impossible to perceive by humans. However, what is gained
|
|||
|
% % % are multiple, simultaneous ways of ordering. Algebraic operations
|
|||
|
% %% % make the relations between vectors graspable again. %
|
|||
|
% %
|
|||
|
% % % This installation uses Gensim, an open-source vector space and
|
|||
|
topic-modelling toolkit implemented in the programming language %
|
|||
|
Python. It allows to manipulate the text using the mathematical
|
|||
|
relationships that emerge between the words, once they have been
|
|||
|
% % % plotted in a vector space. %
|
|||
|
% % % % %
|
|||
|
% % % --- %
|
|||
|
% %
|
|||
|
% Concept & interface: Cristina Cochior
|
|||
|
% % % %
|
|||
|
Technique: word embeddings, word2vec %
|
|||
|
%
|
|||
|
% % Original model: Radim Rehurek and Petr Sojka
|
|||
|
% % %
|
|||
|
% %
|
|||
|
% 0 00 0 0
|
|||
|
0
|
|||
|
% ___ _ 0 _ __ 0 _ 0
|
|||
|
% 0 / __\ | __ _ ___ ___(_)/ _|_ 0 _(_)_ __ __ _
|
|||
|
/ / | |/ _` / __/ __| | |_| | | | | '_ \ / _` |
|
|||
|
/ /___| | (_| \__ \__ \ | _| |_| | | | | | (_| |
|
|||
|
\____/|_|\__,_|___/___/_|_| \__, |_|_| |_|\__, | %
|
|||
|
0 0 0 0 0 |___/ |___/
|
|||
|
_ _ __ __ _ _
|
|||
|
% 0 0 | |_| |__ ___ / / /\ \ \___ _ __| | __| |
|
|||
|
% 0 | __| '_ \ / _ \ \ \/ \/ / _ \| '__| |/ _` |
|
|||
|
0 | |_| | | | __/ \ /\ / (_) | | | | (_| |
|
|||
|
\__|_| |_|\___| \/ \/ \___/|_| |_|\__,_|
|
|||
|
0 0 0
|
|||
|
%
|
|||
|
by Algolit
|
|||
|
|
|||
|
% Librarian Paul Otlet's life work was the construction of the Mun-
|
|||
|
daneum. This mechanical collective brain would house and distrib-
|
|||
|
ute everything ever committed to paper. Each document was classi-
|
|||
|
% fied following the Universal Decimal Classification. Using tele-
|
|||
|
graphs and especially, sorters, the Mundaneum would have been
|
|||
|
able to answer any question from anyone.
|
|||
|
|
|||
|
With the collection of digitized publications we received from
|
|||
|
the Mundaneum, we built a prediction machine that tries to clas-
|
|||
|
% sify the sentence you type in one of the main categories of
|
|||
|
Universal Decimal Classification. You also witness how the ma-
|
|||
|
chine 'thinks'. During the exhibition, this model is regularly
|
|||
|
retrained using the cleaned and annotated data visitors added in
|
|||
|
% Cleaning for Poems and The Annotator. %
|
|||
|
|
|||
|
The main classes of the Universal Decimal Classification system
|
|||
|
are:
|
|||
|
% %
|
|||
|
0 - Science and Knowledge. Organization. Computer Science. Infor-
|
|||
|
mation Science. Documentation. Librarianship. Institutions.
|
|||
|
Publications %
|
|||
|
|
|||
|
1 - Philosophy. Psychology
|
|||
|
|
|||
|
2 - Religion. Theology
|
|||
|
%
|
|||
|
3 - Social Sciences
|
|||
|
%
|
|||
|
4 - vacant
|
|||
|
|
|||
|
16
|
|||
|
%% %% %%% %% % %% 5 - Mathematics. Natural Sciences % % % % % % %% %
|
|||
|
% % %% % % % %% %% %% % % % % % %
|
|||
|
% % % % 6 - Applied Sciences. Medicine, Technology %
|
|||
|
% % % % % % % %%
|
|||
|
% %% % 7 - The Arts. Entertainment. Sport % %% %
|
|||
|
% %% % % % % % %
|
|||
|
% % 8 - Linguistics. Literature % %
|
|||
|
% % % % % % % % % %
|
|||
|
% % % % 9 - Geography. History % %% %
|
|||
|
%% % % %
|
|||
|
% % % ---
|
|||
|
% % %
|
|||
|
% Concept, code, interface: Sarah Garcin, Gijs de Heij, An Mertens
|
|||
|
% % % % %
|
|||
|
% %
|
|||
|
% % 0 0 % 0 %
|
|||
|
%% 000 0 0 % 0
|
|||
|
% ___ 00 _ 0 %
|
|||
|
0 / _ \___ ___ _ __ | | ___ %
|
|||
|
0 0 / /_)/ _ \/ _ \| '_ \| |/ _ \
|
|||
|
0 0 / ___/ __/ (_) | |_) | | __/ 0
|
|||
|
0 \/ % \___|\___/| .__/|_|\___|
|
|||
|
0 0 0 |_| 0
|
|||
|
% _ _ _ 0 _ 0 0
|
|||
|
0 0 __| | ___ _ __( ) |_ | |__ __ ___ _____ %
|
|||
|
% / _` |/ _ \| '_ \/| __| | '_ \ / _` \ \ / / _ \ %
|
|||
|
| (_| | (_) | | | || |_ | | | | (_| |\ V / __/
|
|||
|
0 \__,_|\___/|_| |_| \__| |_| |_|\__,_| \_/ \___|
|
|||
|
_ 0 _ _ 0 0
|
|||
|
| |__ _ _| |_| |_ ___ _ __ ___
|
|||
|
| '_ \| | | | __| __/ _ \| '_ \/ __|
|
|||
|
% 0 | |_) | |_| | |_| || (_) | | | \__ \
|
|||
|
0 |_.__/ \__,_|\__|\__\___/|_| |_|___/
|
|||
|
0 0
|
|||
|
%
|
|||
|
by Algolit
|
|||
|
|
|||
|
Since the early days of artificial intelligence (AI), researchers
|
|||
|
have speculated about the possibility of computers thinking and
|
|||
|
communicating as humans. In the 1980s, there was a first revolu-
|
|||
|
tion in Natural Language Processing (NLP), the subfield of AI
|
|||
|
concerned with linguistic interactions between computers and hu-
|
|||
|
mans. Recently, pre-trained language models have reached state-
|
|||
|
of-the-art results on a wide range of NLP tasks, which intensi-
|
|||
|
% fies again the expectations of a future with AI.
|
|||
|
%
|
|||
|
This sound work, made out of audio fragments of scientific docu-
|
|||
|
mentaries and AI-related audiovisual material from the last half
|
|||
|
century, explores the hopes, fears and frustrations provoked by
|
|||
|
these expectations.
|
|||
|
|
|||
|
---
|
|||
|
|
|||
|
% Concept, sound edit: Javier Lloret
|
|||
|
%
|
|||
|
List of sources:
|
|||
|
'The Machine that Changed the World : Episode IV -- The Thinking
|
|||
|
Machine', 'The Imitation Game', 'Maniac', 'Halt & Catch Fire',
|
|||
|
'Ghost in the Shell', 'Computer Chess', '2001: A Space Odyssey',
|
|||
|
Ennio Morricone, Gijs Gieskes, André Castro.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
17
|
|||
|
CONTEXTUAL STORIES
|
|||
|
ABOUT ORACLES
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Oracles are prediction or profiling machines. They
|
|||
|
are widely used in smartphones, computers, Sweeney based her research on queries of 2184
|
|||
|
tablets. racially associated personal names across two web-
|
|||
|
sites. 88 per cent of first names, identified as
|
|||
|
Oracles can be created using different techniques. being given to more black babies, are found pre-
|
|||
|
One way is to manually define rules for them. As dictive of race, against 96 per cent white. First
|
|||
|
prediction models they are then called rule-based names that are mainly given to black babies, such
|
|||
|
models. Rule-based models are handy for tasks that as DeShawn, Darnell and Jermaine, generated ads
|
|||
|
are specific, like detecting when a scientific pa- mentioning an arrest in 81 to 86 per cent of name
|
|||
|
per concerns a certain molecule. With very little searches on one website and in 92 to 95 per cent
|
|||
|
sample data, they can perform well. on the other. Names that are mainly assigned to
|
|||
|
whites, such as Geoffrey, Jill and Emma, did not
|
|||
|
But there are also the machine learning or statis- generate the same results. The word 'arrest' only
|
|||
|
tical models, which can be divided in two oracles: appeared in 23 to 29 per cent of white name
|
|||
|
'supervised' and 'unsupervised' oracles. For the searches on one site and 0 to 60 per cent on the
|
|||
|
creation of supervised machine learning models, other.
|
|||
|
humans annotate sample text with labels before
|
|||
|
feeding it to a machine to learn. Each sentence, On the website with most advertising, a black-
|
|||
|
paragraph or text is judged by at least three an- identifying name was 25 percent more likely to get
|
|||
|
notators: whether it is spam or not spam, positive an ad suggestive of an arrest record. A few names
|
|||
|
or negative etc. Unsupervised machine learning did not follow these patterns: Dustin, a name
|
|||
|
models don't need this step. But they need large mainly given to white babies, generated an ad sug-
|
|||
|
amounts of data. And it is up to the machine to gestive of arrest in 81 and 100 percent of the
|
|||
|
trace its own patterns or 'grammatical rules'. Fi- time. It is important to keep in mind that the ap-
|
|||
|
nally, experts also make the difference between pearance of the ad is linked to the name itself.
|
|||
|
classical machine learning and neural networks. It is independent of the fact that the name has an
|
|||
|
You'll find out more about this in the Readers arrest record in the company's database.
|
|||
|
zone.
|
|||
|
Reference
|
|||
|
Humans tend to wrap Oracles in visions of Paper: https://dataprivacylab.org/projects/onlin-
|
|||
|
grandeur. Sometimes these Oracles come to the sur- eads/1071-1.pdf
|
|||
|
face when things break down. In press releases,
|
|||
|
these sometimes dramatic situations are called
|
|||
|
'lessons'. However promising their performances --- What is a good employee? ---
|
|||
|
seem to be, a lot of issues remain to be solved.
|
|||
|
How do we make sure that Oracles are fair, that Since 2015 Amazon employs around 575,000 workers.
|
|||
|
every human can consult them, and that they are And they need more. Therefore, they set up a team
|
|||
|
understandable to a large public? Even then, exis- of 12 that was asked to create a model to find the
|
|||
|
tential questions remain. Do we need all types of right candidates by crawling job application web-
|
|||
|
artificial intelligence (AI) systems? And who de- sites. The tool would give job candidates scores
|
|||
|
fines what is fair or unfair? ranging from one to five stars. The potential fed
|
|||
|
the myth: the team wanted it to be a software that
|
|||
|
would spit out the top five human candidates out
|
|||
|
--- Racial AdSense --- of a list of 100. And those candidates would be
|
|||
|
hired.
|
|||
|
A classic 'lesson' in developing Oracles was docu-
|
|||
|
mented by Latanya Sweeney, a professor of Govern- The group created 500 computer models, focused on
|
|||
|
ment and Technology at Harvard University. In specific job functions and locations. They taught
|
|||
|
2013, Sweeney, of African American descent, each model to recognize some 50,000 terms that
|
|||
|
googled her name. She immediately received an ad- showed up on past candidates’ letters. The algo-
|
|||
|
vertisement for a service that offered her ‘to see rithms learned to give little importance to skills
|
|||
|
the criminal record of Latanya Sweeney’. common across IT applicants, like the ability to
|
|||
|
write various computer codes. But they also
|
|||
|
Sweeney, who doesn’t have a criminal record, began learned some decent errors. The company realized,
|
|||
|
a study. She started to compare the advertising before releasing, that the models had taught them-
|
|||
|
that Google AdSense serves to different racially selves that male candidates were preferable. They
|
|||
|
identifiable names. She discovered that she re- penalized applications that included the word
|
|||
|
ceived more of these ads searching for non-white 'women’s,' as in 'women’s chess club captain.' And
|
|||
|
ethnic names, than when searching for tradition- they downgraded graduates of two all-women’s col-
|
|||
|
ally perceived white names.You can imagine how leges.
|
|||
|
damaging it can be when possible employers do a
|
|||
|
simple name search and receive ads suggesting the This is because they were trained using the job
|
|||
|
existence of a criminal record. applications that Amazon received over a ten-year
|
|||
|
period. During that time, the company had mostly
|
|||
|
18
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
hired men. Instead of providing the 'fair' deci-
|
|||
|
sion-making that the Amazon team had promised, the The team developed a model to analyse word embed-
|
|||
|
models reflected a biased tendency in the tech in- dings trained over 100 years of texts. For contem-
|
|||
|
dustry. And they also amplified it and made it in- porary analysis, they used the standard Google
|
|||
|
News word2vec Vectors, a straight-off-the-shelf
|
|||
|
be exceedingly difficult to sue an employer over downloadable package trained on the Google News
|
|||
|
automated hiring: job candidates might never know Dataset. For historical analysis, they used embed-
|
|||
|
that intelligent software was used in the process. dings that were trained on Google Books and the
|
|||
|
Corpus of Historical American English (COHA http-
|
|||
|
Reference s://corpus.byu.edu/coha/) with more than 400 mil-
|
|||
|
https://www.reuters.com/article/us-amazon-com- lion words of text from the 1810s to 2000s. As a
|
|||
|
jobs-automation-insight/amazonscraps-secret-ai-re- validation set to test the model, they trained em-
|
|||
|
cruiting-tool-that-showed-bias-against-women- beddings from the New York Times Annotated Corpus
|
|||
|
idUSKCN1MK08G for every year between 1988 and 2005.
|
|||
|
|
|||
|
The research shows that word embeddings capture
|
|||
|
--- Quantifying 100 Years of Gender and Ethnic changes in gender and ethnic stereotypes over
|
|||
|
Stereotypes --- time. They quantifiy how specific biases decrease
|
|||
|
over time while other stereotypes increase. The
|
|||
|
Dan Jurafsky is the co-author of 'Speech and Lan- major transitions reveal changes in the descrip-
|
|||
|
guage Processing', one of the most influential tions of gender and ethnic groups during the
|
|||
|
books for studying Natural Language Processing women’s movement in the 1960-1970s and the Asian-
|
|||
|
(NLP). Together with a few colleagues at Stanford American population growth in the 1960s and 1980s.
|
|||
|
University, he discovered in 2017 that word embed-
|
|||
|
dings can be a powerful tool to systematically A few examples:
|
|||
|
quantify common stereotypes and other historical
|
|||
|
trends. The top ten occupations most closely associated
|
|||
|
with each ethnic group in the contemporary Google
|
|||
|
Word embeddings are a technique that translates News dataset:
|
|||
|
words to numbered vectors in a multi-dimensional
|
|||
|
space. Vectors that appear next to each other, in- - Hispanic: housekeeper, mason, artist, janitor,
|
|||
|
dicate similar meaning. All numbers will be dancer, mechanic, photographer, baker, cashier,
|
|||
|
grouped together, as well as all prepositions, driver
|
|||
|
person's names, professions. This allows for the
|
|||
|
calculation of words. You could substract London - Asian: professor, official, secretary, conduc-
|
|||
|
from England and your result would be the same as tor, physicist, scientist, chemist, tailor, ac-
|
|||
|
substracting Paris from France. countant, engineer
|
|||
|
|
|||
|
An example in their research shows that the vector - White: smith, blacksmith, surveyor, sheriff,
|
|||
|
for the adjective 'honorable' is closer to the weaver, administrator, mason, statistician, cler-
|
|||
|
vector for 'man' whereas the vector for 'submissive' gy, photographer
|
|||
|
|
|||
|
learned by the algorithm. It will be problematic The 3 most male occupations in the 1930s:
|
|||
|
when the pre-trained embeddings are then used engineer, lawyer, architect.
|
|||
|
for sensitive applications such as search rankings, The 3 most female occupations in the 1930s:
|
|||
|
product recommendations, or translations. This nurse, housekeeper, attendant.
|
|||
|
|
|||
|
can be downloaded as off-the-shelf-packages. Not much has changed in the 1990s.
|
|||
|
|
|||
|
It is known that language reflects and keeps cul- Major male occupations:
|
|||
|
tural stereotypes alive. Using word embeddings to architect, mathematician and surveyor.
|
|||
|
spot these stereotypes is less time-consuming and Female occupations:
|
|||
|
less expensive than manual methods. But the imple- nurse, housekeeper and midwife.
|
|||
|
mentation of these embeddings for concrete predic-
|
|||
|
tion models, has caused a lot of discussion within Reference
|
|||
|
the machine learning community. The biased models https://arxiv.org/abs/1711.08412
|
|||
|
stand for automatic discrimination. Questions are:
|
|||
|
is it actually possible to de-bias these models
|
|||
|
completely? Some say yes, while others disagree: --- Wikimedia's Ores service ---
|
|||
|
instead of retro-engineering the model, we should
|
|||
|
ask whether we need it in the first place. These Software engineer Amir Sarabadani presented the
|
|||
|
researchers followed a third path: by acknowledg- ORES-project in Brussels in November 2017 during
|
|||
|
ing the bias that originates in language, these the Algoliterary Encounter.
|
|||
|
tools become tools of awareness.
|
|||
|
|
|||
|
19
|
|||
|
|
|||
|
|
|||
|
|
|||
|
This 'Objective Revision Evaluation Service' uses was a chat bot that imitated a teenage girl on
|
|||
|
machine learning to help automate critical work on Twitter. She lived for less than 24 hours before
|
|||
|
Wikimedia, like vandalism detection and the re- she was shut down. Few people know that before
|
|||
|
moval of articles. Cristina Cochior and Femke this incident, Microsoft had already trained and
|
|||
|
Snelting interviewed him. released XiaoIce on WeChat, China's most used chat
|
|||
|
application. XiaoIce's success was so promising
|
|||
|
Femke: To go back to your work. In these days you that it led to the development of its American
|
|||
|
tried to understand what it means to find bias in version. However, the developers of Tay were not
|
|||
|
machine learning and the proposal of Nicolas prepared for the platform climate of Twitter.
|
|||
|
Maleve, who gave the workshop yesterday, was nei- Although the bot knew how to distinguish a noun
|
|||
|
ther to try to fix it, nor to refuse to deal with from an adjective, it had no understanding of the
|
|||
|
systems that produce bias, but to work with them. actual meaning of words. The bot quickly learned
|
|||
|
He says that bias is inherent to human knowledge, to copy racial insults and other discriminative
|
|||
|
so we need to find ways to somehow work with it. language it learned from Twitter users and troll
|
|||
|
We're just struggling a bit with what would that attacks.
|
|||
|
mean, how would that work... So I was wondering
|
|||
|
whether you had any thoughts on the question of Tay's appearance and disappearance was an impor-
|
|||
|
bias. tant moment of consciousness. It showed the possi-
|
|||
|
ble corrupt consequences that machine learning can
|
|||
|
Amir: Bias inside Wikipedia is a tricky question have when the cultural context in which the algo-
|
|||
|
because it happens on several levels. One level rithm has to live is not taken into account.
|
|||
|
that has been discussed a lot is the bias in ref-
|
|||
|
erences. Not all references are accessible. So one Reference
|
|||
|
thing that the Wikimedia Foundation has been try- https://chatbotslife.com/the-accountability-of-ai-
|
|||
|
ing to do, is to give free access to libraries case-study-microsofts-tay-experiment-ad577015181f
|
|||
|
that are behind a pay wall. They reduce the bias
|
|||
|
by only using open-access references. Another type
|
|||
|
of bias is the Internet connection, access to the
|
|||
|
Internet. There are lots of people who don't have
|
|||
|
it. One thing about China is that the Internet
|
|||
|
there is blocked. The content against the govern-
|
|||
|
ment of China inside Chinese Wikipedia is higher
|
|||
|
because the editors [who can access the website]
|
|||
|
are not people who are pro government, and try to
|
|||
|
make it more neutral. So, this happens in lots of
|
|||
|
places. But in the matter of artificial intelli-
|
|||
|
gence (AI) and the model that we use at Wikipedia,
|
|||
|
it's more a matter of transparency. There is a
|
|||
|
book about how bias in AI models can break peo-
|
|||
|
ple's lives, it's called 'Weapons of Math Destruc-
|
|||
|
tion'. It talks about AI models that exist in the
|
|||
|
US that rank teachers and it's quite horrible be-
|
|||
|
cause eventually there will be bias. The way to
|
|||
|
deal with it based on the book and their research
|
|||
|
was first that the model should be open source,
|
|||
|
people should be able to see what features are
|
|||
|
used and the data should be open also, so that
|
|||
|
people can investigate, find bias, give feedback
|
|||
|
and report back. There should be a way to fix the
|
|||
|
system. I think not all companies are moving in
|
|||
|
that direction, but Wikipedia, because of the val-
|
|||
|
ues that they hold, are at least more transparent
|
|||
|
and they push other people to do the same thing.
|
|||
|
|
|||
|
Reference
|
|||
|
https://gitlab.constantvzw.org/algolit/algolit
|
|||
|
/blob/master/algoliterary_encounter/Interview%
|
|||
|
20with%20Amir/AS.aac
|
|||
|
|
|||
|
|
|||
|
--- Tay ---
|
|||
|
|
|||
|
One of the infamous stories is that of the machine
|
|||
|
learning programme Tay, designed by Microsoft. Tay
|
|||
|
|
|||
|
|
|||
|
20
|
|||
|
cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean
|
|||
|
cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean
|
|||
|
cleaners clean cleaners clean cleaners clean cleaners clean
|
|||
|
cleaners clean cleaners clean cleaners clean
|
|||
|
cleaners clean cleaners clean cleaners clean cle
|
|||
|
ners clean cleaners clean cleaners clean
|
|||
|
cleaners clean cleaners clean cleaners clean
|
|||
|
cleaners clean cleaners clean cleaners
|
|||
|
lean cleaners clean cleaners clean
|
|||
|
cleaners clean cleaners clean
|
|||
|
cleaners clean cleaners clean cle
|
|||
|
ners clean cleaners clean cleaners
|
|||
|
clean cleaners clean cleaners
|
|||
|
lean cleaners clean cleane
|
|||
|
s clean cleaners clean
|
|||
|
cleaners clean cleaners clean
|
|||
|
cleaners clean cleaners clean
|
|||
|
cleaners clean cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean cleaners clean
|
|||
|
cleaners clean cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean cleaners
|
|||
|
clean cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean cleaners
|
|||
|
clean cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean cle
|
|||
|
ners clean cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean cleaners
|
|||
|
lean cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
cleaners clean
|
|||
|
21
|
|||
|
r u e n 7 c %9 2 y m V +-+-+-+-+-+-+-+-+ e4 +-+-+-+-+-+ 9 -t 0n neof e 5 r6 7 kln
|
|||
|
ci p '.s w s u 18 u n |c|l|e|a|n|e|r|s| 2 |c|l|e|a|n| et.t o % s eii4t i ktu 4i w +
|
|||
|
t 6 . 3e -6 6 rVle 17 +-+-+-+-+-+-+-+-+ rg +-+-+-+-+-+ .e o n 7 ci i 0 e h eR e85 orh
|
|||
|
n x h r 4 h t5 7hoh 4 t ei g + n e3 tt np% k s +h_ hees ir w n +6 l rt 8 oe e Fe
|
|||
|
r5b t ua0e 3ei n a 1 t8 rd t 7 li \ 7n v2 tq e e6 a as o
|
|||
|
2b t t m oe f c8 lx - g9 r - -s+ +-+-+ h +-+-+-+-+-+-+ 8f o1 Ao % r - 5i 2 e - r
|
|||
|
x p n4h e6 s n8 / s7 . 95 sti |w|e| eno |h|e|l|p|e|d| +e r a2 sy n gyl 2u e sti6t
|
|||
|
ch% _ 1r se o + t t 4, 1 t9 l +-+-+ e +-+-+-+-+-+-+ t r i 7 rs u ie o o,4 h
|
|||
|
, 5 5h g gs 6u5e e0 95 eif e % +-+-+ s 9 +-+-+-+-+-+-+-+ o+ m iy n6 m _4 l oae s+ da
|
|||
|
e w i_|e e a 6 an |w|e| | |c|l|e|a|n|e|d| 7 i a e r l 7
|
|||
|
se 8w ,p+tn i d t 1 g s ae l +-+-+ tec +-+-+-+-+-+-+-+ - ts e e,d % e 8e i
|
|||
|
r i _6sog y L5 e v +-+-+-+-+-+ +-+-+-+-+ er +-+-+ +-+-+-+-+-+-+ Ies f e/ 8rh gr o 5 ac55 e
|
|||
|
( h s s9 |h|u|m|a|n| |w|o|r|k| 96 7 |i|s| |n|e|e|d|e|d| i 8 d 13 l , i
|
|||
|
- s tt 1 _ S +-+-+-+-+-+ +-+-+-+-+ _ +-+-+ +-+-+-+-+-+-+ r v Mr_ a3 f r ,
|
|||
|
a s l n 87 +-+-+-+-+-+-+-+-+-+-+-+ rh 9 t r 7 36 w i n e 2 n d m
|
|||
|
i4 +2 c 6 o |p|o|o|r|l|y|-|p|a|i|d| w n 3 g e - 6 tk o- r r
|
|||
|
w9 4 t 8p ie c rVv 5 +-+-+-+-+-+-+-+-+-+-+-+ b n h - 6 xc te|t ,2 5 n
|
|||
|
4 4 ,in 7 4( d +-+-+-+-+-+-+-+-+-+-+-+ l +-+-+-+-+-+ +-+-+-+ -d ah v + n5 . 4 6s_
|
|||
|
t 2- i l |f|r|e|e|l|a|n|c|e|r|s| te3c |c|a|r|r|y| |o|u|t| l e oee 1n 7 \ y1k
|
|||
|
r r l p r 6 e +-+-+-+-+-+-+-+-+-+-+-+ 6|p +-+-+-+-+-+ +-+-+-+ s p o2 ) t -e : p 8 h
|
|||
|
h9 h o 4l +-+-+-+-+-+-+-+-+-+-+ \ +-+-+ +-+-+-+-+-+-+-+-+-+ +-+-+-+-+ nb h 7 s4i1 3
|
|||
|
T z3 |h e 9 |v|o|l|u|n|t|e|e|r|s| 9 |d|o| |f|a|n|t|a|s|t|i|c| |w|o|r|k| 9 ws w 5 e6 x
|
|||
|
a` o +-+-+-+-+-+-+-+-+-+-+ +-+-+ +-+-+-+-+-+-+-+-+-+ +-+-+-+-+ ih l 3 6
|
|||
|
7 r 6 d G i6 1 3 e1 +-+-+-+-+-+-+-+ +-+-+-+-+-+-+ +-+-+ +-+-+-+-+ eir c e n% ui
|
|||
|
l r 6 6s t r |w|h|o|e|v|e|r| |c|l|e|a|n|s| |u|p| |t|e|x|t| h 6 t i
|
|||
|
t tc w a s e 9 +-+-+-+-+-+-+-+ F +-+-+-+-+-+-+ +-+-+ +-+-+-+-+ , 5 9s9 w e e
|
|||
|
n m5 e 4 Mi e c i a U u r e 2 a i % .S g6 u 3
|
|||
|
_t f 2 t 5 t6 v V c a i f- ee l 9rni/ 3 a 7e 1
|
|||
|
1o n 3 2 tn t 5 1o 7 r s / % uio +
|
|||
|
9 f a 4 - e o e t + i r + s 2
|
|||
|
ls_ nr e w i l V - 8e t 5 +i v 2 p o
|
|||
|
l n e j n tr l V| n e w L r 8
|
|||
|
c l1 l i i a 8 t g0 y s
|
|||
|
, a u r9 e 8 4 9 e | e 3
|
|||
|
n g8 r e? M d r a i l c
|
|||
|
- n t r 4 e r l c ii e a
|
|||
|
p r a a h 6 l 3 e s
|
|||
|
i 4 c o | 6 v rh p7 3 % h t a
|
|||
|
e e 1 6 6 p 15 8 e a n s d o 1 i 2 n
|
|||
|
s e m t 2 w v a 6 i i
|
|||
|
r 7 | a e 5 7 s 3 8 i 4 7
|
|||
|
e y 4 3 w 5 l unw5 4ie o3 439 o i %
|
|||
|
r 6 e a 4a f n e
|
|||
|
h a 5 o s i l s
|
|||
|
- s | n D 4
|
|||
|
e 3 - 2 5 h a 1 V p n v
|
|||
|
+ 7 8n n a ar ) v
|
|||
|
. n2 t 5 6r 8 |
|
|||
|
u o _ e r l n, r 1 e
|
|||
|
n ,e r s 7 a 7
|
|||
|
a e h t y d a 3
|
|||
|
u | 2 a s 4 t
|
|||
|
6 e t66 e % 2 3 y 3 n
|
|||
|
a e o i , t 4 i e g c r
|
|||
|
l t w 9 2 a
|
|||
|
h v t , p c a r h c
|
|||
|
l 4 g p1
|
|||
|
z i t o m a % a
|
|||
|
i k | a i e
|
|||
|
s a v c a , l lp + d 2 a
|
|||
|
3 o t
|
|||
|
e
|
|||
|
5 n t p s i a 6 r
|
|||
|
e 5 y,r m e ,
|
|||
|
g i 7 s i 5 s a
|
|||
|
a a % r
|
|||
|
3 u p n
|
|||
|
e \ 5 i p o l i
|
|||
|
|
|||
|
22
|
|||
|
% V V V V V V V % V % % % % %% % % %% % % % % % % %
|
|||
|
V V V V V V V V V V V V V V V V % % % % 0 % % 0 % 0 0 % 0 % % %%% %
|
|||
|
V V V V V V V V % V % 0 % 0 0 % % %
|
|||
|
% % % %% ___ _ 0 % 00 _ % % %
|
|||
|
% % % % 00 / __\ | ___ __ _ _ __ (_)_ __ __ _ %
|
|||
|
CLEANERS % % / / | |/ _ \/ _` | '_ \| | '_ \ / _` | 0 %
|
|||
|
% % % % % % 00 / /___| | __/ (_| | | | | | | | | (_| | %
|
|||
|
% % % % % % 0 \____/|_|\___|\__,_|_| |_|_|_| |_|\__, | %
|
|||
|
V V V V V V V V % 0 |___/ % %
|
|||
|
V V V V V V V V V V V V V V V V __ 0 ___ 0 % 0
|
|||
|
V V V V V V V V V 0 / _| ___ _ __ / _ \___ ___ _ __ ___ ___ %
|
|||
|
V V V V V V V V 0 % | |_ / _ \| '__| / /_)/ _ \ / _ \ '_ ` _ \/ __| %
|
|||
|
V V V V V V V V V V V V V V V V 0 | _| (_) | | / ___/ (_) | __/ | | | | \__ \
|
|||
|
V V V V V V V V V |_| \___/|_| \/ 0 \___/ \___|_| |_| |_|___/
|
|||
|
0 0
|
|||
|
Algolit chooses to work with texts %%% %
|
|||
|
that are free of copyright. This by Algolit % % %
|
|||
|
means that they have been published % % %
|
|||
|
under a Creative Commons 4.0 li- For this exhibition we worked with 3 per cent of the Mundaneum's
|
|||
|
cense – which is rare - or that archive. These documents were first scanned or photographed. To
|
|||
|
they are in the public domain be- make the documents searchable they were transformed into text us-
|
|||
|
cause the author died more than 70 ing Optical Character Recognition software (OCR). OCR are algo-
|
|||
|
years ago. This is the case for the % rithmic models that are trained on other texts. They have learned
|
|||
|
publications of the Mundaneum. We to identify characters, words, sentences and paragraphs. The
|
|||
|
received 203 documents that we software often makes 'mistakes'. It might recognize a wrong char-
|
|||
|
helped turn into datasets. They are acter, it might get confused by a stain an unusual font or the
|
|||
|
now available for others online. reverse side of the page being visible. %
|
|||
|
Sometimes we had to deal with poor % % %
|
|||
|
text formats, and we often dedi- While these mistakes are often considered noise, confusing the
|
|||
|
cated a lot of time to cleaning up training, they can also be seen as poetic interpretations of the
|
|||
|
documents. We were not alone in do- algorithm. They show us the limits of the machine. And they also
|
|||
|
ing this. reveal how the algorithm might work, what material it has seen in
|
|||
|
training and what is new. They say something about the standards %
|
|||
|
Books are scanned at high resolu- of its makers. In this installation we ask your help in verifying
|
|||
|
tion, page by page. This is time- our dataset. As a reward we'll present you with a personal algo-
|
|||
|
consuming, laborious human work and rithmic improvisation.
|
|||
|
often the reason why archives and
|
|||
|
libraries transfer their collec- ---
|
|||
|
tions and leave the job to compa- %
|
|||
|
nies like Google. The photos are Concept, code, interface: Gijs de Heij
|
|||
|
converted into text via OCR (Opti- %
|
|||
|
cal Character Recognition), a soft-
|
|||
|
ware that recognizes letters, but 0 0
|
|||
|
often makes mistakes, especially 0 0 0
|
|||
|
when it has to deal with ancient 0 ___ _ _ _ _ 0 _ _
|
|||
|
fonts and wrinkled pages. Yet more 0 0 / (_)___| |_ _ __(_) |__ _ _| |_ ___ __| |
|
|||
|
wearisome human work is needed to / /\ / / __| __| '__| | '_ \| | | | __/ _ \/ _` |
|
|||
|
improve the texts. This is often 0 / /_//| \__ \ |_| | | | |_) | |_| | || __/ (_| |
|
|||
|
carried out by poorly-paid free- /___,' |_|___/\__|_| |_|_.__/ \__,_|\__\___|\__,_|
|
|||
|
lancers via micro-payment platforms ___ 0 __ 0 0 _
|
|||
|
like Amazon's Mechanical Turk; or / _ \_ __ ___ ___ / _|_ __ ___ __ _ __| | ___ _ __
|
|||
|
by volunteers, like the community / /_)/ '__/ _ \ / _ \| |_| '__/ _ \/ _` |/ _` |/ _ \ '__|
|
|||
|
around the Distributed Proofreaders / ___/| | | (_) | (_) | _| | | __/ (_| | (_| | __/ |
|
|||
|
Project, which does fantastic work. 0 \/ |_| \___/ \___/|_| |_| \___|\__,_|\__,_|\___|_|
|
|||
|
Whoever does it, or wherever it is 0 0 ___ 0
|
|||
|
done, cleaning up texts is a tower- 0 / __| 0
|
|||
|
ing job for which no structural au- 0 0 \__ \ 0
|
|||
|
tomation yet exists. 0 0 |___/ 0
|
|||
|
0 0 00
|
|||
|
|
|||
|
by Algolit
|
|||
|
|
|||
|
Distributed Proofreaders is a web-based interface and an interna-
|
|||
|
tional community of volunteers who help converting public domain
|
|||
|
books into e-books. For this exhibition they proofread the Munda-
|
|||
|
neum publications that appeared before 1923 and are in the public
|
|||
|
domain in the US. Their collaboration meant a great relief for
|
|||
|
the members of Algolit. Less documents to clean up! %
|
|||
|
|
|||
|
23
|
|||
|
% % % % % % % % All the proofread books have been made available on the Project
|
|||
|
% % % % Gutenberg archive. % % %% % % % %
|
|||
|
% % % % % % %% % % % % % %
|
|||
|
% % % % % For this exhibition, An Mertens interviewed Linda Hamilton, the
|
|||
|
% % general manager of Distributed Proofreaders. % %
|
|||
|
%% % % % % % % % % % % %
|
|||
|
% % % --- % % % % %% % % %% %
|
|||
|
% % % % % %
|
|||
|
% Interview: An Mertens % % %
|
|||
|
% % % % % %
|
|||
|
Editing: Michael Murtaugh, Constant %
|
|||
|
% % % % % %
|
|||
|
% %
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
24
|
|||
|
CONTEXTUAL STORIES FOR CLEANERS
|
|||
|
|
|||
|
|
|||
|
|
|||
|
--- Project Gutenberg and Distributed Proofreaders
|
|||
|
--- change.
|
|||
|
|
|||
|
Project Gutenberg is our Ali Baba cave. It offers The Life Instinct: unification; the eternal re-
|
|||
|
more than 58,000 free eBooks to be downloaded or turn; the perpetuation and MAINTENANCE of the ma-
|
|||
|
read online. Works are accepted on Gutenberg when terial; survival systems and operations; equilib-
|
|||
|
their U.S. copyright has expired. Thousands of rium.
|
|||
|
volunteers digitize and proofread books to help
|
|||
|
the project. An essential part of the work is done B. Two basic systems: Development and Maintenance.
|
|||
|
through the Distributed Proofreaders project. This
|
|||
|
is a web-based interface to help convert public The sourball of every revolution: after the revo-
|
|||
|
domain books into e-books. Think of text files, lution, who’s going to try to spot the bias in the
|
|||
|
EPUBs, Kindle formats. By dividing the workload output?
|
|||
|
into individual pages, many volunteers can work on
|
|||
|
a book at the same time; this speeds up the clean- Development: pure individual creation; the new;
|
|||
|
ing process. change; progress; advance; excitement; flight or
|
|||
|
fleeing.
|
|||
|
During proofreading, volunteers are presented with
|
|||
|
a scanned image of the page and a version of the Maintenance: keep the dust off the pure individual
|
|||
|
text, as it is read by an OCR algorithm trained to creation; preserve the new; sustain the change;
|
|||
|
recognize letters in images. This allows the text protect progress; defend and prolong the advance;
|
|||
|
to be easily compared to the image, proofread, and renew the excitement; repeat the flight; show your
|
|||
|
sent back to the site. A second volunteer is then work – show it again, keep the git repository
|
|||
|
presented with the first volunteer's work. She groovy, keep the data analysis revealing.
|
|||
|
verifies and corrects the work as necessary, and
|
|||
|
submits it back to the site. The book then simi- Development systems are partial feedback systems
|
|||
|
larly goes through a third proofreading round, with major room for change.
|
|||
|
plus two more formatting rounds using the same web
|
|||
|
interface. Once all the pages have completed these Maintenance systems are direct feedback systems
|
|||
|
steps, a post-processor carefully assembles them with little room for alteration.
|
|||
|
into an e-book and submits it to the Project
|
|||
|
Gutenberg archive. C. Maintenance is a drag; it takes all the fucking
|
|||
|
time (lit.)
|
|||
|
We collaborated with the Distributed Proofreaders
|
|||
|
project to clean up the digitized files we re- The mind boggles and chafes at the boredom.
|
|||
|
ceived from the Mundaneum collection. From Novem-
|
|||
|
ber 2018 until the first upload of the cleaned-up The culture assigns lousy status on maintenance
|
|||
|
book 'L'Afrique aux Noirs' in February 2019, An jobs = minimum wages, Amazon Mechanical Turks =
|
|||
|
Mertens exchanged about 50 emails with Linda virtually no pay.
|
|||
|
Hamilton, Sharon Joiner and Susan Hanlon, all vol-
|
|||
|
unteers from the Distributed Proofreaders project. Clean the set, tag the training data, correct the
|
|||
|
The conversation is published here. It might in- typos, modify the parameters, finish the report,
|
|||
|
spire you to share unavailable books online. keep the requester happy, upload the new version,
|
|||
|
attach words that were wrongly separated by OCR
|
|||
|
back together, complete those Human Intelligence
|
|||
|
--- An algoliterary version of the Maintenance Tasks, try to guess the meaning of the requester's
|
|||
|
Manifesto --- formatting, you must accept the HIT before you can
|
|||
|
submit the results, summarize the image, add the
|
|||
|
In 1969, one year after the birth of her first bounding box, what's the semantic similarity of
|
|||
|
child, the New York artist Mierle Laderman Ukeles this text, check the translation quality, collect
|
|||
|
wrote a Manifesto for Maintenance Art. The mani- your micro-payments, become a hit Mechanical Turk.
|
|||
|
festo calls for a readdressing of the status of
|
|||
|
maintenance work both in the private, domestic Reference
|
|||
|
space, and in public. What follows is an altered https://www.arnolfini.org.uk/blog/manifesto-for-
|
|||
|
version of her text inspired by the work of the maintenance-art-1969
|
|||
|
Cleaners.
|
|||
|
|
|||
|
IDEAS --- A bot panic on Amazon Mechanical Turk ---
|
|||
|
|
|||
|
A. The Death Instinct and the Life Instinct: Amazon's Mechanical Turk takes the name of a
|
|||
|
chess-playing automaton from the eighteenth centu-
|
|||
|
The Death Instinct: separation; categorization; ry. In fact, the Turk wasn't a machine at all. It
|
|||
|
avant-garde par excellence; to follow the pre- was a mechanical illusion that allowed a human
|
|||
|
dicted path to death – run your own code; dynamic chess master to hide inside the box and manually
|
|||
|
operate it. For nearly 84 years, the Turk won most
|
|||
|
25
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
of the games played during its demonstrations
|
|||
|
around Europe and the Americas. Napoleon Bonaparte
|
|||
|
is said to have been fooled by this trick too.
|
|||
|
|
|||
|
The Amazon Mechanical Turk is an online platform
|
|||
|
for humans to execute tasks that algorithms can-
|
|||
|
not. Examples include annotating sentences as be-
|
|||
|
ing positive or negative, spotting number plates,
|
|||
|
discriminating between face and non-face. The jobs
|
|||
|
posted on this platform are often paid less than a
|
|||
|
cent per task. Tasks that are more complex or re-
|
|||
|
quire more knowledge can be paid up to several
|
|||
|
cents. To earn a living, Turkers need to finish as
|
|||
|
many tasks as fast as possible, leading to in-
|
|||
|
evitable mistakes. As a result, the requesters
|
|||
|
have to incorporate quality checks when they post
|
|||
|
a job on the platform. They need to test whether
|
|||
|
the Turker actually has the ability to complete
|
|||
|
the task, and they also need to verify the re-
|
|||
|
sults. Many academic researchers use Mechanical
|
|||
|
Turk as an alternative to have their students exe-
|
|||
|
cute these tasks.
|
|||
|
|
|||
|
In August 2018 Max Hui Bai, a psychology student
|
|||
|
from the University of Minnesota, discovered that
|
|||
|
the surveys he conducted with Mechanical Turk were
|
|||
|
full of nonsense answers to open-ended questions.
|
|||
|
He traced back the wrong answers and found out
|
|||
|
that they had been submitted by respondents with
|
|||
|
duplicate GPS locations. This raised suspicion.
|
|||
|
Though Amazon explicitly prohibits robots from
|
|||
|
completing jobs on Mechanical Turk, the company
|
|||
|
does not deal with the problems they cause on
|
|||
|
their platform. Forums for Turkers are full of
|
|||
|
conversations about the automation of the work,
|
|||
|
sharing practices of how to create robots that can
|
|||
|
even violate Amazon’s terms. You can also find
|
|||
|
videos on YouTube that show Turkers how to write a
|
|||
|
bot to fill in answers for you.
|
|||
|
|
|||
|
Kristy Milland, an Mechanical Turk activist, says:
|
|||
|
'Mechanical Turk workers have been treated really,
|
|||
|
really badly for 12 years, and so in some ways I
|
|||
|
see this as a point of resistance. If we were paid
|
|||
|
fairly on the platform, nobody would be risking
|
|||
|
their account this way.'
|
|||
|
|
|||
|
Bai is now leading a research project among social
|
|||
|
scientists to figure out how much bad data is in
|
|||
|
use, how large the problem is, and how to stop it.
|
|||
|
But it is impossible at the moment to estimate how
|
|||
|
many datasets have become unreliable in this way.
|
|||
|
|
|||
|
References
|
|||
|
https://requester.mturk.com/create/projects/new
|
|||
|
|
|||
|
https://www.wired.com/story/amazon-mechanical-
|
|||
|
turk-bot-panic/
|
|||
|
|
|||
|
https://www.maxhuibai.com/blog/evidence-that-re-
|
|||
|
sponses-from-repeating-gps-are-random
|
|||
|
|
|||
|
http://timryan.web.unc.edu/2018/08/12/data-contam-
|
|||
|
ination-on-mturk/
|
|||
|
|
|||
|
26
|
|||
|
informants inform informants inform informants inform informants inform informants inform info
|
|||
|
mants inform informants inform informants inform informants inform informants i
|
|||
|
form informants inform informants inform informants inform info
|
|||
|
mants inform informants inform informants inform informants info
|
|||
|
m informants inform informants inform informants inform
|
|||
|
informants inform informants inform informants
|
|||
|
inform informants inform informants inform
|
|||
|
informants inform informants inform informants info
|
|||
|
m informants inform informants inform
|
|||
|
informants inform informants inform
|
|||
|
informants inform informants inform in
|
|||
|
ormants inform informants inform infor
|
|||
|
ants inform informants inform info
|
|||
|
mants inform informants inform
|
|||
|
informants inform informants inform
|
|||
|
informants inform informants inform
|
|||
|
informants inform informants inform
|
|||
|
informants inform infor
|
|||
|
ants inform informants inform
|
|||
|
informants inform informants inform
|
|||
|
informants inform
|
|||
|
informants inform informants inform
|
|||
|
informants inform
|
|||
|
informants inform informants inform
|
|||
|
informants inform
|
|||
|
informants inform informants inform
|
|||
|
informants inform
|
|||
|
informants inform informants
|
|||
|
inform informants inform
|
|||
|
informants inform
|
|||
|
informants inform informants
|
|||
|
inform informants inform
|
|||
|
informants inform
|
|||
|
informants inform
|
|||
|
informants inform informants info
|
|||
|
m informants inform
|
|||
|
informants inform
|
|||
|
informants inform
|
|||
|
informants inform
|
|||
|
informants inform informants
|
|||
|
inform informants inform
|
|||
|
informants inform
|
|||
|
informants inform
|
|||
|
informants inform
|
|||
|
informants inform
|
|||
|
informants inform
|
|||
|
informants inform
|
|||
|
informants inform
|
|||
|
informants inform
|
|||
|
informants inform
|
|||
|
informants inform in
|
|||
|
ormants inform info
|
|||
|
mants inform infor
|
|||
|
ants inform infor
|
|||
|
ants inform info
|
|||
|
mants inform in
|
|||
|
ormants inform
|
|||
|
informants inform
|
|||
|
informants inform
|
|||
|
informants inform
|
|||
|
informants inform
|
|||
|
informants inform
|
|||
|
informants inform
|
|||
|
informants inform
|
|||
|
informants inform
|
|||
|
informants inform
|
|||
|
informants inform
|
|||
|
informants inform
|
|||
|
informants inform
|
|||
|
27
|
|||
|
r 8h3t i5 4 d 7 + +-+-+-+-+-+-+-+-+-+-+ c a +-+-+-+-+-+-+ e f n no6 - - t -as 7 ( e
|
|||
|
a ah 5al ,n ri B |i|n|f|o|r|m|a|n|t|s| l |i|n|f|o|r|m| , 35e t s evn7 73r o2/ L ep - e
|
|||
|
t : ca,i ma eeslh | +-+-+-+-+-+-+-+-+-+-+ r_ T +-+-+-+-+-+-+ 2o 73 pjt 7ng% e 84
|
|||
|
n 7 hnprs s9i 3a1 9e _ 9l e o pi rsa d o ii/5am sd rr1 1 n% + n8w
|
|||
|
h|29 e s _ 3 . o i c i. e+1onIa 4 f p | lu e v1r _nth2i a%a ce 1e 7e 1y |t e r
|
|||
|
xn r 8 sF w t -e +-+-+-+-+ +-+-+-+-+-+-+-+ e +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+ 1 i2 n l cn r3
|
|||
|
t e e ,i n ibC 6 |e|a|c|h| |d|a|t|a|s|e|t| |c|o|l|l|e|c|t|s| |d|i|f|f|e|r|e|n|t| iw tc a318
|
|||
|
e o l a Me -o r + +-+-+-+-+ +-+-+-+-+-+-+-+ d 9 +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+ +yc l p
|
|||
|
+6 n 8 , a -rsb es 3 t t | bt ,p q +-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+ 6 1d e 4 , 1 +
|
|||
|
lk o95 sf s e - 2 b 0 rl n la / S f n |i|n|f|o|r|m|a|t|i|o|n| |a|b|o|u|t| 1 4r y7 n
|
|||
|
i _ m ec cf 2|r 8ra5 n l 6t +-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+ o t | r e
|
|||
|
h_ ae3 5 Ti nf ao 7 l t n 9 9 h +e e-1 +-+-+-+ +-+-+-+-+-+ 7 t 8 - f mme 5
|
|||
|
t og m 9 i r. m l l j +t3 9 |t|h|e| |w|o|r|l|d| e97 3 9 t i s - o s
|
|||
|
_i n l o er 8 n petc 141 s / i +-+-+-+ +-+-+-+-+-+ - 9 w 1 1 b
|
|||
|
t4, r e u n8 a |t +-+-+-+-+-+-+-+-+ , |c +-+-+-+ +-+-+-+-+-+-+ +-+-+-+-+ 2r t 3
|
|||
|
o 6 9.o7e 7 Ce |d|a|t|a|s|e|t|s| V |a|r|e| |i|m|b|u|e|d| |w|i|t|h| 7 ig g ig 3xa
|
|||
|
i r- p R h 8 rr m g _ t +-+-+-+-+-+-+-+-+ +-+-+-+ +-+-+-+-+-+-+ +-+-+-+-+ n f -c , +
|
|||
|
- - 9 f k i r 6 e 665 a +-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+ t m 1 9 6
|
|||
|
om _ 1e Tlh4 , f vr E |c|o|l|l|e|c|t|o|r|'|s| |b|i|a|s| 0 7 t e 2t
|
|||
|
E5 r o r i i b e hw i a ne +-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+ t a
|
|||
|
m, m4 - a +-+-+-+-+ +-+-+-+-+-+-+-+-+ d +-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ 118 2a 6
|
|||
|
- l l |s|o|m|e| |d|a|t|a|s|e|t|s| rt3 |c|o|m|b|i|n|e| |m|a|c|h|i|n|i|c| k f e
|
|||
|
d i i 1 e , h +-+-+-+-+ +-+-+-+-+-+-+-+-+ 5 +-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ i % _e r
|
|||
|
_ f oi e u s dt y +-+-+-+-+-+ +-+-+-+-+ +-+-+-+-+-+ i n9 7 o
|
|||
|
f f 5 h l9 a a b n |l|o|g|i|c| |w|i|t|h| |h|u|m|a|n| s n 79 e if e 0
|
|||
|
s i ln 6t a y t | ’7 / h +-+-+-+-+-+ +-+-+-+-+ +-+-+-+-+-+ 1 - 1n
|
|||
|
s yn p p r oe xy +-+-+-+-+-+ c n d 6 _i a n
|
|||
|
- n iu a v s, d o 7 eu e i |l|o|g|i|c| e as d m 2 v|h - | r
|
|||
|
aL t5 l7 st A c S r c n r / +-+-+-+-+-+ tt o dr | V
|
|||
|
s 9 +-+-+-+-+-+-+ +-+-+-+-+ d 7 + 5 77 2 t
|
|||
|
z l x n |m|o|d|e|l|s| |t|h|a|t| d i n oS ad + a a a . _ t
|
|||
|
ie 7 n n +-+-+-+-+-+-+ +-+-+-+-+ is r t 9 , | f 4 4 a t
|
|||
|
8 - 8 e +-+-+-+-+-+-+-+ 1 o 8 h h + t
|
|||
|
s +m tb rh f 5 6r |r|e|q|u|i|r|e| s o l2 2 | + s o n
|
|||
|
a - rr o n +-+-+-+-+-+-+-+ m | o y 4 r _
|
|||
|
5 i +-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+ d |m ? e
|
|||
|
b 4 _ l ` |s|u|p|e|r|v|i|s|i|o|n| |m|u|l|t|i|p|l|y| |t|h|e| - s n 7 1
|
|||
|
Tn n - +-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+ d 5
|
|||
|
ls t v 3i . - 6 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ h _ 28 9f
|
|||
|
4 s i h s- 4 4 l i |s|u|b|j|e|c|t|i|v|i|t|i|e|s| e a u
|
|||
|
t + 9 fh lh,d +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 6 c 8
|
|||
|
3 r c i 1 +-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+ +-+-+-+-+ p -
|
|||
|
fn o |m|o|d|e|l|s| c |p|r|o|p|a|g|a|t|e| |w|h|a|t| + 5 M 4
|
|||
|
5 r g +-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+ +-+-+-+-+ i t f
|
|||
|
9 t i y +-+-+-+-+-+-+-+ +-+-+-+-+ sv 7
|
|||
|
6r +e n t7 + A h |t|h|e|y|'|v|e| |b|e|e|n| o 45 6
|
|||
|
m s t 9 o o _ s +-+-+-+-+-+-+-+ +-+-+-+-+ t o+ u e
|
|||
|
s k8 3 l 2 - e +-+-+-+-+-+-+ e 6 e- t -
|
|||
|
+ es n 5 e o 4 |t|a|u|g|h|t| s 9
|
|||
|
t p e w , : o - +-+-+-+-+-+-+ t t 3
|
|||
|
e 6 r 8 t +-+-+-+-+ +-+-+ +-+-+-+ a eo m m 3
|
|||
|
e |s|o|m|e| |o|f| |t|h|e| + h e c
|
|||
|
ee +-+-+-+-+ +-+-+ +-+-+-+ c h
|
|||
|
o +-+-+-+-+-+-+-+-+ +-+-+-+-+ +-+-+ +-+-+-+-+-+-+-+ +-+-+
|
|||
|
i k t |d|a|t|a|s|e|t|s| |p|a|s|s| |a|s| |d|e|f|a|u|l|t| |i|n| o o o
|
|||
|
+-+-+-+-+-+-+-+-+ i +-+-+-+-+ +-+-+ +-+-+-+-+-+-+-+ +-+-+ r d
|
|||
|
a i m a . 1 +-+-+-+ +-+-+-+-+-+-+-+ s u
|
|||
|
r h o 2 |t|h|e| |m|a|c|h|i|n|e| l t
|
|||
|
+ e a +-+-+-+ +-+-+-+-+-+-+-+ d 7 |
|
|||
|
e a eo 4 +-+-+-+-+-+-+-+-+ +-+-+-+-+-+
|
|||
|
h n |l|e|a|r|n|i|n|g| |f|i|e|l|d| s n
|
|||
|
t _s n +-+-+-+-+-+-+-+-+ +-+-+-+-+-+
|
|||
|
t n o +-+-+-+-+-+-+ +-+-+-+-+-+ +-+-+-+-+-+-+-+-+ e V
|
|||
|
a d |h|u|m|a|n|s| |g|u|i|d|e| |m|a|c|h|i|n|e|s| u n
|
|||
|
+-+-+-+-+-+-+ +-+-+-+-+-+ +-+-+-+-+-+-+-+-+
|
|||
|
c e 5 1 2
|
|||
|
r 6 r n 6 f
|
|||
|
l o l
|
|||
|
|
|||
|
28
|
|||
|
% V V V V V V V % V % %% % %%% %% %% % %%% %%% % % %%
|
|||
|
V V V V V V V V V V V V V V V V % % % % % %% 0 %% 0 % % % % % %%% %
|
|||
|
V V V V V V V V V % % %% % % 0 0 % % % %
|
|||
|
% % % % % % % % % % 00 0 _ % % % % % %% % %
|
|||
|
% % % 0 /_\ _ __ % %
|
|||
|
% INFORMANTS % % % //_\\| '_ \ % 0
|
|||
|
% % % % % 0 % % 0 / _ \ | | | % % % %%
|
|||
|
% % % 0 \_/ \_/_| |_| 0 0
|
|||
|
V V V V V % V V V % __ _ _ 00 % 00 0 _ %
|
|||
|
V V V V V V V V V V V V V V V V 0 /__\ |_| |__ _ __ ___ __ _ _ __ __ _ _ __ | |__ _ _
|
|||
|
V V V V V V V V V /_\ | __| '_ \| '_ \ / _ \ / _` | '__/ _` | '_ \| '_ \| | | | %
|
|||
|
V V V V V V V V //__ | |_| | | | | | | (_) | (_| | | | (_| | |_) | | | | |_| |
|
|||
|
V V V V V V V V V V V V V V V V % \__/ \__|_| |_|_| |_|\___/ \__, |_| \__,_| .__/|_| |_|\__, |
|
|||
|
V V V V % V V V V V 0 0 % 0 % |___/ |_| 0 |___/
|
|||
|
% 0 0 __ 0 ___ % _ _ 0 %
|
|||
|
Machine learning algorithms need ___ / _| / \__ _| |_ __ _ ___ ___| |_ ___
|
|||
|
guidance, whether they are super- 0 / _ \| |_ 0 / /\ / _` | __/ _` / __|/ _ \ __/ __| %
|
|||
|
vised or not. In order to separate | (_) | _| / /_// (_| | || (_| \__ \ __/ |_\__ \
|
|||
|
one thing from another, they need \___/|_| /___,' \__,_|\__\__,_|___/\___|\__|___/ % %
|
|||
|
material to extract patterns from. 0 0 0
|
|||
|
One should carefully choose the % %
|
|||
|
study material, and adapt it to the by Algolit
|
|||
|
machine's task. It doesn't make
|
|||
|
sense to train a machine with nine- We often start the monthly Algolit meetings by searching for
|
|||
|
teenth-century novels if its mis- datasets or trying to create them. Sometimes we use already-ex-
|
|||
|
sion is to analyse tweets. A badly isting corpora, made available through the Natural Language
|
|||
|
written textbook can lead a student Toolkit nltk. NLTK contains, among others, The Universal Declara-
|
|||
|
to give up on the subject altogeth- tion of Human Rights, inaugural speeches from US presidents, or
|
|||
|
er. A good textbook is preferably movie reviews from the popular site Internet Movie Database
|
|||
|
not a textbook at all. (IMDb). Each style of writing will conjure different relations
|
|||
|
% between the words and will reflect the moment in time from which
|
|||
|
This is where the dataset comes in: they originate. The material included in NLTK was selected be-
|
|||
|
arranged as neatly as possible, or- cause it was judged useful for at least one community of re-
|
|||
|
ganized in disciplined rows and searchers. In spite of specificities related to the initial con-
|
|||
|
lined-up columns, waiting to be text of each document, they become universal documents by de-
|
|||
|
read by the machine. Each dataset fault, via their inclusion into a collection of publicly avail-
|
|||
|
collects different information % able corpora. In this sense, the Python package manager for natu-
|
|||
|
about the world, and like all col- ral language processing could be regarded as a time capsule. The
|
|||
|
lections, they are imbued with col- main reason why The Universal Declaration for Human Rights was
|
|||
|
lectors' bias. You will hear this included may have been because of the multiplicity of transla-
|
|||
|
expression very often: 'data is the tions, but it also paints a picture of the types of human writing
|
|||
|
new oil'. If only data were more that algorithms train on.
|
|||
|
like oil! Leaking, dripping and
|
|||
|
heavy with fat, bubbling up and With this work, we look at the datasets most commonly used by
|
|||
|
jumping unexpectedly when in con- data scientists to train machine algorithms. What material do
|
|||
|
tact with new matter. Instead, data they consist of? Who collected them? When?
|
|||
|
is supposed to be clean. With each
|
|||
|
process, each questionnaire, each --- %
|
|||
|
column title, it becomes cleaner
|
|||
|
and cleaner, chipping distinct % Concept & execution: Cristina Cochior
|
|||
|
characteristics until it fits the %
|
|||
|
mould of the dataset. % %
|
|||
|
0 0 00 0
|
|||
|
Some datasets combine the machinic 0 0 0 0
|
|||
|
logic with the human logic. The __ __ _ _
|
|||
|
models that require supervision 0 / / /\ \ \ |__ ___ __ _(_)_ __ ___
|
|||
|
multiply the subjectivities of both 0 \ \/ \/ / '_ \ / _ \ \ \ /\ / / | '_ \/ __|
|
|||
|
data collectors and annotators, \ /\ /| | | | (_) | \ V V /| | | | \__ \
|
|||
|
then propagate what they've been 0 \/ \/ |_| |_|\___/ \_/\_/ |_|_| |_|___/
|
|||
|
taught. You will encounter some of 0 0 0 0 0
|
|||
|
the datasets that pass as default
|
|||
|
in the machine learning field, as Who wins: creation of relationships
|
|||
|
well as other stories of humans
|
|||
|
guiding machines. by Louise Dekeuleneer, student Arts²/Section Visual Communication
|
|||
|
|
|||
|
French is a gendered language. Indeed many words are female or
|
|||
|
male and few are neutral. The aim of this project is to show that
|
|||
|
a patriarchal society also influences the language itself. The
|
|||
|
work focused on showing whether more female or male words are
|
|||
|
29
|
|||
|
% % %%% % %% % used on highlighting the influence of context on the gender of %%%%%
|
|||
|
% % % % % % words. At this stage, no conclusions have yet been drawn. %
|
|||
|
% % % % %% % % % % % % % % % % %
|
|||
|
% %% Law texts from 1900 to 1910 made available by the Mundaneum have
|
|||
|
% % %% % % been passed into an algorithm that turns the text into a list of %
|
|||
|
%% % % % words. These words are then compared with another list of French %
|
|||
|
% % % % % words, in which is specified whether the word is male or female.
|
|||
|
This list of words comes from Google Books. They created a huge
|
|||
|
% % % % database in 2012 from all the books scanned and available on
|
|||
|
% Google Books. % %
|
|||
|
% % % % % % % %
|
|||
|
Male words are highlighted in one colour and female words in an-
|
|||
|
% % % % other. Words that are not gendered (adverbs, verbs, etc.) are not
|
|||
|
% % % highlighted. All this is saved as an HTML file so that it can be
|
|||
|
% % directly opened in a web page and printed without the need for
|
|||
|
% additional layout. This is how each text becomes a small booklet
|
|||
|
by just changing the input text of the algorithm.
|
|||
|
|
|||
|
%
|
|||
|
0 % 0 0 0
|
|||
|
0 0 0 %
|
|||
|
_____ _ 0 0
|
|||
|
% 0 0 /__ \ |__ ___ % 0
|
|||
|
% / /\/ '_ \ / _ \ 0 %
|
|||
|
0 / / | | | | __/ 0
|
|||
|
% 0 0 0 \/ |_| |_|\___|
|
|||
|
% 0 _ 0 0 _ _
|
|||
|
/_\ _ __ _ __ ___ | |_ __ _| |_ ___ _ __
|
|||
|
//_\\| '_ \| '_ \ / _ \| __/ _` | __/ _ \| '__|
|
|||
|
/ _ \ | | | | | | (_) | || (_| | || (_) | | 0
|
|||
|
\_/ \_/_| |_|_| |_|\___/ \__\__,_|\__\___/|_|
|
|||
|
0 0
|
|||
|
%
|
|||
|
by Algolit
|
|||
|
|
|||
|
The annotator asks for the guidance of visitors in annotating the
|
|||
|
archive of Mundaneum.
|
|||
|
|
|||
|
The annotation process is a crucial step in supervised machine
|
|||
|
learning where the algorithm is given examples of what it needs
|
|||
|
to learn. A spam filter in training will be fed examples of spam
|
|||
|
% and real messages. These examples are entries, or rows from the
|
|||
|
dataset with a label, spam or non-spam.
|
|||
|
|
|||
|
The labelling of a dataset is work executed by humans, they pick
|
|||
|
a label for each row of the dataset. To ensure the quality of the
|
|||
|
% labels multiple annotators see the same row and have to give the
|
|||
|
same label before an example is included in the training data.
|
|||
|
Only when enough samples of each label have been gathered in the
|
|||
|
dataset can the computer start the learning process.
|
|||
|
|
|||
|
In this interface we ask you to help us classify the cleaned
|
|||
|
texts from the Mundaneum archive to expand our training set and
|
|||
|
improve the quality of the installation 'Classifying the World'
|
|||
|
in Oracles.
|
|||
|
|
|||
|
---
|
|||
|
|
|||
|
Concept, code, interface: Gijs de Heij
|
|||
|
|
|||
|
% %
|
|||
|
0 0 0 0 0 0
|
|||
|
0 0 0 0
|
|||
|
0 0 _ ___ ___ ___ 00
|
|||
|
0 0 / |/ _ \ / _ \ / _ \ 0
|
|||
|
0 0 | | | | | | | | | | |
|
|||
|
0 | | |_| | |_| | |_| |
|
|||
|
|_|\___/ \___/ \___/ 00 0
|
|||
|
00 0 0 0 0 _ 00
|
|||
|
30
|
|||
|
%% % % %% % % % % ___ _ _ _ __ ___ ___| |_ ___ % % %
|
|||
|
% %% % % % % / __| | | | '_ \/ __|/ _ \ __/ __| % % %
|
|||
|
% % % % % %% 0 0 \__ \ |_| | | | \__ \ __/ |_\__ \ % % % % %
|
|||
|
% % % % % 0 0 % |___/\__, |_| |_|___/\___|\__|___/ %% %
|
|||
|
% % % % 0 %% 0 |___/ % % % 0 %
|
|||
|
%% % % 0 0 0 0 __ _ % 0 _ 0 %% %
|
|||
|
% % % % 0 0 / /\ /(_)_ __ _ _| | % % %
|
|||
|
% 0 | |\ \ / / | '_ \| | | | | % %
|
|||
|
% % 0 % | | \ V /| | | | | |_| | | 0 0 %
|
|||
|
% % % % | | \_/ |_|_| |_|\__, |_| %
|
|||
|
% % % % 00 \_\ 0 |___/ 0 % %
|
|||
|
% % % __ _ _ _ _ % __ 0
|
|||
|
0 0 % /__\_| (_) |_(_) ___ _ __\ \
|
|||
|
% /_\/ _` | | __| |/ _ \| '_ \| | 0
|
|||
|
% //_| (_| | | |_| | (_) | | | | |
|
|||
|
0 \__/\__,_|_|\__|_|\___/|_| |_| | 0
|
|||
|
% % 00 0 0 /_/
|
|||
|
0 0 00
|
|||
|
|
|||
|
by Algolit
|
|||
|
|
|||
|
Created in 1985, Wordnet is a hierarchical taxonomy that de-
|
|||
|
% scribes the world. It was inspired by theories of human semantic
|
|||
|
% memory developed in the late 1960s. Nouns, verbs, adjectives and
|
|||
|
adverbs are grouped into synonyms sets or synsets, expressing a
|
|||
|
different concept. %
|
|||
|
|
|||
|
ImageNet is an image dataset based on the WordNet 3.0 nouns hier-
|
|||
|
archy. Each synset is depicted by thousands of images. From 2010 %
|
|||
|
until 2017, the ImageNet Large Scale Visual Recognition Challenge
|
|||
|
(ILSVRC) was a key benchmark in object category classification
|
|||
|
% for pictures, having a major impact on software for photography,
|
|||
|
image searches, image recognition.
|
|||
|
|
|||
|
1000 synsets (Vinyl Edition) contains the 1000 synsets used in
|
|||
|
this challenge recorded in the highest sound quality that this
|
|||
|
% analog format allows. This work highlights the importance of the
|
|||
|
datasets used to train artificial intelligence (AI) models that
|
|||
|
run on devices we use on a daily basis. Some of them inherit
|
|||
|
classifications that were conceived more than 30 years ago. This
|
|||
|
sound work is an invitation to thoughtfully analyse them.
|
|||
|
|
|||
|
---
|
|||
|
|
|||
|
Concept & recording: Javier Lloret
|
|||
|
|
|||
|
Voices: Sara Hamadeh & Joseph Hughes
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
31
|
|||
|
CONTEXTUAL STORIES
|
|||
|
ABOUT INFORMANTS
|
|||
|
|
|||
|
|
|||
|
|
|||
|
--- Datasets as representations --- community you try to distinguish what serves the
|
|||
|
community and what doesn't and you try to general-
|
|||
|
The data-collection processes that lead to the ize that, because I think that's what the good
|
|||
|
creation of the dataset raise important questions: faith-bad faith algorithm is trying to do, to find
|
|||
|
who is the author of the data? Who has the privi- helper tools to support the project, you do that
|
|||
|
lege to collect? For what reason was the selection on the basis of a generalization that is on the
|
|||
|
made? What is missing? abstract idea of what Wikipedia is and not on the
|
|||
|
living organism of what happens every day. What
|
|||
|
The artist Mimi Onuoha gives a brilliant example interests me in the relation between vandalism and
|
|||
|
of the importance of collection strategies. She debate is how we can understand the conventional
|
|||
|
chose the case of statistics related to hate drive that sits in these machine-learning pro-
|
|||
|
crimes. In 2012, the FBI Uniform Crime Reporting cesses that we seem to come across in many places.
|
|||
|
(UCR) Program registered almost 6000 hate crimes And how can we somehow understand them and deal
|
|||
|
committed. However, the Department of Justice’s with them? If you place your separation of good
|
|||
|
Bureau of Statistics came up with about 300.000 faith-bad faith on pre-existing labelling and then
|
|||
|
reports of such cases. That is over 50 times as reproduce that in your understanding of what edits
|
|||
|
many. The difference in numbers can be explained are being made, how then to take into account
|
|||
|
by how the data was collected. In the first situa- movements that are happening, the life of the ac-
|
|||
|
tion law enforcement agencies across the country tual project?
|
|||
|
voluntarily reported cases. For the second survey,
|
|||
|
the Bureau of Statistics distributed the National Amir: It's an interesting discussion. Firstly,
|
|||
|
Crime Victimization form directly to the homes of what we are calling good faith and bad faith comes
|
|||
|
victims of hate crimes. from the community itself. We are not doing la-
|
|||
|
belling for them, they are doing labelling for
|
|||
|
In the field of Natural Language Processing (NLP) themselves. So, in many different language
|
|||
|
the material that machine learners work with is Wikipedias, the definition of what is good faith
|
|||
|
text-based, but the same questions still apply: and what is bad faith will differ. Wikimedia is
|
|||
|
who are the authors of the texts that make up the trying to reflect what is inside the organism and
|
|||
|
dataset? During what period were the texts col- not to change the organism itself. If the organism
|
|||
|
lected? What type of worldview do they represent? changes, and we see that the definition of good
|
|||
|
faith and helping Wikipedia has been changed, we
|
|||
|
In 2017, Google's Top Stories algorithm pushed a are implementing this feedback loop that lets
|
|||
|
thread of 4chan, a non-moderated content website, people from inside their community pass judgement
|
|||
|
to the top of the results page when searching for on their edits and if they disagree with the la-
|
|||
|
the Las Vegas shooter. The name and portrait of an belling, we can go back to the model and retrain
|
|||
|
innocent person were linked to the terrible crime. the algorithm to reflect this change. It's some
|
|||
|
Google changed its algorithm just a few hours af- sort of closed loop: you change things and if
|
|||
|
ter the mistake was discovered, but the error had someone sees there is a problem, then they tell us
|
|||
|
already affected the person. The question is: why and we can change the algorithm back. It's an on-
|
|||
|
did Google not exclude 4chan content from the going project.
|
|||
|
training dataset of the algorithm?
|
|||
|
Référence: https://gitlab.constantvzw.org/algo
|
|||
|
Reference lit/algolit/blob/master/algoliterary_encounter
|
|||
|
https://points.datasociety.net/the-point-of-col- /Interview%20with%20Amir/AS.aac
|
|||
|
lection-8ee44ad7c2fa
|
|||
|
|
|||
|
https://arstechnica.com/information-technolo- --- How to make your dataset known ---
|
|||
|
gy/2017/10/google-admits-citing-4chan-to-spread-
|
|||
|
fake-vegas-shooter-news/ NLTK stands for Natural Language Toolkit. For pro-
|
|||
|
grammers who process natural language using
|
|||
|
Python, this is an essential library to work with.
|
|||
|
--- Labeling for an Oracle that detects vandalism Many tutorial writers recommend machine learning
|
|||
|
on Wikipedia --- learners to start with the inbuilt NLTK datasets.
|
|||
|
It comprises 71 different collections, with a to-
|
|||
|
This fragment is taken from an interview with Amir tal of almost 6000 items.
|
|||
|
Sarabadani, software engineer at Wikimedia. He was
|
|||
|
in Brussels in November 2017 during the Algoliter- There is for example the Movie Review corpus for
|
|||
|
ary Encounter. sentiment analysis. Or the Brown corpus, which was
|
|||
|
put together in the 1960s by Henry Kučera and W.
|
|||
|
Femke: If you think about Wikipedia as a living Nelson Francis at Brown University in Rhode Is-
|
|||
|
community, with every edit the project changes. land. There is also the Declaration of Human
|
|||
|
Every edit is somehow a contribution to a living Rights corpus, which is commonly used to test
|
|||
|
organism of knowledge. So, if from within that whether the code can run on multiple languages.
|
|||
|
The corpus contains the Declaration of Human
|
|||
|
32
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Rights expressed in 372 languages from around the on is the same content that they helped to write.
|
|||
|
world. In fact, at the beginning of Wikipedia, many arti-
|
|||
|
cles were written by bots. Rambot, for example,
|
|||
|
But what is the process of getting a dataset ac- was a controversial bot figure on the English-
|
|||
|
cepted into the NLTK library nowadays? On the speaking platform. It authored 98 per cent of the
|
|||
|
Github page, the NLTK team describes the following pages describing US towns.
|
|||
|
requirements:
|
|||
|
As a result of serial and topical robot interven-
|
|||
|
Only contribute corpora that have obtained a ba- tions, the models that are trained on the full
|
|||
|
sic level of notability. That means, there is a Wikipedia dump have a unique view on composing ar-
|
|||
|
publication that describes it, and a community of ticles. For example, a topic model trained on all
|
|||
|
programmers who are using it. of Wikipedia articles will associate 'river' with
|
|||
|
Ensure that you have permission to redistribute 'Romania' and 'village' with 'Turkey'. This is be-
|
|||
|
the data, and can document this. This means that cause there are over 10000 pages written about
|
|||
|
the dataset is best published on an external web- villages in Turkey. This should be enough to spark
|
|||
|
site with a licence. anyone's desire for a visit, but it is far too
|
|||
|
Use existing NLTK corpus readers where possible, much compared to the number of articles other
|
|||
|
or else contribute a well-documented corpus reader countries have on the subject. The asymmetry
|
|||
|
to NLTK. This means, you need to organize your causes a false correlation and needs to be re-
|
|||
|
data in such a way that it can be easily read us- dressed. Most models try to exclude the work of
|
|||
|
ing NLTK code. these prolific robot writers.
|
|||
|
|
|||
|
Reference
|
|||
|
https://blog.lateral.io/2015/06/the-unknown-per-
|
|||
|
--- Extract from a positive IMDb movie review from ils-of-mining-wikipedia/
|
|||
|
the NLTK dataset ---
|
|||
|
|
|||
|
corpus: NLTK, movie reviews
|
|||
|
|
|||
|
fileid: pos/cv998_14111.txt
|
|||
|
|
|||
|
steven spielberg ' s second epic film on world war
|
|||
|
ii is an unquestioned masterpiece of film . spiel-
|
|||
|
berg , ever the student on film , has managed to
|
|||
|
resurrect the war genre by producing one of its
|
|||
|
grittiest , and most powerful entries . he also
|
|||
|
managed to cast this era ' s greatest answer to
|
|||
|
jimmy stewart , tom hanks , who delivers a perfor-
|
|||
|
mance that is nothing short of an astonishing mir-
|
|||
|
acle . for about 160 out of its 170 minutes , "
|
|||
|
saving private ryan " is flawless . literally .
|
|||
|
the plot is simple enough . after the epic d - day
|
|||
|
invasion ( whose sequences are nothing short of
|
|||
|
spectacular ) , capt . john miller ( hanks ) and
|
|||
|
his team are forced to search for a pvt . james
|
|||
|
ryan ( damon ) , whose brothers have all died in
|
|||
|
battle . once they find him , they are to bring
|
|||
|
him back for immediate discharge so that he can go
|
|||
|
home . accompanying miller are his crew , played
|
|||
|
with astonishing perfection by a group of charac-
|
|||
|
ter actors that are simply sensational . barry
|
|||
|
pepper , adam goldberg , vin diesel , giovanni
|
|||
|
ribisi , davies , and burns are the team sent to
|
|||
|
find one man , and bring him home . the battle se-
|
|||
|
quences that bookend the film are extraordinary .
|
|||
|
literally .
|
|||
|
|
|||
|
|
|||
|
--- The ouroboros of machine learning ---
|
|||
|
|
|||
|
Wikipedia has become a source for learning not
|
|||
|
only for humans, but also for machines. Its arti-
|
|||
|
cles are prime sources for training models. But
|
|||
|
very often, the material the machines are trained
|
|||
|
|
|||
|
33
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678
|
|||
|
9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567
|
|||
|
89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456
|
|||
|
789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345
|
|||
|
6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234
|
|||
|
56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123
|
|||
|
456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12
|
|||
|
3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1
|
|||
|
23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678
|
|||
|
9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567
|
|||
|
89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456
|
|||
|
789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345
|
|||
|
6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234
|
|||
|
56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123
|
|||
|
456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12
|
|||
|
3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1
|
|||
|
23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678
|
|||
|
9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567
|
|||
|
89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456
|
|||
|
789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345
|
|||
|
6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234
|
|||
|
56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123
|
|||
|
456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12
|
|||
|
3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1
|
|||
|
23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678
|
|||
|
9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567
|
|||
|
89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456
|
|||
|
789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345
|
|||
|
6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234
|
|||
|
56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123
|
|||
|
456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12
|
|||
|
3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1
|
|||
|
23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678
|
|||
|
9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567
|
|||
|
89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456
|
|||
|
789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345
|
|||
|
6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234
|
|||
|
56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123
|
|||
|
456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12
|
|||
|
3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1
|
|||
|
23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678
|
|||
|
9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567
|
|||
|
89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456
|
|||
|
789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345
|
|||
|
6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234
|
|||
|
56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123
|
|||
|
456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12
|
|||
|
3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1
|
|||
|
23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678
|
|||
|
9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567
|
|||
|
89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456
|
|||
|
789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345
|
|||
|
6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234
|
|||
|
56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123
|
|||
|
456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12
|
|||
|
3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1
|
|||
|
23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
|
|||
|
34
|
|||
|
readers read readers read readers read readers read readers read readers read readers re
|
|||
|
d readers read readers read readers read readers read readers re
|
|||
|
d readers read readers read readers read readers read
|
|||
|
readers read readers read readers read re
|
|||
|
ders read readers read readers read readers re
|
|||
|
d readers read readers read readers r
|
|||
|
ad readers read readers read
|
|||
|
readers read readers read readers read
|
|||
|
readers read readers read
|
|||
|
readers read readers read readers read
|
|||
|
readers read readers read
|
|||
|
readers read readers read
|
|||
|
readers read readers read
|
|||
|
readers read readers read
|
|||
|
readers read readers read
|
|||
|
readers read readers read
|
|||
|
readers read readers
|
|||
|
read readers read
|
|||
|
readers read readers read
|
|||
|
readers read readers read
|
|||
|
readers read
|
|||
|
readers read readers read
|
|||
|
readers read
|
|||
|
readers read readers read
|
|||
|
readers read
|
|||
|
readers read readers read
|
|||
|
readers read
|
|||
|
readers read readers re
|
|||
|
d readers read
|
|||
|
readers read
|
|||
|
readers read readers read
|
|||
|
readers read
|
|||
|
readers read
|
|||
|
readers read re
|
|||
|
ders read readers read
|
|||
|
readers read
|
|||
|
readers read
|
|||
|
readers read
|
|||
|
readers read readers r
|
|||
|
ad readers read
|
|||
|
readers read
|
|||
|
readers read
|
|||
|
readers read
|
|||
|
readers read
|
|||
|
readers read
|
|||
|
readers read
|
|||
|
readers read readers
|
|||
|
read readers read
|
|||
|
readers read
|
|||
|
readers read
|
|||
|
readers read
|
|||
|
readers read
|
|||
|
readers read
|
|||
|
readers read
|
|||
|
readers read
|
|||
|
readers read
|
|||
|
readers read
|
|||
|
readers read
|
|||
|
readers read
|
|||
|
readers read
|
|||
|
readers read
|
|||
|
readers read
|
|||
|
readers read
|
|||
|
readers read
|
|||
|
readers read
|
|||
|
readers read
|
|||
|
readers read
|
|||
|
readers read
|
|||
|
readers read r
|
|||
|
35
|
|||
|
h a o e f rtlt9 b9r+t +-+-+-+-+-+-+-+ n +-+-+-+-+ aM B 6 r fwea5I s s ,e -h e e
|
|||
|
m et u t w8 8+ i4 + R w e |r|e|a|d|e|r|s| f |r|e|a|d| C a r_ n b - i1 a s- noh6M+ pha
|
|||
|
h a% 8 e olt r_ m c hb8 b +-+-+-+-+-+-+-+ mi +-+-+-+-+ pli f ro u n ae 3aee d oo| 3h 6o
|
|||
|
2 ce 'd | 8 eA s d8 - i 6 1 %6 sr2 9 g2 a s lia wrc 3 ?7 i n3+7m s
|
|||
|
c htiuw :ead 7 _ 9r t i d 5 sau4nl |e_ ar 8orl t h h+se a s _o1 s56 ka5n1e no hd
|
|||
|
d m u 's +e | h64t +-+ +-+-+-+-+-+-+-+-+ o +-+-+-+-+-+-+-+-+-+-+-+ enl o 3 t d Ad- 2 ahs
|
|||
|
g o i 0 _ 5o ss x 4 |a| |c|o|m|p|u|t|e|r| sl |u|n|d|e|r|s|t|a|n|d|s| 4i 8 trdiM 48 i5 2 9
|
|||
|
tl e ri 6 9 ln a /8e +-+ +-+-+-+-+-+-+-+-+ 6 x +-+-+-+-+-+-+-+-+-+-+-+ 4 \eda o |y A o3 /1
|
|||
|
e _ en l r 7 -sd c o +-+-+-+ +-+-+-+-+-+-+ l +-+-+-+-+-+-+-+-+-+ d6 m7n n a np l4 s
|
|||
|
7 t p e M fdh c as |a|l|l| |m|o|d|e|l|s| Sa |t|r|a|n|s|l|a|t|e| a 6 w da 5 - o4 5 i )
|
|||
|
r l a nn sh fc ui e7 +-+-+-+ +-+-+-+-+-+-+ c a +-+-+-+-+-+-+-+-+-+ ar 9 r , e a 3 , i
|
|||
|
4 r 2 t +-+-+-+-+ +-+-+-+-+-+-+ 72 +-+-+-+-+-+ p r s r a a h an ' 3 a
|
|||
|
o p ft n l |s|o|m|e| |m|o|d|e|l|s| |c|o|u|n|t| 8r n| 1 a r h o /oa e 7
|
|||
|
m8 4 wa +-+-+-+-+ +-+-+-+-+-+-+ l 7 +-+-+-+-+-+ 2 or r i 9e 4 p142 ,6r
|
|||
|
l 4N i u-3 am +-+-+-+-+ +-+-+-+-+-+-+ 4s +-+-+-+-+-+-+-+ 23 a e rea le dhVo t74 g
|
|||
|
j 7 t o e rd |s|o|m|e| |m|o|d|e|l|s| |r|e|p|l|a|c|e| o -i no r + 2 r l i
|
|||
|
o 6 7g i tt i +-+-+-+-+ +-+-+-+-+-+-+ 8fa +-+-+-+-+-+-+-+ x7 e g o ee d +ni
|
|||
|
d i tr 6k t r 2 3a8 9 i3 5 hv7 ge 5e u - 3y a _ e 2 8 c
|
|||
|
55fi1 - 6 :29 t e al+ atp43e + ac t n b t hTsa4ti03 o% % flol 4-e
|
|||
|
rf m r 8 6y heta 1 e 1 m6 +t dy p e 9 n ,o 5 / n _ | s e1 + ni d
|
|||
|
n 3 leo 5 ti 5 - sc a +1 w uw9 n+ e i m m
|
|||
|
3 a a a 9 \ -8 18 e e l i e h ghc ey9 8 15 3y a 1 -e i 5a i 9r a5pe
|
|||
|
o c c % a + 255 t yy m % 4i i 5 i e t _ 7 au l% 7 o
|
|||
|
g s8 5 e 2 r 3i 2 1 _ i4ir 2 e l s 1la n s s ht 2 r s i 3 r
|
|||
|
u s+ a e m + 6 2n r-l a c6 - t 7 4t +i +r % 8 6 8 r t t r 3 1
|
|||
|
r s 90 k hl a pWn e i5 7 8 a r e4ro e r5wt s m
|
|||
|
- h ea 6 2 8 2 v h nf e _ w lr a iai 7
|
|||
|
| j 4 4 f hc i F 9 p s m toG al 6 / h sde l e
|
|||
|
a 4 s 6 9 - h o m 6 _l34 . % w7 e 8 e l
|
|||
|
n .52- i 7 5 _ r + s 5 p s 5n+ 3 il e 1 o F c
|
|||
|
3 l 2 a o en% _. e 4 8lb 3 r a I 9 k o
|
|||
|
t r 6 e + 2 6 y oa n i r% f 1 n78 s h F o
|
|||
|
e g v 6 u h ad Ua1 2 a t 9 er n t oh7 s s r t g
|
|||
|
+ 7 6 h8 t 7 a - m 73| t o e r i 7
|
|||
|
f l ia s _ e u + 7 ct \ a _ 2- 7 . o o - ,
|
|||
|
t n 0n 4+ f 2r i 9 s y i3 r t r s e a p m h 4
|
|||
|
a c 7 t 9 n n m mro t s i nd e r
|
|||
|
a 1 e e | e 1 3 c n k 2 p e o e
|
|||
|
7i s d 6 a 48 c + Dl 1 1 n r - 0
|
|||
|
V r + a o % 7 7 9r 4 | 9 n 7 e
|
|||
|
e n | , m n e s s 1 e n 5
|
|||
|
5 r 4 o 5 1 6 e - 2 a -r _ e s’1 e S i
|
|||
|
t 2 +|ee s e c n an i e
|
|||
|
a4 9 9 o p _ t 7 h v 9 0
|
|||
|
d % a e , s nr 9 l W h a e t | + + s
|
|||
|
a 3 7I a e tk K y3e 2 c - a h o u e d
|
|||
|
\+ o 1 h r d t e nl 4 k 9 07 o t v 7s
|
|||
|
, n e % _x | i t b1 r h ei
|
|||
|
t a8 e o n t 12 o rs a y
|
|||
|
i e + n a | a 9 \
|
|||
|
n sr - e 3 i r- 8o e i
|
|||
|
6 f i 3 ht a l | h 1 o
|
|||
|
a s df m5 i h n i 9n ,u
|
|||
|
d c n H s o l c i 5
|
|||
|
o | s m rl 9 1 n c _i e
|
|||
|
i + i nr 8 h % t a % t 0 m
|
|||
|
i 6 c6 wt a r
|
|||
|
g s pr l t a 5 | c i |
|
|||
|
e 1 sr/ n e 7 e 9 n t w e c '
|
|||
|
m c - o % n . a 3
|
|||
|
f1 c I u 9 + t
|
|||
|
2 . , 4 na P e e f 2
|
|||
|
n i t 1S f n n a i e
|
|||
|
r + e i h 9 _ v
|
|||
|
3 | h e t s a
|
|||
|
s E l v - p u 1 h 2 , ' 5
|
|||
|
| + nse t a % 8 e w
|
|||
|
o p n y o s o
|
|||
|
|
|||
|
36
|
|||
|
V V V V V V % V V % % % %% % % % % %% % % %%
|
|||
|
V V V V V V V V V V V V V V V V % % % 0 0 % % % % 0 %% % %%% % % %%% %
|
|||
|
V V V V V V V V V % 0 0 %% % % 0 0 % % 0 %
|
|||
|
% % %% % % 0 _____ _ % ___ % _ % __ % %
|
|||
|
% % % % /__ \ |__ % ___ / __\ ___ ___ | | __ ___ / _| %
|
|||
|
% % READERS % / /\/ '_ \ / _ \ /__\/// _ \ / _ \| |/ / / _ \| |_ %
|
|||
|
% % / / | | | | __/ / \/ \ (_) | (_) | < | (_) | _| %
|
|||
|
% % \/ |_| |_|\___| \_____/\___/ \___/|_|\_\ \___/|_|
|
|||
|
V % V V V V V V V % % _____ 0 % 0 _
|
|||
|
V V V V V V V V V V V V V V V V % /__ \___ _ __ ___ ___ _ __ _ __ _____ __ (_)_ __
|
|||
|
V V V V V V V V V / /\/ _ \| '_ ` _ \ / _ \| '__| '__/ _ \ \ /\ / / | | '_ \
|
|||
|
V % V V V V V V V / / | (_) | | | | | | (_) | | | | | (_) \ V V / | | | | |
|
|||
|
V V V V V V V V V V V V V V V V \/ \___/|_| |_| |_|\___/|_| |_| \___/ \_/\_/ |_|_| |_| %
|
|||
|
V V % V V V V V V V 0 0 ___ % 0 0 __
|
|||
|
% % 0 __ _ 0 / __\ __ _ __ _ ___ / _| %
|
|||
|
We communicate with computers 0 0 / _` | /__\/// _` |/ _` | / _ \| |_ 0
|
|||
|
through language. We click on icons | (_| | / \/ \ (_| | (_| | | (_) | _| %
|
|||
|
that have a description in words, 0 \__,_| \_____/\__,_|\__, | \___/|_|
|
|||
|
we tap words on keyboards, use our 0 00 |___/ %
|
|||
|
voice to give them instructions. 0 __ __ % _
|
|||
|
Sometimes we trust our computer % % 0 / / /\ \ \___ _ __ __| |___ 0 % %
|
|||
|
with our most intimate thoughts and \ \/ \/ / _ \| '__/ _` / __| 0
|
|||
|
forget that they are extensive cal- % 0 0 \ /\ / (_) | | | (_| \__ \ 0 %
|
|||
|
culators. A computer understands \/ \/ \___/|_| \__,_|___/ 0 %
|
|||
|
every word as a combination of ze- 0 0 0
|
|||
|
ros and ones. A letter is read as a
|
|||
|
specific ASCII number: capital 'A' by Algolit % %
|
|||
|
is 001. %
|
|||
|
The bag-of-words model is a simplifying representation of text
|
|||
|
In all models, rule-based, classi- used in Natural Language Processing (NLP). In this model, a text
|
|||
|
cal machine learning, and neural is represented as a collection of its unique words, disregarding
|
|||
|
networks, words undergo some type grammar, punctuation and even word order. The model transforms
|
|||
|
of translation into numbers in or- the text into a list of words and how many times they're used in
|
|||
|
der to understand the semantic the text, or quite literally a bag of words.
|
|||
|
meaning of language. This is done
|
|||
|
through counting. Some models count This heavy reduction of language was the big shock when beginning
|
|||
|
the frequency of single words, some to machine learn. Bag of words is often used as a baseline, on
|
|||
|
might count the frequency of combi- which the new model has to perform better. It can understand the
|
|||
|
nations of words, some count the subject of a text by recognizing the most frequent or important
|
|||
|
frequency of nouns, adjectives, words. It is often used to measure the similarities of texts by
|
|||
|
verbs or noun and verb phrases. comparing their bags of words.
|
|||
|
Some just replace the words in a %
|
|||
|
text by their index numbers. Num- For this work the article 'Le Livre de Demain' by engineer G.
|
|||
|
bers optimize the operative speed Vander Haeghen, published in 1907 in the Bulletin de l'Institut
|
|||
|
of computer processes, leading to International de Bibliographie of the Mundaneum, has been liter-
|
|||
|
fast predictions, but they also re- ally reduced to a bag of words. You can buy a bag at the recep-
|
|||
|
move the symbolic links that words tion of Mundaneum.
|
|||
|
might have. Here we present a few
|
|||
|
techniques that are dedicated to ---
|
|||
|
making text readable to a machine.
|
|||
|
% Concept & realisation: An Mertens
|
|||
|
|
|||
|
|
|||
|
0 00
|
|||
|
0 0 0
|
|||
|
0 _____ ___ _____ ___ ___
|
|||
|
0 0 /__ \/ __\ \_ \/ \/ __\
|
|||
|
0 0 / /\/ _\____ / /\/ /\ / _\
|
|||
|
0 00 / / / /|_____/\/ /_/ /_// /
|
|||
|
\/ \/ \____/___,'\/
|
|||
|
0
|
|||
|
|
|||
|
by Algolit
|
|||
|
|
|||
|
The TF-IDF (Term Frequency-Inverse Document Frequency) is a
|
|||
|
weighting method used in text search. This statistical measure
|
|||
|
makes it possible to evaluate the importance of a term contained
|
|||
|
in a document, relative to a collection or corpus of documents.
|
|||
|
The weight increases in proportion to the number of occurrences
|
|||
|
37
|
|||
|
%% % % % %% %% of the word in the document. It also varies according to the fre-
|
|||
|
% % % % % quency of the word in the corpus. The TF-IDF is used in particu-
|
|||
|
% % % % %% lar in the classification of spam in email softwares. %
|
|||
|
% % % % % % % % %
|
|||
|
% % % % A web-based interface shows this algorithm through animations %
|
|||
|
% making it possible to understand the different steps of text %
|
|||
|
% % % classification. How does a TF-IDF-based programme read a text? %
|
|||
|
% How does it transform words into numbers? % % %
|
|||
|
% % % % %
|
|||
|
% --- % %
|
|||
|
% % %
|
|||
|
% Concept, code, animation: Sarah Garcin %
|
|||
|
% % %
|
|||
|
% % %
|
|||
|
0 0 % %
|
|||
|
% 0 0 %
|
|||
|
0 ___ 0 _ 0 0
|
|||
|
0 / _ \_ __ _____ _(_)_ __ __ _ __ _
|
|||
|
0 / /_\/ '__/ _ \ \ /\ / / | '_ \ / _` | / _` |
|
|||
|
0 / /_\\| | | (_) \ V V /| | | | | (_| | | (_| |
|
|||
|
0 \____/|_| \___/ \_/\_/ |_|_| |_|\__, | \__,_|
|
|||
|
0 0 0 |___/ 0
|
|||
|
0 0 0 _ 0 %
|
|||
|
% | |_ _ __ ___ ___
|
|||
|
% 0 0 | __| '__/ _ \/ _ \ %
|
|||
|
% 0 | |_| | | __/ __/
|
|||
|
0 0 0 \__|_| \___|\___|
|
|||
|
%
|
|||
|
|
|||
|
by Algolit %
|
|||
|
% %
|
|||
|
% % Parts-of-Speech is a category of words that we learn at school:
|
|||
|
% noun, verb, adjective, adverb, pronoun, preposition, conjunction,
|
|||
|
% interjection, and sometimes numeral, article, or determiner. %
|
|||
|
|
|||
|
In Natural Language Processing (NLP) there exist many writings
|
|||
|
that allow sentences to be parsed. This means that the algorithm
|
|||
|
can determine the part-of-speech of each word in sentence.'Growing
|
|||
|
tree' uses this techniques to define all nouns in specific
|
|||
|
sentence. Each noun is then replaced by its definition. This
|
|||
|
allows the sentence to grow autonomously and infinitely. The
|
|||
|
recipe of 'Growing tree' was inspired by Oulipo' constraint
|
|||
|
of 'littérature définitionnelle' invented by Marcel Benabou in
|
|||
|
1966. In given phrase, one replaces every significant element
|
|||
|
(noun, adjective, verb, adverb) by one of its definitions in
|
|||
|
given dictionary one reiterates the operation on the newly
|
|||
|
received phrase, and again.
|
|||
|
|
|||
|
The dictionary of definitions used in this work is Wordnet. Word-
|
|||
|
net is a combination of a dictionary and a thesaurus that can be
|
|||
|
read by machines. According to Wikipedia it was created in the
|
|||
|
Cognitive Science Laboratory of Princeton University starting in
|
|||
|
1985. The project was initially funded by the US Office of Naval
|
|||
|
Research and later also by other US government agencies including
|
|||
|
DARPA, the National Science Foundation, the Disruptive Technology
|
|||
|
Office (formerly the Advanced Research and Development Activity),
|
|||
|
and REFLEX.
|
|||
|
|
|||
|
---
|
|||
|
|
|||
|
Concept, code & interface: An Mertens & Gijs de Heij
|
|||
|
|
|||
|
|
|||
|
0 0 0 0000 0 0
|
|||
|
|
|||
|
_ _ _ _ _ _
|
|||
|
/_\ | | __ _ ___ _ __(_) |_| |__ _ __ ___ (_) ___
|
|||
|
0 //_\\| |/ _` |/ _ \| '__| | __| '_ \| '_ ` _ \| |/ __|
|
|||
|
/ _ \ | (_| | (_) | | | | |_| | | | | | | | | | (__
|
|||
|
38
|
|||
|
% %% % % %% % % % \_/ \_/_|\__, |\___/|_| |_|\__|_| |_|_| |_| |_|_|\___| %
|
|||
|
% %% % % % % % |___/ % 0 %% % % 00 %% %%
|
|||
|
%% % % 0 % % % % 0 0 _ _ % % 0 __ %%
|
|||
|
% % % _ __ ___ __ _ __| (_)_ __ __ _ ___ ___ / _| %% %
|
|||
|
% % % | '__/ _ \/ _` |/ _` | | '_ \ / _` / __| / _ \| |_ %
|
|||
|
% % | | | __/ (_| | (_| | | | | | (_| \__ \ | (_) | _| % %
|
|||
|
|_| \___|\__,_|\__,_|_|_| |_|\__, |___/ \___/|_|
|
|||
|
% % 0 % 0 0 0 0 |___/ 0 % 0 %
|
|||
|
%% % ___ 0 _ 0 _ _ _ 0 _ % % %%
|
|||
|
% / __\ ___ _ __| |_(_) | | ___ _ __( )__ %
|
|||
|
% % 0 /__\/// _ \ '__| __| | | |/ _ \| '_ \/ __| %%
|
|||
|
/ \/ \ __/ | | |_| | | | (_) | | | \__ \ %
|
|||
|
% 0 0 \_____/\___|_| \__|_|_|_|\___/|_| |_|___/
|
|||
|
% % 0 _ _ _ 0
|
|||
|
% % % _ __ 0 ___ _ __| |_ _ __ __ _(_) |_ %
|
|||
|
% % % | '_ \ / _ \| '__| __| '__/ _` | | __| 0
|
|||
|
% 00 | |_) | (_) | | | |_| | | (_| | | |_
|
|||
|
% | .__/ \___/|_| \__|_| \__,_|_|\__| %
|
|||
|
|_| 0
|
|||
|
% 0 0 0 % _ 0 0
|
|||
|
0 _ __ __ _ _ __| | ___
|
|||
|
0 0 | '_ \ / _` | '__| |/ _ \ 0
|
|||
|
0 | |_) | (_| | | | | __/ 0
|
|||
|
0 | .__/ \__,_|_| |_|\___|
|
|||
|
0 0 |_|
|
|||
|
00 0 0 0 0 00
|
|||
|
|
|||
|
% by Guillaume Slizewicz (Urban Species)
|
|||
|
% % %
|
|||
|
Written in 1907, Un code télégraphique du portrait parlé is an
|
|||
|
attempt to translate the 'spoken portrait', a face-description
|
|||
|
technique created by a policeman in Paris, into numbers. By im-
|
|||
|
plementing this code, it was hoped that faces of criminals and
|
|||
|
fugitives could easily be communicated over the telegraphic net-
|
|||
|
% work in between countries. In its form, content and ambition this
|
|||
|
text represents our complicated relationship with documentation
|
|||
|
% technologies. This text sparked the creation of the following in-
|
|||
|
% stallations for three reasons: %
|
|||
|
|
|||
|
- First, the text is an algorithm in itself, a compression algo-
|
|||
|
rithm, or to be more precise, the presentation of a compression
|
|||
|
% algorithm. It tries to reduce the information to smaller pieces
|
|||
|
while keeping it legible for the person who has the code. In this
|
|||
|
% regard it is linked to the way we create technology, our pursuit
|
|||
|
for more efficiency, quicker results, cheaper methods. It repre-
|
|||
|
sents our appetite for putting numbers on the entire world, mea-
|
|||
|
suring the smallest things, labeling the tiniest differences.
|
|||
|
This text itself embodies the vision of the Mundaneum.
|
|||
|
|
|||
|
- Second it is about the reasons for and the applications of
|
|||
|
technology. It is almost ironic that this text was in the se-
|
|||
|
lected archives presented to us in a time when face recognition
|
|||
|
and data surveillance are so much in the news. This text bears
|
|||
|
the same characteristics as some of today's technology: motivated
|
|||
|
by social control, classifying people, laying the basis for a
|
|||
|
surveillance society. Facial features are at the heart of recent
|
|||
|
controversies: mugshots were standardized by Bertillon, now they
|
|||
|
are used to train neural network to predict criminals from law-
|
|||
|
abiding citizens. Facial recognition systems allow the arrest of
|
|||
|
criminals via CCTV infrastructure and some assert that people’s
|
|||
|
features can predict sexual orientation.
|
|||
|
|
|||
|
- The last point is about how it represents the evolution of
|
|||
|
mankind’s techno-structure. What our tools allow us to do, what
|
|||
|
they forbid, what they hinder, what they make us remember and
|
|||
|
what they make us forget. This document enables a classification
|
|||
|
between people and a certain vision of what normality is. It
|
|||
|
breaks the continuum into pieces thus allowing stigmatiza-
|
|||
|
tion/discrimination. On the other hand this document also feels
|
|||
|
39
|
|||
|
%% %% % %% %% % obsolete today, because our techno-structure does not need such
|
|||
|
% %% % % % detailed written descriptions about fugitives, criminals or citi- %
|
|||
|
% %% % % % % % % zens. We can now find fingerprints, iris scans or DNA info in %
|
|||
|
% % % % % % % % % % large datasets and compare them directly. Sometimes the techno- %
|
|||
|
% % % % logical systems do not even need human supervision and recognize
|
|||
|
% % % %% % % directly the identity of a person via their facial features or % %
|
|||
|
% their gait. Computers do not use intricate written language to
|
|||
|
describe a face, but arrays of integers. Hence all the words used
|
|||
|
% in this documents seem désuets, dated. Have we forgotten what %
|
|||
|
some of them mean? Did photography make us forget how to describe
|
|||
|
% faces? Will voice-assistance software teach us again?
|
|||
|
%
|
|||
|
Writing with Otlet
|
|||
|
% %
|
|||
|
% % Writing with Otlet is a character generator that uses the spoken %
|
|||
|
% portrait code as its database. Random numbers are generated and
|
|||
|
% translated into a set of features. By creating unique instances,
|
|||
|
% the algorithm reveals the richness of the description that is
|
|||
|
possible with the portrait code while at the same time embodying
|
|||
|
its nuances.
|
|||
|
%
|
|||
|
An interpretation of Bertillon's spoken portrait. %%
|
|||
|
|
|||
|
% This work draws a parallel between Bertillon systems and current
|
|||
|
ones. A webcam linked to a facial recognition algorithm captures %
|
|||
|
the beholder's face and translates it into numbers on a canvas,
|
|||
|
% printing it alongside Bertillon's labelled faces.
|
|||
|
% %
|
|||
|
References
|
|||
|
https://www.technologyreview.com/s/602955/neural-network-learns-
|
|||
|
to-identify-criminals-by-their-faces/
|
|||
|
https://fr.wikipedia.org/wiki/Bertillonnage
|
|||
|
https://callingbullshit.org/case_studies/case_study_criminal_ma-
|
|||
|
chine_learning.html
|
|||
|
% %
|
|||
|
%
|
|||
|
% % 0 0 0 0 %
|
|||
|
0 0 0
|
|||
|
/\ /\__ _ _ __ __ _ _ __ ___ __ _ _ __
|
|||
|
0 / /_/ / _` | '_ \ / _` | '_ ` _ \ / _` | '_ \
|
|||
|
/ __ / (_| | | | | (_| | | | | | | (_| | | | |
|
|||
|
\/ /_/ \__,_|_| |_|\__, |_| |_| |_|\__,_|_| |_|
|
|||
|
0 0 |___/ 0 0
|
|||
|
% 0 0 0 0 0 %
|
|||
|
%
|
|||
|
by Laetitia Trozzi, student Arts²/Section Digital Arts
|
|||
|
|
|||
|
What better way to discover Paul Otlet and his passion for liter-
|
|||
|
ature than to play hangman? Through this simple game, which con-
|
|||
|
sists in guessing the missing letters in a word, the goal is to
|
|||
|
make the public discover terms and facts related to one of the
|
|||
|
creators of the Mundaneum.
|
|||
|
%
|
|||
|
Hangman uses an algorithm to detect the frequency of words in a
|
|||
|
text. Next, a series of significant words were isolated in Paul
|
|||
|
Otlet's bibliography. This series of words is integrated into a
|
|||
|
hangman game presented in a terminal. The difficulty of the game
|
|||
|
gradually increases as the player is offered longer and longer
|
|||
|
words. Over the different game levels, information about the life
|
|||
|
and work of Paul Otlet is displayed.
|
|||
|
|
|||
|
%
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
40
|
|||
|
CONTEXTUAL STORIES
|
|||
|
ABOUT READERS
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Naive Bayes, Support Vector Machines and Linear ter trigram. All the overlapping sequences of
|
|||
|
Regression are called classical machine learning three characters are isolated. For example, the
|
|||
|
algorithms. They perform well when learning with character 3-grams of 'Suicide', would be, ‘Sui’,
|
|||
|
small datasets. But they often require complex ‘uic’, ‘ici’, ‘cid’, etc. Character n-gram fea-
|
|||
|
Readers. The task the Readers do, is also called tures are very simple, they're language-indepen-
|
|||
|
feature-engineering. This means that a human needs dent and they're tolerant to noise. Furthermore,
|
|||
|
to spend time on a deep exploratory data analysis spelling mistakes do not jeopardize the technique.
|
|||
|
of the dataset.
|
|||
|
Patterns found with character n-grams focus on
|
|||
|
Features can be the frequency of words or letters, stylistic choices that are unconsciously made by
|
|||
|
but also syntactical elements like nouns, adjec- the author. The patterns remain stable over the
|
|||
|
tives, or verbs. The most significant features for full length of the text, which is important for
|
|||
|
the task to be solved, must be carefully selected authorship recognition. Other types of experiments
|
|||
|
and passed over to the classical machine learning could include measuring the length of words or
|
|||
|
algorithm. This process marks the difference with sentences, the vocabulary richness, the frequen-
|
|||
|
Neural Networks. When using a neural network, cies of function words; even syntax or semantics-
|
|||
|
there is no need for feature-engineering. Humans related measurements.
|
|||
|
can pass the data directly to the network and
|
|||
|
achieve fairly good performances straightaway. This means that not only your physical fingerprint
|
|||
|
This saves a lot of time, energy and money. is unique, but also the way you compose your
|
|||
|
thoughts!
|
|||
|
The downside of collaborating with Neural Networks
|
|||
|
is that you need a lot more data to train your The same n-gram technique discovered that The
|
|||
|
prediction model. Think of 1GB or more of plain Cuckoo’s Calling, a novel by Robert Galbraith, was
|
|||
|
text files. To give you a reference, 1 A4, a text actually written by … J. K. Rowling!
|
|||
|
file of 5000 characters only weighs 5 KB. You
|
|||
|
would need 8,589,934 pages. More data also re- Reference
|
|||
|
quires more access to useful datasets and more, Paper: On the Robustness of Authorship Attribu-
|
|||
|
much more processing power. tion Based on Character N-gram Features, Efs-
|
|||
|
tathios Stamatatos, in Journal of Law & Policy,
|
|||
|
Volume 21, Issue 2, 2013.
|
|||
|
--- Character n-gram for authorship recognition News article: https://www.scientificamerican.-
|
|||
|
--- com/article/how-a-computer-program-helped-show-jk-
|
|||
|
rowling-write-a-cuckoos-calling/
|
|||
|
Imagine … You've been working for a company for
|
|||
|
more than ten years. You have been writing tons of --- A history of n-grams ---
|
|||
|
emails, papers, internal notes and reports on very
|
|||
|
different topics and in very different genres. All The n-gram algorithm can be traced back to the
|
|||
|
your writings, as well as those of your col- work of Claude Shannon in information theory. In
|
|||
|
leagues, are safely backed-up on the servers of the paper, 'A Mathematical Theory of Communica-
|
|||
|
the company. tion', published in 1948, Shannon performed the
|
|||
|
first instance of an n-gram-based model for natu-
|
|||
|
One day, you fall in love with a colleague. After ral language. He posed the question: given a se-
|
|||
|
quence of letters, what is the likelihood of the
|
|||
|
hysterical and also very dependent on you. The day next letter?
|
|||
|
you decide to break up, your (now) ex elaborates a
|
|||
|
plan to kill you. They succeed. This is unfortu- If you read the following excerpt, can you tell
|
|||
|
nate. A suicide letter in your name is left next who it was written by? Shakespeare or an n-gram
|
|||
|
to your corpse. Because of emotional problems, it piece of code?
|
|||
|
says, you decided to end your life. Your best
|
|||
|
friends don't believe it. They decide to take the SEBASTIAN: Do I stand till the break off.
|
|||
|
case to court. And there, based on the texts you
|
|||
|
and others produced over ten years, a machine BIRON: Hide thy head.
|
|||
|
learning model reveals that the suicide letter was
|
|||
|
written by someone else. VENTIDIUS: He purposeth to Athens: whither, with
|
|||
|
the vow
|
|||
|
How does a machine analyse texts in order to iden- I made to handle you.
|
|||
|
tify you? The most robust feature for authorship
|
|||
|
recognition is delivered by the character n-gram FALSTAFF: My good knave.
|
|||
|
technique. It is used in cases with a variety of
|
|||
|
thematics and genres of the writing. When using You may have guessed, considering the topic of
|
|||
|
character n-grams, texts are considered as se- this story, that an n-gram algorithm generated
|
|||
|
quences of characters. Let's consider the charac- this text. The model is trained on the compiled
|
|||
|
works of Shakespeare. While more recent algo-
|
|||
|
41
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
rithms, such as the recursive neural networks of is good, they buy.
|
|||
|
the CharNN, are becoming famous for their perfor-
|
|||
|
mance, n-grams still execute a lot of NLP tasks. A paper by Haikuan Liu of the Australian National
|
|||
|
They are used in statistical machine translation, University states that the tense of verbs used in
|
|||
|
speech recognition, spelling correction, entity tweets can be an indicator of the frequency of fi-
|
|||
|
detection, information extraction, ... nancial transactions. His idea is based on the
|
|||
|
fact that verb conjugation is used in psychology
|
|||
|
to detect the early stages of human depression.
|
|||
|
--- God in Google Books ---
|
|||
|
Reference
|
|||
|
In 2006, Google created a dataset of n-grams from Paper: 'Grammatical Feature Extraction and Analy-
|
|||
|
their digitized book collection and released it sis of Tweet Text: An Application towards Pre-
|
|||
|
online. Recently they also created an n-gram view- dicting Stock Trends', Haikuan Liu, Research
|
|||
|
er. School of Computer Science (RSCS), College of
|
|||
|
Engineering and Computer Science (CECS), The Aus-
|
|||
|
This allowed for many socio-linguistic investiga- tralian National University (ANU)
|
|||
|
tions. For example, in October 2018, the New York
|
|||
|
Times Magazine published an opinion article titled
|
|||
|
'It’s Getting Harder to Talk About God'. The au- --- Bag of words ---
|
|||
|
thor, Jonathan Merritt, had analysed the mention
|
|||
|
of the word 'God' in Google's dataset using the In Natural Language Processing (NLP), 'bag of
|
|||
|
n-gram viewer. He concluded that there had been a words' is considered to be an unsophisticated mod-
|
|||
|
decline in the word's usage since the twentieth el. It strips text of its context and dismantles
|
|||
|
century. Google's corpus contains texts from the it into a collection of unique words. These words
|
|||
|
sixteenth century leading up to the twenty-first. are then counted. In the previous sentences, for
|
|||
|
However, what the author missed out on was the example, 'words' is mentioned three times, but
|
|||
|
growing popularity of scientific journals around this is not necessarily an indicator of the text's
|
|||
|
the beginning of the twentieth century. This new focus.
|
|||
|
genre that was not mentioning the word God shifted
|
|||
|
the dataset. If the scientific literature was The first appearance of the expression 'bag of
|
|||
|
taken out of the corpus, the frequency of the word words' seems to go back to 1954. Zellig Harris,
|
|||
|
'God' would again flow like a gentle ripple from a an influential linguist, published paper called
|
|||
|
distant wave. 'Distributional Structure' In the section called
|
|||
|
'Meaning as function of distribution' he says
|
|||
|
'for language is not merely bag of words but
|
|||
|
--- Grammatical features taken from Twitter influ- tool with particular properties which have been
|
|||
|
ence the stock market --- fashioned in the course of its use. The linguist'
|
|||
|
work is precisely to discover these properties,
|
|||
|
The boundaries between academic disciplines are whether for descriptive analysis or for the synthesis
|
|||
|
becoming blurred. Economics research mixed with of quasi-linguistic systems.
|
|||
|
psychology, social science, cognitive and emo-
|
|||
|
tional concepts have given rise to a new economics
|
|||
|
subfield, called 'behavioral economics'. This
|
|||
|
means that researchers can start to explain stock
|
|||
|
market mouvement based on factors other than eco-
|
|||
|
nomic factors only. Both the economy and 'public
|
|||
|
opinion' can influence or be influenced by each
|
|||
|
other. A lot of research is being done on how to
|
|||
|
use 'public opinion' to predict tendencies in
|
|||
|
stock-price changes.
|
|||
|
|
|||
|
'Public opinion' is estimated from sources of
|
|||
|
large amounts of public data, like tweets, blogs
|
|||
|
or online news. Research using machinic data anal-
|
|||
|
ysis shows that the changes in stock prices can be
|
|||
|
predicted by looking at 'public opinion', to some
|
|||
|
degree. There are many scientific articles online,
|
|||
|
which analyse the press on the 'sentiment' ex-
|
|||
|
pressed in them. An article can be marked as more
|
|||
|
or less positive or negative. The annotated press
|
|||
|
articles are then used to train a machine learning
|
|||
|
model, which predicts stock market trends, marking
|
|||
|
them as 'down' or 'up'. When a company gets bad
|
|||
|
press, traders sell. On the contrary, if the news
|
|||
|
|
|||
|
42
|
|||
|
learners learn learners learn learners learn learners learn learners learn learners learn
|
|||
|
learners learn learners learn learners learn learners learn learners learn
|
|||
|
learners learn learners learn learners learn learners learn
|
|||
|
learners learn learners learn learners learn
|
|||
|
learners learn learners learn learners learn lea
|
|||
|
ners learn learners learn learners learn
|
|||
|
learners learn learners learn learners learn
|
|||
|
learners learn learners learn learners
|
|||
|
earn learners learn learners learn
|
|||
|
learners learn learners learn
|
|||
|
learners learn learners learn lea
|
|||
|
ners learn learners learn learners
|
|||
|
learn learners learn learners
|
|||
|
earn learners learn learne
|
|||
|
s learn learners learn
|
|||
|
learners learn learners learn
|
|||
|
learners learn learners learn
|
|||
|
learners learn learners learn
|
|||
|
learners learn
|
|||
|
learners learn learners learn
|
|||
|
learners learn learners learn
|
|||
|
learners learn
|
|||
|
learners learn learners learn
|
|||
|
learners learn
|
|||
|
learners learn learners learn
|
|||
|
learners learn
|
|||
|
learners learn learners
|
|||
|
learn learners learn
|
|||
|
learners learn
|
|||
|
learners learn learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
learners learn learners
|
|||
|
learn learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
learners learn lea
|
|||
|
ners learn learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
learners learn learners
|
|||
|
earn learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
learners learn
|
|||
|
43
|
|||
|
4n r- ro %r5 l e +-+-+-+-+-+-+-+-+ f +-+-+-+-+-+ m 9-e p + st2- a , _ nr2
|
|||
|
l itr9 op 2c b ue |l|e|a|r|n|e|r|s| , y |l|e|a|r|n| ) g- 9 c w 1 atn_wn o_ c|
|
|||
|
c o b op , +_7 -x a 9acl +-+-+-+-+-+-+-+-+ hc +-+-+-+-+-+ 34 u a 9a l |an t p 9 -
|
|||
|
|\ _ l6el , 7 3 u r1 3 8dl a. m s T rv t ro|lm ni3 4 V3 as1to 4 e hp
|
|||
|
5_s -o 4 d o9n t 0 t V i5n _ i, _ iu9 l + t t 6t s r s exe4eh l 4
|
|||
|
ri _g d s es c s a 4s i+ i _ +-+-+-+-+-+-+-+-+ +-+-+-+ +-+-+-+-+-+-+-+ e l4 f k 5l l wu |f
|
|||
|
ete V o I- 4e |l|e|a|r|n|e|r|s| 6 e |a|r|e| |p|a|t|t|e|r|n| st 62 t a ne e 2 ?
|
|||
|
.n l 1 ntb 5 d9 +-+-+-+-+-+-+-+-+ e e1 +-+-+-+ +-+-+-+-+-+-+-+ ia 5 n i w er8
|
|||
|
er 1 t i 9 te9 n r7 | t ie m +-+-+-+-+-+-+-+ n s 1 i- e i X c w a
|
|||
|
4 _c4 c s+ m t eh h.5 t a i t m p3 a e |f|i|n|d|e|r|s| , ll 6a e e7ifo- +cs te s-
|
|||
|
h 5 8 m wl c tl u w2 +-+-+-+-+-+-+-+ 8 r s oe t % 8- 1 tl3o 4
|
|||
|
n r a t t 3a 9 +-+-+-+-+-+-+-+-+ 5i9 +-+-+-+ +-+-+-+-+-+-+-+-+ l s 9 | 9a e 0sbntaf
|
|||
|
m(um8 j ra e +t o |l|e|a|r|n|e|r|s| |a|r|e| |c|r|a|w|l|i|n|g| n n ei pte7i r 6ms
|
|||
|
t s G_ el i + ka e . +-+-+-+-+-+-+-+-+ +-+-+-+ +-+-+-+-+-+-+-+-+ ,/s u r r 4 1 i h
|
|||
|
d heeo 2eei m g r ao a ah( 9a u m9 V e +-+-+-+-+-+-+-+ +-+-+-+-+ nae T-e r s-i5 7n
|
|||
|
gt r_ y e io 96 e e s d |T trig - l |t|h|r|o|u|g|h| |d|a|t|a| 7s e1s77 87 2 fw m c
|
|||
|
9d. 2 _ e 2nnm 96 n a t7- c d, o e +-+-+-+-+-+-+-+ +-+-+-+-+ 6 r n rbhi e 5 s n d
|
|||
|
/ _ 2r s f a ef +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+ +-+-+-+-+ h asn _
|
|||
|
t5 w w p l n | a -s |l|e|a|r|n|e|r|s| e |g|e|n|e|r|a|t|e| |s|o|m|e| |k|i|n|d| u s s
|
|||
|
ie im i i 7 t 4 +-+-+-+-+-+-+-+-+ r +-+-+-+-+-+-+-+-+ +-+-+-+-+ +-+-+-+-+ u t nr+ a
|
|||
|
c 7 t s x 4 da n 7 Fd e c & +-+-+ +-+-+-+-+-+-+-+-+ raa o c5 ' e ro.
|
|||
|
k1 n t re 8 n et 9 1 l r 0V |o|f| |s|p|e|c|i|f|i|c| a t9 s c rv v s l
|
|||
|
n_fa r% a Z a 5 w me m n 5 1s n +-+-+ +-+-+-+-+-+-+-+-+ t S 1 o a r d rb
|
|||
|
y 7 r c o ge D _ns v / b +-+-+-+-+-+-+-+-+-+ 8 4- i o 9 t e
|
|||
|
i 4 9 9t6 9- é2 o p| o v i |'|g|r|a|m|m|a|r|'| n p t p 8sn _ l 8
|
|||
|
nt 2pc t V4 e ha e 3 1 , n 2 i o +-+-+-+-+-+-+-+-+-+ %4 r 8 1 1 t e
|
|||
|
e 8 rn d +-+-+-+-+-+-+-+-+-+-+-+ i +-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ u t
|
|||
|
e e e e r F |c|l|a|s|s|i|f|i|e|r|s| %f |g|e|n|e|r|a|t|e|,| |e|v|a|l|u|a|t|e| 1 h V0 t n
|
|||
|
nh % c 5 h r +-+-+-+-+-+-+-+-+-+-+-+ ti +-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ Ul n m ,
|
|||
|
- n 2 ab m 3 o- r e 6| n +-+-+-+ +-+-+-+-+-+-+-+-+ 6 + oe /
|
|||
|
l t i u + u t l i 7 ei |a|n|d| |r|e|a|d|j|u|s|t| 5 r f l f5 %
|
|||
|
n 2 s e m a m e d1 m uh c +-+-+-+ +-+-+-+-+-+-+-+-+ n s g o _
|
|||
|
e d c ps +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+ +-+-+-+ + a D y5 8r
|
|||
|
+1n o h |l|e|a|r|n|e|r|s| |u|n|d|e|r|s|t|a|n|d| |a|n|d| k4t tr t m
|
|||
|
u a t +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+ +-+-+-+ a 3 i 3 t
|
|||
|
2 r 7 n n 9 r r. t p i +-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ -- c
|
|||
|
g + l t v c i 8 f as |r|e|v|e|a|l| |p|a|t|t|e|r|n|s| a _ n
|
|||
|
4 s l 5 2 + f s - l +-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ 4 - e
|
|||
|
y + h -_ 7 +-+-+-+-+-+-+-+-+ +-+-+-+-+-+ +-+-+-+-+-+-+ o . - i e
|
|||
|
i e l t e _ V n |l|e|a|r|n|e|r|s| |d|o|n|'|t| |a|l|w|a|y|s| 4b ,i
|
|||
|
_ % rt h e ,a +-+-+-+-+-+-+-+-+ +-+-+-+-+-+ +-+-+-+-+-+-+ a _ h _
|
|||
|
2 V o 5 t +-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+ _ s
|
|||
|
c % po + h o3 mi5 8 |d|i|s|t|u|i|n|g|u|i|s|h| |w|e|l|l| w 7 _nn
|
|||
|
, ha u pk +-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+ 91s 6 a
|
|||
|
s hp I 3 % +-+-+-+-+-+ +-+-+-+-+-+-+-+-+ i 8
|
|||
|
v o 6 o r s |w|h|i|c|h| |p|a|t|t|e|r|n|s| s_ oge e
|
|||
|
n a + e o e 3 n 7 +-+-+-+-+-+ +-+-+-+-+-+-+-+-+ o 6 +
|
|||
|
i l r \ m + a l r +-+-+-+-+-+-+ +-+-+ +-+-+-+-+-+-+-+-+ , n
|
|||
|
c a o o o |s|h|o|u|l|d| |b|e| |r|e|p|e|a|t|e|d| eh s i
|
|||
|
o tlt t 2 e5 d +-+-+-+-+-+-+ +-+-+ +-+-+-+-+-+-+-+-+ o s
|
|||
|
7 d 2 5 | n | 1 ey d te a t
|
|||
|
r | , + 9 6 % f a i s %
|
|||
|
n o+| r u s \ 4 e ep e
|
|||
|
ao 2 | f' | e e r 9 7 Td i d e
|
|||
|
. t 8m d c l 6 l o i _ t T i - i
|
|||
|
n 7 e d 3 p l a n . i l
|
|||
|
i i % 8 a + p r l e
|
|||
|
4 % a l
|
|||
|
| h 5 | tl d 1mo 7 t N
|
|||
|
, t o i 9 o? F W 9 dC %hf
|
|||
|
o m 5 t t w , - 3p
|
|||
|
a d s e a n t _ o c \ f
|
|||
|
+ p a r f |el 8 , g i l e e
|
|||
|
t e3 - - 9 h c t t +w + | u0 w t
|
|||
|
. h 5 a , s
|
|||
|
t d _ n V 4 a o
|
|||
|
, o t r nt
|
|||
|
w e e
|
|||
|
|
|||
|
44
|
|||
|
V V V % V % V % V V % V % % % % % %% % % % % % % %
|
|||
|
V V V V V V V V V V V V V V V V % % % 0 % % % %% % % %%% % %
|
|||
|
V V V % V V V V V V % % %% 0 0 % % 0 % 00 % % %
|
|||
|
% % % % 0 % __ _ 0 % 0 % ___ % 0 0 %
|
|||
|
% % % % % % 0 /\ \ \__ _(_)_ 0 _____ / __\ __ _ _ _ ___ ___ %
|
|||
|
% % LEARNERS % % / \/ / _` | \ \ / / _ \ /__\/// _` | | | |/ _ \/ __|
|
|||
|
% % % % % / /\ / (_| | |\ V / __/ / \/ \ (_| | |_| | __/\__ \
|
|||
|
% % % % \_\ \/ \__,_|_| \_/ \___| \_____/\__,_|\__, |\___||___/
|
|||
|
V V V V V V V V % 0 % % 0 0 % % |___/
|
|||
|
V V V V V V V V V V V V V V V V % __ _ __ _ _ __ ___ ___ 0 % %
|
|||
|
V V V V V V V % V V % % / _` |/ _` | '_ ` _ \ / _ \ %
|
|||
|
V V V V V V V V 0 0 | (_| | (_| | | | | | | __/ %
|
|||
|
V V V V V V V V V V V V V V V V % 0 00 \__, |\__,_|_| |_| |_|\___| 0 %
|
|||
|
V V V V V V V V V 0 |___/ 0
|
|||
|
% % 0 0 0
|
|||
|
Learners are the algorithms that
|
|||
|
distinguish machine learning prac- by Algolit % %
|
|||
|
tices from other types of prac- %
|
|||
|
tices. They are pattern finders, In machine learning Naive Bayes methods are simple probabilistic
|
|||
|
capable of crawling through data classifiers that are widely applied for spam filtering and decid-
|
|||
|
and generating some kind of spe- ing whether a text is positive or negative.
|
|||
|
cific 'grammar'. Learners are based
|
|||
|
on statistical techniques. Some They require a small amount of training data to estimate the nec-
|
|||
|
need a large amount of training essary parameters. They can be extremely fast compared to more
|
|||
|
data in order to function, others sophisticated methods. They are difficult to generalize, which
|
|||
|
can work with a small annotated means that they perform on specific tasks, demanding to be
|
|||
|
set. Some perform well in classifi- % trained with the same style of data that will be used to work
|
|||
|
cation tasks, like spam identifica- with afterwards.
|
|||
|
tion, others are better at predict-
|
|||
|
ing numbers, like temperatures, This game allows you to play along the rules of Naive Bayes.
|
|||
|
distances, stock market values, and While manually executing the code, you create your own playful
|
|||
|
so on. model that 'just works'. A word of caution is necessary: because
|
|||
|
you only train it with 6 sentences – instead of the minimum 2000
|
|||
|
The terminology of machine learning – it is not representative at all!
|
|||
|
is not yet fully established.
|
|||
|
Depending on the field, whether ---
|
|||
|
statistics, computer science or the
|
|||
|
humanities, different terms are Concept & realisation: An Mertens
|
|||
|
used. Learners are also called
|
|||
|
classifiers. When we talk about
|
|||
|
Learners, we talk about the inter- % 0 % 0 0 0 %
|
|||
|
woven functions that have the ca- 0 0 0 0 0 %
|
|||
|
pacity to generate other functions, __ _ 0
|
|||
|
evaluate and readjust them to fit 0 0 / /(_)_ __ ___ __ _ _ __ 0
|
|||
|
the data. They are good at under- / / | | '_ \ / _ \/ _` | '__|
|
|||
|
standing and revealing patterns. 0 0 / /__| | | | | __/ (_| | |
|
|||
|
But they don't always distinguish 0 \____/_|_| |_|\___|\__,_|_|
|
|||
|
well which of the patterns should 0 __ 0 0 _
|
|||
|
be repeated. 0 /__\ ___ __ _ _ __ ___ ___ ___(_) ___ _ __
|
|||
|
/ \/// _ \/ _` | '__/ _ \/ __/ __| |/ _ \| '_ \
|
|||
|
In software packages, it is not al- 00 0 / _ \ __/ (_| | | | __/\__ \__ \ | (_) | | | |
|
|||
|
ways possible to distinguish the 0 0 \/ \_/\___|\__, |_| \___||___/___/_|\___/|_| |_|
|
|||
|
characteristic elements of the 0 0 |___/ 0
|
|||
|
classifiers, because they are hid- 0 0 __ _ __ _ _ __ ___ ___
|
|||
|
den in underlying modules or li- 0 / _` |/ _` | '_ ` _ \ / _ \
|
|||
|
braries. Programmers can invoke | (_| | (_| | | | | | | __/
|
|||
|
them using a single line of code. 0 \__, |\__,_|_| |_| |_|\___| 0 0 %
|
|||
|
For this exhibition, we therefore |___/ 00
|
|||
|
developed two table games that show 0 0 0 0
|
|||
|
in detail the learning process of
|
|||
|
simple, but frequently used classi- by Algolit
|
|||
|
fiers.
|
|||
|
Linear Regression is one of the best-known and best-understood
|
|||
|
algorithms in statistics and machine learning. It has been around
|
|||
|
for almost 200 years. It is an attractive model because the rep-
|
|||
|
% resentation is so simple. In statistics, linear regression is a
|
|||
|
statistical method that allows to summarize and study relation-
|
|||
|
ships between two continuous (quantitative) variables.
|
|||
|
|
|||
|
45
|
|||
|
% % % %% % % By playing this game you will realize that as a player you have a
|
|||
|
% % % % lot of decisions to make. You will experience what it means to %
|
|||
|
% %% create a coherent dataset, to decide what is in and what is not
|
|||
|
% % % % in. If all goes well, you will feel the urge to change your data %
|
|||
|
% % in order to obtain better results. This is part of the art of ap- %
|
|||
|
%% % % % % % proximation that is at the basis of all machine learning prac-
|
|||
|
% % % tices. % % % % % % % %
|
|||
|
% % %
|
|||
|
% % % % % --- % %
|
|||
|
% % % % % % %
|
|||
|
Concept & realisation: An Mertens %
|
|||
|
% % % %
|
|||
|
%% % %
|
|||
|
0 % 0 0
|
|||
|
00 0 0 0 % 0 0
|
|||
|
0 _____ _ _ 0 _ 0 _ %
|
|||
|
/__ \_ __ __ _(_) |_ ___ __| | ___ __| |
|
|||
|
/ /\/ '__/ _` | | __/ _ \ / _` |/ _ \ / _` |
|
|||
|
% % 0 / / | | | (_| | | || __/ | (_| | __/ | (_| |
|
|||
|
00 \/ |_| \__,_|_|\__\___| \__,_|\___| \__,_|
|
|||
|
% % 0 0 00 0 % _ _ _ 0
|
|||
|
% ___ ___ _ _ _ __ ___ ___ _ __ | |_ __ _| |_(_) ___
|
|||
|
% / _ \ / __| | | | '_ ` _ \ / _ \ '_ \| __/ _` | __| |/ _ \
|
|||
|
% | (_) | (__| |_| | | | | | | __/ | | | || (_| | |_| | (_) |
|
|||
|
\___/ \___|\__,_|_| |_| |_|\___|_| |_|\__\__,_|\__|_|\___/
|
|||
|
% 0 0 0 _ __ 0
|
|||
|
| '_ \ 0
|
|||
|
% 0 0 | | | |
|
|||
|
|_| |_| 0
|
|||
|
0 0 % 0 0
|
|||
|
%
|
|||
|
Traité de Documentation. Three algorithmic poems.
|
|||
|
|
|||
|
by Rémi Forte, designer-researcher at L’Atelier national de
|
|||
|
recherche typographique, Nancy, France
|
|||
|
%
|
|||
|
serigraphy on paper, 60 × 80 cm, 25 ex., 2019, for sale at the
|
|||
|
% reception of the Mundaneum.
|
|||
|
|
|||
|
The poems, reproduced in the form of three posters, are an algo-
|
|||
|
% rithmic and poetic re-reading of Paul Otlet's Traité de documen-
|
|||
|
tation. They are the result of an algorithm based on the mysteri-
|
|||
|
ous rules of human intuition. It has been applied to a fragment
|
|||
|
taken from Paul Otlet's book and is intended to be representative
|
|||
|
% of his bibliological practice.
|
|||
|
%
|
|||
|
For each fragment, the algorithm splits the text, words and punc-
|
|||
|
tuation marks are counted and reordered into a list. In each
|
|||
|
% line, the elements combine and exhaust the syntax of the selected
|
|||
|
fragment. Paul Otlet's language remains perceptible but exacer-
|
|||
|
bated to the point of absurdity. For the reader, the systematiza-
|
|||
|
% tion of the text is disconcerting and his reading habits are dis-
|
|||
|
rupted.
|
|||
|
|
|||
|
% Built according to a mathematical equation, the typographical
|
|||
|
% composition of the poster is just as systematic as the poem. How-
|
|||
|
ever, friction occurs occasionally; loop after loop, the lines
|
|||
|
% extend to bite on the neighbouring column. Overlays are created
|
|||
|
and words are hidden by others. These telescopic handlers draw
|
|||
|
alternative reading paths.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
46
|
|||
|
CONTEXTUAL STORIES
|
|||
|
ABOUT LEARNERS
|
|||
|
|
|||
|
|
|||
|
|
|||
|
--- Naive Bayes & Viagra --- Only after 150 years was the accusation refuted.
|
|||
|
|
|||
|
Naive Bayes is a famous learner that performs well Fast forward to 1939, when Bayes' rule was still
|
|||
|
with little data. We apply it all the time. Chris- virtually taboo, dead and buried in the field of
|
|||
|
tian and Griffiths state in their book, Algorithms statistics. When France was occupied in 1940 by
|
|||
|
To Live By, that 'our days are full of small Germany, which controlled Europe's factories and
|
|||
|
data'. Imagine, for example, that you're standing farms, Winston Churchill's biggest worry was the
|
|||
|
at a bus stop in a foreign city. The other person U-boat peril. U-boat operations were tightly con-
|
|||
|
who is standing there has been waiting for 7 min- trolled by German headquarters in France. Each
|
|||
|
utes. What do you do? Do you decide to wait? And submarine received orders as coded radio messages
|
|||
|
if so, for how long? When will you initiate other long after it was out in the Atlantic. The mes-
|
|||
|
options? Another example. Imagine a friend asking sages were encrypted by word-scrambling machines,
|
|||
|
advice about a relationship. He's been together called Enigma machines. Enigma looked like a com-
|
|||
|
with his new partner for a month. Should he invite plicated typewriter. It was invented by the German
|
|||
|
the partner to join him at a family wedding? firm Scherbius & Ritter after the First World War,
|
|||
|
when the need for message-encoding machines had
|
|||
|
Having pre-existing beliefs is crucial for Naive become painfully obvious.
|
|||
|
Bayes to work. The basic idea is that you calcu-
|
|||
|
late the probabilities based on prior knowledge Interestingly, and luckily for Naive Bayes and the
|
|||
|
and given a specific situation. world, at that time, the British government and
|
|||
|
educational systems saw applied mathematics and
|
|||
|
The theorem was formulated during the 1740s by statistics as largely irrelevant to practical
|
|||
|
Thomas Bayes, a reverend and amateur mathemati- problem-solving. So the British agency charged
|
|||
|
cian. He dedicated his life to solving the ques- with cracking German military codes mainly hired
|
|||
|
tion of how to win the lottery. But Bayes' rule men with linguistic skills. Statistical data was
|
|||
|
was only made famous and known as it is today by seen as bothersome because of its detail-oriented
|
|||
|
the mathematician Pierre Simon Laplace in France a nature. So wartime data was often analysed not by
|
|||
|
bit later in the same century. For a long time af- statisticians, but by biologists, physicists, and
|
|||
|
ter La Place's death, the theory sank into obliv- theoretical mathematicians. None of them knew that
|
|||
|
ion until it was dug up again during the Second the Bayes rule was considered to be unscientific
|
|||
|
World War in an effort to break the Enigma code. in the field of statistics. Their ignorance proved
|
|||
|
fortunate.
|
|||
|
Most people today have come in contact with Naive
|
|||
|
Bayes through their email spam folders. Naive It was the now famous Alan Turing – a mathemati-
|
|||
|
Bayes is a widely used algorithm for spam detec- cian, computer scientist, logician, cryptoanalyst,
|
|||
|
tion. It is by coincidence that Viagra, the erec- philosopher and theoretical biologist – who used
|
|||
|
tile dysfunction drug, was approved by the US Food Bayes' rules probabilities system to design the
|
|||
|
& Drug Administration in 1997, around the same 'bombe'. This was a high-speed electromechanical
|
|||
|
time as about 10 million users worldwide had made machine for testing every possible arrangement
|
|||
|
free webmail accounts. The selling companies were that an Enigma machine would produce. In order to
|
|||
|
among the first to make use of email as a medium crack the naval codes of the U-boats, Turing sim-
|
|||
|
for advertising: it was an intimate space, at the plified the 'bombe' system using Baysian methods.
|
|||
|
time reserved for private communication, for an It turned the UK headquarters into a code-breaking
|
|||
|
intimate product. In 2001, the first SpamAssasin factory. The story is well illustrated in The Imi-
|
|||
|
programme relying on Naive Bayes was uploaded to tation Game, a film by Morten Tyldum dating from
|
|||
|
SourceForge, cutting down on guerilla email mar- 2014.
|
|||
|
keting.
|
|||
|
|
|||
|
Reference --- A story about sweet peas ---
|
|||
|
Machine Learners, by Adrian MacKenzie, MIT Press,
|
|||
|
Cambridge, US, November 2017. Throughout history, some models have been invented
|
|||
|
by people with ideologies that are not to our lik-
|
|||
|
ing. The idea of regression stems from Sir Francis
|
|||
|
--- Naive Bayes & Enigma --- Galton, an influential nineteenth-century scien-
|
|||
|
tist. He spent his life studying the problem of
|
|||
|
This story about Naive Bayes is taken from the heredity – understanding how strongly the charac-
|
|||
|
book 'The Theory That Would Not Die', written by teristics of one generation of living beings mani-
|
|||
|
Sharon Bertsch McGrayne. Among other things, she fested themselves in the following generation. He
|
|||
|
describes how Naive Bayes was soon forgotten after established the field of eugenics, defining it as
|
|||
|
the death of Pierre Simon Laplace, its inventor. ‘the study of agencies under social control that
|
|||
|
The mathematician was said to have failed to may improve or impair the racial qualities of fu-
|
|||
|
credit the works of others. Therefore, he suffered ture generations, either physically or mentally'.
|
|||
|
widely circulated charges against his reputation. On Wikipedia, Galton is a prime example of scien-
|
|||
|
tific racism.
|
|||
|
47
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Galton initially approached the problem of hered-
|
|||
|
ity by examining characteristics of the sweet pea In 1962, he created the Perceptron, a model that
|
|||
|
plant. He chose this plant because the species can learns through the weighting of inputs. It was set
|
|||
|
self-fertilize. Daughter plants inherit genetic aside by the next generation of researchers, be-
|
|||
|
variations from mother plants without a contribu- cause it can only handle binary classification.
|
|||
|
tion from a second parent. This characteristic This means that the data has to be clearly separa-
|
|||
|
eliminates having to deal with multiple sources. ble, as for example, men and women, black and
|
|||
|
white. It is clear that this type of data is very
|
|||
|
Galton's research was appreciated by many intel- rare in the real world. When the so-called first
|
|||
|
lectuals of his time. In 1869, in Hereditary Ge- AI winter arrived in the 1970s and the funding de-
|
|||
|
nius, Galton claimed that genius is mainly a mat- creased, the Perceptron was also neglected. For
|
|||
|
ter of ancestry and he believed that there was a ten years it stayed dormant. When spring settled
|
|||
|
biological explanation for social inequality at the end of the 1980s, a new generation of re-
|
|||
|
across races. Galton even influenced his half- searchers picked it up again and used it to con-
|
|||
|
cousin Charles Darwin with his ideas. After read- struct neural networks. These contain multiple
|
|||
|
ing Galton's paper, Darwin stated, 'You have made layers of Perceptrons. That is how neural networks
|
|||
|
a convert of an opponent in one sense for I have saw the light. One could say that the current ma-
|
|||
|
always maintained that, excepting fools, men did chine learning season is particularly warm, but it
|
|||
|
not differ much in intellect, only in zeal and takes another winter to know a summer.
|
|||
|
hard work'. Luckily, the modern study of heredity
|
|||
|
managed to eliminate the myth of race-based ge-
|
|||
|
netic difference, something Galton tried hard to --- BERT ---
|
|||
|
maintain.
|
|||
|
Some online articles say that the year 2018 marked
|
|||
|
Galton's major contribution to the field was lin- a turning point for the field of Natural Language
|
|||
|
ear regression analysis, laying the groundwork for Processing (NLP). A series of deep-learning models
|
|||
|
much of modern statistics. While we engage with achieved state-of-the-art results on tasks like
|
|||
|
the field of machine learning, Algolit tries not question-answering or sentiment-classification.
|
|||
|
to forget that ordering systems hold power, and Google’s BERT algorithm entered the machine learn-
|
|||
|
that this power has not always been used to the ing competitions of last year as a sort of 'one
|
|||
|
benefit of everyone. Machine learning has inher- model to rule them all'. It showed a superior per-
|
|||
|
ited many aspects of statistical research, some formance over a wide variety of tasks.
|
|||
|
less agreeable than others. We need to be atten-
|
|||
|
tive, because these world views do seep into the BERT is pre-trained; its weights are learned in
|
|||
|
algorithmic models that create new orders. advance through two unsupervised tasks. This means
|
|||
|
BERT doesn’t need to be trained from scratch for
|
|||
|
References each new task. You only have to finetune its
|
|||
|
http://galton.org/letters/darwin/correspon- weights. This also means that a programmer wanting
|
|||
|
dence.htm to use BERT, does not know any longer what parame-
|
|||
|
https://www.tandfonline.com/doi/ful- ters BERT is tuned to, nor what data it has seen
|
|||
|
l/10.1080/10691898.2001.11910537 to learn its performances.
|
|||
|
http://www.paramoulipist.be/?p=1693
|
|||
|
BERT stands for Bidirectional Encoder Representa-
|
|||
|
tions from Transformers. This means that BERT al-
|
|||
|
--- Perceptron --- lows for bidirectional training. The model learns
|
|||
|
the context of a word based on all of its sur-
|
|||
|
We find ourselves in a moment in time in which roundings, left and right of a word. As such, it
|
|||
|
neural networks are sparking a lot of attention. can differentiate between 'I accessed the bank ac-
|
|||
|
But they have been in the spotlight before. The count' and 'I accessed the bank of the river'.
|
|||
|
study of neural networks goes back to the 1940s,
|
|||
|
when the first neuron metaphor emerged. The neuron Some facts:
|
|||
|
is not the only biological reference in the field - BERT_large, with 345 million parameters, is the
|
|||
|
of machine learning - think of the word corpus or largest model of its kind. It is demonstrably su-
|
|||
|
training. The artificial neuron was constructed in perior on small-scale tasks to BERT_base, which
|
|||
|
close connection to its biological counterpart. uses the same architecture with 'only' 110 million
|
|||
|
parameters.
|
|||
|
Psychologist Frank Rosenblatt was inspired by fel- - to run BERT you need to use TPUs. These are the
|
|||
|
low psychologist Donald Hebb's work on the role of Google's processors (CPUs) especially engineered
|
|||
|
neurons in human learning. Hebb stated that 'cells for TensorFLow, the deep-learning platform. TPU's
|
|||
|
that fire together wire together'. His theory now renting rates range from $8/hr till $394/hr. Algo-
|
|||
|
lies at the basis of associative human learning, lit doesn't want to work with off-the-shelf pack-
|
|||
|
but also unsupervised neural network learning. It ages, we are interested in opening up the black-
|
|||
|
moved Rosenblatt to expand on the idea of the ar- box. In that case, BERT asks for quite some sav-
|
|||
|
tificial neuron. ings in order to be used.
|
|||
|
|
|||
|
48
|
|||
|
█▒░░ ▓▒█░░▒▓███▀▒░ ░▒ ▒ ░ ▒▒▓██▒ ░░ ▒░ ░ ▒ ░▓ ░▒▓ ▒ ▒█░░▒█▓▒░▓▒ ▒▓▒▒░ ▒ ▒▒ ▓▒░ ▒░▓ █ ▒░ █░█ ▓▒░ ▒▓░░
|
|||
|
▓ ▒░ ▒▒ ░▒░▒▓ ░ ░ ▒ ░ ▒ ▒ ░ ░ ░▒░ ░▒ ░ ▒ ░▒░▓░ ▒ ░ ▒ ░▒░ ░ ░▒ ▒░░ ░ ▒ ░░▓ ░ ▓▓░░ ░░▒▒▓░
|
|||
|
░ ▒ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░░ ░ ▒ ░ ░ ░ ░ ░ ░░ ░ ░ ░ ░ ░ ░ ░ ░
|
|||
|
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
|
|||
|
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
|
|||
|
|
|||
|
░ ing will be fed examples sentation of text used * CONSTANT
|
|||
|
░ of spam and real mes- in Natural Language Pro- Constant is a non-prof-
|
|||
|
░ ░ ░ ░ sages. These examples cessing (NLP). In this it, artist-run organisa-
|
|||
|
░ ░ ░ ░ are entries, or rows model, a text is repre- tion based in Brussels
|
|||
|
░ ░ from the dataset with a sented as a collection since 1997 and active in
|
|||
|
░ ░ label, spam or non-spam. of its unique words, the fields of art, media
|
|||
|
░ GLOSSARY ░ The labelling of a disregarding grammar, and technology. Algolit
|
|||
|
░ dataset is work executed punctuation and even started as a project of
|
|||
|
░ ░ ░ by humans, they pick a word order. The model Constant in 2012.
|
|||
|
░ ░ ░ ░ label for each row of transforms the text into http://constantvzw.org
|
|||
|
░ the dataset. To ensure a list of words and how
|
|||
|
░ the quality of the la- many times they're used * DATA WORKERS
|
|||
|
░ bels multiple annotators in the text, or quite Artificial intelligences
|
|||
|
see the same row and literally a bag of that are developed to
|
|||
|
This is a non-exhaustive have to give the same words. Bag of words is serve, entertain, record
|
|||
|
wordlist, based on terms label before an example often used as a base- and know about humans.
|
|||
|
that are frequently used is included in the line, on which the new The work of these ma-
|
|||
|
in the exhibition. It training data. model has to perform chinic entities is usu-
|
|||
|
might help visitors who better. ally hidden behind in-
|
|||
|
are not familiar with * AI OR ARTIFICIAL IN- terfaces and patents. In
|
|||
|
the vocabulary related telligences * CHARACTER N-GRAM the exhibition, algo-
|
|||
|
to the field of Natural In computer science, ar- A technique that is used rithmic storytellers
|
|||
|
Language Processing tificial intelligence for authorship recogni- leave their invisible
|
|||
|
(NLP), Algolit or the (AI), sometimes called tion. When using charac- underworld to become in-
|
|||
|
Mundaneum. machine intelligence, is ter n-grams, texts are terlocutors.
|
|||
|
intelligence demon- considered as sequences
|
|||
|
* ALGOLIT strated by machines, in of characters. Let's * DUMP
|
|||
|
A group from Brussels contrast to the natural consider the character According to the English
|
|||
|
involved in artistic re- intelligence displayed trigram. All the over- dictionary, a dump is an
|
|||
|
search on algorithms and by humans and other ani- lapping sequences of accumulation of refused
|
|||
|
literature. Every month mals. Computer science three characters are and discarded materials
|
|||
|
they gather to experi- defines AI research as isolated. For example, or the place where such
|
|||
|
ment with code and texts the study of ‘intelli- the character 3-grams of materials are dumped. In
|
|||
|
that are published under gent agents’. Any device 'Suicide', would be, computing a dump refers
|
|||
|
free licenses. that perceives its envi- 'Sui', 'uic', 'ici', to a ‘database dump’, a
|
|||
|
http://www.algolit.net ronment and takes ac- 'cid' etc. Patterns record of data from a
|
|||
|
tions that maximize its found with character database used for easy
|
|||
|
* ALGOLITERARY chance of successfully n-grams focus on stylis- downloading or for back-
|
|||
|
Word invented by Algolit achieving its goals. tic choices that are un- ing up a database.
|
|||
|
for works that explore More specifically, Ka- consciously made by the Database dumps are often
|
|||
|
the point of view of the plan and Haenlein define author. The patterns re- published by free soft-
|
|||
|
algorithmic storyteller. AI as ‘a system’s abil- main stable over the ware and free content
|
|||
|
What kind of new forms ity to correctly inter- full length of the text. projects, such as
|
|||
|
of storytelling do we pret external data, to Wikipedia, to allow re-
|
|||
|
make possible in dia- learn from such data, * CLASSICAL MACHINE use or forking of the
|
|||
|
logue with machinic and to use those learn- Learning database.
|
|||
|
agencies? ings to achieve specific Naive Bayes, Support
|
|||
|
goals and tasks through Vector Machines and Lin- * FEATURE ENGINEERING
|
|||
|
* ALGORITHM flexible adaptation’. ear Regression are The process of using do-
|
|||
|
A set of instructions in Colloquially, the term called classical machine main knowledge of the
|
|||
|
a specific programming ‘artificial intelli- learning algorithms. data to create features
|
|||
|
language, that takes an gence’ is used to de- They perform well when that make machine learn-
|
|||
|
input and produces an scribe machines that learning with small ing algorithms work.
|
|||
|
output. mimic ‘cognitive’ func- datasets. But they often This means that a human
|
|||
|
tions that humans asso- require complex Readers. needs to spend time on a
|
|||
|
* ANNOTATION ciate with other human The task the Readers do, deep exploratory data
|
|||
|
The annotation process minds, such as ‘learn- is also called feature- analysis of the dataset.
|
|||
|
is a crucial step in su- ing’ and ‘problem solv- engineering (see below). In Natural Language Pro-
|
|||
|
pervised machine learn- ing’. (Wikipedia) This means that a human cessing (NLP) features
|
|||
|
ing where the algorithm needs to spend time on a can be the frequency of
|
|||
|
is given examples of * BAG OF WORDS deep exploratory data words or letters, but
|
|||
|
what it needs to learn. The bag-of-words model analysis of the dataset. also syntactical ele-
|
|||
|
A spam filter in train- is a simplifying repre- ments like nouns, adjec-
|
|||
|
49
|
|||
|
█▒░░ ▓▒█░░▒▓███▀▒░ ░▒ ▒ ░ ▒▒▓██▒ ░░ ▒░ ░ ▒ ░▓ ░▒▓ ▒ ▒█░░▒█▓▒░▓▒ ▒▓▒▒░ ▒ ▒▒ ▓▒░ ▒░▓ █ ▒░ █░█ ▓▒░ ▒▓░░
|
|||
|
▓ ▒░ ▒▒ ░▒░▒▓ ░ ░ ▒ ░ ▒ ▒ ░ ░ ░▒░ ░▒ ░ ▒ ░▒░▓░ ▒ ░ ▒ ░▒░ ░ ░▒ ▒░░ ░ ▒ ░░▓ ░ ▓▓░░ ░░▒▒▓░
|
|||
|
░ ▒ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░░ ░ ▒ ░ ░ ░ ░ ░ ░░ ░ ░ ░ ░ ░ ░ ░ ░
|
|||
|
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
|
|||
|
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
|
|||
|
|
|||
|
tives, or verbs. The to make these as free as from Virginia Woolf's nating between face and
|
|||
|
most significant fea- possible, in long-last- entire work to all ver- non-face. The jobs
|
|||
|
tures for the task to be ing, open formats that sions of Terms of Ser- posted on this platform
|
|||
|
solved, must be care- can be used on almost vice published by Google are often paid less than
|
|||
|
fully selected and any computer. As of since its existence. a cent per task. Tasks
|
|||
|
passed over to the clas- 23 June 2018, Project that are more complex or
|
|||
|
sical machine learning Gutenberg reached 57,000 * MACHINE LEARNING MOD- require more knowledge
|
|||
|
algorithm. items in its collection els can be paid up to sev-
|
|||
|
of free eBooks. Algorithms based on eral cents. Many aca-
|
|||
|
* FLOSS OR FREE LIBRE (Wikipedia) statistics, mainly used demic researchers use
|
|||
|
Open Source Software to analyse and predict Mechanical Turk as an
|
|||
|
Software that anyone is * HENRI LA FONTAINE situations based on ex- alternative to have
|
|||
|
freely licensed to use, Henri La Fontaine isting cases. In this their students execute
|
|||
|
copy, study, and change (1854-1943) is a Belgian exhibition we focus on these tasks.
|
|||
|
in any way, and the politician, feminist and machine learning models
|
|||
|
source code is openly pacifist. He was awarded for text processing or * MUNDANEUM
|
|||
|
shared so that people the Nobel Peace Prize in Natural language pro- In the late nineteenth
|
|||
|
are encouraged to volun- 1913 for his involvement cessing', in short, century two young Bel-
|
|||
|
tarily improve the de- in the International 'nlp'. These models have gian jurists, Paul Otlet
|
|||
|
sign of the software. Peace Bureau and his learned to perform a (1868-1944), ‘the father
|
|||
|
This is in contrast to contribution to the or- specific task on the ba- of documentation’, and
|
|||
|
proprietary software, ganization of the peace sis of existing texts. Henri La Fontaine
|
|||
|
where the software is movement. In 1895, to- The models are used for (1854-1943), statesman
|
|||
|
under restrictive copy- gether with Paul Otlet, search engines, machine and Nobel Peace Prize
|
|||
|
right licensing and the he created the Interna- translations and sum- winner, created The Mun-
|
|||
|
source code is usually tional Bibliography In- maries, spotting trends daneum. The project
|
|||
|
hidden from the users. stitute, which became in new media networks aimed at gathering all
|
|||
|
(Wikipedia) the Mundaneum. Within and news feeds. They in- the world’s knowledge
|
|||
|
this institution, which fluence what you get to and file it using the
|
|||
|
* GIT aimed to bring together see as a user, but also Universal Decimal Clas-
|
|||
|
A software system for all the world's knowl- have their word to say sification (UDC) system
|
|||
|
tracking changes in edge, he contributed to in the course of stock that they had invented.
|
|||
|
source code during soft- the development of the exchanges worldwide, the
|
|||
|
ware development. It is Universal Decimal Clas- detection of cybercrime * NATURAL LANGUAGE
|
|||
|
designed for coordinat- sification (CDU) system. and vandalism, etc. A natural language or
|
|||
|
ing work among program- ordinary language is any
|
|||
|
mers, but it can be used * KAGGLE * MARKOV CHAIN language that has
|
|||
|
to track changes in any An online platform where Algorithm that scans the evolved naturally in hu-
|
|||
|
set of files. Before users find and publish text for the transition mans through use and
|
|||
|
starting a new project, data sets, explore and probability of letter or repetition without con-
|
|||
|
programmers create a build machine learning word occurrences, re- scious planning or pre-
|
|||
|
"git repository" in models, work with other sulting in transition meditation. Natural lan-
|
|||
|
which they will publish data scientists and ma- probability tables which guages can take differ-
|
|||
|
all parts of the code. chine learning engi- can be computed even ent forms, such as
|
|||
|
The git repositories of neers, and enter compe- without any semantic or speech or signing. They
|
|||
|
Algolit can be found on titions to solve data grammatical natural lan- are different from con-
|
|||
|
https://gitlab.con- science challenges. guage understanding. It structed and formal lan-
|
|||
|
stantvzw.org/algolit. About half a million can be used for analyz- guages such as those
|
|||
|
data scientists are ac- ing texts, but also for used to program comput-
|
|||
|
* GUTENBERG.ORG tive on Kaggle. It was recombining them. It is ers or to study logic.
|
|||
|
Project Gutenberg is an founded by Goldbloom and is widely used in spam (Wikipedia)
|
|||
|
online platform run by Ben Hamner in 2010 and generation.
|
|||
|
volunteers to ‘encourage acquired by Google in * NLP OR NATURAL LAN-
|
|||
|
the creation and distri- March 2017. * MECHANICAL TURK guage Processing
|
|||
|
bution of eBooks’. It The Amazon Mechanical Natural language pro-
|
|||
|
was founded in 1971 by * LITERATURE Turk is an online plat- cessing (NLP) is a col-
|
|||
|
American writer Michael Algolit understands the form for humans to exe- lective term referring
|
|||
|
S. Hart and is the old- notion of literature in cute tasks that algo- to automatic computa-
|
|||
|
est digital library. the way a lot of other rithms cannot. Examples tional processing of hu-
|
|||
|
Most of the items in its experimental authors do. include annotating sen- man languages. This in-
|
|||
|
collection are the full It includes all linguis- tences as being positive cludes algorithms that
|
|||
|
texts of public domain tic production, from the or negative, spotting take human-produced text
|
|||
|
books. The project tries dictionary to the Bible, number plates, discrimi- as input, and attempt to
|
|||
|
50
|
|||
|
█▒░░ ▓▒█░░▒▓███▀▒░ ░▒ ▒ ░ ▒▒▓██▒ ░░ ▒░ ░ ▒ ░▓ ░▒▓ ▒ ▒█░░▒█▓▒░▓▒ ▒▓▒▒░ ▒ ▒▒ ▓▒░ ▒░▓ █ ▒░ █░█ ▓▒░ ▒▓░░
|
|||
|
▓ ▒░ ▒▒ ░▒░▒▓ ░ ░ ▒ ░ ▒ ▒ ░ ░ ░▒░ ░▒ ░ ▒ ░▒░▓░ ▒ ░ ▒ ░▒░ ░ ░▒ ▒░░ ░ ▒ ░░▓ ░ ▓▓░░ ░░▒▒▓░
|
|||
|
░ ▒ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░░ ░ ▒ ░ ░ ░ ░ ░ ░░ ░ ░ ░ ░ ░ ░ ░ ░
|
|||
|
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
|
|||
|
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
|
|||
|
|
|||
|
generate text that re- tentielle (Workspace for manually define rules the training material,
|
|||
|
sembles it. Potential Literature). for them. As prediction and adapt it to the ma-
|
|||
|
Oulipo was created in models they are then chine's task. It doesn't
|
|||
|
* NEURAL NETWORKS Paris by the French called rule-based mod- make sense to train a
|
|||
|
Computing systems in- writers Raymond Queneau els, opposed to statis- machine with nineteenth-
|
|||
|
spired by the biological and François Le Lion- tical models. Rule-based century novels if its
|
|||
|
neural networks that nais. They rooted their models are handy for mission is to analyze
|
|||
|
constitute animal practice in the European tasks that are specific, tweets.
|
|||
|
brains. The neural net- avant-garde of the twen- like detecting when a
|
|||
|
work itself is not an tieth century and in the scientific paper con- * UNSUPERVISED MACHINE
|
|||
|
algorithm, but rather a experimental tradition cerns a certain mole- Learning Models
|
|||
|
framework for many dif- of the 1960s. For cule. With very little Unsupervised machine
|
|||
|
ferent machine learning Oulipo, the creation of sample data, they can learning models don't
|
|||
|
algorithms to work to- rules becomes the condi- perform well. need the step of annota-
|
|||
|
gether and process com- tion to generate new tion of the data by hu-
|
|||
|
plex data inputs. Such texts, or what they call * SENTIMENT ANALYSIS mans. This saves a lot
|
|||
|
systems ‘learn’ to per- potential literature. Also called 'opinion of time, energy, money.
|
|||
|
form tasks by consider- Later, in 1981, they mining' A basic task Instead, they need a
|
|||
|
ing examples, generally also created ALAMO, Ate- in sentiment analysis large amount of training
|
|||
|
without being programmed lier de littérature as- is classifying given data, which is not al-
|
|||
|
ways available and can
|
|||
|
rules. For example, in tique et les ordinateurs or neutral. Advanced, take a long cleaning
|
|||
|
image recognition, they (Workspace for litera- 'beyond polarity' time beforehand.
|
|||
|
might learn to identify ture assisted by maths sentiment
|
|||
|
images that contain cats and computers). classification looks, * WORD EMBEDDINGS
|
|||
|
Language modelling tech-
|
|||
|
ages that have been man- * PAUL OTLET states such as 'angry' niques that through mul-
|
|||
|
ually labeled as ‘cat’ Paul Otlet (1868 – 1944) 'sad' and 'happy' tiple mathematical oper-
|
|||
|
or ‘no cat’ and using was a Belgian author, Sentiment ations of counting and
|
|||
|
ordering, plot words
|
|||
|
cats in other images. lawyer and peace ac- to user materials such into a multi-dimensional
|
|||
|
They do this without any tivist; he is one of as reviews and survey vector space. When em-
|
|||
|
prior knowledge about several people who have responses, comments bedding words, they
|
|||
|
cats, for example, that been considered the fa- and posts on social transform from being
|
|||
|
they have fur, tails, ther of information sci- media, and healthcare distinct symbols into
|
|||
|
mathematical objects
|
|||
|
that can be multiplied,
|
|||
|
tomatically generate created the Universal to customer service, divided, added or sub-
|
|||
|
identifying characteris- Decimal Classification, from stock exchange stracted.
|
|||
|
tics from the learning that was widespread in transactions to clinical
|
|||
|
material that they libraries. Together with medicine. * WORDNET
|
|||
|
process. (Wikipedia) Henri La Fontaine he Wordnet is a combination
|
|||
|
created the Palais Mon- * SUPERVISED MACHINE of a dictionary and a
|
|||
|
* OPTICAL CHARACTER dial (World Palace), learning models thesaurus that can be
|
|||
|
Recognition (OCR) later, the Mundaneum to For the creation of su- read by machines. Ac-
|
|||
|
Computer processes for house the collections pervised machine learn- cording to Wikipedia it
|
|||
|
translating images of and activities of their ing models, humans anno- was created in the Cog-
|
|||
|
scanned texts into ma- various organizations tate sample text with nitive Science Labora-
|
|||
|
nipulable text files. and institutes. labels before feeding it tory of Princeton
|
|||
|
to a machine to learn. University starting in
|
|||
|
* ORACLE * PYTHON Each sentence, paragraph 1985. The project was
|
|||
|
Oracles are prediction The main programming or text is judged by at initially funded by the
|
|||
|
or profiling machines, a language that is glob- least 3 annotators US Office of Naval Re-
|
|||
|
specific type of algo- ally used for natural whether it is spam or search and later also by
|
|||
|
rithmic models, mostly language processing, was not spam, positive or other US government
|
|||
|
based on statistics. invented in 1991 by the negative etc. agencies including
|
|||
|
They are widely used in Dutch programmer Guido DARPA, the National
|
|||
|
smartphones, computers, Van Rossum. * TRAINING DATA Science Foundation, the
|
|||
|
tablets. Machine learning algo- Disruptive Technology
|
|||
|
* RULE-BASED MODELS rithms need guidance. In Office (formerly the Ad-
|
|||
|
* OULIPO Oracles can be created order to separate one vanced Research and
|
|||
|
Oulipo stands for Ou- using different tech- thing from another, they Development Activity),
|
|||
|
vroir de litterature po- niques. One way is to need texts to extract and REFLEX.
|
|||
|
51
|
|||
|
should carefully choose
|
|||
|
|
|||
|
█▒░░ ▓▒█░░▒▓███▀▒░ ░▒ ▒ ░ ▒▒▓██▒ ░░ ▒░ ░ ▒ ░▓ ░▒▓ ▒ ▒█░░▒█▓▒░▓▒ ▒▓▒▒░ ▒ ▒▒ ▓▒░ ▒░▓ █ ▒░ █░█ ▓▒░ ▒▓░░
|
|||
|
▓ ▒░ ▒▒ ░▒░▒▓ ░ ░ ▒ ░ ▒ ▒ ░ ░ ░▒░ ░▒ ░ ▒ ░▒░▓░ ▒ ░ ▒ ░▒░ ░ ░▒ ▒░░ ░ ▒ ░░▓ ░ ▓▓░░ ░░▒▒▓░
|
|||
|
░ ▒ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░░ ░ ▒ ░ ░ ░ ░ ░ ░░ ░ ░ ░ ░ ░ ░ ░ ░
|
|||
|
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
|
|||
|
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
52
|
|||
|
◝ humans learn with machines ◜ ◡ machines learn from machines ◞ ◡ machines learn with humans ◞ ◝
|
|||
|
humans learn from machines ◟ ◜ machines learn with machines ◠ ◜ machines learn from humans ◟ ◠
|
|||
|
humans learn with humans ◞ ◝ humans learn from humans ◞ ◠ humans learn with machines ◟ ◡ mac
|
|||
|
ines learn from machines ◡ ◡ machines learn with humans ◟ ◡ humans learn from machines ◝ ◟
|
|||
|
achines learn with machines ◠ ◝ machines learn from humans ◜ ◝ humans learn with humans ◞ ◞
|
|||
|
humans learn from humans ◡ ◞ humans learn with machines ◠ ◠ machines learn from machines ◠
|
|||
|
machines learn with humans ◞ ◜ humans learn from machines ◜ ◠ machines learn with machines ◝
|
|||
|
◜ machines learn from humans ◜ ◠ humans learn with humans ◝ ◟ humans learn from humans ◞
|
|||
|
◜ humans learn with machines ◡ ◡ machines learn from machines ◡ ◟ machines learn with humans
|
|||
|
◠ ◠ humans learn from machines ◡ ◜ machines learn with machines ◜ ◟ machines learn from
|
|||
|
umans ◟ ◞ humans learn with humans ◞ ◟ humans learn from humans ◜ ◠ humans learn with ma
|
|||
|
hines ◜ ◠ machines learn from machines ◝ ◠ machines learn with humans ◝ ◞ humans learn f
|
|||
|
om machines ◝ ◡ machines learn with machines ◜ ◡ machines learn from humans ◜ ◠ humans l
|
|||
|
arn with humans ◡ ◡ humans learn from humans ◝ ◞ humans learn with machines ◟ ◡ machines
|
|||
|
learn from machines ◜ ◜ machines learn with humans ◠ ◞ humans learn from machines ◝ ◠ ma
|
|||
|
hines learn with machines ◟ ◟ machines learn from humans ◝ ◠ humans learn with humans ◟
|
|||
|
humans learn from humans ◝ ◜ humans learn with machines ◠ ◝ machines learn from machines ◞
|
|||
|
◠ machines learn with humans ◝ ◟ humans learn from machines ◟ ◞ machines learn with machines
|
|||
|
◜ ◞ machines learn from humans ◞ ◡ humans learn with humans ◠ ◞ humans learn from human
|
|||
|
◠ ◜ humans learn with machines ◡ ◞ machines learn from machines ◜ ◠ machines learn w
|
|||
|
th humans ◡ ◝ humans learn from machines ◝ ◟ machines learn with machines ◠ ◠ machine
|
|||
|
learn from humans ◞ ◟ humans learn with humans ◠ ◞ humans learn from humans ◠ ◠ huma
|
|||
|
s learn with machines ◡ ◡ machines learn from machines ◜ ◞ machines learn with humans ◡
|
|||
|
◟ humans learn from machines ◜ ◜ machines learn with machines ◜ ◝ machines learn from human
|
|||
|
◜ ◠ humans learn with humans ◝ ◡ humans learn from humans ◡ ◞ humans learn with mach
|
|||
|
nes ◜ ◝ machines learn from machines ◝ ◜ machines learn with humans ◞ ◜ humans learn
|
|||
|
rom machines ◞ ◝ machines learn with machines ◞ ◜ machines learn from humans ◡ ◞ huma
|
|||
|
s learn with humans ◟ ◜ humans learn from humans ◞ ◡ humans learn with machines ◝ ◝ m
|
|||
|
chines learn from machines ◜ ◟ machines learn with humans ◡ ◟ humans learn from machines ◠
|
|||
|
◝ machines learn with machines ◜ ◡ machines learn from humans ◞ ◝ humans learn with huma
|
|||
|
s ◝ ◠ humans learn from humans ◞ ◜ humans learn with machines ◠ ◝ machines learn from
|
|||
|
machines ◟ ◡ machines learn with humans ◝ ◝ humans learn from machines ◞ ◞ machines l
|
|||
|
arn with machines ◠ ◠ machines learn from humans ◠ ◡ humans learn with humans ◜ ◜ hum
|
|||
|
ns learn from humans ◞ ◞ humans learn with machines ◡ ◝ machines learn from machines ◟
|
|||
|
◝ machines learn with humans ◠ ◟ machines learn with humans ◠ ◜ machines learn from
|
|||
|
machines ◡ ◜ humans learn with machines ◞ ◟ humans learn from humans ◜ ◡ humans learn
|
|||
|
with humans ◝ ◞ machines learn from humans ◜ ◝ machines learn with machines ◜ ◠ human
|
|||
|
learn from machines ◡ ◝ machines learn with humans ◝ ◜ machines learn from machines ◜
|
|||
|
◞ humans learn with machines ◠ ◝ humans learn from humans ◠ ◝ humans learn with humans ◞
|
|||
|
◡ machines learn from humans ◜ ◝ machines learn with machines ◠ ◟ humans learn from machi
|
|||
|
es ◜ ◟ machines learn with humans ◝ ◝ machines learn from machines ◞ ◜ humans learn w
|
|||
|
th machines ◝ ◡ humans learn from humans ◝ ◝ humans learn with humans ◠ ◠ machines le
|
|||
|
rn from humans ◝ ◡ machines learn with machines ◡ ◡ humans learn from machines ◠ ◞ ma
|
|||
|
hines learn with humans ◝ ◜ machines learn from machines ◜ ◝ humans learn with machines ◠
|
|||
|
◞ humans learn from humans ◝ ◡ humans learn with humans ◞ ◡ machines learn from humans ◟
|
|||
|
◟ machines learn with machines ◝ ◝ humans learn from machines ◜ ◟ machines learn with
|
|||
|
umans ◡ ◝ machines learn from machines ◡ ◝ humans learn with machines ◞ ◜ humans lear
|
|||
|
from humans ◜ ◝ humans learn with humans ◞ ◡ machines learn from humans ◝ ◡ machines
|
|||
|
learn with machines ◞ ◟ humans learn from machines ◜ ◞ machines learn with humans ◟ ◡
|
|||
|
machines learn from machines ◜ ◝ humans learn with machines ◠ ◠ humans learn from humans ◠
|
|||
|
◝ humans learn with humans ◟ ◞ machines learn from humans ◝ ◠ machines learn with machines
|
|||
|
◜ ◟ humans learn from machines ◠ ◝ machines learn with humans ◝ ◜ machines learn from ma
|
|||
|
hines ◟ ◟ humans learn with machines ◞ ◡ humans learn from humans ◝ ◝ humans learn with
|
|||
|
umans ◡ ◝ machines learn from humans ◝ ◡ machines learn with machines ◟ ◞ humans learn f
|
|||
|
om machines ◝ ◟ machines learn with humans ◝ ◜ machines learn from machines ◝ ◠ humans l
|
|||
|
arn with machines ◠ ◠ humans learn from humans ◟ ◜ humans learn with humans ◟ ◝ machines
|
|||
|
learn from humans ◡ ◡ machines learn with machines ◜ ◜ humans learn from machines ◠ ◟ ma
|
|||
|
hines learn with humans ◞ ◜ machines learn from machines ◠ ◜ humans learn with machines ◜
|
|||
|
◞ humans learn from humans ◝ ◟ humans learn with humans ◟ ◞ machines learn from humans ◟
|
|||
|
◝ machines learn with machines ◡ ◜ humans learn from machines ◠ ◠ machines learn with humans ◞
|
|||
|
◡ machines learn from machines ◟ ◝ humans learn with machines ◜ ◞ humans learn from huma
|
|||
|
s ◝ ◞ humans learn with humans ◜ ◟ machines learn from humans ◜ ◞ machines learn with ma
|
|||
|
hines ◝ ◞ humans learn from machines ◝ ◜ machines learn with humans ◟ ◜ machines learn from
|
|||
|
machines ◡ ◟ humans learn with machines ◞ ◠ humans learn from humans ◞ ◟ humans learn with
|
|||
|
umans ◠ ◜ machines learn from humans ◡ ◠ machines learn with machines ◠ ◝ humans learn from
|
|||
|
machines ◠ ◜ machines learn with humans ◞ ◠ machines learn from machines ◞ ◠ humans learn w
|
|||
|
th machines ◜ ◟ humans learn from humans ◝ ◠ humans learn with humans ◝ ◟ machines learn from
|
|||
|
humans ◜ ◜ machines learn with machines ◠ ◞ humans learn from machines ◠ ◡ machines learn with
|
|||
|
|
|||
|
|