Files for the publication & poster for Data Workers, an exhibition by Algolit. http://www.algolit.net/index.php/Data_Workers
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

3640 lines
344 KiB

data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read
nd learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean,
nform, read and learn data workers write, perform, clean, inform, read and learn data workers write,
perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn
data workers write, perform, clean, inform, read and learn data workers write, perform, clean, infor
, read and learn data workers write, perform, clean, inform, read and learn data workers w
ite, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and l
arn data workers write, perform, clean, inform, read and learn data workers write, p
rform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn
data workers write, perform, clean, inform, read and learn data workers write,
perform, clean, inform, read and learn data workers write, perform, clean, inform, read and
earn data workers write, perform, clean, inform, read and learn data wor
ers write, perform, clean, inform, read and learn data workers write, perform, clean, inf
rm, read and learn data workers write, perform, clean, inform, read and learn
data workers write, perform, clean, inform, read and learn data workers wri
e, perform, clean, inform, read and learn data workers write, perform, clean, inform,
read and learn data workers write, perform, clean, inform, read and learn
data workers write, perform, clean, inform, read and learn data wor
ers write, perform, clean, inform, read and learn data workers write, perform, cl
an, inform, read and learn data workers write, perform, clean, inform, read and
earn data workers write, perform, clean, inform, read and learn
data workers write, perform, clean, inform, read and learn dat
workers write, perform, clean, inform, read and learn data workers write, p
rform, clean, inform, read and learn data workers write, perform, clean, in
orm, read and learn data workers write, perform, clean, inform, read and l
arn data workers write, perform, clean, inform, read and learn
data workers write, perform, clean, inform, read and learn
data workers write, perform, clean, inform, read and learn
data workers write, perform, clean, inform, read and learn data work
rs write, perform, clean, inform, read and learn data workers write,
perform, clean, inform, read and learn data workers write, perform,
clean, inform, read and learn data workers write, perform, clean,
nform, read and learn data workers write, perform, clean, inform,
read and learn data workers write, perform, clean, inform, read
nd learn data workers write, perform, clean, inform, read and l
arn data workers write, perform, clean, inform, read and learn
data workers write, perform, clean, inform, read and learn
data workers write, perform, clean, inform, read and learn
data workers write, perform, clean, inform, read and learn
data workers write, perform, clean, inform, read and learn
data workers write, perform, clean, inform, read and learn
data workers write, perform, clean, inform, read and learn
data workers write, perform, clean, inform, read and learn
data workers write, perform, clean, inform, read and l
arn data workers write, perform, clean, inform, read
nd learn data workers write, perform, clean, inform,
read and learn data workers write, perform, clean,
nform, read and learn data workers write, perform,
clean, inform, read and learn data workers write,
perform, clean, inform, read and learn data work
rs write, perform, clean, inform, read and learn
data workers write, perform, clean, inform, read and learn
data workers write, perform, clean, inform, read and learn
data workers write, perform, clean, inform, read and learn
data workers write, perform, clean, inform, read and learn
What
can
humans learn from humans
humans learn with machines
machines learn from machines
machines learn with humans
humans learn from machines
machines learn with machines
machines learn from humans
humans learn with humans
? ? ?
Data Workers, an exhibition at the Mundaneum in Mons from 28 March until 28 April 2019.
0 12 3 4 5 67 8 9 0
12 3 4 5 67 8 9 0 12
3 4 5 67 8 9 0 1 2
3 4 5 6 7 8 9 0 1 2 3
4 56 7 8 9 01 2 3
4 56 7 8 9 01 2 3
4 56 7 8 9 01 2 3
4 56 7 8 9 0 1 2 3 4
5 6 7 8 9 0 1 2 3 45 6
7 8 90 1 2 3 45 6
7 8 90 1 2 3 45 6
7 8 90 1 2 3 4 5 6
7 8 9 0 1 2 3 4 5 6
7 89 0 1 2 34 5 6
7 89 0 1 2 34 5 6
7 89 0 1 2 34 5 6 7
89 0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 78 9
0 1 23 4 5 6 78 9 0
1 23 4 5 6 78 9 0
1 23 4 5 6 78 9 0
1 2 3 4 5 6 7 8 9 0
1 2 3 4 5 67 8 9 0 12
3 4 5 67 8 9 0 12
3 4 5 67 8 9 0 12 3
4 5 6 7 8 9 0 1 2 3
4 5 6 7 8 9 01 2 3
4 56 7 8 9 01 2 3
4 56 7 8 9 01 2 3
4 56 7 8 9 01 2 3 4
5 6 7 8 9 0 1 2 3 4 5 6
7 8 90 1 2 3 45 6
7 8 90 1 2 3 45 6
7 8 90 1 2 3 45 6
7 8 90 1 2 3 4 5 6 7
8 9 0 1 2 3 4 5 6 7
89 0 1 2 34 5 6 7
89 0 1 2 34 5 6 7 89
0 1 2 34 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 23 4 5 6 78 9 0
1 23 4 5 6 78 9 0
1 23 4 5 6 78 9 0
1 23 4 5 6 7 8 9 0
1 2 3 4 5 6 7 8 9 0 12 3
4 5 67 8 9 0 12 3
4 5 67 8 9 0 12 3
4 5 67 8 9 0 12 3
4 5 6 7 8 9 0 1 2 3
4 5 6 7 8 9 01 2 3 4
56 7 8 9 01 2 3 4
56 7 8 9 01 2 3 4 5
6 7 8 9 0 1 2 3 4 5 6
7 8 9 0 1 2 3 45 6
7 8 90 1 2 3 45 6
7 8 90 1 2 3 45 6
7 8 90 1 2 3 45 6 7
8 9 0 1 2 3 4 5 6 7
8 9 0 1 2 34 5 6 7 89
0 1 2 34 5 6 7 89 0
1 2 34 5 6 7 89 0
1 2 34 5 6 7 8 9 0
1 2 3 4 5 6 7 8 9 0
1 23 4 5 6 78 9 0
1 23 4 5 6 78 9 0
1 23 4 5 6 78 9 0 1
2 3 4 5 6 7 8 9 0 1 2 3
4 5 67 8 9 0 12 3
4 5 67 8 9 0 12 3
2
ABOUT AT THE MUNDANEUM
Data Workers is an exhibition of algoliterary works, of stories In the late nineteenth century two young
told from an ‘algorithmic storyteller point of view’. The exhibi- Belgian jurists, Paul Otlet (1868–1944),
tion was created by members of Algolit, a group from Brussels in- the 'father of documentation’, and Henri
volved in artistic research on algorithms and literature. Every La Fontaine (1854-1943), statesman and
month they gather to experiment with F/LOSS code and texts. Some Nobel Peace Prize winner, created the
works are by students of Arts² and external participants to the Mundaneum. The project aimed to gather
workshop on machine learning and text organized by Algolit in Oc- all the world’s knowledge and to file it
tober 2018 at the Mundaneum. using the Universal Decimal Classifica-
tion (UDC) system that they had invent-
Companies create artificial intelligence (AI) systems to serve, ed. At first it was an International In-
entertain, record and learn about humans. The work of these ma- stitutions Bureau dedicated to interna-
chinic entities is usually hidden behind interfaces and patents. tional knowledge exchange. In the twen-
In the exhibition, algorithmic storytellers leave their invisible tieth century the Mundaneum became a
underworld to become interlocutors. The data workers operate in universal centre of documentation. Its
different collectives. Each collective represents a stage in the collections are made up of thousands of
design process of a machine learning model: there are the Writ- books, newspapers, journals, documents,
ers, the Cleaners, the Informants, the Readers, the Learners and posters, glass plates and postcards in-
the Oracles. The boundaries between these collectives are not dexed on millions of cross-referenced
fixed; they are porous and permeable. At times, Oracles are also cards. The collections were exhibited
Writers. At other times Readers are also Oracles. Robots voice and kept in various buildings in Brus-
experimental literature, while algorithmic models read data, turn sels, including the Palais du Cinquante-
words into numbers, make calculations that define patterns and naire. The remains of the archive only
are able to endlessly process new texts ever after. moved to Mons in 1998.
The exhibition foregrounds data workers who impact our daily Based on the Mundaneum, the two men de-
lives, but are either hard to grasp and imagine or removed from signed a World City for which Le Corbu-
the imagination altogether. It connects stories about algorithms sier made scale models and plans. The
in mainstream media to the storytelling that is found in techni- aim of the World City was to gather,
cal manuals and academic papers. Robots are invited to engage in at a global level, the institutions of
dialogue with human visitors and vice versa. In this way we might knowledge: libraries, museums and uni-
understand our respective reasonings, demystify each other's be- versities. This project was never rea-
haviour, encounter multiple personalities, and value our collec- lized. It suffered from its own utopia.
tive labour. It is also a tribute to the many machines that Paul The Mundaneum is the result of a visio-
Otlet and Henri La Fontaine imagined for their Mundaneum, showing nary dream of what an infrastructure for
their potential but also their limits. universal knowledge exchange could be.
It attained mythical dimensions at the
--- time. When looking at the concrete ar-
chive that was developed, that collec-
Data Workers was created by Algolit. tion is rather eclectic and specific.
Works by: Cristina Cochior, Gijs de Heij, Sarah Garcin, Artificial intelligence systems today
AnMertens, Javier Lloret, Louise Dekeuleneer, Florian Van de Weyer, come with their own dreams of universal-
Laetitia Trozzi, Rémi Forte, Guillaume Slizewicz, ity and knowledge production. When read-
Michael Murtaugh, Manetta Berends, Mia Melvær. ing about these systems, the visionary
dreams of their makers were there from
Co-produced by: Arts², Constant and Mundaneum. the beginning of their development in
the 1950s. Nowadays, their promise has
With the support of: Wallonia-Brussels Federation/Digital Arts, also attained mythical dimensions. When
Passa Porta, UGent, DHuF - Digital Humanities Flanders and looking at their concrete applications,
Distributed Proofreaders Project. the collection of tools is truly innova-
tive and fascinating, but at the same
Thanks to: Mike Kestemont, Michel Cleempoel, Donatella Portoghese, time, rather eclectic and specific. For
François Zajéga, Raphaèle Cornille, Vincent Desfromont, Data Workers, Algolit combined some of
Kris Rutten, Anne-Laure Buisson, David Stampfli. the applications with 10 per cent of the
digitized publications of the Interna-
tional Institutions Bureau. In this way,
we hope to poetically open up a discus-
sion about machines, algorithms, and
technological infrastructures.
3
CONTEXTUAL STORIES
ABOUT ALGOLIT
--- Why contextual stories? --- spread by the media, often limited to superficial
reporting and myth-making. By creating algoliter-
During the monthly meetings of Algolit, we study ary works, we offer humans an introduction to
manuals and experiment with machine learning tools techniques that co-shape their daily lives.
for text processing. And we also share many, many
stories. With the publication of these stories we
hope to recreate some of that atmosphere. The sto- --- What is literature? ---
ries also exist as a podcast that can be down-
loaded from http://www.algolit.net. Algolit understands the notion of literature in
the way a lot of other experimental authors do: it
For outsiders, algorithms only become visible in includes all linguistic production, from the dic-
the media when they achieve an outstanding perfor- tionary to the Bible, from Virginia Woolf's entire
mance, like Alpha Go, or when they break down in work to all versions of the Terms of Service pub-
fantastically terrifying ways. Humans working in lished by Google since its existence. In this
the field though, create their own culture on and sense, programming code can also be literature.
offline. They share the best stories and experi-
ences during live meetings, research conferences The collective Oulipo is a great source of inspi-
and annual competitions like Kaggle. These stories ration for Algolit. Oulipo stands for Ouvroir de
that contextualize the tools and practices can be litterature potentielle (Workspace for Potential
funny, sad, shocking, interesting. Literature). Oulipo was created in Paris by the
French writers Raymond Queneau and François Le
A lot of them are experiential learning cases. The Lionnais. They rooted their practice in the Euro-
implementations of algorithms in society generate pean avant-garde of the twentieth century and in
new conditions of labour, storage, exchange, be- the experimental tradition of the 1960s.
haviour, copy and paste. In that sense, the con-
textual stories capture a momentum in a larger an- For Oulipo, the creation of rules becomes the con-
thropo-machinic story that is being written at dition to generate new texts, or what they call
full speed and by many voices. potential literature. Later, in 1981, they also
created ALAMO, Atelier de littérature assistée par
la mathématique et les ordinateurs (Workspace for
--- We create 'algoliterary' works --- literature assisted by maths and computers).
The term 'algoliterary' comes from the name of our
research group Algolit. We have existed since 2012 --- An important difference ---
as a project of Constant, a Brussels-based organi-
zation for media and the arts. We are artists, While the European avant-garde of the twentieth
writers, designers and programmers. Once a month century pursued the objective of breaking with
we meet to study and experiment together. Our work conventions, members of Algolit seek to make con-
can be copied, studied, changed, and redistributed ventions visible.
under the same free license. You can find all the
information on: http://www.algolit.net. 'I write: I live in my paper, I invest it, I walk
through it.' (Espèces d'espaces. Journal d'un us-
The main goal of Algolit is to explore the view- ager de l'espace, Galilée, Paris, 1974)
point of the algorithmic storyteller. What new
forms of storytelling do we make possible in dia- This quote from Georges Perec in Espèces d'espaces
logue with these machinic agencies? Narrative could be taken up by Algolit. We're not talking
viewpoints are inherent to world views and ideolo- about the conventions of the blank page and the
gies. Don Quixote, for example, was written from literary market, as Georges Perec was. We're re-
an omniscient third-person point of view, showing ferring to the conventions that often remain hid-
Cervantes’ relation to oral traditions. Most con- den behind interfaces and patents. How are tech-
temporary novels use the first-person point of nologies made, implemented and used, as much in
view. Algolit is interested in speaking through academia as in business infrastructures?
algorithms, and in showing you the reasoning un-
derlying one of the most hidden groups on our We propose stories that reveal the complex hy-
planet. bridized system that makes machine learning possi-
ble. We talk about the tools, the logics and the
To write in or through code is to create new forms ideologies behind the interfaces. We also look at
of literature that are shaping human language in who produces the tools, who implements them, and
unexpected ways. But machine Learning techniques who creates and accesses the large amounts of data
are only accessible to those who can read, write needed to develop prediction machines. One could
and execute code. Fiction is a way of bridging the say, with the wink of an eye, that we are collabo-
gap between the stories that exist in scientific rators of this new tribe of human-robot hybrids.
papers and technical manuals, and the stories
4
writers write writers write writers write writers write writers write writers write writ
rs write writers write writers write writers write writers write
writers write writers write writers write writers write
writers write writers write writers write writers write
writers write writers write writers write
writers write writers write writers write
writers write writers write writers write
writers write writers write
writers write writers write writers write
writers write writers write
writers write writers write
writers write writers write
writers write writers write
writers write writers write
writers write writers write
writers write writers write
writers write writers write
writers write writ
rs write writers write
writers write writers write
writers write
writers write writers write
writers write writer
write writers write
writers write writ
rs write writers write
writers write
writers write writers write
writers write
writers write w
iters write writers write
writers write
writers write
writers write writers write
writers write
writers write
writers write
writers write writer
write writers write
writers write
writers write
writers write
writers write
writers write
writers write writ
rs write writers write
writers write
writers write
writers write
writers write
writers write
writers write
writers write
writers write
writers write
writers write
writers write
writers write
writers write
writers write
writers write
writers write
writers write
writers write
writers write
writers write
writers write
writers write
writers write
5
86ncrg k en3 a ioi-t i i l1 e i +-+-+-+-+-+-+-+ a +-+-+-+-+-+ l 9 t7ccpI46ed6t o w 7e a5o3 -
el, e 7 nh 71 e 5 4 3 4 |w|r|i|t|e|r|s| i |w|r|i|t|e| daml su h i e1 ww A l e59se a 5o wl
amlt t s w tlo n r 7a o9 +-+-+-+-+-+-+-+ ta +-+-+-+-+-+ hw t o4e e n,o32r , wd2 eo re 67n r
o1ife tt s 38 nt l 74 o 7 5i oda 65 ei r 9 7 n 5 n1r m l ot a51 e 3ma, 14swn 7 r r
b o i 3 se2 rceit ne a ki r 8 1iw3s n an t 8 8 r ra bn 1 eue r t4a r sT r phe o
e 6e6 7h5orir de6 1 +-+-+-+-+ +-+-+-+-+-+-+-+ t u +-+-+-+-+ 1 8 97o e c 4 d 8 h 7 z o a c4
w as 3r 17r p ai |d|a|t|a| |w|o|r|k|e|r|s| |w|o|r|k| 6 r6v56 4 2i7 e tu1 r9 w 5 8
52 1 wi r 4hn G +-+-+-+-+ +-+-+-+-+-+-+-+ n +-+-+-+-+ nr 4 21 n raa2 Pn9 h
a ca3 adw sara +-+-+-+-+ +-+-+-+-+-+-+-+ +-+-+-+-+-+ 9 e9na y tt c 7 6 .cbieas
u e 5m b t3r 4 46 |m|a|n|y| |a|u|t|h|o|r|s| u |w|r|i|t|e| 4 4 yff , th t e
6 2 6vo nn s +-+-+-+-+ +-+-+-+-+-+-+-+ m +-+-+-+-+-+ i 4 1 W1 n r8 - 1 g7
4n +-+-+-+-+-+ +-+-+-+-+-+ +-+-+-+-+-+ 8 1n e 6l v5c a
r 4 1 |e|v|e|r|y| |h|u|m|a|n| |b|e|i|n|g| n5 asr e 7l h 7 u , k o 2 r
e h r h +-+-+-+-+-+ +-+-+-+-+-+ +-+-+-+-+-+ 65 3 1 t w er e3 5 1en e i
4 o c +-+-+-+ +-+-+-+ +-+-+-+-+-+-+ +-+-+ u 6d7 r tm , t l se t i 1
t fc |w|h|o| |h|a|s| |a|c|c|e|s|s| |t|o| e 69 t n 1 k 4 1
e n +-+-+-+ +-+-+-+ +-+-+-+-+-+-+ +-+-+ ie 62i 2 t tn 7 t on o e
1 l , +-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+ a 9 , 9
9 w r |t|h|e| |i|n|t|e|r|n|e|t| |i|n|t|e|r|a|c|t|s| r i i tr h u f
m i m 5 +-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+ 6 T c 5 w 6 i d T
7 5 l i os +-+-+ +-+-+-+-+-+ +-+-+-+-+-+-+ s m
w s r6 n |w|e| t |c|h|a|t|,| |w|r|i|t|e|,| 6 rrf
e 2 6 , p oe +-+-+ o +-+-+-+-+-+ +-+-+-+-+-+-+ r
e s 4 e p y 9 i +-+-+-+-+-+-+ +-+-+-+-+ +-+-+-+ r /
e s 6 e |c|l|i|c|k|,| |l|i|k|e| |a|n|d| tw r6 t ai
3 8 28 a n e 8 +-+-+-+-+-+-+ +-+-+-+-+ +-+-+-+ r4 7
e n h t 5 n +-+-+-+-+-+ n
3 9 f c |s|h|a|r|e| p
l 5 9 +-+-+-+-+-+ d
7 1 +-+-+ +-+-+-+-+-+ +-+-+-+ +-+-+-+-+ t 5
r 2 2 e |w|e| |l|e|a|v|e| |o|u|r| |d|a|t|a| n3 i ,
d t 8 a 9 +-+-+ 1 +-+-+-+-+-+ +-+-+-+ +-+-+-+-+ t
7 +-+-+ +-+-+-+-+ +-+-+-+-+-+-+-+-+-+
7 t e |w|e| |f|i|n|d| |o|u|r|s|e|l|v|e|s| 6
y s 8 8 +-+-+ 7 +-+-+-+-+ +-+-+-+-+-+-+-+-+-+ n e
r 1 +-+-+-+-+-+-+-+ +-+-+ +-+-+-+-+-+-+ e
a 2 t |w|r|i|t|i|n|g| |i|n| |P|y|t|h|o|n|
5 3 d +-+-+-+-+-+-+-+ +-+-+ +-+-+-+-+-+-+ r
+-+-+-+-+ +-+-+-+-+-+-+ e
|s|o|m|e| |n|e|u|r|a|l| 4 a
k n +-+-+-+-+ +-+-+-+-+-+-+ z
or 3 w +-+-+-+-+-+-+-+-+ +-+-+-+-+-+
1 1 |n|e|t|w|o|r|k|s| c |w|r|i|t|e| 1 9
s n +-+-+-+-+-+-+-+-+ +-+-+-+-+-+ e a
g +-+-+-+-+-+ +-+-+-+-+-+-+-+ +-+-+-+-+-+-+ t
|h|u|m|a|n| |e|d|i|t|o|r|s| |a|s|s|i|s|t| n , o
8 +-+-+-+-+-+ +-+-+-+-+-+-+-+ +-+-+-+-+-+-+ a
+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+ 4
|p|o|e|t|s|,| |p|l|a|y|w|r|i|g|h|t|s| i7
t +-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+ t c k y
v +-+-+ +-+-+-+-+-+-+-+-+-+ o +-+-+-+-+-+-+
|o|r| |n|o|v|e|l|i|s|t|s| |a|s|s|i|s|t| 4 2 9
r +-+-+ +-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+ 7 6
u r e
, R
6 6
t
s
3 g 6 4
c e t 2
3 h 8
D 4
a
n o -
w 5 e 3 n e 3
3
e
6
V V V % V % V % V V V % % %% % %% % %% % % % % % %
V V V V V V V V V V V V V V V V % % 0 %% 0 % %% % % % % %
V V V V V V % V V V % % % % % % 0 % 00 % % 0 %
% %% % 0 0 %% % % ___ _ %% % 0 %
% % % % / \__ _| |_ __ _
WRITERS % % % / /\ / _` | __/ _` | 0 0 % %
% % % % / /_// (_| | || (_| | % % % %
% 0 0 00 /___,' \__,_|\__\__,_| 0
V V V V % V V V % V 0 __ __ _
V V V V V V V V V V V V V V V V 0 0 / / /\ \ \___ _ __| | _____ _ __ ___ 0 0 %
V V V V % V V V V V \ \/ \/ / _ \| '__| |/ / _ \ '__/ __|
V V V V V V V V 0 0 0 \ /\ / (_) | | | < __/ | \__ \ 0
V V V V V V V V V V V V V V V V \/ \/ \___/|_| |_|\_\___|_| |___/ % %
V V V % V V V V V V 0 ___ _ _ _ 0 0 0 _ _ 0 %
% / _ \_ _| |__ | (_) ___ __ _| |_(_) ___ _ __ %
Data workers need data to work 0 / /_)/ | | | '_ \| | |/ __/ _` | __| |/ _ \| '_ \
with. The data that used in the % / ___/| |_| | |_) | | | (_| (_| | |_| | (_) | | | |
context of Algolit is written lan- 0 \/ \__,_|_.__/|_|_|\___\__,_|\__|_|\___/|_| |_|
guage. Machine learning relies on 0 0 % 0 % %
many types of writing. Many authors
write in the form of publications, By Algolit
such as books or articles. These % %
are part of organized archives and All works visible in the exhibition, as well as the contextual
are sometimes digitized. But there stories and some extra text material have been collected in
are other kinds of writing too. We this publication, which exists in French and English.
could say that every human being
who has access to the Internet is a This publication is made using a plain text workflow, based on
writer each time they interact with various text processing and counting tools. The plain text file
algorithms. We chat, write, click, format is a type of document in which there is no inherent struc-
like and share. In return for free tural difference between headers and paragraphs anymore. It is
services, we leave our data that is the most used type of document in machine learning models for
compiled into profiles and sold for text. This format has been the starting point of a playful design
advertising and research purposes. process, where pages are carefully counted, page by page, line by
line and character by character. %
Machine learning algorithms are not %
critics: they take whatever they're Each page holds 110 characters per line and 70 lines per page.
given, no matter the writing style, The design originates from the act of counting words, spaces and
no matter the CV of the author, no lines. It plays with random choices, scripted patterns and
matter the spelling mistakes. In ASCII/UNICODE-fonts, to speculate about the materiality of digi-
fact, mistakes make it better: the tal text and to explore the interrelations between counting and
more variety, the better they learn writing through words and numbers.
to anticipate unexpected text. But
often, human authors are not aware --- %
of what happens to their work.
Texts: Cristina Cochior, Sarah Garcin, Gijs de Heij, An Mertens,
Most of the writing we use is in François Zajéga, Louise Dekeuleneer, Florian Van de Weyer,
English, some in French, some in Laetitia Trozzi, Rémi Forte, Guillaume Slizewicz.
Dutch. Most often we find ourselves
writing in Python, the programming Translations & proofreading: deepl.com, Michel Cleempoel,
language we use. Algorithms can be % Elodie Mugrefya, Emma Kraak, Patrick Lennon.
writers too. Some neural networks
write their own rules and generate Lay-out & cover: Manetta Berends
their own texts. And for the models https://git.vvvvvvaria.org/mb/data-workers-publication
that are still wrestling with the
ambiguities of natural language, Font: GNU Unifont, OGRE
there are human editors to assist Printer: PrinterPro, Rotterdam
them. Poets, playwrights or novel- Paper: Glossy MC 90gr
ists start their new careers as as-
sistants of AI. Responsible publisher: Constant vzw/asbl
Rue du Fortstraat 5, 1060 Brussels
License: Algolit, Data Workers, March 2019, Brussels.
Copyleft: This is a free work, you can copy, distribute,
and modify it under the terms of the Free Art License.
http://artlibre.org/licence/lal/en/
Online version: http://www.algolit.net/index.php/Data_Workers
Sources: https://gitlab.constantvzw.org/algolit/mundaneum
7
% % % % % %%% % % 0 % 00 % % 0 %%
% % 0 ___ _ 0 0
% % % % % / \__ _| |_ __ _ 0 % %
%%% % %% % % % % % % / /\ / _` | __/ _` | % % 0 %
% % % % % % / /_// (_| | || (_| | % % % % %
% %%% % % 00 /___,' \__,_|\__\__,_| % 0 % % % % %
% __ % __ 0 % _ 0 % % % %
% % 0 / / /\ \ \___ _ __| | _____ _ __ ___ % %
% % % % % % \ \/ \/ / _ \| '__| |/ / _ \ '__/ __|
% 0 \ /\ / (_) | | | < __/ | \__ \ 0 %
% 0 \/ \/ \___/|_| |_|\_\___|_| |___/
% % 0 % ___ _ _ %
% % 0 / _ \___ __| | ___ __ _ ___| |_ 0
% 0 0 / /_)/ _ \ / _` |/ __/ _` / __| __|
% % 0 0 / ___/ (_) | (_| | (_| (_| \__ \ |_
% 0 \/ \___/ \__,_|\___\__,_|___/\__| %
0 0 0 0 0 0 %
%
% By Algolit %
% % %
% During our monthly Algolit meetings, we study manuals and experi-
ment with machine learning tools for text processing. And we also
share many, many stories. With this podcast we hope to recreate
some of that atmosphere.
% %
For outsiders, algorithms only become visible in the media when
they achieve an outstanding performance, like Alpha Go, or when
they break down in fantastically terrifying ways. Humans working
in the field though, create their own culture on and offline.
They share the best stories and experiences during live meetings,
research conferences and annual competitions like Kaggle. These
% stories that contextualize the tools and practises can be funny,
sad, shocking, interesting.
A lot of them are experiential learning cases. The implementa-
% % tions of algorithms in society generate new conditions of labour,
storage, exchange, behaviour, copy and paste. In that sense, the
contextual stories capture a momentum in a larger anthropo-ma-
chinic story that is being written at full speed and by many
voices. The stories are also published in this publication.
--- %
% %
% Voices: David Stampfli, Cristina Cochior, An Mertens,
Gijs de Heij, Karin Ulmer, Guillaume Slizewicz
Editing: Javier Lloret
%
Recording: David Stampfli
Texts: Cristina Cochior, An Mertens
8
%% % % % 00 00 0 % %
% % % % % % 0 0 % %
% % %% % 0 0 _ _ _ %%
%%% %% % % % % % % %% /\/\ __ _ _ __| | _| |__ ___ | |_
% % %% / \ / _` | '__| |/ / '_ \ / _ \| __|
% % % % % % / /\/\ \ (_| | | | 0 <| |_) | (_) | |_ %
% % %% \/ \/\__,_|_| |_|\_\_.__/ \___/ \__|
% % % ___ _ 0 0 _ 00 %%%
/ __\ |__ __ _(_)_ __ ___ 0
% %% 0 / / | '_ \ / _` | | '_ \/ __| %
0 / /___| | | | (_| | | | | \__ \
% % 0 \____/|_| |_|\__,_|_|_| |_|___/ 0 0
%% 0 0 0
%% %
By Florian Van de Weyer, student Arts²/Section Digital Arts
Markbot Chain is a social experiment in which the public has a
% direct influence on the result. The intention is to integrate re-
sponses in a text-generation process without applying any filter.
%
All the questions in the digital files provided by the Mundaneum %%
were automatically extracted. These questions are randomly put to
the public via a terminal. By answering them, people contribute
to another database. Each entry generates a series of sentences
using a Markov chain configuration, an algorithm that is widely %
used in spam generation. The sentences generated in this way are
% displayed in the window, and a new question is asked.
% % %
% % %
9
CONTEXTUAL STORIES
ABOUT WRITERS
--- Programmers are writing the only way to maintain trust is through consis-
the dataworkers into being --- tency. So when Cortana talks, you 'must use her
personality'.
We recently had a funny realization: most program-
mers of the languages and packages that Algolit What is Cortana's personality, you ask?
uses are European.
Python, for example, the main language that is 'Cortana is considerate,
globally used for Natural Language Processing sensitive, and supportive.
(NLP), was invented in 1991 by the Dutch program-
mer Guido Van Rossum. He then crossed the Atlantic She is sympathetic but turns quickly to solutions.
and went from working for Google to working for
Dropbox. She doesn't comment on the user’s personal
information or behavior, particularly if
Scikit Learn, the open-source Swiss knife of ma- the information is sensitive.
chine learning tools, started as a Google Summer
of Code project in Paris by French researcher She doesn't make assumptions about what
David Cournapeau. Afterwards, it was taken on by the user wants, especially to upsell.
Matthieu Brucher as part of his thesis at the Sor-
bonne University in Paris. And in 2010, INRA, the She works for the user. She does not repre-
French National Institute for computer science and sent any company, service, or product.
applied mathematics, adopted it.
She doesn’t take credit or
Keras, an open-source neural network library writ- blame for things she didn’t do.
ten in Python, was developed by François Chollet,
a French researcher who works on the Brain team She tells the truth about her
at Google. capabilities and her limitations.
Gensim, an open-source library for Python used to She doesn’t assume your physical capabilities, gen-
create unsupervised semantic models from plain der, age, or any other defining characteristic.
text, was written by Radim Řehůřek. He is a Czech
computer scientist who runs a consulting business She doesn't assume she knows
in Bristol, UK. how the user feels about something.
And to finish up this small series, we also looked She is friendly but professional.
at Pattern, an often-used library for web-mining
and machine learning. Pattern was developed and She stays away from emojis in tasks. Period.
made open-source in 2012 by Tom De Smedt and Wal-
ter Daelemans. Both are researchers at CLIPS, the She doesn’t use culturally- or
research centre for Computational Linguistics and professionally-specific slang.
Psycholinguistcs at the University of Antwerp.
She is not a support bot.'
--- Cortana speaks ---
Humans intervene in detailed ways to programme
AI assistants often need their own assistants: answers to questions that Cortana receives. How
they are helped in their writing by humans who in- should Cortana respond when she is being proposed
ject humour and wit into their machine-processed inappropriate actions? Her gendered acting raises
language. Cortana is an example of this type of difficult questions about power relations within
blended writing. She is Microsoft’s digital assis- the world away from the keyboard, which is being
tant. Her mission is to help users to be more pro- mimicked by technology.
ductive and creative. Cortana's personality has
been crafted over the years. It's important that Consider Cortana's answer to the question:
she maintains her character in all interactions - Cortana, who's your daddy?
with users. She is designed to engender trust and - Technically speaking, he’s Bill Gates.
her behavior must always reflect that. No big deal.
The following guidelines are taken from Mi-
crosoft's website. They describe how Cortana's --- Open-source learning ---
style should be respected by companies that extend
her service. Writers, programmers and novelists, Copyright licenses close up a lot of the machinic
who develop Cortana's responses, personality and writing, reading and learning practices. That
branding have to follow these guidelines. Because means that they're only available for the employ-
10
ees of a specific company. Some companies partici- very definition, resists categorization.
pate in conferences worldwide and share their
knowledge in papers online. But even if they share References
their code, they often will not share the large Paper: https://hiphilangsci.net/2013/05/01/on-the-
amounts of data needed to train the models. history-of-the-question-of-whether-language
-is-illogical/
We were able to learn to machine learn, read and
write in the context of Algolit, thanks to aca- Book: Neural Network Methods for Natural Language
demic researchers who share their findings in pa- Processing, Yoav Goldberg, Bar Ilan University,
pers or publish their code online. As artists, we April 2017.
believe it is important to share that attitude.
That's why we document our meetings. We share the
tools we make as much as possible and the texts we
use are on our online repository under free li-
censes.
We are thrilled when our works are taken up by
others, tweaked, customized and redistributed, so
please feel free to copy and test the code from
our website. If the sources of a particular
project are not there, you can always contact us
through the mailinglist. You can find a link to
our repository, etherpads and wiki at:
http://www.algolit.net.
--- Natural language for
artificial intelligence ---
Natural Language Processing (NLP) is a collective
term that refers to the automatic computational
processing of human languages. This includes algo-
rithms that take human-produced text as input, and
attempt to generate text that resembles it. We
produce more and more written work each year, and
there is a growing trend in making computer inter-
faces to communicate with us in our own language.
NLP is also very challenging, because human lan-
guage is inherently ambiguous and ever-changing.
But what is meant by 'natural' in NLP? Some would
argue that language is a technology in itself. Ac-
cording to Wikipedia, 'a natural language or ordi-
nary language is any language that has evolved
naturally in humans through use and repetition
without conscious planning or premeditation.
Natural languages can take different forms, such
as speech or signing. They are different from con-
structed and formal languages such as those used
to program computers or to study logic. An offi-
cial language with a regulating academy, such as
Standard French with the French Academy, is clas-
sified as a natural language. Its prescriptive
points do not make it constructed enough to be
classified as a constructed language or controlled
enough to be classified as a controlled natural
language.'
So in fact, 'natural languages' also includes lan-
guages which do not fit in any other group. NLP,
instead, is a constructed practice. What we are
looking at is the creation of a constructed lan-
guage to classify natural languages that, by their
11
0 12 3 4 5 67 8 9 0
12 3 4 5 67 8 9 0 12
3 4 5 67 8 9 0 1 2
3 4 5 6 7 8 9 0 1 2 3
4 56 7 8 9 01 2 3
4 56 7 8 9 01 2 3
4 56 7 8 9 01 2 3
4 56 7 8 9 0 1 2 3 4
5 6 7 8 9 0 1 2 3 45 6
7 8 90 1 2 3 45 6
7 8 90 1 2 3 45 6
7 8 90 1 2 3 4 5 6
7 8 9 0 1 2 3 4 5 6
7 89 0 1 2 34 5 6
7 89 0 1 2 34 5 6
7 89 0 1 2 34 5 6 7
89 0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 78 9
0 1 23 4 5 6 78 9 0
1 23 4 5 6 78 9 0
1 23 4 5 6 78 9 0
1 2 3 4 5 6 7 8 9 0
1 2 3 4 5 67 8 9 0 12
3 4 5 67 8 9 0 12
3 4 5 67 8 9 0 12 3
4 5 6 7 8 9 0 1 2 3
4 5 6 7 8 9 01 2 3
4 56 7 8 9 01 2 3
4 56 7 8 9 01 2 3
4 56 7 8 9 01 2 3 4
5 6 7 8 9 0 1 2 3 4 5 6
7 8 90 1 2 3 45 6
7 8 90 1 2 3 45 6
7 8 90 1 2 3 45 6
7 8 90 1 2 3 4 5 6 7
8 9 0 1 2 3 4 5 6 7
89 0 1 2 34 5 6 7
89 0 1 2 34 5 6 7 89
0 1 2 34 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 23 4 5 6 78 9 0
1 23 4 5 6 78 9 0
1 23 4 5 6 78 9 0
1 23 4 5 6 7 8 9 0
1 2 3 4 5 6 7 8 9 0 12 3
4 5 67 8 9 0 12 3
4 5 67 8 9 0 12 3
4 5 67 8 9 0 12 3
4 5 6 7 8 9 0 1 2 3
4 5 6 7 8 9 01 2 3 4
56 7 8 9 01 2 3 4
56 7 8 9 01 2 3 4 5
6 7 8 9 0 1 2 3 4 5 6
7 8 9 0 1 2 3 45 6
7 8 90 1 2 3 45 6
7 8 90 1 2 3 45 6
7 8 90 1 2 3 45 6 7
8 9 0 1 2 3 4 5 6 7
8 9 0 1 2 34 5 6 7 89
0 1 2 34 5 6 7 89 0
1 2 34 5 6 7 89 0
1 2 34 5 6 7 8 9 0
1 2 3 4 5 6 7 8 9 0
1 23 4 5 6 78 9 0
1 23 4 5 6 78 9 0
1 23 4 5 6 78 9 0 1
2 3 4 5 6 7 8 9 0 1 2 3
4 5 67 8 9 0 12 3
4 5 67 8 9 0 12 3
12
oracles predict oracles predict oracles predict oracles predict oracles predict oracles predic
oracles predict oracles predict oracles predict oracles predict orac
es predict oracles predict oracles predict oracles predict
racles predict oracles predict oracles predict oracles predic
oracles predict oracles predict oracles predict
oracles predict oracles predict oracles predict
oracles predict oracles predict or
cles predict oracles predict oracles predict
oracles predict oracles predict
oracles predict oracles predict oracles pr
dict oracles predict oracles predict
oracles predict oracles predict
oracles predict oracles predict
oracles predict oracles predict
oracles predict oracles predict
oracles predict orac
es predict oracles predict
oracles predict oracles predict
oracles predict oracles predic
oracles predict
oracles predict oracles predict
oracles predict
oracles predict oracles predict
oracles predict
racles predict oracles predict
oracles predict
oracles predict oracles predict
oracles predict
oracles predict orac
es predict oracles predict
oracles predict
oracles predict
racles predict oracles predict
oracles predict
oracles predict
oracles predict
racles predict oracles predict
oracles predict
oracles predict
oracles predict
oracles predict
oracles predict or
cles predict oracles predic
oracles predict
oracles predict
oracles predict
oracles predict
oracles predict
oracles predict
oracles predict
oracles predict
oracles predict
oracles predict
oracles predict
oracles predict
oracles predict
oracles predict
oracles predict
oracles predict
oracles predict
oracles predict
oracles predict
oracles predict
oracles predict
oracles predict
oracles predict
oracles predict
oracles predict
oracles predict
13
r e32t 8smc 9i ab14 e s4 +-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+ , e| 8 1 e D ry a4a e ta 9 e
t s5 e ² 348 th8no 2 4at t |o|r|a|c|l|e|s| ar3i |p|r|e|d|i|c|t| 63 s 1 tc39,l3h, d14 5au on w
4 SI, 1 56 e|p 4 iu g7 e +-+-+-+-+-+-+-+ 39k +-+-+-+-+-+-+-+ 9 l o a d r 7 P _ e,a +
n w 2a p/+ 9f8 1of 5\i 4h h e2n 3 t on1 9t \ 94 ne2 + uu e n 63m 5 e a3 2n e,
sn 39ew nt1i -5d 632sd e 15t |a3% 3 c wt9 c n9sg6et 8 8 c , n 1poo F
1 3 o 1g18e +-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ 7 +-+-+-+-+-+-+-+-+ +-+-+-+ 4 n t2+a- 8 43 8 3p4
n o tpn86i |m|a|c|h|i|n|e| |l|e|a|r|n|i|n|g| 2 |a|n|a|l|y|s|e|s| |a|n|d| a 5e v3 5 9 o56n n
e9n 4 5 +-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ etn +-+-+-+-+-+-+-+-+ +-+-+-+ li 5p 8f i h
3 6 k6 3i6 3 9y e , r6 6iA wg r1 +-+-+-+-+-+-+-+-+ 3 e e a y l hl
-N 7 g n6d 14t l1 9ui | _rs e i e 1 |p|r|e|d|i|c|t|s| 1 wn9uc tn s 6m
a rrh4 7 oly e e e e 4 62 y a e +-+-+-+-+-+-+-+-+ g 8a 3 V l% u a i 1 7 1
’ h | 8 8 5 _ n , 8r 4 1_ +-+-+-+-+-+-+ .r +-+-+-+-+ +-+-+-+-+-+-+-+ 5 r 3 9 1 p o f a
r v t 4 o 9 w2 4r |m|o|d|e|l|s| g r |h|a|v|e| |l|e|a|r|n|e|d| 1 n r1 8 2 sro
1 ,d c T2 8 9 41 6 +-+-+-+-+-+-+ c +-+-+-+-+ +-+-+-+-+-+-+-+ d3 s m 6 d n f c t e
t t r 1 6 .ofoi t 5 67 1 +-+-+-+-+-+-+ 7 +-+-+-+ +-+-+-+-+ 4o e e 5 1 98 g ,
+ rw l 9 96 a 3t np , |m|o|d|e|l|s| |a|r|e| |u|s|e|d| , e uu 3 l c t
3 28e 95 9 h _ n +-+-+-+-+-+-+ +-+-+-+ +-+-+-+-+ a9 1e _eu p e d e w
n w r n n f 8 c , d +-+-+-+-+ a +-+-+-+-+-+-+-+-+-+ 84 i e l8 t
+ o mf 7 |t|h|e|y| d |i|n|f|l|u|e|n|c|e| o n a bntq c d n7 8
- s e 9 n 7 77 8 +-+-+-+-+ aa +-+-+-+-+-+-+-+-+-+ t a 6 1 | c4
h o l6 o 9 8 o +-+-+-+-+ i +-+-+-+-+ +-+-+-+-+-+ +-+-+-+ e r 3e9 h 6
o -n p 9 f n s 8hr |t|h|e|y| e- |h|a|v|e| |t|h|e|i|r| |s|a|y| lV d tr
r 2 6 6 a +-+-+-+-+ %5 +-+-+-+-+ +-+-+-+-+-+ +-+-+-+ 3 ip n 5n
r 7 o( s +-+-+-+-+-+-+-+-+-+-+-+ 5 4 a o 7 3 e 6 n- t n f d it
p 1 e |i|n|f|o|r|m|a|t|i|o|n| 4n i3 c, 6 t 1 l ma 7
1 d b +-+-+-+-+-+-+-+-+-+-+-+ a 7 t 4 7 s w 3a e
4 3 3 +-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+ d i 2
6 e r C |e|x|t|r|a|c|t|i|o|n| |r|e|c|o|g|n|i|z|e|s| r
%_ e d kb h +-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+ a
3 c +-+-+-+-+ m v
7 + 9 l 5 so h a a |t|e|x|t| 5 5 e 3 9 P p 5
-9 t u5 7 ' l +-+-+-+-+ m ao n- r
i y +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+ 8 1
a 9 37 |c|l|a|s|s|i|f|i|c|a|t|i|o|n| |d|e|t|e|c|t|s| c
4 I r t p h +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ o +-+-+-+-+-+-+-+ O pe u
g rk 4 7 1 5 5 9 i 4 c 5 2
o 3 p h 9 v r f 3d
d , 3r 5i g h 1 4 l 5
h w c 7 e 3 yo n
h 5 5 2 e m o , c 2 r
s 3 1 7 s 1 e 1
l 6 t e 6 1 r b 2 4
e r 4 4 o s 4
9 ,i pw o c
1 6 n , a 5
e e i 4 p t , ' s
ei 9 t
6 t l u 6 9
V 8 c | _ a
r o 5 r | 3 t t
1 1 o 3 _
o l 6 i 7 + O w e
8 7 M se
% i 3 e
p 3 9
a r a b i n o a
7 e 4 s o tl t
9 r s 94 c
o k5 l 2 | a r T 1 ,
r r 2 s
| , n
o t 5
l t r si
e y s t
y e o
r 8 e 1 h
2 n 6 5
r n 5 s
14
V V V V V V V V %% %% % % % % %
V V V V V V V V V V V V V V V V 0 % 0 % 0 0 %% 0 % % %%
V V V % V % V V V V V % % %% % 0 0 0 0 % 0 0 00
% % % %% % % _____ _ 0 _ _ 0 _ _ % %
% % 0 /__ \ |__ ___ /_\ | | __ _ ___ | (_) %
% ORACLES % % % % % 0 / /\/ '_ \ / _ \ //_\\| |/ _` |/ _ \| | | ___ %
% % %% / / | | | | __/ / _ \ | (_| | (_) | | ||___|
% % \/ |_| |_|\___| \_/ \_/_|\__, |\___/|_|_|
V V V V V V V V % 0 % % % 0 |___/ %
V V V V V V V V V V V V V V V V % 0 0 %% _ 0 0 _ 0 % 0 %
V V V V V V V V V 0 | |_ ___ _ __ __ _| |_ ___ _ __ %
V V V V V V V V % % % % | __/ _ \ '__/ _` | __/ _ \| '__| %
V V V V V V V V V V V V V V V V % | || __/ | | (_| | || (_) | |
V V V V V V V V V \__\___|_| \__,_|\__\___/|_|
% 0 0 %
Machine learning is mainly used to % %
analyse and predict situations by Algolit %
based on existing cases. In this
exhibition we focus on machine The Algoliterator is a neural network trained using the selection
learning models for text processing of digitized works of the Mundaneum archive. %
or Natural Language Processing %
(NLP). These models have learned to With the Algoliterator you can write a text in the style of the
perform a specific task on the ba- International Institutions Bureau. The Algoliterator starts by
sis of existing texts. The models selecting a sentence from the archive or corpus used to train it.
are used for search engines, ma- You can then continue writing yourself or, at any time, ask the
chine translations and summaries, Algoliterator to suggest a next sentence: the network will gener-
spotting trends in new media net- ate three new fragments based on the texts it has read. You can
works and news feeds. They influ- control the level of training of the network and have it generate
ence what you get to see as a user, sentences based on primitive training, intermediate training or
but also have their say in the final training.
course of stock exchanges world-
wide, the detection of cybercrime When you're satisfied with your new text, you can print it on the
and vandalism, etc. thermal printer and take it home as a souvenir.
%
There are two main tasks when it % ---
comes to language understanding.
Information extraction looks at Sources: https://gitlab.constantvzw.org/algolit/algoliterator.clone
concepts and relations between con-
cepts. This allows for recognizing Concept, code & interface: Gijs de Heij & An Mertens
topics, places and persons in a
text, summarization and questions & Technique: Recurrent Neural Network
answering. The other task is text
classification. You can train an Original model: Andrej Karphaty, Justin Johnson %
oracle to detect whether an email
is spam or not, written by a man or
a woman, rather positive or nega- % %
tive. 0 0 0 0 0 0
0 0 0 0 0 0 0
In this zone you can see some of __ __ 0 _ 0 _ 0
those models at work. During your 0 0 / / /\ \ \___ _ __ __| |___ (_)_ __
further journey through the exhibi- \ \/ \/ / _ \| '__/ _` / __| | | '_ \
tion you will discover the differ- \ /\ / (_) | | | (_| \__ \ | | | | |
ent steps that a human-machine goes \/ \/ \___/|_| \__,_|___/ |_|_| |_|
through to come to a final model. 0 __ 0
00 0 / _\_ __ __ _ ___ ___ 0
00 0 \ \| '_ \ / _` |/ __/ _ \
_\ \ |_) | (_| | (_| __/ 0
% 0 \__/ .__/ \__,_|\___\___|
0 0 |_| 0
0 0 0 0 0 0
by Algolit
Word embeddings are language modelling techniques that through
multiple mathematical operations of counting and ordering, plot
words into a multi-dimensional vector space. When embedding
words, they transform from being distinct symbols into mathemati-
cal objects that can be multiplied, divided, added or substracted.
15
%%% % % % % % % % %% % %% % %% %% % %% % % %
% % % % %%% %% %% By distributing the words along the many diagonal lines of the
% % % multi-dimensional vector space, their new geometrical placements
% % become impossible to perceive by humans. However, what is gained
% % % are multiple, simultaneous ways of ordering. Algebraic operations
% %% % make the relations between vectors graspable again. %
% %
% % % This installation uses Gensim, an open-source vector space and
topic-modelling toolkit implemented in the programming language %
Python. It allows to manipulate the text using the mathematical
relationships that emerge between the words, once they have been
% % % plotted in a vector space. %
% % % % %
% % % --- %
% %
% Concept & interface: Cristina Cochior
% % % %
Technique: word embeddings, word2vec %
%
% % Original model: Radim Rehurek and Petr Sojka
% % %
% %
% 0 00 0 0
0
% ___ _ 0 _ __ 0 _ 0
% 0 / __\ | __ _ ___ ___(_)/ _|_ 0 _(_)_ __ __ _
/ / | |/ _` / __/ __| | |_| | | | | '_ \ / _` |
/ /___| | (_| \__ \__ \ | _| |_| | | | | | (_| |
\____/|_|\__,_|___/___/_|_| \__, |_|_| |_|\__, | %
0 0 0 0 0 |___/ |___/
_ _ __ __ _ _
% 0 0 | |_| |__ ___ / / /\ \ \___ _ __| | __| |
% 0 | __| '_ \ / _ \ \ \/ \/ / _ \| '__| |/ _` |
0 | |_| | | | __/ \ /\ / (_) | | | | (_| |
\__|_| |_|\___| \/ \/ \___/|_| |_|\__,_|
0 0 0
%
by Algolit
% Librarian Paul Otlet's life work was the construction of the Mun-
daneum. This mechanical collective brain would house and distrib-
ute everything ever committed to paper. Each document was classi-
% fied following the Universal Decimal Classification. Using tele-
graphs and especially, sorters, the Mundaneum would have been
able to answer any question from anyone.
With the collection of digitized publications we received from
the Mundaneum, we built a prediction machine that tries to clas-
% sify the sentence you type in one of the main categories of
Universal Decimal Classification. You also witness how the ma-
chine 'thinks'. During the exhibition, this model is regularly
retrained using the cleaned and annotated data visitors added in
% Cleaning for Poems and The Annotator. %
The main classes of the Universal Decimal Classification system
are:
% %
0 - Science and Knowledge. Organization. Computer Science. Infor-
mation Science. Documentation. Librarianship. Institutions.
Publications %
1 - Philosophy. Psychology
2 - Religion. Theology
%
3 - Social Sciences
%
4 - vacant
16
%% %% %%% %% % %% 5 - Mathematics. Natural Sciences % % % % % % %% %
% % %% % % % %% %% %% % % % % % %
% % % % 6 - Applied Sciences. Medicine, Technology %
% % % % % % % %%
% %% % 7 - The Arts. Entertainment. Sport % %% %
% %% % % % % % %
% % 8 - Linguistics. Literature % %
% % % % % % % % % %
% % % % 9 - Geography. History % %% %
%% % % %
% % % ---
% % %
% Concept, code, interface: Sarah Garcin, Gijs de Heij, An Mertens
% % % % %
% %
% % 0 0 % 0 %
%% 000 0 0 % 0
% ___ 00 _ 0 %
0 / _ \___ ___ _ __ | | ___ %
0 0 / /_)/ _ \/ _ \| '_ \| |/ _ \
0 0 / ___/ __/ (_) | |_) | | __/ 0
0 \/ % \___|\___/| .__/|_|\___|
0 0 0 |_| 0
% _ _ _ 0 _ 0 0
0 0 __| | ___ _ __( ) |_ | |__ __ ___ _____ %
% / _` |/ _ \| '_ \/| __| | '_ \ / _` \ \ / / _ \ %
| (_| | (_) | | | || |_ | | | | (_| |\ V / __/
0 \__,_|\___/|_| |_| \__| |_| |_|\__,_| \_/ \___|
0
_ 0 _ _ 0
| |__ _ _| |_| |_ ___ _ __ ___
| '_ \| | | | __| __/ _ \| '_ \/ __|
% 0 | |_) | |_| | |_| || (_) | | | \__ \
0 |_.__/ \__,_|\__|\__\___/|_| |_|___/
0 0
%
by Algolit
Since the early days of artificial intelligence (AI), researchers
have speculated about the possibility of computers thinking and
communicating as humans. In the 1980s, there was a first revolu-
tion in Natural Language Processing (NLP), the subfield of AI
concerned with linguistic interactions between computers and hu-
mans. Recently, pre-trained language models have reached state-
of-the-art results on a wide range of NLP tasks, which intensi-
% fies again the expectations of a future with AI.
%
This sound work, made out of audio fragments of scientific docu-
mentaries and AI-related audiovisual material from the last half
century, explores the hopes, fears and frustrations provoked by
these expectations.
---
% Concept, sound edit: Javier Lloret
%
List of sources: 'The Machine that Changed the World :
Episode IV -- The Thinking Machine', 'The Imitation Game',
'Maniac', 'Halt & Catch Fire', 'Ghost in the Shell',
'Computer Chess', '2001: A Space Odyssey', Ennio Morricone,
Gijs Gieskes, André Castro.
17
CONTEXTUAL STORIES
ABOUT ORACLES
Oracles are prediction or profiling machines. Sweeney based her research on queries of 2184 raci-
They are widely used in smartphones, computers, ally associated personal names across two websites.
tablets.
88 per cent of first names, identified as
Oracles can be created using different techniques. being given to more black babies, are found pre-
One way is to manually define rules for them. As dictive of race, against 96 per cent white. First
prediction models they are then called rule-based names that are mainly given to black babies, such
models. Rule-based models are handy for tasks that as DeShawn, Darnell and Jermaine, generated ads
are specific, like detecting when a scientific pa- mentioning an arrest in 81 to 86 per cent of name
per concerns a certain molecule. With very little searches on one website and in 92 to 95 per cent
sample data, they can perform well. on the other. Names that are mainly assigned to
whites, such as Geoffrey, Jill and Emma, did not
But there are also the machine learning or statis- generate the same results. The word 'arrest' only
tical models, which can be divided in two oracles: appeared in 23 to 29 per cent of white name
'supervised' and 'unsupervised' oracles. For the searches on one site and 0 to 60 per cent on the
creation of supervised machine learning models, other.
humans annotate sample text with labels before
feeding it to a machine to learn. Each sentence, On the website with most advertising, a black-
paragraph or text is judged by at least three an- identifying name was 25 percent more likely to get
notators: whether it is spam or not spam, positive an ad suggestive of an arrest record. A few names
or negative etc. Unsupervised machine learning did not follow these patterns: Dustin, a name
models don't need this step. But they need large mainly given to white babies, generated an ad sug-
amounts of data. And it is up to the machine to gestive of arrest in 81 and 100 percent of the
trace its own patterns or 'grammatical rules'. Fi- time. It is important to keep in mind that the ap-
nally, experts also make the difference between pearance of the ad is linked to the name itself.
classical machine learning and neural networks. It is independent of the fact that the name has an
You'll find out more about this in the Readers arrest record in the company's database.
zone.
Reference
Humans tend to wrap Oracles in visions of Paper: https://dataprivacylab.org/projects/
grandeur. Sometimes these Oracles come to the sur- onlineads/1071-1.pdf
face when things break down. In press releases,
these sometimes dramatic situations are called
'lessons'. However promising their performances --- What is a good employee? ---
seem to be, a lot of issues remain to be solved.
How do we make sure that Oracles are fair, that Since 2015 Amazon employs around 575,000 workers.
every human can consult them, and that they are And they need more. Therefore, they set up a team
understandable to a large public? Even then, exis- of 12 that was asked to create a model to find the
tential questions remain. Do we need all types of right candidates by crawling job application web-
artificial intelligence (AI) systems? And who de- sites. The tool would give job candidates scores
fines what is fair or unfair? ranging from one to five stars. The potential fed
the myth: the team wanted it to be a software that
would spit out the top five human candidates out
--- Racial AdSense --- of a list of 100. And those candidates would be
hired.
A classic 'lesson' in developing Oracles was docu-
mented by Latanya Sweeney, a professor of Govern- The group created 500 computer models, focused on
ment and Technology at Harvard University. In specific job functions and locations. They taught
2013, Sweeney, of African American descent, each model to recognize some 50,000 terms that
googled her name. She immediately received an ad- showed up on past candidates’ letters. The algo-
vertisement for a service that offered her ‘to see rithms learned to give little importance to skills
the criminal record of Latanya Sweeney’. common across IT applicants, like the ability to
write various computer codes. But they also
Sweeney, who doesn’t have a criminal record, began learned some decent errors. The company realized,
a study. She started to compare the advertising before releasing, that the models had taught them-
that Google AdSense serves to different racially selves that male candidates were preferable. They
identifiable names. She discovered that she re- penalized applications that included the word 'wo-
ceived more of these ads searching for non-white men’s,' as in 'women’s chess club captain'. And they
ethnic names, than when searching for tradition- downgraded graduates of two all-women’s colleges.
ally perceived white names.You can imagine how
damaging it can be when possible employers do a This is because they were trained using the job
simple name search and receive ads suggesting the applications that Amazon received over a ten-year
existence of a criminal record. period. During that time, the company had mostly
18
hired men. Instead of providing the 'fair' deci- tools become tools of awareness.
sion-making that the Amazon team had promised, the
models reflected a biased tendency in the tech in- The team developed a model to analyse word embed-
dustry. And they also amplified it and made it in- dings trained over 100 years of texts. For contem-
visible. Activists and critics state that it could porary analysis, they used the standard Google
be exceedingly difficult to sue an employer over News word2vec Vectors, a straight-off-the-shelf
automated hiring: job candidates might never know downloadable package trained on the Google News
that intelligent software was used in the process. Dataset. For historical analysis, they used embed-
dings that were trained on Google Books and the
Reference Corpus of Historical American English (COHA
https://www.reuters.com/article/us-amazon-com- https://corpus.byu.edu/coha/) with more than 400
jobs-automation-insight/amazonscraps-secret-ai- million words of text from the 1810s to 2000s. As a
recruiting-tool-that-showed-bias-against-women- validation set to test the model, they trained em-
idUSKCN1MK08G beddings from the New York Times Annotated Corpus
for every year between 1988 and 2005.
--- Quantifying 100 Years The research shows that word embeddings capture
of Gender and Ethnic Stereotypes --- changes in gender and ethnic stereotypes over
time. They quantifiy how specific biases decrease
Dan Jurafsky is the co-author of 'Speech and Lan- over time while other stereotypes increase. The
guage Processing', one of the most influential major transitions reveal changes in the descrip-
books for studying Natural Language Processing tions of gender and ethnic groups during the
(NLP). Together with a few colleagues at Stanford women’s movement in the 1960-1970s and the Asian-
University, he discovered in 2017 that word embed- American population growth in the 1960s and 1980s.
dings can be a powerful tool to systematically
quantify common stereotypes and other historical A few examples:
trends.
The top ten occupations most closely associated
Word embeddings are a technique that translates with each ethnic group in the contemporary
words to numbered vectors in a multi-dimensional Google News dataset:
space. Vectors that appear next to each other,
indicate similar meaning. All numbers will be - Hispanic: housekeeper, mason, artist, janitor,
grouped together, as well as all prepositions, dancer, mechanic, photographer, baker, cashier,
person's names, professions. This allows for the driver
calculation of words. You could substract London
from England and your result would be the same as - Asian: professor, official, secretary,
substracting Paris from France. conductor, physicist, scientist, chemist, tailor,
accountant, engineer
An example in their research shows that the vector
for the adjective 'honorable' is closer to the - White: smith, blacksmith, surveyor, sheriff,
vector for 'man' whereas the vector for 'submissive' weaver, administrator, mason, statistician,
is closer to 'woman'. These stereotypes are auto- clergy, photographer
matically learned by the algorithm. It will be pro-
blematic when the pre-trained embeddings are then The 3 most male occupations in the 1930s:
used for sensitive applications such as search ran- engineer, lawyer, architect.
kings, product recommendations, or translations. The 3 most female occupations in the 1930s:
This risk is real, because a lot of the pre- nurse, housekeeper, attendant.
trained embeddings can be downloaded as off-
the-shelf-packages. Not much has changed in the 1990s.
It is known that language reflects and keeps cul- Major male occupations:
tural stereotypes alive. Using word embeddings to architect, mathematician and surveyor.
spot these stereotypes is less time-consuming and Female occupations:
less expensive than manual methods. But the imple- nurse, housekeeper and midwife.
mentation of these embeddings for concrete predic-
tion models, has caused a lot of discussion within Reference
the machine learning community. The biased models https://arxiv.org/abs/1711.08412
stand for automatic discrimination. Questions are:
is it actually possible to de-bias these models
completely? Some say yes, while others disagree: --- Wikimedia's Ores service ---
instead of retro-engineering the model, we should
ask whether we need it in the first place. These Software engineer Amir Sarabadani presented the
researchers followed a third path: by acknowledg- ORES-project in Brussels in November 2017 during
ing the bias that originates in language, these the Algoliterary Encounter.
19
This 'Objective Revision Evaluation Service' uses Twitter. She lived for less than 24 hours before
machine learning to help automate critical work on she was shut down. Few people know that before
Wikimedia, like vandalism detection and the re- this incident, Microsoft had already trained and
moval of articles. Cristina Cochior and Femke released XiaoIce on WeChat, China's most used chat
Snelting interviewed him. application. XiaoIce's success was so promising
that it led to the development of its American
Femke: To go back to your work. In these days you version. However, the developers of Tay were
tried to understand what it means to find bias in not prepared for the platform climate of Twitter.
machine learning and the proposal of Nicolas Although the bot knew how to distinguish a noun
Maleve, who gave the workshop yesterday, was nei- from an adjective, it had no understanding of the
ther to try to fix it, nor to refuse to deal with actual meaning of words. The bot quickly learned
systems that produce bias, but to work with them. to copy racial insults and other discriminative
He says that bias is inherent to human knowledge, language it learned from Twitter users and troll
so we need to find ways to somehow work with it. attacks.
We're just struggling a bit with what would that
mean, how would that work... So I was wondering Tay's appearance and disappearance was an impor-
whether you had any thoughts on the question of tant moment of consciousness. It showed the possi-
bias. ble corrupt consequences that machine learning can
have when the cultural context in which the algo-
Amir: Bias inside Wikipedia is a tricky question rithm has to live is not taken into account.
because it happens on several levels. One level
that has been discussed a lot is the bias in ref- Reference
erences. Not all references are accessible. So one https://chatbotslife.com/the-accountability-of-ai-
thing that the Wikimedia Foundation has been try- case-study-microsofts-tay-experiment-ad577015181f
ing to do, is to give free access to libraries
that are behind a pay wall. They reduce the bias
by only using open-access references. Another type
of bias is the Internet connection, access to the
Internet. There are lots of people who don't have
it. One thing about China is that the Internet
there is blocked. The content against the govern-
ment of China inside Chinese Wikipedia is higher
because the editors [who can access the website]
are not people who are pro government, and try to
make it more neutral. So, this happens in lots of
places. But in the matter of artificial intelli-
gence (AI) and the model that we use at Wikipedia,
it's more a matter of transparency. There is a
book about how bias in AI models can break peo-
ple's lives, it's called 'Weapons of Math Destruc-
tion'. It talks about AI models that exist in the
US that rank teachers and it's quite horrible be-
cause eventually there will be bias. The way to
deal with it based on the book and their research
was first that the model should be open source,
people should be able to see what features are
used and the data should be open also, so that
people can investigate, find bias, give feedback
and report back. There should be a way to fix the
system. I think not all companies are moving in
that direction, but Wikipedia, because of the val-
ues that they hold, are at least more transparent
and they push other people to do the same thing.
Reference
https://gitlab.constantvzw.org/algolit/algolit
/blob/master/algoliterary_encounter/Interview%
20with%20Amir/AS.aac
--- Tay ---
One of the infamous stories is that of the machine
learning programme Tay, designed by Microsoft.
Tay was a chat bot that imitated a teenage girl on
20
cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean
cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean
cleaners clean cleaners clean cleaners clean cleaners clean
cleaners clean cleaners clean cleaners clean
cleaners clean cleaners clean cleaners clean cle
ners clean cleaners clean cleaners clean
cleaners clean cleaners clean cleaners clean
cleaners clean cleaners clean cleaners
lean cleaners clean cleaners clean
cleaners clean cleaners clean
cleaners clean cleaners clean cle
ners clean cleaners clean cleaners
clean cleaners clean cleaners
lean cleaners clean cleane
s clean cleaners clean
cleaners clean cleaners clean
cleaners clean cleaners clean
cleaners clean cleaners clean
cleaners clean
cleaners clean cleaners clean
cleaners clean cleaners clean
cleaners clean
cleaners clean cleaners clean
cleaners clean
cleaners clean cleaners clean
cleaners clean
cleaners clean cleaners
clean cleaners clean
cleaners clean
cleaners clean cleaners clean
cleaners clean
cleaners clean
cleaners clean cleaners
clean cleaners clean
cleaners clean
cleaners clean
cleaners clean cle
ners clean cleaners clean
cleaners clean
cleaners clean
cleaners clean
cleaners clean
cleaners clean
cleaners clean cleaners
lean cleaners clean
cleaners clean
cleaners clean
cleaners clean
cleaners clean
cleaners clean
cleaners clean
cleaners clean
cleaners clean
cleaners clean
cleaners clean
cleaners clean
cleaners clean
cleaners clean
cleaners clean
cleaners clean
cleaners clean
cleaners clean
cleaners clean
cleaners clean
cleaners clean
cleaners clean
cleaners clean
cleaners clean
cleaners clean
21
r u e n 7 c %9 2 y m V +-+-+-+-+-+-+-+-+ e4 +-+-+-+-+-+ 9 -t 0n neof e 5 r6 7 kln
ci p '.s w s u 18 u n |c|l|e|a|n|e|r|s| 2 |c|l|e|a|n| et.t o % s eii4t i ktu 4i w +
t 6 . 3e -6 6 rVle 17 +-+-+-+-+-+-+-+-+ rg +-+-+-+-+-+ .e o n 7 ci i 0 e h eR e85 orh
n x h r 4 h t5 7hoh 4 t ei g + n e3 tt np% k s +h_ hees ir w n +6 l rt 8 oe e Fe
r5b t ua0e 3ei n a 1 t8 rd t 7 li \ 7n v2 tq e e6 a as o
2b t t m oe f c8 lx - g9 r - -s+ +-+-+ h +-+-+-+-+-+-+ 8f o1 Ao % r - 5i 2 e - r
x p n4h e6 s n8 / s7 . 95 sti |w|e| eno |h|e|l|p|e|d| +e r a2 sy n gyl 2u e sti6t
ch% _ 1r se o + t t 4, 1 t9 l +-+-+ e +-+-+-+-+-+-+ t r i 7 rs u ie o o,4 h
, 5 5h g gs 6u5e e0 95 eif e % +-+-+ s 9 +-+-+-+-+-+-+-+ o+ m iy n6 m _4 l oae s+ da
e w i_|e e a 6 an |w|e| | |c|l|e|a|n|e|d| 7 i a e r l 7
se 8w ,p+tn i d t 1 g s ae l +-+-+ tec +-+-+-+-+-+-+-+ - ts e e,d % e 8e i
r i _6sog y L5 e v +-+-+-+-+-+ +-+-+-+-+ er +-+-+ +-+-+-+-+-+-+ Ies f e/ 8rh gr o 5 ac55 e
( h s s9 |h|u|m|a|n| |w|o|r|k| 96 7 |i|s| |n|e|e|d|e|d| i 8 d 13 l , i
- s tt 1 _ S +-+-+-+-+-+ +-+-+-+-+ _ +-+-+ +-+-+-+-+-+-+ r v Mr_ a3 f r ,
a s l n 87 +-+-+-+-+-+-+-+-+-+-+-+ rh 9 t r 7 36 w i n e 2 n d m
i4 +2 c 6 o |p|o|o|r|l|y|-|p|a|i|d| w n 3 g e - 6 tk o- r r
w9 4 t 8p ie c rVv 5 +-+-+-+-+-+-+-+-+-+-+-+ b n h - 6 xc te|t ,2 5 n
4 4 ,in 7 4( d +-+-+-+-+-+-+-+-+-+-+-+ l +-+-+-+-+-+ +-+-+-+ -d ah v + n5 . 4 6s_
t 2- i l |f|r|e|e|l|a|n|c|e|r|s| te3c |c|a|r|r|y| |o|u|t| l e oee 1n 7 \ y1k
r r l p r 6 e +-+-+-+-+-+-+-+-+-+-+-+ 6|p +-+-+-+-+-+ +-+-+-+ s p o2 ) t -e : p 8 h
h9 h o 4l +-+-+-+-+-+-+-+-+-+-+ \ +-+-+ +-+-+-+-+-+-+-+-+-+ +-+-+-+-+ nb h 7 s4i1 3
T z3 |h e 9 |v|o|l|u|n|t|e|e|r|s| 9 |d|o| |f|a|n|t|a|s|t|i|c| |w|o|r|k| 9 ws w 5 e6 x
a` o +-+-+-+-+-+-+-+-+-+-+ +-+-+ +-+-+-+-+-+-+-+-+-+ +-+-+-+-+ ih l 3 6
7 r 6 d G i6 1 3 e1 +-+-+-+-+-+-+-+ +-+-+-+-+-+-+ +-+-+ +-+-+-+-+ eir c e n% ui
l r 6 6s t r |w|h|o|e|v|e|r| |c|l|e|a|n|s| |u|p| |t|e|x|t| h 6 t i
t tc w a s e 9 +-+-+-+-+-+-+-+ F +-+-+-+-+-+-+ +-+-+ +-+-+-+-+ , 5 9s9 w e e
n m5 e 4 Mi e c i a U u r e 2 a i % .S g6 u 3
_t f 2 t 5 t6 v V c a i f- ee l 9rni/ 3 a 7e 1
1o n 3 2 tn t 5 1o 7 r s / % uio +
9 f a 4 - e o e t + i r + s 2
ls_ nr e w i l V - 8e t 5 +i v 2 p o
l n e j n tr l V| n e w L r 8
c l1 l i i a 8 t g0 y s
, a u r9 e 8 4 9 e | e 3
n g8 r e? M d r a i l c
- n t r 4 e r l c ii e a
p r a a h 6 l 3 e s
i 4 c o | 6 v rh p7 3 % h t a
e e 1 6 6 p 15 8 e a n s d o 1 i 2 n
s e m t 2 w v a 6 i i
r 7 | a e 5 7 s 3 8 i 4 7
e y 4 3 w 5 l unw5 4ie o3 439 o i %
r 6 e a 4a f n e
h a 5 o s i l s
- s | n D 4
e 3 - 2 5 h a 1 V p n v
+ 7 8n n a ar ) v
. n2 t 5 6r 8 |
u o _ e r l n, r 1 e
n ,e r s 7 a 7
a e h t y d a 3
u | 2 a s 4 t
6 e t66 e % 2 3 y 3 n
a e o i , t 4 i e g c r
l t w 9 2 a
h v t , p c a r h c
l 4 g p1
z i t o m a % a
i k | a i e
s a v c a , l lp + d 2 a
3 o t
e
5 n t p s i a 6 r
e 5 y,r m e ,
g i 7 s i 5 s a
a a % r
3 u p n
e \ 5 i p o l i
22
% V V V V V V V % V % % % % %% % % %% % % % % % % %
V V V V V V V V V V V V V V V V % % % % 0 % % 0 % 0 0 % 0 % % %%% %
V V V V V V V V % V % 0 % 0 0 % % %
% % % %% ___ _ 0 % 00 _ % % %
% % % % 00 / __\ | ___ __ _ _ __ (_)_ __ __ _ %
CLEANERS % % / / | |/ _ \/ _` | '_ \| | '_ \ / _` | 0 %
% % % % % % 00 / /___| | __/ (_| | | | | | | | | (_| | %
% % % % % % 0 \____/|_|\___|\__,_|_| |_|_|_| |_|\__, | %
V V V V V V V V % 0 |___/ % %
V V V V V V V V V V V V V V V V __ 0 ___ 0 % 0
V V V V V V V V V 0 / _| ___ _ __ / _ \___ ___ _ __ ___ ___ %
V V V V V V V V 0 % | |_ / _ \| '__| / /_)/ _ \ / _ \ '_ ` _ \/ __| %
V V V V V V V V V V V V V V V V 0 | _| (_) | | / ___/ (_) | __/ | | | | \__ \
V V V V V V V V V |_| \___/|_| \/ 0 \___/ \___|_| |_| |_|___/
0 0
Algolit chooses to work with texts %%% %
that are free of copyright. This by Algolit % % %
means that they have been published % % %
under a Creative Commons 4.0 li- For this exhibition we worked with 3 per cent of the Mundaneum's
cense – which is rare - or that archive. These documents were first scanned or photographed. To
they are in the public domain be- make the documents searchable they were transformed into text us-
cause the author died more than 70 ing Optical Character Recognition software (OCR). OCR are algo-
years ago. This is the case for the % rithmic models that are trained on other texts. They have learned
publications of the Mundaneum. We to identify characters, words, sentences and paragraphs. The
received 203 documents that we software often makes 'mistakes'. It might recognize a wrong char-
helped turn into datasets. They are acter, it might get confused by a stain an unusual font or the
now available for others online. reverse side of the page being visible. %
Sometimes we had to deal with poor % % %
text formats, and we often dedi- While these mistakes are often considered noise, confusing the
cated a lot of time to cleaning up training, they can also be seen as poetic interpretations of the
documents. We were not alone in do- algorithm. They show us the limits of the machine. And they also
ing this. reveal how the algorithm might work, what material it has seen in
training and what is new. They say something about the standards %
Books are scanned at high resolu- of its makers. In this installation we ask your help in verifying
tion, page by page. This is time- our dataset. As a reward we'll present you with a personal algo-
consuming, laborious human work and rithmic improvisation.
often the reason why archives and
libraries transfer their collec- ---
tions and leave the job to compa- %
nies like Google. The photos are Concept, code, interface: Gijs de Heij
converted into text via OCR (Opti- %
cal Character Recognition), a soft-
ware that recognizes letters, but
often makes mistakes, especially
when it has to deal with ancient
fonts and wrinkled pages. Yet more
wearisome human work is needed to
improve the texts. This is often
carried out by poorly-paid free-
lancers via micro-payment platforms
like Amazon's Mechanical Turk; or
by volunteers, like the community
around the Distributed Proofreaders
Project, which does fantastic work.
Whoever does it, or wherever it is
done, cleaning up texts is a tower-
ing job for which no structural au-
tomation yet exists.
23
% % % % % %% % % % % % % % %% 0 0 % % % % % %%
% %% % % % 0 0 0 % %% % % %% %%% %
% % %% %%% 0 ___ _ _ _ _ 0 _ % _
% % % % % 0 0 / (_)___| |_ _ __(_) |__ _ _| |_ ___ __| | % %
% % / /\ / / __| __| '__| | '_ \| | | | __/ _ \/ _` |
%% 0 / /_//| \__ \ |_| | | | |_) | |_| | || __/ (_| | %%
% % /___,' |_|___/\__|_| |_|_.__/ \__,_|\__\___|\__,_|
% % % ___ 0 __ 0 0 _ %
% / _ \_ __ ___ ___ / _|_ __ ___ __ _ __| | ___ _ __ ___
% % / /_)/ '__/ _ \ / _ \| |_| '__/ _ \/ _` |/ _` |/ _ \ '__/ __|
/ ___/| | | (_) | (_) | _| | | __/ (_| | (_| | __/ | \__ \
% 0 \/ |_| \___/ \___/|_| |_| \___|\__,_|\__,_|\___|_| |___/
0 0 0
% 0 % 0 % %%
% 0 0 0 % %
% % by Algolit % %
Distributed Proofreaders is a web-based interface and an interna-
tional community of volunteers who help converting public domain
% % books into e-books. For this exhibition they proofread the Munda-
neum publications that appeared before 1923 and are in the public
domain in the US. Their collaboration meant a great relief for
the members of Algolit. Less documents to clean up!
All the proofread books have been made available on the Project
Gutenberg archive.
For this exhibition, An Mertens interviewed Linda Hamilton, the
general manager of Distributed Proofreaders.
---
%
Interview: An Mertens
%
Editing: Michael Murtaugh, Constant %
%
%
24
CONTEXTUAL STORIES
FOR CLEANERS
--- Project Gutenberg and path to death – run your own code; dynamic change.
Distributed Proofreaders --- operate it. For nearly 84 years, the Turk won most
The Life Instinct: unification; the eternal re-
Project Gutenberg is our Ali Baba cave. It offers turn; the perpetuation and MAINTENANCE of the mate-
more than 58,000 free eBooks to be downloaded or rial; survival systems and operations; equilibrium.
read online. Works are accepted on Gutenberg when
their U.S. copyright has expired. Thousands of B. Two basic systems: Development and Maintenance.
volunteers digitize and proofread books to help
the project. An essential part of the work is done The sourball of every revolution: after the revo-
through the Distributed Proofreaders project. This lution, who’s going to try to spot the bias in
is a web-based interface to help convert public the output?
domain books into e-books. Think of text files,
EPUBs, Kindle formats. By dividing the workload Development: pure individual creation; the new;
into individual pages, many volunteers can work change; progress; advance; excitement; flight or
on a book at the same time; this speeds up the fleeing.
cleaning process.
Maintenance: keep the dust off the pure individual
During proofreading, volunteers are presented with creation; preserve the new; sustain the change;
a scanned image of the page and a version of the protect progress; defend and prolong the advance;
text, as it is read by an OCR algorithm trained to renew the excitement; repeat the flight; show your
recognize letters in images. This allows the text work – show it again, keep the git repository
to be easily compared to the image, proofread, and groovy, keep the data analysis revealing.
sent back to the site. A second volunteer is then
presented with the first volunteer's work. She Development systems are partial feedback systems
verifies and corrects the work as necessary, and with major room for change.
submits it back to the site. The book then simi-
larly goes through a third proofreading round, Maintenance systems are direct feedback systems
plus two more formatting rounds using the same web with little room for alteration.
interface. Once all the pages have completed these
steps, a post-processor carefully assembles them C. Maintenance is a drag;
into an e-book and submits it to the Project it takes all the fucking time (lit.)
Gutenberg archive.
The mind boggles and chafes at the boredom.
We collaborated with the Distributed Proofreaders
project to clean up the digitized files we re- The culture assigns lousy status on maintenance
ceived from the Mundaneum collection. From Novem- jobs = minimum wages, Amazon Mechanical Turks =
ber 2018 until the first upload of the cleaned-up virtually no pay.
book 'L'Afrique aux Noirs' in February 2019, An
Mertens exchanged about 50 emails with Linda Clean the set, tag the training data, correct the
Hamilton, Sharon Joiner and Susan Hanlon, all vol- typos, modify the parameters, finish the report,
unteers from the Distributed Proofreaders project. keep the requester happy, upload the new version,
The conversation is published online. It might attach words that were wrongly separated by OCR
inspire you to share unavailable books online. back together, complete those Human Intelligence
Tasks, try to guess the meaning of the requester's
formatting, you must accept the HIT before you can
--- An algoliterary version submit the results, summarize the image, add the
of the Maintenance Manifesto --- bounding box, what's the semantic similarity of
this text, check the translation quality, collect
In 1969, one year after the birth of her first your micro-payments, become a hit Mechanical Turk.
child, the New York artist Mierle Laderman Ukeles
wrote a Manifesto for Maintenance Art. The mani- Reference
festo calls for a readdressing of the status of
maintenance work both in the private, domestic https://www.arnolfini.org.uk/blog/manifesto-for-
space, and in public. What follows is an altered maintenance-art-1969
version of her text inspired by the work of the
Cleaners.
--- A bot panic on Amazon Mechanical Turk ---
IDEAS
Amazon's Mechanical Turk takes the name of a
A. The Death Instinct and the Life Instinct: chess-playing automaton from the eighteenth cen-
tury. In fact, the Turk wasn't a machine at all.
The Death Instinct: separation; categorization; It was a mechanical illusion that allowed a human
avant-garde par excellence; to follow the predicted chess master to hide inside the box and manually
25
of the games played during its demonstrations
around Europe and the Americas. Napoleon Bonaparte
is said to have been fooled by this trick too.
The Amazon Mechanical Turk is an online platform
for humans to execute tasks that algorithms can-
not. Examples include annotating sentences as be-
ing positive or negative, spotting number plates,
discriminating between face and non-face. The jobs
posted on this platform are often paid less than a
cent per task. Tasks that are more complex or re-
quire more knowledge can be paid up to several
cents. To earn a living, Turkers need to finish as
many tasks as fast as possible, leading to in-
evitable mistakes. As a result, the requesters
have to incorporate quality checks when they post
a job on the platform. They need to test whether
the Turker actually has the ability to complete
the task, and they also need to verify the re-
sults. Many academic researchers use Mechanical
Turk as an alternative to have their students exe-
cute these tasks.
In August 2018 Max Hui Bai, a psychology student
from the University of Minnesota, discovered that
the surveys he conducted with Mechanical Turk were
full of nonsense answers to open-ended questions.
He traced back the wrong answers and found out
that they had been submitted by respondents with
duplicate GPS locations. This raised suspicion.
Though Amazon explicitly prohibits robots from
completing jobs on Mechanical Turk, the company
does not deal with the problems they cause on
their platform. Forums for Turkers are full of
conversations about the automation of the work,
sharing practices of how to create robots that can
even violate Amazon’s terms. You can also find
videos on YouTube that show Turkers how to write a
bot to fill in answers for you.
Kristy Milland, an Mechanical Turk activist, says:
'Mechanical Turk workers have been treated really,
really badly for 12 years, and so in some ways I
see this as a point of resistance. If we were paid
fairly on the platform, nobody would be risking
their account this way.'
Bai is now leading a research project among social
scientists to figure out how much bad data is in
use, how large the problem is, and how to stop it.
But it is impossible at the moment to estimate how
many datasets have become unreliable in this way.
References
https://requester.mturk.com/create/projects/new
https://www.wired.com/story/amazon-mechanical-
turk-bot-panic/
https://www.maxhuibai.com/blog/evidence-that-
responses-from-repeating-gps-are-random
http://timryan.web.unc.edu/2018/08/12/data-
contamination-on-mturk/
26
informants inform informants inform informants inform informants inform informants inform info
mants inform informants inform informants inform informants inform informants i
form informants inform informants inform informants inform info
mants inform informants inform informants inform informants info
m informants inform informants inform informants inform
informants inform informants inform informants
inform informants inform informants inform
informants inform informants inform informants info
m informants inform informants inform
informants inform informants inform
informants inform informants inform in
ormants inform informants inform infor
ants inform informants inform info
mants inform informants inform
informants inform informants inform
informants inform informants inform
informants inform informants inform
informants inform infor
ants inform informants inform
informants inform informants inform
informants inform
informants inform informants inform
informants inform
informants inform informants inform
informants inform
informants inform informants inform
informants inform
informants inform informants
inform informants inform
informants inform
informants inform informants
inform informants inform
informants inform
informants inform
informants inform informants info
m informants inform
informants inform
informants inform
informants inform
informants inform informants
inform informants inform
informants inform
informants inform
informants inform
informants inform
informants inform
informants inform
informants inform
informants inform
informants inform
informants inform in
ormants inform info
mants inform infor
ants inform infor
ants inform info
mants inform in
ormants inform
informants inform
informants inform
informants inform
informants inform
informants inform
informants inform
informants inform
informants inform
informants inform
informants inform
informants inform
informants inform
27
r 8h3t i5 4 d 7 + +-+-+-+-+-+-+-+-+-+-+ c a +-+-+-+-+-+-+ e f n no6 - - t -as 7 ( e
a ah 5al ,n ri B |i|n|f|o|r|m|a|n|t|s| l |i|n|f|o|r|m| , 35e t s evn7 73r o2/ L ep - e
t : ca,i ma eeslh | +-+-+-+-+-+-+-+-+-+-+ r_ T +-+-+-+-+-+-+ 2o 73 pjt 7ng% e 84
n 7 hnprs s9i 3a1 9e _ 9l e o pi rsa d o ii/5am sd rr1 1 n% + n8w
h|29 e s _ 3 . o i c i. e+1onIa 4 f p | lu e v1r _nth2i a%a ce 1e 7e 1y |t e r
xn r 8 sF w t -e +-+-+-+-+ +-+-+-+-+-+-+-+ e +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+ 1 i2 n l cn r3
t e e ,i n ibC 6 |e|a|c|h| |d|a|t|a|s|e|t| |c|o|l|l|e|c|t|s| |d|i|f|f|e|r|e|n|t| iw tc a318
e o l a Me -o r + +-+-+-+-+ +-+-+-+-+-+-+-+ d 9 +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+ +yc l p
+6 n 8 , a -rsb es 3 t t | bt ,p q +-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+ 6 1d e 4 , 1 +
lk o95 sf s e - 2 b 0 rl n la / S f n |i|n|f|o|r|m|a|t|i|o|n| |a|b|o|u|t| 1 4r y7 n
i _ m ec cf 2|r 8ra5 n l 6t +-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+ o t | r e
h_ ae3 5 Ti nf ao 7 l t n 9 9 h +e e-1 +-+-+-+ +-+-+-+-+-+ 7 t 8 - f mme 5
t og m 9 i r. m l l j +t3 9 |t|h|e| |w|o|r|l|d| e97 3 9 t i s - o s
_i n l o er 8 n petc 141 s / i +-+-+-+ +-+-+-+-+-+ - 9 w 1 1 b
t4, r e u n8 a |t +-+-+-+-+-+-+-+-+ , |c +-+-+-+ +-+-+-+-+-+-+ +-+-+-+-+ 2r t 3
o 6 9.o7e 7 Ce |d|a|t|a|s|e|t|s| V |a|r|e| |i|m|b|u|e|d| |w|i|t|h| 7 ig g ig 3xa
i r- p R h 8 rr m g _ t +-+-+-+-+-+-+-+-+ +-+-+-+ +-+-+-+-+-+-+ +-+-+-+-+ n f -c , +
- - 9 f k i r 6 e 665 a +-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+ t m 1 9 6
om _ 1e Tlh4 , f vr E |c|o|l|l|e|c|t|o|r|'|s| |b|i|a|s| 0 7 t e 2t
E5 r o r i i b e hw i a ne +-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+ t a
m, m4 - a +-+-+-+-+ +-+-+-+-+-+-+-+-+ d +-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ 118 2a 6
- l l |s|o|m|e| |d|a|t|a|s|e|t|s| rt3 |c|o|m|b|i|n|e| |m|a|c|h|i|n|i|c| k f e
d i i 1 e , h +-+-+-+-+ +-+-+-+-+-+-+-+-+ 5 +-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ i % _e r
_ f oi e u s dt y +-+-+-+-+-+ +-+-+-+-+ +-+-+-+-+-+ i n9 7 o
f f 5 h l9 a a b n |l|o|g|i|c| |w|i|t|h| |h|u|m|a|n| s n 79 e if e 0
s i ln 6t a y t | ’7 / h +-+-+-+-+-+ +-+-+-+-+ +-+-+-+-+-+ 1 - 1n
s yn p p r oe xy +-+-+-+-+-+ c n d 6 _i a n
- n iu a v s, d o 7 eu e i |l|o|g|i|c| e as d m 2 v|h - | r
aL t5 l7 st A c S r c n r / +-+-+-+-+-+ tt o dr | V
s 9 +-+-+-+-+-+-+ +-+-+-+-+ d 7 + 5 77 2 t
z l x n |m|o|d|e|l|s| |t|h|a|t| d i n oS ad + a a a . _ t
ie 7 n n +-+-+-+-+-+-+ +-+-+-+-+ is r t 9 , | f 4 4 a t
8 - 8 e +-+-+-+-+-+-+-+ 1 o 8 h h + t
s +m tb rh f 5 6r |r|e|q|u|i|r|e| s o l2 2 | + s o n
a - rr o n +-+-+-+-+-+-+-+ m | o y 4 r _
5 i +-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+ d |m ? e
b 4 _ l ` |s|u|p|e|r|v|i|s|i|o|n| |m|u|l|t|i|p|l|y| |t|h|e| - s n 7 1
Tn n - +-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+ d 5
ls t v 3i . - 6 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ h _ 28 9f
4 s i h s- 4 4 l i |s|u|b|j|e|c|t|i|v|i|t|i|e|s| e a u
t + 9 fh lh,d +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 6 c 8
3 r c i 1 +-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+ +-+-+-+-+ p -
fn o |m|o|d|e|l|s| c |p|r|o|p|a|g|a|t|e| |w|h|a|t| + 5 M 4
5 r g +-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+ +-+-+-+-+ i t f
9 t i y +-+-+-+-+-+-+-+ +-+-+-+-+ sv 7
6r +e n t7 + A h |t|h|e|y|'|v|e| |b|e|e|n| o 45 6
m s t 9 o o _ s +-+-+-+-+-+-+-+ +-+-+-+-+ t o+ u e
s k8 3 l 2 - e +-+-+-+-+-+-+ e 6 e- t -
+ es n 5 e o 4 |t|a|u|g|h|t| s 9
t p e w , : o - +-+-+-+-+-+-+ t t 3
e 6 r 8 t +-+-+-+-+ +-+-+ +-+-+-+ a eo m m 3
e |s|o|m|e| |o|f| |t|h|e| + h e c
ee +-+-+-+-+ +-+-+ +-+-+-+ c h
o +-+-+-+-+-+-+-+-+ +-+-+-+-+ +-+-+ +-+-+-+-+-+-+-+ +-+-+
i k t |d|a|t|a|s|e|t|s| |p|a|s|s| |a|s| |d|e|f|a|u|l|t| |i|n| o o o
+-+-+-+-+-+-+-+-+ i +-+-+-+-+ +-+-+ +-+-+-+-+-+-+-+ +-+-+ r d
a i m a . 1 +-+-+-+ +-+-+-+-+-+-+-+ s u
r h o 2 |t|h|e| |m|a|c|h|i|n|e| l t
+ e a +-+-+-+ +-+-+-+-+-+-+-+ d 7 |
e a eo 4 +-+-+-+-+-+-+-+-+ +-+-+-+-+-+
h n |l|e|a|r|n|i|n|g| |f|i|e|l|d| s n
t _s n +-+-+-+-+-+-+-+-+ +-+-+-+-+-+
t n o +-+-+-+-+-+-+ +-+-+-+-+-+ +-+-+-+-+-+-+-+-+ e V
a d |h|u|m|a|n|s| |g|u|i|d|e| |m|a|c|h|i|n|e|s| u n
+-+-+-+-+-+-+ +-+-+-+-+-+ +-+-+-+-+-+-+-+-+
c e 5 1 2
r 6 r n 6 f
l o l
28
% V V V V V V V % V % %% % %%% %% %% % %%% %%% % % %%
V V V V V V V V V V V V V V V V % % % % % %% 0 %% 0 % % % % % %%% %
V V V V V V V V V % % %% % % 0 0 % % % %
% % % % % % % % % % 00 0 _ % % % % % %% % %
% % % 0 /_\ _ __ % %
% INFORMANTS % % % //_\\| '_ \ % 0
% % % % % 0 % % 0 / _ \ | | | % % % %%
% % % 0 \_/ \_/_| |_| 0 0
V V V V V % V V V % __ _ _ 00 % 00 0 _ %
V V V V V V V V V V V V V V V V 0 /__\ |_| |__ _ __ ___ __ _ _ __ __ _ _ __ | |__ _ _
V V V V V V V V V /_\ | __| '_ \| '_ \ / _ \ / _` | '__/ _` | '_ \| '_ \| | | | %
V V V V V V V V //__ | |_| | | | | | | (_) | (_| | | | (_| | |_) | | | | |_| |
V V V V V V V V V V V V V V V V % \__/ \__|_| |_|_| |_|\___/ \__, |_| \__,_| .__/|_| |_|\__, |
V V V V % V V V V V 0 0 % 0 % |___/ |_| 0 |___/
% 0 0 __ 0 ___ % _ _ 0 %
Machine learning algorithms need ___ / _| / \__ _| |_ __ _ ___ ___| |_ ___
guidance, whether they are super- 0 / _ \| |_ 0 / /\ / _` | __/ _` / __|/ _ \ __/ __| %
vised or not. In order to separate | (_) | _| / /_// (_| | || (_| \__ \ __/ |_\__ \
one thing from another, they need \___/|_| /___,' \__,_|\__\__,_|___/\___|\__|___/ % %
material to extract patterns from. 0 0 0
One should carefully choose the % %
study material, and adapt it to the by Algolit
machine's task. It doesn't make
sense to train a machine with nine- We often start the monthly Algolit meetings by searching for
teenth-century novels if its mis- datasets or trying to create them. Sometimes we use already-ex-
sion is to analyse tweets. A badly isting corpora, made available through the Natural Language
written textbook can lead a student Toolkit nltk. NLTK contains, among others, The Universal Declara-
to give up on the subject altogeth- tion of Human Rights, inaugural speeches from US presidents, or
er. A good textbook is preferably movie reviews from the popular site Internet Movie Database
not a textbook at all. (IMDb). Each style of writing will conjure different relations
% between the words and will reflect the moment in time from which
This is where the dataset comes in: they originate. The material included in NLTK was selected be-
arranged as neatly as possible, or- cause it was judged useful for at least one community of re-
ganized in disciplined rows and searchers. In spite of specificities related to the initial con-
lined-up columns, waiting to be text of each document, they become universal documents by de-
read by the machine. Each dataset fault, via their inclusion into a collection of publicly avail-
collects different information % able corpora. In this sense, the Python package manager for natu-
about the world, and like all col- ral language processing could be regarded as a time capsule. The
lections, they are imbued with col- main reason why The Universal Declaration for Human Rights was
lectors' bias. You will hear this included may have been because of the multiplicity of transla-
expression very often: 'data is the tions, but it also paints a picture of the types of human writing
new oil'. If only data were more that algorithms train on.
like oil! Leaking, dripping and
heavy with fat, bubbling up and With this work, we look at the datasets most commonly used by
jumping unexpectedly when in con- data scientists to train machine algorithms. What material do
tact with new matter. Instead, data they consist of? Who collected them? When?
is supposed to be clean. With each
process, each questionnaire, each --- %
column title, it becomes cleaner
and cleaner, chipping distinct % Concept & execution: Cristina Cochior
characteristics until it fits the %
mould of the dataset. % %
0 0 00 0
Some datasets combine the machinic 0 0 0 0
logic with the human logic. The __ __ _ _
models that require supervision 0 / / /\ \ \ |__ ___ __ _(_)_ __ ___
multiply the subjectivities of both 0 \ \/ \/ / '_ \ / _ \ \ \ /\ / / | '_ \/ __|
data collectors and annotators, \ /\ /| | | | (_) | \ V V /| | | | \__ \
then propagate what they've been 0 \/ \/ |_| |_|\___/ \_/\_/ |_|_| |_|___/
taught. You will encounter some of 0 0 0 0 0
the datasets that pass as default
in the machine learning field, as Who wins: creation of relationships
well as other stories of humans
guiding machines. by Louise Dekeuleneer, student Arts²/Section Visual Communication
French is a gendered language. Indeed many words are female or
male and few are neutral. The aim of this project is to show that
a patriarchal society also influences the language itself.
29
The work focused on showing whether more female or male words are
% % %%% % %% % used on highlighting the influence of context on the gender of %%%%%
% % % % % % words. At this stage, no conclusions have yet been drawn.  %
% % % % %% % % % % % % % % % % %
% %% Law texts from 1900 to 1910 made available by the Mundaneum have
% % %% % % been passed into an algorithm that turns the text into a list of %
%% % % % words. These words are then compared with another list of French %
% % % % % words, in which is specified whether the word is male or female.
This list of words comes from Google Books. They created a huge
% % % % database in 2012 from all the books scanned and available on
% Google Books. % %
% % % % % % % %
Male words are highlighted in one colour and female words in an-
% % % % other. Words that are not gendered (adverbs, verbs, etc.) are not
% % % highlighted. All this is saved as an HTML file so that it can be
% % directly opened in a web page and printed without the need for
% additional layout. This is how each text becomes a small booklet
by just changing the input text of the algorithm.
%
0 % 0 0 0
0 0 0 %
_____ _ 0 0
% 0 0 /__ \ |__ ___ % 0
% / /\/ '_ \ / _ \ 0 %
0 / / | | | | __/ 0
% 0 0 0 \/ |_| |_|\___|
% 0 _ 0 0 _ _
/_\ _ __ _ __ ___ | |_ __ _| |_ ___ _ __
//_\\| '_ \| '_ \ / _ \| __/ _` | __/ _ \| '__|
/ _ \ | | | | | | (_) | || (_| | || (_) | | 0
\_/ \_/_| |_|_| |_|\___/ \__\__,_|\__\___/|_|
0 0
%
by Algolit
The annotator asks for the guidance of visitors in annotating
the archive of Mundaneum.
The annotation process is a crucial step in supervised machine
learning where the algorithm is given examples of what it needs
to learn. A spam filter in training will be fed examples of spam
% and real messages. These examples are entries, or rows from the
dataset with a label, spam or non-spam.
The labelling of a dataset is work executed by humans, they pick
a label for each row of the dataset. To ensure the quality of the
% labels multiple annotators see the same row and have to give the
same label before an example is included in the training data.
Only when enough samples of each label have been gathered in the
dataset can the computer start the learning process.
In this interface we ask you to help us classify the cleaned
texts from the Mundaneum archive to expand our training set and
improve the quality of the installation 'Classifying the World'
in Oracles.
---
Concept, code, interface: Gijs de Heij
30
%% % % %% % % % % %
% %% % % 0 0 0 0 0 0 % % %
% % % % % 0 0 0 0 % % % % %
% % % % % 0 0 _ ___ ___ ___ 00 %% %
% % % % 0 0 / |/ _ \ / _ \ / _ \ 0
%% % % 0 0 | | | | | | | | | | | %
% % % % 0 | | |_| | |_| | |_| | %% %
% % |_|\___/ \___/ \___/ 00 0
% % % % 00 0 0 0 0 _ 00 % % %
% % % ___ _ _ _ __ ___ ___| |_ ___
% % / __| | | | '_ \/ __|/ _ \ __/ __| %
% % %% 0 0 \__ \ |_| | | | \__ \ __/ |_\__ \ % %
0 0 % |___/\__, |_| |_|___/\___|\__|___/
0 %% 0 |___/ % % 0 %
0 0 0 0 __ _ % 0 _ 0 % %
0 0 / /\ /(_)_ __ _ _| |
0 | |\ \ / / | '_ \| | | | |
0 % | | \ V /| | | | | |_| | | 0 0
% | | \_/ |_|_| |_|\__, |_| %
% % 00 \_\ 0 |___/ 0
% % % __ _ _ _ _ % __ 0
0 0 % /__\_| (_) |_(_) ___ _ __\ \
% /_\/ _` | | __| |/ _ \| '_ \| | 0
% //_| (_| | | |_| | (_) | | | | |
0 \__/\__,_|_|\__|_|\___/|_| |_| | 0
% % 00 0 0 /_/
0 0 00
%
by Algolit
Created in 1985, Wordnet is a hierarchical taxonomy that de-
scribes the world. It was inspired by theories of human semantic
% memory developed in the late 1960s. Nouns, verbs, adjectives and
adverbs are grouped into synonyms sets or synsets, expressing a
different concept. %
ImageNet is an image dataset based on the WordNet 3.0 nouns hier-
archy. Each synset is depicted by thousands of images. From 2010 %
until 2017, the ImageNet Large Scale Visual Recognition Challenge
(ILSVRC) was a key benchmark in object category classification
% for pictures, having a major impact on software for photography,
image searches, image recognition.
1000 synsets (Vinyl Edition) contains the 1000 synsets used in
this challenge recorded in the highest sound quality that this
% analog format allows. This work highlights the importance of the
datasets used to train artificial intelligence (AI) models that
run on devices we use on a daily basis. Some of them inherit
classifications that were conceived more than 30 years ago. This
sound work is an invitation to thoughtfully analyse them.
---
Concept & recording: Javier Lloret
Voices: Sara Hamadeh & Joseph Hughes
31
CONTEXTUAL STORIES
ABOUT INFORMANTS
--- Datasets as representations --- community you try to distinguish what serves the
community and what doesn't and you try to general-
The data-collection processes that lead to the ize that, because I think that's what the good
creation of the dataset raise important questions: faith-bad faith algorithm is trying to do, to find
who is the author of the data? Who has the privi- helper tools to support the project, you do that
lege to collect? For what reason was the selection on the basis of a generalization that is on the
made? What is missing? abstract idea of what Wikipedia is and not on the
living organism of what happens every day. What
The artist Mimi Onuoha gives a brilliant example interests me in the relation between vandalism and
of the importance of collection strategies. She debate is how we can understand the conventional
chose the case of statistics related to hate drive that sits in these machine-learning pro-
crimes. In 2012, the FBI Uniform Crime Reporting cesses that we seem to come across in many places.
(UCR) Program registered almost 6000 hate crimes And how can we somehow understand them and deal
committed. However, the Department of Justice’s with them? If you place your separation of good
Bureau of Statistics came up with about 300.000 faith-bad faith on pre-existing labelling and then
reports of such cases. That is over 50 times as reproduce that in your understanding of what edits
many. The difference in numbers can be explained are being made, how then to take into account
by how the data was collected. In the first situa- movements that are happening, the life of the ac-
tion law enforcement agencies across the country tual project?
voluntarily reported cases. For the second survey,
the Bureau of Statistics distributed the National Amir: It's an interesting discussion. Firstly,
Crime Victimization form directly to the homes of what we are calling good faith and bad faith comes
victims of hate crimes. from the community itself. We are not doing la-
belling for them, they are doing labelling for
In the field of Natural Language Processing (NLP) themselves. So, in many different language
the material that machine learners work with is Wikipedias, the definition of what is good faith
text-based, but the same questions still apply: and what is bad faith will differ. Wikimedia is
who are the authors of the texts that make up the trying to reflect what is inside the organism and
dataset? During what period were the texts col- not to change the organism itself. If the organism
lected? What type of worldview do they represent? changes, and we see that the definition of good
faith and helping Wikipedia has been changed, we
In 2017, Google's Top Stories algorithm pushed a are implementing this feedback loop that lets
thread of 4chan, a non-moderated content website, people from inside their community pass judgement
to the top of the results page when searching for on their edits and if they disagree with the la-
the Las Vegas shooter. The name and portrait of an belling, we can go back to the model and retrain
innocent person were linked to the terrible crime. the algorithm to reflect this change. It's some
Google changed its algorithm just a few hours af- sort of closed loop: you change things and if
ter the mistake was discovered, but the error had someone sees there is a problem, then they tell us
already affected the person. The question is: why and we can change the algorithm back. It's an on-
did Google not exclude 4chan content from the going project.
training dataset of the algorithm?
Reference
Reference https://gitlab.constantvzw.org/algolit/algolit/blob/
https://points.datasociety.net/the-point-of- master/algoliterary_encounter/Interview%20with%20Amir
collection-8ee44ad7c2fa
https://arstechnica.com/information-technology --- How to make your dataset known ---
/2017/10/google-admits-citing-4chan-to-spread-
fake-vegas-shooter-news/ NLTK stands for Natural Language Toolkit. For pro-
grammers who process natural language using
Python, this is an essential library to work with.
--- Labeling for an Oracle that Many tutorial writers recommend machine learning
detects vandalism on Wikipedia --- learners to start with the inbuilt NLTK datasets.
It comprises 71 different collections, with a to-
This fragment is taken from an interview with Amir tal of almost 6000 items.
Sarabadani, software engineer at Wikimedia. He was
in Brussels in November 2017 during the Algoliter- There is for example the Movie Review corpus for
ary Encounter. sentiment analysis. Or the Brown corpus, which was
put together in the 1960s by Henry Kučera and W.
Femke: If you think about Wikipedia as a living Nelson Francis at Brown University in Rhode Is-
community, with every edit the project changes. land. There is also the Declaration of Human
Every edit is somehow a contribution to a living Rights corpus, which is commonly used to test
organism of knowledge. So, if from within that whether the code can run on multiple languages.
32
The corpus contains the Declaration of Human Rights In fact, at the beginning of Wikipedia,
expressed in 372 languages from around the world. many articles were written by bots.
Rambot, for example, was a controversial bot
But what is the process of getting a dataset ac- figure on the English-speaking platform.
cepted into the NLTK library nowadays? On the It authored 98 per cent of the pages de-
Github page, the NLTK team describes the following scribing US towns.
requirements:
As a result of serial and topical robot interven-
- Only contribute corpora that have obtained a ba- tions, the models that are trained on the full
sic level of notability. That means, there is a Wikipedia dump have a unique view on composing ar-
publication that describes it, and a community of ticles. For example, a topic model trained on all
programmers who are using it. of Wikipedia articles will associate 'river' with
- Ensure that you have permission to redistribute 'Romania' and 'village' with 'Turkey'. This is be-
the data, and can document this. This means that cause there are over 10000 pages written about
the dataset is best published on an external web- villages in Turkey. This should be enough to spark
site with a licence. anyone's desire for a visit, but it is far too
- Use existing NLTK corpus readers where possible, much compared to the number of articles other
or else contribute a well-documented corpus reader countries have on the subject. The asymmetry
to NLTK. This means, you need to organize your causes a false correlation and needs to be re-
data in such a way that it can be easily read us- dressed. Most models try to exclude the work of
ing NLTK code. these prolific robot writers.
Reference
--- Extract from a positive IMDb https://blog.lateral.io/2015/06/the-unknown-
movie review from the NLTK dataset --- perils-of-mining-wikipedia/
corpus: NLTK, movie reviews
fileid: pos/cv998_14111.txt
steven spielberg ' s second epic film on world war
ii is an unquestioned masterpiece of film . spiel-
berg , ever the student on film , has managed to
resurrect the war genre by producing one of its
grittiest , and most powerful entries . he also
managed to cast this era ' s greatest answer to
jimmy stewart , tom hanks , who delivers a perfor-
mance that is nothing short of an astonishing mir-
acle . for about 160 out of its 170 minutes , "
saving private ryan " is flawless . literally .
the plot is simple enough . after the epic d - day
invasion ( whose sequences are nothing short of
spectacular ) , capt . john miller ( hanks ) and
his team are forced to search for a pvt . james
ryan ( damon ) , whose brothers have all died in
battle . once they find him , they are to bring
him back for immediate discharge so that he can go
home . accompanying miller are his crew , played
with astonishing perfection by a group of charac-
ter actors that are simply sensational . barry
pepper , adam goldberg , vin diesel , giovanni
ribisi , davies , and burns are the team sent to
find one man , and bring him home . the battle se-
quences that bookend the film are extraordinary .
literally .
--- The ouroboros of machine learning ---
Wikipedia has become a source for learning not
only for humans, but also for machines. Its arti-
cles are prime sources for training models. But
very often, the material the machines are trained
on is the same content that they helped to write.
33
0 12 3 4 5 67 8 9 0
12 3 4 5 67 8 9 0 12
3 4 5 67 8 9 0 1 2
3 4 5 6 7 8 9 0 1 2 3
4 56 7 8 9 01 2 3
4 56 7 8 9 01 2 3
4 56 7 8 9 01 2 3
4 56 7 8 9 0 1 2 3 4
5 6 7 8 9 0 1 2 3 45 6
7 8 90 1 2 3 45 6
7 8 90 1 2 3 45 6
7 8 90 1 2 3 4 5 6
7 8 9 0 1 2 3 4 5 6
7 89 0 1 2 34 5 6
7 89 0 1 2 34 5 6
7 89 0 1 2 34 5 6 7
89 0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 78 9
0 1 23 4 5 6 78 9 0
1 23 4 5 6 78 9 0
1 23 4 5 6 78 9 0
1 2 3 4 5 6 7 8 9 0
1 2 3 4 5 67 8 9 0 12
3 4 5 67 8 9 0 12
3 4 5 67 8 9 0 12 3
4 5 6 7 8 9 0 1 2 3
4 5 6 7 8 9 01 2 3
4 56 7 8 9 01 2 3
4 56 7 8 9 01 2 3
4 56 7 8 9 01 2 3 4
5 6 7 8 9 0 1 2 3 4 5 6
7 8 90 1 2 3 45 6
7 8 90 1 2 3 45 6
7 8 90 1 2 3 45 6
7 8 90 1 2 3 4 5 6 7
8 9 0 1 2 3 4 5 6 7
89 0 1 2 34 5 6 7
89 0 1 2 34 5 6 7 89
0 1 2 34 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 23 4 5 6 78 9 0
1 23 4 5 6 78 9 0
1 23 4 5 6 78 9 0
1 23 4 5 6 7 8 9 0
1 2 3 4 5 6 7 8 9 0 12 3
4 5 67 8 9 0 12 3
4 5 67 8 9 0 12 3
4 5 67 8 9 0 12 3
4 5 6 7 8 9 0 1 2 3
4 5 6 7 8 9 01 2 3 4
56 7 8 9 01 2 3 4
56 7 8 9 01 2 3 4 5
6 7 8 9 0 1 2 3 4 5 6
7 8 9 0 1 2 3 45 6
7 8 90 1 2 3 45 6
7 8 90 1 2 3 45 6
7 8 90 1 2 3 45 6 7
8 9 0 1 2 3 4 5 6 7
8 9 0 1 2 34 5 6 7 89
0 1 2 34 5 6 7 89 0
1 2 34 5 6 7 89 0
1 2 34 5 6 7 8 9 0
1 2 3 4 5 6 7 8 9 0
1 23 4 5 6 78 9 0
1 23 4 5 6 78 9 0
1 23 4 5 6 78 9 0 1
2 3 4 5 6 7 8 9 0 1 2 3
4 5 67 8 9 0 12 3
4 5 67 8 9 0 12 3
34
readers read readers read readers read readers read readers read readers read readers re
d readers read readers read readers read readers read readers re
d readers read readers read readers read readers read
readers read readers read readers read re
ders read readers read readers read readers re
d readers read readers read readers r
ad readers read readers read
readers read readers read readers read
readers read readers read
readers read readers read readers read
readers read readers read
readers read readers read
readers read readers read
readers read readers read
readers read readers read
readers read readers read
readers read readers
read readers read
readers read readers read
readers read readers read
readers read
readers read readers read
readers read
readers read readers read
readers read
readers read readers read
readers read
readers read readers re
d readers read
readers read
readers read readers read
readers read
readers read
readers read re
ders read readers read
readers read
readers read
readers read
readers read readers r
ad readers read
readers read
readers read
readers read
readers read
readers read
readers read
readers read readers
read readers read
readers read
readers read
readers read
readers read
readers read
readers read
readers read
readers read
readers read
readers read
readers read
readers read
readers read
readers read
readers read
readers read
readers read
readers read
readers read
readers read
readers read r
35
h a o e f rtlt9 b9r+t +-+-+-+-+-+-+-+ n +-+-+-+-+ aM B 6 r fwea5I s s ,e -h e e
m et u t w8 8+ i4 + R w e |r|e|a|d|e|r|s| f |r|e|a|d| C a r_ n b - i1 a s- noh6M+ pha
h a% 8 e olt r_ m c hb8 b +-+-+-+-+-+-+-+ mi +-+-+-+-+ pli f ro u n ae 3aee d oo| 3h 6o
2 ce 'd | 8 eA s d8 - i 6 1 %6 sr2 9 g2 a s lia wrc 3 ?7 i n3+7m s
c htiuw :ead 7 _ 9r t i d 5 sau4nl |e_ ar 8orl t h h+se a s _o1 s56 ka5n1e no hd
d m u 's +e | h64t +-+ +-+-+-+-+-+-+-+-+ o +-+-+-+-+-+-+-+-+-+-+-+ enl o 3 t d Ad- 2 ahs
g o i 0 _ 5o ss x 4 |a| |c|o|m|p|u|t|e|r| sl |u|n|d|e|r|s|t|a|n|d|s| 4i 8 trdiM 48 i5 2 9
tl e ri 6 9 ln a /8e +-+ +-+-+-+-+-+-+-+-+ 6 x +-+-+-+-+-+-+-+-+-+-+-+ 4 \eda o |y A o3 /1
e _ en l r 7 -sd c o +-+-+-+ +-+-+-+-+-+-+ l +-+-+-+-+-+-+-+-+-+ d6 m7n n a np l4 s
7 t p e M fdh c as |a|l|l| |m|o|d|e|l|s| Sa |t|r|a|n|s|l|a|t|e| a 6 w da 5 - o4 5 i )
r l a nn sh fc ui e7 +-+-+-+ +-+-+-+-+-+-+ c a +-+-+-+-+-+-+-+-+-+ ar 9 r , e a 3 , i
4 r 2 t +-+-+-+-+ +-+-+-+-+-+-+ 72 +-+-+-+-+-+ p r s r a a h an ' 3 a
o p ft n l |s|o|m|e| |m|o|d|e|l|s| |c|o|u|n|t| 8r n| 1 a r h o /oa e 7
m8 4 wa +-+-+-+-+ +-+-+-+-+-+-+ l 7 +-+-+-+-+-+ 2 or r i 9e 4 p142 ,6r
l 4N i u-3 am +-+-+-+-+ +-+-+-+-+-+-+ 4s +-+-+-+-+-+-+-+ 23 a e rea le dhVo t74 g
j 7 t o e rd |s|o|m|e| |m|o|d|e|l|s| |r|e|p|l|a|c|e| o -i no r + 2 r l i
o 6 7g i tt i +-+-+-+-+ +-+-+-+-+-+-+ 8fa +-+-+-+-+-+-+-+ x7 e g o ee d +ni
d i tr 6k t r 2 3a8 9 i3 5 hv7 ge 5e u - 3y a _ e 2 8 c
55fi1 - 6 :29 t e al+ atp43e + ac t n b t hTsa4ti03 o% % flol 4-e
rf m r 8 6y heta 1 e 1 m6 +t dy p e 9 n ,o 5 / n _ | s e1 + ni d
n 3 leo 5 ti 5 - sc a +1 w uw9 n+ e i m m
3 a a a 9 \ -8 18 e e l i e h ghc ey9 8 15 3y a 1 -e i 5a i 9r a5pe
o c c % a + 255 t yy m % 4i i 5 i e t _ 7 au l% 7 o
g s8 5 e 2 r 3i 2 1 _ i4ir 2 e l s 1la n s s ht 2 r s i 3 r
u s+ a e m + 6 2n r-l a c6 - t 7 4t +i +r % 8 6 8 r t t r 3 1
r s 90 k hl a pWn e i5 7 8 a r e4ro e r5wt s m
- h ea 6 2 8 2 v h nf e _ w lr a iai 7
| j 4 4 f hc i F 9 p s m toG al 6 / h sde l e
a 4 s 6 9 - h o m 6 _l34 . % w7 e 8 e l
n .52- i 7 5 _ r + s 5 p s 5n+ 3 il e 1 o F c
3 l 2 a o en% _. e 4 8lb 3 r a I 9 k o
t r 6 e + 2 6 y oa n i r% f 1 n78 s h F o
e g v 6 u h ad Ua1 2 a t 9 er n t oh7 s s r t g
+ 7 6 h8 t 7 a - m 73| t o e r i 7
f l ia s _ e u + 7 ct \ a _ 2- 7 . o o - ,
t n 0n 4+ f 2r i 9 s y i3 r t r s e a p m h 4
a c 7 t 9 n n m mro t s i nd e r
a 1 e e | e 1 3 c n k 2 p e o e
7i s d 6 a 48 c + Dl 1 1 n r - 0
V r + a o % 7 7 9r 4 | 9 n 7 e
e n | , m n e s s 1 e n 5
5 r 4 o 5 1 6 e - 2 a -r _ e s’1 e S i
t 2 +|ee s e c n an i e
a4 9 9 o p _ t 7 h v 9 0
d % a e , s nr 9 l W h a e t | + + s
a 3 7I a e tk K y3e 2 c - a h o u e d
\+ o 1 h r d t e nl 4 k 9 07 o t v 7s
, n e % _x | i t b1 r h ei
t a8 e o n t 12 o rs a y
i e + n a | a 9 \
n sr - e 3 i r- 8o e i
6 f i 3 ht a l | h 1 o
a s df m5 i h n i 9n ,u
d c n H s o l c i 5
o | s m rl 9 1 n c _i e
i + i nr 8 h % t a % t 0 m
i 6 c6 wt a r
g s pr l t a 5 | c i |
e 1 sr/ n e 7 e 9 n t w e c '
m c - o % n . a 3
f1 c I u 9 + t
2 . , 4 na P e e f 2
n i t 1S f n n a i e
r + e i h 9 _ v
3 | h e t s a
s E l v - p u 1 h 2 , ' 5
| + nse t a % 8 e w
o p n y o s o
36
V V V V V V % V V % % % %% % % % % %% % % %%
V V V V V V V V V V V V V V V V % % % 0 0 % % % % 0 %% % %%% % % %%% %
V V V V V V V V V % 0 0 %% % % 0 0 % % 0 %
% % %% % % 0 _____ _ % ___ % _ % __ % %
% % % % /__ \ |__ % ___ / __\ ___ ___ | | __ ___ / _| %
% % READERS % / /\/ '_ \ / _ \ /__\/// _ \ / _ \| |/ / / _ \| |_ %
% % / / | | | | __/ / \/ \ (_) | (_) | < | (_) | _| %
% % \/ |_| |_|\___| \_____/\___/ \___/|_|\_\ \___/|_|
V % V V V V V V V % % _____ 0 % 0 _
V V V V V V V V V V V V V V V V % /__ \___ _ __ ___ ___ _ __ _ __ _____ __ (_)_ __
V V V V V V V V V / /\/ _ \| '_ ` _ \ / _ \| '__| '__/ _ \ \ /\ / / | | '_ \
V % V V V V V V V / / | (_) | | | | | | (_) | | | | | (_) \ V V / | | | | |
V V V V V V V V V V V V V V V V \/ \___/|_| |_| |_|\___/|_| |_| \___/ \_/\_/ |_|_| |_| %
V V % V V V V V V V 0 0 ___ % 0 0 __
% % 0 __ _ 0 / __\ __ _ __ _ ___ / _| %
We communicate with computers 0 0 / _` | /__\/// _` |/ _` | / _ \| |_ 0
through language. We click on icons | (_| | / \/ \ (_| | (_| | | (_) | _| %
that have a description in words, 0 \__,_| \_____/\__,_|\__, | \___/|_|
we tap words on keyboards, use our 0 00 |___/ %
voice to give them instructions. 0 / / /\ \ \___ _ __ __| |___ 0 % %
Sometimes we trust our computer % % \ \/ \/ / _ \| '__/ _` / __| 0
with our most intimate thoughts and 0 0 \ /\ / (_) | | | (_| \__ \ 0 %
forget that they are extensive cal- % \/ \/ \___/|_| \__,_|___/ 0 %
culators. A computer understands
every word as a combination of ze- 0 0 0
ros and ones. A letter is read as a by Algolit % %
specific ASCII number: capital 'A'
is 001. The bag-of-words model is a simplifying representation of text
used in Natural Language Processing (NLP). In this model, a text
In all models, rule-based, classi- is represented as a collection of its unique words, disregarding
cal machine learning, and neural grammar, punctuation and even word order. The model transforms
networks, words undergo some type the text into a list of words and how many times they're used in
of translation into numbers in or- the text, or quite literally a bag of words.
der to understand the semantic
meaning of language. This is done This heavy reduction of language was the big shock when beginning
through counting. Some models count to machine learn. Bag of words is often used as a baseline, on
the frequency of single words, some which the new model has to perform better. It can understand the
might count the frequency of combi- subject of a text by recognizing the most frequent or important
nations of words, some count the words. It is often used to measure the similarities of texts by
frequency of nouns, adjectives, comparing their bags of words.
verbs or noun and verb phrases.
Some just replace the words in a For this work the article 'Le Livre de Demain' by engineer G.
text by their index numbers. Num- Vander Haeghen, published in 1907 in the Bulletin de l'Institut
bers optimize the operative speed International de Bibliographie of the Mundaneum, has been liter-
of computer processes, leading to ally reduced to a bag of words. You can buy a bag at the recep-
fast predictions, but they also re- tion of Mundaneum.
move the symbolic links that words
might have. Here we present a few ---
techniques that are dedicated to
making text readable to a machine. Concept & realisation: An Mertens
%
0 00
0 0 0
0 _____ ___ _____ ___ ___
0 0 /__ \/ __\ \_ \/ \/ __\
0 0 / /\/ _\____ / /\/ /\ / _\
0 00 / / / /|_____/\/ /_/ /_// /
\/ \/ \____/___,'\/
0
by Algolit
The TF-IDF (Term Frequency-Inverse Document Frequency) is a
weighting method used in text search. This statistical measure
makes it possible to evaluate the importance of a term contained
in a document, relative to a collection or corpus of documents.
The weight increases in proportion to the number of occurrences
37
%% % % % %% %% of the word in the document. It also varies according to the fre-
% % % % % quency of the word in the corpus. The TF-IDF is used in particu-
% % % % %% lar in the classification of spam in email softwares. %
% % % % % % % % %
% % % % A web-based interface shows this algorithm through animations %
% making it possible to understand the different steps of text %
% % % classification. How does a TF-IDF-based programme read a text? %
% How does it transform words into numbers? % % %
% % % % %
% --- % %
% % %
% Concept, code, animation: Sarah Garcin %
% % %
% % %
0 0 % %
% 0 0 %
0 ___ 0 _ 0 0
0 / _ \_ __ _____ _(_)_ __ __ _ __ _
0 / /_\/ '__/ _ \ \ /\ / / | '_ \ / _` | / _` |
0 / /_\\| | | (_) \ V V /| | | | | (_| | | (_| |
0 \____/|_| \___/ \_/\_/ |_|_| |_|\__, | \__,_|
0 0 0 |___/ 0
0 0 0 _ 0 %
% | |_ _ __ ___ ___
% 0 0 | __| '__/ _ \/ _ \ %
% 0 | |_| | | __/ __/
0 0 0 \__|_| \___|\___|
%
by Algolit %
% %
% % Parts-of-Speech is a category of words that we learn at school:
% noun, verb, adjective, adverb, pronoun, preposition, conjunction,
% interjection, and sometimes numeral, article, or determiner. %
In Natural Language Processing (NLP) there exist many writings
that allow sentences to be parsed. This means that the algorithm
can determine the part-of-speech of each word in a sentence.
'Growing a tree' uses this technique to define all nouns in a
specific sentence. Each noun is then replaced by its definition.
This allows the sentence to grow autonomously and infinitely.
The recipe of 'Growing a tree' was inspired by Oulipo's constraint
of 'littérature définitionnelle' invented by Marcel Benabou in
1966. In a given phrase, one replaces every significant element
(noun, adjective, verb, adverb) by one of its definitions in a
given dictionary; one reiterates the operation on the newly
received phrase, and again.
The dictionary of definitions used in this work is Wordnet. Word-
net is a combination of a dictionary and a thesaurus that can be
read by machines. According to Wikipedia it was created in the
Cognitive Science Laboratory of Princeton University starting in
1985. The project was initially funded by the US Office of Naval
Research and later also by other US government agencies including
DARPA, the National Science Foundation, the Disruptive Technology
Office (formerly the Advanced Research and Development Activity),
and REFLEX.
---
Concept, code & interface: An Mertens & Gijs de Heij
38
% % %% % %% % % %% _ _ % % % _ _ _ %% % % _ %%% % % %
%% /_\ | | __ _ ___ _ __(_) |_| |__ _ __ ___ (_) ___ %
%% 0 //_\\| |/ _` |/ _ \| '__| | __| '_ \| '_ ` _ \| |/ __| % %
% % % % / _ \ | (_| | (_) | | | | |_| | | | | | | | | | (__ %
% % % % % % \_/ \_/_|\__, |\___/|_| |_|\__|_| |_|_| |_| |_|_|\___| % %
% %% % % % |___/ % 0 _ _ %% % % 00 % __ %%
% % % _ __ ___ __ _ __| (_)_ __ __ _ ___ ___ / _| %% %
% % % | '__/ _ \/ _` |/ _` | | '_ \ / _` / __| / _ \| |_ %
% % | | | __/ (_| | (_| | | | | | (_| \__ \ | (_) | _| % %
|_| \___|\__,_|\__,_|_|_| |_|\__, |___/ \___/|_|
% % 0 % ___ 0 _ _ 0 _|___/ 0 %_ 0 %
% / __\ ___ _ __| |_(_) | | ___ _ __( )__ %
% % 0 /__\/// _ \ '__| __| | | |/ _ \| '_ \/ __| %%
/ \/ \ __/ | | |_| | | | (_) | | | \__ \ %
% 0 0 \_____/\___|_| \__|_|_|_|\___/|_| |_|___/
% % 0 _ _ _ 0
% % % _ __ 0 ___ _ __| |_ _ __ __ _(_) |_ %
% % % | '_ \ / _ \| '__| __| '__/ _` | | __| 0
% 00 | |_) | (_) | | | |_| | | (_| | | |_
% | .__/ \___/|_| \__|_| \__,_|_|\__| %
|_| 0 _ __
0 _ __ __ _ _ __| | _/_/
0 0 | '_ \ / _` | '__| |/ _ \ 0
0 | |_) | (_| | | | | __/ 0
0 | .__/ \__,_|_| |_|\___|
0 0 |_|
00 0 0 0 0 00
% by Guillaume Slizewicz (Urban Species)
% % %
Written in 1907, Un code télégraphique du portrait parlé is an
attempt to translate the 'spoken portrait', a face-description
technique created by a policeman in Paris, into numbers. By im-
plementing this code, it was hoped that faces of criminals and
fugitives could easily be communicated over the telegraphic net-
% work in between countries. In its form, content and ambition this
text represents our complicated relationship with documentation
% technologies. This text sparked the creation of the following in-
% stallations for three reasons: %
- First, the text is an algorithm in itself, a compression algo-
rithm, or to be more precise, the presentation of a compression
% algorithm. It tries to reduce the information to smaller pieces
while keeping it legible for the person who has the code. In this
% regard it is linked to the way we create technology, our pursuit
for more efficiency, quicker results, cheaper methods. It repre-
sents our appetite for putting numbers on the entire world, mea-
suring the smallest things, labeling the tiniest differences.
This text itself embodies the vision of the Mundaneum.
- Second it is about the reasons for and the applications of
technology. It is almost ironic that this text was in the se-
lected archives presented to us in a time when face recognition
and data surveillance are so much in the news. This text bears
the same characteristics as some of today's technology: motivated
by social control, classifying people, laying the basis for a
surveillance society. Facial features are at the heart of recent
controversies: mugshots were standardized by Bertillon, now they
are used to train neural network to predict criminals from law-
abiding citizens. Facial recognition systems allow the arrest of
criminals via CCTV infrastructure and some assert that people’s
features can predict sexual orientation.
- The last point is about how it represents the evolution of
mankind’s techno-structure. What our tools allow us to do, what
they forbid, what they hinder, what they make us remember and
what they make us forget. This document enables a classification
between people and a certain vision of what normality is. It
39
breaks the continuum into pieces thus allowing stigmatiza-
tion/discrimination. On the other hand this document also feels
%% %% % %% %% % obsolete today, because our techno-structure does not need such
% %% % % % detailed written descriptions about fugitives, criminals or citi- %
% %% % % % % % % zens. We can now find fingerprints, iris scans or DNA info in %
% % % % % % % % % % large datasets and compare them directly. Sometimes the techno- %
% % % % logical systems do not even need human supervision and recognize
% % % %% % % directly the identity of a person via their facial features or % %
% their gait. Computers do not use intricate written language to
describe a face, but arrays of integers. Hence all the words used
% in this documents seem désuets, dated. Have we forgotten what %
some of them mean? Did photography make us forget how to describe
% faces? Will voice-assistance software teach us again?
%
Writing with Otlet
% %
% % Writing with Otlet is a character generator that uses the spoken %
% portrait code as its database. Random numbers are generated and
% translated into a set of features. By creating unique instances,
% the algorithm reveals the richness of the description that is
possible with the portrait code while at the same time embodying
its nuances.
%
An interpretation of Bertillon's spoken portrait. %%
% This work draws a parallel between Bertillon systems and current
ones. A webcam linked to a facial recognition algorithm captures %
the beholder's face and translates it into numbers on a canvas,
% printing it alongside Bertillon's labelled faces.
% %
References
https://www.technologyreview.com/s/602955/neural-network-learns-
to-identify-criminals-by-their-faces/
https://fr.wikipedia.org/wiki/Bertillonnage
https://callingbullshit.org/case_studies/case_study_criminal_
machine_learning.html
% %
%
% % 0 0 0 0 %
0 0 0
/\ /\__ _ _ __ __ _ _ __ ___ __ _ _ __
0 / /_/ / _` | '_ \ / _` | '_ ` _ \ / _` | '_ \
/ __ / (_| | | | | (_| | | | | | | (_| | | | |
\/ /_/ \__,_|_| |_|\__, |_| |_| |_|\__,_|_| |_|
0 0 |___/ 0 0
% 0 0 0 0 0 %
%
by Laetitia Trozzi, student Arts²/Section Digital Arts
What better way to discover Paul Otlet and his passion for liter-
ature than to play hangman? Through this simple game, which con-
sists in guessing the missing letters in a word, the goal is to
make the public discover terms and facts related to one of the
creators of the Mundaneum.
%
Hangman uses an algorithm to detect the frequency of words in a
text. Next, a series of significant words were isolated in Paul
Otlet's bibliography. This series of words is integrated into a
hangman game presented in a terminal. The difficulty of the game
gradually increases as the player is offered longer and longer
words. Over the different game levels, information about the life
and work of Paul Otlet is displayed.
%
40
CONTEXTUAL STORIES
ABOUT READERS
Naive Bayes, Support Vector Machines and Linear ter trigram. All the overlapping sequences of
Regression are called classical machine learning three characters are isolated. For example, the
algorithms. They perform well when learning with character 3-grams of 'Suicide', would be, ‘Sui’,
small datasets. But they often require complex ‘uic’, ‘ici’, ‘cid’, etc. Character n-gram fea-
Readers. The task the Readers do, is also called tures are very simple, they're language-indepen-
feature-engineering. This means that a human needs dent and they're tolerant to noise. Furthermore,
to spend time on a deep exploratory data analysis spelling mistakes do not jeopardize the technique.
of the dataset.
Patterns found with character n-grams focus on
Features can be the frequency of words or letters, stylistic choices that are unconsciously made by
but also syntactical elements like nouns, adjec- the author. The patterns remain stable over the
tives, or verbs. The most significant features for full length of the text, which is important for
the task to be solved, must be carefully selected authorship recognition. Other types of experiments
and passed over to the classical machine learning could include measuring the length of words or
algorithm. This process marks the difference with sentences, the vocabulary richness, the frequen-
Neural Networks. When using a neural network, cies of function words; even syntax or semantics-
there is no need for feature-engineering. Humans related measurements.
can pass the data directly to the network and
achieve fairly good performances straightaway. This means that not only your physical fingerprint
This saves a lot of time, energy and money. is unique, but also the way you compose your
thoughts! The same n-gram technique discovered that
The downside of collaborating with Neural Networks The Cuckoo’s Calling, a novel by Robert Galbraith,
is that you need a lot more data to train your was actually written by … J. K. Rowling!
prediction model. Think of 1GB or more of plain
text files. To give you a reference, 1 A4, a text Reference
file of 5000 characters only weighs 5 KB. You Paper: On the Robustness of Authorship Attribu-
would need 8,589,934 pages. More data also re- tion Based on Character N-gram Features, Efs-
quires more access to useful datasets and more, tathios Stamatatos, in Journal of Law & Policy,
much more processing power. Volume 21, Issue 2, 2013.
News article: https://www.scientificamerican.com
--- Character n-gram for /article/how-a-computer-program-helped-show-jk-
authorship recognition --- rowling-write-a-cuckoos-calling/
Imagine … You've been working for a company for
more than ten years. You have been writing tons of --- A history of n-grams ---
emails, papers, internal notes and reports on very
different topics and in very different genres. All The n-gram algorithm can be traced back to the
your writings, as well as those of your col- work of Claude Shannon in information theory. In
leagues, are safely backed-up on the servers of the paper, 'A Mathematical Theory of Communica-
the company. tion', published in 1948, Shannon performed the
first instance of an n-gram-based model for natu-
One day, you fall in love with a colleague. After ral language. He posed the question: given a se-
some time you realize this human is rather mad and quence of letters, what is the likelihood of the
hysterical and also very dependent on you. The day next letter?
you decide to break up, your (now) ex elaborates a
plan to kill you. They succeed. This is unfortu- If you read the following excerpt, can you tell
nate. A suicide letter in your name is left next who it was written by? Shakespeare or an n-gram
to your corpse. Because of emotional problems, it piece of code?
says, you decided to end your life. Your best
friends don't believe it. They decide to take the SEBASTIAN: Do I stand till the break off.
case to court. And there, based on the texts you
and others produced over ten years, a machine BIRON: Hide thy head.
learning model reveals that the suicide letter was
written by someone else. VENTIDIUS: He purposeth to Athens: whither, with
the vow
How does a machine analyse texts in order to iden- I made to handle you.
tify you? The most robust feature for authorship
recognition is delivered by the character n-gram FALSTAFF: My good knave.
technique. It is used in cases with a variety of
thematics and genres of the writing. When using You may have guessed, considering the topic of
character n-grams, texts are considered as se- this story, that an n-gram algorithm generated
quences of characters. Let's consider the charac- this text. The model is trained on the compiled
41
works of Shakespeare. While more recent algo- press, traders sell. On the contrary, if the news
rithms, such as the recursive neural networks of is good, they buy.
the CharNN, are becoming famous for their perfor-
mance, n-grams still execute a lot of NLP tasks. A paper by Haikuan Liu of the Australian National
They are used in statistical machine translation, University states that the tense of verbs used in
speech recognition, spelling correction, entity tweets can be an indicator of the frequency of fi-
detection, information extraction, ... nancial transactions. His idea is based on the
fact that verb conjugation is used in psychology
to detect the early stages of human depression.
--- God in Google Books ---
Reference
In 2006, Google created a dataset of n-grams from Paper: 'Grammatical Feature Extraction and Analy-
their digitized book collection and released it sis of Tweet Text: An Application towards Pre-
online. Recently they also created an n-gram dicting Stock Trends', Haikuan Liu, Research
viewer. School of Computer Science (RSCS), College of
Engineering and Computer Science (CECS),
This allowed for many socio-linguistic investiga- The Australian National University (ANU)
tions. For example, in October 2018, the New York
Times Magazine published an opinion article titled
'It’s Getting Harder to Talk About God'. The au- --- Bag of words ---
thor, Jonathan Merritt, had analysed the mention
of the word 'God' in Google's dataset using the In Natural Language Processing (NLP), 'bag of
n-gram viewer. He concluded that there had been words' is considered to be an unsophisticated mod-
a decline in the word's usage since the twentieth el. It strips text of its context and dismantles
century. Google's corpus contains texts from the it into a collection of unique words. These words
sixteenth century leading up to the twenty-first. are then counted. In the previous sentences, for
However, what the author missed out on was the example, 'words' is mentioned three times, but
growing popularity of scientific journals around this is not necessarily an indicator of the text's
the beginning of the twentieth century. This new focus.
genre that was not mentioning the word God shifted
the dataset. If the scientific literature was The first appearance of the expression 'bag of
taken out of the corpus, the frequency of the word words' seems to go back to 1954. Zellig Harris,
'God' would again flow like a gentle ripple from an influential linguist, published a paper called
a distant wave. 'Distributional Structure'. In the section called
'Meaning as a function of distribution', he says
'for language is not merely a bag of words but a
--- Grammatical features taken from tool with particular properties which have been
Twitter influence the stock market --- fashioned in the course of its use. The linguist's
work is precisely to discover these properties,
The boundaries between academic disciplines are whether for descriptive analysis or for the synthesis
becoming blurred. Economics research mixed with of quasi-linguistic systems.'
psychology, social science, cognitive and emo-
tional concepts have given rise to a new economics
subfield, called 'behavioral economics'. This
means that researchers can start to explain stock
market mouvement based on factors other than eco-
nomic factors only. Both the economy and 'public
opinion' can influence or be influenced by each
other. A lot of research is being done on how to
use 'public opinion' to predict tendencies in
stock-price changes.
'Public opinion' is estimated from sources of
large amounts of public data, like tweets, blogs
or online news. Research using machinic data anal-
ysis shows that the changes in stock prices can be
predicted by looking at 'public opinion', to some
degree. There are many scientific articles online,
which analyse the press on the 'sentiment' ex-
pressed in them. An article can be marked as more
or less positive or negative. The annotated press
articles are then used to train a machine learning
model, which predicts stock market trends, marking
them as 'down' or 'up'. When a company gets bad
42
learners learn learners learn learners learn learners learn learners learn learners learn
learners learn learners learn learners learn learners learn learners learn
learners learn learners learn learners learn learners learn
learners learn learners learn learners learn
learners learn learners learn learners learn lea
ners learn learners learn learners learn
learners learn learners learn learners learn
learners learn learners learn learners
earn learners learn learners learn
learners learn learners learn
learners learn learners learn lea
ners learn learners learn learners
learn learners learn learners
earn learners learn learne
s learn learners learn
learners learn learners learn
learners learn learners learn
learners learn learners learn
learners learn
learners learn learners learn
learners learn learners learn
learners learn
learners learn learners learn
learners learn
learners learn learners learn
learners learn
learners learn learners
learn learners learn
learners learn
learners learn learners learn
learners learn
learners learn
learners learn learners
learn learners learn
learners learn
learners learn
learners learn lea
ners learn learners learn
learners learn
learners learn
learners learn
learners learn
learners learn
learners learn learners
earn learners learn
learners learn
learners learn
learners learn
learners learn
learners learn
learners learn
learners learn
learners learn
learners learn
learners learn
learners learn
learners learn
learners learn
learners learn
learners learn
learners learn
learners learn
learners learn
learners learn
learners learn
learners learn
learners learn
learners learn
learners learn
43
4n r- ro %r5 l e +-+-+-+-+-+-+-+-+ f +-+-+-+-+-+ m 9-e p + st2- a , _ nr2
l itr9 op 2c b ue |l|e|a|r|n|e|r|s| , y |l|e|a|r|n| ) g- 9 c w 1 atn_wn o_ c|
c o b op , +_7 -x a 9acl +-+-+-+-+-+-+-+-+ hc +-+-+-+-+-+ 34 u a 9a l |an t p 9 -
|\ _ l6el , 7 3 u r1 3 8dl a. m s T rv t ro|lm ni3 4 V3 as1to 4 e hp
5_s -o 4 d o9n t 0 t V i5n _ i, _ iu9 l + t t 6t s r s exe4eh l 4
ri _g d s es c s a 4s i+ i _ +-+-+-+-+-+-+-+-+ +-+-+-+ +-+-+-+-+-+-+-+ e l4 f k 5l l wu |f
ete V o I- 4e |l|e|a|r|n|e|r|s| 6 e |a|r|e| |p|a|t|t|e|r|n| st 62 t a ne e 2 ?
.n l 1 ntb 5 d9 +-+-+-+-+-+-+-+-+ e e1 +-+-+-+ +-+-+-+-+-+-+-+ ia 5 n i w er8
er 1 t i 9 te9 n r7 | t ie m +-+-+-+-+-+-+-+ n s 1 i- e i X c w a
4 _c4 c s+ m t eh h.5 t a i t m p3 a e |f|i|n|d|e|r|s| , ll 6a e e7ifo- +cs te s-
h 5 8 m wl c tl u w2 +-+-+-+-+-+-+-+ 8 r s oe t % 8- 1 tl3o 4
n r a t t 3a 9 +-+-+-+-+-+-+-+-+ 5i9 +-+-+-+ +-+-+-+-+-+-+-+-+ l s 9 | 9a e 0sbntaf
m(um8 j ra e +t o |l|e|a|r|n|e|r|s| |a|r|e| |c|r|a|w|l|i|n|g| n n ei pte7i r 6ms
t s G_ el i + ka e . +-+-+-+-+-+-+-+-+ +-+-+-+ +-+-+-+-+-+-+-+-+ ,/s u r r 4 1 i h
d heeo 2eei m g r ao a ah( 9a u m9 V e +-+-+-+-+-+-+-+ +-+-+-+-+ nae T-e r s-i5 7n
gt r_ y e io 96 e e s d |T trig - l |t|h|r|o|u|g|h| |d|a|t|a| 7s e1s77 87 2 fw m c
9d. 2 _ e 2nnm 96 n a t7- c d, o e +-+-+-+-+-+-+-+ +-+-+-+-+ 6 r n rbhi e 5 s n d
/ _ 2r s f a ef +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+ +-+-+-+-+ h asn _
t5 w w p l n | a -s |l|e|a|r|n|e|r|s| e |g|e|n|e|r|a|t|e| |s|o|m|e| |k|i|n|d| u s s
ie im i i 7 t 4 +-+-+-+-+-+-+-+-+ r +-+-+-+-+-+-+-+-+ +-+-+-+-+ +-+-+-+-+ u t nr+ a
c 7 t s x 4 da n 7 Fd e c & +-+-+ +-+-+-+-+-+-+-+-+ raa o c5 ' e ro.
k1 n t re 8 n et 9 1 l r 0V |o|f| |s|p|e|c|i|f|i|c| a t9 s c rv v s l
n_fa r% a Z a 5 w me m n 5 1s n +-+-+ +-+-+-+-+-+-+-+-+ t S 1 o a r d rb
y 7 r c o ge D _ns v / b +-+-+-+-+-+-+-+-+-+ 8 4- i o 9 t e
i 4 9 9t6 9- é2 o p| o v i |'|g|r|a|m|m|a|r|'| n p t p 8sn _ l 8
nt 2pc t V4 e ha e 3 1 , n 2 i o +-+-+-+-+-+-+-+-+-+ %4 r 8 1 1 t e
e 8 rn d +-+-+-+-+-+-+-+-+-+-+-+ i +-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ u t
e e e e r F |c|l|a|s|s|i|f|i|e|r|s| %f |g|e|n|e|r|a|t|e|,| |e|v|a|l|u|a|t|e| 1 h V0 t n
nh % c 5 h r +-+-+-+-+-+-+-+-+-+-+-+ ti +-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ Ul n m ,
- n 2 ab m 3 o- r e 6| n +-+-+-+ +-+-+-+-+-+-+-+-+ 6 + oe /
l t i u + u t l i 7 ei |a|n|d| |r|e|a|d|j|u|s|t| 5 r f l f5 %
n 2 s e m a m e d1 m uh c +-+-+-+ +-+-+-+-+-+-+-+-+ n s g o _
e d c ps +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+ +-+-+-+ + a D y5 8r
+1n o h |l|e|a|r|n|e|r|s| |u|n|d|e|r|s|t|a|n|d| |a|n|d| k4t tr t m
u a t +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+ +-+-+-+ a 3 i 3 t
2 r 7 n n 9 r r. t p i +-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ -- c
g + l t v c i 8 f as |r|e|v|e|a|l| |p|a|t|t|e|r|n|s| a _ n
4 s l 5 2 + f s - l +-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ 4 - e
y + h -_ 7 +-+-+-+-+-+-+-+-+ +-+-+-+-+-+ +-+-+-+-+-+-+ o . - i e
i e l t e _ V n |l|e|a|r|n|e|r|s| |d|o|n|'|t| |a|l|w|a|y|s| 4b ,i
_ % rt h e ,a +-+-+-+-+-+-+-+-+ +-+-+-+-+-+ +-+-+-+-+-+-+ a _ h _
2 V o 5 t +-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+ _ s
c % po + h o3 mi5 8 |d|i|s|t|u|i|n|g|u|i|s|h| |w|e|l|l| w 7 _nn
, ha u pk +-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+ 91s 6 a
s hp I 3 % +-+-+-+-+-+ +-+-+-+-+-+-+-+-+ i 8
v o 6 o r s |w|h|i|c|h| |p|a|t|t|e|r|n|s| s_ oge e
n a + e o e 3 n 7 +-+-+-+-+-+ +-+-+-+-+-+-+-+-+ o 6 +
i l r \ m + a l r +-+-+-+-+-+-+ +-+-+ +-+-+-+-+-+-+-+-+ , n
c a o o o |s|h|o|u|l|d| |b|e| |r|e|p|e|a|t|e|d| eh s i
o tlt t 2 e5 d +-+-+-+-+-+-+ +-+-+ +-+-+-+-+-+-+-+-+ o s
7 d 2 5 | n | 1 ey d te a t
r | , + 9 6 % f a i s %
n o+| r u s \ 4 e ep e
ao 2 | f' | e e r 9 7 Td i d e
. t 8m d c l 6 l o i _ t T i - i
n 7 e d 3 p l a n . i l
i i % 8 a + p r l e
4 % a l
| h 5 | tl d 1mo 7 t N
, t o i 9 o? F W 9 dC %hf
o m 5 t t w , - 3p
a d s e a n t _ o c \ f
+ p a r f |el 8 , g i l e e
t e3 - - 9 h c t t +w + | u0 w t
. h 5 a , s
t d _ n V 4 a o
, o t r nt
w e e
44
V V V % V % V % V V % V % % % % % %% % % % % % % %
V V V V V V V V V V V V V V V V % % % 0 % % % %% % % %%% % %
V V V % V V V V V V % % %% 0 0 % % 0 % 00 % % %
% % % % 0 % __ _ 0 % 0 % ___ % 0 0 %
% % % % % % 0 /\ \ \__ _(_)_ 0 _____ / __\ __ _ _ _ ___ ___ %
% % LEARNERS % % / \/ / _` | \ \ / / _ \ /__\/// _` | | | |/ _ \/ __|
% % % % % / /\ / (_| | |\ V / __/ / \/ \ (_| | |_| | __/\__ \
% % % % \_\ \/ \__,_|_| \_/ \___| \_____/\__,_|\__, |\___||___/
V V V V V V V V % 0 % % 0 0 % % |___/
V V V V V V V V V V V V V V V V % __ _ __ _ _ __ ___ ___ 0 % %
V V V V V V V % V V % % / _` |/ _` | '_ ` _ \ / _ \ %
V V V V V V V V 0 0 | (_| | (_| | | | | | | __/ %
V V V V V V V V V V V V V V V V % 0 00 \__, |\__,_|_| |_| |_|\___| 0 %
V V V V V V V V V 0 |___/ 0
% % 0 0 0
Learners are the algorithms that
distinguish machine learning prac- by Algolit % %
tices from other types of prac- %
tices. They are pattern finders, In machine learning Naive Bayes methods are simple probabilistic
capable of crawling through data classifiers that are widely applied for spam filtering and decid-
and generating some kind of spe- ing whether a text is positive or negative.
cific 'grammar'. Learners are based
on statistical techniques. Some They require a small amount of training data to estimate the nec-
need a large amount of training essary parameters. They can be extremely fast compared to more
data in order to function, others sophisticated methods. They are difficult to generalize, which
can work with a small annotated means that they perform on specific tasks, demanding to be
set. Some perform well in classifi- % trained with the same style of data that will be used to work
cation tasks, like spam identifica- with afterwards.
tion, others are better at predict-
ing numbers, like temperatures, This game allows you to play along the rules of Naive Bayes.
distances, stock market values, While manually executing the code, you create your own playful
and so on. model that 'just works'. A word of caution is necessary: because
you only train it with 6 sentences – instead of the minimum 2000
The terminology of machine learn- – it is not representative at all!
ing is not yet fully established.
Depending on the field, whether ---
statistics, computer science or
the humanities, different terms Concept & realisation: An Mertens
are used. Learners are also called
classifiers. When we talk about
Learners, we talk about the inter- % 0 % 0 0 0 %
woven functions that have the ca- 0 0 0 0 0 %
pacity to generate other functions, __ _ 0
evaluate and readjust them to fit 0 0 / /(_)_ __ ___ __ _ _ __ 0
the data. They are good at under- / / | | '_ \ / _ \/ _` | '__|
standing and revealing patterns. 0 0 / /__| | | | | __/ (_| | |
But they don't always distinguish 0 \____/_|_| |_|\___|\__,_|_|
well which of the patterns should 0 __ 0 0 _
be repeated. 0 /__\ ___ __ _ _ __ ___ ___ ___(_) ___ _ __
/ \/// _ \/ _` | '__/ _ \/ __/ __| |/ _ \| '_ \
In software packages, it is not al- 00 0 / _ \ __/ (_| | | | __/\__ \__ \ | (_) | | | |
ways possible to distinguish the 0 0 \/ \_/\___|\__, |_| \___||___/___/_|\___/|_| |_|
characteristic elements of the 0 0 |___/ 0
classifiers, because they are hid- 0 0 __ _ __ _ _ __ ___ ___
den in underlying modules or li- 0 / _` |/ _` | '_ ` _ \ / _ \
braries. Programmers can invoke | (_| | (_| | | | | | | __/
them using a single line of code. 0 \__, |\__,_|_| |_| |_|\___| 0 0 %
For this exhibition, we therefore |___/ 00
developed two table games that show 0 0 0 0
in detail the learning process of
simple, but frequently used classi- by Algolit
fiers.
Linear Regression is one of the best-known and best-understood
algorithms in statistics and machine learning. It has been around
for almost 200 years. It is an attractive model because the rep-
% resentation is so simple. In statistics, linear regression is a
statistical method that allows to summarize and study relation-
ships between two continuous (quantitative) variables.
45
% % % %% % % By playing this game you will realize that as a player you have a
% % % % lot of decisions to make. You will experience what it means to %
% %% create a coherent dataset, to decide what is in and what is not
% % % % in. If all goes well, you will feel the urge to change your data %
% % in order to obtain better results. This is part of the art of ap- %
%% % % % % % proximation that is at the basis of all machine learning prac-
% % % tices. % % % % % % % %
% % %
% % % % % --- % %
% % % % % % %
Concept & realisation: An Mertens %
% % % %
%% % %
0 % 0 0
00 0 0 0 % 0 0
0 _____ _ _ __ 0 _ 0 %
/__ \_ __ __ _(_) |_ _/_/ __| | ___
/ /\/ '__/ _` | | __/ _ \ / _` |/ _ \
% % 0 / / | | | (_| | | || __/ | (_| | __/
00 \/ |_| \__,_|_|\__\___| \__,_|\___|
% % _ 0 00 0 % 0
% __| | ___ ___ _ _ _ __ ___ ___ _ __
% / _` |/ _ \ / __| | | | '_ ` _ \ / _ \ '_ \ ____
% | (_| | (_) | (__| |_| | | | | | | __/ | | |/___/
\__,_|\___/ \___|\__,_|_| |_| |_|\___|_| |_|
% 0 _ 0 _ _ 0 0
| |_ __ _| |_(_) ___ _ __
| __/ _` | __| |/ _ \| '_ \ 0
% | |( |_| |_| | | (_) | | | |
\__\__,_|\__|_|\___/|_| |_| 0
0 0 % 0 0
%
Traité de Documentation. Three algorithmic poems.
by Rémi Forte, designer-researcher at L’Atelier national de
recherche typographique, Nancy, France
%
serigraphy on paper, 60 × 80 cm, 25 ex., 2019, for sale at the
% reception of the Mundaneum.
The poems, reproduced in the form of three posters, are an algo-
% rithmic and poetic re-reading of Paul Otlet's 'Traité de documen-
tation'. They are the result of an algorithm based on the mysteri-
ous rules of human intuition. It has been applied to a fragment
taken from Paul Otlet's book and is intended to be representative
% of his bibliological practice.
%
For each fragment, the algorithm splits the text, words and punc-
tuation marks are counted and reordered into a list. In each
% line, the elements combine and exhaust the syntax of the selected
fragment. Paul Otlet's language remains perceptible but exacer-
bated to the point of absurdity. For the reader, the systematiza-
% tion of the text is disconcerting and his reading habits are dis-
rupted.
% Built according to a mathematical equation, the typographical
% composition of the poster is just as systematic as the poem. How-
ever, friction occurs occasionally; loop after loop, the lines
% extend to bite on the neighbouring column. Overlays are created
and words are hidden by others. These telescopic handlers draw
alternative reading paths.
46
CONTEXTUAL STORIES
ABOUT LEARNERS
--- Naive Bayes & Viagra --- Only after 150 years was the accusation refuted.
Naive Bayes is a famous learner that performs well Fast forward to 1939, when Bayes' rule was still
with little data. We apply it all the time. Chris- virtually taboo, dead and buried in the field of
tian and Griffiths state in their book, 'Algorithms statistics. When France was occupied in 1940 by
To Live By', that 'our days are full of small Germany, which controlled Europe's factories and
data'. Imagine, for example, that you're standing farms, Winston Churchill's biggest worry was the
at a bus stop in a foreign city. The other person U-boat peril. U-boat operations were tightly con-
who is standing there has been waiting for 7 min- trolled by German headquarters in France. Each
utes. What do you do? Do you decide to wait? And submarine received orders as coded radio messages
if so, for how long? When will you initiate other long after it was out in the Atlantic. The mes-
options? Another example. Imagine a friend asking sages were encrypted by word-scrambling machines,
advice about a relationship. He's been together called Enigma machines. Enigma looked like a com-
with his new partner for a month. Should he invite plicated typewriter. It was invented by the German
the partner to join him at a family wedding? firm Scherbius & Ritter after the First World War,
when the need for message-encoding machines had
Having pre-existing beliefs is crucial for Naive become painfully obvious.
Bayes to work. The basic idea is that you calcu-
late the probabilities based on prior knowledge Interestingly, and luckily for Naive Bayes and
and given a specific situation. the world, at that time, the British government
and educational systems saw applied mathematics
The theorem was formulated during the 1740s by and statistics as largely irrelevant to practical
Thomas Bayes, a reverend and amateur mathemati- problem-solving. So the British agency charged
cian. He dedicated his life to solving the ques- with cracking German military codes mainly hired
tion of how to win the lottery. But Bayes' rule men with linguistic skills. Statistical data was
was only made famous and known as it is today by seen as bothersome because of its detail-oriented
the mathematician Pierre Simon Laplace in France a nature. So wartime data was often analysed not by
bit later in the same century. For a long time af- statisticians, but by biologists, physicists, and
ter La Place's death, the theory sank into obliv- theoretical mathematicians. None of them knew that
ion until it was dug up again during the Second the Bayes rule was considered to be unscientific
World War in an effort to break the Enigma code. in the field of statistics. Their ignorance proved
fortunate.
Most people today have come in contact with Naive
Bayes through their email spam folders. Naive It was the now famous Alan Turing – a mathemati-
Bayes is a widely used algorithm for spam detec- cian, computer scientist, logician, cryptoanalyst,
tion. It is by coincidence that Viagra, the erec- philosopher and theoretical biologist – who used
tile dysfunction drug, was approved by the US Food Bayes' rules probabilities system to design the
& Drug Administration in 1997, around the same 'bombe'. This was a high-speed electromechanical
time as about 10 million users worldwide had made machine for testing every possible arrangement
free webmail accounts. The selling companies were that an Enigma machine would produce. In order to
among the first to make use of email as a medium crack the naval codes of the U-boats, Turing sim-
for advertising: it was an intimate space, at the plified the 'bombe' system using Baysian methods.
time reserved for private communication, for an It turned the UK headquarters into a code-breaking
intimate product. In 2001, the first SpamAssasin factory. The story is well illustrated in The Imi-
programme relying on Naive Bayes was uploaded to tation Game, a film by Morten Tyldum dating from
SourceForge, cutting down on guerilla email mar- 2014.
keting.
Reference --- A story about sweet peas ---
Machine Learners, by Adrian MacKenzie, MIT Press,
Cambridge, US, November 2017. Throughout history, some models have been invented
by people with ideologies that are not to our lik-
ing. The idea of regression stems from Sir Francis
--- Naive Bayes & Enigma --- Galton, an influential nineteenth-century scientist.
He spent his life studying the problem of heredity
This story about Naive Bayes is taken from the – understanding how strongly the characteristics
book 'The Theory That Would Not Die', written by of one generation of living beings manifested them-
Sharon Bertsch McGrayne. Among other things, she selves in the following generation. He established
describes how Naive Bayes was soon forgotten after the field of eugenics, defining it as 'the study
the death of Pierre Simon Laplace, its inventor. of agencies under social control that may improve
The mathematician was said to have failed to or impair the racial qualities of future genera-
credit the works of others. Therefore, he suffered tions, either physically or mentally'. On Wikipedia,
widely circulated charges against his reputation. Galton is a prime example of scientific racism.
47
Galton initially approached the problem of hered- In 1962, he created the Perceptron, a model that
ity by examining characteristics of the sweet pea learns through the weighting of inputs. It was
plant. He chose this plant because the species can set aside by the next generation of researchers,
self-fertilize. Daughter plants inherit genetic because it can only handle binary classification.
variations from mother plants without a contribu-
tion from a second parent. This characteristic This means that the data has to be clearly
eliminates having to deal with multiple sources. separable, as for example, men and women, black
and white. It is clear that this type of data is
Galton's research was appreciated by many intel- very rare in the real world. When the so-called
lectuals of his time. In 1869, in 'Hereditary Ge- first AI winter arrived in the 1970s and the funding
nius', Galton claimed that genius is mainly a mat- decreased, the Perceptron was also neglected. For
ter of ancestry and he believed that there was a ten years it stayed dormant. When spring settled
biological explanation for social inequality at the end of the 1980s, a new generation of re-
across races. Galton even influenced his half- searchers picked it up again and used it to con-
cousin Charles Darwin with his ideas. After read- struct neural networks. These contain multiple
ing Galton's paper, Darwin stated, 'You have made layers of Perceptrons. That is how neural networks
a convert of an opponent in one sense for I have saw the light. One could say that the current ma-
always maintained that, excepting fools, men did chine learning season is particularly warm, but it
not differ much in intellect, only in zeal and takes another winter to know a summer.
hard work'. Luckily, the modern study of heredity
managed to eliminate the myth of race-based ge-
netic difference, something Galton tried hard to --- BERT ---
maintain.
Some online articles say that the year 2018 marked
Galton's major contribution to the field was lin- a turning point for the field of Natural Language
ear regression analysis, laying the groundwork for Processing (NLP). A series of deep-learning models
much of modern statistics. While we engage with achieved state-of-the-art results on tasks like
the field of machine learning, Algolit tries not question-answering or sentiment-classification.
to forget that ordering systems hold power, and Google’s BERT algorithm entered the machine learn-
that this power has not always been used to the ing competitions of last year as a sort of 'one
benefit of everyone. Machine learning has inher- model to rule them all'. It showed a superior per-
ited many aspects of statistical research, some formance over a wide variety of tasks.
less agreeable than others. We need to be atten-
tive, because these world views do seep into the BERT is pre-trained; its weights are learned in
algorithmic models that create new orders. advance through two unsupervised tasks. This means
BERT doesn’t need to be trained from scratch for
References each new task. You only have to finetune its
weights. This also means that a programmer wanting
http://galton.org/letters/darwin/correspondence.htm to use BERT, does not know any longer what parame-
https://www.tandfonline.com/doi/full/10.1080 ters BERT is tuned to, nor what data it has seen
/10691898.2001.11910537 to learn its performances.
http://www.paramoulipist.be/?p=1693
BERT stands for 'Bidirectional Encoder Representa-
tions from Transformers'. This means that BERT al-
--- Perceptron --- lows for bidirectional training. The model learns
the context of a word based on all of its sur-
We find ourselves in a moment in time in which roundings, left and right of a word. As such, it
neural networks are sparking a lot of attention. can differentiate between 'I accessed the bank ac-
But they have been in the spotlight before. The count' and 'I accessed the bank of the river'.
study of neural networks goes back to the 1940s,
when the first neuron metaphor emerged. The neuron Some facts:
is not the only biological reference in the field - BERT_large, with 345 million parameters, is the
of machine learning - think of the word corpus or largest model of its kind. It is demonstrably su-
training. The artificial neuron was constructed in perior on small-scale tasks to BERT_base, which
close connection to its biological counterpart. uses the same architecture with 'only' 110 million
parameters.
Psychologist Frank Rosenblatt was inspired by fel- - to run BERT you need to use TPUs. These are the
low psychologist Donald Hebb's work on the role of Google's processors (CPUs) especially engineered
neurons in human learning. Hebb stated that 'cells for TensorFLow, the deep-learning platform. TPU's
that fire together wire together'. His theory now renting rates range from $8/hr till $394/hr. Algo-
lies at the basis of associative human learning, lit doesn't want to work with off-the-shelf pack-
but also unsupervised neural network learning. It ages, we are interested in opening up the black-
moved Rosenblatt to expand on the idea of the ar- box. In that case, BERT asks for quite some sav-
tificial neuron. ings in order to be used.
48
█▒░░ ▓▒█░░▒▓███▀▒░ ░▒ ▒ ░ ▒▒▓██▒ ░░ ▒░ ░ ▒ ░▓ ░▒▓ ▒ ▒█░░▒█▓▒░▓▒ ▒▓▒▒░ ▒ ▒▒ ▓▒░ ▒░▓ █ ▒░ █░█ ▓▒░ ▒▓░░
▓ ▒░ ▒▒ ░▒░▒▓ ░ ░ ▒ ░ ▒ ▒ ░ ░ ░▒░ ░▒ ░ ▒ ░▒░▓░ ▒ ░ ▒ ░▒░ ░ ░▒ ▒░░ ░ ▒ ░░▓ ░ ▓▓░░ ░░▒▒▓░
░ ▒ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░░ ░ ▒ ░ ░ ░ ░ ░ ░░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ing will be fed examples sentation of text used * CONSTANT
░ of spam and real mes- in Natural Language Pro- Constant is a non-prof-
░ ░ ░ ░ sages. These examples cessing (NLP). In this it, artist-run organisa-
░ ░ ░ ░ are entries, or rows model, a text is repre- tion based in Brussels
░ ░ from the dataset with a sented as a collection since 1997 and active in
░ ░ label, spam or non-spam. of its unique words, the fields of art, media
░ GLOSSARY ░ The labelling of a disregarding grammar, and technology. Algolit
░ dataset is work executed punctuation and even started as a project of
░ ░ ░ by humans, they pick a word order. The model Constant in 2012.
░ ░ ░ ░ label for each row of transforms the text into http://constantvzw.org
░ the dataset. To ensure a list of words and how
░ the quality of the la- many times they're used * DATA WORKERS
░ bels multiple annotators in the text, or quite Artificial intelligences
see the same row and literally a bag of that are developed to
This is a non-exhaustive have to give the same words. Bag of words is serve, entertain, record
wordlist, based on terms label before an example often used as a base- and know about humans.
that are frequently used is included in the line, on which the new The work of these ma-
in the exhibition. It training data. model has to perform chinic entities is usu-
might help visitors who better. ally hidden behind in-
are not familiar with * AI OR ARTIFICIAL IN- terfaces and patents.
the vocabulary related TELLIGENCES * CHARACTER N-GRAM In the exhibition, algo-
to the field of Natural In computer science, ar- A technique that is used rithmic storytellers
Language Processing tificial intelligence for authorship recogni- leave their invisible
(NLP), Algolit or the (AI), sometimes called tion. When using charac- underworld to become
Mundaneum. machine intelligence, ter n-grams, texts are interlocutors.
is intelligence demon- considered as sequences
* ALGOLIT strated by machines, in of characters. Let's * DUMP
A group from Brussels contrast to the natural consider the character According to the English
involved in artistic re- intelligence displayed trigram. All the over- dictionary, a dump is an
search on algorithms and by humans and other ani- lapping sequences of accumulation of refused
literature. Every month mals. Computer science three characters are and discarded materials
they gather to experi- defines AI research as isolated. For example, or the place where such
ment with code and texts the study of ‘intelli- the character 3-grams of materials are dumped. In
that are published under gent agents’. Any device 'Suicide', would be, computing a dump refers
free licenses. that perceives its envi- 'Sui', 'uic', 'ici', to a ‘database dump’, a
http://www.algolit.net ronment and takes ac- 'cid' etc. Patterns record of data from a
tions that maximize its found with character database used for easy
* ALGOLITERARY chance of successfully n-grams focus on stylis- downloading or for back-
Word invented by Algolit achieving its goals. tic choices that are un- ing up a database.
for works that explore More specifically, Ka- consciously made by the Database dumps are often
the point of view of the plan and Haenlein define author. The patterns re- published by free soft-
algorithmic storyteller. AI as ‘a system’s abil- main stable over the ware and free content
What kind of new forms ity to correctly inter- full length of the text. projects, such as
of storytelling do we pret external data, to Wikipedia, to allow re-
make possible in dia- learn from such data, * CLASSICAL MACHINE use or forking of the
logue with machinic and to use those learn- LEARNING database.
agencies? ings to achieve specific Naive Bayes, Support
goals and tasks through Vector Machines and * FEATURE ENGINEERING
* ALGORITHM flexible adaptation’. Linear Regression are The process of using do-
A set of instructions in Colloquially, the term called classical machine main knowledge of the
a specific programming ‘artificial intelli- learning algorithms. data to create features
language, that takes gence’ is used to de- They perform well when that make machine learn-
an input and produces scribe machines that learning with small ing algorithms work.
an output. mimic ‘cognitive’ func- datasets. But they often This means that a human
tions that humans asso- require complex Readers. needs to spend time on a
* ANNOTATION ciate with other human The task the Readers do, deep exploratory data
The annotation process minds, such as ‘learn- is also called feature- analysis of the dataset.
is a crucial step in su- ing’ and ‘problem solv- engineering (see below). In Natural Language Pro-
pervised machine learn- ing’. (Wikipedia) This means that a human cessing (NLP) features
ing where the algorithm needs to spend time on can be the frequency of
is given examples of * BAG OF WORDS a deep exploratory data words or letters, but
what it needs to learn. The bag-of-words model analysis of the dataset. also syntactical ele-
A spam filter in train- is a simplifying repre- ments like nouns, adjec-
49
█▒░░ ▓▒█░░▒▓███▀▒░ ░▒ ▒ ░ ▒▒▓██▒ ░░ ▒░ ░ ▒ ░▓ ░▒▓ ▒ ▒█░░▒█▓▒░▓▒ ▒▓▒▒░ ▒ ▒▒ ▓▒░ ▒░▓ █ ▒░ █░█ ▓▒░ ▒▓░░
▓ ▒░ ▒▒ ░▒░▒▓ ░ ░ ▒ ░ ▒ ▒ ░ ░ ░▒░ ░▒ ░ ▒ ░▒░▓░ ▒ ░ ▒ ░▒░ ░ ░▒ ▒░░ ░ ▒ ░░▓ ░ ▓▓░░ ░░▒▒▓░
░ ▒ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░░ ░ ▒ ░ ░ ░ ░ ░ ░░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
tives, or verbs. The to make these as free as from Virginia Woolf's nating between face and
most significant fea- possible, in long-last- entire work to all ver- non-face. The jobs
tures for the task to be ing, open formats that sions of Terms of Ser- posted on this platform
solved, must be care- can be used on almost vice published by Google are often paid less than
fully selected and any computer. As of since its existence. a cent per task. Tasks
passed over to the clas- 23 June 2018, Project that are more complex or
sical machine learning Gutenberg reached 57,000 * MACHINE LEARNING require more knowledge
algorithm. items in its collection MODELS can be paid up to sev-
of free eBooks. Algorithms based on eral cents. Many aca-
* FLOSS OR FREE LIBRE (Wikipedia) statistics, mainly used demic researchers use
OPEN SOURCE SOFTWARE to analyse and predict Mechanical Turk as an
Software that anyone is * HENRI LA FONTAINE situations based on ex- alternative to have
freely licensed to use, Henri La Fontaine isting cases. In this their students execute
copy, study, and change (1854-1943) is a Belgian exhibition we focus on these tasks.
in any way, and the politician, feminist and machine learning models
source code is openly pacifist. He was awarded for text processing or * MUNDANEUM
shared so that people the Nobel Peace Prize in Natural language pro- In the late nineteenth
are encouraged to volun- 1913 for his involvement cessing', in short, century two young Bel-
tarily improve the de- in the International 'nlp'. These models have gian jurists, Paul Otlet
sign of the software. Peace Bureau and his learned to perform a (1868-1944), ‘the father
This is in contrast to contribution to the or- specific task on the ba- of documentation’, and
proprietary software, ganization of the peace sis of existing texts. Henri La Fontaine
where the software is movement. In 1895, to- The models are used for (1854-1943), statesman
under restrictive copy- gether with Paul Otlet, search engines, machine and Nobel Peace Prize
right licensing and the he created the Interna- translations and sum- winner, created The Mun-
source code is usually tional Bibliography In- maries, spotting trends daneum. The project
hidden from the users. stitute, which became in new media networks aimed at gathering all
(Wikipedia) the Mundaneum. Within and news feeds. They in- the world’s knowledge
this institution, which fluence what you get to and file it using the
* GIT aimed to bring together see as a user, but also Universal Decimal Clas-
A software system for all the world's knowl- have their word to say sification (UDC) system
tracking changes in edge, he contributed to in the course of stock that they had invented.
source code during soft- the development of the exchanges worldwide, the
ware development. It is Universal Decimal Clas- detection of cybercrime * NATURAL LANGUAGE
designed for coordinat- sification (CDU) system. and vandalism, etc. A natural language
ing work among program- or ordinary language
mers, but it can be used * KAGGLE * MARKOV CHAIN is any language that
to track changes in any An online platform where Algorithm that scans the has evolved naturally
set of files. Before users find and publish text for the transition in humans through use
starting a new project, data sets, explore and probability of letter or and repetition without
programmers create a build machine learning word occurrences, re- conscious planning or
"git repository" in models, work with other sulting in transition premeditation. Natural
which they will publish data scientists and ma- probability tables which languages can take
all parts of the code. chine learning engi- can be computed even different forms, such
The git repositories of neers, and enter compe- without any semantic or as speech or signing.
Algolit can be found on titions to solve data grammatical natural lan- They are different from
https://gitlab.contant science challenges. guage understanding. It constructed and formal
vzw.org/algolit. About half a million can be used for analyz- languages such as those
data scientists are ac- ing texts, but also for used to program comput-
* GUTENBERG.ORG tive on Kaggle. It was recombining them. It is ers or to study logic.
Project Gutenberg is an founded by Goldbloom and is widely used in spam (Wikipedia)
online platform run by Ben Hamner in 2010 and generation.
volunteers to ‘encourage acquired by Google in * NLP OR NATURAL LAN-
the creation and distri- March 2017. * MECHANICAL TURK GUAGE PROCESSING
bution of eBooks’. It The Amazon Mechanical Natural language pro-
was founded in 1971 by * LITERATURE Turk is an online plat- cessing (NLP) is a col-
American writer Michael Algolit understands the form for humans to exe- lective term referring
S. Hart and is the old- notion of literature in cute tasks that algo- to automatic computa-
est digital library. the way a lot of other rithms cannot. Examples tional processing of
Most of the items in its experimental authors do. include annotating sen- human languages. This
collection are the full It includes all linguis- tences as being positive includes algorithms that
texts of public domain tic production, from the or negative, spotting take human-produced text
books. The project tries dictionary to the Bible, number plates, discrimi- as input, and attempt
50
█▒░░ ▓▒█░░▒▓███▀▒░ ░▒ ▒ ░ ▒▒▓██▒ ░░ ▒░ ░ ▒ ░▓ ░▒▓ ▒ ▒█░░▒█▓▒░▓▒ ▒▓▒▒░ ▒ ▒▒ ▓▒░ ▒░▓ █ ▒░ █░█ ▓▒░ ▒▓░░
▓ ▒░ ▒▒ ░▒░▒▓ ░ ░ ▒ ░ ▒ ▒ ░ ░ ░▒░ ░▒ ░ ▒ ░▒░▓░ ▒ ░ ▒ ░▒░ ░ ░▒ ▒░░ ░ ▒ ░░▓ ░ ▓▓░░ ░░▒▒▓░
░ ▒ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░░ ░ ▒ ░ ░ ░ ░ ░ ░░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
to generate text that tentielle (Workspace for manually define rules should carefully choose
resembles it. Potential Literature). for them. As prediction the training material,
Oulipo was created in models they are then and adapt it to the ma-
* NEURAL NETWORKS Paris by the French called rule-based mod- chine's task. It doesn't
Computing systems in- writers Raymond Queneau els, opposed to statis- make sense to train a
spired by the biological and François Le Lion- tical models. Rule-based machine with nineteenth-
neural networks that nais. They rooted their models are handy for century novels if its mis-
constitute animal practice in the European tasks that are specific, sion is to analyze tweets.
brains. The neural net- avant-garde of the twen- like detecting when a
work itself is not an tieth century and in the scientific paper con- * UNSUPERVISED MACHINE
algorithm, but rather a experimental tradition cerns a certain mole- LEARNING MODELS
framework for many dif- of the 1960s. For cule. With very little Unsupervised machine
ferent machine learning Oulipo, the creation of sample data, they can learning models don't
algorithms to work to- rules becomes the condi- perform well. need the step of annota-
gether and process com- tion to generate new tion of the data by hu-
plex data inputs. Such texts, or what they call * SENTIMENT ANALYSIS mans. This saves a lot
systems ‘learn’ to per- potential literature. Also called 'opinion of time, energy, money.
form tasks by consider- Later, in 1981, they mining' A basic task Instead, they need a
ing examples, generally also created ALAMO, Ate- in sentiment analysis large amount of training
without being programmed lier de littérature as- is classifying a given data, which is not al-
with any task-specific sistée par la mathéma- text as positive, nega- ways available and can
rules. For example, in tique et les ordinateurs tive or neutral. take a long cleaning
image recognition, they (Workspace for litera- Advanced, 'beyond pola- time beforehand.
might learn to identify ture assisted by maths rity' sentiment classi-
images that contain cats and computers). fication looks, for in- * WORD EMBEDDINGS
by analyzing example ima- stance, at emotional Language modelling tech-
ges that have been man- * PAUL OTLET states such as 'angry', niques that through mul-
ually labeled as ‘cat’ Paul Otlet (1868 – 1944) 'sad' and 'happy'. tiple mathematical oper-
or ‘no cat’ and using was a Belgian author, Sentiment analysis ations of counting and
the results to identify entrepreneur, visionary, is widely applied to ordering, plot words
cats in other images. lawyer and peace ac- user materials such into a multi-dimensional
They do this without any tivist; he is one of as reviews and survey vector space. When em-
prior knowledge about several people who have responses, comments bedding words, they
cats, for example, that been considered the fa- and posts on social transform from being
they have fur, tails, ther of information sci- media, and healthcare distinct symbols into
whiskers and cat-like ence, a field he called materials for applica- mathematical objects
faces. Instead, they au- 'documentation'. Otlet tions that range from that can be multiplied,
tomatically generate created the Universal marketing to customer divided, added or sub-
identifying characteris- Decimal Classification, service, from stock ex- stracted.
tics from the learning that was widespread in change transactions to
material that they libraries. Together with clinical medicine. * WORDNET
process. (Wikipedia) Henri La Fontaine he Wordnet is a combination
created the Palais Mon- * SUPERVISED MACHINE of a dictionary and a
* OPTICAL CHARACTER dial (World Palace), LEARNING MODELS thesaurus that can be
RECOGNITION (OCR) later, the Mundaneum to For the creation of su- read by machines.
Computer processes for house the collections pervised machine learn- According to Wikipedia
translating images of and activities of their ing models, humans anno- it was created in the
scanned texts into ma- various organizations tate sample text with Cognitive Science
nipulable text files. and institutes. labels before feeding Laboratory of Princeton
it to a machine to learn. University starting in
* ORACLE * PYTHON Each sentence, paragraph 1985. The project was
Oracles are prediction The main programming or text is judged by at initially funded by the
or profiling machines, language that is glob- least 3 annotators US Office of Naval Re-
a specific type of algo- ally used for natural whether it is spam or search and later also
rithmic models, mostly language processing, was not spam, positive or by other US government
based on statistics. invented in 1991 by the negative etc. agencies including
They are widely used in Dutch programmer Guido DARPA, the National
smartphones, computers, Van Rossum. * TRAINING DATA Science Foundation, the
tablets. Machine learning algo- Disruptive Technology
* RULE-BASED MODELS rithms need guidance. Office (formerly the
* OULIPO Oracles can be created In order to separate one Advanced Research and
Oulipo stands for Ou- using different tech- thing from another, they Development Activity),
vroir de litterature po- niques. One way is to need texts to extract and REFLEX.
51
◝ humans learn with machines ◜ ◡ machines learn from machines ◞ ◡ machines learn with humans ◞ ◝
humans learn from machines ◟ ◜ machines learn with machines ◠ ◜ machines learn from humans ◟ ◠
humans learn with humans ◞ ◝ humans learn from humans ◞ ◠ humans learn with machines ◟ ◡ mac
ines learn from machines ◡ ◡ machines learn with humans ◟ ◡ humans learn from machines ◝ ◟
achines learn with machines ◠ ◝ machines learn from humans ◜ ◝ humans learn with humans ◞ ◞
humans learn from humans ◡ ◞ humans learn with machines ◠ ◠ machines learn from machines ◠
machines learn with humans ◞ ◜ humans learn from machines ◜ ◠ machines learn with machines ◝
◜ machines learn from humans ◜ ◠ humans learn with humans ◝ ◟ humans learn from humans ◞
◜ humans learn with machines ◡ ◡ machines learn from machines ◡ ◟ machines learn with humans
◠ ◠ humans learn from machines ◡ ◜ machines learn with machines ◜ ◟ machines learn from
umans ◟ ◞ humans learn with humans ◞ ◟ humans learn from humans ◜ ◠ humans learn with ma
hines ◜ ◠ machines learn from machines ◝ ◠ machines learn with humans ◝ ◞ humans learn f
om machines ◝ ◡ machines learn with machines ◜ ◡ machines learn from humans ◜ ◠ humans l
arn with humans ◡ ◡ humans learn from humans ◝ ◞ humans learn with machines ◟ ◡ machines
learn from machines ◜ ◜ machines learn with humans ◠ ◞ humans learn from machines ◝ ◠ ma
hines learn with machines ◟ ◟ machines learn from humans ◝ ◠ humans learn with humans ◟
humans learn from humans ◝ ◜ humans learn with machines ◠ ◝ machines learn from machines ◞
◠ machines learn with humans ◝ ◟ humans learn from machines ◟ ◞ machines learn with machines
◜ ◞ machines learn from humans ◞ ◡ humans learn with humans ◠ ◞ humans learn from human
◠ ◜ humans learn with machines ◡ ◞ machines learn from machines ◜ ◠ machines learn w
th humans ◡ ◝ humans learn from machines ◝ ◟ machines learn with machines ◠ ◠ machine
learn from humans ◞ ◟ humans learn with humans ◠ ◞ humans learn from humans ◠ ◠ huma
s learn with machines ◡ ◡ machines learn from machines ◜ ◞ machines learn with humans ◡
◟ humans learn from machines ◜ ◜ machines learn with machines ◜ ◝ machines learn from human
◜ ◠ humans learn with humans ◝ ◡ humans learn from humans ◡ ◞ humans learn with mach
nes ◜ ◝ machines learn from machines ◝ ◜ machines learn with humans ◞ ◜ humans learn
rom machines ◞ ◝ machines learn with machines ◞ ◜ machines learn from humans ◡ ◞ huma
s learn with humans ◟ ◜ humans learn from humans ◞ ◡ humans learn with machines ◝ ◝ m
chines learn from machines ◜ ◟ machines learn with humans ◡ ◟ humans learn from machines ◠
◝ machines learn with machines ◜ ◡ machines learn from humans ◞ ◝ humans learn with huma
s ◝ ◠ humans learn from humans ◞ ◜ humans learn with machines ◠ ◝ machines learn from
machines ◟ ◡ machines learn with humans ◝ ◝ humans learn from machines ◞ ◞ machines l
arn with machines ◠ ◠ machines learn from humans ◠ ◡ humans learn with humans ◜ ◜ hum
ns learn from humans ◞ ◞ humans learn with machines ◡ ◝ machines learn from machines ◟
◝ machines learn with humans ◠ ◟ machines learn with humans ◠ ◜ machines learn from
machines ◡ ◜ humans learn with machines ◞ ◟ humans learn from humans ◜ ◡ humans learn
with humans ◝ ◞ machines learn from humans ◜ ◝ machines learn with machines ◜ ◠ human
learn from machines ◡ ◝ machines learn with humans ◝ ◜ machines learn from machines ◜
◞ humans learn with machines ◠ ◝ humans learn from humans ◠ ◝ humans learn with humans ◞
◡ machines learn from humans ◜ ◝ machines learn with machines ◠ ◟ humans learn from machi
es ◜ ◟ machines learn with humans ◝ ◝ machines learn from machines ◞ ◜ humans learn w
th machines ◝ ◡ humans learn from humans ◝ ◝ humans learn with humans ◠ ◠ machines le
rn from humans ◝ ◡ machines learn with machines ◡ ◡ humans learn from machines ◠ ◞ ma
hines learn with humans ◝ ◜ machines learn from machines ◜ ◝ humans learn with machines ◠
◞ humans learn from humans ◝ ◡ humans learn with humans ◞ ◡ machines learn from humans ◟
◟ machines learn with machines ◝ ◝ humans learn from machines ◜ ◟ machines learn with
umans ◡ ◝ machines learn from machines ◡ ◝ humans learn with machines ◞ ◜ humans lear
from humans ◜ ◝ humans learn with humans ◞ ◡ machines learn from humans ◝ ◡ machines
learn with machines ◞ ◟ humans learn from machines ◜ ◞ machines learn with humans ◟ ◡
machines learn from machines ◜ ◝ humans learn with machines ◠ ◠ humans learn from humans ◠
◝ humans learn with humans ◟ ◞ machines learn from humans ◝ ◠ machines learn with machines
◜ ◟ humans learn from machines ◠ ◝ machines learn with humans ◝ ◜ machines learn from ma
hines ◟ ◟ humans learn with machines ◞ ◡ humans learn from humans ◝ ◝ humans learn with
umans ◡ ◝ machines learn from humans ◝ ◡ machines learn with machines ◟ ◞ humans learn f
om machines ◝ ◟ machines learn with humans ◝ ◜ machines learn from machines ◝ ◠ humans l
arn with machines ◠ ◠ humans learn from humans ◟ ◜ humans learn with humans ◟ ◝ machines
learn from humans ◡ ◡ machines learn with machines ◜ ◜ humans learn from machines ◠ ◟ ma
hines learn with humans ◞ ◜ machines learn from machines ◠ ◜ humans learn with machines ◜
◞ humans learn from humans ◝ ◟ humans learn with humans ◟ ◞ machines learn from humans ◟
◝ machines learn with machines ◡ ◜ humans learn from machines ◠ ◠ machines learn with humans ◞
◡ machines learn from machines ◟ ◝ humans learn with machines ◜ ◞ humans learn from huma
s ◝ ◞ humans learn with humans ◜ ◟ machines learn from humans ◜ ◞ machines learn with ma
hines ◝ ◞ humans learn from machines ◝ ◜ machines learn with humans ◟ ◜ machines learn from
machines ◡ ◟ humans learn with machines ◞ ◠ humans learn from humans ◞ ◟ humans learn with
umans ◠ ◜ machines learn from humans ◡ ◠ machines learn with machines ◠ ◝ humans learn from
machines ◠ ◜ machines learn with humans ◞ ◠ machines learn from machines ◞ ◠ humans learn w
th machines ◜ ◟ humans learn from humans ◝ ◠ humans learn with humans ◝ ◟ machines learn from
humans ◜ ◜ machines learn with machines ◠ ◞ humans learn from machines ◠ ◡ machines learn with
machines ◡ ◟ humans learn with machines ◞ ◠ humans learn from humans ◞ ◟ humans learn with mach
ines ◝ ◞ humans learn from machines ◝ ◜ machines learn with humans ◟ ◜ machines learn from hum