Files for the publication & poster for Data Workers, an exhibition by Algolit.
http://www.algolit.net/index.php/Data_Workers
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
3640 lines
344 KiB
3640 lines
344 KiB
6 years ago
|
data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read
|
||
|
nd learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean,
|
||
|
nform, read and learn data workers write, perform, clean, inform, read and learn data workers write,
|
||
|
perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn
|
||
|
data workers write, perform, clean, inform, read and learn data workers write, perform, clean, infor
|
||
|
, read and learn data workers write, perform, clean, inform, read and learn data workers w
|
||
|
ite, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and l
|
||
|
arn data workers write, perform, clean, inform, read and learn data workers write, p
|
||
|
rform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn
|
||
|
data workers write, perform, clean, inform, read and learn data workers write,
|
||
|
perform, clean, inform, read and learn data workers write, perform, clean, inform, read and
|
||
|
earn data workers write, perform, clean, inform, read and learn data wor
|
||
|
ers write, perform, clean, inform, read and learn data workers write, perform, clean, inf
|
||
|
rm, read and learn data workers write, perform, clean, inform, read and learn
|
||
|
data workers write, perform, clean, inform, read and learn data workers wri
|
||
|
e, perform, clean, inform, read and learn data workers write, perform, clean, inform,
|
||
|
read and learn data workers write, perform, clean, inform, read and learn
|
||
|
data workers write, perform, clean, inform, read and learn data wor
|
||
|
ers write, perform, clean, inform, read and learn data workers write, perform, cl
|
||
|
an, inform, read and learn data workers write, perform, clean, inform, read and
|
||
|
earn data workers write, perform, clean, inform, read and learn
|
||
|
data workers write, perform, clean, inform, read and learn dat
|
||
|
workers write, perform, clean, inform, read and learn data workers write, p
|
||
|
rform, clean, inform, read and learn data workers write, perform, clean, in
|
||
|
orm, read and learn data workers write, perform, clean, inform, read and l
|
||
|
arn data workers write, perform, clean, inform, read and learn
|
||
|
data workers write, perform, clean, inform, read and learn
|
||
|
data workers write, perform, clean, inform, read and learn
|
||
|
data workers write, perform, clean, inform, read and learn data work
|
||
|
rs write, perform, clean, inform, read and learn data workers write,
|
||
|
perform, clean, inform, read and learn data workers write, perform,
|
||
|
clean, inform, read and learn data workers write, perform, clean,
|
||
|
nform, read and learn data workers write, perform, clean, inform,
|
||
|
read and learn data workers write, perform, clean, inform, read
|
||
|
nd learn data workers write, perform, clean, inform, read and l
|
||
|
arn data workers write, perform, clean, inform, read and learn
|
||
|
data workers write, perform, clean, inform, read and learn
|
||
|
data workers write, perform, clean, inform, read and learn
|
||
|
data workers write, perform, clean, inform, read and learn
|
||
|
data workers write, perform, clean, inform, read and learn
|
||
|
data workers write, perform, clean, inform, read and learn
|
||
|
data workers write, perform, clean, inform, read and learn
|
||
|
data workers write, perform, clean, inform, read and learn
|
||
|
data workers write, perform, clean, inform, read and l
|
||
|
arn data workers write, perform, clean, inform, read
|
||
|
nd learn data workers write, perform, clean, inform,
|
||
|
read and learn data workers write, perform, clean,
|
||
|
nform, read and learn data workers write, perform,
|
||
|
clean, inform, read and learn data workers write,
|
||
|
perform, clean, inform, read and learn data work
|
||
|
rs write, perform, clean, inform, read and learn
|
||
|
data workers write, perform, clean, inform, read and learn
|
||
|
data workers write, perform, clean, inform, read and learn
|
||
|
data workers write, perform, clean, inform, read and learn
|
||
|
data workers write, perform, clean, inform, read and learn
|
||
|
|
||
|
|
||
|
What
|
||
|
can
|
||
|
humans learn from humans
|
||
|
humans learn with machines
|
||
|
machines learn from machines
|
||
|
machines learn with humans
|
||
|
humans learn from machines
|
||
|
machines learn with machines
|
||
|
machines learn from humans
|
||
|
humans learn with humans
|
||
|
? ? ?
|
||
|
|
||
|
Data Workers, an exhibition at the Mundaneum in Mons from 28 March until 28 April 2019.
|
||
|
0 12 3 4 5 67 8 9 0
|
||
|
12 3 4 5 67 8 9 0 12
|
||
|
3 4 5 67 8 9 0 1 2
|
||
|
3 4 5 6 7 8 9 0 1 2 3
|
||
|
4 56 7 8 9 01 2 3
|
||
|
4 56 7 8 9 01 2 3
|
||
|
4 56 7 8 9 01 2 3
|
||
|
4 56 7 8 9 0 1 2 3 4
|
||
|
5 6 7 8 9 0 1 2 3 45 6
|
||
|
7 8 90 1 2 3 45 6
|
||
|
7 8 90 1 2 3 45 6
|
||
|
7 8 90 1 2 3 4 5 6
|
||
|
7 8 9 0 1 2 3 4 5 6
|
||
|
7 89 0 1 2 34 5 6
|
||
|
7 89 0 1 2 34 5 6
|
||
|
7 89 0 1 2 34 5 6 7
|
||
|
89 0 1 2 3 4 5 6 7 8 9
|
||
|
0 1 2 3 4 5 6 78 9
|
||
|
0 1 23 4 5 6 78 9 0
|
||
|
1 23 4 5 6 78 9 0
|
||
|
1 23 4 5 6 78 9 0
|
||
|
1 2 3 4 5 6 7 8 9 0
|
||
|
1 2 3 4 5 67 8 9 0 12
|
||
|
3 4 5 67 8 9 0 12
|
||
|
3 4 5 67 8 9 0 12 3
|
||
|
4 5 6 7 8 9 0 1 2 3
|
||
|
4 5 6 7 8 9 01 2 3
|
||
|
4 56 7 8 9 01 2 3
|
||
|
4 56 7 8 9 01 2 3
|
||
|
4 56 7 8 9 01 2 3 4
|
||
|
5 6 7 8 9 0 1 2 3 4 5 6
|
||
|
7 8 90 1 2 3 45 6
|
||
|
7 8 90 1 2 3 45 6
|
||
|
7 8 90 1 2 3 45 6
|
||
|
7 8 90 1 2 3 4 5 6 7
|
||
|
8 9 0 1 2 3 4 5 6 7
|
||
|
89 0 1 2 34 5 6 7
|
||
|
89 0 1 2 34 5 6 7 89
|
||
|
0 1 2 34 5 6 7 8 9
|
||
|
0 1 2 3 4 5 6 7 8 9
|
||
|
0 1 23 4 5 6 78 9 0
|
||
|
1 23 4 5 6 78 9 0
|
||
|
1 23 4 5 6 78 9 0
|
||
|
1 23 4 5 6 7 8 9 0
|
||
|
1 2 3 4 5 6 7 8 9 0 12 3
|
||
|
4 5 67 8 9 0 12 3
|
||
|
4 5 67 8 9 0 12 3
|
||
|
4 5 67 8 9 0 12 3
|
||
|
4 5 6 7 8 9 0 1 2 3
|
||
|
4 5 6 7 8 9 01 2 3 4
|
||
|
56 7 8 9 01 2 3 4
|
||
|
56 7 8 9 01 2 3 4 5
|
||
|
6 7 8 9 0 1 2 3 4 5 6
|
||
|
7 8 9 0 1 2 3 45 6
|
||
|
7 8 90 1 2 3 45 6
|
||
|
7 8 90 1 2 3 45 6
|
||
|
7 8 90 1 2 3 45 6 7
|
||
|
8 9 0 1 2 3 4 5 6 7
|
||
|
8 9 0 1 2 34 5 6 7 89
|
||
|
0 1 2 34 5 6 7 89 0
|
||
|
1 2 34 5 6 7 89 0
|
||
|
1 2 34 5 6 7 8 9 0
|
||
|
1 2 3 4 5 6 7 8 9 0
|
||
|
1 23 4 5 6 78 9 0
|
||
|
1 23 4 5 6 78 9 0
|
||
|
1 23 4 5 6 78 9 0 1
|
||
|
2 3 4 5 6 7 8 9 0 1 2 3
|
||
|
4 5 67 8 9 0 12 3
|
||
|
4 5 67 8 9 0 12 3
|
||
|
2
|
||
|
ABOUT AT THE MUNDANEUM
|
||
|
|
||
|
Data Workers is an exhibition of algoliterary works, of stories In the late nineteenth century two young
|
||
|
told from an ‘algorithmic storyteller point of view’. The exhibi- Belgian jurists, Paul Otlet (1868–1944),
|
||
|
tion was created by members of Algolit, a group from Brussels in- the 'father of documentation’, and Henri
|
||
|
volved in artistic research on algorithms and literature. Every La Fontaine (1854-1943), statesman and
|
||
|
month they gather to experiment with F/LOSS code and texts. Some Nobel Peace Prize winner, created the
|
||
|
works are by students of Arts² and external participants to the Mundaneum. The project aimed to gather
|
||
|
workshop on machine learning and text organized by Algolit in Oc- all the world’s knowledge and to file it
|
||
|
tober 2018 at the Mundaneum. using the Universal Decimal Classifica-
|
||
|
tion (UDC) system that they had invent-
|
||
|
Companies create artificial intelligence (AI) systems to serve, ed. At first it was an International In-
|
||
|
entertain, record and learn about humans. The work of these ma- stitutions Bureau dedicated to interna-
|
||
|
chinic entities is usually hidden behind interfaces and patents. tional knowledge exchange. In the twen-
|
||
|
In the exhibition, algorithmic storytellers leave their invisible tieth century the Mundaneum became a
|
||
|
underworld to become interlocutors. The data workers operate in universal centre of documentation. Its
|
||
|
different collectives. Each collective represents a stage in the collections are made up of thousands of
|
||
|
design process of a machine learning model: there are the Writ- books, newspapers, journals, documents,
|
||
|
ers, the Cleaners, the Informants, the Readers, the Learners and posters, glass plates and postcards in-
|
||
|
the Oracles. The boundaries between these collectives are not dexed on millions of cross-referenced
|
||
|
fixed; they are porous and permeable. At times, Oracles are also cards. The collections were exhibited
|
||
|
Writers. At other times Readers are also Oracles. Robots voice and kept in various buildings in Brus-
|
||
|
experimental literature, while algorithmic models read data, turn sels, including the Palais du Cinquante-
|
||
|
words into numbers, make calculations that define patterns and naire. The remains of the archive only
|
||
|
are able to endlessly process new texts ever after. moved to Mons in 1998.
|
||
|
|
||
|
The exhibition foregrounds data workers who impact our daily Based on the Mundaneum, the two men de-
|
||
|
lives, but are either hard to grasp and imagine or removed from signed a World City for which Le Corbu-
|
||
|
the imagination altogether. It connects stories about algorithms sier made scale models and plans. The
|
||
|
in mainstream media to the storytelling that is found in techni- aim of the World City was to gather,
|
||
|
cal manuals and academic papers. Robots are invited to engage in at a global level, the institutions of
|
||
|
dialogue with human visitors and vice versa. In this way we might knowledge: libraries, museums and uni-
|
||
|
understand our respective reasonings, demystify each other's be- versities. This project was never rea-
|
||
|
haviour, encounter multiple personalities, and value our collec- lized. It suffered from its own utopia.
|
||
|
tive labour. It is also a tribute to the many machines that Paul The Mundaneum is the result of a visio-
|
||
|
Otlet and Henri La Fontaine imagined for their Mundaneum, showing nary dream of what an infrastructure for
|
||
|
their potential but also their limits. universal knowledge exchange could be.
|
||
|
It attained mythical dimensions at the
|
||
|
--- time. When looking at the concrete ar-
|
||
|
chive that was developed, that collec-
|
||
|
Data Workers was created by Algolit. tion is rather eclectic and specific.
|
||
|
|
||
|
Works by: Cristina Cochior, Gijs de Heij, Sarah Garcin, Artificial intelligence systems today
|
||
|
AnMertens, Javier Lloret, Louise Dekeuleneer, Florian Van de Weyer, come with their own dreams of universal-
|
||
|
Laetitia Trozzi, Rémi Forte, Guillaume Slizewicz, ity and knowledge production. When read-
|
||
|
Michael Murtaugh, Manetta Berends, Mia Melvær. ing about these systems, the visionary
|
||
|
dreams of their makers were there from
|
||
|
Co-produced by: Arts², Constant and Mundaneum. the beginning of their development in
|
||
|
the 1950s. Nowadays, their promise has
|
||
|
With the support of: Wallonia-Brussels Federation/Digital Arts, also attained mythical dimensions. When
|
||
|
Passa Porta, UGent, DHuF - Digital Humanities Flanders and looking at their concrete applications,
|
||
|
Distributed Proofreaders Project. the collection of tools is truly innova-
|
||
|
tive and fascinating, but at the same
|
||
|
Thanks to: Mike Kestemont, Michel Cleempoel, Donatella Portoghese, time, rather eclectic and specific. For
|
||
|
François Zajéga, Raphaèle Cornille, Vincent Desfromont, Data Workers, Algolit combined some of
|
||
|
Kris Rutten, Anne-Laure Buisson, David Stampfli. the applications with 10 per cent of the
|
||
|
digitized publications of the Interna-
|
||
|
tional Institutions Bureau. In this way,
|
||
|
we hope to poetically open up a discus-
|
||
|
sion about machines, algorithms, and
|
||
|
technological infrastructures.
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
3
|
||
|
CONTEXTUAL STORIES
|
||
|
ABOUT ALGOLIT
|
||
|
|
||
|
|
||
|
|
||
|
--- Why contextual stories? --- spread by the media, often limited to superficial
|
||
|
reporting and myth-making. By creating algoliter-
|
||
|
During the monthly meetings of Algolit, we study ary works, we offer humans an introduction to
|
||
|
manuals and experiment with machine learning tools techniques that co-shape their daily lives.
|
||
|
for text processing. And we also share many, many
|
||
|
stories. With the publication of these stories we
|
||
|
hope to recreate some of that atmosphere. The sto- --- What is literature? ---
|
||
|
ries also exist as a podcast that can be down-
|
||
|
loaded from http://www.algolit.net. Algolit understands the notion of literature in
|
||
|
the way a lot of other experimental authors do: it
|
||
|
For outsiders, algorithms only become visible in includes all linguistic production, from the dic-
|
||
|
the media when they achieve an outstanding perfor- tionary to the Bible, from Virginia Woolf's entire
|
||
|
mance, like Alpha Go, or when they break down in work to all versions of the Terms of Service pub-
|
||
|
fantastically terrifying ways. Humans working in lished by Google since its existence. In this
|
||
|
the field though, create their own culture on and sense, programming code can also be literature.
|
||
|
offline. They share the best stories and experi-
|
||
|
ences during live meetings, research conferences The collective Oulipo is a great source of inspi-
|
||
|
and annual competitions like Kaggle. These stories ration for Algolit. Oulipo stands for Ouvroir de
|
||
|
that contextualize the tools and practices can be litterature potentielle (Workspace for Potential
|
||
|
funny, sad, shocking, interesting. Literature). Oulipo was created in Paris by the
|
||
|
French writers Raymond Queneau and François Le
|
||
|
A lot of them are experiential learning cases. The Lionnais. They rooted their practice in the Euro-
|
||
|
implementations of algorithms in society generate pean avant-garde of the twentieth century and in
|
||
|
new conditions of labour, storage, exchange, be- the experimental tradition of the 1960s.
|
||
|
haviour, copy and paste. In that sense, the con-
|
||
|
textual stories capture a momentum in a larger an- For Oulipo, the creation of rules becomes the con-
|
||
|
thropo-machinic story that is being written at dition to generate new texts, or what they call
|
||
|
full speed and by many voices. potential literature. Later, in 1981, they also
|
||
|
created ALAMO, Atelier de littérature assistée par
|
||
|
la mathématique et les ordinateurs (Workspace for
|
||
|
--- We create 'algoliterary' works --- literature assisted by maths and computers).
|
||
|
|
||
|
The term 'algoliterary' comes from the name of our
|
||
|
research group Algolit. We have existed since 2012 --- An important difference ---
|
||
|
as a project of Constant, a Brussels-based organi-
|
||
|
zation for media and the arts. We are artists, While the European avant-garde of the twentieth
|
||
|
writers, designers and programmers. Once a month century pursued the objective of breaking with
|
||
|
we meet to study and experiment together. Our work conventions, members of Algolit seek to make con-
|
||
|
can be copied, studied, changed, and redistributed ventions visible.
|
||
|
under the same free license. You can find all the
|
||
|
information on: http://www.algolit.net. 'I write: I live in my paper, I invest it, I walk
|
||
|
through it.' (Espèces d'espaces. Journal d'un us-
|
||
|
The main goal of Algolit is to explore the view- ager de l'espace, Galilée, Paris, 1974)
|
||
|
point of the algorithmic storyteller. What new
|
||
|
forms of storytelling do we make possible in dia- This quote from Georges Perec in Espèces d'espaces
|
||
|
logue with these machinic agencies? Narrative could be taken up by Algolit. We're not talking
|
||
|
viewpoints are inherent to world views and ideolo- about the conventions of the blank page and the
|
||
|
gies. Don Quixote, for example, was written from literary market, as Georges Perec was. We're re-
|
||
|
an omniscient third-person point of view, showing ferring to the conventions that often remain hid-
|
||
|
Cervantes’ relation to oral traditions. Most con- den behind interfaces and patents. How are tech-
|
||
|
temporary novels use the first-person point of nologies made, implemented and used, as much in
|
||
|
view. Algolit is interested in speaking through academia as in business infrastructures?
|
||
|
algorithms, and in showing you the reasoning un-
|
||
|
derlying one of the most hidden groups on our We propose stories that reveal the complex hy-
|
||
|
planet. bridized system that makes machine learning possi-
|
||
|
ble. We talk about the tools, the logics and the
|
||
|
To write in or through code is to create new forms ideologies behind the interfaces. We also look at
|
||
|
of literature that are shaping human language in who produces the tools, who implements them, and
|
||
|
unexpected ways. But machine Learning techniques who creates and accesses the large amounts of data
|
||
|
are only accessible to those who can read, write needed to develop prediction machines. One could
|
||
|
and execute code. Fiction is a way of bridging the say, with the wink of an eye, that we are collabo-
|
||
|
gap between the stories that exist in scientific rators of this new tribe of human-robot hybrids.
|
||
|
papers and technical manuals, and the stories
|
||
|
|
||
|
4
|
||
|
writers write writers write writers write writers write writers write writers write writ
|
||
|
rs write writers write writers write writers write writers write
|
||
|
writers write writers write writers write writers write
|
||
|
writers write writers write writers write writers write
|
||
|
writers write writers write writers write
|
||
|
writers write writers write writers write
|
||
|
writers write writers write writers write
|
||
|
writers write writers write
|
||
|
writers write writers write writers write
|
||
|
writers write writers write
|
||
|
writers write writers write
|
||
|
writers write writers write
|
||
|
writers write writers write
|
||
|
writers write writers write
|
||
|
writers write writers write
|
||
|
writers write writers write
|
||
|
writers write writers write
|
||
|
writers write writ
|
||
|
rs write writers write
|
||
|
writers write writers write
|
||
|
writers write
|
||
|
writers write writers write
|
||
|
writers write writer
|
||
|
write writers write
|
||
|
writers write writ
|
||
|
rs write writers write
|
||
|
writers write
|
||
|
writers write writers write
|
||
|
writers write
|
||
|
writers write w
|
||
|
iters write writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
writers write writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
writers write writer
|
||
|
write writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
writers write writ
|
||
|
rs write writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
writers write
|
||
|
5
|
||
|
86ncrg k en3 a ioi-t i i l1 e i +-+-+-+-+-+-+-+ a +-+-+-+-+-+ l 9 t7ccpI46ed6t o w 7e a5o3 -
|
||
|
el, e 7 nh 71 e 5 4 3 4 |w|r|i|t|e|r|s| i |w|r|i|t|e| daml su h i e1 ww A l e59se a 5o wl
|
||
|
amlt t s w tlo n r 7a o9 +-+-+-+-+-+-+-+ ta +-+-+-+-+-+ hw t o4e e n,o32r , wd2 eo re 67n r
|
||
|
o1ife tt s 38 nt l 74 o 7 5i oda 65 ei r 9 7 n 5 n1r m l ot a51 e 3ma, 14swn 7 r r
|
||
|
b o i 3 se2 rceit ne a ki r 8 1iw3s n an t 8 8 r ra bn 1 eue r t4a r sT r phe o
|
||
|
e 6e6 7h5orir de6 1 +-+-+-+-+ +-+-+-+-+-+-+-+ t u +-+-+-+-+ 1 8 97o e c 4 d 8 h 7 z o a c4
|
||
|
w as 3r 17r p ai |d|a|t|a| |w|o|r|k|e|r|s| |w|o|r|k| 6 r6v56 4 2i7 e tu1 r9 w 5 8
|
||
|
52 1 wi r 4hn G +-+-+-+-+ +-+-+-+-+-+-+-+ n +-+-+-+-+ nr 4 21 n raa2 Pn9 h
|
||
|
a ca3 adw sara +-+-+-+-+ +-+-+-+-+-+-+-+ +-+-+-+-+-+ 9 e9na y tt c 7 6 .cbieas
|
||
|
u e 5m b t3r 4 46 |m|a|n|y| |a|u|t|h|o|r|s| u |w|r|i|t|e| 4 4 yff , th t e
|
||
|
6 2 6vo nn s +-+-+-+-+ +-+-+-+-+-+-+-+ m +-+-+-+-+-+ i 4 1 W1 n r8 - 1 g7
|
||
|
4n +-+-+-+-+-+ +-+-+-+-+-+ +-+-+-+-+-+ 8 1n e 6l v5c a
|
||
|
r 4 1 |e|v|e|r|y| |h|u|m|a|n| |b|e|i|n|g| n5 asr e 7l h 7 u , k o 2 r
|
||
|
e h r h +-+-+-+-+-+ +-+-+-+-+-+ +-+-+-+-+-+ 65 3 1 t w er e3 5 1en e i
|
||
|
4 o c +-+-+-+ +-+-+-+ +-+-+-+-+-+-+ +-+-+ u 6d7 r tm , t l se t i 1
|
||
|
t fc |w|h|o| |h|a|s| |a|c|c|e|s|s| |t|o| e 69 t n 1 k 4 1
|
||
|
e n +-+-+-+ +-+-+-+ +-+-+-+-+-+-+ +-+-+ ie 62i 2 t tn 7 t on o e
|
||
|
1 l , +-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+ a 9 , 9
|
||
|
9 w r |t|h|e| |i|n|t|e|r|n|e|t| |i|n|t|e|r|a|c|t|s| r i i tr h u f
|
||
|
m i m 5 +-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+ 6 T c 5 w 6 i d T
|
||
|
7 5 l i os +-+-+ +-+-+-+-+-+ +-+-+-+-+-+-+ s m
|
||
|
w s r6 n |w|e| t |c|h|a|t|,| |w|r|i|t|e|,| 6 rrf
|
||
|
e 2 6 , p oe +-+-+ o +-+-+-+-+-+ +-+-+-+-+-+-+ r
|
||
|
e s 4 e p y 9 i +-+-+-+-+-+-+ +-+-+-+-+ +-+-+-+ r /
|
||
|
e s 6 e |c|l|i|c|k|,| |l|i|k|e| |a|n|d| tw r6 t ai
|
||
|
3 8 28 a n e 8 +-+-+-+-+-+-+ +-+-+-+-+ +-+-+-+ r4 7
|
||
|
e n h t 5 n +-+-+-+-+-+ n
|
||
|
3 9 f c |s|h|a|r|e| p
|
||
|
l 5 9 +-+-+-+-+-+ d
|
||
|
7 1 +-+-+ +-+-+-+-+-+ +-+-+-+ +-+-+-+-+ t 5
|
||
|
r 2 2 e |w|e| |l|e|a|v|e| |o|u|r| |d|a|t|a| n3 i ,
|
||
|
d t 8 a 9 +-+-+ 1 +-+-+-+-+-+ +-+-+-+ +-+-+-+-+ t
|
||
|
7 +-+-+ +-+-+-+-+ +-+-+-+-+-+-+-+-+-+
|
||
|
7 t e |w|e| |f|i|n|d| |o|u|r|s|e|l|v|e|s| 6
|
||
|
y s 8 8 +-+-+ 7 +-+-+-+-+ +-+-+-+-+-+-+-+-+-+ n e
|
||
|
r 1 +-+-+-+-+-+-+-+ +-+-+ +-+-+-+-+-+-+ e
|
||
|
a 2 t |w|r|i|t|i|n|g| |i|n| |P|y|t|h|o|n|
|
||
|
5 3 d +-+-+-+-+-+-+-+ +-+-+ +-+-+-+-+-+-+ r
|
||
|
+-+-+-+-+ +-+-+-+-+-+-+ e
|
||
|
|s|o|m|e| |n|e|u|r|a|l| 4 a
|
||
|
k n +-+-+-+-+ +-+-+-+-+-+-+ z
|
||
|
or 3 w +-+-+-+-+-+-+-+-+ +-+-+-+-+-+
|
||
|
1 1 |n|e|t|w|o|r|k|s| c |w|r|i|t|e| 1 9
|
||
|
s n +-+-+-+-+-+-+-+-+ +-+-+-+-+-+ e a
|
||
|
g +-+-+-+-+-+ +-+-+-+-+-+-+-+ +-+-+-+-+-+-+ t
|
||
|
|h|u|m|a|n| |e|d|i|t|o|r|s| |a|s|s|i|s|t| n , o
|
||
|
8 +-+-+-+-+-+ +-+-+-+-+-+-+-+ +-+-+-+-+-+-+ a
|
||
|
+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+ 4
|
||
|
|p|o|e|t|s|,| |p|l|a|y|w|r|i|g|h|t|s| i7
|
||
|
t +-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+ t c k y
|
||
|
v +-+-+ +-+-+-+-+-+-+-+-+-+ o +-+-+-+-+-+-+
|
||
|
|o|r| |n|o|v|e|l|i|s|t|s| |a|s|s|i|s|t| 4 2 9
|
||
|
r +-+-+ +-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+ 7 6
|
||
|
u r e
|
||
|
, R
|
||
|
6 6
|
||
|
t
|
||
|
s
|
||
|
3 g 6 4
|
||
|
|
||
|
c e t 2
|
||
|
3 h 8
|
||
|
D 4
|
||
|
a
|
||
|
n o -
|
||
|
w 5 e 3 n e 3
|
||
|
3
|
||
|
e
|
||
|
|
||
|
6
|
||
|
V V V % V % V % V V V % % %% % %% % %% % % % % % %
|
||
|
V V V V V V V V V V V V V V V V % % 0 %% 0 % %% % % % % %
|
||
|
V V V V V V % V V V % % % % % % 0 % 00 % % 0 %
|
||
|
% %% % 0 0 %% % % ___ _ %% % 0 %
|
||
|
% % % % / \__ _| |_ __ _
|
||
|
WRITERS % % % / /\ / _` | __/ _` | 0 0 % %
|
||
|
% % % % / /_// (_| | || (_| | % % % %
|
||
|
% 0 0 00 /___,' \__,_|\__\__,_| 0
|
||
|
V V V V % V V V % V 0 __ __ _
|
||
|
V V V V V V V V V V V V V V V V 0 0 / / /\ \ \___ _ __| | _____ _ __ ___ 0 0 %
|
||
|
V V V V % V V V V V \ \/ \/ / _ \| '__| |/ / _ \ '__/ __|
|
||
|
V V V V V V V V 0 0 0 \ /\ / (_) | | | < __/ | \__ \ 0
|
||
|
V V V V V V V V V V V V V V V V \/ \/ \___/|_| |_|\_\___|_| |___/ % %
|
||
|
V V V % V V V V V V 0 ___ _ _ _ 0 0 0 _ _ 0 %
|
||
|
% / _ \_ _| |__ | (_) ___ __ _| |_(_) ___ _ __ %
|
||
|
Data workers need data to work 0 / /_)/ | | | '_ \| | |/ __/ _` | __| |/ _ \| '_ \
|
||
|
with. The data that used in the % / ___/| |_| | |_) | | | (_| (_| | |_| | (_) | | | |
|
||
|
context of Algolit is written lan- 0 \/ \__,_|_.__/|_|_|\___\__,_|\__|_|\___/|_| |_|
|
||
|
guage. Machine learning relies on 0 0 % 0 % %
|
||
|
many types of writing. Many authors
|
||
|
write in the form of publications, By Algolit
|
||
|
such as books or articles. These % %
|
||
|
are part of organized archives and All works visible in the exhibition, as well as the contextual
|
||
|
are sometimes digitized. But there stories and some extra text material have been collected in
|
||
|
are other kinds of writing too. We this publication, which exists in French and English.
|
||
|
could say that every human being
|
||
|
who has access to the Internet is a This publication is made using a plain text workflow, based on
|
||
|
writer each time they interact with various text processing and counting tools. The plain text file
|
||
|
algorithms. We chat, write, click, format is a type of document in which there is no inherent struc-
|
||
|
like and share. In return for free tural difference between headers and paragraphs anymore. It is
|
||
|
services, we leave our data that is the most used type of document in machine learning models for
|
||
|
compiled into profiles and sold for text. This format has been the starting point of a playful design
|
||
|
advertising and research purposes. process, where pages are carefully counted, page by page, line by
|
||
|
line and character by character. %
|
||
|
Machine learning algorithms are not %
|
||
|
critics: they take whatever they're Each page holds 110 characters per line and 70 lines per page.
|
||
|
given, no matter the writing style, The design originates from the act of counting words, spaces and
|
||
|
no matter the CV of the author, no lines. It plays with random choices, scripted patterns and
|
||
|
matter the spelling mistakes. In ASCII/UNICODE-fonts, to speculate about the materiality of digi-
|
||
|
fact, mistakes make it better: the tal text and to explore the interrelations between counting and
|
||
|
more variety, the better they learn writing through words and numbers.
|
||
|
to anticipate unexpected text. But
|
||
|
often, human authors are not aware --- %
|
||
|
of what happens to their work.
|
||
|
Texts: Cristina Cochior, Sarah Garcin, Gijs de Heij, An Mertens,
|
||
|
Most of the writing we use is in François Zajéga, Louise Dekeuleneer, Florian Van de Weyer,
|
||
|
English, some in French, some in Laetitia Trozzi, Rémi Forte, Guillaume Slizewicz.
|
||
|
Dutch. Most often we find ourselves
|
||
|
writing in Python, the programming Translations & proofreading: deepl.com, Michel Cleempoel,
|
||
|
language we use. Algorithms can be % Elodie Mugrefya, Emma Kraak, Patrick Lennon.
|
||
|
writers too. Some neural networks
|
||
|
write their own rules and generate Lay-out & cover: Manetta Berends
|
||
|
their own texts. And for the models https://git.vvvvvvaria.org/mb/data-workers-publication
|
||
|
that are still wrestling with the
|
||
|
ambiguities of natural language, Font: GNU Unifont, OGRE
|
||
|
there are human editors to assist Printer: PrinterPro, Rotterdam
|
||
|
them. Poets, playwrights or novel- Paper: Glossy MC 90gr
|
||
|
ists start their new careers as as-
|
||
|
sistants of AI. Responsible publisher: Constant vzw/asbl
|
||
|
Rue du Fortstraat 5, 1060 Brussels
|
||
|
|
||
|
License: Algolit, Data Workers, March 2019, Brussels.
|
||
|
Copyleft: This is a free work, you can copy, distribute,
|
||
|
and modify it under the terms of the Free Art License.
|
||
|
http://artlibre.org/licence/lal/en/
|
||
|
|
||
|
Online version: http://www.algolit.net/index.php/Data_Workers
|
||
|
Sources: https://gitlab.constantvzw.org/algolit/mundaneum
|
||
|
|
||
|
7
|
||
|
% % % % % %%% % % 0 % 00 % % 0 %%
|
||
|
% % 0 ___ _ 0 0
|
||
|
% % % % % / \__ _| |_ __ _ 0 % %
|
||
|
%%% % %% % % % % % % / /\ / _` | __/ _` | % % 0 %
|
||
|
% % % % % % / /_// (_| | || (_| | % % % % %
|
||
|
% %%% % % 00 /___,' \__,_|\__\__,_| % 0 % % % % %
|
||
|
% __ % __ 0 % _ 0 % % % %
|
||
|
% % 0 / / /\ \ \___ _ __| | _____ _ __ ___ % %
|
||
|
% % % % % % \ \/ \/ / _ \| '__| |/ / _ \ '__/ __|
|
||
|
% 0 \ /\ / (_) | | | < __/ | \__ \ 0 %
|
||
|
% 0 \/ \/ \___/|_| |_|\_\___|_| |___/
|
||
|
% % 0 % ___ _ _ %
|
||
|
% % 0 / _ \___ __| | ___ __ _ ___| |_ 0
|
||
|
% 0 0 / /_)/ _ \ / _` |/ __/ _` / __| __|
|
||
|
% % 0 0 / ___/ (_) | (_| | (_| (_| \__ \ |_
|
||
|
% 0 \/ \___/ \__,_|\___\__,_|___/\__| %
|
||
|
0 0 0 0 0 0 %
|
||
|
%
|
||
|
% By Algolit %
|
||
|
% % %
|
||
|
% During our monthly Algolit meetings, we study manuals and experi-
|
||
|
ment with machine learning tools for text processing. And we also
|
||
|
share many, many stories. With this podcast we hope to recreate
|
||
|
some of that atmosphere.
|
||
|
% %
|
||
|
For outsiders, algorithms only become visible in the media when
|
||
|
they achieve an outstanding performance, like Alpha Go, or when
|
||
|
they break down in fantastically terrifying ways. Humans working
|
||
|
in the field though, create their own culture on and offline.
|
||
|
They share the best stories and experiences during live meetings,
|
||
|
research conferences and annual competitions like Kaggle. These
|
||
|
% stories that contextualize the tools and practises can be funny,
|
||
|
sad, shocking, interesting.
|
||
|
|
||
|
A lot of them are experiential learning cases. The implementa-
|
||
|
% % tions of algorithms in society generate new conditions of labour,
|
||
|
storage, exchange, behaviour, copy and paste. In that sense, the
|
||
|
contextual stories capture a momentum in a larger anthropo-ma-
|
||
|
chinic story that is being written at full speed and by many
|
||
|
voices. The stories are also published in this publication.
|
||
|
|
||
|
|
||
|
--- %
|
||
|
% %
|
||
|
% Voices: David Stampfli, Cristina Cochior, An Mertens,
|
||
|
Gijs de Heij, Karin Ulmer, Guillaume Slizewicz
|
||
|
|
||
|
Editing: Javier Lloret
|
||
|
%
|
||
|
Recording: David Stampfli
|
||
|
|
||
|
Texts: Cristina Cochior, An Mertens
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
8
|
||
|
%% % % % 00 00 0 % %
|
||
|
% % % % % % 0 0 % %
|
||
|
% % %% % 0 0 _ _ _ %%
|
||
|
%%% %% % % % % % % %% /\/\ __ _ _ __| | _| |__ ___ | |_
|
||
|
% % %% / \ / _` | '__| |/ / '_ \ / _ \| __|
|
||
|
% % % % % % / /\/\ \ (_| | | | 0 <| |_) | (_) | |_ %
|
||
|
% % %% \/ \/\__,_|_| |_|\_\_.__/ \___/ \__|
|
||
|
% % % ___ _ 0 0 _ 00 %%%
|
||
|
/ __\ |__ __ _(_)_ __ ___ 0
|
||
|
% %% 0 / / | '_ \ / _` | | '_ \/ __| %
|
||
|
0 / /___| | | | (_| | | | | \__ \
|
||
|
% % 0 \____/|_| |_|\__,_|_|_| |_|___/ 0 0
|
||
|
%% 0 0 0
|
||
|
%% %
|
||
|
By Florian Van de Weyer, student Arts²/Section Digital Arts
|
||
|
|
||
|
Markbot Chain is a social experiment in which the public has a
|
||
|
% direct influence on the result. The intention is to integrate re-
|
||
|
sponses in a text-generation process without applying any filter.
|
||
|
%
|
||
|
All the questions in the digital files provided by the Mundaneum %%
|
||
|
were automatically extracted. These questions are randomly put to
|
||
|
the public via a terminal. By answering them, people contribute
|
||
|
to another database. Each entry generates a series of sentences
|
||
|
using a Markov chain configuration, an algorithm that is widely %
|
||
|
used in spam generation. The sentences generated in this way are
|
||
|
% displayed in the window, and a new question is asked.
|
||
|
% % %
|
||
|
% % %
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
9
|
||
|
CONTEXTUAL STORIES
|
||
|
ABOUT WRITERS
|
||
|
|
||
|
|
||
|
|
||
|
--- Programmers are writing the only way to maintain trust is through consis-
|
||
|
the dataworkers into being --- tency. So when Cortana talks, you 'must use her
|
||
|
personality'.
|
||
|
We recently had a funny realization: most program-
|
||
|
mers of the languages and packages that Algolit What is Cortana's personality, you ask?
|
||
|
uses are European.
|
||
|
|
||
|
Python, for example, the main language that is 'Cortana is considerate,
|
||
|
globally used for Natural Language Processing sensitive, and supportive.
|
||
|
(NLP), was invented in 1991 by the Dutch program-
|
||
|
mer Guido Van Rossum. He then crossed the Atlantic She is sympathetic but turns quickly to solutions.
|
||
|
and went from working for Google to working for
|
||
|
Dropbox. She doesn't comment on the user’s personal
|
||
|
information or behavior, particularly if
|
||
|
Scikit Learn, the open-source Swiss knife of ma- the information is sensitive.
|
||
|
chine learning tools, started as a Google Summer
|
||
|
of Code project in Paris by French researcher She doesn't make assumptions about what
|
||
|
David Cournapeau. Afterwards, it was taken on by the user wants, especially to upsell.
|
||
|
Matthieu Brucher as part of his thesis at the Sor-
|
||
|
bonne University in Paris. And in 2010, INRA, the She works for the user. She does not repre-
|
||
|
French National Institute for computer science and sent any company, service, or product.
|
||
|
applied mathematics, adopted it.
|
||
|
She doesn’t take credit or
|
||
|
Keras, an open-source neural network library writ- blame for things she didn’t do.
|
||
|
ten in Python, was developed by François Chollet,
|
||
|
a French researcher who works on the Brain team She tells the truth about her
|
||
|
at Google. capabilities and her limitations.
|
||
|
|
||
|
Gensim, an open-source library for Python used to She doesn’t assume your physical capabilities, gen-
|
||
|
create unsupervised semantic models from plain der, age, or any other defining characteristic.
|
||
|
text, was written by Radim Řehůřek. He is a Czech
|
||
|
computer scientist who runs a consulting business She doesn't assume she knows
|
||
|
in Bristol, UK. how the user feels about something.
|
||
|
|
||
|
And to finish up this small series, we also looked She is friendly but professional.
|
||
|
at Pattern, an often-used library for web-mining
|
||
|
and machine learning. Pattern was developed and She stays away from emojis in tasks. Period.
|
||
|
made open-source in 2012 by Tom De Smedt and Wal-
|
||
|
ter Daelemans. Both are researchers at CLIPS, the She doesn’t use culturally- or
|
||
|
research centre for Computational Linguistics and professionally-specific slang.
|
||
|
Psycholinguistcs at the University of Antwerp.
|
||
|
She is not a support bot.'
|
||
|
|
||
|
--- Cortana speaks ---
|
||
|
Humans intervene in detailed ways to programme
|
||
|
AI assistants often need their own assistants: answers to questions that Cortana receives. How
|
||
|
they are helped in their writing by humans who in- should Cortana respond when she is being proposed
|
||
|
ject humour and wit into their machine-processed inappropriate actions? Her gendered acting raises
|
||
|
language. Cortana is an example of this type of difficult questions about power relations within
|
||
|
blended writing. She is Microsoft’s digital assis- the world away from the keyboard, which is being
|
||
|
tant. Her mission is to help users to be more pro- mimicked by technology.
|
||
|
ductive and creative. Cortana's personality has
|
||
|
been crafted over the years. It's important that Consider Cortana's answer to the question:
|
||
|
she maintains her character in all interactions - Cortana, who's your daddy?
|
||
|
with users. She is designed to engender trust and - Technically speaking, he’s Bill Gates.
|
||
|
her behavior must always reflect that. No big deal.
|
||
|
|
||
|
The following guidelines are taken from Mi-
|
||
|
crosoft's website. They describe how Cortana's --- Open-source learning ---
|
||
|
style should be respected by companies that extend
|
||
|
her service. Writers, programmers and novelists, Copyright licenses close up a lot of the machinic
|
||
|
who develop Cortana's responses, personality and writing, reading and learning practices. That
|
||
|
branding have to follow these guidelines. Because means that they're only available for the employ-
|
||
|
|
||
|
10
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
ees of a specific company. Some companies partici- very definition, resists categorization.
|
||
|
pate in conferences worldwide and share their
|
||
|
knowledge in papers online. But even if they share References
|
||
|
their code, they often will not share the large Paper: https://hiphilangsci.net/2013/05/01/on-the-
|
||
|
amounts of data needed to train the models. history-of-the-question-of-whether-language
|
||
|
-is-illogical/
|
||
|
We were able to learn to machine learn, read and
|
||
|
write in the context of Algolit, thanks to aca- Book: Neural Network Methods for Natural Language
|
||
|
demic researchers who share their findings in pa- Processing, Yoav Goldberg, Bar Ilan University,
|
||
|
pers or publish their code online. As artists, we April 2017.
|
||
|
believe it is important to share that attitude.
|
||
|
That's why we document our meetings. We share the
|
||
|
tools we make as much as possible and the texts we
|
||
|
use are on our online repository under free li-
|
||
|
censes.
|
||
|
|
||
|
We are thrilled when our works are taken up by
|
||
|
others, tweaked, customized and redistributed, so
|
||
|
please feel free to copy and test the code from
|
||
|
our website. If the sources of a particular
|
||
|
project are not there, you can always contact us
|
||
|
through the mailinglist. You can find a link to
|
||
|
our repository, etherpads and wiki at:
|
||
|
http://www.algolit.net.
|
||
|
|
||
|
|
||
|
--- Natural language for
|
||
|
artificial intelligence ---
|
||
|
|
||
|
Natural Language Processing (NLP) is a collective
|
||
|
term that refers to the automatic computational
|
||
|
processing of human languages. This includes algo-
|
||
|
rithms that take human-produced text as input, and
|
||
|
attempt to generate text that resembles it. We
|
||
|
produce more and more written work each year, and
|
||
|
there is a growing trend in making computer inter-
|
||
|
faces to communicate with us in our own language.
|
||
|
NLP is also very challenging, because human lan-
|
||
|
guage is inherently ambiguous and ever-changing.
|
||
|
|
||
|
But what is meant by 'natural' in NLP? Some would
|
||
|
argue that language is a technology in itself. Ac-
|
||
|
cording to Wikipedia, 'a natural language or ordi-
|
||
|
nary language is any language that has evolved
|
||
|
naturally in humans through use and repetition
|
||
|
without conscious planning or premeditation.
|
||
|
Natural languages can take different forms, such
|
||
|
as speech or signing. They are different from con-
|
||
|
structed and formal languages such as those used
|
||
|
to program computers or to study logic. An offi-
|
||
|
cial language with a regulating academy, such as
|
||
|
Standard French with the French Academy, is clas-
|
||
|
sified as a natural language. Its prescriptive
|
||
|
points do not make it constructed enough to be
|
||
|
classified as a constructed language or controlled
|
||
|
enough to be classified as a controlled natural
|
||
|
language.'
|
||
|
|
||
|
So in fact, 'natural languages' also includes lan-
|
||
|
guages which do not fit in any other group. NLP,
|
||
|
instead, is a constructed practice. What we are
|
||
|
looking at is the creation of a constructed lan-
|
||
|
guage to classify natural languages that, by their
|
||
|
|
||
|
11
|
||
|
0 12 3 4 5 67 8 9 0
|
||
|
12 3 4 5 67 8 9 0 12
|
||
|
3 4 5 67 8 9 0 1 2
|
||
|
3 4 5 6 7 8 9 0 1 2 3
|
||
|
4 56 7 8 9 01 2 3
|
||
|
4 56 7 8 9 01 2 3
|
||
|
4 56 7 8 9 01 2 3
|
||
|
4 56 7 8 9 0 1 2 3 4
|
||
|
5 6 7 8 9 0 1 2 3 45 6
|
||
|
7 8 90 1 2 3 45 6
|
||
|
7 8 90 1 2 3 45 6
|
||
|
7 8 90 1 2 3 4 5 6
|
||
|
7 8 9 0 1 2 3 4 5 6
|
||
|
7 89 0 1 2 34 5 6
|
||
|
7 89 0 1 2 34 5 6
|
||
|
7 89 0 1 2 34 5 6 7
|
||
|
89 0 1 2 3 4 5 6 7 8 9
|
||
|
0 1 2 3 4 5 6 78 9
|
||
|
0 1 23 4 5 6 78 9 0
|
||
|
1 23 4 5 6 78 9 0
|
||
|
1 23 4 5 6 78 9 0
|
||
|
1 2 3 4 5 6 7 8 9 0
|
||
|
1 2 3 4 5 67 8 9 0 12
|
||
|
3 4 5 67 8 9 0 12
|
||
|
3 4 5 67 8 9 0 12 3
|
||
|
4 5 6 7 8 9 0 1 2 3
|
||
|
4 5 6 7 8 9 01 2 3
|
||
|
4 56 7 8 9 01 2 3
|
||
|
4 56 7 8 9 01 2 3
|
||
|
4 56 7 8 9 01 2 3 4
|
||
|
5 6 7 8 9 0 1 2 3 4 5 6
|
||
|
7 8 90 1 2 3 45 6
|
||
|
7 8 90 1 2 3 45 6
|
||
|
7 8 90 1 2 3 45 6
|
||
|
7 8 90 1 2 3 4 5 6 7
|
||
|
8 9 0 1 2 3 4 5 6 7
|
||
|
89 0 1 2 34 5 6 7
|
||
|
89 0 1 2 34 5 6 7 89
|
||
|
0 1 2 34 5 6 7 8 9
|
||
|
0 1 2 3 4 5 6 7 8 9
|
||
|
0 1 23 4 5 6 78 9 0
|
||
|
1 23 4 5 6 78 9 0
|
||
|
1 23 4 5 6 78 9 0
|
||
|
1 23 4 5 6 7 8 9 0
|
||
|
1 2 3 4 5 6 7 8 9 0 12 3
|
||
|
4 5 67 8 9 0 12 3
|
||
|
4 5 67 8 9 0 12 3
|
||
|
4 5 67 8 9 0 12 3
|
||
|
4 5 6 7 8 9 0 1 2 3
|
||
|
4 5 6 7 8 9 01 2 3 4
|
||
|
56 7 8 9 01 2 3 4
|
||
|
56 7 8 9 01 2 3 4 5
|
||
|
6 7 8 9 0 1 2 3 4 5 6
|
||
|
7 8 9 0 1 2 3 45 6
|
||
|
7 8 90 1 2 3 45 6
|
||
|
7 8 90 1 2 3 45 6
|
||
|
7 8 90 1 2 3 45 6 7
|
||
|
8 9 0 1 2 3 4 5 6 7
|
||
|
8 9 0 1 2 34 5 6 7 89
|
||
|
0 1 2 34 5 6 7 89 0
|
||
|
1 2 34 5 6 7 89 0
|
||
|
1 2 34 5 6 7 8 9 0
|
||
|
1 2 3 4 5 6 7 8 9 0
|
||
|
1 23 4 5 6 78 9 0
|
||
|
1 23 4 5 6 78 9 0
|
||
|
1 23 4 5 6 78 9 0 1
|
||
|
2 3 4 5 6 7 8 9 0 1 2 3
|
||
|
4 5 67 8 9 0 12 3
|
||
|
4 5 67 8 9 0 12 3
|
||
|
12
|
||
|
oracles predict oracles predict oracles predict oracles predict oracles predict oracles predic
|
||
|
oracles predict oracles predict oracles predict oracles predict orac
|
||
|
es predict oracles predict oracles predict oracles predict
|
||
|
racles predict oracles predict oracles predict oracles predic
|
||
|
oracles predict oracles predict oracles predict
|
||
|
oracles predict oracles predict oracles predict
|
||
|
oracles predict oracles predict or
|
||
|
cles predict oracles predict oracles predict
|
||
|
oracles predict oracles predict
|
||
|
oracles predict oracles predict oracles pr
|
||
|
dict oracles predict oracles predict
|
||
|
oracles predict oracles predict
|
||
|
oracles predict oracles predict
|
||
|
oracles predict oracles predict
|
||
|
oracles predict oracles predict
|
||
|
oracles predict orac
|
||
|
es predict oracles predict
|
||
|
oracles predict oracles predict
|
||
|
oracles predict oracles predic
|
||
|
oracles predict
|
||
|
oracles predict oracles predict
|
||
|
oracles predict
|
||
|
oracles predict oracles predict
|
||
|
oracles predict
|
||
|
racles predict oracles predict
|
||
|
oracles predict
|
||
|
oracles predict oracles predict
|
||
|
oracles predict
|
||
|
oracles predict orac
|
||
|
es predict oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
racles predict oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
racles predict oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
oracles predict or
|
||
|
cles predict oracles predic
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
oracles predict
|
||
|
13
|
||
|
r e32t 8smc 9i ab14 e s4 +-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+ , e| 8 1 e D ry a4a e ta 9 e
|
||
|
t s5 e ² 348 th8no 2 4at t |o|r|a|c|l|e|s| ar3i |p|r|e|d|i|c|t| 63 s 1 tc39,l3h, d14 5au on w
|
||
|
4 SI, 1 56 e|p 4 iu g7 e +-+-+-+-+-+-+-+ 39k +-+-+-+-+-+-+-+ 9 l o a d r 7 P _ e,a +
|
||
|
n w 2a p/+ 9f8 1of 5\i 4h h e2n 3 t on1 9t \ 94 ne2 + uu e n 63m 5 e a3 2n e,
|
||
|
sn 39ew nt1i -5d 632sd e 15t |a3% 3 c wt9 c n9sg6et 8 8 c , n 1poo F
|
||
|
1 3 o 1g18e +-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ 7 +-+-+-+-+-+-+-+-+ +-+-+-+ 4 n t2+a- 8 43 8 3p4
|
||
|
n o tpn86i |m|a|c|h|i|n|e| |l|e|a|r|n|i|n|g| 2 |a|n|a|l|y|s|e|s| |a|n|d| a 5e v3 5 9 o56n n
|
||
|
e9n 4 5 +-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ etn +-+-+-+-+-+-+-+-+ +-+-+-+ li 5p 8f i h
|
||
|
3 6 k6 3i6 3 9y e , r6 6iA wg r1 +-+-+-+-+-+-+-+-+ 3 e e a y l hl
|
||
|
-N 7 g n6d 14t l1 9ui | _rs e i e 1 |p|r|e|d|i|c|t|s| 1 wn9uc tn s 6m
|
||
|
a rrh4 7 oly e e e e 4 62 y a e +-+-+-+-+-+-+-+-+ g 8a 3 V l% u a i 1 7 1
|
||
|
’ h | 8 8 5 _ n , 8r 4 1_ +-+-+-+-+-+-+ .r +-+-+-+-+ +-+-+-+-+-+-+-+ 5 r 3 9 1 p o f a
|
||
|
r v t 4 o 9 w2 4r |m|o|d|e|l|s| g r |h|a|v|e| |l|e|a|r|n|e|d| 1 n r1 8 2 sro
|
||
|
1 ,d c T2 8 9 41 6 +-+-+-+-+-+-+ c +-+-+-+-+ +-+-+-+-+-+-+-+ d3 s m 6 d n f c t e
|
||
|
t t r 1 6 .ofoi t 5 67 1 +-+-+-+-+-+-+ 7 +-+-+-+ +-+-+-+-+ 4o e e 5 1 98 g ,
|
||
|
+ rw l 9 96 a 3t np , |m|o|d|e|l|s| |a|r|e| |u|s|e|d| , e uu 3 l c t
|
||
|
3 28e 95 9 h _ n +-+-+-+-+-+-+ +-+-+-+ +-+-+-+-+ a9 1e _eu p e d e w
|
||
|
n w r n n f 8 c , d +-+-+-+-+ a +-+-+-+-+-+-+-+-+-+ 84 i e l8 t
|
||
|
+ o mf 7 |t|h|e|y| d |i|n|f|l|u|e|n|c|e| o n a bntq c d n7 8
|
||
|
- s e 9 n 7 77 8 +-+-+-+-+ aa +-+-+-+-+-+-+-+-+-+ t a 6 1 | c4
|
||
|
h o l6 o 9 8 o +-+-+-+-+ i +-+-+-+-+ +-+-+-+-+-+ +-+-+-+ e r 3e9 h 6
|
||
|
o -n p 9 f n s 8hr |t|h|e|y| e- |h|a|v|e| |t|h|e|i|r| |s|a|y| lV d tr
|
||
|
r 2 6 6 a +-+-+-+-+ %5 +-+-+-+-+ +-+-+-+-+-+ +-+-+-+ 3 ip n 5n
|
||
|
r 7 o( s +-+-+-+-+-+-+-+-+-+-+-+ 5 4 a o 7 3 e 6 n- t n f d it
|
||
|
p 1 e |i|n|f|o|r|m|a|t|i|o|n| 4n i3 c, 6 t 1 l ma 7
|
||
|
1 d b +-+-+-+-+-+-+-+-+-+-+-+ a 7 t 4 7 s w 3a e
|
||
|
4 3 3 +-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+ d i 2
|
||
|
6 e r C |e|x|t|r|a|c|t|i|o|n| |r|e|c|o|g|n|i|z|e|s| r
|
||
|
%_ e d kb h +-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+ a
|
||
|
3 c +-+-+-+-+ m v
|
||
|
7 + 9 l 5 so h a a |t|e|x|t| 5 5 e 3 9 P p 5
|
||
|
-9 t u5 7 ' l +-+-+-+-+ m ao n- r
|
||
|
i y +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+ 8 1
|
||
|
a 9 37 |c|l|a|s|s|i|f|i|c|a|t|i|o|n| |d|e|t|e|c|t|s| c
|
||
|
4 I r t p h +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ o +-+-+-+-+-+-+-+ O pe u
|
||
|
g rk 4 7 1 5 5 9 i 4 c 5 2
|
||
|
o 3 p h 9 v r f 3d
|
||
|
d , 3r 5i g h 1 4 l 5
|
||
|
h w c 7 e 3 yo n
|
||
|
h 5 5 2 e m o , c 2 r
|
||
|
s 3 1 7 s 1 e 1
|
||
|
l 6 t e 6 1 r b 2 4
|
||
|
e r 4 4 o s 4
|
||
|
9 ,i pw o c
|
||
|
1 6 n , a 5
|
||
|
e e i 4 p t , ' s
|
||
|
ei 9 t
|
||
|
6 t l u 6 9
|
||
|
V 8 c | _ a
|
||
|
r o 5 r | 3 t t
|
||
|
1 1 o 3 _
|
||
|
o l 6 i 7 + O w e
|
||
|
8 7 M se
|
||
|
% i 3 e
|
||
|
p 3 9
|
||
|
a r a b i n o a
|
||
|
7 e 4 s o tl t
|
||
|
9 r s 94 c
|
||
|
o k5 l 2 | a r T 1 ,
|
||
|
r r 2 s
|
||
|
| , n
|
||
|
o t 5
|
||
|
l t r si
|
||
|
e y s t
|
||
|
y e o
|
||
|
r 8 e 1 h
|
||
|
2 n 6 5
|
||
|
r n 5 s
|
||
|
|
||
|
14
|
||
|
V V V V V V V V %% %% % % % % %
|
||
|
V V V V V V V V V V V V V V V V 0 % 0 % 0 0 %% 0 % % %%
|
||
|
V V V % V % V V V V V % % %% % 0 0 0 0 % 0 0 00
|
||
|
% % % %% % % _____ _ 0 _ _ 0 _ _ % %
|
||
|
% % 0 /__ \ |__ ___ /_\ | | __ _ ___ | (_) %
|
||
|
% ORACLES % % % % % 0 / /\/ '_ \ / _ \ //_\\| |/ _` |/ _ \| | | ___ %
|
||
|
% % %% / / | | | | __/ / _ \ | (_| | (_) | | ||___|
|
||
|
% % \/ |_| |_|\___| \_/ \_/_|\__, |\___/|_|_|
|
||
|
V V V V V V V V % 0 % % % 0 |___/ %
|
||
|
V V V V V V V V V V V V V V V V % 0 0 %% _ 0 0 _ 0 % 0 %
|
||
|
V V V V V V V V V 0 | |_ ___ _ __ __ _| |_ ___ _ __ %
|
||
|
V V V V V V V V % % % % | __/ _ \ '__/ _` | __/ _ \| '__| %
|
||
|
V V V V V V V V V V V V V V V V % | || __/ | | (_| | || (_) | |
|
||
|
V V V V V V V V V \__\___|_| \__,_|\__\___/|_|
|
||
|
% 0 0 %
|
||
|
Machine learning is mainly used to % %
|
||
|
analyse and predict situations by Algolit %
|
||
|
based on existing cases. In this
|
||
|
exhibition we focus on machine The Algoliterator is a neural network trained using the selection
|
||
|
learning models for text processing of digitized works of the Mundaneum archive. %
|
||
|
or Natural Language Processing %
|
||
|
(NLP). These models have learned to With the Algoliterator you can write a text in the style of the
|
||
|
perform a specific task on the ba- International Institutions Bureau. The Algoliterator starts by
|
||
|
sis of existing texts. The models selecting a sentence from the archive or corpus used to train it.
|
||
|
are used for search engines, ma- You can then continue writing yourself or, at any time, ask the
|
||
|
chine translations and summaries, Algoliterator to suggest a next sentence: the network will gener-
|
||
|
spotting trends in new media net- ate three new fragments based on the texts it has read. You can
|
||
|
works and news feeds. They influ- control the level of training of the network and have it generate
|
||
|
ence what you get to see as a user, sentences based on primitive training, intermediate training or
|
||
|
but also have their say in the final training.
|
||
|
course of stock exchanges world-
|
||
|
wide, the detection of cybercrime When you're satisfied with your new text, you can print it on the
|
||
|
and vandalism, etc. thermal printer and take it home as a souvenir.
|
||
|
%
|
||
|
There are two main tasks when it % ---
|
||
|
comes to language understanding.
|
||
|
Information extraction looks at Sources: https://gitlab.constantvzw.org/algolit/algoliterator.clone
|
||
|
concepts and relations between con-
|
||
|
cepts. This allows for recognizing Concept, code & interface: Gijs de Heij & An Mertens
|
||
|
topics, places and persons in a
|
||
|
text, summarization and questions & Technique: Recurrent Neural Network
|
||
|
answering. The other task is text
|
||
|
classification. You can train an Original model: Andrej Karphaty, Justin Johnson %
|
||
|
oracle to detect whether an email
|
||
|
is spam or not, written by a man or
|
||
|
a woman, rather positive or nega- % %
|
||
|
tive. 0 0 0 0 0 0
|
||
|
0 0 0 0 0 0 0
|
||
|
In this zone you can see some of __ __ 0 _ 0 _ 0
|
||
|
those models at work. During your 0 0 / / /\ \ \___ _ __ __| |___ (_)_ __
|
||
|
further journey through the exhibi- \ \/ \/ / _ \| '__/ _` / __| | | '_ \
|
||
|
tion you will discover the differ- \ /\ / (_) | | | (_| \__ \ | | | | |
|
||
|
ent steps that a human-machine goes \/ \/ \___/|_| \__,_|___/ |_|_| |_|
|
||
|
through to come to a final model. 0 __ 0
|
||
|
00 0 / _\_ __ __ _ ___ ___ 0
|
||
|
00 0 \ \| '_ \ / _` |/ __/ _ \
|
||
|
_\ \ |_) | (_| | (_| __/ 0
|
||
|
% 0 \__/ .__/ \__,_|\___\___|
|
||
|
0 0 |_| 0
|
||
|
0 0 0 0 0 0
|
||
|
|
||
|
by Algolit
|
||
|
|
||
|
Word embeddings are language modelling techniques that through
|
||
|
multiple mathematical operations of counting and ordering, plot
|
||
|
words into a multi-dimensional vector space. When embedding
|
||
|
words, they transform from being distinct symbols into mathemati-
|
||
|
cal objects that can be multiplied, divided, added or substracted.
|
||
|
|
||
|
15
|
||
|
%%% % % % % % % % %% % %% % %% %% % %% % % %
|
||
|
% % % % %%% %% %% By distributing the words along the many diagonal lines of the
|
||
|
% % % multi-dimensional vector space, their new geometrical placements
|
||
|
% % become impossible to perceive by humans. However, what is gained
|
||
|
% % % are multiple, simultaneous ways of ordering. Algebraic operations
|
||
|
% %% % make the relations between vectors graspable again. %
|
||
|
% %
|
||
|
% % % This installation uses Gensim, an open-source vector space and
|
||
|
topic-modelling toolkit implemented in the programming language %
|
||
|
Python. It allows to manipulate the text using the mathematical
|
||
|
relationships that emerge between the words, once they have been
|
||
|
% % % plotted in a vector space. %
|
||
|
% % % % %
|
||
|
% % % --- %
|
||
|
% %
|
||
|
% Concept & interface: Cristina Cochior
|
||
|
% % % %
|
||
|
Technique: word embeddings, word2vec %
|
||
|
%
|
||
|
% % Original model: Radim Rehurek and Petr Sojka
|
||
|
% % %
|
||
|
% %
|
||
|
% 0 00 0 0
|
||
|
0
|
||
|
% ___ _ 0 _ __ 0 _ 0
|
||
|
% 0 / __\ | __ _ ___ ___(_)/ _|_ 0 _(_)_ __ __ _
|
||
|
/ / | |/ _` / __/ __| | |_| | | | | '_ \ / _` |
|
||
|
/ /___| | (_| \__ \__ \ | _| |_| | | | | | (_| |
|
||
|
\____/|_|\__,_|___/___/_|_| \__, |_|_| |_|\__, | %
|
||
|
0 0 0 0 0 |___/ |___/
|
||
|
_ _ __ __ _ _
|
||
|
% 0 0 | |_| |__ ___ / / /\ \ \___ _ __| | __| |
|
||
|
% 0 | __| '_ \ / _ \ \ \/ \/ / _ \| '__| |/ _` |
|
||
|
0 | |_| | | | __/ \ /\ / (_) | | | | (_| |
|
||
|
\__|_| |_|\___| \/ \/ \___/|_| |_|\__,_|
|
||
|
0 0 0
|
||
|
%
|
||
|
by Algolit
|
||
|
|
||
|
% Librarian Paul Otlet's life work was the construction of the Mun-
|
||
|
daneum. This mechanical collective brain would house and distrib-
|
||
|
ute everything ever committed to paper. Each document was classi-
|
||
|
% fied following the Universal Decimal Classification. Using tele-
|
||
|
graphs and especially, sorters, the Mundaneum would have been
|
||
|
able to answer any question from anyone.
|
||
|
|
||
|
With the collection of digitized publications we received from
|
||
|
the Mundaneum, we built a prediction machine that tries to clas-
|
||
|
% sify the sentence you type in one of the main categories of
|
||
|
Universal Decimal Classification. You also witness how the ma-
|
||
|
chine 'thinks'. During the exhibition, this model is regularly
|
||
|
retrained using the cleaned and annotated data visitors added in
|
||
|
% Cleaning for Poems and The Annotator. %
|
||
|
|
||
|
The main classes of the Universal Decimal Classification system
|
||
|
are:
|
||
|
% %
|
||
|
0 - Science and Knowledge. Organization. Computer Science. Infor-
|
||
|
mation Science. Documentation. Librarianship. Institutions.
|
||
|
Publications %
|
||
|
|
||
|
1 - Philosophy. Psychology
|
||
|
|
||
|
2 - Religion. Theology
|
||
|
%
|
||
|
3 - Social Sciences
|
||
|
%
|
||
|
4 - vacant
|
||
|
|
||
|
16
|
||
|
%% %% %%% %% % %% 5 - Mathematics. Natural Sciences % % % % % % %% %
|
||
|
% % %% % % % %% %% %% % % % % % %
|
||
|
% % % % 6 - Applied Sciences. Medicine, Technology %
|
||
|
% % % % % % % %%
|
||
|
% %% % 7 - The Arts. Entertainment. Sport % %% %
|
||
|
% %% % % % % % %
|
||
|
% % 8 - Linguistics. Literature % %
|
||
|
% % % % % % % % % %
|
||
|
% % % % 9 - Geography. History % %% %
|
||
|
%% % % %
|
||
|
% % % ---
|
||
|
% % %
|
||
|
% Concept, code, interface: Sarah Garcin, Gijs de Heij, An Mertens
|
||
|
% % % % %
|
||
|
% %
|
||
|
% % 0 0 % 0 %
|
||
|
%% 000 0 0 % 0
|
||
|
% ___ 00 _ 0 %
|
||
|
0 / _ \___ ___ _ __ | | ___ %
|
||
|
0 0 / /_)/ _ \/ _ \| '_ \| |/ _ \
|
||
|
0 0 / ___/ __/ (_) | |_) | | __/ 0
|
||
|
0 \/ % \___|\___/| .__/|_|\___|
|
||
|
0 0 0 |_| 0
|
||
|
% _ _ _ 0 _ 0 0
|
||
|
0 0 __| | ___ _ __( ) |_ | |__ __ ___ _____ %
|
||
|
% / _` |/ _ \| '_ \/| __| | '_ \ / _` \ \ / / _ \ %
|
||
|
| (_| | (_) | | | || |_ | | | | (_| |\ V / __/
|
||
|
0 \__,_|\___/|_| |_| \__| |_| |_|\__,_| \_/ \___|
|
||
|
0
|
||
|
_ 0 _ _ 0
|
||
|
| |__ _ _| |_| |_ ___ _ __ ___
|
||
|
| '_ \| | | | __| __/ _ \| '_ \/ __|
|
||
|
% 0 | |_) | |_| | |_| || (_) | | | \__ \
|
||
|
0 |_.__/ \__,_|\__|\__\___/|_| |_|___/
|
||
|
0 0
|
||
|
%
|
||
|
by Algolit
|
||
|
|
||
|
Since the early days of artificial intelligence (AI), researchers
|
||
|
have speculated about the possibility of computers thinking and
|
||
|
communicating as humans. In the 1980s, there was a first revolu-
|
||
|
tion in Natural Language Processing (NLP), the subfield of AI
|
||
|
concerned with linguistic interactions between computers and hu-
|
||
|
mans. Recently, pre-trained language models have reached state-
|
||
|
of-the-art results on a wide range of NLP tasks, which intensi-
|
||
|
% fies again the expectations of a future with AI.
|
||
|
%
|
||
|
This sound work, made out of audio fragments of scientific docu-
|
||
|
mentaries and AI-related audiovisual material from the last half
|
||
|
century, explores the hopes, fears and frustrations provoked by
|
||
|
these expectations.
|
||
|
|
||
|
---
|
||
|
|
||
|
% Concept, sound edit: Javier Lloret
|
||
|
%
|
||
|
List of sources: 'The Machine that Changed the World :
|
||
|
Episode IV -- The Thinking Machine', 'The Imitation Game',
|
||
|
'Maniac', 'Halt & Catch Fire', 'Ghost in the Shell',
|
||
|
'Computer Chess', '2001: A Space Odyssey', Ennio Morricone,
|
||
|
Gijs Gieskes, André Castro.
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
17
|
||
|
CONTEXTUAL STORIES
|
||
|
ABOUT ORACLES
|
||
|
|
||
|
|
||
|
|
||
|
Oracles are prediction or profiling machines. Sweeney based her research on queries of 2184 raci-
|
||
|
They are widely used in smartphones, computers, ally associated personal names across two websites.
|
||
|
tablets.
|
||
|
88 per cent of first names, identified as
|
||
|
Oracles can be created using different techniques. being given to more black babies, are found pre-
|
||
|
One way is to manually define rules for them. As dictive of race, against 96 per cent white. First
|
||
|
prediction models they are then called rule-based names that are mainly given to black babies, such
|
||
|
models. Rule-based models are handy for tasks that as DeShawn, Darnell and Jermaine, generated ads
|
||
|
are specific, like detecting when a scientific pa- mentioning an arrest in 81 to 86 per cent of name
|
||
|
per concerns a certain molecule. With very little searches on one website and in 92 to 95 per cent
|
||
|
sample data, they can perform well. on the other. Names that are mainly assigned to
|
||
|
whites, such as Geoffrey, Jill and Emma, did not
|
||
|
But there are also the machine learning or statis- generate the same results. The word 'arrest' only
|
||
|
tical models, which can be divided in two oracles: appeared in 23 to 29 per cent of white name
|
||
|
'supervised' and 'unsupervised' oracles. For the searches on one site and 0 to 60 per cent on the
|
||
|
creation of supervised machine learning models, other.
|
||
|
humans annotate sample text with labels before
|
||
|
feeding it to a machine to learn. Each sentence, On the website with most advertising, a black-
|
||
|
paragraph or text is judged by at least three an- identifying name was 25 percent more likely to get
|
||
|
notators: whether it is spam or not spam, positive an ad suggestive of an arrest record. A few names
|
||
|
or negative etc. Unsupervised machine learning did not follow these patterns: Dustin, a name
|
||
|
models don't need this step. But they need large mainly given to white babies, generated an ad sug-
|
||
|
amounts of data. And it is up to the machine to gestive of arrest in 81 and 100 percent of the
|
||
|
trace its own patterns or 'grammatical rules'. Fi- time. It is important to keep in mind that the ap-
|
||
|
nally, experts also make the difference between pearance of the ad is linked to the name itself.
|
||
|
classical machine learning and neural networks. It is independent of the fact that the name has an
|
||
|
You'll find out more about this in the Readers arrest record in the company's database.
|
||
|
zone.
|
||
|
Reference
|
||
|
Humans tend to wrap Oracles in visions of Paper: https://dataprivacylab.org/projects/
|
||
|
grandeur. Sometimes these Oracles come to the sur- onlineads/1071-1.pdf
|
||
|
face when things break down. In press releases,
|
||
|
these sometimes dramatic situations are called
|
||
|
'lessons'. However promising their performances --- What is a good employee? ---
|
||
|
seem to be, a lot of issues remain to be solved.
|
||
|
How do we make sure that Oracles are fair, that Since 2015 Amazon employs around 575,000 workers.
|
||
|
every human can consult them, and that they are And they need more. Therefore, they set up a team
|
||
|
understandable to a large public? Even then, exis- of 12 that was asked to create a model to find the
|
||
|
tential questions remain. Do we need all types of right candidates by crawling job application web-
|
||
|
artificial intelligence (AI) systems? And who de- sites. The tool would give job candidates scores
|
||
|
fines what is fair or unfair? ranging from one to five stars. The potential fed
|
||
|
the myth: the team wanted it to be a software that
|
||
|
would spit out the top five human candidates out
|
||
|
--- Racial AdSense --- of a list of 100. And those candidates would be
|
||
|
hired.
|
||
|
A classic 'lesson' in developing Oracles was docu-
|
||
|
mented by Latanya Sweeney, a professor of Govern- The group created 500 computer models, focused on
|
||
|
ment and Technology at Harvard University. In specific job functions and locations. They taught
|
||
|
2013, Sweeney, of African American descent, each model to recognize some 50,000 terms that
|
||
|
googled her name. She immediately received an ad- showed up on past candidates’ letters. The algo-
|
||
|
vertisement for a service that offered her ‘to see rithms learned to give little importance to skills
|
||
|
the criminal record of Latanya Sweeney’. common across IT applicants, like the ability to
|
||
|
write various computer codes. But they also
|
||
|
Sweeney, who doesn’t have a criminal record, began learned some decent errors. The company realized,
|
||
|
a study. She started to compare the advertising before releasing, that the models had taught them-
|
||
|
that Google AdSense serves to different racially selves that male candidates were preferable. They
|
||
|
identifiable names. She discovered that she re- penalized applications that included the word 'wo-
|
||
|
ceived more of these ads searching for non-white men’s,' as in 'women’s chess club captain'. And they
|
||
|
ethnic names, than when searching for tradition- downgraded graduates of two all-women’s colleges.
|
||
|
ally perceived white names.You can imagine how
|
||
|
damaging it can be when possible employers do a This is because they were trained using the job
|
||
|
simple name search and receive ads suggesting the applications that Amazon received over a ten-year
|
||
|
existence of a criminal record. period. During that time, the company had mostly
|
||
|
|
||
|
18
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
hired men. Instead of providing the 'fair' deci- tools become tools of awareness.
|
||
|
sion-making that the Amazon team had promised, the
|
||
|
models reflected a biased tendency in the tech in- The team developed a model to analyse word embed-
|
||
|
dustry. And they also amplified it and made it in- dings trained over 100 years of texts. For contem-
|
||
|
visible. Activists and critics state that it could porary analysis, they used the standard Google
|
||
|
be exceedingly difficult to sue an employer over News word2vec Vectors, a straight-off-the-shelf
|
||
|
automated hiring: job candidates might never know downloadable package trained on the Google News
|
||
|
that intelligent software was used in the process. Dataset. For historical analysis, they used embed-
|
||
|
dings that were trained on Google Books and the
|
||
|
Reference Corpus of Historical American English (COHA
|
||
|
https://www.reuters.com/article/us-amazon-com- https://corpus.byu.edu/coha/) with more than 400
|
||
|
jobs-automation-insight/amazonscraps-secret-ai- million words of text from the 1810s to 2000s. As a
|
||
|
recruiting-tool-that-showed-bias-against-women- validation set to test the model, they trained em-
|
||
|
idUSKCN1MK08G beddings from the New York Times Annotated Corpus
|
||
|
for every year between 1988 and 2005.
|
||
|
|
||
|
--- Quantifying 100 Years The research shows that word embeddings capture
|
||
|
of Gender and Ethnic Stereotypes --- changes in gender and ethnic stereotypes over
|
||
|
time. They quantifiy how specific biases decrease
|
||
|
Dan Jurafsky is the co-author of 'Speech and Lan- over time while other stereotypes increase. The
|
||
|
guage Processing', one of the most influential major transitions reveal changes in the descrip-
|
||
|
books for studying Natural Language Processing tions of gender and ethnic groups during the
|
||
|
(NLP). Together with a few colleagues at Stanford women’s movement in the 1960-1970s and the Asian-
|
||
|
University, he discovered in 2017 that word embed- American population growth in the 1960s and 1980s.
|
||
|
dings can be a powerful tool to systematically
|
||
|
quantify common stereotypes and other historical A few examples:
|
||
|
trends.
|
||
|
The top ten occupations most closely associated
|
||
|
Word embeddings are a technique that translates with each ethnic group in the contemporary
|
||
|
words to numbered vectors in a multi-dimensional Google News dataset:
|
||
|
space. Vectors that appear next to each other,
|
||
|
indicate similar meaning. All numbers will be - Hispanic: housekeeper, mason, artist, janitor,
|
||
|
grouped together, as well as all prepositions, dancer, mechanic, photographer, baker, cashier,
|
||
|
person's names, professions. This allows for the driver
|
||
|
calculation of words. You could substract London
|
||
|
from England and your result would be the same as - Asian: professor, official, secretary,
|
||
|
substracting Paris from France. conductor, physicist, scientist, chemist, tailor,
|
||
|
accountant, engineer
|
||
|
An example in their research shows that the vector
|
||
|
for the adjective 'honorable' is closer to the - White: smith, blacksmith, surveyor, sheriff,
|
||
|
vector for 'man' whereas the vector for 'submissive' weaver, administrator, mason, statistician,
|
||
|
is closer to 'woman'. These stereotypes are auto- clergy, photographer
|
||
|
matically learned by the algorithm. It will be pro-
|
||
|
blematic when the pre-trained embeddings are then The 3 most male occupations in the 1930s:
|
||
|
used for sensitive applications such as search ran- engineer, lawyer, architect.
|
||
|
kings, product recommendations, or translations. The 3 most female occupations in the 1930s:
|
||
|
This risk is real, because a lot of the pre- nurse, housekeeper, attendant.
|
||
|
trained embeddings can be downloaded as off-
|
||
|
the-shelf-packages. Not much has changed in the 1990s.
|
||
|
|
||
|
It is known that language reflects and keeps cul- Major male occupations:
|
||
|
tural stereotypes alive. Using word embeddings to architect, mathematician and surveyor.
|
||
|
spot these stereotypes is less time-consuming and Female occupations:
|
||
|
less expensive than manual methods. But the imple- nurse, housekeeper and midwife.
|
||
|
mentation of these embeddings for concrete predic-
|
||
|
tion models, has caused a lot of discussion within Reference
|
||
|
the machine learning community. The biased models https://arxiv.org/abs/1711.08412
|
||
|
stand for automatic discrimination. Questions are:
|
||
|
is it actually possible to de-bias these models
|
||
|
completely? Some say yes, while others disagree: --- Wikimedia's Ores service ---
|
||
|
instead of retro-engineering the model, we should
|
||
|
ask whether we need it in the first place. These Software engineer Amir Sarabadani presented the
|
||
|
researchers followed a third path: by acknowledg- ORES-project in Brussels in November 2017 during
|
||
|
ing the bias that originates in language, these the Algoliterary Encounter.
|
||
|
|
||
|
19
|
||
|
|
||
|
|
||
|
|
||
|
This 'Objective Revision Evaluation Service' uses Twitter. She lived for less than 24 hours before
|
||
|
machine learning to help automate critical work on she was shut down. Few people know that before
|
||
|
Wikimedia, like vandalism detection and the re- this incident, Microsoft had already trained and
|
||
|
moval of articles. Cristina Cochior and Femke released XiaoIce on WeChat, China's most used chat
|
||
|
Snelting interviewed him. application. XiaoIce's success was so promising
|
||
|
that it led to the development of its American
|
||
|
Femke: To go back to your work. In these days you version. However, the developers of Tay were
|
||
|
tried to understand what it means to find bias in not prepared for the platform climate of Twitter.
|
||
|
machine learning and the proposal of Nicolas Although the bot knew how to distinguish a noun
|
||
|
Maleve, who gave the workshop yesterday, was nei- from an adjective, it had no understanding of the
|
||
|
ther to try to fix it, nor to refuse to deal with actual meaning of words. The bot quickly learned
|
||
|
systems that produce bias, but to work with them. to copy racial insults and other discriminative
|
||
|
He says that bias is inherent to human knowledge, language it learned from Twitter users and troll
|
||
|
so we need to find ways to somehow work with it. attacks.
|
||
|
We're just struggling a bit with what would that
|
||
|
mean, how would that work... So I was wondering Tay's appearance and disappearance was an impor-
|
||
|
whether you had any thoughts on the question of tant moment of consciousness. It showed the possi-
|
||
|
bias. ble corrupt consequences that machine learning can
|
||
|
have when the cultural context in which the algo-
|
||
|
Amir: Bias inside Wikipedia is a tricky question rithm has to live is not taken into account.
|
||
|
because it happens on several levels. One level
|
||
|
that has been discussed a lot is the bias in ref- Reference
|
||
|
erences. Not all references are accessible. So one https://chatbotslife.com/the-accountability-of-ai-
|
||
|
thing that the Wikimedia Foundation has been try- case-study-microsofts-tay-experiment-ad577015181f
|
||
|
ing to do, is to give free access to libraries
|
||
|
that are behind a pay wall. They reduce the bias
|
||
|
by only using open-access references. Another type
|
||
|
of bias is the Internet connection, access to the
|
||
|
Internet. There are lots of people who don't have
|
||
|
it. One thing about China is that the Internet
|
||
|
there is blocked. The content against the govern-
|
||
|
ment of China inside Chinese Wikipedia is higher
|
||
|
because the editors [who can access the website]
|
||
|
are not people who are pro government, and try to
|
||
|
make it more neutral. So, this happens in lots of
|
||
|
places. But in the matter of artificial intelli-
|
||
|
gence (AI) and the model that we use at Wikipedia,
|
||
|
it's more a matter of transparency. There is a
|
||
|
book about how bias in AI models can break peo-
|
||
|
ple's lives, it's called 'Weapons of Math Destruc-
|
||
|
tion'. It talks about AI models that exist in the
|
||
|
US that rank teachers and it's quite horrible be-
|
||
|
cause eventually there will be bias. The way to
|
||
|
deal with it based on the book and their research
|
||
|
was first that the model should be open source,
|
||
|
people should be able to see what features are
|
||
|
used and the data should be open also, so that
|
||
|
people can investigate, find bias, give feedback
|
||
|
and report back. There should be a way to fix the
|
||
|
system. I think not all companies are moving in
|
||
|
that direction, but Wikipedia, because of the val-
|
||
|
ues that they hold, are at least more transparent
|
||
|
and they push other people to do the same thing.
|
||
|
|
||
|
Reference
|
||
|
https://gitlab.constantvzw.org/algolit/algolit
|
||
|
/blob/master/algoliterary_encounter/Interview%
|
||
|
20with%20Amir/AS.aac
|
||
|
|
||
|
|
||
|
--- Tay ---
|
||
|
|
||
|
One of the infamous stories is that of the machine
|
||
|
learning programme Tay, designed by Microsoft.
|
||
|
Tay was a chat bot that imitated a teenage girl on
|
||
|
|
||
|
20
|
||
|
cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean
|
||
|
cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean
|
||
|
cleaners clean cleaners clean cleaners clean cleaners clean
|
||
|
cleaners clean cleaners clean cleaners clean
|
||
|
cleaners clean cleaners clean cleaners clean cle
|
||
|
ners clean cleaners clean cleaners clean
|
||
|
cleaners clean cleaners clean cleaners clean
|
||
|
cleaners clean cleaners clean cleaners
|
||
|
lean cleaners clean cleaners clean
|
||
|
cleaners clean cleaners clean
|
||
|
cleaners clean cleaners clean cle
|
||
|
ners clean cleaners clean cleaners
|
||
|
clean cleaners clean cleaners
|
||
|
lean cleaners clean cleane
|
||
|
s clean cleaners clean
|
||
|
cleaners clean cleaners clean
|
||
|
cleaners clean cleaners clean
|
||
|
cleaners clean cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean cleaners clean
|
||
|
cleaners clean cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean cleaners
|
||
|
clean cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean cleaners
|
||
|
clean cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean cle
|
||
|
ners clean cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean cleaners
|
||
|
lean cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
cleaners clean
|
||
|
21
|
||
|
r u e n 7 c %9 2 y m V +-+-+-+-+-+-+-+-+ e4 +-+-+-+-+-+ 9 -t 0n neof e 5 r6 7 kln
|
||
|
ci p '.s w s u 18 u n |c|l|e|a|n|e|r|s| 2 |c|l|e|a|n| et.t o % s eii4t i ktu 4i w +
|
||
|
t 6 . 3e -6 6 rVle 17 +-+-+-+-+-+-+-+-+ rg +-+-+-+-+-+ .e o n 7 ci i 0 e h eR e85 orh
|
||
|
n x h r 4 h t5 7hoh 4 t ei g + n e3 tt np% k s +h_ hees ir w n +6 l rt 8 oe e Fe
|
||
|
r5b t ua0e 3ei n a 1 t8 rd t 7 li \ 7n v2 tq e e6 a as o
|
||
|
2b t t m oe f c8 lx - g9 r - -s+ +-+-+ h +-+-+-+-+-+-+ 8f o1 Ao % r - 5i 2 e - r
|
||
|
x p n4h e6 s n8 / s7 . 95 sti |w|e| eno |h|e|l|p|e|d| +e r a2 sy n gyl 2u e sti6t
|
||
|
ch% _ 1r se o + t t 4, 1 t9 l +-+-+ e +-+-+-+-+-+-+ t r i 7 rs u ie o o,4 h
|
||
|
, 5 5h g gs 6u5e e0 95 eif e % +-+-+ s 9 +-+-+-+-+-+-+-+ o+ m iy n6 m _4 l oae s+ da
|
||
|
e w i_|e e a 6 an |w|e| | |c|l|e|a|n|e|d| 7 i a e r l 7
|
||
|
se 8w ,p+tn i d t 1 g s ae l +-+-+ tec +-+-+-+-+-+-+-+ - ts e e,d % e 8e i
|
||
|
r i _6sog y L5 e v +-+-+-+-+-+ +-+-+-+-+ er +-+-+ +-+-+-+-+-+-+ Ies f e/ 8rh gr o 5 ac55 e
|
||
|
( h s s9 |h|u|m|a|n| |w|o|r|k| 96 7 |i|s| |n|e|e|d|e|d| i 8 d 13 l , i
|
||
|
- s tt 1 _ S +-+-+-+-+-+ +-+-+-+-+ _ +-+-+ +-+-+-+-+-+-+ r v Mr_ a3 f r ,
|
||
|
a s l n 87 +-+-+-+-+-+-+-+-+-+-+-+ rh 9 t r 7 36 w i n e 2 n d m
|
||
|
i4 +2 c 6 o |p|o|o|r|l|y|-|p|a|i|d| w n 3 g e - 6 tk o- r r
|
||
|
w9 4 t 8p ie c rVv 5 +-+-+-+-+-+-+-+-+-+-+-+ b n h - 6 xc te|t ,2 5 n
|
||
|
4 4 ,in 7 4( d +-+-+-+-+-+-+-+-+-+-+-+ l +-+-+-+-+-+ +-+-+-+ -d ah v + n5 . 4 6s_
|
||
|
t 2- i l |f|r|e|e|l|a|n|c|e|r|s| te3c |c|a|r|r|y| |o|u|t| l e oee 1n 7 \ y1k
|
||
|
r r l p r 6 e +-+-+-+-+-+-+-+-+-+-+-+ 6|p +-+-+-+-+-+ +-+-+-+ s p o2 ) t -e : p 8 h
|
||
|
h9 h o 4l +-+-+-+-+-+-+-+-+-+-+ \ +-+-+ +-+-+-+-+-+-+-+-+-+ +-+-+-+-+ nb h 7 s4i1 3
|
||
|
T z3 |h e 9 |v|o|l|u|n|t|e|e|r|s| 9 |d|o| |f|a|n|t|a|s|t|i|c| |w|o|r|k| 9 ws w 5 e6 x
|
||
|
a` o +-+-+-+-+-+-+-+-+-+-+ +-+-+ +-+-+-+-+-+-+-+-+-+ +-+-+-+-+ ih l 3 6
|
||
|
7 r 6 d G i6 1 3 e1 +-+-+-+-+-+-+-+ +-+-+-+-+-+-+ +-+-+ +-+-+-+-+ eir c e n% ui
|
||
|
l r 6 6s t r |w|h|o|e|v|e|r| |c|l|e|a|n|s| |u|p| |t|e|x|t| h 6 t i
|
||
|
t tc w a s e 9 +-+-+-+-+-+-+-+ F +-+-+-+-+-+-+ +-+-+ +-+-+-+-+ , 5 9s9 w e e
|
||
|
n m5 e 4 Mi e c i a U u r e 2 a i % .S g6 u 3
|
||
|
_t f 2 t 5 t6 v V c a i f- ee l 9rni/ 3 a 7e 1
|
||
|
1o n 3 2 tn t 5 1o 7 r s / % uio +
|
||
|
9 f a 4 - e o e t + i r + s 2
|
||
|
ls_ nr e w i l V - 8e t 5 +i v 2 p o
|
||
|
l n e j n tr l V| n e w L r 8
|
||
|
c l1 l i i a 8 t g0 y s
|
||
|
, a u r9 e 8 4 9 e | e 3
|
||
|
n g8 r e? M d r a i l c
|
||
|
- n t r 4 e r l c ii e a
|
||
|
p r a a h 6 l 3 e s
|
||
|
i 4 c o | 6 v rh p7 3 % h t a
|
||
|
e e 1 6 6 p 15 8 e a n s d o 1 i 2 n
|
||
|
s e m t 2 w v a 6 i i
|
||
|
r 7 | a e 5 7 s 3 8 i 4 7
|
||
|
e y 4 3 w 5 l unw5 4ie o3 439 o i %
|
||
|
r 6 e a 4a f n e
|
||
|
h a 5 o s i l s
|
||
|
- s | n D 4
|
||
|
e 3 - 2 5 h a 1 V p n v
|
||
|
+ 7 8n n a ar ) v
|
||
|
. n2 t 5 6r 8 |
|
||
|
u o _ e r l n, r 1 e
|
||
|
n ,e r s 7 a 7
|
||
|
a e h t y d a 3
|
||
|
u | 2 a s 4 t
|
||
|
6 e t66 e % 2 3 y 3 n
|
||
|
a e o i , t 4 i e g c r
|
||
|
l t w 9 2 a
|
||
|
h v t , p c a r h c
|
||
|
l 4 g p1
|
||
|
z i t o m a % a
|
||
|
i k | a i e
|
||
|
s a v c a , l lp + d 2 a
|
||
|
3 o t
|
||
|
e
|
||
|
5 n t p s i a 6 r
|
||
|
e 5 y,r m e ,
|
||
|
g i 7 s i 5 s a
|
||
|
a a % r
|
||
|
3 u p n
|
||
|
e \ 5 i p o l i
|
||
|
|
||
|
22
|
||
|
% V V V V V V V % V % % % % %% % % %% % % % % % % %
|
||
|
V V V V V V V V V V V V V V V V % % % % 0 % % 0 % 0 0 % 0 % % %%% %
|
||
|
V V V V V V V V % V % 0 % 0 0 % % %
|
||
|
% % % %% ___ _ 0 % 00 _ % % %
|
||
|
% % % % 00 / __\ | ___ __ _ _ __ (_)_ __ __ _ %
|
||
|
CLEANERS % % / / | |/ _ \/ _` | '_ \| | '_ \ / _` | 0 %
|
||
|
% % % % % % 00 / /___| | __/ (_| | | | | | | | | (_| | %
|
||
|
% % % % % % 0 \____/|_|\___|\__,_|_| |_|_|_| |_|\__, | %
|
||
|
V V V V V V V V % 0 |___/ % %
|
||
|
V V V V V V V V V V V V V V V V __ 0 ___ 0 % 0
|
||
|
V V V V V V V V V 0 / _| ___ _ __ / _ \___ ___ _ __ ___ ___ %
|
||
|
V V V V V V V V 0 % | |_ / _ \| '__| / /_)/ _ \ / _ \ '_ ` _ \/ __| %
|
||
|
V V V V V V V V V V V V V V V V 0 | _| (_) | | / ___/ (_) | __/ | | | | \__ \
|
||
|
V V V V V V V V V |_| \___/|_| \/ 0 \___/ \___|_| |_| |_|___/
|
||
|
0 0
|
||
|
Algolit chooses to work with texts %%% %
|
||
|
that are free of copyright. This by Algolit % % %
|
||
|
means that they have been published % % %
|
||
|
under a Creative Commons 4.0 li- For this exhibition we worked with 3 per cent of the Mundaneum's
|
||
|
cense – which is rare - or that archive. These documents were first scanned or photographed. To
|
||
|
they are in the public domain be- make the documents searchable they were transformed into text us-
|
||
|
cause the author died more than 70 ing Optical Character Recognition software (OCR). OCR are algo-
|
||
|
years ago. This is the case for the % rithmic models that are trained on other texts. They have learned
|
||
|
publications of the Mundaneum. We to identify characters, words, sentences and paragraphs. The
|
||
|
received 203 documents that we software often makes 'mistakes'. It might recognize a wrong char-
|
||
|
helped turn into datasets. They are acter, it might get confused by a stain an unusual font or the
|
||
|
now available for others online. reverse side of the page being visible. %
|
||
|
Sometimes we had to deal with poor % % %
|
||
|
text formats, and we often dedi- While these mistakes are often considered noise, confusing the
|
||
|
cated a lot of time to cleaning up training, they can also be seen as poetic interpretations of the
|
||
|
documents. We were not alone in do- algorithm. They show us the limits of the machine. And they also
|
||
|
ing this. reveal how the algorithm might work, what material it has seen in
|
||
|
training and what is new. They say something about the standards %
|
||
|
Books are scanned at high resolu- of its makers. In this installation we ask your help in verifying
|
||
|
tion, page by page. This is time- our dataset. As a reward we'll present you with a personal algo-
|
||
|
consuming, laborious human work and rithmic improvisation.
|
||
|
often the reason why archives and
|
||
|
libraries transfer their collec- ---
|
||
|
tions and leave the job to compa- %
|
||
|
nies like Google. The photos are Concept, code, interface: Gijs de Heij
|
||
|
converted into text via OCR (Opti- %
|
||
|
cal Character Recognition), a soft-
|
||
|
ware that recognizes letters, but
|
||
|
often makes mistakes, especially
|
||
|
when it has to deal with ancient
|
||
|
fonts and wrinkled pages. Yet more
|
||
|
wearisome human work is needed to
|
||
|
improve the texts. This is often
|
||
|
carried out by poorly-paid free-
|
||
|
lancers via micro-payment platforms
|
||
|
like Amazon's Mechanical Turk; or
|
||
|
by volunteers, like the community
|
||
|
around the Distributed Proofreaders
|
||
|
Project, which does fantastic work.
|
||
|
Whoever does it, or wherever it is
|
||
|
done, cleaning up texts is a tower-
|
||
|
ing job for which no structural au-
|
||
|
tomation yet exists.
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
23
|
||
|
% % % % % %% % % % % % % % %% 0 0 % % % % % %%
|
||
|
% %% % % % 0 0 0 % %% % % %% %%% %
|
||
|
% % %% %%% 0 ___ _ _ _ _ 0 _ % _
|
||
|
% % % % % 0 0 / (_)___| |_ _ __(_) |__ _ _| |_ ___ __| | % %
|
||
|
% % / /\ / / __| __| '__| | '_ \| | | | __/ _ \/ _` |
|
||
|
%% 0 / /_//| \__ \ |_| | | | |_) | |_| | || __/ (_| | %%
|
||
|
% % /___,' |_|___/\__|_| |_|_.__/ \__,_|\__\___|\__,_|
|
||
|
% % % ___ 0 __ 0 0 _ %
|
||
|
% / _ \_ __ ___ ___ / _|_ __ ___ __ _ __| | ___ _ __ ___
|
||
|
% % / /_)/ '__/ _ \ / _ \| |_| '__/ _ \/ _` |/ _` |/ _ \ '__/ __|
|
||
|
/ ___/| | | (_) | (_) | _| | | __/ (_| | (_| | __/ | \__ \
|
||
|
% 0 \/ |_| \___/ \___/|_| |_| \___|\__,_|\__,_|\___|_| |___/
|
||
|
0 0 0
|
||
|
% 0 % 0 % %%
|
||
|
% 0 0 0 % %
|
||
|
% % by Algolit % %
|
||
|
|
||
|
Distributed Proofreaders is a web-based interface and an interna-
|
||
|
tional community of volunteers who help converting public domain
|
||
|
% % books into e-books. For this exhibition they proofread the Munda-
|
||
|
neum publications that appeared before 1923 and are in the public
|
||
|
domain in the US. Their collaboration meant a great relief for
|
||
|
the members of Algolit. Less documents to clean up!
|
||
|
|
||
|
All the proofread books have been made available on the Project
|
||
|
Gutenberg archive.
|
||
|
|
||
|
For this exhibition, An Mertens interviewed Linda Hamilton, the
|
||
|
general manager of Distributed Proofreaders.
|
||
|
|
||
|
---
|
||
|
%
|
||
|
Interview: An Mertens
|
||
|
%
|
||
|
Editing: Michael Murtaugh, Constant %
|
||
|
%
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
%
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
24
|
||
|
CONTEXTUAL STORIES
|
||
|
FOR CLEANERS
|
||
|
|
||
|
|
||
|
|
||
|
--- Project Gutenberg and path to death – run your own code; dynamic change.
|
||
|
Distributed Proofreaders --- operate it. For nearly 84 years, the Turk won most
|
||
|
The Life Instinct: unification; the eternal re-
|
||
|
Project Gutenberg is our Ali Baba cave. It offers turn; the perpetuation and MAINTENANCE of the mate-
|
||
|
more than 58,000 free eBooks to be downloaded or rial; survival systems and operations; equilibrium.
|
||
|
read online. Works are accepted on Gutenberg when
|
||
|
their U.S. copyright has expired. Thousands of B. Two basic systems: Development and Maintenance.
|
||
|
volunteers digitize and proofread books to help
|
||
|
the project. An essential part of the work is done The sourball of every revolution: after the revo-
|
||
|
through the Distributed Proofreaders project. This lution, who’s going to try to spot the bias in
|
||
|
is a web-based interface to help convert public the output?
|
||
|
domain books into e-books. Think of text files,
|
||
|
EPUBs, Kindle formats. By dividing the workload Development: pure individual creation; the new;
|
||
|
into individual pages, many volunteers can work change; progress; advance; excitement; flight or
|
||
|
on a book at the same time; this speeds up the fleeing.
|
||
|
cleaning process.
|
||
|
Maintenance: keep the dust off the pure individual
|
||
|
During proofreading, volunteers are presented with creation; preserve the new; sustain the change;
|
||
|
a scanned image of the page and a version of the protect progress; defend and prolong the advance;
|
||
|
text, as it is read by an OCR algorithm trained to renew the excitement; repeat the flight; show your
|
||
|
recognize letters in images. This allows the text work – show it again, keep the git repository
|
||
|
to be easily compared to the image, proofread, and groovy, keep the data analysis revealing.
|
||
|
sent back to the site. A second volunteer is then
|
||
|
presented with the first volunteer's work. She Development systems are partial feedback systems
|
||
|
verifies and corrects the work as necessary, and with major room for change.
|
||
|
submits it back to the site. The book then simi-
|
||
|
larly goes through a third proofreading round, Maintenance systems are direct feedback systems
|
||
|
plus two more formatting rounds using the same web with little room for alteration.
|
||
|
interface. Once all the pages have completed these
|
||
|
steps, a post-processor carefully assembles them C. Maintenance is a drag;
|
||
|
into an e-book and submits it to the Project it takes all the fucking time (lit.)
|
||
|
Gutenberg archive.
|
||
|
The mind boggles and chafes at the boredom.
|
||
|
We collaborated with the Distributed Proofreaders
|
||
|
project to clean up the digitized files we re- The culture assigns lousy status on maintenance
|
||
|
ceived from the Mundaneum collection. From Novem- jobs = minimum wages, Amazon Mechanical Turks =
|
||
|
ber 2018 until the first upload of the cleaned-up virtually no pay.
|
||
|
book 'L'Afrique aux Noirs' in February 2019, An
|
||
|
Mertens exchanged about 50 emails with Linda Clean the set, tag the training data, correct the
|
||
|
Hamilton, Sharon Joiner and Susan Hanlon, all vol- typos, modify the parameters, finish the report,
|
||
|
unteers from the Distributed Proofreaders project. keep the requester happy, upload the new version,
|
||
|
The conversation is published online. It might attach words that were wrongly separated by OCR
|
||
|
inspire you to share unavailable books online. back together, complete those Human Intelligence
|
||
|
Tasks, try to guess the meaning of the requester's
|
||
|
formatting, you must accept the HIT before you can
|
||
|
--- An algoliterary version submit the results, summarize the image, add the
|
||
|
of the Maintenance Manifesto --- bounding box, what's the semantic similarity of
|
||
|
this text, check the translation quality, collect
|
||
|
In 1969, one year after the birth of her first your micro-payments, become a hit Mechanical Turk.
|
||
|
child, the New York artist Mierle Laderman Ukeles
|
||
|
wrote a Manifesto for Maintenance Art. The mani- Reference
|
||
|
festo calls for a readdressing of the status of
|
||
|
maintenance work both in the private, domestic https://www.arnolfini.org.uk/blog/manifesto-for-
|
||
|
space, and in public. What follows is an altered maintenance-art-1969
|
||
|
version of her text inspired by the work of the
|
||
|
Cleaners.
|
||
|
--- A bot panic on Amazon Mechanical Turk ---
|
||
|
IDEAS
|
||
|
Amazon's Mechanical Turk takes the name of a
|
||
|
A. The Death Instinct and the Life Instinct: chess-playing automaton from the eighteenth cen-
|
||
|
tury. In fact, the Turk wasn't a machine at all.
|
||
|
The Death Instinct: separation; categorization; It was a mechanical illusion that allowed a human
|
||
|
avant-garde par excellence; to follow the predicted chess master to hide inside the box and manually
|
||
|
|
||
|
25
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
of the games played during its demonstrations
|
||
|
around Europe and the Americas. Napoleon Bonaparte
|
||
|
is said to have been fooled by this trick too.
|
||
|
|
||
|
The Amazon Mechanical Turk is an online platform
|
||
|
for humans to execute tasks that algorithms can-
|
||
|
not. Examples include annotating sentences as be-
|
||
|
ing positive or negative, spotting number plates,
|
||
|
discriminating between face and non-face. The jobs
|
||
|
posted on this platform are often paid less than a
|
||
|
cent per task. Tasks that are more complex or re-
|
||
|
quire more knowledge can be paid up to several
|
||
|
cents. To earn a living, Turkers need to finish as
|
||
|
many tasks as fast as possible, leading to in-
|
||
|
evitable mistakes. As a result, the requesters
|
||
|
have to incorporate quality checks when they post
|
||
|
a job on the platform. They need to test whether
|
||
|
the Turker actually has the ability to complete
|
||
|
the task, and they also need to verify the re-
|
||
|
sults. Many academic researchers use Mechanical
|
||
|
Turk as an alternative to have their students exe-
|
||
|
cute these tasks.
|
||
|
|
||
|
In August 2018 Max Hui Bai, a psychology student
|
||
|
from the University of Minnesota, discovered that
|
||
|
the surveys he conducted with Mechanical Turk were
|
||
|
full of nonsense answers to open-ended questions.
|
||
|
He traced back the wrong answers and found out
|
||
|
that they had been submitted by respondents with
|
||
|
duplicate GPS locations. This raised suspicion.
|
||
|
Though Amazon explicitly prohibits robots from
|
||
|
completing jobs on Mechanical Turk, the company
|
||
|
does not deal with the problems they cause on
|
||
|
their platform. Forums for Turkers are full of
|
||
|
conversations about the automation of the work,
|
||
|
sharing practices of how to create robots that can
|
||
|
even violate Amazon’s terms. You can also find
|
||
|
videos on YouTube that show Turkers how to write a
|
||
|
bot to fill in answers for you.
|
||
|
|
||
|
Kristy Milland, an Mechanical Turk activist, says:
|
||
|
'Mechanical Turk workers have been treated really,
|
||
|
really badly for 12 years, and so in some ways I
|
||
|
see this as a point of resistance. If we were paid
|
||
|
fairly on the platform, nobody would be risking
|
||
|
their account this way.'
|
||
|
|
||
|
Bai is now leading a research project among social
|
||
|
scientists to figure out how much bad data is in
|
||
|
use, how large the problem is, and how to stop it.
|
||
|
But it is impossible at the moment to estimate how
|
||
|
many datasets have become unreliable in this way.
|
||
|
|
||
|
References
|
||
|
https://requester.mturk.com/create/projects/new
|
||
|
|
||
|
https://www.wired.com/story/amazon-mechanical-
|
||
|
turk-bot-panic/
|
||
|
|
||
|
https://www.maxhuibai.com/blog/evidence-that-
|
||
|
responses-from-repeating-gps-are-random
|
||
|
|
||
|
http://timryan.web.unc.edu/2018/08/12/data-
|
||
|
contamination-on-mturk/
|
||
|
|
||
|
26
|
||
|
informants inform informants inform informants inform informants inform informants inform info
|
||
|
mants inform informants inform informants inform informants inform informants i
|
||
|
form informants inform informants inform informants inform info
|
||
|
mants inform informants inform informants inform informants info
|
||
|
m informants inform informants inform informants inform
|
||
|
informants inform informants inform informants
|
||
|
inform informants inform informants inform
|
||
|
informants inform informants inform informants info
|
||
|
m informants inform informants inform
|
||
|
informants inform informants inform
|
||
|
informants inform informants inform in
|
||
|
ormants inform informants inform infor
|
||
|
ants inform informants inform info
|
||
|
mants inform informants inform
|
||
|
informants inform informants inform
|
||
|
informants inform informants inform
|
||
|
informants inform informants inform
|
||
|
informants inform infor
|
||
|
ants inform informants inform
|
||
|
informants inform informants inform
|
||
|
informants inform
|
||
|
informants inform informants inform
|
||
|
informants inform
|
||
|
informants inform informants inform
|
||
|
informants inform
|
||
|
informants inform informants inform
|
||
|
informants inform
|
||
|
informants inform informants
|
||
|
inform informants inform
|
||
|
informants inform
|
||
|
informants inform informants
|
||
|
inform informants inform
|
||
|
informants inform
|
||
|
informants inform
|
||
|
informants inform informants info
|
||
|
m informants inform
|
||
|
informants inform
|
||
|
informants inform
|
||
|
informants inform
|
||
|
informants inform informants
|
||
|
inform informants inform
|
||
|
informants inform
|
||
|
informants inform
|
||
|
informants inform
|
||
|
informants inform
|
||
|
informants inform
|
||
|
informants inform
|
||
|
informants inform
|
||
|
informants inform
|
||
|
informants inform
|
||
|
informants inform in
|
||
|
ormants inform info
|
||
|
mants inform infor
|
||
|
ants inform infor
|
||
|
ants inform info
|
||
|
mants inform in
|
||
|
ormants inform
|
||
|
informants inform
|
||
|
informants inform
|
||
|
informants inform
|
||
|
informants inform
|
||
|
informants inform
|
||
|
informants inform
|
||
|
informants inform
|
||
|
informants inform
|
||
|
informants inform
|
||
|
informants inform
|
||
|
informants inform
|
||
|
informants inform
|
||
|
27
|
||
|
r 8h3t i5 4 d 7 + +-+-+-+-+-+-+-+-+-+-+ c a +-+-+-+-+-+-+ e f n no6 - - t -as 7 ( e
|
||
|
a ah 5al ,n ri B |i|n|f|o|r|m|a|n|t|s| l |i|n|f|o|r|m| , 35e t s evn7 73r o2/ L ep - e
|
||
|
t : ca,i ma eeslh | +-+-+-+-+-+-+-+-+-+-+ r_ T +-+-+-+-+-+-+ 2o 73 pjt 7ng% e 84
|
||
|
n 7 hnprs s9i 3a1 9e _ 9l e o pi rsa d o ii/5am sd rr1 1 n% + n8w
|
||
|
h|29 e s _ 3 . o i c i. e+1onIa 4 f p | lu e v1r _nth2i a%a ce 1e 7e 1y |t e r
|
||
|
xn r 8 sF w t -e +-+-+-+-+ +-+-+-+-+-+-+-+ e +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+ 1 i2 n l cn r3
|
||
|
t e e ,i n ibC 6 |e|a|c|h| |d|a|t|a|s|e|t| |c|o|l|l|e|c|t|s| |d|i|f|f|e|r|e|n|t| iw tc a318
|
||
|
e o l a Me -o r + +-+-+-+-+ +-+-+-+-+-+-+-+ d 9 +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+ +yc l p
|
||
|
+6 n 8 , a -rsb es 3 t t | bt ,p q +-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+ 6 1d e 4 , 1 +
|
||
|
lk o95 sf s e - 2 b 0 rl n la / S f n |i|n|f|o|r|m|a|t|i|o|n| |a|b|o|u|t| 1 4r y7 n
|
||
|
i _ m ec cf 2|r 8ra5 n l 6t +-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+ o t | r e
|
||
|
h_ ae3 5 Ti nf ao 7 l t n 9 9 h +e e-1 +-+-+-+ +-+-+-+-+-+ 7 t 8 - f mme 5
|
||
|
t og m 9 i r. m l l j +t3 9 |t|h|e| |w|o|r|l|d| e97 3 9 t i s - o s
|
||
|
_i n l o er 8 n petc 141 s / i +-+-+-+ +-+-+-+-+-+ - 9 w 1 1 b
|
||
|
t4, r e u n8 a |t +-+-+-+-+-+-+-+-+ , |c +-+-+-+ +-+-+-+-+-+-+ +-+-+-+-+ 2r t 3
|
||
|
o 6 9.o7e 7 Ce |d|a|t|a|s|e|t|s| V |a|r|e| |i|m|b|u|e|d| |w|i|t|h| 7 ig g ig 3xa
|
||
|
i r- p R h 8 rr m g _ t +-+-+-+-+-+-+-+-+ +-+-+-+ +-+-+-+-+-+-+ +-+-+-+-+ n f -c , +
|
||
|
- - 9 f k i r 6 e 665 a +-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+ t m 1 9 6
|
||
|
om _ 1e Tlh4 , f vr E |c|o|l|l|e|c|t|o|r|'|s| |b|i|a|s| 0 7 t e 2t
|
||
|
E5 r o r i i b e hw i a ne +-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+ t a
|
||
|
m, m4 - a +-+-+-+-+ +-+-+-+-+-+-+-+-+ d +-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ 118 2a 6
|
||
|
- l l |s|o|m|e| |d|a|t|a|s|e|t|s| rt3 |c|o|m|b|i|n|e| |m|a|c|h|i|n|i|c| k f e
|
||
|
d i i 1 e , h +-+-+-+-+ +-+-+-+-+-+-+-+-+ 5 +-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ i % _e r
|
||
|
_ f oi e u s dt y +-+-+-+-+-+ +-+-+-+-+ +-+-+-+-+-+ i n9 7 o
|
||
|
f f 5 h l9 a a b n |l|o|g|i|c| |w|i|t|h| |h|u|m|a|n| s n 79 e if e 0
|
||
|
s i ln 6t a y t | ’7 / h +-+-+-+-+-+ +-+-+-+-+ +-+-+-+-+-+ 1 - 1n
|
||
|
s yn p p r oe xy +-+-+-+-+-+ c n d 6 _i a n
|
||
|
- n iu a v s, d o 7 eu e i |l|o|g|i|c| e as d m 2 v|h - | r
|
||
|
aL t5 l7 st A c S r c n r / +-+-+-+-+-+ tt o dr | V
|
||
|
s 9 +-+-+-+-+-+-+ +-+-+-+-+ d 7 + 5 77 2 t
|
||
|
z l x n |m|o|d|e|l|s| |t|h|a|t| d i n oS ad + a a a . _ t
|
||
|
ie 7 n n +-+-+-+-+-+-+ +-+-+-+-+ is r t 9 , | f 4 4 a t
|
||
|
8 - 8 e +-+-+-+-+-+-+-+ 1 o 8 h h + t
|
||
|
s +m tb rh f 5 6r |r|e|q|u|i|r|e| s o l2 2 | + s o n
|
||
|
a - rr o n +-+-+-+-+-+-+-+ m | o y 4 r _
|
||
|
5 i +-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+ d |m ? e
|
||
|
b 4 _ l ` |s|u|p|e|r|v|i|s|i|o|n| |m|u|l|t|i|p|l|y| |t|h|e| - s n 7 1
|
||
|
Tn n - +-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+ d 5
|
||
|
ls t v 3i . - 6 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ h _ 28 9f
|
||
|
4 s i h s- 4 4 l i |s|u|b|j|e|c|t|i|v|i|t|i|e|s| e a u
|
||
|
t + 9 fh lh,d +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 6 c 8
|
||
|
3 r c i 1 +-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+ +-+-+-+-+ p -
|
||
|
fn o |m|o|d|e|l|s| c |p|r|o|p|a|g|a|t|e| |w|h|a|t| + 5 M 4
|
||
|
5 r g +-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+ +-+-+-+-+ i t f
|
||
|
9 t i y +-+-+-+-+-+-+-+ +-+-+-+-+ sv 7
|
||
|
6r +e n t7 + A h |t|h|e|y|'|v|e| |b|e|e|n| o 45 6
|
||
|
m s t 9 o o _ s +-+-+-+-+-+-+-+ +-+-+-+-+ t o+ u e
|
||
|
s k8 3 l 2 - e +-+-+-+-+-+-+ e 6 e- t -
|
||
|
+ es n 5 e o 4 |t|a|u|g|h|t| s 9
|
||
|
t p e w , : o - +-+-+-+-+-+-+ t t 3
|
||
|
e 6 r 8 t +-+-+-+-+ +-+-+ +-+-+-+ a eo m m 3
|
||
|
e |s|o|m|e| |o|f| |t|h|e| + h e c
|
||
|
ee +-+-+-+-+ +-+-+ +-+-+-+ c h
|
||
|
o +-+-+-+-+-+-+-+-+ +-+-+-+-+ +-+-+ +-+-+-+-+-+-+-+ +-+-+
|
||
|
i k t |d|a|t|a|s|e|t|s| |p|a|s|s| |a|s| |d|e|f|a|u|l|t| |i|n| o o o
|
||
|
+-+-+-+-+-+-+-+-+ i +-+-+-+-+ +-+-+ +-+-+-+-+-+-+-+ +-+-+ r d
|
||
|
a i m a . 1 +-+-+-+ +-+-+-+-+-+-+-+ s u
|
||
|
r h o 2 |t|h|e| |m|a|c|h|i|n|e| l t
|
||
|
+ e a +-+-+-+ +-+-+-+-+-+-+-+ d 7 |
|
||
|
e a eo 4 +-+-+-+-+-+-+-+-+ +-+-+-+-+-+
|
||
|
h n |l|e|a|r|n|i|n|g| |f|i|e|l|d| s n
|
||
|
t _s n +-+-+-+-+-+-+-+-+ +-+-+-+-+-+
|
||
|
t n o +-+-+-+-+-+-+ +-+-+-+-+-+ +-+-+-+-+-+-+-+-+ e V
|
||
|
a d |h|u|m|a|n|s| |g|u|i|d|e| |m|a|c|h|i|n|e|s| u n
|
||
|
+-+-+-+-+-+-+ +-+-+-+-+-+ +-+-+-+-+-+-+-+-+
|
||
|
c e 5 1 2
|
||
|
r 6 r n 6 f
|
||
|
l o l
|
||
|
|
||
|
28
|
||
|
% V V V V V V V % V % %% % %%% %% %% % %%% %%% % % %%
|
||
|
V V V V V V V V V V V V V V V V % % % % % %% 0 %% 0 % % % % % %%% %
|
||
|
V V V V V V V V V % % %% % % 0 0 % % % %
|
||
|
% % % % % % % % % % 00 0 _ % % % % % %% % %
|
||
|
% % % 0 /_\ _ __ % %
|
||
|
% INFORMANTS % % % //_\\| '_ \ % 0
|
||
|
% % % % % 0 % % 0 / _ \ | | | % % % %%
|
||
|
% % % 0 \_/ \_/_| |_| 0 0
|
||
|
V V V V V % V V V % __ _ _ 00 % 00 0 _ %
|
||
|
V V V V V V V V V V V V V V V V 0 /__\ |_| |__ _ __ ___ __ _ _ __ __ _ _ __ | |__ _ _
|
||
|
V V V V V V V V V /_\ | __| '_ \| '_ \ / _ \ / _` | '__/ _` | '_ \| '_ \| | | | %
|
||
|
V V V V V V V V //__ | |_| | | | | | | (_) | (_| | | | (_| | |_) | | | | |_| |
|
||
|
V V V V V V V V V V V V V V V V % \__/ \__|_| |_|_| |_|\___/ \__, |_| \__,_| .__/|_| |_|\__, |
|
||
|
V V V V % V V V V V 0 0 % 0 % |___/ |_| 0 |___/
|
||
|
% 0 0 __ 0 ___ % _ _ 0 %
|
||
|
Machine learning algorithms need ___ / _| / \__ _| |_ __ _ ___ ___| |_ ___
|
||
|
guidance, whether they are super- 0 / _ \| |_ 0 / /\ / _` | __/ _` / __|/ _ \ __/ __| %
|
||
|
vised or not. In order to separate | (_) | _| / /_// (_| | || (_| \__ \ __/ |_\__ \
|
||
|
one thing from another, they need \___/|_| /___,' \__,_|\__\__,_|___/\___|\__|___/ % %
|
||
|
material to extract patterns from. 0 0 0
|
||
|
One should carefully choose the % %
|
||
|
study material, and adapt it to the by Algolit
|
||
|
machine's task. It doesn't make
|
||
|
sense to train a machine with nine- We often start the monthly Algolit meetings by searching for
|
||
|
teenth-century novels if its mis- datasets or trying to create them. Sometimes we use already-ex-
|
||
|
sion is to analyse tweets. A badly isting corpora, made available through the Natural Language
|
||
|
written textbook can lead a student Toolkit nltk. NLTK contains, among others, The Universal Declara-
|
||
|
to give up on the subject altogeth- tion of Human Rights, inaugural speeches from US presidents, or
|
||
|
er. A good textbook is preferably movie reviews from the popular site Internet Movie Database
|
||
|
not a textbook at all. (IMDb). Each style of writing will conjure different relations
|
||
|
% between the words and will reflect the moment in time from which
|
||
|
This is where the dataset comes in: they originate. The material included in NLTK was selected be-
|
||
|
arranged as neatly as possible, or- cause it was judged useful for at least one community of re-
|
||
|
ganized in disciplined rows and searchers. In spite of specificities related to the initial con-
|
||
|
lined-up columns, waiting to be text of each document, they become universal documents by de-
|
||
|
read by the machine. Each dataset fault, via their inclusion into a collection of publicly avail-
|
||
|
collects different information % able corpora. In this sense, the Python package manager for natu-
|
||
|
about the world, and like all col- ral language processing could be regarded as a time capsule. The
|
||
|
lections, they are imbued with col- main reason why The Universal Declaration for Human Rights was
|
||
|
lectors' bias. You will hear this included may have been because of the multiplicity of transla-
|
||
|
expression very often: 'data is the tions, but it also paints a picture of the types of human writing
|
||
|
new oil'. If only data were more that algorithms train on.
|
||
|
like oil! Leaking, dripping and
|
||
|
heavy with fat, bubbling up and With this work, we look at the datasets most commonly used by
|
||
|
jumping unexpectedly when in con- data scientists to train machine algorithms. What material do
|
||
|
tact with new matter. Instead, data they consist of? Who collected them? When?
|
||
|
is supposed to be clean. With each
|
||
|
process, each questionnaire, each --- %
|
||
|
column title, it becomes cleaner
|
||
|
and cleaner, chipping distinct % Concept & execution: Cristina Cochior
|
||
|
characteristics until it fits the %
|
||
|
mould of the dataset. % %
|
||
|
0 0 00 0
|
||
|
Some datasets combine the machinic 0 0 0 0
|
||
|
logic with the human logic. The __ __ _ _
|
||
|
models that require supervision 0 / / /\ \ \ |__ ___ __ _(_)_ __ ___
|
||
|
multiply the subjectivities of both 0 \ \/ \/ / '_ \ / _ \ \ \ /\ / / | '_ \/ __|
|
||
|
data collectors and annotators, \ /\ /| | | | (_) | \ V V /| | | | \__ \
|
||
|
then propagate what they've been 0 \/ \/ |_| |_|\___/ \_/\_/ |_|_| |_|___/
|
||
|
taught. You will encounter some of 0 0 0 0 0
|
||
|
the datasets that pass as default
|
||
|
in the machine learning field, as Who wins: creation of relationships
|
||
|
well as other stories of humans
|
||
|
guiding machines. by Louise Dekeuleneer, student Arts²/Section Visual Communication
|
||
|
|
||
|
French is a gendered language. Indeed many words are female or
|
||
|
male and few are neutral. The aim of this project is to show that
|
||
|
a patriarchal society also influences the language itself.
|
||
|
|
||
|
29
|
||
|
The work focused on showing whether more female or male words are
|
||
|
% % %%% % %% % used on highlighting the influence of context on the gender of %%%%%
|
||
|
% % % % % % words. At this stage, no conclusions have yet been drawn. %
|
||
|
% % % % %% % % % % % % % % % % %
|
||
|
% %% Law texts from 1900 to 1910 made available by the Mundaneum have
|
||
|
% % %% % % been passed into an algorithm that turns the text into a list of %
|
||
|
%% % % % words. These words are then compared with another list of French %
|
||
|
% % % % % words, in which is specified whether the word is male or female.
|
||
|
This list of words comes from Google Books. They created a huge
|
||
|
% % % % database in 2012 from all the books scanned and available on
|
||
|
% Google Books. % %
|
||
|
% % % % % % % %
|
||
|
Male words are highlighted in one colour and female words in an-
|
||
|
% % % % other. Words that are not gendered (adverbs, verbs, etc.) are not
|
||
|
% % % highlighted. All this is saved as an HTML file so that it can be
|
||
|
% % directly opened in a web page and printed without the need for
|
||
|
% additional layout. This is how each text becomes a small booklet
|
||
|
by just changing the input text of the algorithm.
|
||
|
|
||
|
%
|
||
|
0 % 0 0 0
|
||
|
0 0 0 %
|
||
|
_____ _ 0 0
|
||
|
% 0 0 /__ \ |__ ___ % 0
|
||
|
% / /\/ '_ \ / _ \ 0 %
|
||
|
0 / / | | | | __/ 0
|
||
|
% 0 0 0 \/ |_| |_|\___|
|
||
|
% 0 _ 0 0 _ _
|
||
|
/_\ _ __ _ __ ___ | |_ __ _| |_ ___ _ __
|
||
|
//_\\| '_ \| '_ \ / _ \| __/ _` | __/ _ \| '__|
|
||
|
/ _ \ | | | | | | (_) | || (_| | || (_) | | 0
|
||
|
\_/ \_/_| |_|_| |_|\___/ \__\__,_|\__\___/|_|
|
||
|
0 0
|
||
|
%
|
||
|
by Algolit
|
||
|
|
||
|
The annotator asks for the guidance of visitors in annotating
|
||
|
the archive of Mundaneum.
|
||
|
|
||
|
The annotation process is a crucial step in supervised machine
|
||
|
learning where the algorithm is given examples of what it needs
|
||
|
to learn. A spam filter in training will be fed examples of spam
|
||
|
% and real messages. These examples are entries, or rows from the
|
||
|
dataset with a label, spam or non-spam.
|
||
|
|
||
|
The labelling of a dataset is work executed by humans, they pick
|
||
|
a label for each row of the dataset. To ensure the quality of the
|
||
|
% labels multiple annotators see the same row and have to give the
|
||
|
same label before an example is included in the training data.
|
||
|
Only when enough samples of each label have been gathered in the
|
||
|
dataset can the computer start the learning process.
|
||
|
|
||
|
In this interface we ask you to help us classify the cleaned
|
||
|
texts from the Mundaneum archive to expand our training set and
|
||
|
improve the quality of the installation 'Classifying the World'
|
||
|
in Oracles.
|
||
|
|
||
|
---
|
||
|
|
||
|
Concept, code, interface: Gijs de Heij
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
30
|
||
|
%% % % %% % % % % %
|
||
|
% %% % % 0 0 0 0 0 0 % % %
|
||
|
% % % % % 0 0 0 0 % % % % %
|
||
|
% % % % % 0 0 _ ___ ___ ___ 00 %% %
|
||
|
% % % % 0 0 / |/ _ \ / _ \ / _ \ 0
|
||
|
%% % % 0 0 | | | | | | | | | | | %
|
||
|
% % % % 0 | | |_| | |_| | |_| | %% %
|
||
|
% % |_|\___/ \___/ \___/ 00 0
|
||
|
% % % % 00 0 0 0 0 _ 00 % % %
|
||
|
% % % ___ _ _ _ __ ___ ___| |_ ___
|
||
|
% % / __| | | | '_ \/ __|/ _ \ __/ __| %
|
||
|
% % %% 0 0 \__ \ |_| | | | \__ \ __/ |_\__ \ % %
|
||
|
0 0 % |___/\__, |_| |_|___/\___|\__|___/
|
||
|
0 %% 0 |___/ % % 0 %
|
||
|
0 0 0 0 __ _ % 0 _ 0 % %
|
||
|
0 0 / /\ /(_)_ __ _ _| |
|
||
|
0 | |\ \ / / | '_ \| | | | |
|
||
|
0 % | | \ V /| | | | | |_| | | 0 0
|
||
|
% | | \_/ |_|_| |_|\__, |_| %
|
||
|
% % 00 \_\ 0 |___/ 0
|
||
|
% % % __ _ _ _ _ % __ 0
|
||
|
0 0 % /__\_| (_) |_(_) ___ _ __\ \
|
||
|
% /_\/ _` | | __| |/ _ \| '_ \| | 0
|
||
|
% //_| (_| | | |_| | (_) | | | | |
|
||
|
0 \__/\__,_|_|\__|_|\___/|_| |_| | 0
|
||
|
% % 00 0 0 /_/
|
||
|
0 0 00
|
||
|
%
|
||
|
by Algolit
|
||
|
|
||
|
Created in 1985, Wordnet is a hierarchical taxonomy that de-
|
||
|
scribes the world. It was inspired by theories of human semantic
|
||
|
% memory developed in the late 1960s. Nouns, verbs, adjectives and
|
||
|
adverbs are grouped into synonyms sets or synsets, expressing a
|
||
|
different concept. %
|
||
|
|
||
|
ImageNet is an image dataset based on the WordNet 3.0 nouns hier-
|
||
|
archy. Each synset is depicted by thousands of images. From 2010 %
|
||
|
until 2017, the ImageNet Large Scale Visual Recognition Challenge
|
||
|
(ILSVRC) was a key benchmark in object category classification
|
||
|
% for pictures, having a major impact on software for photography,
|
||
|
image searches, image recognition.
|
||
|
|
||
|
1000 synsets (Vinyl Edition) contains the 1000 synsets used in
|
||
|
this challenge recorded in the highest sound quality that this
|
||
|
% analog format allows. This work highlights the importance of the
|
||
|
datasets used to train artificial intelligence (AI) models that
|
||
|
run on devices we use on a daily basis. Some of them inherit
|
||
|
classifications that were conceived more than 30 years ago. This
|
||
|
sound work is an invitation to thoughtfully analyse them.
|
||
|
|
||
|
---
|
||
|
|
||
|
Concept & recording: Javier Lloret
|
||
|
|
||
|
Voices: Sara Hamadeh & Joseph Hughes
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
31
|
||
|
CONTEXTUAL STORIES
|
||
|
ABOUT INFORMANTS
|
||
|
|
||
|
|
||
|
|
||
|
--- Datasets as representations --- community you try to distinguish what serves the
|
||
|
community and what doesn't and you try to general-
|
||
|
The data-collection processes that lead to the ize that, because I think that's what the good
|
||
|
creation of the dataset raise important questions: faith-bad faith algorithm is trying to do, to find
|
||
|
who is the author of the data? Who has the privi- helper tools to support the project, you do that
|
||
|
lege to collect? For what reason was the selection on the basis of a generalization that is on the
|
||
|
made? What is missing? abstract idea of what Wikipedia is and not on the
|
||
|
living organism of what happens every day. What
|
||
|
The artist Mimi Onuoha gives a brilliant example interests me in the relation between vandalism and
|
||
|
of the importance of collection strategies. She debate is how we can understand the conventional
|
||
|
chose the case of statistics related to hate drive that sits in these machine-learning pro-
|
||
|
crimes. In 2012, the FBI Uniform Crime Reporting cesses that we seem to come across in many places.
|
||
|
(UCR) Program registered almost 6000 hate crimes And how can we somehow understand them and deal
|
||
|
committed. However, the Department of Justice’s with them? If you place your separation of good
|
||
|
Bureau of Statistics came up with about 300.000 faith-bad faith on pre-existing labelling and then
|
||
|
reports of such cases. That is over 50 times as reproduce that in your understanding of what edits
|
||
|
many. The difference in numbers can be explained are being made, how then to take into account
|
||
|
by how the data was collected. In the first situa- movements that are happening, the life of the ac-
|
||
|
tion law enforcement agencies across the country tual project?
|
||
|
voluntarily reported cases. For the second survey,
|
||
|
the Bureau of Statistics distributed the National Amir: It's an interesting discussion. Firstly,
|
||
|
Crime Victimization form directly to the homes of what we are calling good faith and bad faith comes
|
||
|
victims of hate crimes. from the community itself. We are not doing la-
|
||
|
belling for them, they are doing labelling for
|
||
|
In the field of Natural Language Processing (NLP) themselves. So, in many different language
|
||
|
the material that machine learners work with is Wikipedias, the definition of what is good faith
|
||
|
text-based, but the same questions still apply: and what is bad faith will differ. Wikimedia is
|
||
|
who are the authors of the texts that make up the trying to reflect what is inside the organism and
|
||
|
dataset? During what period were the texts col- not to change the organism itself. If the organism
|
||
|
lected? What type of worldview do they represent? changes, and we see that the definition of good
|
||
|
faith and helping Wikipedia has been changed, we
|
||
|
In 2017, Google's Top Stories algorithm pushed a are implementing this feedback loop that lets
|
||
|
thread of 4chan, a non-moderated content website, people from inside their community pass judgement
|
||
|
to the top of the results page when searching for on their edits and if they disagree with the la-
|
||
|
the Las Vegas shooter. The name and portrait of an belling, we can go back to the model and retrain
|
||
|
innocent person were linked to the terrible crime. the algorithm to reflect this change. It's some
|
||
|
Google changed its algorithm just a few hours af- sort of closed loop: you change things and if
|
||
|
ter the mistake was discovered, but the error had someone sees there is a problem, then they tell us
|
||
|
already affected the person. The question is: why and we can change the algorithm back. It's an on-
|
||
|
did Google not exclude 4chan content from the going project.
|
||
|
training dataset of the algorithm?
|
||
|
Reference
|
||
|
Reference https://gitlab.constantvzw.org/algolit/algolit/blob/
|
||
|
https://points.datasociety.net/the-point-of- master/algoliterary_encounter/Interview%20with%20Amir
|
||
|
collection-8ee44ad7c2fa
|
||
|
|
||
|
https://arstechnica.com/information-technology --- How to make your dataset known ---
|
||
|
/2017/10/google-admits-citing-4chan-to-spread-
|
||
|
fake-vegas-shooter-news/ NLTK stands for Natural Language Toolkit. For pro-
|
||
|
grammers who process natural language using
|
||
|
Python, this is an essential library to work with.
|
||
|
--- Labeling for an Oracle that Many tutorial writers recommend machine learning
|
||
|
detects vandalism on Wikipedia --- learners to start with the inbuilt NLTK datasets.
|
||
|
It comprises 71 different collections, with a to-
|
||
|
This fragment is taken from an interview with Amir tal of almost 6000 items.
|
||
|
Sarabadani, software engineer at Wikimedia. He was
|
||
|
in Brussels in November 2017 during the Algoliter- There is for example the Movie Review corpus for
|
||
|
ary Encounter. sentiment analysis. Or the Brown corpus, which was
|
||
|
put together in the 1960s by Henry Kučera and W.
|
||
|
Femke: If you think about Wikipedia as a living Nelson Francis at Brown University in Rhode Is-
|
||
|
community, with every edit the project changes. land. There is also the Declaration of Human
|
||
|
Every edit is somehow a contribution to a living Rights corpus, which is commonly used to test
|
||
|
organism of knowledge. So, if from within that whether the code can run on multiple languages.
|
||
|
|
||
|
32
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
The corpus contains the Declaration of Human Rights In fact, at the beginning of Wikipedia,
|
||
|
expressed in 372 languages from around the world. many articles were written by bots.
|
||
|
Rambot, for example, was a controversial bot
|
||
|
But what is the process of getting a dataset ac- figure on the English-speaking platform.
|
||
|
cepted into the NLTK library nowadays? On the It authored 98 per cent of the pages de-
|
||
|
Github page, the NLTK team describes the following scribing US towns.
|
||
|
requirements:
|
||
|
As a result of serial and topical robot interven-
|
||
|
- Only contribute corpora that have obtained a ba- tions, the models that are trained on the full
|
||
|
sic level of notability. That means, there is a Wikipedia dump have a unique view on composing ar-
|
||
|
publication that describes it, and a community of ticles. For example, a topic model trained on all
|
||
|
programmers who are using it. of Wikipedia articles will associate 'river' with
|
||
|
- Ensure that you have permission to redistribute 'Romania' and 'village' with 'Turkey'. This is be-
|
||
|
the data, and can document this. This means that cause there are over 10000 pages written about
|
||
|
the dataset is best published on an external web- villages in Turkey. This should be enough to spark
|
||
|
site with a licence. anyone's desire for a visit, but it is far too
|
||
|
- Use existing NLTK corpus readers where possible, much compared to the number of articles other
|
||
|
or else contribute a well-documented corpus reader countries have on the subject. The asymmetry
|
||
|
to NLTK. This means, you need to organize your causes a false correlation and needs to be re-
|
||
|
data in such a way that it can be easily read us- dressed. Most models try to exclude the work of
|
||
|
ing NLTK code. these prolific robot writers.
|
||
|
|
||
|
Reference
|
||
|
--- Extract from a positive IMDb https://blog.lateral.io/2015/06/the-unknown-
|
||
|
movie review from the NLTK dataset --- perils-of-mining-wikipedia/
|
||
|
|
||
|
corpus: NLTK, movie reviews
|
||
|
|
||
|
fileid: pos/cv998_14111.txt
|
||
|
|
||
|
steven spielberg ' s second epic film on world war
|
||
|
ii is an unquestioned masterpiece of film . spiel-
|
||
|
berg , ever the student on film , has managed to
|
||
|
resurrect the war genre by producing one of its
|
||
|
grittiest , and most powerful entries . he also
|
||
|
managed to cast this era ' s greatest answer to
|
||
|
jimmy stewart , tom hanks , who delivers a perfor-
|
||
|
mance that is nothing short of an astonishing mir-
|
||
|
acle . for about 160 out of its 170 minutes , "
|
||
|
saving private ryan " is flawless . literally .
|
||
|
the plot is simple enough . after the epic d - day
|
||
|
invasion ( whose sequences are nothing short of
|
||
|
spectacular ) , capt . john miller ( hanks ) and
|
||
|
his team are forced to search for a pvt . james
|
||
|
ryan ( damon ) , whose brothers have all died in
|
||
|
battle . once they find him , they are to bring
|
||
|
him back for immediate discharge so that he can go
|
||
|
home . accompanying miller are his crew , played
|
||
|
with astonishing perfection by a group of charac-
|
||
|
ter actors that are simply sensational . barry
|
||
|
pepper , adam goldberg , vin diesel , giovanni
|
||
|
ribisi , davies , and burns are the team sent to
|
||
|
find one man , and bring him home . the battle se-
|
||
|
quences that bookend the film are extraordinary .
|
||
|
literally .
|
||
|
|
||
|
|
||
|
--- The ouroboros of machine learning ---
|
||
|
|
||
|
Wikipedia has become a source for learning not
|
||
|
only for humans, but also for machines. Its arti-
|
||
|
cles are prime sources for training models. But
|
||
|
very often, the material the machines are trained
|
||
|
on is the same content that they helped to write.
|
||
|
|
||
|
33
|
||
|
0 12 3 4 5 67 8 9 0
|
||
|
12 3 4 5 67 8 9 0 12
|
||
|
3 4 5 67 8 9 0 1 2
|
||
|
3 4 5 6 7 8 9 0 1 2 3
|
||
|
4 56 7 8 9 01 2 3
|
||
|
4 56 7 8 9 01 2 3
|
||
|
4 56 7 8 9 01 2 3
|
||
|
4 56 7 8 9 0 1 2 3 4
|
||
|
5 6 7 8 9 0 1 2 3 45 6
|
||
|
7 8 90 1 2 3 45 6
|
||
|
7 8 90 1 2 3 45 6
|
||
|
7 8 90 1 2 3 4 5 6
|
||
|
7 8 9 0 1 2 3 4 5 6
|
||
|
7 89 0 1 2 34 5 6
|
||
|
7 89 0 1 2 34 5 6
|
||
|
7 89 0 1 2 34 5 6 7
|
||
|
89 0 1 2 3 4 5 6 7 8 9
|
||
|
0 1 2 3 4 5 6 78 9
|
||
|
0 1 23 4 5 6 78 9 0
|
||
|
1 23 4 5 6 78 9 0
|
||
|
1 23 4 5 6 78 9 0
|
||
|
1 2 3 4 5 6 7 8 9 0
|
||
|
1 2 3 4 5 67 8 9 0 12
|
||
|
3 4 5 67 8 9 0 12
|
||
|
3 4 5 67 8 9 0 12 3
|
||
|
4 5 6 7 8 9 0 1 2 3
|
||
|
4 5 6 7 8 9 01 2 3
|
||
|
4 56 7 8 9 01 2 3
|
||
|
4 56 7 8 9 01 2 3
|
||
|
4 56 7 8 9 01 2 3 4
|
||
|
5 6 7 8 9 0 1 2 3 4 5 6
|
||
|
7 8 90 1 2 3 45 6
|
||
|
7 8 90 1 2 3 45 6
|
||
|
7 8 90 1 2 3 45 6
|
||
|
7 8 90 1 2 3 4 5 6 7
|
||
|
8 9 0 1 2 3 4 5 6 7
|
||
|
89 0 1 2 34 5 6 7
|
||
|
89 0 1 2 34 5 6 7 89
|
||
|
0 1 2 34 5 6 7 8 9
|
||
|
0 1 2 3 4 5 6 7 8 9
|
||
|
0 1 23 4 5 6 78 9 0
|
||
|
1 23 4 5 6 78 9 0
|
||
|
1 23 4 5 6 78 9 0
|
||
|
1 23 4 5 6 7 8 9 0
|
||
|
1 2 3 4 5 6 7 8 9 0 12 3
|
||
|
4 5 67 8 9 0 12 3
|
||
|
4 5 67 8 9 0 12 3
|
||
|
4 5 67 8 9 0 12 3
|
||
|
4 5 6 7 8 9 0 1 2 3
|
||
|
4 5 6 7 8 9 01 2 3 4
|
||
|
56 7 8 9 01 2 3 4
|
||
|
56 7 8 9 01 2 3 4 5
|
||
|
6 7 8 9 0 1 2 3 4 5 6
|
||
|
7 8 9 0 1 2 3 45 6
|
||
|
7 8 90 1 2 3 45 6
|
||
|
7 8 90 1 2 3 45 6
|
||
|
7 8 90 1 2 3 45 6 7
|
||
|
8 9 0 1 2 3 4 5 6 7
|
||
|
8 9 0 1 2 34 5 6 7 89
|
||
|
0 1 2 34 5 6 7 89 0
|
||
|
1 2 34 5 6 7 89 0
|
||
|
1 2 34 5 6 7 8 9 0
|
||
|
1 2 3 4 5 6 7 8 9 0
|
||
|
1 23 4 5 6 78 9 0
|
||
|
1 23 4 5 6 78 9 0
|
||
|
1 23 4 5 6 78 9 0 1
|
||
|
2 3 4 5 6 7 8 9 0 1 2 3
|
||
|
4 5 67 8 9 0 12 3
|
||
|
4 5 67 8 9 0 12 3
|
||
|
34
|
||
|
readers read readers read readers read readers read readers read readers read readers re
|
||
|
d readers read readers read readers read readers read readers re
|
||
|
d readers read readers read readers read readers read
|
||
|
readers read readers read readers read re
|
||
|
ders read readers read readers read readers re
|
||
|
d readers read readers read readers r
|
||
|
ad readers read readers read
|
||
|
readers read readers read readers read
|
||
|
readers read readers read
|
||
|
readers read readers read readers read
|
||
|
readers read readers read
|
||
|
readers read readers read
|
||
|
readers read readers read
|
||
|
readers read readers read
|
||
|
readers read readers read
|
||
|
readers read readers read
|
||
|
readers read readers
|
||
|
read readers read
|
||
|
readers read readers read
|
||
|
readers read readers read
|
||
|
readers read
|
||
|
readers read readers read
|
||
|
readers read
|
||
|
readers read readers read
|
||
|
readers read
|
||
|
readers read readers read
|
||
|
readers read
|
||
|
readers read readers re
|
||
|
d readers read
|
||
|
readers read
|
||
|
readers read readers read
|
||
|
readers read
|
||
|
readers read
|
||
|
readers read re
|
||
|
ders read readers read
|
||
|
readers read
|
||
|
readers read
|
||
|
readers read
|
||
|
readers read readers r
|
||
|
ad readers read
|
||
|
readers read
|
||
|
readers read
|
||
|
readers read
|
||
|
readers read
|
||
|
readers read
|
||
|
readers read
|
||
|
readers read readers
|
||
|
read readers read
|
||
|
readers read
|
||
|
readers read
|
||
|
readers read
|
||
|
readers read
|
||
|
readers read
|
||
|
readers read
|
||
|
readers read
|
||
|
readers read
|
||
|
readers read
|
||
|
readers read
|
||
|
readers read
|
||
|
readers read
|
||
|
readers read
|
||
|
readers read
|
||
|
readers read
|
||
|
readers read
|
||
|
readers read
|
||
|
readers read
|
||
|
readers read
|
||
|
readers read
|
||
|
readers read r
|
||
|
35
|
||
|
h a o e f rtlt9 b9r+t +-+-+-+-+-+-+-+ n +-+-+-+-+ aM B 6 r fwea5I s s ,e -h e e
|
||
|
m et u t w8 8+ i4 + R w e |r|e|a|d|e|r|s| f |r|e|a|d| C a r_ n b - i1 a s- noh6M+ pha
|
||
|
h a% 8 e olt r_ m c hb8 b +-+-+-+-+-+-+-+ mi +-+-+-+-+ pli f ro u n ae 3aee d oo| 3h 6o
|
||
|
2 ce 'd | 8 eA s d8 - i 6 1 %6 sr2 9 g2 a s lia wrc 3 ?7 i n3+7m s
|
||
|
c htiuw :ead 7 _ 9r t i d 5 sau4nl |e_ ar 8orl t h h+se a s _o1 s56 ka5n1e no hd
|
||
|
d m u 's +e | h64t +-+ +-+-+-+-+-+-+-+-+ o +-+-+-+-+-+-+-+-+-+-+-+ enl o 3 t d Ad- 2 ahs
|
||
|
g o i 0 _ 5o ss x 4 |a| |c|o|m|p|u|t|e|r| sl |u|n|d|e|r|s|t|a|n|d|s| 4i 8 trdiM 48 i5 2 9
|
||
|
tl e ri 6 9 ln a /8e +-+ +-+-+-+-+-+-+-+-+ 6 x +-+-+-+-+-+-+-+-+-+-+-+ 4 \eda o |y A o3 /1
|
||
|
e _ en l r 7 -sd c o +-+-+-+ +-+-+-+-+-+-+ l +-+-+-+-+-+-+-+-+-+ d6 m7n n a np l4 s
|
||
|
7 t p e M fdh c as |a|l|l| |m|o|d|e|l|s| Sa |t|r|a|n|s|l|a|t|e| a 6 w da 5 - o4 5 i )
|
||
|
r l a nn sh fc ui e7 +-+-+-+ +-+-+-+-+-+-+ c a +-+-+-+-+-+-+-+-+-+ ar 9 r , e a 3 , i
|
||
|
4 r 2 t +-+-+-+-+ +-+-+-+-+-+-+ 72 +-+-+-+-+-+ p r s r a a h an ' 3 a
|
||
|
o p ft n l |s|o|m|e| |m|o|d|e|l|s| |c|o|u|n|t| 8r n| 1 a r h o /oa e 7
|
||
|
m8 4 wa +-+-+-+-+ +-+-+-+-+-+-+ l 7 +-+-+-+-+-+ 2 or r i 9e 4 p142 ,6r
|
||
|
l 4N i u-3 am +-+-+-+-+ +-+-+-+-+-+-+ 4s +-+-+-+-+-+-+-+ 23 a e rea le dhVo t74 g
|
||
|
j 7 t o e rd |s|o|m|e| |m|o|d|e|l|s| |r|e|p|l|a|c|e| o -i no r + 2 r l i
|
||
|
o 6 7g i tt i +-+-+-+-+ +-+-+-+-+-+-+ 8fa +-+-+-+-+-+-+-+ x7 e g o ee d +ni
|
||
|
d i tr 6k t r 2 3a8 9 i3 5 hv7 ge 5e u - 3y a _ e 2 8 c
|
||
|
55fi1 - 6 :29 t e al+ atp43e + ac t n b t hTsa4ti03 o% % flol 4-e
|
||
|
rf m r 8 6y heta 1 e 1 m6 +t dy p e 9 n ,o 5 / n _ | s e1 + ni d
|
||
|
n 3 leo 5 ti 5 - sc a +1 w uw9 n+ e i m m
|
||
|
3 a a a 9 \ -8 18 e e l i e h ghc ey9 8 15 3y a 1 -e i 5a i 9r a5pe
|
||
|
o c c % a + 255 t yy m % 4i i 5 i e t _ 7 au l% 7 o
|
||
|
g s8 5 e 2 r 3i 2 1 _ i4ir 2 e l s 1la n s s ht 2 r s i 3 r
|
||
|
u s+ a e m + 6 2n r-l a c6 - t 7 4t +i +r % 8 6 8 r t t r 3 1
|
||
|
r s 90 k hl a pWn e i5 7 8 a r e4ro e r5wt s m
|
||
|
- h ea 6 2 8 2 v h nf e _ w lr a iai 7
|
||
|
| j 4 4 f hc i F 9 p s m toG al 6 / h sde l e
|
||
|
a 4 s 6 9 - h o m 6 _l34 . % w7 e 8 e l
|
||
|
n .52- i 7 5 _ r + s 5 p s 5n+ 3 il e 1 o F c
|
||
|
3 l 2 a o en% _. e 4 8lb 3 r a I 9 k o
|
||
|
t r 6 e + 2 6 y oa n i r% f 1 n78 s h F o
|
||
|
e g v 6 u h ad Ua1 2 a t 9 er n t oh7 s s r t g
|
||
|
+ 7 6 h8 t 7 a - m 73| t o e r i 7
|
||
|
f l ia s _ e u + 7 ct \ a _ 2- 7 . o o - ,
|
||
|
t n 0n 4+ f 2r i 9 s y i3 r t r s e a p m h 4
|
||
|
a c 7 t 9 n n m mro t s i nd e r
|
||
|
a 1 e e | e 1 3 c n k 2 p e o e
|
||
|
7i s d 6 a 48 c + Dl 1 1 n r - 0
|
||
|
V r + a o % 7 7 9r 4 | 9 n 7 e
|
||
|
e n | , m n e s s 1 e n 5
|
||
|
5 r 4 o 5 1 6 e - 2 a -r _ e s’1 e S i
|
||
|
t 2 +|ee s e c n an i e
|
||
|
a4 9 9 o p _ t 7 h v 9 0
|
||
|
d % a e , s nr 9 l W h a e t | + + s
|
||
|
a 3 7I a e tk K y3e 2 c - a h o u e d
|
||
|
\+ o 1 h r d t e nl 4 k 9 07 o t v 7s
|
||
|
, n e % _x | i t b1 r h ei
|
||
|
t a8 e o n t 12 o rs a y
|
||
|
i e + n a | a 9 \
|
||
|
n sr - e 3 i r- 8o e i
|
||
|
6 f i 3 ht a l | h 1 o
|
||
|
a s df m5 i h n i 9n ,u
|
||
|
d c n H s o l c i 5
|
||
|
o | s m rl 9 1 n c _i e
|
||
|
i + i nr 8 h % t a % t 0 m
|
||
|
i 6 c6 wt a r
|
||
|
g s pr l t a 5 | c i |
|
||
|
e 1 sr/ n e 7 e 9 n t w e c '
|
||
|
m c - o % n . a 3
|
||
|
f1 c I u 9 + t
|
||
|
2 . , 4 na P e e f 2
|
||
|
n i t 1S f n n a i e
|
||
|
r + e i h 9 _ v
|
||
|
3 | h e t s a
|
||
|
s E l v - p u 1 h 2 , ' 5
|
||
|
| + nse t a % 8 e w
|
||
|
o p n y o s o
|
||
|
|
||
|
36
|
||
|
V V V V V V % V V % % % %% % % % % %% % % %%
|
||
|
V V V V V V V V V V V V V V V V % % % 0 0 % % % % 0 %% % %%% % % %%% %
|
||
|
V V V V V V V V V % 0 0 %% % % 0 0 % % 0 %
|
||
|
% % %% % % 0 _____ _ % ___ % _ % __ % %
|
||
|
% % % % /__ \ |__ % ___ / __\ ___ ___ | | __ ___ / _| %
|
||
|
% % READERS % / /\/ '_ \ / _ \ /__\/// _ \ / _ \| |/ / / _ \| |_ %
|
||
|
% % / / | | | | __/ / \/ \ (_) | (_) | < | (_) | _| %
|
||
|
% % \/ |_| |_|\___| \_____/\___/ \___/|_|\_\ \___/|_|
|
||
|
V % V V V V V V V % % _____ 0 % 0 _
|
||
|
V V V V V V V V V V V V V V V V % /__ \___ _ __ ___ ___ _ __ _ __ _____ __ (_)_ __
|
||
|
V V V V V V V V V / /\/ _ \| '_ ` _ \ / _ \| '__| '__/ _ \ \ /\ / / | | '_ \
|
||
|
V % V V V V V V V / / | (_) | | | | | | (_) | | | | | (_) \ V V / | | | | |
|
||
|
V V V V V V V V V V V V V V V V \/ \___/|_| |_| |_|\___/|_| |_| \___/ \_/\_/ |_|_| |_| %
|
||
|
V V % V V V V V V V 0 0 ___ % 0 0 __
|
||
|
% % 0 __ _ 0 / __\ __ _ __ _ ___ / _| %
|
||
|
We communicate with computers 0 0 / _` | /__\/// _` |/ _` | / _ \| |_ 0
|
||
|
through language. We click on icons | (_| | / \/ \ (_| | (_| | | (_) | _| %
|
||
|
that have a description in words, 0 \__,_| \_____/\__,_|\__, | \___/|_|
|
||
|
we tap words on keyboards, use our 0 00 |___/ %
|
||
|
voice to give them instructions. 0 / / /\ \ \___ _ __ __| |___ 0 % %
|
||
|
Sometimes we trust our computer % % \ \/ \/ / _ \| '__/ _` / __| 0
|
||
|
with our most intimate thoughts and 0 0 \ /\ / (_) | | | (_| \__ \ 0 %
|
||
|
forget that they are extensive cal- % \/ \/ \___/|_| \__,_|___/ 0 %
|
||
|
culators. A computer understands
|
||
|
every word as a combination of ze- 0 0 0
|
||
|
ros and ones. A letter is read as a by Algolit % %
|
||
|
specific ASCII number: capital 'A'
|
||
|
is 001. The bag-of-words model is a simplifying representation of text
|
||
|
used in Natural Language Processing (NLP). In this model, a text
|
||
|
In all models, rule-based, classi- is represented as a collection of its unique words, disregarding
|
||
|
cal machine learning, and neural grammar, punctuation and even word order. The model transforms
|
||
|
networks, words undergo some type the text into a list of words and how many times they're used in
|
||
|
of translation into numbers in or- the text, or quite literally a bag of words.
|
||
|
der to understand the semantic
|
||
|
meaning of language. This is done This heavy reduction of language was the big shock when beginning
|
||
|
through counting. Some models count to machine learn. Bag of words is often used as a baseline, on
|
||
|
the frequency of single words, some which the new model has to perform better. It can understand the
|
||
|
might count the frequency of combi- subject of a text by recognizing the most frequent or important
|
||
|
nations of words, some count the words. It is often used to measure the similarities of texts by
|
||
|
frequency of nouns, adjectives, comparing their bags of words.
|
||
|
verbs or noun and verb phrases.
|
||
|
Some just replace the words in a For this work the article 'Le Livre de Demain' by engineer G.
|
||
|
text by their index numbers. Num- Vander Haeghen, published in 1907 in the Bulletin de l'Institut
|
||
|
bers optimize the operative speed International de Bibliographie of the Mundaneum, has been liter-
|
||
|
of computer processes, leading to ally reduced to a bag of words. You can buy a bag at the recep-
|
||
|
fast predictions, but they also re- tion of Mundaneum.
|
||
|
move the symbolic links that words
|
||
|
might have. Here we present a few ---
|
||
|
techniques that are dedicated to
|
||
|
making text readable to a machine. Concept & realisation: An Mertens
|
||
|
%
|
||
|
|
||
|
0 00
|
||
|
0 0 0
|
||
|
0 _____ ___ _____ ___ ___
|
||
|
0 0 /__ \/ __\ \_ \/ \/ __\
|
||
|
0 0 / /\/ _\____ / /\/ /\ / _\
|
||
|
0 00 / / / /|_____/\/ /_/ /_// /
|
||
|
\/ \/ \____/___,'\/
|
||
|
0
|
||
|
|
||
|
by Algolit
|
||
|
|
||
|
The TF-IDF (Term Frequency-Inverse Document Frequency) is a
|
||
|
weighting method used in text search. This statistical measure
|
||
|
makes it possible to evaluate the importance of a term contained
|
||
|
in a document, relative to a collection or corpus of documents.
|
||
|
The weight increases in proportion to the number of occurrences
|
||
|
|
||
|
37
|
||
|
%% % % % %% %% of the word in the document. It also varies according to the fre-
|
||
|
% % % % % quency of the word in the corpus. The TF-IDF is used in particu-
|
||
|
% % % % %% lar in the classification of spam in email softwares. %
|
||
|
% % % % % % % % %
|
||
|
% % % % A web-based interface shows this algorithm through animations %
|
||
|
% making it possible to understand the different steps of text %
|
||
|
% % % classification. How does a TF-IDF-based programme read a text? %
|
||
|
% How does it transform words into numbers? % % %
|
||
|
% % % % %
|
||
|
% --- % %
|
||
|
% % %
|
||
|
% Concept, code, animation: Sarah Garcin %
|
||
|
% % %
|
||
|
% % %
|
||
|
0 0 % %
|
||
|
% 0 0 %
|
||
|
0 ___ 0 _ 0 0
|
||
|
0 / _ \_ __ _____ _(_)_ __ __ _ __ _
|
||
|
0 / /_\/ '__/ _ \ \ /\ / / | '_ \ / _` | / _` |
|
||
|
0 / /_\\| | | (_) \ V V /| | | | | (_| | | (_| |
|
||
|
0 \____/|_| \___/ \_/\_/ |_|_| |_|\__, | \__,_|
|
||
|
0 0 0 |___/ 0
|
||
|
0 0 0 _ 0 %
|
||
|
% | |_ _ __ ___ ___
|
||
|
% 0 0 | __| '__/ _ \/ _ \ %
|
||
|
% 0 | |_| | | __/ __/
|
||
|
0 0 0 \__|_| \___|\___|
|
||
|
%
|
||
|
|
||
|
by Algolit %
|
||
|
% %
|
||
|
% % Parts-of-Speech is a category of words that we learn at school:
|
||
|
% noun, verb, adjective, adverb, pronoun, preposition, conjunction,
|
||
|
% interjection, and sometimes numeral, article, or determiner. %
|
||
|
|
||
|
In Natural Language Processing (NLP) there exist many writings
|
||
|
that allow sentences to be parsed. This means that the algorithm
|
||
|
can determine the part-of-speech of each word in a sentence.
|
||
|
'Growing a tree' uses this technique to define all nouns in a
|
||
|
specific sentence. Each noun is then replaced by its definition.
|
||
|
This allows the sentence to grow autonomously and infinitely.
|
||
|
The recipe of 'Growing a tree' was inspired by Oulipo's constraint
|
||
|
of 'littérature définitionnelle' invented by Marcel Benabou in
|
||
|
1966. In a given phrase, one replaces every significant element
|
||
|
(noun, adjective, verb, adverb) by one of its definitions in a
|
||
|
given dictionary; one reiterates the operation on the newly
|
||
|
received phrase, and again.
|
||
|
|
||
|
The dictionary of definitions used in this work is Wordnet. Word-
|
||
|
net is a combination of a dictionary and a thesaurus that can be
|
||
|
read by machines. According to Wikipedia it was created in the
|
||
|
Cognitive Science Laboratory of Princeton University starting in
|
||
|
1985. The project was initially funded by the US Office of Naval
|
||
|
Research and later also by other US government agencies including
|
||
|
DARPA, the National Science Foundation, the Disruptive Technology
|
||
|
Office (formerly the Advanced Research and Development Activity),
|
||
|
and REFLEX.
|
||
|
|
||
|
---
|
||
|
|
||
|
Concept, code & interface: An Mertens & Gijs de Heij
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
38
|
||
|
% % %% % %% % % %% _ _ % % % _ _ _ %% % % _ %%% % % %
|
||
|
%% /_\ | | __ _ ___ _ __(_) |_| |__ _ __ ___ (_) ___ %
|
||
|
%% 0 //_\\| |/ _` |/ _ \| '__| | __| '_ \| '_ ` _ \| |/ __| % %
|
||
|
% % % % / _ \ | (_| | (_) | | | | |_| | | | | | | | | | (__ %
|
||
|
% % % % % % \_/ \_/_|\__, |\___/|_| |_|\__|_| |_|_| |_| |_|_|\___| % %
|
||
|
% %% % % % |___/ % 0 _ _ %% % % 00 % __ %%
|
||
|
% % % _ __ ___ __ _ __| (_)_ __ __ _ ___ ___ / _| %% %
|
||
|
% % % | '__/ _ \/ _` |/ _` | | '_ \ / _` / __| / _ \| |_ %
|
||
|
% % | | | __/ (_| | (_| | | | | | (_| \__ \ | (_) | _| % %
|
||
|
|_| \___|\__,_|\__,_|_|_| |_|\__, |___/ \___/|_|
|
||
|
% % 0 % ___ 0 _ _ 0 _|___/ 0 %_ 0 %
|
||
|
% / __\ ___ _ __| |_(_) | | ___ _ __( )__ %
|
||
|
% % 0 /__\/// _ \ '__| __| | | |/ _ \| '_ \/ __| %%
|
||
|
/ \/ \ __/ | | |_| | | | (_) | | | \__ \ %
|
||
|
% 0 0 \_____/\___|_| \__|_|_|_|\___/|_| |_|___/
|
||
|
% % 0 _ _ _ 0
|
||
|
% % % _ __ 0 ___ _ __| |_ _ __ __ _(_) |_ %
|
||
|
% % % | '_ \ / _ \| '__| __| '__/ _` | | __| 0
|
||
|
% 00 | |_) | (_) | | | |_| | | (_| | | |_
|
||
|
% | .__/ \___/|_| \__|_| \__,_|_|\__| %
|
||
|
|_| 0 _ __
|
||
|
0 _ __ __ _ _ __| | _/_/
|
||
|
0 0 | '_ \ / _` | '__| |/ _ \ 0
|
||
|
0 | |_) | (_| | | | | __/ 0
|
||
|
0 | .__/ \__,_|_| |_|\___|
|
||
|
0 0 |_|
|
||
|
00 0 0 0 0 00
|
||
|
|
||
|
% by Guillaume Slizewicz (Urban Species)
|
||
|
% % %
|
||
|
Written in 1907, Un code télégraphique du portrait parlé is an
|
||
|
attempt to translate the 'spoken portrait', a face-description
|
||
|
technique created by a policeman in Paris, into numbers. By im-
|
||
|
plementing this code, it was hoped that faces of criminals and
|
||
|
fugitives could easily be communicated over the telegraphic net-
|
||
|
% work in between countries. In its form, content and ambition this
|
||
|
text represents our complicated relationship with documentation
|
||
|
% technologies. This text sparked the creation of the following in-
|
||
|
% stallations for three reasons: %
|
||
|
|
||
|
- First, the text is an algorithm in itself, a compression algo-
|
||
|
rithm, or to be more precise, the presentation of a compression
|
||
|
% algorithm. It tries to reduce the information to smaller pieces
|
||
|
while keeping it legible for the person who has the code. In this
|
||
|
% regard it is linked to the way we create technology, our pursuit
|
||
|
for more efficiency, quicker results, cheaper methods. It repre-
|
||
|
sents our appetite for putting numbers on the entire world, mea-
|
||
|
suring the smallest things, labeling the tiniest differences.
|
||
|
This text itself embodies the vision of the Mundaneum.
|
||
|
|
||
|
- Second it is about the reasons for and the applications of
|
||
|
technology. It is almost ironic that this text was in the se-
|
||
|
lected archives presented to us in a time when face recognition
|
||
|
and data surveillance are so much in the news. This text bears
|
||
|
the same characteristics as some of today's technology: motivated
|
||
|
by social control, classifying people, laying the basis for a
|
||
|
surveillance society. Facial features are at the heart of recent
|
||
|
controversies: mugshots were standardized by Bertillon, now they
|
||
|
are used to train neural network to predict criminals from law-
|
||
|
abiding citizens. Facial recognition systems allow the arrest of
|
||
|
criminals via CCTV infrastructure and some assert that people’s
|
||
|
features can predict sexual orientation.
|
||
|
|
||
|
- The last point is about how it represents the evolution of
|
||
|
mankind’s techno-structure. What our tools allow us to do, what
|
||
|
they forbid, what they hinder, what they make us remember and
|
||
|
what they make us forget. This document enables a classification
|
||
|
between people and a certain vision of what normality is. It
|
||
|
|
||
|
39
|
||
|
breaks the continuum into pieces thus allowing stigmatiza-
|
||
|
tion/discrimination. On the other hand this document also feels
|
||
|
%% %% % %% %% % obsolete today, because our techno-structure does not need such
|
||
|
% %% % % % detailed written descriptions about fugitives, criminals or citi- %
|
||
|
% %% % % % % % % zens. We can now find fingerprints, iris scans or DNA info in %
|
||
|
% % % % % % % % % % large datasets and compare them directly. Sometimes the techno- %
|
||
|
% % % % logical systems do not even need human supervision and recognize
|
||
|
% % % %% % % directly the identity of a person via their facial features or % %
|
||
|
% their gait. Computers do not use intricate written language to
|
||
|
describe a face, but arrays of integers. Hence all the words used
|
||
|
% in this documents seem désuets, dated. Have we forgotten what %
|
||
|
some of them mean? Did photography make us forget how to describe
|
||
|
% faces? Will voice-assistance software teach us again?
|
||
|
%
|
||
|
Writing with Otlet
|
||
|
% %
|
||
|
% % Writing with Otlet is a character generator that uses the spoken %
|
||
|
% portrait code as its database. Random numbers are generated and
|
||
|
% translated into a set of features. By creating unique instances,
|
||
|
% the algorithm reveals the richness of the description that is
|
||
|
possible with the portrait code while at the same time embodying
|
||
|
its nuances.
|
||
|
%
|
||
|
An interpretation of Bertillon's spoken portrait. %%
|
||
|
|
||
|
% This work draws a parallel between Bertillon systems and current
|
||
|
ones. A webcam linked to a facial recognition algorithm captures %
|
||
|
the beholder's face and translates it into numbers on a canvas,
|
||
|
% printing it alongside Bertillon's labelled faces.
|
||
|
% %
|
||
|
References
|
||
|
https://www.technologyreview.com/s/602955/neural-network-learns-
|
||
|
to-identify-criminals-by-their-faces/
|
||
|
|
||
|
https://fr.wikipedia.org/wiki/Bertillonnage
|
||
|
|
||
|
https://callingbullshit.org/case_studies/case_study_criminal_
|
||
|
machine_learning.html
|
||
|
% %
|
||
|
%
|
||
|
% % 0 0 0 0 %
|
||
|
0 0 0
|
||
|
/\ /\__ _ _ __ __ _ _ __ ___ __ _ _ __
|
||
|
0 / /_/ / _` | '_ \ / _` | '_ ` _ \ / _` | '_ \
|
||
|
/ __ / (_| | | | | (_| | | | | | | (_| | | | |
|
||
|
\/ /_/ \__,_|_| |_|\__, |_| |_| |_|\__,_|_| |_|
|
||
|
0 0 |___/ 0 0
|
||
|
% 0 0 0 0 0 %
|
||
|
%
|
||
|
by Laetitia Trozzi, student Arts²/Section Digital Arts
|
||
|
|
||
|
What better way to discover Paul Otlet and his passion for liter-
|
||
|
ature than to play hangman? Through this simple game, which con-
|
||
|
sists in guessing the missing letters in a word, the goal is to
|
||
|
make the public discover terms and facts related to one of the
|
||
|
creators of the Mundaneum.
|
||
|
%
|
||
|
Hangman uses an algorithm to detect the frequency of words in a
|
||
|
text. Next, a series of significant words were isolated in Paul
|
||
|
Otlet's bibliography. This series of words is integrated into a
|
||
|
hangman game presented in a terminal. The difficulty of the game
|
||
|
gradually increases as the player is offered longer and longer
|
||
|
words. Over the different game levels, information about the life
|
||
|
and work of Paul Otlet is displayed.
|
||
|
|
||
|
%
|
||
|
|
||
|
|
||
|
|
||
|
40
|
||
|
CONTEXTUAL STORIES
|
||
|
ABOUT READERS
|
||
|
|
||
|
|
||
|
|
||
|
Naive Bayes, Support Vector Machines and Linear ter trigram. All the overlapping sequences of
|
||
|
Regression are called classical machine learning three characters are isolated. For example, the
|
||
|
algorithms. They perform well when learning with character 3-grams of 'Suicide', would be, ‘Sui’,
|
||
|
small datasets. But they often require complex ‘uic’, ‘ici’, ‘cid’, etc. Character n-gram fea-
|
||
|
Readers. The task the Readers do, is also called tures are very simple, they're language-indepen-
|
||
|
feature-engineering. This means that a human needs dent and they're tolerant to noise. Furthermore,
|
||
|
to spend time on a deep exploratory data analysis spelling mistakes do not jeopardize the technique.
|
||
|
of the dataset.
|
||
|
Patterns found with character n-grams focus on
|
||
|
Features can be the frequency of words or letters, stylistic choices that are unconsciously made by
|
||
|
but also syntactical elements like nouns, adjec- the author. The patterns remain stable over the
|
||
|
tives, or verbs. The most significant features for full length of the text, which is important for
|
||
|
the task to be solved, must be carefully selected authorship recognition. Other types of experiments
|
||
|
and passed over to the classical machine learning could include measuring the length of words or
|
||
|
algorithm. This process marks the difference with sentences, the vocabulary richness, the frequen-
|
||
|
Neural Networks. When using a neural network, cies of function words; even syntax or semantics-
|
||
|
there is no need for feature-engineering. Humans related measurements.
|
||
|
can pass the data directly to the network and
|
||
|
achieve fairly good performances straightaway. This means that not only your physical fingerprint
|
||
|
This saves a lot of time, energy and money. is unique, but also the way you compose your
|
||
|
thoughts! The same n-gram technique discovered that
|
||
|
The downside of collaborating with Neural Networks The Cuckoo’s Calling, a novel by Robert Galbraith,
|
||
|
is that you need a lot more data to train your was actually written by … J. K. Rowling!
|
||
|
prediction model. Think of 1GB or more of plain
|
||
|
text files. To give you a reference, 1 A4, a text Reference
|
||
|
file of 5000 characters only weighs 5 KB. You Paper: On the Robustness of Authorship Attribu-
|
||
|
would need 8,589,934 pages. More data also re- tion Based on Character N-gram Features, Efs-
|
||
|
quires more access to useful datasets and more, tathios Stamatatos, in Journal of Law & Policy,
|
||
|
much more processing power. Volume 21, Issue 2, 2013.
|
||
|
|
||
|
News article: https://www.scientificamerican.com
|
||
|
--- Character n-gram for /article/how-a-computer-program-helped-show-jk-
|
||
|
authorship recognition --- rowling-write-a-cuckoos-calling/
|
||
|
|
||
|
Imagine … You've been working for a company for
|
||
|
more than ten years. You have been writing tons of --- A history of n-grams ---
|
||
|
emails, papers, internal notes and reports on very
|
||
|
different topics and in very different genres. All The n-gram algorithm can be traced back to the
|
||
|
your writings, as well as those of your col- work of Claude Shannon in information theory. In
|
||
|
leagues, are safely backed-up on the servers of the paper, 'A Mathematical Theory of Communica-
|
||
|
the company. tion', published in 1948, Shannon performed the
|
||
|
first instance of an n-gram-based model for natu-
|
||
|
One day, you fall in love with a colleague. After ral language. He posed the question: given a se-
|
||
|
some time you realize this human is rather mad and quence of letters, what is the likelihood of the
|
||
|
hysterical and also very dependent on you. The day next letter?
|
||
|
you decide to break up, your (now) ex elaborates a
|
||
|
plan to kill you. They succeed. This is unfortu- If you read the following excerpt, can you tell
|
||
|
nate. A suicide letter in your name is left next who it was written by? Shakespeare or an n-gram
|
||
|
to your corpse. Because of emotional problems, it piece of code?
|
||
|
says, you decided to end your life. Your best
|
||
|
friends don't believe it. They decide to take the SEBASTIAN: Do I stand till the break off.
|
||
|
case to court. And there, based on the texts you
|
||
|
and others produced over ten years, a machine BIRON: Hide thy head.
|
||
|
learning model reveals that the suicide letter was
|
||
|
written by someone else. VENTIDIUS: He purposeth to Athens: whither, with
|
||
|
the vow
|
||
|
How does a machine analyse texts in order to iden- I made to handle you.
|
||
|
tify you? The most robust feature for authorship
|
||
|
recognition is delivered by the character n-gram FALSTAFF: My good knave.
|
||
|
technique. It is used in cases with a variety of
|
||
|
thematics and genres of the writing. When using You may have guessed, considering the topic of
|
||
|
character n-grams, texts are considered as se- this story, that an n-gram algorithm generated
|
||
|
quences of characters. Let's consider the charac- this text. The model is trained on the compiled
|
||
|
|
||
|
41
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
works of Shakespeare. While more recent algo- press, traders sell. On the contrary, if the news
|
||
|
rithms, such as the recursive neural networks of is good, they buy.
|
||
|
the CharNN, are becoming famous for their perfor-
|
||
|
mance, n-grams still execute a lot of NLP tasks. A paper by Haikuan Liu of the Australian National
|
||
|
They are used in statistical machine translation, University states that the tense of verbs used in
|
||
|
speech recognition, spelling correction, entity tweets can be an indicator of the frequency of fi-
|
||
|
detection, information extraction, ... nancial transactions. His idea is based on the
|
||
|
fact that verb conjugation is used in psychology
|
||
|
to detect the early stages of human depression.
|
||
|
--- God in Google Books ---
|
||
|
Reference
|
||
|
In 2006, Google created a dataset of n-grams from Paper: 'Grammatical Feature Extraction and Analy-
|
||
|
their digitized book collection and released it sis of Tweet Text: An Application towards Pre-
|
||
|
online. Recently they also created an n-gram dicting Stock Trends', Haikuan Liu, Research
|
||
|
viewer. School of Computer Science (RSCS), College of
|
||
|
Engineering and Computer Science (CECS),
|
||
|
This allowed for many socio-linguistic investiga- The Australian National University (ANU)
|
||
|
tions. For example, in October 2018, the New York
|
||
|
Times Magazine published an opinion article titled
|
||
|
'It’s Getting Harder to Talk About God'. The au- --- Bag of words ---
|
||
|
thor, Jonathan Merritt, had analysed the mention
|
||
|
of the word 'God' in Google's dataset using the In Natural Language Processing (NLP), 'bag of
|
||
|
n-gram viewer. He concluded that there had been words' is considered to be an unsophisticated mod-
|
||
|
a decline in the word's usage since the twentieth el. It strips text of its context and dismantles
|
||
|
century. Google's corpus contains texts from the it into a collection of unique words. These words
|
||
|
sixteenth century leading up to the twenty-first. are then counted. In the previous sentences, for
|
||
|
However, what the author missed out on was the example, 'words' is mentioned three times, but
|
||
|
growing popularity of scientific journals around this is not necessarily an indicator of the text's
|
||
|
the beginning of the twentieth century. This new focus.
|
||
|
genre that was not mentioning the word God shifted
|
||
|
the dataset. If the scientific literature was The first appearance of the expression 'bag of
|
||
|
taken out of the corpus, the frequency of the word words' seems to go back to 1954. Zellig Harris,
|
||
|
'God' would again flow like a gentle ripple from an influential linguist, published a paper called
|
||
|
a distant wave. 'Distributional Structure'. In the section called
|
||
|
'Meaning as a function of distribution', he says
|
||
|
'for language is not merely a bag of words but a
|
||
|
--- Grammatical features taken from tool with particular properties which have been
|
||
|
Twitter influence the stock market --- fashioned in the course of its use. The linguist's
|
||
|
work is precisely to discover these properties,
|
||
|
The boundaries between academic disciplines are whether for descriptive analysis or for the synthesis
|
||
|
becoming blurred. Economics research mixed with of quasi-linguistic systems.'
|
||
|
psychology, social science, cognitive and emo-
|
||
|
tional concepts have given rise to a new economics
|
||
|
subfield, called 'behavioral economics'. This
|
||
|
means that researchers can start to explain stock
|
||
|
market mouvement based on factors other than eco-
|
||
|
nomic factors only. Both the economy and 'public
|
||
|
opinion' can influence or be influenced by each
|
||
|
other. A lot of research is being done on how to
|
||
|
use 'public opinion' to predict tendencies in
|
||
|
stock-price changes.
|
||
|
|
||
|
'Public opinion' is estimated from sources of
|
||
|
large amounts of public data, like tweets, blogs
|
||
|
or online news. Research using machinic data anal-
|
||
|
ysis shows that the changes in stock prices can be
|
||
|
predicted by looking at 'public opinion', to some
|
||
|
degree. There are many scientific articles online,
|
||
|
which analyse the press on the 'sentiment' ex-
|
||
|
pressed in them. An article can be marked as more
|
||
|
or less positive or negative. The annotated press
|
||
|
articles are then used to train a machine learning
|
||
|
model, which predicts stock market trends, marking
|
||
|
them as 'down' or 'up'. When a company gets bad
|
||
|
|
||
|
42
|
||
|
learners learn learners learn learners learn learners learn learners learn learners learn
|
||
|
learners learn learners learn learners learn learners learn learners learn
|
||
|
learners learn learners learn learners learn learners learn
|
||
|
learners learn learners learn learners learn
|
||
|
learners learn learners learn learners learn lea
|
||
|
ners learn learners learn learners learn
|
||
|
learners learn learners learn learners learn
|
||
|
learners learn learners learn learners
|
||
|
earn learners learn learners learn
|
||
|
learners learn learners learn
|
||
|
learners learn learners learn lea
|
||
|
ners learn learners learn learners
|
||
|
learn learners learn learners
|
||
|
earn learners learn learne
|
||
|
s learn learners learn
|
||
|
learners learn learners learn
|
||
|
learners learn learners learn
|
||
|
learners learn learners learn
|
||
|
learners learn
|
||
|
learners learn learners learn
|
||
|
learners learn learners learn
|
||
|
learners learn
|
||
|
learners learn learners learn
|
||
|
learners learn
|
||
|
learners learn learners learn
|
||
|
learners learn
|
||
|
learners learn learners
|
||
|
learn learners learn
|
||
|
learners learn
|
||
|
learners learn learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
learners learn learners
|
||
|
learn learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
learners learn lea
|
||
|
ners learn learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
learners learn learners
|
||
|
earn learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
learners learn
|
||
|
43
|
||
|
4n r- ro %r5 l e +-+-+-+-+-+-+-+-+ f +-+-+-+-+-+ m 9-e p + st2- a , _ nr2
|
||
|
l itr9 op 2c b ue |l|e|a|r|n|e|r|s| , y |l|e|a|r|n| ) g- 9 c w 1 atn_wn o_ c|
|
||
|
c o b op , +_7 -x a 9acl +-+-+-+-+-+-+-+-+ hc +-+-+-+-+-+ 34 u a 9a l |an t p 9 -
|
||
|
|\ _ l6el , 7 3 u r1 3 8dl a. m s T rv t ro|lm ni3 4 V3 as1to 4 e hp
|
||
|
5_s -o 4 d o9n t 0 t V i5n _ i, _ iu9 l + t t 6t s r s exe4eh l 4
|
||
|
ri _g d s es c s a 4s i+ i _ +-+-+-+-+-+-+-+-+ +-+-+-+ +-+-+-+-+-+-+-+ e l4 f k 5l l wu |f
|
||
|
ete V o I- 4e |l|e|a|r|n|e|r|s| 6 e |a|r|e| |p|a|t|t|e|r|n| st 62 t a ne e 2 ?
|
||
|
.n l 1 ntb 5 d9 +-+-+-+-+-+-+-+-+ e e1 +-+-+-+ +-+-+-+-+-+-+-+ ia 5 n i w er8
|
||
|
er 1 t i 9 te9 n r7 | t ie m +-+-+-+-+-+-+-+ n s 1 i- e i X c w a
|
||
|
4 _c4 c s+ m t eh h.5 t a i t m p3 a e |f|i|n|d|e|r|s| , ll 6a e e7ifo- +cs te s-
|
||
|
h 5 8 m wl c tl u w2 +-+-+-+-+-+-+-+ 8 r s oe t % 8- 1 tl3o 4
|
||
|
n r a t t 3a 9 +-+-+-+-+-+-+-+-+ 5i9 +-+-+-+ +-+-+-+-+-+-+-+-+ l s 9 | 9a e 0sbntaf
|
||
|
m(um8 j ra e +t o |l|e|a|r|n|e|r|s| |a|r|e| |c|r|a|w|l|i|n|g| n n ei pte7i r 6ms
|
||
|
t s G_ el i + ka e . +-+-+-+-+-+-+-+-+ +-+-+-+ +-+-+-+-+-+-+-+-+ ,/s u r r 4 1 i h
|
||
|
d heeo 2eei m g r ao a ah( 9a u m9 V e +-+-+-+-+-+-+-+ +-+-+-+-+ nae T-e r s-i5 7n
|
||
|
gt r_ y e io 96 e e s d |T trig - l |t|h|r|o|u|g|h| |d|a|t|a| 7s e1s77 87 2 fw m c
|
||
|
9d. 2 _ e 2nnm 96 n a t7- c d, o e +-+-+-+-+-+-+-+ +-+-+-+-+ 6 r n rbhi e 5 s n d
|
||
|
/ _ 2r s f a ef +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+ +-+-+-+-+ h asn _
|
||
|
t5 w w p l n | a -s |l|e|a|r|n|e|r|s| e |g|e|n|e|r|a|t|e| |s|o|m|e| |k|i|n|d| u s s
|
||
|
ie im i i 7 t 4 +-+-+-+-+-+-+-+-+ r +-+-+-+-+-+-+-+-+ +-+-+-+-+ +-+-+-+-+ u t nr+ a
|
||
|
c 7 t s x 4 da n 7 Fd e c & +-+-+ +-+-+-+-+-+-+-+-+ raa o c5 ' e ro.
|
||
|
k1 n t re 8 n et 9 1 l r 0V |o|f| |s|p|e|c|i|f|i|c| a t9 s c rv v s l
|
||
|
n_fa r% a Z a 5 w me m n 5 1s n +-+-+ +-+-+-+-+-+-+-+-+ t S 1 o a r d rb
|
||
|
y 7 r c o ge D _ns v / b +-+-+-+-+-+-+-+-+-+ 8 4- i o 9 t e
|
||
|
i 4 9 9t6 9- é2 o p| o v i |'|g|r|a|m|m|a|r|'| n p t p 8sn _ l 8
|
||
|
nt 2pc t V4 e ha e 3 1 , n 2 i o +-+-+-+-+-+-+-+-+-+ %4 r 8 1 1 t e
|
||
|
e 8 rn d +-+-+-+-+-+-+-+-+-+-+-+ i +-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ u t
|
||
|
e e e e r F |c|l|a|s|s|i|f|i|e|r|s| %f |g|e|n|e|r|a|t|e|,| |e|v|a|l|u|a|t|e| 1 h V0 t n
|
||
|
nh % c 5 h r +-+-+-+-+-+-+-+-+-+-+-+ ti +-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ Ul n m ,
|
||
|
- n 2 ab m 3 o- r e 6| n +-+-+-+ +-+-+-+-+-+-+-+-+ 6 + oe /
|
||
|
l t i u + u t l i 7 ei |a|n|d| |r|e|a|d|j|u|s|t| 5 r f l f5 %
|
||
|
n 2 s e m a m e d1 m uh c +-+-+-+ +-+-+-+-+-+-+-+-+ n s g o _
|
||
|
e d c ps +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+ +-+-+-+ + a D y5 8r
|
||
|
+1n o h |l|e|a|r|n|e|r|s| |u|n|d|e|r|s|t|a|n|d| |a|n|d| k4t tr t m
|
||
|
u a t +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+ +-+-+-+ a 3 i 3 t
|
||
|
2 r 7 n n 9 r r. t p i +-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ -- c
|
||
|
g + l t v c i 8 f as |r|e|v|e|a|l| |p|a|t|t|e|r|n|s| a _ n
|
||
|
4 s l 5 2 + f s - l +-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ 4 - e
|
||
|
y + h -_ 7 +-+-+-+-+-+-+-+-+ +-+-+-+-+-+ +-+-+-+-+-+-+ o . - i e
|
||
|
i e l t e _ V n |l|e|a|r|n|e|r|s| |d|o|n|'|t| |a|l|w|a|y|s| 4b ,i
|
||
|
_ % rt h e ,a +-+-+-+-+-+-+-+-+ +-+-+-+-+-+ +-+-+-+-+-+-+ a _ h _
|
||
|
2 V o 5 t +-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+ _ s
|
||
|
c % po + h o3 mi5 8 |d|i|s|t|u|i|n|g|u|i|s|h| |w|e|l|l| w 7 _nn
|
||
|
, ha u pk +-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+ 91s 6 a
|
||
|
s hp I 3 % +-+-+-+-+-+ +-+-+-+-+-+-+-+-+ i 8
|
||
|
v o 6 o r s |w|h|i|c|h| |p|a|t|t|e|r|n|s| s_ oge e
|
||
|
n a + e o e 3 n 7 +-+-+-+-+-+ +-+-+-+-+-+-+-+-+ o 6 +
|
||
|
i l r \ m + a l r +-+-+-+-+-+-+ +-+-+ +-+-+-+-+-+-+-+-+ , n
|
||
|
c a o o o |s|h|o|u|l|d| |b|e| |r|e|p|e|a|t|e|d| eh s i
|
||
|
o tlt t 2 e5 d +-+-+-+-+-+-+ +-+-+ +-+-+-+-+-+-+-+-+ o s
|
||
|
7 d 2 5 | n | 1 ey d te a t
|
||
|
r | , + 9 6 % f a i s %
|
||
|
n o+| r u s \ 4 e ep e
|
||
|
ao 2 | f' | e e r 9 7 Td i d e
|
||
|
. t 8m d c l 6 l o i _ t T i - i
|
||
|
n 7 e d 3 p l a n . i l
|
||
|
i i % 8 a + p r l e
|
||
|
4 % a l
|
||
|
| h 5 | tl d 1mo 7 t N
|
||
|
, t o i 9 o? F W 9 dC %hf
|
||
|
o m 5 t t w , - 3p
|
||
|
a d s e a n t _ o c \ f
|
||
|
+ p a r f |el 8 , g i l e e
|
||
|
t e3 - - 9 h c t t +w + | u0 w t
|
||
|
. h 5 a , s
|
||
|
t d _ n V 4 a o
|
||
|
, o t r nt
|
||
|
w e e
|
||
|
|
||
|
44
|
||
|
V V V % V % V % V V % V % % % % % %% % % % % % % %
|
||
|
V V V V V V V V V V V V V V V V % % % 0 % % % %% % % %%% % %
|
||
|
V V V % V V V V V V % % %% 0 0 % % 0 % 00 % % %
|
||
|
% % % % 0 % __ _ 0 % 0 % ___ % 0 0 %
|
||
|
% % % % % % 0 /\ \ \__ _(_)_ 0 _____ / __\ __ _ _ _ ___ ___ %
|
||
|
% % LEARNERS % % / \/ / _` | \ \ / / _ \ /__\/// _` | | | |/ _ \/ __|
|
||
|
% % % % % / /\ / (_| | |\ V / __/ / \/ \ (_| | |_| | __/\__ \
|
||
|
% % % % \_\ \/ \__,_|_| \_/ \___| \_____/\__,_|\__, |\___||___/
|
||
|
V V V V V V V V % 0 % % 0 0 % % |___/
|
||
|
V V V V V V V V V V V V V V V V % __ _ __ _ _ __ ___ ___ 0 % %
|
||
|
V V V V V V V % V V % % / _` |/ _` | '_ ` _ \ / _ \ %
|
||
|
V V V V V V V V 0 0 | (_| | (_| | | | | | | __/ %
|
||
|
V V V V V V V V V V V V V V V V % 0 00 \__, |\__,_|_| |_| |_|\___| 0 %
|
||
|
V V V V V V V V V 0 |___/ 0
|
||
|
% % 0 0 0
|
||
|
Learners are the algorithms that
|
||
|
distinguish machine learning prac- by Algolit % %
|
||
|
tices from other types of prac- %
|
||
|
tices. They are pattern finders, In machine learning Naive Bayes methods are simple probabilistic
|
||
|
capable of crawling through data classifiers that are widely applied for spam filtering and decid-
|
||
|
and generating some kind of spe- ing whether a text is positive or negative.
|
||
|
cific 'grammar'. Learners are based
|
||
|
on statistical techniques. Some They require a small amount of training data to estimate the nec-
|
||
|
need a large amount of training essary parameters. They can be extremely fast compared to more
|
||
|
data in order to function, others sophisticated methods. They are difficult to generalize, which
|
||
|
can work with a small annotated means that they perform on specific tasks, demanding to be
|
||
|
set. Some perform well in classifi- % trained with the same style of data that will be used to work
|
||
|
cation tasks, like spam identifica- with afterwards.
|
||
|
tion, others are better at predict-
|
||
|
ing numbers, like temperatures, This game allows you to play along the rules of Naive Bayes.
|
||
|
distances, stock market values, While manually executing the code, you create your own playful
|
||
|
and so on. model that 'just works'. A word of caution is necessary: because
|
||
|
you only train it with 6 sentences – instead of the minimum 2000
|
||
|
The terminology of machine learn- – it is not representative at all!
|
||
|
ing is not yet fully established.
|
||
|
Depending on the field, whether ---
|
||
|
statistics, computer science or
|
||
|
the humanities, different terms Concept & realisation: An Mertens
|
||
|
are used. Learners are also called
|
||
|
classifiers. When we talk about
|
||
|
Learners, we talk about the inter- % 0 % 0 0 0 %
|
||
|
woven functions that have the ca- 0 0 0 0 0 %
|
||
|
pacity to generate other functions, __ _ 0
|
||
|
evaluate and readjust them to fit 0 0 / /(_)_ __ ___ __ _ _ __ 0
|
||
|
the data. They are good at under- / / | | '_ \ / _ \/ _` | '__|
|
||
|
standing and revealing patterns. 0 0 / /__| | | | | __/ (_| | |
|
||
|
But they don't always distinguish 0 \____/_|_| |_|\___|\__,_|_|
|
||
|
well which of the patterns should 0 __ 0 0 _
|
||
|
be repeated. 0 /__\ ___ __ _ _ __ ___ ___ ___(_) ___ _ __
|
||
|
/ \/// _ \/ _` | '__/ _ \/ __/ __| |/ _ \| '_ \
|
||
|
In software packages, it is not al- 00 0 / _ \ __/ (_| | | | __/\__ \__ \ | (_) | | | |
|
||
|
ways possible to distinguish the 0 0 \/ \_/\___|\__, |_| \___||___/___/_|\___/|_| |_|
|
||
|
characteristic elements of the 0 0 |___/ 0
|
||
|
classifiers, because they are hid- 0 0 __ _ __ _ _ __ ___ ___
|
||
|
den in underlying modules or li- 0 / _` |/ _` | '_ ` _ \ / _ \
|
||
|
braries. Programmers can invoke | (_| | (_| | | | | | | __/
|
||
|
them using a single line of code. 0 \__, |\__,_|_| |_| |_|\___| 0 0 %
|
||
|
For this exhibition, we therefore |___/ 00
|
||
|
developed two table games that show 0 0 0 0
|
||
|
in detail the learning process of
|
||
|
simple, but frequently used classi- by Algolit
|
||
|
fiers.
|
||
|
Linear Regression is one of the best-known and best-understood
|
||
|
algorithms in statistics and machine learning. It has been around
|
||
|
for almost 200 years. It is an attractive model because the rep-
|
||
|
% resentation is so simple. In statistics, linear regression is a
|
||
|
statistical method that allows to summarize and study relation-
|
||
|
ships between two continuous (quantitative) variables.
|
||
|
|
||
|
45
|
||
|
% % % %% % % By playing this game you will realize that as a player you have a
|
||
|
% % % % lot of decisions to make. You will experience what it means to %
|
||
|
% %% create a coherent dataset, to decide what is in and what is not
|
||
|
% % % % in. If all goes well, you will feel the urge to change your data %
|
||
|
% % in order to obtain better results. This is part of the art of ap- %
|
||
|
%% % % % % % proximation that is at the basis of all machine learning prac-
|
||
|
% % % tices. % % % % % % % %
|
||
|
% % %
|
||
|
% % % % % --- % %
|
||
|
% % % % % % %
|
||
|
Concept & realisation: An Mertens %
|
||
|
% % % %
|
||
|
%% % %
|
||
|
0 % 0 0
|
||
|
00 0 0 0 % 0 0
|
||
|
0 _____ _ _ __ 0 _ 0 %
|
||
|
/__ \_ __ __ _(_) |_ _/_/ __| | ___
|
||
|
/ /\/ '__/ _` | | __/ _ \ / _` |/ _ \
|
||
|
% % 0 / / | | | (_| | | || __/ | (_| | __/
|
||
|
00 \/ |_| \__,_|_|\__\___| \__,_|\___|
|
||
|
% % _ 0 00 0 % 0
|
||
|
% __| | ___ ___ _ _ _ __ ___ ___ _ __
|
||
|
% / _` |/ _ \ / __| | | | '_ ` _ \ / _ \ '_ \ ____
|
||
|
% | (_| | (_) | (__| |_| | | | | | | __/ | | |/___/
|
||
|
\__,_|\___/ \___|\__,_|_| |_| |_|\___|_| |_|
|
||
|
% 0 _ 0 _ _ 0 0
|
||
|
| |_ __ _| |_(_) ___ _ __
|
||
|
| __/ _` | __| |/ _ \| '_ \ 0
|
||
|
% | |( |_| |_| | | (_) | | | |
|
||
|
\__\__,_|\__|_|\___/|_| |_| 0
|
||
|
0 0 % 0 0
|
||
|
%
|
||
|
Traité de Documentation. Three algorithmic poems.
|
||
|
|
||
|
by Rémi Forte, designer-researcher at L’Atelier national de
|
||
|
recherche typographique, Nancy, France
|
||
|
%
|
||
|
serigraphy on paper, 60 × 80 cm, 25 ex., 2019, for sale at the
|
||
|
% reception of the Mundaneum.
|
||
|
|
||
|
The poems, reproduced in the form of three posters, are an algo-
|
||
|
% rithmic and poetic re-reading of Paul Otlet's 'Traité de documen-
|
||
|
tation'. They are the result of an algorithm based on the mysteri-
|
||
|
ous rules of human intuition. It has been applied to a fragment
|
||
|
taken from Paul Otlet's book and is intended to be representative
|
||
|
% of his bibliological practice.
|
||
|
%
|
||
|
For each fragment, the algorithm splits the text, words and punc-
|
||
|
tuation marks are counted and reordered into a list. In each
|
||
|
% line, the elements combine and exhaust the syntax of the selected
|
||
|
fragment. Paul Otlet's language remains perceptible but exacer-
|
||
|
bated to the point of absurdity. For the reader, the systematiza-
|
||
|
% tion of the text is disconcerting and his reading habits are dis-
|
||
|
rupted.
|
||
|
|
||
|
% Built according to a mathematical equation, the typographical
|
||
|
% composition of the poster is just as systematic as the poem. How-
|
||
|
ever, friction occurs occasionally; loop after loop, the lines
|
||
|
% extend to bite on the neighbouring column. Overlays are created
|
||
|
and words are hidden by others. These telescopic handlers draw
|
||
|
alternative reading paths.
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
46
|
||
|
CONTEXTUAL STORIES
|
||
|
ABOUT LEARNERS
|
||
|
|
||
|
|
||
|
|
||
|
--- Naive Bayes & Viagra --- Only after 150 years was the accusation refuted.
|
||
|
|
||
|
Naive Bayes is a famous learner that performs well Fast forward to 1939, when Bayes' rule was still
|
||
|
with little data. We apply it all the time. Chris- virtually taboo, dead and buried in the field of
|
||
|
tian and Griffiths state in their book, 'Algorithms statistics. When France was occupied in 1940 by
|
||
|
To Live By', that 'our days are full of small Germany, which controlled Europe's factories and
|
||
|
data'. Imagine, for example, that you're standing farms, Winston Churchill's biggest worry was the
|
||
|
at a bus stop in a foreign city. The other person U-boat peril. U-boat operations were tightly con-
|
||
|
who is standing there has been waiting for 7 min- trolled by German headquarters in France. Each
|
||
|
utes. What do you do? Do you decide to wait? And submarine received orders as coded radio messages
|
||
|
if so, for how long? When will you initiate other long after it was out in the Atlantic. The mes-
|
||
|
options? Another example. Imagine a friend asking sages were encrypted by word-scrambling machines,
|
||
|
advice about a relationship. He's been together called Enigma machines. Enigma looked like a com-
|
||
|
with his new partner for a month. Should he invite plicated typewriter. It was invented by the German
|
||
|
the partner to join him at a family wedding? firm Scherbius & Ritter after the First World War,
|
||
|
when the need for message-encoding machines had
|
||
|
Having pre-existing beliefs is crucial for Naive become painfully obvious.
|
||
|
Bayes to work. The basic idea is that you calcu-
|
||
|
late the probabilities based on prior knowledge Interestingly, and luckily for Naive Bayes and
|
||
|
and given a specific situation. the world, at that time, the British government
|
||
|
and educational systems saw applied mathematics
|
||
|
The theorem was formulated during the 1740s by and statistics as largely irrelevant to practical
|
||
|
Thomas Bayes, a reverend and amateur mathemati- problem-solving. So the British agency charged
|
||
|
cian. He dedicated his life to solving the ques- with cracking German military codes mainly hired
|
||
|
tion of how to win the lottery. But Bayes' rule men with linguistic skills. Statistical data was
|
||
|
was only made famous and known as it is today by seen as bothersome because of its detail-oriented
|
||
|
the mathematician Pierre Simon Laplace in France a nature. So wartime data was often analysed not by
|
||
|
bit later in the same century. For a long time af- statisticians, but by biologists, physicists, and
|
||
|
ter La Place's death, the theory sank into obliv- theoretical mathematicians. None of them knew that
|
||
|
ion until it was dug up again during the Second the Bayes rule was considered to be unscientific
|
||
|
World War in an effort to break the Enigma code. in the field of statistics. Their ignorance proved
|
||
|
fortunate.
|
||
|
Most people today have come in contact with Naive
|
||
|
Bayes through their email spam folders. Naive It was the now famous Alan Turing – a mathemati-
|
||
|
Bayes is a widely used algorithm for spam detec- cian, computer scientist, logician, cryptoanalyst,
|
||
|
tion. It is by coincidence that Viagra, the erec- philosopher and theoretical biologist – who used
|
||
|
tile dysfunction drug, was approved by the US Food Bayes' rules probabilities system to design the
|
||
|
& Drug Administration in 1997, around the same 'bombe'. This was a high-speed electromechanical
|
||
|
time as about 10 million users worldwide had made machine for testing every possible arrangement
|
||
|
free webmail accounts. The selling companies were that an Enigma machine would produce. In order to
|
||
|
among the first to make use of email as a medium crack the naval codes of the U-boats, Turing sim-
|
||
|
for advertising: it was an intimate space, at the plified the 'bombe' system using Baysian methods.
|
||
|
time reserved for private communication, for an It turned the UK headquarters into a code-breaking
|
||
|
intimate product. In 2001, the first SpamAssasin factory. The story is well illustrated in The Imi-
|
||
|
programme relying on Naive Bayes was uploaded to tation Game, a film by Morten Tyldum dating from
|
||
|
SourceForge, cutting down on guerilla email mar- 2014.
|
||
|
keting.
|
||
|
|
||
|
Reference --- A story about sweet peas ---
|
||
|
Machine Learners, by Adrian MacKenzie, MIT Press,
|
||
|
Cambridge, US, November 2017. Throughout history, some models have been invented
|
||
|
by people with ideologies that are not to our lik-
|
||
|
ing. The idea of regression stems from Sir Francis
|
||
|
--- Naive Bayes & Enigma --- Galton, an influential nineteenth-century scientist.
|
||
|
He spent his life studying the problem of heredity
|
||
|
This story about Naive Bayes is taken from the – understanding how strongly the characteristics
|
||
|
book 'The Theory That Would Not Die', written by of one generation of living beings manifested them-
|
||
|
Sharon Bertsch McGrayne. Among other things, she selves in the following generation. He established
|
||
|
describes how Naive Bayes was soon forgotten after the field of eugenics, defining it as 'the study
|
||
|
the death of Pierre Simon Laplace, its inventor. of agencies under social control that may improve
|
||
|
The mathematician was said to have failed to or impair the racial qualities of future genera-
|
||
|
credit the works of others. Therefore, he suffered tions, either physically or mentally'. On Wikipedia,
|
||
|
widely circulated charges against his reputation. Galton is a prime example of scientific racism.
|
||
|
|
||
|
47
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
Galton initially approached the problem of hered- In 1962, he created the Perceptron, a model that
|
||
|
ity by examining characteristics of the sweet pea learns through the weighting of inputs. It was
|
||
|
plant. He chose this plant because the species can set aside by the next generation of researchers,
|
||
|
self-fertilize. Daughter plants inherit genetic because it can only handle binary classification.
|
||
|
variations from mother plants without a contribu-
|
||
|
tion from a second parent. This characteristic This means that the data has to be clearly
|
||
|
eliminates having to deal with multiple sources. separable, as for example, men and women, black
|
||
|
and white. It is clear that this type of data is
|
||
|
Galton's research was appreciated by many intel- very rare in the real world. When the so-called
|
||
|
lectuals of his time. In 1869, in 'Hereditary Ge- first AI winter arrived in the 1970s and the funding
|
||
|
nius', Galton claimed that genius is mainly a mat- decreased, the Perceptron was also neglected. For
|
||
|
ter of ancestry and he believed that there was a ten years it stayed dormant. When spring settled
|
||
|
biological explanation for social inequality at the end of the 1980s, a new generation of re-
|
||
|
across races. Galton even influenced his half- searchers picked it up again and used it to con-
|
||
|
cousin Charles Darwin with his ideas. After read- struct neural networks. These contain multiple
|
||
|
ing Galton's paper, Darwin stated, 'You have made layers of Perceptrons. That is how neural networks
|
||
|
a convert of an opponent in one sense for I have saw the light. One could say that the current ma-
|
||
|
always maintained that, excepting fools, men did chine learning season is particularly warm, but it
|
||
|
not differ much in intellect, only in zeal and takes another winter to know a summer.
|
||
|
hard work'. Luckily, the modern study of heredity
|
||
|
managed to eliminate the myth of race-based ge-
|
||
|
netic difference, something Galton tried hard to --- BERT ---
|
||
|
maintain.
|
||
|
Some online articles say that the year 2018 marked
|
||
|
Galton's major contribution to the field was lin- a turning point for the field of Natural Language
|
||
|
ear regression analysis, laying the groundwork for Processing (NLP). A series of deep-learning models
|
||
|
much of modern statistics. While we engage with achieved state-of-the-art results on tasks like
|
||
|
the field of machine learning, Algolit tries not question-answering or sentiment-classification.
|
||
|
to forget that ordering systems hold power, and Google’s BERT algorithm entered the machine learn-
|
||
|
that this power has not always been used to the ing competitions of last year as a sort of 'one
|
||
|
benefit of everyone. Machine learning has inher- model to rule them all'. It showed a superior per-
|
||
|
ited many aspects of statistical research, some formance over a wide variety of tasks.
|
||
|
less agreeable than others. We need to be atten-
|
||
|
tive, because these world views do seep into the BERT is pre-trained; its weights are learned in
|
||
|
algorithmic models that create new orders. advance through two unsupervised tasks. This means
|
||
|
BERT doesn’t need to be trained from scratch for
|
||
|
References each new task. You only have to finetune its
|
||
|
weights. This also means that a programmer wanting
|
||
|
http://galton.org/letters/darwin/correspondence.htm to use BERT, does not know any longer what parame-
|
||
|
https://www.tandfonline.com/doi/full/10.1080 ters BERT is tuned to, nor what data it has seen
|
||
|
/10691898.2001.11910537 to learn its performances.
|
||
|
http://www.paramoulipist.be/?p=1693
|
||
|
BERT stands for 'Bidirectional Encoder Representa-
|
||
|
tions from Transformers'. This means that BERT al-
|
||
|
--- Perceptron --- lows for bidirectional training. The model learns
|
||
|
the context of a word based on all of its sur-
|
||
|
We find ourselves in a moment in time in which roundings, left and right of a word. As such, it
|
||
|
neural networks are sparking a lot of attention. can differentiate between 'I accessed the bank ac-
|
||
|
But they have been in the spotlight before. The count' and 'I accessed the bank of the river'.
|
||
|
study of neural networks goes back to the 1940s,
|
||
|
when the first neuron metaphor emerged. The neuron Some facts:
|
||
|
is not the only biological reference in the field - BERT_large, with 345 million parameters, is the
|
||
|
of machine learning - think of the word corpus or largest model of its kind. It is demonstrably su-
|
||
|
training. The artificial neuron was constructed in perior on small-scale tasks to BERT_base, which
|
||
|
close connection to its biological counterpart. uses the same architecture with 'only' 110 million
|
||
|
parameters.
|
||
|
Psychologist Frank Rosenblatt was inspired by fel- - to run BERT you need to use TPUs. These are the
|
||
|
low psychologist Donald Hebb's work on the role of Google's processors (CPUs) especially engineered
|
||
|
neurons in human learning. Hebb stated that 'cells for TensorFLow, the deep-learning platform. TPU's
|
||
|
that fire together wire together'. His theory now renting rates range from $8/hr till $394/hr. Algo-
|
||
|
lies at the basis of associative human learning, lit doesn't want to work with off-the-shelf pack-
|
||
|
but also unsupervised neural network learning. It ages, we are interested in opening up the black-
|
||
|
moved Rosenblatt to expand on the idea of the ar- box. In that case, BERT asks for quite some sav-
|
||
|
tificial neuron. ings in order to be used.
|
||
|
|
||
|
48
|
||
|
█▒░░ ▓▒█░░▒▓███▀▒░ ░▒ ▒ ░ ▒▒▓██▒ ░░ ▒░ ░ ▒ ░▓ ░▒▓ ▒ ▒█░░▒█▓▒░▓▒ ▒▓▒▒░ ▒ ▒▒ ▓▒░ ▒░▓ █ ▒░ █░█ ▓▒░ ▒▓░░
|
||
|
▓ ▒░ ▒▒ ░▒░▒▓ ░ ░ ▒ ░ ▒ ▒ ░ ░ ░▒░ ░▒ ░ ▒ ░▒░▓░ ▒ ░ ▒ ░▒░ ░ ░▒ ▒░░ ░ ▒ ░░▓ ░ ▓▓░░ ░░▒▒▓░
|
||
|
░ ▒ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░░ ░ ▒ ░ ░ ░ ░ ░ ░░ ░ ░ ░ ░ ░ ░ ░ ░
|
||
|
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
|
||
|
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
|
||
|
|
||
|
░ ing will be fed examples sentation of text used * CONSTANT
|
||
|
░ of spam and real mes- in Natural Language Pro- Constant is a non-prof-
|
||
|
░ ░ ░ ░ sages. These examples cessing (NLP). In this it, artist-run organisa-
|
||
|
░ ░ ░ ░ are entries, or rows model, a text is repre- tion based in Brussels
|
||
|
░ ░ from the dataset with a sented as a collection since 1997 and active in
|
||
|
░ ░ label, spam or non-spam. of its unique words, the fields of art, media
|
||
|
░ GLOSSARY ░ The labelling of a disregarding grammar, and technology. Algolit
|
||
|
░ dataset is work executed punctuation and even started as a project of
|
||
|
░ ░ ░ by humans, they pick a word order. The model Constant in 2012.
|
||
|
░ ░ ░ ░ label for each row of transforms the text into http://constantvzw.org
|
||
|
░ the dataset. To ensure a list of words and how
|
||
|
░ the quality of the la- many times they're used * DATA WORKERS
|
||
|
░ bels multiple annotators in the text, or quite Artificial intelligences
|
||
|
see the same row and literally a bag of that are developed to
|
||
|
This is a non-exhaustive have to give the same words. Bag of words is serve, entertain, record
|
||
|
wordlist, based on terms label before an example often used as a base- and know about humans.
|
||
|
that are frequently used is included in the line, on which the new The work of these ma-
|
||
|
in the exhibition. It training data. model has to perform chinic entities is usu-
|
||
|
might help visitors who better. ally hidden behind in-
|
||
|
are not familiar with * AI OR ARTIFICIAL IN- terfaces and patents.
|
||
|
the vocabulary related TELLIGENCES * CHARACTER N-GRAM In the exhibition, algo-
|
||
|
to the field of Natural In computer science, ar- A technique that is used rithmic storytellers
|
||
|
Language Processing tificial intelligence for authorship recogni- leave their invisible
|
||
|
(NLP), Algolit or the (AI), sometimes called tion. When using charac- underworld to become
|
||
|
Mundaneum. machine intelligence, ter n-grams, texts are interlocutors.
|
||
|
is intelligence demon- considered as sequences
|
||
|
* ALGOLIT strated by machines, in of characters. Let's * DUMP
|
||
|
A group from Brussels contrast to the natural consider the character According to the English
|
||
|
involved in artistic re- intelligence displayed trigram. All the over- dictionary, a dump is an
|
||
|
search on algorithms and by humans and other ani- lapping sequences of accumulation of refused
|
||
|
literature. Every month mals. Computer science three characters are and discarded materials
|
||
|
they gather to experi- defines AI research as isolated. For example, or the place where such
|
||
|
ment with code and texts the study of ‘intelli- the character 3-grams of materials are dumped. In
|
||
|
that are published under gent agents’. Any device 'Suicide', would be, computing a dump refers
|
||
|
free licenses. that perceives its envi- 'Sui', 'uic', 'ici', to a ‘database dump’, a
|
||
|
http://www.algolit.net ronment and takes ac- 'cid' etc. Patterns record of data from a
|
||
|
tions that maximize its found with character database used for easy
|
||
|
* ALGOLITERARY chance of successfully n-grams focus on stylis- downloading or for back-
|
||
|
Word invented by Algolit achieving its goals. tic choices that are un- ing up a database.
|
||
|
for works that explore More specifically, Ka- consciously made by the Database dumps are often
|
||
|
the point of view of the plan and Haenlein define author. The patterns re- published by free soft-
|
||
|
algorithmic storyteller. AI as ‘a system’s abil- main stable over the ware and free content
|
||
|
What kind of new forms ity to correctly inter- full length of the text. projects, such as
|
||
|
of storytelling do we pret external data, to Wikipedia, to allow re-
|
||
|
make possible in dia- learn from such data, * CLASSICAL MACHINE use or forking of the
|
||
|
logue with machinic and to use those learn- LEARNING database.
|
||
|
agencies? ings to achieve specific Naive Bayes, Support
|
||
|
goals and tasks through Vector Machines and * FEATURE ENGINEERING
|
||
|
* ALGORITHM flexible adaptation’. Linear Regression are The process of using do-
|
||
|
A set of instructions in Colloquially, the term called classical machine main knowledge of the
|
||
|
a specific programming ‘artificial intelli- learning algorithms. data to create features
|
||
|
language, that takes gence’ is used to de- They perform well when that make machine learn-
|
||
|
an input and produces scribe machines that learning with small ing algorithms work.
|
||
|
an output. mimic ‘cognitive’ func- datasets. But they often This means that a human
|
||
|
tions that humans asso- require complex Readers. needs to spend time on a
|
||
|
* ANNOTATION ciate with other human The task the Readers do, deep exploratory data
|
||
|
The annotation process minds, such as ‘learn- is also called feature- analysis of the dataset.
|
||
|
is a crucial step in su- ing’ and ‘problem solv- engineering (see below). In Natural Language Pro-
|
||
|
pervised machine learn- ing’. (Wikipedia) This means that a human cessing (NLP) features
|
||
|
ing where the algorithm needs to spend time on can be the frequency of
|
||
|
is given examples of * BAG OF WORDS a deep exploratory data words or letters, but
|
||
|
what it needs to learn. The bag-of-words model analysis of the dataset. also syntactical ele-
|
||
|
A spam filter in train- is a simplifying repre- ments like nouns, adjec-
|
||
|
49
|
||
|
█▒░░ ▓▒█░░▒▓███▀▒░ ░▒ ▒ ░ ▒▒▓██▒ ░░ ▒░ ░ ▒ ░▓ ░▒▓ ▒ ▒█░░▒█▓▒░▓▒ ▒▓▒▒░ ▒ ▒▒ ▓▒░ ▒░▓ █ ▒░ █░█ ▓▒░ ▒▓░░
|
||
|
▓ ▒░ ▒▒ ░▒░▒▓ ░ ░ ▒ ░ ▒ ▒ ░ ░ ░▒░ ░▒ ░ ▒ ░▒░▓░ ▒ ░ ▒ ░▒░ ░ ░▒ ▒░░ ░ ▒ ░░▓ ░ ▓▓░░ ░░▒▒▓░
|
||
|
░ ▒ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░░ ░ ▒ ░ ░ ░ ░ ░ ░░ ░ ░ ░ ░ ░ ░ ░ ░
|
||
|
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
|
||
|
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
|
||
|
|
||
|
tives, or verbs. The to make these as free as from Virginia Woolf's nating between face and
|
||
|
most significant fea- possible, in long-last- entire work to all ver- non-face. The jobs
|
||
|
tures for the task to be ing, open formats that sions of Terms of Ser- posted on this platform
|
||
|
solved, must be care- can be used on almost vice published by Google are often paid less than
|
||
|
fully selected and any computer. As of since its existence. a cent per task. Tasks
|
||
|
passed over to the clas- 23 June 2018, Project that are more complex or
|
||
|
sical machine learning Gutenberg reached 57,000 * MACHINE LEARNING require more knowledge
|
||
|
algorithm. items in its collection MODELS can be paid up to sev-
|
||
|
of free eBooks. Algorithms based on eral cents. Many aca-
|
||
|
* FLOSS OR FREE LIBRE (Wikipedia) statistics, mainly used demic researchers use
|
||
|
OPEN SOURCE SOFTWARE to analyse and predict Mechanical Turk as an
|
||
|
Software that anyone is * HENRI LA FONTAINE situations based on ex- alternative to have
|
||
|
freely licensed to use, Henri La Fontaine isting cases. In this their students execute
|
||
|
copy, study, and change (1854-1943) is a Belgian exhibition we focus on these tasks.
|
||
|
in any way, and the politician, feminist and machine learning models
|
||
|
source code is openly pacifist. He was awarded for text processing or * MUNDANEUM
|
||
|
shared so that people the Nobel Peace Prize in Natural language pro- In the late nineteenth
|
||
|
are encouraged to volun- 1913 for his involvement cessing', in short, century two young Bel-
|
||
|
tarily improve the de- in the International 'nlp'. These models have gian jurists, Paul Otlet
|
||
|
sign of the software. Peace Bureau and his learned to perform a (1868-1944), ‘the father
|
||
|
This is in contrast to contribution to the or- specific task on the ba- of documentation’, and
|
||
|
proprietary software, ganization of the peace sis of existing texts. Henri La Fontaine
|
||
|
where the software is movement. In 1895, to- The models are used for (1854-1943), statesman
|
||
|
under restrictive copy- gether with Paul Otlet, search engines, machine and Nobel Peace Prize
|
||
|
right licensing and the he created the Interna- translations and sum- winner, created The Mun-
|
||
|
source code is usually tional Bibliography In- maries, spotting trends daneum. The project
|
||
|
hidden from the users. stitute, which became in new media networks aimed at gathering all
|
||
|
(Wikipedia) the Mundaneum. Within and news feeds. They in- the world’s knowledge
|
||
|
this institution, which fluence what you get to and file it using the
|
||
|
* GIT aimed to bring together see as a user, but also Universal Decimal Clas-
|
||
|
A software system for all the world's knowl- have their word to say sification (UDC) system
|
||
|
tracking changes in edge, he contributed to in the course of stock that they had invented.
|
||
|
source code during soft- the development of the exchanges worldwide, the
|
||
|
ware development. It is Universal Decimal Clas- detection of cybercrime * NATURAL LANGUAGE
|
||
|
designed for coordinat- sification (CDU) system. and vandalism, etc. A natural language
|
||
|
ing work among program- or ordinary language
|
||
|
mers, but it can be used * KAGGLE * MARKOV CHAIN is any language that
|
||
|
to track changes in any An online platform where Algorithm that scans the has evolved naturally
|
||
|
set of files. Before users find and publish text for the transition in humans through use
|
||
|
starting a new project, data sets, explore and probability of letter or and repetition without
|
||
|
programmers create a build machine learning word occurrences, re- conscious planning or
|
||
|
"git repository" in models, work with other sulting in transition premeditation. Natural
|
||
|
which they will publish data scientists and ma- probability tables which languages can take
|
||
|
all parts of the code. chine learning engi- can be computed even different forms, such
|
||
|
The git repositories of neers, and enter compe- without any semantic or as speech or signing.
|
||
|
Algolit can be found on titions to solve data grammatical natural lan- They are different from
|
||
|
https://gitlab.contant science challenges. guage understanding. It constructed and formal
|
||
|
vzw.org/algolit. About half a million can be used for analyz- languages such as those
|
||
|
data scientists are ac- ing texts, but also for used to program comput-
|
||
|
* GUTENBERG.ORG tive on Kaggle. It was recombining them. It is ers or to study logic.
|
||
|
Project Gutenberg is an founded by Goldbloom and is widely used in spam (Wikipedia)
|
||
|
online platform run by Ben Hamner in 2010 and generation.
|
||
|
volunteers to ‘encourage acquired by Google in * NLP OR NATURAL LAN-
|
||
|
the creation and distri- March 2017. * MECHANICAL TURK GUAGE PROCESSING
|
||
|
bution of eBooks’. It The Amazon Mechanical Natural language pro-
|
||
|
was founded in 1971 by * LITERATURE Turk is an online plat- cessing (NLP) is a col-
|
||
|
American writer Michael Algolit understands the form for humans to exe- lective term referring
|
||
|
S. Hart and is the old- notion of literature in cute tasks that algo- to automatic computa-
|
||
|
est digital library. the way a lot of other rithms cannot. Examples tional processing of
|
||
|
Most of the items in its experimental authors do. include annotating sen- human languages. This
|
||
|
collection are the full It includes all linguis- tences as being positive includes algorithms that
|
||
|
texts of public domain tic production, from the or negative, spotting take human-produced text
|
||
|
books. The project tries dictionary to the Bible, number plates, discrimi- as input, and attempt
|
||
|
50
|
||
|
█▒░░ ▓▒█░░▒▓███▀▒░ ░▒ ▒ ░ ▒▒▓██▒ ░░ ▒░ ░ ▒ ░▓ ░▒▓ ▒ ▒█░░▒█▓▒░▓▒ ▒▓▒▒░ ▒ ▒▒ ▓▒░ ▒░▓ █ ▒░ █░█ ▓▒░ ▒▓░░
|
||
|
▓ ▒░ ▒▒ ░▒░▒▓ ░ ░ ▒ ░ ▒ ▒ ░ ░ ░▒░ ░▒ ░ ▒ ░▒░▓░ ▒ ░ ▒ ░▒░ ░ ░▒ ▒░░ ░ ▒ ░░▓ ░ ▓▓░░ ░░▒▒▓░
|
||
|
░ ▒ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░░ ░ ▒ ░ ░ ░ ░ ░ ░░ ░ ░ ░ ░ ░ ░ ░ ░
|
||
|
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
|
||
|
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
|
||
|
|
||
|
to generate text that tentielle (Workspace for manually define rules should carefully choose
|
||
|
resembles it. Potential Literature). for them. As prediction the training material,
|
||
|
Oulipo was created in models they are then and adapt it to the ma-
|
||
|
* NEURAL NETWORKS Paris by the French called rule-based mod- chine's task. It doesn't
|
||
|
Computing systems in- writers Raymond Queneau els, opposed to statis- make sense to train a
|
||
|
spired by the biological and François Le Lion- tical models. Rule-based machine with nineteenth-
|
||
|
neural networks that nais. They rooted their models are handy for century novels if its mis-
|
||
|
constitute animal practice in the European tasks that are specific, sion is to analyze tweets.
|
||
|
brains. The neural net- avant-garde of the twen- like detecting when a
|
||
|
work itself is not an tieth century and in the scientific paper con- * UNSUPERVISED MACHINE
|
||
|
algorithm, but rather a experimental tradition cerns a certain mole- LEARNING MODELS
|
||
|
framework for many dif- of the 1960s. For cule. With very little Unsupervised machine
|
||
|
ferent machine learning Oulipo, the creation of sample data, they can learning models don't
|
||
|
algorithms to work to- rules becomes the condi- perform well. need the step of annota-
|
||
|
gether and process com- tion to generate new tion of the data by hu-
|
||
|
plex data inputs. Such texts, or what they call * SENTIMENT ANALYSIS mans. This saves a lot
|
||
|
systems ‘learn’ to per- potential literature. Also called 'opinion of time, energy, money.
|
||
|
form tasks by consider- Later, in 1981, they mining' A basic task Instead, they need a
|
||
|
ing examples, generally also created ALAMO, Ate- in sentiment analysis large amount of training
|
||
|
without being programmed lier de littérature as- is classifying a given data, which is not al-
|
||
|
with any task-specific sistée par la mathéma- text as positive, nega- ways available and can
|
||
|
rules. For example, in tique et les ordinateurs tive or neutral. take a long cleaning
|
||
|
image recognition, they (Workspace for litera- Advanced, 'beyond pola- time beforehand.
|
||
|
might learn to identify ture assisted by maths rity' sentiment classi-
|
||
|
images that contain cats and computers). fication looks, for in- * WORD EMBEDDINGS
|
||
|
by analyzing example ima- stance, at emotional Language modelling tech-
|
||
|
ges that have been man- * PAUL OTLET states such as 'angry', niques that through mul-
|
||
|
ually labeled as ‘cat’ Paul Otlet (1868 – 1944) 'sad' and 'happy'. tiple mathematical oper-
|
||
|
or ‘no cat’ and using was a Belgian author, Sentiment analysis ations of counting and
|
||
|
the results to identify entrepreneur, visionary, is widely applied to ordering, plot words
|
||
|
cats in other images. lawyer and peace ac- user materials such into a multi-dimensional
|
||
|
They do this without any tivist; he is one of as reviews and survey vector space. When em-
|
||
|
prior knowledge about several people who have responses, comments bedding words, they
|
||
|
cats, for example, that been considered the fa- and posts on social transform from being
|
||
|
they have fur, tails, ther of information sci- media, and healthcare distinct symbols into
|
||
|
whiskers and cat-like ence, a field he called materials for applica- mathematical objects
|
||
|
faces. Instead, they au- 'documentation'. Otlet tions that range from that can be multiplied,
|
||
|
tomatically generate created the Universal marketing to customer divided, added or sub-
|
||
|
identifying characteris- Decimal Classification, service, from stock ex- stracted.
|
||
|
tics from the learning that was widespread in change transactions to
|
||
|
material that they libraries. Together with clinical medicine. * WORDNET
|
||
|
process. (Wikipedia) Henri La Fontaine he Wordnet is a combination
|
||
|
created the Palais Mon- * SUPERVISED MACHINE of a dictionary and a
|
||
|
* OPTICAL CHARACTER dial (World Palace), LEARNING MODELS thesaurus that can be
|
||
|
RECOGNITION (OCR) later, the Mundaneum to For the creation of su- read by machines.
|
||
|
Computer processes for house the collections pervised machine learn- According to Wikipedia
|
||
|
translating images of and activities of their ing models, humans anno- it was created in the
|
||
|
scanned texts into ma- various organizations tate sample text with Cognitive Science
|
||
|
nipulable text files. and institutes. labels before feeding Laboratory of Princeton
|
||
|
it to a machine to learn. University starting in
|
||
|
* ORACLE * PYTHON Each sentence, paragraph 1985. The project was
|
||
|
Oracles are prediction The main programming or text is judged by at initially funded by the
|
||
|
or profiling machines, language that is glob- least 3 annotators US Office of Naval Re-
|
||
|
a specific type of algo- ally used for natural whether it is spam or search and later also
|
||
|
rithmic models, mostly language processing, was not spam, positive or by other US government
|
||
|
based on statistics. invented in 1991 by the negative etc. agencies including
|
||
|
They are widely used in Dutch programmer Guido DARPA, the National
|
||
|
smartphones, computers, Van Rossum. * TRAINING DATA Science Foundation, the
|
||
|
tablets. Machine learning algo- Disruptive Technology
|
||
|
* RULE-BASED MODELS rithms need guidance. Office (formerly the
|
||
|
* OULIPO Oracles can be created In order to separate one Advanced Research and
|
||
|
Oulipo stands for Ou- using different tech- thing from another, they Development Activity),
|
||
|
vroir de litterature po- niques. One way is to need texts to extract and REFLEX.
|
||
|
51
|
||
|
◝ humans learn with machines ◜ ◡ machines learn from machines ◞ ◡ machines learn with humans ◞ ◝
|
||
|
humans learn from machines ◟ ◜ machines learn with machines ◠ ◜ machines learn from humans ◟ ◠
|
||
|
humans learn with humans ◞ ◝ humans learn from humans ◞ ◠ humans learn with machines ◟ ◡ mac
|
||
|
ines learn from machines ◡ ◡ machines learn with humans ◟ ◡ humans learn from machines ◝ ◟
|
||
|
achines learn with machines ◠ ◝ machines learn from humans ◜ ◝ humans learn with humans ◞ ◞
|
||
|
humans learn from humans ◡ ◞ humans learn with machines ◠ ◠ machines learn from machines ◠
|
||
|
machines learn with humans ◞ ◜ humans learn from machines ◜ ◠ machines learn with machines ◝
|
||
|
◜ machines learn from humans ◜ ◠ humans learn with humans ◝ ◟ humans learn from humans ◞
|
||
|
◜ humans learn with machines ◡ ◡ machines learn from machines ◡ ◟ machines learn with humans
|
||
|
◠ ◠ humans learn from machines ◡ ◜ machines learn with machines ◜ ◟ machines learn from
|
||
|
umans ◟ ◞ humans learn with humans ◞ ◟ humans learn from humans ◜ ◠ humans learn with ma
|
||
|
hines ◜ ◠ machines learn from machines ◝ ◠ machines learn with humans ◝ ◞ humans learn f
|
||
|
om machines ◝ ◡ machines learn with machines ◜ ◡ machines learn from humans ◜ ◠ humans l
|
||
|
arn with humans ◡ ◡ humans learn from humans ◝ ◞ humans learn with machines ◟ ◡ machines
|
||
|
learn from machines ◜ ◜ machines learn with humans ◠ ◞ humans learn from machines ◝ ◠ ma
|
||
|
hines learn with machines ◟ ◟ machines learn from humans ◝ ◠ humans learn with humans ◟
|
||
|
humans learn from humans ◝ ◜ humans learn with machines ◠ ◝ machines learn from machines ◞
|
||
|
◠ machines learn with humans ◝ ◟ humans learn from machines ◟ ◞ machines learn with machines
|
||
|
◜ ◞ machines learn from humans ◞ ◡ humans learn with humans ◠ ◞ humans learn from human
|
||
|
◠ ◜ humans learn with machines ◡ ◞ machines learn from machines ◜ ◠ machines learn w
|
||
|
th humans ◡ ◝ humans learn from machines ◝ ◟ machines learn with machines ◠ ◠ machine
|
||
|
learn from humans ◞ ◟ humans learn with humans ◠ ◞ humans learn from humans ◠ ◠ huma
|
||
|
s learn with machines ◡ ◡ machines learn from machines ◜ ◞ machines learn with humans ◡
|
||
|
◟ humans learn from machines ◜ ◜ machines learn with machines ◜ ◝ machines learn from human
|
||
|
◜ ◠ humans learn with humans ◝ ◡ humans learn from humans ◡ ◞ humans learn with mach
|
||
|
nes ◜ ◝ machines learn from machines ◝ ◜ machines learn with humans ◞ ◜ humans learn
|
||
|
rom machines ◞ ◝ machines learn with machines ◞ ◜ machines learn from humans ◡ ◞ huma
|
||
|
s learn with humans ◟ ◜ humans learn from humans ◞ ◡ humans learn with machines ◝ ◝ m
|
||
|
chines learn from machines ◜ ◟ machines learn with humans ◡ ◟ humans learn from machines ◠
|
||
|
◝ machines learn with machines ◜ ◡ machines learn from humans ◞ ◝ humans learn with huma
|
||
|
s ◝ ◠ humans learn from humans ◞ ◜ humans learn with machines ◠ ◝ machines learn from
|
||
|
machines ◟ ◡ machines learn with humans ◝ ◝ humans learn from machines ◞ ◞ machines l
|
||
|
arn with machines ◠ ◠ machines learn from humans ◠ ◡ humans learn with humans ◜ ◜ hum
|
||
|
ns learn from humans ◞ ◞ humans learn with machines ◡ ◝ machines learn from machines ◟
|
||
|
◝ machines learn with humans ◠ ◟ machines learn with humans ◠ ◜ machines learn from
|
||
|
machines ◡ ◜ humans learn with machines ◞ ◟ humans learn from humans ◜ ◡ humans learn
|
||
|
with humans ◝ ◞ machines learn from humans ◜ ◝ machines learn with machines ◜ ◠ human
|
||
|
learn from machines ◡ ◝ machines learn with humans ◝ ◜ machines learn from machines ◜
|
||
|
◞ humans learn with machines ◠ ◝ humans learn from humans ◠ ◝ humans learn with humans ◞
|
||
|
◡ machines learn from humans ◜ ◝ machines learn with machines ◠ ◟ humans learn from machi
|
||
|
es ◜ ◟ machines learn with humans ◝ ◝ machines learn from machines ◞ ◜ humans learn w
|
||
|
th machines ◝ ◡ humans learn from humans ◝ ◝ humans learn with humans ◠ ◠ machines le
|
||
|
rn from humans ◝ ◡ machines learn with machines ◡ ◡ humans learn from machines ◠ ◞ ma
|
||
|
hines learn with humans ◝ ◜ machines learn from machines ◜ ◝ humans learn with machines ◠
|
||
|
◞ humans learn from humans ◝ ◡ humans learn with humans ◞ ◡ machines learn from humans ◟
|
||
|
◟ machines learn with machines ◝ ◝ humans learn from machines ◜ ◟ machines learn with
|
||
|
umans ◡ ◝ machines learn from machines ◡ ◝ humans learn with machines ◞ ◜ humans lear
|
||
|
from humans ◜ ◝ humans learn with humans ◞ ◡ machines learn from humans ◝ ◡ machines
|
||
|
learn with machines ◞ ◟ humans learn from machines ◜ ◞ machines learn with humans ◟ ◡
|
||
|
machines learn from machines ◜ ◝ humans learn with machines ◠ ◠ humans learn from humans ◠
|
||
|
◝ humans learn with humans ◟ ◞ machines learn from humans ◝ ◠ machines learn with machines
|
||
|
◜ ◟ humans learn from machines ◠ ◝ machines learn with humans ◝ ◜ machines learn from ma
|
||
|
hines ◟ ◟ humans learn with machines ◞ ◡ humans learn from humans ◝ ◝ humans learn with
|
||
|
umans ◡ ◝ machines learn from humans ◝ ◡ machines learn with machines ◟ ◞ humans learn f
|
||
|
om machines ◝ ◟ machines learn with humans ◝ ◜ machines learn from machines ◝ ◠ humans l
|
||
|
arn with machines ◠ ◠ humans learn from humans ◟ ◜ humans learn with humans ◟ ◝ machines
|
||
|
learn from humans ◡ ◡ machines learn with machines ◜ ◜ humans learn from machines ◠ ◟ ma
|
||
|
hines learn with humans ◞ ◜ machines learn from machines ◠ ◜ humans learn with machines ◜
|
||
|
◞ humans learn from humans ◝ ◟ humans learn with humans ◟ ◞ machines learn from humans ◟
|
||
|
◝ machines learn with machines ◡ ◜ humans learn from machines ◠ ◠ machines learn with humans ◞
|
||
|
◡ machines learn from machines ◟ ◝ humans learn with machines ◜ ◞ humans learn from huma
|
||
|
s ◝ ◞ humans learn with humans ◜ ◟ machines learn from humans ◜ ◞ machines learn with ma
|
||
|
hines ◝ ◞ humans learn from machines ◝ ◜ machines learn with humans ◟ ◜ machines learn from
|
||
|
machines ◡ ◟ humans learn with machines ◞ ◠ humans learn from humans ◞ ◟ humans learn with
|
||
|
umans ◠ ◜ machines learn from humans ◡ ◠ machines learn with machines ◠ ◝ humans learn from
|
||
|
machines ◠ ◜ machines learn with humans ◞ ◠ machines learn from machines ◞ ◠ humans learn w
|
||
|
th machines ◜ ◟ humans learn from humans ◝ ◠ humans learn with humans ◝ ◟ machines learn from
|
||
|
humans ◜ ◜ machines learn with machines ◠ ◞ humans learn from machines ◠ ◡ machines learn with
|
||
|
machines ◡ ◟ humans learn with machines ◞ ◠ humans learn from humans ◞ ◟ humans learn with mach
|
||
|
ines ◝ ◞ humans learn from machines ◝ ◜ machines learn with humans ◟ ◜ machines learn from hum
|