data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read nd learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, nform, read and learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, infor , read and learn data workers write, perform, clean, inform, read and learn data workers w ite, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and l arn data workers write, perform, clean, inform, read and learn data workers write, p rform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and earn data workers write, perform, clean, inform, read and learn data wor ers write, perform, clean, inform, read and learn data workers write, perform, clean, inf rm, read and learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn data workers wri e, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn data wor ers write, perform, clean, inform, read and learn data workers write, perform, cl an, inform, read and learn data workers write, perform, clean, inform, read and earn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn dat workers write, perform, clean, inform, read and learn data workers write, p rform, clean, inform, read and learn data workers write, perform, clean, in orm, read and learn data workers write, perform, clean, inform, read and l arn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn data work rs write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, nform, read and learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read nd learn data workers write, perform, clean, inform, read and l arn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and l arn data workers write, perform, clean, inform, read nd learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, nform, read and learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn data work rs write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn data workers write, perform, clean, inform, read and learn What can humans learn from humans humans learn with machines machines learn from machines machines learn with humans humans learn from machines machines learn with machines machines learn from humans humans learn with humans ? ? ? Data Workers, an exhibition at the Mundaneum in Mons from 28 March until 28 April 2019. 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678 9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567 89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456 789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345 6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234 56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123 456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12 3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1 23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678 9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567 89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456 789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345 6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234 56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123 456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12 3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1 23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678 9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567 89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456 789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345 6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234 56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123 456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12 3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1 23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678 9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567 89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456 789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345 6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234 56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123 456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12 3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1 23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678 9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567 89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456 789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345 6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234 56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123 456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12 3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1 23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678 9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567 89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456 789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345 6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234 56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123 456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12 3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1 23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678 9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567 89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456 789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345 6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234 56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123 456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12 3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1 23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 2 ABOUT AT THE MUNDANEUM Data Workers is an exhibition of algoliterary works, of stories In the late nineteenth century two young told from an ‘algorithmic storyteller point of view’. The exhibi- Belgian jurists, Paul Otlet (1868–1944), tion was created by members of Algolit, a group from Brussels in- the 'father of documentation’, and Henri volved in artistic research on algorithms and literature. Every La Fontaine (1854-1943), statesman and month they gather to experiment with F/LOSS code and texts. Some Nobel Peace Prize winner, created the works are by students of Arts² and external participants to the Mundaneum. The project aimed to gather workshop on machine learning and text organized by Algolit in Oc- all the world’s knowledge and to file it tober 2018 at the Mundaneum. using the Universal Decimal Classifica- tion (UDC) system that they had invent- Companies create artificial intelligence (AI) systems to serve, ed. At first it was an International In- entertain, record and learn about humans. The work of these ma- stitutions Bureau dedicated to interna- chinic entities is usually hidden behind interfaces and patents. tional knowledge exchange. In the twen- In the exhibition, algorithmic storytellers leave their invisible tieth century the Mundaneum became a underworld to become interlocutors. The data workers operate in universal centre of documentation. Its different collectives. Each collective represents a stage in the collections are made up of thousands of design process of a machine learning model: there are the Writ- books, newspapers, journals, documents, ers, the Cleaners, the Informants, the Readers, the Learners and posters, glass plates and postcards in- the Oracles. The boundaries between these collectives are not dexed on millions of cross-referenced fixed; they are porous and permeable. At times, Oracles are also cards. The collections were exhibited Writers. At other times Readers are also Oracles. Robots voice and kept in various buildings in Brus- experimental literature, while algorithmic models read data, turn sels, including the Palais du Cinquante- words into numbers, make calculations that define patterns and naire. The remains of the archive only are able to endlessly process new texts ever after. moved to Mons in 1998. The exhibition foregrounds data workers who impact our daily Based on the Mundaneum, the two men de- lives, but are either hard to grasp and imagine or removed from signed a World City for which Le Corbus- the imagination altogether. It connects stories about algorithms ier made scale models and plans. The aim in mainstream media to the storytelling that is found in techni- of the World City was to gather, at a cal manuals and academic papers. Robots are invited to engage in global level, the institutions of knowl- dialogue with human visitors and vice versa. In this way we might edge: libraries, museums and universi- understand our respective reasonings, demystify each other's be- ties. This project was never realized. haviour, encounter multiple personalities, and value our collec- It suffered from its own utopia. The tive labour. It is also a tribute to the many machines that Paul Mundaneum is the result of a visionary Otlet and Henri La Fontaine imagined for their Mundaneum, showing dream of what an infrastructure for uni- their potential but also their limits. versal knowledge exchange could be. It attained mythical dimensions at the --- time. When looking at the concrete ar- chive that was developed, that collec- Data Workers was created by Algolit. tion is rather eclectic and specific. Works by: Cristina Cochior, Gijs de Heij, Sarah Garcin, An Artificial intelligence systems today Mertens, Javier Lloret, Louise Dekeuleneer, Florian Van de Weyer, come with their own dreams of universal- Laetitia Trozzi, Rémi Forte, Guillaume Slizewicz, Michael Mur- ity and knowledge production. When read- taugh, Manetta Berends, Mia Melvær. ing about these systems, the visionary dreams of their makers were there from Co-produced by: Arts², Constant and Mundaneum. the beginning of their development in the 1950s. Nowadays, their promise has With the support of: Wallonia-Brussels Federation/Digital Arts, also attained mythical dimensions. When Passa Porta, UGent, DHuF - Digital Humanities Flanders and looking at their concrete applications, Distributed Proofreaders Project. the collection of tools is truly innova- tive and fascinating, but at the same Thanks to: Mike Kestemont, Michel Cleempoel, Donatella Portoghe- time, rather eclectic and specific. For se, François Zajéga, Raphaèle Cornille, Vincent Desfromont, Kris Data Workers, Algolit combined some of Rutten, Anne-Laure Buisson, David Stampfli. the applications with 10 per cent of the digitized publications of the Interna- tional Institutions Bureau. In this way, we hope to poetically open up a discus- sion about machines, algorithms, and technological infrastructures. 3 CONTEXTUAL STORIES ABOUT ALGOLIT --- Why contextual stories? --- spread by the media, often limited to superficial reporting and myth-making. By creating algoliter- During the monthly meetings of Algolit, we study ary works, we offer humans an introduction to manuals and experiment with machine learning tools techniques that co-shape their daily lives. for text processing. And we also share many, many stories. With the publication of these stories we hope to recreate some of that atmosphere. The sto- --- What is literature? --- ries also exist as a podcast that can be down- loaded from http://www.algolit.net. Algolit understands the notion of literature in the way a lot of other experimental authors do: it For outsiders, algorithms only become visible in includes all linguistic production, from the dic- the media when they achieve an outstanding perfor- tionary to the Bible, from Virginia Woolf's entire mance, like Alpha Go, or when they break down in work to all versions of the Terms of Service pub- fantastically terrifying ways. Humans working in lished by Google since its existence. In this the field though, create their own culture on and sense, programming code can also be literature. offline. They share the best stories and experi- ences during live meetings, research conferences The collective Oulipo is a great source of inspi- and annual competitions like Kaggle. These stories ration for Algolit. Oulipo stands for Ouvroir de that contextualize the tools and practices can be litterature potentielle (Workspace for Potential funny, sad, shocking, interesting. Literature). Oulipo was created in Paris by the French writers Raymond Queneau and François Le A lot of them are experiential learning cases. The Lionnais. They rooted their practice in the Euro- implementations of algorithms in society generate pean avant-garde of the twentieth century and in new conditions of labour, storage, exchange, be- the experimental tradition of the 1960s. haviour, copy and paste. In that sense, the con- textual stories capture a momentum in a larger an- For Oulipo, the creation of rules becomes the con- thropo-machinic story that is being written at dition to generate new texts, or what they call full speed and by many voices. potential literature. Later, in 1981, they also created ALAMO, Atelier de littérature assistée par la mathématique et les ordinateurs (Workspace for --- We create 'algoliterary' works --- literature assisted by maths and computers). The term 'algoliterary' comes from the name of our research group Algolit. We have existed since 2012 --- An important difference --- as a project of Constant, a Brussels-based organi- zation for media and the arts. We are artists, While the European avant-garde of the twentieth writers, designers and programmers. Once a month century pursued the objective of breaking with we meet to study and experiment together. Our work conventions, members of Algolit seek to make con- can be copied, studied, changed, and redistributed ventions visible. under the same free license. You can find all the information on: http://www.algolit.net. 'I write: I live in my paper, I invest it, I walk through it.' (Espèces d'espaces. Journal d'un us- The main goal of Algolit is to explore the view- ager de l'espace, Galilée, Paris, 1974) point of the algorithmic storyteller. What new forms of storytelling do we make possible in dia- This quote from Georges Perec in Espèces d'espaces logue with these machinic agencies? Narrative could be taken up by Algolit. We're not talking viewpoints are inherent to world views and ideolo- about the conventions of the blank page and the gies. Don Quixote, for example, was written from literary market, as Georges Perec was. We're re- an omniscient third-person point of view, showing ferring to the conventions that often remain hid- Cervantes’ relation to oral traditions. Most con- den behind interfaces and patents. How are tech- temporary novels use the first-person point of nologies made, implemented and used, as much in view. Algolit is interested in speaking through academia as in business infrastructures? algorithms, and in showing you the reasoning un- derlying one of the most hidden groups on our We propose stories that reveal the complex hy- planet. bridized system that makes machine learning possi- ble. We talk about the tools, the logics and the To write in or through code is to create new forms ideologies behind the interfaces. We also look at of literature that are shaping human language in who produces the tools, who implements them, and unexpected ways. But machine Learning techniques who creates and accesses the large amounts of data are only accessible to those who can read, write needed to develop prediction machines. One could and execute code. Fiction is a way of bridging the say, with the wink of an eye, that we are collabo- gap between the stories that exist in scientific rators of this new tribe of human-robot hybrids. papers and technical manuals, and the stories 4 writers write writers write writers write writers write writers write writers write writ rs write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writ rs write writers write writers write writers write writers write writers write writers write writers write writer write writers write writers write writ rs write writers write writers write writers write writers write writers write writers write w iters write writers write writers write writers write writers write writers write writers write writers write writers write writers write writer write writers write writers write writers write writers write writers write writers write writers write writ rs write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write writers write 5 86ncrg k en3 a ioi-t i i l1 e i +-+-+-+-+-+-+-+ a +-+-+-+-+-+ l 9 t7ccpI46ed6t o w 7e a5o3 - el, e 7 nh 71 e 5 4 3 4 |w|r|i|t|e|r|s| i |w|r|i|t|e| daml su h i e1 ww A l e59se a 5o wl amlt t s w tlo n r 7a o9 +-+-+-+-+-+-+-+ ta +-+-+-+-+-+ hw t o4e e n,o32r , wd2 eo re 67n r o1ife tt s 38 nt l 74 o 7 5i oda 65 ei r 9 7 n 5 n1r m l ot a51 e 3ma, 14swn 7 r r b o i 3 se2 rceit ne a ki r 8 1iw3s n an t 8 8 r ra bn 1 eue r t4a r sT r phe o e 6e6 7h5orir de6 1 +-+-+-+-+ +-+-+-+-+-+-+-+ t u +-+-+-+-+ 1 8 97o e c 4 d 8 h 7 z o a c4 w as 3r 17r p ai |d|a|t|a| |w|o|r|k|e|r|s| |w|o|r|k| 6 r6v56 4 2i7 e tu1 r9 w 5 8 52 1 wi r 4hn G +-+-+-+-+ +-+-+-+-+-+-+-+ n +-+-+-+-+ nr 4 21 n raa2 Pn9 h a ca3 adw sara +-+-+-+-+ +-+-+-+-+-+-+-+ +-+-+-+-+-+ 9 e9na y tt c 7 6 .cbieas u e 5m b t3r 4 46 |m|a|n|y| |a|u|t|h|o|r|s| u |w|r|i|t|e| 4 4 yff , th t e 6 2 6vo nn s +-+-+-+-+ +-+-+-+-+-+-+-+ m +-+-+-+-+-+ i 4 1 W1 n r8 - 1 g7 4n +-+-+-+-+-+ +-+-+-+-+-+ +-+-+-+-+-+ 8 1n e 6l v5c a r 4 1 |e|v|e|r|y| |h|u|m|a|n| |b|e|i|n|g| n5 asr e 7l h 7 u , k o 2 r e h r h +-+-+-+-+-+ +-+-+-+-+-+ +-+-+-+-+-+ 65 3 1 t w er e3 5 1en e i 4 o c +-+-+-+ +-+-+-+ +-+-+-+-+-+-+ +-+-+ u 6d7 r tm , t l se t i 1 t fc |w|h|o| |h|a|s| |a|c|c|e|s|s| |t|o| e 69 t n 1 k 4 1 e n +-+-+-+ +-+-+-+ +-+-+-+-+-+-+ +-+-+ ie 62i 2 t tn 7 t on o e 1 l , +-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+ a 9 , 9 9 w r |t|h|e| |i|n|t|e|r|n|e|t| |i|n|t|e|r|a|c|t|s| r i i tr h u f m i m 5 +-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+ 6 T c 5 w 6 i d T 7 5 l i os +-+-+ +-+-+-+-+-+ +-+-+-+-+-+-+ s m w s r6 n |w|e| t |c|h|a|t|,| |w|r|i|t|e|,| 6 rrf e 2 6 , p oe +-+-+ o +-+-+-+-+-+ +-+-+-+-+-+-+ r e s 4 e p y 9 i +-+-+-+-+-+-+ +-+-+-+-+ +-+-+-+ r / e s 6 e |c|l|i|c|k|,| |l|i|k|e| |a|n|d| tw r6 t ai 3 8 28 a n e 8 +-+-+-+-+-+-+ +-+-+-+-+ +-+-+-+ r4 7 e n h t 5 n +-+-+-+-+-+ n 3 9 f c |s|h|a|r|e| p l 5 9 +-+-+-+-+-+ d 7 1 +-+-+ +-+-+-+-+-+ +-+-+-+ +-+-+-+-+ t 5 r 2 2 e |w|e| |l|e|a|v|e| |o|u|r| |d|a|t|a| n3 i , d t 8 a 9 +-+-+ 1 +-+-+-+-+-+ +-+-+-+ +-+-+-+-+ t 7 +-+-+ +-+-+-+-+ +-+-+-+-+-+-+-+-+-+ 7 t e |w|e| |f|i|n|d| |o|u|r|s|e|l|v|e|s| 6 y s 8 8 +-+-+ 7 +-+-+-+-+ +-+-+-+-+-+-+-+-+-+ n e r 1 +-+-+-+-+-+-+-+ +-+-+ +-+-+-+-+-+-+ e a 2 t |w|r|i|t|i|n|g| |i|n| |P|y|t|h|o|n| 5 3 d +-+-+-+-+-+-+-+ +-+-+ +-+-+-+-+-+-+ r +-+-+-+-+ +-+-+-+-+-+-+ e |s|o|m|e| |n|e|u|r|a|l| 4 a k n +-+-+-+-+ +-+-+-+-+-+-+ z or 3 w +-+-+-+-+-+-+-+-+ +-+-+-+-+-+ 1 1 |n|e|t|w|o|r|k|s| c |w|r|i|t|e| 1 9 s n +-+-+-+-+-+-+-+-+ +-+-+-+-+-+ e a g +-+-+-+-+-+ +-+-+-+-+-+-+-+ +-+-+-+-+-+-+ t |h|u|m|a|n| |e|d|i|t|o|r|s| |a|s|s|i|s|t| n , o 8 +-+-+-+-+-+ +-+-+-+-+-+-+-+ +-+-+-+-+-+-+ a +-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+ 4 |p|o|e|t|s|,| |p|l|a|y|w|r|i|g|h|t|s| i7 t +-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+ t c k y v +-+-+ +-+-+-+-+-+-+-+-+-+ o +-+-+-+-+-+-+ |o|r| |n|o|v|e|l|i|s|t|s| |a|s|s|i|s|t| 4 2 9 r +-+-+ +-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+ 7 6 u r e , R 6 6 t s 3 g 6 4 c e t 2 3 h 8 D 4 a n o - w 5 e 3 n e 3 3 e 6 V V V % V % V % V V V % % %% % %% % %% % % % % % % V V V V V V V V V V V V V V V V % % 0 %% 0 % %% % % % % % V V V V V V % V V V % % % % % % 0 % 00 % % 0 % % %% % 0 0 %% % % ___ _ %% % 0 % % % % % / \__ _| |_ __ _ WRITERS % % % / /\ / _` | __/ _` | 0 0 % % % % % % / /_// (_| | || (_| | % % % % % 0 0 00 /___,' \__,_|\__\__,_| 0 V V V V % V V V % V 0 __ __ _ V V V V V V V V V V V V V V V V 0 0 / / /\ \ \___ _ __| | _____ _ __ ___ 0 0 % V V V V % V V V V V \ \/ \/ / _ \| '__| |/ / _ \ '__/ __| V V V V V V V V 0 0 0 \ /\ / (_) | | | < __/ | \__ \ 0 V V V V V V V V V V V V V V V V \/ \/ \___/|_| |_|\_\___|_| |___/ % % V V V % V V V V V V 0 ___ _ _ _ 0 0 0 _ _ 0 % % / _ \_ _| |__ | (_) ___ __ _| |_(_) ___ _ __ % Data workers need data to work 0 / /_)/ | | | '_ \| | |/ __/ _` | __| |/ _ \| '_ \ with. The data that used in the % / ___/| |_| | |_) | | | (_| (_| | |_| | (_) | | | | context of Algolit is written lan- 0 \/ \__,_|_.__/|_|_|\___\__,_|\__|_|\___/|_| |_| guage. Machine learning relies on 0 0 % 0 % % many types of writing. Many authors write in the form of publications, By Algolit such as books or articles. These % % are part of organized archives and All works visible in the exhibition, as well as the contextual are sometimes digitized. But there stories and some extra text material have been collected in a are other kinds of writing too. We publication, which exists in French and English. could say that every human being who has access to the Internet is a This publication is made using a plain text workflow, based on writer each time they interact with various text processing and counting tools. The plain text file algorithms. We chat, write, click, format is a type of document in which there is no inherent struc- like and share. In return for free tural difference between headers and paragraphs anymore. It is services, we leave our data that is the most used type of document in machine learning models for compiled into profiles and sold for text. This format has been the starting point of a playful design advertising and research purposes. process, where pages are carefully counted, page by page, line by line and character by character. % Machine learning algorithms are not % critics: they take whatever they're Each page holds 110 characters per line and 70 lines per page. given, no matter the writing style, The design originates from the act of counting words, spaces and no matter the CV of the author, no lines. It plays with random choices, scripted patterns and matter the spelling mistakes. In ASCII/UNICODE-fonts, to speculate about the materiality of digi- fact, mistakes make it better: the tal text and to explore the interrelations between counting and more variety, the better they learn writing through words and numbers. to anticipate unexpected text. But often, human authors are not aware --- % of what happens to their work. Texts: Cristina Cochior, Sarah Garcin, Gijs de Heij, An Mertens, Most of the writing we use is in François Zajéga, Louise Dekeuleneer, Florian Van de Weyer, Laeti- English, some in French, some in tia Trozzi, Rémi Forte, Guillaume Slizewicz. Dutch. Most often we find ourselves writing in Python, the programming Translations & proofreading: deepl.com, Michel Cleempoel, Elodie language we use. Algorithms can be % Mugrefya, Emma Kraak, Patrick Lennon. writers too. Some neural networks write their own rules and generate Lay-out & cover: Manetta Berends their own texts. And for the models that are still wrestling with the Responsible publisher: Constant vzw/asbl, Rue du Fortstraat 5, ambiguities of natural language, 1060 Brussels there are human editors to assist them. Poets, playwrights or novel- License: Algolit, Data Workers, March 2019, Brussels. Copyleft: ists start their new careers as as- This is a free work, you can copy, distribute, and modify it un- sistants of AI. der the terms of the Free Art License http://artlibre.org/li- cence/lal/en/. Online version: http://www.algolit.net/index.php/Data_Workers Sources: https://gitlab.constantvzw.org/algolit/mundaneum % 0 0 0 0 0 0 ___ _ 0 0 7 % % % % % %%% % % % % / \__ _| |_ __ _ 0 % % %%% % %% % % % % % % / /\ / _` | __/ _` | % % 0 % % % % % % % / /_// (_| | || (_| | % % % % % % %%% % % 00 /___,' \__,_|\__\__,_| % 0 % % % % % % __ % __ 0 % _ 0 % % % % % % 0 / / /\ \ \___ _ __| | _____ _ __ ___ % % % % % % % % \ \/ \/ / _ \| '__| |/ / _ \ '__/ __| % 0 \ /\ / (_) | | | < __/ | \__ \ 0 % % 0 \/ \/ \___/|_| |_|\_\___|_| |___/ % % 0 % ___ _ _ % % % 0 / _ \___ __| | ___ __ _ ___| |_ 0 % 0 0 / /_)/ _ \ / _` |/ __/ _` / __| __| % % 0 0 / ___/ (_) | (_| | (_| (_| \__ \ |_ % 0 \/ \___/ \__,_|\___\__,_|___/\__| % 0 0 0 0 0 0 % % % By Algolit % % % % % During our monthly Algolit meetings, we study manuals and experi- ment with machine learning tools for text processing. And we also share many, many stories. With this podcast we hope to recreate some of that atmosphere. % % For outsiders, algorithms only become visible in the media when they achieve an outstanding performance, like Alpha Go, or when they break down in fantastically terrifying ways. Humans working in the field though, create their own culture on and offline. They share the best stories and experiences during live meetings, research conferences and annual competitions like Kaggle. These % stories that contextualize the tools and practises can be funny, sad, shocking, interesting. A lot of them are experiential learning cases. The implementa- % % tions of algorithms in society generate new conditions of labour, storage, exchange, behaviour, copy and paste. In that sense, the contextual stories capture a momentum in a larger anthropo-ma- chinic story that is being written at full speed and by many voices. The stories are also published in the publication of Data Workers. --- % % % % Voices: David Stampfli, Cristina Cochior, An Mertens, Gijs de Heij, Karin Ulmer, Guillaume Slizewicz Editing: Javier Lloret % Recording: David Stampfli Texts: Cristina Cochior, An Mertens 00 00 0 0 0 0 0 _ _ _ % /\/\ __ _ _ __| | _| |__ ___ | |_ / \ / _` | '__| |/ / '_ \ / _ \| __| / /\/\ \ (_| | | | 0 <| |_) | (_) | |_ \/ \/\__,_|_| |_|\_\_.__/ \___/ \__| ___ _ 0 0 _ 00 / __\ |__ __ _(_)_ __ ___ 0 0 / / | '_ \ / _` | | '_ \/ __| 0 / /___| | | | (_| | | | | \__ \ 0 \____/|_| |_|\__,_|_|_| |_|___/ 0 0 0 0 0 By Florian Van de Weyer, student Arts²/Section Digital Arts Markbot Chain is a social experiment in which the public has a 8 %% % % % direct influence on the result. The intention is to integrate re- % % % % % % sponses in a text-generation process without applying any filter. % % %% % %%% %% % % % % % % % % %% % % % % % % All the questions in the digital files provided by the Mundaneum %% % % %% were automatically extracted. These questions are randomly put to % % the public via a terminal. By answering them, people contribute % % % to another database. Each entry generates a series of sentences % using a Markov chain configuration, an algorithm that is widely % used in spam generation. The sentences generated in this way are % % % displayed in the window, and a new question is asked. % % % % % % % % 9 CONTEXTUAL STORIES ABOUT WRITERS --- Programmers are writing the dataworkers into the only way to maintain trust is through consis- being --- tency. So when Cortana talks, you 'must use her personality'. We recently had a funny realization: most program- mers of the languages and packages that Algolit What is Cortana's personality, you ask? uses are European. Python, for example, the main language that is 'Cortana is considerate, sensitive, and support- globally used for Natural Language Processing ive. (NLP), was invented in 1991 by the Dutch program- mer Guido Van Rossum. He then crossed the Atlantic She is sympathetic but turns quickly to solutions. and went from working for Google to working for Dropbox. She doesn't comment on the user’s personal infor- mation or behavior, particularly if the informa- Scikit Learn, the open-source Swiss knife of ma- tion is sensitive. chine learning tools, started as a Google Summer of Code project in Paris by French researcher She doesn't make assumptions about what the user David Cournapeau. Afterwards, it was taken on by wants, especially to upsell. Matthieu Brucher as part of his thesis at the Sor- bonne University in Paris. And in 2010, INRA, the She works for the user. She does not represent any French National Institute for computer science and company, service, or product. applied mathematics, adopted it. She doesn’t take credit or blame for things she Keras, an open-source neural network library writ- didn’t do. ten in Python, was developed by François Chollet, a French researcher who works on the Brain team at She tells the truth about her capabilities and her Google. limitations. Gensim, an open-source library for Python used to She doesn’t assume your physical capabilities, create unsupervised semantic models from plain gender, age, or any other defining characteristic. text, was written by Radim Řehůřek. He is a Czech computer scientist who runs a consulting business She doesn't assume she knows how the user feels in Bristol, UK. about something. And to finish up this small series, we also looked She is friendly but professional. at Pattern, an often-used library for web-mining and machine learning. Pattern was developed and She stays away from emojis in tasks. Period made open-source in 2012 by Tom De Smedt and Wal- ter Daelemans. Both are researchers at CLIPS, the She doesn’t use culturally- or professionally-spe- research centre for Computational Linguistics and cific slang. Psycholinguistcs at the University of Antwerp. She is not a support bot.' --- Cortana speaks --- Humans intervene in detailed ways to programme an- AI assistants often need their own assistants: swers to questions that Cortana receives. How they are helped in their writing by humans who in- should Cortana respond when she is being proposed ject humour and wit into their machine-processed inappropriate actions? Her gendered acting raises language. Cortana is an example of this type of difficult questions about power relations within blended writing. She is Microsoft’s digital assis- the world away from the keyboard, which is being tant. Her mission is to help users to be more pro- mimicked by technology. ductive and creative. Cortana's personality has been crafted over the years. It's important that Consider Cortana's answer to the question: she maintains her character in all interactions with users. She is designed to engender trust and - Cortana, who's your daddy? her behavior must always reflect that. - Technically speaking, he’s Bill Gates. No big deal. The following guidelines are taken from Mi- crosoft's website. They describe how Cortana's style should be respected by companies that extend --- Open-source learning --- her service. Writers, programmers and novelists, who develop Cortana's responses, personality and Copyright licenses close up a lot of the machinic branding have to follow these guidelines. Because writing, reading and learning practices. That means that they're only available for the employ- 10 ees of a specific company. Some companies partici- pate in conferences worldwide and share their References knowledge in papers online. But even if they share https://hiphilangsci.net/2013/05/01/on-the-his- their code, they often will not share the large tory-of-the-question-of-whether-natural-language- amounts of data needed to train the models. is-illogical/ We were able to learn to machine learn, read and Book: Neural Network Methods for Natural Language write in the context of Algolit, thanks to aca- Processing, Yoav Goldberg, Bar Ilan University, demic researchers who share their findings in pa- April 2017. pers or publish their code online. As artists, we believe it is important to share that attitude. That's why we document our meetings. We share the tools we make as much as possible and the texts we use are on our online repository under free li- censes. We are thrilled when our works are taken up by others, tweaked, customized and redistributed, so please feel free to copy and test the code from our website. If the sources of a particular project are not there, you can always contact us through the mailinglist. You can find a link to our repository, etherpads and wiki at: http://www.algolit.net. --- Natural language for artificial intelligence --- Natural Language Processing (NLP) is a collective term that refers to the automatic computational processing of human languages. This includes algo- rithms that take human-produced text as input, and attempt to generate text that resembles it. We produce more and more written work each year, and there is a growing trend in making computer inter- faces to communicate with us in our own language. NLP is also very challenging, because human lan- guage is inherently ambiguous and ever-changing. But what is meant by 'natural' in NLP? Some would argue that language is a technology in itself. Ac- cording to Wikipedia, 'a natural language or ordi- nary language is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation. Natu- ral languages can take different forms, such as speech or signing. They are different from con- structed and formal languages such as those used to program computers or to study logic. An offi- cial language with a regulating academy, such as Standard French with the French Academy, is clas- sified as a natural language. Its prescriptive points do not make it constructed enough to be classified as a constructed language or controlled enough to be classified as a controlled natural language.' So in fact, 'natural languages' also includes lan- guages which do not fit in any other group. NLP, instead, is a constructed practice. What we are looking at is the creation of a constructed lan- guage to classify natural languages that, by their very definition, resists categorization. 11 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678 9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567 89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456 789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345 6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234 56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123 456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12 3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1 23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678 9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567 89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456 789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345 6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234 56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123 456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12 3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1 23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678 9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567 89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456 789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345 6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234 56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123 456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12 3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1 23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678 9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567 89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456 789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345 6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234 56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123 456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12 3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1 23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678 9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567 89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456 789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345 6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234 56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123 456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12 3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1 23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678 9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567 89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456 789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345 6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234 56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123 456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12 3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1 23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678 9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567 89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456 789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345 6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234 56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123 456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12 3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1 23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12 oracles predict oracles predict oracles predict oracles predict oracles predict oracles predic oracles predict oracles predict oracles predict oracles predict orac es predict oracles predict oracles predict oracles predict racles predict oracles predict oracles predict oracles predic oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict or cles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles pr dict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict orac es predict oracles predict oracles predict oracles predict oracles predict oracles predic oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict racles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict orac es predict oracles predict oracles predict oracles predict racles predict oracles predict oracles predict oracles predict oracles predict racles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict or cles predict oracles predic oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict oracles predict 13 r e32t 8smc 9i ab14 e s4 +-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+ , e| 8 1 e D ry a4a e ta 9 e t s5 e ² 348 th8no 2 4at t |o|r|a|c|l|e|s| ar3i |p|r|e|d|i|c|t| 63 s 1 tc39,l3h, d14 5au on w 4 SI, 1 56 e|p 4 iu g7 e +-+-+-+-+-+-+-+ 39k +-+-+-+-+-+-+-+ 9 l o a d r 7 P _ e,a + n w 2a p/+ 9f8 1of 5\i 4h h e2n 3 t on1 9t \ 94 ne2 + uu e n 63m 5 e a3 2n e, sn 39ew nt1i -5d 632sd e 15t |a3% 3 c wt9 c n9sg6et 8 8 c , n 1poo F 1 3 o 1g18e +-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ 7 +-+-+-+-+-+-+-+-+ +-+-+-+ 4 n t2+a- 8 43 8 3p4 n o tpn86i |m|a|c|h|i|n|e| |l|e|a|r|n|i|n|g| 2 |a|n|a|l|y|s|e|s| |a|n|d| a 5e v3 5 9 o56n n e9n 4 5 +-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ etn +-+-+-+-+-+-+-+-+ +-+-+-+ li 5p 8f i h 3 6 k6 3i6 3 9y e , r6 6iA wg r1 +-+-+-+-+-+-+-+-+ 3 e e a y l hl -N 7 g n6d 14t l1 9ui | _rs e i e 1 |p|r|e|d|i|c|t|s| 1 wn9uc tn s 6m a rrh4 7 oly e e e e 4 62 y a e +-+-+-+-+-+-+-+-+ g 8a 3 V l% u a i 1 7 1 ’ h | 8 8 5 _ n , 8r 4 1_ +-+-+-+-+-+-+ .r +-+-+-+-+ +-+-+-+-+-+-+-+ 5 r 3 9 1 p o f a r v t 4 o 9 w2 4r |m|o|d|e|l|s| g r |h|a|v|e| |l|e|a|r|n|e|d| 1 n r1 8 2 sro 1 ,d c T2 8 9 41 6 +-+-+-+-+-+-+ c +-+-+-+-+ +-+-+-+-+-+-+-+ d3 s m 6 d n f c t e t t r 1 6 .ofoi t 5 67 1 +-+-+-+-+-+-+ 7 +-+-+-+ +-+-+-+-+ 4o e e 5 1 98 g , + rw l 9 96 a 3t np , |m|o|d|e|l|s| |a|r|e| |u|s|e|d| , e uu 3 l c t 3 28e 95 9 h _ n +-+-+-+-+-+-+ +-+-+-+ +-+-+-+-+ a9 1e _eu p e d e w n w r n n f 8 c , d +-+-+-+-+ a +-+-+-+-+-+-+-+-+-+ 84 i e l8 t + o mf 7 |t|h|e|y| d |i|n|f|l|u|e|n|c|e| o n a bntq c d n7 8 - s e 9 n 7 77 8 +-+-+-+-+ aa +-+-+-+-+-+-+-+-+-+ t a 6 1 | c4 h o l6 o 9 8 o +-+-+-+-+ i +-+-+-+-+ +-+-+-+-+-+ +-+-+-+ e r 3e9 h 6 o -n p 9 f n s 8hr |t|h|e|y| e- |h|a|v|e| |t|h|e|i|r| |s|a|y| lV d tr r 2 6 6 a +-+-+-+-+ %5 +-+-+-+-+ +-+-+-+-+-+ +-+-+-+ 3 ip n 5n r 7 o( s +-+-+-+-+-+-+-+-+-+-+-+ 5 4 a o 7 3 e 6 n- t n f d it p 1 e |i|n|f|o|r|m|a|t|i|o|n| 4n i3 c, 6 t 1 l ma 7 1 d b +-+-+-+-+-+-+-+-+-+-+-+ a 7 t 4 7 s w 3a e 4 3 3 +-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+ d i 2 6 e r C |e|x|t|r|a|c|t|i|o|n| |r|e|c|o|g|n|i|z|e|s| r %_ e d kb h +-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+ a 3 c +-+-+-+-+ m v 7 + 9 l 5 so h a a |t|e|x|t| 5 5 e 3 9 P p 5 -9 t u5 7 ' l +-+-+-+-+ m ao n- r i y +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+ 8 1 a 9 37 |c|l|a|s|s|i|f|i|c|a|t|i|o|n| |d|e|t|e|c|t|s| c 4 I r t p h +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ o +-+-+-+-+-+-+-+ O pe u g rk 4 7 1 5 5 9 i 4 c 5 2 o 3 p h 9 v r f 3d d , 3r 5i g h 1 4 l 5 h w c 7 e 3 yo n h 5 5 2 e m o , c 2 r s 3 1 7 s 1 e 1 l 6 t e 6 1 r b 2 4 e r 4 4 o s 4 9 ,i pw o c 1 6 n , a 5 e e i 4 p t , ' s ei 9 t 6 t l u 6 9 V 8 c | _ a r o 5 r | 3 t t 1 1 o 3 _ o l 6 i 7 + O w e 8 7 M se % i 3 e p 3 9 a r a b i n o a 7 e 4 s o tl t 9 r s 94 c o k5 l 2 | a r T 1 , r r 2 s | , n o t 5 l t r si e y s t y e o r 8 e 1 h 2 n 6 5 r n 5 s 14 V V V V V V V V %% %% % % % % % V V V V V V V V V V V V V V V V 0 % 0 % 0 0 %% 0 % % %% V V V % V % V V V V V % % %% % 0 0 0 0 % 0 0 00 % % % %% % % _____ _ 0 _ _ 0 _ _ _ % % % % 0 /__ \ |__ ___ /_\ | | __ _ ___ | (_) |_ % % ORACLES % % % % % 0 / /\/ '_ \ / _ \ //_\\| |/ _` |/ _ \| | | __| % % % %% / / | | | | __/ / _ \ | (_| | (_) | | | |_ % % \/ |_| |_|\___| \_/ \_/_|\__, |\___/|_|_|\__| V V V V V V V V % 0 % % % 0 |___/ % V V V V V V V V V V V V V V V V % 0 0 %% 0 0 _ 0 % 0 % V V V V V V V V V 0 ___ _ __ __ _| |_ ___ _ __ % V V V V V V V V % % % % / _ \ '__/ _` | __/ _ \| '__| % V V V V V V V V V V V V V V V V % | __/ | | (_| | || (_) | | V V V V V V V V V 0 \___|_| \__,_|\__\___/|_| % 0 0 % Machine learning is mainly used to % % analyse and predict situations by Algolit % based on existing cases. In this exhibition we focus on machine The Algoliterator is a neural network trained using the selection learning models for text processing of digitized works of the Mundaneum archive. % or Natural Language Processing % (NLP). These models have learned to With the Algoliterator you can write a text in the style of the perform a specific task on the ba- International Institutions Bureau. The Algoliterator starts by sis of existing texts. The models selecting a sentence from the archive or corpus used to train it. are used for search engines, ma- You can then continue writing yourself or, at any time, ask the chine translations and summaries, Algoliterator to suggest a next sentence: the network will gener- spotting trends in new media net- ate three new fragments based on the texts it has read. You can works and news feeds. They influ- control the level of training of the network and have it generate ence what you get to see as a user, sentences based on primitive training, intermediate training or but also have their say in the final training. course of stock exchanges world- wide, the detection of cybercrime When you're satisfied with your new text, you can print it on the and vandalism, etc. thermal printer and take it home as a souvenir. % There are two main tasks when it % --- comes to language understanding. Information extraction looks at Sources: https://gitlab.constantvzw.org/algolit/algoliterator.- concepts and relations between con- clone cepts. This allows for recognizing topics, places and persons in a Concept, code & interface: Gijs de Heij & An Mertens text, summarization and questions & answering. The other task is text Technique: Recurrent Neural Network classification. You can train an oracle to detect whether an email Original model: Andrej Karphaty, Justin Johnson % is spam or not, written by a man or a woman, rather positive or nega- % % tive. 0 0 0 0 0 0 0 0 0 0 0 0 0 In this zone you can see some of __ __ 0 _ 0 _ 0 those models at work. During your 0 0 / / /\ \ \___ _ __ __| |___ (_)_ __ further journey through the exhibi- \ \/ \/ / _ \| '__/ _` / __| | | '_ \ tion you will discover the differ- \ /\ / (_) | | | (_| \__ \ | | | | | ent steps that a human-machine goes \/ \/ \___/|_| \__,_|___/ |_|_| |_| through to come to a final model. 0 __ 0 00 0 / _\_ __ __ _ ___ ___ 0 00 0 \ \| '_ \ / _` |/ __/ _ \ _\ \ |_) | (_| | (_| __/ 0 % 0 \__/ .__/ \__,_|\___\___| 0 0 |_| 0 0 0 0 0 0 0 by Algolit Word embeddings are language modelling techniques that through multiple mathematical operations of counting and ordering, plot words into a multi-dimensional vector space. When embedding words, they transform from being distinct symbols into mathemati- cal objects that can be multiplied, divided, added or substract- ed. 15 %%% % % % % % % % %% % %% % %% %% % %% % % % % % % % %%% %% %% By distributing the words along the many diagonal lines of the % % % multi-dimensional vector space, their new geometrical placements % % become impossible to perceive by humans. However, what is gained % % % are multiple, simultaneous ways of ordering. Algebraic operations % %% % make the relations between vectors graspable again. % % % % % % This installation uses Gensim, an open-source vector space and topic-modelling toolkit implemented in the programming language % Python. It allows to manipulate the text using the mathematical relationships that emerge between the words, once they have been % % % plotted in a vector space. % % % % % % % % % --- % % % % Concept & interface: Cristina Cochior % % % % Technique: word embeddings, word2vec % % % % Original model: Radim Rehurek and Petr Sojka % % % % % % 0 00 0 0 0 % ___ _ 0 _ __ 0 _ 0 % 0 / __\ | __ _ ___ ___(_)/ _|_ 0 _(_)_ __ __ _ / / | |/ _` / __/ __| | |_| | | | | '_ \ / _` | / /___| | (_| \__ \__ \ | _| |_| | | | | | (_| | \____/|_|\__,_|___/___/_|_| \__, |_|_| |_|\__, | % 0 0 0 0 0 |___/ |___/ _ _ __ __ _ _ % 0 0 | |_| |__ ___ / / /\ \ \___ _ __| | __| | % 0 | __| '_ \ / _ \ \ \/ \/ / _ \| '__| |/ _` | 0 | |_| | | | __/ \ /\ / (_) | | | | (_| | \__|_| |_|\___| \/ \/ \___/|_| |_|\__,_| 0 0 0 % by Algolit % Librarian Paul Otlet's life work was the construction of the Mun- daneum. This mechanical collective brain would house and distrib- ute everything ever committed to paper. Each document was classi- % fied following the Universal Decimal Classification. Using tele- graphs and especially, sorters, the Mundaneum would have been able to answer any question from anyone. With the collection of digitized publications we received from the Mundaneum, we built a prediction machine that tries to clas- % sify the sentence you type in one of the main categories of Universal Decimal Classification. You also witness how the ma- chine 'thinks'. During the exhibition, this model is regularly retrained using the cleaned and annotated data visitors added in % Cleaning for Poems and The Annotator. % The main classes of the Universal Decimal Classification system are: % % 0 - Science and Knowledge. Organization. Computer Science. Infor- mation Science. Documentation. Librarianship. Institutions. Publications % 1 - Philosophy. Psychology 2 - Religion. Theology % 3 - Social Sciences % 4 - vacant 16 %% %% %%% %% % %% 5 - Mathematics. Natural Sciences % % % % % % %% % % % %% % % % %% %% %% % % % % % % % % % % 6 - Applied Sciences. Medicine, Technology % % % % % % % % %% % %% % 7 - The Arts. Entertainment. Sport % %% % % %% % % % % % % % % 8 - Linguistics. Literature % % % % % % % % % % % % % % % % 9 - Geography. History % %% % %% % % % % % % --- % % % % Concept, code, interface: Sarah Garcin, Gijs de Heij, An Mertens % % % % % % % % % 0 0 % 0 % %% 000 0 0 % 0 % ___ 00 _ 0 % 0 / _ \___ ___ _ __ | | ___ % 0 0 / /_)/ _ \/ _ \| '_ \| |/ _ \ 0 0 / ___/ __/ (_) | |_) | | __/ 0 0 \/ % \___|\___/| .__/|_|\___| 0 0 0 |_| 0 % _ _ _ 0 _ 0 0 0 0 __| | ___ _ __( ) |_ | |__ __ ___ _____ % % / _` |/ _ \| '_ \/| __| | '_ \ / _` \ \ / / _ \ % | (_| | (_) | | | || |_ | | | | (_| |\ V / __/ 0 \__,_|\___/|_| |_| \__| |_| |_|\__,_| \_/ \___| _ 0 _ _ 0 0 | |__ _ _| |_| |_ ___ _ __ ___ | '_ \| | | | __| __/ _ \| '_ \/ __| % 0 | |_) | |_| | |_| || (_) | | | \__ \ 0 |_.__/ \__,_|\__|\__\___/|_| |_|___/ 0 0 % by Algolit Since the early days of artificial intelligence (AI), researchers have speculated about the possibility of computers thinking and communicating as humans. In the 1980s, there was a first revolu- tion in Natural Language Processing (NLP), the subfield of AI concerned with linguistic interactions between computers and hu- mans. Recently, pre-trained language models have reached state- of-the-art results on a wide range of NLP tasks, which intensi- % fies again the expectations of a future with AI. % This sound work, made out of audio fragments of scientific docu- mentaries and AI-related audiovisual material from the last half century, explores the hopes, fears and frustrations provoked by these expectations. --- % Concept, sound edit: Javier Lloret % List of sources: 'The Machine that Changed the World : Episode IV -- The Thinking Machine', 'The Imitation Game', 'Maniac', 'Halt & Catch Fire', 'Ghost in the Shell', 'Computer Chess', '2001: A Space Odyssey', Ennio Morricone, Gijs Gieskes, André Castro. 17 CONTEXTUAL STORIES ABOUT ORACLES Oracles are prediction or profiling machines. They are widely used in smartphones, computers, Sweeney based her research on queries of 2184 tablets. racially associated personal names across two web- sites. 88 per cent of first names, identified as Oracles can be created using different techniques. being given to more black babies, are found pre- One way is to manually define rules for them. As dictive of race, against 96 per cent white. First prediction models they are then called rule-based names that are mainly given to black babies, such models. Rule-based models are handy for tasks that as DeShawn, Darnell and Jermaine, generated ads are specific, like detecting when a scientific pa- mentioning an arrest in 81 to 86 per cent of name per concerns a certain molecule. With very little searches on one website and in 92 to 95 per cent sample data, they can perform well. on the other. Names that are mainly assigned to whites, such as Geoffrey, Jill and Emma, did not But there are also the machine learning or statis- generate the same results. The word 'arrest' only tical models, which can be divided in two oracles: appeared in 23 to 29 per cent of white name 'supervised' and 'unsupervised' oracles. For the searches on one site and 0 to 60 per cent on the creation of supervised machine learning models, other. humans annotate sample text with labels before feeding it to a machine to learn. Each sentence, On the website with most advertising, a black- paragraph or text is judged by at least three an- identifying name was 25 percent more likely to get notators: whether it is spam or not spam, positive an ad suggestive of an arrest record. A few names or negative etc. Unsupervised machine learning did not follow these patterns: Dustin, a name models don't need this step. But they need large mainly given to white babies, generated an ad sug- amounts of data. And it is up to the machine to gestive of arrest in 81 and 100 percent of the trace its own patterns or 'grammatical rules'. Fi- time. It is important to keep in mind that the ap- nally, experts also make the difference between pearance of the ad is linked to the name itself. classical machine learning and neural networks. It is independent of the fact that the name has an You'll find out more about this in the Readers arrest record in the company's database. zone. Reference Humans tend to wrap Oracles in visions of Paper: https://dataprivacylab.org/projects/onlin- grandeur. Sometimes these Oracles come to the sur- eads/1071-1.pdf face when things break down. In press releases, these sometimes dramatic situations are called 'lessons'. However promising their performances --- What is a good employee? --- seem to be, a lot of issues remain to be solved. How do we make sure that Oracles are fair, that Since 2015 Amazon employs around 575,000 workers. every human can consult them, and that they are And they need more. Therefore, they set up a team understandable to a large public? Even then, exis- of 12 that was asked to create a model to find the tential questions remain. Do we need all types of right candidates by crawling job application web- artificial intelligence (AI) systems? And who de- sites. The tool would give job candidates scores fines what is fair or unfair? ranging from one to five stars. The potential fed the myth: the team wanted it to be a software that would spit out the top five human candidates out --- Racial AdSense --- of a list of 100. And those candidates would be hired. A classic 'lesson' in developing Oracles was docu- mented by Latanya Sweeney, a professor of Govern- The group created 500 computer models, focused on ment and Technology at Harvard University. In specific job functions and locations. They taught 2013, Sweeney, of African American descent, each model to recognize some 50,000 terms that googled her name. She immediately received an ad- showed up on past candidates’ letters. The algo- vertisement for a service that offered her ‘to see rithms learned to give little importance to skills the criminal record of Latanya Sweeney’. common across IT applicants, like the ability to write various computer codes. But they also Sweeney, who doesn’t have a criminal record, began learned some decent errors. The company realized, a study. She started to compare the advertising before releasing, that the models had taught them- that Google AdSense serves to different racially selves that male candidates were preferable. They identifiable names. She discovered that she re- penalized applications that included the word ceived more of these ads searching for non-white 'women’s,' as in 'women’s chess club captain.' And ethnic names, than when searching for tradition- they downgraded graduates of two all-women’s col- ally perceived white names.You can imagine how leges. damaging it can be when possible employers do a simple name search and receive ads suggesting the This is because they were trained using the job existence of a criminal record. applications that Amazon received over a ten-year period. During that time, the company had mostly 18 hired men. Instead of providing the 'fair' deci- sion-making that the Amazon team had promised, the The team developed a model to analyse word embed- models reflected a biased tendency in the tech in- dings trained over 100 years of texts. For contem- dustry. And they also amplified it and made it in- porary analysis, they used the standard Google News word2vec Vectors, a straight-off-the-shelf be exceedingly difficult to sue an employer over downloadable package trained on the Google News automated hiring: job candidates might never know Dataset. For historical analysis, they used embed- that intelligent software was used in the process. dings that were trained on Google Books and the Corpus of Historical American English (COHA http- Reference s://corpus.byu.edu/coha/) with more than 400 mil- https://www.reuters.com/article/us-amazon-com- lion words of text from the 1810s to 2000s. As a jobs-automation-insight/amazonscraps-secret-ai-re- validation set to test the model, they trained em- cruiting-tool-that-showed-bias-against-women- beddings from the New York Times Annotated Corpus idUSKCN1MK08G for every year between 1988 and 2005. The research shows that word embeddings capture --- Quantifying 100 Years of Gender and Ethnic changes in gender and ethnic stereotypes over Stereotypes --- time. They quantifiy how specific biases decrease over time while other stereotypes increase. The Dan Jurafsky is the co-author of 'Speech and Lan- major transitions reveal changes in the descrip- guage Processing', one of the most influential tions of gender and ethnic groups during the books for studying Natural Language Processing women’s movement in the 1960-1970s and the Asian- (NLP). Together with a few colleagues at Stanford American population growth in the 1960s and 1980s. University, he discovered in 2017 that word embed- dings can be a powerful tool to systematically A few examples: quantify common stereotypes and other historical trends. The top ten occupations most closely associated with each ethnic group in the contemporary Google Word embeddings are a technique that translates News dataset: words to numbered vectors in a multi-dimensional space. Vectors that appear next to each other, in- - Hispanic: housekeeper, mason, artist, janitor, dicate similar meaning. All numbers will be dancer, mechanic, photographer, baker, cashier, grouped together, as well as all prepositions, driver person's names, professions. This allows for the calculation of words. You could substract London - Asian: professor, official, secretary, conduc- from England and your result would be the same as tor, physicist, scientist, chemist, tailor, ac- substracting Paris from France. countant, engineer An example in their research shows that the vector - White: smith, blacksmith, surveyor, sheriff, for the adjective 'honorable' is closer to the weaver, administrator, mason, statistician, cler- vector for 'man' whereas the vector for 'submissive' gy, photographer learned by the algorithm. It will be problematic The 3 most male occupations in the 1930s: when the pre-trained embeddings are then used engineer, lawyer, architect. for sensitive applications such as search rankings, The 3 most female occupations in the 1930s: product recommendations, or translations. This nurse, housekeeper, attendant. can be downloaded as off-the-shelf-packages. Not much has changed in the 1990s. It is known that language reflects and keeps cul- Major male occupations: tural stereotypes alive. Using word embeddings to architect, mathematician and surveyor. spot these stereotypes is less time-consuming and Female occupations: less expensive than manual methods. But the imple- nurse, housekeeper and midwife. mentation of these embeddings for concrete predic- tion models, has caused a lot of discussion within Reference the machine learning community. The biased models https://arxiv.org/abs/1711.08412 stand for automatic discrimination. Questions are: is it actually possible to de-bias these models completely? Some say yes, while others disagree: --- Wikimedia's Ores service --- instead of retro-engineering the model, we should ask whether we need it in the first place. These Software engineer Amir Sarabadani presented the researchers followed a third path: by acknowledg- ORES-project in Brussels in November 2017 during ing the bias that originates in language, these the Algoliterary Encounter. tools become tools of awareness. 19 This 'Objective Revision Evaluation Service' uses was a chat bot that imitated a teenage girl on machine learning to help automate critical work on Twitter. She lived for less than 24 hours before Wikimedia, like vandalism detection and the re- she was shut down. Few people know that before moval of articles. Cristina Cochior and Femke this incident, Microsoft had already trained and Snelting interviewed him. released XiaoIce on WeChat, China's most used chat application. XiaoIce's success was so promising Femke: To go back to your work. In these days you that it led to the development of its American tried to understand what it means to find bias in version. However, the developers of Tay were not machine learning and the proposal of Nicolas prepared for the platform climate of Twitter. Maleve, who gave the workshop yesterday, was nei- Although the bot knew how to distinguish a noun ther to try to fix it, nor to refuse to deal with from an adjective, it had no understanding of the systems that produce bias, but to work with them. actual meaning of words. The bot quickly learned He says that bias is inherent to human knowledge, to copy racial insults and other discriminative so we need to find ways to somehow work with it. language it learned from Twitter users and troll We're just struggling a bit with what would that attacks. mean, how would that work... So I was wondering whether you had any thoughts on the question of Tay's appearance and disappearance was an impor- bias. tant moment of consciousness. It showed the possi- ble corrupt consequences that machine learning can Amir: Bias inside Wikipedia is a tricky question have when the cultural context in which the algo- because it happens on several levels. One level rithm has to live is not taken into account. that has been discussed a lot is the bias in ref- erences. Not all references are accessible. So one Reference thing that the Wikimedia Foundation has been try- https://chatbotslife.com/the-accountability-of-ai- ing to do, is to give free access to libraries case-study-microsofts-tay-experiment-ad577015181f that are behind a pay wall. They reduce the bias by only using open-access references. Another type of bias is the Internet connection, access to the Internet. There are lots of people who don't have it. One thing about China is that the Internet there is blocked. The content against the govern- ment of China inside Chinese Wikipedia is higher because the editors [who can access the website] are not people who are pro government, and try to make it more neutral. So, this happens in lots of places. But in the matter of artificial intelli- gence (AI) and the model that we use at Wikipedia, it's more a matter of transparency. There is a book about how bias in AI models can break peo- ple's lives, it's called 'Weapons of Math Destruc- tion'. It talks about AI models that exist in the US that rank teachers and it's quite horrible be- cause eventually there will be bias. The way to deal with it based on the book and their research was first that the model should be open source, people should be able to see what features are used and the data should be open also, so that people can investigate, find bias, give feedback and report back. There should be a way to fix the system. I think not all companies are moving in that direction, but Wikipedia, because of the val- ues that they hold, are at least more transparent and they push other people to do the same thing. Reference https://gitlab.constantvzw.org/algolit/algolit /blob/master/algoliterary_encounter/Interview% 20with%20Amir/AS.aac --- Tay --- One of the infamous stories is that of the machine learning programme Tay, designed by Microsoft. Tay 20 cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cle ners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners lean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cle ners clean cleaners clean cleaners clean cleaners clean cleaners lean cleaners clean cleane s clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cle ners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners lean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean cleaners clean 21 r u e n 7 c %9 2 y m V +-+-+-+-+-+-+-+-+ e4 +-+-+-+-+-+ 9 -t 0n neof e 5 r6 7 kln ci p '.s w s u 18 u n |c|l|e|a|n|e|r|s| 2 |c|l|e|a|n| et.t o % s eii4t i ktu 4i w + t 6 . 3e -6 6 rVle 17 +-+-+-+-+-+-+-+-+ rg +-+-+-+-+-+ .e o n 7 ci i 0 e h eR e85 orh n x h r 4 h t5 7hoh 4 t ei g + n e3 tt np% k s +h_ hees ir w n +6 l rt 8 oe e Fe r5b t ua0e 3ei n a 1 t8 rd t 7 li \ 7n v2 tq e e6 a as o 2b t t m oe f c8 lx - g9 r - -s+ +-+-+ h +-+-+-+-+-+-+ 8f o1 Ao % r - 5i 2 e - r x p n4h e6 s n8 / s7 . 95 sti |w|e| eno |h|e|l|p|e|d| +e r a2 sy n gyl 2u e sti6t ch% _ 1r se o + t t 4, 1 t9 l +-+-+ e +-+-+-+-+-+-+ t r i 7 rs u ie o o,4 h , 5 5h g gs 6u5e e0 95 eif e % +-+-+ s 9 +-+-+-+-+-+-+-+ o+ m iy n6 m _4 l oae s+ da e w i_|e e a 6 an |w|e| | |c|l|e|a|n|e|d| 7 i a e r l 7 se 8w ,p+tn i d t 1 g s ae l +-+-+ tec +-+-+-+-+-+-+-+ - ts e e,d % e 8e i r i _6sog y L5 e v +-+-+-+-+-+ +-+-+-+-+ er +-+-+ +-+-+-+-+-+-+ Ies f e/ 8rh gr o 5 ac55 e ( h s s9 |h|u|m|a|n| |w|o|r|k| 96 7 |i|s| |n|e|e|d|e|d| i 8 d 13 l , i - s tt 1 _ S +-+-+-+-+-+ +-+-+-+-+ _ +-+-+ +-+-+-+-+-+-+ r v Mr_ a3 f r , a s l n 87 +-+-+-+-+-+-+-+-+-+-+-+ rh 9 t r 7 36 w i n e 2 n d m i4 +2 c 6 o |p|o|o|r|l|y|-|p|a|i|d| w n 3 g e - 6 tk o- r r w9 4 t 8p ie c rVv 5 +-+-+-+-+-+-+-+-+-+-+-+ b n h - 6 xc te|t ,2 5 n 4 4 ,in 7 4( d +-+-+-+-+-+-+-+-+-+-+-+ l +-+-+-+-+-+ +-+-+-+ -d ah v + n5 . 4 6s_ t 2- i l |f|r|e|e|l|a|n|c|e|r|s| te3c |c|a|r|r|y| |o|u|t| l e oee 1n 7 \ y1k r r l p r 6 e +-+-+-+-+-+-+-+-+-+-+-+ 6|p +-+-+-+-+-+ +-+-+-+ s p o2 ) t -e : p 8 h h9 h o 4l +-+-+-+-+-+-+-+-+-+-+ \ +-+-+ +-+-+-+-+-+-+-+-+-+ +-+-+-+-+ nb h 7 s4i1 3 T z3 |h e 9 |v|o|l|u|n|t|e|e|r|s| 9 |d|o| |f|a|n|t|a|s|t|i|c| |w|o|r|k| 9 ws w 5 e6 x a` o +-+-+-+-+-+-+-+-+-+-+ +-+-+ +-+-+-+-+-+-+-+-+-+ +-+-+-+-+ ih l 3 6 7 r 6 d G i6 1 3 e1 +-+-+-+-+-+-+-+ +-+-+-+-+-+-+ +-+-+ +-+-+-+-+ eir c e n% ui l r 6 6s t r |w|h|o|e|v|e|r| |c|l|e|a|n|s| |u|p| |t|e|x|t| h 6 t i t tc w a s e 9 +-+-+-+-+-+-+-+ F +-+-+-+-+-+-+ +-+-+ +-+-+-+-+ , 5 9s9 w e e n m5 e 4 Mi e c i a U u r e 2 a i % .S g6 u 3 _t f 2 t 5 t6 v V c a i f- ee l 9rni/ 3 a 7e 1 1o n 3 2 tn t 5 1o 7 r s / % uio + 9 f a 4 - e o e t + i r + s 2 ls_ nr e w i l V - 8e t 5 +i v 2 p o l n e j n tr l V| n e w L r 8 c l1 l i i a 8 t g0 y s , a u r9 e 8 4 9 e | e 3 n g8 r e? M d r a i l c - n t r 4 e r l c ii e a p r a a h 6 l 3 e s i 4 c o | 6 v rh p7 3 % h t a e e 1 6 6 p 15 8 e a n s d o 1 i 2 n s e m t 2 w v a 6 i i r 7 | a e 5 7 s 3 8 i 4 7 e y 4 3 w 5 l unw5 4ie o3 439 o i % r 6 e a 4a f n e h a 5 o s i l s - s | n D 4 e 3 - 2 5 h a 1 V p n v + 7 8n n a ar ) v . n2 t 5 6r 8 | u o _ e r l n, r 1 e n ,e r s 7 a 7 a e h t y d a 3 u | 2 a s 4 t 6 e t66 e % 2 3 y 3 n a e o i , t 4 i e g c r l t w 9 2 a h v t , p c a r h c l 4 g p1 z i t o m a % a i k | a i e s a v c a , l lp + d 2 a 3 o t e 5 n t p s i a 6 r e 5 y,r m e , g i 7 s i 5 s a a a % r 3 u p n e \ 5 i p o l i 22 % V V V V V V V % V % % % % %% % % %% % % % % % % % V V V V V V V V V V V V V V V V % % % % 0 % % 0 % 0 0 % 0 % % %%% % V V V V V V V V % V % 0 % 0 0 % % % % % % %% ___ _ 0 % 00 _ % % % % % % % 00 / __\ | ___ __ _ _ __ (_)_ __ __ _ % CLEANERS % % / / | |/ _ \/ _` | '_ \| | '_ \ / _` | 0 % % % % % % % 00 / /___| | __/ (_| | | | | | | | | (_| | % % % % % % % 0 \____/|_|\___|\__,_|_| |_|_|_| |_|\__, | % V V V V V V V V % 0 |___/ % % V V V V V V V V V V V V V V V V __ 0 ___ 0 % 0 V V V V V V V V V 0 / _| ___ _ __ / _ \___ ___ _ __ ___ ___ % V V V V V V V V 0 % | |_ / _ \| '__| / /_)/ _ \ / _ \ '_ ` _ \/ __| % V V V V V V V V V V V V V V V V 0 | _| (_) | | / ___/ (_) | __/ | | | | \__ \ V V V V V V V V V |_| \___/|_| \/ 0 \___/ \___|_| |_| |_|___/ 0 0 Algolit chooses to work with texts %%% % that are free of copyright. This by Algolit % % % means that they have been published % % % under a Creative Commons 4.0 li- For this exhibition we worked with 3 per cent of the Mundaneum's cense – which is rare - or that archive. These documents were first scanned or photographed. To they are in the public domain be- make the documents searchable they were transformed into text us- cause the author died more than 70 ing Optical Character Recognition software (OCR). OCR are algo- years ago. This is the case for the % rithmic models that are trained on other texts. They have learned publications of the Mundaneum. We to identify characters, words, sentences and paragraphs. The received 203 documents that we software often makes 'mistakes'. It might recognize a wrong char- helped turn into datasets. They are acter, it might get confused by a stain an unusual font or the now available for others online. reverse side of the page being visible. % Sometimes we had to deal with poor % % % text formats, and we often dedi- While these mistakes are often considered noise, confusing the cated a lot of time to cleaning up training, they can also be seen as poetic interpretations of the documents. We were not alone in do- algorithm. They show us the limits of the machine. And they also ing this. reveal how the algorithm might work, what material it has seen in training and what is new. They say something about the standards % Books are scanned at high resolu- of its makers. In this installation we ask your help in verifying tion, page by page. This is time- our dataset. As a reward we'll present you with a personal algo- consuming, laborious human work and rithmic improvisation. often the reason why archives and libraries transfer their collec- --- tions and leave the job to compa- % nies like Google. The photos are Concept, code, interface: Gijs de Heij converted into text via OCR (Opti- % cal Character Recognition), a soft- ware that recognizes letters, but 0 0 often makes mistakes, especially 0 0 0 when it has to deal with ancient 0 ___ _ _ _ _ 0 _ _ fonts and wrinkled pages. Yet more 0 0 / (_)___| |_ _ __(_) |__ _ _| |_ ___ __| | wearisome human work is needed to / /\ / / __| __| '__| | '_ \| | | | __/ _ \/ _` | improve the texts. This is often 0 / /_//| \__ \ |_| | | | |_) | |_| | || __/ (_| | carried out by poorly-paid free- /___,' |_|___/\__|_| |_|_.__/ \__,_|\__\___|\__,_| lancers via micro-payment platforms ___ 0 __ 0 0 _ like Amazon's Mechanical Turk; or / _ \_ __ ___ ___ / _|_ __ ___ __ _ __| | ___ _ __ by volunteers, like the community / /_)/ '__/ _ \ / _ \| |_| '__/ _ \/ _` |/ _` |/ _ \ '__| around the Distributed Proofreaders / ___/| | | (_) | (_) | _| | | __/ (_| | (_| | __/ | Project, which does fantastic work. 0 \/ |_| \___/ \___/|_| |_| \___|\__,_|\__,_|\___|_| Whoever does it, or wherever it is 0 0 ___ 0 done, cleaning up texts is a tower- 0 / __| 0 ing job for which no structural au- 0 0 \__ \ 0 tomation yet exists. 0 0 |___/ 0 0 0 00 by Algolit Distributed Proofreaders is a web-based interface and an interna- tional community of volunteers who help converting public domain books into e-books. For this exhibition they proofread the Munda- neum publications that appeared before 1923 and are in the public domain in the US. Their collaboration meant a great relief for the members of Algolit. Less documents to clean up! % 23 % % % % % % % % All the proofread books have been made available on the Project % % % % Gutenberg archive. % % %% % % % % % % % % % % %% % % % % % % % % % % % For this exhibition, An Mertens interviewed Linda Hamilton, the % % general manager of Distributed Proofreaders. % % %% % % % % % % % % % % % % % % --- % % % % %% % % %% % % % % % % % % Interview: An Mertens % % % % % % % % % Editing: Michael Murtaugh, Constant % % % % % % % % % 24 CONTEXTUAL STORIES FOR CLEANERS --- Project Gutenberg and Distributed Proofreaders --- change. Project Gutenberg is our Ali Baba cave. It offers The Life Instinct: unification; the eternal re- more than 58,000 free eBooks to be downloaded or turn; the perpetuation and MAINTENANCE of the ma- read online. Works are accepted on Gutenberg when terial; survival systems and operations; equilib- their U.S. copyright has expired. Thousands of rium. volunteers digitize and proofread books to help the project. An essential part of the work is done B. Two basic systems: Development and Maintenance. through the Distributed Proofreaders project. This is a web-based interface to help convert public The sourball of every revolution: after the revo- domain books into e-books. Think of text files, lution, who’s going to try to spot the bias in the EPUBs, Kindle formats. By dividing the workload output? into individual pages, many volunteers can work on a book at the same time; this speeds up the clean- Development: pure individual creation; the new; ing process. change; progress; advance; excitement; flight or fleeing. During proofreading, volunteers are presented with a scanned image of the page and a version of the Maintenance: keep the dust off the pure individual text, as it is read by an OCR algorithm trained to creation; preserve the new; sustain the change; recognize letters in images. This allows the text protect progress; defend and prolong the advance; to be easily compared to the image, proofread, and renew the excitement; repeat the flight; show your sent back to the site. A second volunteer is then work – show it again, keep the git repository presented with the first volunteer's work. She groovy, keep the data analysis revealing. verifies and corrects the work as necessary, and submits it back to the site. The book then simi- Development systems are partial feedback systems larly goes through a third proofreading round, with major room for change. plus two more formatting rounds using the same web interface. Once all the pages have completed these Maintenance systems are direct feedback systems steps, a post-processor carefully assembles them with little room for alteration. into an e-book and submits it to the Project Gutenberg archive. C. Maintenance is a drag; it takes all the fucking time (lit.) We collaborated with the Distributed Proofreaders project to clean up the digitized files we re- The mind boggles and chafes at the boredom. ceived from the Mundaneum collection. From Novem- ber 2018 until the first upload of the cleaned-up The culture assigns lousy status on maintenance book 'L'Afrique aux Noirs' in February 2019, An jobs = minimum wages, Amazon Mechanical Turks = Mertens exchanged about 50 emails with Linda virtually no pay. Hamilton, Sharon Joiner and Susan Hanlon, all vol- unteers from the Distributed Proofreaders project. Clean the set, tag the training data, correct the The conversation is published here. It might in- typos, modify the parameters, finish the report, spire you to share unavailable books online. keep the requester happy, upload the new version, attach words that were wrongly separated by OCR back together, complete those Human Intelligence --- An algoliterary version of the Maintenance Tasks, try to guess the meaning of the requester's Manifesto --- formatting, you must accept the HIT before you can submit the results, summarize the image, add the In 1969, one year after the birth of her first bounding box, what's the semantic similarity of child, the New York artist Mierle Laderman Ukeles this text, check the translation quality, collect wrote a Manifesto for Maintenance Art. The mani- your micro-payments, become a hit Mechanical Turk. festo calls for a readdressing of the status of maintenance work both in the private, domestic Reference space, and in public. What follows is an altered https://www.arnolfini.org.uk/blog/manifesto-for- version of her text inspired by the work of the maintenance-art-1969 Cleaners. IDEAS --- A bot panic on Amazon Mechanical Turk --- A. The Death Instinct and the Life Instinct: Amazon's Mechanical Turk takes the name of a chess-playing automaton from the eighteenth centu- The Death Instinct: separation; categorization; ry. In fact, the Turk wasn't a machine at all. It avant-garde par excellence; to follow the pre- was a mechanical illusion that allowed a human dicted path to death – run your own code; dynamic chess master to hide inside the box and manually operate it. For nearly 84 years, the Turk won most 25 of the games played during its demonstrations around Europe and the Americas. Napoleon Bonaparte is said to have been fooled by this trick too. The Amazon Mechanical Turk is an online platform for humans to execute tasks that algorithms can- not. Examples include annotating sentences as be- ing positive or negative, spotting number plates, discriminating between face and non-face. The jobs posted on this platform are often paid less than a cent per task. Tasks that are more complex or re- quire more knowledge can be paid up to several cents. To earn a living, Turkers need to finish as many tasks as fast as possible, leading to in- evitable mistakes. As a result, the requesters have to incorporate quality checks when they post a job on the platform. They need to test whether the Turker actually has the ability to complete the task, and they also need to verify the re- sults. Many academic researchers use Mechanical Turk as an alternative to have their students exe- cute these tasks. In August 2018 Max Hui Bai, a psychology student from the University of Minnesota, discovered that the surveys he conducted with Mechanical Turk were full of nonsense answers to open-ended questions. He traced back the wrong answers and found out that they had been submitted by respondents with duplicate GPS locations. This raised suspicion. Though Amazon explicitly prohibits robots from completing jobs on Mechanical Turk, the company does not deal with the problems they cause on their platform. Forums for Turkers are full of conversations about the automation of the work, sharing practices of how to create robots that can even violate Amazon’s terms. You can also find videos on YouTube that show Turkers how to write a bot to fill in answers for you. Kristy Milland, an Mechanical Turk activist, says: 'Mechanical Turk workers have been treated really, really badly for 12 years, and so in some ways I see this as a point of resistance. If we were paid fairly on the platform, nobody would be risking their account this way.' Bai is now leading a research project among social scientists to figure out how much bad data is in use, how large the problem is, and how to stop it. But it is impossible at the moment to estimate how many datasets have become unreliable in this way. References https://requester.mturk.com/create/projects/new https://www.wired.com/story/amazon-mechanical- turk-bot-panic/ https://www.maxhuibai.com/blog/evidence-that-re- sponses-from-repeating-gps-are-random http://timryan.web.unc.edu/2018/08/12/data-contam- ination-on-mturk/ 26 informants inform informants inform informants inform informants inform informants inform info mants inform informants inform informants inform informants inform informants i form informants inform informants inform informants inform info mants inform informants inform informants inform informants info m informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants info m informants inform informants inform informants inform informants inform informants inform informants inform in ormants inform informants inform infor ants inform informants inform info mants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform infor ants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants info m informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform in ormants inform info mants inform infor ants inform infor ants inform info mants inform in ormants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform informants inform 27 r 8h3t i5 4 d 7 + +-+-+-+-+-+-+-+-+-+-+ c a +-+-+-+-+-+-+ e f n no6 - - t -as 7 ( e a ah 5al ,n ri B |i|n|f|o|r|m|a|n|t|s| l |i|n|f|o|r|m| , 35e t s evn7 73r o2/ L ep - e t : ca,i ma eeslh | +-+-+-+-+-+-+-+-+-+-+ r_ T +-+-+-+-+-+-+ 2o 73 pjt 7ng% e 84 n 7 hnprs s9i 3a1 9e _ 9l e o pi rsa d o ii/5am sd rr1 1 n% + n8w h|29 e s _ 3 . o i c i. e+1onIa 4 f p | lu e v1r _nth2i a%a ce 1e 7e 1y |t e r xn r 8 sF w t -e +-+-+-+-+ +-+-+-+-+-+-+-+ e +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+ 1 i2 n l cn r3 t e e ,i n ibC 6 |e|a|c|h| |d|a|t|a|s|e|t| |c|o|l|l|e|c|t|s| |d|i|f|f|e|r|e|n|t| iw tc a318 e o l a Me -o r + +-+-+-+-+ +-+-+-+-+-+-+-+ d 9 +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+ +yc l p +6 n 8 , a -rsb es 3 t t | bt ,p q +-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+ 6 1d e 4 , 1 + lk o95 sf s e - 2 b 0 rl n la / S f n |i|n|f|o|r|m|a|t|i|o|n| |a|b|o|u|t| 1 4r y7 n i _ m ec cf 2|r 8ra5 n l 6t +-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+ o t | r e h_ ae3 5 Ti nf ao 7 l t n 9 9 h +e e-1 +-+-+-+ +-+-+-+-+-+ 7 t 8 - f mme 5 t og m 9 i r. m l l j +t3 9 |t|h|e| |w|o|r|l|d| e97 3 9 t i s - o s _i n l o er 8 n petc 141 s / i +-+-+-+ +-+-+-+-+-+ - 9 w 1 1 b t4, r e u n8 a |t +-+-+-+-+-+-+-+-+ , |c +-+-+-+ +-+-+-+-+-+-+ +-+-+-+-+ 2r t 3 o 6 9.o7e 7 Ce |d|a|t|a|s|e|t|s| V |a|r|e| |i|m|b|u|e|d| |w|i|t|h| 7 ig g ig 3xa i r- p R h 8 rr m g _ t +-+-+-+-+-+-+-+-+ +-+-+-+ +-+-+-+-+-+-+ +-+-+-+-+ n f -c , + - - 9 f k i r 6 e 665 a +-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+ t m 1 9 6 om _ 1e Tlh4 , f vr E |c|o|l|l|e|c|t|o|r|'|s| |b|i|a|s| 0 7 t e 2t E5 r o r i i b e hw i a ne +-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+ t a m, m4 - a +-+-+-+-+ +-+-+-+-+-+-+-+-+ d +-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ 118 2a 6 - l l |s|o|m|e| |d|a|t|a|s|e|t|s| rt3 |c|o|m|b|i|n|e| |m|a|c|h|i|n|i|c| k f e d i i 1 e , h +-+-+-+-+ +-+-+-+-+-+-+-+-+ 5 +-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ i % _e r _ f oi e u s dt y +-+-+-+-+-+ +-+-+-+-+ +-+-+-+-+-+ i n9 7 o f f 5 h l9 a a b n |l|o|g|i|c| |w|i|t|h| |h|u|m|a|n| s n 79 e if e 0 s i ln 6t a y t | ’7 / h +-+-+-+-+-+ +-+-+-+-+ +-+-+-+-+-+ 1 - 1n s yn p p r oe xy +-+-+-+-+-+ c n d 6 _i a n - n iu a v s, d o 7 eu e i |l|o|g|i|c| e as d m 2 v|h - | r aL t5 l7 st A c S r c n r / +-+-+-+-+-+ tt o dr | V s 9 +-+-+-+-+-+-+ +-+-+-+-+ d 7 + 5 77 2 t z l x n |m|o|d|e|l|s| |t|h|a|t| d i n oS ad + a a a . _ t ie 7 n n +-+-+-+-+-+-+ +-+-+-+-+ is r t 9 , | f 4 4 a t 8 - 8 e +-+-+-+-+-+-+-+ 1 o 8 h h + t s +m tb rh f 5 6r |r|e|q|u|i|r|e| s o l2 2 | + s o n a - rr o n +-+-+-+-+-+-+-+ m | o y 4 r _ 5 i +-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+ d |m ? e b 4 _ l ` |s|u|p|e|r|v|i|s|i|o|n| |m|u|l|t|i|p|l|y| |t|h|e| - s n 7 1 Tn n - +-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+ d 5 ls t v 3i . - 6 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ h _ 28 9f 4 s i h s- 4 4 l i |s|u|b|j|e|c|t|i|v|i|t|i|e|s| e a u t + 9 fh lh,d +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 6 c 8 3 r c i 1 +-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+ +-+-+-+-+ p - fn o |m|o|d|e|l|s| c |p|r|o|p|a|g|a|t|e| |w|h|a|t| + 5 M 4 5 r g +-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+ +-+-+-+-+ i t f 9 t i y +-+-+-+-+-+-+-+ +-+-+-+-+ sv 7 6r +e n t7 + A h |t|h|e|y|'|v|e| |b|e|e|n| o 45 6 m s t 9 o o _ s +-+-+-+-+-+-+-+ +-+-+-+-+ t o+ u e s k8 3 l 2 - e +-+-+-+-+-+-+ e 6 e- t - + es n 5 e o 4 |t|a|u|g|h|t| s 9 t p e w , : o - +-+-+-+-+-+-+ t t 3 e 6 r 8 t +-+-+-+-+ +-+-+ +-+-+-+ a eo m m 3 e |s|o|m|e| |o|f| |t|h|e| + h e c ee +-+-+-+-+ +-+-+ +-+-+-+ c h o +-+-+-+-+-+-+-+-+ +-+-+-+-+ +-+-+ +-+-+-+-+-+-+-+ +-+-+ i k t |d|a|t|a|s|e|t|s| |p|a|s|s| |a|s| |d|e|f|a|u|l|t| |i|n| o o o +-+-+-+-+-+-+-+-+ i +-+-+-+-+ +-+-+ +-+-+-+-+-+-+-+ +-+-+ r d a i m a . 1 +-+-+-+ +-+-+-+-+-+-+-+ s u r h o 2 |t|h|e| |m|a|c|h|i|n|e| l t + e a +-+-+-+ +-+-+-+-+-+-+-+ d 7 | e a eo 4 +-+-+-+-+-+-+-+-+ +-+-+-+-+-+ h n |l|e|a|r|n|i|n|g| |f|i|e|l|d| s n t _s n +-+-+-+-+-+-+-+-+ +-+-+-+-+-+ t n o +-+-+-+-+-+-+ +-+-+-+-+-+ +-+-+-+-+-+-+-+-+ e V a d |h|u|m|a|n|s| |g|u|i|d|e| |m|a|c|h|i|n|e|s| u n +-+-+-+-+-+-+ +-+-+-+-+-+ +-+-+-+-+-+-+-+-+ c e 5 1 2 r 6 r n 6 f l o l 28 % V V V V V V V % V % %% % %%% %% %% % %%% %%% % % %% V V V V V V V V V V V V V V V V % % % % % %% 0 %% 0 % % % % % %%% % V V V V V V V V V % % %% % % 0 0 % % % % % % % % % % % % % % 00 0 _ % % % % % %% % % % % % 0 /_\ _ __ % % % INFORMANTS % % % //_\\| '_ \ % 0 % % % % % 0 % % 0 / _ \ | | | % % % %% % % % 0 \_/ \_/_| |_| 0 0 V V V V V % V V V % __ _ _ 00 % 00 0 _ % V V V V V V V V V V V V V V V V 0 /__\ |_| |__ _ __ ___ __ _ _ __ __ _ _ __ | |__ _ _ V V V V V V V V V /_\ | __| '_ \| '_ \ / _ \ / _` | '__/ _` | '_ \| '_ \| | | | % V V V V V V V V //__ | |_| | | | | | | (_) | (_| | | | (_| | |_) | | | | |_| | V V V V V V V V V V V V V V V V % \__/ \__|_| |_|_| |_|\___/ \__, |_| \__,_| .__/|_| |_|\__, | V V V V % V V V V V 0 0 % 0 % |___/ |_| 0 |___/ % 0 0 __ 0 ___ % _ _ 0 % Machine learning algorithms need ___ / _| / \__ _| |_ __ _ ___ ___| |_ ___ guidance, whether they are super- 0 / _ \| |_ 0 / /\ / _` | __/ _` / __|/ _ \ __/ __| % vised or not. In order to separate | (_) | _| / /_// (_| | || (_| \__ \ __/ |_\__ \ one thing from another, they need \___/|_| /___,' \__,_|\__\__,_|___/\___|\__|___/ % % material to extract patterns from. 0 0 0 One should carefully choose the % % study material, and adapt it to the by Algolit machine's task. It doesn't make sense to train a machine with nine- We often start the monthly Algolit meetings by searching for teenth-century novels if its mis- datasets or trying to create them. Sometimes we use already-ex- sion is to analyse tweets. A badly isting corpora, made available through the Natural Language written textbook can lead a student Toolkit nltk. NLTK contains, among others, The Universal Declara- to give up on the subject altogeth- tion of Human Rights, inaugural speeches from US presidents, or er. A good textbook is preferably movie reviews from the popular site Internet Movie Database not a textbook at all. (IMDb). Each style of writing will conjure different relations % between the words and will reflect the moment in time from which This is where the dataset comes in: they originate. The material included in NLTK was selected be- arranged as neatly as possible, or- cause it was judged useful for at least one community of re- ganized in disciplined rows and searchers. In spite of specificities related to the initial con- lined-up columns, waiting to be text of each document, they become universal documents by de- read by the machine. Each dataset fault, via their inclusion into a collection of publicly avail- collects different information % able corpora. In this sense, the Python package manager for natu- about the world, and like all col- ral language processing could be regarded as a time capsule. The lections, they are imbued with col- main reason why The Universal Declaration for Human Rights was lectors' bias. You will hear this included may have been because of the multiplicity of transla- expression very often: 'data is the tions, but it also paints a picture of the types of human writing new oil'. If only data were more that algorithms train on. like oil! Leaking, dripping and heavy with fat, bubbling up and With this work, we look at the datasets most commonly used by jumping unexpectedly when in con- data scientists to train machine algorithms. What material do tact with new matter. Instead, data they consist of? Who collected them? When? is supposed to be clean. With each process, each questionnaire, each --- % column title, it becomes cleaner and cleaner, chipping distinct % Concept & execution: Cristina Cochior characteristics until it fits the % mould of the dataset. % % 0 0 00 0 Some datasets combine the machinic 0 0 0 0 logic with the human logic. The __ __ _ _ models that require supervision 0 / / /\ \ \ |__ ___ __ _(_)_ __ ___ multiply the subjectivities of both 0 \ \/ \/ / '_ \ / _ \ \ \ /\ / / | '_ \/ __| data collectors and annotators, \ /\ /| | | | (_) | \ V V /| | | | \__ \ then propagate what they've been 0 \/ \/ |_| |_|\___/ \_/\_/ |_|_| |_|___/ taught. You will encounter some of 0 0 0 0 0 the datasets that pass as default in the machine learning field, as Who wins: creation of relationships well as other stories of humans guiding machines. by Louise Dekeuleneer, student Arts²/Section Visual Communication French is a gendered language. Indeed many words are female or male and few are neutral. The aim of this project is to show that a patriarchal society also influences the language itself. The work focused on showing whether more female or male words are 29 % % %%% % %% % used on highlighting the influence of context on the gender of %%%%% % % % % % % words. At this stage, no conclusions have yet been drawn.  % % % % % %% % % % % % % % % % % % % %% Law texts from 1900 to 1910 made available by the Mundaneum have % % %% % % been passed into an algorithm that turns the text into a list of % %% % % % words. These words are then compared with another list of French % % % % % % words, in which is specified whether the word is male or female. This list of words comes from Google Books. They created a huge % % % % database in 2012 from all the books scanned and available on % Google Books. % % % % % % % % % % Male words are highlighted in one colour and female words in an- % % % % other. Words that are not gendered (adverbs, verbs, etc.) are not % % % highlighted. All this is saved as an HTML file so that it can be % % directly opened in a web page and printed without the need for % additional layout. This is how each text becomes a small booklet by just changing the input text of the algorithm. % 0 % 0 0 0 0 0 0 % _____ _ 0 0 % 0 0 /__ \ |__ ___ % 0 % / /\/ '_ \ / _ \ 0 % 0 / / | | | | __/ 0 % 0 0 0 \/ |_| |_|\___| % 0 _ 0 0 _ _ /_\ _ __ _ __ ___ | |_ __ _| |_ ___ _ __ //_\\| '_ \| '_ \ / _ \| __/ _` | __/ _ \| '__| / _ \ | | | | | | (_) | || (_| | || (_) | | 0 \_/ \_/_| |_|_| |_|\___/ \__\__,_|\__\___/|_| 0 0 % by Algolit The annotator asks for the guidance of visitors in annotating the archive of Mundaneum. The annotation process is a crucial step in supervised machine learning where the algorithm is given examples of what it needs to learn. A spam filter in training will be fed examples of spam % and real messages. These examples are entries, or rows from the dataset with a label, spam or non-spam. The labelling of a dataset is work executed by humans, they pick a label for each row of the dataset. To ensure the quality of the % labels multiple annotators see the same row and have to give the same label before an example is included in the training data. Only when enough samples of each label have been gathered in the dataset can the computer start the learning process. In this interface we ask you to help us classify the cleaned texts from the Mundaneum archive to expand our training set and improve the quality of the installation 'Classifying the World' in Oracles. --- Concept, code, interface: Gijs de Heij % % 0 0 0 0 0 0 0 0 0 0 0 0 _ ___ ___ ___ 00 0 0 / |/ _ \ / _ \ / _ \ 0 0 0 | | | | | | | | | | | 0 | | |_| | |_| | |_| | |_|\___/ \___/ \___/ 00 0 00 0 0 0 0 _ 00 30 %% % % %% % % % % ___ _ _ _ __ ___ ___| |_ ___ % % % % %% % % % % / __| | | | '_ \/ __|/ _ \ __/ __| % % % % % % % % %% 0 0 \__ \ |_| | | | \__ \ __/ |_\__ \ % % % % % % % % % % 0 0 % |___/\__, |_| |_|___/\___|\__|___/ %% % % % % % 0 %% 0 |___/ % % % 0 % %% % % 0 0 0 0 __ _ % 0 _ 0 %% % % % % % 0 0 / /\ /(_)_ __ _ _| | % % % % 0 | |\ \ / / | '_ \| | | | | % % % % 0 % | | \ V /| | | | | |_| | | 0 0 % % % % % | | \_/ |_|_| |_|\__, |_| % % % % % 00 \_\ 0 |___/ 0 % % % % % __ _ _ _ _ % __ 0 0 0 % /__\_| (_) |_(_) ___ _ __\ \ % /_\/ _` | | __| |/ _ \| '_ \| | 0 % //_| (_| | | |_| | (_) | | | | | 0 \__/\__,_|_|\__|_|\___/|_| |_| | 0 % % 00 0 0 /_/ 0 0 00 by Algolit Created in 1985, Wordnet is a hierarchical taxonomy that de- % scribes the world. It was inspired by theories of human semantic % memory developed in the late 1960s. Nouns, verbs, adjectives and adverbs are grouped into synonyms sets or synsets, expressing a different concept. % ImageNet is an image dataset based on the WordNet 3.0 nouns hier- archy. Each synset is depicted by thousands of images. From 2010 % until 2017, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) was a key benchmark in object category classification % for pictures, having a major impact on software for photography, image searches, image recognition. 1000 synsets (Vinyl Edition) contains the 1000 synsets used in this challenge recorded in the highest sound quality that this % analog format allows. This work highlights the importance of the datasets used to train artificial intelligence (AI) models that run on devices we use on a daily basis. Some of them inherit classifications that were conceived more than 30 years ago. This sound work is an invitation to thoughtfully analyse them. --- Concept & recording: Javier Lloret Voices: Sara Hamadeh & Joseph Hughes 31 CONTEXTUAL STORIES ABOUT INFORMANTS --- Datasets as representations --- community you try to distinguish what serves the community and what doesn't and you try to general- The data-collection processes that lead to the ize that, because I think that's what the good creation of the dataset raise important questions: faith-bad faith algorithm is trying to do, to find who is the author of the data? Who has the privi- helper tools to support the project, you do that lege to collect? For what reason was the selection on the basis of a generalization that is on the made? What is missing? abstract idea of what Wikipedia is and not on the living organism of what happens every day. What The artist Mimi Onuoha gives a brilliant example interests me in the relation between vandalism and of the importance of collection strategies. She debate is how we can understand the conventional chose the case of statistics related to hate drive that sits in these machine-learning pro- crimes. In 2012, the FBI Uniform Crime Reporting cesses that we seem to come across in many places. (UCR) Program registered almost 6000 hate crimes And how can we somehow understand them and deal committed. However, the Department of Justice’s with them? If you place your separation of good Bureau of Statistics came up with about 300.000 faith-bad faith on pre-existing labelling and then reports of such cases. That is over 50 times as reproduce that in your understanding of what edits many. The difference in numbers can be explained are being made, how then to take into account by how the data was collected. In the first situa- movements that are happening, the life of the ac- tion law enforcement agencies across the country tual project? voluntarily reported cases. For the second survey, the Bureau of Statistics distributed the National Amir: It's an interesting discussion. Firstly, Crime Victimization form directly to the homes of what we are calling good faith and bad faith comes victims of hate crimes. from the community itself. We are not doing la- belling for them, they are doing labelling for In the field of Natural Language Processing (NLP) themselves. So, in many different language the material that machine learners work with is Wikipedias, the definition of what is good faith text-based, but the same questions still apply: and what is bad faith will differ. Wikimedia is who are the authors of the texts that make up the trying to reflect what is inside the organism and dataset? During what period were the texts col- not to change the organism itself. If the organism lected? What type of worldview do they represent? changes, and we see that the definition of good faith and helping Wikipedia has been changed, we In 2017, Google's Top Stories algorithm pushed a are implementing this feedback loop that lets thread of 4chan, a non-moderated content website, people from inside their community pass judgement to the top of the results page when searching for on their edits and if they disagree with the la- the Las Vegas shooter. The name and portrait of an belling, we can go back to the model and retrain innocent person were linked to the terrible crime. the algorithm to reflect this change. It's some Google changed its algorithm just a few hours af- sort of closed loop: you change things and if ter the mistake was discovered, but the error had someone sees there is a problem, then they tell us already affected the person. The question is: why and we can change the algorithm back. It's an on- did Google not exclude 4chan content from the going project. training dataset of the algorithm? Référence: https://gitlab.constantvzw.org/algo Reference lit/algolit/blob/master/algoliterary_encounter https://points.datasociety.net/the-point-of-col- /Interview%20with%20Amir/AS.aac lection-8ee44ad7c2fa https://arstechnica.com/information-technolo- --- How to make your dataset known --- gy/2017/10/google-admits-citing-4chan-to-spread- fake-vegas-shooter-news/ NLTK stands for Natural Language Toolkit. For pro- grammers who process natural language using Python, this is an essential library to work with. --- Labeling for an Oracle that detects vandalism Many tutorial writers recommend machine learning on Wikipedia --- learners to start with the inbuilt NLTK datasets. It comprises 71 different collections, with a to- This fragment is taken from an interview with Amir tal of almost 6000 items. Sarabadani, software engineer at Wikimedia. He was in Brussels in November 2017 during the Algoliter- There is for example the Movie Review corpus for ary Encounter. sentiment analysis. Or the Brown corpus, which was put together in the 1960s by Henry Kučera and W. Femke: If you think about Wikipedia as a living Nelson Francis at Brown University in Rhode Is- community, with every edit the project changes. land. There is also the Declaration of Human Every edit is somehow a contribution to a living Rights corpus, which is commonly used to test organism of knowledge. So, if from within that whether the code can run on multiple languages. The corpus contains the Declaration of Human 32 Rights expressed in 372 languages from around the on is the same content that they helped to write. world. In fact, at the beginning of Wikipedia, many arti- cles were written by bots. Rambot, for example, But what is the process of getting a dataset ac- was a controversial bot figure on the English- cepted into the NLTK library nowadays? On the speaking platform. It authored 98 per cent of the Github page, the NLTK team describes the following pages describing US towns. requirements: As a result of serial and topical robot interven- Only contribute corpora that have obtained a ba- tions, the models that are trained on the full sic level of notability. That means, there is a Wikipedia dump have a unique view on composing ar- publication that describes it, and a community of ticles. For example, a topic model trained on all programmers who are using it. of Wikipedia articles will associate 'river' with Ensure that you have permission to redistribute 'Romania' and 'village' with 'Turkey'. This is be- the data, and can document this. This means that cause there are over 10000 pages written about the dataset is best published on an external web- villages in Turkey. This should be enough to spark site with a licence. anyone's desire for a visit, but it is far too Use existing NLTK corpus readers where possible, much compared to the number of articles other or else contribute a well-documented corpus reader countries have on the subject. The asymmetry to NLTK. This means, you need to organize your causes a false correlation and needs to be re- data in such a way that it can be easily read us- dressed. Most models try to exclude the work of ing NLTK code. these prolific robot writers. Reference https://blog.lateral.io/2015/06/the-unknown-per- --- Extract from a positive IMDb movie review from ils-of-mining-wikipedia/ the NLTK dataset --- corpus: NLTK, movie reviews fileid: pos/cv998_14111.txt steven spielberg ' s second epic film on world war ii is an unquestioned masterpiece of film . spiel- berg , ever the student on film , has managed to resurrect the war genre by producing one of its grittiest , and most powerful entries . he also managed to cast this era ' s greatest answer to jimmy stewart , tom hanks , who delivers a perfor- mance that is nothing short of an astonishing mir- acle . for about 160 out of its 170 minutes , " saving private ryan " is flawless . literally . the plot is simple enough . after the epic d - day invasion ( whose sequences are nothing short of spectacular ) , capt . john miller ( hanks ) and his team are forced to search for a pvt . james ryan ( damon ) , whose brothers have all died in battle . once they find him , they are to bring him back for immediate discharge so that he can go home . accompanying miller are his crew , played with astonishing perfection by a group of charac- ter actors that are simply sensational . barry pepper , adam goldberg , vin diesel , giovanni ribisi , davies , and burns are the team sent to find one man , and bring him home . the battle se- quences that bookend the film are extraordinary . literally . --- The ouroboros of machine learning --- Wikipedia has become a source for learning not only for humans, but also for machines. Its arti- cles are prime sources for training models. But very often, the material the machines are trained 33 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678 9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567 89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456 789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345 6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234 56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123 456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12 3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1 23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678 9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567 89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456 789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345 6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234 56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123 456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12 3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1 23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678 9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567 89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456 789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345 6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234 56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123 456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12 3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1 23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678 9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567 89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456 789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345 6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234 56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123 456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12 3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1 23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678 9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567 89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456 789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345 6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234 56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123 456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12 3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1 23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678 9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567 89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456 789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345 6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234 56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123 456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12 3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1 23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345678 9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234567 89 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456 789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345 6789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1234 56789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123 456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12 3456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 1 23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 34 readers read readers read readers read readers read readers read readers read readers re d readers read readers read readers read readers read readers re d readers read readers read readers read readers read readers read readers read readers read re ders read readers read readers read readers re d readers read readers read readers r ad readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers re d readers read readers read readers read readers read readers read readers read readers read re ders read readers read readers read readers read readers read readers read readers r ad readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read readers read r 35 h a o e f rtlt9 b9r+t +-+-+-+-+-+-+-+ n +-+-+-+-+ aM B 6 r fwea5I s s ,e -h e e m et u t w8 8+ i4 + R w e |r|e|a|d|e|r|s| f |r|e|a|d| C a r_ n b - i1 a s- noh6M+ pha h a% 8 e olt r_ m c hb8 b +-+-+-+-+-+-+-+ mi +-+-+-+-+ pli f ro u n ae 3aee d oo| 3h 6o 2 ce 'd | 8 eA s d8 - i 6 1 %6 sr2 9 g2 a s lia wrc 3 ?7 i n3+7m s c htiuw :ead 7 _ 9r t i d 5 sau4nl |e_ ar 8orl t h h+se a s _o1 s56 ka5n1e no hd d m u 's +e | h64t +-+ +-+-+-+-+-+-+-+-+ o +-+-+-+-+-+-+-+-+-+-+-+ enl o 3 t d Ad- 2 ahs g o i 0 _ 5o ss x 4 |a| |c|o|m|p|u|t|e|r| sl |u|n|d|e|r|s|t|a|n|d|s| 4i 8 trdiM 48 i5 2 9 tl e ri 6 9 ln a /8e +-+ +-+-+-+-+-+-+-+-+ 6 x +-+-+-+-+-+-+-+-+-+-+-+ 4 \eda o |y A o3 /1 e _ en l r 7 -sd c o +-+-+-+ +-+-+-+-+-+-+ l +-+-+-+-+-+-+-+-+-+ d6 m7n n a np l4 s 7 t p e M fdh c as |a|l|l| |m|o|d|e|l|s| Sa |t|r|a|n|s|l|a|t|e| a 6 w da 5 - o4 5 i ) r l a nn sh fc ui e7 +-+-+-+ +-+-+-+-+-+-+ c a +-+-+-+-+-+-+-+-+-+ ar 9 r , e a 3 , i 4 r 2 t +-+-+-+-+ +-+-+-+-+-+-+ 72 +-+-+-+-+-+ p r s r a a h an ' 3 a o p ft n l |s|o|m|e| |m|o|d|e|l|s| |c|o|u|n|t| 8r n| 1 a r h o /oa e 7 m8 4 wa +-+-+-+-+ +-+-+-+-+-+-+ l 7 +-+-+-+-+-+ 2 or r i 9e 4 p142 ,6r l 4N i u-3 am +-+-+-+-+ +-+-+-+-+-+-+ 4s +-+-+-+-+-+-+-+ 23 a e rea le dhVo t74 g j 7 t o e rd |s|o|m|e| |m|o|d|e|l|s| |r|e|p|l|a|c|e| o -i no r + 2 r l i o 6 7g i tt i +-+-+-+-+ +-+-+-+-+-+-+ 8fa +-+-+-+-+-+-+-+ x7 e g o ee d +ni d i tr 6k t r 2 3a8 9 i3 5 hv7 ge 5e u - 3y a _ e 2 8 c 55fi1 - 6 :29 t e al+ atp43e + ac t n b t hTsa4ti03 o% % flol 4-e rf m r 8 6y heta 1 e 1 m6 +t dy p e 9 n ,o 5 / n _ | s e1 + ni d n 3 leo 5 ti 5 - sc a +1 w uw9 n+ e i m m 3 a a a 9 \ -8 18 e e l i e h ghc ey9 8 15 3y a 1 -e i 5a i 9r a5pe o c c % a + 255 t yy m % 4i i 5 i e t _ 7 au l% 7 o g s8 5 e 2 r 3i 2 1 _ i4ir 2 e l s 1la n s s ht 2 r s i 3 r u s+ a e m + 6 2n r-l a c6 - t 7 4t +i +r % 8 6 8 r t t r 3 1 r s 90 k hl a pWn e i5 7 8 a r e4ro e r5wt s m - h ea 6 2 8 2 v h nf e _ w lr a iai 7 | j 4 4 f hc i F 9 p s m toG al 6 / h sde l e a 4 s 6 9 - h o m 6 _l34 . % w7 e 8 e l n .52- i 7 5 _ r + s 5 p s 5n+ 3 il e 1 o F c 3 l 2 a o en% _. e 4 8lb 3 r a I 9 k o t r 6 e + 2 6 y oa n i r% f 1 n78 s h F o e g v 6 u h ad Ua1 2 a t 9 er n t oh7 s s r t g + 7 6 h8 t 7 a - m 73| t o e r i 7 f l ia s _ e u + 7 ct \ a _ 2- 7 . o o - , t n 0n 4+ f 2r i 9 s y i3 r t r s e a p m h 4 a c 7 t 9 n n m mro t s i nd e r a 1 e e | e 1 3 c n k 2 p e o e 7i s d 6 a 48 c + Dl 1 1 n r - 0 V r + a o % 7 7 9r 4 | 9 n 7 e e n | , m n e s s 1 e n 5 5 r 4 o 5 1 6 e - 2 a -r _ e s’1 e S i t 2 +|ee s e c n an i e a4 9 9 o p _ t 7 h v 9 0 d % a e , s nr 9 l W h a e t | + + s a 3 7I a e tk K y3e 2 c - a h o u e d \+ o 1 h r d t e nl 4 k 9 07 o t v 7s , n e % _x | i t b1 r h ei t a8 e o n t 12 o rs a y i e + n a | a 9 \ n sr - e 3 i r- 8o e i 6 f i 3 ht a l | h 1 o a s df m5 i h n i 9n ,u d c n H s o l c i 5 o | s m rl 9 1 n c _i e i + i nr 8 h % t a % t 0 m i 6 c6 wt a r g s pr l t a 5 | c i | e 1 sr/ n e 7 e 9 n t w e c ' m c - o % n . a 3 f1 c I u 9 + t 2 . , 4 na P e e f 2 n i t 1S f n n a i e r + e i h 9 _ v 3 | h e t s a s E l v - p u 1 h 2 , ' 5 | + nse t a % 8 e w o p n y o s o 36 V V V V V V % V V % % % %% % % % % %% % % %% V V V V V V V V V V V V V V V V % % % 0 0 % % % % 0 %% % %%% % % %%% % V V V V V V V V V % 0 0 %% % % 0 0 % % 0 % % % %% % % 0 _____ _ % ___ % _ % __ % % % % % % /__ \ |__ % ___ / __\ ___ ___ | | __ ___ / _| % % % READERS % / /\/ '_ \ / _ \ /__\/// _ \ / _ \| |/ / / _ \| |_ % % % / / | | | | __/ / \/ \ (_) | (_) | < | (_) | _| % % % \/ |_| |_|\___| \_____/\___/ \___/|_|\_\ \___/|_| V % V V V V V V V % % _____ 0 % 0 _ V V V V V V V V V V V V V V V V % /__ \___ _ __ ___ ___ _ __ _ __ _____ __ (_)_ __ V V V V V V V V V / /\/ _ \| '_ ` _ \ / _ \| '__| '__/ _ \ \ /\ / / | | '_ \ V % V V V V V V V / / | (_) | | | | | | (_) | | | | | (_) \ V V / | | | | | V V V V V V V V V V V V V V V V \/ \___/|_| |_| |_|\___/|_| |_| \___/ \_/\_/ |_|_| |_| % V V % V V V V V V V 0 0 ___ % 0 0 __ % % 0 __ _ 0 / __\ __ _ __ _ ___ / _| % We communicate with computers 0 0 / _` | /__\/// _` |/ _` | / _ \| |_ 0 through language. We click on icons | (_| | / \/ \ (_| | (_| | | (_) | _| % that have a description in words, 0 \__,_| \_____/\__,_|\__, | \___/|_| we tap words on keyboards, use our 0 00 |___/ % voice to give them instructions. 0 __ __ % _ Sometimes we trust our computer % % 0 / / /\ \ \___ _ __ __| |___ 0 % % with our most intimate thoughts and \ \/ \/ / _ \| '__/ _` / __| 0 forget that they are extensive cal- % 0 0 \ /\ / (_) | | | (_| \__ \ 0 % culators. A computer understands \/ \/ \___/|_| \__,_|___/ 0 % every word as a combination of ze- 0 0 0 ros and ones. A letter is read as a specific ASCII number: capital 'A' by Algolit % % is 001. % The bag-of-words model is a simplifying representation of text In all models, rule-based, classi- used in Natural Language Processing (NLP). In this model, a text cal machine learning, and neural is represented as a collection of its unique words, disregarding networks, words undergo some type grammar, punctuation and even word order. The model transforms of translation into numbers in or- the text into a list of words and how many times they're used in der to understand the semantic the text, or quite literally a bag of words. meaning of language. This is done through counting. Some models count This heavy reduction of language was the big shock when beginning the frequency of single words, some to machine learn. Bag of words is often used as a baseline, on might count the frequency of combi- which the new model has to perform better. It can understand the nations of words, some count the subject of a text by recognizing the most frequent or important frequency of nouns, adjectives, words. It is often used to measure the similarities of texts by verbs or noun and verb phrases. comparing their bags of words. Some just replace the words in a % text by their index numbers. Num- For this work the article 'Le Livre de Demain' by engineer G. bers optimize the operative speed Vander Haeghen, published in 1907 in the Bulletin de l'Institut of computer processes, leading to International de Bibliographie of the Mundaneum, has been liter- fast predictions, but they also re- ally reduced to a bag of words. You can buy a bag at the recep- move the symbolic links that words tion of Mundaneum. might have. Here we present a few techniques that are dedicated to --- making text readable to a machine. % Concept & realisation: An Mertens 0 00 0 0 0 0 _____ ___ _____ ___ ___ 0 0 /__ \/ __\ \_ \/ \/ __\ 0 0 / /\/ _\____ / /\/ /\ / _\ 0 00 / / / /|_____/\/ /_/ /_// / \/ \/ \____/___,'\/ 0 by Algolit The TF-IDF (Term Frequency-Inverse Document Frequency) is a weighting method used in text search. This statistical measure makes it possible to evaluate the importance of a term contained in a document, relative to a collection or corpus of documents. The weight increases in proportion to the number of occurrences 37 %% % % % %% %% of the word in the document. It also varies according to the fre- % % % % % quency of the word in the corpus. The TF-IDF is used in particu- % % % % %% lar in the classification of spam in email softwares. % % % % % % % % % % % % % % A web-based interface shows this algorithm through animations % % making it possible to understand the different steps of text % % % % classification. How does a TF-IDF-based programme read a text? % % How does it transform words into numbers? % % % % % % % % % --- % % % % % % Concept, code, animation: Sarah Garcin % % % % % % % 0 0 % % % 0 0 % 0 ___ 0 _ 0 0 0 / _ \_ __ _____ _(_)_ __ __ _ __ _ 0 / /_\/ '__/ _ \ \ /\ / / | '_ \ / _` | / _` | 0 / /_\\| | | (_) \ V V /| | | | | (_| | | (_| | 0 \____/|_| \___/ \_/\_/ |_|_| |_|\__, | \__,_| 0 0 0 |___/ 0 0 0 0 _ 0 % % | |_ _ __ ___ ___ % 0 0 | __| '__/ _ \/ _ \ % % 0 | |_| | | __/ __/ 0 0 0 \__|_| \___|\___| % by Algolit % % % % % Parts-of-Speech is a category of words that we learn at school: % noun, verb, adjective, adverb, pronoun, preposition, conjunction, % interjection, and sometimes numeral, article, or determiner. % In Natural Language Processing (NLP) there exist many writings that allow sentences to be parsed. This means that the algorithm can determine the part-of-speech of each word in sentence.'Growing tree' uses this techniques to define all nouns in specific sentence. Each noun is then replaced by its definition. This allows the sentence to grow autonomously and infinitely. The recipe of 'Growing tree' was inspired by Oulipo' constraint of 'littérature définitionnelle' invented by Marcel Benabou in 1966. In given phrase, one replaces every significant element (noun, adjective, verb, adverb) by one of its definitions in given dictionary  one reiterates the operation on the newly received phrase, and again. The dictionary of definitions used in this work is Wordnet. Word- net is a combination of a dictionary and a thesaurus that can be read by machines. According to Wikipedia it was created in the Cognitive Science Laboratory of Princeton University starting in 1985. The project was initially funded by the US Office of Naval Research and later also by other US government agencies including DARPA, the National Science Foundation, the Disruptive Technology Office (formerly the Advanced Research and Development Activity), and REFLEX. --- Concept, code & interface: An Mertens & Gijs de Heij 0 0 0 0000 0 0 _ _ _ _ _ _ /_\ | | __ _ ___ _ __(_) |_| |__ _ __ ___ (_) ___ 0 //_\\| |/ _` |/ _ \| '__| | __| '_ \| '_ ` _ \| |/ __| / _ \ | (_| | (_) | | | | |_| | | | | | | | | | (__ 38 % %% % % %% % % % \_/ \_/_|\__, |\___/|_| |_|\__|_| |_|_| |_| |_|_|\___| % % %% % % % % % |___/ % 0 %% % % 00 %% %% %% % % 0 % % % % 0 0 _ _ % % 0 __ %% % % % _ __ ___ __ _ __| (_)_ __ __ _ ___ ___ / _| %% % % % % | '__/ _ \/ _` |/ _` | | '_ \ / _` / __| / _ \| |_ % % % | | | __/ (_| | (_| | | | | | (_| \__ \ | (_) | _| % % |_| \___|\__,_|\__,_|_|_| |_|\__, |___/ \___/|_| % % 0 % 0 0 0 0 |___/ 0 % 0 % %% % ___ 0 _ 0 _ _ _ 0 _ % % %% % / __\ ___ _ __| |_(_) | | ___ _ __( )__ % % % 0 /__\/// _ \ '__| __| | | |/ _ \| '_ \/ __| %% / \/ \ __/ | | |_| | | | (_) | | | \__ \ % % 0 0 \_____/\___|_| \__|_|_|_|\___/|_| |_|___/ % % 0 _ _ _ 0 % % % _ __ 0 ___ _ __| |_ _ __ __ _(_) |_ % % % % | '_ \ / _ \| '__| __| '__/ _` | | __| 0 % 00 | |_) | (_) | | | |_| | | (_| | | |_ % | .__/ \___/|_| \__|_| \__,_|_|\__| % |_| 0 % 0 0 0 % _ 0 0 0 _ __ __ _ _ __| | ___ 0 0 | '_ \ / _` | '__| |/ _ \ 0 0 | |_) | (_| | | | | __/ 0 0 | .__/ \__,_|_| |_|\___| 0 0 |_| 00 0 0 0 0 00 % by Guillaume Slizewicz (Urban Species) % % % Written in 1907, Un code télégraphique du portrait parlé is an attempt to translate the 'spoken portrait', a face-description technique created by a policeman in Paris, into numbers. By im- plementing this code, it was hoped that faces of criminals and fugitives could easily be communicated over the telegraphic net- % work in between countries. In its form, content and ambition this text represents our complicated relationship with documentation % technologies. This text sparked the creation of the following in- % stallations for three reasons: % - First, the text is an algorithm in itself, a compression algo- rithm, or to be more precise, the presentation of a compression % algorithm. It tries to reduce the information to smaller pieces while keeping it legible for the person who has the code. In this % regard it is linked to the way we create technology, our pursuit for more efficiency, quicker results, cheaper methods. It repre- sents our appetite for putting numbers on the entire world, mea- suring the smallest things, labeling the tiniest differences. This text itself embodies the vision of the Mundaneum. - Second it is about the reasons for and the applications of technology. It is almost ironic that this text was in the se- lected archives presented to us in a time when face recognition and data surveillance are so much in the news. This text bears the same characteristics as some of today's technology: motivated by social control, classifying people, laying the basis for a surveillance society. Facial features are at the heart of recent controversies: mugshots were standardized by Bertillon, now they are used to train neural network to predict criminals from law- abiding citizens. Facial recognition systems allow the arrest of criminals via CCTV infrastructure and some assert that people’s features can predict sexual orientation. - The last point is about how it represents the evolution of mankind’s techno-structure. What our tools allow us to do, what they forbid, what they hinder, what they make us remember and what they make us forget. This document enables a classification between people and a certain vision of what normality is. It breaks the continuum into pieces thus allowing stigmatiza- tion/discrimination. On the other hand this document also feels 39 %% %% % %% %% % obsolete today, because our techno-structure does not need such % %% % % % detailed written descriptions about fugitives, criminals or citi- % % %% % % % % % % zens. We can now find fingerprints, iris scans or DNA info in % % % % % % % % % % % large datasets and compare them directly. Sometimes the techno- % % % % % logical systems do not even need human supervision and recognize % % % %% % % directly the identity of a person via their facial features or % % % their gait. Computers do not use intricate written language to describe a face, but arrays of integers. Hence all the words used % in this documents seem désuets, dated. Have we forgotten what % some of them mean? Did photography make us forget how to describe % faces? Will voice-assistance software teach us again? % Writing with Otlet % % % % Writing with Otlet is a character generator that uses the spoken % % portrait code as its database. Random numbers are generated and % translated into a set of features. By creating unique instances, % the algorithm reveals the richness of the description that is possible with the portrait code while at the same time embodying its nuances. % An interpretation of Bertillon's spoken portrait. %% % This work draws a parallel between Bertillon systems and current ones. A webcam linked to a facial recognition algorithm captures % the beholder's face and translates it into numbers on a canvas, % printing it alongside Bertillon's labelled faces. % % References https://www.technologyreview.com/s/602955/neural-network-learns- to-identify-criminals-by-their-faces/ https://fr.wikipedia.org/wiki/Bertillonnage https://callingbullshit.org/case_studies/case_study_criminal_ma- chine_learning.html % % % % % 0 0 0 0 % 0 0 0 /\ /\__ _ _ __ __ _ _ __ ___ __ _ _ __ 0 / /_/ / _` | '_ \ / _` | '_ ` _ \ / _` | '_ \ / __ / (_| | | | | (_| | | | | | | (_| | | | | \/ /_/ \__,_|_| |_|\__, |_| |_| |_|\__,_|_| |_| 0 0 |___/ 0 0 % 0 0 0 0 0 % % by Laetitia Trozzi, student Arts²/Section Digital Arts What better way to discover Paul Otlet and his passion for liter- ature than to play hangman? Through this simple game, which con- sists in guessing the missing letters in a word, the goal is to make the public discover terms and facts related to one of the creators of the Mundaneum. % Hangman uses an algorithm to detect the frequency of words in a text. Next, a series of significant words were isolated in Paul Otlet's bibliography. This series of words is integrated into a hangman game presented in a terminal. The difficulty of the game gradually increases as the player is offered longer and longer words. Over the different game levels, information about the life and work of Paul Otlet is displayed. % 40 CONTEXTUAL STORIES ABOUT READERS Naive Bayes, Support Vector Machines and Linear ter trigram. All the overlapping sequences of Regression are called classical machine learning three characters are isolated. For example, the algorithms. They perform well when learning with character 3-grams of 'Suicide', would be, ‘Sui’, small datasets. But they often require complex ‘uic’, ‘ici’, ‘cid’, etc. Character n-gram fea- Readers. The task the Readers do, is also called tures are very simple, they're language-indepen- feature-engineering. This means that a human needs dent and they're tolerant to noise. Furthermore, to spend time on a deep exploratory data analysis spelling mistakes do not jeopardize the technique. of the dataset. Patterns found with character n-grams focus on Features can be the frequency of words or letters, stylistic choices that are unconsciously made by but also syntactical elements like nouns, adjec- the author. The patterns remain stable over the tives, or verbs. The most significant features for full length of the text, which is important for the task to be solved, must be carefully selected authorship recognition. Other types of experiments and passed over to the classical machine learning could include measuring the length of words or algorithm. This process marks the difference with sentences, the vocabulary richness, the frequen- Neural Networks. When using a neural network, cies of function words; even syntax or semantics- there is no need for feature-engineering. Humans related measurements. can pass the data directly to the network and achieve fairly good performances straightaway. This means that not only your physical fingerprint This saves a lot of time, energy and money. is unique, but also the way you compose your thoughts! The downside of collaborating with Neural Networks is that you need a lot more data to train your The same n-gram technique discovered that The prediction model. Think of 1GB or more of plain Cuckoo’s Calling, a novel by Robert Galbraith, was text files. To give you a reference, 1 A4, a text actually written by … J. K. Rowling! file of 5000 characters only weighs 5 KB. You would need 8,589,934 pages. More data also re- Reference quires more access to useful datasets and more, Paper: On the Robustness of Authorship Attribu- much more processing power. tion Based on Character N-gram Features, Efs- tathios Stamatatos, in Journal of Law & Policy, Volume 21, Issue 2, 2013. --- Character n-gram for authorship recognition News article: https://www.scientificamerican.- --- com/article/how-a-computer-program-helped-show-jk- rowling-write-a-cuckoos-calling/ Imagine … You've been working for a company for more than ten years. You have been writing tons of --- A history of n-grams --- emails, papers, internal notes and reports on very different topics and in very different genres. All The n-gram algorithm can be traced back to the your writings, as well as those of your col- work of Claude Shannon in information theory. In leagues, are safely backed-up on the servers of the paper, 'A Mathematical Theory of Communica- the company. tion', published in 1948, Shannon performed the first instance of an n-gram-based model for natu- One day, you fall in love with a colleague. After ral language. He posed the question: given a se- quence of letters, what is the likelihood of the hysterical and also very dependent on you. The day next letter? you decide to break up, your (now) ex elaborates a plan to kill you. They succeed. This is unfortu- If you read the following excerpt, can you tell nate. A suicide letter in your name is left next who it was written by? Shakespeare or an n-gram to your corpse. Because of emotional problems, it piece of code? says, you decided to end your life. Your best friends don't believe it. They decide to take the SEBASTIAN: Do I stand till the break off. case to court. And there, based on the texts you and others produced over ten years, a machine BIRON: Hide thy head. learning model reveals that the suicide letter was written by someone else. VENTIDIUS: He purposeth to Athens: whither, with the vow How does a machine analyse texts in order to iden- I made to handle you. tify you? The most robust feature for authorship recognition is delivered by the character n-gram FALSTAFF: My good knave. technique. It is used in cases with a variety of thematics and genres of the writing. When using You may have guessed, considering the topic of character n-grams, texts are considered as se- this story, that an n-gram algorithm generated quences of characters. Let's consider the charac- this text. The model is trained on the compiled works of Shakespeare. While more recent algo- 41 rithms, such as the recursive neural networks of is good, they buy. the CharNN, are becoming famous for their perfor- mance, n-grams still execute a lot of NLP tasks. A paper by Haikuan Liu of the Australian National They are used in statistical machine translation, University states that the tense of verbs used in speech recognition, spelling correction, entity tweets can be an indicator of the frequency of fi- detection, information extraction, ... nancial transactions. His idea is based on the fact that verb conjugation is used in psychology to detect the early stages of human depression. --- God in Google Books --- Reference In 2006, Google created a dataset of n-grams from Paper: 'Grammatical Feature Extraction and Analy- their digitized book collection and released it sis of Tweet Text: An Application towards Pre- online. Recently they also created an n-gram view- dicting Stock Trends', Haikuan Liu, Research er. School of Computer Science (RSCS), College of Engineering and Computer Science (CECS), The Aus- This allowed for many socio-linguistic investiga- tralian National University (ANU) tions. For example, in October 2018, the New York Times Magazine published an opinion article titled 'It’s Getting Harder to Talk About God'. The au- --- Bag of words --- thor, Jonathan Merritt, had analysed the mention of the word 'God' in Google's dataset using the In Natural Language Processing (NLP), 'bag of n-gram viewer. He concluded that there had been a words' is considered to be an unsophisticated mod- decline in the word's usage since the twentieth el. It strips text of its context and dismantles century. Google's corpus contains texts from the it into a collection of unique words. These words sixteenth century leading up to the twenty-first. are then counted. In the previous sentences, for However, what the author missed out on was the example, 'words' is mentioned three times, but growing popularity of scientific journals around this is not necessarily an indicator of the text's the beginning of the twentieth century. This new focus. genre that was not mentioning the word God shifted the dataset. If the scientific literature was The first appearance of the expression 'bag of taken out of the corpus, the frequency of the word words' seems to go back to 1954. Zellig Harris, 'God' would again flow like a gentle ripple from a an influential linguist, published paper called distant wave. 'Distributional Structure' In the section called 'Meaning as function of distribution' he says 'for language is not merely bag of words but --- Grammatical features taken from Twitter influ- tool with particular properties which have been ence the stock market --- fashioned in the course of its use. The linguist' work is precisely to discover these properties, The boundaries between academic disciplines are whether for descriptive analysis or for the synthesis becoming blurred. Economics research mixed with of quasi-linguistic systems. psychology, social science, cognitive and emo- tional concepts have given rise to a new economics subfield, called 'behavioral economics'. This means that researchers can start to explain stock market mouvement based on factors other than eco- nomic factors only. Both the economy and 'public opinion' can influence or be influenced by each other. A lot of research is being done on how to use 'public opinion' to predict tendencies in stock-price changes. 'Public opinion' is estimated from sources of large amounts of public data, like tweets, blogs or online news. Research using machinic data anal- ysis shows that the changes in stock prices can be predicted by looking at 'public opinion', to some degree. There are many scientific articles online, which analyse the press on the 'sentiment' ex- pressed in them. An article can be marked as more or less positive or negative. The annotated press articles are then used to train a machine learning model, which predicts stock market trends, marking them as 'down' or 'up'. When a company gets bad press, traders sell. On the contrary, if the news 42 learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn lea ners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners earn learners learn learners learn learners learn learners learn learners learn learners learn lea ners learn learners learn learners learn learners learn learners earn learners learn learne s learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn lea ners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners earn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn learners learn 43 4n r- ro %r5 l e +-+-+-+-+-+-+-+-+ f +-+-+-+-+-+ m 9-e p + st2- a , _ nr2 l itr9 op 2c b ue |l|e|a|r|n|e|r|s| , y |l|e|a|r|n| ) g- 9 c w 1 atn_wn o_ c| c o b op , +_7 -x a 9acl +-+-+-+-+-+-+-+-+ hc +-+-+-+-+-+ 34 u a 9a l |an t p 9 - |\ _ l6el , 7 3 u r1 3 8dl a. m s T rv t ro|lm ni3 4 V3 as1to 4 e hp 5_s -o 4 d o9n t 0 t V i5n _ i, _ iu9 l + t t 6t s r s exe4eh l 4 ri _g d s es c s a 4s i+ i _ +-+-+-+-+-+-+-+-+ +-+-+-+ +-+-+-+-+-+-+-+ e l4 f k 5l l wu |f ete V o I- 4e |l|e|a|r|n|e|r|s| 6 e |a|r|e| |p|a|t|t|e|r|n| st 62 t a ne e 2 ? .n l 1 ntb 5 d9 +-+-+-+-+-+-+-+-+ e e1 +-+-+-+ +-+-+-+-+-+-+-+ ia 5 n i w er8 er 1 t i 9 te9 n r7 | t ie m +-+-+-+-+-+-+-+ n s 1 i- e i X c w a 4 _c4 c s+ m t eh h.5 t a i t m p3 a e |f|i|n|d|e|r|s| , ll 6a e e7ifo- +cs te s- h 5 8 m wl c tl u w2 +-+-+-+-+-+-+-+ 8 r s oe t % 8- 1 tl3o 4 n r a t t 3a 9 +-+-+-+-+-+-+-+-+ 5i9 +-+-+-+ +-+-+-+-+-+-+-+-+ l s 9 | 9a e 0sbntaf m(um8 j ra e +t o |l|e|a|r|n|e|r|s| |a|r|e| |c|r|a|w|l|i|n|g| n n ei pte7i r 6ms t s G_ el i + ka e . +-+-+-+-+-+-+-+-+ +-+-+-+ +-+-+-+-+-+-+-+-+ ,/s u r r 4 1 i h d heeo 2eei m g r ao a ah( 9a u m9 V e +-+-+-+-+-+-+-+ +-+-+-+-+ nae T-e r s-i5 7n gt r_ y e io 96 e e s d |T trig - l |t|h|r|o|u|g|h| |d|a|t|a| 7s e1s77 87 2 fw m c 9d. 2 _ e 2nnm 96 n a t7- c d, o e +-+-+-+-+-+-+-+ +-+-+-+-+ 6 r n rbhi e 5 s n d / _ 2r s f a ef +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+ +-+-+-+-+ h asn _ t5 w w p l n | a -s |l|e|a|r|n|e|r|s| e |g|e|n|e|r|a|t|e| |s|o|m|e| |k|i|n|d| u s s ie im i i 7 t 4 +-+-+-+-+-+-+-+-+ r +-+-+-+-+-+-+-+-+ +-+-+-+-+ +-+-+-+-+ u t nr+ a c 7 t s x 4 da n 7 Fd e c & +-+-+ +-+-+-+-+-+-+-+-+ raa o c5 ' e ro. k1 n t re 8 n et 9 1 l r 0V |o|f| |s|p|e|c|i|f|i|c| a t9 s c rv v s l n_fa r% a Z a 5 w me m n 5 1s n +-+-+ +-+-+-+-+-+-+-+-+ t S 1 o a r d rb y 7 r c o ge D _ns v / b +-+-+-+-+-+-+-+-+-+ 8 4- i o 9 t e i 4 9 9t6 9- é2 o p| o v i |'|g|r|a|m|m|a|r|'| n p t p 8sn _ l 8 nt 2pc t V4 e ha e 3 1 , n 2 i o +-+-+-+-+-+-+-+-+-+ %4 r 8 1 1 t e e 8 rn d +-+-+-+-+-+-+-+-+-+-+-+ i +-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ u t e e e e r F |c|l|a|s|s|i|f|i|e|r|s| %f |g|e|n|e|r|a|t|e|,| |e|v|a|l|u|a|t|e| 1 h V0 t n nh % c 5 h r +-+-+-+-+-+-+-+-+-+-+-+ ti +-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ Ul n m , - n 2 ab m 3 o- r e 6| n +-+-+-+ +-+-+-+-+-+-+-+-+ 6 + oe / l t i u + u t l i 7 ei |a|n|d| |r|e|a|d|j|u|s|t| 5 r f l f5 % n 2 s e m a m e d1 m uh c +-+-+-+ +-+-+-+-+-+-+-+-+ n s g o _ e d c ps +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+ +-+-+-+ + a D y5 8r +1n o h |l|e|a|r|n|e|r|s| |u|n|d|e|r|s|t|a|n|d| |a|n|d| k4t tr t m u a t +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+ +-+-+-+ a 3 i 3 t 2 r 7 n n 9 r r. t p i +-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ -- c g + l t v c i 8 f as |r|e|v|e|a|l| |p|a|t|t|e|r|n|s| a _ n 4 s l 5 2 + f s - l +-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ 4 - e y + h -_ 7 +-+-+-+-+-+-+-+-+ +-+-+-+-+-+ +-+-+-+-+-+-+ o . - i e i e l t e _ V n |l|e|a|r|n|e|r|s| |d|o|n|'|t| |a|l|w|a|y|s| 4b ,i _ % rt h e ,a +-+-+-+-+-+-+-+-+ +-+-+-+-+-+ +-+-+-+-+-+-+ a _ h _ 2 V o 5 t +-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+ _ s c % po + h o3 mi5 8 |d|i|s|t|u|i|n|g|u|i|s|h| |w|e|l|l| w 7 _nn , ha u pk +-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+ 91s 6 a s hp I 3 % +-+-+-+-+-+ +-+-+-+-+-+-+-+-+ i 8 v o 6 o r s |w|h|i|c|h| |p|a|t|t|e|r|n|s| s_ oge e n a + e o e 3 n 7 +-+-+-+-+-+ +-+-+-+-+-+-+-+-+ o 6 + i l r \ m + a l r +-+-+-+-+-+-+ +-+-+ +-+-+-+-+-+-+-+-+ , n c a o o o |s|h|o|u|l|d| |b|e| |r|e|p|e|a|t|e|d| eh s i o tlt t 2 e5 d +-+-+-+-+-+-+ +-+-+ +-+-+-+-+-+-+-+-+ o s 7 d 2 5 | n | 1 ey d te a t r | , + 9 6 % f a i s % n o+| r u s \ 4 e ep e ao 2 | f' | e e r 9 7 Td i d e . t 8m d c l 6 l o i _ t T i - i n 7 e d 3 p l a n . i l i i % 8 a + p r l e 4 % a l | h 5 | tl d 1mo 7 t N , t o i 9 o? F W 9 dC %hf o m 5 t t w , - 3p a d s e a n t _ o c \ f + p a r f |el 8 , g i l e e t e3 - - 9 h c t t +w + | u0 w t . h 5 a , s t d _ n V 4 a o , o t r nt w e e 44 V V V % V % V % V V % V % % % % % %% % % % % % % % V V V V V V V V V V V V V V V V % % % 0 % % % %% % % %%% % % V V V % V V V V V V % % %% 0 0 % % 0 % 00 % % % % % % % 0 % __ _ 0 % 0 % ___ % 0 0 % % % % % % % 0 /\ \ \__ _(_)_ 0 _____ / __\ __ _ _ _ ___ ___ % % % LEARNERS % % / \/ / _` | \ \ / / _ \ /__\/// _` | | | |/ _ \/ __| % % % % % / /\ / (_| | |\ V / __/ / \/ \ (_| | |_| | __/\__ \ % % % % \_\ \/ \__,_|_| \_/ \___| \_____/\__,_|\__, |\___||___/ V V V V V V V V % 0 % % 0 0 % % |___/ V V V V V V V V V V V V V V V V % __ _ __ _ _ __ ___ ___ 0 % % V V V V V V V % V V % % / _` |/ _` | '_ ` _ \ / _ \ % V V V V V V V V 0 0 | (_| | (_| | | | | | | __/ % V V V V V V V V V V V V V V V V % 0 00 \__, |\__,_|_| |_| |_|\___| 0 % V V V V V V V V V 0 |___/ 0 % % 0 0 0 Learners are the algorithms that distinguish machine learning prac- by Algolit % % tices from other types of prac- % tices. They are pattern finders, In machine learning Naive Bayes methods are simple probabilistic capable of crawling through data classifiers that are widely applied for spam filtering and decid- and generating some kind of spe- ing whether a text is positive or negative. cific 'grammar'. Learners are based on statistical techniques. Some They require a small amount of training data to estimate the nec- need a large amount of training essary parameters. They can be extremely fast compared to more data in order to function, others sophisticated methods. They are difficult to generalize, which can work with a small annotated means that they perform on specific tasks, demanding to be set. Some perform well in classifi- % trained with the same style of data that will be used to work cation tasks, like spam identifica- with afterwards. tion, others are better at predict- ing numbers, like temperatures, This game allows you to play along the rules of Naive Bayes. distances, stock market values, and While manually executing the code, you create your own playful so on. model that 'just works'. A word of caution is necessary: because you only train it with 6 sentences – instead of the minimum 2000 The terminology of machine learning – it is not representative at all! is not yet fully established. Depending on the field, whether --- statistics, computer science or the humanities, different terms are Concept & realisation: An Mertens used. Learners are also called classifiers. When we talk about Learners, we talk about the inter- % 0 % 0 0 0 % woven functions that have the ca- 0 0 0 0 0 % pacity to generate other functions, __ _ 0 evaluate and readjust them to fit 0 0 / /(_)_ __ ___ __ _ _ __ 0 the data. They are good at under- / / | | '_ \ / _ \/ _` | '__| standing and revealing patterns. 0 0 / /__| | | | | __/ (_| | | But they don't always distinguish 0 \____/_|_| |_|\___|\__,_|_| well which of the patterns should 0 __ 0 0 _ be repeated. 0 /__\ ___ __ _ _ __ ___ ___ ___(_) ___ _ __ / \/// _ \/ _` | '__/ _ \/ __/ __| |/ _ \| '_ \ In software packages, it is not al- 00 0 / _ \ __/ (_| | | | __/\__ \__ \ | (_) | | | | ways possible to distinguish the 0 0 \/ \_/\___|\__, |_| \___||___/___/_|\___/|_| |_| characteristic elements of the 0 0 |___/ 0 classifiers, because they are hid- 0 0 __ _ __ _ _ __ ___ ___ den in underlying modules or li- 0 / _` |/ _` | '_ ` _ \ / _ \ braries. Programmers can invoke | (_| | (_| | | | | | | __/ them using a single line of code. 0 \__, |\__,_|_| |_| |_|\___| 0 0 % For this exhibition, we therefore |___/ 00 developed two table games that show 0 0 0 0 in detail the learning process of simple, but frequently used classi- by Algolit fiers. Linear Regression is one of the best-known and best-understood algorithms in statistics and machine learning. It has been around for almost 200 years. It is an attractive model because the rep- % resentation is so simple. In statistics, linear regression is a statistical method that allows to summarize and study relation- ships between two continuous (quantitative) variables. 45 % % % %% % % By playing this game you will realize that as a player you have a % % % % lot of decisions to make. You will experience what it means to % % %% create a coherent dataset, to decide what is in and what is not % % % % in. If all goes well, you will feel the urge to change your data % % % in order to obtain better results. This is part of the art of ap- % %% % % % % % proximation that is at the basis of all machine learning prac- % % % tices. % % % % % % % % % % % % % % % % --- % % % % % % % % % Concept & realisation: An Mertens % % % % % %% % % 0 % 0 0 00 0 0 0 % 0 0 0 _____ _ _ 0 _ 0 _ % /__ \_ __ __ _(_) |_ ___ __| | ___ __| | / /\/ '__/ _` | | __/ _ \ / _` |/ _ \ / _` | % % 0 / / | | | (_| | | || __/ | (_| | __/ | (_| | 00 \/ |_| \__,_|_|\__\___| \__,_|\___| \__,_| % % 0 0 00 0 % _ _ _ 0 % ___ ___ _ _ _ __ ___ ___ _ __ | |_ __ _| |_(_) ___ % / _ \ / __| | | | '_ ` _ \ / _ \ '_ \| __/ _` | __| |/ _ \ % | (_) | (__| |_| | | | | | | __/ | | | || (_| | |_| | (_) | \___/ \___|\__,_|_| |_| |_|\___|_| |_|\__\__,_|\__|_|\___/ % 0 0 0 _ __ 0 | '_ \ 0 % 0 0 | | | | |_| |_| 0 0 0 % 0 0 % Traité de Documentation. Three algorithmic poems. by Rémi Forte, designer-researcher at L’Atelier national de recherche typographique, Nancy, France % serigraphy on paper, 60 × 80 cm, 25 ex., 2019, for sale at the % reception of the Mundaneum. The poems, reproduced in the form of three posters, are an algo- % rithmic and poetic re-reading of Paul Otlet's Traité de documen- tation. They are the result of an algorithm based on the mysteri- ous rules of human intuition. It has been applied to a fragment taken from Paul Otlet's book and is intended to be representative % of his bibliological practice. % For each fragment, the algorithm splits the text, words and punc- tuation marks are counted and reordered into a list. In each % line, the elements combine and exhaust the syntax of the selected fragment. Paul Otlet's language remains perceptible but exacer- bated to the point of absurdity. For the reader, the systematiza- % tion of the text is disconcerting and his reading habits are dis- rupted. % Built according to a mathematical equation, the typographical % composition of the poster is just as systematic as the poem. How- ever, friction occurs occasionally; loop after loop, the lines % extend to bite on the neighbouring column. Overlays are created and words are hidden by others. These telescopic handlers draw alternative reading paths. 46 CONTEXTUAL STORIES ABOUT LEARNERS --- Naive Bayes & Viagra --- Only after 150 years was the accusation refuted. Naive Bayes is a famous learner that performs well Fast forward to 1939, when Bayes' rule was still with little data. We apply it all the time. Chris- virtually taboo, dead and buried in the field of tian and Griffiths state in their book, Algorithms statistics. When France was occupied in 1940 by To Live By, that 'our days are full of small Germany, which controlled Europe's factories and data'. Imagine, for example, that you're standing farms, Winston Churchill's biggest worry was the at a bus stop in a foreign city. The other person U-boat peril. U-boat operations were tightly con- who is standing there has been waiting for 7 min- trolled by German headquarters in France. Each utes. What do you do? Do you decide to wait? And submarine received orders as coded radio messages if so, for how long? When will you initiate other long after it was out in the Atlantic. The mes- options? Another example. Imagine a friend asking sages were encrypted by word-scrambling machines, advice about a relationship. He's been together called Enigma machines. Enigma looked like a com- with his new partner for a month. Should he invite plicated typewriter. It was invented by the German the partner to join him at a family wedding? firm Scherbius & Ritter after the First World War, when the need for message-encoding machines had Having pre-existing beliefs is crucial for Naive become painfully obvious. Bayes to work. The basic idea is that you calcu- late the probabilities based on prior knowledge Interestingly, and luckily for Naive Bayes and the and given a specific situation. world, at that time, the British government and educational systems saw applied mathematics and The theorem was formulated during the 1740s by statistics as largely irrelevant to practical Thomas Bayes, a reverend and amateur mathemati- problem-solving. So the British agency charged cian. He dedicated his life to solving the ques- with cracking German military codes mainly hired tion of how to win the lottery. But Bayes' rule men with linguistic skills. Statistical data was was only made famous and known as it is today by seen as bothersome because of its detail-oriented the mathematician Pierre Simon Laplace in France a nature. So wartime data was often analysed not by bit later in the same century. For a long time af- statisticians, but by biologists, physicists, and ter La Place's death, the theory sank into obliv- theoretical mathematicians. None of them knew that ion until it was dug up again during the Second the Bayes rule was considered to be unscientific World War in an effort to break the Enigma code. in the field of statistics. Their ignorance proved fortunate. Most people today have come in contact with Naive Bayes through their email spam folders. Naive It was the now famous Alan Turing – a mathemati- Bayes is a widely used algorithm for spam detec- cian, computer scientist, logician, cryptoanalyst, tion. It is by coincidence that Viagra, the erec- philosopher and theoretical biologist – who used tile dysfunction drug, was approved by the US Food Bayes' rules probabilities system to design the & Drug Administration in 1997, around the same 'bombe'. This was a high-speed electromechanical time as about 10 million users worldwide had made machine for testing every possible arrangement free webmail accounts. The selling companies were that an Enigma machine would produce. In order to among the first to make use of email as a medium crack the naval codes of the U-boats, Turing sim- for advertising: it was an intimate space, at the plified the 'bombe' system using Baysian methods. time reserved for private communication, for an It turned the UK headquarters into a code-breaking intimate product. In 2001, the first SpamAssasin factory. The story is well illustrated in The Imi- programme relying on Naive Bayes was uploaded to tation Game, a film by Morten Tyldum dating from SourceForge, cutting down on guerilla email mar- 2014. keting. Reference --- A story about sweet peas --- Machine Learners, by Adrian MacKenzie, MIT Press, Cambridge, US, November 2017. Throughout history, some models have been invented by people with ideologies that are not to our lik- ing. The idea of regression stems from Sir Francis --- Naive Bayes & Enigma --- Galton, an influential nineteenth-century scien- tist. He spent his life studying the problem of This story about Naive Bayes is taken from the heredity – understanding how strongly the charac- book 'The Theory That Would Not Die', written by teristics of one generation of living beings mani- Sharon Bertsch McGrayne. Among other things, she fested themselves in the following generation. He describes how Naive Bayes was soon forgotten after established the field of eugenics, defining it as the death of Pierre Simon Laplace, its inventor. ‘the study of agencies under social control that The mathematician was said to have failed to may improve or impair the racial qualities of fu- credit the works of others. Therefore, he suffered ture generations, either physically or mentally'. widely circulated charges against his reputation. On Wikipedia, Galton is a prime example of scien- tific racism. 47 Galton initially approached the problem of hered- ity by examining characteristics of the sweet pea In 1962, he created the Perceptron, a model that plant. He chose this plant because the species can learns through the weighting of inputs. It was set self-fertilize. Daughter plants inherit genetic aside by the next generation of researchers, be- variations from mother plants without a contribu- cause it can only handle binary classification. tion from a second parent. This characteristic This means that the data has to be clearly separa- eliminates having to deal with multiple sources. ble, as for example, men and women, black and white. It is clear that this type of data is very Galton's research was appreciated by many intel- rare in the real world. When the so-called first lectuals of his time. In 1869, in Hereditary Ge- AI winter arrived in the 1970s and the funding de- nius, Galton claimed that genius is mainly a mat- creased, the Perceptron was also neglected. For ter of ancestry and he believed that there was a ten years it stayed dormant. When spring settled biological explanation for social inequality at the end of the 1980s, a new generation of re- across races. Galton even influenced his half- searchers picked it up again and used it to con- cousin Charles Darwin with his ideas. After read- struct neural networks. These contain multiple ing Galton's paper, Darwin stated, 'You have made layers of Perceptrons. That is how neural networks a convert of an opponent in one sense for I have saw the light. One could say that the current ma- always maintained that, excepting fools, men did chine learning season is particularly warm, but it not differ much in intellect, only in zeal and takes another winter to know a summer. hard work'. Luckily, the modern study of heredity managed to eliminate the myth of race-based ge- netic difference, something Galton tried hard to --- BERT --- maintain. Some online articles say that the year 2018 marked Galton's major contribution to the field was lin- a turning point for the field of Natural Language ear regression analysis, laying the groundwork for Processing (NLP). A series of deep-learning models much of modern statistics. While we engage with achieved state-of-the-art results on tasks like the field of machine learning, Algolit tries not question-answering or sentiment-classification. to forget that ordering systems hold power, and Google’s BERT algorithm entered the machine learn- that this power has not always been used to the ing competitions of last year as a sort of 'one benefit of everyone. Machine learning has inher- model to rule them all'. It showed a superior per- ited many aspects of statistical research, some formance over a wide variety of tasks. less agreeable than others. We need to be atten- tive, because these world views do seep into the BERT is pre-trained; its weights are learned in algorithmic models that create new orders. advance through two unsupervised tasks. This means BERT doesn’t need to be trained from scratch for References each new task. You only have to finetune its http://galton.org/letters/darwin/correspon- weights. This also means that a programmer wanting dence.htm to use BERT, does not know any longer what parame- https://www.tandfonline.com/doi/ful- ters BERT is tuned to, nor what data it has seen l/10.1080/10691898.2001.11910537 to learn its performances. http://www.paramoulipist.be/?p=1693 BERT stands for Bidirectional Encoder Representa- tions from Transformers. This means that BERT al- --- Perceptron --- lows for bidirectional training. The model learns the context of a word based on all of its sur- We find ourselves in a moment in time in which roundings, left and right of a word. As such, it neural networks are sparking a lot of attention. can differentiate between 'I accessed the bank ac- But they have been in the spotlight before. The count' and 'I accessed the bank of the river'. study of neural networks goes back to the 1940s, when the first neuron metaphor emerged. The neuron Some facts: is not the only biological reference in the field - BERT_large, with 345 million parameters, is the of machine learning - think of the word corpus or largest model of its kind. It is demonstrably su- training. The artificial neuron was constructed in perior on small-scale tasks to BERT_base, which close connection to its biological counterpart. uses the same architecture with 'only' 110 million parameters. Psychologist Frank Rosenblatt was inspired by fel- - to run BERT you need to use TPUs. These are the low psychologist Donald Hebb's work on the role of Google's processors (CPUs) especially engineered neurons in human learning. Hebb stated that 'cells for TensorFLow, the deep-learning platform. TPU's that fire together wire together'. His theory now renting rates range from $8/hr till $394/hr. Algo- lies at the basis of associative human learning, lit doesn't want to work with off-the-shelf pack- but also unsupervised neural network learning. It ages, we are interested in opening up the black- moved Rosenblatt to expand on the idea of the ar- box. In that case, BERT asks for quite some sav- tificial neuron. ings in order to be used. 48 █▒░░ ▓▒█░░▒▓███▀▒░ ░▒ ▒ ░ ▒▒▓██▒ ░░ ▒░ ░ ▒ ░▓ ░▒▓ ▒ ▒█░░▒█▓▒░▓▒ ▒▓▒▒░ ▒ ▒▒ ▓▒░ ▒░▓ █ ▒░ █░█ ▓▒░ ▒▓░░ ▓ ▒░ ▒▒ ░▒░▒▓ ░ ░ ▒ ░ ▒ ▒ ░ ░ ░▒░ ░▒ ░ ▒ ░▒░▓░ ▒ ░ ▒ ░▒░ ░ ░▒ ▒░░ ░ ▒ ░░▓ ░ ▓▓░░ ░░▒▒▓░ ░ ▒ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░░ ░ ▒ ░ ░ ░ ░ ░ ░░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ing will be fed examples sentation of text used * CONSTANT ░ of spam and real mes- in Natural Language Pro- Constant is a non-prof- ░ ░ ░ ░ sages. These examples cessing (NLP). In this it, artist-run organisa- ░ ░ ░ ░ are entries, or rows model, a text is repre- tion based in Brussels ░ ░ from the dataset with a sented as a collection since 1997 and active in ░ ░ label, spam or non-spam. of its unique words, the fields of art, media ░ GLOSSARY ░ The labelling of a disregarding grammar, and technology. Algolit ░ dataset is work executed punctuation and even started as a project of ░ ░ ░ by humans, they pick a word order. The model Constant in 2012. ░ ░ ░ ░ label for each row of transforms the text into http://constantvzw.org ░ the dataset. To ensure a list of words and how ░ the quality of the la- many times they're used * DATA WORKERS ░ bels multiple annotators in the text, or quite Artificial intelligences see the same row and literally a bag of that are developed to This is a non-exhaustive have to give the same words. Bag of words is serve, entertain, record wordlist, based on terms label before an example often used as a base- and know about humans. that are frequently used is included in the line, on which the new The work of these ma- in the exhibition. It training data. model has to perform chinic entities is usu- might help visitors who better. ally hidden behind in- are not familiar with * AI OR ARTIFICIAL IN- terfaces and patents. In the vocabulary related telligences * CHARACTER N-GRAM the exhibition, algo- to the field of Natural In computer science, ar- A technique that is used rithmic storytellers Language Processing tificial intelligence for authorship recogni- leave their invisible (NLP), Algolit or the (AI), sometimes called tion. When using charac- underworld to become in- Mundaneum. machine intelligence, is ter n-grams, texts are terlocutors. intelligence demon- considered as sequences * ALGOLIT strated by machines, in of characters. Let's * DUMP A group from Brussels contrast to the natural consider the character According to the English involved in artistic re- intelligence displayed trigram. All the over- dictionary, a dump is an search on algorithms and by humans and other ani- lapping sequences of accumulation of refused literature. Every month mals. Computer science three characters are and discarded materials they gather to experi- defines AI research as isolated. For example, or the place where such ment with code and texts the study of ‘intelli- the character 3-grams of materials are dumped. In that are published under gent agents’. Any device 'Suicide', would be, computing a dump refers free licenses. that perceives its envi- 'Sui', 'uic', 'ici', to a ‘database dump’, a http://www.algolit.net ronment and takes ac- 'cid' etc. Patterns record of data from a tions that maximize its found with character database used for easy * ALGOLITERARY chance of successfully n-grams focus on stylis- downloading or for back- Word invented by Algolit achieving its goals. tic choices that are un- ing up a database. for works that explore More specifically, Ka- consciously made by the Database dumps are often the point of view of the plan and Haenlein define author. The patterns re- published by free soft- algorithmic storyteller. AI as ‘a system’s abil- main stable over the ware and free content What kind of new forms ity to correctly inter- full length of the text. projects, such as of storytelling do we pret external data, to Wikipedia, to allow re- make possible in dia- learn from such data, * CLASSICAL MACHINE use or forking of the logue with machinic and to use those learn- Learning database. agencies? ings to achieve specific Naive Bayes, Support goals and tasks through Vector Machines and Lin- * FEATURE ENGINEERING * ALGORITHM flexible adaptation’. ear Regression are The process of using do- A set of instructions in Colloquially, the term called classical machine main knowledge of the a specific programming ‘artificial intelli- learning algorithms. data to create features language, that takes an gence’ is used to de- They perform well when that make machine learn- input and produces an scribe machines that learning with small ing algorithms work. output. mimic ‘cognitive’ func- datasets. But they often This means that a human tions that humans asso- require complex Readers. needs to spend time on a * ANNOTATION ciate with other human The task the Readers do, deep exploratory data The annotation process minds, such as ‘learn- is also called feature- analysis of the dataset. is a crucial step in su- ing’ and ‘problem solv- engineering (see below). In Natural Language Pro- pervised machine learn- ing’. (Wikipedia) This means that a human cessing (NLP) features ing where the algorithm needs to spend time on a can be the frequency of is given examples of * BAG OF WORDS deep exploratory data words or letters, but what it needs to learn. The bag-of-words model analysis of the dataset. also syntactical ele- A spam filter in train- is a simplifying repre- ments like nouns, adjec- 49 █▒░░ ▓▒█░░▒▓███▀▒░ ░▒ ▒ ░ ▒▒▓██▒ ░░ ▒░ ░ ▒ ░▓ ░▒▓ ▒ ▒█░░▒█▓▒░▓▒ ▒▓▒▒░ ▒ ▒▒ ▓▒░ ▒░▓ █ ▒░ █░█ ▓▒░ ▒▓░░ ▓ ▒░ ▒▒ ░▒░▒▓ ░ ░ ▒ ░ ▒ ▒ ░ ░ ░▒░ ░▒ ░ ▒ ░▒░▓░ ▒ ░ ▒ ░▒░ ░ ░▒ ▒░░ ░ ▒ ░░▓ ░ ▓▓░░ ░░▒▒▓░ ░ ▒ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░░ ░ ▒ ░ ░ ░ ░ ░ ░░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ tives, or verbs. The to make these as free as from Virginia Woolf's nating between face and most significant fea- possible, in long-last- entire work to all ver- non-face. The jobs tures for the task to be ing, open formats that sions of Terms of Ser- posted on this platform solved, must be care- can be used on almost vice published by Google are often paid less than fully selected and any computer. As of since its existence. a cent per task. Tasks passed over to the clas- 23 June 2018, Project that are more complex or sical machine learning Gutenberg reached 57,000 * MACHINE LEARNING MOD- require more knowledge algorithm. items in its collection els can be paid up to sev- of free eBooks. Algorithms based on eral cents. Many aca- * FLOSS OR FREE LIBRE (Wikipedia) statistics, mainly used demic researchers use Open Source Software to analyse and predict Mechanical Turk as an Software that anyone is * HENRI LA FONTAINE situations based on ex- alternative to have freely licensed to use, Henri La Fontaine isting cases. In this their students execute copy, study, and change (1854-1943) is a Belgian exhibition we focus on these tasks. in any way, and the politician, feminist and machine learning models source code is openly pacifist. He was awarded for text processing or * MUNDANEUM shared so that people the Nobel Peace Prize in Natural language pro- In the late nineteenth are encouraged to volun- 1913 for his involvement cessing', in short, century two young Bel- tarily improve the de- in the International 'nlp'. These models have gian jurists, Paul Otlet sign of the software. Peace Bureau and his learned to perform a (1868-1944), ‘the father This is in contrast to contribution to the or- specific task on the ba- of documentation’, and proprietary software, ganization of the peace sis of existing texts. Henri La Fontaine where the software is movement. In 1895, to- The models are used for (1854-1943), statesman under restrictive copy- gether with Paul Otlet, search engines, machine and Nobel Peace Prize right licensing and the he created the Interna- translations and sum- winner, created The Mun- source code is usually tional Bibliography In- maries, spotting trends daneum. The project hidden from the users. stitute, which became in new media networks aimed at gathering all (Wikipedia) the Mundaneum. Within and news feeds. They in- the world’s knowledge this institution, which fluence what you get to and file it using the * GIT aimed to bring together see as a user, but also Universal Decimal Clas- A software system for all the world's knowl- have their word to say sification (UDC) system tracking changes in edge, he contributed to in the course of stock that they had invented. source code during soft- the development of the exchanges worldwide, the ware development. It is Universal Decimal Clas- detection of cybercrime * NATURAL LANGUAGE designed for coordinat- sification (CDU) system. and vandalism, etc. A natural language or ing work among program- ordinary language is any mers, but it can be used * KAGGLE * MARKOV CHAIN language that has to track changes in any An online platform where Algorithm that scans the evolved naturally in hu- set of files. Before users find and publish text for the transition mans through use and starting a new project, data sets, explore and probability of letter or repetition without con- programmers create a build machine learning word occurrences, re- scious planning or pre- "git repository" in models, work with other sulting in transition meditation. Natural lan- which they will publish data scientists and ma- probability tables which guages can take differ- all parts of the code. chine learning engi- can be computed even ent forms, such as The git repositories of neers, and enter compe- without any semantic or speech or signing. They Algolit can be found on titions to solve data grammatical natural lan- are different from con- https://gitlab.con- science challenges. guage understanding. It structed and formal lan- stantvzw.org/algolit. About half a million can be used for analyz- guages such as those data scientists are ac- ing texts, but also for used to program comput- * GUTENBERG.ORG tive on Kaggle. It was recombining them. It is ers or to study logic. Project Gutenberg is an founded by Goldbloom and is widely used in spam (Wikipedia) online platform run by Ben Hamner in 2010 and generation. volunteers to ‘encourage acquired by Google in * NLP OR NATURAL LAN- the creation and distri- March 2017. * MECHANICAL TURK guage Processing bution of eBooks’. It The Amazon Mechanical Natural language pro- was founded in 1971 by * LITERATURE Turk is an online plat- cessing (NLP) is a col- American writer Michael Algolit understands the form for humans to exe- lective term referring S. Hart and is the old- notion of literature in cute tasks that algo- to automatic computa- est digital library. the way a lot of other rithms cannot. Examples tional processing of hu- Most of the items in its experimental authors do. include annotating sen- man languages. This in- collection are the full It includes all linguis- tences as being positive cludes algorithms that texts of public domain tic production, from the or negative, spotting take human-produced text books. The project tries dictionary to the Bible, number plates, discrimi- as input, and attempt to 50 █▒░░ ▓▒█░░▒▓███▀▒░ ░▒ ▒ ░ ▒▒▓██▒ ░░ ▒░ ░ ▒ ░▓ ░▒▓ ▒ ▒█░░▒█▓▒░▓▒ ▒▓▒▒░ ▒ ▒▒ ▓▒░ ▒░▓ █ ▒░ █░█ ▓▒░ ▒▓░░ ▓ ▒░ ▒▒ ░▒░▒▓ ░ ░ ▒ ░ ▒ ▒ ░ ░ ░▒░ ░▒ ░ ▒ ░▒░▓░ ▒ ░ ▒ ░▒░ ░ ░▒ ▒░░ ░ ▒ ░░▓ ░ ▓▓░░ ░░▒▒▓░ ░ ▒ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░░ ░ ▒ ░ ░ ░ ░ ░ ░░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ generate text that re- tentielle (Workspace for manually define rules the training material, sembles it. Potential Literature). for them. As prediction and adapt it to the ma- Oulipo was created in models they are then chine's task. It doesn't * NEURAL NETWORKS Paris by the French called rule-based mod- make sense to train a Computing systems in- writers Raymond Queneau els, opposed to statis- machine with nineteenth- spired by the biological and François Le Lion- tical models. Rule-based century novels if its neural networks that nais. They rooted their models are handy for mission is to analyze constitute animal practice in the European tasks that are specific, tweets. brains. The neural net- avant-garde of the twen- like detecting when a work itself is not an tieth century and in the scientific paper con- * UNSUPERVISED MACHINE algorithm, but rather a experimental tradition cerns a certain mole- Learning Models framework for many dif- of the 1960s. For cule. With very little Unsupervised machine ferent machine learning Oulipo, the creation of sample data, they can learning models don't algorithms to work to- rules becomes the condi- perform well. need the step of annota- gether and process com- tion to generate new tion of the data by hu- plex data inputs. Such texts, or what they call * SENTIMENT ANALYSIS mans. This saves a lot systems ‘learn’ to per- potential literature. Also called 'opinion of time, energy, money. form tasks by consider- Later, in 1981, they mining' A basic task Instead, they need a ing examples, generally also created ALAMO, Ate- in sentiment analysis large amount of training without being programmed lier de littérature as- is classifying given data, which is not al- ways available and can rules. For example, in tique et les ordinateurs or neutral. Advanced, take a long cleaning image recognition, they (Workspace for litera- 'beyond polarity' time beforehand. might learn to identify ture assisted by maths sentiment images that contain cats and computers). classification looks, * WORD EMBEDDINGS Language modelling tech- ages that have been man- * PAUL OTLET states such as 'angry' niques that through mul- ually labeled as ‘cat’ Paul Otlet (1868 – 1944) 'sad' and 'happy' tiple mathematical oper- or ‘no cat’ and using was a Belgian author, Sentiment ations of counting and ordering, plot words cats in other images. lawyer and peace ac- to user materials such into a multi-dimensional They do this without any tivist; he is one of as reviews and survey vector space. When em- prior knowledge about several people who have responses, comments bedding words, they cats, for example, that been considered the fa- and posts on social transform from being they have fur, tails, ther of information sci- media, and healthcare distinct symbols into mathematical objects that can be multiplied, tomatically generate created the Universal to customer service, divided, added or sub- identifying characteris- Decimal Classification, from stock exchange stracted. tics from the learning that was widespread in transactions to clinical material that they libraries. Together with medicine. * WORDNET process. (Wikipedia) Henri La Fontaine he Wordnet is a combination created the Palais Mon- * SUPERVISED MACHINE of a dictionary and a * OPTICAL CHARACTER dial (World Palace), learning models thesaurus that can be Recognition (OCR) later, the Mundaneum to For the creation of su- read by machines. Ac- Computer processes for house the collections pervised machine learn- cording to Wikipedia it translating images of and activities of their ing models, humans anno- was created in the Cog- scanned texts into ma- various organizations tate sample text with nitive Science Labora- nipulable text files. and institutes. labels before feeding it tory of Princeton to a machine to learn. University starting in * ORACLE * PYTHON Each sentence, paragraph 1985. The project was Oracles are prediction The main programming or text is judged by at initially funded by the or profiling machines, a language that is glob- least 3 annotators US Office of Naval Re- specific type of algo- ally used for natural whether it is spam or search and later also by rithmic models, mostly language processing, was not spam, positive or other US government based on statistics. invented in 1991 by the negative etc. agencies including They are widely used in Dutch programmer Guido DARPA, the National smartphones, computers, Van Rossum. * TRAINING DATA Science Foundation, the tablets. Machine learning algo- Disruptive Technology * RULE-BASED MODELS rithms need guidance. In Office (formerly the Ad- * OULIPO Oracles can be created order to separate one vanced Research and Oulipo stands for Ou- using different tech- thing from another, they Development Activity), vroir de litterature po- niques. One way is to need texts to extract and REFLEX. 51 should carefully choose █▒░░ ▓▒█░░▒▓███▀▒░ ░▒ ▒ ░ ▒▒▓██▒ ░░ ▒░ ░ ▒ ░▓ ░▒▓ ▒ ▒█░░▒█▓▒░▓▒ ▒▓▒▒░ ▒ ▒▒ ▓▒░ ▒░▓ █ ▒░ █░█ ▓▒░ ▒▓░░ ▓ ▒░ ▒▒ ░▒░▒▓ ░ ░ ▒ ░ ▒ ▒ ░ ░ ░▒░ ░▒ ░ ▒ ░▒░▓░ ▒ ░ ▒ ░▒░ ░ ░▒ ▒░░ ░ ▒ ░░▓ ░ ▓▓░░ ░░▒▒▓░ ░ ▒ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░░ ░ ▒ ░ ░ ░ ░ ░ ░░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ 52 ◝ humans learn with machines ◜ ◡ machines learn from machines ◞ ◡ machines learn with humans ◞ ◝ humans learn from machines ◟ ◜ machines learn with machines ◠ ◜ machines learn from humans ◟ ◠ humans learn with humans ◞ ◝ humans learn from humans ◞ ◠ humans learn with machines ◟ ◡ mac ines learn from machines ◡ ◡ machines learn with humans ◟ ◡ humans learn from machines ◝ ◟ achines learn with machines ◠ ◝ machines learn from humans ◜ ◝ humans learn with humans ◞ ◞ humans learn from humans ◡ ◞ humans learn with machines ◠ ◠ machines learn from machines ◠ machines learn with humans ◞ ◜ humans learn from machines ◜ ◠ machines learn with machines ◝ ◜ machines learn from humans ◜ ◠ humans learn with humans ◝ ◟ humans learn from humans ◞ ◜ humans learn with machines ◡ ◡ machines learn from machines ◡ ◟ machines learn with humans ◠ ◠ humans learn from machines ◡ ◜ machines learn with machines ◜ ◟ machines learn from umans ◟ ◞ humans learn with humans ◞ ◟ humans learn from humans ◜ ◠ humans learn with ma hines ◜ ◠ machines learn from machines ◝ ◠ machines learn with humans ◝ ◞ humans learn f om machines ◝ ◡ machines learn with machines ◜ ◡ machines learn from humans ◜ ◠ humans l arn with humans ◡ ◡ humans learn from humans ◝ ◞ humans learn with machines ◟ ◡ machines learn from machines ◜ ◜ machines learn with humans ◠ ◞ humans learn from machines ◝ ◠ ma hines learn with machines ◟ ◟ machines learn from humans ◝ ◠ humans learn with humans ◟ humans learn from humans ◝ ◜ humans learn with machines ◠ ◝ machines learn from machines ◞ ◠ machines learn with humans ◝ ◟ humans learn from machines ◟ ◞ machines learn with machines ◜ ◞ machines learn from humans ◞ ◡ humans learn with humans ◠ ◞ humans learn from human ◠ ◜ humans learn with machines ◡ ◞ machines learn from machines ◜ ◠ machines learn w th humans ◡ ◝ humans learn from machines ◝ ◟ machines learn with machines ◠ ◠ machine learn from humans ◞ ◟ humans learn with humans ◠ ◞ humans learn from humans ◠ ◠ huma s learn with machines ◡ ◡ machines learn from machines ◜ ◞ machines learn with humans ◡ ◟ humans learn from machines ◜ ◜ machines learn with machines ◜ ◝ machines learn from human ◜ ◠ humans learn with humans ◝ ◡ humans learn from humans ◡ ◞ humans learn with mach nes ◜ ◝ machines learn from machines ◝ ◜ machines learn with humans ◞ ◜ humans learn rom machines ◞ ◝ machines learn with machines ◞ ◜ machines learn from humans ◡ ◞ huma s learn with humans ◟ ◜ humans learn from humans ◞ ◡ humans learn with machines ◝ ◝ m chines learn from machines ◜ ◟ machines learn with humans ◡ ◟ humans learn from machines ◠ ◝ machines learn with machines ◜ ◡ machines learn from humans ◞ ◝ humans learn with huma s ◝ ◠ humans learn from humans ◞ ◜ humans learn with machines ◠ ◝ machines learn from machines ◟ ◡ machines learn with humans ◝ ◝ humans learn from machines ◞ ◞ machines l arn with machines ◠ ◠ machines learn from humans ◠ ◡ humans learn with humans ◜ ◜ hum ns learn from humans ◞ ◞ humans learn with machines ◡ ◝ machines learn from machines ◟ ◝ machines learn with humans ◠ ◟ machines learn with humans ◠ ◜ machines learn from machines ◡ ◜ humans learn with machines ◞ ◟ humans learn from humans ◜ ◡ humans learn with humans ◝ ◞ machines learn from humans ◜ ◝ machines learn with machines ◜ ◠ human learn from machines ◡ ◝ machines learn with humans ◝ ◜ machines learn from machines ◜ ◞ humans learn with machines ◠ ◝ humans learn from humans ◠ ◝ humans learn with humans ◞ ◡ machines learn from humans ◜ ◝ machines learn with machines ◠ ◟ humans learn from machi es ◜ ◟ machines learn with humans ◝ ◝ machines learn from machines ◞ ◜ humans learn w th machines ◝ ◡ humans learn from humans ◝ ◝ humans learn with humans ◠ ◠ machines le rn from humans ◝ ◡ machines learn with machines ◡ ◡ humans learn from machines ◠ ◞ ma hines learn with humans ◝ ◜ machines learn from machines ◜ ◝ humans learn with machines ◠ ◞ humans learn from humans ◝ ◡ humans learn with humans ◞ ◡ machines learn from humans ◟ ◟ machines learn with machines ◝ ◝ humans learn from machines ◜ ◟ machines learn with umans ◡ ◝ machines learn from machines ◡ ◝ humans learn with machines ◞ ◜ humans lear from humans ◜ ◝ humans learn with humans ◞ ◡ machines learn from humans ◝ ◡ machines learn with machines ◞ ◟ humans learn from machines ◜ ◞ machines learn with humans ◟ ◡ machines learn from machines ◜ ◝ humans learn with machines ◠ ◠ humans learn from humans ◠ ◝ humans learn with humans ◟ ◞ machines learn from humans ◝ ◠ machines learn with machines ◜ ◟ humans learn from machines ◠ ◝ machines learn with humans ◝ ◜ machines learn from ma hines ◟ ◟ humans learn with machines ◞ ◡ humans learn from humans ◝ ◝ humans learn with umans ◡ ◝ machines learn from humans ◝ ◡ machines learn with machines ◟ ◞ humans learn f om machines ◝ ◟ machines learn with humans ◝ ◜ machines learn from machines ◝ ◠ humans l arn with machines ◠ ◠ humans learn from humans ◟ ◜ humans learn with humans ◟ ◝ machines learn from humans ◡ ◡ machines learn with machines ◜ ◜ humans learn from machines ◠ ◟ ma hines learn with humans ◞ ◜ machines learn from machines ◠ ◜ humans learn with machines ◜ ◞ humans learn from humans ◝ ◟ humans learn with humans ◟ ◞ machines learn from humans ◟ ◝ machines learn with machines ◡ ◜ humans learn from machines ◠ ◠ machines learn with humans ◞ ◡ machines learn from machines ◟ ◝ humans learn with machines ◜ ◞ humans learn from huma s ◝ ◞ humans learn with humans ◜ ◟ machines learn from humans ◜ ◞ machines learn with ma hines ◝ ◞ humans learn from machines ◝ ◜ machines learn with humans ◟ ◜ machines learn from machines ◡ ◟ humans learn with machines ◞ ◠ humans learn from humans ◞ ◟ humans learn with umans ◠ ◜ machines learn from humans ◡ ◠ machines learn with machines ◠ ◝ humans learn from machines ◠ ◜ machines learn with humans ◞ ◠ machines learn from machines ◞ ◠ humans learn w th machines ◜ ◟ humans learn from humans ◝ ◠ humans learn with humans ◝ ◟ machines learn from humans ◜ ◜ machines learn with machines ◠ ◞ humans learn from machines ◠ ◡ machines learn with