Browse Source

publishing the code of a CC version of wiki-to-print

hacking-docs
mb 1 year ago
commit
40b86ee7dc
  1. 115
      LICENSE.txt
  2. 56
      README.md
  3. 11
      command-line/Makefile
  4. 34
      command-line/README.md
  5. 19
      command-line/css/baseline.css
  6. 180
      command-line/css/pagedjs.css
  7. 22
      command-line/css/print.css
  8. 31061
      command-line/js/paged.js
  9. 31107
      command-line/js/paged.polyfill.js
  10. 17
      command-line/templates/template.html
  11. 12
      command-line/templates/template.inspect.html
  12. 226
      command-line/update.py
  13. 53
      wiki-to-print.Common.css.example
  14. 104
      wiki-to-print.Common.js.example
  15. 73
      wiki-to-print.nginx.example
  16. 9
      wiki-to-print/Makefile
  17. 14
      wiki-to-print/README.md
  18. 432
      wiki-to-print/api.py
  19. 19
      wiki-to-print/config.json
  20. 5
      wiki-to-print/config.py
  21. 20
      wiki-to-print/requirements.txt
  22. 19
      wiki-to-print/static/css/baseline.css
  23. 83
      wiki-to-print/static/css/main.css
  24. 214
      wiki-to-print/static/css/pagedjs.css
  25. 74
      wiki-to-print/static/css/preview.css
  26. 32667
      wiki-to-print/static/js/paged.esm.js
  27. 32682
      wiki-to-print/static/js/paged.js
  28. 32726
      wiki-to-print/static/js/paged.polyfill.js
  29. 20
      wiki-to-print/templates/base.html
  30. 3
      wiki-to-print/templates/dynamic_css.css
  31. 28
      wiki-to-print/templates/index.html
  32. 37
      wiki-to-print/templates/inspect.html
  33. 26
      wiki-to-print/templates/pagedjs.html
  34. 126
      wiki-to-print/web-interface.py

115
LICENSE.txt

@ -0,0 +1,115 @@
[ Copyleft Attitude with a difference ]
COLLECTIVE CONDITIONS FOR RE-USE (CC4r)
version 1.0
============
REMINDER TO CURRENT AND FUTURE AUTHORS:
The authored work released under the CC4r was never yours to begin with. The CC4r considers authorship to be part of a collective cultural effort and rejects authorship as ownership derived from individual genius. This means to recognize that it is situated in social and historical conditions and that there may be reasons to refrain from release and re-use.
=============
PREAMBLE
The CC4r articulates conditions for re-using authored materials. This document is inspired by the principles of Free Culture – with a few differences. You are invited to copy, distribute, and transform the materials published under these conditions, and to take the implications of (re-)use into account.
The CC4r understands authorship as inherently collaborative and already-collective. It applies to hybrid practices such as human-machine collaborations and other-than-human contributions. The legal framework of copyright ties authorship firmly in property and individual human creation, and prevents more fluid modes of authorial becoming from flourishing. Free Culture and intersectional, feminist, anti-colonial work reminds us that there is no tabula rasa, no original or single author; that authorial practice exist within a web of references.
The CC4r favours re-use and generous access conditions. It considers hands-on circulation as a necessary and generative activation of current, historical and future authored materials. While you are free to (re-)use them, you are not free from taking the implications from (re-)use into account.
The CC4r troubles the binary approach that declares authored works either ‘open’ or ‘closed'. It tries to address how a universalist approach to openness such as the one that Free licenses maintain, has historically meant the appropriation of marginalised knowledges. It is concerned with the way Free Culture, Free Licenses and Open Access do not account for the complexity and porosity of knowledge practices and their circulation, nor for the power structures active around it. This includes extractive use by software giants and commercial on-line platforms that increasingly invest into and absorb Free Culture.
The CC4r asks CURRENT and FUTURE AUTHORS, as a collective, to care together for the implications of appropriation. To be attentive to the way re-use of materials might support or oppress others, even if this will never be easy to gauge. This implies to consider the collective conditions of authorship.
The CC4r asks you to be courageous with the use of materials that are being licensed under the CC4r. To discuss them, to doubt, to let go, to change your mind, to experiment with them, to give back to them and to take responsibility when things might go wrong.
Considering the Collective Conditions for (re-)use involves inclusive crediting and speculative practices for referencing and resourcing. To consider the circulation of materials on commercial platforms as participating in extractive data practices; platform capitalism appropriates and abuses collective authorial practice. To take into account that the defaults of openness and transparency have different consequences in different contexts. To consider the potential necessity for opacity when accessing and transmitting knowledge, especially when it involves materials that matter to marginalized communities.
This document was written in response to the Free Art License (FAL) in a process of coming to terms with the colonial structuring of knowledge production. It emerged out of concerns with the way Open Access and Free Culture ideologies by foregrounding openness and freedom as universal principles might replicate some of the problems with conventional copyright.
DEFINITIONS
-----------
« LEGAL AUTHOR » In the CC4r, LEGAL AUTHOR is used for the individual that is assigned as "author" by conventional copyright. Even if the authored work was never theirs to begin with, he or she is the only one that is legally permitted to license a work under a CC4r. This license is therefore not about liability, or legal implications. It cares about the ways copyright contributes to structural inequalities.
« CURRENT AUTHOR » can be used for individuals and collectives. It is the person, collective or other that was involved in generating the work created under a CC4r license. CURRENT and FUTURE AUTHOR are used to avoid designations that overly rely on concepts of 'originality' and insist on linear orders of creation.
« FUTURE AUTHOR » can be used for individuals and collectives. They want to use the work under CC4r license and are held to its conditions. All future authors are considered coauthors, or anauthors. They are anauthorized because this license provides them with an unauthorized authorization.
« LICENSE » due to its conditional character, this document might actually not qualify as a license. It is for sure not a Free Culture License. see also: UNIVERSALIST OPENNESS.
« (RE-)USE » the CC4r opted for bracketing "RE" out of necessity to mess up the time-space linearity of the original.
« OPEN <-> CLOSED » the CC4r operates like rotating doors... it is a swinging license, or a hinged license.
« UNIVERSALIST OPENNESS » the CC4r tries to propose an alternative to universalist openness. A coming to terms with the fact that universal openness is "safe" only for some.
0. CONDITIONS
The invitation to (re-)use the work licenced under CC4r applies as long as the FUTURE AUTHOR is convinced that this does not contribute to oppressive arrangements of power, privilege and difference. These may be reasons to refrain from release and re-use.
If it feels paralyzing to decide whether or not these conditions apply, it might point at the need to find alternative ways to activate the work. In case of doubt, consult for example https://constantvzw.org/wefts/orientationspourcollaboration.en.html
1. OBJECT
The aim of this license is to articulate collective conditions for re-use.
2. SCOPE
The work licensed under the CC4r is reluctantly subject to copyright law. By applying CC4r, the legal author extends its rights and invites others to copy, distribute, and modify the work.
2.1 INVITATION TO COPY (OR TO MAKE REPRODUCTIONS)
When the conditions under 0. apply, you are invited to copy this work, for whatever reason and with whatever technique.
2.2 INVITATION TO DISTRIBUTE, TO PERFORM IN PUBLIC
As long as the conditions under 0. apply, you are invited to distribute copies of this work; modified or not, whatever the medium and the place, with or without any charge, provided that you:
- attach this license to each of the copies of this work or indicate where the license can be found.
- make an effort to account for the collective conditions of the work, for example what contributions were made to the modified work and by whom, or how the work could continue.
- specify where to access other versions of the work.
2.3 INVITATION TO MODIFY
As long as the conditions under 0. apply, you are invited to make future works based on the current work, provided that you:
- observe all conditions in article 2.2 above, if you distribute future works;
- indicate that the work has been modified and, if possible, what kind of modifications have been made.
- distribute future works under the same license or any compatible license.
3. INCORPORATION OF THE WORK
Incorporating this work into a larger work (i.e., database, anthology, compendium, etc.) is possible. If as a result of its incorporation, the work can no longer be accessed apart from its appearance within the larger work, incorporation can only happen under the condition that the larger work is as well subject to the CC4r or to a compatible license.
4. COMPATIBILITY
A license is compatible with the CC4r provided that:
- it invites users to take the implications of their appropriation into account;
- it invites to copy, distribute, and modify copies of the work including for commercial purposes and without any other restrictions than those required by the other compatibility criteria;
- it ensures that the collective conditions under which the work was authored are attributed unless not desirable, and access to previous versions of the work is provided when possible;
- it recognizes the CC4r as compatible (reciprocity);
- it requires that changes made to the work will be subject to the same license or to a license which also meets these compatibility criteria.
5. LEGAL FRAMEWORK
Because of the conditions mentioned under 0., this is not a Free License. It is reluctantly formulated within the framework of both the Belgian law and the Berne Convention for the Protection of Literary and Artistic Works.
“We recognize that private ownership over media, ideas, and technology is rooted in European conceptions of property and the history of colonialism from which they formed. These systems of privatization and monopolization, namely copyright and patent law, enforce the systems of punishment and reward which benefit a privileged minority at the cost of others’ creative expression, political discourse, and cultural survival. The private and public institutions, legal frameworks, and social values which uphold these systems are inseparable from broader forms of oppression. Indigenous people, people of color, queer people, trans people, and women are particularly exploited for their creative and cultural resources while hardly receiving any of the personal gains or legal protections for their work. We also recognize that the public domain has jointly functioned to compliment the private, as works in the public domain may be appropriated for use in proprietary works. Therefore, we use copyleft not only to circumvent the monopoly granted by copyright, but also to protect against that appropriation.” [Decolonial Media License https://freeculture.org/About/license]
6. YOUR RESPONSIBILITIES
The invitation to use the work as defined by the CC4r (invitation to copy, distribute, modify) implies to take the implications of the appropriation of the materials into account.
7. DURATION OF THE LICENSE
This license takes effect as of the moment that the FUTURE AUTHOR accepts the invitation of the CURRENT AUTHOR. The act of copying, distributing, or modifying the work constitutes a tacit agreement. This license will remain in effect for the duration of the copyright which is attached to the work. If you do not respect the terms of this license, the invitation that it confers is void.
If the legal status or legislation to which you are subject makes it impossible for you to respect the terms of this license, you may not make use of the rights which it confers.
8. VARIOUS VERSIONS OF THE LICENSE
You are invited to reformulate this license by way of new, renamed versions. [link to license on gitlab]. You can of course make reproductions and distribute this license verbatim (without any changes).
USER GUIDE
– How to use the CC4r?
To apply the CC4r, you need to mention the following elements:
[Name of the legal author, title, date of the work. When applicable, names of authors of the common work and, if possible, where to find other versions of the work].
Copyleft with a difference: This is a collective work, you are invited to copy, distribute, and modify it under the terms of the CC4r [link to license].
Short version: Legal author=name, date of work (? ask SD). CC4r [link to license]
– Why use the CC4r?
1. To remind yourself and others that you do not own authored works
2. To not allow copyright to hinder works to evolve, to be extended, to be transformed
3. To allow materials to circulate as much as they need to
4. Because the CC4r offers a legal framework to disallow mis-appropriation by insisting on inclusive attribution. Nobody can take hold of the work as one’s exclusive possession.
– When to use the CC4r?
Any time you want to invite others to copy, distribute and transform authored works without exclusive appropriation but with considering the implications of (re-)use, you can use the CC4r. You can for example apply it to collective documentation, hybrid productions, artistic collaborations or educational projects.
– What kinds of works can be subject to the CC4r?
The Collective Conditions for re-use can be applied to digital as well as physical works.
You can choose to apply the CC4r for any text, picture, sound, gesture, or whatever material as long as you have legal author’s rights.
– Background of this license:
The CC4r was developed for the Constant worksession Unbound libraries (spring 2020) and followed from discussions during and contributions to the study day Authors of the future (Fall 2019). It is based on the Free Art License http://artlibre.org/licence/lal/en/ and inspired by other licensing projects such as The (Cooperative) Non-Violent Public License https://thufie.lain.haus/NPL.html and the Decolonial Media license https://freeculture.org/About/license.
Copyleft Attitude with a difference, 6 October 2020.

56
README.md

@ -0,0 +1,56 @@
# wiki-to-print
Slightly adapted version of <https://github.com/hackersanddesigners/wiki2print>, in continuation of <https://gitlab.constantvzw.org/titipi/wiki-to-pdf> and <https://git.vvvvvvaria.org/mb/volumetric-regimes-book>.
Installed at: <https://cc.vvvvvvaria.org/wiki/Wiki2print>.
The code of the wiki2print instance that is running on the *creative
crowd* server is published at [Varia's Gitea](https://git.vvvvvvaria.org/varia/wiki-to-print) under the
[CC4r](https://constantvzw.org/wefts/cc4r.en.html) license.
## Continuations
This project is inspired by and builds upon several previous iterations
of and experiments with mediawiki-to-pdf workflows:
- [Hackers & Designer](https://hackersanddesigners.nl/)\'s work on
[Making
Matters](https://wiki2print.hackersanddesigners.nl/wiki/Publishing:Making_Matters_Lexicon)
- [TITiPI](http://titipi.org/)\'s work on [Infrastructural
Interactions](http://titipi.org/wiki-to-pdf/unfold/Infrastructural_Interactions)
- [Manetta](https://git.vvvvvvaria.org/mb)\'s work on [Volumetric
Regimes](https://volumetricregimes.xyz/index.php?title=Volumetric_Regimes)
- [Constant](https://constantvzw.org/site/)\'s and
[OSP](https://osp.kitchen/)\'s work on
[Diversions](https://diversions.constantvzw.org/wiki/index.php?title=Main_Page)
- [many
more\...](https://constantvzw.org/wefts/webpublications.en.html)
## How does it work?
When you create a page in the `Pdf` namespace on <https://cc.vvvvvvaria.org/wiki/>, it will load the wiki-to-print buttons in the navigation bar:
- `CSS!`
- `View HTML`
- `View PDF`
- `Update text`
- `Update Media`
You can transclude pages into this page, structure your publication and edit the CSS.
- When you click `View HTML`: the Flask application returns you a HTML version of the page.
- When you click `View PDF`: the Flask application returns you a HTML version of the page, loaded with Paged.js. The HTML page is rendered into pages, giving you a preview of the PDF. You can use the inspector to work on the lay out.
- When you click `Update text`: the Flask application makes a copy of all the text of the page and saves it to a file on the server (in the `static` folder).
- When you click `Update media`: the Flask application downloads all the images on the page and saves tem to a folder on the server (in the `static` folder).
## In this repository
* **command-line**: Python script to work on a local copy of your publication
* **wiki-to-print**: Flask application that renders a wiki page into HTML
## Links
* <https://cc.vvvvvvaria.org/wiki/Wiki2print>
* <https://constantvzw.org/wefts/webpublications.en.html>
* <https://titipi.org/wiki/index.php/Wiki-to-pdf>
* <https://pad.vvvvvvaria.org/wiki-printing>

11
command-line/Makefile

@ -0,0 +1,11 @@
all: run
run:
python3 -m http.server
wiki:
# ---
# update the materials from the wiki, save it as Unfolded.html
python3 update.py
@echo "Pulling updates from the wiki: Unfolded (wiki) --> Unfolded.html (file)"

34
command-line/README.md

@ -0,0 +1,34 @@
# CLI for wiki-to-print
The script uses the MediaWiki API to download all content (text + images) from a specified wiki page.
It saves it as a HTML page, which can be turned into a PDF with Paged.js.
## Folder structure
```
.
├── css
│   ├── baseline.css
│   ├── pagedjs.css
│   └── print.css
├── fonts
├── images
├── js
│   ├── paged.js
│   └── paged.polyfill.js
├── Makefile
├── templates
│   ├── template.html
│   └── template.inspect.html
└── update.py
```
## How to use it?
1. Change the `wiki` and `pagename` variables in `update.py` on line 221 + 222.
2. Copy paste your CSS into print.css
3. Run `$ python3 update.py`
4. Run `$ make`
5. Open `localhost:8000` in your browser

19
command-line/css/baseline.css

@ -0,0 +1,19 @@
/* This baseline.css stylesheet is derived from: https://gist.github.com/julientaq/08d636a7a2b5f2824025256de0fca467 */
/* Thanks a lot to julientaq for publishing it! */
:root {
--baseline: 18px;
--baseline-color: blue;
}
/* grid baseline */
.pagedjs_page {
/* background:
repeating-linear-gradient(
white 0,
white calc(var(--baseline) - 1px), var(--baseline-color) var(--baseline));
background-size: cover;
background-repeat: repeat-y; */
/* start of the first baseline: half of the line-height: 9px */
/* background-position-y: 9px; */
}

180
command-line/css/pagedjs.css

@ -0,0 +1,180 @@
/* CSS for Paged.js interface – v0.2 */
/* Change the look */
:root {
--color-background: whitesmoke;
--color-pageSheet: #cfcfcf;
--color-pageBox: violet;
--color-paper: white;
--color-marginBox: transparent;
--pagedjs-crop-color: black;
--pagedjs-crop-shadow: white;
--pagedjs-crop-stroke: 1px;
}
/* To define how the book look on the screen: */
@media screen {
body {
background-color: var(--color-background);
}
.pagedjs_pages {
display: flex;
width: calc(var(--pagedjs-width) * 2);
flex: 0;
flex-wrap: wrap;
margin: 0 auto;
}
.pagedjs_page {
background-color: var(--color-paper);
box-shadow: 0 0 0 1px var(--color-pageSheet);
margin: 0;
flex-shrink: 0;
flex-grow: 0;
margin-top: 10mm;
}
.pagedjs_first_page {
margin-left: var(--pagedjs-width);
}
.pagedjs_page:last-of-type {
margin-bottom: 10mm;
}
.pagedjs_pagebox{
box-shadow: 0 0 0 1px var(--color-pageBox);
}
.pagedjs_left_page{
z-index: 20;
width: calc(var(--pagedjs-bleed-left) + var(--pagedjs-pagebox-width))!important;
}
.pagedjs_left_page .pagedjs_bleed-right .pagedjs_marks-crop {
border-color: transparent;
}
.pagedjs_left_page .pagedjs_bleed-right .pagedjs_marks-middle{
width: 0;
}
.pagedjs_right_page{
z-index: 10;
position: relative;
left: calc(var(--pagedjs-bleed-left)*-1);
}
/* show the margin-box */
.pagedjs_margin-top-left-corner-holder,
.pagedjs_margin-top,
.pagedjs_margin-top-left,
.pagedjs_margin-top-center,
.pagedjs_margin-top-right,
.pagedjs_margin-top-right-corner-holder,
.pagedjs_margin-bottom-left-corner-holder,
.pagedjs_margin-bottom,
.pagedjs_margin-bottom-left,
.pagedjs_margin-bottom-center,
.pagedjs_margin-bottom-right,
.pagedjs_margin-bottom-right-corner-holder,
.pagedjs_margin-right,
.pagedjs_margin-right-top,
.pagedjs_margin-right-middle,
.pagedjs_margin-right-bottom,
.pagedjs_margin-left,
.pagedjs_margin-left-top,
.pagedjs_margin-left-middle,
.pagedjs_margin-left-bottom {
box-shadow: 0 0 0 1px inset var(--color-marginBox);
}
/* uncomment this part for recto/verso book : ------------------------------------ */
/*
.pagedjs_pages {
flex-direction: column;
width: 100%;
}
.pagedjs_first_page {
margin-left: 0;
}
.pagedjs_page {
margin: 0 auto;
margin-top: 10mm;
}
.pagedjs_left_page{
width: calc(var(--pagedjs-bleed-left) + var(--pagedjs-pagebox-width) + var(--pagedjs-bleed-left))!important;
}
.pagedjs_left_page .pagedjs_bleed-right .pagedjs_marks-crop{
border-color: var(--pagedjs-crop-color);
}
.pagedjs_left_page .pagedjs_bleed-right .pagedjs_marks-middle{
width: var(--pagedjs-cross-size)!important;
}
.pagedjs_right_page{
left: 0;
}
*/
/*--------------------------------------------------------------------------------------*/
/* uncomment this par to see the baseline : -------------------------------------------*/
/*
.pagedjs_pagebox {
--pagedjs-baseline: 22px;
--pagedjs-baseline-position: 5px;
--pagedjs-baseline-color: cyan;
background: linear-gradient(transparent 0%, transparent calc(var(--pagedjs-baseline) - 1px), var(--pagedjs-baseline-color) calc(var(--pagedjs-baseline) - 1px), var(--pagedjs-baseline-color) var(--pagedjs-baseline)), transparent;
background-size: 100% var(--pagedjs-baseline);
background-repeat: repeat-y;
background-position-y: var(--pagedjs-baseline-position);
} */
/*--------------------------------------------------------------------------------------*/
}
/* Marks (to delete when merge in paged.js) */
.pagedjs_marks-crop{
z-index: 999999999999;
}
.pagedjs_bleed-top .pagedjs_marks-crop,
.pagedjs_bleed-bottom .pagedjs_marks-crop{
box-shadow: 1px 0px 0px 0px var(--pagedjs-crop-shadow);
}
.pagedjs_bleed-top .pagedjs_marks-crop:last-child,
.pagedjs_bleed-bottom .pagedjs_marks-crop:last-child{
box-shadow: -1px 0px 0px 0px var(--pagedjs-crop-shadow);
}
.pagedjs_bleed-left .pagedjs_marks-crop,
.pagedjs_bleed-right .pagedjs_marks-crop{
box-shadow: 0px 1px 0px 0px var(--pagedjs-crop-shadow);
}
.pagedjs_bleed-left .pagedjs_marks-crop:last-child,
.pagedjs_bleed-right .pagedjs_marks-crop:last-child{
box-shadow: 0px -1px 0px 0px var(--pagedjs-crop-shadow);
}

22
command-line/css/print.css

@ -0,0 +1,22 @@
:root{
--font-size: 12px;
--line-height: 18px;
}
@page{
size: A4 portrait;
bleed: 3mm;
marks: crop;
@bottom-center{
content: counter(page);
font-size: 8pt;
}
}
html, body{
font-family: serif;
font-size: var(--font-size);
line-height: var(--line-height);
hyphens: auto;
}

31061
command-line/js/paged.js

File diff suppressed because it is too large

31107
command-line/js/paged.polyfill.js

File diff suppressed because it is too large

17
command-line/templates/template.html

@ -0,0 +1,17 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<script src="./js/paged.js" type="text/javascript"></script>
<script src="./js/paged.polyfill.js" type="text/javascript"></script>
<link href="./css/pagedjs.css" rel="stylesheet" type="text/css">
<link href="./css/print.css" rel="stylesheet" type="text/css" media="print">
<!-- <link href="./css/baseline.css" rel="stylesheet" type="text/css" media="print"> -->
</head>
<body>
<div id="wrapper">
{{ publication_unfolded }}
</div>
</body>
</html>

12
command-line/templates/template.inspect.html

@ -0,0 +1,12 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<link href="./css/print.css" rel="stylesheet" type="text/css" media="print">
</head>
<body>
<div id="wrapper">
{{ publication_unfolded }}
</div>
</body>
</html>

226
command-line/update.py

@ -0,0 +1,226 @@
import urllib.request
import os
import re
import json
import jinja2
STATIC_FOLDER_PATH = '.' # without trailing slash
PUBLIC_STATIC_FOLDER_PATH = '.' # without trailing slash
TEMPLATES_DIR = './templates'
# This uses a low quality copy of all the images
# (using a folder with the name "images-small",
# which stores a copy of all the images generated with:
# $ mogrify -quality 5% -adaptive-resize 25% -remap pattern:gray50 * )
fast = False
def API_request(url, pagename):
"""
url = API request url (string)
data = { 'query':
'pages' :
pageid : {
'links' : {
'?' : '?'
'title' : 'pagename'
}
}
}
}
"""
response = urllib.request.urlopen(url).read()
data = json.loads(response)
# Save response as JSON to be able to inspect API call
json_file = f'{ STATIC_FOLDER_PATH }/{ pagename }.json'
print('Saving JSON:', json_file)
with open(json_file, 'w') as out:
out.write(json.dumps(data, indent=4))
out.close()
return data
def download_media(html, images, wiki):
"""
html = string (HTML)
images = list of filenames (str)
"""
# check if 'images/' already exists
if not os.path.exists(f'{ STATIC_FOLDER_PATH }/images'):
os.makedirs(f'{ STATIC_FOLDER_PATH }/images')
# download media files
for filename in images:
filename = filename.replace(' ', '_') # safe filenames
# check if the image is already downloaded
# if not, then download the file
if not os.path.isfile(f'{ STATIC_FOLDER_PATH }/images/{ filename }'):
# first we search for the full filename of the image
url = f'{ wiki }/api.php?action=query&list=allimages&aifrom={ filename }&format=json'
response = urllib.request.urlopen(url).read()
data = json.loads(response)
# we select the first search result
# (assuming that this is the image we are looking for)
image = data['query']['allimages'][0]
# then we download the image
image_url = image['url']
image_filename = image['name']
print('Downloading:', image_filename)
image_response = urllib.request.urlopen(image_url).read()
# and we save it as a file
image_path = f'{ STATIC_FOLDER_PATH }/images/{ image_filename }'
out = open(image_path, 'wb')
out.write(image_response)
out.close()
import time
time.sleep(3) # do not overload the server
# replace src link
image_path = f'{ PUBLIC_STATIC_FOLDER_PATH }/images/{ filename }' # here the images need to link to the / of the domain, for flask :/// confusing! this breaks the whole idea to still be able to make a local copy of the file
matches = re.findall(rf'src="/images/.*?px-{ filename }"', html) # for debugging
if matches:
html = re.sub(rf'src="/images/.*?px-{ filename }"', f'src="{ image_path }"', html)
else:
matches = re.findall(rf'src="/images/.*?{ filename }"', html) # for debugging
html = re.sub(rf'src="/images/.*?{ filename }"', f'src="{ image_path }"', html)
# print(f'{filename}: {matches}\n------') # for debugging: each image should have the correct match!
return html
def add_item_inventory_links(html):
"""
html = string (HTML)
"""
# Find all references in the text to the item index
pattern = r'Item \d\d\d'
matches = re.findall(pattern, html)
index = {}
new_html = ''
from nltk.tokenize import sent_tokenize
for line in sent_tokenize(html):
for match in matches:
if match in line:
number = match.replace('Item ', '').strip()
if not number in index:
index[number] = []
count = 1
else:
count = index[number][-1] + 1
index[number].append(count)
item_id = f'ii-{ number }-{ index[number][-1] }'
line = line.replace(match, f'Item <a id="{ item_id }" href="#Item_Index">{ number }</a>')
# the line is pushed back to the new_html
new_html += line + ' '
# Also add a <span> around the index nr to style it
matches = re.findall(r'<li>\d\d\d', new_html)
for match in matches:
new_html = new_html.replace(match, f'<li><span class="item_nr">{ match }</span>')
# import json
# print(json.dumps(index, indent=4))
return new_html
def clean_up(html):
"""
html = string (HTML)
"""
html = re.sub(r'\[.*edit.*\]', '', html) # remove the [edit]
html = re.sub(r'href="/index.php\?title=', 'href="#', html) # remove the internal wiki links
html = re.sub(r'&#91;(?=\d)', '', html) # remove left footnote bracket [
html = re.sub(r'(?<=\d)&#93;', '', html) # remove right footnote bracket ]
return html
def fast_loader(html):
"""
html = string (HTML)
"""
if fast == True:
html = html.replace('/images/', '/images-small/')
print('--- rendered in FAST mode ---')
return html
def parse_page(pagename, wiki):
"""
pagename = string
html = string (HTML)
"""
parse = f'{ wiki }/api.php?action=parse&page={ pagename }&pst=True&format=json'
data = API_request(parse, pagename)
# print(json.dumps(data, indent=4))
if 'parse' in data:
html = data['parse']['text']['*']
images = data['parse']['images']
html = download_media(html, images, wiki)
html = clean_up(html)
html = add_item_inventory_links(html)
html = fast_loader(html)
else:
html = None
return html
def save(html, pagename):
"""
html = string (HTML)
pagename = string
"""
if __name__ == "__main__":
# command-line
# save final page that will be used with PagedJS
template_file = open(f'{ STATIC_FOLDER_PATH }/{ TEMPLATES_DIR }/template.html').read()
template = jinja2.Template(template_file)
doc = template.render(publication_unfolded=html, title=pagename)
html_file = f'{ STATIC_FOLDER_PATH }/{ pagename }.html'
print('Saving HTML:', html_file)
with open(html_file, 'w') as out:
out.write(doc)
out.close()
# save extra html page for debugging (CLI only)
template_file = open(f'{ STATIC_FOLDER_PATH }/{ TEMPLATES_DIR }/template.inspect.html').read()
template = jinja2.Template(template_file)
doc = template.render(publication_unfolded=html, title=pagename)
html_file = f'{ STATIC_FOLDER_PATH }/{ pagename }.inspect.html'
print('Saving HTML:', html_file)
with open(html_file, 'w') as out:
out.write(doc)
out.close()
else:
# Flask application
with open(f'{ STATIC_FOLDER_PATH }/Unfolded.html', 'w') as out:
out.write(html) # save the html to a file (without <head>)
def update_material_now(pagename, wiki):
"""
pagename = string
publication_unfolded = string (HTML)
"""
publication_unfolded = parse_page(pagename, wiki)
return publication_unfolded
# ---
if __name__ == "__main__":
wiki = 'https://example.com/wiki' # no tail slash '/'
pagename = 'Unfolded'
publication_unfolded = update_material_now(pagename, wiki) # download the latest version of the page
save(publication_unfolded, pagename) # save the page to file

53
wiki-to-print.Common.css.example

@ -0,0 +1,53 @@
/* CSS placed here will be applied to all skins */
.vector-body h1{
display: inline;
}
h1#firstHeading{
color: white;
border-bottom: 0;
padding: 0.3em 0 0 0.5em;
background: black;
}
/* wiki2print button */
li.wiki2print{
background-image: linear-gradient(to top,fuchsia 0,#ed8ce2 1px,#fff 100%) !important;
}
#ca-talk.wiki2print > a,
#p-views .wiki2print > a{
color: magenta;
background-image: linear-gradient(to bottom,rgba(167,215,249,0) 0,fuchsia 100%);
background-size: 1px 100%;
background-repeat: no-repeat;
}
/* captcha question on the Create Account page -- does not work*/
#userloginForm .mw-input {
font-weight: bold;
margin-top: 1em;
}
/* categorytree styling*/
.CategoryTreeItem::before{
content: "•";
font-size: 17px;
margin-right: -0.6em;
margin-left: 0.6em;
vertical-align: middle;
}
div.pad{
display:block;
height: 1em;
}
div.pad iframe{
float: right;
margin-left: 2em;
margin-bottom: 2em;
}
/* hide the pad in the Visual Editor edit view */
.ve-init-target-visual div.pad{
display: none;
}

104
wiki-to-print.Common.js.example

@ -0,0 +1,104 @@
/* Any JavaScript here will be loaded for all users on every page load. */
// Any JavaScript here will be loaded for all
// users on every page load.
console.log('hello from common.js')
// rename 'Discussion' tab or context menu button
// to 'CSS' in the 'Pdf' namespace.
const
url = window.location.href,
NS = 'Pdf', // content namespace
cssNS = NS + 'CSS', // css namespace
pageName = mw.config.get("wgPageName").split(":")[1]
if (url.includes(NS + ':')) {
console.log('this page is in namespace', NS)
// Change Discussion into CSS button
const talkAnchor = document.querySelector('#ca-talk a')
const talkLink = talkAnchor.href
talkAnchor.innerText = 'CSS!'
const talkButton = document.querySelector('#ca-talk')
talkButton.classList.add('wiki2print')
// adding more buttons
const pageViews = document.querySelector('#p-views ul')
// View HTML
const htmlButton = document.createElement('li')
htmlButton.classList.add('collapsible', 'mw-list-item', 'wiki2print')
htmlButton.id = 'ca-html'
htmlButton.innerHTML = '<a href="https://cc.vvvvvvaria.org/wiki-to-print/html/' + pageName + '" target="_blank">View HTML</a>'
pageViews.appendChild(htmlButton)
// View PDF
const pdfButton = document.createElement('li')
pdfButton.classList.add('collapsible', 'mw-list-item', 'wiki2print')
pdfButton.id = 'ca-pdf'
pdfButton.innerHTML = '<a href="https://cc.vvvvvvaria.org/wiki-to-print/pdf/' + pageName + '" target="_blank">View PDF</a>'
pageViews.appendChild(pdfButton)
// UPDATE
const updateButton = document.createElement('li')
updateButton.classList.add('collapsible', 'mw-list-item', 'wiki2print')
updateButton.id = 'ca-update'
updateButton.innerHTML = '<a href="https://cc.vvvvvvaria.org/wiki-to-print/update/' + pageName + '" target="_blank">Update text</a>'
pageViews.appendChild(updateButton)
// FULL UPDATE
const fullupdateButton = document.createElement('li')
fullupdateButton.classList.add('collapsible', 'mw-list-item', 'wiki2print')
fullupdateButton.id = 'ca-full-update'
fullupdateButton.innerHTML = '<a href="https://cc.vvvvvvaria.org/wiki-to-print/update/' + pageName + '?full=true" target="_blank">Update media</a>'
pageViews.appendChild(fullupdateButton)
} else if (url.includes(cssNS + ':')) {
console.log('this page is in namespace', cssNS)
// Change "Page" button into "Content" button
const contentAnchor = document.querySelector('#ca-nstab-pdf a')
const contentLink = contentAnchor.href
contentAnchor.innerText = 'Content'
// Change "Discussion" button into "CSS" button
const talkAnchor = document.querySelector('#ca-talk a')
const talkLink = talkAnchor.href
talkAnchor.innerText = 'CSS!'
const talkButton = document.querySelector('#ca-talk')
talkButton.classList.add('wiki2print')
// adding more buttons
const pageViews = document.querySelector('#p-views ul')
// View HTML
const htmlButton = document.createElement('li')
htmlButton.classList.add('collapsible', 'mw-list-item', 'wiki2print')
htmlButton.id = 'ca-html'
htmlButton.innerHTML = '<a href="https://cc.vvvvvvaria.org/wiki-to-print/html/' + pageName + '" target="_blank">View HTML</a>'
pageViews.appendChild(htmlButton)
// View PDF
const pdfButton = document.createElement('li')
pdfButton.classList.add('collapsible', 'mw-list-item', 'wiki2print')
pdfButton.id = 'ca-pdf'
pdfButton.innerHTML = '<a href="https://cc.vvvvvvaria.org/wiki-to-print/pdf/' + pageName + '" target="_blank">View PDF</a>'
pageViews.appendChild(pdfButton)
// UPDATE
const updateButton = document.createElement('li')
updateButton.classList.add('collapsible', 'mw-list-item', 'wiki2print')
updateButton.id = 'ca-update'
updateButton.innerHTML = '<a href="https://cc.vvvvvvaria.org/wiki-to-print/update/' + pageName + '" target="_blank">Update text</a>'
pageViews.appendChild(updateButton)
// FULL UPDATE
const fullupdateButton = document.createElement('li')
fullupdateButton.classList.add('collapsible', 'mw-list-item', 'wiki2print')
fullupdateButton.id = 'ca-full-update'
fullupdateButton.innerHTML = '<a href="https://cc.vvvvvvaria.org/wiki-to-print/update/' + pageName + '?full=true" target="_blank">Update media</a>'
pageViews.appendChild(fullupdateButton)
}

73
wiki-to-print.nginx.example

@ -0,0 +1,73 @@
server {
listen 80 default_server;
listen [::]:80 default_server;
return 301 https://cc.vvvvvvaria.org$request_uri;
}
server {
listen 443 ssl;
server_name cc.vvvvvvaria.org;
root /var/www/html;
index index.html index.php index.htm index.nginx-debian.html;
location / {
try_files $uri $uri/ =404;
autoindex on;
}
ssl_certificate /etc/letsencrypt/live/cc.vvvvvvaria.org/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/cc.vvvvvvaria.org/privkey.pem;
location ~ \.php$ {
include snippets/fastcgi-php.conf;
fastcgi_buffers 16 16k;
fastcgi_buffer_size 32k;
fastcgi_pass unix:/run/php/php7.4-fpm.sock;
# tip from Michael
include fastcgi_params;
}
# ---------------------------------------------------
# WIKI
# Images
location /wiki/images {
# Separate location for images/ so .php execution won't apply
}
location /wiki/images/deleted {
# Deny access to deleted images folder
deny all;
}
# MediaWiki assets (usually images)
location ~ ^/wiki/resources/(assets|lib|src) {
try_files $uri 404;
add_header Cache-Control "public";
expires 7d;
}
# Assets, scripts and styles from skins and extensions
location ~ ^/wiki/(skins|extensions)/.+\.(css|js|gif|jpg|jpeg|png|svg|wasm)$ {
try_files $uri 404;
add_header Cache-Control "public";
expires 7d;
}
# License and credits files
location ~ ^/wiki/(COPYING|CREDITS)$ {
default_type text/plain;
}
# Handling for Mediawiki REST API, see [[mw:API:REST_API]]
location /wiki/rest.php/ {
try_files $uri $uri/ /wiki/rest.php?$query_string;
}
# Handling for the article path (pretty URLs)
location /wiki/ {
rewrite ^/wiki/(?<pagename>.*)$ /wiki/index.php;
}
# ----------------------------------------------------
# wiki-to-print
location /wiki-to-print/ {
proxy_pass http://localhost:5522;
}
}

9
wiki-to-print/Makefile

@ -0,0 +1,9 @@
all: local
local:
./venv/bin/python3 web-interface.py
server:
@SCRIPT_NAME=/wiki-to-print venv/bin/gunicorn -b localhost:5522 --reload web-interface:APP

14
wiki-to-print/README.md

@ -0,0 +1,14 @@
# wiki-to-print
## How to use it?
* `$ make local`: to work locally
* `$ make server`: to install wiki-to-print on a server with `gunicorn`
## How to install it?
* install a wiki on a server
* install the Flask application on the same server
* edit the `Mediawiki:Common.js` page on the wiki (see `wiki-to-print.Common.js.example`)
* edit the `Mediawiki:Common.css` page on the wiki (see `wiki-to-print.Common.css.example`)
* configure nginx (see `wiki-to-print.nginx.example`)

432
wiki-to-print/api.py

@ -0,0 +1,432 @@
from pprint import pprint
import sys
import urllib.request
import urllib.error
import os
import re
import json
import jinja2
import datetime
from bs4 import BeautifulSoup
STATIC_FOLDER_PATH = './static' # without trailing slash
PUBLIC_STATIC_FOLDER_PATH = '/static' # without trailing slash
TEMPLATES_DIR = None
# This uses a low quality copy of all the images
# (using a folder with the name "images-small",
# which stores a copy of all the images generated with:
# $ mogrify -quality 5% -adaptive-resize 25% -remap pattern:gray50 * )
fast = False
# gets or creates index of publications in namespace
def get_index(wiki, subject_ns):
"""
wiki = string
subject_ns = object
"""
return load_file('index', 'json') or create_index(
wiki,
subject_ns
)
# gets publication's HTML and CSS
def get_publication(wiki, subject_ns, styles_ns, pagename):
"""
wiki = string
subject_ns = object
styles_ns = object
pagename = string
"""
return {
'html': get_html(wiki, subject_ns, pagename),
'css': get_css(wiki, styles_ns, pagename)
}
# gets or creates HTML file for a publication
def get_html(wiki, subject_ns, pagename):
"""
wiki = string
subject_ns = object
pagename = string
"""
return load_file(pagename, 'html') or create_html(
wiki,
subject_ns,
pagename,
)
# gets or creates CSS file for a publication
def get_css(wiki, styles_ns, pagename):
"""
wiki = string
styles_ns = object
pagename = string
"""
return load_file(pagename, 'css') or create_css(
wiki,
styles_ns,
pagename
)
# makes API call to create/update index of publications
def create_index(wiki, subject_ns):
"""
wiki = string
subject_ns = object
"""
url = f'{ wiki }/api.php?action=query&format=json&list=allpages&apnamespace={ subject_ns["id"] }'
data = do_API_request(url)
pages = data['query']['allpages']
# exclude subpages
pages = [page for page in pages if '/' not in page['title']]
for page in pages:
# removing the namespace from title
page['title'] = page['title'].replace(subject_ns['name'] + ':', '')
page['slug'] = page['title'].replace(' ', '_') # slugifying title
pageJSON = load_file(page['slug'], 'json')
page['updated'] = pageJSON and pageJSON['updated'] or '--'
now = str(datetime.datetime.now())
index = {
'pages': pages,
'updated': now
}
save_file('index', 'json', index)
return index
# Creates/updates a publication object
def create_publication(wiki, subject_ns, styles_ns, pagename, full_update):
"""
wiki = string
subject_ns = object
styles_ns = object
pagename = string
"""
return {
'html': create_html(wiki, subject_ns, pagename, full_update),
'css': create_css(wiki, styles_ns, pagename)
}
# makes API call to create/update a publication's HTML
def create_html(wiki, subject_ns, pagename, full_update):
"""
wiki = string
subject_ns = object
pagename = string
full_update = None or string. Full update when not None
"""
url = f'{ wiki }/api.php?action=parse&page={ subject_ns["name"] }:{ pagename }&pst=True&format=json'
data = do_API_request(url, subject_ns["name"]+":"+pagename, wiki)
# pprint(data)
now = str(datetime.datetime.now())
data['updated'] = now
save_file(pagename, 'json', data)
update_publication_date( # we add the last updated of the publication to our index
wiki,
subject_ns,
pagename,
now
)
if 'parse' in data:
html = data['parse']['text']['*']
# pprint(html)
imgs = data['parse']['images']
html = remove_comments(html)
html = download_media(html, imgs, wiki, full_update)
html = clean_up(html)
# html = add_item_inventory_links(html)
if fast == True:
html = fast_loader(html)
soup = BeautifulSoup(html, 'html.parser')
soup = remove_edit(soup)
soup = inlineCiteRefs(soup)
html = str(soup)
# html = inlineCiteRefs(html)
# html = add_author_names_toc(html)
else:
html = None
save_file(pagename, 'html', html)
return html
# makes API call to create/update a publication's CSS
def create_css(wiki, styles_ns, pagename):
"""
wiki = string
styles_ns = object
pagename = string
"""
css_url = f'{ wiki }/api.php?action=parse&page={ styles_ns["name"] }:{ pagename }&prop=wikitext&pst=True&format=json'
css_data = do_API_request(css_url)
if css_data and 'parse' in css_data:
css = css_data['parse']['wikitext']['*']
save_file(pagename, 'css', css)
return css
# Load file from disk
def load_file(pagename, ext):
"""
pagename = string
ext = string
"""
path = f'{ STATIC_FOLDER_PATH }/{ pagename }.{ ext }'
if os.path.exists(path):
print(f'Loading { ext }:', path)
with open(path, 'r') as out:
if ext == 'json':
data = json.load(out)
else:
data = out.read()
out.close()
return data
# Save file to disk
def save_file(pagename, ext, data):
"""
pagename = string
ext = string
data = object
"""
path = f'{ STATIC_FOLDER_PATH }/{ pagename }.{ ext }'
print(f'Saving { ext }:', path)
try:
out = open(path, 'w')
except OSError:
print("Could not open/write file:", path)
sys.exit()
with out: #open(path, 'w') as out:
if ext == 'json':
out.write( json.dumps(data, indent = 2) )
else:
out.write( data )
out.close()
return data
# do API request and return JSON
def do_API_request(url, filename="", wiki=""):
"""
url = API request url (string)
data = { 'query':
'pages' :
pageid : {
'links' : {
'?' : '?'
'title' : 'pagename'
}
}
}
}
"""
purge(filename, wiki)
print('Loading from wiki: ', url)
response = urllib.request.urlopen(url)
response_type = response.getheader('Content-Type')
if response.status == 200 and "json" in response_type:
contents = response.read()
data = json.loads(contents)
return data
# api calls seem to be cached even when called with maxage
# So call purge before doing the api call.
# https://www.mediawiki.org/wiki/API:Purge
def purge(filename, wiki):
if(filename=="" or wiki==""): return
print("purge " + filename )
import requests
S = requests.Session()
URL = f'{ wiki }/api.php'
# url = f'{ wiki }/api.php?action=query&list=allimages&aifrom={ filename }&format=json'
PARAMS = {
"action": "purge",
"titles": filename,
"format": "json",
"generator": "alltransclusions",
}
R = S.post(url=URL, params=PARAMS)
# DATA = R.text
# updates a publication's last updated feild in the index
def update_publication_date(wiki, subject_ns, pagename, updated):
"""
wiki = string
subject_ns = object
pagename = string
updated = string
"""
index = get_index(wiki, subject_ns)
for page in index['pages']:
if page['slug'] == pagename:
page['updated'] = updated
save_file('index', 'json', index)
def customTemplate(name):
path = "custom/%s.html" % name
if os.path.isfile(os.path.join(os.path.dirname(__file__), "templates/", path)):
return path
else:
return None
# Beautiful soup seems to have a problem with some comments,
# so lets remove them before parsing.
def remove_comments( html ):
"""
html = string (HTML)
"""
pattern = r'(<!--.*?-->)|(<!--[\S\s]+?-->)|(<!--[\S\s]*?$)'
return re.sub(pattern, "", html)
# Downloading images referenced in the html
def download_media(html, images, wiki, full_update):
"""
html = string (HTML)
images = list of filenames (str)
"""
# check if 'images/' already exists
if not os.path.exists(f'{ STATIC_FOLDER_PATH }/images'):
os.makedirs(f'{ STATIC_FOLDER_PATH }/images')
# download media files
for filename in images:
filename = filename.replace(' ', '_') # safe filenames
# check if the image is already downloaded
# if not, then download the file
if (not os.path.isfile(f'{ STATIC_FOLDER_PATH }/images/{ filename }')) or full_update:
# first we search for the full filename of the image
url = f'{ wiki }/api.php?action=query&list=allimages&aifrom={ filename }&format=json'
# url = f'{ wiki }/api.php?action=query&titles=File:{ filename }&format=json'
data = do_API_request(url)
# timestamp = data.query.pages.
# print(json.dumps(data, indent=2))
if data and data['query']['allimages']:
# we select the first search result
# (assuming that this is the image we are looking for)
image = data['query']['allimages'][0]
if image:
# then we download the image
image_url = image['url']
image_filename = image['name']
print('Downloading:', image_filename)
image_response = urllib.request.urlopen(image_url).read()
# and we save it as a file
image_path = f'{ STATIC_FOLDER_PATH }/images/{ image_filename }'
out = open(image_path, 'wb')
out.write(image_response)
out.close()
print(image_path)
import time
time.sleep(3) # do not overload the server
# replace src links
e_filename = re.escape( filename ) # needed for filename with certain characters
image_path = f'{ PUBLIC_STATIC_FOLDER_PATH }/images/{ filename }' # here the images need to link to the / of the domain, for flask :/// confusing! this breaks the whole idea to still be able to make a local copy of the file
matches = re.findall(rf'src=\"/wiki/mediawiki/images/.*?px-{ e_filename }\"', html) # for debugging
# pprint(matches)
if matches:
html = re.sub(rf'src=\"/wiki/mediawiki/images/.*?px-{ e_filename }\"', f'src=\"{ image_path }\"', html)
else:
matches = re.findall(rf'src=\"/wiki/mediawiki/images/.*?{ e_filename }\"', html) # for debugging
# print(matches, e_filename, html)
html = re.sub(rf'src=\"/wiki/mediawiki/images/.*?{ e_filename }\"', f'src=\"{ image_path }\"', html)
print(f'{filename}: {matches}\n------') # for debugging: each image should have the correct match!
return html
def clean_up(html):
"""
html = string (HTML)
"""
# html = re.sub(r'\[.*edit.*\]', '', html) # remove the [edit] # Heerko: this somehow caused problems. Removing it solves it, seeming without side effects...
html = re.sub(r'href="/index.php\?title=', 'href="#', html) # remove the internal wiki links
html = re.sub(r'&#91;(?=\d)', '', html) # remove left footnote bracket [
html = re.sub(r'(?<=\d)&#93;', '', html) # remove right footnote bracket ]
return html
def remove_edit(soup):
"""
soup = BeautifSoup (HTML)
"""
es = soup.find_all(class_="mw-editsection")
for s in es:
s.decompose()
return soup
# inline citation references in the html for pagedjs
# Turns: <sup class="reference" id="cite_ref-1"><a href="#cite_note-1">[1]</a></sup>
# into: <span class="footnote">The cite text</span>
def inlineCiteRefs(soup):
"""
soup = BeautifSoup (HTML)
"""
refs = soup.find_all("sup", class_="reference")
for ref in refs:
href = ref.a['href']
res = re.findall('[0-9]+', href)
if(res):
cite = soup.find_all(id="cite_note-"+res[0])
text = cite[0].find(class_="reference-text")
text['class'] = 'footnote'
ref.replace_with(text)
# remove the reference from the bottom of the document
for item in soup.find_all(class_="references"):
item.decompose()
return soup
def fast_loader(html):
"""
html = string (HTML)
"""
html = html.replace('/images/', '/images-small/')
print('--- rendered in FAST mode ---')
return html

19
wiki-to-print/config.json

@ -0,0 +1,19 @@
{
"project_name": "wiki-to-print",
"port": 5522,
"dir_path": ".",
"wiki": {
"base_url": "https://example.com/wiki/",
"subject_ns": { "name": "Pdf", "id": 3000 },
"styles_ns": { "name": "PdfCSS", "id": 3001 }
},
"pagename": "Test",
"stylesheet": "print.css",
"replacements": [
{
"type": "regex",
"search": "<h3><span class=\"mw-headline\" id=\"References.*?\">References</span><span class=\"mw-editsection\"><span class=\"mw-editsection-bracket\"></span></span></h3><ul>",
"replace": "<h3 class=\"references\"><span class=\"mw-headline\" id=\"References\">References</span><span class=\"mw-editsection\"><span class=\"mw-editsection-bracket\"></span></span></h3><ul class=\"references\">"
}
]
}

5
wiki-to-print/config.py

@ -0,0 +1,5 @@
import json
import pkg_resources
data = pkg_resources.resource_string(__name__, "config.json")
config = json.loads(data)

20
wiki-to-print/requirements.txt

@ -0,0 +1,20 @@
attrs==22.2.0
beautifulsoup4==4.11.2
bs4==0.0.1
certifi==2022.12.7
charset-normalizer==3.0.1
click==8.1.3
Flask==2.2.2
gunicorn==20.1.0
idna==3.4
importlib-metadata==6.0.0
itsdangerous==2.1.2
Jinja2==3.1.2
jsonschema==4.17.3
MarkupSafe==2.1.2
pyrsistent==0.19.3
requests==2.28.2
soupsieve==2.3.2.post1
urllib3==1.26.14
Werkzeug==2.2.2
zipp==3.12.0

19
wiki-to-print/static/css/baseline.css

@ -0,0 +1,19 @@
/* This baseline.css stylesheet is derived from: https://gist.github.com/julientaq/08d636a7a2b5f2824025256de0fca467 */
/* Thanks a lot to julientaq for publishing it! */
:root {
--baseline: 18px;
--baseline-color: blue;
}
/* grid baseline */
.pagedjs_page {
/* background:
repeating-linear-gradient(
white 0,
white calc(var(--baseline) - 1px), var(--baseline-color) var(--baseline));
background-size: cover;
background-repeat: repeat-y; */
/* start of the first baseline: half of the line-height: 9px */
/* background-position-y: 9px; */
}

83
wiki-to-print/static/css/main.css

@ -0,0 +1,83 @@
@media screen{
body{
background-color: lavender;
margin: 1vh 5vw 2vh 5vw;
z-index: 1;
}
p {
max-width: 30em;
}
div#nav{
position: fixed;
width: calc(100% - 2em);
margin: 1em;
left: 0;
top: 0;
z-index: 999;
}
div#nav a#home,
div#nav a#notes,
div#nav a#update {
float: left;
padding: 0.25em 0.125em;
}
div#nav div#loading{
display: none;
margin: 0.35em 0;
color: black;
clear: both;
float: right;
background-color: white;
padding: 0.5em 1em;
border-radius: 5px;
opacity: 0;
animation: fade 2s infinite linear;
}
@keyframes fade {
0%,100% { opacity: 0 }
50% { opacity: 1 }
}
table {
border-collapse: collapse;
}
table tr th {
font-weight: normal;
}
table tr th,
table tr td {
border: 1px solid darkgreen;
padding: 0.4em 0.8em;
}
table tr th:first-of-type {
text-align: left;
}
table tr td:first-of-type {
min-width: 15em;
}
span.updated {
font-family: monospace;
font-size: 0.9em;
}
table tr td input[type="checkbox"] {
min-width: 0;
width: unset;
}
div#index{
/* line-height: 2; */
}
div#index ul{
padding: 0;
margin: 0 0 0 2.5em;
width: 750px;
}
div#index ul li{
list-style: none;
}
div#index ul li::before{
content: "-----";
float: left;
margin-left: -2.5em;
}
}

214
wiki-to-print/static/css/pagedjs.css

@ -0,0 +1,214 @@
/* CSS for Paged.js interface – v0.2 */
/* Change the look */
:root {
--color-background: whitesmoke;
--color-pageSheet: #cfcfcf;
--color-pageBox: violet;
--color-paper: white;
--color-marginBox: transparent;
--pagedjs-crop-color: black;
--pagedjs-crop-shadow: white;
--pagedjs-crop-stroke: 1px;
}
/* To define how the book look on the screen: */
@media screen {
/* adding this here from main.css to style the div#nav */
div#nav{
position: fixed;
width: calc(100% - 2em);
margin: 1em;
text-align: right;
left: 0;
top: 0;
z-index: 999;
}
div#nav a#home,
div#nav a#notes{
float: left;
padding: 0.25em 0.125em;
}
div#nav div#loading{
display: none;
margin: 0.35em 0;
color: black;
clear: both;
float: right;
background-color: white;
padding: 0.5em 1em;
border-radius: 5px;
opacity: 0;
animation: fade 2s infinite linear;
}
@keyframes fade {
0%,100% { opacity: 0 }
50% { opacity: 1 }
}
body {
background-color: var(--color-background);
}
.pagedjs_pages {
display: flex;
width: calc(var(--pagedjs-width) * 2);
flex: 0;
flex-wrap: wrap;
margin: 0 auto;
}
.pagedjs_page {
background-color: var(--color-paper);
box-shadow: 0 0 0 1px var(--color-pageSheet);
margin: 0;
flex-shrink: 0;
flex-grow: 0;
margin-top: 10mm;
}
.pagedjs_first_page {
margin-left: var(--pagedjs-width);
}
.pagedjs_page:last-of-type {
margin-bottom: 10mm;
}
.pagedjs_pagebox{
box-shadow: 0 0 0 1px var(--color-pageBox);
}
.pagedjs_left_page{
z-index: 20;
width: calc(var(--pagedjs-bleed-left) + var(--pagedjs-pagebox-width))!important;
}
.pagedjs_left_page .pagedjs_bleed-right .pagedjs_marks-crop {
border-color: transparent;
}
.pagedjs_left_page .pagedjs_bleed-right .pagedjs_marks-middle{
width: 0;
}
.pagedjs_right_page{