DIY_Book_Scanner_Workflow/readme.md

<h1 align="center">DIY Book Scanner Workflow</h1>

## Getting started

This set of scripts was written for the Text Laundrette workshop.<br>It is a workflow to turn the pictures from the DIY Book Scanner into a final OCRed PDF.

In case you want to skip any of the scripts just comment out in the shell code, <em>workshop_stream.sh</em>.


##Dependencies
###Brew (MAC) or apt-get (LINUX)
<p>You’ll need the command-line tools for Xcode installed.</p>

```bash
xcode-select --install
```

<p>After install Homebrew.</p>

```bash
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
```

<p>Run the following command once you’re done to ensure Homebrew is installed and working properly:</p>

```bash
brew doctor
```

```bash
sudo apt-get install python3 python3-pip imagemagick poppler pdfunite
```

```bash
brew install python3 python3-pip imagemagick poppler pdfunite
```

###PIP3
sudo pip3 install pdf2image Pillow time logging opencv-python pytesseract


##How to use
<p>Add your pictures from the book scanner to the folder "/scans"</p>

<p>Make all the files executable.</p>

```bash
sudo chmod 777 merge_scans.sh workshop_stream.sh marge_files.sh
```

<p>Run ./workshop_stream.sh</p>

<p>Wait :)</p>


##Aditional information
###Create 5 directories

```bash
mkdir split
mkdir rotated
mkdir ocred
mkdir bounding_box
mkdir cropped
```
###Merge the files in the directory <em>scans</em>
<p>All the scans will be appended to one pdf called out.pdf</p>
```bash
./merge_scans.sh
```

###Burst the pdf in <em>scans</em>
<p>Burst this pdf, renaming all the files so they can be iterated later.</p>
```bash
python3 burstpdf.py
```

###Rotate the pdfs
<p>The book scanner takes pictures of the pdfs, this scrip iterates through the odd and even pages rotating them to their original position.</p>
```bash
python3 rotation.py
```

###Cropping the bounding boxes
<p>The pages are now in their original position, but they have a bounding box. This script iterates through them and crops the highest contrast area found.</p>
```bash
python3 bounding_box.py
```

###Cropping the mirror
<p>The pages are now cropped, but the mirror is still visible in the middle.</p>
```bash
python3 mirror_crop.py
```

###OCR
<p>In this part we OCR the jpg, turning these into PDFs.</p>
```bash
python3 tesseract_ocr.py
```

###Merge all the files and create the pdf
<p>The OCRed pages are now joined into their final PDF, your book is ready :)</p>
```bash
./merge_files.sh
```

## License
The package is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
Added readme file 5 years ago			`<h1 align="center">DIY Book Scanner Workflow</h1>`

			`## Getting started`

Update 'readme.md' 5 years ago			`This set of scripts was written for the Text Laundrette workshop.<br>It is a workflow to turn the pictures from the DIY Book Scanner into a final OCRed PDF.`
Added readme file 5 years ago
			`In case you want to skip any of the scripts just comment out in the shell code, <em>workshop_stream.sh</em>.`

Update 'readme.md' 5 years ago

Added readme file 5 years ago			`##Dependencies`
			`###Brew (MAC) or apt-get (LINUX)`
			`<p>You’ll need the command-line tools for Xcode installed.</p>`
Update 'readme.md' 5 years ago
Added readme file 5 years ago			```bash
			`xcode-select --install`
			```

			`<p>After install Homebrew.</p>`
Update 'readme.md' 5 years ago
Added readme file 5 years ago			```bash
			`ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"`
			```

			`<p>Run the following command once you’re done to ensure Homebrew is installed and working properly:</p>`
Update 'readme.md' 5 years ago
Added readme file 5 years ago			```bash
			`brew doctor`
			```

			```bash
			`sudo apt-get install python3 python3-pip imagemagick poppler pdfunite`
			```

			```bash
			`brew install python3 python3-pip imagemagick poppler pdfunite`
			```

			`###PIP3`
			`sudo pip3 install pdf2image Pillow time logging opencv-python pytesseract`


			`##How to use`
			`<p>Add your pictures from the book scanner to the folder "/scans"</p>`

			`<p>Make all the files executable.</p>`
Update 'readme.md' 5 years ago
Added readme file 5 years ago			```bash
			`sudo chmod 777 merge_scans.sh workshop_stream.sh marge_files.sh`
			```

			`<p>Run ./workshop_stream.sh</p>`

			`<p>Wait :)</p>`


			`##Aditional information`
			`###Create 5 directories`
Update 'readme.md' 5 years ago
Added readme file 5 years ago			```bash
			`mkdir split`
			`mkdir rotated`
			`mkdir ocred`
			`mkdir bounding_box`
			`mkdir cropped`
			```
			`###Merge the files in the directory <em>scans</em>`
			`<p>All the scans will be appended to one pdf called out.pdf</p>`
			```bash
			`./merge_scans.sh`
			```

			`###Burst the pdf in <em>scans</em>`
			`<p>Burst this pdf, renaming all the files so they can be iterated later.</p>`
			```bash
			`python3 burstpdf.py`
			```

			`###Rotate the pdfs`
			`<p>The book scanner takes pictures of the pdfs, this scrip iterates through the odd and even pages rotating them to their original position.</p>`
			```bash
			`python3 rotation.py`
			```

			`###Cropping the bounding boxes`
			`<p>The pages are now in their original position, but they have a bounding box. This script iterates through them and crops the highest contrast area found.</p>`
			```bash
			`python3 bounding_box.py`
			```

			`###Cropping the mirror`
			`<p>The pages are now cropped, but the mirror is still visible in the middle.</p>`
			```bash
			`python3 mirror_crop.py`
			```

			`###OCR`
			`<p>In this part we OCR the jpg, turning these into PDFs.</p>`
			```bash
			`python3 tesseract_ocr.py`
			```

			`###Merge all the files and create the pdf`
			`<p>The OCRed pages are now joined into their final PDF, your book is ready :)</p>`
			```bash
			`./merge_files.sh`
			```

			`## License`
			`The package is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).`