DIY Book Scanner Workflow

<h1 align="center">DIY Book Scanner Workflow</h1>

## Getting started

This set of scripts was written for the Text Laundrette workshop.<br>It is a workflow to turn the pictures from the DIY Book Scanner into a final OCRed PDF.

In case you want to skip any of the scripts just comment out in the shell code, <em>workshop_stream.sh</em>.

##Dependencies
###Brew (MAC) or apt-get (LINUX)
<p>You’ll need the command-line tools for Xcode installed.</p>

```bash
xcode-select --install
```

<p>After install Homebrew.</p>

```bash
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
```

<p>Run the following command once you’re done to ensure Homebrew is installed and working properly:</p>

```bash
brew doctor
```

```bash
sudo apt-get install python3 python3-pip imagemagick poppler pdfunite
```

```bash
brew install python3 python3-pip imagemagick poppler pdfunite
```

###PIP3
sudo pip3 install pdf2image Pillow time logging opencv-python pytesseract


##How to use
<p>Add your pictures from the book scanner to the folder "/scans"</p>

<p>Make all the files executable.</p>

```bash
sudo chmod 777 merge_scans.sh workshop_stream.sh marge_files.sh
```

<p>Run ./workshop_stream.sh</p>

<p>Wait :)</p>


##Aditional information
###Create 5 directories

```bash
mkdir split
mkdir rotated
mkdir ocred
mkdir bounding_box
mkdir cropped
```
###Merge the files in the directory <em>scans</em>
<p>All the scans will be appended to one pdf called out.pdf</p>
```bash
./merge_scans.sh
```

###Burst the pdf in <em>scans</em>
<p>Burst this pdf, renaming all the files so they can be iterated later.</p>
```bash
python3 burstpdf.py
```

###Rotate the pdfs
<p>The book scanner takes pictures of the pdfs, this scrip iterates through the odd and even pages rotating them to their original position.</p>
```bash
python3 rotation.py
```

###Cropping the bounding boxes
<p>The pages are now in their original position, but they have a bounding box. This script iterates through them and crops the highest contrast area found.</p>
```bash
python3 bounding_box.py
```

###Cropping the mirror
<p>The pages are now cropped, but the mirror is still visible in the middle.</p>
```bash
python3 mirror_crop.py
```

###OCR
<p>In this part we OCR the jpg, turning these into PDFs.</p>
```bash
python3 tesseract_ocr.py
```

###Merge all the files and create the pdf
<p>The OCRed pages are now joined into their final PDF, your book is ready :)</p>
```bash
./merge_files.sh
```

## License
The package is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).