diff --git a/.DS_Store b/.DS_Store index a1d8211..b363336 100644 Binary files a/.DS_Store and b/.DS_Store differ diff --git a/readme.md b/readme.md new file mode 100644 index 0000000..3e2694a --- /dev/null +++ b/readme.md @@ -0,0 +1,103 @@ +

DIY Book Scanner Workflow

+ +## Getting started + +These set of scripts was written for the Text Laundrette workshop. It is a workflow to turn the pictures from the DIY Book Scanner into a final OCRed PDF. + +In case you want to skip any of the scripts just comment out in the shell code, workshop_stream.sh. + +##Dependencies +###Brew (MAC) or apt-get (LINUX) +

You’ll need the command-line tools for Xcode installed.

+```bash +xcode-select --install +``` + +

After install Homebrew.

+```bash +ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" +``` + +

Run the following command once you’re done to ensure Homebrew is installed and working properly:

+```bash +brew doctor +``` + +```bash +sudo apt-get install python3 python3-pip imagemagick poppler pdfunite +``` + +```bash +brew install python3 python3-pip imagemagick poppler pdfunite +``` + +###PIP3 +sudo pip3 install pdf2image Pillow time logging opencv-python pytesseract + + +##How to use +

Add your pictures from the book scanner to the folder "/scans"

+ +

Make all the files executable.

+```bash +sudo chmod 777 merge_scans.sh workshop_stream.sh marge_files.sh +``` + +

Run ./workshop_stream.sh

+ +

Wait :)

+ + +##Aditional information +###Create 5 directories +```bash +mkdir split +mkdir rotated +mkdir ocred +mkdir bounding_box +mkdir cropped +``` +###Merge the files in the directory scans +

All the scans will be appended to one pdf called out.pdf

+```bash +./merge_scans.sh +``` + +###Burst the pdf in scans +

Burst this pdf, renaming all the files so they can be iterated later.

+```bash +python3 burstpdf.py +``` + +###Rotate the pdfs +

The book scanner takes pictures of the pdfs, this scrip iterates through the odd and even pages rotating them to their original position.

+```bash +python3 rotation.py +``` + +###Cropping the bounding boxes +

The pages are now in their original position, but they have a bounding box. This script iterates through them and crops the highest contrast area found.

+```bash +python3 bounding_box.py +``` + +###Cropping the mirror +

The pages are now cropped, but the mirror is still visible in the middle.

+```bash +python3 mirror_crop.py +``` + +###OCR +

In this part we OCR the jpg, turning these into PDFs.

+```bash +python3 tesseract_ocr.py +``` + +###Merge all the files and create the pdf +

The OCRed pages are now joined into their final PDF, your book is ready :)

+```bash +./merge_files.sh +``` + +## License +The package is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).