diff --git a/.DS_Store b/.DS_Store index a1d8211..b363336 100644 Binary files a/.DS_Store and b/.DS_Store differ diff --git a/readme.md b/readme.md new file mode 100644 index 0000000..3e2694a --- /dev/null +++ b/readme.md @@ -0,0 +1,103 @@ +
You’ll need the command-line tools for Xcode installed.
+```bash +xcode-select --install +``` + +After install Homebrew.
+```bash +ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" +``` + +Run the following command once you’re done to ensure Homebrew is installed and working properly:
+```bash +brew doctor +``` + +```bash +sudo apt-get install python3 python3-pip imagemagick poppler pdfunite +``` + +```bash +brew install python3 python3-pip imagemagick poppler pdfunite +``` + +###PIP3 +sudo pip3 install pdf2image Pillow time logging opencv-python pytesseract + + +##How to use +Add your pictures from the book scanner to the folder "/scans"
+ +Make all the files executable.
+```bash +sudo chmod 777 merge_scans.sh workshop_stream.sh marge_files.sh +``` + +Run ./workshop_stream.sh
+ +Wait :)
+ + +##Aditional information +###Create 5 directories +```bash +mkdir split +mkdir rotated +mkdir ocred +mkdir bounding_box +mkdir cropped +``` +###Merge the files in the directory scans +All the scans will be appended to one pdf called out.pdf
+```bash +./merge_scans.sh +``` + +###Burst the pdf in scans +Burst this pdf, renaming all the files so they can be iterated later.
+```bash +python3 burstpdf.py +``` + +###Rotate the pdfs +The book scanner takes pictures of the pdfs, this scrip iterates through the odd and even pages rotating them to their original position.
+```bash +python3 rotation.py +``` + +###Cropping the bounding boxes +The pages are now in their original position, but they have a bounding box. This script iterates through them and crops the highest contrast area found.
+```bash +python3 bounding_box.py +``` + +###Cropping the mirror +The pages are now cropped, but the mirror is still visible in the middle.
+```bash +python3 mirror_crop.py +``` + +###OCR +In this part we OCR the jpg, turning these into PDFs.
+```bash +python3 tesseract_ocr.py +``` + +###Merge all the files and create the pdf +The OCRed pages are now joined into their final PDF, your book is ready :)
+```bash +./merge_files.sh +``` + +## License +The package is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).