various cross-reading prototypes

1.5 KiB

Raw Blame History

█▀▀ █▀▀█ █▀▀█ █▀▀ █▀▀ ░░ █▀▀█ █▀▀ █▀▀█ █▀▀▄ ░▀░ █▀▀▄ █▀▀▀ █▀▀ 
█░░ █▄▄▀ █░░█ ▀▀█ ▀▀█ ▀▀ █▄▄▀ █▀▀ █▄▄█ █░░█ ▀█▀ █░░█ █░▀█ ▀▀█ 
▀▀▀ ▀░▀▀ ▀▀▀▀ ▀▀▀ ▀▀▀ ░░ ▀░▀▀ ▀▀▀ ▀░░▀ ▀▀▀░ ▀▀▀ ▀░░▀ ▀▀▀▀ ▀▀▀

cross-reader (TF-IDF)

(a few notes)

Install

$ pip3 install flask 

$ pip3 install nltk

You also need to download a nltk package for the tokenizer that is used, to split sentences up in lists of words.

$ python3

>>> import nltk
>>> nltk.download('punkt')

Start

Start the flask/python local server ...

$ python3 start.py

Browse to your localhost on port 5001 ...

> 127.0.0.1:5001

Txt documents

The search machine is using the index.json file to process results.

The function 'create_index' can be called to generate this file. It uses a set of plain text files to index each word and its corresponding TFIDF value.

Changing txt documents

If you want to work with another set of texts, make a 'txt/' folder, add a few txt files in it, and remove the index.json file (or rename it if you want to keep it).

To generate a new index.json file:

Remove the index.json file

$ rm index.json

Stop and start the python server...

ctrl + c

$ python3 start.py