You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
manetta
8164d040eb
|
6 years ago | |
---|---|---|
.. | ||
static/css | 6 years ago | |
templates | 6 years ago | |
txt | 6 years ago | |
.gitignore | 6 years ago | |
README.md | 6 years ago | |
index.json | 6 years ago | |
readings.py | 6 years ago | |
start.py | 6 years ago | |
tfidf.py | 6 years ago | |
words.txt | 6 years ago |
README.md
█▀▀ █▀▀█ █▀▀█ █▀▀ █▀▀ ░░ █▀▀█ █▀▀ █▀▀█ █▀▀▄ ░▀░ █▀▀▄ █▀▀▀ █▀▀
█░░ █▄▄▀ █░░█ ▀▀█ ▀▀█ ▀▀ █▄▄▀ █▀▀ █▄▄█ █░░█ ▀█▀ █░░█ █░▀█ ▀▀█
▀▀▀ ▀░▀▀ ▀▀▀▀ ▀▀▀ ▀▀▀ ░░ ▀░▀▀ ▀▀▀ ▀░░▀ ▀▀▀░ ▀▀▀ ▀░░▀ ▀▀▀▀ ▀▀▀
cross-reader (TF-IDF)
(a few notes)
Install
$ pip3 install flask
$ pip3 install nltk
You also need to download a nltk package for the tokenizer that is used, to split sentences up in lists of words.
$ python3
>>> import nltk
>>> nltk.download('punkt')
Start
Start the flask/python local server ...
$ python3 start.py
Browse to your localhost on port 5001 ...
> 127.0.0.1:5001
Txt documents
The search machine is using the index.json file to process results.
The function 'create_index' can be called to generate this file. It uses a set of plain text files to index each word and its corresponding TFIDF value.
Changing txt documents
If you want to work with another set of texts, make a 'txt/' folder, add a few txt files in it, and remove the index.json file (or rename it if you want to keep it).
To generate a new index.json file:
Remove the index.json file
$ rm index.json
Stop and start the python server...
ctrl + c
$ python3 start.py