I tried a few search queries today and my initial impression is its way too loose with its results. Even when directly specifying complete sentences from one pdf like:
"TMS44100, TMS44100P, TMS46100, TMS46100P 4194304-WORD BY 1-BIT DYNAMIC RANDOM-ACCESS MEMORIES"
It should really only return tms44100.pdf, but it returned 5 pdfs.
There should be a way to understand when something should be loose and when something should be strict.
But thinking through what this software should do, I think we only need really strict search. If I want "Hex Schmitt Trigger" I don't want a synonyms or pseudonyms of Trigger.
https://git.vvvvvvaria.org/varia/go-sh-manymanuals/issues/2#issuecomment-955
> I tried a few search queries today and my initial impression is its way too loose with its results. Even when directly specifying complete sentences from one pdf like:
> "TMS44100, TMS44100P, TMS46100, TMS46100P 4194304-WORD BY 1-BIT DYNAMIC RANDOM-ACCESS MEMORIES"
>
> It should really only return tms44100.pdf, but it returned 5 pdfs.
> There should be a way to understand when something should be loose and when something should be strict.
>
> But thinking through what this software should do, I think we only need really strict search. If I want "Hex Schmitt Trigger" I don't want a synonyms or pseudonyms of Trigger.
I suspect it is due to this naive approach to indexing in 9a8ff220d2/exp/bleve.go (L87-L89) which is just key = filename, value = plain text contents of file. Do we need to process a bit the content of the file and then generate indexes from that? I'm really not good at informational retrieval, maybe someone can help us on this 🤔
I suspect it is due to this naive approach to indexing in https://git.vvvvvvaria.org/varia/go-sh-manymanuals/src/commit/9a8ff220d28e41b8351d187e12c43faded43315f/exp/bleve.go#L87-L89 which is just key = filename, value = plain text contents of file. Do we need to process a bit the content of the file and then generate indexes from that? I'm really not good at informational retrieval, maybe someone can help us on this 🤔
One interesting part is that it can match terms from an index and then generate a PDF of the relevant pages on-the-fly for review. Unsure if they also include highlighting the actual text, but that could be possible also. Unsure how that could translate to a terminal environment.
Just saw this fly by and is related: https://github.com/PaperCutSoftware/pdfsearch
One interesting part is that it can match terms from an index and then generate a PDF of the relevant pages on-the-fly for review. Unsure if they also include highlighting the actual text, but that could be possible also. Unsure how that could translate to a terminal environment.
And the code [here](https://github.com/peterwilliams97/pdf-search) uses Bleve too.
Should we ever return to hack again 😆
> Should we ever return to hack again 😆
![2024 is the promise](https://thumbs.dreamstime.com/b/hacking-future-hack-concept-hacker-using-laptop-digital-business-interface-double-exposure-136506720.jpg)
https://git.vvvvvvaria.org/varia/go-sh-manymanuals/issues/2#issuecomment-955
I suspect it is due to this naive approach to indexing in
9a8ff220d2/exp/bleve.go (L87-L89)
which is just key = filename, value = plain text contents of file. Do we need to process a bit the content of the file and then generate indexes from that? I'm really not good at informational retrieval, maybe someone can help us on this 🤔Just saw this fly by and is related: https://github.com/PaperCutSoftware/pdfsearch
One interesting part is that it can match terms from an index and then generate a PDF of the relevant pages on-the-fly for review. Unsure if they also include highlighting the actual text, but that could be possible also. Unsure how that could translate to a terminal environment.
And the code here uses Bleve too.
Should we ever return to hack again 😆