Text search in docx and pdf files

Hi,
i am looking for a node that acts like the F3 function in an pdf document. The purpose is to search for keywords in an bunch of technical documents available as pdf or docx.

Sorry, pressed the return key too early.
I searched in the forum and found a hint to the switch node and a post regarding the text search in log files. I tried to find something in the flows section but maybe i used wrong keywords, englisch is not my first language.

Any hints available?

Thanks in advance!

You can install the node-red-contrib-pdf-reader node.

Use a file read node, set it to output buffer, run it through the pdf node and attach a debug node.

Both of those types of files are binary not text, so you need something that can either peer inside the binary for embedded text or that can convert it to something simpler.

docx files are actually largely zipped XML. So you could simply unzip them and search as text but you would likely get a bunch of hard to understand XML code as well. Otherwise, you should look for a node.js library capable of parsing a docx and then use its API to search.

Alternatively, use something like the CLI Pandoc tool to covert to simpler and more easily parsed text.

@bakman2: thank you, i tried it with 3 different pdf's. Always get the message 'FieldError: Missing payload data'
There are more pdf-nodes available, will try something else.

@Totallyinformation: Very useful information, indeed! Will follow this branch of development!

@bakman: pdf hummus does a much better job

There appear to be a few node.js libraries such as pdf-parse - npm

Should be usable with a function node.