How to use external libraries?

Hi Community,

I'm trying to use the "mupdf-js" library in a function node (because mupdf offers more info than pdfjs).

To do so, I tried to follow the recommendations given in the thread # Problem importing external libraries to node-red. But still unsuccessfully.

What I did:
a) install a docker container with a Debian OS and Node version 20.8.1

b) install in the container nodered npm install -g node-red

c) declare nodered as a service to be started on container start with systemctl

d) install "mupdf" with npm install mupdf-js in the node-red folder

e) modify settings.js section "functionGlobalContext" as follows:

    // The following property can be used to seed Global Context with predefined
    // values. This allows extra node modules to be made available with the
    // Function node.
    // For example,
    //    functionGlobalContext: { os:require('os') }
    // can be accessed in a function block as:
    //    global.get("os")
    functionGlobalContext: {
        mupdfJS:require('mupdf-js')
        // os:require('os'),
        // jfive:require("johnny-five"),
        // j5board:require("johnny-five").Board({repl:false})
    },

f) restart NodeRed

g) Create a function node
g1) Declare "mupdf-js" as "mupdfJS" in the "SETUP" tab
g2) in the "On Message" tab :

const mudf = global.get('mupdfJS');
var doc = new mupdf.Document.openDocument('filename.pdf','application/pdf');
msg.payload = doc;
return msg;

When launched, the flow returns
TypeError: Cannot read properties of undefined (reading 'Document')

The logs of the Nodered service only showed:

[error] [function:muPDF] TypeError: mupdfJS is not a function

Any idea what step I'm missing?

Looks like there's a typo in the first line. Shouldn't mudf be mupdf?

Hi @ralphwetzel ,

You're right. This is however only a transcription mistake. You should read

const mupdf = global.get('mupdfJS');
var doc = new mupdf.Document.openDocument('filename.pdf', 'application/pdf');

Thx.

mupdf-js seems to be an ES module. You thus would need to import it - not require!

Having done that, the mupfd-js docu tells you:

... which is different from what you're doing.

1 Like

Thanks for highlighting that fact. I'm however still unclear about the syntax to use.

1 - I tried (already before) to use directly the code snippet from mupdf-js - npm you mentioned in the function node. But unsuccessful.

2 - Now with your highlights, I tried it in slightly different way.

import { createMuPdf } from 'mupdf-js';
var file = msg.payload;  // the pdf file is loaded with prior "read-file" node as a buffered array
const mupdf = createMuPdf();
const buf = file.arrayBuffer();
const doc = mupdf.load(buf);
msg.payload = doc;
return msg;

But I only got "SyntaxError: Cannot use import statement outside a module (body:line 1)".

3 - Same behaviour when declaring mupdf-js as mupdfJS in the Setup tab and replace in the first code line
import { createMuPdf } from mupdfJS;

4 - Same behaviour when using in settings.js

functionGlobalContext: {
        mupdfJS:import('mupdf-js')

5 - When using

var file = msg.payload;  // the pdf file is loaded with prior "read-file" node as a buffered array
const mupdf = mupdfJs.createMuPdf();
const buf = file.arrayBuffer();
const doc = mupdf.load(buf);
msg.payload = doc;
return msg;

the server is crashing !

A way out?

Try adding mupdf-js to function node setup. Name it mupdfJs then see what it contains by doing node.warn({mupdfJs})

It is likely you will see mupdfJs contains createMuPdf as an object or function. Just expand it out in the debug node until you see createMuPdf as a function - that'll be what you use.

And in case it is not clear, import and require are not possible directly in the function node code. That's what the setup tab does.

Hi @Steve-Mcl,

With node.warn({mupdfJs}); the debug windows shows 2 functions associated with mupdfJs:

  • createMuPdf
  • default

When now using

var file = msg.payload;
const mupdf =  mupdfJs.createMuPdf();
const doc = mupdf.load(file);
msg.payload = mupdf.getPageText(doc,0);
return msg;

the server crashes.

I tried several ways to invoke createMuPdf, but always resulting in a server crash.

This, according to npm readme is async so you'll need to await it

await mupdfJs.createMuPdf()

Also, note, that module uses web assembly. You may need a newer version of node or a flag set to use web assembly. I've not tried.

Lastly, if you show us logs from around the time it crashes, we will be able to better help.

Hi,

I inserted the async statement as follows:

const mupdf = await mupdfJs.createMuPdf();
const file = msg.payload;
const buf = file.arrayBuffer;
msg.payload = buf;
return msg;

It unfortunately resulted in a server crash again. Logs gave:

24 Oct 07:37:53 - [warn] Encrypted credentials not found
24 Oct 07:37:53 - [info] Server now running at http://127.0.0.1:1880/
24 Oct 07:37:53 - [info] Starting flows
24 Oct 07:37:53 - [info] Started flows
24 Oct 07:38:54 - [info] Stopping flows
24 Oct 07:38:54 - [info] Stopped flows
24 Oct 07:38:54 - [info] Updated flows
24 Oct 07:38:54 - [info] Starting flows
24 Oct 07:38:54 - [info] Started flows
TypeError: Failed to parse URL from /root/.node-red/node_modules/mupdf-js/dist/libmupdf.wasm
TypeError: Failed to parse URL from /root/.node-red/node_modules/mupdf-js/dist/libmupdf.wasm
24 Oct 07:38:57 - [red] Uncaught Exception:
24 Oct 07:38:57 - [error] RuntimeError: abort(TypeError: Failed to parse URL from /root/.node-red/node_modules/mupdf-js/dist/libmupdf.wasm). Build with -s ASSERTIONS=1 for more info.
    at process.abort (/root/.node-red/node_modules/mupdf-js/dist/libmupdf.js:9:13762)
    at process.emit (node:events:514:28)
    at emit (node:internal/process/promises:150:20)
    at processPromiseRejections (node:internal/process/promises:284:27)
    at process.processTicksAndRejections (node:internal/process/task_queues:96:32)

After some readings on the web, I decided to downgrade node.js version from 20.8.1 to 16.20.2. The flow doesn't crash anymore. A good start!

However, when starting that flow:

const mupdf = await mupdfJs.createMuPdf();
const file = msg.payload;
const doc = mupdf.load(file);

msg.payload = doc;
return msg;

The "doc" object returns empty.

Looking at the logs again:

24 Oct 08:26:44 - [warn] Encrypted credentials not found
24 Oct 08:26:44 - [info] Server now running at http://127.0.0.1:1880/
24 Oct 08:26:44 - [info] Starting flows
24 Oct 08:26:44 - [info] Started flows
24 Oct 08:26:50 - [info] Stopping flows
24 Oct 08:26:50 - [info] Stopped flows
24 Oct 08:26:50 - [info] Updated flows
24 Oct 08:26:50 - [info] Starting flows
24 Oct 08:26:50 - [info] Started flows
24 Oct 08:27:33 - [info] Stopping flows
24 Oct 08:27:33 - [info] Stopped flows
24 Oct 08:27:33 - [info] Updated flows
24 Oct 08:27:33 - [info] Starting flows
24 Oct 08:27:33 - [info] Started flows
24 Oct 08:28:25 - [info] Stopping flows
24 Oct 08:28:25 - [info] Stopped flows
24 Oct 08:28:25 - [info] Updated flows
24 Oct 08:28:25 - [info] Starting flows
24 Oct 08:28:25 - [info] Started flows
error: cannot recognize xref format
warning: trying to repair broken xref
warning: repairing PDF document
error: zlib error: incorrect header check
warning: read error; treating as end of file
error: corrupt object stream (2 0 R)
warning: ignoring broken object stream (2 0 R)
error: zlib error: incorrect header check
warning: read error; treating as end of file
error: corrupt object stream (67 0 R)
warning: ignoring broken object stream (67 0 R)
24 Oct 08:29:18 - [info] Stopping flows
24 Oct 08:29:18 - [info] Stopped flows
24 Oct 08:29:18 - [info] Updated flows
24 Oct 08:29:18 - [info] Starting flows
24 Oct 08:29:18 - [info] Started flows
24 Oct 08:31:49 - [info] Stopping flows
24 Oct 08:31:49 - [info] Stopped flows
24 Oct 08:31:49 - [info] Updated flows
24 Oct 08:31:49 - [info] Starting flows
24 Oct 08:31:49 - [info] Started flows

The question is now if the buffered array provided by the "read file" node has an 8 bit buffer encoding as expected by mupdf.

If so, I may facing an issue specifically related to the mupdf-js library. What do you think?

What settings are in the file read node?

Have you checked this file can actually be read by the lib outside of Node-RED? (I.e. create a small node project that does the same thing - did it work on that file with current node version?)

Hi,

Actually the file can be read outside NodeRed and can even be text extracted by the Python wrapping of muPDF (named fitz).

Settings in Read File node are as below:

image

Ok if you managed to get it working. I'll start all over again with another file and new nodered instance. Keep you posted. Thanks.

Hi @Steve-Mcl & @ralphwetzel ,

After quite a lot of struggle, I could make use of mupdf in Nodered. Let me share the issues encountered and the way I did.

(1) Using the mupdf-js package is in fact a known issue with WASM. Actually it did throw on my system a parsing error on libmupdf.wasm and made my Nodered instance crash.
More info on [FEATURE] Alternative way to instantiate mupdf-js instance · Issue #48 · andytango/mupdf-js · GitHub

(2) As per recommendation in the previous link, I toggled to mupdf package.
The function node had to be rewritten as follows:

const file = msg.payload;
const doc = mupdf.Document.openDocument(file, 'application/pdf');
msg.payload = doc.loadPage(0).toStructuredText().asJSON();
return msg;

From that node, I got a JSON object in msg.payload that can be then converted into an JS object with a JSON node as depicted below.

As a result, I have the complete structured text object of mupdf.

Hope this may help you (and others).

Thanks again for the hints provided.

4 Likes

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.