Well, it is effective. It absolutely prevents escaping from the required root folder. Following the standard that you always validate user input as close to source as possible. Replacing ..
with an empty string and ensuring there is no leading /
- or better still, prepending the absolute root folder to the start of the string - gives absolute certainty that an input cannot escape. And is pretty easy to do as well.
Working much better with the latest version! Do you think it would be appropriate it you had two files with the same extension but one in lowercase and one in upper:
file1.txt
file2.TXT
to delete them both by converting all file names and patterns to lowercase before processing?
no, most file systems are case-sensitive by design and assumptions should not be made IMO.
You can however use glob patterns to achieve this. e.g *.{log,LOG}
Try it out here: Glob Tool | DigitalOcean
Bart, gave it a quick go, something funky going on
This is what I expected: Glob Tool | DigitalOcean
EDIT
This may be partly down to how I entered paths and being on a windows system. e.g. when I set the base path to use x\y\z
vs x/y/z
format I get different results.
I think what is happening is there is no path normalisation going on. e.g. to be platform agnostic, I typically convert all paths to posix format and resolve them to ensure sane paths. Also (i didnt check, you may already be doing this) but I also use nodes path.join
to ensure no double slashes or missing slashes in the final joined paths.
A few thoughts that struck me:
The node-name File Retention Manager is not very helpful.
I found it a bit confusing that after installing cleanup-filesystem, searching the palette for cleanup or filesystem did not show it.
Someone in need of a node to delete unwanted files rather than the command line is surely going to be intimidated by having to choose between glob and regex, never mind constructing these filters?
A pattern is not mandatory but age is. Why?
The node seems to ignore the age setting for empty directories:
mkdir /home/pi/stuff/brandnew
Immediately run the flow with age set to 5 min
report.folders includes brandnew
[
"/home/pi/stuff/brandnew",
"/home/pi/stuff/directorywithfiles/emptydirectory",
"/home/pi/stuff/emptydirectory"
]
For setup purposes it might be handy if the report array included the modification time for each file.
The debug pane output would be more compact if the base folder was not shown on each line.
Edit - another niggle:
The output in msg.payload should be msg.payload.result. I've spent half an hour trying to find any glob pattern that works, only to realise that msg.payload.deletedFiles is empty because the dry run flag is set, not because the pattern is broken.
I know, it is deletedFiles but still it confused me.
Agree! Even though I know what it is (due to experience and being a programmer), I dont hink it is ideal for low-code terms.
Kinda on side here. I am usually of the opinion "choice is good" but I am not sure regex is really warranted in this case (tho I am happy Bart defaulted to glob
)
@BartButenaers curious as to why you added it to this "filesystem" category. Do you have other file type nodes using that category? As you can see, your node is all alone in there
Perhaps the storage
category (alongside the file read/write nodes) could be a better fit ?(NB: the storage
category is internationalised and therefore more discoverable for foreign speakers) :
Using the glob pattern **/*junk*
offers these files for deletion (files with a problem underlined)
-
From a Linux perspective I would not expect
.junk1
to be discovered by this glob. Compare with
Edit - But I see thatfind
also shows dotted filenames.
-
Using the glob pattern
**/?junk*
however, which should discoversjunk1
does not do so.
cf
Try ls -a **/*junk*
By default ls
does not show hidden files.
Hmm. Thanks Colin.
also path.normalise is useful to resolve those that you can then check are still valid (or in this case still start with the base dir.)
Hey Steve,
Thanks for the tip! I used already path.join
Not used the path resolving before.
I have added it to Github. Hopefully that solves it...
I think this is because during this discussion I changed the node name to *node-red-file-retention-manager'. So when I enter "file" or "retention" I see it popping up:
That might be the case. Not sure. For me it is very clear, but might be unclear for others. No idea. That is why I asked some renaming proposals above...
Personally I like regex much more, but I have added glob because I assumed from the above discussion that it is more widely used for file related searches. I will add some extra info in the documentation to make the distinction a bit more clear, and some example patterns to get them started. I have made "glob" the default option, but I am going to keep regex because I like it more. And I am the most important customer for this node
That was not the intention, because I added a check to make sure a pattern is defined. Need to work tomorrow evening, so will try to fix it in a couple of days from now.
Hmm that might indeed be that case. My use case for this node was to remove the video footage from my Reolink doorbell. So the focus is on removing files, and then check whether to remove the folder that had become empty. Will need to have a look at that also.
Yes indeed that might be usefull. Had not thought about that, because for Reolink the directory structure already contains the year & month & day, so I had already all that info in my report indirectly. Need to have a look at that also...
Yes true. On the other hand if you have multiple instances of this node (each with their own base folder), it is not clear anymore which files have been removed...
Yes I had added that to the documentation, because @zenofmud was also confused in the beginning. I have already considered to calculate deletedFiles and deletedFolders in case of a dry-run, but I didn't do that for two reasons:
- Now it is clear to me that the dry-run did not delete anything.
- Otherwise it would be some kind of redundant data in the output msg, because it contains the same info as the length of the msg.payload.report.files and msg.payload.report.folders arrays.
Perhaps I need to explain it more clearly both in the info panel and readme page.
My node has not be complaining about if feeling lonely...
Choosing a category is always about creative. There is no global list of categories to choose from, and I don't have time to start searching for all related nodes across the web to figure out which category they have used. Moreover I don't always like categories that other developers have choosen. Moreover they oftern use "function" because they have copied the code from another node and forgot to change it... So I have choosen "filesystem" simply because I liked it.
But yes I can change it to "storage"...
Thanks for the assitance @Colin !
@BartButenaers - As a test, I let your node loose on my music folder earlier, which contained a large number of sub-directories and .mp3 files, where the names for both contained spaces, such as Alter Bridge
and tracks 11.Clear Horizon.mp3
.
I successfully deleted a number of files & folders using glob
- (music that I uploaded for my grandson 8 years ago!) which all worked fine!
All of the nursery rhymes are now gone
You seem to be applying the regex pattern to the full pathname of each file, thus regex
.{15}junk.*
finds 6 of my test files to delete
and ^junk.*
finds none.
This seems a little surprising, especially so if at some future time "base folder" can be passed to the node as a message property and is thus of indeterminate length.
Yes indeed you are right. Will need to change it. Thanks for testing!
Hi guys,
Did quite some refactoring, and hopefully most of the feedback is now implemented in one or another way. The readme page is up to date, in case anybody wants to test this new version...
Got some feedback that the current name "node-red-file-retention-manager" doesn't clearly explain what the node does. So I am willing to rename it one more time, to explain more clear that it is used only to remove files (and optionally folders) that have exceeded a specified age.
So let's do a poll:
- node-red-file-retention-manager
- node-red-file-expiry-manager
- node-red-aged-file-cleaner
- node-red-file-purger
- node-red-aged-file-eraser
- node-red-aged-file-purger
You can select 2 options maximum, and I will close the poll in a couple of days.
Thanks!!
- node-red-rm (to keep the name simple so that the experts will find it)
- node-red-delete-file-by-age (chatgpt helped with this one)
The name with eraser
makes me think of the contents of the file being erased as opposed to the file being unlinked. The name with retention
leads me to think that it focuses on retaining more than it does on removing, such as moving the files to another location for a more permanent long-term storage.
How about. node-red-delete-aged-files
When I have worked with retention files in a database, what should I look for in NodeRed?
The InfluxDB retention enforcement service checks for and removes
data with timestamps beyond the defined retention period of the
bucket the data is stored in. This service is designed to automatically
delete "expired" data and optimize disk usage without any user
intervention.
What about node-red-file-blaster
Or... node-red-file-remover
?