I'm not going to talk about Docker, I've made my personal feelings on the issues it brings known previously. You need to balance those against any benefits.
What I would say is that, if you need resilience - and I suggest that you really do - you need to look at more than a single Pi.
I think that you have two basic approaches. I'm sure other people can chip in with other approaches.
One would be to run a cluster - which would be fine if everything you were running were cluster aware and so would continue to work if one of the Pi's in the cluster fails. However, I don't believe this is very feasible with the collection of software that you are using (which, incidentally is very similar to mine).
The alternative is to run two independent instances of everything on two separate Pi's and then get them to work in concert. This is not hard to do and doesn't require any fancy OS level software but does require some careful planning out of how things communicate.
With this approach, you need the Pi's to have their own identities/IP addresses and to set up Node-RED, Mosquitto and InfluxDB on both.
Then you need flows on both that detect whether Node-RED is still able to talk to the outside world. You can use these to auto-reboot the device which may fix whatever the problem was.
You should also have flows from each NR instance that give an MQTT heartbeat on both instances of Mosquitto, along with a last will and testament so that the broker itself will mark Node-RED as being offline. This is important as it means the other instance of Node-RED can detect when the first goes offline reliably without any other software or hardware. This lets you send out your alerts - personally, I use Telegram rather than SMS. If you want to get clever, you could use one of the circuits you can find online that will let you cut power and forcibly restart the other Pi though there are, of course, some risks with that.
Then you need to decide how far you want to go with resilience. The above is fairly easy to set up and work with and keeps things simple. It is easy to extend such that both Pi's will normally be recording data should you wish that. But all it really does is give you notifications and lets you reboot the offline device.
Next step, would be to go for a simple failover approach. In this scenario, you designate a "main" Pi. The alternate monitors the main Pi and if it goes offline, probably tries to restart it first. If that fails, it runs a script that makes it the "main" device and swaps from running the monitoring flow to the live flow along with firing up the supporting services. This is probably where Docker can help. You will lose some data with this approach since you haven't got a sync'd copy of your InfluxDB data, it is probable that you could fix that but I'm not sure if a normal Pi would have the resources to run a suitable Influx cluster. I also don't know how difficult it is to set up. In any case, such data is generally a nice-to-have rather than a critical element. Doing a periodic copy (nightly maybe) of the Influx data to the backup Pi may help with that.
Other than getting the failover script right, this again is fairly straightforwards to achieve and again requires no particularly special software or knowledge.
On the ESP hardware side of things, you will want to set them up so that they talk to both MQTT brokers and have flows that detect when they go offline so that you can take action. You should also make them restart themselves once if they can't reach the broker, but don't let them go into a reboot loop as that will kill them quite quickly.
There are lots more things that could be done but this post is already quite long.
You mention a single IP name though and to touch on that, I don't think that will help you massively here since you would really need a reverse proxy to handle proper failover and that becomes another element that you need to make resilient - it has to run somewhere. You would need to proxy at least Node-RED (http(s) and websockets) and MQTT. Proxying the InfluxDB wouldn't help unless both instances have the same data.
One other thing. Don't forget that it isn't just the Pi and its services that might fail. WiFi is another element that can commonly fail. Your Pi's should be hard wired (switches rarely fail) and should monitor WiFi availability, reporting on failure. I would also recommend using a separate access point rather than one built into a router.
So to summarise:
- You need 2 Pi's or at least something else to run at least Node-RED and MQTT
- A minimum setup would simply report when the main Pi goes offline. It might trigger a restart of that Pi (e.g. via a SONOFF switch)
- An extension to that would be a failover script that turns the backup into the primary (some Influx data will be lost)
- You can go further but it gets very complex, very quickly.
- You need to also report on WiFi not just the Pi and its services. Again, you could have a SONOFF or similar switch on your WiFi AP to restart it. Truly resilient WiFi is very hard.
And finally, many Pi related issues are either caused by poor power supplies or by overloading the Pi. Get a really good power supply and put your router, switches, AP and Pi's on a filtered power supply - a PC UPS is ideal, but at least a reputable make of a protected extension board. On the Pi, disable the GUI desktop and remove any software that isn't really needed.