Seeking Help with Challenges in Visualizing Massive Data

Hello, Everyone
Due to project requirements, I need to create multiple line charts on the dashboard to display the values returned by approximately 1500 sensors in real-time. Undoubtedly, such a large amount of data will cause issues like excessive load, data congestion, and other related problems when displayed on the dashboard. Therefore, I hope you can provide some suggestions to alleviate these issues appropriately.

Show us a sketch of what you would like to see. I cannot imagine how you can possibly display that much information at one time.

The above image is just a small portion of the sensor data that I need to visualize, and it’s only a static display using Excel. In the actual scenario, I hope to display more data in real-time. Are there any methods to handle this?

Nothing beats dygraphs for this type of linechart, can handle 100’s of thousands datapoints, can setup some demo later today.

I suspect it depends whether you can classify the sensors, geographically or by production line/workflow etc and whether you can identify sensors which might need attention - eg abnormally low/high values

You can't possibly display 1500 lines on a chart. How many pixels high is your monitor?

There are websites which visualise massive quantities of data - for example weather reports, aeroplane locations

And WinDirStat displays a million or more files grouped by location, sized proportinally and colour coded according to some other measure. It's easy to see those big red splodges and you can click on them to see further details.

In fact, the project I'm currently working on does involve monitoring the temperature of a large machine, and I'm using many temperature sensors. I just did a quick search with ChatGPT, and it seems that I can use the template to introduce other specialized big data visualization software like Dygraphs and WinDirStat to achieve this goal. Is that correct?

WinDirStat is a windows native application - this cannot be included in a web page.

Do NOT rely on ChatGPT for things like this - it aims to please by giving credible sounding answers that are nonsense!

You can however use echarts in a ui-template (Dashboard 2.0) which has a treemap chart

HaHa, there's no doubt that ChatGPT is not entirely intelligent, and I've never fully relied on its idealistic responses. However, when we're unfamiliar with something, it can more or less provide some suggestions worth trying. :crazy_face:
As for WinDirStat, although I still doubt whether it can achieve the results I expect (the task I have planned is indeed quite demanding), I will give it a try later. Thank you for your reply!

I only gave WinDirStat as an example of displaying large quantities of data graphically. I have no expectation of it being applicable to different kinds of data, nor embedding it in Node-red.

Sifting through the outliers in big data before actually displaying is more effective though.

Just a quick reminder that not everything needs Node-RED as the answer. :wink:

If you are open to additional tools, note that Grafana excels (no pun intended!) at this kind of thing. While it is most commonly used with a timeseries db, you can also use MQTT and HTTP or websockets as data sources which would work just fine in conjunction with Node-RED.

Hi,

Based on the various responses I think one has to rather be practical and use some common sense.

Trying to manage so many sensors in a chart does not make sense.

Rather one should be asking what is it that I am trying to detect. Surely it would be better to only detect and display outliers where the particular sensor has exceeded some boundary.

You are using the temperature sensors not to do run-of-the-mill logging and visualisation, but rather you appear to be trying to see whether the measured variable is staying within expected or prescribed bounds.

It would be far better and easier to define an upper and lower bound for each sensor and then only display those that fall outside of this boundary.

This will apply to all others charting tools as well.

One could use historical data to generate the upper and lower limit profile and then use these to detect or isolate any outliers.

Anyway my pennies worth.

Cheers

Greg Diana

1 Like

I would also echo this. As the guy that built Dashboard 2.0, I can recommend it, but I'd be asking a more fundamental question - why are you trying to visualise so many sensors on a single chart, and what question is that chart meant to answer each time you look at it?

Excuse the self-promotion, but I do have an article about this you may find interesting: Designing with Data. Five considerations to take when… | by Joe Pavitt | Medium

In particular, Point 5 which stresses:

Just because you have the data, doesn’t mean you should visualise it

1 Like

If you're interested on how these are built - I also have an article about that :grin: Developing a Data-Driven Game & Wind Animation with Canvas’, Vue & d3.js | by Joe Pavitt | Medium

In particular this section.

These modern weather dashboards all use canvas' and interpolate data points that they have in order to render pseudo data points between, e.g. when animating wind direction - all inspired by Cameron Beccario who built Earth Nullschool because he wanted to learn JS after being made redundant from his Java developer job and was a weather nerd

3 Likes

I think this is the key question, but probably the hardest one as well - because sometimes you don't know what the data provides and yet you want to capture it :') I am dealing with with big data (billions of records) professionally and answering the 'question' is so extremely hard when data is variable, if normalized, it becomes somewhat easier.

I also fully agree with this;

Just because you have the data, doesn’t mean you should visualise it

I would go even further - don't even store it either, just the outliers, known stuff is not relevant.
Nice challenges nevertheless :slight_smile:

Personally, I fully understand and agree with your viewpoint that using Node-RED or other tools to monitor such a large volume of temperature data in real-time is neither practical nor useful.
In fact, my colleagues have previously developed a similar watchdog service system using different software to monitor temperature changes in various areas. This fully data-driven approach required us to spend too much time locating specific anomalies in practice, which was inefficient. Therefore, while developing data visualization in Node-RED this time, we are re-discussing the feasibility of visualizing these temperature data.
Besides directly creating a real-time line chart with such a large amount of data, we are also trying to return these data in the form of Excel sheets and charts at regular intervals. However, this approach also involves considerations of timeliness.
Actually, I haven't been using Node-RED for long and lack sufficient depth in understanding it. The reason I brought up this discussion is not only to seek advice from others with more experience in Node-RED development but also to explore the possibility of using the Node-RED framework to handle large volumes of data. With such capability, whether it be large-scale real-time visualization or even further digital twins, it would offer better prospects.
In short, I really appreciate everyone's sharing.

I would strongly recommend NOT using Excel for such data. It is inefficient for that. Use a time-series database such as InfluxDB. This will let you easily analyse very large sets of time-related data and it will be very much easier to do time-related analysis such as having an hourly average/max/mix, anomalies per hour, etc measurements. Then it is easy to chart these using Grafana.

You can still use Node-RED to help manage the incoming data and to orchestrate alerting, etc.

1 Like

Node-RED can absolutely handle these workloads, and I echo @TotallyInformation - avoid Excel for this.

If you have to visualise this, then Grafana may be more appropriate, but I would be looking a ta use case where by Node-RED monitors the values, emits an event to a table if the devices peaks over a given threshold. Clicking that entry in the table could then take you to a page to show all of the raw data for that particular sensor. Rather than rendering all 1,500 in one hit. All of which could be built in Dashboard 2.0.

The question of "are there anomalies" is answered quickly and clearly, and you can still deep-dive into the data when required.