As a long-time exporter of flow data from edge devices, I've run into the issue of data underuse. I usually export from edge devices: WAN routers & firewalls. I've not personally found too much use in exporting flow data from core devices, as that's rarely where I need to do flow analysis. The edge devices have been the key in my experience, answering questions such as:
- Who's using all the bandwidth?
- How long has a given flow been alive?
- What's the traffic mix look like across this link?
What occurs to me is that I usually only look at flow data when I've got a problem, most often when a link is saturated and I need to quickly nail down the culprit. That said, I know that's a waste of interesting historical data, if only I'd take the time to look at it. There's just so much of it to plow through, that I usually don't take the time. It seems like there's always so many more pressing things to do that I don't take the time to make large chunks of historical flow data digestible.
To kick off a discussion, I'm curious to hear the answer to a few questions.
- Other than troubleshooting, how do you make use of your flow data?
- How often do you look at flow data summaries, and in what form (on-demand via HTML form, automated reporting, CLI top-talkers, etc.)?
- Summaries have the side effect of masking more granular data, i.e. smaller flows that might be interesting. In your view, is this a concern, and if so, how do you work around it?
- How old does flow data have to get before it's no longer useful? For the sake of SQL, I was only keeping 7 days worth of flow data in NTA, assuming that it would be very unlikely I'd need to go back further than that. That was true most of the time, but there were times I wished I could dig back further.