The science of network troubleshooting (updated)
By stretch | Monday, May 19, 2008 at 1:06 a.m. UTC
I'm really sick of hearing about the art of this or the art of that. The "art" of security, the "art" of intrusion detection, the "art" of exploitation (with apologies to Jon Erickson). But what really irritates me is the idea of troubleshooting as an art. Troubleshooting, performed correctly, is essentially the correlation of cause and effect. This is scientific process at its core, and this is the point I hope to make in a new paper, The Science of Network Troubleshooting.
From the paper:
Consider walking into a dark room. The light is off, but you don't know why. This is the observed effect for which we need to identify a cause. Instinctively, you'll reach for the light switch. If the light switch is on, you'll search for another cause. Maybe the power's out. Maybe the breaker's been tripped. Maybe someone stole all the light bulbs (it happens). Without much thought, you investigate each of these possible causes in order of convenience or likelihood. Subconsciously, you're applying a process to resolve the problem.
I approach troubleshooting as a generic five-step process, developed from my own experience. The paper is written with the network engineer in mind, but its methods are vague enough to be applied to a variety of fields.
Posted in Announcements
May 20, 2008 at 3:22 p.m. UTC
I couldn't agree more!
May 22, 2008 at 1:44 a.m. UTC
In defense of the "The Art of ..." I liken your approach to troubleshooting to an artists paint brush.
Art may not be the best word, creativity would be a much better fit, but wouldn't look as good as a book title.
May 22, 2008 at 1:54 a.m. UTC
If that were the case, all paintings would look the same.
The whole point of the paper is that troubleshooting is a systematic process.
May 22, 2008 at 1:56 p.m. UTC
I think the exact opposite. Creative people have always made better paintings than others.
Following a flow chart is the beginning, being able to combine or omit steps by using more specific tests is the "next step"
May 30, 2008 at 11:31 a.m. UTC
Because not everyone has the same experience and knowledge, troubleshooting has to be considered an Art. If we all had complete knowledge of every device in the chain, then troubleshooting could be distilled into a flowchart set into stone. At a very high level, perhaps you can consider it a scientific step-by-step process, but those steps will be only marginally helpful in solving an actual issue.
May 30, 2008 at 1:26 p.m. UTC
The point of having a scientific approach is that no intimate knowledge of a specific system is necessary. For example, I've never worked with HP switches, but I am confident I can apply the same science (identify effects, eliminate causes, etc.) in troubleshooting one just as I could any other device. I decide which steps to take based on the outcome of prior experiments (science), in a logical order, rather than examining or altering whatever I feel (art).
May 31, 2008 at 4:17 a.m. UTC
I find your "cheat sheets" artistic. Not necessarily art in and of themselves. But artistic none the less.. Would you agree and why or why not???
I think the answer is going to very telling.