How Much Can Go Wrong with a Cross-Connect?
By stretch | Friday, May 11, 2012 at 1:12 a.m. UTC
In data centers, customers are interconnected with carriers and other customers using dedicated physical connections called cross-connects. These can be just about any type of medium, however most cross-connects run inside a single building are copper or fiber Ethernet. In the simplest of cases, a cross-connect can be a single CAT 5e or 6 cable running from a patch panel in one cage to a patch panel in another cage. Very little can go wrong, or so one might think.
I ordered two such Gigabit Ethernet cross-connects recently. Two lines of CAT 6 cable from a cage leased by company to a customer's cabinet down the hall. Simple. One of the cross-connects came up fine, as usual. The other did not.
This was odd, because the data center tech who installs the cross-connect is responsible for certifying its operation before making it available to the customer. I check it out with my Fluke and see that pairs 3/6 and 4/5 are crossed. No big deal, probably just needs an end re-terminated. I disconnect the patch cables from the panels at either end of the cross-connect so a tech can re-terminate it, open a trouble ticket with the data center, and go on about my day.
The next day, I get an email confirmation that ticket with the data center has been closed. Awesome. I go to plug the cables back in thinking the issue has been resolved. Same problem as before; a short on 3/6 and 4/5. Annoyed, I call up the data center help desk and ask what's up.
It seems that the technician dispatched to handle my ticket immediately concluded that my cross-connect was not working because "the patch cables were unplugged at both ends."
The data center has a very responsible policy of never disconnecting a plugged-in cable without first contacting a customer it belongs to for verification, hence me leaving the cross-connect patches disconnected. After conveying this logic to the help desk representative, I scheduled a second visit, this time requesting a tech to meet me in the cage at a specific time should any hand-holding be necessary.
The appointment comes around the next week and I head out to the data center. I wait in the cage for about ten minutes, and no one else shows up. I call up the help desk, again, and ask them to send someone out. The tech shows up maybe ten minutes later. In the cage behind mine. I suppose they do all look alike, though only mine has the cage number I gave him ten minutes ago.
The tech makes it into the cage and starts working. Breaks out his Fluke cable tester (one of the fancy models) and runs a TDR test on the cable. The Fluke randomly reboots halfway through the test. He tries again, it reboots again. He leaves to grab another tester. This one completes the test and the cross-connect fails verification.
He dismounts the jack from the panel and finds the original installer neglected to trim the wire ends after punching down the CAT 6 cable. Sloppy work, but an easy fix, and likely the cause of the short. The tech trims the stray wires from both ends of the cross-connect. Hooray, progress! Or so I thought.
The tech goes to mount the jack back in the panel, which is at a difficult angle, and breaks the jack. Oops. He reterminates the cable into a new jack and successfully mates it back to the patch panel. Now we're getting somewhere. I thank him and plug in the patch cable on that end. I head over to the cabinet where the other end of the cross-connect terminates, intent on plugging in the patch at that end as well and calling it a day.
Except I can't open the cabinet door. At some point in the last few days, the electronic locking mechanism on the cabinet door died. The cabinets on either side of this one open fine, as does the door on the opposite side of this cabinet. But not this door. Not the only door I need to open for ten seconds so I can plug in a cable and put this all behind me.
I call up the help desk again. If the same person answered the phone every time, we'd be on a first-name basis by now. I explain my predicament. She says it's the first time she's ever encountered this issue in her employment there. Lucky me. She puts in a high-priority smart hands ticket and I chill for a bit in the pseudo-office space they have for customers.
A different data center tech pops in to verify the problem and cabinet number. I hesitantly inquire as to how they even set about getting open a door with a failed lock, and my fears are confirmed: a crow bar. This explains the bent cabinet doors I see leaning up against cage walls here and there throughout the data center.
I received confirmation this afternoon that my cabinet door issue has been... resolved. But to be completely honest, I'm no longer sure I want to plug in this cross-connect. The amount of hassle this one particular cross-connect has conjured cannot be coincidental. It's as if some divine power is forbidding this cross-connect from ever carrying data. But dammit, I'm determined to win this battle. We'll see how it goes tomorrow.
Posted in Humor
May 11, 2012 at 1:47 a.m. UTC
Ahahaha this seems like such a typical week in a datacenter. Those electronic locks are the worst too. I much prefer the old-school Master locks that 1: have a second spare key and 2: can be cut.
In all honesty it does seem like you've been dealing with some lazy workers. This is much too common in datacenters and represents a huge percentage of the cause of "Sysadmin Rage".
I feel your pain.
May 11, 2012 at 2:32 a.m. UTC
Sadly I get to deal with this on a almost daily basis, as I work for a telecom company that dose business all around the Western states.
Fiber, copper, coax, they all suck and pretty much it seems like the level of ability of the remote hands guys and the guys who run cross connects is about zero.
You're lucky you can physically go and see the problem, being on the phone 1000s of miles away is horrible.
May 11, 2012 at 4:35 a.m. UTC
Does your hosting provider's name start with an S and end in an s?
May 11, 2012 at 12:25 p.m. UTC
Oh, I could tell you some utter horror stories about dealing with the data center staff at the Telx building at 56 Marietta in Atlanta....
wiring the wrong cabinet for power, needing some of my cross connects changed to a different floor of the building, and having them cutover the one i needed to stay upstairs, the nightmare of getting a phone line installed for out of band management... it finally got the point where I made sure I was on site for any work they were performing for me so I could validate and test it immediately
May 11, 2012 at 1:20 p.m. UTC
Does that data center start with a P and end with a zero? Lol
May 11, 2012 at 1:28 p.m. UTC
Last time I had a cabinet get locked like that, I was lucky it was only the front door on the cabinet. We were able to open the back door and reach through to remove the pins from the hinges on the door with the failed lock. I am assuming you are in a DC with a provider that starts with E and ends with X since you reference Smarthands.
May 11, 2012 at 5:50 p.m. UTC
You're too funny. Great post. What's the provider?
May 11, 2012 at 5:51 p.m. UTC
I hate it when some cable monkey is called out to fix a line that you've proven to be at fault, only for them to tell me its not the line at fault and actually something to do with my switch.
One such case last time I was in Afghanistan was a cable for a VoIP phone. There are errors on the interface, the phone simply displays network initialising and if I plug it directly into the switch port using a short patch cable, it works. I book this out to the relevant team explaining why I believe it to be the cable at fault, only to have them come back to me a day later telling me that its not working because I'd missed the fact it was not getting an IP address. What am I like! Silly me. They didn't even bother testing the line, which was only 50m long and a 1 minute job at worst.
I go back to them to explain that the reason its not getting an IP address is due to a bad line. I won't bore you with the rest of the story as it's a blog post in itself, but they eventually worm out of replacing the cable on a silly technicality that the user was good enough to accept, and we had to go about a different route to fix it... that meant they still had to install a new cable anyway.
Some of the jobs I've had to go to due to bad installations are soul destroying...
May 11, 2012 at 5:57 p.m. UTC
After being told the problem was the patch cables were not plugged in did you laugh or cry? I think I probably would have done both though not sure in which order. haha
May 13, 2012 at 2:33 p.m. UTC
Reminds me of the time when a large carrier punched down a cross connect wrong on both ends, causing the 100MBit connection to work until a few MBit/s, and to miserably fail above. Took some weeks of troubleshooting and the good old blame game until they finally checked their own work and found the mistake...
It did not help that theirs techs were so old-school they insisted on disabling auto-negotiation on anything ethernet...
May 13, 2012 at 11:16 p.m. UTC
This reminds me of a tech I was working with on Friday. We had a shelf go down in a DSLAM. I dispatched the tech and he insisted that I was looking at the wrong DSLAM and that the shelf was perfectly operational. After arguing with him, I convinced him to power cycle the shelf. After he did, the shelf synced with the head node and returned to service.
Is there a reason that field/cable/install techs don't trust us?
May 15, 2012 at 1:31 p.m. UTC
Aha I completly understand what it might have been when you could have done it yourself in few second took some sloopy few hours.
I know sometimes the process are time wasting however its a irony we cant bypass them.
May 16, 2012 at 11:41 a.m. UTC
Basic troubleshooting - check the cable. When you're done, check it again. Also troubleshooting going up the OSI. When I've skipped layer 1, it's bitten me.
May 18, 2012 at 2:18 a.m. UTC
I just had two cross-connects installed at a major colo provider. One was for a 100 Mbps CenturyLink iQ MPLS circuit (delivered as 1000BaseLX, Single Model fiber) and one was for a 100 Mbps AT&T AVPN MPLS circuit (delivered as 1000BaseSX, Multi-Mode fiber). The distance for these cross-connects was about 450 feet from the carrier meet-me-room to our racks.
Both cross-connects were bad.
After a day and half of troubleshooting with CenturyLink, CenturyLink tech discoveres the colo tech doesn't know how to properly use an OTDR and the colo tech was actually testing the patch cable, not the the full length of the cross-connect. When properly tested, the 1000BaseLX Single Mode cross-connect was showing over 50 dB loss -- -50 dB at 450 feet means they were terminated correctly or the fiber was damaged. Since they were from different spools (Single Mode and Multi-mode), definitely the installation/termination that was the issue.
May 18, 2012 at 5:39 p.m. UTC
you know what, I have the same experience with optic cable.. it was flapping, sometime I get like only 60% replies on my every ping..
the cabling guy just doesn't show up until a month later, when the connection absolutely down.
the cable need to be clean, but it just take them month to come and check on my optic cable.. damn, I hate it so much!!
May 20, 2012 at 6:32 p.m. UTC
Hey, the guy in the picture (YOU DON'T SAY) looks like Castor Troy...
May 23, 2012 at 11:35 p.m. UTC
sounds like Equinix! "smart" hands
May 27, 2012 at 9:01 a.m. UTC
Good Post .....
May 29, 2012 at 4:22 p.m. UTC
Wow, I have hundreds of these stories. Life of a packet pusher. Smart Hands definitely sounds like Equ***x. I am so not a fan of that colo provider. Give me TelX any day. I had an issue with them at 56 Marietta. First its hell opening up a ticket with them. I have had mutiple occasions where they did not even know thier own site. Last issue in ATL, I aksed for them to run a cross connect for me, and call me once it was installed. I never received a phone call, and when I attempted to turn up the interface, I was getting hard down. Had to open up another ticket with them for troubleshooting cross-connect. (New Ticket, new charge, yes they freakin charged me again to troubleshoot the cross-connect they installed) Turns out that they did not have the MMR panel properly labeled. They told me they ran the cable from 17 and 18 on a 48 port panel. Turns out they had 4x12 port panels, marked A B C D, and my circuit was on 5 and 6 on panel B. Took 2 days for them to nfigure out thier own panel labeling. Another time, I ended up working with a tech, and told him to pull a red cat 5 cable. The tech told me that all he sees is an Et hernet cable. (huh?) How do you work in this field and not realize that this is one and the same. I ended up having to walk the same tech through loading up HyperTerminal as he had never used it before, nor heard of an RS-232 dongle.
June 27, 2012 at 1:16 p.m. UTC
Maybe we'd care about our work if we got paid more and weren't doing the work loads of three people.
If you want the cheap guys, you get what you pay for.
April 24, 2013 at 4:38 p.m. UTC
Some days you're the dog; most days you're the fire hydrant...
Episodes like this are what led me into Buddhism. I realized that the coping skills I had developed over the years in response to incidents such as these were very Zen-like in their makeup, so I figured -- "why not? I'm halfway there already."
Seems the standard rebuttal from my Army Tech Controller days has become de rigueur in this industry --
"The problem is leaving here fine."