Back

Working day and night…

on 

Last night, 5 of us were on deck after a difficult wake up call in the middle of the night. Regardless of the time, Greg / Pierre-Laurent / Dumè / Seb and I were up from 2 am until 4, in the aftermath of a BGP incident.

When BGP breaks down

The main particularity of the internet is its capacity to reconfigure itself in case of a connection breakdown. This feature, inherited from the early military days of the network, aimed at making it always functional. To achieve this, each router dispatches the list of networks within its reach and management to those it is in communication with : this is the BGP protocole (Border Gateway Protocol).
Tuesday/Wednesday night, our network provider OVH conducted a maintenance on our routers which consists of removing the obsolete or useless BGP rules. Gregory Giannoni explains:

"Major network failures have a lot in common with police investigations, it is very difficult to learn the specifics before the case is actually closed, but let’s just say that the cleanup of our routers’ configuration was a little too thorough so that the computing lines directing towards our networks were deleted, isolating our servers from the rest of the world."

Pierre Laurent Medori, who still didn’t get a night's sleep, approves!

It took an hour, in the middle of the night, for the system to be operational again, for the most part, but a few connectivity issues still remain as of now, for some service providers or countries, the issues are currently being resolved. We are still in close touch with the housing support of our provider.

Morning maintenance

Yes, we also had a morning maintenance scheduled, from 7 to 10 am GMT + 1, which was just a coincidence. We considered postponing it after the complicated night but it had to be conducted. It went well and was completed by 9:30. The two operations are distinct: the maintenance had nothing to do with the breakdown, it was carried out successfully.
To sum up the situation, we are now back on track, with a stable environment. We are still on the lookout for a few glitches here and there, we are actively working to stamp them out for good. We will keep you posted when it is 100% resolved and back to normal.