analysis | dialtone | Life | news | outages | Skype | statistics

Internet Outage Kicks 3.4 Million Skype Users Offline for 50-90 Minutes

20111107 TWCable OutageChart showing timeline of effect of North American Level 3/TimeWarnerCable Internet outage on 7 November 2011. cc-by Phil Wolff.

More than 3.4 million North American Skype users, about 12% of those online at the time, were affected by an ISP service fumble, with reduced or no access to Skype dialtone for up to 90 minutes today. Phyber Communications reports ISPs appear to have been affected by Juniper routers on Level3 networks, including TimeWarnerCable Internet.

fiction | fiction | fun | Life | outages

Seven Fake Reasons the Skype network went down

Fame. Because a rumor Justin Bieber tweeted his Skype name caused a swarm of new Skype downloads, new user accounts, and millions of futile IM and call attempts.

Technology. Because Skype ran out of IPv4 addresses.

Credit. Because American Express ran a credit check when they hired former Skype CEO Josh Silverman to head their US consumer services businesses.

Deals. Because Canadian mobile operator TELUS tried too hard to order a SkypeIn phone number. Not available in Canada.

IPOs. Because Avaya startled Skype’s supernodes when it filed to go public.

Business. Because Microsoft tried changing Skype’s servers from Linux to Windows.

Envy. Because phone companies hired hackers to sabotage the network.

financials | outages | Skype | statistics

Skype network rebounds (CHART)

image

Skype’s network crashed from 23.7 million active users to 1.8 today, more than a small number. It’s rebounding, now around 12.7 million people logged in. A spurt of software downloads came with the crash, although Skype blogs this is not necessary. Darned misconfigurations.

downloads | outages | Skype

Download: Skype for Windows 5.3.0.116 Hotfix

SNAGHTML1b8fc7d1

Download Skype for Windows 5.3.0.116 Hotfix. It addresses problems some users had connecting to the Skype network. A Mac hotfix is coming soon but Linux users should follow the manual procedure to repair the shared.xml file.

UPDATE: Corrected release number.

analysis | dialtone | outages | Skype | statistics

Did you notice Wednesday’s short Skype login and SkypeOut outage?

[blackbirdpie url="https://twitter.com/#!/Skype/status/53231840756109312"]

I didn’t notice when Skype’s login servers and SkypeOut services went offline but several people Skyped and tweeted it at the time.

what-outage-300

Andy Abramson can’t believe Skype’s servers don’t have the redundancy enterprise customers expect. I’m sure all hands showed up.

And yet I’m unable to see an outage in Skype’s stats showing the number of users online. I can only imagine the service interruption was short, the interruption was intermittent, it affected only a few people, or the data we’re seeing is incorrect. 

Dan York wants transparency in the form of prompt, ongoing, and more explanatory communication.

I agree with Dan. Skype’s Heartbeat reports could have been much more specific.

  • What exactly is the problem?
  • What are the symptoms?
  • Does it apply to everyone?
  • How can I tell if it affects me?
  • What caused the problem?
  • What should users do?
  • How long might this continue?
  • Why do you think it happened? (I’m hoping for ninjas).
  • How is Skype responding?

And an after-action report, summing things up.

So, what really happened, Skype?

design | outages | Skype | Technology

Wishlist: Skype Heartbeat Alert

Wishlist: Skype Heartbeat AlertI wish Skype communicated outages and other issues more clearly. Tell me:

  • what’s going on,
  • how it affects me,
  • what I can do about it for now,
  • that you’ll let me know about status and resolution,
  • what happens when the event is over.

Here’s a mockup of a dialog Skype desktop clients could offer.

One of those dialogs we hope never to see, always glad when we do.

Chat with me on Skype. Call me at +1-510-316-9773 (my mobile), follow on twitter @evanwolf (everything) and @SkypeJournal (just the posts). Visit our Skype Journal private technologist roundtable, one of the longest running public Skype chats, where we’re talking about this right now.

analysis | architecture | design | news | outages | Skype | Technology

Complexity: Monoculture or Biodiversity in Software Architecture?

tractors in fieldLars Rabbe, Skype’s CIO, reports today “around 50% of all Skype users globally were running the 5.0.0.152 version of Skype for Windows.” .152 is the build that crashed when another part of the network, offline instant messaging, became overloaded. So half of Skype’s users were using other versions and were not immediately affected. This slowed the onset of the outage. 

Sven Tigane blogged about the importance of updating just the week before the Great Skype Supernode Crash of 2010. “At Skype, we regularly release updates to our software, and with every update, we introduce new features, improve existing ones and fix bugs. Keeping Skype up to date is the only way you can take advantage of these improvements, and is vital to making sure that Skype remains safe and secure.” Sven is programme manager for Skype for Windows and long time Skype employee; his views matter.

Skype’s deployed client inventory (the software we all use) is highly diverse. Millions of users are on dozens of versions of multiple operating systems with many kinds of hardware and very different network configurations. Simplifying, unifying, getting everyone to use the latest version will make life much easier, faster, and cheaper for Skype’s customer support, security, network administration and software development teams.

Absolute uniformity is not ideal. Here’s Aswath Rao answering “Was the Skype outage preventable?” on Quora:

Realistically, no system is totally fail safe. One can only design to reduce the chances and minimize the impact. The problem with the current design and market condition is that most of the supernodes will be running the same OS and the same version of the app. At this stage, Skype can easily deploy their own supernodes, but maintain the P2P architecture as a way to handle growth, failover etc. They just have to make sure that the supernodes run diff OSs and they upgrade their app in a staggered fashion. In other words, they need to come up with an operational plan.

Aswath is saying Skype is suffering from monoculture: too much uniformity.

Like farming wheat in massive farms, you want the efficiency that comes with having your whole crop be one species. You get better deals on seed, only have to think about the nutrient needs of the one species, and you can harvest your whole crop at the same time. Following the simile: it’s cheaper to make the software, less complex so more reliable, and improvements propagate to your whole network.

Monoculture has its downsides. One pest can ruin your entire crop. We saw that problem in both of Skype’s major outages. Small code or settings adjustments in a popular version of the program spread like wheat rust. When they create problems, those spread quickly too.

Planned biodiversity is a response. In farming you introduce slightly different strains of your crop. In software and networks, you keep multiple versions going in the field. You keep multiple strains of protocols and architectures running at the same time. So problems, when they spread, hit limited populations. When some varieties are affected by a blight or an attack, and others aren’t, you know those differences can lead you to the disease or infestation vectors.

So here’s to keeping your last version of Skype around. And waiting a day or two for updates.

What tools did you try when Skype went down? Did you come back to Skype? Chat with me on Skype. Call me at +1-510-343-5664 (Google Voice), follow on twitter @SkypeJournal(just the posts) and @evanwolf (everything). Visit our Skype Journal private technologist roundtable, one of the longest running public Skype chats, where we’re talking about this right now.

photo by NightThree via Wikimedia commons.

corrections | dialtone | outages | P2P | Skype | statistics | Technology

1.4 million Skype supernodes crashed (chart)(video)

UPDATE:

From Julian Cain at 1:33 PM Pacific.

Hi Phil. I’d like to clear up the node to supernode relationship. Each supernode determines it’s client connection list to be full at 350 active tcp connections. This means a supernode is considered at maximum capacity at roughly ~350 client connections however they will and do handle more client connections up to a degree depending on system variables. ATM  there are ~58K supernodes "active" and ~116K "idle". My data during the outage listed clients and supernodes however I truncated the dataset to begin at about ~1.5 MM nodes. The decline is linear with online supernodes and clients and represents a "live" graph of the overlay undergoing "segmentation" due to a failure in the main supernode backbone.

CORRECTIONS:

First, while Cain’s data is correct, I mislabeled it. These are 1.4 million nodes, not supernodes. By 3:46PM Eastern in the US, the Skype network crash had been going on for many hours. So you’re seeing the last leg of the collapse. Cain labeled the data correctly; it was my transcription error.

Second, I should have recognized the error. I was confused by two things. All the other information and videos focused on supernode behavior, so I was myopic. I completely missed out on the time zone differences so I didn’t notice that this data really fit into a different place on the event timeline.

Thanks to several folks at Skype who urged me to dig a bit deeper and to anonymous commenter Chupacabra who wrote “1.4 million supernodes? Thats rubbish!”

My response to Chupacabra:

@Chupacabra, I thought that at first too. And you may be right since nobody at Skype is talking. So let’s do some back of the napkin calculations. In the olden days, say 2005 or so, it was said that a supernode could support between 500 and 1000 active nodes easily. On that fateful day last week, there were about 25 million accounts online when things began to go wrong. Very few people have more than one copy of Skype running, so let’s leave that to rounding error for the moment and say there were 25 million nodes running. So 1000 goes into 25,000,000 twenty-five thousand times. 50K supernodes if we say each supports only 500 nodes. 25 to 50K feel right to me.

Are we dividing into the right population? Skype has about 150 million active users over the course of a few weeks, about six times peak levels. So that would make the number of supernodes about 150K to 300K. Still far short of 1400K.

1.4 million supernodes is not what we’d expect.

So,

- Has the math changed? If Cain’s observations were correct, does a supernode now support, say, just 20 other nodes? That would be a massive drop in efficiency.

- Did Cain observe something else, not a supernode but, perhaps, a node that was capable of becoming a supernode?

- Did Cain only find nodes – not supernodes – that were visible from his vantage point in the network? A large but limited subset?

- Did Cain make a fundamental mistake in data collection and processing?

- Did I report it wrong?

I’d like to test his rig, but his response to a request for an interview:

“Hi, Thanks for the offer but I am not speaking to the press for the foreseeable future. Regards, Julian”


Almost 25 million accounts were online when Skype’s 1.4 million supernodes started leaving the cloud. It took 330 minutes for 98% of the supernodes to go offline, cutting off nearly all Skype users. Researcher Julian Cain set up a UDP probe to look at the Skype network as it crashed last week. At the bottom of the blackout, Cain demonstrated how his reverse engineered Skype client attempted to connect to the network and was rejected, like all the other Skype clients struggling to reconnect.

98% of Skype Supernodes Crash Over 5.5 Hours

The chart shows 98% of the Skype supernodes leaving the Skype network over 5.5 hours (data).  Cain puts minimum viability for the p2p mesh at about 75K nodes.

Cain shot a short video after the network crash. He is showing his “fully functioning 3rd party Skype peer-to-peer stack during the global overlay outage.”

10:42 PM Eastern, still unable to connect with Skype. He shows traceroutes to Luxembourg supernodes operated by Skype, mostly pooled within the same IP ranges. Here’s a list of hard-coded Skype IP addresses from earlier this year. He shows his own reverse engineered, third-party implementation of a Skype client. “It’s the first ever.” He uses Skype’s login server as a bootstrap supernode. “Here I perform a UDP probe of all of the supernodes that ship with the Skype binary to check responsiveness.” He makes his own SkyLib (a core Skype messaging component) and pinged the Skype mega-supernode. He receives a NACK (negative acknowledgement) from the Skype supernode and connects to the network. The network drops his TCP/IP connection, just like any other Skype client in the outage.

Skype promised to report on its internal post-mortem this week. Let’s hope we get this level of disclosure.

More from Skype Journal on the outage:

What tools did you try when Skype went down? Did you come back to Skype? Chat with me on Skype. Call me at +1-510-343-5664 (Google Voice), follow on twitter @SkypeJournal(just the posts) and @evanwolf (everything). Visit our Skype Journal private technologist roundtable, one of the longest running public Skype chats, where we’re talking about this right now.

outages

Skype Outage Coverage: What The Tech (12/22/2010)

The Guys From Queens Podcast Network team switched to Microsoft Live Messenger 2011 and Google Talk to get through the outage. “Nobody has delivered for our needs, anyway, the quality and stability that Skype is normally famous for.” Tom: “My business relies on it.”

They talk through alternatives and why they use Skype for their podcasts. They make fun of the Cisco umi. From the start through minute 14:00.

analysis | Business | marketing | outages | Skype | Technology

Quora Questions the 2010 Skype Blackout

Quora is hosting questions about Skype’s pre-Christmas 2010 outage. See how people are answering, clarify the questions, answer a few or, please, ask better questions.

Explain what happened:

What if?

What’s next?

What questions lead us to the truth? To a better understanding? To action? Chat with me on Skype. Call me at +1-510-316-9773, follow on twitter @SkypeJournal(just the posts) and @evanwolf(everything). Visit our Skype Journal private technologist roundtable, one of the longest running public Skype chats, where we’re talking about this right now.

community | complaints | outages | Skype | Twitter

Complaining is your duty, not just a right

murplejane3098761766b

Quora asks:

Is it reasonable to complain about a free service being down? Why or why not? A lot of people complaining about it being down, can you really complain about something that you don’t pay for? In my opinion of course you can be annoyed but complaining to the company about it? I’m afraid this is the risk you take when you decide to use a service that you don’t financially support.”

Complaining about the outage of a free service is not just a right, it’s a duty.

  1. Complaining is feedback to the host of the free service that an outage matters. Seriously, there are sites that could go down for a week and nobody would notice.
  2. Complaining signals fellow users of a problem’s existence. We don’t all use a service at the same time. Skype, for example, rarely has more than one sixth of its active users logged in. So those who experience a service interruption spread the warning to the rest of the users.
  3. Complaining builds community. You are not alone in the dark during this outage; you are sharing that experience with others.
  4. Complaining characterizes the interruption’s scope. It is useful to know when the problem is local (and if it affects you); in Skype’s outages one of the first alerts are "I’m having trouble in Singapore" and "Skype 5 is not loading properly in Florida."
  5. Complaining adds the emotional overlay to contextualize the problem. Did you miss a service because it was inconvenient or a little slow? Did the downtime threaten your safety, livelihood, freedom. Has it cost you money? Do you feel betrayed? All the stakeholders need to understand why customers care, individually and collectively.
  6. Complaining triggers survival instincts among users. People may swarm to help a service (like throttling down consumption if a service is overwhelmed or upgrading to a new client), seek out temporary solutions (working offline, downshifting from group video calls to group audio calls), or design exit strategies (sharing how to extract your information assets, how to migrate to alternative services, how to notify your contacts that you’re moving).
  7. Complaining about an outage can provide useful data to recovery from the outage. Sometimes.
  8. Most people never complain. So complainants speak for the user majority.
  9. Complaining defines this moment in history. Our kvetch about service interruptions inform a service’s reputation and is part of its legend.

Complaining is a productive contribution to business, social and technological ecosystems. Services like GetSatisfaction try to channel complaining, as do other kinds of feedback analytic tools.

Beyond duty to your community, kvetching can feel exquisite. In many cultures, you can heal by getting your strong feelings of frustration and disappointment off your chest. Is complaint a therapy for long term service separation anxiety? Let’s leave that to the Fail Whale.

Photo credit: “rage” cc-by-sharealike by how will i ever, December 10, 2008. 

I’m always open to complaints. Chat with me on Skype (when it’s back up for you). Call me at +1-510-343-5664 (Google Voice), follow on twitter @SkypeJournal(just the posts) and @evanwolf (everything). Visit our Skype Journal private technologist roundtable, one of the longest running public Skype chats, where we’re talking about this right now.

dialtone | news | outages | Skype | statistics

The Skype Blackout of 2010 is Over (chart)(recap)

All systems are back online, according to today’s Skype blog update. Here’s a link to the full size of the chart below.

2010OutagePart6-600

Skypers were active the week before the crash , near Skype’s all time dial tone high watermark, peaking near 25 million accounts logged-in. You can see from the insert in the chart that activity stayed high right through the previous weekend.

Skype’s CIO texted CEO Tony Bates at 4pm Wednesday London time that the network was crashing. Assess, assess, asses. Troubleshoot, troubleshoot, troubleshoot. P2P team steals servers from other departments. Thousands of MegaSuperNodes to the rescue. Core network comes back steadily. Logins throttled and other Skype services turned off. Network continues to come back online. Other departments get their servers back (or near enough) and group video, presence, Skype Manager, and offline instant messaging come back online. Most of the Skype team rushes home for Christmas or Shabbos. Users start their Christmas calling. Fah who foraze! Dah who doraze!

Serious critiques inside and outside of Skype start next week after the eggnog wears off.

More from Skype Journal on the outage:

Do you have great stories about your holiday Skyping? Pictures? Chat with me on Skype (when it’s back up for you). Call me at +1-510-343-5664 (Google Voice), follow on twitter @SkypeJournal (just the posts) and @evanwolf (everything). Visit our Skype Journal private technologist roundtable, one of the longest running public Skype chats, where we’re talking about this right now.

dialtone | fun | outages | TonyBates

Skype CEO: We apologize and will top you up. Caption contest.

Skype CEO Tony Bates is back on YouTube and Skype’s blog.

  • Number of people logged in with Skype dial tone: 90% of normal. #skypemebaby
  • Offline IM: still unavailable. #Groan
  • Group video calling: still unavailable. #ooVoo
  • Skype rules out malicious causes for the outage. #wheresJulianAssange
  • Skype will send 30 minute anywhere-landline calling cards to Pay As You Go and Pre-Pay customers. #iwannabeabillionaire
  • Skype will extend active subscriptions for a week. #AndOnTheSeventhDaySkypeRested

Caption Contest.

Twenty words or less, no swearing, and no sex-related captions please!

Skype CEO Tony Bates reporting on the 201 outage

US winner gets a freetalk everyman headset, made for Skyping. Skype and contractor employees may play but aren’t eligible for prizes. Leave your captions in the comments or tweet to @SkypeJournal.

Business | dialtone | outages | SkypeKit | Technology | wishlist

Wishlist: Enterprise MegaSuperNode Appliances

freetalknodemasterDo you have 100 employees using Skype at one location?

Worried you won’t have enough Skype supernodes to go around?

Worry no more! Now you can buy the Reef9 Node Master for only $495. Just plug it in outside your firewall and watch it go. It will spin up dozens of Skype supernodes. Near your users and always on, so you always have the best access to the Skype network money can buy for just pennies a day. For an extra $45 per year, Reef9 will update your Skype clients with the latest in Skype P2P technology.

It’s fiction, at the moment.

But not unreasonable.

It’s the kind of solution SkypeKit was made for.

More from Skype Journal on the outage:

If you’re interested in putting together a business plan for the Reef9 Node Master, chat with me on Skype (when it’s back up for you). Call me at +1-510-343-5664 (Google Voice), follow on twitter @SkypeJournal (just the posts) and @evanwolf (everything). Visit our Skype Journal private technologist roundtable, one of the longest running public Skype chats, where we’re talking about this right now.

dialtone | fun | outages | Skype | Skype News

First picture of a Skype #MegaSuperNode

First photo of a MegaSuperNode

Skypelandic engineers turned ordinary blog and accounting servers into powerful superheroes, harnessing Cloud Power to restore conversation to peaceful Skypelandia. Rolf, one of the Mighty MegaSuperNodes, posed for this portrait.

For the best illustrations of how Skype’s nodes and supernodes work, see Dan York’s readable Understanding Today’s Skype Outage: Explaining Supernodes.

Thanks to Pixartica for the Skype chest icon, Marvel Comics for the toolkit, and Skype for the scenery.

More from Skype Journal on the outage:

Are you Skype hero? Come out of the phone booth (like we have phone booths in the Skype era). Chat with me on Skype (when it’s back up for you). Call me at +1-510-343-5664 (Google Voice), follow on twitter @SkypeJournal (just the posts) and @evanwolf (everything). Visit our Skype Journal private technologist roundtable, one of the longest running public Skype chats, where we’re talking about this right now.

7 years and 12 days since Skype Journal launched as a stand-alone blog.

Topics