Late last year, IMS Research reported that we’d passed the 5 billion mark – 5 billion internet connected devices – and predicted we’d connect 22 billion devices by 2020. That said, internet connectivity is no longer enough to differentiate your product. So, what will be different about the next 17 billion devices? Video.
We’re finally at a place where we’re moving from the PC to non-PC devices being rich communication endpoints – from your mobile phone and TV to your in-car navigation system. Internet + voice + video is emerging as the ultimate trifecta for cutting-edge devices, and increasingly what consumers will expect from their electronics. However, as demand for video-enabled electronics continues to increase, development bottlenecks caused by closed API standards will continue to plague the industry and hinder the growth process. During his presentation, Skype’s Jonathan Christensen will discuss:
The historical context and early communications pioneers that took VoIP mainstream
The inefficiencies and hurdles that spurred industry-wide change in the early 2000s
How broadband penetration, multimedia PCs and P2P file sharing set the stage for rich mainstream IP communications and the proliferation of video calling
The future of communications in which devices are no longer isolated, and open standards will shorten development cycles Christensen will also give an update on how Skype is working to enable developers to leverage the power of voice and video to create a new generation of communications experiences, gaining access to a huge market potential by bringing video to a wide-range of consumer devices. With this next wave of connected devices, ‘video anywhere’ becomes a real possibility.
The line “Skype is working to enable developers to leverage the power of voice and video to create a new generation of communications experiences, gaining access to a huge market potential by bringing video to a wide-range of consumer devices.” is what brings SkypeKit to mind.
The session I want to host: a panel on Skype@Microsoft. Good for Skype? Good for VoIP? Brand collision? Borg avoidance? What could this look like in 2012, 2014? What new opportunities will this create? Will there still be a Skype Journal?
On stage or in the hallway, I’ll see you there this year.
Interpersonal interaction started in the real world. The software community started by modeling observed behavior. Postal service became email, writing became word processing, meetings became conferencing, bulletin boards, instant messaging, and Skype calls. Services experimented with a mix of media choices; time/space/channel shifting; participant discovery, invitation, and scheduling; defining participant roles and moderating behavior; backchannels and voyeur streams; and the archiving and distribution of meeting work product.
The first generation of AR for conversation will reverse the flow, bringing the benefits of newly defined online social experiences to face-to-face real life encounters.
II. A dream platform
Let’s take a walk with a conversation charged with by augmented reality. Dream with me that we reach technical and economic feasibility in a few years.
In this world, eyeware becomes AR’s delivery platform, freeing your hands. Gestures and speech are our mouse and keyboard. I’ll assume a relatively thin client that senses locally and works with information via computing clouds.
III. Aug’d Talk
Before our meeting, gatekeeper bots help you screen, find, schedule, negotiate people to visit. Think services like Tungle.
Flickrrecognizes their faces at a distance, tells you to turn around to greet them, and makes them glow as they approach.
Fashionistascores the other person’s wardrobe and accessories. “Gucci knockoff”, “UC Berkeley Store: $34.95”, “Amazon Wishlist: Pearl Earrings, learn more.”
Equifax shows a hovering frame with their latest credit score, criminal background check, and public updates. "Experian: 4 of 5 stars."
Seesmicdampens your heads-up social peripheral vision so you can pay attention to each other.
Plaxo reminds you of who your closest mutual friends are with a two-second montage of faces.
Systraninterpreter bots overlay speakers with live subtitles in your first language.
Nuance and Bing listen in as you talk, whispering private tips and reminders in your ear. "Amy’s mother’s name is Gail." The social secretary serving up social objects and avoiding faux pas.
SecondLife adds a third participant’s avatar as we walk and talk down the street. Giving a virtual participant a medium for to participate in a realworld conversation.
Zoho Planner tells us we have three more minutes on this topic.
I see my RSAbot flash a caution halo over your head when it thinks your voice, facial expressions, and body language show deceit or that you are very likely to become violent.
Yelp Monocle suggests a nearby lunch place we might like and books a table for us, including a chair for the avatar and a blank wall for projections.
Google AdWalls places poster ads on blank walls as we walk by, some optimized by our mutual Buzz history, others unique to each participant.
UStream.ar makes a composite video stream from our respective Logitech PoVcams. Hundreds of voyeurs join our conversation, chatting in their own instant messaging backchannel.
Zemanta links to related conversations you might like. “Carol and David – same spot, same day, last year.” “Ed and Faith – similar topic (live).” “Gary, Harry, Ike – same café (live)."
Meetup reminds us there’s a flashshmooze in five minutes. "Our World’s A Conversation®."
Basecamp helps us recap our action items and commitments for our next meeting.
IV. Augmenting conversation requires technical architecture
We don’t have what it takes to deliver this experience in 2010. Let’s start with the technology.
A. The AR community opens its architecture.
Most parts of augmented reality software are bundled, intertwined, and often proprietary. I heard more than one AR researcher say at eComm they build AR browsers from scratch for each project.
Developers need a stack architecture that isolates components from one another, that defines how the parts talk to one another. A well-crafted stack means technologists and businesses can compete to become best-of-class within their part of the stack without breaking dependencies with other parts. For example, improvements in how bits are routed over the Internet don’t require an update to every web page; the two components are isolated, each doing their part.
This is how our telephones, local networks, and the Internet work. A stack speeds engineering through focus, reduces risk through compartmentalization, and allocates resources well as each layer of the stack becomes sends signals about its technical and business opportunity.
For a stack architecture to work, it must become a de facto standard, used by all.
B. The AR community separates browsers from content.
Most AR systems bundle content and features with a browser. This lets programmers tinker with the nuance of user experience, to exploit hardware, to code within an agile and iterative design process.
AR must split this atom to lower the bar for those who want to augment reality with information, behavior, communication and interaction.
Some companies, like Layar, aspire to be AR’s Netscape. That’s the right direction. Layar’s architecture is closed (you need their permission to use it), hosted (a single point of failure), and proprietary (only Layar defines how to use it). AR investors and consumers need a less risky architecture, so content and services survive when Layar closes (remember Netscape?) or acts capriciously (remember Apple?).
As with a protocol stack architecture, delaminating content creation and service from content players lets both evolve more quickly and with greater competition. At the moment, Layar offers a point-of-interest annotation browser; remember the web before forms, CGI scripts, Javascript and AJAX. AR has the potential for much more. That only happens when public and open protocols for serving and playing content emerge from the AR community.
Perhaps this is a call for Layar to follow Netscape’s shoes and further open its protocols and code, so its browsers can serve augments from any server, not just Layar’s. And so others can compete to serve AR content better, just as Firefox, Chrome, Opera, and Internet Explorer vie for web browser share.
C. AR browsers must offer concurrent experiences
I can only experience one reality at a time. I can’t see Wikipedia entries or the morning local news stories while looking for coffee. I want to. People need the power to blend content in our reality browsers. We need to be able to see content from multiple, independent sources at the same time.
Let’s assume your RayBan RealWare(TM) browser is letting you run a hundred views at once. Browser plug-ins and other services will help manage that experience to avoid overloading the user and to increase relevance. To do this, we need a few technology standards comparable to what’s evolved on the web. A secure way to deliver a service layer. A standard model of a layer as a container. A standard model of things within a layer and a way for your software to discover its properties and behaviors. Universal baseline user interaction methods so users can navigate and manipulate and control their environment. Agreement, on a limited basis, of when and how objects in one layer may interact with objects from another layer and how to give power over that choice to each user.
E. Layer Discovery and Stores
Which brings me to distribution. Build interoperability so anyone and everyone can publish with minimal opportunity for governmental or corporate restriction. Decentralization is a defense against consolidated power. More on this later.
V. What we don’t know
Despite more than twenty years of augmented reality research, AR is still early and immature. There’s a lot we don’t know.
A. Managing field of view
I was astonished by the 4K video displays at CES. So many pixels that you keep discovering new levels of detail as you walk toward the 152 inch monitor with four times 1080p HDTV resolution. At some point you can’t see the whole screen. You focus your attention, and your eyeballs, on a small portion of the whole. Just like real life.
When there’s a hard limit to what’s visible, you have scarcity. The economics of a zero sum game come to play. Those who value your attention (including you) will fight for prominent positioning within your field of view.
It’s happened before.
Billboards filled United States highways, advertisers bidding for the best locations to reach the most drivers. The public fought back after fifty years with the Highway Beautification Act promoted by Lady Bird Johnson.
Desktop operating systems and browsers also fight for scarce default pixels. Governments accused Microsoft of anticompetitive behavior for favoring its own Internet Explorer browser with an icon on the default Windows desktop. Google’s position as Firefox’s default search engine is worth more than $50 million annually. Skype agreed to exclusivity and to give lower its privacy expectations in exchange for scarce Verizon mobile desktop "on deck" placement.
For everyone else who wants your attention, there’s the $500 billion web browsing economy. Like America’s byways, the web brings a torrent of intrusive and distracting ads. Again, the public fights back with ad-blocking software, users seizing control of their pixels.
So the technology reveals a political question: who has power over what you see?
B. Interruption and Alerting
If you thought distracted driving was a problem… Interruption overload of your reality will be huge. AR interfaces can take your eyes from where they are needed, your mind off what you are doing, and pull your hands from what they are controlling.
I predict the first death from an AR interruption will be reported in a top news service by 1 January 2017.
Rapid contextual filtering will save lives.
Workshop to explore human cognitive limits and approaches to filtering well?
C. Gestural Language Tower of Babel
All of today’s gestural interfaces are limited in their scope and vocabulary. And no two of them are the same. When you are living with multiple layers, you need universal universal primitive gestures. With the web we have "click this link", "go back", "change context", "type something here", "press this button to make something happen" and "close the browser." Let’s avoid having the "get me out of here" gesture from one service to be interpreted as the "place my order for Viagra" gesture by another.
Workshop to start harmonizing gesture languages, anyone?
D. Layer Discovery Protocols
Many web pages hide little notes that point to alternative versions of that page. That’s how feed readers discover RSS, Atom, and ActivityStream feeds.
We’ll want our layer discovery protocols to be useful for people, helping them choose what layer services and objects to consume, and for systems, describing layers in ways that let software understand how to interact with that layer’s capabilities.
This technical disclosure powers the realtime web and enables our mashup economy. Disclosure protocols can fuel a world of AR mashups and interplay.
Workshop to start prototyping and testing discovery mechanisms, anyone?
E. Designing for variable infrastructure capacity
Cyberspace isn’t flat. It’s lumpy, twisted, sporadic, and changing. AR will challenge the ability of each part of the Internet’s infrastructure.
Mobile CPU/Storage/Battery
Bandwidth bottlenecks.
Wireless coverage
PAN bandwidth
Relevant content density
F. Social conventions
What is the proper way to meet an augment? To agree on mutual augments? To include a non-augment in your conversation? To behave in the presence of offline people?
What is forbidden? Is it OK for me to alter how I see you without telling you?
OK to share our conversations and augments with others without disclosure? Would it be OK to do that if I felt you are dangerous to me? If I was paid to broadcast? Could you claim an equity stake in my broadcast revenue?
Should I automatically disclose to you the analysis and metadata I collect about you, so you understand how I perceive you?
We have an entirely new body of social behavior to evolve, adapt and codify. An augment etiquette that defines new conventions to support our new virtual realities.
G. Public Policy
We need to start a conversation about society’s interest in encouraging good behavior and discouraging the bad.
Privacy for Public Conversations. I live in California where phone conversations may only be recorded if all the parties consent (California Penal Code 632). Should we extend this to conversations held in semi-public places now we wear recording media and as the streets and architecture become sensor-rich?
Data Portability Mandates. Companies hold our data hostage, the way landlords might hold your deposit and furniture hostage. This is a strong imbalance of power. As we come to rely on our augments for basic services all day long, their power will grow. Communities passed laws to protect renters from abusive landlords. Will we pass laws to protect people from sites that won’t let our data go?
AR-Free Zones. There may be a public good in defining some areas free of some or all augmentation. Courtrooms, perhaps. Some public parks. Would it be OK for a restaurant to define itself as an AR-Free Zone and require you to turn yours off before coming in? We have precedent in smoke-free, phone-free, and pet-free restrictions. Is access to your view of the world a fundamental right, to be protected by law?
Net Neutrality for AR era. Reality is too precious to let someone distort it. You don’t want any of the companies between you and your experience to have a say in what you see/hear/feel and what you don’t. In what you say and do, and what you can’t. To pick for you what is important and what isn’t. Their interests may not be yours. Should we legislate that ISPs serving AR experiences be forbidden from treating some bits differently from others based on their content or source? Should we apply net neutrality principles to companies like Layar who have the power to ban a publisher? To the companies that make your AR devices?
AR Carterfone. The Carterfone ruling said you may connect any device to the telephone network so long as it doesn’t harm the network. This led to a world of fax machines, private telephones, mobile phones, telephone switches, and eventually Skype. We need an AR Carterfone. So industry innovates and consumers choose and power doesn’t lie with those who connect us to each other.
Disclosure. "This is not real. Push here for details." "This is an old version of this place." "I am wearing an avatar."
Equity and Property. Does your employer get a copy of your AR life when you leave the company? Does your employer get to cut you off from the people and memories you created through AR? Who gets the layers you created with the kids once you divorce?
Freedoms. Politically, AR is speech; everyone will be a publisher. We want our technology to promote free speech. Socially, AR is conversation, so we want technology to help people organize and assemble themselves. Economically, AR is land; everyone will homestead and grow and build on it. We want everyone to build well without stifling the AR economy. We need to map AR against our ideas of civil liberty, human rights, jurisdiction, citizenship, protections for the weak and disenfranchised.
VI. In short…
We will augment walking and talking. Enhancing our face-to-face conversations will make money. Industries that now help us organize our time and relationships will compete to be part of our world view.
The technology comes quickly but huge obstacles remain. The biggest is a lack of an IT architecture.
The future depends on research and conversations. Design research into vision, attention, interruption, and gestural language. Consensus on how to behave. Debate on how to set things up so we build an augmented society we want to live in.
For a start, let’s talk about an Augmented Reality Stack. Get started. Let’s target have a workshop early this fall.
Social sharing through sites like Facebook and Twitter has seen meteoric rise in the last year. Exciting as that may be, it only scratches the surface of what social sharing can mean on the Web. In this talk, Jonathan Rosenberg will explore the next phase of social sharing – real-time communications using voice and video. Through it, a whole new set of online interaction models open up for Web publishers, going well beyond the mere posting of links on walls and in tweets. Jonathan will detail several potential use cases to see how they can drive increased value for users and content providers alike. [emphasis mine.]
Speculating madly, dear reader, what will Rosenberg announce?
Skype voice and video conferencing you can embed in web sites (like bloggers embed YouTube videos).
A Skype platform you can build into mobile, desktop and web apps.
Live voice/video conversations you can attach to threads of textual conversation in mailing lists and blogs.
Skype conversations you can trigger through links in social media, a la bit.ly.
Syndicating your Skype conversations and activities through RSS, Atom, and ActivityStream feeds that can be consumed by sites and news readers.
Hosting recordings of your Skype conversations for discussion and sharing a la Facebook and YouTube.
Anything less will disappoint.
Frankly, I’d love for Skype to publish user behavior as an activity stream that can be consumed by other systems. Beyond online/offline/availability presence, let me show friend/follow/block, chat room join/leave, conference call join/leave, mood changes, profile updates, contact group updates, and my other in-Skype activity. If you’re in San Francisco this weekend, join MySpace, Google, Ericsson, Facebook, Microsoft and me at StreamCamp.
He’s in a wave. Adds a gadget. Passes a Skype name to a gadget. Browser-to-Skype call starts.
They talk. As each person talks for a bit, their bit is encoded and linked-to.
So you have a play-by-play record of a call.
Inside a Google wave.
Under the covers: Jason Goecke said "it is a Google Wave Gadget with his PhoneFromHere.com IAX2 Java softphone as the client. Then, the IAX2 Java phone connects to Asterisk with Skype for Asterisk installed. Then, there is a server-side element, Ibook, that is breaking apart utterances into individual files. So that as each person speaks, it captures it into its own file. Then, as that happens, a text frame is sent from Asterisk to the softphone with the file details. The gadget then uses some Javascript to embed a link. IAX2 supports text frames."
This is cool (like I really had to tell you).
First, it shows what it’s like to build Skype calls into other applications. Without a Skype client running. (Pardon my drooling.)
Second, it deconstructs a long talk into directly referenceable snippets. (Still needs permalinks in addition to the playable links). This means you can annotate live calls with transcripts, pictures, etc. So the call’s Binary Large Object becomes binary tiny objects.
Third, because the snippets are referred to by a wave, other gadgets and bots can enhance the archive. Add or remove background noise. Translate and provide voiceovers in your language. Highlight statistically improbable phrases. Detect stress in a voice. Visualize the data in a timeline or a relationship scorecard (who talked more?). Add tags to help you find this wave again.
Fourth, no phone numbers were called in the making of this demo. Phone companies weren’t bothered. Internet all the way.
Fifth, because this is within the context of a wave, it should be possible to use wave member data to lookup Skype names and bring people into an open conference room.
It’s almost therapy for your phone company: be true to yourself. Find joy in simplifying your customer relationships so you can be marvelous at simply helping people talk. Rudolf van der Berg‘s talk at the Emerging Communications Conference in Amsterdam. Too bad Lee Dryburgh won’t stream the high quality video he’s paying to record.
This follows-through on Skype’s pledge to make superwideband audio cheap and ubiquitous.
On the business side, the SILK codec eliminates one of Skype’s three outside software dependencies: audio codecs from Global IP Solutions (GIPS). The two remaining are Skype’s high quality video codec, from On2, and Skype’s peer-to-peer directory, the Global Index from Joltid. Skype’s commitment to free themselves from dependencies should comfort investors and others worried about the Joltid/Joost litigation.
Here’s Jonathan Christensen speaking about the evolution of codecs (the software that turns your voice into bits and back) at the March 2009 Emerging Communications conference (slides, podcast).
eBay is preparing to spin-out Skype, setting it free to steer its own course. Almost six years ago Skype redefined realtime communications and changed the industry. Lee Dryburgh, the man behind the Emerging Communications Conference, shared some thoughts with me about his vision for what comes next. – Phil Wolff
I spent many years thinking about telephony, seven days a week, in a way it “destroyed” my life in a mental health sense during those years trying to ascertain where it was going between 2005-2020. It was clear to me that what had existed for over a century and which today generates revenues that dwarf the Internet, was going to be surpassed and that we had already put one foot on the cliff edge. It’s the big reason I kicked off the Emerging Communications Conference & Awards, because no other event seemed to have enough inherent vision.
Where is it going?
First you’ve got the telephony application itself. Because of the exceptional widespread deployment of the telephone, it’s century long cultural embedment, extreme ease of use and very low barriers to usage, it’s not going away in a big way, at any time least soon. It’s far too big and you’ve got far too much inertia in and around it.
Relationships replaces Voice as the substrate in clients.
However because its substantial list of deficiencies grows, what we are seeing emerging and what will gain ever further traction is software based voice-enabled, communication technologies. Interestingly voice may not be the “substrate” of these clients, “relationships” will be, both between people and things.
Second, we’ve got the economic model behind it. Even today, well over a hundred years since it’s original inception, we still have the same usage paradigms and economic models put in place at the time of the first electro-mechanical switches.
Now the keyword in all of this is “software.” Six years ago, the Skype software client was released. It was the harbinger of change to come. It called into question the need for very expensive dedicated underlying transport networks by pushing edge intelligence into the Codec layer to deal with less than ideal networks. It called into question the need for dedicated telecom hardware in the core network, by using the edge-clients to perform the work in a decentralised fashion. It called into question the inherent limited geographical structuring of telecom operators themselves; software does not face such physical and regulatory boundaries; distribution is relatively zero-cost; and worse still for the operator model, by it’s global footprint, it achieves unprecedented scale.
Looking forwards, we can consider Skype phase one.
Phase two is emerging on the horizon and it will have deeper impact yet. In fact, played out it will change social governance, market economics, how humans relate to each other and even the nature of geo-politics. It’s likely to have ramifications on all social order. In the long-term view, it will also be the “new” multi-trillion dollar market replacing much of what today is the multi-trillion-telephony market.
Phase two is built around an economic model that puts human time and attention at a premium as opposed to dedicated circuits, specialist hardware and personnel. It’s the opposite of what we experience today with telephony, where human time and attention is wasted; ringing, call queues, voice mail boxes, IVR trees, repetitious verbal transfer of static information such as credit card numbers, call transfers and such like.
And that’s just a quick C2B example. C2C has similar lunacy, for example needing to place a telephone call to request a single piece of discrete information or the other person’s location. The economic crisis experienced worldwide is likely to highlight such sources of great inefficiency.
Here is another angle to get you thinking, more and more calls originate from a number noted on a Website and yet when the call is placed, no information is passed with the call about what the context of the call. It’s lost, so each end has to orally work more at the beginning that would otherwise be necessary. Billions of minutes are needlessly wasted on a every day globally.
Phase two is about intention-based economics. It’s focused on fulfilling intentions and desires. Another way of putting it is we no longer need to care about network availability (i.e. “dial tone”), and reaching an endpoint (i.e. A telephone). Network availability and endpoint reachability is assumed. What we care about with intention based economics is human psychology and behaviour, both individual and in aggregate. I’m not saying we need to become psychologists and anthropologists. But what we need to build for is access to ever more personal information, i.e. about the human behind the endpoint. Privacy does not exist looking long-term. Ever more personal information is the new currency, which underlies intention-based economics, and people will increasingly trade it for free access to services.
If any of this seems abstract at the moment, think about what makes Google money, Ad Words. Google provides search free to the consumer in order to gain eyeballs (mass attention) and takes the search parameter to try and deduce intention. It then sells that attention and intention data upstream to advertisers. Google even has machines reading your emails in order to deduce your possible intentions and desires, which is why you may often find an eerily relevant ad above your Gmail account inbox. The underlying reason for the Android initiative surely has to be to gain access to better intention deriving data in order to sell upstream to advertisers.
Yet telecom networks receive vastly more human attention coming in from the edges and transit much more “intention data” than Google, in the form of telecom signaling. But it’s latent, not acted upon and thrown away. They actually throw away their most precious asset and plan to continue charging for their long-term least worthy asset (voice transmission).
To make the situation even worse, telecoms today is still charging downstream to the consumer, ignores money and wishes of upstream parties (like retailers, media companies for example). Because the telecom business model and regulation is pretty much hard nailed like the network itself, the bulk of telecom operators are not likely to be able to transition in time before other entrants move in who appreciate the new economics and who don’t have ball and chain legacy. New entrants and probably a third of telecom operators will transition successfully around phase two.
You’re probably wondering what phase two looks like from the point of view of applications? This is where things get very abstract and potentially the prose could get long-winded. But this is not to be unexpected since the foundation is in the abstract with the word “intention.” To try and get a flavour of the phase two application direction, imagine for a start that the demarcation lines between content, information access, entertainment, ecommerce unravel ever further and the result is intrinsically tied to an ever smarter fusion of more communication modalities. Now underpin that with attention and intention based economics.
Digital identity barriers to mobile community. Products that get it and grow, and those that don’t and fail the leap to community.
Lessons from six months of Skype on the iPhone, three months on Nokia smartphones. What worked, what didn’t. What was hot in some markets and not in others. What Skype changed for OS3 and the new model iPhones.
Mobile programmers emulating the music business ecosystem. iTunes and the other mobile stores are baiting small teams to form garage bands, craft apps the way musicians make songs, market themselves to followers the way bands do, and trade off publishing/producing themselves or getting signed to a major label. Store optimization changes mobile software design, software engineering practices, and business models.
Mobile data portability: the new privacy policy. Can you move, get, sync, and use your data (profile, contacts, conversations, media, and history) among mobile applications? Across phones? Between carriers? Between your PCs, web sites and your mobile? Not likely. Let’s look at the technologies and companies working in this area.
Friend Of A Friend: Guanxi and the need for introductions. Instant friending isn’t for everyone. Mobile, VoIM, and social apps designed in the West are losing to services where a third-person introduces and guides two people from strangers into relationship.
What mobile collaboration learns from war. Emergency medicine improves with each war; so does mobile communications, collaboration, coordination, and control. What have we learned from the last five years?
Handicapping the race to talkify the web. Odds-on favorites? Dark horses?
From asynch to synch. Blurring voice messaging, voice mail and live talk.
Undermining WebEx. Who is disrupting the leading seller of collaboration, conferencing, and other meeting services? Who is cheaper, faster, easier, and more fun? How is Cisco changing WebEx in response?
Real world Mobile Net Neutrality. Should your carrier limit citizen access to the Internet based on content? Based on device? Based on carrier’s competitive interests? Let’s hear from Deutsche Telekom and AT&T, from Skype and Google.
Running out of mobile bandwidth. Has demand for mobile data outstripped world and local supplies of capital to build out the data infrastructure? Are there regulatory hurdles? With today’s capital markets, where is the money coming from to pay for the buildout?
Rural Stimulus. Who got government money to build access to the Internet? Is it being spent wisely?
I’m in the New York Times coverage of Google Voice. Quoted correctly (yay!) but before my own column on the subject came out (d’oh!). Google has some truly delightful advantages in the race to become the world’s largest communications company.
Foresight Institute gets a new president. Skype me (evanwolf) if you want to come to Dr. Hall’s Sunday reception in Palo Alto. We’ll all be talking molecular manufacturing, nanotechnology and the singularity.
Nokia shares its vision. Smartphones rising. Death of patience. Rewarding engagement. Personal expression. New learning economy. Clickable world. Personal relevance. A good summary of forces driving the interplay between mobile technology, industry dynamics, and human behavior.
Benjamin Leviton seeks VoIP help: "I have a Brekeke SIP proxy server. I am looking for someone to remote on to my desktop, log into its interface and config my carriers with the proxy server. Also check the interface of Polycom phone and make sure it is working properly with the SIP proxy server." Contact: +1-917-273-5808, ben@capitalfinanceusa.com, yahoo IM gcc644@yahoo.com, or skype:levtop.
Skype‘s been saying its SILK audio codec is better than others. They released some data today supporting their claim.
Key measure is Mean Opinion Score, which compares sound as perceived before and after processing. Higher is better, greater fidelity.
In this chart, the codecs are tested at low bitrates (hard, on the left) to high bitrates (easy, on the right). Lots of bandwidth makes it easy to replicate sounds. SILK does better even at dial-up speeds, and SILK climbs in quality with even a little extra freedom.
It’s written in fixed point ANSI C, so it will run efficiently nearly anywhere.
It quickly adapts to changes in sample rate, network quantity/quality, and CPU resources. This minimizes audio artifacts and preserves quality.
Low delay frees up other parts of a system, cutting latency. SILK only needs 25 ms (20 ms frame size + 5 ms look-ahead).
SILK does double duty with non-speech media. Skype’s codec also works at music quality. Systems that stream music, television, movies, or ambient audio (games) will be able to use SILK.
Signal processing takes up huge overhead on mobile phones. As SILK moves from software to firmware, Skype suddenly takes up less memory, CPU, and power. Users get longer battery life, less heat, less latency. This would be a big win for Skype’s mobile strategy. Skype would work on much dumber, cheaper, ubiquitous smartphones: a vastly larger market.
Notes from the data sheet:
MOS (Mean Opinion Score) listening test was performed for Wideband speech signals by Dynastat, an independent 3rd party laboratory. Confidence intervals (95%) are +/- 0.1 MOS. All bitrates are measured and averaged over frames containing active speech. SILK and Speex were run in the highest complexity mode. Packet Loss and Office Noise tests were done with all codecs running at 18.25 kbps.
The Karaka libraries manage Skype farms (many instances of Skype running in a data center) and bridge chat users to the Skype network through XMPP applications.
Skype farming is part of building a gateway. Fring, iSkoot, Eqo, Ribbit, IM+ and anyone else who wants to offer Skype chat, Skype presence, Skype profiles and other Skype data must have a gateway. Karaka helps you build your farm management system.
Neil Stratford, Vipadia’s CEO, said "we needed the gateway to support our ClackPoint service – as a building block it seemed that it would be more widely useful, so we decided to release it publicly."
Scope of a generic Skype gateway?
Instance lifecycle management: creating, monitoring, and closing instances of Skype.
Instance virtualization: running your Skype instances on many servers/blades so you scale to meet demand.
Multisite hosting: minimizing latency (speeding up round trips) by routing conversations to the closest server with available resources
Skype client configuration: streamlining instances to avoid using a computer’s memory, cpu and bandwidth, and to avoid memory leaks.
Session management: mapping outside clients to sessions in your gateway, even when they have flaky connectivity.
Security: the usual, but more so.
Modeling: associating Skype’s data models for people, groups, chats, calls, to your own software and APIs.
What Karaka does and doesn’t do:
Instance lifecycle management: Yes.
Instance virtualization: Yes.
Multisite hosting: No. You can use DNS SRV record load balancing to different sites.
Skype client configuration: Defaults to a basic config, but you can script your own.
Session management: Yes.
Security: Up to you. "We have an API to enable encrypted transmission of credentials, but otherwise we rely on the security of the associated XMPP infrastructure."
Modeling: Yes for those elements in the XMPP definition, No for SIP call elements.
In English:
Look at Vipadia’s GPL’d libraries when you want to build a gateway to Skype, to have Skype inside your product or service.
The news release.
Vipadia is pleased to announce the release under the GPLv2 of Karaka, the open-source XMPP-Skype Gateway.
Existing Skype interconnect solutions focus on bridging voice even though the primary use of Skype is for instant messaging and associated presence data. Interconnecting with Skype messaging and presence has been a major stumbling block for many who wish to offer Skype interconnection to their network. Karaka bridges the XMPP and Skype clouds, removing this stumbling block by converting Skype messaging and presence to the popular XMPP protocol as used by, e.g., Google Talk.
Karaka is a scalable distributed XMPP transport that bridges instant messaging and presence between a user’s XMPP and Skype accounts. In addition to full presence and instant messaging exchange, it also automatically detects Skype multi-party conversations, elevating them into XMPP conference rooms.
Karaka implements the XMPP standards XEP-0100 for gateway support, XEP-0045 for multi-user chats and XEP-0144 for roster exchange.
Vipadia <http://vipadia.com/> is a Cambridge, UK based startup that creates and innovates in the field of IP communications, specialising in Voice, Video, Messaging and Presence over IP.
Karaka uses the Skype API but is not endorsed or certified by Skype.
Codec Evolution and Industry Proposal (Plus Skype Announcement)
The PSTN has been bandwidth limited from its inception. This was done to keep equipment costs down. But is 3kHz really enough to get your point across? Wideband audio has emerged in services like Skype and with today’s low cost, silicon based manufacturing and the move to all IP transmission there is an opportunity to finally break through the POTS bandwidth barrier. Jonathan will discuss the complex audio codec landscape and put forth a proposal for how we [the Industry] can make wideband audio ubiquitous.
Let’s parse this and madly speculate where Jonathan’s going.
The PSTN has been bandwidth limited from its inception. This was done to keep equipment costs down.
The public switched telephone network (PSTN) cuts off your speech’s top (high notes) and bottom (low notes). While some microphones and speakers, like those used by musicians, capture everything, most equipment in mobile phones, landline phones, speakerphones, or even Skype phones captures just enough of your sound to be understood.
But is 3kHz really enough to get your point across? Wideband audio has emerged in services like Skype
Wideband audio restores the lifelike quality of sound by capturing and playing more of your sound’s natural highs and lows. Skype’s new SILK codec, which moves sound between Skype and your computer, and between Skype and other Skype users, is a wideband codec. Incredibly vivid sound.
and with today’s low cost, silicon based manufacturing
Putting software into a chip… SILK codecs as semiconductor "cores"? A core is a readily usable bit of software already rendered in the software language of chip programming. Everything electronic has some sort of chip in it, from radios to cars. Pre-built cores make it fast, cheap, and easy to drop new features into your product. "SILK Inside"?
and the move to all IP transmission
Most mobile and landline phone companies have switched their plumbing from analog to digital to Internet Protocol.
there is an opportunity to finally break through the POTS bandwidth barrier.
POTS (plain old telephone service) is basic phone service, the one with the 3kHz bandwidth limits. Could the breakthrough be offering SILK Inside in the routers PSTN services use? In mobile phones?
Jonathan will discuss the complex audio codec landscape
Ummm. I haven’t a clue. But Jonathan should know; he’s been working in the codec business for years.
and put forth a proposal for how we [the Industry] can make wideband audio ubiquitous.
If you want something ubiquitous, you have to take away cost and risk. Sounds like open source to me.
So, again, this is me guessing what Skype will announce and all errors are mine:
Skype will release SILK with an open source license.
Skype will partner with an ASIC semiconductor manufacturer to release SILK in VHDL (or another chip design language).
Skype has partnerships with Cisco, Motorola, Nokia and other companies to use the chips in networking products and mobile handsets.
Let me make another assumption. Skype will announce a public platform in 2009. So people could make their own Skype clients or build Skype into their own products/services. To make that work, Skype needs to share codecs and encryption with developers. Licenses could be for packaged software or for open source libraries. I’m betting on open source for the codecs and shrinkwrapped for the encryption.