Google Talk: 24 hours later

« previous entry | next entry »
Aug. 25th, 2005 | 12:00 am
music: Thievery Corporation - Heart of Lonely Hunter

Jabber’s biggest problem has always been evangelism and, recently, features. No one pushes it and there’s no standard way of handling VOIP (voice chat) or webcams. Google sets up their Talk service and things looked really bright. They already have VOIP going and they’ve drawn a lot of attention simply because it’s Google. Heck, I already have more people with Google Talk accounts than I’ve ever had on Jabber

Sadly, the current version is crippled. Their server doesn’t:
  • connect to other Jabber servers
  • support chat rooms (conferences)
  • have transports (so you could connect to AIM, MSN, etc.)
The first two are really basic features that Jabber’s had forever. Missing them really irritates me, especially with Google talking about “server choice” and “openness.” Transports would be fantastic, but I’m not surprised that they’re unsupported. The existing implementations are notoriously buggy and Google is big enough that they need to make arrangements with the other services.

Here’s a direct link to the client.

Now, for the interesting bits they added on to XMPP:

Request your roster, with GMail contacts:
<iq type=“get” id=“9”>
  <query xmlns=“jabber:iq:roster” xmlns:gr=“google:roster” gr:ext=“true” gr:include=“all”/>
</iq>

Roster, with GMail contacts and Google Talk status:
<iq to=“nikolas.coukouma@gmail.com/Talk.v64E5E0ED0F” id=“9” type=“result”>
  <query gr:ext=“true” xmlns=“jabber:iq:roster” xmlns:gr=“google:roster”>
    <item jid=“atrus@atrus.org” subscription=“none” ask=“subscribe” gr:t=“H” gr:w=“1” gr:mc=“1” gr:emc=“2”/>
    <item jid=“mwittig@gmail.com” subscription=“both” gr:w=“1” gr:mc=“5”>
      <group>Buddies</group>
      <group>talk.google.com</group>
    </item>
    <item jid=“atrus@bloodyxml.com” subscription=“none” ask=“subscribe” gr:t=“N” gr:w=“0” gr:mc=“1”>
      <group>Me</group>
    </item>
    <item jid=“atrus@umd.edu” subscription=“none” ask=“subscribe” gr:t=“H” gr:w=“0” gr:mc=“3” gr:emc=“1”/>
    <item jid=“dcrookston@gmail.com” subscription=“both” gr:t=“A” gr:w=“0” gr:mc=“3” gr:emc=“1”/>
    <item jid=“kendric.beachey@gmail.com” subscription=“none” name=“Kendric Beachey” gr:t=“H”/>
  </query>
</iq>
Here’s a table with all those properties for each contact:
jidgr:tgr:wgr:mcgr:emcdescription
atrus@atrus.orgH112invited e-mail address
mwittig@gmail.com15GTalk user via Gaim
atrus@bloodyxml.comN01Jabber account
atrus@umd.eduH031another invited e-mail address
dcrookston@gmail.comA031another GTalk user, with official client

Presence notification that announces Google’s VOIP capability:
<presence>
  <status/>
  <priority>0</priority>
  <c xmlns=“http://jabber.org/protocol/caps” node=“http://www.google.com/xmpp/client/caps” ver=“1.0.0.64” ext=“voice-v1”/>
  <x xmlns=“jabber:x:delay” stamp=“20050825T01:59:56”/>
</presence>

Poll for GMail updates:
<iq type=“get” id=“13”>
  <query xmlns=“google:mail:notify” newer-than-time=“1124933545351”/>
</iq>

Response to GMail polling:
<iq to=“nikolas.coukouma@gmail.com/Talk.v64E5E0ED0F” id=“13” type=“result”>
  <mailbox total-matched=“0” result-time=“1124935189786” xmlns=“google:mail:notify”/>
</iq>

Google’s proprietary settings:
<iq type=“set” to=“nikolas.coukouma@gmail.com” id=“8”>
  <usersetting xmlns=“google:setting”>
    <autoacceptsuggestions value=“true”/>
    <autoacceptrequests value=“false”/>
    <mailnotifications value=“true”/>
  </usersetting>
</iq>
Reply sent:
<iq from=“nikolas.coukouma@gmail.com” to=“nikolas.coukouma@gmail.com/Talk.v64E5E0ED0F” id=“8” type=“result”/>

Invite to voice chat:
<iq to=“dcrookston@gmail.com/Talk.v64E5E0360E” type=“set” id=“19”>
  <session xmlns=“http://www.google.com/session” type=“candidates” id=“2396058376”
           initiator=“nikolas.coukouma@gmail.com/Talk.v64E5E0ED0F”>
    <candidate name=“rtp” address=“192.168.4.27” port=“4561” username=“KvNBwuDHpopXqO9O”
               password=“v76FBqEi0nG01DaA” preference=“1” protocol=“udp” type=“local”
               network=“0” generation=“0”/>
  </session>
</iq>

Reply recieved
<iq type=“result” to=“nikolas.coukouma@gmail.com/Talk.v64E5E0ED0F” id=“19” from=“dcrookston@gmail.com/Talk.v64E5E0360E”/>
Note that the id and initiator fields will be repeated in future messages for tracking. The network and generation numbers remain a mystery. Perhaps they're meant for future use?
Invite to voice chat (with STUN):
<iq to=“dcrookston@gmail.com/Talk.v64E5E0360E” type=“set” id=“20”>
  <session xmlns=“http://www.google.com/session” type=“candidates” id=“2396058376”
           initiator=“nikolas.coukouma@gmail.com/Talk.v64E5E0ED0F”>
    <candidate name=“rtp” address=“128.8.244.15” port=“24655” username=“CMpV68TnXxRa5dYc”
               password=“l/B4NSooSERrHpSY” preference=“0.9” protocol=“udp” type=“stun”
               network=“0” generation=“0”/>
  </session>
</iq>

Reply received
<iq type=“result” to=“nikolas.coukouma@gmail.com/Talk.v64E5E0ED0F” id=“20” from=“dcrookston@gmail.com/Talk.v64E5E0360E”/>

The person you requested sends back some candidates. Here’s the local one:
<iq to=“nikolas.coukouma@gmail.com/Talk.v64E5E0ED0F” type=“set” id=“30” from=“dcrookston@gmail.com/Talk.v64E5E0360E”>
  <session type=“candidates” id=“2396058376” initiator=“nikolas.coukouma@gmail.com/Talk.v64E5E0ED0F”
           xmlns=“http://www.google.com/session”>
    <candidate name=“rtp” address=“192.168.0.100” port=“2086” username=“DrXck04CSHo1vQ1b”
               password=“Z+dHlskCJBjLYVjc” preference=“1” protocol=“udp” type=“local”
               network=“0” generation=“0”/>
  </session>
</iq>
Reply sent
<iq type=“result” to=“dcrookston@gmail.com/Talk.v64E5E0360E” id=“30”/>

And here’s the STUN candidate:
<iq to=“nikolas.coukouma@gmail.com/Talk.v64E5E0ED0F” type=“set” id=“31” from=“dcrookston@gmail.com/Talk.v64E5E0360E”>
  <session type=“candidates” id=“2396058376” initiator=“nikolas.coukouma@gmail.com/Talk.v64E5E0ED0F”
           xmlns=“http://www.google.com/session”>
    <candidate name=“rtp” address=“67.161.243.72” port=“61851” username=“0Vqv/oVBSSnWyZh9”
               password=“/nTVvwxQ7Up3YH0u” preference=“0.9” protocol=“udp” type=“stun”
               network=“0” generation=“0”/>
  </session>
</iq>
Reply sent:
<iq type=“result” to=“dcrookston@gmail.com/Talk.v64E5E0360E” id=“31”/>

After the candidates are sorted out (try connection via highest pref and then down), you’ll receive an acceptance:
<iq to=“nikolas.coukouma@gmail.com/Talk.v64E5E0ED0F” type=“set” id=“33” from=“dcrookston@gmail.com/Talk.v64E5E0360E”>
  <session type=“accept” id=“2396058376” initiator=“nikolas.coukouma@gmail.com/Talk.v64E5E0ED0F” xmlns=“http://www.google.com/session”>
    <description xmlns=“http://www.google.com/session/phone”>
      <payload-type id=“103” name=“ISAC” xmlns=“http://www.google.com/session/phone”/>
      <payload-type id=“97” name=“IPCMWB” xmlns=“http://www.google.com/session/phone”/>
      <payload-type id=“102” name=“iLBC” xmlns=“http://www.google.com/session/phone”/>
      <payload-type id=“4” name=“G723” xmlns=“http://www.google.com/session/phone”/>
      <payload-type id=“100” name=“EG711U” xmlns=“http://www.google.com/session/phone”/>
      <payload-type id=“101” name=“EG711A” xmlns=“http://www.google.com/session/phone”/>
      <payload-type id=“0” name=“PCMU” xmlns=“http://www.google.com/session/phone”/>
      <payload-type id=“8” name=“PCMA” xmlns=“http://www.google.com/session/phone”/>
      <payload-type id=“13” name=“CN” xmlns=“http://www.google.com/session/phone”/>
    </description>
  </session>
</iq>
Reply sent:
<iq type=“result” to=“dcrookston@gmail.com/Talk.v64E5E0360E” id=“33”/>
I'm guessing that you pick your codec of choice from the list and then start sending RTP, starting with an ID for the codec. It's not clear whether the username and password are used in RTP or during the connection negotiation (candidate stuff).

Ending a voice chat:
<iq to=“nikolas.coukouma@gmail.com/Talk.v64E5E0ED0F” type=“set” id=“34” from=“dcrookston@gmail.com/Talk.v64E5E0360E”>
  <session type=“terminate” id=“2396058376” initiator=“nikolas.coukouma@gmail.com/Talk.v64E5E0ED0F” xmlns=“http://www.google.com/session”/>
</iq>
Reply recieved:
<iq type=“result” to=“dcrookston@gmail.com/Talk.v64E5E0360E” id=“34”/>


They seem to have a pretty nice system for using STUN and RTP in conjunction with Jabber. STUN provides NAT (router) busting and RTP is a generic real-time protocol on top of UDP. Note that IPCMWB, iLBC, G.723, EG711U, EG711A, PCMU (part of G.711), PCMA (also part of G.711), and CN (comfort noise) are well established audio codecs. RFC 1890 specifies how to wrap most of those in RTP. RFC 3389. I don’t have any packets captured for the RTP bit, so I’m not sure if they’re actually using any of it.

The above info is sorta covered by the developer page.

Because Google is talking about adding SIP support, I think they’ll end up re-implementing Gizmo. I’m not terribly excited because Gizmo is very friendly. The big advantage, of course, is that they are Google and Gizmo is not. There’s always the obvious speculation that Google will simply buy SIPphone, the creator of Gizmo.

[edit: To clarify a bit:
Google’s work on VOIP+Jabber is important because there’s no standard for it yet. There’s a long and interesting thread from Febuary titled “VOIP and Jabber” on the standards mailing list. JEP 0111 is the current attempt at a standard. Google’s is simpler but may be missing something ... I might try to whip up a JEP-like summary of Google’s implementation later, but first I’ll prod them about it.]

Link | Leave a comment | Add to Memories | Tell a Friend

Comments {7}

Тимофій

(no subject)

from: [info]tymofiy
date: Aug. 25th, 2005 01:06 am (UTC)
Link

Did they said something about other Jabber servers? It would be greatest thing in IM ever.

Reply | Thread

Atrus

(no subject)

from: [info]nikolasco
date: Aug. 25th, 2005 10:39 am (UTC)
Link

I'm not sure I understand the question ... at the moment they're not connecting to other Jabber servers and haven't said anything about. Everybody is crying out for it, but there hasn't been a response (that I've seen).

Reply | Parent | Thread

Missing the initiator

from: anonymous
date: Aug. 26th, 2005 06:55 pm (UTC)
Link

Using your reverse engineered specifications I was able to initiate a call (only to have it fail when they picked up, but that was to be expected). However you missed the initial "initiation":
<iq to='xxx1@gmail.com/Talk.v64EC3D4A77' type='set' id='24'
        from='xxx2@gmail.com/tkabber'>
<session type='initiate' id='3400396481'
        initiator='xxx2@gmail.com/tkabber'
        xmlns='http://www.google.com/session'>
<description xmlns='http://www.google.com/session/phone'>
<payload-type id='103' name='ISAC' xmlns='http://www.google.com/session/phone'/>
<payload-type id='97' name='IPCMWB' xmlns='http://www.google.com/session/phone'/>
<payload-type id='102' name='iLBC' xmlns='http://www.google.com/session/phone'/>
<payload-type id='100' name='EG711U' xmlns='http://www.google.com/session/phone'/>
<payload-type id='101' name='EG711A' xmlns='http://www.google.com/session/phone'/>
<payload-type id='0' name='PCMU' xmlns='http://www.google.com/session/phone'/>
<payload-type id='8' name='PCMA' xmlns='http://www.google.com/session/phone'/>
<payload-type id='13' name='CN' xmlns='http://www.google.com/session/phone'/>
</description>
</session>
</iq>
The remote end then gets a dialog asking if they want to accept the call, and it progresses as you said above. It looks like it's a fairly straight forward conversion of SIP to XMPP, so I'm hoping to follow the SIP/XMPP specifications and implement a linux client.

Reply | Parent | Thread

Evan Martin

(no subject)

from: [info]evan
date: Sep. 5th, 2005 09:05 pm (UTC)
Link

http://www.eightypercent.net/Archive/2005/08/23.html#a249
has comments from a developer discussing it.

Reply | Parent | Thread

STUN, ICE-5 (6?)

from: anonymous
date: Oct. 27th, 2005 01:23 pm (UTC)
Link

They might be something like this, which incorporates STUN:
http://www.networkworld.com/news/tech/2005/080105techupdate.html

IETF draft:
http://www.ietf.org/internet-drafts/draft-ietf-mmusic-ice-06.txt

Reply | Thread

jingle

from: anonymous
date: Jan. 19th, 2006 08:45 am (UTC)
Link

There's now an 'open standard' for VOIP over XMPP. It's called Jingle, and the specs can be found here:
  • Jingle Signaling (http://www.jabber.org/jeps/jep-0166.html)

  • Jingle Audio (http://www.jabber.org/jeps/jep-0167.html)

Reply | Thread

Atrus

Re: jingle

from: [info]nikolasco
date: Jan. 19th, 2006 10:59 am (UTC)
Link

Yes, it was filed by Google in December. I suppose I should have added a note to this old entry at the time.

Reply | Parent | Thread