Understanding exactly how to implement the Web Real-Time Communication (WebRTC) protocol can be difficult. Experts often describe WebRTC (Web Real-Time Communication) as a peer-to-peer (P2P) communication protocol because it enables direct communication between browsers or devices without the need for data to be relayed through a central server. But despite what you might think, that doesn’t mean a server is never involved if you’re using WebRTC.

Making sense of the role of a WebRTC signaling server in establishing a P2P WebRTC connection is what this article is all about. Here’s what you need to know about what these servers do and how they assist in video streaming and IoT processes.

Understanding Signaling in WebRTC

Let’s say you have a message you want to give to your friend, whom you’ve only met online. You want to talk to them in person because the message is particularly sensitive, but you don’t know their home address, and they don’t know yours. So you set up a third location you’re both familiar with so that you can meet and exchange information directly.

A WebRTC signaling server provides that third location. It doesn’t forward all communication, but it does provide an initial method to set up communication so two browsers can “talk” to each other directly.

Now let’s talk about how this works in WebRTC’s most common use case, which is video conferencing. The two participants in a P2P video conversation are, unsurprisingly, called peers. In a WebRTC communication, what we’ll call Peer One, which is the browser of the first participant, will send an offer to connect with Peer Two. Since Peer One doesn’t know Peer Two’s home address, or rather, its IP address, it sends that offer to a signaling server. The offer includes information like Peer One’s IP address and other information it needs to set up a video stream. The signaling server forwards that offer to Peer Two.

Peer Two receives the offer and sends an answer, letting Peer One know its IP address, what time it can meet, what format it will use, and so on. The signaling server forwards the answer along to Peer One, and finally the two peers can establish a direct connection. That’s when the video conference happens, and the two peers can talk with each other in real time.

Perhaps the best part of WebRTC is that it’s not just for sending and receiving video in real time. You can also use it to send files, images, and other data in various formats. WebRTC is very flexible in terms of the types of data it can handle, and what’s more it doesn’t just work with browsers. You can also use WebRTC with mobile apps for Android and iOS. Since many people want to be able to view video feeds from their phones in, for example, home security IoT applications, the protocol is gaining ground as an important IoT communication method.

WebRTC Signaling Server Communication Methods

Now that you have a high level view of how WebRTC sets up video calls, let’s take a deeper look at the exact methods that peers in a WebRTC interaction can use to send information like IP addresses and codecs (formats) for a call. First, you should understand that a WebRTC signaling server typically uses a format called the Session Description Protocol (SDP). SDP is just a format that other protocols use to transfer data; it’s not an actual communication protocol itself.

An SDP message will communicate information like:

  • Media type – Whether the media that the peers are transmitting will be audio only or include video, images, etc.
  • Media format – What type of encoding the interaction will use to compress and decompress files
  • Transport protocol – The actual protocol that will transport the information; in this case, WebRTC
  • Media attributes – Which include bandwidth requirements
  • Connection information – Including IP addresses and information about the network connection

There are two main ways that SDP messages cross the internet to and from signaling servers, and that is via the User Datagram Protocol (UDP) or the Transmission Control Protocol (TCP). In the context of WebRTC, TCP’s reliability makes it suitable for signaling. Basically, TCP ensures reliable transmission of data by checking for errors and ensuring all packets are delivered in the correct order.

On the other hand, UDP does not guarantee the reliability of data transmission. It sends packets without waiting for confirmation, which can result in packet loss. So while developers can use UDP for the media stream itself due to its low latency, they probably won’t use it as often for signaling in WebRTC because losing signaling data can lead to failure in setting up the communication channel.

You can see a basic comparison in the table below:

Feature UDP (User Datagram Protocol) TCP (Transmission Control Protocol)
Reliability Less reliable, no guarantee of packet delivery Highly reliable, ensures packet delivery
Connection Connectionless, no handshake Connection-oriented, uses handshake
Speed Faster due to minimal overhead Slower, due to acknowledgments and retransmissions
Data Order No guarantee of data order Ensures data is ordered correctly
Use Cases Real-time applications (e.g., video streaming, gaming) Applications requiring reliability (e.g., file transfers, web pages)
Complexity Simpler, easier to implement More complex due to error checking and flow control
Overhead Lower overhead, less data control Higher overhead, more data control mechanisms

 

WebRTC Signaling Server Protocols

WebRTC doesn’t standardize the exact exchange of offers in a signaling process. But here we will run through some ways the exchange can be handled. Just a few of the common choices are SIP over WebSocket, XMPP, and proprietary protocols/platforms. Many developers also choose to create their own custom signaling protocols, since that allows them greater flexibility in how they use signaling in WebRTC scenarios.

Note that what follows is a very simplified explanation, as these protocols are pretty complex in terms of how they work within WebRTC.

1. Session Initiation Protocol (SIP) over WebSocket

WebSocket is a communication protocol that provides two-way communication over a single connection. Developers like to use WebSocket for WebRTC signaling because it allows real-time communication to and from the WebRTC signaling server. Its advantages are that it is pretty low latency and fairly energy efficient. But WebSocket isn’t compatible with every WebRTC-compatible application or protocol, so it may not be possible to use it for all WebRTC interactions. Devs may also need to do some extra work to help WebSocket communications traverse firewalls properly.

The Session Initiation Protocol (SIP) is a signaling protocol that works with WebSocket to provide signaling in web applications, like a WebRTC video conference. While SIP is a rather complex protocol for some simple WebRTC applications, like a basic video surveillance application, SIP over WebSocket is still a common choice.

2. Extensible Messaging and Presence Protocol (XMPP)

While XMPP is widely used in IoT applications, it’s not as common for WebRTC. That’s because it’s pretty complicated and while it can be compatible with WebRTC, it wasn’t originally designed for that use. Instead, XMPP is a communication protocol for instant messaging. Still, it’s useful for signaling in applications that already use XMPP. For example, if you run a telehealth platform that allows instant messaging via XMPP, at some point you want to add video conferencing so patients can talk to their doctors directly via WebRTC. In that case, you would also use XMPP for signaling in the WebRTC video conference interactions, because that would be simpler than adding a whole new protocol to your telehealth platform.

3. Proprietary protocols – Nabto

There are also IoT communication and connectivity platforms that provide proprietary protocols and methods for signaling. Nabto’s WebRTC offering is another option that uses Nabto Edge Streams to securely and reliably perform signaling without relying on a central mediator. In Nabto’s WebRTC solution, the Nabto base station establishes an end-to-end encrypted P2P connection, and the signaling takes place directly between the two peers rather than on the base station itself. This enhances the security and reliability of the connection beyond traditional signaling processes.

STUN/NAT Traversal vs. TURN/ICE

Developers need to make sure communications can get through firewalls for WebRTC signaling servers to work properly. STUN/NAT traversal connections are preferred for this purpose, but sometimes STUN cannot decipher the correct IP-address/port because a strong firewall randomizes the assignment. A “strong” firewall normally is one that you cannot test which address (IP and port) you will be assigned to when trying to use the STUN service.

Every time you try to communicate out through the NAT firewall you will get a new IP and random port assigned, so you cannot determine what offer to send to the other peer (since you have no way to check which address, port you are assigned to). By contrast, in a “relaxed” firewall you will be assigned the same port (or close, i.e. assigned port+1 or the like) when communicating with the peer (after you tested using the STUN server).

When STUN fails, the connection fallback is ICE/Turn, which is basically relaying through a 3rd party server. So now let’s look at the methods themselves in more detail.

1. Session Traversal Utilities for NAT (STUN)

More than four in five of all WebRTC-based video calls rely on Session Traversal Utilities for NAT, which is commonly known as STUN. And before you can understand STUN, you need to know about Network Address Translation (NAT). Basically, NAT takes the private IP address of a device and turns it into a public IP address that all of the devices on the same local network can share so that address is discoverable to the WebRTC signaling server. NAT also protects the individual device’s IP address from being visible to any old passing server.

The problem with NAT is that, while it’s necessary to make the IP address discoverable in order to establish a connection, that very process of making the public IP address prevents Peer Two from being able to see the actual address of the device.

Let’s go back to that previous example. If you wanted to set up a meeting with your friend so you could talk directly to them, but you could only give your friend the name of the cross streets for the area in which you planned to meet, that person would have to check every building in that area in order to actually meet up with you. The cross streets give your friend a place to start, but now you need the actual address or name of the building you want to meet in.

So the public IP address has to be translated back into the private address of the particular device. STUN does that by sending a request to a server, which will take a look at the public IP address and send it to the Peer One device. The Peer One device will take the public IP address and match it with its private IP address to share with other peers and establish the direct P2P connection.

All in all, STUN plays a crucial role in NAT environments by facilitating the discovery of public IP addresses and ports, enabling devices behind NAT to set up P2P connections more effectively.

2. Traversal Using Relays around NAT (TURN)

Now suppose a strong firewall is stopping a STUN server from forwarding the information it needs. In which case, the system will fall back on Traversal Using Relays around NAT, or TURN. TURN relays all traffic, not just the initial signaling process, through a server on the public internet. This requires more bandwidth and can increase latency, since the messages have to make a stop at the server on their way to the peers. So ICE only uses TURN when it simply can’t create a P2P connection.

Developers can configure ICE to rely on TCP tunneling to get traffic to and from the TURN server. TCP tunneling makes the connection more secure and is especially useful for video streaming WebRTC applications like smart video surveillance. To learn more about how Nabto’s TCP tunneling works in video surveillance, take a look at our recent blog post.

Final Thoughts

WebRTC signaling servers are a complicated technology, and there are a lot of protocols that come into play to make sure two peers can communicate with each other. And despite the fact that WebRTC communication is meant to be P2P, a session can involve various servers as well. If you want more information on how to implement a secure and reliable WebRTC IoT video streaming application, contact us and request a consultation.

Read our other resources

We’ve also published a range of IoT device resources for our community, including:

Want to learn more about P2P IoT?

Please visit the:
P2P IoT Academy

Deep dive Into our documentation?

Please visit the:
Nabto Platform Overview

Try our demo for Video Surveillance?

Please visit the:
Nabto Edge
Video Cam Demo

Looking for other Great posts?

###