Web Real Time Communication (WebRTC) is a popular video streaming and voice communication protocol created by Google, and is a well known option for video surveillance and video conferencing. This blog post addresses such questions as: What is WebRTC? How does it work? And what are its top applications and use cases?
The Basics of WebRTC
WebRTC is open source and enables real-time communication directly in web browsers and mobile applications. It also facilitates video, audio, and data sharing without the need for additional plugins or applications. WebRTC allows users to engage in voice chat, video chat, and peer-to-peer data sharing directly within web browsers, so users don’t need any external software or plugins. One of its most important features is that it is browser-agnostic, so you can use it with Chrome, Firefox, or any other browser that supports WebRTC. It’s also operating system-agnostic, so you can use it whether you’re using Windows, Linux, or Mac.
Another important feature of WebRTC is that it is fully adaptive to network conditions. Say you’re trying to have a Google Meet video conference at work and you’re in the middle of an important presentation. Your network starts to slow down because too many people are using it at once. Instead of overwhelming the network by requiring too much bandwidth and then shutting down your call, WebRTC is able to make the video more grainy or reduce the sound quality to keep you connected even with poor network speeds.
WebRTC also has some nice security features built in. For example, it encrypts all communication through Datagram Transport Layer Security (DTLS). Think of DTLS as a convoy of armored trucks that’s responsible for creating a safe and secure route between the sender and the receiver. Before the journey starts, DTLS checks the credentials of both the sender and receiver (like verifying IDs) to make sure the parties are who they say they are. DTLS provides authentication in WebRTC to verify that all participants in the media stream are legitimate.
Once DTLS confirms identities, DTLS creates a secure, encrypted path for the packages to travel through. DTLS encryption makes sure that the “trucks” of the convoy are locked up tight with a key that only the intended recipient can use to unlock them.
WebRTC also uses the Secure Real-time Transport Protocol (SRTP) to make packages of data even more secure. SRTP are like elite guards who specifically train to protect the packages – meaning your video and audio streams – and not just the route or method of communication. These guards accompany the packages inside the DTLS convoy trucks. SRTP makes sure that every package is tamper-proof and properly sealed, and they have tools to detect any tampering during transit.
Developers commonly use WebRTC with Hypertext Transfer Protocol Secure (HTTPS) as well. HTTPS makes sure that the processes that send information – about when the session will start and end and other important aspects – are also encrypted and can’t be tampered with.
The last and most important feature of WebRTC that I’ll mention is one that also contributes to more secure communication, as well as reduced latency and faster speeds on calls. WebRTC is a peer-to-peer (P2P) protocol, meaning that you can use it to communicate directly between two devices without an intermediary like a cloud server. Video can go straight from a video camera to a browser, which means it’s harder for any malicious actors or server problems like breakdowns or bottlenecks to interfere with the stream. And because the stream doesn’t have to take any detours, it also gets to its destination faster. All of these features make WebRTC an excellent option for a lot of different P2P video streaming applications.
How a WebRTC interaction works
During a video call or any other kind of WebRTC interaction, there are some steps that your device will follow to make sure the call gets through.
The first thing that happens is something called signaling. Signaling in WebRTC is when the participating browsers send some information about how to establish a connection. The info will include the IP address of the relevant devices so they can find each other on the Internet, the type of data that the browsers will use during communication, and any other configuration information the devices need to communicate smoothly.
The next step is Network Address Translation (NAT) traversal, which allows the participants to communicate through firewalls. To understand NAT traversal, you first need to understand how network address translation works. Let’s say you’ve got two laptops and a tablet in your home and you want to use your tablet for a video call to your friend. Your laptops and your tablet all have different private IP addresses that are only visible to your home network.
Those private IP addresses need to be translated into a public address that other devices outside of your network can see. But there’s a limit to how many public addresses are available, so a NAT session takes the IP addresses of your two laptops and your tablet and gives them just one public address that’s visible to anyone. It’s sort of like being in a hotel and making a call; each room has its own number, but the building itself has only one phone number that’s publicly available.
NAT is necessary because there’s a limited number of public IP addresses, so every device can’t have its own. But it causes some issues as well. After all, your friend doesn’t want to call all the devices in your house; your friend wants to call you. But your friend only has access to the public number. So NAT saves an additional number, called a port number. That number is only available upon request. Back to the hotel analogy, when a guest in room 101 (a device on your network) makes a call, the hotel operator (NAT) notes down that a call made from room 101 is using line 1 (assigns a port number). This is similar to how NAT assigns a unique port number to each outgoing internet connection.
Now, if your friend wants to call you back, they can’t directly dial room 101 since they only see the hotel’s main phone number. Instead, they call the hotel’s main number and mention they want to speak with you again. The hotel operator (NAT), using the earlier note, knows that line 1 is connected to room 101 and routes the call accordingly.
STUN, TURN, and ICE
So to fix the problem caused by the necessity of NAT, NAT traversal acts like an operator to make sure your friend gets to the right room number and connects directly with your tablet. There are two different methods that WebRTC can use for NAT traversal.
The first is Sessional Traversal Utilities for NAT (STUN). Basically, STUN is just a method in case your friend can’t reach the hotel clerk (the NAT session) directly because of firewalls or other security measures. STUN acts like a call center operator in the hotel that helps you connect with the hotel clerk. In WebRTC, STUN routes your friend’s call to where it needs to go so it can eventually reach your device. But some firewalls are so complex or restrictive that STUN isn’t allowed, in which case, NAT traversal will require the help of Traversal Using Relays around NAT (TURN). TURN basically relies on an external server to reroute the connection so communication can be established. It relays all traffic between the two devices through that server, which increases latency and bandwidth but ensures a connection. Again, for our hotel analogy, TURN is like an external call center instead of a call center located at the hotel. It reroutes your call to the correct place and also controls the connection.
To choose between those methods and decide what’s the best way to connect, WebRTC can use Interactive Connectivity Establishment (ICE). ICE acts like your smart assistant that first tries to call the hotel directly, then tries the call center at the hotel. When that doesn’t work, ICE switches to the TURN method and relays through an external server.
Once the devices have completed NAT traversal to establish a connection, WebRTC will actually stream the video, voice, and any other data between the two parties until it’s time for the call to end. At which point, WebRTC will hang up the call for you.
What is WebRTC used for?
The fact that you can use WebRTC to share data files in various formats means it’s not limited to video surveillance or video conferencing applications. You can use it to send static images, pre-recorded videos, and anything else you want to send. That means it’s ideal for Internet of Things (IoT) applications because you can use it to send commands to IoT devices directly in a browser, or send data to devices, or stream video from IoT devices to a browser. In addition, developers can adapt WebRTC for mobile applications, so if you want to monitor a security camera feed from your phone, you could potentially do that with WebRTC.
And there are plenty of other uses as well. For example, WebRTC enables telehealth visits. Patients don’t even have to download any specialized software and can meet with doctors virtually directly via a web browser. And, of course, teachers found out just how useful WebRTC could be when they were forced to teach online during the pandemic.
Beyond one-on-one communication, WebRTC can also allow one-to-many communication. For example, a gamer can stream plays to a wide audience via WebRTC or connect with several friends in a multiplayer scenario. The same goes for sports broadcasting, webinars, or sales events.
WebRTC isn’t merely a video protocol; it also allows participants to send data files, so you can use it in teleoperations to control IoT devices like smart cameras or even smart cars and smart machinery. And it has some interesting potential applications in smart security and smart industrial operations as a result.
Does WebRTC have a downside?
Just because you can use WebRTC for a lot of different applications doesn’t mean it’s always the best option. It’s not the most scalable protocol, so even though you can use it to broadcast to a big audience, you may find it uses up a lot of bandwidth and the stream runs slowly with a lot of latency as a result.
WebRTC is also highly sensitive to network speeds, and those will affect the quality of the stream. And, unlike some other protocols, WebRTC allows some data loss. That means that certain bits of data might not go through properly, which isn’t a big problem for gamers who are streaming their game play, but it can become a huge problem if you’re using WebRTC to control a drone, a smart car, or an industrial machine that needs to be able to shut off in a split second if it becomes a danger to a human being.
The protocol may also not be ideal for highly energy-constrained applications – for example, for smaller IoT devices that operate on a battery – as WebRTC can consume a lot of power. Still, WebRTC remains a top option for voice and video chat communication over the internet, and with time, it may become more common in IoT as well.
WebRTC can be pretty complicated to understand, and this article barely gets to the basics of what it’s capable of or how it works. But one of the fundamental features of WebRTC, namely that it is P2P, makes it ideally suited for IoT. Unfortunately, the high bandwidth may make it less suitable for certain IoT use cases.
To learn how to use Nabto’s upcoming WebRTC solution with all the benefits of the Nabto Edge platform, which is low-latency and ideal for resource constrained applications, contact us and request a consultation.
Read our other resources
We’ve also published a range of IoT device resources for our community, including:
- Our P2P explainer, which lays out the many benefits of P2P communication for IoT
- IoT and the Future of Video Surveillance, which covers what you can expect to see in smart security applications
- Why Use P2P for IoT Video Streaming?, a guide to why P2P is often the best option for IoT video applications