A Complete Guide to the Real-Time Streaming Protocol (RTSP)

With video surveillance increasingly becoming a top application of smart technology, video streaming protocols are getting a lot more attention. We’ve recently spent a lot of time on our blog posts discussing real-time communication, both to and from video devices, and that has finally led to an examination of the Real-Time Streaming Protocol (RTSP) and its place in the Internet of Things (IoT).

What is the Real-Time Streaming Protocol?

The Real-Time Streaming Protocol is a network control convention that’s designed for use in entertainment and communications systems to establish and control media streaming sessions. RTSP is how you will play, record, and pause media in real time. Basically, it acts like the digital form of the remote control you use on your TV at home.

We can trace the origins of RTSP back to 1996, when a collaborative effort between RealNetworks, Netscape, and Columbia University developed it with the intent to create a standardized protocol for controlling streaming media over the Internet. These groups designed the protocol to be compatible with existing network protocols, such as HTTP, but with a focus specifically on the control aspects of streaming media, which HTTP did not adequately address at the time.

The Internet Engineering Task Force (IETF) officially published RTSP in April of 1998. Since the inception of RTSP, IoT developers have used it for various applications, including for streaming media over the Internet, in IP surveillance cameras, and in any other systems that require real-time delivery of streaming content.

It’s important to note that RTSP does not actually transport the streaming data itself; rather, it controls the connection and the streaming, often working in conjunction with other protocols like the Real-time Transport Protocol (RTP) for the transport of the actual media data.

RTSP works on a client-server architecture, in which a software or media player – called the client – sends requests to a second party, i.e., the server. In an IoT interaction, the way this works is typically that the client software is on your smartphone or your computer and you are sending commands to a smart video camera or other smart device that acts as the server. The server will respond to requests by performing a specific action, like playing or pausing a media stream or starting a recording. And you’ll be able to choose what the device does in real time.

Set up a simulated IoT Video surveillance device on your PC in minutes.

Our full-stack demos give you access to the Nabto Platform so you can try it now. We specialize in secure, low-latency, P2P connectivity. Get the demo app to try it.

Get App Demo

Understanding RTSP requests

So, the client in an RTSP connection sends requests. But what exactly does that mean?

Basically, the set up process for streaming via RTSP involves a media player or feed monitoring platform on your computer or smartphone sending a request to the camera’s URL to establish a connection. This is done using the “SETUP” command for setting up the streaming session and the “PLAY” command to start the stream. The camera then responds by providing session details so the RTP protocol can send the media data, including details about which transport protocol it will use.

Once the camera receives the “PLAY” command through RTSP, it begins to stream packets of video data in real-time via RTP, possibly through a TCP tunnel (more on this later). The media player or monitoring software then receives and decodes these video data packets into viewable video.

Here’s a more thorough list of additional requests and their meanings in RTSP:

OPTIONS: Queries the server for the supported commands. It’s used to request the available options or capabilities of a server.
DESCRIBE: Requests a description of a media resource, typically in SDP (Session Description Protocol) format, which includes details about the media content, codecs, and transport information.
SETUP: Initializes the session and establishes a media transport, specifying how the media streams should be sent. This command also prepares the server for streaming by allocating necessary resources.
PLAY: Starts the streaming of the media. It tells the server to start sending data over the transport protocol defined in the SETUP command.
PAUSE: Temporarily halts the stream without tearing down the session, allowing it to be resumed later with another PLAY command.
TEARDOWN: Ends the session and stops the media stream, freeing up the server resources. This command effectively closes the connection.
GET_PARAMETER: Used to query the current state or value of a parameter on the session or media stream.
SET_PARAMETER: Allows the client to change or set the value of a parameter on the session or media stream.

Once a request goes through, the server can offer a response. For example, a “200 OK” response indicates a successful completion of the request, while “401 Unauthorized” indicates that the server needs more authentication. And “404 Not Found” means the specified resource does not exist. If that looks familiar, it’s because you’ve probably seen 404 errors and a message like “Web page not found” at least once in the course of navigating the internet.

The Real-Time Transport Protocol

As I said earlier, RTSP doesn’t directly transmit the video stream. Instead, developers use the protocol in conjunction with a transport protocol. The most common is the Real-time Transport Protocol (RTP). RTP delivers audio and video over networks from the server to the client so you can, for example, view the feed from a surveillance camera on your phone. The protocol is widely used in streaming media systems and video conferencing to transmit real-time data, such as audio, video, or simulation data.

Some of the key characteristics of RTP include:

Payload Type Identification: RTP headers include a payload type field, which allows receivers to interpret the format of the data, such as the codec being used.
Sequence Numbering: Each RTP data packet is assigned a sequence number. This helps the receiver detect data loss and reorder packets that arrive out of sequence.
Timestamping: RTP packets carry timestamp information to enable the receiver to reconstruct the timing of the media stream, maintaining correct pacing of audio and video playback.

RTP and RTSP are still not enough on their own to handle all the various tasks involved in streaming video data. Typically, a streaming session will also involve the Real-time Transport Control Protocol (RTCP), which provides feedback on the quality of the data distribution, including statistics and information about participants in the streaming session.

And finally, RTP itself does not provide any mechanism for ensuring timely delivery or protecting against data loss; instead, it relies on underlying network protocols such as the User Datagram Protocol (UDP) or Transport Control Protocol (TCP) to handle data transmission. To put it all together, RTP puts data in packets and transports it via UDP or TCP, while RTCP helps with quality control and RTSP only comes in to set up the stream and act like a remote control.

RTSP via TCP tunneling

While I said you can use both UDP and TCP to deliver a media stream, I usually recommend RTSP over TCP, specifically using TCP tunneling like what Nabo provides. Basically, TCP tunneling makes it easier for RTSP commands to get through network firewalls and Network Address Translation (NAT) systems.

The reason this is necessary is because RTSP in its out-of-box version has certain deficiencies when it comes to authentication and privacy. Basically, its features were not built for the internet of today that is blocked by firewalls on all sides. Rather than being made for devices on local home networks behind NAT systems, RTSP was originally designed more for streaming data from central services. For that reason, it struggles to get through firewalls or locate and access cameras behind those firewalls, which limits its possible applications.

However, using TCP tunneling allows RTSP to get through firewalls and enables easy NAT traversal while maintaining strong authentication. It basically allows you to use an existing protocol and just “package” it in TCP for enhanced functionality.

The tunnel can wrap RTSP communication inside a NAT traversal layer to get through the firewall. This is important because it can be difficult to set up a media stream between devices that are on different networks: for example, if you’re trying to monitor your home surveillance system while you’re on vacation.

Another benefit of TCP tunneling is enhanced security. Whereas RTSP and RTP don’t have the out-of-box security features of some other protocols, like WebRTC, you can fully encrypt all data that goes through the TCP tunnel. These important factors have made RTSP via TCP tunneling a top option for video streaming within IoT.

Final thoughts

If you’re an IoT developer, the process of choosing between video streaming protocols and services can quickly become overwhelming. Nabto can help you integrate real-time communication (RTC) into IoT products. For more information on TCP tunneling and other services we can provide, contact us and request a consultation.

Read our other resources:

We’ve also published a range of IoT resources for our community, including:

Our RTC explainer, which lays out the many benefits of real-time communication for IoT
Our blog post that covers IoT and the future of video surveillance
Our guide to IoT protocols for developers in 2023