Standards and protocols for real-time text

Many implementations of real-time text (RTT) systems exist only as proprietary or undocumented code. For a survey of real-time text systems, please refer to A survey of real-time text systems.

WM: Archived via the Internet Archive Wayback Machine

Table of Contents

1. T.140-based RTT over RTP/SIP

This family of standards forms a keystroke streaming approach to real-time text. T.140 defines the core text format and semantics, RFC 4103 provides the RTP transport with redundancy, and RFC 5194 supplies the SIP-based session framework.

1.1. ITU-T T.140 (1998)

T.140 (Protocol for multimedia application text conversation) defines a foundational protocol for real-time text conversation in multimedia applications. Each character (or character group) is sent immediately as typed. Editing is supported via embedded control codes (e.g., backspace, delete, and cursor positioning). T.140 is transport-agnostic and focuses purely on text semantics and character-level streaming.

Source: itu.int/rec/T-REC-T.140/en
Mirrors: [PDF] [PDF of addendum]

1.2. IETF RFC 4103 (2005)

RFC 4103 (RTP Payload for Text Conversation) defines an RTP payload format that carries T.140 text over IP networks. Each RTP packet contains one T.140 block. UDP packet loss is mitigated by adding redundancy (RFC 2198), typically including up to two previous text generations in every packet. Timestamps and sequence numbers allow synchronization and loss detection.

Source: rfc-editor.org/rfc/rfc4103.txt
Mirrors: [TXT]

1.3. IETF RFC 5194 (2008)

RFC 5194 (Framework for Real-Time Text over IP Using the Session Initiation Protocol (SIP)) provides the architectural framework (Text-over-IP or ToIP) for deploying real-time text in SIP-based multimedia sessions. It defines session setup, SDP negotiation for the “text” media type, mid-call modality switching, and integration with other media (voice/video). It relies on T.140 encoding carried via RFC 4103 RTP payloads, with requirements for low latency (≤300 ms per character) and support for interworking with legacy TTY systems.

Source: rfc-editor.org/rfc/rfc5194.txt
Mirrors: [TXT]

2. Real-time text over XMPP

Extensible Messaging and Presence Protocol (XMPP), formerly Jabber, is an open XML-based standard for communication. It defines a number of XMPP Extension Protocols (XEPs).

One notable extension protocol is XEP-0301, which specifies how real-time text can be implemented in an XMPP system. XEP-0301 builds on and synergizes with many other XEPs such that XMPP can be used to, for instance, construct a multi-user real-time text system with presence indicators, rich text, attachments, and so on.

2.1. XEP-0301: In-Band Real Time Text (2013)

XEP-0301 defines an in-band Real-Time Text protocol for XMPP. Unlike the T.140 family’s keystroke-streaming model, it uses a diff-based approach: updates are sent as XML action elements inside <message/> stanzas – <t/> for insertions at a position, <e/> for erasures, and <w/> for timing waits to preserve natural typing rhythm. Updates are typically sent every ~700 ms. A periodic <rtt event='reset'/> retransmits the full current state for synchronization. This design is more bandwidth-efficient, supports complex mid-message edits, and reduces the impact of packet loss compared to pure keystroke transmission.

Source: xmpp.org/extensions/xep-0301.html
Mirrors: [WM] [PDF] [MHTML]