For our new XMPP-server we selected APR library (and Tomcat Native as Java wrapper) as a basis for NIO network support library. New server handles times more connections, reducing number of servers in production environment and JVM restarts.
Odnoklassniki.ru (rus. одноклассники — schoolmates) is one of the most popular social networks in Russian part of the Internet. More than 150 million users are registered and more than 70 million people use Odnoklassniki every month. Three main offices are located in Moscow (Russia), Saint Petersburg (Russia) and Riga (Latvia)
Along with website, we have mobile version, mobile applications and instant messenger application. IM uses XMPP protocol and accept connections from third-party clients (Psi, Miranda, etc.). Number of concurrent connections is slowly growing from 150k (January 2012) to 300k (July 2012) and expected to be 500k just before New Year holidays.
We had 12 Java XMPP servers. They were based on Tigase XMPP-server. Each of those handled about 25k connections. During normal work everything was okay, but sometimes bad things happened: Full GC calls, problems with off-heap memory. Also it is very hard to use codebase as a “proxy” – we don’t need XMPP server to store messages in database or maintain contact list, it shall just pass all requests to our EJB servers and responses back. It was very hard to do with Tigase.
We still prefer Java as a platform, because it is a corporate standard, has a garbage collector and a lot of other reasons. Moving to C or Erlang was not an option. So we decided to create own XMPP-server, but with JNI-based network part. Actually we considered using one of home-grown library, which already used as base for very fast NIO HTTP server. But I insisted we need not only library with a good performance, but it also need to be cross-platform. I.e. it should be possible to design and test application on Windows platform and run it under Linux. So “Apache Portable Runtime” and “Tomcat Native” were selected. It turns out Tomcat Native already have OpenSSL wrappers (home-grown library haven’t). SSL support is required for SSL/TLS XMPP support.
(illustration of server design here)
Our application uses single writing poll thread, read polls thread pool and additional async operation thread pool for every operation, including reading from socket and writing to socket. When poll signals about socket, it is removed from poll collection and passed to async operation pool. When operation is completed, socket is returned to poll. Because of distinguished polls and async thread pools, single sockets will not block other sockets from reading or writing.
(details of implementation here: Read/Write poll as an example; per-socket “operation queue”)
As long as we read some data from socket, we pass the reading ByteBuffer to XML-parsed, that is based on Jasper XMPP implementation. As long as we have completed XML element (“stranza” in terms of XMPP) it is passed to one of processors. It doesn’t matter how long parser or processer will work, because it is all done in thread from “async” pool, not in “poll-signal-process-poll” cycle.
Not only client can initiate socket activity, but server as well (for example, ping operation each 10 minutes). But we need not to mess up read operations, write operations and other sockets operations (like TLS-handshake). So each socket has the “operation queue” collection. There are commands that need to be executed for socket. It can be READ command, WRITE command (with buffer link), TLS handshake command, or, for example, close socket command. All commands are executed in FIFO order. Technically when poll signals about new data to be read, it adds READ command to socket queue and pass the socket to queue execution method. If server requires socket to do something, and socket is currently in poll, server code adds new command to queue and asks poll thread to remove queue from poll and start operations queue. When no commands left in queue, and socket is not closed, it is returned to read poll.
(short code examples for read poll cycle, operations queue handling, read operation, write operation, TLS handshake operation)
As a result, now we have only 4 servers, that handles 90k (actual uptime data placeholder) simultaneous connections each, and tested to be scaled up to 200k each. They work without restart for weeks. (Actual uptime data placeholder).
(Graphics here: compare of old and new server CPU usage, GC count and times)
Some problems and tips:
Work with SSL (TLS handshake) still has several issues with socket timeouts.
Consider carefully, what design is best for you. For “short” requests “accept-poll-signal-response-close” is much simpler and still usable. For long-living sockets complicated design is okay, but be careful not to “lost” sockets somewhere between threads.