I removed all the fcntl() statements from the TCPConnection class. Apparently, Gush worked fine, albeit slow at the start when I did “connect slice williams_kudzu”. I’m still rather confused about the purpose of fcntl(). If it is to set the bits for fds used in forking, apparently a totally new set of localized fds (like 0, 1, 2) are used in the multiprocess communications.
Important precondition: Prior to the removal of the fcntl() statements, Gush was run on a set of clean nodes. As a result, these nodes had already had the Gush clients by the time Gush was recompiled without the fcntl() statements. I don’t know what role this fact plays. Recorded here for future reference.
Anyways, it is good news. The fcntl() statements are the only socket-related calls that cannot be emulated in UDT. Since they’re not absolutely essential, I can leave them out in the UDTConnection class. Basically, the new UDTConnection class will be (ideally) somewhat similar to TCPConnection; all socket-related calls will be replaced by equivalent UDT calls (along with some nasty hacks just to get things to work).
This echos my earlier concern:
I have been looking for an application-layer protocol and I came across UDT (http://udt.sourceforge.net). Though its intended use is for high-speed networks, its potential to be used in the TCPConnection class is likely; it provides a similar interface to BSD sockets, and that it talks in UDP. Right now, I’m trying to replace BSD-socket statements into UDT-socket statements, e.g. from “int sock = std::socket(…)” into ” UDTSOCKET sock = UDT::socket(…)”. This is a hello-world application in UDT: http://udt.sourceforge.net/udt4/doc/t-hello.htm — it is not very different from a typical BSD-socket application.
Yet UDT is not the panacea. Some fd-related operations simply do not have their counterparts in the UDT world. For instance, “fcntl(sock, F_SETFD, FD_CLOEXEC)” and “fcntl(sock, F_SETFL, O_NONBLOCK)” are not possible if the UDT library is used. If I’m not wrong, they are meant for interprocess communications as proxy installers are spawned. Given that we still need fds to facilitate interproces communications, there is still a chance that Gush may run out of fds if processes are spawned too quickly, or if these processes don’t exit fast enough because of network latency (which is usually the case). These slow processes eat up the available fds, causing other new connections to starve. Since this is the main source where fds are consumed the most, I do not know how effective the TCP-to-UDT conversion may be with respect to the running-out-of-fd problem. Of course, adopting UDT has the potential of increased performance (or so the paper claims).
For now, I’m hacking the source code of UDT and trying to fit it into Gush. As for the “fcntl” cases, I may eventually leave them out in the UDT version and figure out an alternative.