UDT issue

March 4th, 2010 Danny Y. Huang No comments

So here is the set up. I use the gush server to bootstrap a client. Because the client and server don’t communicate properly, the client sits and listen for incoming connections. Next I used the appclient program that comes with the UDT package to make sure that the client is listening properly. It does.

yh1@sysnet1:~/gush/udt4/app$ ./appclient 137.165.1.113 15000
SendRate(Mb/s)	RTT(ms)	CWnd	PktSndPeriod(us)	RecvACK	RecvNAK
59.6142		13.026	170	149.114			79	5
37.1614		8.06	132	142.323			57	0

So I guess something must be wrong, inherent to Gush then. Stay tuned.

Categories: research Tags:

Preventing client from launching twice

March 3rd, 2010 Danny Y. Huang No comments

Right now UDT is not really working, so the client is launched twice. In the first time, the gush server tries to communicate with the client via TCP. When that fails, a second attempt is made by using SSH connection. While this is a failsafe mechanism in Gush, it results in the client launching twice, preventing the UDT from binding. To solve the issue, add the following line to gush.prefs:

false
Categories: research Tags:

Segmentation fault with UDT file descriptors

January 24th, 2010 Danny Y. Huang No comments

When Gush is first started, segmentation fault appears. Here is the backtrace of the stack.

From the following line in TCPConnection::listen

loop->setHandlers(sock, _accept_handler, NULL, _accept_handler, true, false);

the program jumps to line #166 in event_loop_select.cc:

144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
void SelectEventLoop::setHandlers(int fd, ReadHandler *rh, WriteHandler *wh,
	ErrorHandler *eh, bool reading, bool writing)
{
    debugn(6, "setHandlers(" << fd << ", rh=" << (void *)rh << ", wh=" <<
	    (void *)wh << ", eh=" << (void *)eh << ", r=" << reading <<
	    ", w=" << writing << ")" );
    gush_assert( fd >= 0, "fd < 0 in setHandlers()");
 
    const unsigned int ufd = fd;
    MutexHolder lock_it(_lock);
 
    if (_read_handlers.size() <= ufd) {
	_read_handlers.resize(fd+1);
    }
    if (_write_handlers.size() <= ufd) {
	_write_handlers.resize(fd+1);
    }
    if (_error_handlers.size() <= ufd) {
	_error_handlers.resize(fd+1);
    }
    _read_handlers[fd] = rh;
    if ( reading ) {
	FD_SET( fd, &_read_fds );
    } else {
	FD_CLR( fd, &_read_fds );
    }
 
    _write_handlers[fd] = wh;
    if ( writing ) {
	FD_SET( fd, &_write_fds );
    } else {
	FD_CLR( fd, &_write_fds );
    }
 
    _error_handlers[fd] = eh;
    if ( eh != NULL ) {
	FD_SET( fd, &_error_fds );
    } else {
	FD_CLR( fd, &_error_fds );
    }
    updateMaxFD( fd );
}

Turns out that I need to change all FD_??? macros into UDT4’s very own UD_??? ones.

Categories: research Tags:

Statically compile the gush client

January 24th, 2010 Danny Y. Huang No comments
CLIENT_LDFLAGS = $(CFLAGS) $(LDFLAGS) $(STATIC_LDFLAGS)
CLIENT_LDFLAGS_POST = $(LDFLAGS_POST)

Use “ldd client” to check its dependency. If it is compiled statically, no dependencies will be displayed. Such a static compilation should solve the problem that the client cannot run on Planet-Lab machines.

Categories: research Tags:

XML-RPC Library not working on Planet-Lab

January 16th, 2010 Danny Y. Huang No comments

I modified the TCPConnection class to incorporate the UDT libraries. It was a pain to compile and link, but I suppose the server somehow worked—in a sense that the gush server can work without runtime errors. No changes were made in client.cc, but since it was statically compiled, the client executable has also changed. Yet when run on a PL machine, the client reported:

“./client: error while loading shared libraries: libxmlrpc_client++.so.3: cannot open shared object file: No such file or directory”

Since I was not able to run the client at all, there is no way to test for correctness. All I see right now is that gush can bootstrap but cannot make “TCP” connections (I didn’t change the interfaces yet). Installing the XML-RPC Library on PL is impossible because gcc is not available. Installing from the RPM yielded some wierd depency problem. Basically, I’ve been fighting the linking issue for a while now. I feel it is hard to proceed unless I can get the client to work.

Categories: research Tags:

UDT

January 16th, 2010 Danny Y. Huang No comments

Getting the UDT library to compile on Gush is a PAIN! More details coming up.

Categories: research Tags:

Gush without fcntl

January 13th, 2010 Danny Y. Huang No comments

I removed all the fcntl() statements from the TCPConnection class. Apparently, Gush worked fine, albeit slow at the start when I did “connect slice williams_kudzu”. I’m still rather confused about the purpose of fcntl(). If it is to set the bits for fds used in forking, apparently a totally new set of localized fds (like 0, 1, 2) are used in the multiprocess communications.

Important precondition: Prior to the removal of the fcntl() statements, Gush was run on a set of clean nodes. As a result, these nodes had already had the Gush clients by the time Gush was recompiled without the fcntl() statements. I don’t know what role this fact plays. Recorded here for future reference.

Anyways, it is good news. The fcntl() statements are the only socket-related calls that cannot be emulated in UDT. Since they’re not absolutely essential, I can leave them out in the UDTConnection class. Basically, the new UDTConnection class will be (ideally) somewhat similar to TCPConnection; all socket-related calls will be replaced by equivalent UDT calls (along with some nasty hacks just to get things to work).

This echos my earlier concern:

I have been looking for an application-layer protocol and I came across UDT (http://udt.sourceforge.net). Though its intended use is for high-speed networks, its potential to be used in the TCPConnection class is likely; it provides a similar interface to BSD sockets, and that it talks in UDP. Right now, I’m trying to replace BSD-socket statements into UDT-socket statements, e.g. from “int sock = std::socket(…)” into ” UDTSOCKET sock = UDT::socket(…)”. This is a hello-world application in UDT: http://udt.sourceforge.net/udt4/doc/t-hello.htm — it is not very different from a typical BSD-socket application.

Yet UDT is not the panacea. Some fd-related operations simply do not have their counterparts in the UDT world. For instance, “fcntl(sock, F_SETFD, FD_CLOEXEC)” and “fcntl(sock, F_SETFL, O_NONBLOCK)” are not possible if the UDT library is used. If I’m not wrong, they are meant for interprocess communications as proxy installers are spawned. Given that we still need fds to facilitate interproces communications, there is still a chance that Gush may run out of fds if processes are spawned too quickly, or if these processes don’t exit fast enough because of network latency (which is usually the case). These slow processes eat up the available fds, causing other new connections to starve. Since this is the main source where fds are consumed the most, I do not know how effective the TCP-to-UDT conversion may be with respect to the running-out-of-fd problem. Of course, adopting UDT has the potential of increased performance (or so the paper claims).

For now, I’m hacking the source code of UDT and trying to fit it into Gush. As for the “fcntl” cases, I may eventually leave them out in the UDT version and figure out an alternative.

Categories: research Tags:

Application layer transport protocols over UDP

January 7th, 2010 Danny Y. Huang No comments

See UDT: http://udt.sourceforge.net/ Looks very promising.

Categories: research Tags:

TCP over UDP literature

December 11th, 2009 Danny Y. Huang No comments

I haven’t found any articles on TCP over UDP at the application layer.

Will work on a small test program in c before hacking Gush.

Categories: research Tags:

Varying ulimit

December 11th, 2009 Danny Y. Huang No comments

I tried to vary the number of max fds to see the effects on the number of successful connections.

ulimit -n success
70 9
71 8*
72 10
73 11
74 13
75 13

* on second trial, 11 successful connections

The results of the tests, however, cannot be replicated over multiple trials. Apparently, each connection uses at least two file descriptors. One for the actual TCP connection, the other for the proxy (which essentially spawns an SSH and copies the client over). My guess is that the proxy is created too fast, such that they’re eating up too many file descriptors and leaving no fds for the TCP connections.

Sample output:

gush> connect slice williams_gush
Found 30 hosts
Initiated connections to 30 of 30 hosts.
gush> For an unknown reason, we were unable to invite williams_gush@planetlab1.ucsd.edu:15001.
For an unknown reason, we were unable to invite williams_gush@planetlab01.cs.washington.edu:15001.
For an unknown reason, we were unable to invite williams_gush@planetlab02.cs.washington.edu:15001.
For an unknown reason, we were unable to invite williams_gush@planetlab03.cs.washington.edu:15001.
For an unknown reason, we were unable to invite williams_gush@planetlab04.cs.washington.edu:15001.
For an unknown reason, we were unable to invite williams_gush@planetlab05.cs.washington.edu:15001.
For an unknown reason, we were unable to invite williams_gush@planetlab2.ucsd.edu:15001.
For an unknown reason, we were unable to invite williams_gush@planetlab3.ucsd.edu:15001.
For an unknown reason, we were unable to invite williams_gush@planetlab7.cs.duke.edu:15001.
For an unknown reason, we were unable to invite williams_gush@planetlab2.cs.duke.edu:15001.
For an unknown reason, we were unable to invite williams_gush@planetlab3-dsl.cs.cornell.edu:15001.
williams_gush@planetlab-04.cs.princeton.edu:15001 has joined the mesh.
williams_gush@planetlab-9.cs.princeton.edu:15001 has joined the mesh.
williams_gush@planetlab-02.cs.princeton.edu:15001 has joined the mesh.
williams_gush@planetlab5.cs.duke.edu:15001 has joined the mesh.
williams_gush@planetlab6.cs.cornell.edu:15001 has joined the mesh.
williams_gush@planetlab4-dsl.cs.cornell.edu:15001 has joined the mesh.
williams_gush@planetlab1.cs.cornell.edu:15001 has joined the mesh.
williams_gush@planetlab3.williams.edu:15001 has joined the mesh.
williams_gush@planet1.scs.stanford.edu:15001 has joined the mesh.
williams_gush@planet2.scs.stanford.edu:15001 has joined the mesh.
williams_gush@planetlab2.williams.edu:15001 has joined the mesh.
williams_gush@planetlab4.williams.edu:15001 has joined the mesh.
williams_gush@planetlab5.williams.edu:15001 has joined the mesh.
Categories: research Tags: