portability/tcp_sockets.hpp
Platform-Independent TCP Internet Classes

Introduction

This package provides simple, clean and platform-independent interfaces to the Internet protocols provided by TCP. The motivation for writing it was to make these protocols more accessible to the developer. The underlying mechanisms for TCP are very low level and unnecessarily complicated. These classes encapsulate the essence of TCP in three very simple to use classes.

There are two 'ends' to an internet connection, known as the client and the server. A client is a program that requests a connection whilst a server provides that connection. In most cases, developers will only be interested in the client side, since that is all that is needed to build a web browser, news reader, mail reader etc.. However, this package also provides the server classes that would be required to build a web server, news server, mail server etc..

The TCP classes only provide the underlying transport mechanism of TCP. It does not provide the information protocols such as HTTP, FTP, Telnet etc. These information protocols could be provided as additional classes layered onto the TCP classes. For example, an HTTP client class could be layered onto the TCP client class.

The TCP classes are derived from the IP_socket class. This class is not documented separately because it is not intended to be used directly. IP_socket implements low-level socket operations in a platform-independent, object-orientated form. It hides the operating-system differences and the sheer ugliness of the socket interfaces that implement the internet protocol (IP). The TCP classes then inherit from that layer, implementing the TCP functionality using this platform-independent interface to simplify the task. The intention is that you only use the TCP classes and not the base class.

TCP Client

The TCP client is used to create a connection to a TCP server anywhere on the internet. The client is responsible for initiating an internet connection which the server then responds to.

The interface to the TCP_client class is shown below.

class TCP_client
{
public:
  TCP_client(void);
  TCP_client(const std::string& remote_address, unsigned short remote_port, unsigned timeout = 0);
  TCP_client(unsigned long remote_address, unsigned short remote_port, unsigned timeout = 0);
  ~TCP_client(void);
  TCP_client(const TCP_client&);
  TCP_client& operator=(const TCP_client&);

  unsigned long ip_lookup(const std::string& remote_address);
  bool initialise(const std::string& remote_address, unsigned short remote_port, unsigned timeout = 0);
  bool initialise(unsigned long remote_address, unsigned short remote_port, unsigned timeout = 0);
  bool initialised(void) const;
  bool connected(void);
  bool close(void);

  bool send_ready(unsigned wait = 0);
  bool send (std::string& data);
  bool receive_ready(unsigned wait = 0);
  bool receive (std::string& data);

  unsigned short local_port(void) const;
  unsigned long remote_address(void) const;
  unsigned short remote_port(void) const;

  void clear_error (void) const;
  int error(void) const;
  std::string message(void) const;
};

There are two ways of creating a TCP client connection. You can either create an uninitialised client (calling the void constructor) and then initialise it later, or you can initialise the connection at construction time and then later test to see if the connection initialised correctly. In the former case, the initialise function will return true if the initialisation succeeded and false otherwise. In the latter case, constructors cannot return a result so you must call the initialised test to see if the connection initialised correctly. If the connection fails, the error function will give an error code corresponding to the cause for the failure and the message function will give a textual representation of the error.

The first initialise function (and the second constructor) takes three arguments. The first is a string containing the internet address to connect to, either as a name (e.g. stlplus.sourceforge.net) or as a number (e.g. 195.0.0.1) and the second is the port to connect to (e.g. 80 for the HTTP service). The third argument is a timeout clause: if the connection can't be made in this time (measured in micro-seconds) then the connection fails. However, if the timeout is set to 0 (the default), then the client implements a non-blocking connection which then needs to be polled by calling the connected method to determine when it is connected. Non-blocking operation is suited to applications such as GUIs where other operations need to be performed in parallel with TCP operations.

The second initialise function (and the third constructor) also takes three arguments. The first is the internet address to connect to represented as an integer (actually an unsigned long). This integer is the result from the ip_lookup method. It enables the name lookup to be separated out from the actual connection process. The second and third argument, and indeed the operation of the function, is the same as the previous initialisation function.

The TCP client class supports assignment. This is done using a smart pointer to the underlying data structures. Thus, when you assign TCP clients, you simply create multiple pointers to the same data structure. There is no way of having more than one client class for a single connection.

This support for assignment makes it easy to create data structures of TCP clients. For example, to support multiple simultaneous connections, you could have an STL vector of TCP_client objects. Each object could be created outside the vector, initialised and then only copied into to the vector if initialisation worked.

Once a client connection is finished with, it should be closed by calling close. The destructor also calls close, so allowing the client object to be destroyed will close the connection automatically. This would happen if you erased the client from a vector for example.

The next block of functions actually perform the communication phase of the connection. All communications are done in non-blocking mode, so these functions will not block and therefore will not lock up the program. However, this means that you have to be prepared to retry.

To send data down the connection to the server, use the send function. The string passed to send is consumed by the send function. In other words, if you pass a string of, say, 1k bytes and only 100 bytes can be sent at once, the first 100 bytes of the string will be erased to indicate that they've been sent. You should repeat the send until the whole string has been consumed. The send function returns false only if an error occurred - failure to send because the server is not ready does not class as an error. All that will happen there is that the function will return true but the string will still contain the same data. You should retry, possibly after a short delay.

You can use the send_ready function to determine whether the server is ready to receive what you send and then call send to actually send the data. Note that you can call send on its own and it will check whether the server is ready for you. The send_ready function allows you to specify a time delay to wait for the server to become ready if it is not at the time of the call. This delay is specified in micro-seconds (so giving the delay the value 1000000 will cause a delay of 1s). Setting the delay to zero will simply test whether the server is ready now. This is a non-blocking test which will be the best choice if you are using the TCP classes as part of a bigger program such as a GUI where there are other things to deal with, or if you are trying to poll multiple connections. If the server is not ready after this time delay has expired, the send_ready function will return false to indicate not ready. If the server becomes available at any time during the time delay, the timer will be stopped and the function will return true immediately, it will not wait for the full delay before returning.

A similar argument applies to the receive and receive_ready functions. Calling receive_ready checks whether there is any server data ready to be received by the client. If it is, then receive will collect it. You may need to keep calling receive to get all the data (the exact method for determining whether all data has been received depends on the protocol). Bear in mind that network delays mean that, even if receive_ready returns false, that doesn't mean all the data has been received, simply that no data has arrived within the timeout period. Again, receive_ready can be used with little or no timeout to prevent blocking of the program and once again, there is no point calling receive_ready with zero timeout since you might as well just call receive.

The receive function appends data received to the string passed as its argument, so you can allow the data to accumulate over many calls to receive. On reading the contents of the string, you can then clear the string ready to receive the next batch of data.

If the server closes the connection, the client will detect the closure either when a send fails or at the end of receiving data from the server.The connected function should be called to detect this situation.

The remote_address and remote_port methods allow access to the connection details once the connection has been made. The local_port method allows the local port being used for the connection to be retrieved. For TCP client connections, this is usually one picked from a pool of port numbers set aside by the operating system for use by clients, and will not be one of the service numbers such as 80 for HTTP. Each open connection uses a different local port number.

Finally, there are three error handling functions. The error function returns the latest error code, zero if there hasn't been an error. This is the test for no error. Then, if there is an error, the message function returns a string description of the error. Finally, if the error is non-fatal, it can be cleared by clear_error and communications can continue.

In typical use, the connection should be first initialised and then any acceptance message received from the server. Whether there is such a message depends on the protocol. For example, the NNTP protocol for news sends a welcome message to confirm that the connection is present and that the news server is responding okay. Then most protocols use a two-way communication, with the client sending commands to the server and the server responding with either status codes or data. At the end of the communication, it is usually the client's job to close the connection, although some protocols such as HTTP allow the server to close the connection as soon as a request has been fulfilled.

Here's a simple Telnet-like client program:

#include "tcp_sockets.hpp"
#include <iostream>
#include <string>

int main (int argc, char* argv[])
{
  if (argc != 3)
    std::cerr << "usage: " << argv[0] << " <host> <port>" << std::endl;
  else
  {
    // create a client connection
    // the address is specified by command argument 1 and the port
    // specified by argument 2. Use a timeout of 10s.
    stlplus::TCP_client client(argv[1],(unsigned short)atoi(argv[2]), 10000000);

    // test to see if the connection completed OK within the timeout
    if (!client.initialised())
    {
      std::cerr << "client failed to initialise" << std::endl;
      return -1;
    }
    if (client.error())
    {
      std::cerr << "client initialisation failed with error " << client.error() << std::endl;
      return -1;
    }

    // start processing
    std::cerr << "type in messages, end the connection with EOF" << std::endl;
    for(;;)
    {
      while (client.receive_ready(1000000))
      {
        std::string returned;
        if (!client.receive(returned))
          std::cerr << "receive failed" << std::endl;
        else
          std::cerr << "received \"" << returned << "\"" << std::endl;
      }
      if (!client.initialised())
      {
        std::cerr << "connection has closed" << std::endl;
        break;
      }
      std::string message;
      std::cerr << "command:" << flush;
      if (!std::getline(fin,message))
      {
        std::cerr << "done" << std::endl;
        client.close();
      }
      else
      {
        message += "\r\n";
        while(!message.empty())
        {
          if (!client.send(message))
          {
            std::cerr << "failed to send message: " << message << std::endl;
            break;
          }
        }
      }
    }
  }
  return 0;
}

It is common with most internet protocols to use the carriage-return/line-feed ("\r\n") termination of a request as shown in this example.

TCP Server

A TCP Server is a program that spends most of its time doing nothing very much. It listens to a port, waiting for a client to attempt a connection. When a connection is attempted, the server creates a new socket to manage that connection, reallocates it to a different port (so the original port is made available again for further connections), and then goes back to listening again. The connection itself is then managed separately by a TCP_connection class.

The server interface is shown below:

class TCP_server
{
public:
  TCP_server(void);
  TCP_server(unsigned short port, unsigned short queue = 0);
  ~TCP_server(void);
  TCP_server(const TCP_server&);
  TCP_server& operator=(const TCP_server&);

  bool initialise(unsigned short port, unsigned short queue = 0);
  bool initialised(void) const;
  bool close(void);

  bool accept_ready(unsigned wait);
  TCP_connection accept(void);

  void clear_error (void) const;
  int error(void) const;
  std::string message(void) const;
};

There are two ways of creating a TCP server. You can either create an uninitialised server (calling the void constructor) and then initialise it later, or you can initialise the server at construction time and then later test to see if the server initialised correctly. In the former case, the initialise function will return true if the initialisation succeeded and false otherwise. In the latter case, constructors cannot return a result so you must call the initialised test to see if the server initialised correctly. If the initialisation fails, the error function will give an error code corresponding to the cause for the failure and the message function will give a textual representation of the error.

The initialise function (and the second constructor) takes two arguments. The first is the port number to listen to and the second is the number of client requests to allow to be queued before rejecting connections. This may be zero, in which case any incoming requests while processing a connection will be rejected.

The TCP server class supports assignment. This is done using a smart pointer to the underlying data structures. Thus, when you assign TCP servers, you simply create multiple pointers to the same data structure as with the TCP client.

This support for assignment makes it easy to create data structures of TCP servers. For example, to listen to many different ports, you could have a vector of TCP_server objects, one for each port being listened to. Each object could be created outside the vector, initialised and then only copied into to the vector if initialisation worked.

Once a server is finished with, it should be closed by calling close. The destructor also calls close, so allowing the server object to be destroyed will close the listening port automatically.

To accept a connection, you first need to poll the function accept_ready. If this returns false, then no connections are incoming on the port so there is nothing to do, but if it returns true it means that at least one client is trying to connect. In this case the connection can be accepted by calling the accept function. This returns an object of type TCP_connection which will be described in the next section. The server object does not itself handle the connection, that task is entirely delegated to the TCP_connection object and the server has no further part to play in the connection.

The error handling functions have the same functionality as for the TCP_client

TCP Connection

A TCP Connection object is created by a TCP server when accepting a connection from a client. The connection object manages the server end of that connection. The interface to the TCP connection is:

class TCP_connection
{
public:
  TCP_connection(void);
  ~TCP_connection(void);

  TCP_connection(const TCP_connection&);
  TCP_connection& operator=(const TCP_connection&);

  bool initialised(void) const;
  bool close(void);

  bool send_ready(unsigned wait);
  bool send (std::string& data);
  bool receive_ready(unsigned wait);
  bool receive (std::string& data);

  unsigned short local_port(void) const;
  unsigned long remote_address(void) const;
  unsigned short remote_port(void) const;

  void clear_error(void) const;
  int error(void) const;
  std::string message(void) const;
};

You will notice that there is no method or constructor provided for initialising a TCP_connection. This is because these objects can only be created by a TCP_server. However, if the server creates a connection but that somehow fails, the initialised function will return false. This is the way to test whether a connection was accepted properly. Futhermore, if the connection is ended either by the client disconnected or by closing the connection then the initialised function will also return false. In any case, if the connection failed due to an error, the error function will return a non-zero error code and the message function will give a text representation of that error.

The remote_address and remote_port functions allow details of the connected client to be examined. The address is an IP address packed into a 32-bit word. The port is the number of the port of the remote client making the connection, whilst the rlocal_port is the port on the local computer that is serving the connection (remember that the server listens on a pre-defined port but transfers accepted connections onto another, arbitrarily chosen port for the duration of that connection.

The connection can be closed at the server end by calling the close function. This is also called by the destructor, so allowing the connection to be destroyed also is an effective way of closing the connection.

The remainder of the functions mirror the client data exchange functions. The send function sends data to the client. If the send succeeds, then the sent data is removed from the string passed as the argument. You may need to call send more than once to send the whole string. If the client is not ready to receive, then send returns true (since no error has occurred), but no data is removed from the string. You can optionally call send_ready to check whether calling send would result in something actually being sent. The send_receive function also gives a mechanism for allowing the program to wait a time delay before sending, since network traffic is typically slow. This would be appropriate if you are writing a program that has nothing else to do but process the connection. However, if you are in a GUI or in a program with many connections to serve, either no wait or a very small wait should be used. If no wait is required there is really no point in calling send_ready since send does not fail if the client is not ready.

A similar argument applies to the receive and receive_ready functions. Calling receive_ready checks whether there is any client data ready to be received by the server. If it is, then receive will collect it. You may need to keep calling receive to get all the data (the exact method for determining whether all data has been received depends on the protocol). Bear in mind that network delays mean that, even if receive_ready returns false, that doesn't mean all the data has been received, simply that no data has arrived within the timeout period. Again, receive_ready can be used with little or no timeout to prevent blocking of the program and once again, there is no point calling receive_ready with zero timeout since you might as well just call receive.

If the client closes the connection, the TCP_connection will detect the closure either when a send fails or at the end of receiving data from the client. The connected function should be called to detect this situation.

Example

The following example shows the server and connection classes in use:

int server(std::vector<std::string> values, int port, const std::string& command)
{
  int errors = 0;
  std::cerr << "server: " << stlplus::build() << std::endl;
  // this is the first-run instance and acts as the server
  // setup a server on the shared port
  std::cerr << "server: started" << std::endl;
  stlplus::TCP_server server(port);
  std::cerr << "server: set up TCP server on port " << port << std::endl;
  // now spawn a second copy of this program
  stlplus::async_subprocess client;
  client.spawn(command);
  std::cerr << "server: spawned client with command: " << command << std::endl;
  // now wait for a client connection to come in
  std::cerr << "server: waiting for incoming connection" << std::endl;
  if (!server.accept_ready(timeout))
  {
    std::cerr << "server: ERROR: wait for incoming connection timed out" << std::endl;
    ++errors;
  }
  else
  {
    std::cerr << "server: accepting incoming connection" << std::endl;
    stlplus::TCP_connection connection = server.accept();
    unsigned request = 0;
    while(connection.initialised())
    {
      std::cerr << "server: checking for receive request" << std::endl;
      // check connection for incoming request
      if (!connection.receive_ready(timeout))
      {
        std::cerr << "server: ERROR: wait for receive timed out" << std::endl;
        ++errors;
      }
      else
      {
        // receive the request
        std::string data;
        if (!connection.receive(data))
        {
          std::cerr << "server: ERROR: connection receive failed after receive_ready was true" << std::endl;
          ++errors;
        }
        else if (data.size() == 0)
        {
          // legitimate (?) last receive of an empty string
          std::cerr << "server: received empty string \"" << data << "\"" << std::endl;
        }
        else
        {
          // report the value and check it's correct
          std::cerr << "server: received \"" << data << "\"" << std::endl;
          std::string value = std::string("error");
          if (request < values.size())
            value = values[request];
          if (data != value)
          {
            std::cerr << "server: ERROR: received data \"" << data << "\" != \"" << value << "\"" << std::endl;
            ++errors;
          }
          // now throw the same data back
          if (!connection.send_ready(timeout))
          {
            std::cerr << "server: ERROR: wait for send timed out" << std::endl;
            ++errors;
          }
          else
          {
            std::cerr << "server: sending \"" << data << "\"" << std::endl;
            if (!connection.send(data))
            {
              std::cerr << "server: ERROR: connection send failed after send_ready was true" << std::endl;
              ++errors;
            }
          }
          request++;
        }
      }
      // just check for client having exited
      if (!client.tick())
      {
        std::cerr << "server: client has exited - closing connection" << std::endl;
        connection.close();
      }
    }
    std::cerr << "server: exited server loop" << std::endl;
    // wait for client to exit
    while (client.tick())
    {
      std::cerr << "server: client has not exited - waiting" << std::endl;
      sleep(1);
    }
    if (client.exit_status() != 0)
    {
      std::cerr << "server: ERROR: client has exited with status " << client.exit_status() << std::endl;
      errors += client.exit_status();
    }
  }
  std::cerr << "server: " << (errors ? "ERROR:" : "SUCCESS:") << " exiting with " << errors << " errors" << std::endl;
  return errors;
}

In this example, which is a test program, the server is set up to listen to the specified port, then it uses the subprocess classes to spawn a client program. The client program then sends strings to the server which sends them back again. When the client disconnects, the server closes and the function exits.

The other part of this program is the client which is sending the requests in the first place:

int client(std::vector<std::string> values, int port)
{
  int errors = 0;
  // this is the second-run instance and acts as the client
  std::cerr << "client: " << stlplus::build() << std::endl;
  std::cerr << "client: creating connection" << std::endl;
  stlplus::TCP_client connection("localhost", port, timeout);
  unsigned request = 0;
  while(connection.connected())
  {
    std::string value = values[request];
    // send this value
    std::cerr << "client: sending data" << std::endl;
    if (!connection.send_ready(timeout))
    {
      std::cerr << "client: ERROR: wait for send timed out" << std::endl;
      ++errors;
    }
    else
    {
      // avoid the data being consumed by the send
      std::string data = value;
      std::cerr << "client: sending \"" << data << "\"" << std::endl;
      if (!connection.send(data))
      {
        std::cerr << "client: ERROR: connection send failed after send_ready was true" << std::endl;
        ++errors;
      }
    }
    // check connection for incoming request
    if (!connection.receive_ready(timeout))
    {
      std::cerr << "client: ERROR: wait for receive timed out" << std::endl;
      ++errors;
    }
    else
    {
      // receive the request
      std::string data;
      if (!connection.receive(data))
      {
        std::cerr << "client: ERROR: connection receive failed after receive_ready was true" << std::endl;
        ++errors;
      }
      else
      {
        // report the value and check it's correct
        std::cerr << "client: received \"" << data << "\"" << std::endl;
        if (data != value)
        {
          std::cerr << "client: ERROR: received data \"" << data << "\" != \"" << value << "\"" << std::endl;
          ++errors;
        }
        request++;
      }
    }
    if (request >= values.size())
    {
      std::cerr << "client: all values sent, closing connection" << std::endl;
      connection.close();
    }
  }
  std::cerr << "client: exited communication loop" << std::endl;
  std::cerr << "client: " << (errors ? "ERROR:" : "SUCCESS:") << " exiting with " << errors << " errors" << std::endl;
  return errors;
}

Finally, these two functions are actually part of one program. The program is run with no parameters, which becomes the server. This then spawns another copy of itself but with a parameter, which makes it into a client. Here is the remaining code to make this work:

#include "build.hpp"
#include "tcp_sockets.hpp"
#include "string_utilities.hpp"
#include "subprocesses.hpp"
#include <vector>
#include <string>
#include <iostream>

////////////////////////////////////////////////////////////////////////////////
// TCP test: this program is designed to be run twice - once without an
// argument as the server, which itself runs a second copy with an argument as
// the client. The two instances then swap information using TCP
////////////////////////////////////////////////////////////////////////////////

unsigned timeout = 10000000;

////////////////////////////////////////////////////////////////////////////////
// Server code
// sets up a TCP server then spawns this program again in client mode

<server function>

////////////////////////////////////////////////////////////////////////////////
// client

<client function>

////////////////////////////////////////////////////////////////////////////////
// main program

int main (int argc, char* argv[])
{
  // code common to both instances to ensure both have the same information
  std::vector<std::string> values = stlplus::split("one:two:three", ":");
  int port = 3000;

  // now branch for the two instances
  int errors = 0;
  if (argc == 1)
  {
    std::string command = std::string(argv[0]) + std::string(" client");
    errors = server(values, port, command);
  }
  else
  {
    errors = client(values, port);
  }
  return errors;
}