Tuesday 17 July 2012

Sockets and Serialisation

Introduction

If you have looked at some of the previous posts, you will see that I use XML-serialisation.  The reason is simple: its easy to do, automatically done by the compiler and the least error prone solution I know.  There are of course a few issues: You are tied to the .Net framework and XML is not the most efficient storage format.
When you start sending the data down a socket I never seem to get it working so well and I end up hitting issues with various things.  So I decided to have another go and write down some of my experiences.

XML-Serialisation and TCP

The basic XML-serialisation to a string is easy.  If you want to know about it, see my past post (http://codethegame.blogspot.com/2011/11/xml-serialisation-of-hierarchy-in-c.html).  Once the XML is generated, its fairly simple to send, or is it?
When I work with TCP, I normally send data one line at a time. So you see a lot of ReadLine/WriteLine in my code.  To format the XML onto a single line takes a bit of work (you need to use and XmlWriter and XmlWriterSettings, you can look it up easily).  So your code ends up a bit like this:
// Pseudo C# to send
Message m=....  // whatever
string msg=DoSerialise(m); // convert it to a string
socketWriter.WriteLine(msg); // send it
// Pseudo C# to receive
string msg=socketReader.ReadLine(); // get a line from the socket
Message m=DoDeserialise(msg);  // convert back to an object
// do your message handling here
This is all fine and simple.  But what happens if I want to switch to binary instead?

Binary-Serialisation and UDP

Binary serialisation obviously turns the object into an array of bytes.  This works very easily with UDP, as its also designed to work with arrays of bytes.  The code ends up looking like this:
// Pseudo C# to send
Message m=....  // whatever
IFormatter formatter = new BinaryFormatter();
MemoryStream mem=new MemoryStream();
formatter.Serialize(mem,m); // serialise
byte[] data=mem.ToArray(); // get the bytes
SendUdp(data); // send it
// Pseudo C# to receive
byte[] data=RecvUdp(); // get the data from the socket
IFormatter formatter = new BinaryFormatter();
MemoryStream mem=new MemoryStream(data);
Message m=(Message)formatter.Deserialize(mem); // deserialise
// do your message handling here
Overall its quite simple: Message=>Binary Array=>UDP=>Binary Array=>Message.
However, if you try this simple approach in TCP, it fails very quickly.
// BROKEN Pseudo C# to send
Message m=....  // whatever
byte[] data=Serialise(m); // get the bytes
SendTcp(data); // send it
// BROKEN Pseudo C# to receive
byte[] data=RecvTcp(); // get the data from the socket
Message m=Deserialise(data); // deserialise
// do your message handling here
‘Why?’ you ask, well its because of the way the TCP and UDP handle data.

TCP vs UDP

Without boring you all senseless with theory which you hopefully already know: UDP is packet based and TCP is stream based.  Or to put it another way:

UDP
This is UDP.
As you can see.
Each ‘box’ is a separate packet of data.
You can clearly see each item separately.

TCP 1
This is TCP. As you can see. Its all together in one continual stream.  Its still quite easy to follow what is going on, because this is English. All you have to do is look for the full stops and you can separate the sentences from each other.

TCP 2
Butnowlookatthismessthisissomemoretextwithoutspacesorpunctuationanditsquiteanightmaretofigureoutwhatsgoingonunlessyoureadverycarefullytomakemattersworseifyougetconfusedyouprobablyendupbacktrackinguntilyoucanpickupthemessageproperly

The above example (TCP 2) is why stuffing binary down TCP becomes quite hard. Unless you can cleanly extract the messages (words/sentences) its hard to process.
With UDP, you can be sure that each packet of data contains a single message.  With TCP, you cannot be sure.  A single TCP read call might give you a message, half a message, 1.5 messages, 2 or even more messages.  You are then stuck trying to break the binary array down, before you feed it into the deserialiser.
Or if you are like me: you just stuff the data into the deserialiser and then it fails because it has no idea what to do with it!
This problem of sending chunks of binary data down the TCP socket annoyed me for quite a while.  The proper way to do it is to use message framing (http://nitoprograms.blogspot.com/2009/04/message-framing.html).  If you want to know what framing is: think of it as adding the spaces and the full stops between the letters you know where one word begins and the next ends.

Binary-Serialisation and TCP

I considered writing a framing algorithm to send/receive the data of the right packet size and so on.  But the trouble is, that all this complexity means, I might as well just go for a simpler solution, like switch back to UDP or use XML instead. 

That it until I suddenly realized I was going about this the wrong way.

I asked myself: ‘why are you using a memory-stream to serialize the object, then take the memory and push it down a network-second stream? Why not serialize directly to the network-stream?’
Do I hacked up the following code:
// Pseudo C# to send
Message m=....  // whatever
IFormatter formatter = new BinaryFormatter();
NetworkStream stream=socket.GetStream();
formatter.Serialize(stream,m); // serialise directly to the network
// Pseudo C# to receive
IFormatter formatter = new BinaryFormatter();
NetworkStream stream=socket.GetStream();
Message m=(Message)formatter.Deserialize(stream); // deserialise
// do your message handling here
I looked at the code & though, “its to simple, it won’t work, the serialiser will not be able to frame the message properly”. But I tried it anyway.
And It worked!

Conclusion

Is this a perfect solution?  Well, almost. I wrote some test routines which had many clients sending messages to and from the server. I managed 10 clients, each sending 200* 2KB message per second.  That’s 4MB/second going into the server and 4MB/seconds going out of the server.
The only time I hit an issue was then I sent so much data in a single message burst that the entire serialiser hung.  I’m not able to explain why yet.  It occurs in the ‘serialise’ function when I send lots of data down the network.
My best guess is that because both sides were writing, the send buffer filled up and blocked, deadlocking the whole system.  If this is the case: having a separate thread just to read the network might fix this.
Overall this is looking quite good. I will be testing more in the future and will let you know.

Anyway, Hope this is useful,
Mark

Code

Here is the code, feel free to mess about with it.

    [Serializable]
    public class Message
    {
        public string s;
        public byte[] megaArray = new byte[1 * 1024];
    }
    class SimpleClient
    {
        static void Main(string[] args)
        {
            new SimpleClient().Go();
        }
        TcpClient sock;
        IFormatter formatter;
        NetworkStream stream;
        public int messageIn, messageOut;

        double Now() { return DateTime.Now.Ticks / 1e7; } // time in seconds
        private void Go()
        {
            sock = new TcpClient("127.0.0.1", 3000);
            stream = sock.GetStream();
            formatter= new BinaryFormatter();
            const int SEND_PER_SECOND=200;
            const double TIME_TO_NEXT_SEND=1.0 / SEND_PER_SECOND;
            double nextSend = Now() + TIME_TO_NEXT_SEND;
            double nextDisplay = Now() + 1;
            while (true)
            {
                if (Now() >= nextSend)
                {
                    SendData();
                    nextSend += TIME_TO_NEXT_SEND;
                }
                RecvData();
                if (Now() >= nextDisplay)
                {
                    Console.WriteLine(string.Format("Messages out {0} Messages In {1}", messageOut, messageIn));
                    messageIn = messageOut = 0;
                    nextDisplay++;
                }
                Thread.Sleep(1);
            }            
        }
        private void SendData()
        {
            Message m = new Message();
            formatter.Serialize(stream, m);
            stream.Flush();
            messageOut++;
        }
        private void RecvData()
        {
            while (stream.DataAvailable)
            {
                Message m = (Message)formatter.Deserialize(stream);
                messageIn++;
            }
        }
    }
    class SimpleServer
    {
        static void Main(string[] args)
        {
            new SimpleServer().Go();
        }
        class Connection
        {
            public TcpClient sock;
            public IFormatter formatter = new BinaryFormatter();
            public NetworkStream stream;
        }
        List<Connection> connections = new List<Connection>();
        TcpListener server;
        public int messageIn, messageOut;

        double Now() { return DateTime.Now.Ticks / 1e7; } // time in seconds
        private void Go()
        {
            server = new TcpListener(IPAddress.Parse("127.0.0.1"), 3000);
            server.Start();
            const int SEND_PER_SECOND = 200;
            const double TIME_TO_NEXT_SEND = 1.0 / SEND_PER_SECOND;
            double nextSend = Now() + TIME_TO_NEXT_SEND;
            double nextDisplay = Now() + 1;
            while (true)
            {
                AcceptConnections();
                if (Now() >= nextSend)
                {
                    SendData();
                    nextSend += TIME_TO_NEXT_SEND;
                }
                RecvData();
                if (Now() >= nextDisplay)
                {
                    Console.WriteLine(string.Format("Messages out {0} Messages In {1}", messageOut, messageIn));
                    messageIn = messageOut = 0;
                    nextDisplay++;
                }
                Thread.Sleep(1);
            }
        }
        private void AcceptConnections()
        {
            if (server.Pending())
            {
                Connection conn = new Connection();
                conn.sock = server.AcceptTcpClient();
                conn.stream = conn.sock.GetStream();
                conn.formatter = new BinaryFormatter();
                connections.Add(conn);
            }
        }
        private void SendData()
        {
            Message m = new Message();
            // send to all
            foreach (Connection c in connections)
            {
                c.formatter.Serialize(c.stream, m);
                c.stream.Flush();
                messageOut++;
            }
        }
        private void RecvData()
        {
            foreach (Connection c in connections)
            {
                while (c.stream.DataAvailable)
                {
                    Message m = (Message)c.formatter.Deserialize(c.stream);
                    messageIn++;
                }
            }
        }
    }