Reference |
Serial Data |
|
How serial protocols encode text and binary data, and how to generate data in your driver.
Serial protocols transmit data one "piece" at a time: each piece is generally the smallest divisible unit of information. In most computer and electronic systems, this is called a byte (or octet), comprising of 8 bits (binary digits). A byte can hold a decimal value of 0-255 or a hexadecimal A base-16 number system, also called hex: decimal values 0 through 15 are represented by the digits 0 through 9 and the alphabet digits A through F. equivalent 0x00-0xFF.
In the introductory section of the Write my Own Driver sample, the following aspects of protocol design are discussed:
Transport: the communication method used, e.g. serial, TCP/IP, UDP, SNMP, HTTP,
Port Configuration: port settings, e.g. baud rate, IP address, TCP or UDP ports,
Command Format: whether the command data is transmitted as text or binary, encoding schemes, line terminators, etc.
Command Structure: how commands are put together, e.g. start characters, lengths, delimiters, parameters, end characters, checksums, etc.
The Transport and Port Configuration aspects deal with the low-level connection between computers and other electronic equipment. Command Format and Command Structure are the aspects that we're most interested in when implementing communication protocols.
One of the most critical aspects of a protocol is the format of the commands and parameters that are sent or received from the device. A command and its parameters will typically be a specific group of bytes, perhaps of a fixed length, called a packet.
There are basically two ways of representing, or encoding, commands and values using packets of bytes: text or binary.
Text encoding uses a numeric representation of alphabetic letters, digits and other symbols, that conforms to a chosen encoding format, so that each end of the communication link can correctly interpret the encoded text. A common encoding format is ASCII, which can represent any upper- or lower-case character from the Latin alphabet, digits, punctuation symbols and a few other characters, 128 in total, all in the space of one byte.
For example, the letter Z is represented in ASCII by the number 90 (0x5A), the digit 9 by the number 57 (0x39) and the symbol $ by 36 (0x24). In Microsoft.Net text can be stored as individual characters or as groups of characters called strings.
ASCII is the most common and readily supported format for text-based control protocols.
The String to Byte Array pattern illustrates how to display each character of a string and the hex value of the character's ASCII code.
ASCII encoding can only facilitate text from languages that use the Latin alphabet: it can't be used to encode Japanese or Arabic, for example. It also is limited in that it only contains 128 different characters.
This was addressed in the early 1990s by the development of the Unicode encoding format, which aims to be able to accommodate most written languages and their variants. Unicode currently facilitates 40 different modern writing alphabets, another 10 or so ancient writing systems for academic purposes, plus a large number of other symbols: punctuation, mathematical, industrial and graphical. It currently has already defined more than 100,000 encoded characters. Unicode encoding does come at a cost however: a single character requires four bytes of storage, as compared to ASCII's one.
There are many other encoding schemes that have been developed at different times, and for different languages and purposes. However, beyond ASCII and Unicode, language-specific encoding becomes complicated very quickly: just take a look at your favorite web browser's "Encoding" options to see how ridiculous things get.
Note
ASCII can also be used to encode binary data; an example of this is email attachments: binary files are converted to text using an encoding format called base 64.
Binary encoding just uses numeric values to represent commands or functions. For example, the value 1 (0x1) may mean "start", the value 2 (0x2) may mean "stop" etc.
In Microsoft.Net we can store a complete binary encoded packet in a byte array, which is a group of bytes in a particular order: any byte can be individually read or changed.
Note
The ASCII values of digit characters within text strings (e.g. "The time is 10 o'clock") are not the same as binary encoded numeric values. That is, the string "10" is encoded as two character bytes: 0x31 0x30, whereas the number 10 has a single byte value of 0x0A.
Even some binary protocols may still require text information to be included in a command packet, and therefore a mix of text and binary encoding may be used in the same protocol.
The Append String to Byte Array pattern illustrates how to append an ASCII encoded string to end of a byte array.
The Byte Array to String pattern retrieves an ASCII encoded string from an array of bytes.
Refer to Unidirectional Drivers to learn more about command structures.