From ASCII, UTF-8 and Bit to Byte – Computer basics

Follow and share now!


In this blog is the operator ^ the exponential function almost others. So 2^3 is just 2 * 2 * 2.

From Bits to Byte – computer basics:

Most computer systems are build to work with Byte-sized operators. A Byte is defined as eight binary digits, short Bits. This are digits with a base of 2, hence it can be either zero or one. Such a Byte is always noted with leading zeros, for example 00110101. This is important to see immediate the length of the data.

Number ordering:

A number is the sum of all digit multiplied with the base to the power of the location in the number, starting with zero. So the decimal number 123 is just 1 * 10^2 + 2 * 10^1 + 3 * 10^0 = 1 * 100 + 2 * 10 + 3 * 1.

Most and least significant Bit:

The most significant Bit (MSB) is the Bit with the highest order in the number, in a Byte 2^7. The least significant Bit (LSB) is always 2^0. In the decimal number 123 is 1 the most significant digit, so anybody usual writes the most significant Bit left and the least significant Bit right, but in the reality it is not really fix defined. When you would swap the data lines in a computer, the Byte would read in the other order, so any output is possible.

Bytes to decimal number translation:

It is possible to construct 256 different combinations from eight digits with two possible states (two to the power of eight). There is no fix definitional how to translate a Byte to a decimal, but two major used ways:


All Byte numbers on this homepage are unsigned, when not other noted, hence without negative numbers. The decimal value is just the binary value of the eight Bit. So one Byte is just a number from 0 to 255, two Bytes are together a number of 0 to 65535 (two to the power of 16) and so on.


On the search of storing positive and negative numbers in just eight Bits on most abstract way, it figured out, a smart way would be to store negative numbers by complementing all Bits of the positive one. Complementing a Byte means negating all his Bit, hence the output of each Bit is zero when the input was one, one otherwise.  When the most significant Bit is a zero, the Bits of the Byte representing the number with the base of two, if the MSB is one, the number is the negative of the complemented number. 0000 0011 is 3, 1111 1100 is -3. On this way stored numbers can get added and subtracted without problems, only multiplying and dividing needs two different commands. To subtract, just add the first operand with the negotiation of the second one. Keep in mind, a Byte become never greater than eight Bit, but if you copy such a Byte in a number of multiple Bytes, the new Bits must all be set to the MSB of the old Byte. 0000 0000 1011 1010 is not the same as 1011 1010! This representation, the One-complement, had just one bad side effect: You will get on calculations as results may zero and negative zero as well. The solution was to negate numbers by complementing them first, followed by adding one. When you negate zero this way, the result will be zero. The old negative zero will become -1, -1 become -2 and so on. The now free number become the most lowest negative number, what cannot get negated either. That is, when just the MSB is set. This system, called Two-Complement, is the most preferred way to store negative numbers in binary digits.

Hexadecimal view:

The hexadecimal view is the major way for user-friendly representation of one or more bytes. Hexa is a Greek prefix for six and decimal means ten, so hexadecimal means 16. Since two to the power of four is 16, a hexadecimal number needs exactly four Bits for storage. We have just ten Arabic digits in usage, so we use additional the six letters A to F in the default romaji order. A is ten, B eleven and F fifteen. Upper- and lowercase doesn’t matter. To convert a Byte to his hexadecimal view, split it in two groups of four Bits and convert each of them independent to the above described number system. You may saw the MAC address of a network interface card. The contain six Bytes and is usual listed hexadecimal, each Byte separated by a colon. Hence, it looks like 01:23:45:67:89:AB. You may find such address entries on stickers on devices with network cards (also WLAN), like your router.

0000000000 00000x0064640100 00000x40128-1281000 00000x80192-641100 00000xC0
0010010000 00010x0165650100 00010x41129-1271000 00010x81193-631100 00010xC1
0020020000 00100x0266660100 00100x42130-1261000 00100x82194-621100 00100xC2
0030030000 00110x0367670100 00110x43131-1251000 00110x83195-611100 00110xC3
0040040000 01000x0468680100 01000x44132-1241000 01000x84196-601100 01000xC4
0050050000 01010x0569690100 01010x45133-1231000 01010x85197-591100 01010xC5
0060060000 01100x0670700100 01100x46134-1221000 01100x86198-581100 01100xC6
0070070000 01110x0771710100 01110x47135-1211000 01110x87199-571100 01110xC7
0080080000 10000x0872720100 10000x48136-1201000 10000x88200-561100 10000xC8
0090090000 10010x0973730100 10010x49137-1191000 10010x89201-551100 10010xC9
0100100000 10100x0A74740100 10100x4A138-1181000 10100x8A202-541100 10100xCA
0110110000 10110x0B75750100 10110x4B139-1171000 10110x8B203-531100 10110xCB
0120120000 11000x0C76760100 11000x4C140-1161000 11000x8C204-521100 11000xCC
0130130000 11010x0D77770100 11010x4D141-1151000 11010x8D205-511100 11010xCD
0140140000 11010x0E78780100 11100x4E142-1141000 11100x8E206-501100 11100xCE
0150150000 11110x0F79790100 11110x4F143-1131000 11110x8F207-491100 11010xCF
0160160001 00000x1080800101 00000x50144-1121001 00000x90208-481101 00000xD0
0170170001 00010x1181810101 00010x51145-1111001 00010x91209-471101 00010xD1
0180180001 00100x1282820101 00100x52146-1101001 00100x92210-461101 00100xD2
0190190001 00110x1383830101 00110x53147-1091001 00110x93211-451101 00110xD3
0200200001 01000x1484840101 01000x54148-1081001 01000x94212-441101 01000xD4
0210210001 01010x1585850101 01010x55149-1071001 01010x95213-431101 01010xD5
0220220001 01100x1686860101 01100x56150-1061001 01100x96214-421101 01100xD6
0230230001 01110x1787870101 01110x57151-1051001 01110x97215-411101 01110xD7
0240240001 10000x1888880101 10000x58152-1041001 10000x98216-401101 10000xD8
0250250001 10010x1989890101 10010x59153-1031001 10010x99217-391101 10010xD9
0260260001 10100x1A90900101 10100x5A154-1021001 10100x9A218-381101 10100xDA
0270270001 10110x1B91910101 10110x5B155-1011001 10110x9B219-371101 10110xDB
0280280001 11000x1C92920101 11000x5C156-1001001 11000x9C220-361101 11000xDC
0290290001 11010x1D93930101 11010x5D157-991001 11010x9D221-351101 11010xDD
0300300001 11100x1E94940101 11100x5E158-981001 11100x9E222-341101 11100xDE
0310310001 11110x1F95950101 11110x5F159-971001 11110x9F223-331101 11110xDF
0320320010 00000x2096960110 00000x60160-961010 00000xA0224-321110 00000xE0
0330330010 00010x2197970110 00010x61161-951010 00010xA1225-311110 00010xE1
0340340010 00100x2298980110 00100x62162-941010 00100xA2226-301110 00100xE2
0350350010 00110x2399990110 00110x63163-931010 00110xA3227-291110 00110xE3
0360360010 01000x241001000110 01000x64164-921010 01000xA4228-281110 01000xE4
0370370010 01010x251011010110 01010x65165-911010 01010xA5229-271110 01010xE5
0380380010 01100x261021020110 01100x66166-901010 01100xA6230-261110 01100xE6
0390390010 01110x271031030110 01110x67167-891010 01110xA7231-251110 01110xE7
0400400010 10000x281041040110 10000x68168-881010 10000xA8232-241110 10000xE8
0410410010 10010x291051050110 10010x69169-871010 10010xA9233-231110 10010xE9
0420420010 10100x2A1061060110 10100x6A170-861010 10100xAA234-221110 10100xEA
0430430010 10110x2B1071070110 10110x6B171-851010 10110xAB235-211110 10110xEB
0440440010 11000x2C1081080110 11000x6C172-841010 11000xAC236-201110 11000xEC
0450450010 11010x2D1091090110 11010x6D173-831010 11010xAD237-191110 11010xED
0460460010 11100x2E1101100110 11100x6E174-821010 11100xAE238-181110 11100xEE
0470470010 11110x2F1111110110 11110x6F175-811010 11110xAF239-171110 11110xEF
0480480011 00000x301121120111 00000x70176-801011 00000xB0240-161111 00000xF0
0490490011 00010x311131130111 00010x71177-791011 00010xB1241-151111 00010xF1
0500500011 00100x321141140111 00100x72178-781011 00100xB2242-141111 00100xF2
0510510011 00110x331151150111 00110x73179-771011 00110xB3243-131111 00110xF3
0520520011 01000x341161160111 01000x74180-761011 01000xB4244-121111 01000xF4
0530530011 01010x351171170111 01010x75181-751011 01010xB5245-111111 01010xF5
0540540011 01100x361181180111 01100x76182-741011 01100xB6246-101111 01100xF6
0550550011 01110x371191190111 01110x77183-731011 01110xB7247-91111 01110xF7
0560560011 10000x381201200111 10000x78184-721011 10000xB8248-81111 10000xF8
0570570011 10010x391211210111 10010x79185-711011 10010xB9249-71111 10010xF9
0580580011 10100x3A1221220111 10100x7A186-701011 10100xBA250-61111 10100xFA
0590590011 10110x3B1231230111 10110x7B187-691011 10110xBB251-51111 10110xFB
0600600011 11000x3C1241240111 11000x7C188-681011 11000xBC252-41111 11000xFC
0610610011 11010x3D1251250111 11010x7D189-671011 11010xBD253-31111 11010xFD
0620620011 11100x3E1261260111 11100x7E190-661011 11100xBE254-21111 11100xFE
0630630011 11110x3F1271270111 11110x7F191-651011 11110xBF255-11111 11110xFF

ASCII Codepage – from numbers to letters:

The usual basic way to store letters as Bytes is to use translation tables, called Codepages. The most known is ASCII, American Standard Code for Information Interchange. It has a size of seven Bits, so it contains 128 elements. The table contains uppercase characters from A to Z, lowercase characters from a to z, number characters from 0 to 9, symbols like §, $ and % and a lot of control characters. When somebody speaks about ASCII, the usual means UTF-8. As the name suggest, this Codepage is eight Bit long, hence contains the double amount of elements. The lower half of UTF-8 is the same as ASCII, so it is full compatible. Many languages contain extra letters, missed in ASCII and it is on most computer systems not possible or you need a lot extra effort to effectively store seven Bit numbers. The conclusion is to store them in eight Bits, where the most significant bit is always zero. So it has no advantages to use ASCII instead of UTF-8. That is why there are only few systems left that use ASCII.

ASCII table:

Wrapped in hexadecimal style:

When you view the ASCII Table below, wrapped in hexadecimal style, you will see the different groups of elements as well. It starts with 32 control characters (inclusive line feed), followed by numbers, upper case and lowercase, all separated by few symbol characters. If you search for decimal number sorting kind, look at the second table or the image including both more below.

A space is not easy to display, since it is a space filled with nothing. It has the number 0x20 (dec. 32).

2x" "!"#$%&'()*+,-./

Wrapped in decimal style:

3xRSUS" "!"#$%&'

As image:

ASCII Table wrapped hexadecimal and decimal.

UTF-8 table:

Wrapped in hexadecimal style:

The upper part of UTF-8 is sorted in a similar pattern as ASCII. As mentioned above are 0b0xxxxxxx, hence 0x00 to 0x7f, identity to ASCII. UTF-8 chars are sometimes noted as U-<hexadecimal number>. So U-DF is the character ß. Sometimes are leading zeros for longer UTF versions preattended, but they are all downward compatible.


Wrapped in decimal style:


As image:



A stack is a memory that have usual just two access methods, one to write (push) and one to read (pop). A pop returns and removes the most recent pushed element. Such memories are called LIFO (last-in-first-out) or FILO (first-in-last-out). So it is similar to a stack of heavy cardboard boxes or similar, where you only can add a new box above the most upper one and take the most upper one.

Usages of stacks:

The advantage of a stack is, that it is possible to store a lot of data, where the data are managed only depending on the order of accesses. It is possible to push ten values and get them later back, as long each part pops exactly so many values they pushed. An often usage is to store a call hierarchy. When a method is called, the memory to store the local method variables, called the stackframe, is pushed, after leaving the method, the last stack frame is thrown away. Nearly all computer processor have instructions to push and pop data to a variable called the stack pointer inclusive increment or decrement them, but some machines have as only storage just a stack. A usual instruction pops their arguments from stack and push the result back, some just push new values to the stack. Two of the famous examples of such a stack machine are the Java Runtime Environment and Bitcoin Script.


Base is an algorithm to translate a stream of Bytes in an alphabet of displayable characters. The most know one is Base64 with 64 symbols. So it reads the incoming data in parts of six Bit. Such a representation is a way shorter than the hexadezimal view used here, but only few people are able to de- or encode them in their head. Bitcoin defines his own version Base58 for various reasons, it contains for example only symbols that don’t break the selection by double clicking on most implementations.


There are two usual ways to encrypt data: symmetric encryption and asymmetric encryption. Hashing is sometimes called an encryption, but this is technical incorrect. Encryption means to change a stream of data, the plain data, by an algorithm with the usage of a second byte argument, the key. A modern encryption algorithm is called safe, if there is no realistic way known to convert parts of the encrypted data back, as long the key is unknown. An according decryption algorithm changes the encrypted data back to the plain data. This can be the same as the encryption algorthm, may do the same as the encryption algorithm in reverse order or may be completely differently. The key doesn’t need to be the same also.

Symmetric encryption:

Symmetric encryption algorithms like Rijndael, Blowfish and Serpent are the from users most direct used ones. It is primary used to encrypt local data. You have one byte sequence as key like a passphrase, a file filled with random bytes or something else. The encryption algorithm change the plain data basing on your key, so hopefully nobody can read them. If you need your data back, the according decrypt algorithm calculate the encrypted data with the usage of your key back to their initial form.

Asymmetric encryption:

Asymmetric encryption algorithms like RSA generating a key pair. Key pairs contains a public and private key. In the case of RSA, both keys a just two very big numbers, the exponent and the modulus, where the modulus is usual the same and 0x0100 (65536). Keys are usual just written in files, it would be laborious to transfer a key by hand. All you encrypted with one of the keys in a key pair, you can only decrypt with the other key. That allows…

  1. encrypted communication: Symmetric encryption allows only encrypted communication, if the key is transferred over an absolutely safe channel, like personally. To set up a secured tunnel between Kim and Sarah, both girls generate a key pair and share their public keys. When Kim sends Sarah encrypted data, she encrypt the data first with Sarahs public key, so she can decrypt the data with her private key. In fact, there is usual a hybrid encryption used. The content of the transmitted data becomes symmetric encrypted with a secure random generated password, only this password is transferred asymmetric encrypted. That is, because symmetric encryption is fast by dimensions and easier to use.
  2. safer logins (than password-based authentication): That is mostly seen per SSH, to log into a remote server. The server has a list of public keys, allowed to log in. When the user want to log in, the server generates a random password, called the challenge. That one become send encrypted by the public key send to the user, who needs to send the challenge decrypted back. That make it for attackers virtually impossible to guest the password. The private key file is usual symmetric encrypt with a password for additional protection.
  3. signatures: A signature is a proof of owning the private key. There are different kind of signatures, for example to encrypt the hash of data. When anybody knows my public key, and I publish a text article, I could hash the text and encrypt the hash. So it would be easy to valid the article was created or at least signed by me, by decrypt the hash with my public key and compare them with the real text hash.

It is always very important to confirm, the gotten public key is really the public key of this person. A public key written on a website would not be very trustfully, especially to prove a signature on the same page. The Pidgin-Plugin Off-the-record implements, for example, a way for an intimate answer to mark the public key as trustworthy.

Post Your Comment Here

Your email address will not be published. Required fields are marked *