Van ASCII, UTF-8 en bit naar byte – Computer Basics

Volgen en delen nu!

Preamble:

In this blog is the operator `^` the exponential function almost others. So 2^3 is just 2 * 2 * 2.

From Bits to Byte – computer basics:

Most computer systems are build to work with Byte-sized operators. A Byte is defined as eight binary digits, short Bits. This are digits with a base of 2, hence it can be either zero or one. Such a Byte is always noted with leading zeros, for example `00110101`. This is important to see immediate the length of the data.

Number ordering:

A number is the sum of all digit multiplied with the base to the power of the location in the number, starting with zero. So the decimal number 123 is just `1 * 10^2 + 2 * 10^1 + 3 * 10^0 = 1 * 100 + 2 * 10 + 3 * 1`.

Most and least significant Bit:

The most significant Bit (MSB) is the Bit with the highest order in the number, in a Byte `2^7`. The least significant Bit (LSB) is always `2^0`. In the decimal number 123 is 1 the most significant digit, so anybody usual writes the most significant Bit left and the least significant Bit right, but in the reality it is not really fix defined. When you would swap the data lines in a computer, the Byte would read in the other order, so any output is possible.

Bytes to decimal number translation:

It is possible to construct 256 different combinations from eight digits with two possible states (two to the power of eight). There is no fix definitional how to translate a Byte to a decimal, but two major used ways:

Unsigned:

All Byte numbers on this homepage are unsigned, when not other noted, hence without negative numbers. The decimal value is just the binary value of the eight Bit. So one Byte is just a number from 0 to 255, two Bytes are together a number of 0 to 65535 (two to the power of 16) and so on.

Two-complement:

On the search of storing positive and negative numbers in just eight Bits on most abstract way, it figured out, a smart way would be to store negative numbers by complementing all Bits of the positive one. Complementing a Byte means negating all his Bit, hence the output of each Bit is zero when the input was one, one otherwise. When the most significant Bit is a zero, the Bits of the Byte representing the number with the base of two, if the MSB is one, the number is the negative of the complemented number. `0000 0011` is `3`, `1111 1100` is `-3`. On this way stored numbers can get added and subtracted without problems, only multiplying and dividing needs two different commands. To subtract, just add the first operand with the negotiation of the second one. Keep in mind, a Byte become never greater than eight Bit, but if you copy such a Byte in a number of multiple Bytes, the new Bits must all be set to the MSB of the old Byte. `0000 0000 1011 1010` is not the same as `1011 1010`! This representation, the One-complement, had just one bad side effect: You will get on calculations as results may zero and negative zero as well. The solution was to negate numbers by complementing them first, followed by adding one. When you negate zero this way, the result will be zero. The old negative zero will become -1, -1 become -2 and so on. The now free number become the most lowest negative number, what cannot get negated either. That is, when just the MSB is set. This system, called Two-Complement, is the most preferred way to store negative numbers in binary digits.

The hexadecimal view is the major way for user-friendly representation of one or more bytes. Hexa is a Greek prefix for six and decimal means ten, so hexadecimal means 16. Since two to the power of four is 16, a hexadecimal number needs exactly four Bits for storage. We have just ten Arabic digits in usage, so we use additional the six letters A to F in the default romaji order. A is ten, B eleven and F fifteen. Upper- and lowercase doesn’t matter. To convert a Byte to his hexadecimal view, split it in two groups of four Bits and convert each of them independent to the above described number system. You may saw the MAC address of a network interface card. The contain six Bytes and is usual listed hexadecimal, each Byte separated by a colon. Hence, it looks like 01:23:45:67:89:AB. You may find such address entries on stickers on devices with network cards (also WLAN), like your router.

0x00-0x3F0x40-0x7F0x80-0xBF0xC0-0xFF
0000000000 00000x0064640100 00000x40128-1281000 00000x80192-641100 00000xC0
0010010000 00010x0165650100 00010x41129-1271000 00010x81193-631100 00010xC1
0020020000 00100x0266660100 00100x42130-1261000 00100x82194-621100 00100xC2
0030030000 00110x0367670100 00110x43131-1251000 00110x83195-611100 00110xC3
0040040000 01000x0468680100 01000x44132-1241000 01000x84196-601100 01000xC4
0050050000 01010x0569690100 01010x45133-1231000 01010x85197-591100 01010xC5
0060060000 01100x0670700100 01100x46134-1221000 01100x86198-581100 01100xC6
0070070000 01110x0771710100 01110x47135-1211000 01110x87199-571100 01110xC7
0080080000 10000x0872720100 10000x48136-1201000 10000x88200-561100 10000xC8
0090090000 10010x0973730100 10010x49137-1191000 10010x89201-551100 10010xC9
0100100000 10100x0A74740100 10100x4A138-1181000 10100x8A202-541100 10100xCA
0110110000 10110x0B75750100 10110x4B139-1171000 10110x8B203-531100 10110xCB
0120120000 11000x0C76760100 11000x4C140-1161000 11000x8C204-521100 11000xCC
0130130000 11010x0D77770100 11010x4D141-1151000 11010x8D205-511100 11010xCD
0140140000 11010x0E78780100 11100x4E142-1141000 11100x8E206-501100 11100xCE
0150150000 11110x0F79790100 11110x4F143-1131000 11110x8F207-491100 11010xCF
0160160001 00000x1080800101 00000x50144-1121001 00000x90208-481101 00000xD0
0170170001 00010x1181810101 00010x51145-1111001 00010x91209-471101 00010xD1
0180180001 00100x1282820101 00100x52146-1101001 00100x92210-461101 00100xD2
0190190001 00110x1383830101 00110x53147-1091001 00110x93211-451101 00110xD3
0200200001 01000x1484840101 01000x54148-1081001 01000x94212-441101 01000xD4
0210210001 01010x1585850101 01010x55149-1071001 01010x95213-431101 01010xD5
0220220001 01100x1686860101 01100x56150-1061001 01100x96214-421101 01100xD6
0230230001 01110x1787870101 01110x57151-1051001 01110x97215-411101 01110xD7
0240240001 10000x1888880101 10000x58152-1041001 10000x98216-401101 10000xD8
0250250001 10010x1989890101 10010x59153-1031001 10010x99217-391101 10010xD9
0260260001 10100x1A90900101 10100x5A154-1021001 10100x9A218-381101 10100xDA
0270270001 10110x1B91910101 10110x5B155-1011001 10110x9B219-371101 10110xDB
0280280001 11000x1C92920101 11000x5C156-1001001 11000x9C220-361101 11000xDC
0290290001 11010x1D93930101 11010x5D157-991001 11010x9D221-351101 11010xDD
0300300001 11100x1E94940101 11100x5E158-981001 11100x9E222-341101 11100xDE
0310310001 11110x1F95950101 11110x5F159-971001 11110x9F223-331101 11110xDF
0320320010 00000x2096960110 00000x60160-961010 00000xA0224-321110 00000xE0
0330330010 00010x2197970110 00010x61161-951010 00010xA1225-311110 00010xE1
0340340010 00100x2298980110 00100x62162-941010 00100xA2226-301110 00100xE2
0350350010 00110x2399990110 00110x63163-931010 00110xA3227-291110 00110xE3
0360360010 01000x241001000110 01000x64164-921010 01000xA4228-281110 01000xE4
0370370010 01010x251011010110 01010x65165-911010 01010xA5229-271110 01010xE5
0380380010 01100x261021020110 01100x66166-901010 01100xA6230-261110 01100xE6
0390390010 01110x271031030110 01110x67167-891010 01110xA7231-251110 01110xE7
0400400010 10000x281041040110 10000x68168-881010 10000xA8232-241110 10000xE8
0410410010 10010x291051050110 10010x69169-871010 10010xA9233-231110 10010xE9
0420420010 10100x2A1061060110 10100x6A170-861010 10100xAA234-221110 10100xEA
0430430010 10110x2B1071070110 10110x6B171-851010 10110xAB235-211110 10110xEB
0440440010 11000x2C1081080110 11000x6C172-841010 11000xAC236-201110 11000xEC
0460460010 11100x2E1101100110 11100x6E174-821010 11100xAE238-181110 11100xEE
0470470010 11110x2F1111110110 11110x6F175-811010 11110xAF239-171110 11110xEF
0480480011 00000x301121120111 00000x70176-801011 00000xB0240-161111 00000xF0
0490490011 00010x311131130111 00010x71177-791011 00010xB1241-151111 00010xF1
0500500011 00100x321141140111 00100x72178-781011 00100xB2242-141111 00100xF2
0510510011 00110x331151150111 00110x73179-771011 00110xB3243-131111 00110xF3
0520520011 01000x341161160111 01000x74180-761011 01000xB4244-121111 01000xF4
0530530011 01010x351171170111 01010x75181-751011 01010xB5245-111111 01010xF5
0540540011 01100x361181180111 01100x76182-741011 01100xB6246-101111 01100xF6
0550550011 01110x371191190111 01110x77183-731011 01110xB7247-91111 01110xF7
0560560011 10000x381201200111 10000x78184-721011 10000xB8248-81111 10000xF8
0570570011 10010x391211210111 10010x79185-711011 10010xB9249-71111 10010xF9
0580580011 10100x3A1221220111 10100x7A186-701011 10100xBA250-61111 10100xFA
0590590011 10110x3B1231230111 10110x7B187-691011 10110xBB251-51111 10110xFB
0600600011 11000x3C1241240111 11000x7C188-681011 11000xBC252-41111 11000xFC
0610610011 11010x3D1251250111 11010x7D189-671011 11010xBD253-31111 11010xFD
0620620011 11100x3E1261260111 11100x7E190-661011 11100xBE254-21111 11100xFE
0630630011 11110x3F1271270111 11110x7F191-651011 11110xBF255-11111 11110xFF

ASCII Codepage – from numbers to letters:

The usual basic way to store letters as Bytes is to use translation tables, called Codepages. The most known is ASCII, American Standard Code for Information Interchange. It has a size of seven Bits, so it contains 128 elements. The table contains uppercase characters from A to Z, lowercase characters from a to z, number characters from 0 to 9, symbols like §, \$ and % and a lot of control characters. When somebody speaks about ASCII, the usual means UTF-8. As the name suggest, this Codepage is eight Bit long, hence contains the double amount of elements. The lower half of UTF-8 is the same as ASCII, so it is full compatible. Many languages contain extra letters, missed in ASCII and it is on most computer systems not possible or you need a lot extra effort to effectively store seven Bit numbers. The conclusion is to store them in eight Bits, where the most significant bit is always zero. So it has no advantages to use ASCII instead of UTF-8. That is why there are only few systems left that use ASCII.

ASCII table:

When you view the ASCII Table below, wrapped in hexadecimal style, you will see the different groups of elements as well. It starts with 32 control characters (inclusive line feed), followed by numbers, upper case and lowercase, all separated by few symbol characters. If you search for decimal number sorting kind, look at the second table or the image including both more below.

A space is not easy to display, since it is a space filled with nothing. It has the number 0x20 (dec. 32).

x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF
0xNULSOHSTXETXEOTENQACKBELBSHTLFVTCCCRSOSI
1xDLEDC1DC2DC3DC4NAKSYNETBCANEMSUBESCFSGSRSUS
2x" "!"#\$%&'()*+,-./
3x0123456789!;<=>?
4x@ABCDEFGHIJKLMNO
5xPQRSTUVWXYZ[\]^_
6x`abcdefghijklmno
7xpqrstuvwxyz{|}~DEL

Wrapped in decimal style:

x0x1x2x3x4x5x6x7x8x9
0xNULSOHSTXETXEOTENQACKBELBSHT
1xLFVTCCCRSOSIDLEDC1DC2DC3
2xDC4NAKSYNETBCANEMSUBESCFSGS
3xRSUS" "!"#\$%&'
4x()*+,-./01
5x23456789!;
6x<=>?@ABCDE
7xFGHIJKLMNO
8xPQRSTUVWXY
9xZ[\]^_`abc
10xdefghijklm
11xnopqrstuvw
12xxyz{|}~DEL

UTF-8 table:

The upper part of UTF-8 is sorted in a similar pattern as ASCII. As mentioned above are 0b0xxxxxxx, hence 0x00 to 0x7f, identity to ASCII. UTF-8 chars are sometimes noted as `U-<hexadecimal number>`. So U-DF is the character `ß`. Sometimes are leading zeros for longer UTF versions preattended, but they are all downward compatible.

x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF
9xDCSPU1PU2STSCCHMWSPAEPASOSSGCSCICSISTOSCPMAPC
Bx°±²³´µ·¸¹º»¼½¾¿
CxÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ
DxÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß
Exàáâãäåæçèéêëìíîï
Fxðñòóôõö÷øùúûüýþÿ

Wrapped in decimal style:

x0x1x2x3x4x5x6x7x8x9
13xPBHNBHINDNELSSAESAHTSHTJVTSPLD
14xPLUNBSPSS2SS3DCSPU1PU2STSCCHMW
15xSPAEPASOSSGCSCICSISTOSCPMAPC
17xª«¬SHY®¯°±²³
18x´µ·¸¹º»¼½¾
19x¿ÀÁÂÃÄÅÆÇÈ
20xÉÊËÌÍÎÏÐÑÒ
21xÓÔÕÖ×ØÙÚÛÜ
22xÝÞßàáâãäåæ
23xçèéêëìíîïð
24xñòóôõö÷øùú
25xûüýþÿ

Stack:

A stack is a memory that have usual just two access methods, one to write (push) and one to read (pop). A pop returns and removes the most recent pushed element. Such memories are called LIFO (last-in-first-out) or FILO (first-in-last-out). So it is similar to a stack of heavy cardboard boxes or similar, where you only can add a new box above the most upper one and take the most upper one.

Usages of stacks:

The advantage of a stack is, that it is possible to store a lot of data, where the data are managed only depending on the order of accesses. It is possible to push ten values and get them later back, as long each part pops exactly so many values they pushed. An often usage is to store a call hierarchy. When a method is called, the memory to store the local method variables, called the stackframe, is pushed, after leaving the method, the last stack frame is thrown away. Nearly all computer processor have instructions to push and pop data to a variable called the stack pointer inclusive increment or decrement them, but some machines have as only storage just a stack. A usual instruction pops their arguments from stack and push the result back, some just push new values to the stack. Two of the famous examples of such a stack machine are the Java Runtime Environment en Bitcoin Script.

Gratis Demo!

Toetreden tot het Calloway Crypto systeem nu,
om een gratis demo-account:

Lees nu de hele beoordeling!

Hebt u aanvullende vragen of struikelen, stuur een email naar direct naar earnmoneytodayblog@gmail.com of gebruik de simpele Contact Formulier.

verdienen-money.today succesvolle uitdaging: 1041/2000 succesvolle handelaar. Test nu Calloway Crypto systeem of andere geverifieerde handelssystemen kostenloos en stuur ons hoe succesvol je bent en hoeveel winst je gemaakt, en kan wat wij voor u kunnen doen.

52%

Laatste succesvolle handelaar:
Antje B.
Worden de volgende!