The ASCII text encoding is an alphabet. It contains 128 characters.
34 characters are non-printing control characters. 94 characters are printable (they have individual visible glyphs).
Notes:
- Many of the control characters are no longer used.
- The horizontal tab (HT), the line feed (LF), and the space (SPA) characters are whitespace characters. They are non-printing control characters but they are used to affect the visible layout of the printable text.
- The line feed character is often known as the "newline" character. The vertical tab has fallen out of use, so the horizontal tab character is often known simply as the "tab" character.
- The other whitespace characters are vertical tab (VT), carriage return (CR), and form feed (FF). They are rarely used. Exception: Windows uses CR+LF as its line termination sequence.
The first 33 characters (characters 0-32) and the last character (character 127) are the non-printing control characters. Characters 33-126 are the printable ASCII characters.
Table 1: ASCII text encoding | |||
0 | |||
Note: The binary and hex values are 8-bit. | |||
0 | |||
Decimal | Character | Binary | Hex |
0 | |||
0 | NUL | 0000 0000 | 00 |
1 | SOH | 0000 0001 | 01 |
2 | STX | 0000 0010 | 02 |
3 | ETX | 0000 0011 | 03 |
4 | EOT | 0000 0100 | 04 |
5 | ENQ | 0000 0101 | 05 |
6 | ACK | 0000 0110 | 06 |
7 | BEL | 0000 0111 | 07 |
8 | BS | 0000 1000 | 08 |
9 | HT | 0000 1001 | 09 |
10 | LF | 0000 1010 | 0a |
11 | VT | 0000 1011 | 0b |
12 | FF | 0000 1100 | 0c |
13 | CR | 0000 1101 | 0d |
14 | SO | 0000 1110 | 0e |
15 | SI | 0000 1111 | 0f |
16 | DLE | 0001 0000 | 10 |
17 | DC1 | 0001 0001 | 11 |
18 | DC2 | 0001 0010 | 12 |
19 | DC3 | 0001 0011 | 13 |
20 | DC4 | 0001 0100 | 14 |
21 | NAK | 0001 0101 | 15 |
22 | SYN | 0001 0110 | 16 |
23 | ETB | 0001 0111 | 17 |
24 | CAN | 0001 1000 | 18 |
25 | EM | 0001 1001 | 19 |
26 | SUB | 0001 1010 | 1a |
27 | ESC | 0001 1011 | 1b |
28 | FS | 0001 1100 | 1c |
29 | GS | 0001 1101 | 1d |
30 | RS | 0001 1110 | 1e |
31 | US | 0001 1111 | 1f |
32 | SPA | 0010 0000 | 20 |
33 | ! | 0010 0001 | 21 |
34 | " | 0010 0010 | 22 |
35 | # | 0010 0011 | 23 |
36 | $ | 0010 0100 | 24 |
37 | % | 0010 0101 | 25 |
38 | & | 0010 0110 | 26 |
39 | ' | 0010 0111 | 27 |
40 | ( | 0010 1000 | 28 |
41 | ) | 0010 1001 | 29 |
42 | * | 0010 1010 | 2a |
43 | + | 0010 1011 | 2b |
44 | , | 0010 1100 | 2c |
45 | - | 0010 1101 | 2d |
46 | . | 0010 1110 | 2e |
47 | / | 0010 1111 | 2f |
48 | 0 | 0011 0000 | 30 |
49 | 1 | 0011 0001 | 31 |
50 | 2 | 0011 0010 | 32 |
51 | 3 | 0011 0011 | 33 |
52 | 4 | 0011 0100 | 34 |
53 | 5 | 0011 0101 | 35 |
54 | 6 | 0011 0110 | 36 |
55 | 7 | 0011 0111 | 37 |
56 | 8 | 0011 1000 | 38 |
57 | 9 | 0011 1001 | 39 |
58 | : | 0011 1010 | 3a |
59 | ; | 0011 1011 | 3b |
60 | < | 0011 1100 | 3c |
61 | = | 0011 1101 | 3d |
62 | > | 0011 1110 | 3e |
63 | ? | 0011 1111 | 3f |
64 | @ | 0100 0000 | 40 |
65 | A | 0100 0001 | 41 |
66 | B | 0100 0010 | 42 |
67 | C | 0100 0011 | 43 |
68 | D | 0100 0100 | 44 |
69 | E | 0100 0101 | 45 |
70 | F | 0100 0110 | 46 |
71 | G | 0100 0111 | 47 |
72 | H | 0100 1000 | 48 |
73 | I | 0100 1001 | 49 |
74 | J | 0100 1010 | 4a |
75 | K | 0100 1011 | 4b |
76 | L | 0100 1100 | 4c |
77 | M | 0100 1101 | 4d |
78 | N | 0100 1110 | 4e |
79 | O | 0100 1111 | 4f |
80 | P | 0101 0000 | 50 |
81 | Q | 0101 0001 | 51 |
82 | R | 0101 0010 | 52 |
83 | S | 0101 0011 | 53 |
84 | T | 0101 0100 | 54 |
85 | U | 0101 0101 | 55 |
86 | V | 0101 0110 | 56 |
87 | W | 0101 0111 | 57 |
88 | X | 0101 1000 | 58 |
89 | Y | 0101 1001 | 59 |
90 | Z | 0101 1010 | 5a |
91 | [ | 0101 1011 | 5b |
92 | \ | 0101 1100 | 5c |
93 | ] | 0101 1101 | 5d |
94 | ^ | 0101 1110 | 5e |
95 | _ | 0101 1111 | 5f |
96 | ` | 0110 0000 | 60 |
97 | a | 0110 0001 | 61 |
98 | b | 0110 0010 | 62 |
99 | c | 0110 0011 | 63 |
100 | d | 0110 0100 | 64 |
101 | e | 0110 0101 | 65 |
102 | f | 0110 0110 | 66 |
103 | g | 0110 0111 | 67 |
104 | h | 0110 1000 | 68 |
105 | i | 0110 1001 | 69 |
106 | j | 0110 1010 | 6a |
107 | k | 0110 1011 | 6b |
108 | l | 0110 1100 | 6c |
109 | m | 0110 1101 | 6d |
110 | n | 0110 1110 | 6e |
111 | o | 0110 1111 | 6f |
112 | p | 0111 0000 | 70 |
113 | q | 0111 0001 | 71 |
114 | r | 0111 0010 | 72 |
115 | s | 0111 0011 | 73 |
116 | t | 0111 0100 | 74 |
117 | u | 0111 0101 | 75 |
118 | v | 0111 0110 | 76 |
119 | w | 0111 0111 | 77 |
120 | x | 0111 1000 | 78 |
121 | y | 0111 1001 | 79 |
122 | z | 0111 1010 | 7a |
123 | { | 0111 1011 | 7b |
124 | | | 0111 1100 | 7c |
125 | } | 0111 1101 | 7d |
126 | ~ | 0111 1110 | 7e |
127 | DEL | 0111 1111 | 7f |
Table 2: Names of the non-printing control characters in ASCII | ||
0 | ||
Decimal | Character | Name |
0 | ||
0 | NUL | Null character |
1 | SOH | Start of Heading |
2 | STX | Start of Text |
3 | ETX | End of Text |
4 | EOT | End of Transmission |
5 | ENQ | Enquiry |
6 | ACK | Acknowledgement |
7 | BEL | Audible Bell or Alarm |
8 | BS | Backspace |
9 | HT | Horizontal Tab |
10 | LF | Line Feed |
11 | VT | Vertical Tab |
12 | FF | Form Feed |
13 | CR | Carriage Return |
14 | SO | Shift Out |
15 | SI | Shift In |
16 | DLE | Data Link Escape |
17 | DC1 | Device Control 1: Resume Transmission. Also known as XON. |
18 | DC2 | Device Control 2 |
19 | DC3 | Device Control 3: Suspend Transmission. Also known as XOF. |
20 | DC4 | Device Control 4 |
21 | NAK | Negative Acknowledgement |
22 | SYN | Synchronous Idle |
23 | ETB | End of Transmission Block |
24 | CAN | Cancel |
25 | EM | End of Medium |
26 | SUB | Substitute |
27 | ESC | Escape |
28 | FS | File Separator |
29 | GS | Group Separator |
30 | RS | Record Separator |
31 | US | Unit Separator |
32 | SPA | Space |
127 | DEL | Delete |
Table 3: Names of the printable characters in ASCII | ||
0 | ||
Decimal | Character | Name |
0 | ||
33 | ! | Exclamation Mark |
34 | " | Quotation Mark |
35 | # | Number Sign |
36 | $ | Dollar Sign |
37 | % | Percent Sign |
38 | & | Ampersand |
39 | ' | Apostrophe |
40 | ( | Left Parenthesis |
41 | ) | Right Parenthesis |
42 | * | Asterisk |
43 | + | Plus Sign |
44 | , | Comma |
45 | - | Hyphen-Minus (also known as Hyphen or Minus) |
46 | . | Full Stop (also known as Period or Dot) |
47 | / | Solidus (also known as Slash) |
48 | 0 | Digit Zero |
49 | 1 | Digit One |
50 | 2 | Digit Two |
51 | 3 | Digit Three |
52 | 4 | Digit Four |
53 | 5 | Digit Five |
54 | 6 | Digit Six |
55 | 7 | Digit Seven |
56 | 8 | Digit Eight |
57 | 9 | Digit Nine |
58 | : | Colon |
59 | ; | Semicolon |
60 | < | Less-Than Sign |
61 | = | Equals Sign |
62 | > | Greater-Than Sign |
63 | ? | Question Mark |
64 | @ | Commercial At (also known as At Sign) |
65 | A | Latin Capital Letter A |
66 | B | Latin Capital Letter B |
67 | C | Latin Capital Letter C |
68 | D | Latin Capital Letter D |
69 | E | Latin Capital Letter E |
70 | F | Latin Capital Letter F |
71 | G | Latin Capital Letter G |
72 | H | Latin Capital Letter H |
73 | I | Latin Capital Letter I |
74 | J | Latin Capital Letter J |
75 | K | Latin Capital Letter K |
76 | L | Latin Capital Letter L |
77 | M | Latin Capital Letter M |
78 | N | Latin Capital Letter N |
79 | O | Latin Capital Letter O |
80 | P | Latin Capital Letter P |
81 | Q | Latin Capital Letter Q |
82 | R | Latin Capital Letter R |
83 | S | Latin Capital Letter S |
84 | T | Latin Capital Letter T |
85 | U | Latin Capital Letter U |
86 | V | Latin Capital Letter V |
87 | W | Latin Capital Letter W |
88 | X | Latin Capital Letter X |
89 | Y | Latin Capital Letter Y |
90 | Z | Latin Capital Letter Z |
91 | [ | Left Square Bracket |
92 | \ | Reverse Solidus (also known as Backslash) |
93 | ] | Right Square Bracket |
94 | ^ | Circumflex Accent |
95 | _ | Low Line (also known as Underscore) |
96 | ` | Grave Accent (also known as Backtick) |
97 | a | Latin Small Letter A |
98 | b | Latin Small Letter B |
99 | c | Latin Small Letter C |
100 | d | Latin Small Letter D |
101 | e | Latin Small Letter E |
102 | f | Latin Small Letter F |
103 | g | Latin Small Letter G |
104 | h | Latin Small Letter H |
105 | i | Latin Small Letter I |
106 | j | Latin Small Letter J |
107 | k | Latin Small Letter K |
108 | l | Latin Small Letter L |
109 | m | Latin Small Letter M |
110 | n | Latin Small Letter N |
111 | o | Latin Small Letter O |
112 | p | Latin Small Letter P |
113 | q | Latin Small Letter Q |
114 | r | Latin Small Letter R |
115 | s | Latin Small Letter S |
116 | t | Latin Small Letter T |
117 | u | Latin Small Letter U |
118 | v | Latin Small Letter V |
119 | w | Latin Small Letter W |
120 | x | Latin Small Letter X |
121 | y | Latin Small Letter Y |
122 | z | Latin Small Letter Z |
123 | { | Left Curly Bracket (also known as Left Brace) |
124 | | | Vertical Line (also known as Vertical Bar) |
125 | } | Right Curly Bracket (also known as Right Brace) |
126 | ~ | Tilde |
[start of notes]
ASCII stands for "American Standard Code for Information Interchange".
Here are my notes on ASCII, taken from various sources.
Source 1:
rabbit.eng.miami.edu/info/ascii.html
Author: First name appears to be Stephen. Did not find a surname. Seems to be a professor (?) of computer science (?) at the University of Miami.
Excerpts:
ASCII is a seven bit code, it only defines codes from 0 to 127. Codes outside this range are not part of ASCII, and vary in meaning considerably. The version in use today is more completely called ASCII-1967 (it was adopted in 1967), and there are two slightly different earlier versions documented below.
ASCII uses only seven bits. Although it was communicated in eight-bit bytes, normal communication channels were unreliable. The 8th bit was used for error checking (parity). Typically the 8th bit was set to ensure that there was always an odd number of 1's in each byte transmitted (e.g. '$' is binary 0100100 which has an even number on 1's so is transmitted as 10100100; 'F' is binary 1000110 with an odd number of 1's so is transmitted as 01000110), but even parity systems were also used. The receiving equipment would simply check the parity of each byte; any single-bit inversion would be detected, and large errors were very likely to be noticed.
Seven bit code was not considered strange; it is only comparatively recently that computers with eight-bit byte based memory became an accidental standard. The Dec-system-10 had 36-bit memory, then ICL-1900 had 24-bit memory, and the CDC-6600 had 60-bit memory, to name but three.
[...]
Older ASCII versions
ASCII-1963 was the same as the current (1967) version except:
- What is now usually rendered as a hat ^ was rendered as an arrow pointing up,
- What is now rendered as an underline was rendered as an arrow pointing left,
- The last 32 codes were not assigned: lower-case letters did not exist,
- The invisible control codes (1 to 31) had different official abbreviations.
ASCII-1965 was the same as the current (1967) version except:
- What is now the backwards-divide sign \ was the wiggle sign ~,
- What is now the vertical line | was the logical-not sign ,
- What is now the wiggle sign ~ was the vertical line |,
- The at-sign @ and the backwards-single-quote ` swapped places.
The symbols (e.g. NUL, SOH) and names for the non-printing control characters come from source 1.
From source 1, I also learned that in C / Unix:
- The NUL character is used to signify "end of string".
- The EOT character is used to signify "end of file".
- The LF character is used to signify "end of line". (I already knew this, but included it here for the completeness of the pattern).
Additionally:
- The SUB character is used in the Unix C Shell to suspend the current process.
Source 2:
www.computinghistory.org.uk/det/5942/First-edition-of-the-ASCII-standard-was-published
Author: None listed.
Excerpt:
The American Standard Code for Information Interchange (acronym: ASCII) is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text. Most modern character-encoding schemes, which support many more characters than did the original, are based on ASCII.
US-ASCII is the IANA preferred charset name for ASCII.
Historically, ASCII developed from telegraphic codes. Its first commercial use was as a seven-bit teleprinter code promoted by Bell data services. Work on ASCII formally began October 6, 1960, with the first meeting of the American Standards Association's (ASA) X3.2 subcommittee. The first edition of the standard was published during 1963, a major revision during 1967, and the most recent update during 1986. Compared to earlier telegraph codes, the proposed Bell code and ASCII were both ordered for more convenient sorting (i.e., alphabetization) of lists, and added features for devices other than teleprinters.
Source 3:
nemesis.lonestar.org/reference/telecom/codes/ascii.html
Author: Frank Durda IV
Excerpts:
The United States of America Standard Code for Information Interchange (USACII, later renamed American Standard Code for Information Interchange, or simply "ASCII") describes a communications system where 7-bit words represent printable symbols and control codes. The 1963 USACII standard went through numerous revisions between 1963 and 1968, when it was formally adopted in 1968 by the American National Standards Institute (ANSI).
[...]
ASCII Code Divisions and Categories
The ASCII code is divided into three main divisions and five categories as shown in this table:
[table altered to be the following list]
Division: Control
- ASCII Range (Decimal): 0 to 31, 127
Division: Basic Printable
- ASCII Range (Decimal): 32 to 95
-- Subcategory: Symbols and Punctuation (32-47, 58-64, 91-95)
-- Subcategory: Numbers (48-57)
-- Subcategory: Uppercase Letters (65-90)
Division: Extended Printable
- ASCII Range (Decimal): 96 to 126
-- Subcategory: Lowercase Letters (97-122)
-- Subcategory: Extended Symbols and Punctuation (96, 123-126)
The extended printable character set was deliberately arranged so that if a symbol was received in this range and could not be displayed due to limitations of the printing or display device, the symbol in the basic printable range exactly 32 (0x20) positions earlier could be substituted and would provide reasonable results. In such situations, "{" and "}" would be displayed or printed as "[" and "]", while lowercase letters would be displayed or printed in uppercase.
[...]
This design of ASCII was intentionally organized to allow simpler display devices to be produced that only had to print 62 of the 94 ASCII printable codes and could substitute something "close" when asked to display an ASCII character that the device was incapable of producing, such as using the uppercase letter when the lowercase letter could not be printed.
[...]
Early Uses of ASCII and alternate coding systems
One of the earliest 7-bit ASCII devices was an improved line of electro-mechanical printers made by the Teletype corporation. With an operational speed of up to 10 characters per second, these devices were used worldwide for message transmission by Western Union, various news wire services and the military. Later, these devices found new uses as input/output devices connected to computer systems that also communicated using the ASCII character set.
The most widely-manufactured Teletype model was number 33, which was sold under a variety of model names such as the KSR-33 and ASR-33. These devices could only print the basic printable character portion of the ASCII character set (64 characters). This limited these devices to uppercase letters, numbers and most punctuation characters as shown in the table above. Some early video terminals and computers (such as the Digital Equipment Corporation VT50 and the Radio Shack TRS-80 Model I) supported only the basic printable set of characters, despite being designed and manufactured years after the ASCII extended character set was adopted. Some manufacturers did offer upgrades that allowed for the display of all ASCII printable characters.
Prior to the introduction of the ASCII-based teletype printers, the Teletype corporation produced teleprinters that used Baudot or "5-Level" character codes, operating at speeds between 40 and 75 baud. These were widely used for over thirty years, but were largely removed from service by the mid 1960s.
IBMs earlier mainframe computers (notably the IBM 360 and 370 families) did not use ASCII. Instead, they used an alternate character coding system called EBCDIC which was devised by IBM as a way to ensure that any peripherals to be connected to IBM computers were also made by IBM. IBM eventually lost this battle and by the late 1970s, it was common to see IBM systems that used EBCDIC internally, but had external communication processors that translated transmissions between IBMs EBCDIC and what other equipment makers were using, which was ASCII.
From source 3, I also learned that:
- "Paper Advance" is another term for "Line Feed".
- LF = Paper Advance one line or move cursor down one line. If VDT is at the bottom of screen already, scroll screen one line or wrap to top, depending on settings.
- UNIX system display routines treat LF as though it received CR and LF in most situations. However, TCP communication software on UNIX systems running in the default "cooked" mode must use the proper CR/LF sequence to end a given line of ASCII text that is transmitted or received.
- Vertical Tabulation = Paper Advance by number of lines dictated by the control tape or similar mechanism.
- Form Feed = Paper Advance to next page, screen clear and/or position to top or bottom line on some VDTs.
- Carriage Return = Move print head or cursor to column 1.
Source 4:
www.cs.mcgill.ca/~rwest/wikispeedia/wpcd/wp/a/ASCII.htm
Author: The content found at the hyperlink www.cs.mcgill.ca/~rwest indicates that the site author is Robert West. However, this content originally comes from Wikipedia, and has been curated / hand-selected (by whom?).
Excerpts:
Like other character representation computer codes, ASCII specifies a correspondence between digital bit patterns and the symbols / glyphs of a written language, thus allowing digital devices to communicate with each other and to process, store, and communicate character-oriented information.
ASCII is, strictly, a seven-bit code, meaning that it uses the bit patterns representable with seven binary digits (a range of 0 to 127 decimal) to represent character information. At the time ASCII was introduced, many computers dealt with eight-bit groups ( bytes or, more specifically, octets) as the smallest unit of information; the eighth bit was commonly used as a parity bit for error checking on communication lines or other device-specific functions. Machines which did not use parity typically set the eighth bit to zero, though some systems such as Prime machines running PRIMOS set the eighth bit of ASCII characters to one.
ASCII only defines a relationship between specific characters and bit sequences; aside from reserving a few control codes for line-oriented formatting, it does not define any mechanism for describing the structure or appearance of text within a document. Such concepts are within the realm of other systems such as the markup languages.
[...]
ASCII developed from telegraphic codes and first entered commercial use as a seven-bit teleprinter code promoted by Bell data services in 1963. The Bell System had previously planned to use a six-bit code, derived from Fieldata, that added punctuation and lower-case letters to the earlier five-bit Baudot teleprinter code, but was persuaded instead to join the ASA subcommittee that had started to develop ASCII. Baudot helped in the automation of sending and receiving telegraphic messages, and took many features from Morse code; however, unlike Morse code, Baudot used constant-length codes. Compared to earlier telegraph codes, the proposed Bell code and ASCII both underwent re-ordering for more convenient sorting (especially alphabetization) of lists, and added features for devices other than teleprinters. Bob Bemer introduced features such as the 'escape sequence'. His British colleague Hugh McGregor Ross helped to popularize this work, as Bemer said, "so much so that the code that was to become ASCII was first called the Bemer-Ross Code in Europe".
[...]
Many more of the control codes have taken on meanings quite different from their original ones. The "escape" character (code 27), for example, was originally intended to allow sending other control characters as literals instead of invoking their meaning. This is the same meaning of "escape" encountered in URL encodings, C language strings, and other systems where certain characters have a reserved meaning. Over time this meaning has been coopted and has eventually drifted. In modern use, an ESC sent to the terminal usually indicates the start of a command sequence, usually in the form of an ANSI escape code. An ESC sent from the terminal is most often used as an "out of band" character used to terminate an operation, as in the TECO and vi text editors.
The inherent ambiguity of many control characters, combined with their historical usage, has also created problems when transferring "plain text" files between systems. The clearest example of this is the newline problem on various operating systems. On printing terminals there is no question that you terminate a line of text with both "Carriage Return" and "Linefeed". The first returns the printing carriage to the beginning of the line and the second advances to the next line without moving the carriage. However, requiring two characters to mark the end of a line introduced unnecessary complexity and questions as to how to interpret each character when encountered alone. To simplify matters, plain text files on Unix systems use line feeds alone to separate lines. Similarly, older Macintosh systems, among others, use only carriage returns in plain text files. Various DEC operating systems used both characters to mark the end of a line, perhaps for compatibility with teletypes, and this de facto standard was copied in the CP/M operating system and then in MS-DOS and eventually Microsoft Windows. The DEC operating systems, along with CP/M, tracked file length only in units of disk blocks and used Control-Z (SUB) to mark the end of the actual text in the file (also done for CP/M compatibility in some cases in MS-DOS, though MS-DOS has always recorded exact file-lengths). Control-C (ETX, End of TeXt) might have made more sense, but was already in wide use as a program abort signal. UNIX's use of Control-D (EOT, End of Transmission) appears on its face similar, but is used only from the terminal and never stored in a file.
While the codes mentioned above have retained some semblance of their original meanings, many of the codes originally intended for stream delimiters or for link control on a terminal have lost all meaning except their relation to a letter. Control-A is almost never used to mean "start of header" except on an ANSI magnetic tape. When connecting a terminal to a system, or asking the system to recognize that a logged-out terminal wants to log in, modern systems are much more likely to want a carriage return or an ESCape than Control-E (ENQuire, meaning "is there anybody out there?").
[...]
Structural features:
- The digits 0-9 are represented with their values in binary prefixed with 0011 (this means that converting BCD to ASCII is simply a matter of taking each BCD nibble separately and prefixing 0011 to it).
- Lowercase and uppercase letters only differ in bit pattern by a single bit simplifying case conversion to a range test (to avoid converting characters that are not letters) and a single bitwise operation. Fast case conversion is important because it is often used in case-ignoring search algorithms.
[...]
The blend word ASCIIbetical has evolved to describe the collation of data in ASCII-code order rather than "standard" alphabetical order.
The abbreviation ASCIIZ or ASCIZ refers to a null-terminated ASCII string.
[...]
This reference article is mainly selected from the English Wikipedia with only minor checks and changes (see www.wikipedia.org for details of authors and sources) and is available under the GNU Free Documentation License. See also our Disclaimer.
Source 5:
www.aivosto.com/articles/charsets-7bit.html
Author: Unknown
Excerpts:
When computers were young in the early 1960s, it was decided that text should be represented with 7 bits for each character. Seven bits would be enough to represent 128 different characters, including letters, numbers, symbols and required control codes. 6 bits were too few. 8 bits were considered too much. The standard became 7.
ASCII (American Standard Code for Information Interchange) was the first 7-bit character set to be standardized. During the years, several revisions of ASCII were published. ASCII based character sets became immensely widespread. Most character sets in current use are based on ASCII in a way or another.
[...]
Revisions of ASCII
ASCII has undergone several revisions to become the character set we know today. The history of ASCII is not always fully understood. As an example, IANA lists ASCII as the same thing as ANSI_X3.4-1968 and ANSI_X3.4-1986. This is not entirely accurate. The 1968 revision was ambigous. The ambiguities were fixed later, making the 1986 revision different from the 1968 revision.
[...]
ASCII-1963 (ASA standard X3.4-1963) was the initial release of ASCII. It was in many ways different from the ASCII in current use. ASCII-1963 didn't yet gain wide acceptance. One of the reasons is that IBM chose to use EBCDIC, an IBM proprietary character set, in its successful SYSTEM/360 series of computers released in 1964.
ASCII-1965 was an unpublished major revision. It looked a lot like the current ASCII, even though there were differences with certain characters. ASCII-1965 was accepted as a standard, but it went unpublished and unused.
ASCII-1967 (USAS X3.4-1967) was a major revision of the previous versions of ASCII. This was the version that eventually evolved to the ASCII we know today.
ASCII-1967 was not exactly what we currently think of as ASCII. The differences are as follows. ASCII-1967 offered some options for certain characters, and one character was totally ambigous. The Number Sign (#) could be replaced by the symbol £. Two characters could be stylized. The Exclamation Point (!) could be stylized as a logical OR (|) and the Circumflex (^) could be stylized as a logical NOT (¬). Character 7C, even though called a Vertical Line, looked like a broken vertical bar (¦). It looked that way to avoid confusion with a solid vertical bar (|) used as a logical OR. In other words, since character 21 could sometimes look like (|), 7C had to look like (¦).
Character 7E was ambiguous. This character had three functions. It was 1) Overline when used as punctuation, 2) Tilde when used as a diacritic, and 3) General Accent, yet another diacritic which could be used for other accents not specifically provided. The character appeared in two shapes, upper tilde ([deleted]) and midline tilde (~), interchangeably. No explanation was provided as to which shape to use and when. The character did not look like an overline (¯), even when it was called Overline. As if they couldn't decide what this character really was for. The midline shape (~) may have been unintentional. The midline position conflicts with the intended use either as a diacritic or as an overline. Ambiguity regarding the shape seems to have originated in ASCII-1965, where it may have been a typographical error or restriction.
ASCII-1968 (USAS X3.4-1968) was a minor revision. It didn't change any of the graphic characters. The only change was to the "newline" function. LF could now be used alone as a newline. The previous versions required the use of CR LF (or LF CR). The 1968 standard also gave the code its name ASCII or USASCII.
ASCII-1977 (ANSI X3.4-1977) fixed some of the ambiguities of ASCII-1967 and ASCII-1968. The Number Sign (#) could no longer be replaced by the Pound (£). Character 7C was now a Vertical Line (|) that no longer looked like a broken vertical bar. One could no longer stylize the Exclamation Point (!) as a (|) or the Circumflex (^) as a logical NOT (¬). Overline was no longer present; it was simply a Tilde ([deleted], not ~). That character could no longer be used as a General Accent either. ASCII-1977 also changed the definitions of several control characters. The changes did not necessarily change the intended use of these characters. An essential change was with VT and FF: it was now possible to allow an "optional implicit CR" after VT and FF the same way it was already possible with LF. More changes can be found in Control characters in ASCII and Unicode [ www.aivosto.com/articles/control-characters.html ].
ASCII-1986 (ANSI X3.4-1986) did not change the character set nor the control characters.
- Question: "Overline was no longer present; it was simply a Tilde ([deleted], not ~)." - but today, in ASCII, the tilde is the midline tilde. Why?
-- Alternatively: Is this statement a mistake? Should it be "it was simply a Tilde (~, not [deleted])."?
- Note: I have replaced the upper tilde Unicode character with "[deleted]".
Source 6:
www.unicode.org/charts/PDF/U0000.pdf
Author: Unknown.
This source supplied the names of the printable characters.
Changes to the original text:
- I have not always preserved the format of any excerpts from webpages on other sites (e.g. not preserving the original bold/italic styles, changing the list structures, not preserving hyperlinks).
- I have not always preserved the spellings in excerpts from webpages on other sites (e.g. I may change "execpt" to "except").
[end of notes]