Does C and C++ guarantee the ASCII of [a-f] and [A-F] characters? -

April 15, 2011

i'm looking @ following code test hexadecimal digit , convert integer. code kind of clever in takes advantage of difference between between capital , lower letters 32, , that's bit 5. code performs 1 or, saves 1 jmp , 2 cmps.

static const int bit_five = (1 << 5); static const char str[] = "0123456789abcdefabcdef";  (unsigned int = 0; < countof(str); i++) {     int digit, ch = str[i];      if (ch >= '0' && ch <= '9')         digit = ch - '0';     else if ((ch |= bit_five) >= 'a' && ch <= 'f')         digit = ch - 'a' + 10;     ... }

do c , c++ guarantee ascii or values of [a-f] , [a-f] characters? here, guarantee means upper , lower character sets differ constant value can represented bit (for trick above). if not, standard them?

(sorry c , c++ tag. i'm interested in both language's position on subject).

there no guarantees particular values you shouldn't care, because software never encounter system not compatible in way ascii. assume space 32 , 65, works fine in modern world.

the c standard guarantees letters a-z , a-z exist , fit within single byte.

it guarantee 0-9 sequential.

in both source , execution basic character sets, value of each character after 0 in above list of decimal digits shall 1 greater value of previous.

justification

there lot of character encodings out in world. if care portability, can either make program portable different character sets, or can choose 1 character set use everywhere (e.g. unicode). i'll go ahead , loosely categorize existing character encodings you:

single byte character encodings compatible iso/iec 646. digits 0-9 , letters a-z , a-z occupy same positions.
multibyte character encodings (big5, shift jis, iso 2022-based). in these encodings, program already broken , you'll need spend time fixing if care. however, parsing numbers still work expected.
unicode encodings. digits 0-9 , letters a-z, a-z occupy same positions. can either work code points or code units freely , same result, if working code points below 128 (which are). (are working utf-7? no, should use email.
ebcdic. digits , letters assigned different values values in ascii, however, 0-9 , a-f, a-f still contiguous. then, chance code run on ebcdic system zero.

so question here is: think hypothetical fifth option invented in future, somehow less compatible / more difficult use unicode?

do care ebcdic?

we dream bizarre systems day... suppose char_bit 11, or sizeof(long) = 100, or suppose use one's complement arithmetic, or malloc() returns null, or suppose pixels on monitor arranged in hexagonal grid. suppose floating-point numbers aren't ieee 754, suppose of data pointers different sizes. @ end of day, not closer our goals of writing working software on actual modern systems (with occasional exception).

Search This Blog

Plus Code

Does C and C++ guarantee the ASCII of [a-f] and [A-F] characters? -

justification

Comments

Post a Comment

Popular posts from this blog

r - Trouble relying on third party package imports in my package -

java - Intellij IDEA shortcut How to add new element (ex. class or package)? -

Payment information shows nothing in one page checkout page magento -