Does C and C++ guarantee the ASCII of [a-f] and [A-F] characters? -


i'm looking @ following code test hexadecimal digit , convert integer. code kind of clever in takes advantage of difference between between capital , lower letters 32, , that's bit 5. code performs 1 or, saves 1 jmp , 2 cmps.

static const int bit_five = (1 << 5); static const char str[] = "0123456789abcdefabcdef";  (unsigned int = 0; < countof(str); i++) {     int digit, ch = str[i];      if (ch >= '0' && ch <= '9')         digit = ch - '0';     else if ((ch |= bit_five) >= 'a' && ch <= 'f')         digit = ch - 'a' + 10;     ... } 

do c , c++ guarantee ascii or values of [a-f] , [a-f] characters? here, guarantee means upper , lower character sets differ constant value can represented bit (for trick above). if not, standard them?

(sorry c , c++ tag. i'm interested in both language's position on subject).

there no guarantees particular values you shouldn't care, because software never encounter system not compatible in way ascii. assume space 32 , 65, works fine in modern world.

the c standard guarantees letters a-z , a-z exist , fit within single byte.

it guarantee 0-9 sequential.

in both source , execution basic character sets, value of each character after 0 in above list of decimal digits shall 1 greater value of previous.

justification

there lot of character encodings out in world. if care portability, can either make program portable different character sets, or can choose 1 character set use everywhere (e.g. unicode). i'll go ahead , loosely categorize existing character encodings you:

  1. single byte character encodings compatible iso/iec 646. digits 0-9 , letters a-z , a-z occupy same positions.

  2. multibyte character encodings (big5, shift jis, iso 2022-based). in these encodings, program already broken , you'll need spend time fixing if care. however, parsing numbers still work expected.

  3. unicode encodings. digits 0-9 , letters a-z, a-z occupy same positions. can either work code points or code units freely , same result, if working code points below 128 (which are). (are working utf-7? no, should use email.

  4. ebcdic. digits , letters assigned different values values in ascii, however, 0-9 , a-f, a-f still contiguous. then, chance code run on ebcdic system zero.

so question here is: think hypothetical fifth option invented in future, somehow less compatible / more difficult use unicode?

do care ebcdic?

we dream bizarre systems day... suppose char_bit 11, or sizeof(long) = 100, or suppose use one's complement arithmetic, or malloc() returns null, or suppose pixels on monitor arranged in hexagonal grid. suppose floating-point numbers aren't ieee 754, suppose of data pointers different sizes. @ end of day, not closer our goals of writing working software on actual modern systems (with occasional exception).


Comments

Popular posts from this blog

tcpdump - How to check if server received packet (acknowledged) -