Does C and C++ guarantee the ASCII of [a-f] and [A-F] characters? -
i'm looking @ following code test hexadecimal digit , convert integer. code kind of clever in takes advantage of difference between between capital , lower letters 32, , that's bit 5. code performs 1 or
, saves 1 jmp
, 2 cmp
s.
static const int bit_five = (1 << 5); static const char str[] = "0123456789abcdefabcdef"; (unsigned int = 0; < countof(str); i++) { int digit, ch = str[i]; if (ch >= '0' && ch <= '9') digit = ch - '0'; else if ((ch |= bit_five) >= 'a' && ch <= 'f') digit = ch - 'a' + 10; ... }
do c , c++ guarantee ascii or values of [a-f] , [a-f] characters? here, guarantee means upper , lower character sets differ constant value can represented bit (for trick above). if not, standard them?
(sorry c , c++ tag. i'm interested in both language's position on subject).
there no guarantees particular values you shouldn't care, because software never encounter system not compatible in way ascii. assume space 32 , 65, works fine in modern world.
the c standard guarantees letters a-z , a-z exist , fit within single byte.
it guarantee 0-9 sequential.
in both source , execution basic character sets, value of each character after 0 in above list of decimal digits shall 1 greater value of previous.
justification
there lot of character encodings out in world. if care portability, can either make program portable different character sets, or can choose 1 character set use everywhere (e.g. unicode). i'll go ahead , loosely categorize existing character encodings you:
single byte character encodings compatible iso/iec 646. digits 0-9 , letters a-z , a-z occupy same positions.
multibyte character encodings (big5, shift jis, iso 2022-based). in these encodings, program already broken , you'll need spend time fixing if care. however, parsing numbers still work expected.
unicode encodings. digits 0-9 , letters a-z, a-z occupy same positions. can either work code points or code units freely , same result, if working code points below 128 (which are). (are working utf-7? no, should use email.
ebcdic. digits , letters assigned different values values in ascii, however, 0-9 , a-f, a-f still contiguous. then, chance code run on ebcdic system zero.
so question here is: think hypothetical fifth option invented in future, somehow less compatible / more difficult use unicode?
do care ebcdic?
we dream bizarre systems day... suppose char_bit
11, or sizeof(long) = 100
, or suppose use one's complement arithmetic, or malloc()
returns null
, or suppose pixels on monitor arranged in hexagonal grid. suppose floating-point numbers aren't ieee 754, suppose of data pointers different sizes. @ end of day, not closer our goals of writing working software on actual modern systems (with occasional exception).
Comments
Post a Comment