Actually the Greek question mark (;) looks like the Latin semi-colon (;)!
Last time I looked it up I think I found they are the same characters, and I tried compiling C with a Greek question mark instead of a semi-colon and it compiled fine! But I’m curious if it was because of something else, like my computer’s keyboard layout, or the compiler simply being able to handle them 🤔
Something somewhere was definitely doing the conversion for you, but it could have been your editor, the compiler or something in between like a C preprocessor directive getting loaded in by your configuration.
In C and C++, the source character set is implementation defined. This means that each compiler sets its own rules about what characters are accepted. For example compilers could choose to accept ASCII or EBCDIC or Unicode, or some combination, etc.
So the ISO standard will say that ; character is the end of statement punctuation. But it is up to the compiler to say which character(s) or code point(s) represent the ISO ;.
The ISO standards also require compilers to define a separate execution character set to specify values that can be stored in char and used with the string library functions. The execution character set doesn’t have to be the same as the source character set.
Edit: I should also mention that the rules for this stuff are changing a lot in ISO C23 and C++23. (Which standards I haven’t yet personally adopted.) Basically the ISO 23 standards mandate compilers to support UTF-8 source files, and they map every source character in the ISO standard to its corresponding Unicode character.
Mhh today I learned. That’s wild. I would have thought that any sane person would allow only 7-bit ASCII for the source code, and forward-compatible character sets in strings (every standard iteration being allowed to add characters, but not remove them).
Unicode truly is amazing.
Like that fake apple site that uses the Cyrillic A instead of the Latin A.
Or the Greek question mark being a different code to Latin question marks.
That would be a homograph attack: https://en.wikipedia.org/wiki/IDN_homograph_attack
Actually the Greek question mark (
;) looks like the Latin semi-colon (;)!Last time I looked it up I think I found they are the same characters, and I tried compiling C with a Greek question mark instead of a semi-colon and it compiled fine! But I’m curious if it was because of something else, like my computer’s keyboard layout, or the compiler simply being able to handle them 🤔
Wait, does C read like valley girl speech in Greek?
Shit - the next five weeks I’ll read C++ lines in upspeak in my head :(
Something somewhere was definitely doing the conversion for you, but it could have been your editor, the compiler or something in between like a C preprocessor directive getting loaded in by your configuration.
I’d be pissed if it was my editor. A compiler used on a global scale would make sense.
Nah, I would absolutely want my compiler to error out hard on characters that are not allowed per the standard.
In C and C++, the source character set is implementation defined. This means that each compiler sets its own rules about what characters are accepted. For example compilers could choose to accept ASCII or EBCDIC or Unicode, or some combination, etc.
So the ISO standard will say that ; character is the end of statement punctuation. But it is up to the compiler to say which character(s) or code point(s) represent the ISO ;.
The ISO standards also require compilers to define a separate execution character set to specify values that can be stored in char and used with the string library functions. The execution character set doesn’t have to be the same as the source character set.
Edit: I should also mention that the rules for this stuff are changing a lot in ISO C23 and C++23. (Which standards I haven’t yet personally adopted.) Basically the ISO 23 standards mandate compilers to support UTF-8 source files, and they map every source character in the ISO standard to its corresponding Unicode character.
Mhh today I learned. That’s wild. I would have thought that any sane person would allow only 7-bit ASCII for the source code, and forward-compatible character sets in strings (every standard iteration being allowed to add characters, but not remove them).
At the time that C was designed, ASCII was not a universal standard. It was one encoding competing with other encodings.
Ok that’s a fair point I had overlooked. Thanks for explaining.