Deep-Dive System Documentation
Unicode Characters in Rust
Unlike languages where a character is represented by 8-bit ASCII or 16-bit UTF-16 code units, Rust's char type represents a full 32-bit Unicode Scalar Value (4 bytes). This ensures native, secure, and robust support for emojis, Cyrillic, Hanzi, and custom operating system key modifier markers (⌘).
let is_cmd: bool = true;
let modifier: char = if is_cmd { '⌘' } else { '⌃' };
Useful Methods
.is_ascii() -> bool
Checks if the character is a standard 7-bit ASCII character.
let ascii_char = 'A';
let emoji_char = '🚀';
assert!(ascii_char.is_ascii());
assert!(!emoji_char.is_ascii());
.is_numeric() -> bool
Checks if the character represents a Unicode digit character. This includes digits from other scripts (e.g., Devanagari or Arabic).
let c = '7';
assert!(c.is_numeric());
.is_alphabetic() -> bool
Checks if the character is alphabetic in the Unicode sense.
let c = 'R';
assert!(c.is_alphabetic());
.to_lowercase() -> ToLowercase
Converts a character to lowercase, yielding an iterator because a single Unicode character can lowercase to multiple characters (e.g., 'İ' to 'i' + '̇').
let c = 'R';
let lower: String = c.to_lowercase().to_string();
assert_eq!(lower, "r");
.len_utf8() -> usize
Returns the number of bytes required to encode this character in UTF-8. Will always return between 1 and 4 bytes.
let ascii = 'A';
let emoji = '🚀';
assert_eq!(ascii.len_utf8(), 1);
assert_eq!(emoji.len_utf8(), 4);
Quick Reference Guide
let letter: char = 'A';
let symbol: char = '⌘';