So letters are numbers?

Recently, whilst reading a book on Java programming, I discovered that chars, are actually numbers! Or at least in Java.

Before I go on, I’m going to explain a couple fundamentals about strings, and chars… First of all, this entire post, is recognized by WordPress, and the whole gang of serverside applications that go with it, as a string. What is a string? Just a bunch of text. In the Java, and JavaScript world, strings are just a whole lot of chars lined up one after the other! And so, we get into chars… Though, in both Java, and JavaScript, they are defined as chars, in english, we call them characters, or letters.

Anyways, I’ve basically dived into the science behind how Java stores strings! Think of a string, as a list, or as we programmers call them, arrays.

The sentance “Hello, World”, is really represented as:

H,

E,

l,

l,

o,

,,

And so on. Each item on the list is a char. Interestingly enough, chars are actually two types of variables at once! A number, and a letter. In your computer’s memory system, a char is actually stored as an octal number – Base eight! Why is this? Well, it’s simpler to store numbers base eight, rather than base ten (Decimal) for two reasons:

1. Computer memory is base two (Numbers, and other data consists of only ones, and zeroes.) Eight is power of two, but ten is not, so it’s simpler for the programmer, and computer, to use base eight. Why eight, and not, say, four? Because eight is the closes power of two value to ten. Had computers used 0’s, 1’s, and 2’s, such values might be base 9, which is the closest power of three value to ten. Unfortunately, at 1’s, 2’s, 3’s, and 4’s, we would jump to base 16, as 16 is more powerful than base 4, but just as close to base ten.

2. Computer memory is separated into parts of eight! There are eight bits to a byte, 1000 bytes to a kilobyte, 1000 kilobytes to a megabyte, 1000 megabytes to a gigabyte, and 1000 gigabytes to a terrabyte. (Currently, computers come with an average of 500 GB (gibabytes) of storage space) (Note that these are not exact, in fact there are just around 1024 MB in a GB, not 1000, as 1024 is power of 2)

So, now that I have schooled you on why computers internally use the number 26 to represent 32, I’ll explain what this all has to do with my interests… As stated earlier, chars have the capability to act as numbers! So, theoretically, someone could, (call me crazy), apply math to text, and thus encrypt text!

Imagine that the number for a is 36, and I want to encrypt it… I multiply 36 by a power of two, and divide it by two, generating a semi random letter. This a could now be ¥, or ?, or even ®! The secret to decryption would be to apply reverse math! Multiply the current char number by two, and then put that to a power of -2! Back to normal! :)

JavaScript is not so lucky. :( Chars are just plain chars, and you have to use special functions to convert them to their Unicode counterparts. Unicode is the system that determines what number each char gets. If different computers had different char codes, then if I wrote “I love you” on my computer, your computer might display it as “I hate yau”. Unicode is currently the standard char number system, and is what makes sure that my love letters carry out their purpose. ;)

Fun fact: Apparently, the letter ^ is a mathematical operator for JS, but has no relation to exponents, and is not documented anywhere on the internet (As far as I can search). I really can’t tell WHAT it does.

Leave a Reply