I’ve received feedback that this binary stuff is too complex to understand. So let me start at the very foundation and build up from there.
In this post, I’ll talk about how we can use a smaller set of binary digits to represent large counts (aka “numbers”) and to illustrate how endianness works.
Computers only serve two broad purposes: to identify things and to count things (I’m going to work hard to avoid using the term “number” and use “count” instead).
In our computer systems, we use counts (alright, numbers) to identify things so let’s see how we can get counts in our computers. In the following “dummies” posts, I’ll talk about how we use the counts to identify things.
Let’s start with something we can all understand. Let’s use our hands as our computer since our fingers and hands are similar to a computer’s configuration. In computers, we have only two states from which we need to base all of our counts, on or off. So with our hands we also have two states, finger up or down.
With our two hands, we can see that we can count up to ten. But what if we want to count more than that? Let’s ask a friend to help by bringing their two hands.
Together we can now count to twenty. But we need more. We could just add more friends (memory in the computer); but we need to be able to count to larger numbers using just our two sets of hands.
What if we devise a system where my friend stands to my right and every time I hit a ten count, I tell him and he raises a finger to represent the ten I have, and then I lower my fingers and start the digit counts again?
For example, you might stand across the street as we count cars and by looking at our fingers and hands, you will know how many cars we have counted. Let’s say he has nine fingers up and I have seven… you will know that we have his ninety (his nine fingers representing the count of tens, times ten) PLUS the seven fingers I have or ninety-seven.
But we have a very busy intersection and we need to count hundreds of cars. What then?
We need another friend, this one can represent the maximum number of counts from my friend to my right who is showing the counts of tens. In other words, every time my friend to the right has all his fingers up, representing a count of one-hundred, he will drop his fingers and our new friend will raise a finger.
Now, when you stand across the street, you might see our new friend on my far right showing three fingers (three 100 counts), and my friend to my right showing six fingers (six ten counts) and I’m showing nine fingers (nine count), you could see that have counted 369 cars. This is far more cars than we could have shown by just summing our digits: 30 cars!
KEY POINT: If we know the maximum number of cars we would ever need to count, we would know how many friends we would need to help us with our counts. If we want to keep our friends, we would not waste their time by asking them to come stand with us when we know we’d never count enough cars for them to show a count! In computers, we use “type containers” (such as Integer, Integer32, etc.) to hold our counts and these containers are just that: containers to hold a maximum count of things. We’ll use these counts later for identification. However, for now understand that we waste resources (friends) if we use containers that are larger than what we need for our counts.
So what about the Endians?
What if you (across the street) misunderstand our instructions that the friends to MY right indicates the ever increasing sizes and you think they are on your right as you face us across the street. So instead of starting with my friend to my far right and seeing 3 – 6 – 9; you start with me, on your far right, and see 9 – 6- 3 or 963 cars!
We must agree in advance which end of our line holds the big numbers and which end holds the little numbers; or the BigEnd and LittleEnd. In computers this is know as Big Endian and Little Endian.
OK, so now we understand Endianness: it is an agreement about which end of a sequence of digit containers (people in our example; but bytes in a computer) is the big end or which is the little end.
Base 10, Base 8
In all computers, the primary container size is not ten (I have no idea why) but is always eight; that is, the fundamental container size in all computers is called a byte which can hold eight digits (called bits). We could simulate that with our friends as long as we only used fingers, with no thumbs, and so had only eight possible counts per person.
Key Point: in all computer systems, we use this 8 digit (bit) container called a byte and we build up our counting containers (types) from there! All of our counts are stored in these bytes; and they can be represented by different numbers. That is: different numbers can indicate the same count, depending on the base of the count.
Fortunately, no matter what “Base” we use, the concepts are the same:
As we move from the Little End to the Big End, each new container will show the digits as the sum of the maximum of the previous container. So naturally, as we change the count of the first container (the “base”), our numeric representations will change.
Let’s use our hands and friends’ hands once again to illustrate. But this time, we decide to not use our thumbs so we can only show a Base count of eight.
When I count eight cars, I will raise a finger for each car until I hit a count of eight. At that time, I will lower my fingers and my friend to the right will raise one finger. See, just as we did with all ten fingers and thumbs; but now we change our base to eight.
So, assume I have one finger up, the friend to my right has six fingers up, and our newest friend has five fingers up. You would see the counts for each of us as: 561
But this number, 561, represents a count of cars and it’s NOT five hundred and sixty one cars since it’s in Base 8. Let’s see what the count is:
My one count (1): one PLUS
My friend’s six times my full-count of 8: forty-eight PLUS
Our new friend’s five times my friend’s full-count of sixty-four: three-hundred-twenty
Our counts of Three-Hundred-Sixty-Nine cars when represented by all ten fingers on both hands (Base 10) is a number of 369.
Our counts of Three-Hundred-Sixty-Nine cars when represented by just the eight fingers on both hands (no thumbs) is a number of 561
So the same count is represented by two very different numbers!
Fortunately, in our computer systems this makes no difference since numbers are generally just used for convenience for us anyway. But for us simple programmers, we need to convert and display numbers and so we need to understand how we represent the counts in the underlying system!
Interesting but What about Binary?
Remember when we started, we used ten fingers and two people and we could get a count up to twenty by just holding up a single digit for each car? If we had a count of seven cars, we could hold up any fingers between us as long as they added to seven and the count would be correct.
But when we put a rule around the placement of the digits, we were able to greatly increase our counts since we could then use multiples of the previous container (person).
But still, using our Base 10 example, I could raise any two of my fingers and my friend could raise any two of his fingers and they would represent twenty-two; regardless of their position.
But what if we made every position important? What if we made our container size two?
Since this is how computers work, let’s look at it.
Let’s line up eight chairs and I’ll ask seven friends to help me. We stand in front of the chairs and you go across the street as we count our cars. Now, instead of holding up fingers, we either stand or sit. When we stand, we indicate a count of one.
The first car passes and I stand up.
The second car passes and I sit down and the friend to my right stands up and I sit down.
Now this is where it gets a little strange so pay attention. Binary is typically represented as one or zero (1 or 0) but that’s not accurate. It really is “on” or “off”. The key point is that both have value! Therefore, even though we show the data as zero and one, it really is binary or each “on” or “off” represents a count.
So back to our cars, two cars have passed and I’m sitting down and the friend to my right stands up. You see: 10. Our rule of counting still stands if we remember that in binary, we are really at Base 2. So our first place can be zero or one (two values). The second can also be zero or 1 but that value represents the total count of the base container which is two.
So the binary 10 (big end is left) is my zero PLUS my friend’s one times the total count for the base container (my position) or two. And that’s correct.
When the third car comes, I stand up and you see 11.
When the fourth car comes, we need our third friend. He stands up and we sit down and you see 100 which represents a count of four in binary. See? The concepts of counting remain the same! Cool.
So let’s summarize all we’ve learned with a final bogus example. I’m a geek and only have three friends. However, being a smart geek I want to use my friends efficiently and still be able to count many, many cars. Unfortunately, one my friends was born with no thumbs so we decide to use eight fingers for each of us. We have 32 digits between us (sound familiar?). How many cars can we count? Let’s see.
We have to count in order from left to right since location of the digits makes a huge difference. As we just saw, we can represent four cars with only three digits, but now math sets in. Our rule can be reduced to a formula.
If our count is “N”, we can determine the maximum counts of each place as follows:
Place 2: N * N (N squared)
Place 3: N * N * N (N cubed)
..see the pattern?
So our maximum count is the base count raised to the position of the “number”.
In our Base 10 example, you see that works. The first (little end) is 10. The next is 10 * 10, and the last is 10 * 10 * 10
So the same rule applies to our binary. Since I use eight fingers, by myself I can count:
Base count of 2 raised to the eighth power. I can count two-hundred-fifty-six cars using just eight fingers and including a count of zero.
When I add my first friend, we can get a count of two raised to the 16th power or over sixty-five-thousand cars.
When I have all three friends join me, we can raise up to eight fingers each or a total of 32 positions which would give us a total of: 4,294,967,296 cars. I can do a lot with just a few good friends, even if they don’t have thumbs.
Think about this for a moment. With only four bytes of eight bits (which matches our example) I can identify over four billion things on a computer if we use counts to identify our things.
Summary – The Key Points
- We don’t care about numbers, we only care about counts.
- We use the counts to identify stuff in our programs.
- In all computers, every count starts with 8 digit (8 “bit”) containers so all counts are ultimately represented in Base 8 numbers.
- When our counts exceed the size of base container, eight in computers, we need to roll over into larger containers (just as we added friends in the examples above). However, we we line up these containers, we must know which end holds the little numbers and which end holds the little numbers or our counts will be far off.
- To be efficient, we should group our bytes into the smallest containers possible.
Application to Our DNA Data
- We don’t need to use ASCII containers that are designed to identify all the characters in the alphabet when we only need to identify four or a few more items.
- We should design a custom container where we define which end is big in advance and build a custom loader in our program that always loads the data so it’s big to little end is always the same. This will make our programs platform independent.
- Once we identify our DNA with these very small counts, we simply look for matches by comparing the counts. For example, if we have an 11-mer, we get the count that represents that unique count of “letters” and use that to compare to our test count. We don’t care about the individual digits (unless we want to) when we do the comparisons so this is extremely efficient.
Our next post will move to how we use our counts to represent everything in computers. This will help us understand fundamental things such as storage and data transmittal; but also help us with things such as data compression and encryption. Once you understand these concepts, even encryption becomes pretty simple.