This chapter has the following sections within this page:
First, let's start with a little philosophy relating to the meaning of Discrete Mathematics and Numbers.
Think for a moment about your algebra, trigonometry, or calculus classes. The basic notion there was a continuum. You would have
So, what makes it discrete? We assign numbers – integers – to particular points on or within those. All each is is a point on an otherwise continuous number line or within a continuous graph.
On the basic number line, the distance between the point 0 and 1, and then 1 and 2, and -1 to 0 are identically the same, a distance of 1. Each of those is just an infinitely tiny point on their lines. And between each of those numbered points are an infinite number of infinitely tiny points on the same line. This is what makes it discrete, those numbered points. With these integer values, we really only mean those distinct points with equal distance between them. With simple integers, you do not get to even describe those in between points unless you scale down tighter the distance between these numbered points. (And, having done that, you still have infinitely many points still between each of these numbered points.)
Whatever the precision we choose for assigning these numbers to points, there are also infinitely many of these such points, whether their scope is from negative infinity to positive infinity or from zero to positive infinity. Discrete Mathematics allows for these infinitely many points. Computer science, as you will see shortly, does not really. Where the scope in mathematics might be these infinities, in computer science we must also deal with real limitations which limit the scope we can represent. You will see more of this shortly, but, as an example, with a byte's worth of bits, you only get to represent up to the value of 255.
It is handy, usually, to just think of these integer values as counters. We can count from 0 to 1 to 2 to 3 and so on for as large as an integer can handle. All representing equal distance, or equal time, or equal entities, or what have you between them. Not fractional values, just discrete count values as integers.
To represent some of the values between these pointers, we can enter the realm of fractions, really just an integer divided by another integer, a dividend over a divisor. And we can leave it at that. Our computers also have a means of representing what we think of as fractional values via a single "number" as well. (Much more on that later.) But even these are little more than redefining the format of a very finite number of bits allowing us to measure in units smaller than an integer or the values between integers. But, no matter how precise we can define these numbers with fractions, there still remains an infinite set of values between even those. Again, just discrete values representable on a number line or within a graph. Discrete Numbers.
Whatever the format, for each type of number format that we have or can invent in a computer, it has a precision, a scope, a range, a maximum value and corresponding minimum. We can have remarkable precision with these numbers and wide scope, but the precision is certainly not absolutely precise and the range is definitely not infinite. As programmers, never forget that. Scope has limits and precision can get lost.
In your computers from a physical perspective, you likely know that practically everything in computers is based on binary, here binary numbers based on bits, 0 or 1, Yes or No, True or False. We hang multiple of these bits together in strange and wonderful formats, with many of these understood by the computer hardware, with those abstractions further abstracted by we programmers with the intent of representing most everything. All just strings of bits with an abstraction applied to those bit strings per how we want to use them. But yet, as numbers, they remain discrete in nature. No matter what our other mathematics and science courses tell us, we simply cannot enter the full realm of all that we know, only discrete points within it.
Just philosophy really. It will make more sense as we get into these numbers in more detail.
So, let's talk about what we can represent with those numbers. We start by counting.Ten. Ten fingers. To each we defined a number, 1 - X , X being also the Roman numeral for 10. Later, as humanity started understanding that we needed to track even having nothing, that became 0-9. And then what? 10, 11, 12, ⋯ 20, 21,⋯ with these requiring more digits of value 0-9 on the left. Conceptually, ultimately, to more digits than we can count. That is called Base 10. We could have done the same thing if we had used only five fingers; 0-4, 10-14, 20-24. Base 5. Or the magical number 12 as in our clocks; Base 12 (or Base 60). Or just one. With one finger 0-1, a second finger 10-11 (representing so far 0-3 in decimal), a third 000-111 (0-7), and so on. Base 2, Binary. All of them for counting. Counting using Binary Numbers and power-of-two-based fractions.
More formally for binary numbers
which we represent in a bit-string format as
where the letters here merely represent positionally either the value 0 or 1; examples being 10101010, 11111111, 10000000, to whatever number of bits. For example, the binary number 01010101 is understood, when interpreted as a decimal number to be
These are Binary Numbers for as many bits (each only 0 or 1) as we choose to string together in this format.
For Decimal (Base 10), it's largely the same concept, but using more digit values in each position. For Base 10, as you know, we have
And even that Decimal number being represented as a string of characters as
where M and F-A can now take on the values 0 – 9; 12345678, 11111111, 90000000 being examples.
Whether Decimal or Binary (or Base 3, 4, 5, ⋯, 8 ⋯, 16, ⋯) they are all just different formats for your counters. And they can be converted from one to another simply by recognizing that they have the same basic format based on powers of their base number.
So, back to binary. There is no theoretical limit to the number of bits you can append in this way. BUT the reality in your computers is a bit different as we will be seeing. The predefined sizes also define the scope, the range of values you can represent. We cover that scope, that range, next.
These are just numbers for as many bits (0 or 1) that we choose to hang together. As you can see in the table, the range of numbers that they represent (shown as decimal) is dependent on the size of that string of bits. Each with its own range, but nonetheless a limited range. And also only a counter, no fractions.
Bit Count | Name | Signed Binary | Unsigned Binary |
---|---|---|---|
8 bits | Byte | -128 to 127 | 0 to 255 |
16 bits | Halfword | -32786 to +32767 | 0 to 65535 |
32 bits | Word | -2147483648 to +2147483647 | 0 to 4,294,967,297 |
64 bits | Doubleword | -9,223,372,036,854,775,808 to +9,223,372,036,854,775,807 | 0 to 18,446,744,073,709,551,615 |
You will have noticed that each has two different ranges. The difference is like those two number lines you saw above. One – called Signed Binary – assumes 0 is the center of those ranges and includes both positive and negative values, with range out to its full scope (for roughly half of the bits). The other – called Unsigned Binary – has the value of 0 at the left end and roughly doubles the range of positive values.
Each computer language has names for each of these sizes and representations. The most typical, almost a default, is the term INTEGER or INT, representing the 32-bit format and understood to be signed binary. We are not going to get into all of the variations for every language, but just know that there are typically types for all of these.
So, same number of bits, for all of these just a different interpretation and meaning. But why signed and unsigned and how do we tell the difference? It starts with the interpretation of the leftmost bit. If, say, a 32-bit value is to be interpreted as signed binary, the leftmost bit is 0 for its positive range and 1 or its negative range. As just an example for now, this is how you would interpret a set of different 32-bit strings.
32 bits | Signed Binary | Unsigned Binary |
---|---|---|
00000000 00000000 00000000 00000000 | 0 | 0 |
00000000 00000000 00000000 00000001 | 1 | 1 |
11111111 11111111 11111111 11111111 | -1 | Largest value ⋯ 4,294,967,297 |
10000000 00000000 00000000 00000000 | Most negative value ⋯ -2,147,483,648 | +2,147,483,648 (about half way) |
01111111 11111111 11111111 11111111 | Most positive value ⋯ +2,147,483,647 | +2,147,483,647 (one less than above) |
Notice again the range, something very important as we do operations on any of them; there is clearly a limit to all of these. We can count either direction to quite large numbers, but then what? We will see shortly.
As to the operations possible on these, the operations are most anything that we could possibly do on a number or even a sequence of bits. We add them, subtract them, multiply and divide them, compare them, shift them, truncate them, and more. And processor hardware logic exists to do a lot of it in well less than a nanosecond.
Let us take a look at a simple increment/addition by one. And to demonstrate about scope, we will add that one to the largest (positive) 32-bit signed binary value which just happens to be (as you saw above)
Again, that is 32 bits in size. Just as when you add one to decimal 999 (to produce decimal 1000), you get a carry to the next higher digit, overflowing the carry on the left. Same sort of thing with binary. We add the 1 starting with the least significant bit (the rightmost), with the result on that one bit being 0 and a carry of a 1. That carry gets propagated to the next bit, with that carry propagated ultimately through all 32 bits until we reach the leftmost 0; there we add the carry to the one-bit 0, producing the most significant 1 bit. By incrementing, by adding that one, our initial number
becomes
From a binary addition point of view, this makes sense right? If both numbers were interpreted as unsigned binary, all we really did was to add one to +2,147,483,647 to get +2,147,483,648. You can see that in the table above.
But as a 32-bit signed binary interpretation, there is a problem. The first number just happens to represent the most positive 32-bit signed binary number, there is nothing larger in 32 bits, and we want to add 1 to it.
Stepping back, then, go look for those two binary numbers in the table above. We are adding one to the most positive 32-bit signed binary number and the result has us rolling around to the most negative 32-bit signed binary number. Not really what you intended. What we really wanted was one more, but because of the limited number of bits – and there are always a limited number of bits – you get this as a result.
Reversing all of that, decrementing one from the latter number, we roll back to the former. Again, not what you want. That's scope, that's range. Bottom line ⋯ Be Careful. When defining your data types, be sure that you understand the limits of what you want to represent.
As an exercise for you, using 32-bit unsigned binary numbers, what happens when you add one to the largest number? (Hint: You get it wrapping back 0.)
Hexadecimal is a number base in its own right (Base 16) and follows all of the same rules of operations as on any base. For most of your experience in computer science, though, you will find it useful mostly as a shorthand for binary. You will understand in a moment, but as shorthand, it avoids you having to write 32 or 64 bits in a sequence and instead allows you to easily represent those same binary values in a quarter of the characters. In some programming development environments, even early programmers can have already seen hexadecimal in what appear to be error messages associated with addresses.
So, Hexadecimal. In Base 10 (Decimal) you have ten distinct characters, 0-9. Base 2 (Binary) has its two distinct characters, 0-1. For Base 16 (Hexadecimal), we need another six characters over and above that available in decimal. I do not know the history on it, but those 16 distinct characters are characters 0-9 and then additionally A-F.
The number 16 of Base 16 is also 2^{4}, meaning that each Hexadecimal digit (0-F) can represent the same values as four binary bits. And that mapping is as follow:
A time will come when this is second nature to you. For now, just use this table.
Again, though, for most of your purposes, you use this as a shorthand. Our number, the most positive 32-bit signed binary number is as bits
which when represented in hexadecimal is just
Each four bits is just a hexadecimal character. Reversing that, going from hexadecimal to 32 bits, the hexadecimal number
is just
You can, of course, do Base 16 math with those hexadecimal characters, just as you can do with decimal, but it is sometimes just easier to quickly reformat it as binary.
Aside from error message or accidentally printing an address from your programs, you will not often see hexadecimal. It tends to be seen more frequently when viewing a dump of memory when the organization of the contents of that memory is unknown like you might see when working in the bowels of an operating system.
Processor architectures also understand a type of binary data which most of us consider to be numbers with fractions. Here, though, the fractional part is integrated into this data type. It, too, though, is just a string of binary bits, with a format understood by the hardware and program compilers to be this thing with an integrated fraction. It is the format of those bits that carries the information.
There are a number of such formats of bit strings that do just that, some 16 bits, 32, 64 and 128 bits in size. We will look at just two of them here. With these sizes and formats we can interpret them to represent either very very large numbers or smaller and very very small numbers with fractional parts. In many cases with these formats, absolute precision is possible, but not always. These are Binary Floating-Point Numbers. Below are shown the typical 32-bit (single precision) and 64-bit (double precision) formats.
Both of these formats are interpreted as
Fortunately, you do not really need to memorize this, only note that there are a range of power of two exponents – both positive and negative exponents – a fractional part typically led by an implicit value of 1, and overall positive or negative sign.
To help you get your head around this suppose this format had an exponent equal to the bias and a fraction field of all zeros. So,
To represent the number 2, the fraction remains the same but the exponent becomes one larger as in
The number 3? In Binary that would be '11'B. In this format it becomes
For fractional additions on any of these, just add non-zero bits in the fraction field. For example, the number 1 1/2 becomes
Now go back and look at the difference between the numbers 3 and 1 1/2. The difference is only in the exponent; the latter is half – a power of two – of the former.
A bit strange, perhaps, but it works. For negative numbers, just set the format's "s" bit to 1. For positive numbers smaller than one, the "Exp" field is just set less than the bias.
With an ever increasing "Exp" field, we can produce some very large numbers. Decreasing that "Exp" field down to zero, we can also have some very small positive numbers as well such as
or, if you like, the same as
Of course, you also have quite a few "fraction" bits. But, as we will discuss shortly, with these fixed number of "fraction" bits, there is a tradeoff occurring with the precision of the number you want to represent and the range controlled by the "Exp" field. As with straight up binary integers, these are still just numbers – with fractions – on the number line there still remains even with these an infinite number of values between any two even consecutive versions of them. As a programmer, please be aware of that.
These binary floating-point numbers can, of course, be manipulated at least using the operations of addition, subtraction, multiplication, and division, but there are more.
Bottom line, you can count with these and you can estimate, some of very large numbers, very small numbers, including between 0 and 1, but always with discrete values, all limited by the number of bits you decide to string together to do that. After all, there is nothing particularly special about any of the formats shown above, except that the hardware itself is capable of very quickly manipulating them.
Hardware or not, you, yourself, can similarly string together bits, format them in various ways, write functions that work with them, and have them still execute relatively quickly. For example, consider formatting – say – 1024 bits using a structure like that above, and with that you have even larger numbers (and smaller) and fractional parts with even more precision. Against these you can execute the usual operations of at least addition, subtraction, multiplication, and division. If that is what you need to meet your needs, just do it. For example, if doing a multiplication, you would add the exponents and multiply the fractional parts. If addition, you would right shift the fraction of the smaller of them until the exponents are equal and then add the resulting fractions. It is all just strings of bits with the format carrying the information.
We programmers get to do a lot with these numbers. But that which we call "variables" in our programs are not that which is called a variable in algebra. Our variables are just boxes which get to hold only a finite set of different values because of their size as strings of binary bits. Floating-point variables get to hold a larger set, trading off resolution to do it, but it is still a finite set. Each of your program variables gets to hold only discrete values.
Next we dig down further into the notion of precision.
With counters, Integers (32 bits or 64 bits), you get the precision of a counter; no fractional parts along the number line. You also get the range of each; up to 2^{32} or 2^{64}, respectively. Big, but keep in mind that 32 bits, interpreted as signed, only gets you the ability to count up to about positive 2 billion.
With floating-point, when using smaller exponents, you are entering the realm of fractions along the number line. But as with practically all things in computers, we are still talking about binary and powers of two. With the 32-bit form – this being float in at least C++ and JAVA – it is possible to have fractional precision down to less than 1/2^{23}. Very tiny. For 64-bit form, double in C++ and JAVA, the precision moves down to below 1/2^{52}. Really very tiny. Still, binary. The fact that you can convert these formats to and from some decimal number with digits to the right of the decimal point is not relevant to the precision nor can it necessarily even be done precisely. For floating-point numbers between 0 and 1 - for example - it is still just power of two fractional values along the number line between 0 and 1. Just discrete values. And, as you'll be seeing, that precision changes as we move up the number line.
Let's do that and move up the number line from representing number between 0 and 1 up one power of two to represent numbers between 2 and 4; we did that by having merely increased the exponent by 1. Notice that the range doubled when we went from a power of two exponent of 0 to 1. What happens with the fractional part's precision when we did that? We after all still only get 23 or 52 bits to represent the fraction. Between 2 and 4, we get the same number of discrete fractional values, but those are spread over twice the range (from 2-4, up from 0-1). The precision between consecutive values of these fractional parts is lessened by a power of two. Keep incrementing the exponent and so moving your way up the number line, 4-8, 8-16, 16-32, all the way up to 2^{Exp-bias} for the 8 or 11-bit exponent values. The precision gets worse by a power of two for each jump up the number line. Again, we only have 23 or 52-bit fractional parts. It is not that this is bad, it just is, for you to be aware of. In fact, if you look at the range of the exponent values, for ever larger exponents, after a while you will notice that the numbers you are representing do not really even have a sub-integer fractional part. (At some point, integers have the greater precision than these floating-point numbers.) Same number of bits, different format, different interpretation and precision. It's all a trade-off.
To help clarify this, let's work with the 32-bit form, again shown below:
and recall how this is interpreted:
So, in the table below, let's work our way up in powers of two where the Exp-bias is shown as the Power column in the following table.
For numbers in the range of 1 through – say – 16384, we will just call precision of the fractional part of the number as being small. But, as our floating-point number increases above that, you can easily see that the precision worsens, well above what the precision you even have with basic integers. Said differently, the larger your number, the poorer the precision.
That has nothing at all to do with your ability to measure precisely, only with this format's ability to represent what you measured as a number. Depending on the size of the number, some precision might be lost transitioning from the precision of your tools into this format. This is merely a tradeoff, a tradeoff resulting simply from that fact that you are using only 32 bits of information. With 64 bits and the resulting larger fraction, the effect is the same, but the range of "small" precision is also quite a bit wider.
Of course, we can reverse that. The lowest "Power" is not 0, it is a negative number because the bias can be greater than the Exp field. Let's decrease the exponent (i.e., the Exp-bias) into the negative power of two range, and move ever closer to 0 on the number line. Each decrease of the exponent by one (from 2^{0} to 2^{-1} to 2^{-2} to 2^{-3}, and so on) effectively improves the precision by a factor of two. The precision for numbers between 1/2 and 1/4 is worse than that of numbers between 1/4 and 1/8 and so on Increasingly precise, but nonetheless discrete. And its for exactly the same reason; the range on the number line shortens, but we have the same number of fraction bits representing values between. Still, it is not absolutely precise; you only have 32 or 64 bits to work with after all.
Slightly changing subjects we are next covering Rounding. Just as you – and your calculators – can do with your traditional powers of 10, when doing mathematical operations on binary floating-point, results can and do get rounded if absolute precision cannot be maintained. With decimal, instead of rounding up or down to the next power of 10 value based on the desired number of fractional digits, we are of course talking about rounding up - or down - based on powers of two. It happens that the floating-point hardware is capable of multiple forms of rounding. Fortunately (unfortunately?) there is a default, round-to-nearest. From Wikipedia,
Java provides no way to alter this default. C++ does. The options are
What happens in the hardware when doing the usual operations on these numbers is that, while doing so, that hardware keeps a few more bits of significance on the low order bits. With the completion of the operation, these soon to be lost significant bits are used as part of the process of rounding. Cool, but it is necessary simply because absolute precision is just not possible.
Whether 32-bit or 64-bit numbers, processor hardware supports the rapid multiplication of binary numbers, up to a point. As has been pointed out, all that these numbers really are is just contiguous strings of bits perceived as such. Your numbers can be any length string of these bits. The trick, of course, is to provide operations – like multiplication – which understand and operate upon these bit strings. So, how do we execute a 64-bit multiply on our own?
Let's start with what you already know about decimal multiplication of any length. We will multiply 321 (the multiplier) by 123 (the multiplicand) manually. What do you do?
Well, that is sort of what we do.
When we do it ourselves, we tend to actually shift the multiplicand to the left by one and then two digits, padding a zero or two on the right, multiply each part, and then add. Although correct, we don't really think about padding each digit of the multiplier on the right with one or two zeros. We instead think in terms of shifting the multiplicand.
So, why did I create that mental image for you about decimal multiply? That happens to be a way of implementing binary multiplication. And, fortunately, all that we are going to be doing is shifting for each bit of the multiplier and adding when the next bit of the multiplier is 1. Afterall, you don't really need to multiply by each '1'b.
Consider the following examples:
In the first, since every bit of the multiplier is '1'b, we are going to just shift the multiplicand left for each bit and then add those. Another way we can think of this is that – like the decimal multiply where we multiplied by powers of 10 – here we are multiplying the multiplicand by powers of two (i.e., 2^{0}, 2^{1}, 2^{2}, 2^{3}) to account for the position of each bit in the multiplier.
In the second, two of the multiplier bits are 0. Those we are still effectively shifting, but we can ignore them in the addition, just as you would with a decimal digit of 0.
Of course, the multiplier is here only four bits, but you can see that this same algorithm can be used for any number of bits, as wide as you want to make this. And, of course, with each step there is a a shift and then add, creating a partial product along the way until we reach the most significant '1'b of the multiplier. The point? You do not need to limit yourself to – say – a 64-bit string of bits. You can create binary strings as large as you need them to be and implement for yourself all of the operations you implicitly use in existing programming languages.
The previous section dealt with fractions and the precision of fractions given some binary-based number format (a.k.a., binary floating-point) representing both. With the calculation of post-division remainder – and in computing the MOD operation – we are largely saying that we will treat a division result's integer part and fractional part as two separate things. Usually, we think of a remainder as just the stuff that is left over after dividing discrete integer values when the result is not again strictly an integer. You know, 7 divided by 2 is 3 1/2; the 1 above the 2 is the remainder. If there is a remainder, the 1/2 is sort of like saying "We did what we could, but you divided by 2 and the 1 is what we have left over". And this post-division representation is precise.
But it is also more subtle than that. It is also what you have when – say – you are interested only in the hours of a 12-digit clock. It is modulo counting, with whatever the number system is. For the clock it is modulo 12 or perhaps modulo 60. For example, it's now 6:00 and I add 10 hours to that; you know that the clock has 12 hours, so the result is 4:00, a Base 12 (Modulo 12) operation. Or I can divide 50 hours by 2 and I get 2 days (48 hours) with 2 hours as a remainder. This one is a Base 24 (Mudulo 12) operation. 10-based counters (Base 10) are just one of them. So, let's start there, with Base 10 (Modulo 10) counting.
With Base 10, the only possible numbers are the integer values 0-9. Negatives of these also exist, of course. If you divide 20 by 10 you get precisely 2. No question. Divide -31 by 10 and you get -3 with something left over, specifically a -1 as a remainder, so -1/10. Or we could use the more traditional form of -3 1/10. You know they mean the same thing. You understand that to be a precise value either way. (Keep the negative remainder in mind for later; it will be important.)
If you are familiar with traditional Long Division, no matter how big the numbers, the Remainder – and usually MOD – is whatever is left over; it is the value remaining and less than the Divisor after the step-by-step division is done between the Dividend and the Divisor.
I trust you are familiar with that more formal terminology of integer division as in
Nothing special here, right? Or said differently, having multiplied both sides by the divisor, we get
Now let's think about that equation another way, from the point of view of your program. Suppose you have an integer dividend and an integer divisor and let's have your program precisely divide these. That means, when done, you have an integer quotient and potentially a non-zero remainder. But, as integers, you have no idea whether there was a remainder or not. So, what is that integer remainder, even if it is zero?
So, Step 1, you do the only thing you can do and that is to create a quotient simply be dividing.
Step 2, given that integer quotient, turn around and multiply it with the by divisor, producing what we’re calling here an integer partial dividend.
Step 3, you get the integer remainder by subtracting that from the actual dividend.
As an example, recalling that everything is an integer, let's divide 11 by 5 to find the remainder:
Let's do that again, but this time with a negative quotient (as a signed binary integer):
This is the grounding for modulo arithmetic. But there are a couple of quirks.
Let's talk, for example, about MOD 24 and a 24-hour clock. Given that it is 12:00 midnight right now, what time is it in hours if we are interested in a time 48 hours in the future?
48 hours – our dividend – is divided by 24 – our divisor – and we get 2 – as in two days – as our quotient and 0 as our remainder representing hours – as in 12:00 midnight. Easy enough.
OK, next, a subtle change. What time is it 49 hours in the future given now is 12:00 midnight? Same number of days, of course, but the remainder is 1 hour, meaning 1:00 in the morning. Modulo 24 on 49 is 1 as the remainder.
Play the same game with a 12-hour clock, and again starting at 12:00.
still meaning 1:00. Or, saying the same thing,
Let's next get a bit weird with – say – a 10-hour clock, Mod 10. Midnight is 10:00. What time is it 42 hours in the future starting at 10:00?
So, the answer is 2:00 on a 10-hour clock. We care only about the remainder.
You can see the same as a picture here:
Modulo arithmetic, Base 10.
So, how should we be perceiving negative Modulo arithmetic? We saw what it does for remainder, but what of Modulo arithmetic? Is it the same? After all, everything we have discussed so far with Modulo arithmetic has assumed all positive values. So, what are the rules when our dividend is negative and the divisor is positive? Again, is it the same as for remainder? (We can ask the same question in the case where the divisor is negative and the dividend positive, but you can easily see that they are the same.)
In such a case, doing simple division, you know we would end up with a negative quotient. But what of the sign of the remainder and by extension of MOD operation, and what is its value in any case. Said differently, if doing a MOD operation on a negative number – with a positive base – what is the result?
Again, this might seem weird, but you, as programmers, might need to be aware. It is not what you might expect.
To give you a partial answer right up front, if dealing with dividends and divisors with different signs, you will get different values for remainder and MOD. So, be careful. Let's take a look at why.
Picture a number line, for now just on the positive side of 0. And on it we are going to divide 11 and -11 (both dividends) by positive 5 (divisor). Easily enough, you know that the quotient will be 2 and -2, respectively. Similarly, you know that the remainder for +11 / 5 is 1, and, additionally, for MOD base 5 as well.
If you did that same division using long division, you would get the same quotient value of 2 and a remainder of 1. Same results, given positive values.
So, what of -11?
Doing simple division, you would get -2 as the quotient. You might also presume that the remainder ought to be -1. After all,
But, now, check that out both positive and negative 11 with a tool like MS Excel as in
Or write a short program using the MOD operation - % in C/JAVA - to get these out.
You will indeed get 1 for the positive 11 as expected.
But what about latter as in
It is not 1; it is 4. The MS Excel function running MOD produced 4. Other languages supporting the MOD operation will produce the same thing. Why?
Keep in mind that we are doing this to help drive home what a Mod operation really is.
We will picture both of these, first on a number line and then with a clock with 5 digits (0-4), a MOD(5) clock.
Again, we are dividing 11 by 5. On the positive number line we get two regions of 5, totaling 10. To get to 11, we have to add in the remainder value of 1. Nothing special, right?
Now divide -11 by 5. What is the remainder? From above, we might want it to be -1, and it is, as a remainder.
We again have two regions of 5 (actually -5), totaling -10. To get to -11 we add in a remainder of -1. Again, pretty much what we would expect for a remainder operation. But what of that claim of +4 for
Let's look at a 5-digit clock as in the following and redo the positive case first as in +11 MOD(5).
Starting at 0 – at the top – we work our way around the clock in a clockwise direction 5 units and then 5 more units (i.e., twice for a quotient of 2) and then one more unit (i.e., the remainder of 1) to get to 11. Again, pretty much what we have believed for forever. Now, back to the -11, -11 MOD(5).
Again, the dividend is negative. So, we reverse the direction of the clock, moving counterclockwise. Ten units in that direction is two times counterclockwise around the clock, five at a time, starting at 0, brings us back around to 0. To get to -11, we go one more still counterclockwise. The result is 4, not 1.
See the effect? For positive numbers, the notions of remainder and modulo arithmetic are practically identical. Be careful, though, when dealing with negative numbers.
Still, even using our number line as a model, we can get the same result for MOD. Interestingly, if, having done manual division shown above to get the remainder and you do produce negative remainder (like -1), you can find the MOD value by simply adding in the Base value (here Base 5).
Producing what is the correct answer for MOD, again when dealing with negative numbers. You can see why this works by looking again at the clock-like graphic as well.
Perhaps you are familiar with the data structure called a Hash Table. If not, its purpose is to allow you to more rapidly search a list for some key value by essentially breaking the one big list into multiple lists, and from there only processing one of them, now considerably shorter. With a Hash Table, it ensures that all of the keys in any one of the multiple lists share the same Congruence Class. We will define that shortly. This, too, relates directly to the MOD operation.
To explain, suppose that you have a simple data structure – like a linked list – representing the key values of 10 through 59 as in the following. You could search the list, one item after the other for your key value. Once found, you would use the data found there in that node. For this tiny list, worst case would result in having to touch 50 items. No big deal here, of course. Up into the realm of thousand and millions, it matters. You want to find your item faster than having to touch most if not all of them.
We break that one long list up into multiple lists. How? By fronting each list with a pointer found in an array. The trick is to determine which list has your key value. You will notice that I have arbitrarily decided to make it 10 lists, just for clarity. This is 10 lists fronted by a simple array of – here – ten elements, each pointing to a different list. So, what made the decision of which list?
This figure shows the input Key Value as feeding some function, with the function's sole purpose to spit out an index into that array. Again, what we want first in an index value into this 10-entry array.
Take a look at the values of the keys in each individual linked list. What, here, is that function on our Key Value?
Momentarily, back to the term Congruence Class. Given any one list, the key values within a congruence class happen to be the key values represented in any one list. The set of key values in any one linked list are the congruence class. The "class" is a set. Array entry 0 is addressing nodes of the set where the keys are all divisible (with remainder of 0) by 10. Those keys make up this one congruence class, this set. Array entry 1 addresses a set where, if divided by 10, has a remainder of 1. Array entry 2 ⋯ Remainder 2 and so on.
So, again, what is that function on the Key Value in the blue box?
Here it happens to be MOD(KeyValue, 10). It is the remainder resulting from the division of the Key Value by 10, a value which is the array size. The array size was arbitrarily chosen to be 10, so we are using Base 10. Change the array size to be – say – a prime number and the base is that prime number. Of course, if that is your choice, the sets that make up each congruence class change with it. An array size of – say – 31 merely means that the base becomes 31. Doing so, and the set of keys in the nodes in the linked list addressed by array element 0 will be those keys that happen to be exactly divisible by 31 (with remainder 0).
In choosing that size and that function, your intent is to both limit that search time by limiting each list length and to keep the list lengths roughly equal. That's up to you and the likely distribution of your keys.
Back to Congruence Class; the key values in the set of nodes in any one linked list all share the same remainder value, the same MOD value. In this Hash Table, the "function" generates nothing more than the MOD value, which is in turn the index into the array. The indexed array entry anchors the linked list, each entry of the list holding a key value of the same Congruence Class. Of course, each list entry contains more than just the key value; the key merely represents the search parameter. By reorganizing the one list as many lists, we have not really changed what this data structure represents.
In this section we are going to use some interesting effects of MOD operations as a bit of a trick to find MODs on numbers that are really huge.
First, we start with an interesting fact about exponents. We know that
On MOD, the useful facts are
In words, the MOD on the sum and product of two (or more) numbers is the sum and product of the MOD of those same numbers.
For now, just keep these in mind.
The typical integer size in computer languages is 32 bits. If that is a signed binary quantity, its maximum size is 2^{31}-1, a positive integer. Big, but not all that big, a couple billion. Fortunately, most processors support 64-bit binary numbers. Because languages treat that size differently across languages, I will type the signed version of that as INT64 and the unsigned version of that as UINT64. Maximum sizes for those, respectively, are 2^{63}-1 and 2^{64}-1. (The -1 is to account for the value of 0.) Huge, but sometimes (rarely?) not huge enough.
These are just sizes of binary numbers understood by the processors. But you need not stop there, even on a computer. It happens that you can string these big binary numbers together to create still larger numbers by just treating one string of bits as being to the left of the other, and on and on. Addition/subtraction operations on these – say – 128-bit monstrosities can be handled by simply adding each half, taking into account the carry/borrow between them. If, for example, an addition of these has a carry from the lower half, we simply need to add one to the upper half. I could go on and talk about multiply/divide on these as well, but in this section it turns out that we do not have to.
In this section, we are interested in doing MOD operations on such very wide numbers of your making. Some operations in cryptography fall into this category. You will see shortly, that the base on these large numbers does not matter. In the coming example, to help get the point across, we use binary numbers.
Let's, again for this example, have a number as in – say – 2^{70} and we want to take the MOD (the remainder) of that with an arbitrary base with an arbitrary base, but our example number could be anything.
You will remember from above that you can do MOD operations by manually calculating a remainder. You would do a division (of divisor into dividend) to get a quotient, integer multiplied that quotient by the divisor, and subtracted that result from the dividend to produce the remainder. Works great for something that a processor can do quickly, but we are here interested in far bigger numbers. Even so, even for those a manual form of multiplication and division can be done as a function, doing something along the lines of what you do manually in Base 10. Each one would, though, take a while. Still, you do not have to for this. We are going to use the hardware's ability on fewer bits to do MOD operations.
So, we are using our at first simple example: 2^{70}
Again, from exponent math,
we can break our 2^{70} into multiple smaller pieces as in
Whatever mix works.
Nice, and it will be useful, but we are interested in the MOD operation in whatever base, so that alone does not help much.
Here is where we use
This is modulo multiplication. Said differently, if our huge number can be factored into smaller numbers – here A and B – we can take the MOD M of that huge number by taking individually the MOD M operation on A and B and multiplying those results.
So, our simple example of 2^{70} (MOD M) is the same as
or perhaps
I chose for this example a simple power of two number to get the point across. It is easy to see how to factor such a number. But our factors could be most anything. The trick is to break the number into products of numbers small enough for you to process within a computer as those smaller pieces.
Even so, in what's coming, we are counting on our ability to think of any number as being a binary number. Afterall, that is what they often are in a computer in any case.
Next, we are also going to use the fact that
With that in mind, let's picture ANY number, pretty much no matter how large, and present it in binary. What is a binary number after all? It is the sum of powers of 2 as in the binary number
where each letter is either a 0 or a 1. Our number of interest is
A * 2^{0} + B * 2^{1} + C * 2^{2} + D * 2^{3} + E * 2^{4} + F * 2^{5} + ⋯ + X * 2^{N}Remembering the MOD over addition above, this binary number representation is a sum, and there is no limit to N. For each '1'B, we find its power of two number and add those together.
If N is small enough for the computer to process the binary number directly (less than 64 bits for instance), we can to take the MOD M against that entire number.
OR, for something larger like we are working with, we can use the MOD M as in the above formulas on the value of each '1'b bit individually. For example, '1010'b is really just 2^{3} + 2^{1} so we use MOD M on both of these and use the modulo arithmetic from above. We can do this for whatever number of '1'b are in our number. You can easily picture a simple loop, finding the location of each '1'B in that binary number, take the MOD M on the corresponding power of two number, and then add those together. And, for really big power of two numbers, we also use the multiplication rule from above. Perhaps it is even quite fast enough.
We could do it this way, but why stop there? There is another way. As long as the binary number we are intending to take the MOD M against is small enough for the processor to calculate, let it do it. That will be explained next.
So, let's continue to suppose the entire number is too big. We can still break it into pieces. We can scale the higher order parts of that and use the multiplication formula above to do the scaling. The scaling shifts the high order bits right into a range where the processor can do the MOD.
As a case in point, let's for now just think about an arbitrary relatively small 32-bit integer as in the following:
That is no different than its four component bytes, each appropriately shifted – read that multiplied – and added. The four shown 8-bit pieces represent exactly the same thing as our 32-bit number above.
So, repeating our modulo formulas for multiplication and addition, we have
For the multiplication part of this, the A is our smallish byte value and B is our power of two number. In this example, our A just happens to always be '01010101'B (but could be anything) and B varies across the values 2^{24}, 2^{16}, 2^{8}, and the ignorable 2^{0}. Of each of these pieces, we take MOD M using the hardware and then multiply and add up all the pieces. Remember, for really large powers of two, that can be repeated as in our example