This chapter has the following sections within this page:
You have dealt with Sets for, well, forever. It is nothing more than a collection of stuff, typically unordered and with different elements. Every data repository that you have ever created, be it an array, a linked list, a graph, a Collection object, or a database with its table is a set of items with a common meaning between its members, its elements. From each of those you ask questions, creating subsets, occasionally subsets with one or no elements. You have very likely been exposed to Venn Diagrams as in the following figure, with one with two sets intersecting. There too you ran into the notion of operations on sets such as OR (and Union), AND (and Intersection), and even variations such as Exclusive OR.
Although these figures appear continuous in nature, you know them to just be packages of stuff, all representing discrete items. Nothing special.
Even so, as with other sections, this is about a language, the standard notation, a formalism, here covering the concept of sets. What makes it special is the new formalisms you will be exposed to. So, we start with formal Representations of a Set, which come in three forms.
The Statement Form is not much more than a well-defined word-based description of the elements of the set. For example:
Simple enough, this is just a listing of the elements. An example is the output of an SQL SELECT statement executed against tables of a database. Here, though, that listing is enclosed in braces and separated by commas, sort of like the definition of a static array in some programming languages.
These are defined by their common property in a format as in the following:
In words: The set of all elements x associated with a function p(x).
You also might be familiar with the For Each in some languages as in
Like that, where "cars" represents a set of car brands. Of course, as here the set has a common property, here being car brands.
Again, much like the rows of your database table or your array of objects.
A variation on these is the Interval Notation. We are still talking about discrete members, but – say – with an x limited as integers, given real numbers a and b we can say
Notice that this is your basic range notation from algebra and number lines. As we will see in a moment, it might be more precise, though, to write this as
meaning in words:
Set theory also uses a still tighter notation via the symbols ∈ and ∉.
Using our set of vowels earlier,
Said differently, given a set
Next, some more set symbology ⋯
Cardinality is nothing more than the size of the set, the count of the number of discrete elements in the set. The notation used for the cardinality of a Set S is
These sizes, being just integers, you can compare the cardinality of two sets X and Y as in the following:
Equality for cardinality does not necessarily mean that the sets themselves are the same, only that they are of the same size.
But, if
X ⊆ Y and if
|X| = |Y|, sets X and Y are equivalent.
It also follows that if
X ⊂ Y, a proper subset, then
|X| < |Y|.
As a couple of cases in point of cardinality,
We have hinted earlier that sets can be defined via functions such as one finding the set of all integers between two real values;
The concept is much more broad than that. It is instead most any function for which there are solutions. You can phrase that is being statements which are either True or False, but producing sets from the results when the statement is true.
For example, consider the statement for the set
If we were to ignore the integer test for x^{2}, we have a continuum of possible solutions between 2 and 10, not really a set of discrete values. Add the integer test, and we find that from within this limited number line we have only a few values for which this statement is true; for all the rest of the real numbers the statement would be false. Notice also that this does not require any x to be an integer, only x^{2} is an integer. X, for example, could be √5 or √7, neither of which are integers, but their squares are (i.e., 5 and 7). In fact, if you work at it, you might notice for this reason the cardinality of this set is 95.
As usual, Discrete Mathematics has a term for these functions: Quantifiers. It also has associated symbols:
The symbol ∀ can be read as "for every" or "for all" or "for any". So, our statement above could be written as
or more formally:
Don't get hung up on the symbology, it merely means the same thing as the sentence preceding it.
Choose your function. It could be most anything of your choosing. So, still more generally
called the Universal Statements (or Univeral Quanitifiers) where U is any value and p(x) is your function on that x. The function p(x) is mapping each of your x values to another value. And, again, only those discrete x values for which the function is true are in the set. A universal quantifier states that an entire set of things share a characteristic. Words that describe an entire set, such as "all", "every", or "none" are called universal quantifiers because that set could be considered a universal set.
Another symbol is that used for Existential Statements (or Existential Quantifiers). The symbol used is ∃.
The words going along with ∃ include "there exists" or "there is" or "there are". An existential quantifier states that a set contains at least one element. As with the Universal Statements the general form reads like
is defined to be True if and only if there exists such an x where the function p(x) holds.
For example:
The cardinality of this set is 8. Why? (The values x must be integers and cannot include 10.)
So, again, the symbols
Next, we will be playing with a quirk of these statements. Recall that quantified statements refer to the statements being True or False. Something interesting happens when we negate – or state the opposite of – a quantified statement. This means that you can reverse the logic on your statements; you can Negate Quantified Statements.
As some examples of negating quantified statements:
It will take some thought, but formally, and using the following symbols
the following statements are true:
A simplified form, perhaps more understandable form of this reads
As an example, suppose that we want to negate the statement
meaning, "there exist integers of value less than 5"
and which after negation means "there do not exist any integers less than 5" which is written as
Now, based on the equivalencies above, we are allowed to rewrite this as
meaning, for this negated statement, "all of the integers are greater than or equal to 5".
Or, as a conclusion, notice the format of the following statement in the formal statement below:
I trust that you will not be to upset if I do not walk through this in depth, but it feels intuitively reasonable. If in a statement it is not true that everything is OK, then there exist some things for which it is not OK. Similarly, if it is not true that there exist somethings that are OK, then everything is not OK. As in your programming, sometimes you are more interested in your ELSE cases than in the THEN cases. All of the above is just telling you, be careful of what you think of as the ELSE, the otherwise because in subtle ways it is not what you might think it is.
Finally, if we were to take the above to two parameters, x and y, the equivalences outlined about get extended as
Venn Diagrams, in their various relationships, show all the possible logical relationships between different sets, all for our purposes containing discrete elements.
We start with the Universal Set (U), which is not so much a set as a concept. It is a collection of all elements in a particular context – the universe of that context. Think of the box shown below as defining that entire context. Within that we have sets, here A, B, and C. Notice in this instance that set A has no overlap with sets B or C. This is the definition of a Disjoint Set; A is disjoint from the other two. There is no member of set A which is also a member of set B or C.
(Please, in the following, work through what follows nice and slow to absorb each of the concepts. They all refer to the figure above.)
Sets B and C, though, do overlap. We also see that all the members of set Y are those members that are in both B and C, these being members of both B AND C. Sets B and C intersect, creating Set Y.
This notion of AND and so also Intersect is represented by the symbol ∩. So, Y = B ∩ C.
The concept of an Intersection normally implies some kind of an overlap between sets. Looking at the overlap – the Intersection - of Sets B and C, we see that the Intersection is the members in Set Y above.
Using our set notion to describe Intersections,
In words, Set Y is the set of unique members common to both Sets B and C.
On cardinality, notice that, if there is an overlap between Sets B and C, there a non-zero cardinality to Set Y,
But what of the subset A ∩ B when Sets A and B are disjoint, having no overlap? No overlap, no intersect, no members. This is called the Empty Set, identified as ∅. And, of course, the cardinality of any empty set | ∅ | = 0.
Next, the notion of Union. A Union is a set consisting of the distinct members of the totality of multiple source sets. If we took the union of the sets A, B and C above, the resulting set would include the members found in all three taken together, but only distinct members. It is a set in its own right, consisting of unique members found in A, B, and C. The members in set Y would be included, but only because those members already exist in set B or set C; members are not double-counted. It follows that set created from unions on non-empty sets will never themselves be empty.
The symbol used for Union is ∪. Please do not confuse this Union symbol ∪ with the U from Universe; think of Union ( ∪ ) as an operation in the same way as Intersect. Sets can be produced by a union whether or not they intersect. New sets can be created as in any of the following:
Using our set notion to describe Unions,
In words,
the set produced by the union of two sets is all elements of Set A or Set B.
This OR, unlike Exclusive OR, means those values of x which are in Set A, Set B, or perhaps both. (Exclusive OR would mean elements in Set A, Set B, but also excluding those residing in both.)
As to the cardinality of sets produced by unions, notice in the above diagram that
but
The reason? Doing so would double count | Y |. Be careful.
As part of becoming formal, let's just create a quick and perhaps obvious definition of the term Inclusion. As in the following figure, Set B is a subset of Set A if all element of B are also elements of A. Conversely, Set A is a superset of Set B. Set A and B can be the same, but if B is the smaller, you know B as the proper subset of A. This is the notion of Inclusion, where Set B is a proper subset of Set A.
Symbolically, this is B ⊆ A. Getting formal as before we say
meaning for every element in Set B, that element is also in Set A. This set of elements in B can include every element from A.
We have been looking at logical operations as we look at sets;
These, though, need not be the only operations. Perhaps we want to add or subtract sets or find the complement of sets, operations which we normally think of as mathematical operations. What would those mean in the context of sets?
Taking the Difference of sets is like taking Set A minus Set B. More formally, the identifier for difference is – unfortunately -
To explain, suppose we have a Set A which consists partly of the members in Set B as in the following figure:
What then is Another way of thinking of Difference is by defining Complement. A \ B? It is just removing all the members in Set B from Set A, leaving a hole in A where B had been?
Or as another take on this same subtraction, consider A \ B in the following figure:
A \ B is Set A without Set B's portion of A, like a bite out of A, again ensuring that Set A has none of the elements found in Set B. A subtraction.
Another way of thinking of Difference is by defining Complement.
In the entire universe of members of a class, the complement of a set B is the set of elements which are not in set B. We will call that B^{c}. Formally,
Which effectively means, the set of elements in U that are not in Set B.
Subtracting Set B from Set A becomes
which first means the intersection of Set A with Set B-complement and second means the intersectiom of Set A with everything that is not in Set B.
It is also nearly intuitively obvious that the intersection of a Set B and its complement B^{c} is empty.
If you are familiar with database and SQL, you know that we are talking about operations used to create all sorts of Result Sets with all sorts of ANDs and ORs and NOTs, and occasionally very explicitly UNIONs. You fight to create queries using these all the time. All that we talked about here is the formalization of those concepts. As it relates to subtraction even, in SQL we would be saying that we want to delete all the records from the totality of a database table, records associated with some Set B, leaving everything else as it was. Or, perhaps you could create from some query a temporary table and using that subtract out some Set B from there.
Even in some arbitrary data structure, really just a temporary data repository, it means to search for and then remove elements in the data structure associated with an attribute making it part of Set B. In an object-based programming language where you can override the subtraction operation (i.e., the – or even \), such subtraction means a new method on the universe of objects representing A, with the intent of removing objects with the right attributes normally associated with B.
Just a lot of formalism used to describe things that you already know, or will know.
At its most simple, Symmetric Difference is the set of those members that are in Set A and Set B, but not both. Think Exclusive OR( ⊕ ). It is the union of Sets A and B ( A ∪ B ) less the intersection of A and B ( A ∩ B ). It is, using the figure below,
Having looked at other operations on sets, what about the the operation of multiplication? Here, as with the preceding operations, we define what this operation means for sets. So, what is a product for sets?
Database gives us a hint. It tends to get glossed over, but in relational database support, it is the basis of SQL JOINS, the conceptual merging of the rows of multiple tables.
In database, a Cartesian Product is another temporary table which is, yes, a set. It is a set in which for every single row of a Table A, it is matched up every row of a Table B.
Think of it this way ⋯ Suppose that both tables A and B contain one column each, Table A with single lower-case letters, and Table B of single-digit numbers. For example,
Yes, these are sets as well. A Cartesian Product given these two sets, these two tables, becomes the set
Every single row of a Table A is matched up with every row of a Table B. A Cartesian Product.
It is a set, a 2-column temporary table, in its own right. It is made by matching all the columns of every row of Set A to all of the columns of every row of Set B. For every row in Table A, we just tag it with every single row of Table B. And, in terms of the resulting temporary table's attributes, it is just the columns of both. That is effectively the definition of a Cartesian Product. Same thing for the Cartesian Product of sets.
In the database management system, when we ask for a JOIN, does this temporary table really ever get created? No, at least not normally. If all you specified on a Join was only a JOIN, yes you do. Try it. But you also normally add additional logic around the Join to filter out most of the uninteresting noise, the excess rows that come with a full Cartesian Product. Even so, that database management system executing your query does not actually first create a tempoary table representing the full Cartesian Product and later filters out the excess. So, think of Cartesian Product as a concept, the Join; how it gets done is actually far more interesting. Still, even after that filtering, say between matching primary and foreign keys, you do have a temporary table containing columns from the tables comprising the Join(s), just like this. Another set.
As to our little example here, is that particularly useful? Not really. We normally do not want everything with everything. With database JOINs we normally add some form of a function to create only what we need. So, let's do that with our little example, add a filter to create something that could be useful. Let's add a quantifier which says that we want Cartesian Product but only where we can group a letter with the location of that letter in alphabetical order. Notice also that the number in Set B are only odd numbers. The result ought then become
We have filtered out our excess from a complete Cartesian Product by defining a function where, when true, we choose that new element in our set.
With all that in mind, ask yourself what you would want to happen if you were to define the operation “X” (multiplication) between the members of two of your own data repositories. For example, given two linked lists of objects, even of different types of objects. Each list is just a set. So what would a Cartesian Product look like for those data repositorites. It is all just sets of objects, as are databsae rows, producing what amounts to a data repository of objects with the attributes of both classes. You determine what the "X" means in that context if such makes some sense. It is all just your game to play, here defining operations as you want or need them over Sets.
Suppose that there exists a database table – a Set – holding the letter grades for each student taking a class in Discrete Mathematics. The set of students who received "A"s is a subset of that overall set. So is the set of students who received a "B" and so on. Altogether, those grade-based subsets taken together as a Union, totally account for all members in that database table. As such, those subsets are each called a Partition.
Let's formalize that. Given a Set S, with a number of disjoint subsets, it can be thought of as being made up of Partitions P_{1}, P_{2}, ⋯, P_{n}. They must, though, satisfy the following three conditions:
As another way to think about it, consider what can be thought of as the three partitions associated with the following figure:
This leads us to some formal nomenclature. A relation is both the connection and the function that describes how the connected elements are related. For our Sets A and B, a relation R consists of ordered pairs (a,b) where a ∈ A and b ∈ B. This relation is sometimes written as a R b. In this way you can think of R as the function relating the two elements. For example, you can replace R in a R b by a symbol for the function such as a < b. More formally we might write that asEach is clearly a subset of Set U, but together they also account for everything in U. They are Partitions.
For those of you with a future in really huge databases, SQL relational and otherwise, keep this in mind. If you want speed, you also want that database to be able to fit into a system's DRAM, it's local memory. Often enough, the memory available is smaller than your database. A performance problem. Instead, making use of multiple systems, each with their own memory, it is often possible to partition that huge database and place each partition in its own system, accessed by its own local DBMS. For example, if the set A-B fits in a single-system's memory, any access from there will be done without much delay. Same thing for the other partitions if they fit in a system's memory. So, now picture a query sent to a DBMS responsible for this entire universe of data. From the ourside world, it is just one very large databse. The DBMS knows otherwise, and also knows how its universe is partitioned. It in turn, sends the request to each local DBMS, split or sent only to one, based on the query proper.
Relations can exist between members of the same sets or between the members of multiple sets. For example, within a single set, a relation can be defined which orders the contents of the set in some way. Picture the data structures of an ordered list or of a tree or a graph for these relations. What makes them ordered? That is a relation. Or, across multiple sets, relations can exist to create an association between members in different sets, these crossing set boundaries. Think of relational databases for instance, with their relation between rows enabled by the connections between primary keys in one table and foreign keys in another. Perhaps think of relations generally as connection lines between members. The connections, though, are more than just lines or pointers, these have a meaning relating the members.
As with a lot of what we have covered on other subjects, you are familiar with much of what constitutes a relation already. We are largely just talking Discrete Mathematics, of course, so need to work through some of the formalisms involved. But let's remind you first of what you do know. For example, all the Keyed List Items in the following ordered list are part of a set, an Ordered Set at that. The fact that this is a linked list is largely irrelevant except that the connections not only connect objects so they can be found, but also imply the operation of Less-Than-First. Ordered. We have a Relation with Less-Than-First and the connections enforcing that relation. Greater-Than-First also enables an ordered list. Or name your relation. All just functions of your choosing, all meeting a programming need of yours.
Notice, in this list, that if we were to remove one of these nodes – say Key = 3 – there would still be a relation between what had been the surrounding nodes. The node with Key = 1 would still precede the node with Key = 5.
Or, as another form of relation, consider this tree structure. Yes, clearly, nodes are connected in some predefined manner. But the relation also means here that from any given node we must decide left to go to the left or to the right. We could make the relation be "Less Than" takes us to the left and "Greater Than" takes us to the right. Again, a relation between the members of the sets based on the connections and the operation used for ordering these nodes this way.
Or, what about the items in a single database table? Within the table proper, there is usually no guarantee of order within the entire set found there, but you can certainly create a subset, a temporary table, where you can perceive the order as being ascending or descending per any column(s) of that table. In this way rows have a relationship between the members (rows) of a set.
But, with database, your mental picture might be that of the connections between multiple joined tables, the relationships found typically between the foreign keys in one table and the corresponding primary keys of another. These connections, based on equal key values, can be one-to-one, but one-to-N relations are frequent as well. Relations between keys in sets. In the following the relationship is between equal keys, where they are a primary key in Table A and foreign keys in Table B.
Sets and Relations are associated with each other. When two sets are used, you define a relation between them. And, in many cases, you get to define what those relational operations are. So, let's now get more formal.
First, one of the things to keep in mind is that the relation is not just the connection itself. It is the function that defines whether two elements have a relation. In our database table join, the function is normally that the foreign key is equal to the primary key. In our unique ordered linked list example, it is that every node to the right of a given node has a key value greater than the node of interest. In the tree, a search might proceed down the true and to the left if the remaining key value is less than some reference value and proceeds to the right if that key value is greater than that value. You define the relation – the function - for the set(s) based on your need.
This leads us to some formal nomenclature. A relation is both the connection and the function that describes how the connected elements are related. For our Sets A and B, a relation R consists of ordered pairs (a,b) where a ∈ A and b ∈ B. This relation is sometimes written as a R b. In this way you can think of R as the function relating the two elements. For example, you can replace R by a symbol for the function such as a < b. More formally we might write that as
Or perhaps as a different example
and formally as
As another example, consider a couple of simple sets where Set A = {1,2,3,4,5,6} and Set B = {1,2,3,4}, let's define our relation as one where the sum of an element from Set A with an element in Set B is divisible by 2, in effect an even sum. So, formally,
we would have a Relation R = { 1,1}, (1,3), (2,2), (2,4), (3,1), (3,3), (4,2), (4,4), (5,1), (5,3), (6,2), (6,4) }
All that we did to produce this was to go through the entire Set A, element by element, match up each with the elements of Set B, and ask whether their sum was even. In a manner of speaking, we created a Cartesian Product of all possible elements in Set A with all in set B, and threw out those where the sum was odd. Or using our connection analogy,
all because their sum must be even.
Empty Relation | An Empty Relation (a.k.a., Void Relation) is one in which there is no relation between the elements within or across sets. An unordered list or two database tables independent of each other might be examples. Or suppose that you try to impose a relationship such as R={x,y} where x and y must be even when the key values in the set are only odd. That would produce no connections and so an Empty Relation. |
Universal Relation | A Universal Relation (or Full Relation) is one in which every element of a set is related to every other. An ordered list might be an example; the list remains ordered even when members are added or removed. A full Cartesian Product relating all of the rows of one table with another tables rows might be another. |
Identity Relation | In an Identity Relation, every element of a set is related to itself only. Key equality as in equal primary and foreign keys across database tables might be an example. |
Inverse Relation | Recalling that a relation is a function, an Inverse Relation is like an inverse function. For example, if you have a relation R which maps as R = {(a,b), (c,d)}, perhaps with many more such, then its inverse maps {(b,a), (d,c)}. |
Reflexive Relation | Reflexivity for a set is a set which is a subset of itself; A ⊆ A. A Reflexive Relation is a relation where every element maps to itself. For example, given the very simple set A={1,2} representing – say – key values, the relation R={(1,1),(2,2),(1,2),(2,1)}. That function would continue to ever more as A's key membership increases. |
Symmetric Relation | A relation R is symmetric if for a mapping (a,b) ∈ R when (b,a) ∈ R is also true. |
Transitive Relation | If the mapping (a, b) ∈ R, and the mapping (b, c) ∈ R, then the mapping (a, c) ∈ R. |