This chapter has the following sections within this page:
A Discrete Mathematics course would show you the theory and use of the notions of graphs and trees and variations thereof. Here, though, in this Discrete Mathematics for Programmers we are going to come at these from a different direction, that of the notion of data structures, structures for organizing data, which ultimately, they really are.
So, let's start at the bottom, that of a graph's nodes and edges, or using the corresponding computer science terms of objects and links. We will start with the most simple of them, a Linked List.
These two figures can represent exactly the same data repository. In the case of this linked list, we have three objects, each one representing the same type of data (Payload) and search key (Key) and we see a clear ordering of those objects, one perhaps based on the value of the Key. To find Object B, we must go through Object A. To find Object C, Object A then Object B must have be accessed. Given C, we cannot find A unless we start over from the beginning. There is a directional order to this. Implicitly clear.
So, what enables these lines making the connections between nodes? Various names are used such as Pointers, Addresses, and References. At the level of your programs when executing, the values residing in each object – the Link – are just numbers, typically a unique address of the first byte of the referenced object in program memory. Every byte of anything in your program memory is unique. Aside from being a program memory value, that "address" can also be other things of your making, like an index into an array of these nodes. It is these links creating the actual order for these objects, not the index into the array; each object can reside most anywhere within its address space. Also notice the value of NULL, the name NULL merely representing a special value meaning the end. However this addressing is done, for out purposes it enables nothing more than this ordered set of objects.
And so it also is with the second figure, the Directed Graph, just an abstraction meaning the same thing. Whatever those nodes represent, there is an implied ordering to them, here in just one direction. Same basic concept. Like a one–way road system, you can only get from Node A to Node C by way of Node B. Rather than the term Node, Discrete Mathematics also calls them Vertices. The connections between them are called Edges. An Edge relates to vertices, a single source and to a single target.
Graphs, though, need not be directed, one leading only to the next in only one direction. A road – a vertex – can take you in either direction between cities. A computer network link can allow two computers to communicate with each other in either direction. For now, let's keep such a bidirectional graph simple and look at it as in the following figures. Arrows pointing in both directions are included in the graph proper, but they need not; a simple line between vertices implies both directions. There is, of course, a corresponding concept in a linked list, but not as a single line. It is a Doubly-linked List. With it, we can go either direction within the list. We still have our link – here called F Link – taking us forward in the list. No change there. To go backward, the new B Link takes us back a preceding node object. Notice that Node B is addressed by the F Link in Node A and by the B Link in Node C. These linked objects represent the same concept as the graph, they just take some overhead to maintain it.
The graph is clear enough, and implicitly so. There is still an order to it, but not in just one direction. We still go in either direction based on the edges, but we cannot go directly from Vertex A to Vertex C. That is the graph's abstraction. The slightly more real linked list still also maintains the notion of order. It is similarly bidirectional because of the forward and backward pointers, but here too we have no pointer which takes us directly from Node A to C or from Node B to Node A, except via B. Those two links – forward and backward – together represent the same thing as an edge of a (undirected) graph. No difference. This graph is just an abstraction of something that is a bit more real, the doubly-linked list.
From those graphs as one-to-one lists we can head off into a subcategory of a graph called a Tree. Only to keep the pictures cleaner, we will limit ourselves for now to single-direction trees, allowing us to work our way only downward within this tree. So, a Tree, here a Directed Tree. (Although we could draw it otherwise, the top of these trees is called the Root.)
From some Root Node A, given some arbitrary function applied to some search value, our search takes us to either Node B or Node C and nowhere else, and only as here downward. The function making that choice – here only left or right – can be most anything. For example, given a simple binary number as the key value, the choice of B or C might be based on the state of state of the least significant bit of that number; 0 takes us left, 1 takes us right. The choice at the next level might be of the next bit up from there. A simple binary search. Your choice. Or maybe it is a text search where the decision is based on a string being lexically less (left) than or lexically greater than (right) some text in the node; if lexically equal, the search could be done. But it is your choice based on what this directed tree needs to represent. It is just a data structure of your making after all.
Continuing downward, once at Node B, the same type of question gets asked in determining whether the next vertex ought to be D or E. Sooner or later, if not ending elsewhere, we end at the bottom, having found no more vertices, ending our search.
Which brings us to our more real object-based data structure supporting the same concept as a tree.
The two edges at each new level of the Tree, instead of just the one pointer in the linked list, we here need two links in each object, each node, both pointing downward. The function choosing whether we go left or right, thereby selecting one of the two addresses, is the same as it is in our Directed Tree above. If our search were to fail to find what we are looking for, the link values at the bottom level tell us that we are at the bottom because they too are our NULL value(s).
So far, a simple Tree, implemented with two pointers in one downward direction.
Can an abstraction of a tree be understood to mean either direction, up or down as well. Yes. Remove the arrows on those edges and it now means that – say – we can find our way up the tree. Of course, for our pointer-based implementation of that abstraction, we need to think about what it takes to support that. As an example, all that we need is to allow us to be on Node B and find Node A, or from Node D finding B, conceptually "higher" in the tree. We need another pointer, another link, just one, this time pointing backward up the tree, much like we pointed backward in our doubly-linked list. In short, each node object needs three links, two down, one up.
We have been slowly adding complexity to what we had, so let's add another.
There is no requirement that there must only two choices leading to the next level. It is two because so far we only needed two. If necessary, for what we require our tree to do, a Vertex A might be required to have edges with Vertices B through E, and those to still others, or not. A generic tree, here still a directed tree. As before, we still need a function which makes the decision of which is to be the next node at the next level(s) to visit. It could be anything, like your decision to take one highway versus another to get from City A to City H.
So, if that is what we need, what is the new news for our linked node objects? Since we are back to talking about working our way just down the tree, two links will not be enough. Node A here needs to be capable of addressing node objects B though E; four pointers, at least. Node D needs no such capability at all, for the moment. To explain in more detail now – in this Discrete Math tutorial – requires an understanding of what programmers learn in a Data Structures class but suffice it to say that this can easily be done.
From this new abstraction, let's add the complexity of bi-direction edges back in so we can work our way in either direction. The edges are no longer directed only downward. Given, for example, that we wanted to travel from City H to City A, Node A is no longer our root node, but now H is.
But, not only that, let's add the complexity that we want a way to allow any vertex to be our starting city, our tree root. All the edges are as they were before, no more, but travel is possible in either direction.
So, what's new? Not only does each node need a means of travelling down the tree, from one node to an arbitrary number of lower-level nodes, but we need a way of travelling up the tree, again from any one node to an arbitrary number of what appears to be here a higher-level node. Drawn like this, Node A appears to be the root, and so at the highest level. But any node could be that top node, that Tree Root, one like Node H. We have just flipped this tree completely over. Node H is the top level. You can see how we would navigate such a tree from Node H to Node A, by way of Node C. Notice also that we can only reach Node D from H by way of nodes C and A, but we can.
Additionally, we want any node to potentially be our root node. To enable that, first using the array on the right, we can search through that array for whatever we consider to be the name of our root node, and that found array entry addresses for us the needed root node.
It's still sort of a tree, but the root redefines how you perceive it as such. The root, whichever it is, is the starting point for a search over a set of connections.
Which brings us back to Graphs and then to Weighted Graphs.
All that a graph really is is just an arbitrary set of vertices (nodes) along with an arbitrary set of edges between pairs of those vertices. Again, pairs of those vertices. There is no limit on the size of the set containing those vertices, nor really any limit on the number of edges – the connections – between pairs of nodes, nor on which vertices get connected. Any of those edges can be bidirectional or unidirectional. Look at any map with roads and cities. It is a graph. Or consider the connections of the nodes in a computer network. It is a graph.
A graph. Just an abstraction really. For any kind of data between which there are relations, a graph might be capable of representing that data. There is even a database type capable of efficiently representing graphs (i.e., Graph Databases).
However, all that we have discussed so far are arbitrary nodes with links. Aside from their directionality, one link is being treated as good as any other. Often, they are not. The distance between two linked cities varies. The number of lanes and speed limits may vary. In networked computers, the bandwidths of the links between them or their relative reliability may differ. In short, the links – the edges – themselves can have some weighting factor. Whether that is in miles, or bandwidth, or latency, or probability, that is up to whatever need you have. It gets factored into decisions that you make pertaining to that graph.
The following is one such, showing both bidirectional and unidirectional links:
This is a graph, consisting of vertices and edges, nodes and links. The weights – the numbers between the nodes – could represent most anything, but let's say that here they are miles between cities. We are even using infinity to represent a temporarily shut down road.
So, a question ⋯ As of the state of the graph at this moment, what is the fastest way to travel between any two cities? The "root" node – the source or starting node, the starting city – can be any node, just as could the destination city. It happens that the shortest way is not always better between directly linked cities. As a case in point, if wanting to go from A to D, the shorter way is via B. Another set of weights representing current speed, not shown here but also representable within our data structure, could also demonstrate that the direct route from A to D is nonetheless faster as opposed to shorter.
We could represent all this/these much like with the data structures shown in the previous section. The new news here is that each node not only knows each of the links for which is it a sourcing node, and the names of the nodes at the other end of that link, but also about the weight(s) of each such links. Think of that information as being carried within each node object.
For example, Node A is the source for three edges, with the edge to Node C having weight 10, the edge to node B weight 5, and to node D weight 20. (The incoming edge is not sourced by Node A.) The node object would carry not just the list of these nodes to which it can directly connect, but the weight of each such connection – that edge – as well.
There are algorithms for finding shortest or fastest paths between arbitrary nodes within graphs just like this, but that is a subject for your Data Structures class. (See Dijkstra's Algorithm for example.)
Do not think of these as static graphs or even as static data structures. New edges – new roads – might be added, the weights of existing ones might change – including broken edges making the distance infinite – or even new vertices added, representing cities newly accessible via their own edges and associated weights. The associated data structures you have to represent all of that, of course, need to follow suit. A node might find that it is the source for a new link to a pre-existing node, an existing link%apos;s edge needs to change, or there is a node new node and link(s) being added. The data structures of the nodes must be capable of these types of modifications.
Still another way to represent this same information, rather than within the nodes themselves, is via something external called an Adjacency Matrix. In that, we are defining the rows of the matrix as being source cities, the columns as being destination cities, and the value in the matrix as being the weight value. An N/A (i.e., Not Available) encoded here represents a non-existent link between cities, and related, and encode for INF can mean one temporarily broken.
You will notice a certain symmetry here. Focus first on yellow and along the diagonal; there are no explicit links between the same cities (although there could be), so the distance in these boxes is shown as zero. Elsewhere, where the link is bidirectional – and equally weighted – the entries are marked in orange with the intent being to call out the symmetry across that diagonal. Again, the symmetry is because of the bidirectional links, these by definition having equal weights. But suppose they all are not, as in that graph. In that case the boxes in red are the weights of links between the same city-pairs when
Here, too, though, changes can occur. For example, a new road between these existing cities means an N/A becomes a weight value. Or perhaps a broken road gets fixed, the INF similarly becomes a weight value. Conversely, a newly broken road is marked with INF.
However, these straightforward changes do not scale up if a completely new road is created between cities already having a road. We need a way of adding edges, bidirectional or not. In such a case, one solution is to define each entry of this matrix as an object, a list of weights.
How ever it is represented, it iss all still just a weighted graph, an abstraction. How you implement it is up to you and the requirements on that abstraction.
In another very different portion of this "book" on Discrete Mathematics, it spoke of Conditional Probabilities and Bayes' Theorem. There we covered the notion of the probability of something being true (or false) given that something else was true (or false). You do not need a deep understanding of that to understand what we will be covering next, but it is related. First, all that you really need to know for what is coming is that the weights of the edges on a graph – discussed in the previous section – can be values representing probabilities as well.
With that we can go on and say that the models associated with much of artificial intelligence is roughly that, truly massive graphs with vertices having edges between them and those edges based on probabilities. We are not going to get in deep into AI models here – you are welcome – but it happens that the training of those models is partly a function of adjusting those probabilities – the weights of the edges – until, given some input, the output of the overall graph is consistent with what you might expect. The AI model is largely this graph with weights as probabilities, just enormously larger and more complex than we will be seeing here.
So, next, a very simple such graph. A set of values come into the model – into the graph – from the left, get munged by the entire graph, and produce a set of values on the right. In reality, the number of nodes comprising the height of each level can be unbelievably high, and the number of levels many times what is shown, and so the number of edges is similarly huge, but ultimately the difference is merely about computer memory usage to hold all that and about processing capacity to quickly get results. It is a weighted graph, with probabilities on the edges.
Consider the following, but first just the graph on the left:
Let's start with Node A, really just an input number called I0, one of many in the input. Node A here has an edge to every node in the second level. So do nodes B through E. Same thing in Level 2 – Nodes F through I – and their connections to Level 3's nodes – J through K – and continuing on all the way until the graph reaches the right output side.
We are going to now change focus a bit. Now, given a Node F (here at level 2), it follows that it receives some sort of a value input from all of the nodes of the preceding level. Each of nodes A through E have some arbitrary value, those somehow being passed into this Node F. So, what does Node F do with all that input?
To answer that we need to now introduce the edge weights. Each weight is a probability. So, now let's look at the Adjacency Matrices at the right. There we see that the probability weight of the A-to-F edge just happens to be the value 50%. It got set to that in a much earlier process of training this entire model, loading this entire graph. For our purposes, the value edge's weight is this arbitrary 50%.
More formally, to determine the value of Node F alone, or for that matter any internal node alone, we would execute the following function, with weights and input and bias values particular to a node:
In words, we are multiplying the weight values (Wi) associated with each edge leading into Node F with their corresponding nodes value (which in the first step happen to be the input values Ii), adding those with a bias for Node F and then executing some range scaling function against that result. Do not get hung up on thae actual function used here; determining just the right function of this AI model is a bit of a science in its own right.
Now repeat all that for every node in the level (think repeating that function for thousands of nodes in a level) and as each level's node values are all determined, do the same thing all over again for the next level, and the next. That is the essence of it.
From a program's point of view, you might have noticed that this is not just one pass of one path through the graph, but instead every node and every edge is repeatedly being accessed. It is rather no surprise that specialized processing hardware is created just for this purpose alone. But it is not just the new hardware that makes this fast, it is how you lay out that data, a layout under your control. For example, notice Node F – in orange – and the array for it in the adjacency matrix, also in orange. Again, picture that array as consisting of thousands of entries. It is therefore – say – a one-thousand-entry array of weight values used just for calculating Node F. Also, conceptually there in orange is another equal sized array of the current values for nodes F through I (again, representing in reality thousand of such values). So, when doing the function above, all that the processor is doing is efficiently vacuuming up the contents of those two arrays, bringing it sort of all at once into the processor, and executing the function above. And, again, specialized hardware exists for doing exactly that very efficiently.