193 35 2MB
Hungarian Pages [487] Year 2012
Large networks and graph limits L´aszl´o Lov´asz ¨ tvo ¨ s Lora ´ nd University, Budapest, Institute of Mathematics, Eo Hungary
2010 Mathematics Subject Classification. Primary 05C99, Secondary 05C25, 05C35, 05C80, 05C82, 05C85, 90B15 Key words and phrases. graph homomorphism, graph algebra, graph limit, graphon, graphing, property testing, regularity lemma
To Kati as all my books
Contents Preface Part 1.
xi Large graphs: an informal introduction
1
Chapter 1. Very large networks 1.1. Huge networks everywhere 1.2. What to ask about them? 1.3. How to obtain information about them? 1.4. How to model them? 1.5. How to approximate them? 1.6. How to run algorithms on them? 1.7. Bounded degree graphs
3 3 4 5 8 11 18 22
Chapter 2. Large graphs in mathematics and physics 2.1. Extremal graph theory 2.2. Statistical physics
25 25 32
Part 2.
35
The algebra of graph homomorphisms
Chapter 3. Notation and terminology 3.1. Basic notation 3.2. Graph theory 3.3. Operations on graphs
37 37 38 39
Chapter 4. Graph parameters and connection matrices 4.1. Graph parameters and graph properties 4.2. Connection matrices 4.3. Finite connection rank
41 41 42 45
Chapter 5. Graph homomorphisms 5.1. Existence of homomorphisms 5.2. Homomorphism numbers 5.3. What hom functions can express 5.4. Homomorphism and isomorphism 5.5. Independence of homomorphism functions 5.6. Characterizing homomorphism numbers 5.7. The structure of the homomorphism set
55 55 56 62 68 72 75 79
Chapter 6. Graph algebras and homomorphism functions 6.1. Algebras of quantum graphs 6.2. Reflection positivity
83 83 88
vii
viii
CONTENTS
6.3. 6.4. 6.5. 6.6. Part 3.
Contractors and connectors Algebras for homomorphism functions Computing parameters with finite connection rank The polynomial method Limits of dense graph sequences
94 101 106 108 113
Chapter 7. Kernels and graphons 7.1. Kernels, graphons and stepfunctions 7.2. Generalizing homomorphisms 7.3. Weak isomorphism I 7.4. Sums and products 7.5. Kernel operators
115 115 116 121 122 124
Chapter 8. The cut distance 8.1. The cut distance of graphs 8.2. Cut norm and cut distance of kernels 8.3. Weak and L1 -topologies
127 127 131 138
Chapter 9. Szemer´edi partitions 9.1. Regularity Lemma for graphs 9.2. Regularity Lemma for kernels 9.3. Compactness of the graphon space 9.4. Fractional and integral overlays 9.5. Uniqueness of regularity partitions
141 141 144 149 151 154
Chapter 10.1. 10.2. 10.3. 10.4. 10.5. 10.6. 10.7.
10. Sampling W -random graphs Sample concentration Estimating the distance by sampling The distance of a sample from the original Counting Lemma Inverse Counting Lemma Weak isomorphism II
157 157 158 160 164 167 169 170
Chapter 11.1. 11.2. 11.3. 11.4. 11.5. 11.6. 11.7. 11.8.
11. Convergence of dense graph sequences Sampling, homomorphism densities and cut distance Random graphs as limit objects The limit graphon Proving convergence Many disguises of graph limits Convergence of spectra Convergence in norm First applications
173 173 174 180 185 193 194 196 197
Chapter 12.1. 12.2. 12.3. 12.4.
12. Convergence from the right Homomorphisms to the right and multicuts The overlay functional Right-convergent graphon sequences Right-convergent graph sequences
201 201 205 207 211
CONTENTS
ix
Chapter 13.1. 13.2. 13.3. 13.4. 13.5.
13. On the structure of graphons The general form of a graphon Weak isomorphism III Pure kernels The topology of a graphon Symmetries of graphons
217 217 220 222 225 234
Chapter 14.1. 14.2. 14.3. 14.4. 14.5. 14.6.
14. The space of graphons Norms defined by graphs Other norms on the kernel space Closures of graph properties Graphon varieties Random graphons Exponential random graph models
239 239 242 247 250 256 259
Chapter 15.1. 15.2. 15.3. 15.4.
15. Algorithms for large graphs and graphons Parameter estimation Distinguishing graph properties Property testing Computable structures
263 263 266 268 276
Chapter 16.1. 16.2. 16.3. 16.4. 16.5. 16.6. 16.7.
16. Extremal theory of dense graphs Nonnegativity of quantum graphs and reflection positivity Variational calculus of graphons Densities of complete graphs The classical theory of extremal graphs Local vs. global optima Deciding inequalities between subgraph densities Which graphs are extremal?
281 281 283 285 293 294 299 307
Chapter 17. Multigraphs and decorated graphs 17.1. Compact decorated graphs 17.2. Multigraphs with unbounded edge multiplicities
317 318 325
Part 4.
327
Chapter 18.1. 18.2. 18.3. 18.4. 18.5. 18.6.
Limits of bounded degree graphs 18. Graphings Borel graphs Measure preserving graphs Random rooted graphs Subgraph densities in graphings Local equivalence Graphings and groups
329 329 332 338 344 346 349
Chapter 19. Convergence of bounded degree graphs 19.1. Local convergence and limit 19.2. Local-global convergence
351 351 360
Chapter 20. Right convergence of bounded degree graphs 20.1. Random homomorphisms to the right 20.2. Convergence from the right
367 367 375
x
CONTENTS
Chapter 21. On the structure of graphings 21.1. Hyperfiniteness 21.2. Homogeneous decomposition
383 383 393
Chapter 22.1. 22.2. 22.3.
397 397 402 405
Part 5. Chapter 23.1. 23.2. 23.3. 23.4. 23.5.
22. Algorithms for bounded degree graphs Estimable parameters Testable properties Computable structures Extensions: a brief survey 23. Other combinatorial structures Sparse (but not very sparse) graphs Edge-coloring models Hypergraphs Categories And more...
413 415 415 416 421 425 429
Appendix A. Appendix A.1. M¨obius functions A.2. The Tutte polynomial A.3. Some background in probability and measure theory A.4. Moments and the moment problem A.5. Ultraproduct and ultralimit A.6. Vapnik–Chervonenkis dimension A.7. Nonnegative polynomials A.8. Categories
433 433 434 436 441 444 445 446 447
Bibliography
451
Author Index
465
Subject Index
469
Notation Index
473
Preface Within a couple of months in 2003, in the Theory Group of Microsoft Research in Redmond, Washington, three questions were asked by three colleagues. Michael Freedman, who was working on some very interesting ideas to design a quantum computer based on methods of algebraic topology, wanted to know which graph parameters (functions on finite graphs) can be represented as partition functions of models from statistical physics. Jennifer Chayes, who was studying internet models, asked whether there was a notion of “limit distribution” for sequences of graphs (rather than for sequences of numbers). Vera T. S´os, a visitor from Budapest interested in the phenomenon of quasirandomness and its connections to the Regularity Lemma, suggested to generalize results about quasirandom graphs to multitype quasirandom graphs. It turned out that these questions were very closely related, and the ideas which we developed for the answers have motivated much of my research for the next years. Jennifer’s question recalled some old results of mine characterizing graphs through homomorphism numbers, and another paper with Paul Erd˝os and Joel Spencer in which we studied normalized versions of homomorphism numbers and their limits. Using homomorphism numbers, Mike Freedman, Lex Schrijver and I found the answer to Mike’s question in a few months. The method of solution, the use of graph algebras, provided a tool to answer Vera’s. With Christian Borgs, Jennifer Chayes, Lex Schrijver, Vera S´os, Bal´azs Szegedy, and Kati Vesztergombi, we started to work out an algebraic theory of graph homomorphisms and an analytic theory of convergence of graph sequences and their limits. This book will try to give an account of where we stand. Finding unexpected connections between the three questions above was stimulating and interesting, but soon we discovered that these methods and results are connected to many other studies in many branches of mathematics. A couple of years earlier Itai Benjamini and Oded Schramm had defined convergence of graph sequences with bounded degree, and constructed limit objects for them (our main interest was, at least initially, the convergence theory of dense graphs). Similar ideas were raised even earlier by David Aldous. The limit theories of dense and bounded-degree graphs have lead to many analogous questions and results, and each of them is better understood thanks to the other. Statistical physics deals with very large graphs and their local and global properties, and it turned out to be extremely fruitful to have two statistical physicists (Jennifer and Christian) on the (informal) team along with graph theorists. This put the burden to understand the other person’s goals and approaches on all of us, but at the end it was the key to many of the results.
xi
xii
PREFACE
Another important connection that was soon discovered was the theory of property testing in computer science, initiated by Goldreich, Goldwasser and Ron several years earlier. This can be viewed as statistics done on graphs rather than on numbers, and probability and statistics became a major tool for us. One of the most important application areas of these results is extremal graph theory. A fundamental tool in the extremal theory of dense graphs is Szemer´edi’s Regularity Lemma, and this lemma turned out to be crucial for us as well. Graph limit theory, we hope, repaid some of this debt, by providing the shortest and most general formulation of the Regularity Lemma (“compactness of the graphon space”). Perhaps the most exciting consequence of the new theory is that it allows the precise formulation of, and often the exact answer to, some very general questions concerning algorithms on large graphs and extremal graph theory. Independently and about the same time as we did, Razborov developed the closely related theory of flag algebras, which has lead to the solution of several long-standing open problems in extremal graph theory. Speaking about limits means, of course, analysis, and for some of us graph theorists, it meant hard work learning the necessary analytical tools (mostly measure theory and functional analysis, but even a bit of differential equations). Involving analysis has advantages even for some of the results that can be stated and proved purely graph-theoretically: many definitions and proofs are shorter, more transparent in the analytic language. Of course, combinatorial difficulties don’t just disappear: sometimes they are replaced by analytic difficulties. Several of these are of a technical nature: Are the sets we consider Lebesgue/Borel measurable? In a definition involving an infimum, is it attained? Often this is not really relevant for the development of the theory. Quite often, on the other hand, measurability carries combinatorial meaning, which makes this relationship truly exciting. There were some interesting connections with algebra too. Bal´azs Szegedy solved a problem that arose as a dual to the characterization of homomorphism functions, and through his proof he established, among others, a deep connection with the representation theory of algebras. This connection was later further developed by Schrijver and others. Another one of these generalizations has lead to a combinatorial theory of categories, which, apart from some sporadic results, has not been studied before. The limit theory of bounded degree graphs also found very strong connections to algebra: finitely generated infinite groups yield, through their Cayley graphs, infinite bounded degree graphs, and representing these as limits of finite graphs has been studied in group theory (under the name of sofic groups) earlier. These connections with very different parts of mathematics made it quite difficult to write this book in a readable form. One way out could have been to focus on graph theory, not to talk about issues whose motivation comes from outside graph theory, and sketch or omit proofs that rely on substantial mathematical tools from other parts. I felt that such an approach would hide what I found the most exciting feature of this theory, namely its rich connections with other parts of mathematics (classical and non-classical). So I decided to explain as many of these connections as I could fit in the book; the reader will probably skip several parts if he/she does not like them or does not have the appropriate background, but perhaps the flavor of these parts can be remembered.
PREFACE
xiii
The book has five main parts. First, an informal introduction to the mathematical challenges provided by large networks. We ask the “general questions” mentioned above, and try to give an informal answer, using relatively elementary mathematics, and motivating the need for those more advanced methods that are developed in the rest of the book. The second part contains an algebraic treatment of homomorphism functions and other graph parameters. The two main algebraic constructions (connection matrices and graph algebras) will play an important role later as well, but they also shed some light on the seemingly completely heterogeneous set of “graph parameters”. In the third part, which is the longest and perhaps most complete within its own scope, the theory of convergent sequences of dense graphs is developed, and applications to extremal graph theory and graph algorithms are given. The fourth part contains an analogous theory of convergent sequences of graphs with bounded degree. This theory is more difficult and less well developed than the dense case, but it has even more important applications, not only because most networks arising in real life applications have low density, but also because of connections with the theory of finitely generated groups. Research on this topic has been perhaps the most active during the last months of my work, so the topic was a “moving target”, and it was here where I had the hardest time drawing the line where to stop with understanding and explaining new results. The fifth part deals with extensions. One could try to develop a limit theory for almost any kind of finite structures. Making a somewhat arbitrary selection, we only discuss extensions to edge-coloring models and categories, and say a few words about hypergraphs, to much less depth than graphs are discussed in parts III and IV. I included an Appendix about several diverse topics that are standard mathematics, but due to the broad nature of the connections of this material in mathematics, few readers would be familiar with all of them. One of the factors that contributed to the (perhaps too large) size of this book was that I tried to work out many examples of graph parameters, graph sequences, limit objects, etc. Some of these may be trivial for some of the readers, others may be tough, depending on one’s background. Since this is the first monograph on the subject, I felt that such examples would help the reader to digest this quite diverse material. In addition, I included quite a few exercises. It is a good trick to squeeze a lot of material into a book through this, but (honestly) I did try to find exercises about which I expected that, say, a graduate student of mathematics could solve them with not too much effort.
Acknowledgements. I am very grateful to my coauthors of those papers that form the basis of this book: Christian Borgs, Jennifer Chayes, Michael Freedman, Lex Schrijver, Vera S´os, Bal´azs Szegedy, and Kati Vesztergombi, for sharing their ideas, knowledge, and enthusiasm during our joint work, and for their advice and extremely useful criticism in connection with this book. The creative atmosphere and collaborative spirit at Microsoft Research made the successful start of this research project possible. It was a pleasure to do the last finishing touches on the book in Redmond again. The author acknowledges the support of ERC Grant
xiv
PREFACE
No. 227701, OTKA grant No. CNK 77780 and the hospitality of the Institute for Advanced Study in Princeton while writing most of this book. My wife Kati Vesztergombi has not only contributed to the content, but has provided invaluable professional, technical and personal help all the time. Many other colleagues have very unselfishly offered their expertise and advice during various phases of our research and while writing this book. I am particularly grateful to Mikl´os Ab´ert, Noga Alon, Endre Cs´oka, G´abor Elek, Guus Regts, Svante Janson, D´avid Kunszenti-Kov´acs, G´abor Lippner, Russell Lyons, Jarik Neˇsetˇril, Yuval Peres, Oleg Pikhurko, the late Oded Schramm, Miki Simonovits, Vera S´os, Kevin Walker, and Dominic Welsh. Without their interest, encouragement and help, I would not have been able to finish my work.
Part 1
Large graphs: an informal introduction
CHAPTER 1
Very large networks 1.1. Huge networks everywhere In the last decade it became apparent that a large number of the most interesting structures and phenomena of the world can be described by networks: often the system consists of discrete, well separable elements, with connections (or interactions) between certain pairs of them. To understand the behavior of the whole system, one has to study the behavior of the individual elements as well as the structure of the underlying network. Let us see some examples. • Among very large networks, probably the best known and the most studied is the internet. Moreover, the internet (as the physical underlying network) gives rise to many other networks: the network of hyperlinks (web, logical internet), internet based social networks, distributed data bases, etc. The size of the internet is growing fast: currently the number of web pages may be 30 billion (3 · 1010 ) or more, and the number of interconnected devices is probably more than a billion. The graph theoretic structure of the internet determines, to a large degree, how communication protocols should be designed, how likely certain parts get jammed, how fast computer viruses spread etc. • Social networks are basic objects of many studies in the area of sociology, history, epidemiology and economics. They are not necessarily formally established, like Facebook and other internet networks: The largest social network is the acquaintance graph of all living people, with about 7 billion nodes. The structure of this acquaintance graph determines, among others, how fast news, inventions, religions, diseases spread over the world, now and during history. • Biology contributes ecological networks, networks of interactions between proteins, and the human brain, just to mention a few. The human brain, a network of neurons, is really large for its mass, having about a hundred billion nodes. One of the greatest challenges is, of course, to understand ourselves. • Statistical physics studies the interactions between large numbers of discrete particles, where the underlying structure is often described by a graph. For example, a crystal can be thought of as a graph whose nodes are the atoms and whose edges represent chemical bonds. A perfect crystal is a rather boring graph, but impurities and imperfections create interesting graph-theoretical digressions. 10 gram of a diamond has about 5 × 1023 nodes. The structure of a crystal influences important macroscopic properties like whether the material is magnetizable, or how it melts. • Some of the largest networks in engineering occur in chip design. There can be more than a billion transistors on a chip nowadays. Even though these networks are man-made and carefully designed, many of their properties, like 3
4
1. VERY LARGE NETWORKS
the exact time they will need to perform some computation, are difficult to determine from their design, due to their huge size. • To be pretentious, we can say that the whole universe is a single (really huge, possibly infinite) network, where the nodes are events (interactions between elementary particles), and the edges are the particles themselves. This is a network with perhaps 1080 nodes. It is an ongoing debate in physics how much additional structure the universe has, but perhaps understanding the graph-theoretical structure of this graph can help with understanding the global structure of the universe. These huge networks pose exciting challenges for the mathematician. Graph Theory (the mathematical theory of networks) has been one of the fastest developing areas of mathematics in the last decades; with the appearance of the Internet, however, it faces fairly novel, unconventional problems. In traditional graph theoretical problems the whole graph is exactly given, and we are looking for relationships between its parameters or efficient algorithms for computing its parameters. On the other hand, very large networks (like the Internet) are never completely known, in most cases they are not even well defined. Data about them can be collected only by indirect means like random local sampling or by monitoring the behavior of various global processes. Dense networks (in which a node is adjacent to a positive percent of other nodes) and very sparse networks (in which a node has a bounded number of neighbors) show a very different behavior. From a practical point of view, sparse networks are more important, but at present we have more complete theoretical results for dense networks. In this introduction, most of the discussion will focus on dense graphs; we will survey the additional challenges posed by sparse networks in Section 1.7. 1.2. What to ask about them? Think of a really large graph, say the internet, and try to answer the following four simple questions about it. Question 1. Does the graph have an odd or even number of nodes? This is a very basic property of a graph in the classical setting. For example, it is one of the first theorems or exercises in a graph theory course that every graph with an odd number of nodes must have a node with even degree. But for the internet, this question is clearly nonsense. Not only does the number of nodes change all the time, with devices going online and offline, but even if we fix a specific time like 12:00am today, it is not well-defined: there will be computers just in the process of booting up, breaking down etc. Question 2. What is the average degree of nodes? This, on the other hand, is a meaningful question. Of course, the average degree can only be determined with a certain error, and it will change as technology or the social composition of users change; but at a given time, a good approximation can be sought (I am not speaking now about how to find it). Question 3. Is the graph connected? To this question, the answer is almost certainly no: somewhere in the world there will be a faulty router with some unhappy users on the wrong side of it. But this is not the interesting way to interpret the question: we should consider the
1.3. HOW TO OBTAIN INFORMATION ABOUT THEM?
5
internet “disconnected” if, say, an earthquake combined with a sunflare severs all connections between the Old and New worlds. So we want to ignore small components that are negligible in comparison with the whole graph, and consider the graph “disconnected” only if it decomposes into two parts which are commeasurable with the whole. On the other hand, we may want to allow that the two large parts be connected by a very few edges, and still consider the graph “disconnected”. Question 4. Where is the largest cut in the graph? (This means to find the partition of the nodes into two classes so as to maximize the number of edges connecting the two classes.) This example shows that even if the question is meaningful, it is not clear in what form can we expect the answer. We can ask for the fraction of edges contained in the largest cut (depending on the model, this can be determined relatively easily, with an error that is small with high probability, although it is not easy to prove that the algorithm works). But suppose we want to “compute” the largest cut itself; how to return the result, i.e., how to specify the largest cut (or even an approximate version of it)? We cannot just list all nodes and tell on which side do they belong: this would be too much time and memory space. Is there a better way to answer the question? 1.3. How to obtain information about them? If we face a large network (think of the internet) the first challenge is to obtain information about it. Often, we don’t even know the number of nodes. 1.3.1. Sampling. Properties of very large graphs can be studied by randomly sampling small subgraphs. The theory of this, sort of a statistics where we work with graphs instead of numbers, is called property testing in computer science. Initiated by Goldreich, Goldwasser and Ron [1998], this theory emerged in the last 15-20 years, and will be one of the main areas of applications of the methods developed in this book. In the case of dense graphs G, it is simple to describe a reasonably realistic sampling process: we select independently a fixed number k of random nodes, and determine the edges between them, to get a random induced subgraph (Figure 1.1). We have to assume, of course, that we have methods to select a uniformly distributed random node of the graph, and to determine whether two nodes are adjacent. We’ll call this subgraph sampling. For each graph F on [k] = {1, 2, . . . , k}, there is a certain probability of seeing F when k nodes are sampled, which we denote by σG,k (F ). So every graph G defines a probability distribution σG,k on all graphs with k nodes. It turns out that this sample contains enough information to determine many properties and parameters of the graph, with some error of course. This error can be made arbitrarily small with high probability if we choose the sample size k sufficiently large, depending on the error bound (and only on the error bound, not on the graph!). One may try to strengthen this and allow this sampling process to be repeated a bounded number of times. This would not give anything new, however: sampling k nodes r times gives less information than sampling kr nodes once. For clarity, it is sometimes better to describe algorithms saying that we repeat a certain sampling process, but this could always be replaced by taking a single sample (larger, but still of bounded size).
6
1. VERY LARGE NETWORKS
Figure 1.1. Sampling from a dense graph and from a graph with bounded degree. 1.3.2. Global observables. Instead of taking a random subset of the nodes (sampling) and studying the subgraph induced by them, we can take a random partition of the nodes into a small number of classes, and study the “quotient”, the small graph obtained by merging the classes of the partition. (This will have very large edge multiplicities, which we have to normalize appropriately.) These quotients carry information about global measurements (like the number of stable sets, the maximum cut, various quantities in statistical physics, etc.). The remarkable fact is that under the right conditions, these “global” observables carry the same information as “local” sampling (see Sections 12.3 and 20.2.) Another source of information about a very large network is the observation of the behavior of various processes on the graph, through a longer time interval. The observation can be global (measurement of some global parameter), or local (at one node, or a few neighboring nodes). Observing heat propagation through a material is an example of the first kind of approach; web crawlers can be considered as examples of the second, and in a sense so is our observation of the universe. There are some sporadic results about the local observation of simple random processes (Benjamini and Lov´ asz [2002], Benjamini, Kozma, Lov´asz, Romik, and Tardos [2006], but a general theory of such local observation of global processes has not emerged yet. 1.3.3. Left and right homomorphisms. In theoretical studies, it is often more convenient to talk about homomorphisms (adjacency-preserving maps) between graphs, instead of looking at randomly chosen induced subgraphs. For two finite simple graphs F and G, let hom(F, G) denote the number of homomorphisms of F into G (adjacency-preserving maps from V (F ) to V (G)). We often normalize these homomorphism numbers, to get homomorphism densities: (1.1)
t(F, G) =
hom(F, G) . v(G)v(F )
This number is the probability that a random map of V (F ) into V (G) preserves adjacency. (We denote by V (F ) and E(F ) the sets of nodes and edges of the graph F , respectively, and their cardinalities by v(F ) = |V (F )| and e(F ) = |E(F )|.) Homomorphisms will be basic tools throughout the book. We introduce them in Chapter 5 (where we survey some of our knowledge about them), but use them all the time thereafter. There will be different versions like homomorphisms into weighted graphs, which play an important role in statistical physics (we will return to them at the end of the Introduction).
1.3. HOW TO OBTAIN INFORMATION ABOUT THEM?
7
Homomorphism densities can be expressed in terms of the distribution of samples, and vice versa (at least asymptotically, as the size of G tends to infinity). For example, let us consider the homomorphism density of the quadrilateral C4 in a large graph G. If we map the four nodes of C4 into G, if may be that the images are different, and so the image is a quadrilateral. It could also happen that the images of two nodes coincide. This cannot happen to adjacent nodes, because the image of an edge must be an edge; but it may happen to two opposite nodes. In this case, the image is a “V” (path with two edges). Or it can happen that both pairs of opposite nodes have the same image, and the image is a single edge. If we know the numbers of edges, V’s and quadrilaterals in G, then we can compute the number of homomorphisms of the quadrilateral into G. (Warning: the same quadrilateral in G can be the image in 8 different ways, the same V can be the image in 4 ways, and the same edge, in 2 ways!). The numbers of quadrilaterals, V’s and edges can be estimated by sampling. In fact, the last two will not matter much for very large graphs G, since a random map of 4 elements into v(G) elements will be one-to-one with high probability. So homomorphism densities and sampling distributions carry the same information, why bother to introduce both? Using homomorphisms has several advantages (and some disadvantages). • Homomorphism numbers are better behaved algebraically, and they have been used before to study various algebraic questions concerning direct product of graphs, like cancellation laws (see Section 5.4.2). Furthermore, a lot is known about other issues concerning homomorphisms: existence, structure, etc. • When looking at a (large) graph G, we may try to study its local structure by counting homomorphisms from various “small” graphs F into G; we can also study its global structure by counting homomorphisms from G into various small graphs H. The first type of information is closely related (in many cases, equivalent) to sampling, while the second is related to global observables. This way homomorphisms are pointing at a certain duality between sampling and global observation. We can sum up our framework for studying large graphs in the following formula: F −→
G −→ H.
We will informally talk about “left-homomorphisms” and “righthomomorphisms” to refer to these two kind of mappings. • We will characterize which distributions come from sampling k nodes from a (large) graph G, and we will characterize homomorphism densities as well. It turns out that a characterization of sample distributions is simpler and more natural, but putting it in another way, the characterization of homomorphism densities is more surprising, and therefore has more interesting applications. • Using homomorphisms leads us to looking at things through the spectacles of category theory, and this point of view is very fruitful. For example, sometimes one can simply “turn arrows around”, and get new results almost for free. We will say more about this generalization to categories near the end of the book, in Section 23.4.
8
1. VERY LARGE NETWORKS
1.4. How to model them? 1.4.1. Random graphs. We celebrated the 50th birthday of random graphs recently: The simplest random graph model was developed by Erd˝os and R´enyi [1959] and Gilbert [1959]. Given a positive integer n and a real number 0 ≤ p ≤ 1, we generate a random graph G(n, p) by taking n nodes, say [n] = {1, . . . , n}, and connecting any two of them with probability p, making an independent decision about each pair. There are alternate models, which are essentially equivalent from the point of view of many properties. Two of these were introduced in the early papers by Erd˝os–R´enyi [1959, 1960]: We could fix the number of edges m, and then choose a random m-element subset of the set of pairs in [n], uniformly from all such subsets. ( ) This random graph G(n, m) has very similar properties to G(n, p) when m = p n2 . Another model, closer to some of the more recent developments, is evolving random graphs, where edges are added one by one, always choosing uniformly from the set of unconnected pairs. Stopping this process after m steps, we get G(n, m). Random graphs have many interesting, often surprising properties, and a huge literature, see Bollob´as [2001], Janson, Luczak and Ruczinski [2000], or Alon and Spencer [2000]. One conventional wisdom about random graphs with a given edge density is that they are all alike. Their basic parameters, like chromatic number, maximum clique, triangle density, spectra etc. are highly concentrated. This fact will be an important motivation when defining the right measure of global similarity of two graphs in Chapter 8. Many generalizations of this random graph model have been studied. For example, we can consider a “template” for the random graph in the form of a weighted graph H on q nodes, with a weight αi > 0 associated with each node, and a weight 0 ≤ βij ≤ 1 associated with each edge ij. We assume that the nodeweights sum to 1. We may also assume that H is complete with a loop at every node, since the missing edges can be added with weight 0. A multitype random graph G(n; H) with template H is generated as follows: We take [n] = {1, . . . , n} as its node set, where we think of n as a much larger number than q. We partition [n] into q sets V1 , . . . , Vq , by putting node u in Vi with probability αi , and connecting each pair u ∈ Vi and v ∈ Vj with probability βij (all these decisions are made independently). While multitype random graphs are too close to the original Erd˝os–R´enyi model to be useful as, say, internet models, they play an extremely important role by serving as simple objects approximating arbitrarily large graphs (see Section 1.5.2 and Chapter 9). More generally, one could have different probabilities assigned to different edges (this was suggested by Erd˝os and R´enyi in their second paper [1960] already). The construction we will use a lot, namely constructing a random graph from a symmetric measurable function [0, 1]2 → [0, 1] is a related idea, discovered independently several times, first time, probably, by Diaconis and Freedman [1981]. These random graphs, which we call W -random, will be discussed in Section 10.1 and will play an important role throughout the book.
1.4.2. Quasirandom graphs. Deterministic objects that look and behave like randomly generated ones are important in various branches of science. For
1.4. HOW TO MODEL THEM?
9
example, pseudorandom number generators are basic algorithms in computer science, with many applications in Monte-Carlo algorithms, computer security and elsewhere. Exact definitions are usually difficult to give, and they vary according to need. It is very remarkable that in graph theory it is possible to give a very robust definition of quasirandom graphs, where many related or even seemingly quite different formalizations of properties of random graphs capture the same notion. We know that random graphs have a variety of quite strict properties (with high probability); it turns out that for several of these basic properties, the exceptional graphs are the same. In other words, any of these properties implies the others, regardless of any stochastic consideration. A measure of quasirandomness of a graph was introduced by Thomason [1987]; the theory of quasirandom graph sequences, which has been an important example for convergent graph sequences central to this book, was developed by Chung, Graham and Wilson [1989]. To make this idea precise, we consider a sequence of graphs (Gn ) with v(Gn ) → ∞. For simplicity of notation, assume that v(Gn ) = n. Let 0 < p < 1 be a real number. Consider the following properties of the sequence of graphs: (QR1) Almost all degrees are asymptotically pn and almost all codegrees (numbers of common neighbors of two nodes) are asymptotically p2 n. (QR2) For every fixed graph F , the number of homomorphisms of F into Gn is asymptotically pe(F ) nv(F ) . (QR3) The number of edges is asymptotically pn2 /2 and the number of 4-cycles is asymptotically p4 n4 /8. (We have to divide by 2 and 8, because we are counting unlabeled copies rather than homomorphisms.) (QR4) The number of edges induced by any set of n/2 nodes is asymptotically pn2 /8. (QR5) For any two disjoint sets X, Y of nodes, the number of edges between X and Y is p|X||Y | + o(n2 ). All these properties hold with probability 1 if Gn = G(n, p). However, more is true: if a graph sequence satisfies either one of them, then it satisfies all. Perhaps the most surprising fact along these lines is the equivalence of the second and third: prescribing the right asymptotic number of copies in Gn for just two small graphs (the edge and the 4-cycle) forces every other simple graph to have (asymptotically) the right number of copies. Such graph sequences are called quasirandom. The five properties above are only a sampler; there are many other properties of random graphs that are also equivalent to these (Chung, Graham and Wilson [1989], Simonovits and S´os [1991, 1997, 2003]). Many interesting deterministic graph sequences are quasirandom. We mention an important example from number theory: Example 1.1 (Paley graphs). Let q be any prime congruent 1 modulo 4, and let us define a graph on {0, . . . , q − 1} by connecting i and j if and only if i − j is a quadratic residue modulo q. We construct this graph for every such prime, and order them in a sequence. This graph sequence is quasirandom with density 1/2. (To verify the first property above is perhaps the easiest; see Exercise 1.2). This example also illustrates how some of the equivalent conditions above may be much easier to verify than
10
1. VERY LARGE NETWORKS
others: in this case, the third would be as easy as the verification of the first, but the second and fourth would be quite difficult: How would you count the number of copies of, say, the Petersen graph in a Paley graph? How would you count the number of those differences that are quadratic residues between, say, square-free integers in {0, . . . , q − 1}? When posed directly, these questions sound formidable; but the equivalence of the above conditions provides answers to them. We should emphasize that in this setting, quasirandomness is a property of a sequence of graphs, not of a single graph. Of course, one could introduce a measure of deviation from the “ideal” quasirandomness in each of the conditions (QR1)– (QR5), and prove explicit relationships between them. Since our interest is the limit theory, we will not go in this direction. Sometimes we need to consider quasirandom bipartite graphs, which can be defined, mutatis mutandis, by any of the properties above. More generally, just as there are multitype random graphs, there are also multitype quasirandom graph sequences. Similarly as for random graphs, a multitype quasirandom graph sequence (Gn ) is defined by a “template” weighted graph H on q nodes, with a nodeweights αi > 0 and edgeweights βij . The sequence is multitype quasirandom with template H, if the node set V (Gn ) can be partitioned into q sets V1 , . . . , Vq such that |Vi | ∼ αi v(Gn ), the subgraphs Gn [Vi ] induced by Vi form a quasirandom sequence for every i ∈ [q], and the bipartite subgraphs Gn [Vi , Vj ] between Vi and Vj form a quasirandom bipartite graph sequence for each pair i ̸= j ∈ [q]. The same remark applies as for multitype random graphs: they play an extremely important role by serving as simple objects approximating arbitrarily large graphs. The equivalence of conditions (Q1)–(Q5) can be generalized appropriately (with a larger, but finite set of graphs in (Q3) instead of just 2), as it will be discussed in Section 16.7.1. The main topic of the book, the theory of convergent graph sequences, can be considered as a further, rather far-reaching generalization of quasirandom sequences. 1.4.3. Randomly growing graphs. Random graph models on a fixed set of nodes, discussed above, fail to reproduce important properties of real-life networks. For example, the degrees of Erd˝os–R´enyi random graphs follow a binomial distribution, and so they are asymptotically normal if the edge probability p is a constant, and asymptotically Poisson if the expected degree is constant (i.e., p = p(n) ∼ c/n). In either case, the degrees are highly concentrated around the mean, while the degrees of real life networks tend to obey the “Zipf phenomenon”, which means that the tail of the distribution decreases according to a power law (unlike the most familiar distributions like Gaussian, geometric or Poisson, whose tail probability drops exponentially; Figure 1.2). In 1999 Albert and Barab´asi [1999, 2002, 2002] created a new random network model. Perhaps the main new feature compared with the Erd˝os–R´enyi graph evolution model is that not only edges, but also nodes are added by natural rules of growing. When a new node is added, it connects itself to a given number d of old nodes, where each neighbor is selected randomly, with probability proportional to its degree. (This random selection is called preferential attachment.) The Albert– Barab´asi graphs reproduce the “heavy tail” behavior of the degree sequences of real-life graphs. Since then a great variety of growing networks were introduced, reproducing this and other empirical properties of real-life networks.
1.5. HOW TO APPROXIMATE THEM?
11
Figure 1.2. Degree distributions of an Erd˝os–R´enyi random graph on 100 nodes with edge density .1 (left) and of a real life graph with similar parameters (right). The main feature to observe about the latter is not that the largest frequency is 1, but that it is much more stretched out. This is perhaps the first point which suggests one of our main tools, namely assigning limits to sequences of graphs. Just as the Law of Large Numbers tells us that adding up more and more independent random variables we get an increasingly deterministically behaving number, these growing graph sequences tend to have a well-defined structure, for almost all of the possible random choices along the way. In the limit, the randomness disappears, and the asymptotic behavior of the sequence can be described by a well-defined limit object. We will return to this in this Introduction in Sections 1.5.3 and 11.3. Exercise 1.2. Prove that the sequence of Paley graphs is quasirandom.
1.5. How to approximate them? If we want to experiment with a large network (say, try out a new protocol for the internet), then it is good to have a ”scaled down” version of it. In other words, we want a compact approximate description of a very large network, from which a network similar to the original, but of suitable size, can be generated. To make this mathematically precise, we need to define what we mean by two graphs to be “similar” or “close”, and describe what kind of structures we use for approximation. 1.5.1. The distance of two graphs. There are many ways of defining the distance of two graphs G and G′ . Suppose that the two graphs have a common node set [n]. Then a natural notion of distance is the edit distance, defined as the number of edges to be changed to get from one graph to the other. This could also be viewed as the Hamming distance |E(G)△E(G′ )| of the edge sets (△ denotes symmetric difference). Since our graphs are very large, we want to normalize this. If the graphs are dense, then a natural normalization is |E(G)△E(G′ )| . n2 While this distance plays an important role in the study of testable graph properties, it does not reflect structural similarity well. To raise one objection, consider two d1 (G, G′ ) =
12
1. VERY LARGE NETWORKS
random graphs on [n] with edge density 1/2. As mentioned in the introduction, these graphs are very similar from almost every aspect, but their normalized edit distance is large (about 1/2 with high probability). One might try to decrease this by relabeling one of them to get the best overlay minimizing the edit distance; but the improvement would be marginal (tending to 0 if n tends to infinity). Another trouble with the notion of edit distance is that it is defined only when the two graphs have the same set of nodes. We want to define a notion of distance for two graphs that are so large that we don’t even know the number of their nodes, and these numbers might be very different. For example, we want to find that two large random graphs are “close” even if they have a different number of nodes. One useful way to overcome these difficulties is to base the measurement of distance on sampling. Recall that for a graph G, σG,k is the probability distribution on graphs on [k] = {1, 2, . . . , k} obtained by selecting a random ordered k-subset of nodes and taking the subgraph induced by them. Strictly speaking, this is only defined when k ≤ v(G); but we are interested in taking a small sample from a large graph, not the other way around. To make the definition precise, let us say that the sampling returns the edgeless k-node graph if k > v(G). (In this case it would be a better solution to sample with repetition, but sampling without repetition is better in other cases, so let us stick to it.) Now if we have two graphs G and G′ , we can compare the distributions of knode samples for any fixed k. We use the variation distance between distributions α and β on the same set, defined by dvar (α, β) = sup |α(X) − β(X)|, X
where the supremum is taken over all measurable subsets (observable events). If we want to measure the distance of two graphs by a single number, we use a simple trick known from analysis: We define the sampling distance of two dense graphs G and G′ by (1.2)
∞ ∑ 1 δsamp (G, G ) = dvar (σG,k , σG′ ,k ) 2k ′
k=1
k
(Here the coefficients 1/2 are quite arbitrary, they are there only to make the sum convergent; but the above is a convenient choice.) This distance notion is very suitable for our general goals, since two graphs are close in this distance if and only if random sampling of “small” induced subgraphs does not distinguish them reliably. However, sampling distance has one drawback: it does not directly reflect any structural similarity. In Chapter 8 we will define a notion of distance, called cut distance, between graphs, which will be satisfactory from all these points of view: it will be defined for two graphs with possibly different number of nodes, the distance of two random graphs with the same edge density will be very small, and it will reflect global structural similarity. The definition involves too many technical details to be given here, unfortunately. But it will turn out (and this is one of the main results in this book) that the cut distance is equivalent to the sampling distance in a topological sense. 1.5.2. Approximation by smaller: Regularity Lemmas. Let us return to the question of “scaling down” a huge graph, first in the dense case. The main
1.5. HOW TO APPROXIMATE THEM?
13
tool for doing so is the “Szemer´edi Partition” or “Regularity Lemma”. Szemer´edi developed the first version of the Regularity Lemma for his celebrated proof of the Erd˝os–Tur´ an Conjecture on arithmetic progressions in dense sets of integers in 1975. Since then, the Lemma has emerged as a fundamental tool in graph theory, with many applications in extremal graph theory, combinatorial number theory, graph property testing etc., and became a true focus of research in the past years. Informally, the Regularity Lemma says that every graph can be approximated by a multitype quasirandom graph, where the number of classes depends on the error of the approximation only. This lemma can be viewed as an archetypal example of dichotomy between randomness and structure, where we try to decompose a (large and complicated) object A into a more highly structured object A′ with a (quasi)random perturbation (cf. Tao [2006c]). The highly structured part may be easier to handle because of the structure, and the quasirandom part will often be easier to handle due to Laws of Large Numbers. Pixel pictures. In this introductory part, we want to illustrate the idea of a regularity partition visually. To this end, let us introduce a non-standard way of visualizing graphs. On the left of Figure 1.3 we see a graph (the Petersen graph). In the middle, we see its adjacency matrix. On the right, we see another version of its adjacency matrix, where the 0’s are replaced by white squares and the 1’s are replaced by black squares. We think of the whole picture as the unit square, so the little squares have side length 1/n, where n is the number of nodes. The origin is in the upper left corner, following the convention of indexing matrix elements. 0 1 0 0 1 1 0 0 0 0
1 0 1 0 0 0 1 0 0 0
0 1 0 1 0 0 0 1 0 0
0 0 1 0 1 0 0 0 1 0
1 0 0 1 0 0 0 0 0 1
1 0 0 0 0 0 0 1 1 0
0 1 0 0 0 0 0 0 1 1
0 0 1 0 0 1 0 0 0 1
0 0 0 1 0 1 1 0 0 0
0 0 0 0 1 0 1 1 0 0
Figure 1.3. The Petersen graph, its adjacency matrix, and its pixel picture It is not clear that this pixel picture reveals more about small graphs than the usual way of drawing them (probably less), but it can be suggestive for large graphs. Figure 1.4 shows the usual drawing and the pixel picture of a half-graph, a bipartite graph defined on the set {1, . . . , n, 1′ , . . . , n′ }, where the edges are the pairs (i, j) with i ≤ j ′ . For large n, the pixel picture of a half-graph may be more informative, as we will see in the next section. The left square in Figure 1.5 is the pixel picture of a (reasonably large) random graph. We don’t see much structure—and we shouldn’t. From a distance, this picture is more-or-less uniformly grey, similar to the second square. The 100 × 100 chessboard in the third picture is also uniformly grey, or at least it would become so if we increased the number of pixels sufficiently. One might think that it represents
14
1. VERY LARGE NETWORKS
Figure 1.4. A half-graph and its pixel picture a graph that is close to the random graph. But rearranging the rows and columns so that odd indexed columns come first, we get the 2 × 2 chessboard on the right! So wee see that both the middle and the right side pictures represent a complete bipartite graph. The pixel picture of a graph depends on the ordering of the nodes. We can be reassured, however, that a random graph remains random, no matter how we order the nodes, and so the picture on the left remains uniformly grey, no matter how the nodes are ordered.
Figure 1.5. A random graph with 100 nodes and edge density 1/2, a random graph with very many nodes and edge density 1/2, a chessboard, and the pixel picture obtained by rearranging the rows and columns. Remark 1.3. Using pixel pictures to represent graphs, in particular random graphs, goes in a sense in the opposite direction to what was studied in the psychology of vision. Of course, processing images given by pixel pictures has been a fundamental issue in connection with computer graphics and related areas, and we are not going into this issue in this book. But we should mention the work of Julesz, who studied the question of how well the human eye can distinguish random noise (like Figure 1.5(a)) from images that are also uniformly grey but more structured (textured). The chessboard in Figure 1.5(b) would be a trivial example of such an image. Disproving some of his conjectures, Diaconis and Freedman [1981] constructed pixel pictures that are very closely related to our W -random graphs. The Regularity Lemma. We illustrate the Regularity Lemma by Figure 1.6. The graph on the left side (given by its pixel picture) looks quite random. In the middle we see the same graph, with its nodes ordered differently. In this picture, we see some structure of the graph (even though it is not as clear-cut as in Figure 1.5); what we see is that the upper left corner is denser, and the lower right corner is sparser. If we cut the picture into four equal parts, and average the “blackness” in each, we get the picture on the right. Inside each of the four parts, the arrangement
1.5. HOW TO APPROXIMATE THEM?
15
is quite random-like, and further rearrangement would not reveal any additional structure. Still informally, the Regularity Lemma says the following: The nodes of every graph can be partitioned into a “small” number of “almost equal” parts in such a way that for “almost all” pairs of partition classes, the bipartite graph between them is “quasirandom”.
Figure 1.6. A random-looking pixel picture, an informative rearrangement, and its regularity partition Some of the expressions in quotation marks are easy to explain. For the whole theorem, we have an error bound 0 < ε < 1 specified in advance. The condition that the parts are “almost equal” means that their sizes differ by at most one: if the graph has n nodes partitioned into k classes, then the size of each class is either ⌊n/k⌋ or ⌈n/k⌉. The condition that the number of classes is “small” means that it can be bounded by an explicit function f (ε) of ε; to exclude trivialities, we ( )also assume that k ≥ 1/ε. “Almost all” pairs of classes means that we allow ε k2 exceptional pairs about which we don’t claim anything (we can include the subgraphs induced by the classes among these exceptions). Finally, we need to define what it means to be “random-like”: one way to put it is that this bipartite graph is quasirandom with some density pij (which may be different for different pairs of classes) and with error ε, in the sense introduced (informally) in Section 1.4.2. Regularity partitions and quasirandomness have a lot to do with each other. Not only is quasirandomness part of the statement of the Regularity Lemma, but the regularity lemma can be used to characterize quasirandomness: Simonovits and S´os [1991] proved that a graph sequence is quasirandom with density p if and only if the graphs have regularity partitions for arbitrarily small ε > 0 such that the densities pij between the partition classes tend to p. I have to come back to the “small” number of partition classes. The proof gives 2... a tower 22 of height 1/ε5 , which is a very large number, and which unfortunately cannot be improved too much, since Gowers [1997] constructed graphs for which the smallest number of classes in a Szemer´edi partition was at least a tower of height log(1/ε). So the tower behavior is a sad fact of life. There are related partitions with a more decent number of classes, as we shall see in Chapter 9, where regularity partitions will be defined formally. We will also discuss situations when the regularity partitions have a very decent size, like 1/εconst (Sections 13.4 and 16.7). Implicitly or explicitly, regularity partitions will be used throughout this book.
16
1. VERY LARGE NETWORKS
1.5.3. Approximation by infinite: convergence and limits. This idea can be motivated by how we look at a large piece of metal. This is a crystal, that is a really large graph consisting of atoms and bonds between them. But from many points of view (e.g., the use of the metal in building a bridge), it is more useful to consider it as a continuum with a few important parameters (density, elasticity etc.). Its behavior is governed by differential equations relating these parameters. Can we consider a more general very large graph as some kind of a continuum? Our way to make this intuition precise is to consider a growing sequence (Gn ) of graphs whose number of nodes tends to infinity, to define when such a sequence is convergent, and to assign a limit object to convergent graph sequences, which somehow incorporates all the properties we want to be remembered. (We have mentioned this idea in connection with randomly growing graphs, but now we don’t assume anything about how the graphs in the sequence are obtained.) This plan is the backbone of this book: we will carry it out both for dense graphs and also for graphs with bounded degree. There will be a good collection of applications of this work. Our discussion of sampling from a graph suggests a general principle leading to a definition: we consider samples of a fixed size k from Gn , and their distribution. We say that the sequence is locally convergent (with respect to the given sampling method) if this distribution tends to a limit as n → ∞ for every fixed k. For dense graphs, this notion of convergence was defined by Borgs, Chayes, Lov´ asz, S´os, and Vesztergombi [2006, 2008]; some elements of this definition go back to Erd˝os, Lov´ asz and Spencer [1979]. This notion has many useful properties. Perhaps most important of these is that it can be characterized in terms of the cut distance of graphs. It is not hard to see that the above notion of convergence is equivalent to saying that the graph sequence is a Cauchy sequence in the sampling distance. One of the main results presented in this book is Theorem 11.3, which can be stated informally as follows: The same graph sequences are convergent (Cauchy sequences) for both the cut distance and the sampling distance. If we have a notion of convergence, the question arises naturally: what does it converge to? Can we describe a limit object for every convergent graph sequence? The family of limiting sample distributions (one for each k) can be considered as a limit object of the sequence (we call this the “weak limit”). This is not always a helpful representation of the limit object, and a more explicit description is desirable. A next step is to represent the family of distributions on finite graphs (the samples) by a single probability distribution on countable graphs: we get certain notion of random graphs on the countable set N∗ = {1, 2, 3, . . . } (see Theorem 11.52). More explicit descriptions of these limit objects can also be given, in the form of a two-variable measurable function W : [0, 1]2 → [0, 1], called a graphon (Lov´asz and Szegedy [2006]; see Section 7). These limit objects can be considered as weighted graphs with an underlying set of continuum cardinality. (If you wish, you can also think of these graphons as unweighted graphs on a non-standard model of the unit interval, where W (x, y) is the density of edges between an infinitesimal neighborhood of x and an infinitesimal neighborhood of y; this approach will be explained in Section 11.3.2). Random graphs with edge density 1/2 converge to the
1.5. HOW TO APPROXIMATE THEM?
17
identically 1/2 function (have a look at the two squares on the left of Figure 1.5). Figure 1.7 illustrates that the sequence of half-graphs (discussed in Section 1.5.2) converges to a limit (the function W (x, y) = 1(y ≥ x + 1/2 or x ≥ y + 1/2). It has been observed and used before (see e.g. Sidorenko [1991]) that such functions can be used as generalizations of graphs, and this gives certain arguments a greater analytic flexibility.
Figure 1.7. A half-graph, its pixel picture, and the limit function Let us describe another example here (more to follow in Section 11.4.2). The picture on the left side of Figure 1.8 is the adjacency matrix of a graph G with 100 nodes, where the 1’s are represented by black squares and the 0’s, by white squares. The graph itself is constructed by a simple randomized growing rule: Starting with a single node, we create a new node, and connect every pair of nonadjacent nodes with probability 1/n, where n is the current number of nodes. (This construction will be discussed in detail in Section 11.4.2.)
Figure 1.8. A randomly grown uniform attachment graph with 100 nodes, and a (continuous) function approximating it The picture on the right side is a grayscale image of the function U (x, y) = 1 − max(x, y). (Recall that the origin is in the upper left corner!) The similarity with the picture on the left is apparent, and suggests that the limit of the graph sequence on the left is this function. This turns out to be the case in a well defined sense. It follows that to approximately compute various parameters of the graph on the left side, we can compute related parameters of the function on the right side. For example, the triangle density of the graph on the left tends (as n → ∞) to the integral ∫ (1.3) U (x, y)U (y, z)U (z, x) dx dy dz [0,1]3
18
1. VERY LARGE NETWORKS
(the evaluation of this integral is a boring but easy task). It is easy to see how to generalize this formula to express the limiting density of any fixed graph F . We hope that the examples above provide motivation for the following fact, which is one of the key results to be discussed in the book (Theorem 11.21): The limit of any convergent graph sequence can be represented by a graphon, in the sense that the limiting density of any fixed simple graph F is given by an integral of the type (1.3). Of course, a graphon can be infinitely complicated; but in many cases, limits of growing graph sequences have a limit graphon that is a continuous function described by a simple formula (see some further examples in Section 11.4.2). Such a limit graphon provides a very useful approximation of a large dense graph. Graphons can be considered as generalizations of graphs, and this way of looking at them is very fruitful. In fact, many results can be stated and proved for graphons in a more natural and cleaner way. In particular, regularity lemmas can be extended to graphons, where we will see that they are statements about approximating general measurable functions by stepfunctions. Approximating graphs by multitype quasirandom graphs is as basic a tool in graph theory as approximating functions by stepfunctions is in analysis. Remark 1.4. Much of this book is about finite, countable and uncountable graphs and connections between them. There are two technical limitations of measure theory that we have to work our way around. (a) One cannot construct more than countably many independent random variables (in a nontrivial way, neither of them concentrated on a single value). This is the reason while we cannot define a random graph on an uncountable set like [0, 1], only on finite and countable subsets of it. (b) There is no uniform distribution on a countable set (while there is one on every finite set and then again on sets with continuum cardinality like [0, 1]). This limitation is connected to the fact that the limit objects for convergent graph sequences will be graphons (which could be considered as graphs defined on a continuum) rather than graphs on a countable set as one would first expect. I want to emphasize that these difficulties are not just annoying technicalities: they reflect the fact, for example, that the limit object of a convergence graph sequence carries a lot more information than what could be squeezed into a countable graph. Both measure theory and combinatorics force us into the same realm. 1.6. How to run algorithms on them? 1.6.1. Parameter estimation. What can we learn about a huge graph G from sampling? There are several related questions here, depending on what we need as a result. The easiest setup is when we want to compute a numerical parameter of the graph; say, how large is the maximum cut, or what fraction of the triples induce a triangle. We call this problem parameter estimation. Most of the time we normalize the parameter to be between 0 and 1. Since, as discussed above, we get information about the graph through random sampling, any answer we can possibly compute will, with some probability, be in error. So we will have to specify an error parameter ε > 0, and will have to accept an answer which, with probability at least 1 − ε, will be closer than ε to the true value of the parameter. (An ) easy example is to estimate the triangle density (number of triangles divided by n3 ). A trivial algorithm is to pick many random triples of nodes independently,
1.6. HOW TO RUN ALGORITHMS ON THEM?
19
and count how many of them form triangles in the graph. Elementary statistics tells us that if we sample O(ε−2 | log ε|) triples, then with probability at least 1 − ε, our estimate will be closer than ε to the truth. A much more interesting and difficult example is that of estimating the density ( ) a of the maximum cut (its size divided by n2 ) in a graph G. One thing we can try is to choose N random nodes (where N depends on the error bound ε), and compute the density X of the maximum cut in the subgraph H they induce. Is X a good estimate for a? The inequality X ≥ a − ε (for every ε > 0 if N is large enough, with high probability) is relatively easy to prove. The graph G has a cut C with density a, and this cut provides a cut C ′ in the random induced subgraph H. It is easy to see that the density of C ′ is the same as the density a of C in expectation, and it takes some routine computation in probability theory to show that it is highly concentrated around this value. The density X of the largest cut in H is at least the density of C ′ , and so with high probability it is at least a − ε (Figure 1.9).
Figure 1.9. A dense cut in the large graph gives a dense cut in the sample. The reverse inequality is much more difficult to prove, at least from scratch, and in fact it is rather surprising. We can phrase the question like this: Suppose that most random induced subgraphs H on N nodes have a cut that is denser than b. Does it follow that G has a cut that is denser than b − ε? It is not clear why this should be so: why should these cuts in these small subgraphs “line up” to give a dense cut in G? We will see that it does follow that the estimate is correct, once N is large enough (about ε−4 | log ε|). In fact, one can give general necessary and sufficient conditions under which parameters can be estimated by sampling, as we will see in Section 15.1. 1.6.2. Property testing. A more complicated issue is property testing: we want to determine whether the graph has some given property, for example, can it be decomposed into two connected components of equal size, is it planar, or does it contain any triangle. We could consider this as a 0-1 valued parameter, but computing this parameter approximately would not make sense (or rather, it would be requiring too much, since this would be equivalent to exact computation). A good way of posing this problem was developed by Rubinfeld and Sudan [1996] and Goldreich, Goldwasser and Ron [1998]. In the slightly different context of “additive approximation”, closely related problems were studied by Arora, Karger
20
1. VERY LARGE NETWORKS
and Karpinski [1995] (see e.g. Fischer [2001] for a survey and the volume edited by Goldreich [2010] for a collection of more recent surveys). This approach acknowledges that any answer is only approximate. Suppose that we want to test for a property P, and we get information about the graph by taking a bounded size random sample of the nodes, and inspecting the subgraph induced by them. We interpret the answer of the algorithm as follows: If it concludes that the graph has property P, this means that we can change εn2 edges so that we get a graph with property P; if it concludes that the graph does not have property P, this means that we can change εn2 edges so that we get a graph without property P. Again, we have to specify an error parameter ε > 0 in advance, and will have to accept an answer which may be wrong with probability ε, and even if it is “right”, it only means that we can change εn2 edges in the graph so that the answer becomes correct. Sometimes we can do better and eliminate either false positives or false negatives. As an example, let us try to test whether a given (dense) graph contains a triangle. We take a sample of size f (ε) (the best function f which is known to work is outrageously large, but let’s not worry about this), and check whether they contain a triangle. If they do, then we know that the graph has a triangle. If they don’t, then one can prove (see Section 15.3) that with high probability, we can delete εn2 edges from the graph so that no triangle remains. Remark 1.5. We will not be concerned with the sample size as a function of the error bound ε. Sometimes it is polynomial (as in the examples above), but in other cases one uses the Regularity Lemma, which forces tower-size samples, making the algorithms of theoretical interest only. Goldreich [2010], in his survey of property testing, emphasizes the importance of testing with samples of manageable size, and I could not agree more; but this book, being about limit theory, does not address this issue. Another caveat: Many extensions deal with testing models where we are allowed to sample more than a constant number of nodes of the large graph G. For this, we have to take the number of nodes into account, but usually it is enough to know the order of magnitude of the number of nodes, which in practical situations is easy to do. We do not discuss these important methods in our book. 1.6.3. Computation of a structure. Perhaps the most complex algorithmic task is the computation of a structure, where the structure is of size comparable with the graph itself. For example, we want to find a perfect matching in the graph, or a maximum cut (not just its density, but the cut itself), or a regularity partition in a huge dense graph. The conceptual difficulty is that the output of the algorithm is too large to be explicitly produced. What we can do is to carry out some preprocessing whose result can be stored (e.g., label a bounded number of nodes or edges), and give an algorithm which, for given input node or nodes, determines the local part of the structure we are looking for. Usually, this algorithm returns the “status” of a node or edge in the output structure (for example, whether the given edge belongs to matching, or which side of the cut the given node belongs to). As an example, we will describe in Section 15.4.3 how to compute a maximum cut. We can access the graph by taking a bounded size sample of the nodes, and inspect the subgraph induced by them. For a given ε > 0, we precompute a
1.6. HOW TO RUN ALGORITHMS ON THEM?
21
“representative set” (see next section) together with a bipartition of this set. In addition, we describe a “Placing Algorithm” which has an arbitrary node v as its input, and tells us on which side of the cut v is located. This Placing Algorithm can be called any number of times with different nodes v, and the answers it gives should be consistent with an approximately maximum cut. For example, calling this algorithm many times, we can estimate the density of the maximum cut (but this can be done in an easier way, as we have seen). The parameter ε is an error bound: the cut computed may be off the true maximum cut by εn2 edges, the precomputation may be wrong with probability at most ε, and for each query, the answer may be in error with probability at most ε. 1.6.4. Representative set. Szemer´edi partitions are closely related to the main ingredient in these algorithms, namely a “representative set”. We want to select a (fairly large, but bounded size) subset R of the nodes such that every node is “similar” to one of the nodes in R. To be economical, we don’t want to include similar points in R. We must start with defining what “similar” means; we will do so by defining a “similarity distance” between two nodes of a graph. A first idea would be to use their distance in the graph (the length of the shortest path connecting them). However, this measures something else (the prime minister and the doorman in his office know each other, but their positions in the society are certainly not similar). We could try considering two nodes similar, if their neighborhoods differ by little. This is certainly a reasonable thing to do, but it is too restrictive for our purposes. For example, if we consider a random graph on n nodes with edge density 1/2, then the neighborhoods of any two nodes are very different (they have about n/2 elements and overlap in about n/4), but all nodes of a random graph are alike, so we would like them to be close in the similarity distance. It turns out (somewhat surprisingly) that it suffices to consider second neighborhoods: we consider two nodes s and t similar, if for most other nodes v, the number of paths of length two from s to v is about the same as the number of paths of length two from t to v. The similarity distance defined this way (for the exact definition, see Section 15.4.1) has many nice properties: • The similarity distance can be computed by sampling. • For every ε > 0, every graph has a “representative set” R of nodes, whose size depends on ε only; nodes in this set are at least ε apart, and almost every node is at a distance less than ε from the representative set. • The representative set can be computed by sampling. • Borrowing a phrase from geometry, we define the Voronoi cell of a node v of the representative set R as the set of all nodes in the whole graph that are closer to v than to any other node of R. The Voronoi cells of the representative set give a Weak Regularity Partition, and vice versa, every Weak Regularity Partition, after deletion of a fraction of ε of the nodes, consists of sets with small diameter in the similarity distance. The key to many structural computational problems is that first a representative set is computed, and then the status of any node or edge can be computed using the representative set. For example, if we want to compute a Weak Regularity Partition, we compute a representative set, which we consider as a set of representative nodes of the partition classes, which are the Voronoi cells of the nodes. We
22
1. VERY LARGE NETWORKS
cannot compute all the Voronoi cells; but if we want to know which class (cell) does a given node belong to, all we need to do is to compute its distance to the nodes in R.
1.7. Bounded degree graphs Let us discuss briefly how, and to what degree, the above considerations carry over to graphs with bounded degree. (We are doing injustice here to a rich and very active research area; I hope some of this will be rectified in Part 4 of the book. One of the reasons is that the technicalities in the bounded degree case are deeper, and so it is more difficult to state key results, even informally.) Sampling. In the case of graphs with bounded degree, the subgraph sampling method gives a trivial result: the sampled subgraph will almost certainly be edgeless. Probably the most natural way to fix this is to consider neighborhood sampling (Figure 1.1). Let GD denote the class of finite graphs with all degrees bounded by D. For G ∈ GD , select a random node and explore its neighborhood to a given depth r. This provides a probability distribution ρG,r on graphs in GD , with a specified root node, such that all nodes are at distance at most r from the root. We will briefly refer to these rooted graphs as r-balls. Note that the number of possible r-balls is finite if D and r are fixed. The situation for bounded degree graphs is, however, less satisfactory than for dense graphs, for two reasons. First, a full characterization of what distributions of r-balls the neighborhood sampling procedure can result in is not known (cf. Conjecture 19.8). Second, neighborhood sampling misses some important global properties of the graph, like expansion. In Section 19.2 we will introduce a notion of convergence, called local-global, which is better from this point of view, but it is not based on any implementable sampling procedure. This suggests looking at further possibilities. Suppose, for example, that instead of exploring the neighborhood of a single random node, we could select two (or more) random nodes and determine simple quantities associated with pairs of nodes, like pairwise distances, maximum flow, electrical resistance, hitting times of random walks (studies of this nature have been performed, for example, on the internet, see e.g. Kallus, H´aga, M´atray, Vattay and Laki [2011]). What information can be gained by such tests? Is there a “complete” set of tests that would give enough information to determine the global structure of the graph to a reasonable accuracy? Such questions could lead to different theories of large graphs and their limit objects; at this time, however, they are unexplored. Remark 1.6. It is interesting to note that our two sampling methods correspond to the two basic data structures for graph algorithms, adjacency matrix and neighborhood lists. To be more specific, both methods assume that we can choose a uniformly distributed random node, and repeat this a constant number of times. In subgraph sampling, we must be able to determine whether two given nodes are adjacent or not. For a graph that is explicitly given, this is easy if the graph is given by its adjacency matrix. For neighborhood sampling, we have to be able to find all neighbors of a given node. This is easy if the graph is given by neighborhood lists. It would be very time consuming to perform these sampling operations on a graph given by the wrong data structure.
1.7. BOUNDED DEGREE GRAPHS
23
Sampling distance. The construction of the sampling distance can be carried over to graphs with bounded degree, by replacing in (1.2) the sampling distributions σG,k by the neighborhood distributions ρG,k . We must point out, however, that it seems to be difficult to define a notion of distance between two graphs with bounded degree (in analogy with the cut distance) that would reflect global similarity. Regularity Lemma. This is one of the big unsolved problems for graphs with bounded degree. If we consider regularity lemmas as providing “approximation by the smaller”, then there is a simple non-constructive result (Proposition 19.10), which should be proved in a constructive way to be really useful. One can start at many other facets of the Regularity Lemma, but a satisfactory version of bounded degree graphs has turned out most elusive. Convergence. The notion of a convergent sequence of bounded degree graphs was in fact the first among such convergence notions, introduced by Benjamini and Schramm [2001], motivated in part by earlier work of Aldous [1998]. Our discussion of local convergence of dense graphs above, based on the convergence of the distribution of samples, was modeled on the Benjamini–Schramm definition of convergence of bounded degree graphs. There are, however, good reasons to try to strengthen this notion. Unlike in the dense case, neighborhood sampling cannot distinguish between bipartite graphs and graphs that are far from being bipartite, cannot estimate the maximum cut etc., which means that locally convergent graph sequences must lose this information in the limit. We will introduce and study a stronger notion of convergence, which we call local-global, which passes on these properties and parameters to the limit. However, we don’t know if there is any natural and practical algorithmic setup that would correspond to local-global convergence. Limit objects. For bounded degree graphs, Benjamini and Schramm provide a notion of a limit object (see Section 18). The Benjamini–Schramm limit object can be described as a distribution on rooted countable graphs with a special property called “involution invariance”. Another way of describing a limit object is a “graphing”. In a sense, this latter object is what we expect: a bounded degree graph on an infinite (typically uncountable) set, with appropriate measurability and measure preserving conditions. This construction was folklore in an informal way; the first exact statements were published by Aldous and Lyons [2007] and Elek [2007a]. Graphings were invented by group theorists. The idea is to consider a finitely generated group acting on a probability space (for example, rotations of a circle by integer multiples of a given angle). One can construct a graph on the underlying space, by connecting each point to its images under the generators of the group. This construction gives a graph with bounded degree (the set of points is typically of continuum cardinality). It is a beautiful fact that graphings, representing groups this way, are just right to describe the limit objects of convergent graph sequences with bounded degree. Depending on personal taste, a graphing may be considered more complicated or less complicated than an involution-invariant random countable rooted graph. But graphings have an important advantage: they can express a richer structure, the limits of graph sequences convergent in the local-global sense.
24
1. VERY LARGE NETWORKS
Algorithms. Here is finally an area where the study of bounded degree graphs can be considered at least as advanced as the study of dense graphs. Let us discuss the task of computing a structure. Selecting random nodes and exploring their neighborhoods, we see (with high probability) disjoint parts of the graph, and so there is no method to build up a global structure. Still, very nontrivial algorithms can be designed in this model. For example, in Section 22.3.1 we describe an algorithm due to Nguyen and Onak [2008], that constructs an almost maximum matching. The way the output can be described is similar to how the output of a maximum cut algorithm was described in the dense setting: for any node we can tell which other node it is matched to, inspecting a bounded neighborhood only; these assignments will be consistent throughout the graph; and the difference in size from the true maximum matching is only εn, where ε > 0 is an error bound and n is the number of nodes. There is an equivalent way to describe such algorithms, which may be easier to follow, and this is the model of distributed computing (going back to the 1980’s). In this case, an agent (or processor) is sitting at each node of the graph, and they cooperate in exploring various properties of it. They can only communicate along the edges. In the case we are interested in (which is in a sense extreme), they are restricted to exchange a bounded number of bits (where the bound may depend on the degree D, on an error bound ε, and of course on the task they are performing, but not on the number of nodes). In some other versions of the model (cellular automata), the amount of communication is not restricted, but the computing power of the agents is. Note that in our model communication between the agents is restricted to a bounded number of bits, and hence they may be assumed to be very stupid, even finite automata. There is a large literature on distributed computing, both from the practical and theoretical aspect. We will not be able to cover this; we will restrict ourselves to the discussion of the strong connection of this computation model with our approach to large graphs and graph limits.
CHAPTER 2
Large graphs in mathematics and physics The algorithmic treatment of very large networks is not the only area where the notions of very large graphs and their limits can be applied successfully. Many of the problems and methods in graph limit theory come from extremal graph theory or from statistical physics. Let us give s very brief introduction to these theories. 2.1. Extremal graph theory Extremal graph theory is one of the oldest areas of graph theory; it has some elegant general results, but also many elementary extremal problems that are still unsolved. Graph limit theory (mostly the related theory of flag algebras by Razborov) has provided powerful tools for the solution of some of these problems. Furthermore, graph limits, along with the algebraic tools that will be introduced soon, will enable us to formulate and (at least partially) answer some very general questions in extremal graph theory (similarly to the general questions for very large graphs posed in the previous chapter). 2.1.1. Edges vs. triangles. Perhaps the first result in extremal graph theory was found by Mantel [1907]. This says that if a graph on n nodes has more than n2 /4 edges, then it contains a triangle. Another way of saying this is that if we want to squeeze in the largest number of edges without creating a triangle, then we should split the nodes into two equal classes (if n is odd, then their sizes differ by 1) and insert all edges between the two classes. As another early example, Erd˝os [1938] proved a bound on the number of edges in a C4 -free bipartite graph (see (2.9) below), as a lemma in a paper about number theory. Mantel’s result is a special case of Tur´an’s Theorem [1941], which is often considered as the work that started the systematic development of extremal graph theory. Tur´ an solved the generalization of Mantel’s problem for any complete graph in place of the triangle. We define the Tur´ an graph T (n, r) (1 ≤ r ≤ n) as follows: we partition [n] into r classes as equitably as possible, and connect two nodes if and only if they belong to different classes. Since we are interested in large n and fixed r, the complication that the classes cannot be exactly equal in size (which causes the formula for the number of edges of T (n, r) to be a bit ugly) should not worry us. It will be enough to know that the number of edges in a Tur´an graph is ( )( ) ( ) r n 2 , e T (n, r) ∼ r 2 and in terms of( the homomorphism densities defined in the previous chapter in ) (1.1), we (have t K2 , T)(n, r) ∼ 1 − 1r . For the triangle density we have the similar formula t K3 , T (n, r) ∼ (1 − 1r )(1 − 2r ). 25
26
2. LARGE GRAPHS IN MATHEMATICS AND PHYSICS
Theorem 2.1 (Tur´ an’s Theorem). Among all graphs on n nodes containing no complete k-graph, the Tur´ an graph T (n, k − 1) has the maximum number of edges. Let us return to triangles, however, and ask for not just their existence, but for their number, when the number of edges is known. All of a sudden, we get to a rather difficult problem with some unexpected complications (which makes the subject fascinating). It is really difficult to think of a simpler question about small subgraphs of a large graph! Since we are interested in large n, it is natural to normalize, and use homomorphism densities. The Mantel–Tur´an Theorem says, in this language, that t(K2 , G) > 1/2 ⇒ t(K3 , G) > 0. ( ) Every graph G produces a pair of numbers t(K2 , G), t(K3 , G) this way, which we can consider as a point in the plane. If we plot this point for every graph G, we get a picture as in Figure 2.1(a). To be more precise, we get a countably infinite set of points; the figure shows its closure, which we denote by D2,3 . (Another motivation for introducing convergent graph sequences and their limit objects: they give a meaning to all points of this figure.)
(2.1)
(a)
(b)
(c)
Figure 2.1. (a) The closure D2,3 of the set of pairs of edge density and triangle density. (b) Goodman’s bound. (c) Bollob´as’s’ bound. The picture is a little distorted in order to show its special features better. Some features of this picture are easy to explain. The lower edge means that there are triangle-free graphs with edge density up to 1/2, and the Mantel–Tur´an Theorem says that for larger edge density, the triangle density must be positive. A lower bound for the triangle density was proved by Goodman [1959], (2.2)
t(K3 , G) ≥ t(K2 , G)(2t(K2 , G) − 1),
which corresponds to the parabola shown in 2.1(b). The upper boundary curve turns out to be given by the equation y = x3/2 , which is a very special case of the Kruskal–Katona Theorem in extremal hypergraph theory (the full theorem gives the precise value, not just asymptotics, and concerns uniform hypergraphs, not just graphs). In other words, this says that (2.3)
t(K3 , G) ≤ t(K2 , G)3/2 .
2.1. EXTREMAL GRAPH THEORY
27
Both (2.2) and (2.3) are sharp in a sense: Goodman’s Theorem is sharp if the edge density is of the form 1/2, 2/3, 3/4, . . . (Tur´an graphs give equality). In this form of the Kruskal–Katona Theorem equality is not attained except at the points (0, 0) and (1, 1), but for every point (x, x3/2 of the upper boundary curve there are points representing a graph arbitrarily close (just use graphs consisting of a complete graph and isolated nodes). From our perspective, there is nothing to improve on the upper bound, but can we get arbitrarily close to the lower bound between two special edge density values 1 − 1/k? Surprisingly, the answer is no. Bollob´as [1976] proved in 1976 that 1 , 1 − k1 ) is not only the triangle density for a graph with edge density x ∈ (1 − k−1 above the parabola, but also above the chord of the parabola connecting the special points corresponding to T (n, k − 1) and T (n, k). Lov´ asz and Simonovits [1976, 1983] formulated a conjecture about the exact bounding curve, and proved it in very small neighborhoods of the special edge density values above. One way to state this is that the minimum number of triangles is attained by a complete k-partite graph with unequal color classes. The sizes of the color classes can be determined by solving an optimization problem, which leads to a cubic concave curve connecting the two special points. This conjecture turned out quite hard. Lov´ asz and Simonovits proved it in the special case when the edge density x was close to one of the endpoints of the interval. Fisher [1989] proved the conjecture for the first interval (1/2, 2/3). After quite a while, Razborov [2007, 2008] proved the general conjecture. His work was extended by Nikiforov [2011] to bounding the number of complete 4-graphs, and by Reiher [2012] to all complete graphs. So we know what the lower and upper bounding curves are. Luckily, math plays no further tricks on us: it is easy to see that for every point between the two curves there are points representing graphs arbitrarily close. I dwelt quite long on this very simple special problem not only to show how complicated it gets (and yet solvable), but also because Razborov’s methods for the solution fit quite well in the framework developed in this book, and they will be presented in Chapter 16. 2.1.2. A sampler of classical results. Let us start with some remarks to simplify and to some degree unify the statements of these results. Every algebraic inequality between subgraph densities can be “linearized”, using the following multiplicativity of t(., G): (2.4)
t(F1 F2 , G) = t(F1 , G)t(F2 , G),
where F1 F2 denotes the disjoint union of F1 and F2 . (This property will play a very important role in the sequel, but right now it is just a convenient simplification.) For example, we can replace (2.2) by (2.5)
t(K3 , G) ≥ 2t(K2 K2 , G) − t(K2 , G).
We can make the statements (and their proofs, as we will see below) more transparent by two further tricks: first, if a linear inequality between the densities of certain subgraphs F1 , . . . , Fk holds for all graphs, then we write it as an inequality between F1 , . . . , Fk ; and for specific small graphs Fi , we use little pictograms. Goodman’s Inequality (2.2) can be expressed as follows: (2.6)
K3 ≥ 2K2 2 − K2
28
2. LARGE GRAPHS IN MATHEMATICS AND PHYSICS
or (2.7)
≥2
−
.
The Kruskal–Katona Theorem for triangles is: ≤
(2.8)
.
Let us describe some further classical results. Instead of counting complete graphs, we can consider the density of some other graph F in G. Erd˝os proved the inequality (2.9)
t(C4 , G) ≥ t(K2 , G)4 ,
or in pictograms (2.10)
≥
.
Graphs with asymptotic equality here are quasirandom graphs (Section 1.4.2). Bounding from below the homomorphism density of paths is a more difficult question, but it turns out to be equivalent to theorems of Mulholland and Smith [1959], Blakley and Roy [1965], and London [1966] in matrix theory (applied to the adjacency matrix). If Pk denotes the path with k nodes, then for all k ≥ 2, (2.11)
t(Pk , G) ≥ t(K2 , G)k−1 .
Regular graphs give equality here. The first nontrivial case of inequality (2.11) is (2.12)
≥
.
Translating to homomorphisms, this means that v(G)hom(P3 , G) ≥ hom(K2 , G)2 . If we count the homomorphisms on the left side by the image of the middle node, we see that it is the sum of the squared degrees of G. Since hom(K2 , G) = 2e(G) is the sum of the degrees, this inequality is just the inequality between arithmetic and quadratic means, applied to the sequence of degrees. Bounding the P3 -density from above in terms of the edge density is more difficult, but it was solved by Ahlswede and Katona [1978]; we formulate this as Exercise 2.4 below. The next case of inequality (2.11) is (2.13)
≥
,
and this is already quite hard, although short proofs with a tricky application of the Cauchy–Schwarz inequality are known. In Chapter 16 we will return to the question of how far the application of such elementary inequalities takes us in proving inequalities between subgraph densities. 2.1.3. An algebraic “proof ” of an extremal theorem. We illustrate the use of the formalism with pictograms for an algebraic proof of Goodman’s Inequality 2.2. This will motivate a basic tool to be introduced in Chapter 6, namely graph algebras. To describe this proof, we extend the pictogram formalism from Section 2.1.2. If we fill a node, this indicates that this node is labeled . We should write the label on the node, but to keep the picture simple, let us agree that the labels are 1, 2, . . . ,
2.1. EXTREMAL GRAPH THEORY
29
starting from the lower left corner, and going counterclockwise. (It does not really matter.) The role of the labels is that when taking a “product” of two graphs, we take the disjoint union, but identify nodes with the same label. With this convention, it is easy to check that ( )2 − − + = − − + (this combination is “idempotent”) and )2 ( − =
−2
+
Forgetting the labels, adding up, and deleting isolated nodes, we get ( )2 ( )2 − − + +2 − = −2 + . So the right side is a sum of squares, which implies that it is nonnegative: −2
+
≥ 0,
which is just (2.7). Is this a valid argument? It turns out that it is, and the method can be formalized using the notion of graph algebras. These will be very useful tools in the proofs of characterization theorems of homomorphism functions, and also in some other studies of graph parameters. 2.1.4. General results. Moving from special extremal graph problems to the more general, let us describe some quite general results about extremal graphs, which were obtained quite a long time ago in several papers of Erd˝os, Stone and Simonovits [1946, 1966, 1968]. We exclude an arbitrary graph L as subgraph of a simple graph G, and want to determine the maximum number of edges of G, given the number of nodes n. Tur´ an’s Theorem 2.1 is a special case when L is a complete graph. It turns out that the key quantity that governs the answer is the chromatic number r = χ(G). The Tur´ an graph T (n, r − 1) is certainly one of the candidates for the extremal graph, since it cannot contain any graph as a subgraph that has chromatic number r. For certain excluded graphs L it is easy to construct examples that have slightly more edges than this Tur´ an graph; however, the gain is negligible: for every graph G on n nodes that does not contain L as a subgraph, we have ( )( n ) 1 (2.14) e(G) ≤ (1 + o(1))e(T (n, r − 1)) = 1 − + o(1) . r−1 2 There is also a “stability” result: For every ε > 0 there is an ε′ > 0 (depending L with at least on ( L and ε, but not )( on ) G) such that if G is a graph not containing ( ) 1 − 1/(r − 1) − ε′ n2 edges, then we can change at most ε n2 edges of G to get a Tur´ an graph T (n, r − 1). We will see that graph limit theory gives very short and elegant proofs for these facts. The idea that extremal graph problems have “continuous versions” (in a sense quite similar to our use of graphons), which are often cleaner and easier to handle, goes back to around 1980, when Katona [1978, 1980, 1985] and Sidorenko [1980, 1982] used this method to generalize graph and hypergraph problems, and also to give applications in probability theory.
30
2. LARGE GRAPHS IN MATHEMATICS AND PHYSICS
Remark 2.2. If r = 2 (which means that L is bipartite), then the main term in (2.14) disappears, and all we get is that the number of edges is o(n2 ). Of course, one would like to know the precise order of magnitude of the best upper bound. This is known in several cases (e.g., small complete bipartite graphs and cycles), but in general it seems to be a difficult unsolved problem. The extremal graphs in this case are sparse, and quite complex: for example, C4 -free graphs with maximum edge density are constructed from finite projective planes. Extremal problems for graphs with excluded bipartite graphs do not seem to fit in with the framework developed in this book, but perhaps they can serve as motivation for extending it to sparser graphs. 2.1.5. General questions. We have brought up the idea of introducing graphons (graph limits) in Section 1.5.3 motivated by the goal to approximate very large networks by simpler analytic objects. We have seen that graphons provide cleaner formulations, with no error terms, of some results in graph theory (for example, about quasirandom graphs). We will see in Section 16.7 that extremal graph theory provides another, also quite compelling motivation: Graphons provide a way to state, in an exact way, general questions about the nature of extremal graphs, and also help answering them, at least in some cases. (They have similar uses in the theory of computing; cf. Chapter 15). Which inequalities between subgraph densities are valid? Given a linear inequality between subgraph densities (like (2.7) above), is it valid for all graphs G? Hatami and Norine [2011] proved recently that this question is algorithmically undecidable. We will describe the proof of this fundamental result in Section 16.6.1. On the other hand, it follows from the results of Lov´asz and Szegedy [2012a] that if we allow an arbitrarily small “slack”, then it becomes decidable (see Section 16.6.2). Can all linear inequalities between subgraph densities be proved using just Cauchy–Schwarz? We described above a proof of the simple inequality (2.12) using the inequality between arithmetic and quadratic means, or equivalently, the Cauchy–Schwarz Inequality. Many other extremal problems can be proved by using the Cauchy–Schwarz Inequality (often repeatedly and in nontrivial ways). Exercise 2.5 shows that Goodman’s Inequality can also be proved by this method. How general a tool is the Cauchy–Schwarz Inequality in this context? Using the notions of graphons and graph algebras we will be able to give an exact formulation of this question. It will turn out that the answer is negative (Hatami and Norine [2011], Section 16.6.1), but it becomes positive if we allow an arbitrarily small error (Lov´ asz and Szegedy [2012a], Section 16.6.2). Is there always an extremal graph? Let us consider extremal problems of the form “maximize a linear combination of subgraph densities, subject to fixing other such combinations”. For example, “maximize the triangle density subject to a given edge density” (the answer is given by the first nontrivial case of the Kruskal–Katona Theorem (2.8)). To motivate our approach, consider the following two optimization problems. Classical optimization problem. Find the minimum of x3 − 6x over all numbers x ≥ 0. Graph optimization problem. graphs G with t(K2 , G) ≥ 1/2.
Find the minimum of t(C4 , G) over all
2.1. EXTREMAL GRAPH THEORY
31
√ The solution of the classical optimization problem is of course x = 2. This means that it has no solution in rationals, but we can find rational numbers that are arbitrarily close to being optimal. If we want an exact solution, we have to go to the completion of the rationals, i.e., to the reals. The graph optimization problem may take a bit more effort to solve, but (2.9) shows that if the edge density is 1/2, then the 4-cycle density is at least 1/16. With a little effort one can show that equality is never attained here. Furthermore, the 4-cycle-density gets arbitrarily close to 1/16 for appropriate families of graphs: the simplest example is a random graph with edge density 1/2 (cf. also Section 1.4.2). The analogy with the classical optimization problem above suggests that we should try to enlarge the set of (finite) graphs with new objects so that the appropriate extension of our optimization problem has a solution among the new objects. Furthermore, we want that these new objects should be approximable by graphs, just like real numbers are approximable by rationals. As it turns out, graphons are just the right objects for this. One can prove that there is always an extremal graphon, which then gives a “template” for asymptotically extremal graphs. This follows from another fact that can be considered one of the basic results treated in this book: The space of graphons is compact in the cut-distance metric. (This notion of distance was mentioned in Section 1.5.1, and will be defined in Chapter 8; the compactness of the graphon space will be proved in Section 9.3). Which graphs are extremal? This is not a good question (every graph is extremal for some sufficiently complicated extremal graph problem), but replacing “graph” by “graphon” makes it mathematically meaningful. Every extremal graphon gives a “template” for asymptotically extremal graphs.
Figure 2.2. Templates for optimal solutions to some classical extremal graph results: (a) Tur´an’s Theorem 2.1 and Goodman’s Inequality (2.2); (b) the Kruskal–Katona Theorem (2.3); (c) Erd˝os’s inequality (2.9) In classical extremal graph results, these templates are quite simple (Figure 2.2). A natural guess would be that all templates have the form of a stepfunction, like the rightmost square in Figure 1.6. All of these are indeed templates for appropriate extremal problems, but they is not all the templates: we will see that the limit of half-graphs (the rightmost square in Figure 1.7) is also the template for the extremal graph of a quite simple extremal problem, and there are many other, more complicated, templates. We will prove several results about the structure of these extremal templates (Section 16.7), but no full characterization is known.
32
2. LARGE GRAPHS IN MATHEMATICS AND PHYSICS
Exercise 2.3. Prove inequality (2.13) Exercise 2.4. Let G be a simple graph with edge density d = t(K2 , G). Prove that t(P3 , G) ≤ max(d3/2 , 1 − 2d + d3/2 ). Exercise 2.5. Translate the “proof” of Goodman’s Inequality 2.2 above into a valid proof using the Cauchy–Schwarz inequality twice.
2.2. Statistical physics One area of research where graph homomorphisms play an important role, and the study of the asymptotic behavior of parameters when tending to infinity with the size of a graph is a main goal, is statistical physics. I am afraid this book will not do justice to this connection; my excuse is that statistical physics is such a large area, with so advanced special methods, that any reasonable treatment would double the size of the book. Nevertheless, I must give a very short introduction to the subject here. To describe a basic model in statistical physics, suppose that we have a piece of a crystal, where the spin of every atom can point either up or down. We model this by an n × n grid G = Gn×n (for simplicity, in two dimensions). If we assign to every node of G (every “site”) a “state”, which can be UP or DOWN, we get a “configuration”. The atoms are changing their spins randomly all the time, but not independently of each other: depending on the spins of adjacent atoms, one direction of the spin of an atom may be less likely then the other, or even entirely impossible. We would like to know how a typical configuration looks like: is it random-like as the first picture in Figure 2.3, is it homogeneous as the second (well, maybe with a few exceptions here and there), or is it structured in other ways, as the third?
Figure 2.3. Three configurations of the Ising model. Two atoms that are adjacent in the grid have an “interaction energy”, which depends on their states. In the simplest version of the basic Ising model, the interaction energy is some number −J if the atoms are in the same state, and J if they are not. The states of an atom can be described by the integers 1 and −1, and so a configuration is a mapping σ : V (G) → {1, −1}. If σu denotes the state of atom u, then the total energy of a given configuration is ∑ H(σ) = − Jσu σv . uv∈E(G)
Basic physics (going back to Boltzmann) tells us that the system is more likely to be in states with low energy. In formula, the probability of a given configuration
2.2. STATISTICAL PHYSICS
33
is proportional to e−H(σ)/T , where T is the temperature (from the point of view of the mathematician, just a parameter). Since probabilities must add up to 1, these values must be normalized: e−H(σ)/T P(σ) = , Z where the normalizing factor Z is called the partition function of the system (it is called a “function” because it depends on the temperature). This is perhaps the most important quantity to know, which contains implicitly many important physical parameters. The partition function is simple to describe: (1 ∑ ) ∑ ∑ Z= e−H(σ)/T = exp Jσu σv , T σ σ uv∈E(G)
but since the number of terms is enormous, partition functions can be very hard to compute or analyze. The behavior of the system depends very much on the sign of J. If J > 0, then adjacent pairs that are in the same state contribute less to the total energy than those that are in different state, and so the configuration with the lowest energy is attained when all atoms are in the same state. The typical configuration of the system will be close to this, at least as long as the temperature T is small. This is called the ferromagnetic Ising model, because it gives an explanation how materials like iron get magnetized. If J < 0 (the antiferromagnetic case), then the behavior is different: the chessboard-like pattern minimizes the energy, and no magnetization occurs at any temperature. One may notice that the temperature T emphasizes the difference between the energy of different configurations when T → 0 (and de-emphasizes it when T → ∞). In the limit when T → 0, all the probability will be concentrated on the states with minimum energy, which are called ground states. In the simplest ferromagnetic Ising model, there are two ground states: either all atoms are in state UP, or all of them are in state DOWN. If the temperature increases, disordered states like the left picture in Figure 2.3 become more likely. The transition from the ordered state to the disordered may be gradual (in dimension 1), or it may happen suddenly at a given temperature (in dimensions 2 and higher, for large graphs G); this is called a phase transition. This leads us to one of the central problems in statistical physics; alas, we cannot go deeper into the discussion of this issue in our book. To make the connection to graph homomorphisms, we generalize the Ising model a little. First, we replace the grid by an arbitrary graph G. (From the point of view of physics, other lattices, corresponding to crystals with other structure, are certainly natural. Other materials don’t have a simple periodic crystal structure.) Second, we introduce a “magnetic field”, which prefers one state over the other: in ∑ the simplest case it adds − u hσu to the energy function, with some parameter h. Third, we consider not two, but q possible states for every atom, which we label by 1, 2, . . . , q (unlike 1 and −1 before, these should not be considered as numbers: they are just labels). We have to specify an interaction energy Jij for any two states i and j, and a magnetic field energy hi for every state i. A configuration is now a map σ : V (G) → [q], and the energy of it is ∑ ∑ H(σ) = − hσ(v) − Jσ(u),σ(v) . v∈V (G)
uv∈E(G)
34
2. LARGE GRAPHS IN MATHEMATICS AND PHYSICS
The partition function is ( 1( ∑ ∑ exp − hσ(v) + Z= T σ: V (G)→[q]
v∈V (G)
∑
)) Jσ(u),σ(v)
.
uv∈E(G)
We are almost at homomorphisms! For i, j ∈ [q], let ( 1 ) ( 1 ) αi = exp − hi , and βij = exp − Jij , T T then the partition function can be expressed as ∑ ∏ ∏ (2.15) Z= ασ(v) βσ(u)σ(v) . σ:V (G)→[q] v∈V (G)
uv∈E(G)
Consider the case when αi = 1 for all i, and βij is 0 or 1 (in the Ising model βij cannot be zero, but (2.15) allows this substitution). Then every term in (2.15) is either 0 or 1, and a term is 1 if and only if βσ(u)σ(v) = 1 for every uv ∈ E(G). Let us build a graph H with node set V (H) = [q], in which i, j ∈ [q] are adjacent if and only if βij = 1. Then a term in (2.15) is 1 if and only if σ is a homomorphism G → H, and so the sum simply counts these homomorphisms, and gives the value Z = hom(G, H). In the case of general values for the α and β, we can define a weighted graph H with nodeweights αi and edgeweights βij . Formula (2.15) can then serve as the definition of hom(G, H), which will be very important for us. We don’t discuss the connections between statistical physics and graph theory (homomorphisms and limits) any further; for an introduction to the connections between statistical physics and graph theory, with more examples, see de la Harpe and Jones [1993]. Exercise 2.6. Define a model in statistical physics in which the ground state corresponds to the maximum cut of a graph.
Part 2
The algebra of graph homomorphisms
CHAPTER 3
Notation and terminology In this book, different areas of mathematics come together (graph theory, probability, algebra, functional analysis), and this makes it difficult to find good notation, and impossible in some cases to stick to standard notation. I tried to find notation that helps readability. For example, when doing computations with small graphs, I often use pictograms instead of introducing dozens of notations for them. When labeling one or more nodes of a graph G, I use G• or G•• , and when adding some loops at the nodes, I use G◦ . These graphs must be still defined, but perhaps the meaning of the notation is easier to remember keeping this in mind.
3.1. Basic notation Let R, C, Z denote the sets of real, complex and integer numbers. We denote by N the set of nonnegative integers, by N∗ , the set of positive integers, by Zq , the set of integers modulo q, and by R+ , the set of nonnegative reals. We use the notation [n] = {1, 2, . . . , n} and (n)k = n(n − 1) . . . (n − k + 1). If A is a statement, then { 1, if A is true, 1(A) = 0, otherwise. If A is a set, then 1A is its indicator function: 1A (x) = 1(x ∈ A). If A is a real matrix, then A ≽ 0 means that A is positive semidefinite (in particular, symmetric), while A ≥ 0 means that all its elements are nonnegative. For two matrices A, B ∈ Rm×n , their dot product is defined by A·B =
m ∑ n ∑
Aij Bij .
i=1 j=1
The natural logarithm will be denoted by ln; logarithm of base 2, by log. (There is a recurring dilemma about which logarithm to use. Base 2 is used in information theory, and it is often better suited for combinatorial problems; natural logarithm has simpler analytical formulas. Luckily, the two differ in a constant factor only so the difference is usually irrelevant.) We denote by log∗ x the least n for which the n-times iterated logarithm of x is less than 1. The Lebesgue measure on R will be denoted by λ. We will consider partitions of both finite sets and the interval [0, 1]. A partition of [0, 1] will be called an equipartition, if it has a finite number of measurable classes with the same measure. A partition of a finite set V will be called equitable, if |S| − |T | ≤ 1 for any two partition classes. 37
38
3. NOTATION AND TERMINOLOGY
3.2. Graph theory We denote by v(G) = |V (G)| the number of nodes and by e(G) = |E(G)|, the number of edges. The subgraph induced by S ⊆ V (G) is denoted by G[S]. For X, Y ⊆ V (G), let eG (X, Y ) denote the number of edges with one endnode in X and another in Y ; edges with both endnodes in X ∩ Y are counted twice. We denote by NG (v) the set of neighbors of v in the graph G (which we abbreviate as N (v) if the graph G is understood). We denote by ∇(v) the set of edges incident with the node v. For every r ≥ 0 and v ∈ V (G), we denote by BG,r (v) the subgraph of G induced by those nodes that are at a distance at most r from v. We also call this graph the r-ball about v. For any family C of sets, we denote by L(C) the intersection graph of C, i.e. the graph with node set C, where two nodes (sets in C) are connected if and only if they have a nonempty intersection. As a special case, the intersection graph of E(G) (where G is any multigraph) is called the line-graph of G, and denoted by L(G). We have to introduce many types of graphs. A graph!simple is a finite graph without loops and multiple edges. A looped-simple graph is a finite graph without multiple edges, in which any subset of the nodes can carry a loop; equivalently, this is a symmetric binary relation on a finite set. A multigraph is a finite graph (in which loops and multiple edges are allowed). Let Fksimp denote the set of simple graphs on node set [k], and Fkmult , the set of multigraphs on node set [k]. We denote by Gsimp the simple graph obtained from a multigraph G by deleting loops as well as all but one edge from every parallel class. Some special graphs need special names: Pn denotes the path with n nodes (note the somewhat unusual indexing; we usually put the number of nodes in the subscript); Cn denotes the cycle with n nodes (this is mostly used for n ≥ 3, but C2 and even C1 (a node with a loop) will be useful occasionally); Kn is the complete graph with n nodes (including the graph K0 with no nodes and edges); Kn◦ is the complete graph with n nodes, with a loop added at every node; Sn is the star with n nodes; On is the graph on [n] with no edges. The m-bond B m consists of two nodes connected by m edges. For a simple graph G, we denote by Conn(G) the set of connected subgraphs , and by Csp(G), connected spanning subgraphs (note: spanning, not induced!).
Weighted graphs. A weighted graph H is a looped-simple graph, with a positive real weight αi (H) associated with each node i and a real weight βi,j (H) associated with each edge ij. It is often convenient to assume that H is a complete graph with a loop at all nodes; the missing edges can be added with weight 0. Then the weighted graph H is completely described (up to isomorphism) by a nonnegative integer q = v(H), a positive real vector a = (α1 , . . . , αq ) ∈ Rq of nodeweights and the real symmetric matrix B = (βij ) ∈ Rq×q of edgeweights. We denote this weighted graph by H(a, B). An edge-weighted graph is a weighted graph with nodeweights 1. A simple graph can be considered as a special edge-weighted graph in which all edge-weights are 0 or 1, and all loops have weight 0. Multigraphs can be considered as edge-weighted graphs in which the nodeweights are 1 and the edgeweights are nonnegative integers (but this is not always the best).
3.3. OPERATIONS ON GRAPHS
39
Signed graphs. Suppose that the edges of a graph F are partitioned into two sets E+ and E− . The triple F = (V, E+ , E− ) will be called a signed graph. (We don’t consider this as a weighted graph with edge weights ±1, because these signs will play a quite different role!) Partially labeled graphs. This less standard type of graphs will play a crucial role in this book. A simply k-labeled graph is a graph in which k of the nodes are labeled by 1, . . . , k (there may be any number of unlabeled nodes). A k-multilabeled graph is a graph in which labels 1, . . . , k are attached to some nodes; the same node may carry more than one label (but a label occurs only once). So a k-multilabeled graph is a graph F together with a map [k] → V (F ), and this is k-labeled if this map is injective. We omit “simply” from k-labeled, unless we want to emphasize that it is simply k-labeled. The set of isomorphism types of k-labeled multigraphs will be denoted by Fk• . More generally, for every finite set S ⊆ N of labels we can talk about S-labeled and S-multilabeled graphs. A partially labeled graph is an S-labeled graph for some finite set S. A 0-labeled graph (or equivalently an ∅-labeled graph) is just an unlabeled graph. The set of S-labeled multigraphs will be denoted by FS• . A partially labeled graph in which all nodes are labeled will be called flat or fully labeled or flat. For every partially labeled graph G and S ⊆ N, let [[G]]S denote the partially labeled graph obtained by removing the labels not in S . For S = ∅, we denote [[G]]∅ simply by [[G]]; this is the unlabeled version of the graph G. We need some notation for differently labeled versions of some basic graphs (Figure 3.1). We denote by Kn , Kn• , Kn•• , . . . the complete graph with 0, 1, 2, . . . nodes labeled. . We denote by Pn , Pn• , Pn•• the path on n nodes with 0, 1, 2 endnodes labeled. . The m-bond labeled at both nodes will be denoted by B m•• . . We de• •• note by Ka,b , Ka,b , Ka,b the complete bipartite graph with a nodes in the “first” bipartition class and b nodes in the “second”, with no node labeled, the first bipartition class labeled, and all nodes labeled, respectively. In figures, the labeled nodes are denoted by black circles, the labels ordered left-to-right or up-down. The 2-multilabeled graph consisting of a single node will be denoted by K1•• . The adjacency matrix of a multigraph G is the V (G) × V (G) matrix AG where (AG )ij is the number of edges connecting node i and j. In the case of a simple graph, this is a 0-1 matrix. For a weighted graph, we let (AG )ij denote the weight of the edge ij (the nodeweights can be encoded in a separate vector in RV (G) ). Colored graphs. We will use graphs in which all the edges and all the nodes are colored (so they are colorful objects indeed). To be precise, a colored graph of type (b, c) (where b and c are positive integers) is a multigraph (possibly with loops) G = (V, E), which is node-colored with b colors and edge-colored with c colors. 3.3. Operations on graphs For the standard notions of edge-deletion, contraction, subdivision, and minor, we refer to any textbook. We will need some less standard operations on graphs. Twin reduction. Let H be a weighted graph and let i, j ∈ V (H) be two nodes such that βik = βjk for every node k; in particular, this includes that βii = βjj = βij (but we allow that αi ̸= αj ). Such a pair of nodes will be called twin nodes. In the case when H is a simple graph, twin nodes are nonadjacent and have the same neighborhood. Interchanging a twin pair is an automorphism in the unweighted
40
3. NOTATION AND TERMINOLOGY
Figure 3.1. The most often used partially labeled graphs case, but not necessarily in the weighted case, since their nodeweights may be different. Let H ′ be obtained by identifying two twin nodes i and j, which means that we delete j, and add αj (H) to αi (H). We can repeat this operation until we get a weighted graph with no twins. The construction leading to this twin-free weighted graph is called twin reduction. It is not hard to see that the twin-free graph obtained from a given graph by twin reduction is uniquely determined. Quotient. Let P be a partition of V (G). We denote by G/P the graph obtained by merging each class of P into a single node. This definition is not precise; in different parts of the book, we need it with edge multiplicities summed, averaged, or maximized. If G is a simple graph (or a looped-simple graph, then one natural interpretation is that G/P is a looped-simple graph, in which two nodes are adjacent if and only if they have adjacent pre-images. we will call this the simple quotient.. For other versions of the quotient construction, instead of introducing a different notation for each of these versions, we will define how the edges are mapped whenever we use this notation. Blow-up. We define the m-blowup G(m) of a graph G if it is obtained by replacing each node of G by m twin copies (m ≥ 1). Sometimes we need a blow-up of G with a given number of nodes, and so we need a little more general notion: we say that a graph G′ is a near-blowup of G if it is obtained by replacing each node of G by m or m + 1 twin copies for some m ≥ 1. Product of graphs. For two looped-simple graphs G1 and G2 , their categorical (weak){( product G1 × G2 )is defined by V (G1 × G2 ) = V (G}1 ) × V (G2 ), and E(G1 × G2 ) = (u1 , u2 ), (v1 , v2 ) : u1 v1 ∈ E(G1 ), u2 v2 ∈ E(G1 ) . We denote by G×k the k-fold categorical product of G with itself. If G1 and G2 are simple, then so is G1 × G2 . The strong product G1 G2 of two simple graphs can be defined by adding a loop at every node, taking the categorical product, and then removing the loops. A further operation on graphs is the Cartesian {( sum G1 G2 ,) defined by V (G1 G2 ) = V (G1 ) × V (G2 ) and E(G1 G2 ) =} (u1 , u2 ), (v1 , v2 ) : u1 v1 ∈ E(G1 ) and u2 = v2 , or u1 = v1 and u2 v2 ∈ E(G1 ) .
CHAPTER 4
Graph parameters and connection matrices 4.1. Graph parameters and graph properties A graph parameter is a function defined on isomorphism types of multigraphs with loops. We will mostly consider real valued graph parameters; we’ll say explicitly when complex values are also allowed. A graph parameter f is called simple if its value is not changed when loops are removed and edge multiplicities are reduced to 1. Equivalently, we can think of a simple graph parameter as being defined on simple graphs only, but it is often convenient to extend it to multigraphs G by f (G) = f (Gsimp ). A graph parameter f is additive, if f (G) = f (G1 ) + f (G2 ) whenever G is the disjoint union of G1 and G2 ; it is multiplicative if f (G) = f (G1 )f (G2 ), and maxing if f (G) = max{f (G1 ), f (G2 )}. We say that a graph parameter is normalized if its value on K1 , the graph with one node and no edge, is 1. Note that if a graph parameter is multiplicative and not identically 0, then its value on K0 (the graph with no nodes and no edges) is 1. We call a graph parameter isolate-indifferent if its value does not change when isolated nodes are removed. Every multiplicative and normalized graph parameter is isolate-indifferent. There are, of course, too many graph parameters to be treated in a unified way. We will need (and say a few words about) those which are everybody’s favorites: • the maximum size of a stable set of nodes α(G): this is additive; • the maximum number of independent edges (matching number) ν(G) (additive); • the number of perfect matchings pm(G) (multiplicative); • the size of the maximum clique ω(G) (maxing); • the chromatic number χ(G) (maxing); Our main objects of study will be graph parameters defined by counting homomorphisms into, and from, given graphs. We discuss those in detail in the next chapter. A graph property is a class of graphs that is invariant under isomorphism. We identify every graph property P with its indicator function 1P , so for us, graph properties are just 0-1 valued graph parameters. Of course, there are almost as many graph properties in the literature as there are graph parameters. For our purposes, some “properties of properties” will be important. In particular we will often consider the following special types of properties. • Monotone property: inherited by subgraphs, i.e., G ∈ P implies that G′ ∈ P for every subgraph G′ of G. To be bipartite, triangle-free, or planar are examples of monotone properties. 41
42
4. GRAPH PARAMETERS AND CONNECTION MATRICES
• Hereditary property: inherited by induced subgraphs. All monotone properties are also hereditary. Further (non-monotone) hereditary properties are being perfect, or triangulated, or a line-graph. • Minor-closed property: inherited by minors. Being planar, or series-parallel, or linklessly embedable in 3-space are such properties. These monotonicity conditions can be extended to real valued graph parameters in a natural way. For example, a graph parameter f is called minor-monotone, if f (G′ ) ≤ f (G) whenever G′ is a minor of G. An important operation on graph parameters is the M¨obius transformation. It is best to introduce this here, because we will need to use more than one kind. Appendix A.1 introduces the M¨obius transformation on a general finite lattice. We will need three special cases for graphs: • The upper M¨ obius inverse of a simple graph parameter f (with respect to the lattice of simple graphs on the given nodes set) is defined by ∑ ′ (4.1) f ↑ (F ) = (−1)e(F )−e(F ) f (F ′ ) F′
(the summation ranges over all simple graphs F ′ ⊇ F with V (F ′ ) = V (F )). • The lower M¨ obius inverse of a multigraph parameter f is defined by ∑ ′ (4.2) f ↓ (F ) = (−1)e(F )−e(F ) f (F ′ ) F′
(the summation ranges over all subgraphs F ′ ⊆ F with V (F ′ ) = V (F )). • The M¨ obius inverse of a graph parameter f , relative to the partition lattice, is defined by ∑ (4.3) f ⇓ (F ) = µP f (F/P ), P
where P ranges over all partitions of V (F ), and µ is the M¨obius function of the partition lattice, given by (A.2) (the actual value of µP will not be important to us, just that such integers exist). This is in fact the “lower” M¨obius inverse on the partition lattice, but thankfully we don’t need the upper one in this book. By the general properties of M¨obius inversion, we have the relations (4.4) ∑ ∑ ∑ f (F ) = f ↑ (F ′ ), f (F ) = f ↓ (F ′ ), f (F ) = f ⇓ (F ′ ). F ′ ⊇F V (F ′ )=V (F )
F ′ ⊆F V (F ′ )=V (F )
P
Exercise 4.1. Let f be a multiplicative graph parameter. Prove that f ↓ is multiplicative as well. Exercise 4.2. Let f be an additive graph parameter. Prove that f ↓ (G) = 0 if G is disconnected with at least two non-singleton components.
4.2. Connection matrices Let F1 and F2 be two partially multilabeled graphs. We define their gluing product (or often just product, if there is no danger of confusion with any other product notion) F1 F2 by taking their disjoint union, and then identifying nodes with the same label. Note that this may force further identifications, since labels i and j may occur on the same node u in F1 , but on different nodes v and v ′ in
4.2. CONNECTION MATRICES
43
F2 , in which case u, v and v ′ must be identified (Figure 4.1). (If F1 and F2 are simply labeled, then this does not happen, and F1 F2 is also simply labeled. If F1 and F2 are k-labeled, then F1 F2 is also k-labeled.) Another way to describe this construction: form the disjoint union of F1 and F2 , add edges between nodes with the same label, and contract the new edges. So the new labeled nodes will correspond to the connected components of the graph on the original labeled nodes, formed by the new edges. Even if F1 and F2 are simple graphs, which are k-multilabeled, their product may have loops and parallel edges. If F1 and F2 are simply k-labeled and have no loops, then F1 F2 has no loops, but may have multiple edges. For two 0-labeled (i.e., unlabeled) graphs, F1 F2 is their disjoint union. Clearly this multiplication is associative and commutative.
Figure 4.1. Top: the product of two simply partially labeled graphs. Bottom: the product of two 4-multilabeled graphs Example 4.3. Consider edgeless fully k-multilabeled graphs. Such a graph is given by a partition of the label set [k]. The product of two such graphs is also edgeless, and it corresponds to the join of the partitions in the partition lattice (see Appendix A.1). The (unique) simply labeled graph in this class corresponds to the discrete partition; the graph with one node corresponds to the indiscrete partition. Our basic tool to study a graph parameter will be the sequence of its connection matrices: These are infinite matrices, one for every integer k ≥ 0, whose linear algebraic properties are closely related to graph-theoretic properties of graph parameters. Let f be any multigraph parameter and fix an integer k ≥ 0. We define the k-th multilabeled connection matrix of the graph parameter f as the (infinite) symmetric matrix M mult (f, k), whose rows and columns are indexed by (isomorphism types of) k-multilabeled multigraphs, and the entry in the intersection of the row corresponding to F1 and the column corresponding to F2 is f ([[F1 F2 ]]). The submatrix corresponding to the simply k-labeled graphs is denoted by M simp (f, k) or just M (f, k), and will be called simply the k-th connection matrix (Figure 4.2). The submatrix of M (f, k) formed by rows and columns that are fully labeled (flat) will be called the flat connection matrix and denoted by M flat (f, k). If the graph parameter f is a simple graph parameter, then in M (f, k) those rows that correspond to rows indexed by graphs with loops and/or multiple edges are just copies of rows indexed by simple graphs, and similarly for the columns.
44
4. GRAPH PARAMETERS AND CONNECTION MATRICES
Sometimes it is convenient to work with a single connection matrix M (f, N), whose rows and columns are indexed by all partially labeled graphs, and (as before) the entry in row F1 and column F2 is f ([[F1 F2 ]]). Trivially, M (f, N) contains all the connection matrices M (f, k) as submatrices, but it does not carry substantially more information, at least for parameters that will be most interesting for us: if f is isolate-indifferent, then every finite submatrix of M (f, N) is also a submatrix of M (f, k) for every sufficiently large k.
0 1
0
1
2
2
1
2
3
1
0
1
2
2
1
2
3
3
2
3
4
.. .
.. .
.. .
.. .
1 0 1 2 .. .
...
. . . . . . . . . . . . .. .
Figure 4.2. Some rows and columns of the second connection matrix. The entries are obtained by applying a graph parameter to the graphs shown. If the graph parameter is the number of edges, we get the infinite matrix on the right. Two possible properties of connection matrices will be particularly important for us. We call the graph parameter f reflection positive if all the corresponding connection matrices M (f, k) are positive semidefinite. For isolate-indifferent parameters, this is equivalent to saying that M (f, N) is positive semidefinite. (To be precise, we have to talk about reflection positivity with respect to multilabeled or simply labeled connection matrices. If not said explicitly, we mean simply labeled.) We call the parameter flatly reflection positive if its flat connection matrices are positive semidefinite. For simple graphs, these matrices are finite for every fixed k, so flat reflection positivity is a much friendlier notion than general reflection positivity. Nevertheless, they will turn out to be equivalent under mild conditions (Proposition 14.60). We define rank function of a graph parameter as the rank ( the connection ) r(f, k) = rk M (f, k) as a function of k (again, simply/multilabeled and simple graph/multigraph versions can be defined). This number is infinite in general, but it is finite in a surprisingly large number of cases. Those parameters for which it is finite for all k, which we call parameters of finite rank, are of particular interest, and will be discussed next. Exercise 4.4. (a) Show that a multigraph parameter f is multiplicative if and only if M (f, 0) is positive semidefinite and has rank 0 or 1. (b) Characterize graph parameters for which M (f, 0) has rank 1.
4.3. FINITE CONNECTION RANK
45
4.3. Finite connection rank Connection matrices are infinite, and typically they have infinite rank. However, surprisingly many multigraph parameters, including some large classes, have finite connection rank, which makes this finiteness a combinatorially important property. Finite connection rank will have an important algorithmic consequence: every such parameter can be computed efficiently (in polynomial time) for graphs with bounded treewidth. The exact statement of this fact and the description of the algorithm will be given in Section 6.5. As a warm-up, we make a few simple observations about operations on multigraph parameters preserving finite connection rank. (These also hold for simple graph parameters, which can be considered as a special case here.) Lemma 4.5. Let f and g be graph parameters and c ∈ C. If f and g have finite connection rank, then so do cf , f + g and f g. Proof. The first two assertions are trivial. The third one follows from the observation that every connection matrix M (f g, k) is a submatrix of the Kronecker product of M (f, k) and M (g, k), and so its rank is at most the product of the ranks of them. Lemma 4.6. Let f and g be graph parameters with finite rank, and suppose that f − g has finite range. Then max(f, g) and min(f, g) have finite connection rank. Proof. First, we prove the case when both f and g have finite range. Lemma 4.5 implies that p(f, g) has finite connection rank for every polynomial p in two variables. Since over a finite range, every function of f and g can be expressed as a polynomial of them, the assertion follows. Second, in the general case we use that max(f, g) = f + max(0, g − f ). By what we just proved, max(0, g − f ) is a parameter with finite rank, and hence so is max(f, g). The assertion for the minimum follows similarly. 4.3.1. Many parameters with finite connection rank. In this section, we describe connection matrices for a variety of multigraph parameters. Our main concern will be their rank (and sometimes whether they are semidefinite). Example 4.7 (Nodes and edges). The number of edges e(G) in G is an additive parameter: e(F1 F2 ) = e(F1 ) + e(F2 ) for two unlabeled multigraphs F1 and F2 . In fact, this holds for two k-labeled graphs as well, and so M (e, k) is the sum of two matrices of rank 1. Thus M (e, k) has rank 2, so r(e, k) = 2 for all k ≥ 0. Similarly, the number of nodes has finite connection rank r(v, k) = 2 for all k. Example 4.8 (Non-parallel edges). Let e′ (G) = e(Gsimp ) denote the number of different (i.e., non-parallel) edges in G. For two k-labeled graphs G1 and G2 , we have e′ (G1 G2 ) = e′ (G1 ) + e′ (G2 ) − 12 A1 · A1 , where Ai is the adjacency matrix of the subgraph of Gi induced by the labeled nodes. Hence M (e′ , k) can be ( )written as the sum of two matrices of rank 1 and one matrix of rank at most k2 . Thus ( ) r(e′ , k) ≤ k2 + 2, and one can check that this is the exact value. Example 4.9 (Subgraphs). Let sg(G) = 2e(G) denote the number of spanning subgraphs of G. Then sg(G1 G2 ) = sg(G1 )sg(G2 ), and so M (sg, k) has rank 1. Thus r(sg, k) = 1 for all k. These matrices are trivially positive semidefinite.
46
4. GRAPH PARAMETERS AND CONNECTION MATRICES ′
Example 4.10 (Simple subgraphs). Let sg′ (G) = 2e (G) denote the number of simple subgraphs of G. Then 1 sg′ (G1 G2 ) = sg′ (G1 )sg′ (G2 ) ′ . sg (G1 ∩ G2 ) The first two factors don’t change the rank, and the rows of the matrix given by the third factor are determined by the edges induced by the labeled nodes, so the k k corresponding matrix has at most 2(2) different rows. Hence r(sg′ , k) ≤ 2(2) . Again one can check that this is the exact value. Next we look at some of the less trivial but still common graph parameters, which make more complicated examples. Example 4.11 (Stability number). The maximum size α(G) of a stable set of nodes is additive, and has finite connection rank (Godlin, Kotek and Makowski [2008]). This is more difficult to prove. First, we split the rows of the matrix k M (α, k) into 2(2) classes, according to the subgraph Hi of Fi induced by the labeled nodes. This splits the matrix M (α, k) into 2k(k−1) submatrices, and it suffices to show that each of these has finite rank. So let us fix H1 and H2 . Let I denote the set of stable sets of nodes in H1 ∪ H2 , and let Fi′ = Fi \ [k] and FiS = Fi′ \ NFi (S). For two k-labeled graphs F1 and F2 with Fi [k] = Hi , we have α(F1 F2 ) = maxS∈I αS (F1 F2 ), where ( ( αS (F1 F2 ) = |S| + α F1S ) + α F2S ) is the maximum size of ( ) a stable set in F1 F2 intersecting [k] in S. The rank of the matrix αS (F1 F2 ) is at most 3. Unfortunately, we cannot apply Lemma 4.6 directly, since αS (F1 F2 ) is not bounded. But we can use that α(F1 F2 ) ≥ α∅ (F1 F2 ) = α(F1′ ) + α(F2′ ), and hence those sets S for which α(F1S ) < α(F1′ ) − k or α(F2S ) < α(F2′ ) − k play no role in the maximum. In other words, we can replace αS by { } { } αS′ (F1 F2 ) = |S| + max α(F1S ), α(F1′ ) − k + max α(F2S ), α(F2′ ) − k , ( ) and still have that α(F1 F2 ) = maxS αS′ (F1 F2 ). The matrices αS′ (F1 F2 ) have rank at most 3, and for different sets S the corresponding entries differ by at most 3k, so the same argument as in the proof of Lemma 4.6 implies that α has finite connection rank. Example 4.12 (Node cover number). The minimum number of nodes covering all edges, τ (G) = v(G) − α(G), has finite connection rank as well, since every connection matrix of τ is the difference of the corresponding connection matrices of the parameters v and α, which both have finite rank. Example 4.13 (Number of stable sets). Let stab(G) denote the number of stable sets in G. This parameter is multiplicative, and has finite connection rank: this can be verified easily by distinguishing stable sets according to their intersection with the set of labeled nodes. Example 4.14 (Number of perfect matchings). Let pm(G) denote the number of perfect matchings in the multigraph G. It is trivial that pm is multiplicative. Let G be a k-labeled multigraph, let X ⊆ [k], and let pm(G, X) denote the number of matchings in G that match all the unlabeled nodes and the nodes with
4.3. FINITE CONNECTION RANK
47
label in X, but not any of the other labeled nodes. Then we have for any two k-labeled multigraphs G1 and G2 ∑ pm(G1 , X)pm(G2 , [k] \ X). pm(G1 G2 ) = X⊆[k]
Hence the matrix M (pm, k) can be written as the sum of 2k matrices of rank 1, and its rank is at most 2k (it is not hard to see that in fact equality holds). If we consider the matching number as a simple graph parameter (in terms of multigraphs, this means that we don’t care which edge in a parallel class matches a given pair of nodes), then the above argument has to be modified, to arrive at a similar conclusion. The details of this are left to the reader as an exercise. Example 4.15 (Number of Hamilton cycles). Let ham(G) be the number of Hamilton cycles in G. For two k-labeled multigraphs G1 and G2 , every Hamilton cycle H in G1 G2 defines a cyclic ordering (i1 , . . . , ik ) of the nodes in [k], and for any two consecutive nodes ir and ir+1 , it defines an index jr ∈ [2] which tells us whether the arc of H between ir and ir+1 uses G1 or G2 . Let us call the cyclic ordering (i1 , . . . , ik ), together with the indices (j1 , . . . , jk ), the trace of H on the labeled nodes. (If you are living in the set of labeled nodes, and cannot see farther than a small neighborhood of the nodes, then the trace is all that you can see from a Hamilton cycle.) Given a possible trace T = (i1 , . . . , ik ; j1 , . . . , jk ), we denote by ham(Gj ; T ) the number of systems of edge-disjoint paths in Gj which connect ir and ir+1 for all r with jr = j, and which cover all unlabeled nodes in Gj . Then ∑ ham(G1 G2 ) = ham(G1 ; T )ham(G2 ; T ), T
showing that the rank of M (ham, k) is bounded by the number of possible traces (which is 2k−1 (k − 1)! by standard combinatorial calculation). Example 4.16 (Chromatic polynomial). Every substitution into the chromatic polynomial chr(G, x) gives a graph parameter (see Appendix A.2). If we substitute a nonnegative integer q for the variable x, we get the number of q-colorings, which is a special case of homomorphism functions (to be discussed in the next Chapter). What about evaluations at other values? The rank of the connection matrices for the general case was determined by Freedman, Lov´asz and Welsh (see Lov´asz [2006a]). Let Bk denote the number of partitions of a k-set (the k-th Bell number), and let Bk,q denote the number of its partitions into at most q parts. Proposition 4.17. For every fixed x, chr(., x) is a multiplicative graph parameter. For every k ≥ 0, { Bk,x if x is a nonnegative integer, r(chr(., x), k) = Bk otherwise. Furthermore, M (chr(., x), k) is positive semidefinite if and only if x is a nonnegative integer or x ≥ k − 1. Proof. We prove that the right hand side is an upper bound even for the rank of the multi-connection matrix, and a lower bound for the rank of the simple
48
4. GRAPH PARAMETERS AND CONNECTION MATRICES
connection matrix. The case x = 0 is trivial, so suppose that x ̸= 0. Using the deletion-contraction relation (4.5)
chr(F, x) = chr(F − e, x) − chr(F/e, x),
we see that for every k-multilabeled multigraph F with at least one edge, the row of M (chr(., x), k) corresponding to F is the difference of two earlier rows (where we order the rows so that the number of edges is non-decreasing). So the rank of the whole matrix is the same as the rank of the submatrix formed by k-multilabeled graphs with no edges. Since deleting an unlabeled isolated node just divides the row by x, we may assume that all nodes are labeled. So the rows and columns of the remaining matrix M ′ correspond to partitions of the label set [k]. In the intersection of the row indexed by P ∈ Π(k) and column indexed by Q ∈ Π(k), we find the k-multilabeled graph corresponding to P ∨ Q. The chromatic polynomial for this graph is x|P∨Q| . Let D denote the diagonal matrix in which the entry in row and column P is (x)|P| , then by identities (A.3) and (A.1), we have M ′ = ZDZ T . This implies that the rank of M ′ is the same as the rank of D, and it is positive semidefinite if and only if D is positive semidefinite. If x is not a nonnegative integer, then D has full rank. The same conclusion holds for positive integers x ≥ k. If x < k is a nonnegative integer, then the number of nonzero diagonal entries in D is Bk,x . Finally, D is positive semidefinite if and only if (x)j ≥ 0 for all 1 ≤ j ≤ k, which is clearly equivalent to x being a nonnegative integer or x ≥ k − 1. Note the nontrivial fact that the rank is always finite. If x is a nonnegative integer, then the connection rank is bounded by xk , but otherwise, as a function of k, it grows faster than ck for every c. Example 4.18 (Tutte polynomial). The cluster expansion version cep(G; u, v) of the Tutte polynomial generalizes the chromatic polynomial (see again Appendix A.2), and it behaves similarly. It is not hard to show that for v ̸= 0, { Bk,u if u is a nonnegative integer, r(cep, k) = Bk otherwise (the case v = 0 is trivial). Furthermore, cep(G; u, v) is reflection positive if and only if u is a nonnegative integer. For other versions of the Tutte polynomial (e.g., tut) similar conclusions hold, since they are related to cep by scaling and substitution in the variables (except when the expressions we scale with are 0). Example 4.19 (Number of spanning trees). The number of spanning trees tree(G) of a graph G is obtained by substitution into the Tutte polynomial tut with x = y = 1. Since u = (x − 1)(y − 1) = 0, this falls under the exception at the end of the last example. Nevertheless, the arguments can be adjusted appropriately, and we get that r(tree, k) = Bk . We conclude with a couple of examples of parameters whose connection matrices have infinite rank, but they are still “interesting”. Example 4.20 (Maximum clique). The size of a maximum clique, ω(G), is maxing. It does not have finite connection rank. In fact, consider the connection matrix M (ω, 0), and its submatrix M whose rows and columns are indexed by
4.3. FINITE CONNECTION RANK
cliques K1 , K2 , . . . . This looks like
1 2 M = 3 4 .. .
2 2 3 4 .. .
3 3 3 4 .. .
4 4 4 4 .. .
49
... . . . . . . . . . .. .
and has infinite rank. Similar argument shows that no unbounded maxing graph parameter has finite connection rank (Exercise 4.34). In particular, the chromatic number has infinite connection rank. Example 4.21 (Eulerian orientations). For an undirected multigraph F , let − → eul(F ) denote the number of eulerian orientations of F (i.e., orientations in which every node has the same outdegree as indegree; for simplicity, let’s exclude loops). − → By Euler’s theorem, eul(F ) = 0 if and only if F has a node with odd degree. It is − → clear that the parameter eul is multiplicative. − → Let us define eul(G; a1 . . . ak ) as the number of orientations of a k-labeled graph G such that the unlabeled nodes have equal outdegree and indegree, while for a node i ∈ [k], the difference between its indegree and outdegree is ai . Then we have: ∑− − → → − → eul(G1 G2 ) = eul(G1 ; a1 . . . ak )eul(G2 ; −a1 · · · − ak ) a
(4.6)
∑− → − → = eul(G1 ; a1 . . . ak )eul(G2 ; a1 . . . ak ). a
(The sum is finite for every G1 and G2 , but the number of nonzero terms is not − → bounded). This implies that the connection matrices M (eul, k) are positive semidefinite, but it does not follow that they have finite rank; and in fact, they have infinite rank for k ≥ 2 (see Exercise 4.33). 4.3.2. Minor-closed graph properties. We have seen many examples of graph parameters with finite connection rank. In the next sections, we will describe some very general classes of such parameters. A challenging problem is to determine all such graph parameters. Theorem 4.22. Every minor-closed multigraph property has finite connection rank. Proof. We use the very deep theorem of Robertson and Seymour [2004] that such a property can be characterized by a finite number of excluded minors. Let P be the multigraph property that G does not contain any of H1 , . . . , Hm as a minor, and let Pi be the property that G does not contain Hi as a minor. Then 1P = 1P1 . . . 1Pm , and so it suffices to prove that Pi has finite connection rank. In other words, we may assume that P is the property of not containing a given graph H as a minor. Fix k, and consider all graphs H ′ on at most v(H) + k − 1 nodes that can be contracted to H. In H ′ , select a subset of at most k nodes in all possible ways, and label them by different numbers from [k]. Finally, 2-color the edges of H ′ red and blue so that only labeled nodes can be incident with edges of both colors. Call every partially labeled 2-colored graph obtained this way a pre-minor. It is clear that the number of pre-minors is finite.
50
4. GRAPH PARAMETERS AND CONNECTION MATRICES
Let G and G′ be two partially labeled edge-colored graphs. We say that G′ is a minor of G, if G′ can be obtained from G by deleting edges and/or nodes and contracting edges so that no edge with both endnodes labeled is ever contracted. (The remaining edges keep their colors and the remaining labeled nodes keep their labels.) Consider a product G1 G2 of two k-labeled graphs, and color the edges of G1 red and the edges of G2 blue. It is easy to see that G1 G2 , as an unlabeled uncolored graph, contains H as a minor if and only if it contains, as a labeled edge-colored graph, at least one pre-minor as a minor. For a pre-minor H ′ , let H1′ be the subgraph of H ′ formed by all red edges, their endpoints, and the labeled nodes. We define H2′ similarly using the blue edges. Then G1 G2 contains H ′ as a minor if and only if G1 contains H1′ and G2 contains H2′ as a minor. Let G1 and G′1 be two k-labeled graphs and suppose that for every pre-minor ′ H , H1′ is a minor of G1 if and only if it is a minor of G′1 . Then the rows of M (1P , k) indexed by G1 and G′1 are equal. This means that M (1P , k) has only a finite number of different rows, and hence its rank is finite. Corollary 4.23. Every nonnegative integer valued bounded minor-monotone multigraph parameter has finite connection rank. Proof. Let f be such a parameter, and assume that f ≤ K. Then f (G) = 1(f (G) ≤ 1) + 1(f (G) ≤ 2) + · · · + 1(f (G) ≤ K). Since the graph property that f (.) ≤ i is minor-closed, each parameter 1(f (G) ≤ i) has finite connection rank by Theorem 4.22, and hence so does f . 4.3.3. Monadic second order formulas. To describe a very rich class of graph properties with finite connection rank (at least for looped-simple graphs), we consider properties defined by certain logical formulas. A first order formula in graph theory is composed of primitives “x = y” and “x ∼ y”, using logical operations “∧” (AND), “∨” (OR) and “¬” (NEGATION), and logical quantifiers “∀” and “∃”. Every such formula, properly composed, with all variables quantified, defines a property of looped simple graphs, if we interpret the quantified variables as nodes, and the relation x ∼ y as x and y being adjacent. For example, the property of being a 2-regular loopless graph can be expressed as (∀x)(∀y)(x = y ⇒ x ̸∼ y) ( ) ∧ (∀x)(∃y)(∃z) y ̸= z ∧ x ∼ y ∧ x ∼ z ∧ (∀u)(x ∼ u ⇒ (u = y ∨ u = z)) (to facilitate reading these formulas, we will use some standard conventions like writing A ⇒ B instead of ¬A ∨ B and x ̸= y instead of ¬(x = y)). First order formulas can define rather simple graph properties only, but we get a real jump in generality if we allow quantifying over subsets of nodes and edges. A monadic second order formula has three types of variables, which we distinguish using different fonts. Lower case letters denote nodes, upper case letters denote subsets of nodes, and upper case boldface letters denote subsets of the edges. The primitives then also include x ∈ X and xy ∈ Y. We call the formula node-monadic if quantifying over subsets of edges is not allowed. This way we get a quite powerful language to express graph formulas, as the following examples show (see also the exercises at the end of the section).
4.3. FINITE CONNECTION RANK
51
Example 4.24. The property of being bipartite (2-colorable) can be expressed as ( ) (∃X)(∃Y ) (∀x)(x ∈ X ∨ x ∈ Y ) ∧ ¬(x ∈ X ∧ x ∈ Y ) ( ) ∧ (∀x)(∀y) x ∼ y ⇒ ((x ∈ X ∧ y ∈ Y ) ∨ (x ∈ Y ∧ y ∈ X)) .
Colorability by any given number of colors can be expressed similarly. Example 4.25. The existence of a perfect matching can be expressed as ( ( ) (∃M) (∀x)(∀y) (xy ∈ M) ⇒ (x ∼ y) ∧ (∀x)(∃y)(xy ∈ M) ( )) ∧ (∀x)(∀y)(∀z) ((xy ∈ M) ∧ (xz ∈ M) ⇒ z = y) . The existence of a Hamilton cycle can also be expressed (see Exercise 4.38).
Example 4.26. Planarity of a graph can be expressed by a node-monadic secondorder formula. First we construct a formula expressing the property of a graph G that it contains a subdivision of K5 . One way to do so is to look for 5 nodes v1 , . . . , v5 and 10 subsets of nodes P12 , P13 , . . . , P45 such that every Pij induces a connected subgraph containing vi and vj , and the sets Pij \ {vi , vj } are disjoint. For a given i < j, the required properties of Pij can be expressed by (( ) Ψij : (vi ∈ Pij ) ∧ (vj ∈ Pij ) ∧ (∀S) (vi ∈ S) ∧ (vj ∈ / S) ( )) ⇒ (∃u)(∃w) (u ∈ Pij ) ∧ (w ∈ Pij ) ∧ (u ∈ S) ∧ (w ∈ / S) ∧ (u ∼ w) . For every pair of pairs {{i, j}, {k, l}} with i < j, k < l, and {i, j} ∩ {k, l} = ∅ we write ( ) Φi,j,k,l : (∀u) (u ∈ / Pij ) ∨ (u ∈ / Pkl ) . and for every pair of pairs {{i, j}, {k, l}} with i < j, k < l, and {i, j} ∩ {k, l} = {m} we write (( ) ) Φi,j,k,l : (∀u) (u ∈ Pij ) ∧ (u ∈ Pkl ) ⇒ (u = vm ) . Then the formula Θ1 = (∃v1 ) . . . (∃v5 )(∃P12 )(∃P13 ) . . . (∃P45 )
(∧ i −D (as G is non-bipartite).
68
5. GRAPH HOMOMORPHISMS
To handle this product, we take the logarithm and expand it: ln
n ∏
(x − λk ) = n ln x +
k=1
(5.43)
= n ln x −
n ∑ k=1 ∞ ∑ r=1
ln(1 − λk /x) = n ln x −
n ∑ ∞ ∑ 1 λr
k
k=1 r=1
r xr
n 1 ∑ r λk . rxr k=1
We can express the last sum using (5.31), to get (5.40). By the Matrix Tree Theorem, ntree(G) is the coefficient of the linear term in the determinant det(yI + DI − A), and hence ( ) ln tree(G) = lim ln det(yI + DI − A) − ln y − ln n . y→0
Using (5.40), ln det(yI + DI − A) = n ln(y + D) − = n ln(y + D) −
∞ ∑ hom(Cr , G) r=1 ∞ ∑ r=1
r(y + D)r y hom(Cr , G) − Dr + ln r(y + D)r y+D
= (n − 1) ln(y + D) −
∞ ∑ hom(Cr , G) − Dr r=1
r(y + D)r
+ ln y.
Substituting this in the formula for ln tree(G) and letting y → 0, we get (5.41).
Exercise 5.25. Prove identity (5.32). Exercise 5.26. Let H = H(a, B) be a weighted graph, where ( ) ( ) 1 2 1 , B= a= −1 1 1 (illegal weighting, because there is a negative nodeweight, but the formula defining the hom function makes sense). Prove that hom(F, H) is the number of those subsets of edges that cover every node. Exercise 5.27. Verify that (5.41) yields the Cayley formula tree(Kn ) = nn−2 . ∏ Exercise 5.28. Prove that ntree(G) = n k=2 (D − λk ), and show that this implies (5.41).
5.4. Homomorphism and isomorphism 5.4.1. Homomorphism–profiles. We start with a simple but useful observation (Lov´ asz [1967]); various less trivial extensions and generalizations of this fact will play an important role a number of times (cf. Theorems 5.33, 13.9 and 17.5, and Corollaries 5.45 and 10.34). Theorem 5.29. Either one of the simple graph parameters hom(., G) and hom(G, .) determines a simple graph G. By the same argument, these parameters defined on looped-simple graphs determine a looped-simple graph G.
5.4. HOMOMORPHISM AND ISOMORPHISM
69
Proof. We prove that hom(., G) determines G; the argument for hom(G, .) is similar. The analogous statement for injective homomorphisms is trivial: if G and G′ are two simple graphs such that inj(F, G) = inj(F, G′ ) for every simple graph F , then in particular inj(G′ , G) = inj(G′ , G′ ) > 0 and inj(G, G′ ) = inj(G, G) > 0, so G and G′ have injective homomorphisms into each other, and hence they are isomorphic. Now (5.18) expresses injective homomorphism numbers in terms of ordinary homomorphism numbers, which implies that if hom(F, G) = hom(F, G′ ) for every simple graph F , then inj(F, G) = inj(F, G′ ) for every simple graph F , and hence G∼ = G′ . We see from the proof of Theorem 5.29 that in fact G is determined by the values hom(F, G) where v(F ) ≤ v(G), as well as by the values hom(G, F ) where v(F ) ≤ v(G). It is a long-standing open problem whether, up to trivial exceptions, strictly smaller graphs F are enough: Conjecture 5.30 (Reconstruction Conjecture). If G is a simple graph with v(G) ≥ 3, then the numbers hom(F, G) with v(F ) < v(G) determine G. There is a weaker version, which is also unsolved: Conjecture 5.31 (Edge Reconstruction Conjecture). If G is a simple graph with e(G) ≥ 4, then the numbers hom(F, G) with e(F ) < e(G) determine G. It is known that the Edge Reconstruction Conjecture holds for graphs G with e(G) ≥ v(G) log v(G) (M¨ uller [1977]). We will prove an “approximate” version of the Reconstruction Conjecture (Theorem 10.32): for an arbitrarily large graph √ G, the numbers hom(F, G) with v(F ) ≤ k determine G up to an error of O(1/ log k) (measured in the cut distance, which was mentioned in the Introduction but will be formally defined in Chapter 8). Unfortunately, this does not seem to bring us closer to the resolution of the Reconstruction Conjecture. The normalized homomorphism density function t(., G) does not determine a simple graph(G: If G(p) is obtained from G by replacing every node by p twin ) nodes, then t F, G(p) = t(F, G). But this is all that can go wrong: Theorem 5.32. If G1 and G2 are simple graphs such that t(F, G1 ) = t(F, G2 ) for every simple graph F , then there is a third simple graph G and positive integers p1 , p2 such that G1 ∼ = G(p1 ) and G2 ∼ = G(p2 ). Proof. Let ni = v(Gi ), and consider the blowups G′1 = G1 (n2 ) and G′2 = G2 (n1 ). These have the same number of nodes, and hence t(F, G′1 ) = t(F, G1 ) = t(F, G2 ) = t(F, G′2 ) implies that hom(F, G′1 ) = hom(F, G′2 ). So by Theorem 5.29, we have G′1 ∼ = G′2 . It follows that the number of elements in every class of twin nodes of ′ ∼ ′ G1 = G2 is divisible by both n1 and n2 , and so it also divisible by m = lcm(n1 , n2 ). So G′1 ∼ = G′2 ∼ = G(m) for some simple graph G, and hence pi = m/ni satisfies the requirements in the theorem. For weighted graphs, one must be a little careful. Let H be a weighted graph and let H ′ be obtained from H by twin reduction. Then hom(F, H ′ ) = hom(F, H) for every multigraph F , even though H and H ′ are not isomorphic. Restricting our attention to twin-free graphs, we have an analogue of Theorem 5.29 (Lov´asz [2006b]):
70
5. GRAPH HOMOMORPHISMS
Theorem 5.33. Let H1 and H2 be twin-free weighted graphs such that hom(F, H1 ) = hom(F, H2 ) holds for all simple graphs F with at most 2(v(H1 ) + v(H2 ) + 3)8 nodes. Then H1 ∼ = H2 . The proof of this theorem is substantially more complicated then that of its unweighted version. A proof without the bound on the size of F will be described in Section 6.4.1, where it will follow easily from the general tools developed there, and the full proof will be postponed until Section 6.4.2 5.4.2. Algebraic properties of graph multiplication. The fact that homomorphism numbers (into it or from it) determine the graph will motivate much in the sequel. Here we describe a few old applications of this fact to some basic algebraic properties of categorical product (Lov´asz [1967, 1971]). First, we show that taking k-th root is unique (if it exists at all). ∼ ×k for Theorem 5.34. If G1 and G2 are looped-simple graphs such that G×k 1 = G2 some k ≥ 1, then G1 ∼ = G2 . Proof. For every looped-simple graph F , we have 1/k 1/k hom(F, G1 ) = hom(F, G×k = hom(F, G×k = hom(F, G2 ), 1 ) 2 ) ∼ whence by Theorem 5.29, G1 = G2 .
Next we turn to the question of Cancellation Law: does G1 × H ∼ = G2 × H (where G1 , G2 and H are looped-simple graphs) imply that G1 ∼ = G2 ? This is false in general: (5.44) K2 × C6 ∼ = K2 × (K3 K3 ). But the proof method of Theorem 5.34 almost goes through: for every looped-simple graph F , we have hom(F, G1 )hom(F, H) = hom(F, G1 × H) = hom(F, G2 × H) = hom(F, G2 )hom(F, H); if hom(F, H) ̸= 0, then this implies that hom(F, G1 ) = hom(F, G2 ). What to do if hom(F, H) = 0? We can find several simple conditions under which this difficulty can be handled: Proposition 5.35. Let G1 , G2 and H be looped-simple graphs such that G1 × H G2 × H. (a) If H has a loop, then G1 ∼ = G2 . (b) If both G1 and G2 have a homomorphism into H, then G1 ∼ = G2 . ′ (c) If a looped-simple graph H has a homomorphism into H, then G1 × H ′ G2 × H ′ .
∼ =
∼ =
Since strong product corresponds to having a loop at every node, we get: Corollary 5.36. Let G1 , G2 and H be simple graphs such that G1 H ∼ = G2 H. Then G1 ∼ = G2 . The proof of Proposition 5.35 is left to the reader as an exercise. With a little more effort, we can characterize cancelable graphs (Lov´asz [1971]). If H is bipartite, then hom(H, K2 ) > 0, and so (5.44) and Proposition 5.35 imply H × C6 ∼ = H × (K3 ∪ K3 ),
5.4. HOMOMORPHISM AND ISOMORPHISM
71
so H is not cancelable. On the other hand, nonbipartite graphs are cancelable: Theorem 5.37. Let G1 , G2 and H be looped-simple graphs such that G1 × H ∼ = G2 × H. If H is not bipartite, then G1 ∼ = G2 . The proof depends on the following lemma: Lemma 5.38. Suppose that G1 × H ∼ = G2 × H. Then there is an isomorphism σ : G1 ×H → G2 ×H such that σ(V (G1 )×{v}) = V (G2 )×{v} for every v ∈ V (H). Proof. We consider graphs G together with a homomorphism π : G → H. We call the pair G = (G, π) an H-colored graph. For every graph G, the product G × H is H-colored in the natural way by the projection onto H. We denote this H-colored graph by GH . Two H-colored graphs F = (F, ρ) and G = (G, π) are isomorphic, if there is(an isomorphism σ : G1 → G2 which commutes with the projections to H, i.e., ) π η(i) = ρ(i) for every i ∈ V (F ). In this language, we want to prove that G1 ×H ∼ = G2 × H implies that (G1 )H ∼ = (G2 )H . For two H-colored graphs F = (F, ρ) and G = (G, π), let hom(F, ( G)) denote the number of those homomorphisms η from F to G that satisfy π η(i) = ρ(i) for every i. Let inj(F, G) denote the number of injective homomorphisms with this property. We can define the product F × G of two H-colored graphs F = (F, ρ) and G = (G, π) as the subgraph of F ×G induced by those nodes (i, j) with ρ(i) = π(j), together with the homomorphism σ(i, j) = ρ(i) into H. The case when H consists of a single node with a loop is equivalent to just ordinary homomorphism numbers. Two identities extend quite easily to this more general notion: (5.45)
hom(F, G1 × G2 ) = hom(F, G1 )hom(F, G2 ),
and (5.46)
inj(F, G) =
∑
µ(F, F′ )hom(F′ , G),
F′
where we sum over all H-colored graphs F′ on at most v(F) nodes, with appropriate coefficients µ(F, F′ ). Let us add the easy identity (5.47) (F × G)H ∼ = FH × GH . ∼ (G2 × H)H (as H-colored From G1 × H ∼ = G2 × H it follows that (G1 × H)H = graphs). By (5.47), this implies that (G1 )H × HH ∼ = (G2 )H × HH , and hence by (5.45), hom(F, (G1 )H )hom(F, HH ) = hom(F, (G2 )H )hom(F, HH ). But notice that hom(F, HH ) > 0: if F = (F, σ), then (σ, σ) is a homomorphism F → HH . Thus we can divide by hom(F, HH ) to get hom(F, (G1 )H ) = hom(F, (G2 )H ) ∼ (G2 )H follows just like in the for every H-colored graph F. From here (G1 )H = proof of Theorem 5.29. Proof of Theorem 5.37. By Proposition 5.35, we may assume that H is an odd cycle with V (H) = [2r + 1] and E(H) = {ij : j ≡ i + 1 (mod 2r + 1)}. By Lemma 5.38 there exist bijections φ1 , . . . , φ2r+1 : V (G1 ) → V (G2 ) such that for
72
5. GRAPH HOMOMORPHISMS
every ij ∈ E(H), φi (u)φj (v) ∈ E(G2 ) if and only if uv ∈ E(G1 ). (Note that this means a different condition if we interchange i and j.) We show that φ1 is an isomorphism between G1 and G2 . Indeed, we have ( ) φ1 (u)φ1 (v) ∈ E(G2 ) ⇐⇒ φ−1 φ1 (u) v ∈ E(G1 ) ⇐⇒ φ1 (u)φ3 (v) ∈ E(G2 ) 2 ( ) φ1 (u) v ∈ E(G1 ) ⇐⇒ φ1 (u)φ5 (v) ∈ E(G2 ) ⇐⇒ φ−1 4 ⇐⇒ . . . ⇐⇒ φ1 (u)φ2r+1 (v) ∈ E(G2 ) ⇐⇒ uv ∈ E(G1 ) This completes the proof.
Remark 5.39. You may have noticed that the proof of Lemma 5.38 followed the lines of the proof of Proposition 5.35, only restricting the notion of homomorphisms to those respecting the H-coloring. This suggests that there is a more general formulation for categories. This is indeed the case, as we will see in Section 23.4. A further natural question about multiplication is whether prime factorization is unique. This is clearly a stronger property than the Cancellation Law, so let us restrict our attention to the strong product, which satisfies the Cancellation Law. The following example shows that prime factorization is not unique in general. We start with an algebraic identity: (5.48)
(1 + x + x2 )(1 + x3 ) = (1 + x)(1 + x2 + x4 ).
If we substitute any connected graph G for x, and interpret “+” as disjoint union, we get a counterexample. For example, (5.49)
(K1 ∪ K2 ∪ K4 ) (K1 ∪ K8 ) = (K1 ∪ K2 ) (K1 ∪ K4 ∪ K16 ).
But there is a very nice positive result of D¨orfler and Imrich [1970] and McKenzie [1971]. (The proof uses different techniques, and we don’t reproduce it here.) Theorem 5.40. Prime factorization is unique for the strong product of connected graphs. Exercise 5.41. (a) Prove that the strong product of two graphs is connected if and only if both graphs are connected. (b) Show by an example that the categorical product of two connected graphs is not always connected. (c) Characterize all counterexamples in (b). Exercise 5.42. Given two looped-simple digraphs F and G, we define the F digraph GF ( as follows:) V (GF ) =( V (G)V (F ) , E(G ) ) = {(φ, ψ) : φ, ψ ∈ V (F ) V (G) , φ(u), ψ(v) ∈ E(G) ∀(u, v) ∈ E(F ) }. (a) Prove the following identities: F F F F F F (G1 × G2 )F ∼ GF1 ×F2 ∼ GF1 F2 ∼ = G1 × G2 , = (G 1 ) 2 , = G 1 × G 2. (b) Show that hom(F, G) is the number of loops in GF . (c) Prove that if adjacency is symmetric both in G and in F , then it is also symmetric in GF (Lov´ asz [1967]).
5.5. Independence of homomorphism functions How independent are homomorphism functions hom(F, .) (in an algebraic sense)? We know that hom(F1 F2 , G) = hom(F1 , G)hom(F2 , G) for two (unlabeled) graphs F1 and F2 ; is this the only identity relating these functions? We start with excluding linear relations. For a set of (non-isomorphic) simple graphs A = {F1 , . . . , Fm }, we define the matrix ( )m A Mhom = hom(Fi , Fj ) i,j=1 .
5.5. INDEPENDENCE OF HOMOMORPHISM FUNCTIONS
73
A A A The matrices Minj and Msurj are defined analogously. Finally, we also define Maut as the matrix with aut(Fi ) = surj(Fi , Fi ) = inj(Fi , Fi ) in the i-th entry of the diagonal and 0 outside the diagonal. A Clearly, Maut is a diagonal matrix; if we order the graphs Fi according to increasing number of edges (and arbitrarily for graphs with the same number of A A edges), then the matrices Minj and Msurj become triangular. All diagonal entries A A A and Msurj are nonsingular. are positive in each case. Hence the matrices Maut , Minj A With Mhom the situation is more complicated: it may be singular (Exercise 5.46). However, we have the following simple but useful fact, observed by Borgs, Chayes, Kahn and Lov´ asz [2012]:
Proposition 5.43. Let A be a family of simple graphs closed under surjective A homomorphisms. Then Mhom is nonsingular. In particular, this holds if A consists of all graphs with at most k nodes, or at most k edges, for some k ≥ 0. Proof. Under the conditions of the Proposition, the matrices introduced above are related by the following identity: (5.50)
A A −1 A A . ) Minj (Maut = Msurj Mhom
Indeed, every homomorphism can be decomposed as a surjective homomorphism followed by an (injective) embedding. By our assumption, the image F of the surjective homomorphism is in A. The decomposition is uniquely determined except for the automorphisms of F . This gives the equation hom(Fi , Fj ) =
m ∑ surj(Fi , Fk )inj(Fk , Fj ) k=1
aut(Fk )
,
which is just 5.50 written out in coordinates. A It follows that Mhom is the product of three nonsingular matrices, and hence it is also nonsingular. If A is just an arbitrary set of simple graphs, we can still create a nonsingular A matrix related to Mhom (Erd˝os, Lov´asz and Spencer [1979]). Proposition 5.44. Let F1 , . . . , Fk be nonisomorphic simple graphs. (a) Let Hi be obtained from Fi by weighting its nodes, and suppose that all the [ ]k weights used are algebraically independent. Then the matrix hom(Fi , Hj ) i,j=1 is nonsingular. [ ]k (b) There are simple graphs G1 , . . . , Gk such that the matrix hom(Fi , Gj ) i,j=1 is nonsingular. (c) If F1 , . . . , Fk have no isolated nodes, then there are simple graphs G1 , . . . , Gk [ ]k such that the matrix t(Fi , Gj ) i,j=1 is nonsingular. We could use nodeweights in (a) chosen randomly and independently from the uniform distribution on [0, 1] (or form any other atomfree distribution); the matrix will be nonsingular with probability 1.
74
5. GRAPH HOMOMORPHISMS
Proof. (a) Considering the node weights as variables, the determinant of the [ ]k matrix hom(Fi , Hj ) i,j=1 is a polynomial p with integral coefficients. The mul[ ]k tilinear part of p is just the determinant of inj(Fi , Hj ) i,j=1 , which is non-zero, since this matrix is upper triangular and the diagonal entries are nonzero polynomials. Hence p is not the zero polynomial, which shows that for an algebraically independent substitution it does not vanish. (b) Instead of algebraically independent weights, we can also substitute appro[ ]k priate positive integers in p to get a nonsingular matrix hom(Fi , Hj ) i,j=1 , since a nonzero polynomial cannot vanish for all positive integer substitutions. For a graph Hj and a node v ∈ V (Hj ) with weight mv , we replace v by mv twin copies of weight 1. Let Gj be the graph obtained this way, then hom(Fi , Gj ) = hom(Fi , Hj ) [ ]k [ ]k for all i, and hence hom(Fi , Gj ) i,j=1 = hom(Fi , Hj ) i,j=1 is nonsingular. (c) Let n = maxi v(Fi ), and let us add n − v(Fi ) isolated nodes to every Fi . The resulting graphs Fi′ are non-isomorphic, and hence there are simple [ ]k graphs G1 , . . . , Gk such that the matrix hom(Fi′ , Gj ) i,j=1 is nonsingular. Since hom(Fi′ , Gj ) = v(Gj )n t(Fi′ , Gj ), we can scale the columns and get that the matrix [ ]k t(Fi′ , Gj ) i,j=1 is nonsingular. Since clearly t(Fi , Gj ) = t(Fi′ , Gj ), this proves the proposition. The following corollary of these constructions goes back to Whitney [1932]. We have seen that the homomorphism functions satisfy the multiplicativity relations hom(F1 F2 , G) = hom(F1 , G)hom(F2 , G) (where F1 F2 denotes disjoint union). Is there any other algebraic relation between them? Using multiplicativity, we can turn any algebraic relation to a linear relation, so the question is: are the graph parameters hom(F, .) linearly independent (in the sense that any finite number of them are). Thus (b) above implies: Corollary 5.45. The simple graph parameters hom(F, .) (where F ranges over simple graphs) are linearly independent. Equivalently, the simple graph parameters hom(F, .) (where F ranges over connected simple graphs) are algebraically independent. What about non-algebraic relations? Such relations sound unlikely, and in fact it can be proved (Erd˝os, Lov´asz and Spencer [1979]) that they don’t exist. To be more precise, for any finite set of(distinct connected graphs A = {F1 , . . . , Fk }, if we ) construct the set T (A) of points t(F1 , G), . . . , t(Fk , G) ∈ Rk , where G ranges over all finite graphs, then the closure T (A) has an internal point. We will talk more about these sets T (A) in Chapter 16. A Exercise 5.46. Show by an example that Mhom may be singular. Exercise 5.47. Prove a version of part (a) of Proposition 5.44 in which the edges are weighted (instead of the nodes). Exercise 5.48. Find an upper bound on the number of nodes in the graphs Gi in part (b) and (c) of Proposition 5.44. Exercise 5.49. For every m ≥ 1, construct a family A of m simple graphs such A that the matrix Mhom is the identity matrix. Exercise 5.50. For every m ≥ 1 there exist simple graphs F1 , . . . , Fm such that for every integer vector a ∈ Nm there is a simple graph G such that hom(Fi , G) = ai for all i ∈ [m].
5.6. CHARACTERIZING HOMOMORPHISM NUMBERS
75
Exercise 5.51. Let H1 , . . . , Hm be non-isomorphic simple graphs. Prove that there are no linear relations between the graph parameters hom(., Hi ). Exercise 5.52. (a) Let H1 , . . . , Hm be non-isomorphic simple connected graphs. Prove that there are no linear relations between the graph parameters hom(., Hi ), even when they are restricted to connected graphs. (b) Show that this is no longer true if we don’t assume the connectivity of the Hi . Exercise 5.53. (a) Let H1 , . . . , Hm be simple nonisomorphic connected nonbipartite graphs. Prove that there is a simple connected graph F such that the homomorphism numbers hom(F, Gi ) are distinct. (b) Show that for every simple graph F , at least two of the numbers hom(F, C6 ), hom(F, K2 ) and hom(F, K3 K3 ) are equal.
5.6. Characterizing homomorphism numbers In the previous sections (e.g. in the proof of Theorem 5.34 and related results) our key tool was to associate, with every graph G, the graph parameter hom(., G). What else can be said about these graph parameters? It turns out that they have an interesting characterization, which will play an important role throughout this book. There are different versions of this characterization, of which we state a sample. Multigraph parameters of the form hom(., H), where H is a weighted graph, were characterized by Freedman, Lov´asz and Schrijver [2007]. Theorem 5.54. Let f be a graph parameter defined on multigraphs without loops. Then f is equal to hom(., H) for some weighted graph H on q nodes if and only if it is reflection positive, f (K0 ) = 1, and r(f, k) ≤ q k for all k ≥ 0. Let us note that the condition for k = 0 says that r(f, 0) ≤ 1, which implies that f is multiplicative (Exercise 4.4). In terms of statistical physics, this theorem can be viewed as a characterization of partition functions of vertex coloring models. Theorem 5.54 implies that those graph parameters that can be expressed as homomorphism numbers into fixed weighted graphs are all reflection positive and have exponentially bounded connection rank. It may be instructive to see directly why this is so for the number of nowhere-zero flows. Example 5.55. For two k-labeled graphs G1 and G2 , the value flo(G1 G2 , q) can be computed by a simple formula provided we know, for all a1 . . . , ak ∈ Zq , the number flo(Gi ; a1 , . . . , ak ) of nowhere-zero q-flows in Gi with “surplus” ai at each node i ∈ [k]; then we have a formula similar to (4.6), except that the summation will range over a ∈ Zkq instead of Zk : ∑ flo(G1 G2 , q) = flo(G1 ; a1 , . . . , ak )flo(G2 ; −a1 , . . . , −ak ) a
(5.51)
=
∑
flo(G1 ; a1 , . . . , ak )flo(G2 ; a1 , . . . , ak ).
a
From this, we see that M (flo, k) is positive semidefinite and has rank at most q k . Schrijver [2009] gave the following characterization of graph parameters representable as homomorphism functions into weighted graphs with node weights 1 and complex edgeweights. Recalling the M¨obius inverse on the partition lattice (4.3), we can state the result as follows:
76
5. GRAPH HOMOMORPHISMS
Theorem 5.56. Let f be a complex valued graph parameter defined on looped multigraphs. Then f = hom(., H) for some edge-weighted graph H on q nodes with complex edgeweights if and only if f is multiplicative, f (K1 ) = q, and f ⇓ (G) = 0 for every graph G with more than q nodes. Using this theorem, Schrijver gave a real-valued version, which is more similar to Theorem 5.54. Theorem 5.57. Let f be a real valued graph parameter defined on looped multigraphs. Then f = hom(., H) for some edge-weighted graph H with real edgeweights if and only if f is multiplicative and, for every integer k ≥ 0, the multilabeled connection matrix M mult (f, k) is positive semidefinite. Every graph parameter f defined on looped-simple graphs can be extended to looped-multigraphs so that it is invariant under adding parallel edges. Every homomorphism function hom(., H) where all edge-weights are 0 or 1 defines such a multigraph parameter. Conversely, if f = hom(., H) (where H is a weighted graph) is invariant under adding parallel edges, then every edge of H must have weight 0 or 1 (Exercise 5.66). In particular, if all nodeweights of H are 1, then H can be viewed as a looped-simple graph itself. Hence Theorem 5.57 implies the following characterization of homomorphism numbers into looped-simple graphs, as noticed by Lov´ asz and Schrijver [2010, 2009]: Corollary 5.58. Let f be a graph parameter defined on looped-simple graphs. Then f = hom(., H) for some looped-simple graph H if and only if f is multiplicative and, for every integer k ≥ 0, the connection matrix M mult (f, k) is positive semidefinite. Note that in Theorem 5.57 and Corollary 5.58 no bound on the connection rank is assumed; in fact (somewhat surprisingly), it follows from the multiplicativity and reflection positivity conditions that f has finite connection rank, and r(f, k) ≤ f (K1 )k for all k. Furthermore, in Corollary 5.58 it also follows from the conditions that the values of f are integers. Next, we state an analogous (dual) characterization of graph parameters of the form hom(F, .), defined on looped-simple graphs, where F is also a looped-simple graph (Lov´ asz and Schrijver [2010]). To state the result, we need some definitions. Recall the notion of H-colored graphs and their products from the proof of Lemma 5.38; we need only the rather trivial version where H = Kq◦ is a fully looped complete graph. We define dual connection matrices N (f, q) of a graph parameter f : the rows and columns are indexed by Kq◦ -colored graphs, and the entry in row G1 and column G2 is f (G1 × G2 ). Theorem 5.59. Let f be a graph parameter defined on looped-simple graphs. Then f = hom(F, .) for some looped-simple graph F if and only if f is multiplicative over direct product, and for each k ≥ 1, the dual connection matrix N (f, k) is positive semidefinite. It is interesting to note that “primal” connection matrices of these “dual” homomorphism numbers hom(F, .) also have finite rank (see Exercise 5.67). However, no characterization of homomorphism numbers in terms of these “primal” connection matrices is known.
5.6. CHARACTERIZING HOMOMORPHISM NUMBERS
77
5.6.1. Randomly weighted graphs. The last result to be presented in this line is a characterization of multiplicative and reflection positive multigraph parameters with finite connection rank (Lov´asz and Szegedy [2012c]). To state the result, we have to generalize the notion of weighted graphs. A randomly weighted graph is a finite graph H (which we may assume to be a looped complete graph) in which the nodes are weighted with positive real numbers (just like in the case of ordinary weighted graphs) and each edge ij is weighted by a random variable Bij taking values from a finite set of reals. Ordinary weighted graphs can be regarded as randomly weighted graphs in which the edgeweights are random variables concentrated on a single value. To define homomorphism numbers into a randomly weighted graph takes a little care. A first idea is to define it as the expectation of hom(F, H) where H is the weighted graph where the edgeweights are generated randomly and independently from the corresponding distributions. However, this quantity would not be multiplicative. We could start with taking the expectation separately for each edge; this would then give nothing new relative to the homomorphism numbers into weighted graphs. We therefore take a middle ground: ∑ ∏ ∏ Fij (5.52) hom(F, H) = αφ(i) E(Bφ(i)φ(j) ), φ: V (F )→V (H) i∈V (F )
ij∈E(F simp )
where Fij is the multiplicity of the edge ij in F . This quantity is multiplicative, and it specializes to the previously defined homomorphism number when the edge weights are deterministic. We note two special cases. If F is simple, then we could take the expectation all the way in; in other words, homomorphisms into randomly weighted graphs give no new simple graph parameters. On the other hand, if we consider inj(F, H) (restricting the summation in (5.52) then we can take the expectation ( to injections), ) all the way out, i.e., inj(F, H) = E inj(F, H) . simp
Example 5.60. Consider the multigraph parameter f (G) = pe(G ) , where 0 < p < 1 is fixed. It is not hard to see that this is reflection positive and its conk nection rank is 2(2) . We can characterize it as hom(G, K1◦ [p]), where K1◦ [p] is the randomly weighted graph on a single node with a loop, where the loop is decorated by the probability distribution on {0, 1} in which 1 has probability p. We can also characterize it as the expectation of tinj (G, G(n, p)), where n ≥ v(G). This paramk eter is multiplicative, reflection positive, and its connection rank if r(f, k) = 2(2) , which is finite for every k, but has superexponential growth. With this generalized notion of homomorphism numbers, we are able to state the theorem announced above: Theorem 5.61. A multigraph parameter f is equal to hom(., H) for some randomly weighted graph H if and only if it is multiplicative, reflection positive and r(f, 2) is finite. It would not be enough to assume that r(f, 1) is finite instead of r(f, 2) (see Exercise 5.65). While in Theorem 5.61 we don’t have to assume anything about the higher connection ranks, it does follow that they are all finite. In fact, we have the following “Theorem of Alternatives”:
78
5. GRAPH HOMOMORPHISMS
Supplement 5.62. Let f be a multiplicative and reflection positive parameter defined on multigraphs without loops. Then one of three alternative must occur: (i) r(f, k) is infinite for all k ≥ 2; (ii) r(f, k) is finite for all k, and log r(f, k) = Θ(k); (iii) r(f, k) is finite for all k, and log r(f, k) = Θ(k 2 ). It follows that alternative (ii) obtains iff f = hom(., H) for some weighted graph H, and alternative (iii) obtains when f = hom(., H) for some randomly weighted graph H in which at least one edgeweight has a proper distribution. It is possible to give a more precise description of the asymptotic behavior of log r(hom(., H), k), but we have to refer to the paper of Lov´asz and Szegedy [2012c] for details. Let us note that no such conclusion can be drawn without assuming reflection positivity. For example, the chromatic polynomial chr(., x) satisfies log rk(chr(., x), k) = Θ(k log k). Remark 5.63. Several further improvements, versions and extensions of these results have been obtained, extending them to directed graphs, hypergraphs, semigroups, and indeed, to all categories satisfying reasonable conditions. In this book, related characterizations will be described for homomorphisms into graphons and random graphons (Theorem 11.52 and Proposition 14.60), morphisms in categories (Theorem 23.16), and edge coloring models (Theorem 23.5). One would wish to derive all of these from a single “Master Theorem”; alas, this has not yet been found. The least appealing feature of these theorems is that the necessary and sufficient condition involves infinite matrices, and in most cases infinitely many of them. While this is clearly unavoidable in a sense (the condition must involve the value of the parameter on all graphs), one can formulate conditions that involve only submatrices with a simpler structure. For example, in Theorem 5.61, it suffices to consider fully and simply labeled graphs, fully labeled edgeless graphs, and fully labeled bonds (cf. also the proof of Theorem 5.57 given is Section 6.6). 5.6.2. About the proofs. The proofs of the theorems above follow at least three different lines. To be more precise, the necessity of the conditions is easy to prove; below we prove the “easy” direction of Theorem 5.54, and the others follow by essentially the same argument. The sufficiency parts will be postponed until some further techniques will be developed: —The completion of the proof of Theorem 5.54 will be given in Section 6.2.2, after the development of graph algebras. (These algebras will be useful to study other related properties of homomorphism functions, and the technique will also be applied in extremal graph theory.) Corollary 5.58 and its dual, Theorem 5.59, can be proved by a similar technique. This technique extends to a much more general setting, to categories, as we will sketch in Section 23.4. —The proofs of Theorems 5.56 and 5.57 will be described in Section 6.6, where a general connection to the Nullstellensatz and invariant theory will be developed. This method extends to edge coloring models (see Section 23.2). —The proof of Theorem 5.61 will use a lot of the analytic machinery to be developed in Part 3 of the book, and will be sketched at the end of that part (Section 17.1.4). For the details of this proof, and for the proof of Supplement 5.62, we refer to Lov´ asz and Szegedy [2012c]. We conclude this section proving the “easy” direction in Theorem 5.54:
5.7. THE STRUCTURE OF THE HOMOMORPHISM SET
79
Proposition 5.64. For every weighted graph H, the graph parameter hom(., H) is reflection positive and r(hom(., H), k) ≤ v(H)k . Proof. For any two k-labeled graph F1 and F2 and φ : [k] → V (H), we have (5.53)
homφ (F1 F2 , H) = homφ (F1 , H)homφ (F2 , H)
(recall the definition of homφ from (5.10)). Let F = [[F1 F2 ]], then the decomposition ∑ hom(F, H) = αφ homφ (F, H). φ: [k]→V (H)
writes the matrix M (hom(., H), k) as the sum of v(H)k matrices, one for each mapping φ : [k] → V (H); (5.53) shows that these matrices are positive semidefinite and have rank 1. This implies Lemma 5.64. Exercise 5.65. For a multigraph G on [n], let X1 , . . . , Xn be random points on the unit circle, and let f (G) denote the probability that Xi · Xj ≥ 0 for every edge ij of G. Prove that f is reflection positive, r(f, 0) = r(f, 1) = 1, but r(f, 2) = ∞. Exercise 5.66. Let H be a weighted graph for which the looped-multigraph parameter hom(., H) is invariant under adding parallel edges. Prove that all edgeweights of H are 0 or 1. Exercise 5.67. Prove that the connection rank r(hom(F, .), k) is bounded by (k + 2)v(F ) .
5.7. The structure of the homomorphism set We have discussed the existence of homomorphisms between two graphs F and G, i.e., the emptiness or non-emptiness of the homomorphism set Hom(F, G). We considered the size of this set (in fact, much of this book turns around this number). This set has further structure, which is quite interesting and which can be exploited to obtain combinatorial results about graphs. We only give a glimpse of these questions. 5.7.1. The graph of homomorphisms. Let F and G be two simple graphs. The set Hom(F, G) can be endowed with a graph structure. Brightwell and Winkler [2004] define a graph Hom(F, G) by connecting two nodes, meaning homomorphisms φ, ψ : F → G, if they differ only on one node of F . Example 5.68 (Linegraph). If F = K2 , then we get a version of the line-graph of G: every edge of G will be represented by two nodes in Hom(K2 , G) corresponding − → → will to the two orientations of the edge, and two nodes (oriented edges ij and − uv) be connected if either i = u or j = v. (The ordinary linegraph is obtained by merging the two copies of each edge.) Example 5.69 (Colorings). Let the target graph G be a complete q-graph. In this case, nodes are legitimate q-colorings of F , and two of them are adjacent if only one node of F is recolored to get one coloring from the other. This construction is important when analyzing the “heat bath” or “Glauber dynamics” Markov chain in statistical physics, which corresponds to a random walk on this graph. We will need this Markov chain in Section 20.1.2.
80
5. GRAPH HOMOMORPHISMS
Brightwell and Winkler relate properties of the Hom graph to a number of important issues in statistical physics, like long-range actions and phase transitions. These would be too difficult to state here, but a corollary of their main result is worth formulating: Theorem 5.70. Suppose that G is a graph such that the graph Hom(F, G) is connected for every connected graph F with maximum degree d. Then the chromatic number of G is at least d/2 + 1. (They conjecture that d/2 + 1 can be replaced by d.) 5.7.2. The complex of homomorphisms. The set Hom(F, G) can also be equipped with a topological structure. We say that a set of homomorphisms φ1 , . . . , φk : F → G is a cluster if for every edge uv ∈ E(F ) and any 1 ≤ i < j ≤ k, we have φi (u)φj (v) ∈ E(G). It is clear that these clusters form a simplicial complex Hom(F, G) (i.e., they are closed under taking subsets). It is quite surprising that topological properties of this complex have graph-theoretic consequences. What is important about this construction is that it is “functorial”, which means that every homomorphism ψ : G1 → G2 induces a simplicial (and hence continuous) map ψb : Hom(F, G1 ) → Hom(F, G2 ) in a canonical way: For every b homomorphism ( )φ : F → G1 , we define ψ(φ)(= φψ. )It is trivial that this map from V Hom(F, G1 ) = Hom(F, G1 ) to V (Hom F, G2 ) = Hom(F, G2 ) maps clusters onto clusters. We also note that the automorphism group of F acts on Hom(F, G): ˇ : φ 7→ αφ is an automorphism of Hom(F, G). if α is an automorphism of F , then α We quote two theorems relating properties of these topological spaces to colorability of the graph, and they are important tools in determining the chromatic number of certain graph families. (See Kozlov [2008] and Matouˇsek [2003] for detailed treatments of this topic.) The first is a re-statement in this language of a result of Lov´ asz [1978]. Theorem 5.71. If Hom(K2 , G) is k-connected as a topological space, then the chromatic number of G is at least k + 3. The second theorem is due to Babson and Kozlov [2003, 2006, 2007]. Theorem 5.72. If Hom(C2r+1 , G) is k-connected as a topological space for some r ≥ 1, then the chromatic number of G is at least k + 4. These results suggest the more general assertion that if Hom(F, G) is kconnected as a topological space, then χ(G) ≥ k + χ(F ) + 1. This is, however, false, as shown by Hoory and Linial [2005]. But the relationship between the chromatic numbers of F and G and the topology of the complex Hom(F, G) is mostly unexplored. Exercise 5.73. Prove that if G is a simple graph with maximum degree d, then for all q ≥ d + 2 the graph Hom(G, Kq ) is connected, i.e., we can transform any q-coloring of G into any other, changing the color of one node at a time, going through legitimate q-colorings. Exercise 5.74. Prove that the graph Hom(F, G) is connected if and only if the simplicial complex Hom(F, G) is connected. Exercise 5.75. Prove that Hom(., .) is a contravariant functor in its first F1 → F2 induces a simplicial map variable: every homomorphism ξ : φ ˇ : Hom(F2 , G) → Hom(F1 , G). Analogously, Hom(., .) is a covariant functor in its second variable.
5.7. THE STRUCTURE OF THE HOMOMORPHISM SET
Exercise 5.76. Let α be an automorphism of F which has an orbit on V (F ) that is not a stable set. Assume that G has no loops. Then the simplicial map α ˇ is fixed-point-free on the geometric realization of Hom(F, G). Exercise 5.77. Prove that Hom(K2 , Kp ) is homotopy equivalent to the (p − 2)dimensional sphere.
81
CHAPTER 6
Graph algebras and homomorphism functions 6.1. Algebras of quantum graphs A quantum graph is defined as a formal linear combination of a finite number of multigraphs with real coefficients. To be pedantic, let’s add that these coefficients can be zero, but terms with zero coefficient can be deleted without changing the quantum graph. Those graphs that occur with non-zero coefficient are called the constituents of x. Quantum graphs form an infinite dimensional linear space, which we denote by Q0 . ∑n Every graph parameter ∑nf can be extended to quantum graphs linearly: if x = λ F , then f (x) = of hom(F, G) i i i=1 i=1 λi f (Fi ). In particular, the definition ∑n α Fi and y = and t(F, G) extends to quantum graphs bilinearly: if x = i i=1 ∑m β G , then we define j j j=1 hom(x, y) =
n ∑ m ∑
αi βj hom(Fi , Gj ),
i=1 j=1
and similarly for t(x, y). (Most of the time we will use linearity in the first argument only.) Quantum graphs are useful in expressing various combinatorial situations. For example, for any signed graph F , we consider the quantum graph ∑ ′ (6.1) x= (−1)e(F )−|E+ | F ′ , F′
where the summation extends over all simple graphs F ′ such that V (F ′ ) = V (F ) and E+ ⊆ E(F ′ ) ⊆ E+ ∪ E− . By inclusion-exclusion we see that for any simple graph G, hom(F, G) = hom(x, G) is the number of maps V (F ) → V (G) that map positive edges onto edges and negative edges onto non-edges. The equation hom(F, G) = hom(x, G) remains valid if G is a weighted graph (one way to see it is to expand the parentheses in definition (5.9)). Due to these nice formulas, we will denote the quantum graph x by F ; this will not cause any confusion. The relationships between homomorphism numbers and injective homomorphism numbers, equations can be expressed as follows: For every ∑ (5.16) and (5.18), ∑ G/P and M G = graph G, let ZG = P P µP G/P , where P ranges over all partitions of V (G). Here the quotient graph G/P is defined by merging every class into a single node, and adding up the multiplicities of pre-images of an edge to get its multiplicity in G/P . Then (6.2)
hom(F, G) = inj(ZF, G),
and inj(F, G) = hom(M F, G).
More generally, for any graph parameter f , we have f (M G) = f ⇓ (G). The operators Z and M extend to linear operators Z, M : Q0 → Q0 . Clearly, they are 83
84
6. GRAPH ALGEBRAS AND HOMOMORPHISM FUNCTIONS
inverses of each other: ZM = M Z = idQ0 . (In Appendix A.1 these operators are discussed for general lattices.) We will see that other important facts, like the contraction/deletion relation of the chromatic polynomial (4.5) can also be conveniently expressed by quantum graphs (cf. Section 6.3). For any k ≥ 0, a k-labeled quantum graph is a formal linear combination of k-labeled graphs. We say that a k-labeled quantum graph is simple [loopless] if all its constituents are simple [loopless]. 6.1.1. The gluing algebra. Let Qk denote the (infinite dimensional) vector space of k-labeled quantum graphs. We can turn Qk into an algebra by using the gluing product F1 F2 introduced in Section 4.2 as the product of two generators, and then extending this multiplication to the other elements of the algebra by linearity. Clearly Qk is associative and commutative. The fully labeled graph Ok on [k] with no edges is the multiplicative unit in Qk . Every graph parameter f can be extended linearly to quantum graphs, and defines an inner product on Qk by (6.3)
⟨x, y⟩ = f (xy).
This inner product has nice properties, for example it satisfies the Frobenius identity (6.4)
⟨x, yz⟩ = ⟨xy, z⟩.
Let Nk (f ) denote the kernel (annihilator) of this inner product, i.e., Nk (f ) = {x ∈ Qk : f (xy) = 0 ∀y ∈ Qk }. Note that it would be equivalent to require this condition for (ordinary) k-labeled graphs only in place of y. Sometimes we write this condition as x ≡ 0 (mod f ), and then use x ≡ y (mod f ) if x − y ≡ 0 (mod f ). We define the factor algebra Qk /f = Qk /Nk (f ). Formula (6.3) still defines an inner product on Qk /f , and identity (6.4) remains valid. While the algebra Qk is infinite dimensional, the factor algebra Qk /f is finite dimensional for many interesting graph parameters f . Proposition 6.1. The dimension of Qk /f is equal to the rank of the connection matrix M (f, k). The inner product (6.3) is positive semidefinite on Qk if and only if M (f, k) is positive semidefinite. So if the parameter f is reflection positive, then the inner product is positive semidefinite on every Qk ; equivalently, it is positive definite on Qk /f . It follows that the examples in section 4.3 provide several graph parameters for which the algebras Qk /f have finite dimension. This means that in these cases our graph algebra Qk /f is a Frobenius algebra (see Kock [2003]). For a reflection positive parameter, the inner product is positive definite on Qk /f , so it turns Qk /f into an inner product space. Example 6.2 (Number of perfect matchings). Consider the number pm(G) of perfect matchings in the graph G. It is a basic property of this value that subdividing an edge by two nodes does not change it. This can be expressed as ≡ where the black nodes are labeled.
(mod pm),
6.1. ALGEBRAS OF QUANTUM GRAPHS
85
Sometimes it will be convenient to put all k-labeled graphs into a single structure as follows. Recall the notion of partially labeled graphs from Section 4.2, and also the notion of their gluing product. Let QN denote the (infinite dimensional) vector space of formal linear combinations (with real coefficients) of partially labeled graphs. We can turn QN into an algebra by using the product G1 G2 introduced above (gluing along the labeled nodes) as the product of two generators, and then extending this multiplication to the other elements linearly. Clearly QN is associative and commutative, and the empty graph is a unit element. A graph parameter f defines an inner product on the whole space QN by (6.3), and we can consider the kernel N (f ) = {x ∈ QN : ⟨x, y⟩ = 0 ∀y ∈ QN } of this inner product. It is not hard to see that Nk (f ) = Qk ∩ N (f ). For every finite set S ⊆ N, the set of all formal linear combinations of S-labeled graphs form a subalgebra QS of QN . We set QS /f = {x/f : x ∈ QS }. Clearly QS /f is a subalgebra of QN /f , and it is not hard to see that QS /f ∼ = Q|S| /f . The graph with |S| nodes labeled by the elements of S and no edges, which we denote by OS , is a unit in the algebra QS . 6.1.2. The concatenation algebra. There is another algebra on these vector spaces. For two 2-multilabeled multigraphs F and G, we define their concatenation by identifying node 2 of F with node 1 of G, and unlabeling this merged node. We denote the resulting 2-labeled graph by F ◦G. It is easy to check that this operation is associative (but not commutative). We extend this operation linearly over Q2 . This algebra has a ∗ (conjugate) operation: for a 2-labeled graph F , we define F ∗ by interchanging the two labels. Clearly (F ◦G)∗ = G∗ ◦F ∗ . We can also extend this linearly over Q2 . Let f be a graph parameter. It is easy to see that if x ≡ 0 (mod f ) then x∗ ≡ 0 (mod f ), so the ∗ operator is well defined on elements of Q2 /f . A further important property of concatenation is that for any three 2-labeled graph F , G and H, ∼ F (H ◦ G∗ ), (F ◦ G)H = and hence (6.5)
( ) ( ) f (x ◦ y)z = f x(z ◦ y ∗ ) ,
for any three elements x, y, z ∈ Q2 . It follows that if x ≡ 0 (mod f ) then x ◦ y ≡ 0 (mod f ) for every y ∈ Q2 and thus concatenation is well defined on the elements of Q2 /f . It is easy to see that (Q2 /f, ◦) is an associative (but not necessarily commutative) algebra. We can think of a 2-labeled graph as a graph having one labeled node on its left side and one on its right side. Then concatenation means that we identify the right labeled node of one graph with the left labeled node of another. This suggests a generalization: Instead of a single node, we consider graphs that have k labeled nodes on each side. Let’s say the labels are 1, . . . , k on both sides, so each label occurs twice, once on the left and once on the right. It is convenient to allow that one and the same node gets a left label and a right label. Such a graph will be called (k, k)-labeled, and we denote their set by Fk,k . We can define a multiplication on Fk,k , denoted by ◦, in which we identify each right labeled node of the first graph with the left labeled node of the second graph with the same label. We can take the space Qk,k of “quantum bi-labeled graphs”,
86
6. GRAPH ALGEBRAS AND HOMOMORPHISM FUNCTIONS
i.e., formal linear combinations of graphs in Fk,k . The graph Ok on [k] with no edges, with its nodes labeled 1, . . . , k from both sides, is a unit in the algebra. This algebra is associative, but not commutative. It has a “conjugate” operation, which we denote by ∗, of interchanging “left” and “right”. This is related to multiplication through the identity (A ◦ B)∗ = B ∗ ◦ A∗ . Given a graph parameter f defined on looped-multigraphs, we can define the inner product of two (k, k)-labeled graphs as before: we consider them as multilabeled graphs (where left label i is different from right label i), form their gluing product, and evaluate the parameter on the resulting multigraphs. (We have to work with multilabeled graphs, since a node is allowed to have two labels. As a consequence, the gluing product can have loops.) A further natural generalization involves graphs with possibly different numbers of labeled nodes on the left and on the right. Let Fk,m denote the set of multigraphs with k labeled nodes on the left and m labeled nodes on the right. We cannot form the product of any two graphs, but we can multiply a graph F ∈ Fk,m with a graph G ∈ Gm,n to get a graph F ◦ G ∈ Fk,n . So bi-labeled graphs form the morphisms of a category, in which the objects are the natural numbers. The star operation (interchanging left and right) maps Fk,m onto Fm,k . Any graph parameter f defines a scalar product on every Fk,m by ⟨F, G⟩ = f (F G), where F G is defined by identifying nodes with the same left-label as well as nodes with the same right-label in the disjoint union of F and G. Just as above, the operations ◦, ∗, and ⟨., .⟩ extend linearly to the linear spaces Qk,m of formal linear combinations of graphs in Fk,m . This leads us to semisimple categories and topological quantum field theory (see Witten [1988]), which topics are beyond the limits of this book. 6.1.3. Unlabeling. Having defined the graph algebras we need, we are going to describe the relationship between algebras of labeled graphs using different label sets. There is nothing terribly deep or surprising here; but it might serve as a warm-up, illustrating how combinatorial and algebraic constructions correspond to each other. The unlabeling operator G 7→ [[G]]S extends to Q by linearity. We note that for any two partially labeled graphs G and H, [[[[G]]S H ]] ∼ = [[G[[H]]S ]], = [[[[G]]S [[H]]S ]] ∼ and hence we get the identity (6.6)
⟨[[x]]S , y⟩ = ⟨[[x]]S , [[y]]S ⟩ = ⟨x, [[y]]S ⟩
(x, y ∈ Q).
By a similar argument we get that if S, T ⊂ N are finite sets, then (6.7)
⟨x, y⟩ = ⟨[[x]]S∩T , [[y]]S∩T ⟩
(x ∈ QS , y ∈ QT )
One consequence of identity (6.6) is that if some x ∈ Q is congruent modulo f to some S-labeled quantum graph y ∈ QS , then such a y can be obtained by simply removing the labels outside S: (6.8)
x − y ∈ N (f ) =⇒ x − [[x]]S ∈ N (f ).
Indeed, for any z ∈ Q, we have ⟨x − [[x]]S , z⟩ = ⟨x, z⟩ − ⟨[[x]]S , z⟩ = ⟨y, z⟩ − ⟨[[x]]S , z⟩ = ⟨y, [[z]]S ⟩ − ⟨x, [[z]]S ⟩ = ⟨y − x, [[z]]S ⟩ = 0.
6.1. ALGEBRAS OF QUANTUM GRAPHS
87
As a special case, we get that [[x]]S ∈ N (f ) for all x ∈ N (f ). This implies that the operator x 7→ [[x]]S is defined on the factor algebra Q/f , and in fact it gives the orthogonal projection of Q/f to the subalgebra QS /f . Indeed, by (6.6) ⟨[[x]]S , x − [[x]]S ⟩ = ⟨x, [[x − [[x]]S ]]S ⟩ = ⟨x, [[x]]S − [[x]]S ⟩ = 0. Another consequence of (6.8) is that for every x ∈ Q there is a unique smallest set S ⊂ N such that x ≡ [[x]]S (mod f ). For the rest of this section, we assume that f is multiplicative and normalized so that f (K1 ) = 1. (This latter condition is usually easily achieved by replacing f (G) by f (G)/f (K1 )v(G) .) One important consequence of this assumption is that deleting isolated nodes (labeled or unlabeled) from a graph G does not change f (G). This implies that it does not change G/f either. Indeed, let F denote the graph obtained from G by deleting some isolated nodes, then for every partially labeled graph H, the products F H and GH differ only in isolated nodes, and hence f (F H) = f (GH), showing that F/f = G/f . In particular, every graph with no edges has the same image in Q/f , which is the unit element of Q/f . Lemma 6.3. Let f be a multiplicative and normalized graph parameter, and let S ⊆ T be finite subsets of N. (a) If S ⊆ T , then QS /f has a natural embedding into QT /f . (b) For any two S, T ⊆ N, we have QS /f ∩ QT /f = QS∩T /f . (c) If S ∩ T = ∅, then QS QT ∼ = QS ⊗ QT and (QS /f )(QT /f ) ∼ = QS /f ⊗ QS /f . Proof. (a) Every S-labeled graph G can be turned into a T -labeled graph G′ by adding |T \ S| new isolated nodes, and label them by the elements of T \ S. (This is equivalent to multiplying it by UT ). As remarked above, G − G′ ∈ N (f ), and so G/f = G′ /f . (b) The containment ⊇ follows from (a). To prove the other direction, we consider any z ∈ QS /f ∩ QT /f . Then we have an x ∈ QS with x/f = z, and a y ∈ QT with y/f = z. So x−y = x−[[y]]T ∈ N (f ), and so by (6.8), x−[[x]]T ∈ N (f ). But we can write this as [[x]]T − [[x]]S ∈ N (f ), and then by the same reasoning [[x]]T − [[[[x]]T ]]S = [[x]]T − [[x]]T ∩S ∈ N (f ), showing that x − [[x]]T ∩S ∈ N (f ), and so z = x/f ∈ QS∩T /f . (c) The first relation is trivial, since the partially labeled graphs F G, F ∈ FS• , G ∈ FT• are different generators of QS∪T . To prove the second, let a1 , a2 , . . . be any basis of QS /f and b1 , b2 , . . . , any basis of QT /f . Consider the map ai ⊗ bj 7→ ai bj (which is defined on a basis of QS /f ⊗ QT /f ), and extend it linearly to a map Φ : QS /f ⊗ QT /f → (QS /f )(QT /f ). We show that Φ is an isomorphism between QS /f ⊗ QT /f and (QS /f )(QT /f ). It is straightforward to check that Φ preserves product in the algebra and also the unit element. It is also clear that (QS /f )(QT /f ) is generated by the elements ai bj , so Φ is surjective. To prove that Φ is injective, suppose that ∑ there are real numbers cij of which a finite but positive number is nonzero such that i,j cij ai bj = 0. Then for every x ∈ QS /f and y ∈ QT /f , we have by multiplicativity ⟨ ⟩ ∑ ∑ ∑ ∑ cij f (xai )f (ybj ) = cij f (xyai bj ) = cij ⟨xy, ai bj ⟩ = xy, cij ai bj = 0. i,j
i,j
i,j
i,j
88
6. GRAPH ALGEBRAS AND HOMOMORPHISM FUNCTIONS
⟨ ∑ ⟩ ∑ Writing this equation as y, i,j cij f (xai )bj = 0, we see that i,j cij f (xai )bj = 0. Since the bi are linearly independent, this means that for every 1 ≤ j ≤ m, ⟨ ∑ ⟩ ∑ x, cij ai = cij f (xai ) = 0. i
i
∑ This implies that i cij ai = 0, and since the ai are linearly independent, it follows that cij = 0 for all i and j. Corollary 6.4. For every multiplicative graph parameter f with finite rank, r(f, k) is a supermultiplicative function of k in the sense that r(f, k + l) ≥ r(f, k)r(f, l). Proof. It follows from Lemma 6.3(c) that for any two disjoint finite sets S and T there is an embedding (6.9)
QS /f ⊗ QT /f ,→ QS∪T /f.
Considering the dimensions, the assertion follows.
Exercise 6.5. Prove that if all nodes of a simple graph F are labeled, then both F and the quantum graph Fb introduced above are idempotent in the algebra of simple partially labeled graphs: F 2 = F and Fb2 = Fb. Exercise 6.6. Let f be a graph parameter for which r(f, 2) = r is finite. (a) Prove that every path labeled at its endpoints can be expressed, modulo f , as a linear combination of paths of length at most r. (b) Prove that a 2-labeled m-bond B m•• can be expressed, modulo f , as a linear combination of 2-labeled k-bonds with k ≤ r − 1. (c) A series-parallel graph is a 2-labeled graph obtained from K2•• by repeated application of the gluing and concatenation operations. Prove that every seriesparallel graph can be expressed, modulo f , as a linear combination of seriesparallel graphs with at most 2r−1 edges.
6.2. Reflection positivity In this section we assume that f is reflection positive, multiplicative, normalized and has finite connection rank. We are going to prove that the gluing algebras (modulo f ) have a very tight structure. 6.2.1. The idempotent basis. Let S be a finite subset of N. If f is reflection positive and the dimension of QS /f is a finite number, then the factor algebra QS /f is a finite dimensional commutative Frobenius algebra: it has a commutative and associative product as well as a positive definite inner product, related by the Frobenius identity (6.4). This implies that QS /f has a very simple structure. Every element x ∈ QS /f defines a linear transformation Ax : QS /f → QS /f by Ax y = xy. Clearly x 7→ Ax is an algebra homomorphism, and the fact that the inner product is definite on QS /f implies that x 7→ Ax is injective. Commutativity and the Frobenius identity imply that Ax is symmetric, and that any two transformations Ax commute. This implies that there is an orthonormal basis in which all the Ax are simultaneously diagonal. Counting dimensions shows that every diagonal matrix is of the form Ax , and so QS /f is isomorphic with the algebra of diagonal matrices. Another way of saying this is that QS /f is isomorphic to Rm endowed with the coordinate-wise product and the usual inner product (where m = dim(QS /f ) = r(f, k)).
6.2. REFLECTION POSITIVITY
89
The algebra elements corresponding to the standard basis vectors form a (uniquely determined) basis BS = {pS1 , . . . , pSr } such that (pSi )2 = pSi (the basis elements are idempotent in the algebra) and pSi pSj = 0 for i ̸= j. We call this the idempotent basis of QS /f . If p ∈ Q/f is any nonzero idempotent, then f (p) = f (pp) = ⟨p, p⟩ > 0. In particular, > 0. Note, however, that f (pSi ) ̸= 1 in general, so the algebra isomorphism between QS /f and Rm does not preserve the inner product. This purely algebraic construction carries a lot of combinatorial information, as we shall see. But first, let us work out an example. f (pSi )
Example 6.7 (Eulerian property). Let us compute the algebras associated with Eul(G), the indicator function of G being eulerian (example 4.21). We know that this is a homomorphism function into a 2-node weighted graph again, hence it is reflection positive and r(Eul, k) ≤ 2k , but this last inequality will not hold with equality for k ≥ 1. The space Q0 /Eul is 1-dimensional. To determine Q1 /Eul, we note that for two 1-labeled graphs G1 and G2 , the product G1 G2 is eulerian if and only if both G1 and G2 are eulerian (G1 and G2 cannot have a single node with odd degree!), and so Eul(G1 G2 ) = Eul(G1 )Eul(G2 ) holds for 1-labeled graphs as well. This shows that Q1 /Eul is also 1-dimensional. Next, for general k, let Odd(G) denote the set of nodes of G with odd degree. Clearly G ≡ 0 (mod Eul) for any graph with Odd(G) ̸⊆ [k]. Since |Odd(G)| is even, the set Odd(G) is uniquely determined by the intersection Odd′ (G) = [k − 1] ∩ Odd(G). For two k-labeled graphs G1 and G2 , the product G1 G2 is eulerian if and only if Odd′ (G1 ) = Odd′ (G2 ) (which implies that the unlabeled nodes have even degree). Furthermore, Odd′ (G1 G2 ) = Odd′ (G1 )△Odd′ (G2 ), and hence G 7→ Odd′ (G) induces an algebra isomorphism between Qk /Eul and the group algebra of Z2k−1 . Hence r(Eul, k) = 2k−1 for k ≥ 1. The idempotents of a finite abelian group are determined by its characters, which in this simple case means that they are indexed by subsets S ⊆ [k − 1], and the idempotent is the group algebra element ∑ 1 pS = k−1 (−1)|S∩X| X. 2 X⊆[k−1]
The set Odd(G) can be expressed in this basis by discrete Fourier inversion: ∑ 1 (−1)|S∩X| pX . Odd(G) = k−1 2 X⊆[k−1]
It follows that
( ) G 7→ (−1)|S∩Odd(G)| : S ⊆ [k − 1] k−1
defines an algebra isomorphism between Qk /Eul and R2
.
For two idempotents p and q in Q/f , we say that q resolves p, if pq = q. It is clear that this relation is transitive. Lemma 6.8. Let r be any idempotent element of QS /f . Then r is the sum of those idempotents in BS that resolve it.
90
6. GRAPH ALGEBRAS AND HOMOMORPHISM FUNCTIONS
∑ Proof. Indeed, we can write r = p∈BS µp p with some scalars µp . Using that r is idempotent, we get that ∑ ∑ r = r2 = µp µp′ pp′ = µ2p p, p,p′ ∈BS
p∈BS
which shows that = µp for every p, and so µp ∈ {0, 1}. So r is the sum of some subset X ⊆ BS . It is clear that rp = p for p ∈ X and rp = 0 for p ∈ BS \ X, so X consists of exactly those elements of BS that resolve q. µ2p
As a special case, we see that (6.10)
u=
∑
p
p∈BS
is the unit element of QS (this is the image of the edgeless graph US ), and also the unit element of the whole algebra Q. Lemma 6.9. Let S ⊂ T be two finite sets. Then every q ∈ BT resolves exactly one element of BS . Proof. We have by (6.10) that ∑ ∑ u= p= p∈BS
p∈BS
and also u=
∑
∑
q,
q∈BT q resolves p
q,
q∈BT
so by the uniqueness of the representation we get that every q must resolve exactly one p. Lemma 6.10. If p ∈ BS and q resolves p, then [[q]]S =
f (q) f (p) p.
(q) Proof. Clearly ff (p) p ∈ QS /f . Furthermore, ⟨ ⟩ f (q) f (q) f (q) q− p, p = f (qp) − f (p2 ) = f (q) − f (p) = 0, f (p) f (p) f (p)
and for every other basic idempotent p′ ∈ BS , we have ⟨ ⟩ f (q) f (q) f (q) q− p, p′ = f (qp′ ) − f (pp′ ) = f (qpp′ ) − f (pp′ ) = 0. f (p) f (p) f (p) (q) This shows that ff (p) p is the orthogonal projection of q to QS /f . Since [[q]]S has the same characterization, the lemma follows.
Lemma 6.11. Let S, T ⊂ N be finite sets, let p ∈ BS∩T , and let q ∈ BS resolve p. Then for any x ∈ QT /f we have f (p)f (qx) = f (q)f (px). Indeed, by Lemma 6.10 and (6.7), f (qx) = f ([[q]]S∩T x) =
f (q) f (px). f (p)
Lemma 6.12. If both idempotents q ∈ BS and r ∈ BT resolve the same idempotent p ∈ BS∩T , then qr ̸= 0.
6.2. REFLECTION POSITIVITY
91
Indeed, by Lemma 6.11, f (qr) =
f (q) f (q) f (pr) = f (r) > 0. f (p) f (p)
u Let S ⊂ N and p ∈ BS . For u ∈ N \ S, let q1u , . . . , qD denote the elements of BS∪{u} resolving p. Note that for u, v ∈ N \ S, there is a natural isomorphism between QS∪{u} /f and QS∪{v} /f (induced by the map that fixes S and maps u onto v), and we may choose the labeling so that qiu corresponds to qiv under this isomorphism. Let T ⊃ S and V = T \ S. For every map φ : V → [D], let ∏ v (6.11) qφ = qφ(v) . v∈V
Lemma 6.13. The algebra elements qφ are nonzero idempotents in QT /f such that qφ qψ = 0 if φ ̸= ψ. It is clear that the qφ are idempotents. By Lemma 6.11, (6.12)
f (qφ ) = f (
∏
v∈V
v qφ(v) )=
v )) ( ∏ f (qφ(v) f (p) ̸= 0, f (p) v∈V
and so qφ ̸= 0. Finally, if φ ̸= ψ, then there is a v ∈ V such that φ(v) ̸= ψ(v), and v v then qφ(v) qψ(v) = 0, which implies that qφ qψ = 0. If p ∈ BS and S ⊂ T , |T | = |S| + 1, then the number of elements in BT that resolve p will be called the degree of p, and denoted by deg(p). Obviously this value is independent of which (|S| + 1)-element superset T of S we are considering. Lemma 6.14. If S ⊂ T , and q ∈ BT resolves p ∈ BS , then deg(q) ≥ deg(p). It suffices to show this in the case when |T | = |S| + 1. Let T = S ∪ {u} and u denote the elements of BT resolving p, where (say) q = q1u , v = N\T . Let q1u , . . . , qD v v and let q1 , . . . , qD be the basic idempotents in BS∪{v} resolving p. Then by Lemma 6.13, the elements q1u qiv , i ∈ [D] are nonzero idempotents in QS∪{u,v} /f resolving q, such that the product of any two of them is 0. Writing every such q1i as a sum of basic idempotents we see that deg(q) ≥ D. 6.2.2. Homomorphisms into weighted graphs. After all these preparations, we are able to complete the proof of Theorem 5.54. Our goal is to construct a weighted graph H for which f (G) = hom(G, H) for every loopless multigraph G. The non-degenerate case. We start with sketching the proof in the nondegenerate case, when dim(Qk /f ) = q k for all k. (This is in fact the generic case, in the sense that it occurs with probability 1 if f = hom(., H) for a weighted graph H with randomly chosen edge and nodeweights; see Section 6.4.1.) This implies that the embedding QS /f ⊗ QT /f ,→ QS∪T /f in (6.9) is an isomorphism. In this case, the idempotent bases of the algebras Qk /f are easy to construct explicitly: if p1 , . . . , pq is the idempotent basis of Q1 /f , then the elements pi1 ⊗· · ·⊗pik (ij ∈ [q]) form the idempotent basis of Qk /f . We can define a weighted complete graph H on [q] as follows: let αi = f (pi ) = f (p2i ) > 0 and define βij by expressing the graph k2 (a single edge with both nodes
92
6. GRAPH ALGEBRAS AND HOMOMORPHISM FUNCTIONS
labeled) in the idempotent basis: (6.13)
k2 =
∑
βij (pi ⊗ pj )
i,j∈[q]
To show that the weighted graph H obtained this way satisfies f (G) = hom(G, H) for any multigraph G, we may assume that V (G) = [k] and all nodes of G are labeled. Then we can write ∏ (6.14) G= Kuv , uv∈E(G)
where Kuv is the graph on k labeled nodes, with a single edge connecting u and v. Defining pφ = pφ(1) ⊗ · · · ⊗ pφ(k) for φ : [k] → [q], the k-labeled quantum graphs pφ form a basis of Qk /f consisting of idempotents, and hence ∑ (6.15) Ok = pφ . φ: [k]→[q]
By the definition of the βij , we have (6.16)
Kuv pφ = βφ(u)φ(v) pφ .
We want to evaluate f (G) = ⟨G, Ok ⟩. Substituting from (6.14) and (6.15), and using (6.16) and the Frobenius identity (6.4) repeatedly, we get that ∑ ∏ ∏ f (G) = βφ(u)φ(v) αφ(u) = hom(G, H). φ: [k]→[q] uv∈E(G)
u∈V (G)
The general (and degenerate) case takes more work, but we are ready to give it now. Proof of Theorem 5.54. The idea is that we find a basic idempotent p ∈ BS for a sufficiently large finite set S ⊆ N, with the property that the subalgebra pQ/f behaves like the whole algebra behaved in generic case. So the idempotent bases in it, and from these the weighted graph H, can be constructed explicitly. Bounding the expansion. If a basic idempotent p ∈ BS has degree D, then by Lemma 6.14, there are D basic idempotents in BT with |T | = |S| + 1 with degree ≥ D that resolve p. Hence if |T | ≥ |S|, then the dimension of QT is at least D|T \S| . It follows that the degrees of basic idempotents are bounded by q. Let us choose S and p ∈ BS so that D = deg(p) is maximum degree. Then it follows by Lemma 6.14 that all basic idempotents resolving p have degree exactly D. Describing the idempotents. Let us fix a set S and a basic idempotent p ∈ BS u with maximum degree D. For u ∈ N \ S, let q1u , . . . , qD denote the elements of BS∪{u} resolving p. We can describe, for a finite set T ⊃ S, all basic idempotents in BT that resolve p. Let V = T \ S, and for every map φ : V → {1, . . . , D}, let ∏ v (6.17) qφ = qφ(v) . v∈V
Note that by Lemma 6.11, (6.18)
f (qφ ) = f (
∏
v∈V
and so qφ ̸= 0.
v qφ(v) )
v )) ( ∏ f (qφ(v) = f (p) ̸= 0, f (p) v∈V
6.2. REFLECTION POSITIVITY
93
Claim 6.15. The basic idempotents in QT /f resolving p are exactly the algebra elements of the form qφ , φ ∈ {1, . . . , D}V . We prove this by induction on |T \ S|. For |T \ S| = 1 the assertion is trivial. Suppose that |T \ S| > 1. Let u ∈ T \ S, U = S ∪ {u} and W = T \ {u}; thus U ∩ W = S. By the induction hypothesis, the basic idempotents in BW resolving p are elements of the form qψ (ψ ∈ {1, . . . , D}V \{u} ). Let r be one of these. By Lemma 6.12, rqiu ̸= 0 for any 1 ≤ i ≤ D, and clearly resolves r. We can write rqiu as the sum of basic idempotents in BT , and it is easy to see that these also resolve r. Furthermore, the basic idempotents occurring in the expression of rqiu and rqju (i ̸= j) are different. But r has degree D, so each rqiu must be a basic idempotent in BT itself. Since the sum of the basic idempotents rqiu (r ∈ BW,p , 1 ≤ i ≤ D) is p, it follows that these are all the elements of BT,p . This proves the Claim. It is immediate from the definition that an idempotent qφ resolves qiv if and only if φ(v) = i. Hence it also follows that ∑ (6.19) qiv = qφ . φ: φ(v)=i
Constructing the target graph. Now we can define H as follows. Let H be the looped complete graph on V (H) = {1, . . . , D}. We have to define the node weights and edge weights. Fix any u ∈ N \ S. For every i ∈ V (H), let αi = f (qiu )/f (p) be the weight of the node j. Clearly αi > 0. Let u, v ∈ N\S, v ̸= u, and let W = S ∪{u, v}. Let Kuv denote the graph on W which has only one edge connecting u and v, and let kuv denote the corresponding element of QW . We can express pkuv as a linear combination of elements of BW,p (since for any r ∈ BW \ BW,p one has rp = 0 and hence rpku,v = 0): ∑ pkuv = βij qiu qjv . i,j
This defines the weight βij of the edge ij. Note that βij = βji , since pkuv = pkvu . Verifying the target graph. We prove that this weighted graph H gives the right homomorphism function: f (G) = hom(G, H) for every multigraph G. By (6.19), we have for each pair u, v of distinct elements of V (G) ∑ ∑ ∑ ∑ βi,j pkuv = βi,j qiu qjv = qφ = βφ(u),φ(v) qφ . i,j∈V (H)
i,j∈V (H)
φ: φ(u)=i φ(v)=j
φ∈V (H)V
Consider any V -labeled graph G with V (G) = V ⊆ N \ S, and let g be the corresponding element of Q/f . Then ∏ ∏ ( ∑ ) pg = pkuv = βφ(u),φ(v) qφ uv∈E(G)
=
∑
(
uv∈E(G) φ∈V (H)V
∏
φ:V →V (H) uv∈E(G)
) βφ(u),φ(v) qφ .
94
6. GRAPH ALGEBRAS AND HOMOMORPHISM FUNCTIONS
Since p ∈ QS /f and g ∈ QV /f where S ∩ V = ∅, we have f (p)f (g) = f (pg), and so by (6.12), ∑ ( ∏ ) βφ(u),φ(v) f (qφ ) f (p)f (g) = f (pg) = ∑
=
φ∈V (H)V uv∈E(G)
(
∏
βφ(u),φ(v)
φ:V →V (H) uv∈E(G)
)( ∏
) αφ(v) f (p) = hom(G, H)f (p).
v∈V (G)
The factor f (p) > 0 can be cancelled from both sides, completing the proof of the theorem. 6.3. Contractors and connectors We study the existence and properties of two special elements in graph algebras. These will serve as important building blocks for other constructions, like a more explicit description of the idempotent basis. 6.3.1. Contractors and connectors for general graph parameters. In the algebra Q2 of 2-multilabeled graphs, multiplication by the single node with two labels (denoted by K1•• ) results in identifying nodes labeled 1 and 2. In the factor algebra modulo a graph parameter, it may be the case that multiplication by some other graph has essentially the same result. For example, for the chromatic polynomial chr(., x), the contraction-deletion identity (A.10) can be written like this: (6.20)
K1•• ≡ O2•• − K2••
(mod chr(., x)).
or in pictograms (6.21)
12
• ≡
−
(mod chr(., x)).
We say that the 2-labeled quantum graph on the right side is a contractor for the graph parameter chr(., x). Starting with any multilabeled quantum graph, we can apply this identity repeatedly to construct a simply labeled quantum graph which represents the same element in Q/f . A consequence of the contraction-deletion relation is the identity (6.22)
≡
−
(mod chr(., x)),
which does not involve multiple labels. One way to read this is that the edge (with both endnodes labeled) can be replaced by the difference of two simple graphs in which the two labeled nodes are nonadjacent. We say that the 2-labeled quantum graph on the right is a connector for the graph parameter chr(., x). One important consequence of this identity is that applying it repeatedly, we can eliminate all edge multiplicities. To state the definition formally, let us say that a quantum graph x is a proper expansion of a partially labeled graph F modulo a graph parameter f , if x ≡ F (mod f ), and no constituent of x is of the form F G for some partially labeled graph G. Then a contractor is a proper expansion of K1•• , and a connector is a proper expansion of K2•• . Also, the remarks after (6.20) and (6.22) can be formalized like this:
6.3. CONTRACTORS AND CONNECTORS
95
Proposition 6.16. A graph parameter f has a contractor if and only if every multilabeled quantum graph is congruent modulo f to a simply labeled quantum graph. A graph parameter f has a simple connector if and only if every simply labeled quantum graph is congruent modulo f to a simply labeled simple quantum graph with no edge connecting the labeled nodes. We can put the notion of a contractor in a different context. Let Fkstab denote the set of k-labeled multigraphs in which the labeled nodes form an independent (stable) set, and let Qstab denote the subalgebra of Qk generated by them. For k a 2-labeled graph F ∈ F2stab , let F ′ denote the graph obtained by identifying the two labeled nodes and labeling it by 1. (So F ′ is obtained from the product F K1•• by removing the label 2.) The map F 7→ F ′ maps 2-labeled graphs to 1-labeled graphs. We can extend it linearly to get an algebra homomorphism x 7→ x′ from Qstab into Q1 . 2 The map x 7→ x′ does not in general preserve the inner product or even its kernel; we say that the graph parameter f is contractible, if for every x ∈ Qstab 2 , x ≡ 0 (mod f ) implies x′ ≡ 0 (mod f ); in other words, x 7→ x′ factors to a linear map Qstab 2 /f → Q1 /f . With this notation, z ∈ Q2 is a contractor for f if and only ′ if for every x ∈ Qstab 2 , we have f (xz) = f (x ). Contractors also relate to the algebra of concatenations: Proposition 6.17. A contractor for f is the multiplicative identity for the operation ◦ modulo f . Proof. We have to verify that if z is a contractor, then (6.23)
z◦x≡x
(mod f ) ( ) for all x ∈ Q2 . This is equivalent to f (z ◦ x)y = f (xy) for all x, y ∈ Q2 . Using (6.5) and that in x ◦ y the labeled nodes are nonadjacent, we obtain: ( ) ( ) ( ) f (z ◦ x)y = f z(y ◦ x∗ ) = f (y ◦ x∗ )′ = f (xy), which proves (6.23).
Note that in every constituent of x ◦ y the labeled nodes are nonadjacent for all x, y ∈ Q2 . It follows that if the algebra (Q2 /f, ◦) has a multiplicative identity (in particular, if it has a contractor), then every y ∈ Q2 /f can be represented by a 2-labeled quantum graph with nonadjacent labeled nodes. While the existence of a contractor, the existence of a connector, and contractibility are three different properties of a graph parameter, there is some connection, as expressed in the following propositions. Proposition 6.18. If a graph parameter has a contractor, then it is contractible. Proof. Let w be a contractor for f . Suppose that x ∈ Q2 satisfies x ≡ 0 (mod f ), and let y ∈ Q1 . Choose a z ∈ Q2 such that z ′ = y. Then ( ) ( ) ( ) f (x′ y) = f (x′ z ′ ) = f (xz)′ = f (xz)w = f x(zw) = 0, showing that x′ ≡ 0 (mod f ).
Proposition 6.19. If a graph parameter has a contractor, then it has a connector; if it has a simple contractor, then it has a simple connector.
96
6. GRAPH ALGEBRAS AND HOMOMORPHISM FUNCTIONS
Proof. Let z be a contractor. We claim that z ◦ P2•• is a connector. Indeed, for any 2-labeled graph G, we)have ((z ◦ P2•• )G ∼ = )z(G ◦ P2•• ), and by the definition ( •• •• of a contractor, f z(G ◦ P2 ) = f K1 (G ◦ P2 ) = f (P2•• G). So z ◦ P2•• ≡ P2•• (mod f ), and thus z ◦ P2•• is a connector. The second assertion is trivial by the same construction. Proposition 6.20. If f is contractible, has a connector, and r(f, 2) is finite, then f has a contractor. Proof. Since ⟨x, y⟩ = f (xy) is a symmetric (possibly indefinite) bilinear form that is not singular on Q2 /f , there is a basis p1 , . . . , pN in Q2 /f such that f (pi pj ) = 0 if i ̸= j and f (pi pi ) ̸= 0. By the assumption that f has a connector, we may represent this basis by quantum graphs with nonadjacent labeled nodes; then the contracted quantum graphs p′i have no loops. Let z=
N ∑ f (p′i ) pi . f (p2i ) i=1
We claim that z is a contractor. Indeed, let x ∈ Q2 be a quantum graph with ∑N nonadjacent labeled nodes, and write x ≡ i=1 ai pi (mod f ). Then we have ∑ f (p′i ) f (p2i ) = ai f (p′i ). 2 f (p ) i i=1 i=1 ∑N ′ On the other hand, contractibility implies that x ≡ i=1 ai p′i (mod f ), and so f (xz) =
N ∑
N
ai
f (x′ ) =
N ∑
ai f (p′i ) = f (xz).
i=1
Proposition 6.21. If M (f, 2) is positive semidefinite and has finite rank r, and f is contractible, then f has a connector whose constituents are paths of length at most r + 1. Proof. Since Q2 /f is finite dimensional, there is a linear dependence between •• in Q2 /f . Hence there is a (smallest) k ≥ 2 such that Pk•• can P2•• , P3•• , . . . , Pr+2 be expressed as r ∑ •• (mod f ) (6.24) Pk•• ≡ ai Pk+i i=1
with some real numbers , ar . The assertion is equivalent to saying that k = 2. ∑r a1 , . . . •• •• . Then (6.24) can be written as x ◦ Pk−1 Let x = P2•• − i=1 ai P2+i ≡ 0 •• (mod f ). If k = 3, then this implies that x ◦ P2+i ≡ 0 (mod f ) for all( i ≥ 0, and ) hence x ◦ x ≡ 0 (mod f ). Using contractibility we obtain that 0 = f (x ◦ x)′ = f (x2 ). Now semidefiniteness of M (f, 2) shows that x ≡ 0 (mod f ). So (6.24) holds with k = 2 as well, a contradiction. Suppose that k > 3, then •• 2 •• •• (x ◦ Pk−2 ) = (x ◦ Pk−1 )(x ◦ Pk−3 ) ≡ 0 (mod f ),
and so by the assumption that M (f, 2) is positive semidefinite, we get that x ◦ •• Pk−2 ≡ 0 (mod f ), which contradicts the minimality of k again. The following statement is a corollary of Propositions 6.20 and 6.21.
6.3. CONTRACTORS AND CONNECTORS
97
Corollary 6.22. If M (f, 2) is positive semidefinite and has finite rank, and f is contractible, then f has a contractor. We conclude this section with a number of examples of connectors and contractors. Example 6.23 (Perfect matchings). Recall that pm(G) denotes the number of perfect matchings in the graph G. We have seen that rk(pm, k) = 2k is exponentially bounded, but pm is not reflection-positive, and thus pm(G) cannot be represented as a homomorphism function. On the other hand: pm has a contractor: a path of length 2, and also a connector: a path of length 3. Example 6.24 (Number of triangles). The graph parameter hom(K3 , .) has no connector. Indeed, suppose that x ∈ Q2 is a connector, then we must have hom(K3 , xP3•• ) = hom(K3 , K2•• P3•• ) = hom(K3 , K3 ) = 6, and also
hom(K3 , xP4•• ) = hom(K3 , K2•• P4•• ) = hom(K3 , C4 ) = 0. On the other hand, hom(K3 , xP3•• ) = hom(K3 , xP4•• ), since x has with nonadjacent labeled nodes, so no homomorphism from K3 touches the edges of the P3•• factor. This contradiction shows that hom(K3 , .) has no connector. A similar argument shows that hom(K3 , .) is not contractible (and so it has no contractor). Example 6.25 (S-Flows). The number flo(G) of flows on G with values from a given subset of a finite abelian group can be described as a homomorphism function (Example 5.16). It has a trivial connector, a path of length 2 (which is an algebraic way of saying that if we subdivide an edge, then the flows don’t change essentially). In the case of nowhere-0 flows, K2•• + U2•• is a contractor (which amounts to the contraction-deletion identity for the flow polynomial). In general, it is more difficult to describe the contractors, but it is possible (Garijo, Goodall and Neˇsetˇril [2011]). Example 6.26 (Density in a random graph). Recall the multigraph parameter simp from Example 5.60: f (G) = pe(G ) (0 < p < 1), which is reflection positive, multiplicative, and has finite connection rank. This parameter has neither a contractor nor a connector; it is not even contractible. We have ≡
(mod f ),
but identifying the labeled nodes produces a pair of parallel edges in the first graph but not in the second, so they don’t remain congruent. − → Example 6.27 (Eulerian orientations). Recall that eul(G) denotes the number − → of eulerian orientations of the graph G. We have seen that the graph parameter eul is reflection positive, but has infinite connection rank (so it is not a homomorphism function). Similarly as in Example 6.25, a path of length 2 is a connector. Furthermore, this graph parameter is contractible, but has no contractor (see Exercise 6.33).
98
6. GRAPH ALGEBRAS AND HOMOMORPHISM FUNCTIONS
6.3.2. Contractors and connectors for homomorphism functions. Many graph parameters have contractors and/or connectors, as we have seen. Our next theorem asserts this for homomorphism functions f = hom(., H) for any weighted graph H. It is easy to check from the definitions that a 2-labeled quantum graph z is a contractor for hom(., H) if and only if (6.25)
homij (z, H) = (1/αi )1(i = j)
for every i, j ∈ V (H). It is a connector for hom(., H) if and only if z is simple, it has nonadjacent labeled nodes, and (6.26)
homij (z, H) = βij
for every i, j ∈ V (H). Theorem 6.28. Let f = hom(., H) for some weighted graph H. Then f has a simple contractor and a simple connector. We can in fact construct connectors and contractors of a rather simple form. To state the result, we define classes of 2-labeled multigraphs as follows. We start with K = {K2•• }. The (gluing) products of at most a of members of K form the class K(a) (so these are 2-labeled bonds). Concatenations of at most b bonds form the class K(a, b). The gluing products of at most c members of K(a, b) form the class K(a, b, c). Concatenations of at most d members of K(a, b, c) form the class K(a, b, c, d) etc (Fig. 6.1).
Figure 6.1. Graphs in classes K Supplement 6.29. Let q = v(H). (a) The graph parameter f = hom(., H) has a connector whose constituents are paths Pk•• with 3 ≤ k ≤ q + 3. (b) The graph parameter f = hom(., H) has a contractor whose constituents are in the class K(q 2 − 1, 2, q 2 − 1, 2, q). In particular, the number of nodes in this contractor is at most 2q 3 . All graphs in the classes K(a, b, . . . ) are series-parallel. So in particular, the graph parameter f = hom(., H) has a contractor whose constituents are seriesparallel (Lov´ asz and Szegedy [2009]). The Supplement above is a refinement of this statement. For us, the bound on the number of nodes will be more important than the structure. If we get rid of the parallel edges in the contractor by replacing each edge by the simple connector constructed above, every constituent will still be a series-parallel graph, and the number of nodes in any constituent will be bounded by 2q 6 .
6.3. CONTRACTORS AND CONNECTORS
99
Proof of Theorem 6.28 and Supplement 6.29. We may assume that H is twin-free, since identifying twins does not change the graph parameter hom(., H). (a) To construct a simple connector, let B = (βij ) be the (weighted) adjacency √ √ matrix of H, and let D = diag( α1 , . . . , αm ). Let λ1 , . . . , λt be the nonzero eigenvalues of the matrix DBD (which are real as DBD is symmetric), and consider ∏t the polynomial ρ(z) = z i=1 (1 − z/λi ). Then ρ(DBD) = 0 (since the eigenvalues of DBD are roots of ρ). Since the constant term in ρ(z) is 0 and the linear term is z, ∑tthis expressess DBD as a linear combination of higher powers of DBD: DBD = s=2 as (DBD) , or (6.27)
B=
t ∑
as (BD2 )s−1 B.
s=2
For all i, j ∈ V (H), we have (6.28)
( ) homij (Ps•• , H) = (BD2 )s−2 B ij .
∑t •• Let y = s=2 as Ps+1 , then (6.27) and (6.28) imply that y is a connector, with the structure described in the Supplement. (b) The existence of a contractor, a 2-labeled quantum graph z satisfying (6.25), follows by Theorem 6.38. We can replace each edge of the contractor by a simple connector, to make it simple. The construction of a contractor of the special type described in the Supplement takes more work. We consider the values βij = homij (K2•• , H). Replacing K2 by the •• m m-bond, we get homij (Bm , H) = βij . (Note that this relation also holds for m = 0.) Hence for every real polynomial p ∈ R[x] we get a 2-labeled quantum graph yp that is a quantum bond (a linear combination of bonds), such that homij (yp , H) = p(βij ). Since every real function equals to a polynomial of degree less than q 2 on the q 2 values βij , we get that for every real function g there is quantum graph yg with constituents from K(q 2 − 1), such that homij (yg , H) = g(βij ). Let K denote the field obtained from Q by adjoining the node weights of H. We choose the function g above so that the values γij = g(βij ) are algebraically independent over K for different values of βij . Next, we form the concatenation ug = yg ◦ yg∗ (which is a linear combination of graphs in K(q 2 − 1, 2)), and consider the values homij (ug , H). We claim that a “diagonal” value homii (ug , H) can never be equal to an “off-diagonal” value homjk (ug , H), i ̸= j. Indeed, if they are equal, then we have q ∑ r=1
2 αr γir
=
q ∑
αr γjr γkr .
r=1
Since different γ’s are algebraically independent over K (which contains the coefficients αr ), the two sides must be equal formally. In particular, every product γjr γkr must occur on the left side, which implies that γjr = γkr . By the definition of γ, this means that βjr = βkr for every r, and so nodes j and k are twins, which was excluded. It follows that we can find a polynomial h of degree at most q 2 − 1 such that h(γii ) = 1 but h(γij ) = 0 if i ̸= j. Hence we get a quantum graph w such that (6.29)
homij (w, H) = 1(i = j).
100
6. GRAPH ALGEBRAS AND HOMOMORPHISM FUNCTIONS
Every constituent of w is the (gluing) product of at most q 2 −1 graphs in K(q 2 −1, 2), so it is in class K(q 2 − 1, 2, q 2 − 1). Consider the quantum graph w′ = w ◦ w. The constituents of w′ are graphs in class K(q 2 − 1, 2, q 2 − 1, 2). Using (6.29), we get ∑ αk homik (w, H)homkj (w, H) = αi 1(i = j). (6.30) homij (w′ , H) = k
Expressing the function i(t) = 1t 1(t ̸= 0) on the values of homij (w′ , H) by a polynomial of degree at most q as above, we can construct a quantum graph z in class K(q 2 − 1, 2, q 2 − 1, 2, q) satisfying (6.25). Using the notion of a contractor, we can give the following characterization of homomorphism functions, which does not involve the finiteness of the connection rank. Theorem 6.30. A graph parameter f can be represented in the form f = hom(., H) for some weighted graph H if and only if it is multiplicative, reflection positive and has a contractor. Proof. The necessity of the conditions follows by Theorem 6.28. To prove the sufficiency of the conditions, it suffices to prove that there exists a q > 0 such that r(f, k) ≤ q k for all k ≥ 0, and then invoke Theorem 5.54. Note that reflection positivity is used twice: the existence of a contractor does not in itself imply an exponential bound on the connection rank (cf. our introductory example of the chromatic polynomial). Let g0 be a contractor for f ; we show that q = f (g02 ) provides the appropriate upper bound on the connection rank. Since f is multiplicative, we already know this for k = 0. We may normalize f so that f (K1 ) = 1. If the conclusion is false (with q = f (g02 )) then for some integer k > 0 we have (possibly infinite) r(f, k) > q k and hence for N = ⌊q k + 1⌋ there are mutually orthogonal unit vectors q1 , . . . , qN in the algebra Qk /f . Let qi ⊗ qi denote the (2k)-labeled quantum graph obtained from 2k labeled nodes by attaching a copy of qi at {1, . . . , k} and another copy of qi at {k + 1, . . . , 2k}. Let h denote the (2k)-labeled quantum graph obtained from 2k labeled nodes by attaching a copy of g0 at {i, k + i} for each i = 1, . . . , k. Consider the quantum graph x=
N ∑
qi ⊗ qi − h.
i=1
By reflection positivity, we have f (x2 ) ≥ 0. But f (x2 ) =
N ∑ N N ∑ ∑ ⟨qi ⊗ qi , qj ⊗ qj ⟩ − 2 ⟨qi ⊗ qi , h⟩ + ⟨h, h⟩. i=1 j=1
i=1
Here ⟨qi ⊗ qi , qj ⊗ qj ⟩ = ⟨qi , qj ⟩2 = 1(i = j), and so N ∑ N ∑ ⟨qi ⊗ qi , qj ⊗ qj ⟩ = N. i=1 j=1
6.4. ALGEBRAS FOR HOMOMORPHISM FUNCTIONS
101
Furthermore, by the definition of g0 and h, we have N N ∑ ∑ ⟨qi ⊗ qi , h⟩ = ⟨qi , qi ⟩ = N. i=1
i=1
Finally, by the definition of h and the multiplicativity of f , we have ⟨h, h⟩ = f (g02 )k . Thus f (x2 ) ≥ 0 implies that N ≤ f (g02 )k = q k , a contradiction. Exercise 6.31. Let z be a contractor for a graph parameter f , and let F be a k-labeled graph. Let us delete an edge 1i from F , and add the edge 2i, to obtain another k-labeled graph F ′ . Prove that zF ≡ zF ′ (mod f ). Exercise 6.32. Let z be a contractor for a graph parameter f , and let z = O2 −z. Construct a 3-labeled quantum graph x by gluing a copy of z on nodes 1 and 2, a copy of z on nodes 2 and 3, and a copy of z on nodes 1 and 3. Prove that x ≡ 0 (mod f ). − → Exercise 6.33. Prove that the number of eulerian orientations eul is contractible, but has no contractor. ˘ be obtained by repeatedly conExercise 6.34. For every multigraph G, let G tracting edges with multiplicity larger than 1 until a simple graph is obtained, ˘ is uniquely determined. (b) ˘ (a) Prove that G and define f (G) = hom(K3 , G). Show that f has a contractor. (c) Prove that f has no connector.
6.4. Algebras for homomorphism functions We have proved that reflection positive graph parameters with exponentially bounded connection rank are homomorphism functions into weighted graphs. In this section we continue the study of such parameters, now using their representation as homomorphism functions. A key step in the proof was to construct an idempotent basis in the appropriate graph algebra. What is the exact size of this basis? How large graphs we need to represent the basis elements as quantum graphs? Our goal is to prove an exact answer to the first question, and to give good bounds for the second. 6.4.1. The connection rank of homomorphism functions. One can give an exact formula for the connection rank r(f, k) of homomorphism functions f = hom(., H) (Lov´ asz [2006b]); to state it, we need a definition. Two nodes i and j in the weighted graph H are twins, if βH (i, k) = βH (j, k) for every node k ∈ V (H). (Note that this applies also to k = i and k = j. On the other hand, twin nodes may have different nodeweights.) Twin nodes can be merged (adding up their weights) without changing the homomorphism functions t(., H) and hom(., H). If H is a weighted graph, then we denote the factor algebra Qk /hom(., H) simply by Qk /H. The proof of the following characterization of the kernels of the algebras is left to the reader as an exercise: Proposition 6.35. If H is a weighted graph and x is a k-labeled quantum graph, then x ≡ 0 (mod hom(., H)) if and only if homφ (x, H) = 0 for every φ : [k] → V (H). This fact allows us to consider the functions homφ as functions on the factor algebra Qk /H. Theorem 6.36. If H is a twin-free weighted graph, then r(hom(., H), k) is the number of orbits of the automorphism group of H on the ordered k-tuples of nodes.
102
6. GRAPH ALGEBRAS AND HOMOMORPHISM FUNCTIONS
Corollary 6.37. Let H be a weighted graph that has no twins and no proper automorphisms. Then r(hom(., H), k) = v(H)k for every k. Theorem 6.36 has a number of essentially equivalent formulations, which are interesting on their own right. One of these characterizes homomorphism functions of the form homφ (F, H). Theorem 6.38. Let H be a twin-free weighted graph and h : V (H)k → R. Then there exists a k-labeled quantum graph z such that homφ (z, H) = h(φ) for every φ ∈ V (H)k if and only if h is invariant under the automorphisms of H: for every φ ∈ V (H)k and every automorphism σ of H, h(σ ◦ φ) = h(φ). Another variant of these theorems gives a combinatorial description of the basic idempotents p1 , . . . , pn in the algebra Qk /H, which played an important role in the proof of the characterization theorem. For every φ ∈ V (H)k , we have homφ (pi , H) = homφ (p2i , H) = homφ (pi , H)2 , and hence homφ (pi , H) ∈ {0, 1}. Furthermore, for i ̸= j, we have homφ (pi , H)homφ (pj , H) = homφ (pi pj , H) = 0, and hence the sets Φi = {φ ∈ V (H)k : homφ (pi , H) = 1}, which we call idempotent supports, are disjoint. Since ∑ αφ (H)homφ (pi , H) = hom(pi , H) = hom(p2i , H) > 0, φ
the idempotent supports are nonempty. We have n ∑
∑ i
pi = Ok , and hence
homφ (pi , H) = homφ (Ok , H) = 1,
i=1
and so the idempotent supports form a partition of V (H)k . Since p1 , . . . , pn form a basis of Qk /H, it follows that functions φ 7→ homφ (x, H) (x ∈ Qk ) are exactly those that are constant on every idempotent support. The following characterization of the partition Bk into idempotent supports is a further equivalent version of Theorem 6.36. Theorem 6.39. For a twin-free weighted graph H, the idempotent supports are exactly the orbits of the automorphism group of H on V (H)k . We call two maps φ, ψ ∈ V (H)k equivalent (in notation φ ∼ ψ) if homφ (F, H) = homψ (F, H) for every k-labeled graph F . It follows from the discussion above that this means that they belong to the same idempotent support. With this notion finally we come to the version of these equivalent theorems that we are going to prove first. Theorem 6.40. Two maps φ, ψ ∈ V (H)k are equivalent if and only if there exists an automorphism σ of H such that ψ = σ ◦ φ. Proof. The “if” part is trivial, so let do the “only if” part. For any map φ : [k] → [q], let φ′ denote its restriction to [k − 1]. We start with some easy facts about equivalence of maps. Claim 6.41. If two maps φ, ψ are equivalent, then so are φ′ and ψ ′ .
6.4. ALGEBRAS FOR HOMOMORPHISM FUNCTIONS
103
Indeed, for any (k − 1)-labeled graph F , and the graph F1 obtained from F by adding a new isolated node labeled k, we have homφ′ (F, H) = homφ (F1 , H) = homψ (F1 , H) = homψ′ (F, H). Claim 6.42. Suppose that φ, ψ ∈ [q]k are equivalent. Then for every µ ∈ [q]k+1 such that φ = µ′ there exists a ν ∈ [q]k+1 such that ψ = ν ′ and µ and ν are equivalent. Indeed, let µ belong to the support of the basic idempotent p ∈ Qk+1 /H, then for every ν ∈ V (H)k+1 we have homν (p, H) = 1(ν ∼ µ). Let p′ be obtained by unlabeling k + 1 in p. Then ∑ ∑ (6.31) homφ (p′ , H) = αη(k+1) (H)homη (p, H) = αη(k+1) (H), η:η ′ =φ
η:η ′ =φ η∼µ
and similarly (6.32)
homψ (p′ , H) =
∑
αη(k+1) (H).
η:η ′ =ψ η∼µ
These two numbers are equal since φ ∼ ψ. Since the right side of (6.31) is positive, this implies that the sum in (6.32) is nonempty, and hence there is a map ν such that ν ′ = ψ and ν ∼ µ. The next observation makes use of the twin-free assumption. Claim 6.43. Every map σ : [q] → [q] such that βσ(i)σ(j) = βij for every i, j ∈ [q] is bijective. To prove this, note that the mapping σ has some potent. Then for all i, j ∈ [q], we have βij = βγ(i)γ(j) shows that i and γ(i) are twins for all i ∈ [q]. Since that γ is the identity, and so σ must be bijective. After this preparation, we prove the theorem for mappings.
power γ = σ s that is idem= βγ 2 (i)γ(j) = βγ(i)j , which H is twin-free, this implies larger and larger classes of
Case 1: φ is bijective. Then k = q. We may assume that the nodes of H are labeled so that φ is the identity, and then we want to prove that ψ (viewed as a map of V (H) into itself) is an automorphism of H. First, we show that (6.33)
βij = βψ(i)ψ(j)
for every i, j ∈ [k]. Indeed, let kij be the k-labeled graph consisting of k nodes and a single edge connecting nodes i and j. Then βij = homφ (kij , H) = homψ (kij , H) = βψ(i)ψ(j) . It follows by Claim 6.43 that ψ is also bijective. Second, we show that for every j ∈ [k], (6.34)
αj = αψ(j) .
It suffices to prove this for the case j = k. For the graph Ok−1 consisting of k − 1 isolated labeled nodes, k−1 ∏ homφ′ (Ok−1 , H) = αj , j=1
104
6. GRAPH ALGEBRAS AND HOMOMORPHISM FUNCTIONS
and since ψ is bijective, homψ′ (Ok−1 , H) =
k−1 ∏ j=1
αψ(j) =
1
k ∏
αψ(k)
j=1
αj .
Since ψ ′ ∼ φ′ by Claim 6.41, equation (6.34) follows. Case 2: φ is surjective. By permuting the labels 1, . . . , k if necessary, we may assume that φ(1) = 1, . . . , φ(q) = q. Claim 6.41 implies that the restriction of ψ to [q] is equivalent to the restriction of φ to [q], and so by Case 1, there is an automorphism σ of H such that ψ(i) = σ(i) for i = 1, . . . , q. Consider any q + 1 ≤ j ≤ k, and let φ(j) = r. We claim that ψ(j) = ψ(r). Indeed, the restriction of φ to {1, . . . , r−1, r+1, . . . , q, j} is bijective, and equivalent to the restriction of ψ to this set; hence the restriction of ψ to this set must be bijective,( which ) implies that ψ(j) = ψ(r). This implies that for every 1 ≤ i ≤ k, ψ(i) = σ φ(i) as claimed. Case 3. φ is arbitrary. We can extend φ to a mapping µ : [ℓ] → [q] (ℓ ≥ k) which is surjective. By Claim 6.42, there is a mapping ν : [ℓ] → [q] extending ψ such that µ and ν are equivalent. Then by Case 2, there is an automorphism σ of G such that ν = σ ◦ µ. Restricting this map to [k], the assertion follows. The other theorems stated above are easy to derive now. Proof of Theorems 6.36, 6.38, and 6.39. Theorem 6.39 is trivially equivalent to Theorem 6.40 by the description of idempotent supports. The “only if” part of Theorem 6.38 is also trivial. To prove the “if” part, notice that every function h : V (H)k → R invariant under automorphisms can be written as a linear combination of indicator functions of the orbits of the automorphism group. By Theorem 6.39, this means that it is a linear combination of the functions homφ (pi , H), and hence it is of the form homφ (z, H) with some z ∈ Qk . Finally, it follows that the number of orbits of the automorphism group of H on V (H)k is the number of the idempotents pi , which is r(f, k), which proves Theorem 6.36. These results describe an interesting isomorphism between the graph algebras defined by a homomorphism function hom(., H) and algebras of functions on a twin-free weighted graph H: Qk /H is isomorphic to the algebra of those functions V (H)k → R that are invariant under the automorphisms of H. The isomorphism is defined by the map F 7→ homv1 ...vk (F, H), where homv1 ...vk (F, H) is viewed as a function of v1 , . . . , vk . This correspondence between quantum graphs and functions on V (H) is useful in constructing quantum graphs with special properties. As an application of the tools developed in this section, we are now able to prove a weaker version of Theorem 5.33, without the bounds on the size of the graph. Corollary 6.44. If H1 and H2 are twin-free weighted graphs such that hom(F, H1 ) = hom(F, H2 ) holds for all simple graphs F , then H1 ∼ = H2 . Proof. Let H be the graph obtained by taking the disjoint union of H1 and H2 , creating two new nodes v1 and v2 , and connecting vi to all nodes of Hi . Also add loops at both vi . The new nodes and new edges have weight 1, except for the
6.4. ALGEBRAS FOR HOMOMORPHISM FUNCTIONS
105
loops at v1 and v2 , which get some weight β different from all other edgeweights. This last trick is needed to make sure that the graph H is twin-free. We claim that for every 1-labeled graph F (6.35)
homv1 (F, H) = homv2 (F, H).
Indeed, if F is not connected, then those components not containing the labeled node contribute the same factors to both sides. So it suffices to prove (6.35) when F is connected. Then we have ∑ homv1 (F, H) = β eF (S) hom(F \ S, H1 ). v1 ∈S⊆V (F ) −1
Indeed, if we fix the set S = φ (v1 ), then the restriction φ′ of φ to V (F ) \ S is a map into V (H1 ) (else, the contribution of the map to hom(F, H) is 0), and the contribution of φ to homv1 (F, H) is the product of contributions from the edges induced by S and the contribution of φ′ to hom(F \ S, H). Since homv2 (F, H) can be expressed by a similar formula, and the sums on the right hand sides are equal by hypothesis, this proves (6.35). Now (6.35) can be phrased as the maps 1 7→ v1 and 1 7→ v2 are equivalent, and so Theorem 6.40 implies that there is an automorphism of H mapping v1 to v2 . This automorphism gives an isomorphism between H1 and H2 . 6.4.2. The size of basis graphs. Every element of the factor algebra Qk /H has many representations as a quantum graph in Qk . The following theorem asserts that it has a representation whose constituents are (in a sense) small. Theorem 6.45. Let H be a weighted graph with V (H) = [q]. The algebra Qk /H is generated by simple k-labeled graphs with at most 2(k + q 2 )q 6 nodes, in which the labeled nodes form a stable set. Proof. Let F = (V, E) be any k-labeled graph; we construct a simple k-labeled quantum graph x, where each constituent has no more than 2(k + q 2 )q 6 nodes, and F ≡ x (mod H). ( ) Let z be a 2-labeled quantum graph such that homφ (z, H) = 1 φ(1) = φ(2) for all φ : {1, 2} → [q]. (So z is very similar to a contractor. We have z 2 = z, but z ◦ z ̸= z.) We may assume that every constituent of z has at most 2q 6 nodes (Supplement 6.29 and the Remark after it). Let z = O2 − z. Let w be a simple connector; we can assume that w is a linear combination of paths of length at least 3 and at most q + 3 by Exercise 6.46. Let us glue a copy of U2 on every pair of distinct nodes of V ; this does not change F . But we can expand every O2 as O2 = z + z, and obtain a representation |V | of F as a sum of quantum graphs xℓ (ℓ = 1, . . . , 2( 2 ) ), each of which is obtained from F by gluing either z or z on every pair of nodes in V . Many of these terms will be 0. For any term xℓ , let Gℓ denote the graph on S in which two nodes are connected if and only if they have a copy of z glued on. If (i, j) and (j, k) have z glued on, but (i, k) has z, then the union of these three is 0 as a quantum graph in Q3 (this is easy to check; cf. Exercise 6.32). Hence if xi is a nonzero term, then adjacency must be transitive in Gℓ , and so Gℓ consists of disjoint complete graphs. If Gℓ has more than q components, then any map V → V (H) will collapse two nodes of V on which a z is glued, and hence xℓ = 0. So we are left with only those terms in which Gℓ consists of at most q disjoint
106
6. GRAPH ALGEBRAS AND HOMOMORPHISM FUNCTIONS
complete graphs. Let V = V1 ∪ · · · ∪ Vr be the partition onto the node sets of these components (r ≤ q). Let us select a representative node vi from every Vi . It is easy to see that deleting the copies of z except those which are attached to a vi , and also the copies of z except those connecting two nodes vi , does not change xℓ . If uv ∈ E with u ∈ Vi and v ∈ Vj (i ̸= j), then we can “slide” this edge to vi vj without changing xℓ (cf. Exercise 6.31). If u, v ∈ Vi , then we replace the edge uv by a simple connector w in which the labeled nodes are at a distance at least 3 (cf. Exercise 6.46), and then slide both attachment nodes to vi , to get a copy of w′ hanging from vi . Each constituent of the resulting quantum graph consists of a “core”, the set of the nodes vi and the set of labeled nodes, at most k + q nodes altogether. What is not bounded is the sets of edges connecting a vi and a vj , the sets of copies of w′ hanging from a vi , and the copies of z connecting vi to other nodes in Vi . However, we can get rid of these unbounded multiplicities. First, a set of q 2 or more parallel edges can be replaced by a linear combination of sets of parallel edges with multiplicity at most q 2 − 1, by Exercise 6.6(b). By a similar argument, a set of q or more copies of w′ hanging from the same node vi can be expressed as a linear combination of sets of at most q − 1 copies. Finally, again by the same argument, a set of q or more copies of z connecting vi to unlabeled nodes can be expressed as combination of sets of at most q − 1 copies. So we are left with at most (aq)linear 2 (q − 1) edges that may be parallel to others, at most q(q − 1) hanging copies of 2 w′ , and at most k + q(q − 1) copies of z. We get rid of the edge multiplicities by replacing each edge between core nodes by a simple connector w. After that, each constituent will be a simple graph. By the choice of z( and ) 2 q, q the number of nodes in( each constituent will be bounded by k + q + (q + 2) 2 (q − ) 1) + (q + 2)q(q − 1) + k + q(q − 1) (2q 6 ) < 2(k + q 2 )q 6 . As an application of the previous theorem, we prove Theorem 5.33 in its full strength, including the bounds on the sizes of the graphs needed. Proof of Theorem 5.33. Following the proof of Corollary 6.44, we have to show that (6.35) holds. We do know that it holds for every F with at most 2(v(H1 )+ v(H2 ) + 3)8 nodes. Since this includes all basis graphs of Q1 /H by Theorem 6.45, it follows that (6.35) holds for all simple 1-labeled graphs F . From here, the proof is unchanged. Exercise 6.46. Prove that for every weighted graph H with q nodes and every •• •• t ≥ 2, hom(., H) has a connector whose constituents are Pt•• , Pt+1 , . . . , Pt+q . Exercise 6.47. Prove that for every weighted graph H, hom(., H) has a contractor whose constituents are series-parallel graphs.
6.5. Computing parameters with finite connection rank As an application of graph algebras, we prove the theorem announced before, namely that graph parameters with finite connection rank can be computed in polynomial time for certain classes of graphs. Of course, the algorithm could be described without reference to algebras, but I feel that the essence is better shown through this tool.
6.5. COMPUTING PARAMETERS WITH FINITE CONNECTION RANK
107
Suppose you want to evaluate a graph parameter on a graph G. There is a cut of size k in a graph, and while you know everything about one side of the cut, you have to pay for information about the other side. How much information do you need about the other side? To avoid the trivial solution “just tell me the value of the parameter, if my side looks like this”, let us assume that the information about the other side must be independent from what is on our side, and it is encoded in the form of an m-tuple of real numbers. Furthermore, the answer must be obtained by taking an appropriate linear combination of these m numbers, with coefficients that depend only on the graph on our side. As an example, let k = 1 (so we have a cutset {v} with one node), and suppose that we want to compute the number of independent sets in the whole graph. Then we need to know two data about the other side: the number a0 of independent sets not containing v, and the number a1 of independent sets containing v. We determine the analogous numbers b0 , b1 for our side, and then the number of independent sets in the whole graph is a0 b0 + a1 b1 . One reason to be interested in the finiteness of connection rank is the fact that such a graph parameter can be evaluated in polynomial time for graphs with bounded treewidth, based on the idea explained above. The treewidth of a graph G is defined as follows. A tree-decomposition of a graph G is determined by a tree T and a family (Gi )i∈V (T ) of induced subgraphs of G such that G = ∪i Gi and whenever i is on the path from j to k in T (i, j, k ∈ V (T )), then V (Hi ) ⊃ V (Hj ) ∩ V (Hk ). The tree-width of a graph G is the smallest integer k such that G has a tree-decomposition into subgraphs of size at most k + 1. Theorem 6.48. Let f be a graph parameter and k ≥ 0. If r(f, k) is finite, then f can be computed in polynomial time for graphs with treewidth at most k. Proof. We describe a dynamic programing algorithm to compute the parameter. If the connection matrix M (f, k) has finite rank m, then the graph algebra Qm /f is finite dimensional for all m ≤ k (Exercise 4.30). We need to do some (large, but finite) precomputation. First, we compute a basis Bm , consisting of (ordinary) m-labeled graphs, for each of the algebras Qm /f . We also express the product of any two basis elements in this basis (the “Schur constants”). Second, let H be an l-labeled graph with at most k + 1 nodes (l ≤ k), and for every ordered subset S ⊆ V (H) with |S| ≤ k, let a basis graph FS ∈ B|S| be assigned. Let us glue the labeled nodes of FS onto the set S (erasing the labels in FS at the same time), to get an l-labeled graph H ′ . We compute the representation of H ′ in the basis Bl . We do this precomputation for every H and every assignment of basis graphs FS . Third, we compute the values f (G) for every G ∈ B0 . Let G be a graph with treewidth at most k, and let (Gi )i∈V (T ) be a treedecomposition of the graph with v(Gi ) ≤ k + 1. Designate any leaf r of T as its root, and for i ∈ V (T ) \ {r}, let i′ denote its parent. For every node i ∈ V (T )\{r}, the set Si = V (Gi )∩V (Gi′ ) is a cutset in G with ki ≤ k nodes. Let Fi denote the union of all graphs Gj where j is a descendant of i (including i), in which the ki nodes of Si are labeled. The algorithm will consist of expressing every Fi in the basis Bki , starting from the leaves and working our way up to the root.
108
6. GRAPH ALGEBRAS AND HOMOMORPHISM FUNCTIONS
Suppose that such an expression has been computed for every proper descendant of i. The ki -labeled graph Fi is obtained from Gi by attaching different branches Fj at the sets Sj . We already know how to express each Fj in the basis Bkj ; let us substitute this expression for Fj , to get a representation of Fi as a linear combination of graphs, each of which consists of Gi with some number of basis graphs attached at various subsets S ⊆ V (Gi ) with |S| ≤ k. If two or more basis graphs are attached at the same set S, we can replace them by one, since we have precomputed products of basis graphs. But then we have a linear combination of ki -labeled graphs of the type we have already expressed in the basis Bki . When we get to the root, we consider it 0-labeled, and get an expression for G in the basis B0 , which yields the value f (G). 6.6. The polynomial method In this section we describe a method of proving representation theorems for graph parameters, which depends on commutative algebra (properties of multivariate polynomials). This method was developed by B. Szegedy [2007] for the proof of the characterization of graph parameters that are partition functions of edge coloring models (we will describe this result without proof in Chapter 23.2). Here we use an adaptation of this method to prove Theorem 5.57, due to Schrijver [2009]. The basic idea is to treat the edge weights of the target graph H as variables. Then homomorphism numbers into H will be polynomials in these variables. One treats this as a polynomial-valued graph parameter, works out the corresponding graph algebras, and then proves that one can find a substitution for the variables that reproduces the given graph parameter. Let H be a weighted graph on [q], in which the nodeweights are 1, and the edgeweights are different variables xij (where xij = xji ). It will be convenient to arrange these variables into a symmetric q × q matrix X. Every substitution of complex numbers for the xij gives a complex valued homomorphism function into an edge-weighted graph on [q], and vice versa. The homomorphism number ∑ ∏ hom(G, H) = xφ(i)φ(j) φ:V (G)→[q] ij∈E(G)
(q )
is a polynomial in the 2 + q variables xij , which we denote by hom(G, X). We define the polynomial inj(G, X) analogously. Clearly hom(K0 , X) = 1, hom(K1 , X) = q, and hom(G1 G2 , X) = hom(G1 , X)hom(G2 , X). In particular, hom(GK1 , X) = qhom(G, X) for every graph G. We extend the definition linearly to quantum graphs, to get polynomials hom(g, X), inj(g, X) ∈ C[X] associated with every quantum graph g. We start with describing the range and kernel of the map g 7→ hom(g, X) (where q is fixed). Clearly, hom(g, X) is invariant under the permutations of [q]. To be more precise, if σ ∈ Sq and we define X σ = (xσ(i),σ(j) : i, j ∈ [q]), then trivially hom(., X) = hom(., X σ ). Let C[X]Sq denote the space of polynomials in C[X] that are invariant under Sq in this sense. Lemma 6.49. The polynomials hom(g, X), where g ∈ Q0 , form the space C[X]Sq . ∏ a Proof. Let X a = i≤j xijij be any monomial, and let G denote the multigraph on [q] in which nodes i and j are connected by aij edges. Then inj(G, X) =
6.6. THE POLYNOMIAL METHOD
∑
109
) . Since every polynomial in C[X]Sq can be written as a linear combination (with constant coefficients) of such special polynomials, it follows that every polynomial in C[X]Sq can be written as inj(g, X) for some quantum graph g. By identity (5.18) (which remains valid if the graph G is replaced by the matrix X), this implies the Lemma. σ∈Sq (X
σ a
Next, we describe quantum graphs g with hom(g, X) = 0 (identically 0 as a polynomial in the entries of X). Note that if we remove an isolated node to a constituent of any quantum graph g, and multiply its coefficient by q, then we get a quantum graph g ′ such that hom(g, X) = hom(g ′ , X). Let us call the repeated application of this operation an isolate removal. Lemma 6.50. A quantum graph g satisfies hom(g, X) = 0 if and only if there is a quantum graph h in which all constituents have more than q nodes such that removing isolates from g we obtain M h. Proof. If g = M h, where all constituents of h have more than q nodes, then hom(g, X) = inj(h, X) = 0. Isolate removal does not change the value of hom(g, X). Conversely, suppose that hom(g, X) = 0. We may assume that the constituents of g have no isolated nodes. We have inj(Zg, X) = hom(g, X) = 0. If Zg has a constituent with at most q nodes, then this produces in inj(Zg, X) a term which does not cancel (here we use that the constituent has no isolated nodes). So all constituents of Zg have more than q nodes, and we can take h = Zg. Now we are ready to prove Theorems 5.56 and 5.57. Proof of Theorem 5.56. Multiplicativity implies that f (K0 ) = 1 (since f is not identically 0), and f (GK1 ) = qf (G). We want to prove that f = hom(., A) for an appropriate symmetric complex matrix A; in other words, we want to show that the polynomial equations (6.36)
hom(G, X) − f (G) = 0
(for all looped multigraphs G)
are solvable for the variables xij (1 ≤ i, j ≤ q) over the complex numbers. We are going to use Hilbert’s Nullstellensatz for this, but we need some preparation. We begin with relating the kernel of the map hom(., X) to the kernel of f . Claim 6.51. If hom(g, X) = c (a constant polynomial) for some quantum graph g, then f (g) = c. First we consider the case when c = 0. We may assume that g has no isolated nodes, since isolate removal does not change the values hom(g, X) and f (g). By Lemma 6.50, g = M h for some quantum graph h in which all constituents have more than q nodes. But then f (M h) = 0 by the hypothesis of the Theorem. The case of general constant c follows easily: we have hom(g − cK0 , X) = hom(g, X) − c = 0, and hence f (g) = f (g − cK0 ) + c = c. Claim 6.52. The ideal generated by the polynomials hom(g, X) with f (g) = 0 does not contain the constant polynomial 1. Suppose that we have a representation 1=
N ∑ i=1
pi (X)hom(gi , X),
110
6. GRAPH ALGEBRAS AND HOMOMORPHISM FUNCTIONS
where f (gi ) = 0, and the pi are arbitrary polynomials in C[X]. Let us apply a permutation σ ∈ Sq to the variables, and sum over all σ. We get: q! =
N ∑ ∑
pi (X σ )hom(gi , X σ ) =
N (∑ ∑
) pi (X σ ) hom(gi , X).
i=1 σ∈Sq
i=1 σ∈Sq
The expression in the large parenthesis is a polynomial in C[X]Sq , and hence by Lemma 6.49, it is a polynomial of the form hom(hi , X). Hence we get q! =
N ∑
N (∑ ) hom(hi , X)hom(gi , X) = hom hi gi , X .
i=1
i=1
By Claim 6.51, this implies that f
N (∑
) hi gi = q!.
i=1
But we have, using the multiplicativity of f , N N N (∑ ) ∑ ∑ f hi gi = f (hi gi ) = f (hi )f (gi ) = 0, i=1
i=1
i=1
a contradiction. So Claim 6.52 is proved. Now it is easy to finish the proof of the theorem. Claim 6.52 and the Nullstellensatz imply that there are complex numbers aij such that aij = aji (1 ≤ i, j ≤ q), and hom(g, A) = 0 for every quantum graph g for which f (g) = 0 (where A is the matrix with entries aij ). Applying this to the quantum graph G − f (G)K0 (where G is an arbitrary multigraph), we get that hom(G − f (G)K0 , A) = 0, and hence f (G) = hom(G, A). Proof of Theorem 5.57. We will apply Theorem 5.56, but first we need to show a couple of properties of f following from reflection positivity. Claim 6.53. f (K1 ) is a nonnegative integer. Let q = f (K1 ). For k ≥ 1, and for a partition P = (S1 , . . . , Sm ) of [k], let UP denote the k-multilabeled graph on [m] with no edges, where node i is labeled by the elements of Si . Consider the submatrix M of M mult (f, k) formed by those rows and columns indexed by the graphs UP . ∑ Let hk = P µP UP . Then using identity (A.5) for the M¨obius function, we get ∑ ∑ (6.37) ⟨hk , hk ⟩ = µP µQ f (UP ∨Q ) = µP µQ q |P ∨Q| = q(q − 1) · · · (q − k + 1). P,Q
P,Q
Since f is reflection positive, this value must be nonnegative for every k, which implies that q is a nonnegative integer. Claim 6.54. If G is a multigraph with k = v(G) > q, then f (M G) = 0. Let us label the nodes of G by [k], to get a k-labeled graph. Then M G = [[hk G]], and so f (M G) = ⟨hk , G⟩. Equation (6.37) implies that ⟨hk , hk ⟩ = 0, which (using reflection positivity again) implies that ⟨hk , G⟩ = 0, which proves the Claim. So Theorem 5.56 applies, and we get that there exists a symmetric matrix A ∈ Cq×q for which f = hom(., A). To complete the proof, we have to show:
6.6. THE POLYNOMIAL METHOD
111
Claim 6.55. The matrix A is real. This does not follow just from the assumption that f is real (see Exercise 6.56); we have to use reflection positivity again. Suppose that A has an entry auv which ∑N is not real. There is a polynomial p = j=0 cj z j ∈ C[z] such that if z = auv , i p(z) = −i if z = auv , 0 if z ∈ {ast , ast } for some entry ast ̸= auv , auv . (so p takes pure imaginary values on the entries of A and on their conjugates). This polynomial may have complex coefficients, but it is easy to see that its complex conjugate p(x) satisfies the same conditions, and hence, replacing p by (p + p)/2 if necessary, we may assume that p has real coefficients. ∑ Consider the 2-labeled quantum graph g = j cj Bj•• . We have ∑ ∑ ∑ ∑ p(auv )2 < 0, ⟨g, g⟩ = ck cj hom(Bk+j , A) = ck cj ak+j uv = k,j
k,j
u,v
u,v
which contradicts the assumption that f is reflection positive. Exercise 6.56. Show by an example that hom(G, Z) can be real for every multigraph G for a non-real matrix Z.
Part 3
Limits of dense graph sequences
CHAPTER 7
Kernels and graphons The aim of this Chapter is to introduce certain analytic objects, which will serve as limit objects for graph sequences in the dense case. In the Introduction (Section 1.5.3) we already gave an informal description of how these graphons enter the picture as limit objects; however, for the next few chapters we will not talk about graph sequences, but we treat graphons as generalizations of graphs, to which many graph-theoretic definitions and results can be extended. Quite often, the formulation and even the proof of these more general facts are easier in this analytic setting. We will define the cut norm and cut-distance of these objects, state and prove regularity lemmas for them, and prove basic properties of sampling from them. These results will enable us to show that these are just the right objects to represent the limits of convergent dense graph sequences. 7.1. Kernels, graphons and stepfunctions Let W denote the space of all bounded symmetric measurable functions W : [0, 1]2 → R. The elements of W will be called kernels (the name refers to the fact that they give rise to kernel operators on function spaces on [0, 1]; we will return to this connection in Section 7.5). Let W0 denote the set of all kernels W ∈ W such that 0 ≤ W ≤ 1. The elements of W0 will be called graphons (the name comes from the contraction of graph-function). Sometimes we will also need to consider the set of all functions W ∈ W such that −1 ≤ W ≤ 1; this will be denoted by W1 . As usual, we will not distinguish functions that are almost everywhere equal (. . . most of the time). Then the space W is just the space of symmetric functions in L∞ ([0, 1]2 ), which we could identify with the space L∞ (T ), where T is the triangle {(x, y) ∈ [0, 1]2 : x ≤ y}. We introduced a separate notation for it because we want to consider a number of different norms on W, of which the L∞ norm will play a relatively minor role. A graphon whose values are 0 and 1 can be considered as a graph on node set [0, 1]. In this case, we can talk about its subgraphs, induced subgraphs, complement, and so on. Such 0-1 valued graphons will come up in our discussions repeatedly; however, they would not be sufficient for our main goal, namely, describing limit objects for convergent graph sequences. Kernels generalize weighted graphs in the following sense. A function W ∈ W is called a stepfunction, if there is a partition S1 ∪ · · · ∪ Sk of [0, 1] into measurable sets such that W is constant on every product set Si × Sj . The sets Si are the steps of W . For every weighted graph H (on node set V (H) = [n]), we define a stepfunction WH ∈ W as follows: Split [0, 1] into n intervals J1 , . . . , Jn of length λ(Ji ) = αi /αH , and for x ∈ Ji and y ∈ Jj , let WH (x, y) = βij (H). Note that the function WH depends on how the nodes of H are labeled. 115
116
7. KERNELS AND GRAPHONS
Conversely, every stepfunction U corresponds to a weighted graph: if S1 , . . . , Sk are its steps, then the graph is defined on [k], and the edge ij has weight U (x, y), where x ∈ Si and y ∈ Sj . If H is a weighted graph with nodeweights 1 and weighted adjacency matrix A, then we write WA = WH . If the edgeweights of H are from the interval [0, 1], then WH is a graphon. In particular, for every simple (unweighted) graph G, WG is a 0-1 valued graphon (recall Figure 1.3). In this sense, simple graphs can be considered as special 0-1valued graphons. This correspondence with simple graphs suggests how to extend some basic quantities associated with graphs to kernels (or at least to graphons). Most important of these is the (normalized) degree function ∫1 (7.1)
dW (x) =
W (x, y) dy. 0
(If the graphon is associated with a simple graph G, this corresponds to the scaled degree dG (x)/v(G).) We will see more such quantities in the next sections. Instead of the interval [0, 1], we can consider any probability space (Ω, A, π) with a symmetric measurable function W : Ω × Ω → [0, 1]. This would not provide substantially greater generality, but it is sometimes useful to represent graphons by probability spaces other than [0, 1]. We’ll discuss this in detail in Chapter 13, but will use this different way of representing a graphon throughout. Graphons will come up in several quite different forms in our discussions. In Theorem 11.52 we will collect the many disguises in which they occur.
7.2. Generalizing homomorphisms Homomorphism densities in graphs extend to homomorphism densities in graphons and, more generally, in kernels. For every W ∈ W and multigraph F = (V, E) (without loops), define ∫ ∏ ∏ t(F, W ) = W (xi , xj ) dxi [0,1]V
ij∈E
i∈V
We can think of the interval [0, 1] as the set of nodes, and of the value W (x, y) as the weight of the edge xy. Then the formula above is an infinite analogue of weighted homomorphism numbers. We get weighted graph homomorphisms as a special case when W is a stepfunction: For every unweighted multigraph F and weighted graph G, (7.2)
t(F, G) = t(F, WG ).
Of the two modified versions of homomorphism densities (5.12) and (5.13), the notion of the injective density tinj has no significance in this context, since a random assignment i 7→ xi (i ∈ V (F ), xi ∈ [0, 1]) is injective with probability 1. In other words, tinj (F, W ) = t(F, W ) for any kernel W and any graph F . But the induced subgraph density is worth defining, and in fact it can be expressed by a rather
7.2. GENERALIZING HOMOMORPHISMS
simple integral: (7.3)
∫
tind (F, W ) = [0,1]V
∏
W (xi , xj )
ij∈E
∏
117
( ) ∏ 1 − W (xi , xj ) dxi . i∈V
ij∈(V2 )\E
We have an analogue of the inclusion-exclusion formula (5.20), which follows by expanding the parentheses in the integrand (7.3): ∑ ′ (7.4) tind (F, W ) = (−1)e(F )−e(F ) t(F ′ , W ) = t↑ (F, W ). F ′ ⊇F V (F ′ )=V (F )
We should point out that tinj (F, WH ) ̸= tinj (F, H) and tind (F, WH ) ̸= tind (F, H) in general. We have seen that tinj (F, WH ) = t(F, WH ) = t(F, H). For the induced density, tind (F, WH ) has a combinatorial meaning if H is a looped-simple graph: it is the probability that a random map V (F ) → V (H) (not necessarily injective) preserves both adjacency and nonadjacency. Many other basic properties of homomorphism numbers extend to graphons, often to kernels, in a straightforward way, like (5.19) generalizes to ∑ tind (F ′ , W ), (7.5) t(F, W ) = F ′ ⊇F
and (5.28) generalizes to the identity (7.6)
t(F1 F2 , W ) = t(F1 , W )t(F2 , W ).
We can also generalize homomorphism numbers from partially labeled graphs. Let F = (V, E) be a k-labeled multigraph. Let V0 = V \ [k] be the set of unlabeled nodes. For W ∈ W and x1 , . . . , xk ∈ [0, 1], we define ∫ ∏ ∏ tx1 ,...,xk (F, W ) = W (xi , xj ) dxi x∈[0,1]V0
ij∈E
i∈V0
(this is a function of x1 , . . . , xk ). In particular, we have tx (K2• , W ) = dW (x). It is often convenient to use the notation tx , where x = (x1 , . . . , xk ). The product of two k-labeled graphs F1 and F2 satisfies (7.7)
tx (F1 F2 , W ) = tx (F1 , W )tx (F2 , W )
′
If F arises from F by unlabeling node k (say), then ∫ (7.8) tx1 ,...,xk−1 (F ′ , W ) = tx1 ,...,xk (F, W ) dxk . [0,1]
By repeated application of this equation, we get that if F is a k-labeled multigraph, then ∫ (7.9) t([[F ]], W ) = tx (F, W ) dx. [0,1]k
Further versions of homomorphism densities treated before can be extended to homomorphism densities in kernels in a straightforward way. Homomorphism densities of quantum graphs in kernels are defined simply by linearity. Densities of signed graphs can be defined generalizing expression 7.3 for the induced subgraph
118
7. KERNELS AND GRAPHONS
densities. Explicitly, let F = (V, E + , E − ) be a signed graph and W ∈ W, then we define ∫ ∏ ∏ ( ) ∏ (7.10) t(F, W ) = W (xi , xj ) 1 − W (xi , xj ) dxi . [0,1]V
ij∈E+
ij∈E−
i∈V
From this definition, it follows that if W is a graphon, then 0 ≤ t(F, W ) ≤ 1 for every signed graph F . We can also express t(F, W ) as ∑ ( ) (7.11) t(F, W ) = (−1)|Y | t (V, E+ ∪ Y ), W . Y ⊆E−
This shows that∑we can still identify a signed graph F = (V, E + , E − ) with the quantum graph Y ⊆E− (−1)|Y | (V, E+ ∪ Y ). If all edges are signed “+”, then t(F, W ) is the same as for unsigned graphs. If Fb is the signed complete graph, obtained from an unsigned simple graph F on the same node set, in which the edges of F are signed positive and the edges of the complement are signed negative, then we get the following identity, equivalent to (7.4): t(Fb, W ) = tind (F, W ).
(7.12)
We define the induced density tind,x (F, W ) of a k-labeled graph F , and the density of a k-labeled signed graph or quantum graph in the obvious way. The following proposition states some main properties of subgraph densities in kernels. Proposition 7.1. The graph parameter t(., W ) is multiplicative and reflection positive for every kernel W ∈ W. The corresponding simple graph parameter is also multiplicative, and it is reflection positive if W ∈ W0 . Proof. The second assertion is more difficult to prove, and we describe the proof in this case only. Multiplicativity is trivial. To prove that t(., W ) is reflection positive, consider any finite set F1 , . . . , Fm of k-labeled graphs, and real numbers y1 , . . . , ym . We want to prove that m ∑
t([[Fp Fq ]], W )yp yq ≥ 0.
p,q=1
For every k-labeled graph F with node set [n], let F ′ denote the subgraph of F induced by the labeled nodes, and F ′′ denote the graph obtained from F by deleting the edges spanned by the labeled nodes. Then we have (7.13)
m ∑
yp yq t([[Fp Fq ]], W )
p,q=1
∫
m ∑
= [0,1]k
yp yq tx (Fp′′ , W )tx (Fp′′ , W )tx (Fp′ ∪ Fq′ , W ) dx.
p,q=1
∑ We substitute tx (Fp′ ∪ Fq′ , W ) = H tind,x (H, W ), where the summation extends over all graphs on [k] containing Fp′ ∪ Fq′ as a subgraph. Interchanging summation,
7.2. GENERALIZING HOMOMORPHISMS
119
we get (7.14)
m ∑
yp yq t([[Fp Fq ]], W )
p,q=1
∫ =
∑
∑
[0,1]k H F ,F ⊆H p q
yp yq tx (Fp′′ , W )tx (Fp′′ , W )tind,x (H, W ) dx
For a fixed H, the integrand can be written as ( ∑ )2 ∑ yp yq tx (Fp′′ , W )tx (Fp′′ , W )tind,x (H, W ) = yp tx (Fp′′ , W ) tind,x (H, W ), Fp ,Fq ⊆H
Fp ⊆H
which is nonnegative by the assumption that 0 ≤ W ≤ 1.
We will see (Theorem 11.52) that multiplicativity and reflection positivity, together with the trivial condition that t(K1 ) = 1, characterize simple graph parameters of the form t(F, W ). For a general graphon W , the graph parameter t(., W ) cannot be represented as a homomorphism number into a (finite) weighted graph: it is multiplicative and reflection positive, but it may have infinite connection rank. We will see (Corollary 13.48) that t(., W ) has finite connection rank if and only if W is equal to a stepfunction almost everywhere. The multigraph parameter t(., W ) is contractible, but has no contractor. This will follow from Theorem 6.30 together with the uniqueness of representation of a parameter in the form t(., W ) (Theorem 13.10). Example 7.2 (Eulerian orientations revisited). We have seen that the number of eulerian orientations eul(G) is not a homomorphism function. However, it can be expressed as a homomorphism density in a kernel: ( ) − → (7.15) eul(F ) = t F, 2 cos(2π(x − y)) . Indeed, we can write 2 cos(2π(x − y) = e2πi(x−y) + e2πi(y−x) , so if we expand the product ∏
(e2πi(xu −xv ) + e2πi(xv −xu ) ),
uv∈E(F )
− → then every term corresponds to an orientation F of F , where selecting e2πi(xv −xu ) corresponds to orienting the edge uv from u to v. Thus ∏ ∑ ∏ e2πi(xv −xu ) 2 cos(2π(xu − xv )) = → − → − F uv∈E( F )
uv∈E(F )
=
∑ ∏
e
− + 2πi(d→ − (u))xu − (u)−d→ F
F
.
→ − u∈V (F ) F
If we integrate over all the xu , every term cancels in which the orientation is not + − eulerian, i.e., where any of the nodes u has d→ − (u) − d→ − (u) ̸= 0. Those terms F F corresponding to eulerian orientations contribute 1. So the sum counts eulerian orientations.
120
7. KERNELS AND GRAPHONS
Remark 7.3. There is probably no good way to define homomorphism numbers from graphons into graphs or into other graphons. The parameters related to such homomorphisms that extend naturally to graphons are defined by maximization, like the normalized maximum cut, and more generally, restricted maximum multicuts. We will discuss these in Chapter 12. We can generalize the functional t(F, W ) further (believe me, not for the sake of generality). Let A be a set of kernels. An A-decorated graph is a finite simple graph F = (V, E) in which every edge e ∈ E is labeled by a function We ∈ A. We write w = (We : e ∈ E). For every W-decorated graph (F, w) we define ∫ ∏ ∏ Wij (xi , xj ) dxi . (7.16) t(F, w) = [0,1]V
ij∈E
i∈V
For a fixed graph F , the functional t(F, w) is linear in every edge decoration We . So it may be considered as linear functional on the tensor product W ⊗ · · · ⊗ W (one factor for every edge of F ), or equivalently, as a tensor on W with e(F ) slots. This definition contains some of the previous variations on homomorphism numbers, and it can be used to express homomorphism densities in sums of kernels. Example 7.4. Let F = (V, E + , E − ) be a signed graph and W ∈ W0 . Let us decorate each edge in E + by W , and each edge in E − by 1 − W . Let F0 be the unsigned version of F . Then for the W-decorated graph (F, w) obtained this way, we have t(F0 , w) = t(F, W ). Example 7.5. For W1 , . . . , Wk ∈ W, we have t(F, W1 + · · · + Wk ) =
∑
t(F, w),
w
where w ranges over all {W1 , . . . , Wk }-decorations of F . Exercise 7.6. Let F and G be two simple graphs, and let W be a graphon such that t(F, G) > 0 and t(G, W ) > 0. Prove that t(F, W ) > 0. [Hint: Use the Lebesgue Density Theorem.] Exercise 7.7. Prove that for any two simple graphs F and G with v(F ) ≤ v(G) we have (v(F )) tind (F, G) − tind (F, WG ) ≤ 2 . v(G) Exercise 7.8. Let us generalize the construction of graph integrals by adding “nodeweights”: for every graph F and bounded measurable functions α : [0, 1] → R and W : [0, 1]2 → R (where W is symmetric), we define ∫ ∏ ∏ α(xi ) W (xi , xj ) dx. t(F, α, W ) = [0,1]V (F )
i∈V (F )
ij∈E(F )
Show that if we require that α ≥ 0, then t(F, α, W ) can be expressed as cv(F ) t(F, U ) with some c ≥ 0 and U : [0, 1]2 → R, where c and U depend on α and W , but not on F . Exercise 7.9. Prove that the number of perfect matchings in a graph G = (V, E) can be expressed as t(G, e−2πix , 1 + e2πi(x+y) ).
7.3. WEAK ISOMORPHISM I
121
7.3. Weak isomorphism I One complication caused by moving to infinite objects is that isomorphism does not have an obvious (and unique) definition any more. We can of course talk about two kernels U, W to be equal as functions, but this is not very useful. More in the spirit of functional analysis, we will talk about the two kernels to be equal almost everywhere, i.e., W (x, y) = U (x, y) for almost all (x, y) ∈ [0, 1]2 (with respect to the Lebesgue measure). This notion, however, is not what we mean by two kernels being “essentially the same”: it corresponds to the equality of labeled graphs, not to isomorphism of unlabeled graphs, which involves finding the right bijection between the node sets. In terms of graphons (or kernels), we can define this as follows: two kernels U, W ∈ W are isomorphic up to a null ( set if there ) is an invertible measure preserving map φ : [0, 1] → [0, 1] such that U φ(x), φ(y) = W (x, y) almost everywhere. (See Appendix A.3 and the book of Sinai [1976] for the basics of measure preserving maps.) Since the inverse of an invertible measure preserving map φ : [0, 1] → [0, 1] is also measure preserving, isomorphism up to a null set is an equivalence relation. However, there is a weaker notion of isomorphism, which will be more important for us. The motivation for this notion is the fact that a measure preserving map need not be invertible. Let W ∈ W and let φ : [0, 1] → [0, 1] be a measure preserving map. We define a kernel W φ by ( ) W φ (x, y) = W φ(x), φ(y) . From the point of view of using these functions as continuous analogues of graphs, the functions W and W φ are not essentially different. For example, we have the following important fact: Proposition 7.10. Let W ∈ W and let φ : [0, 1] → [0, 1] be a measure preserving map. Then for every multigraph F = (V, E), we have t(F, W φ ) = t(F, W ). ( ) Proof. This follows from the fact that (x1 , . . . , xn ) 7→ φ(x1 ), . . . , φ(xn ) is a measure preserving map [0, 1]n → [0, 1]n , and hence for every integrable function f : [0, 1]n → R we have ∫ ∫ ( ) f φ(x1 ), . . . , φ(xn ) dx1 . . . dxn = f (x1 , . . . , xn ) dx1 . . . dxn [0,1]n
[0,1]n
by ∏ (A.16) in the Appendix. Applying this equation to the function f (x1 , . . . , xn ) = ij∈E W (xi , xj ), we get the assertion. We want to say that W and W φ are “weakly isomorphic”. One has to be a little careful though, because measure preserving maps are not necessarily invertible, and so the relationship between W and W φ in Proposition 7.10 is not symmetric (see Example 7.11). For the time being, we take the easy way out, and call two kernels U and W weakly isomorphic if t(F, U ) = t(F, W ) for every simple graph F . We will come back to a characterization of weakly isomorphic kernels in terms of measure preserving maps (in other words, proving a certain converse of Proposition 7.10) in Sections 10.7 and 13.2. It will also follow that in this case the equation t(F, U ) = t(F, W ) holds for all multigraphs F (see Exercise 7.18 for a direct proof). Weak isomorphism of kernels is clearly an equivalence relation, and we can identify kernels that are weakly isomorphic. This identification will play an important role in our discussions.
122
7. KERNELS AND GRAPHONS
Example 7.11. The map φ2 : x 7→ 2x (mod 1) is measure preserving. For every kernel W , the kernel W φ2 consists of four “copies” of W (see Figure 7.1). Similarly, φ3 : x 7→ 3x (mod 1) is measure preserving, and W φ3 consists of nine “copies” of W . The kernels W , W φ2 and W φ3 are weakly isomorphic, but there is no measure preserving map transforming W φ2 to W φ3 (Exercise 7.13).
W
W
ϕ2
W
ϕ3
Figure 7.1. Gray-scale images of three graphons that are weakly isomorphic, but not isomorphic up to a null set. Recall that the origin is in the upper left corner. This example illustrates that weak isomorphism is not a very easy notion. We will return to it and develop more and more information about it when we introduce distances between graphons, sampling, twin reduction, and other tools in the theory of graphons. Exercise 7.12. Suppose that two kernels U and W are weakly isomorphic. Prove that so are the kernels aU + b and aW + b (a, b ∈ R). Exercise 7.13. Prove that the kernels W , W φ2 and W φ3 in Example 7.11 are weakly isomorphic, but not isomorphic up to a null set.
7.4. Sums and products Perhaps the first tool we use in graph theory is the decomposition into connected components. For kernels, a similar decomposition exists, but one must be a bit careful with 0-sets. This was worked out by Janson [2008]. Let W1 , W2 , . . . be a finite or countably infinite family of kernels, and let ∑ a1 , a2 . . . be positive real numbers with i ai = 1. We define the direct sum of the Wi with weights ai , in notation W = a1 W1 ⊕ a2 W2 ⊕ . . . , as follows. We split the interval [0, 1] into intervals J1 , J2 , . . . of length a1 , a2 , . . . , consider the monotone affine maps φi mapping Ji onto [0, 1], and let { ( ) Wi φi (x), φi (y) , if x, y ∈ Ji , i = 1, 2, . . . , W (x, y) = 0, otherwise. A kernel will be called connected, if it is not isomorphic up to a null set to the direct sum of two kernels. This is equivalent to saying that for every subset S ⊆ [0, 1] with 0 < λ(S) < 1, we have ∫ |W (x, y)| dx dy > 0. S×([0,1]\S)
Every kernel can be written as the direct sum of connected kernels and perhaps the 0 kernel. (We have to allow the 0 kernel, which cannot be written as the sum of
7.4. SUMS AND PRODUCTS
123
connected kernels.) This decomposition is unique (up to zero sets); see Bollob´as, Janson and Riordan [2007] and Janson [2008] for more. Somewhat confusingly, we can introduce three “product” operations on kernels, and we will need all three of them. Let U, W ∈ W. We denote by U W their (pointwise) product as functions, i.e., (U W )(x, y) = U (x, y)W (x, y). We denote by U ◦ W their operator product (the name refers to the fact that this is the product of U and W as kernel operators, see Section 7.5) ∫1 (U ◦ W )(x, y) =
U (x, z)W (z, y) dz. 0
We note that U ◦ W is not symmetric in general, but it will be in the cases we use this operation (for example, when U = W ). Finally, we denote by U ⊗ W their tensor product; this is defined as a function [0, 1]2 × [0, 1]2 → [0, 1] by (U ⊗ W )(x1 , x2 , y1 , y2 ) = U (x1 , y1 )W (x2 , y2 ). This function is not defined on [0, 1]2 and hence it is not in W; however, we can consider any measure preserving map φ : [0, 1] → [0, 1]2 , and define the kernel ( ) (U ⊗ W )φ (x, y) = (U ⊗ W ) φ(x), φ(y) . It does not really matter which particular measure preserving map we use here: these kernels obtained from different maps φ are weakly isomorphic by the same computation as used in the proof of Proposition 7.10, and so we can call any of them the tensor product of U and W . We note that the tensor product has the nice property that (7.17)
t(F, U ⊗ W ) = t(F, U )t(F, W )
for every multigraph F . We denote the n-th power of a kernel according to these three multiplications by W n (pointwise power), W ◦n (operator power), and W ⊗n (tensor power). There are many other properties and constructions for graphs that can be generalized to graphons in a natural way. For example, we call a graphon W bipartite, if there is a partition V (G) = V1 ∪ V2 such that W (x1 , x2 ) = 0 for almost all (x1 , x2 ) ∈ V1 × V2 . We can define k-colorable kernels similarly. We call a graphon triangle-free, if t(K3 , W ) = 0. Simple facts like “every bipartite graphon is triangle-free” can be proved easily. Often one faces minor complications because of exceptional nullsets; a rather general remedy for this problem, called pure graphons, will be introduced in Section 13.3. Exercise 7.14. Show that for every simple graph F , t(F, W ◦n ) = t(F ′ , W ), where F ′ is obtained from F by subdividing each edge by n − 1 new nodes. Exercise 7.15. Prove that connectivity of a graphon is invariant under weak isomorphism. Exercise 7.16. Prove that a graphon W is bipartite if and only if t(C2k+1 , W ) = 0 for all k ≥ 1.
124
7. KERNELS AND GRAPHONS
7.5. Kernel operators Every function W ∈ W defines an operator TW : L1 [0, 1] → L∞ [0, 1], by ∫1 (7.18)
W (x, y)f (y) dy.
(TW f )(x) = 0
Sometimes it will be useful to consider TW as an operator L∞ [0, 1] → L1 [0, 1] or L2 [0, 1] → L2 [0, 1]; the formula is meaningful in each of these cases. If we consider TW : L2 [0, 1] → L2 [0, 1], then it is a Hilbert-Schmidt operator, and the rich theory of such operators can be applied. It is a compact operator, which has a discrete spectrum, i.e., a countable multiset Spec(W ) of nonzero (real) eigenvalues {λ1 , λ2 . . . } such that λn → 0. In particular, every nonzero eigenvalue has finite multiplicity. Furthermore, it has a spectral decomposition ∑ (7.19) W (x, y) ∼ λk fk (x)fk (y), k
where fk is the eigenfunction belonging to the eigenvalue λk with ∥fk ∥2 = 1. The series on the right may not be almost everywhere convergent (only in L2 ), but one has ∫ ∞ ∑ 2 W (x, y)2 dx dy = ∥W ∥22 ≤ ∥W ∥2∞ . λk = k=1
[0,1]2
A useful consequence of this bound is that if we order the λi by decreasing absolute value: |λ1 | ≥ |λ2 | ≥ . . . , then ∥W ∥2 |λk | ≤ √ . k It also follows that for every other kernel U on the same probability space, the inner product can be computed from the spectral decomposition: ∫ ∫ ∑ (7.21) ⟨U, W ⟩ = U (x, y)W (x, y) dx dy = λk U (x, y)fk (x)fk (y) dx dy (7.20)
k
[0,1]2
=
∑
[0,1]2
λk ⟨fk , U fk ⟩
k
(where the series on the right is absolute convergent). The spectral decomposition is particularly useful if we need to express operator powers: The spectral decomposition of the n-th operator power is ∑ W ◦n (x, y) = λnk fk (x)fk (y), k
and the series on the right hand side converges to the left hand side almost everywhere if n ≥ 2. Proposition 7.17. The eigenfunctions fk belonging to a nonzero eigenvalue λk of any function W ∈ W are bounded. Proof. Indeed, ∫1 1 1 |fk (x)| = ∥W ∥∞ ∥fk ∥1 . W (x, y)fk (y) dy ≤ |λk | |λk | 0
7.5. KERNEL OPERATORS
125
Some subgraph densities have nice expressions in terms of this spectrum. Generalizing (5.31), we have (7.22) ∫ ∑ λnk . t(Cn , W ) = W (x1 , x2 ) · · · W (xn−1 , xn )W (xn , x1 ) dx1 . . . dxn = k
[0,1]n
This expression is also valid for n = 2: ∫ ∑ (7.23) t(C2 , W ) = W (x, y)2 dx dy = ∥W ∥22 = λ2k . k
[0,1]2
Furthermore, for every n ≥ 3, (7.24)
txy (Pn•• , W ) =
∑
λn−1 fk (x)fk (y) k
k
almost everywhere. For n = 2 the left side is just W , and so we don’t always get pointwise equality, only convergence in L2 . For a general multigraph F = (V, E), we can express its density in a kernel W by a rather hairy spectral formula (Lov´asz and Szegedy [2011]), which is nevertheless useful. Substituting (7.19) in the definition of t(F, W ) and expanding, we get ∑ ∏ ∏ (7.25) t(F, W ) = λχ(e) Mχ (v), χ:E→N∗ e∈E
v∈V
where ∫1 (7.26)
Mχ (v) =
∏
fχ(uv) (x) dx.
0 u: uv∈E
(One has to be careful, since (7.19) only converges in L2 , not necessarily almost everywhere. But using (7.21) we can substitute for the values W (xi , xj ) one by one.) This representation expresses t(F, W ) in an infinite “edge-coloring model”, which is analogous to homomorphism numbers with the role of nodes and edges interchanged (see Section 23.2 for a discussion of finite edge-coloring models): we sum over all colorings of the edges with N; for every coloring, we take the product of nodeweights and the product of edgeweights; the edgeweights are just the eigenvalues, and the weight of a node is computed from the colors of the edges incident with it. One consequence of (7.22) is that the cycle densities in W determine the spectrum of TW and vice versa. In fact, we don’t have to know all cycle densities: any “tail” (t(Ck , W ) : k ≥ k0 ) is enough. This follows from Proposition A.21 in the Appendix. In particular, we see that t(C2 , W ) = ∥W ∥22 is determined by the cycle densities t(Ck , W ), k ≥ 3. Exercise 7.18. (a) Let F = (V, E) be a multigraph without loops, and let us subdivide each edge e ∈ E by m(e) ≥ 0 new nodes, to get a multigraph F ′ . Show that using (7.24) the density of F ′ in W can be expressed by a formula similar to (7.25). (b) Show that the densities of simple graphs in a kernel determine the densities of multigraphs. Exercise 7.19. Let W be a graphon. Prove that (a) all eigenvalues of TW are contained in the interval [−1, 1]; (b) the largest eigenvalue is also largest in absolute value; (c) at least one of the eigenvectors belonging to the largest eigenvalue is nonnegative almost everywhere.
CHAPTER 8
The cut distance We have announced in the Introduction that we are going to define the distance of two arbitrary graphs, so that this distance will reflect structural similarity. The definition is quite involved, and we will approach the problem in several steps: starting with two graphs on the same node set, then moving to graphs with the same number of nodes (but on unrelated sets of nodes), then moving to the general case. Finally, we extend the definition to kernels, where it will turn out simpler (at least in words) than in the finite case. In this section we consider dense graphs. The definitions are of course valid for all graphs, but they give a distance of o(1) between two graphs with edge density o(1), so they are not useful in that setting. 8.1. The cut distance of graphs 8.1.1. Norms of a matrix. Let A be an n × n matrix. There are a number of norms that come up in various studies. We will need the ℓ1 -norm n 1 ∑ |Aij |, (8.1) ∥A∥1 = 2 n i,j=1 the ℓ2 or Frobenius norm (8.2)
n (1 ∑ )1/2 ∥A∥2 = A2ij , 2 n i,j=1
and the ℓ∞ -norm (8.3)
∥A∥∞ = max |Aij |. i,j
(Note the normalization for the ℓ1 and ℓ2 norms: when A an adjacency matrix, all these norms are between 0 and 1.) Our main tool will be a less standard norm, called the cut norm, which was introduced by Frieze and Kannan [1999]. This is defined by ∑ 1 Aij . (8.4) ∥A∥ = 2 max n S,T ⊆[n] i∈S,j∈T
It is clear that (8.5)
∥A∥ ≤ ∥A∥1 ≤ ∥A∥2 ≤ ∥A∥∞ .
Example 8.1. Let A be an n × n matrix, whose entries are independent random ±1’s (with expectation 0). ∑ Then ∥A∥1 = ∥A∥2 = ∥A∥∞ = 1. On the other hand, the expectation of i∈S,j∈T Aij is 0, and the variance is Θ(n2 ), and so the ∑ expectation of i∈S,j∈T Aij is Θ(n). The expectation of the maximum in (8.4) 127
128
8. THE CUT DISTANCE
is more difficult to compute, but using the Chernoff–Hoeffding inequality, one gets that ∥A∥ < 4n−1/2 with high probability. Alon and Naor [2006] relate the cut norm of a symmetric matrix to its Grothendieck norm (well known in functional analysis). It follows by the results of Grothendieck that the cut norm is between two absolute constant multiples of the Grothendieck norm. The Grothendieck norm can be viewed as a semidefinite relaxation of the cut norm, and it is polynomial time computable to an arbitrary precision. So we can compute, in polynomial time, an approximation of the cut norm with a multiplicative error less than 2. We don’t go into the details of these results here; in our setting it will be more important to approximate the cut norm by a randomized sampling algorithm, to be described in Section 10.3. We’ll say more about approximation of the cut norm in the more general setting of graphons in Section 14.1. 8.1.2. Two graphs on the same set of nodes. Let G and G′ be two graphs with a common node set [n]. From any of the matrix norms introduced above, the norm of the difference of their adjacency matrices defines a distance between two graphs. Two of these distances have special significance. The ℓ1 distance |E(G)△E(G′ )| = ∥AG − AG′ ∥1 n2 is also called the edit distance (usually without the normalization). It can be thought of as the fraction of pairs of nodes whose adjacency we have to toggle to get from one graph to the other. The cut metric derived from the cut norm can be described combinatorially as follows. For an unweighted graph G = (V, E) and sets S, T ⊆ V , let eG (S, T ) denote the number of edges in G with one endnode in S and the other in T (the endnodes may also belong to S ∩ T ; so eG (S, S) = 2eG (S) is twice the number of edges spanned by S). For two graphs G and G′ on the same node set [n], we define their cut distance (as labeled graphs) by d1 (G, G′ ) =
d (G, G′ ) =
max
S,T ⊆V (G)
|eG (S, T ) − eG′ (S, T )| = ∥AG − AG′ ∥ . n2
In this setting dividing by |S|×|T | instead of n2 might look more natural. However, dividing by |S|×|T | would emphasize small sets too much, and the maximum would be attained when |S| = |T | = 1. With our definition, the contribution of a pair S, T is at most |T ||S|/n2 (for simple graphs). It is easy to see that d (G, G′ ) ≤ d1 (G, G′ ), and in general the two distances are quite different. For example, if G and G′ are two independent random graphs 1/2, then with high probability d1 (G, G′ ) ≈ 1/2 but on [n] with edge probability √ ′ d (G, G ) = O(1/ n). We will have to define the distance of two weighted graphs G and G′ on the same node set V , but with possibly different nodeweights. In this case, we have to add a term accounting for the difference in their node weighting. To simplify ′ notation, let αi = αi (G)/αG , αi′ = αi (G′ )/αG′ , βij = βij (G) and βij = βij (G′ ). Then we define ∑ ∑ ′ (8.6) d1 (G, G′ ) = |αi − αi′ | + |αi αj βij − αi′ αj′ βij | i∈V
i,j∈V
8.1. THE CUT DISTANCE OF GRAPHS
and (8.7)
d (G, G′ ) =
∑
129
∑ ′ |αi − αi′ | + max (αi αj βij − αi′ αj′ βij ) .
i∈V
S,T ⊆V
i∈S,j∈T
It is easy to check that these formulas define metrics, and they specialize to the “old” definitions when the nodeweights are 1 and the edgeweights are 0 or 1. Another special case worth mentioning is when the nodeweights of the two graphs are the same: in this case, the first term in both definitions disappears, and inside the ′ ). We note, second term, we get the slightly simpler expression αi αj (βij − βij ′ furthermore, that since G and G can be represented as points in the same finite dimensional space, all usual distance functions on the set of weighted graphs on the same set of nodes would give the same topology. Example 8.2. Let Hn denote the complete graph on [n], where all nodes have weight 1 and all edges have weight 1/2. Then for a random graph G = G(n, 1/2) on the same node set, we have d (G, Hn ) = o(1) with high probability. 8.1.3. Two graphs with the same number of nodes. If G and G′ are unlabeled unweighted graphs on possibly different node sets but of the same cardinality n, then we define their distance by b G b ′ ), (8.8) δb (G, G′ ) = min d (G, bG b′ G,
b and G b ′ range over all labelings of G and G′ by 1, . . . , n, respectively. (Of where G course, it would be enough to fix a labeling for one of the graphs and minimize over all labelings of the other.) The hat above the δ indicates that the “ultimate” definition will be somewhat different. Indeed, handling of this quantity δb (G, G′ ) is quite difficult, due to the min-max in the definition. 8.1.4. Two arbitrary graphs. Let G = (V, E) and G′ = (V ′ , E ′ ) be two graphs with (say) V = [n] and V ′ = [n′ ]. To define their distance, recall that for every graph G and positive integer m, the graph G(m) is obtained from G by replacing each node of G by m nodes, where two new nodes are connected if and only if their predecessors were. Using this operation, we can change the graphs so that they have the same number of nodes, by replacing them with G(n′ ) and G′ (n), or more generally, by G(kn′ ) and G′ (kn) for any k ∈ N. Now we can use the distance δb to define the distance ( ) δ (G, G′ ) = lim δb G(kn′ ), G′ (kn) . k→∞
A more complicated but “finite” definition of the same quantity can be given as follows (cf. Exercise 8.5). A fractional overlay of G and G′ is a nonnegative ∑n ∑n′ n × n′ matrix X = (Xiu ) such that u=1 Xiu = n1 and i=1 Xiu = n1′ . If n = n′ and σ : V → V ′ is a bijection, then Xiu = n1 1(σ(i) = u) is a fractional overlay (which in this case is an honest-to-good overlay). We denote by X (G, G′ ) the set of all fractional overlays. Fixing a fractional overlay X, we can define a generalization of the labeled cut distance: ∑ ( ) (8.9) d (G, G′ , X) = max ′ Xiu Xjv 1(ij ∈ E) − 1(uv ∈ E ′ ) . Q,R⊆V ×V
iu∈Q, jv∈R
130
8. THE CUT DISTANCE
The distance of the two graphs can be described by optimizing over fractional overlays: δ (G, G′ ) =
(8.10)
min
X∈X (G,G′ )
d (G, G′ , X)
One can generalize this to weighted graphs. Let G = (V, E) and G′ = (V ′ , E ′ ) be two weighted graphs with normalized nodeweights αi = αi (G) and αu′ = αu (G′ ) ′ (so that αG = αG′ = 1), and edgeweights βij = βij (G) and βij = βij (G′ ). A fractional overlay of G and G′ is defined as a nonnegative n × n′ matrix X such ∑n′ ∑n that u=1 Xiu = αi (G) and i=1 Xiu = αu (G′ ). We define ∑ ′ Xiu Xjv (βij − βuv ) (8.11) d (G, G′ , X) = max ′ Q,R⊆V ×V
iu∈Q, jv∈R
′
and then δ (G, G ) can be defined by the same formula (8.10). This formula can be rephrased as follows, using two more V × V ′ matrices Y and Z: ∑ ′ (8.12) δ (G, G′ ) = min ′ max Yiu Zjv (βij − βuv ) . X∈X (G,G ) 0≤Y,Z≤X
i,j∈V,u,v∈V ′
Indeed, the absolute value on the right is a convex function of the entries of Y and Z, and so it is maximized when every entry is equal to either 0 or to the corresponding entry of X. To illuminate definition (8.10) a little, we can think of a fractional overlay as a probability distribution χ on V × V ′ whose marginals are uniform. In other words, it is a coupling of the uniform distribution on V with the uniform distribution on V ′ . Select two pairs (i, u) and (j, v) from the distribution χ. Then (8.9) expresses some form of correlation between ij being an edge and uv being an edge. One word of warning: δ is only a pseudometric, not a true metric, because δ (G, G′ ) may be zero for different graphs G and G′ . This is the case e.g. if G′ = G(k) for some k (cf. Exercise 8.6). We have to discuss a technical problem, for which only partial results are available (but these will be enough for our purposes). If G and G′ have the same number of nodes, then the definition of δ may give a value different from their δb distance. It is trivial that δ (G, G′ ) ≤ δb (G, G′ ), but how much larger can the right side be? It may be larger (see Exercise 8.8. Perhaps the increase is never larger than a factor of 2, but this is open. To prove anything nontrivial requires tools to be developed later; in Section 9.4 we are going to prove, among others, the (rather weak) inequality 45 . δb (G, G′ ) ≤ √ − log δ (G, G′ ) (One important consequence of this weak inequality will be that any Cauchy sequence of graphs in the δ distance is also a Cauchy sequence in the δb distance.) Example 8.3. Let K denote the graph with a single node of weight 1, endowed with a loop with weight 1/2. Then for a random graph G = G(n, 1/2), we have δ (G, K) = o(1) with high probability.
8.2. CUT NORM AND CUT DISTANCE OF KERNELS
131
Exercise 8.4. Let A be a symmetric matrix. Show that restricting the pairs (S, T ) in the definition (8.4) of the cut norm in any of the following ways will decrease it by a small factor only: (a) T = S, by at most 2; (b) T ∩ S = ∅, by at most 4; (c) T = [n] \ S, by at most 6; (d) |S|, |T | ≥ n/2, by at most 4. Exercise 8.5. Prove that the definitions of δ (G, G′ ) through blow-ups and through fractional overlays lead to the same value. Exercise 8.6. Let G1 and G2 be two simple graphs with δ (G1 , G2 ) = 0. Prove that there is a simple graph G and n1 , n2 ≥ 1 such that Gi ∼ = G(ni ). Exercise 8.7. Let A be a symmetric n × n matrix with all entries in [−1, 1]. Let A′ be obtained from A by deleting a row and the corresponding column. Prove that 2 ′ ∥A∥ − ∥A ∥ ≤ . n Exercise 8.8. (a) Let H denote the graph on two nonadjacent nodes, with a loop at each of them. Prove that δb (H, K2 ) = 1/4 but δ (H, K2 ) = 1/8. (b) Prove b n,n , K n,n ) > δ(Kn,n , K n,n ). that if n is odd, then δ(K
8.2. Cut norm and cut distance of kernels After the rather heavy going with the cut distance for graphs, it sounds frightening that we want to extend all this to kernels. But in fact, the definitions become simpler and more transparent. (This is not the last time when graphons will provide a more user-friendly environment.) 8.2.1. Cut norm. We define the cut norm on the linear space W of kernels by (8.13)
∫ ∥W ∥ = sup W (x, y) dx dy S,T ⊆[0,1] S×T
where the supremum is taken over all measurable subsets S and T . It is sometimes convenient to use the corresponding metric d (U, W ) = ∥U − W ∥ . The cut norm is a norm; this is easy to prove using standard analysis. Similarly as in the case of matrices, we have the trivial inequalities between the most important norms of a kernel in W1 : (8.14)
∥W ∥ ≤ ∥W ∥1 ≤ ∥W ∥2 ≤ ∥W ∥∞ ≤ 1. 1/2
In the opposite direction, we have trivially ∥W ∥2 ≤ ∥W ∥1 (showing that ∥.∥1 and ∥.∥2 define the same topology on W1 ), but the other two norms in the formula above define different topologies. However, for a stepfunction U with k steps we have the trivial inequality (8.15)
∥U ∥1 ≤ k 2 ∥U ∥ .
√ It can be shown, in fact, that the coefficient k 2 can be replaced by 2k (see Janson [2010], Remark 9.8, and also our Exercise 8.18); but the inequality above will be enough for us. There is some natural notation that goes with this norm. For every set R ⊆ W0 , we define its ε-neighborhood in the cut-norm B (R, ε) = {W ∈ W0 : d (W, R) < ε} = {W ∈ W0 : (∃U ∈ R) d (W, U ) < ε}.
132
8. THE CUT DISTANCE
We define the ε-neighborhood B1 (R, ε) in the L1 -norm analogously. (We defined all this in the graphon space W0 , where we need this notation. One could of course take other sets of kernels as the universe.) 8.2.2. Cut distance of unlabeled kernels. Kernels, defined on the fixed set [0, 1], correspond to labeled graphs. Just as for graphs, we introduce an “unlabeled” version of the cut norm, by finding the best overlay of the underlying sets. Let S [0,1] denote the set of measure preserving maps [0, 1] → [0, 1], and let S[0,1] denote the set of all invertible measure preserving maps [0, 1] → [0, 1] (the inverse of such a map is known to be measure preserving as well, so S[0,1] is a group; see Appendix A.3.2). We define the cut distance of two kernels by (8.16)
δ (U, W ) =
inf
φ∈S[0,1]
d (U, W φ ),
( ) (where W φ (x, y) = W φ(x), φ(y) ). It is easy to see that either one of the following expressions could be used to define the cut distance: (8.17)
δ (U, W ) = =
inf
φ∈S[0,1]
d (U φ , W ) =
inf φ,ψ∈S [0,1]
inf φ∈S [0,1]
d (U, W φ )
d (U ψ , W φ ).
We will prove the much less trivial fact that in the last expression the infimum is attained: Theorem 8.13 below establishes this in larger generality, for all norms satisfying some natural conditions. The distance δ of kernels is only a pseudometric, since different kernels can have distance zero. (Such pairs of kernels will turn out exactly the weakly isomorphic pairs, but this will take more work to prove.) We can identify two kernels f of unlabeled kernels. We define the sets whose cut distance is 0, to get the set W f0 and W f1 analogously. W Going into all the complications with using the cut norm and then minimizing over measure preserving transformations is justified by the important fact that the metric δ defines a compact metric space on graphons. We will state and prove this fact in Section 9.3. One main advantage in using graphons instead of graphs is that many formulas and proofs become much simpler and more transparent. (Just compare the definition (8.16) of the distance of two graphons with the definition (8.12) of the analogous quantity for two weighted graphs!) When going from graphs to graphons via the correspondence G 7→ WG , we may pay a prize by having to estimate how much error we make by this. This will indeed require extra work in some cases, but in other cases we will be lucky, and no error will be made. For example, equation (7.2) shows that homomorphism numbers “from the left” don’t change when we replace G by WG . The next lemma shows that the situation is similar with the δ distance. (We will not always be so lucky; Section 12.4.4 will be devoted to estimating this kind of error for multicuts.) Lemma 8.9. For any two weighted graphs H and H ′ δ (H, H ′ ) = δ (WH , WH ′ ).
( [0, 1] be a measure preserving map. Let Si : i ∈ Proof. ) (Let φ : [0, 1] → ) V (H) and Tu : u ∈ V (H ′ ) be the partitions of [0, 1] into the steps of WH and
8.2. CUT NORM AND CUT DISTANCE OF KERNELS
133
( ) WH ′ . Define Xiu = λ Si ∩ φ(Tu ) , then the matrix (Xiu ) is a fractional overlay of H and H ′ . Conversely, every fractional overlay can be obtained from a measure preserving map this way. We claim that for this measure preserving map and the corresponding fractional overlay we have ∑ ∫ φ ′ (8.18) max ′ Xiu Xjv (βij − βuv ) = sup (WH − WH ′ ) . Q,R⊆V ×V
Y,Z⊆[0,1]
iu∈Q, jv∈R
Y ×Z
′
For every Q ⊆ V ×V , let SQ = ∪(i,u)∈Q Si ∩φ(Tu ). Then for a fixed Q, R ⊆ V ×V ′ , it is easy to check that ∫ ∑ φ ′ Xiu Xjv (βij − βuv )= (WH − WH ′ ). iu∈Q, jv∈R
SQ ×SR
On the other hand, if Ziu = λ(Z ∩ Si ∩ φ(Tu )) and Yiu = λ(Y ∩ Si ∩ φ(Tu )), then 0 ≤ Yiu , Ziu ≤ Xiu , and ∫ ∑ φ ′ Yiu Zjv (βij − βuv (WH − WH ). ′) = i,j∈V,u,v∈V ′
Y ×Z
So the definition (8.10) of δ (H, H ′ ) implies the direction ≤ in (8.18), while formula (8.12) implies reverse direction. This proves (8.18), from which the Lemma follows. 8.2.3. Maxima versus suprema: cut norm. One price we have to pay for working with infinite objects like graphons is that when maximizing a function over an infinite set of objects (e.g. subsets), we don’t necessarily have a maximum, only a supremum; hence we have to work with approximate optima. With two important definitions, the cut norm and the cut distance, we don’t have this difficulty. (The Compactness Theorem 9.23 will provide another powerful tool to avoid such problems in many cases.) Next we prove this for the cut norm, and at the end of this chapter, for the cut distance. This would not be absolutely necessary: in most cases, we could just carry along an arbitrarily small error term. Nevertheless, it makes sense to include these facts in this book: if you want to work with these notions, you might as well work with them as conveniently as possible. The next lemma also provides a useful expression for the cut norm. Lemma 8.10. For any kernel W ∈ W, the optima ∫ (8.19) sup W (x, y) dx dy S,T ⊆[0,1]
and (8.20)
sup
S×T
∫
f,g: [0,1]→[0,1]
f (x)g(y)W (x, y) dx dy
[0,1]2
are attained, and they are both equal to ∥W ∥ . The sets S, T and the functions f, g are tacitly assumed to be measurable. We can write the expression to be maximized in (8.19) as ⟨1S , TW 1T ⟩, and in (8.20), as ⟨f, TW g⟩ (where TW is the operator defined by (7.18)). The assertion of the lemma is equivalent to saying that the optimum in (8.20) is attained, and it is attained
134
8. THE CUT DISTANCE
by 0-1 valued functions f and g. I am grateful to Svante Janson for suggesting a simplification of the proof that follows. Proof. Let D = supf,g ⟨f, TW g⟩. We start with proving that this supremum is attained by appropriate functions f and g. Let fn , gn : [0, 1] → [0, 1] (n = 1, 2, . . . ) be functions such that ⟨fn , TW gn ⟩ → D. The set of functions [0, 1] → [0, 1] are weak*-compact, which means that by selecting a subsequence, we may assume that it tends to a limit f : [0, 1] → [0, 1] in the sense that ⟨fn , h⟩ → ⟨f, h⟩ for every h ∈ L1 [0, 1]. Similarly, we can go to a further subsequence to assume that gn converges to a function g in the same sense. It is easy to see that f and g are bounded (perhaps after changing them on a null set). Now we claim that ∫ ∫ fn (x)gn (y)W (x, y) dx dy −→ f (x)g(y)W (x, y) dx dy. [0,1]2
[0,1]2
This convergence is trivial when W = 1S×T for two measurable sets S, T ⊆ [0, 1]. Hence it follows when W is stepfunction, since stepfunctions are linear combinations of a finite number of functions of the type 1S×T . Hence it follows for every kernel, since every kernel can be approximated by stepfunctions in L1 ([0, 1]2 ), and the factors fn , gn , f, g are bounded. This implies that ⟨f, TW g⟩ = D. Next we show that the maximizing functions f and g can be chosen to be 0-1 valued. Let S = {x : 0 < f (x) < 1}, and suppose that λ(S) > 0. Define ( ) fs (x) = f (x) + s min f (x), 1 − f (x) . Then for −1 ≤ s ≤ 1, the function fs satisfies 0 ≤ fs ≤ 1, and hence, by the maximality property of f , we have ⟨fs , Tw g⟩ ≤ ⟨f, Tw g⟩. Since ⟨fs , Tw g⟩ is a linear function of s and equality holds for s = 0, we must have equality for of ( all values ) s, in particular for s = 1, and so we can replace f by f1 (x) = min 1, 2f (x) . Repeating this construction, we get a sequence of optimizing functions that monotone converges to the 0-1 valued function f = 1(f (x) > 0). So we can replace f by f , and similarly we can replace g by a 0-1 valued function g. 8.2.4. Operator norms and cut norm. While the cut norm is best suited for combinatorial purposes, it is equivalent to more traditional norms, such as the operator norm of TW as an operator L∞ → L1 , as the following simple lemma shows: Lemma 8.11. For every kernel W , we have ∥W ∥ ≤ ∥TW ∥∞→1 ≤ 4∥W ∥ . Proof. By definition, ∥TW ∥∞→1 =
sup ∥TW g∥1 =
−1≤g≤1
sup
⟨f, TW g⟩ =
−1≤f,g≤1
sup
⟨f, TW g⟩ .
−1≤f,g≤1
Comparing this expression with (8.20), we get the first inequality. For the second, we write ∥TW ∥∞→1 = sup ⟨f − f ′ , TW (g − g ′ )⟩. 0≤f,f ′ ,g,g ′ ≤1
Here ⟨f − f ′ , TW (g − g ′ )⟩ = ⟨f, TW g⟩ − ⟨f ′ , TW g⟩ − ⟨f, TW g ′ ⟩ + ⟨f ′ , TW g ′ ⟩ ≤ 4∥T ∥ .
8.2. CUT NORM AND CUT DISTANCE OF KERNELS
135
There are many other variations on the definition which give norms that are some constant factor away from the cut norm; these are useful since in some proofs they come up more directly than the cut norm. Some of these are stated as exercises at the end of this section. There are other well-studied operator norms that are topologically equivalent to the cut norm (even though they are not equivalent up to a constant factor). The Schatten p-norm Sp (TW ) of a kernel operator TW is defined as the ℓp -norm of the sequence of its eigenvalues. For an even integer p, these can be expressed in terms of homomorphism densities: Sp (TW ) = t(Cp , W )1/p . (It is not trivial that t(C2r , U )1/(2r) is a norm, i.e., it is subadditive (the other defining properties of a norm are easy). In Proposition 14.2 we’ll describe a method to prove that Schatten norms are indeed norms, along with certain more general norms defined by graphs.) These norms define the same topology on W1 as the cut norm. We prove the explicit relationship for the case p = 4, which we need. Lemma 8.12. For every graphon U ∈ W1 , ∥U ∥4 ≤ t(C4 , U ) ≤ 4∥U ∥ . Proof. The second inequality is a special case of Lemma 10.23. To prove the first inequality, we use ∥U ∥ = sup ⟨f, TU g⟩, 0≤f,g≤1
where ⟨f, TU g⟩ ≤ ∥f ∥2 ∥TU g∥2 ≤ ∥TU g∥2 = ⟨TU g, TU g⟩1/2 = ⟨g, TU2 g⟩1/2 1/2
1/2
1/2
= ⟨g, TU ◦U g⟩1/2 ≤ ∥g∥2 ∥TU ◦U ∥2→2 ≤ ∥TU ◦U ∥2→2 ≤ ∥U ◦ U ∥2
= t(C4 , U )1/4 .
8.2.5. Minima versus infima: cut distance. The last result in this section is of a similar nature as Lemma 8.10: we prove that the “inf” in the last quantity in formula 8.17 above is in fact a “min”. This was proved by Bollob´as and Riordan [2009]. An analogous result for the L1 -norm was proved by Pikhurko [2010]. With later applications in mind, we prove it in greater generality. The construction that gives the cut distance δ from the cut norm can be applied to any other norm on W that is invariant under maps W 7→ W φ for all φ ∈ S[0,1] . We will call such a norm invariant. For an invariant norm N on the linear space W, we define δN (U, W ) =
inf
φ∈S[0,1]
N (U − W φ ).
We call this function the distance derived from N . The distances δN will be interesting for us mainly in the cases when N = ∥.∥ , N = ∥.∥1 and N = ∥.∥2 . The corresponding unlabeled distances are δ , δ1 and δ2 . Since the norm is invariant under measure preserving bijections, we have N (U − −1 W φ ) = N (U φ − W ), implying that δN (U, W ) = δN (W, U ). It is trivial that the triangle inequality holds for δN , so it is a semimetric (and clearly it is not a true metric, since δN (U, U φ ) = 0 for every measure preserving map φ ∈ S[0,1] ). We call a norm N smooth, if it is continuous in the topology of pointwise convergence in W. In other words, for every sequence of kernels (Wn ) such that
136
8. THE CUT DISTANCE
Wn ∈ W1 and Wn → 0 almost everywhere, we have N (Wn ) → 0. This implies that if Wn → W almost everywhere, then N (Wn ) → N (W ). The L1 , L2 and cut norms are smooth, but L∞ is not. We have defined invariance of a norm using measure preserving bijections, but (at least for smooth norms) this implies invariance under all measure preserving maps φ : [0, 1] → [0, 1]. This is easy to see for stepfunctions W , since for any measure preserving map W φ is a stepfunction with the same number of steps, same size of steps, and same function value on these steps as W , and hence there is a bijective measure preserving map ψ such that W φ = W ψ . For a general kernel W ∈ W, we have a sequence of stepfunctions Wn such that Wn → W almost everywhere, and then also Wnφ → W φ almost everywhere. By the smoothness of N this implies that N (Wn ) → N (W ) and N (Wnφ ) → N (W φ ). Since we know that N (Wnφ ) = N (Wn ), it follows that N (W φ ) = N (W ). Let us note that an invariant norm N on W defines a norm on bounded symmetric measurable functions on any standard probability space. Indeed, if (Ω, A, π) is such a space, then there is a measure preserving map ψ : [0, 1] → Ω, and then we can define N (W ) = N (W ψ ) for every bounded symmetric measurable function W : Ω × Ω → R. This value will not depend on the choice of ψ, which follows easily from the invariance of N . One can also give a more probabilistic description of the distance δN , using coupling measures (see Appendix A.3). For every coupling measure µ between two copies of [0, 1], the two projection maps π, ρ : [0, 1]2 → [0, 1] (where [0, 1]2 is equipped with the measure µ and [0, 1], with the Lebesgue measure) are measure preserving. So for every kernel U , the function U π is a kernel on the probability space ([0, 1]2 , B, µ), and similarly for the projection ρ. As remarked above, N defines a norm on kernels on ([0, 1]2 , B, µ); we denote this norm by Nµ . It is easy to see that for every kernel U on [0, 1], we have Nµ (U π ) = N (U ).
(8.21)
After this explanation, we can state the theorem: Theorem 8.13. Let N be a smooth invariant norm on W. Then we have the following alternate expressions for the unlabeled distance derived from N : (8.22)
δN (U, W ) = = =
inf
φ∈S[0,1]
inf
ψ∈S[0,1]
inf
N (U − W φ ) = N (U ψ − W ) =
φ,ψ∈S[0,1]
inf φ∈S [0,1]
inf ψ∈S [0,1]
N (U ψ − W φ ) =
N (U − W φ ) N (U ψ − W )
min φ,ψ∈S [0,1]
N (U ψ − W φ ),
and (8.23)
δN (U, W ) = min Nµ (U π − W ρ ), µ
where µ ranges over all coupling measures on [0, 1]2 . Proof. The equality of the first expressions in each line of (8.22) follows from the fact that invertible measure preserving maps form a group. First, let U and W be stepfunctions. As used before, the kernel W φ for any measure preserving map φ can be realized by an invertible measure preserving map,
8.2. CUT NORM AND CUT DISTANCE OF KERNELS
137
which implies that in each line of 8.22, the two expressions are equal. Equation (8.23) follows similarly easily in this case. Second, we consider arbitrary functions U, W ∈ W, and prove the formulas with the two occurrences of “min” replaced by “inf”. Let (Un ) and (Wn ) be sequences of stepfunctions converging almost everywhere to U and W , respectively. Then N (Un − U ) → 0 by the smoothness of N , and similarly for W . Since N (Unφ − U φ ) = N (Un − U ) for every measure preserving map φ, this implies that inf φ∈S [0,1]
N (Un − Wnφ ) =
inf
φ∈S[0,1]
N (Un − Wnφ ) →
inf
φ∈S[0,1]
N (U − W φ ) = δN (U, W ),
and also that inf φ∈S [0,1]
N (Un − Wnφ ) →
inf φ∈S [0,1]
N (U − W φ ),
which proves the equality in the first line of (8.22). The other equations follow similarly. However, this argument only gives an “inf” in the last two expressions for δN . To prove that it is in fact a minimum, we begin with (8.23). The space of coupling measures is compact in the weak topology, so it suffices to show that Nµ (U π − W ρ ), as a function of µ, is lower semicontinuous. This means that if µn → µ weakly (where µ and µn are coupling measures), then for every two kernels U and W , we have (8.24)
lim inf Nµn (U π − W ρ ) ≥ Nµ (U π − W ρ ). n
As a first step, we prove that Nµn (V ) → Nµ (V ) for every continuous function V . Let fn and f be the functions representing the measures µn and µ as in Proposition fn f A.6(iv). Then Nµn (V ( ) = N (V )), and (Nµ (V ) = N) (V ).f Since V is continuous, we fn have V (x, y) = V fn (x), fn (y) → V f (x), f (y) = V (x, y) for almost all x, y ∈ [0, 1]2 . By our assumption on the norm N , this implies that Nµn (V ) → Nµ (V ). As a special case, we get (8.24) for continuous kernels U and W . Let U, W : [0, 1] × [0, 1] → R be arbitrary kernels, and fix any ε > 0. There are continuous kernels Uk and Wk (k = 1, 2, . . . ) such that Uk → U and Wk → W almost everywhere. By the smoothness of N , we can fix k large enough so that N (Uk − U ) ≤ ε and N (W k − W ) ≤ ε. By the special case proved above, we know that Nµn (Ukπ − Wkρ ) → Nµ (Ukπ − Wkρ )
(n → ∞),
and we can fix n so that |Nµn (Ukπ − Wkρ ) − Nµ (Ukπ − Wkρ )| ≤ ε. Then, using (8.21), Nµ (U π − W ρ ) ≤ Nµ (Ukπ − Wkρ ) + Nµ (Ukπ − U π ) + Nµ (Wkρ − W ρ ) = Nµ (Ukπ − Wkρ ) + N (Uk − U ) + N (Wk − W ) ≤ Nµ (Ukπ − Wkρ ) + 2ε. Here, by the choice of n, Nµ (Ukπ − Wkρ ) ≤ Nµn (Ukπ − Wkρ ) + ε ≤ Nµn (U π − W ρ ) + Nµn (Ukπ − U π ) + Nµn (Wkρ − W ρ ) + ε = Nµn (U π − W ρ ) + N (Uk − U ) + N (Wk − W ) + ε ≤ Nµn (U π − W ρ ) + 3ε.
138
8. THE CUT DISTANCE
Combining these inequalities, we get that Nµ (U π − W ρ ) ≤ Nµn (U π − W ρ ) + 5ε if n is large enough. This proves (8.24) and thereby the existence of the minimum in (8.23). The existence of the minimum in (8.22) follows easily now. Let µ be a coupling measure such that δN (U, W ) = Nµ (U π − W ρ ). Let σ be a measure preserving bijection from [0, 1] with the Lebesgue measure into [0, 1]2 with the measure µ, and let π and ρ be the projections of [0, 1]2 to the two coordinates. The fact that µ is a coupling measure implies that the compositions φ = σπ and ψ = σρ are measure preserving, and N (U φ − W ψ ) = Nµ (U π , W ρ ) = δN (U, W ). This theorem has an important corollary: Corollary 8.14. For any smooth and invariant norm N on W, we have δN (U, W ) = 0 if and only if there exist maps φ, ψ ∈ S [0,1] such that U ψ = W φ almost everywhere. f This corollary allows us to consider the distances δ1 and δ2 as defined on W (just as δ ). In other words, the condition δN (U, W ) = 0 is independent of N , and identifying such pairs of kernels gives the same space for every smooth and invariant norm N . Exercise 8.15. Let H and H ′ be two weighted graphs on the same set with αH = αH ′ = 1, with the same edgeweights, but different nodeweights. Prove that δ1 (H, H ′ ) ≤ ∥α(H) − α(H ′ )∥1 . Exercise 8.16. Prove that for every kernel W ∈ W1 and k ≥ 2, we have t(C4 , W ) ≤ t(C2k , W )1/(2k) ≤ t(C4 , W )1/4 . Exercise 8.17. Let σ : [0, 1] → [0, 1] range over maps that can be obtained as , nk ] (k = 1, . . . , n) and permute follows: we split (0, 1] into the intervals Ik = ( k−1 n these intervals arbitrarily. Prove that for every smooth and invariant norm N we have δN (U, W ) = inf N (U − W σ ). σ √ Exercise 8.18. Improve the coefficient in (8.15) to (a) 2k, (b) 2 k (this is not easy!). ∫ Exercise 8.19. Show that if we use the formula supS S×S W to define a norm (which is only a constant factor off the cut norm), then the supremum is not always attained (Laczkovich [1995]). Exercise 8.20. Show by examples that one could not replace any of the “inf”-s by “min” in Theorem 8.13. Exercise 8.21. Show that if N is the L∞ norm on W, then even the “easy” part of Theorem 8.13 fails: there are two kernels U and W such that inf φ∈S [0,1] ∥U − W φ ∥∞ = 0 but inf φ∈S[0,1] ∥U − W φ ∥∞ > 0.
8.3. Weak and L1 -topologies We end this discussion of graphon distances with a further somewhat technical issue. The topology on W defined by the cut norm is certainly different from the topology defined by the L1 -norm; there are, however, some nontrivial relationships between them. We will discuss these in larger generality and detail in Section 14.2, but a few simple facts can be proved here easily, and we will need some of them soon.
8.3. WEAK AND L1 -TOPOLOGIES
139
The key to relating the cut norm to other topologies is the following lemma. Lemma 8.22. Suppose that ∥Wn ∥ → 0 as n → ∞ (Wn ∈ W1 ). Then∫ for every function Z ∈ L1 ([0, 1]2 ), ∥ZWn ∥ → 0. In particular, ⟨Z, Wn ⟩ → 0 and S Wn → 0 for every measurable set S ⊆ [0, 1]2 . Proof. If Z is the indicator function of a rectangle, these conclusions follow from the definition of the ∥.∥ norm. Hence the conclusion follows for stepfunctions, since they are linear combinations of a finite number of indicator functions of rectangles. Then it follows for all integrable functions, since they are approximable in L1 ([0, 1]2 ) by stepfunctions. A uniformly bounded sequence of kernels Wn ∈ W is called weak* convergent to a kernel W if ⟨Wn , U ⟩ → ⟨W, U ⟩ ∫for every integrable function U : [0, 1]2 → R. ∫ This is equivalent to requiring that S×T Wn → S×T W for all measurable sets S and T . This sound almost like convergence in the cut norm, but it is not the same! Lemma 8.22 implies that convergence in the ∥.∥ norm implies weak* convergence. However, weak* convergence does not imply convergence in the cut norm (Exercise 8.26; an interesting counterexample follows from Example 11.41). Since ∥.∥ ≤ ∥.∥1 , the cut norm is continuous with respect to the L1 -norm. The converse is not true (recall the example of random graphs from the Introduction, Figure 1.5), but the following fact, proved and used by Lov´asz and Szegedy [2010a], shows that it is at least lower semicontinuous: Proposition 8.23. Let Wn → W in the cut norm (Wn , W ∈ W1 ). Then lim inf ∥Wn ∥1 ≥ ∥W ∥1 . n→∞
Proof. Let Y = sgn(W ). Then by Lemma 8.22, ∥Wn ∥1 ≥ ⟨Wn , Y ⟩ → ⟨W, Y ⟩ = ∥W ∥1 .
As noted above, we cannot claim in Proposition 8.23 that ∥Wn ∥1 → ∥W ∥1 . However, as further applications of Lemma 8.22, we can state two weaker facts in this direction (Lov´ asz and Szegedy [2010a]); the first of these was also proved (in a slightly different form) by Pikhurko [2010]. Proposition 8.24. Let W be a 0-1 valued graphon and let (Wn ) be a sequence of graphons such that ∥Wn − W ∥ → 0. Then ∥Wn − W ∥1 → 0. Proof. By Lemma 8.22, we have ∫ ∫ ∥Wn − W ∥1 = Wn + (1 − Wn ) → {W =0}
{W =1}
∫
∫ (1 − W ) = 0.
W+ {W =0}
{W =1}
Proposition 8.25. Suppose that Un → U in the cut norm as n → ∞ (U, Un ∈ W0 ). Then for every W ∈ W0 there is a sequence of graphons Wn ∈ W0 such that Wn → W in the cut norm, and ∥Un − Wn ∥1 → ∥U − W ∥1 . It is important that we want Wn ∈ W0 ; if we only wanted kernels, we could take simply Wn = W + Un − U .
140
8. THE CUT DISTANCE
Proof. First we consider the case when U ≥ W . Let { W (x, y)/U (x, y) if U (x, y) > 0, Z(x, y) = 0 otherwise. and define Wn = ZUn . Trivially Wn ∈ W0 , W = ZU , and ∥W − Wn ∥ = ∥Z(U − Un )∥ → 0 by Lemma 8.22. Furthermore, using that Un ≥ Wn and U ≥ W , we get ∥Un −Wn ∥1 −∥U −W ∥1 = ∥Un −Wn ∥ −∥U −W ∥ ≤ ∥U −Un ∥ +∥W −Wn ∥ . This implies that ∥Un − Wn ∥1 → ∥U − W ∥1 as n → ∞. The case when U ≤ W follows by a similar argument, replacing U, W, Un by 1 − U, 1 − W, 1 − Un . Finally, in the general case, consider the graphon V = max(U, W ). Then clearly ∥U − V ∥1 + ∥V − W ∥1 = ∥U − W ∥1 . Since U ≤ V , there exists a sequence (Vn ) of graphons such that ∥Vn − V ∥ → 0 and ∥Vn − Un ∥1 → ∥V − U ∥1 . Since V ≥ W , there is a sequence (Wn ) of graphons such that ∥Wn − W ∥ → 0 and ∥Wn − Vn ∥1 → ∥W − V ∥1 . Hence lim sup ∥Un − Wn ∥1 ≤ lim sup ∥Un − Vn ∥1 + lim sup ∥Vn − Wn ∥1 n→∞
n→∞
n→∞
= ∥U − V ∥1 + ∥V − W ∥1 = ∥U − W ∥1 . Using Proposition 8.23, the lemma follows. Exercise 8.26. Show that weak* convergence of a sequence of graphons does not imply convergence in the cut norm. Exercise 8.27. Show that ∥Wn ∥ → 0 (Wn ∈ W1 ) does not imply that ∥Wn ∥1 → 0.
CHAPTER 9
Szemer´ edi partitions One of the most important tools in understanding large dense graphs is the Regularity Lemma of Szemer´edi [1975, 1978] and its extensions. This lemma has many interesting connections to other areas of mathematics, including analysis and information theory (see Lov´ asz and Szegedy [2007], Bollob´as and Nikiforov [2008], Tao [2006a]). It also has weaker (but more effective) and stronger versions. Here we survey as much as we need from this rich theory, extend it to graphons (as it happens quite often, this leads to simpler, more elegant formulations), and prove a very general version of it using the space of graphons. 9.1. Regularity Lemma for graphs 9.1.1. Homogeneous bipartite graphs and the original lemma. For a graph G = (V, E) and for X, Y ⊆ V , let eG (X, Y ) denote the number of edges with one endnode in X and another in Y ; edges with both endnodes in X ∩ Y are counted twice. We denote by dG (X, Y ) =
eG (X, Y ) |X||Y |
the density of edges between X and Y . If X and Y are disjoint, we denote by G[X, Y ] the bipartite graph on X ∪ Y obtained by keeping just those edges of G that connect X and Y . Let P = {V1 , . . . , Vk } be a partition of V . We define the weighted graph GP on V by taking the complete graph and weighting its edge uv by dG (Vi , Vj ) if u ∈ Vi and v ∈ Vj . A related, but different construction is that of the template graph of the partition P. This weighted quotient graph G/P is defined on [k]: node i gets nodeweight |Si |/|V |, and the edge ij gets edgeweight eG (Si , Sj )/(|Si ||Sj |) (we allow loops here). The Regularity Lemma says, roughly speaking, that the node set of every graph has an equitable partition P into a “small” number of classes such that GP is “close” to G. Various (non-equivalent) forms of this lemma can be proved, depending on what we mean by “close”. Let G be a bipartite graph G with bipartition {U, W }. On the average, we expect that for X ⊆ U and Y ⊆ W , eG (X, Y ) ≈ dG (U, V )|X||Y |. For two arbitrary subsets of the nodes, eG (X, Y ) may be very far from this “expected value”, but if G is a random graph, or at least “random-like”, then it will be close; random graphs are very “homogeneous” in this respect. We say that G is ε-homogeneous, if eG (X, Y ) − dG (U, V )|X||Y | ≤ ε|U ||W | (9.1) 141
142
´ 9. SZEMEREDI PARTITIONS
holds for all subsets X ⊆ U and Y ⊆ W . Remark 9.1. Here we diverge from the usual statement of the Regularity Lemma: one usually considers ε-regular bipartite graphs, where the stronger condition eG (X, Y ) − dG (U, V )|X||Y | ≤ ε|X||Y | (9.2) is required for all Y ⊆ U and Y ⊆ W such that |X| > ε|U | and |Y | > ε|W |. (Clearly we could not require condition (9.2) to hold for small X and Y : for example, if both have one element, then the quotient eG (X, Y )/(|X||Y |) is either 0 or 1.) The properties of ε-homogeneity and ε-regularity are essentially equivalent (see Exercise 9.6). In either version, this property can be viewed as a quantitative version of quasirandomness discussed in the Introduction. With these definitions, the Regularity Lemma can be stated as follows: Lemma 9.2 (Regularity Lemma, almost original form). For every ε > 0 there is an S(ε) ∈ N such that every graph G = (V, E) has an equitable partition {V1 , . . . , Vk } (1/ε ≤ k ≤ S(ε)) such that for all but εk 2 pairs of indices 1 ≤ i < j ≤ k, the bipartite graph G[Vi , Vj ] is ε-homogeneous. The important point is that the bound S(ε) on the number of classes is independent of the graph G. Note that the Regularity Lemma does not say anything about the internal structure of the classes Vi . The lower bound k ≥ 1/ε guarantees that the number of edges inside the classes is bounded by k(n/k)2 ≤ εn2 . The exceptional pairs of classes contain at most εk 2 (n/k)2 = εn2 edges, so all these edges can be considered as “error terms”. If we need information about the internal structure of the classes, we have to appeal to the Strong Regularity Lemma to be discussed below. One feature of the Regularity Lemma, which unfortunately forbids practical applications, is that the upper bound S(ε) it...provides on the number of classes is 2 very large: standard proofs give a tower 22 of height about 1/ε2 , and unfortunately this is not far from the truth, as was shown by Gowers [2006] (for a simpler recent construction, see Conlon and Fox [2011]). 9.1.2. Weak Regularity Lemma. A version of the Regularity Lemma with a weaker conclusion but with a more reasonable error bound was proved by Frieze and Kannan [1999]. This is the form that we use most of the time in this book. Lemma 9.3 (Weak Regularity Lemma). For every k ≥ 1 and every graph G = (V, E), V has a partition P into k classes such that 2 d (G, GP ) ≤ √ . log k Note that we do not require here that P be an equitable partition; it is not hard to see that this version implies that there is also an equitable partition with √ similar property, just we have to increase the error bound to 4/ log k (see Exercise 9.7). To see the connection with the original lemma, we note that if G0 is an εhomogeneous bipartite graph, and H is the weighted complete bipartite graph with the same bipartition {U, W } and with edge weights d = e(G0 )/(|U ||W |), then (9.1) says that d (G0 , H) ≤ ε. Hence if P is a Szemer´edi partition in the sense of Lemma 9.2, then the distance between the bipartite subgraph of G induced by Vi and Vj ,
9.1. REGULARITY LEMMA FOR GRAPHS
143
and the corresponding weighted bipartite subgraph of GP , is at most ε for all but εk 2 pairs (i, j), and at most 1 for the remaining εk 2 pairs. This implies that the cut distance between G and GP is at most 2ε. So the partition in Lemma 9.3 has indeed weaker properties than the partition in Lemma 9.2. This is compensated for by the relatively decent number of partition classes. The Weak Regularity Lemma implies that there is a partition P such that the template graph satisfies 2 . (9.3) δ (G, G/P) ≤ d (G, GP ) ≤ √ log k 9.1.3. Strong Regularity Lemma. Other versions of the Regularity Lemma strengthen, rather than weaken, the conclusion (of course, at the cost of replacing the tower function by an even more formidable value). Such a “super-strong” Regularity Lemma was proved by Alon, Fischer, Krivelevich and Szegedy [2000]. To state this lemma, we need a further definition. Let P be an equitable partition of V (G), and let Q be an equitable refinement of it. Following Conlon and Fox [2011], we say that Q is ε-close to P, if for almost every pair S ̸= T ∈ P (with at most ε|P|2 exceptions), for almost every pair X, Y ∈ Q (with at most (|Q|/|P|)2 exceptions), we have eG (X, Y ) eG (S, T ) ≤ ε. − |X||Y | |S||T | Lemma 9.4 (Very Strong Regularity Lemma). For every sequence ε = (ε0 , ε1 , ...) of positive numbers there is a positive integer S(ε) such that for every graph G = (V, E), the node set V has an equitable partition P and an equitable refinement Q of P such that |Q| ≤ S(ε), P is ε0 -regular, Q is ε|P| -regular, and Q is ε0 -close to P. While this Very Strong Regularity Lemma has many important applications, it is not easy to explain its significance at this point. One important feature is that through the second partition Q, it carries information about the inside of the partition classes of P. A somewhat weaker (but essentially equivalent) version, which is simpler to state but more difficult to apply, was proved by Tao [2006b] and by Lov´asz and Szegedy [2007]. Lemma 9.5 (Strong Regularity Lemma). For every sequence ε = (ε0 , ε1 , ...) of positive numbers there is a positive integer S(ε) such that for every graph G = (V, E), there is a graph G′ on V , and V has a partition P into k ≤ S(ε) classes such that (9.4)
d1 (G, G′ ) ≤ ε0
and
d (G′ , (G′ )P ) ≤ εk .
Note that the first inequality involves the normalized edit distance, and so it is stronger than a similar condition with the cut distance would be. The second error bound εk in (9.4) can be thought of as being very small. If we choose εk = ε/2 for all k, we get the Weak Regularity Lemma 9.3 (without an explicit bound on the number of classes). Choosing εk = ε20 /k 2 , the partition obtained satisfies the requirements of the Original Regularity Lemma 9.2. We can replace εk by the much smaller number εk /(k 2 S(εk )2 ), where S is the bound in the Original Regularity Lemma. Then we can apply the Original Regularity Lemma to each of the partition classes obtained in Lemma 9.5, to get
´ 9. SZEMEREDI PARTITIONS
144
the Very Strong Regularity Lemma 9.4. (The details of this derivation are left to the reader as an exercise.) We will formulate the Strong Regularity Lemma for kernels, and prove it in that version, in Section 9.3. Exercise 9.6. Show that (a) if a bipartite graph is ε-regular, then it is εhomogeneous; (b) if a bipartite graph is ε3 -homogeneous, then it is ε-regular. Exercise 9.7. Prove that for every k ≥ 1 and every graph G √ = (V, E), V has an equitable partition P into k classes such that d (G, GP ) ≤ 4/ log k.
9.2. Regularity Lemma for kernels The Weak Regularity Lemma extends to kernels, and this is the form we are going to prove first. While stating and proving the Lemma directly would be quite easy, we make a detour by introducing the “stepping operator” formally and stating some basic properties. These will be useful later on. 9.2.1. The stepping operator. Let W ∈ W and let P = (S1 , . . . , Sq ) be a partition of [0, 1] into a finite number of measurable sets. (When we speak of a partition of [0, 1], we always mean such a partition.) We define the function WP by ∫ 1 WP (x, y) = W (x, y) dx dy (x ∈ Si , y ∈ Sj ). λ(Si )λ(Sj ) Si ×Sj
So WP is obtained by averaging W over the “steps” Si × Sj ; it is a stepfunction with steps in P. If λ(Si ) = 0 or λ(Sj ) = 0, then we define WP (x, y) = 0 (this is just to have a complete definition; sets of measure zero in the partition can usually be ignored). We call this construction a stepping of W ; it will be used throughout this book. Analytically, the stepping operator is the orthogonal projection of the Hilbert space L2 ([0, 1]2 ) onto the subspace of stepfunctions with P-steps. In probability language, it is the conditional expectation relative to the (finite) sigma-algebra generated by the sets in P. These remarks may make some of the basic properties below easier to understand, but we will use the more elementary direct formulation. Most of the information about the stepfunction WP is contained in a finite weighted graph, the quotient graph W/P. This is a weighted graph on [q], with nodeweights αi (W/P) = λ(Si ) and edgeweights βij (W/P) = WP (x, y) for any x ∈ Si , y ∈ Sj . On the space W, the stepping operator W 7→ WP is a linear operator which is idempotent and symmetric: (9.5)
⟨UP , WP ⟩ = ⟨UP , W ⟩ = ⟨U, WP ⟩.
Stepfunctions with steps in a fixed partition P form a finite dimensional linear space, and the stepping operator is the orthogonal projection onto this space, which is shown by the simple identity (9.6)
⟨WP , W − WP ⟩ = 0.
This implies that stepping operator is contractive with respect to the L2 norm: (9.7)
∥WP ∥22 = ∥W ∥22 − ∥W − WP ∥22 ≤ ∥W ∥22 .
9.2. REGULARITY LEMMA FOR KERNELS
145
It is not hard to see that the stepping operator is also contractive with respect to the cut norm (Exercise 9.17). In fact, we will see in Section 14.2.1 that stepping is contractive with respect to any other reasonable norm on W. 9.2.2. Weak Regularity Lemma. It is a basic fact from analysis that every kernel W can be approximated arbitrarily well by stepfunctions in the L1 norm. The approximating stepfunctions can be obtained by averaging over “steps”: Proposition 9.8. Let (Pn ) be a sequence of measurable partitions of [0, 1] such that every pair of points is separated by all but a finite number of partitions Pn . Then WPn → W almost everywhere for every W ∈ W. The Weak Regularity Lemma for kernels, proved by Frieze and Kannan [1999] (and in particular its Corollary 9.13 below), is a related statement about approximation by stepfunctions in the cut norm (instead of in the sense of almost everywhere convergence). Lemma 9.9 (Weak Regularity Lemma for Kernels). For every function W ∈ W and k ≥ 1 there is stepfunction U with k steps such that 2 ∥W ∥2 . ∥W − U ∥ < √ log k Roughly speaking, this Lemma says that every kernel can be approximated well in the cut norm by stepfunctions (in fact, by its steppings). Proposition 9.8 asserts something similar about approximating in the L1 -norm. Since ∥W ∥ ≤ ∥W ∥1 , approximating in the L1 norm seems to be a stronger result. However, the error in the L1 -norm approximation depends not only on the number of steps, but on W as well. The crucial fact about Lemma 9.9 is that the error tends to 0 as k → ∞, uniformly in W . The error bound in Lemma 9.9 is only attractive when compared with the error bound in the stronger versions; for a prescribed error ε, the number of partition classes we need is still exponential in 1/ε2 . Frieze and Kannan give a stronger form of this result that provides a polynomial size description of the approximating stepfunction. Lemma 9.10. For every kernel U ∈ W1 and k ≥ 1 there are k pairs of subsets Si , Ti ⊆ [0, 1] and k real numbers ai such that k ∑
1 ai 1Si ×Ti ∥ < √ . k i=1 ∑ 1Si ×Ti is a stepfunction; we can make it It is clear that the function i ai∑ symmetric by taking the average with i ai 1Ti ×Si , getting 2k terms. This symmetric stepfunction has at most 22k steps, so Lemma 9.9 follows from Lemma 9.10 (replacing k by 22k ). We have mentioned the significance of the interplay between the cut norm and other kernel norms. The proof of the Regularity Lemma is the first point where this is apparent. For later reference, we state the key observation in the proof of the Weak Regularity Lemma separately, in two versions. ∥U −
Lemma 9.11. (a) For every U ∈ W there are two sets S, T ⊆ [0, 1] and a real number 0 ≤ a ≤ ∥U ∥∞ such that ∥U − a1S×T ∥22 ≤ ∥U ∥22 − ∥U ∥2 .
´ 9. SZEMEREDI PARTITIONS
146
(b) Let U ∈ W and let P be a measurable k-partition of [0, 1]. Then there is a partition Q refining P with at most 4k classes such that ∥U − UP ∥ = ∥UQ − UP ∥ . Proof. Let S and T be measurable subsets of [0, 1] such that ∫ ∥U ∥ = U = |⟨U, 1S×T ⟩|, S×T
where we may assume that ⟨U, 1S×T ⟩ ≥ 0. Let a = (9.8)
∥U − a1S×T ∥22 = ∥U ∥22 −
1 λ(S)λ(T ) ∥U ∥ .
Then
1 ∥U ∥2 ≤ ∥U ∥22 − ∥U ∥2 . λ(S)λ(T )
This proves (a). The proof of (b) is similar. The inequality ∥U − UP ∥ ≥ ∥UQ − UP ∥ follows by the contractivity of the stepping operator (Exercise 9.17). To prove the other direction, let S and T be measurable subsets of [0, 1] such that ⟨U − UP , 1S×T ⟩ = ∥U − UP ∥ , and let Q denote the partition generated by P, S and T . Clearly Q has at most 4k classes. Using (9.5), we get ⟨U, 1S×T ⟩ = ⟨UQ , 1S×T ⟩, and hence ∥U − UP ∥ = ⟨U − UP , 1S×T ⟩ = ⟨UQ − UP , 1S×T ⟩ ≤ ∥UQ − UP ∥ .
This completes the proof.
Proof of Lemma 9.10. We apply Lemma 9.11(a) repeatedly, to get pairs of ∑j sets Si , Ti and real numbers ai such that the “remainders” Wj = U − i=1 ai 1Si ×Ti satisfy j−1 ∑ ∥Wi ∥2 . ∥Wj ∥22 ≤ ∥U ∥22 − i=0
Since the right hand side remains nonnegative, it follows that for every k there is a 0 ≤ i < k with ∥Wi ∥2 ≤ 1/k. Changing ai+1 , . . . , ak to 0, we get the lemma. The stepfunction approximating a given graphon W in Lemma 9.9 is usually not a stepping of W . Is the optimally approximating stepfunction necessarily a stepping of W ? While this looks plausible, the answer is negative (see Exercise 9.18). But, as noted by Frieze and Kannan [1999], such steppings are almost optimal: Lemma 9.12. Let W ∈ W1 , let U be a stepfunction, and let P denote the partition of [0, 1] into the steps of U . Then ∥W − WP ∥ ≤ 2∥W − U ∥ . Proof. Using that U = UP and the contractivity of the stepping operator with respect to the cut norm, we get ∥W − WP ∥ ≤ ∥W − U ∥ + ∥U − WP ∥ = ∥W − U ∥ + ∥UP − WP ∥ = ∥W − U ∥ + ∥(U − W )P )∥ ≤ 2∥W − U ∥ . Lemmas 9.9 and 9.12 imply:
9.2. REGULARITY LEMMA FOR KERNELS
147
Corollary 9.13. For every function W ∈ W1 and k ≥ 1 there is a partition P of [0, 1] into at most k sets with positive measure for which 2 ∥W − WP ∥ ≤ √ . log k A partition P of [0, 1] such that ∥W −WP ∥ ≤ ε will be called a weak regularity partition of w with error ε. Lemma 9.11(b) provides an alternative way of getting this corollary. It is easy to check that ⟨UQ − UP , UP ⟩ = ⟨U − UP , UP ⟩ = 0 if Q is a refinement of P. Hence ∥U − UP ∥2 = ∥UQ − UP ∥2 ≤ ∥UQ − UP ∥22 = ∥UQ ∥22 − ∥UP ∥22 = ∥U − UP ∥22 − ∥U − UQ ∥22 . So UQ is a better approximation of U in L2 than UP , and the gain is at least as large as ∥U − UP ∥2 . From here we can conclude just as in the proof above. Coming back to the approximation described in Lemma 9.10, it is often useful to have a bound on the numbers ai . With the notation of its proof, looking at the proof carefully, we see that 1 ai = ∥Wi ∥ , λ(Si )λ(Ti ) and 1 ∥Wi ∥2 = ∥Wi ∥22 − ∥Wi+1 ∥22 , λ(Si )λ(Ti ) whence k k ∑ ∑ λ(Si )λ(Ti )a2i = (∥Wi ∥22 − ∥Wi+1 ∥22 ) = ∥U ∥22 − ∥Wk ∥22 ≤ ∥U ∥22 . i=1
i=1
This bound allows ai to be large when λ(Si )λ(Ti ) is small, but it is easy to fix the argument to get a more useful bound. Instead of choosing the optimal sets Si and Ti , we choose a pair Si , Ti such that λ(Si ), λ(Ti ) ≥ 1/2, and 1 ⟨Wi , 1Si ×Ti ⟩ ≥ ∥Wi ∥ . 4 It is easy to see that such a pair exists (cf. Exercise 8.4). Then we get the following: Lemma 9.14. For every kernel W ∈ W1 and∑k ≥ 1 there are k pairs of subsets Si , Ti ⊆ [0, 1] and k real numbers ai such that i a2i ≤ 4 and ∥W −
k ∑
4 ai 1Si ×Ti ∥ < √ . k i=1
We can easily add other requirements in Lemma 9.9. Lemma 9.15. Let W ∈ W1 and 1 ≤ m < k. (a) For every m-partition Q of [0, 1] there is k-partition P refining Q such that 2 ∥W − WP ∥ ≤ √ . log k/m (b) For every m-partition Q of [0, 1] there is an equipartition P with k classes such that 2m ∥W − WP ∥ ≤ 2∥W − WQ ∥ + . k
´ 9. SZEMEREDI PARTITIONS
148
Proof. Statement (a) follows by the same argument as Lemma 9.9, just starting with Q instead of the indiscrete partition. To prove (b), we partition each class of Q into classes of measure 1/k, with at most one exceptional class of size less then 1/k. Keeping all classes of size 1/k, let us take the union of exceptional classes, and repartition it into classes of size 1/k, to get a partition P. To analyze this construction, let us also consider the common refinement R = P ∧ Q. Then WR and WP differ on a set of measure less than 2(m/k), and so ∥W − WP ∥ ≤ ∥W − WR ∥ +
2m . k
Lemma 9.12 implies that ∥W − WR ∥ ≤ 2∥W − WQ ∥ , which completes the proof. 9.2.3. Strong Regularity Lemma. The Strong Regularity Lemma too has a “continuous” version: Lemma 9.16 (Strong Regularity Lemma for Kernels). For every sequence ε = (ε0 , ε1 , ...) of positive numbers there is a positive integer S(ε) such that for every graphon W , there is another graphon W ′ , and a stepfunction U ∈ W0 with k ≤ S(ε) steps such that (9.9)
∥W − W ′ ∥1 ≤ ε0
and
∥W ′ − U ∥ ≤ εk .
We will give a proof of this Lemma, deriving it from an even more general theorem, in the next section. Here we sketch how to derive the graph version 9.5 from the kernel version. Let (ε0 , ε1 , ...) be a sequence of positive numbers, which we may assume is monotone decreasing. Let G be a simple graph on [n]. We apply Lemma 9.16 with εk /2 to WG , to get a threshold S ′ (depending only on (ε0 , ε1 , . . . ), a kernel W ′ and a partition P of [0, 1] such that |P| ≤ S ′ , ∥WG − W ′ ∥1 ≤ ε0 /2 and ∥W ′ − WP′ ∥ ≤ εk /2. ′ ′ First,(we have to turn ] W into a graph G . This can be done by randomization. Let∫ Ii = (i − 1)/n, i/n and Rij = Ii × Ij . We connect i and j with probability n2 Rij W ′ . The probability that this edge will be in the symmetric difference of ∫ E(G) and E(G′ ) is at most n2 |WG − W ′ |, and hence the expected (normalized) Rij
edit distance between G and G′ is at most ∥WG −W ′ ∥1 ≤ ε0 /2. Markov’s inequality gives that with probability at least 1/2, the distance d1 (G, G′ ) ≤ ε0 . Next, we have to turn the partition P of [0, 1] into a partition Q of [n]. We do this randomly again, by selecting a uniform random point Xi ∈ Ii (i = 1, . . . , n), and putting i into the m-th class of Q if Xi is in the m-th class of P. A bit trickier computation with second moments (which is similar to the proof of Proposition 12.19, but simpler, and √ is not given here) shows that with high probability, d (G′ , (G′ )Q ) ≤ εk /2 + 10/ n. Now we choose k0 = max(k0′ , 400/ε2k′ ). If n ≤ k0 , then we can take G = G′ 0 and partition [n] into singletons. If n > k0 , then with positive probability the ′ ′ partition Q constructed above √ satisfies |Q| = k ≤ k0 ≤ k0 , d1 (G, G ) ≤ ε0 and d (G′ , (G′ )Q ) ≤ εk /2 + 10/ n ≤ εk . Exercise 9.17. Prove that the stepping operator is contractive with respect to the L1 norm and the cut norm.
9.3. COMPACTNESS OF THE GRAPHON SPACE
149
Exercise 9.18. Show by an example that the best approximation in the cut norm of a function W ∈ W1 by a stepfunction with a given number of steps is not necessarily a stepping of W . Is stepping the best approximation in the L2 or in the L1 norm? Exercise 9.19. Are analogues of Lemma 9.12 valid for the L1 L2 and L∞ norms? Exercise 9.20. Formulate and prove the original Regularity Lemma for kernels. Exercise 9.21. Give a proof of the Strong Regularity Lemma 9.16 along the lines of the proof of Lemma 9.9. Exercise 9.22. Let K1 , K2 , . . . be arbitrary nonempty subsets of a Hilbert space H. Prove for every ε > 0 and f ∈ H there is an integer 1 ≤ m ≤ ⌈1/ε2 ⌉ and a vector f0 = α1 f1 + · · · + αm fm (αi ∈ R, fi ∈ Ki such that for every g ∈ Km+1 we have |⟨g, f − f0 ⟩| ≤ ε∥g∥∥f ∥. Derive the weak, original, and strong lemmas by choosing the sets Ki appropriately.
9.3. Compactness of the graphon space In this section we prove a theorem of Lov´asz and Szegedy [2007] that is equivalent (at least in a non-effective sense) to all versions of the Regularity Lemma. To be more precise, we will derive the theorem from the Weak Regularity Lemma, and then we will show that the Strong Regularity Lemma can be derived from it quite easily. In a sense, this theorem can be considered as the strongest form of regularity. f0 , δ ) is compact. Theorem 9.23. The space (W Proof. In a metric space, it suffices to prove that every sequence W1 , W2 , . . . of graphons has a convergent subsequence. For every n ≥ 1, we can construct the partitions Pn,k of [0, 1] (k = 1, 2, . . . ), using Lemma 9.15, such that these partitions and the corresponding stepfunctions Wn,k = (Wn )Pn,k ∈ W0 satisfy the following conditions: (i) ∥Wn − Wn,k ∥ ≤ 1/k, (ii) The partition Pn,k+1 refines Pn,k , (iii) |Pn,k | = mk depends only on k. Once we have such partitions, we can rearrange the points of [0, 1] for every fixed n by a measure preserving bijection so that every partition class in every Pn,k is an interval. Claim 9.24. We can replace the sequence (Wn ) by a subsequence so that for every k, the sequence Wn,k converges almost everywhere to a stepfunction Uk with mk steps as n → ∞. Indeed, we can select a subsequence of the Wn for which the length of the i-th interval of Wn,1 converges for every i, and also the value of Wn,1 on the product of the i-th and j-th intervals converges for every i and j (as n → ∞). It follows then that the sequence Wn,1 converges to a limit U1 almost everywhere, which itself is a stepfunction with m1 steps that are intervals. We repeat this for k = 2, 3, . . . , to get subsequences for which Wk,n → Uk almost everywhere, where Uk is a stepfunction with mk steps that are intervals. As usual, we always keep the k-th function after the k-th step. This yields the subsequence with the properties in the Claim.
´ 9. SZEMEREDI PARTITIONS
150
Let Pk denote the partition of [0, 1] into the steps of Uk . For every k < l, the partition Pn,l is a refinement of the partition Pn,k , and hence Wn,k = (Wn,l )Pn,k . It is easy to see that this kind of relation is inherited by the limiting stepfunctions: (9.10)
Uk = (Ul )Pk .
Let (X, Y ) be a random point in [0, 1]2 chosen uniformly, then (9.10) implies that the sequence (U1 (X, Y ), U2 (X, Y ), . . . ) is a martingale. Since the random variables Ui (X, Y ) remain bounded, the Martingale Convergence Theorem A.12 implies that this sequence is convergent with probability 1. In other words, the sequence of functions (U1 , U2 , . . . ) is convergent almost everywhere. Let U be its limit; we show that ∥U − Wn ∥ → 0. Fix any ε > 0. Then there is a k > 3/ε such that ∥U − Uk ∥1 < ε/3. Fixing this k, there is an n0 such that ∥Uk − Wn,k ∥1 < ε/3 for all n ≥ n0 . Then δ (U, Wn ) ≤ δ (U, Uk ) + δ (Uk , Wn,k ) + δ (Wn,k , Wn ) ≤ ∥U − Uk ∥1 + ∥Uk − Wn,k ∥1 + δ (Wn,k , Wn ) ≤
ε ε ε + + = ε. 3 3 3
This completes the proof of Theorem 9.23.
f0 by any uniformly bounded subset The theorem remains valid if we replace W f f1 , δ ) is also compact. of W, closed in the δ distance. For example, the space (W This can be proved by the same argument, or by noticing that W 7→ 2W − 1 is a f0 , δ ) → (W f1 , δ ) that is continuous and surjective, and so it preserves mapping (W compactness. An easy consequence of Theorem 9.23 is the following: Corollary 9.25. For every ε > 0 there is an integer k(ε) ≥ 1 such that simple graphs with kε nodes form an ε-net in (W0 , δ ). We conclude this section with showing that Theorem 9.23 implies the Strong Regularity Lemma quite easily. Proof of Lemma 9.16. Every graphon W is the limit of stepfunctions in the ∥.∥1 norm, hence there is a stepfunction U ∈ W0 with ∥W − U ∥1 ≤ ε0 . This means that the sets B1 (U, ε0 ), where U is a stepfunction, cover the whole space W0 . Unfortunately, these sets are not open in the d metric. Therefore, we take a little larger sets. Let k(U ) denote the number of steps of a stepfunction U , and define AU = {W ∈ W0 : (∃V ∈ W0 ) ∥U − V ∥ < εk(U ) , ∥V − W ∥1 < ε0 }. Claim 9.26. The set AU is open in the cut norm. Indeed, let W ∈ AU and Wn ∈ W0 such that ∥W − Wn ∥ → 0. By the definition of AU , there is a graphon V such that ∥U −V ∥ < εk(U ) and ∥V −W ∥1 < ε0 . By Proposition 8.25, there are graphons Vn such that ∥V − Vn ∥ → 0 and ∥Wn − Vn ∥1 → ∥V − W ∥1 < ε0 . So if n is large enough, we have ∥U − Vn ∥ < εk(U ) and ∥Vn − Wn ∥1 < ε0 , showing that Wn ∈ AU . f0 . For every stepfunction U , we Now we have to go to the factor space W consider the sets eU = {W ∈ W f0 : (∃V ∈ W f0 ) δ (U, V ) < εk(U ) , δ1 (V, W ) < ε0 }. A
9.4. FRACTIONAL AND INTEGRAL OVERLAYS
151
f0 , δ ). The sets A eU cover eU is open in (W It is easy to see, using Claim 9.26, that A the whole space, so by the compactness of the space (Theorem 9.23), we obtain a eU = W f0 . finite set of stepfunctions U1 , . . . , Ut such that ∪ti=1 A i We claim that we can choose S(ε) = maxi≤t k(Ui ) to satisfy the requirements of the lemma. Indeed, for every graphon W there is a stepfunction Ui (1 ≤ i ≤ t) such eU , which means that there is a graphon V such that δ (Ui , V ) < εk(U ) that W ∈ A i and δ1 (V, W ) < ε0 . We can apply a measure preserving bijections φ, ψ to get d1 (V φ , W ) < ε0 and δ (Uiψ◦φ , V φ ) < εk(Ui ) . Since Uiψ◦φ is a stepfunction with k(Ui ) steps, we can take U = Uiψ◦φ and W ′ = V φ to complete the proof. Exercise 9.27. Prove that for every ε > 0 there is a positive integer S(ε) such that for every graphon W there is another graphon W ′ and a stepfunction U ∈ W0 with k ≤ S(ε) steps such that ∥W − W ′ ∥ ≤ ε/k! and ∥W ′ − U ∥1 ≤ ε.
9.4. Fractional and integral overlays Using the Regularity Lemma, we are now ready to discuss the problem of comparing the two distances δ and δb , as raised in Section 8.1.4. If two graphs G1 and G2 have the same number of nodes, then the inequality δ (G1 , G2 ) ≤ δb (G1 , G2 ), is easy, but what can we say in the other direction? (This is admittedly a very technical issue, but it comes up in all sorts of arguments.) Perhaps the following very close connection is true: Conjecture 9.28. For any two simple graphs G and G′ on n nodes, δb (G, G′ ) ≤ 2δ (G, G′ ). Unfortunately, I can only offer some weaker bounds (but these will be sufficient for the applications later). We will need these results for weighted graphs too. The following theorem is a combination of results of Borgs, Chayes, Lov´asz, S´os and Vesztergombi [2008] and Alon [unpublished]. Theorem 9.29. For any two edge-weighted graphs H1 and H2 with the same number n of nodes, with edgeweights in [0, 1], we have the following inequalities: (9.11) (9.12) (9.13)
δb (H1 , H2 ) ≤ n6 δ (H1 , H2 ), 17 δb (H1 , H2 ) ≤ δ (H1 , H2 ) + √ log n 45 δb (H1 , H2 ) ≤ √ . − log δ (H1 , H2 )
Proof. The first inequality is quite easy. Let (Xui ) be an optimal fractional overlay of H1 and H2 . We claim that there is a bijection π : V1 → V2 such that Xu,π(u) ≥ 1/n3 for all u ∈ V (H1 ). This follows from the Marriage Theorem: if there is no such bijection, then there are two sets S ⊆ V1 and T ⊆ V2 such that |S| + |T | > n and Xst < 1/n3 for all s ∈ S and t ∈ T . Then X(S, T ) ≤ |S||T |/n3 < 1/n. On the other hand, X(S, T ) = X(S, V2 ) − X(S, V2 \ T ) ≥ a contradiction.
|S| + |T | − n 1 |S| |V2 \ T | − = ≥ , n n n n
´ 9. SZEMEREDI PARTITIONS
152
Let Y be the fractional overlay corresponding to the bijection π. Then Y ≤ n3 X, and hence δb (H1 , H2 ) ≤ d (H1 , H2 , Y ) ≤ n6 d (H1 , H2 , X) = n6 δ (H1 , H2 ). This proves (9.11). To prove the second inequality (9.12), let k = ⌊n1/3 ⌋. By the Weak Regularity Lemma 9.3, there are partitions P = {V1 , . . . , Vk } of V (H1 ) and Q = {U1 , . . . , Uk } of V (H2 ) into k almost equal classes so that 4 . d (H1 , (H1 )P ), d (H2 , (H2 )Q ) ≤ √ log k For the weighted k-node “template” graphs H1 /P and H2 /Q we have δ (H1 /P, H2 /Q) = δ ((H1 )P , (H2 )Q ) ≤ δ (H1 , H2 ) + δ (H1 , (H1 )P ) + δ (H2 , (H2 )Q ) 8 ≤ δ (H1 , H2 ) + √ . log k Let (Xij )ki,j=1 be an optimal fractional overlay of H1 /P and H2 /Q. We define a bijection φ : V (H1 ) → V (H2 ) by mapping ⌊Xij n⌋ nodes of Vi to Uj arbitrarily. This is possible, since k k ∑ ∑ ⌊Xij n⌋ ≤ Xij n = |Vi | j=1
and
j=1
k k ∑ ∑ ⌊Xij n⌋ ≤ Xij n = |Uj |. i=1
i=1
The nodes left in the two graphs are matched with each other arbitrarily. The bijection φ between V (H1 ) = V ((H1 )P ) and V (H2 ) = V ((H2 )Q ) defines a fractional overlay Y between H1 /P and H2 /Q, such that d (φ((H1 )P ), (H2 )Q ) = d (H1 /P, H2 /Q, Y ). The fractional overlays X and Y are very close: |Xij − Yij | ≤ 1/n for every 1 ≤ i, j ≤ k. Hence it follows that d (H1 /P, H2 /Q, Y ) ≤ d (H1 /P, H2 /Q, X) +
k2 k2 = δ (H1 /P, H2 /Q) + . n n
Combining, we get b 1 , H2 ) ≤ d (φ(H1 ), H2 ) ≤ d (φ((H1 )P ), (H2 )Q ) + √ 8 δ(H log k 8 k2 8 = d (H1 /P, H2 /Q, Y ) + √ ≤ δ (H1 /P, H2 /Q) + +√ n log k log k 16 k2 +√ ≤ δ (H1 , H2 ) + . n log k Recalling the choice of k, we get (9.12). Finally, the third inequality (9.13) follows easily from the first two. If n < δ (H1 , H2 )−1/7 , then (9.11) implies that 55 b 1 , H2 ) ≤ δ (H1 , H2 )1/7 < √ , δ(H − log δ (H1 , H2 )
9.4. FRACTIONAL AND INTEGRAL OVERLAYS
153
while for n ≥ δ (H1 , H2 )−1/7 , (9.12) gives that δb (H1 , H2 ) ≤ δ (H1 , H2 ) + √
17 log(δ (H1 , H2
)−1/7 )
45 0, let W, W1 , W2 be graphons and let P1 , P2 be equipartitions of [0, 1] into k parts so that ∥W − Wi ∥1 ≤ ε and ∥Wi − (Wi )Pi ∥ ≤ ε/k 4 . Then δ1 (W1 /P1 , W2 /P2 ) ≤ 8ε.
9.5. UNIQUENESS OF REGULARITY PARTITIONS
155
Figure 9.1. Proving uniqueness of strong regularity partitions. Heavy lines mean small cut distance, broken lines mean small L1 distance. Proof. Let Q = P1 ∨ P2 denote the common refinement of P1 and P2 ; clearly Q has at most k 2 classes, and ∥W1 −W2 ∥ ≤ 2ε. By the contractivity of the stepping operator (Exercise 9.17 of Proposition 14.13), we have ∥(W1 )P1 − (W2 )P1 ∥1 ≤ 2ε, ∥(W1 )P2 − (W2 )P2 ∥1 ≤ 2ε, ∥(W1 )Q − (W2 )Q ∥1 ≤ 2ε, and ∥(W1 )P2 − (W1 )Q ∥ ≤ ∥W1 − (W1 )P1 ∥ ≤ ε/k 4 , ∥(W2 )P1 − (W1 )Q ∥ ≤ ∥W2 − (W2 )P2 ∥ ≤ ε/k 4 (see Figure 9.1 for the chain of small distances followed by the proof). Hence δ1 (W1 /P1 , W2 /P2 ) = δ1 ((W1 )P1 , (W2 )P2 ) ≤ ∥(W1 )P1 − (W2 )P2 ∥1 ≤ ∥(W1 )P1 − (W2 )P1 ∥1 + ∥(W2 )P1 − (W1 )Q ∥1 + ∥(W1 )Q − (W2 )Q ∥1 + ∥(W2 )P1 − (W1 )Q ∥1 + ∥(W1 )P2 − (W2 )P2 ∥1 . By the trivial inequality (8.15), ∥(W2 )P1 − (W1 )Q ∥1 ≤ k 4 ∥(W2 )P1 − (W1 )Q ∥ ≤ ε, and similarly ∥(W1 )P2 − (W2 )Q ∥1 ≤ ε. Substituting these bounds, the theorem follows.
The proof above bounds the fractional edit distance of the two template graphs (W1 )/P1 and (W2 )/P2 . Using Pikhurko’s Theorem 9.30, we could replace it by the integral version of the edit distance δb1 , at the cost of another factor of 3. Using Exercise 8.18, we could replace the bound ∥Wi − (Wi )Pi ∥ ≤ ε/k 4 by ∥Wi − (Wi )Pi ∥ ≤ ε/(2k) Also, we could derive similar bounds for the weighted graphs W/P1 and W/P2 .
CHAPTER 10
Sampling We turn to the analysis of sampling from a graph, our basic method of gathering information about very large dense graphs. In fact, most of the time we prove our results in the framework of sampling from a graphon. We start with describing what it means to sample from a graphon. 10.1. W -random graphs A graphon W gives rise to a way of generating random graphs that are more general than the Erd˝os–R´enyi graphs. This construction was introduced independently by Diaconis and Freedman [1981], Bogu˜ n´a and Pastor-Satorras [2003], Lov´ asz and Szegedy [2006], and Bollob´as, Janson and Riordan [2007], and quite probably implicitly by others. Given a graphon W and an ordered set S = (x1 , . . . , xn ), where xi ∈ [0, 1], we define a weighted graph H(S, W ) on node set [n] by assigning weight W (xi , xj ) to edge ij (i, j ∈ [n], i ̸= j). We give weight 0 to the loops. Every weighted graph H with edgeweights βij (H) ∈ [0, 1] gives rise to a random simple graph G(H) on V (H): we connect nodes i and j with probability βij (H), making an independent decision for distinct pairs (i, j) (i, j ∈ ([n], i ̸= )j). In particular, we can construct a random simple graph G(S, W ) = G H(S, W ) . For an integer n > 0, we define the random weighted graph H(n, W ) = H(S, W ), and the random simple graph G(n, W ) = G(S, W ), where S is an ordered n-tuple of independent uniform random points from [0, 1]. To mention some special cases, if W is the identically p function, we get “ordinary” random graphs G(k, p). If W = WG for some simple graph G, then G(k, WG ) = H(k, WG ) is “almost” the same as the random induced subgraph G(k, G) of G. To be more precise, if we condition on x1 , . . . , xk belonging to different steps of WG , then G(k, WG ) is a random k-node induced subgraph. The set this condition excludes, namely sequences x1 , . . . , xk containing repetitions, has a ( ) measure at most k2 /v(G). Hence ( ) ( ) k 1 (10.1) dvar G(k, G), G(k, WG ) ≤ . 2 v(G) It is straightforward to extend this construction to generating a countable random graph G(W ) on N: We generate an infinite sequence X1 , X2 , . . . of independent uniformly distributed random points from [0, 1], and (as before) connect nodes i and j with probability W (Xi , Xj ). Remark 10.1. There are two ways of thinking about a graphon as a generalized graph. First, we can consider it as a weighted graph with node set [0, 1]. Second, we may think of each element x ∈ [0, 1] as an infinite set Sx of nodes with infinitesimally 157
158
10. SAMPLING
small measure, where there is a random bipartite graph Gx,y between Sx and Sy with density W (x, y). These random bipartite graphs must be independent as random variables, which makes this impossible to construct in standard measure theory (one can construct such an object in non-standard analysis, cf. Section 11.3.2). But often this is a useful informal way of thinking of a graphon. The two random samples H(n, W ) and G(n, W ) correspond to these two ways of looking at graphons. The definition of the sampling distance can also be extended from simple graphs to graphons (recall (1.2) for graphs): (10.2)
δsamp (U, W ) =
∞ ∑ ( ) 1 dvar G(k, U ), G(k, W ) . k 2
k=1
Using the fact that for any graphon U and simple graph F on node set [k], the probability that G(k, U ) = F is just tind (F, U ), we have for all U, W ∈ W0 ( ) 1 ∑ (10.3) dvar G(k, U ), G(k, W ) = |tind (F, U ) − tind (F, W )|. 2 simp F ∈Fk
Hence (10.4)
δsamp (U, W ) =
∑
2−v(F )−1 |tind (F, U ) − tind (F, W )|,
F
where F ranges through all finite graphs with V (F ) = {1, . . . , v(F )}. By (10.1) the distributions of G(k, G) and G(k, WG ) are almost the same if v(G) is large, and hence δsamp (F, G) − δsamp (WF , WG ) ≤ 4 . (10.5) v(G) While the sampling procedure described above is the most natural and most often used, we sometimes need to sample in other ways. In Lemma 10.18 we will describe a sampling method where the random selection of the nodes is more restricted, but which is still good enough to get the same information about W (however, we need much larger samples). There are other uses of graphons and kernels in generating random graphs. Bollob´as, Borgs, Chayes and Riordan [2010] and Bollob´as, Janson and Riordan [2007] study sparse random graphs generated from a nonnegative kernel W by constructing a (W/n)-random graph on n nodes. Bollob´as, Janson and Riordan [2010] and Bollob´as and Riordan [2009] study random trees generated from a graphon. Palla, Lov´ asz and Vicsek [2010] construct sparse random graphs as (W ⊗n )-random graphs with n′ nodes, where n and n′ are chosen so as to keep the average degree constant. We will not go into the details of these constructions. 10.2. Sample concentration If we take a bounded size sample from a graph, we can see very different graphs. For a sufficiently large random graph, for example, we can see anything. The natural way to use the sample G[S] is to compute some graph parameter f (G[S]). But this parameter can vary wildly with the choice of the sample, so what information do we get?
10.2. SAMPLE CONCENTRATION
159
The following theorem asserts that every reasonably smooth parameter of a sample is highly concentrated. (Note: we don’t say anything here about the connection between the value of the parameter on the whole graph and on the sample! We return to this question in Chapter 15.) Let us define a reasonably smooth graph parameter as a parameter f satisfying |f (G) − f (G′ )| ≤ 1 for any two graphs G and G′ on the same node set whose edge sets differ only in edges incident with a single node. More generally, we define parameter of edge-weighted graphs as reasonably smooth if |f (H) − f (H ′ )| ≤ 1 for any two edge-weighted graphs on the same node set that differ only in the weights of edges incident with a single node. Theorem 10.2 (Sample Concentration for Graphs). Let f be a reasonably smooth graph) parameter, let G be a graph, and let 1 ≤ k ≤ v(G). Let f0 = ( E f (G(k, G)) , then for every t ≥ 0, ( ( √ ) ) P f G(k, G) ≥ f0 + 2tk ≤ e−t . The result extends to graphons. We formulate two versions, corresponding to the two sampling methods defined above. Theorem 10.3 (Sample Concentration for Graphons). (a) Let f be a reasonably( smooth simple graph parameter, let W ∈ W0 , and let k ≥ 1. Let ) f0 = E f (G(k, W )) , then for every t ≥ 0, ( ( √ ) ) P f G(k, W ) ≥ f0 + 2tk ≤ e−t . (b) Let f be a reasonably( smooth parameter of edge-weighted graphs. Let W ∈ ) W, let k ≥ 1, and let f0 = E f (H(k, W )) , then for every t > 0, ( ( √ ) ) P f H(k, W ) ≥ f0 + 2tk ≤ e−t . In both theorems, we can apply the same inequality to the function −f , to obtain a bound on the probability of a large deviation from the mean in the other direction. ( ) Proof. The function f G({x1 , . . . , xk }, W ) (as a function of x1 , . . . , xk ∈ [0, 1]) satisfies the conditions of Corollary A.15 of Azuma’s Inequality, and hence applying the inequality with n = k and ε = (2t/k)1/2 , the inequality in (a) follows. The proof of (b) is essentially the same. ( ) Applying this theorem with f (G) = v(G)/v(F ) tinj (F, G) (which is reasonably smooth), and combining it with (5.21), we get the following concentration inequalities for subgraph densities: Corollary 10.4. Let W ∈ W0 , n ≥ 1, 0 < ε < 1, and let F be a simple graph, then the W -random graph G = G(n, W ) satisfies ) ( ( ) ε2 n P |tinj (F, G) − t(F, W )| > ε ≤ 2 exp − 2v(F )2 and ) ( ( ) ε2 n . P |t(F, G) − t(F, W )| > ε ≤ 2 exp − 8v(F )2 We will see in Section 10.4 that not only numerical parameters of subgraph samples are concentrated, but the samples themselves are concentrated in the cut distance.
160
10. SAMPLING
10.3. Estimating the distance by sampling 10.3.1. The main sampling lemma. Among the main technical tools used in this book are a couple of probabilistic theorems, which relate sampling to cut distance. The first of these theorems is due to Alon, Fernandez de la Vega, Kannan and Karpinski [2003], with an improvement by Borgs, Chayes, Lov´asz, S´os and Vesztergombi [2008]. Its proof will be quite involved. Its main implication is that the d -distance of two graphs on the same set of nodes can be estimated by sampling. Lemma 10.5 (First Sampling Lemma for Graphs). Let G and H be weighted graphs with V (G) = V (H), with the same node weights, and with edge weights in [0, 1]. Let k ≤ v(G) be a positive integer, and let S be chosen uniformly from all √ subsets of V (G) of size k. Then with probability at least 1 − 4e− k/10 , 8 d (G[S], H[S]) − d (G, H) ≤ 1/4 . k This Lemma extends to kernels, and this is the form which we prove. For U ∈ W and X = (X1 , . . . , Xk ) ⊆ [0, 1], let U [X] denote the symmetric k × k matrix defined by (U [X])ij = U (Xi , Xj ). Lemma 10.6 (First Sampling Lemma for Kernels). Let U ∈ W1 and let√X be a random ordered of k-subset of [0, 1]. Then with probability at least 1 − 4e− k/10 , 3 8 − ≤ ∥U [X]∥ − ∥U ∥ ≤ 1/4 . k k Not only are the lower and upper bounds in this lemma different, they are also quite different in difficulty. To prove the lower bound is rather straightforward, but the proof of the upper bound will need a couple of lemmas about a tricky sampling procedure estimating the sum of entries of a matrix. It will be more convenient to work with the following one-sided version of the cut norm: ∑ 1 ∥A∥+ Aij = n2 max S,T ⊆[n] i∈S,j∈T
for an n × n matrix A, and ∥W ∥+ =
∫ sup
W (x, y) dx dy
S,T ⊆[0,1] S×T
+ for a kernel W . We note that ∥A∥ = max{∥A∥+ , ∥−A∥ }, and similarly for the cut norm of kernels. In terms of this norm, we are going to prove the following similar bounds:
Lemma 10.7. Let U ∈ W1 and let √ X be a random ordered of k-subset of [0, 1]. − k/10 , Then with probability at least 1 − 2e 3 8 − ≤ ∥U [X]∥ − ∥U ∥ ≤ 1/4 . k k Let B = U [X]. For any set Q of rows and any set Q2 of columns, we set 1 ∑ + B(Q1 , Q2 ) = B . We denote by Q the set of columns j ∈ [k] for 1 i∈Q1 ,j∈Q2 ij − which B(Q1 , {j}) > 0. We define the set of columns Q1 and the sets of rows − + + Q+ 2 , Q2 analogously. Note that B(Q1 , Q1 ), B(Q2 , Q2 ) ≥ 0 by this definition.
10.3. ESTIMATING THE DISTANCE BY SAMPLING
161
We start with proving an inequality for the case when only a random subset Q of columns is selected. Lemma 10.8. Let S1 , S2 ⊆ [k], and let Q be random q-subset of [k] (1 ≤ q ≤ k). Then ( ) k2 B(S1 , S2 ) ≤ EQ B((Q ∩ S2 )+ , S2 ) + √ . q Proof. The inequality is clearly equivalent to the following: (10.6)
( ) k2 EQ B((Q ∩ S2 )− , S2 ) ≤ √ . q
Note that there is no absolute value on the left side: the expectation of B(Q ∩ S2 )− , S2 ) can be very negative, but not very positive. The lemma says that the set Q ∩ S2 )− tends to pick out those rows whose sum∑is small. ∑ 2 Consider row i of B. Let m = |S2 |, bi = j∈S2 Bij , ci = j∈S2 Bij and ∑ Ai = j∈Q∩S2 Bij . The contribution of row i to the left side is bi if Ai ≤ 0 (i.e., i ∈ (Q ∩ S2 )− ), and 0 otherwise. So the expected contribution of row i is P(Ai ≤ 0)bi . If bi ≤ 0, then this contribution is nonpositive. Else, we use Chebyshev’s inequality to estimate the probability of Ai ≤ 0. We have E(Ai ) = qbi /k and Var(Ai ) < qci /k. Hence ( kci qbi qbi ) k 2 Var(Ai ) P(Ai ≤ 0) ≤ P Ai − < 2. ≤ ≥ k k q 2 b2i qbi The probability on the left is at most 1, and so we can bound it from above by its square root: √ √ kci P(Ai ≤ 0) ≤ P(Ai ≤ 0) ≤ √ . qbi √ ( ) So the contribution of row i to EQ B((Q ∩ S2 )− , S2 ) is P(Ai ≤ 0)bi ≤ kci /q ≤ √ k/ q. Summing over all i ∈ S1 , inequality (10.6) follows. The following lemma gives an upper bound on the one-sided cut norm, using the sampling procedure from the previous lemma. Lemma 10.9. Let S1 , S2 ⊆ [k], and let Q1 and Q2 be random q-subsets of [k], (1 ≤ q ≤ k). Then ( ) 1 2 + + ≤ ∥B∥+ E max B(R , R ) +√ . Q ,Q 1 2 2 1 2 Ri ⊆Qi k q The Lemma estimates the (one-sided) cut norm by maximizing only over certain rectangles (at the cost of averaging these estimates). the main point for our purposes will be that (for a fixed Q1 and Q2 ), the number of rectangles to consider is only 4q , as opposed to 4k in the definition of the cut norm. Proof. Fix any two sets S1 , S2 ⊆ [k]. By Lemma 10.8, (10.7)
( ) k2 B(S1 , S2 ) ≤ EQ2 B((Q2 ∩ S2 )+ , S2 ) + √ . q
162
10. SAMPLING
We apply Lemma 10.8 again, interchanging the roles of rows and columns: ( k2 B((Q2 ∩ S2 )+ , S2 ) ≤ EQ1 B((Q2 ∩ S2 )+ , (Q1 ∩ (Q2 ∩ S2 )+ )+ ) + √ q ( ) k2 ≤ EQ1 max B(R1+ , R2+ ) + √ . Ri ⊆Qi q Substituting in (10.7), the Lemma follows.
Now we can turn to the main part of the proof. + Proof of Lemma 10.7. To bound the difference ∥B∥+ −∥U ∥ , we first bound its expectation. For any two measurable subsets S1 , S2 ⊂ [0, 1], we have
∥B∥+ ≥
1 U (S1 ∩ X, S2 ∩ X), k2
∑ (where U (Z1 , Z2 ) = x∈Z1 ,y∈Z2 U (x, y) for finite subsets Z1 , Z2 ⊂ [0, 1]). Choosing the set X randomly, we get ( ) ( ) 1 EX ∥B∥+ ≥ 2 EX U (S1 ∩ X, S2 ∩ X) k ∫ ∫ 1 k−1 U (x, y) dx dy + U (x, x) dx = k k S1 ×S2 S1 ∩S2 ∫ 2 ≥ U (x, y) dx dy − k S1 ×S2
Taking the supremum of the right side over all measurable sets S1 , S2 we get ( ) 2 + EX ∥B∥+ ≥ ∥U ∥ − . k From here, the bound follows by sample concentration (Theorem 10.3). + To prove an upper bound on the difference ∥B∥+ − ∥U ∥ , let Q1 and Q2 be √ random q-subsets of [k], where q = ⌊ k/4⌋. Lemma 10.9 say that for every X, ( ) 1 2 + + ∥B∥+ ≤ E max B(R , R ) +√ . Q ,Q 1 2 2 1 2 R ⊆Q k q i i Next we take expectation over the choice of X. More precisely, we fix the sets Ri ⊆ Qi ⊆ [k], and also those ∑ points Xi ∈ [0, 1] for which i ∈ Q = Q1 ∪ Q2 . Define Y1 = {y ∈ [0, 1] : i∈R1 U (Xi , y) > 0}, and define Y2 analogously. Let X ′ = (Xi : i ∈ [k] \ Q), then for every i ∈ S∫1 \ Q and j ∈ S2 \ Q, the contribution of the term U (Xi , Xj ) to EX ′ B(R2+ , R1+ ) is Y1 ×Y2 U ≤ ∥U ∥+ . The contribution of the remaining terms U (Xi , Xj ) with either i ∈ Q or j ∈ Q is at most 2k|Q| ≤ 4kq in absolute value. Hence (10.8)
EX ′ B(R2+ , R1+ ) ≤ k 2 ∥U ∥+ + 4kq.
Next we show that the value of B(R2+ , R1+ ) is highly concentrated around its expectation. This is a function of the independent random variables Xi , i ∈ [k] \ Q, and if we change the value of one of these Xi , the sum B(R2+ , R1+ ) changes by at most 4k (there are fewer than 2k entries that may change, and each of them by at
10.3. ESTIMATING THE DISTANCE BY SAMPLING
163
most 2). We can apply Corollary A.15 of Azuma’s Inequality, and conclude that with probability at least 1 − e−1.9q , we have √ √ + 4kq + 7.9k kq. B(R2+ , R1+ ) ≤ EX ′ B(R2+ , R1+ ) + 7.9k kq ≤ k 2 ∥U ∥+ The number of possible pairs of sets R1 and R2 is 4q , and hence with probability at least 1 − 4q e−1.9q > 1 − e−q/2 , this holds for all R1 ⊆ Q1 and R2 ⊆ Q2 , and so it holds for the maximum. Taking expectation over Q1 and Q2 does not change this, so we get that with probability (over X) at least 1 − e−q/2 , we have √ 4q 7.9 q 2 + √ ∥B∥+ + + . ≤ ∥U ∥ + √ q k k This implies the upper bound in the lemma by simple computation (if k large enough). Proof of Lemma 10.6. Applying Lemma 10.7 to both kernels U and −U , √ − k/10 with probability at least 1 − 4e all four inequalities will hold, and in this case so do the inequalities in the Lemma. 10.3.2. First applications. We can apply the First Sampling lemma when U = W1 − W2 is a difference of two graphons. Considering Wi [X] as the edgeweighted graph H(X, Wi ), Lemma 10.6 implies the following: Corollary 10.10. Let W1 , W2 ∈ W0 and let X be a sequence of k ≥ 1 random points of [0, 1] chosen independently from the uniform distribution. Then with prob√ − k/10 ability at least 1 − 4e , 8 d (H(X, W1 ), H(X, W2 )) − ∥W1 − W2 ∥ ≤ 1/4 . k In terms of the random weighted ( graphs H(k, W1 ) )and H(k, W2 ) this means that they can be coupled so that d H(k, W1 ), H(k, W2 ) ≈ δ (W1 , W2 ) with high probability. We will see that more is true: H(k, W ) will be close to W in the cut distance with high probability. (However, quantitatively “closeness” will be much weaker.) We have seen that the cut distance of two samples H(k, W1 ) and H(k, W2 ) is close to the distance of W1 and W2 (if coupled appropriately). How about the simple graphs G(k, W1 ) and G(k, W2 )? The following simple lemma shows that if k is large enough, then G(k, W ) is close to H(k, W ), so similar conclusions hold. Lemma 10.11. For every edge-weighted graph H with edgeweights in [0, 1], and √ for every ε ≥ 10/ q, P(d (G(H), H) > ε) ≤ e−ε q /100 . √ Applying this inequality with ε = 10/ q and bounding the distance by 1 in the exceptional cases, we get the inequality ( ) 11 (10.9) E(d G(H), H) ≤ √ . q 2 2
Note that no similar assertion would hold for the distances d1 or d2 . For example, if all edgeweights of H are 1/2, then d1 (G(H), H) = d2 (G(H), H) = 1/2 for any instance of G(H).
164
10. SAMPLING
( ) Proof. For i, j ∈ [q], define the random variable Xij = 1 ij ∈ E(G(H)) . Let S and T be two disjoint subsets of [q]. Then the Xij (i ∈ S, j ∈ T ) are independent, and E(Xij ) = βij (H), which gives that ∑ ( ) eG(H) (S, T ) − eH (S, T ) = Xij − E(Xij ) . i∈S, j∈T
Let us call the pair (S, T ) bad, if |eG(H) (S, T ) − eH (S, T )| > εq 2 /4. The probability of this can be estimated by the Chernoff–Hoeffding Inequality: ( ∑ ( ) ( ( −ε2 q 2 ) ) 1 ε2 q 4 ) Xij − E(Xij ) > εq 2 ≤ 2 exp − P ≤ 2 exp . 4 32|S||T | 32 i∈S, j∈T
The number of disjoint pairs (S, T ) is 3q , and so the probability that there is a bad 2 2 2 2 pair is bounded by 2 · 3q e−ε q /32 < e−ε q /100 . If there is no bad pair, then it is easy to see that d (G(H), H) ≤ ε (cf. Exercise 8.4). This completes the proof. This lemma implies that the weighted sample in the First Sampling Lemma can be replaced by a simple graph at little cost. We state one corollary: Corollary 10.12. Let W1 , W2 ∈ W0 and k ≥ 1. Then the random graphs G(k, W1 ) √ and G(k, W2 ) can be coupled so that with probability at least 1 − 5e− k/10 , ( ) 10 d G(k, W1 ), G(k, W2 ) − ∥W1 − W2 ∥ ≤ 1/4 . k Exercise 10.13. Derive the First Sampling Lemma for graphs (Lemma 10.5) from the graphon version (Lemma 10.6). Attention: sampling from a graph G and sampling from WG does not quite give the same distribution! Exercise 10.14. Prove the (much easier) analogue of the First Sampling Lemma for the edit distance: Let G and H be simple graphs V (G) = V (H). Let k ≤ v(G) be a positive integer, and let S be chosen uniformly from all ordered subsets of V (G) of size k. Then ) ( (k − 1)n d1 (G, H), E d1 (G[S], H[S]) = k(n − 1) 2
and for every ε > 0, with probability at least 1 − 2e−kε /2 , d1 (G[S], H[S]) − d1 (G, H) ≤ ε.
10.4. The distance of a sample from the original A second lemma about sampling that will be used very often, due to Borgs, Chayes, Lov´ asz, S´os and Vesztergombi [2008], shows that a sample is close to the original graph (or graphon) with high probability. Note that here we have to use the δ distance, rather than the d distance, since the graphs have different number of nodes, and no overlaying is given a priori. Also note that the bound on the distance is much weaker than in the previous lemma (but it does tend to 0 with the sample size). Lemma 10.15 (Second Sampling Lemma for Graphs). Let k ≥ 1, and let G be on at least k nodes. Then with probability at least ( a simple graph ) 1 − exp −k/(2 log k) , ( ) 20 δ G, G(k, G) ≤ √ . log k
10.4. THE DISTANCE OF A SAMPLE FROM THE ORIGINAL
165
The Second Sampling Lemma also extends to graphons, which can be stated in terms of the W -random graphs H(k, W ) and G(k, W ). Lemma 10.16 (Second Sampling Lemma for Graphons). ( Let k ≥ 1,) and let W ∈ W0 be a graphon. Then with probability at least 1 − exp −k/(2 log k) , 20 δ (H(k, W ), W ) ≤ √ , log k and
22 δ (G(k, W ), W ) ≤ √ . log k Proof. First we prove that these inequalities hold in expectation. Let m = ⌈k 1/4 ⌉. By Lemma 9.15, there is an equipartition P = {V1 , . . . , Vm } of [0, 1] into m classes such that 8 d (W, WP ) ≤ √ . log k Let S be a random k-subset of [0, 1], then by the First Sampling Lemma 10.6, we have d (W [S], WP [S]) − d (W, WP ) ≤ 8 k 1/4 with high probability. This implies that ) ( 10 E d (W [S], WP [S]) − d (W, WP ) ≤ 1/4 k (k is large enough for this, else the bound in the lemma is trivial), and so ) ( ) ( E d (W [S], WP [S]) ≤ E d (W [S], WP [S]) − d (W, WP ) + d (W, WP ) 9 . ≤√ log k So it suffices to prove that δ (WP , WP [S]) is small on the average. Let H = WP [S]. The graphons WP and WH are almost the same: both are stepfunctions with m steps, with the same function values on corresponding steps. The only difference is that the measure of the i-th step Vi in WP is 1/m, while the measure of the i-th step in WH is |Vi ∩ S|/k, which is expected to be close to 1/m if k is large enough. ∑ Write |Vi ∩ S|/k = 1/m + ri , then it is easy to see that δ (WP , WH ) ≤ i |ri |. Hence it is easy to estimate the expectation of this distance, using elementary probability theory: √ √ ( ) ∑ ( ) ( ) m−1 1 2 E δ (WP , WH ) ≤ E |ri | = mE |r1 | ≤ m E(r1 ) = < 3/8 . k k i Hence
( ) ( ) E(δ (W, W [S]) ≤ δ (W, WP ) + E δ (WP , WP [S]) + E δ (WP [S], W [S]) 8 1 9 18 ≤√ + +√ ≤√ . log k k 3/8 log k log k ( ) A similar estimate for δ W, G(k, W ) follows if we invoke inequality (10.9): ( ) ( ) ( ) E δ (W, G(k, W )) ≤ E δ (W, H(k, W )) + E δ (H(k, W ), G(k, W )) 18 11 20 ≤√ +√ 25, else there is nothing to prove. The Second Sampling Lemma implies that with probability at ( ) least 1 − 2 exp −r/(2 log r) , we have 22 . δ (G′ [T ], G′ ) ≤ √ log r Now we can generate G′ [T ] = G(r, G′ ) in the following way: we choose a random sequence X of r independent uniform points in [0, 1]; if they belong to different intervals Ji = [(i − 1)/k, i/k], then we return G(X, W ); else, we try again. This gives us a coupling between G(r, W ) and G(r, G′ ) such that ( ) r(r − 1) P G(r, W ) ̸= G(r, G′ ) ≤ P(∃i : |X ∩ Ji | ≥ 2) ≤ . k Invoking the Second Sampling Lemma again, with probability at least 1 − ( ) 2 exp −r/(2 log r) we have 22 δ (G(r, W ), W ) ≤ √ , log r and hence with probability at least 1 − 4 exp(−
r r(r − 1) 5 )− ≥1− √ 2 log r k k
we have 176 44 ≤√ . δ (G′ , W ) ≤ δ (G′ , G′ [T ]) + δ (G(r, W ), W ) ≤ √ log r log k
10.5. COUNTING LEMMA
167
Exercise 10.19. Consider the template graph H of a weak regularity partition, with k almost equal classes, of a large graph G, and turn it to a simple graph by the method of Lemma 10.11. Prove that the (random) simple graph G(H) ( ) √ obtained this way satisfies, with high probability, δ G, G(H) ≤ 10/ log k. Exercise 10.20. Let k ≥ 1, let W be a graphon, and let S1 and S2 be two independent random k-subsets of [0, 1]. Then with probability at least 1 − 21−k , ( ) 22 δb G(S1 , W ), G(S2 , W ) ≤ √ log k (note the “hat” over the δ). Exercise 10.21. Let f be a graph parameter and assume that |f (G) − f (G′ )| ≤ d (G, G′ ) for any two graphs on the same node set. Then for every graph G and 1 ≤ k ≤ v(G) there is a value f0 such that if S ⊆ V (G) is a random k-subset, then 22 |f (G[S]) − f0 | < √ log k with probability at least 1 − o(1).
10.5. Counting Lemma It is time to relate the two main quantities we introduced to study large dense graphs and graphons: homomorphism densities (which are equivalent to sample distributions) and the cut distance. The following simple but fundamental relation between them, due to Lov´ asz and Szegedy [2006], is a generalization of the “Counting Lemma” in the theory of Szemer´edi partitions. (Lemma 10.32, which will be more difficult to prove, will state a certain converse of this fact.) We start with a combinatorial formulation. Lemma 10.22 (Counting Lemma for Graphs). For any three simple graphs F , G and G′ |t(F, G) − t(F, G′ )| ≤ e(F ) δ (G, G′ ). The Lemma extends to graphons: Lemma 10.23 (Counting Lemma for Graphons). Let F be a simple graph and let W, W ′ ∈ W0 . Then |t(F, W ) − t(F, W ′ )| ≤ e(F )δ (W, W ′ ). This lemma shows that for any simple graph F , the function W 7→ t(F, W ) is Lipschitz-continuous on W0 in the metric δ . At the end of this section, we state several further versions of the Counting Lemma as exercises. The proof of the lemma will be given in the more general setting of W0 -decorated graphs (which actually makes the proof simpler!). Recall that a W0 -decorated graph is a simple graph in which a graphon We is assigned to each edge e. Also recall the definition (7.16) of homomorphism density of such a decorated graph. Lemma 10.24 (Counting Lemma for decorated graphs). Let (F, w) and (F, w′ ) be two W0 -decorated graphs with the same underlying simple graph, where w = (We : e ∈ E) and w′ = (We′ : e ∈ E). Then ∑ |t(F, w) − t(F, w′ )| ≤ ∥We − We′ ∥ . e∈E(F )
168
10. SAMPLING
Proof. It suffices to prove this bound for the case when We = We′ for all edges ′ but one. Let F = (V, E), and let uv be the edge with Wuv ̸= Wuv . Then ∫ ∏ ( ) ′ (xu , xv ) dx t(F, w) − t(F, w′ ) = Wij (xi , xj ) Wuv (xu , xv ) − Wuv [0,1]V ij∈E(F )\{uv}
∫
=
( ) ′ f (x)g(x) Wuv (xu , xv ) − Wuv (xu , xv ) dx,
[0,1]V
where f (x) =
∏
Wij (xi , xj )
ij∈∇(u)\uv
does not depend on xv , and satisfies 0 ≤ f ≤ 1. Similarly, ∏ g(x) = Wij (xi , xj ) ij∈E\∇(u)
does not depend on xu , and satisfies 0 ≤ g ≤ 1. Fixing all variables except xu and xv , we get the following estimate by Lemma 8.10: ∫ ( ) ′ ′ f (x)g(x) Wuv (xu , xv ) − Wuv (xu , xv ) dxu dxv ≤ ∥Wuv − Wuv ∥ . [0,1]2
Integrating over the remaining variables, we get that ′ |t(F, w) − t(F, w′ )| ≤ ∥Wuv − Wuv ∥ .
From this lemma, along with (10.3) and (7.4), it is easy to derive a relationship between the variation distance of the distributions of the random graphs G(k, U ) and G(k, W ), and the cut distance of U and W . Corollary 10.25. Let U and W be two graphons, then for every k ≥ 2, we have ( ) 2 dvar G(k, U ), G(k, W ) ≤ 2k δ (U, W ). Exercise 10.26. Show that the Counting Lemma does not hold for multigraphs, not even for F = C2 . Exercise 10.27. Let F be a simple graph with m edges and let W, W ′ ∈ W1 . Then |t(F, W ) − t(F, W ′ )| ≤ 4mδ (W, W ′ ). Exercise 10.28. Let F be a simple graph with m edges and let W ∈ W1 . Then |t(F, W )| ≤ 4m∥W ∥ . Exercise 10.29. For every W1 -decorated graph (F, w), t(F, w) ≤ 4 min ∥We ∥ . e∈E(F )
Exercise 10.30. Prove the following “induced” version of the Counting Lemma: If F is a simple graph and U, W ∈ W0 , then ( ) k |tind (F, U ) − tind (F, W )| ≤ 4 ∥U − W ∥ . 2 2
Use this to improve the coefficient 2k in Corollary 10.25.
10.6. INVERSE COUNTING LEMMA
169
10.6. Inverse Counting Lemma Our goal is to establish a converse to the Counting Lemma: if two “large” graphs are locally close (in the sense of sampling or homomorphism densities) then they are globally close (in the sense of cut distance). This treatment is based on Borgs, Chayes, Lov´ asz, S´os and Vesztergombi [2008]. We prove two versions, both of which will play an important role later on. Lemma 10.31. Let U and W be two graphons and suppose that for some k ≥ 2, we have ( ( ) k ) dvar G(k, U ), G(k, W ) < 1 − 2 exp − . 2 log k Then 50 δ (U, W ) ≤ √ . log k Note that the bound on the variation distance of the distributions of the random subgraphs G(k, U ) and G(k, W ) is very weak: a tiny overlap between them already implies that the graphons U and W are close. Applying the lemma to WG1 and WG2 gives a similar result for two large graphs. Proof. The assumption implies that we can couple G(k, U )( and G(k, W)) so that G(k, U ) = G(k, W ) with probability larger than 2 exp −k/(2 log k) . The( Second Sampling Lemma 10.16 implies that with probability at least 1 − ) exp −k/(2 log k) , we have ( ) 22 δ U, G(k, U ) ≤ √ , log k and similar assertion holds for W . It follows that with positive probability all three happen, and then we get ( ) ( ) 50 . δ (U, W ) ≤ δ U, G(k, U ) + δ W, G(k, W ) ≤ √ log k Lemma 10.32 (Inverse Counting Lemma). Let k be a positive integer, let U, W ∈ W0 , and assume that for every simple graph F on k nodes, we have |t(F, U ) − t(F, W )| ≤ 2−k . 2
Then δ (U, W ) ≤ √
50 . log k
Proof. Assume that U, W ∈ W0 satisfy |t(F, U ) − t(F, W )| ≤ 2−k
2
for every graph F with k nodes. This implies (by inclusion-exclusion) that k k+1 2 |tind (F, U ) − tind (F, W )| ≤ 2(2) 2−k = 2−( 2 ) .
In terms of the W -random graphs G(k, U ) and G(k, W ), ( ) ( ) k+1 P G(k, U ) = F − P G(k, W ) = F ≤ 2−( 2 ) .
170
10. SAMPLING
Hence ) ( ) ( ) ∑ ( k k+1 P G(k, U ) = F − P G(k, W ) = F ≤ 2(2) 2−( 2 ) dvar G(k, U ), G(k, W ) = F
( = 2−k < 1 − 2 exp −
k ) . 2 log k An application of Lemma 10.31 completes the proof.
Exercise 10.33. Prove that for any two graphons U and W , √ ( ) 1 400 1 ≤ log ≤ exp . log 2 δ (U, W ) δsamp (U, W ) δ (U, W )
10.7. Weak isomorphism II An important consequence of (10.4) is that two weakly isomorphic graphons have sampling distance 0, i.e., they are indistinguishable by sampling. The converse of this assertion follows by the same kind of argument. The significance of this easy remark is that it allows us to relate weak isomorphism to the cut distance via the sampling distance. We start with a more general fact, showing the topological equivalence of the sampling distance and the cut distance on the space of graphons. We have noted that two graphons U and W are weakly isomorphic (i.e., t(F, U ) = t(F, W ) for every simple graph) if and only if their sampling distance is 0. The Counting Lemma and the Inverse Counting Lemma imply that two graphons are weakly isomorphic if and only if their cut distance is 0. It is easy to see that this implies the same conclusion for general kernels: Corollary 10.34. Two kernels U and W are weakly isomorphic if and only if δ (U, W ) = 0. Since δ (U, W ) = 0 expresses the existence of a correspondence between the points of the two graphons, this theorem can be considered as a generalization of Theorem 5.29 (which can be derived from it with some effort). The proof of Corollary 10.34, if we include the proofs of the Counting Lemma and the Inverse Counting Lemma, is quite long, and in particular the proof of the Inverse Counting Lemma, which builds on the First Sampling Lemma, is quite involved. One can get a more direct proof using only rather standard analysis; see Exercise 11.27. Theorem 8.13 and its Corollary 8.14, for the special case of the cut norm and cut-distance 0, imply the following further characterizations of weak isomorphism: Corollary 10.35. (a) Two kernels U and W are weakly isomorphic if and only if there exist measure preserving maps φ, ψ : [0, 1] → [0, 1] such that U φ = W ψ almost everywhere. (b) Two kernels U and W are weakly isomorphic if and only if there exists a coupling measure µ on [0, 1]2 such that for two random samples (x1 , y1 ) and (x2 , y2 ) from µ, we have U (x1 , x2 ) = W (y1 , y2 ) with probability 1. As a further corollary we get the following fact (stated before as Exercise 7.18): Corollary 10.36. If two kernels U, W ∈ W are weakly isomorphic, then t(F, U ) = t(F, W ) holds for all multigraphs F .
10.7. WEAK ISOMORPHISM II
Exercise 10.37. Construct the coupling measures in Theorem 8.13 for the cut distance of the three weakly isomorphic graphons in Example 7.11. Exercise 10.38. Show by an example that the sampling distance and the cut distance do not define the same topology on the set of finite graphs.
171
CHAPTER 11
Convergence of dense graph sequences Finally we have come to the central topic of this book: convergent graph sequences and their limits. The two key elements, namely sampling and graphons, have been introduced in the Introduction. Here we take our time to look at them from various aspects. 11.1. Sampling, homomorphism densities and cut distance Recall from the introduction that we can define a notion of convergence if we fix a sampling method. For dense graphs, we use subgraph sampling: We select uniformly a random k-element subset of V (G), and return the subgraph induced by it. The probability that we see a given graph F is the quantity tind (F, G) introduced in (5.13). A sequence of graphs (Gn ) with v(Gn ) → ∞ is convergent if the induced subgraph densities tind (F, Gn ) converge for every finite graph F . It is often more convenient to define convergence using the homomorphism densities t(F, Gn ) or the subgraph densities tinj (F, Gn ). This does not change the notion of convergence as introduced above in terms of sampling. Indeed, subgraph densities can be expressed as linear combinations of induced subgraph densities and vice versa (we have discussed such relations in Section 5.2.3), and hence tinj (F, Gn ) tends to a limit as n → ∞ if and only if tind (F, Gn ) does. For the homomorphism densities the ( ) argument is a bit more involved: we know that t(F, G) − tinj (F, G) = O 1/v(G) , and so this difference tends to 0 if v(G) → ∞. Hence t(F, Gn ) tends to a limit as n → ∞ if and only if tinj (F, Gn ) does. This notion of convergence of dense graphs is often called left-convergence, since it is based on homomorphisms “from the left”. In the case of dense graphs, this notion is rather robust (it seems to be the only reasonable way to define convergence), and hence we call it simply “convergence”. The parallel notion of right-convergence (which will turn out to be equivalent, at least if defined properly) will be discussed in Chapter 12. Many examples of convergent graph sequences will be shown in Section 11.4.2, but let us describe a couple of very simple ones here. Example 11.1. The sequence of complete graphs is convergent, since a random induced k-node subgraph is always a complete graph itself. Example 11.2. Fix any 0 ≤ p ≤ 0, and generate a random graph Gn = G(n, p) for every n. This sequence will be convergent with probability 1. Indeed, for large n, a random induced k-subgraph Gn [S] of Gn will be very close in distribution to G(k, p), for most choices of Gn . (Note that what we mean here is that Gn is fixed, the randomness comes from the choice of the k-subset S). This is not hard to verify directly, using elementary probability theory; it also follows from the much stronger results in Chapter 10. 173
174
11. CONVERGENCE OF DENSE GRAPH SEQUENCES
The definition of convergence can be reformulated using the notion of sampling distance (1.2): a sequence (Gn ) of simple graphs with v(Gn ) → ∞ is convergent if for every graph F , (tind (F, Gn ) : n = 1, 2, . . . ) is a Cauchy sequence (equivalently, (t(F, Gn ) : n = 1, 2, . . . ) is a Cauchy sequence). This is equivalent to saying that the graph sequence is Cauchy in the δsamp metric (1.2). The following theorem of Borgs, Chayes, Lov´ asz, S´os and Vesztergombi [2006, 2008], which is one of the main results in this theory, justifies the use of the cut metric δ . Theorem 11.3. A sequence (Gn ) of simple graphs with v(Gn ) → ∞ is convergent if and only if it is a Cauchy sequence in the metric δ . Proof. The Counting Lemma 10.22 implies that every Cauchy sequence in the metric δ is convergent. The Inverse Counting Lemma 10.32 (applied to the graphons WGn ) implies the converse. Remark 11.4. This proof builds on a fairly long chain of previous results, some of which, like the First Sampling Lemma 10.6, were quite involved. The advantage of this proof is that it gives a quantitative form of the equivalence of two convergence notions. As pointed out by Schrijver, a weaker qualitative form is easier to prove, inasmuch we can replace the use of the Inverse Counting Lemma by the characterization of weak isomorphism (for which a simple direct proof is sketched in Exercise f0 , δ ) (the graphon space) and 11.27). Indeed, consider the two metric spaces (W F [0, 1] (the space of graph parameters with values in [0, 1]). Both of these are compact (one by the Compactness Theorem 9.23, the other by Tychonoff’s Theorem). The map W 7→ t(., W ) is continuous by the Counting Lemma, and injective by Corollary 10.34, and hence its inverse is also continuous. For a convergent sequence of graphs, this means precisely that the graphons WGn form a convergent sequence f0 , δ ). in (W Theorem 11.3 can be generalized to characterize convergence in the space W. The proof is the same, except that the graphon versions of the Counting Lemmas must be used. Theorem 11.5. Let (Wn ) be a sequence of graphons in W0 and let W ∈ W0 . Then t(F, Wn ) converges for all finite simple graphs F if and only if Wn is a Cauchy sequence in the δ distance. Furthermore, t(F, Wn ) → t(F, W ) for all finite simple graphs F if and only if δ (Wn , W ) → 0. 11.2. Random graphs as limit objects Once we have defined convergence of a graph sequence, we would like to answer the question: what does it converge to? I have told you (and justified by some pictures) that the answer is “graphons”, but let us dwell on this question in a more abstract setting for a while. In an abstract sense, we can assign a “limit object” to every convergent sequence: we say that two convergent sequences are “equivalent” if interlacing them we get a convergent sequence, and the limit objects can be defined as equivalence classes of convergent sequences. This abstract definition is not of much help; we are going to describe much more explicit representations for the limit objects. Our favorite one is the graphon, but we’ll see other, equivalent representations in the form of random graph models, reflection positive graph parameters, and more.
11.2. RANDOM GRAPHS AS LIMIT OBJECTS
175
11.2.1. Finite random graph models. The first construction of a limit object, which we call the weak limit, is actually quite general: it can be constructed for convergence of any reasonable sequence of structures for which we have any reasonable sampling process. (We will see later, for example, how it works for graphs with bounded degree.) Given a simple graph G and k ∈ [v(G)], the random sample G(k, G) is a random graph on k labeled nodes; we denote its distribution by σG,k . Clearly σG,k (F ) = tind (F, G). If (G1 , G2 , . . . ) is a convergent graph sequence, then the distributions σGn ,k tend to some distribution σk on k-node labeled graphs. Conversely, if the distributions σGn ,k tend to a limit for every k, then the graph sequence is convergent. (The distribution σGn ,k may be undefined for a finite number of indices n for every fixed k.) So the sequence of limit distributions (σ1 , σ2 , . . . ) encodes the “limit” of the convergent graph sequence. Which sequences of distributions arise this way? A random graph model is a probability distribution σk on simple graphs on node set [k], for every k ≥ 1, which is invariant under the reordering of the nodes. In other words, it is a sequence of random variables Gk , whose values are simple graphs on [k], and isomorphic graphs have the same probability. We say that a random graph model is consistent if deleting node k from Gk , the distribution of the resulting graph is the same as the distribution of Gk−1 . In formulas, this means that for every graph H on k − 1 nodes, ∑ (11.1) σk−1 (H) = σk (F ), F : F ′ =H
where F ′ denotes the graph obtained by deleting node k from F . We say that the model is local, if for two disjoint subsets S, T ⊆ [k], the subgraphs of Gk induced by S and T are independent as random variables. We note that consistency, together with the invariance under reordering the nodes, implies that for every simple graph F on k nodes, the expectation ( ) E(tind F, Gn ) = σk (F ) is independent of n once n ≥ k. Example 11.6. For every graphon W , the random graph model Gk = G(k, W ) is both consistent and local, which is trivial to check. (It will turn out that this example represents all such models.) Theorem 11.7. If a graph sequence (G1 , G2 , . . . ) is convergent, then the distributions σk = limn→∞ σk,Gn form a consistent and local random graph model. Conversely, every consistent and local random graph model arises this way. Before proving this theorem, we need some preparation. Let G be a graph and k ≤ v(G). The sequence of distributions (σG,1 , σG,2 , . . . ) is not quite consistent, because it breaks down for k > v(G); but it is consistent for the values of k for which it is defined. There is a more serious problem with locality: selecting i distinct random nodes of G will bias the selection of the remaining k − i, if we insist on selecting distinct points. So locality will be only approximately true. We can fix both problems if we consider the slightly modified distributions ′ ′ ′ σG,k (F ) = tind (F, WG ). The sequence (σG,1 , σG,2 , . . . ) is consistent. The random graphs corresponding to this sequence of distributions are G(k, WG ). (We could also generate this from G by selecting the k random nodes with replacement.) The ′ difference between σG,k and σG,k is very small if G is large: if we sample G(k, WG )
176
11. CONVERGENCE OF DENSE GRAPH SEQUENCES
and keep it iff the sampled points correspond to different nodes of G, and otherwise resample, then we get a sample from the distribution G(k, G). This shows that ( ) n(n − 1) . . . (n − k + 1) 1 k ′ (11.2) dvar (σG,k , σG,k ) ≤ 1 − < . nk n 2 As discussed in the introduction, random graphs satisfy quite strong laws of large numbers in the sense that two large random graphs are very much alike; this translates to the fact that a sequence of independently generated random graphs G(n, p) is convergent with probability 1. The next lemma shows that all local and consistent random graph models have a similar property. Lemma 11.8. Let (σ1 , σ2 , . . . ) be a local consistent random graph model, and generate a graph Gn from every σn , independently for different values of n. Then the sequence (G1 , G2 , . . . ) is convergent with probability 1. Proof. First we note that for every simple graph F on [k] and n ≥ k, we have ( ) (11.3) E tind (F, Gn ) = σk (F ). Indeed, consider any injective map φ : V (F ) → V (Gn ). It follows from the isomorphism invariance of σn that the probability that φ is an induced embedding is the same for every map φ, so it suffices to compute this probability when φ is the identity map on [k]. By the consistency of the model, this probability is P(Gk = F ) = σk (F ). Next we show that tind (F, Gn ) is concentrated around its expectation σk (F ). We could compute second moments, but this would not give a sufficiently good bound. So (sigh!) we compute the fourth moment. Let S1 , S2 , S3 , S4 be independent random ordered k-subsets of [n] (we assume that n > k 2 ). Define Xi = 1(Gn [Si ] = F ) − σk (F ). Note that E(Xi ) = 0 by (11.3), even if we condition on the choice of the Si , since the distribution of Gn [S] is the same for every ordered k-set S ⊆ [n]. Furthermore, ( ) (11.4) E(X1 X2 X3 X4 ) = E (tind (F, Gn ) − σk (F ))4 , since for a fixed Gn the variables Xi are independent, and E(Xi | Gn ) = tind (F, Gn ) − σk (F ). Let A denote the event that every Si meets at least one other Sj . The key observation is that E(X1 X2 X3 X4 | A) = 0. This follows since if the Si are fixed so that (say) S4 does not meet the others, then X4 is independent of {X1 , X2 , X3 }, and its expectation is 0. (This is where we use the assumption that our random graph model is local!) Thus E(X1 X2 X3 X4 ) ≤ E(X1 X2 X3 X4 | A)P(A) + E(X1 X2 X3 X4 | A)P(A) ≤ P(A) ≤
7k 4 . n2
(The last inequality follows by elementary combinatorics.) Thus we get that ( ) 7k 4 E (tind (F, Gn ) − σk (F ))4 ≤ 2 , n
11.2. RANDOM GRAPHS AS LIMIT OBJECTS
177
and hence by Markov’s Inequality (11.5)
( ) P(|tind (F, Gn ) − σk (F )| > ε) = P (tind (F, Gn ) − σk (F ))4 > ε4 ) 1 ( 7k 4 ≤ 4 E (tind (F, Gn ) − σk (F ))4 ≤ 4 2 . ε ε n
If we sum (11.5) for n ≥ 1 with a fixed ε > 0, then the sum of the right hand sides is convergent, so it follows by the Borel–Cantelli Lemma that with probability 1, |tind (F, Gn ) − σk (F )| > ε holds for a finite number of values of n only, and so tind (F, Gn ) → σk (F ) with probability 1. Hence the graph sequence Gn converges with probability 1. With this lemma at hand, our theorem is easy to prove. Proof of Theorem 11.7. First, consider a convergent graph sequence (G1 , G2 , . . . ) and the probability distributions σk defined by it. Consistency and ′ locality follow by the consistency and locality of the distributions σk,G . n Second, consider consistent and local random graph model (σ1 , σ2 , . . . ), and a sequence of random graphs Gn (n = 1, 2, . . . ), which are independently generated from distribution σn for different indices n. It follows from Lemma 11.8 that with probability 1, this graph sequence is convergent. Equation 11.3 implies that it reproduces the right random graph model. 11.2.2. Countable random graph models. We can arrange all labeled simple graphs in a locally finite rooted tree, where the empty graph is the root, and F ′ is the parent of F . If (σ1 , σ2 , . . . ) is a consistent sequence of distributions, then σk is a probability distribution on the k-th level of the tree, and the probability of each node is the sum of probabilities of its children. From this setup, we can combine all the distributions σk into a single probability distribution on all infinite paths starting at the root. To be more precise, let Ω denote the set of such paths, and let ΩF denote the set of paths passing through the node F . Then the sets ΩF generate a sigma-algebra A on Ω. The Kolmogorov Extension Theorem implies that there is a (unique) probability measure σ on (Ω, A) such that σ(ΩF ) = σk (F ) for every F . This is so far an abstract construction. We can, however, make explicit sense of the elements of Ω. A path in the tree starting at the root is a sequence (F0 , F1 , . . . ) ′ of graphs such that Fk = Fk+1 . Hence the path gives rise to the countable graph F = ∪n Fn on the set of positive integers N∗ . Conversely, every graph on N∗ corresponds to a path in the tree starting at the origin. Thus the points of Ω can be identified with the graphs on N∗ . The sets ΩF are obtained by fixing adjacency between a finite number of nodes. Thus σ can be thought of as a probability distribution on graphs on N∗ . A countable random graph model is a probability distribution σ on (Ω, A), invariant under permutations of N∗ . Such a random graph can also be considered as a symmetric exchangeable array of 0-1 valued random variables (we will come back to this way of looking at them in Section 11.3.3). The countable random graph model is local if for any two finite disjoint subsets S1 , S2 ⊆ N∗ , the subgraphs induced by S1 and S2 are independent (as random variables). The discussion above shows that every consistent random graph model defines a countable random graph model.
178
11. CONVERGENCE OF DENSE GRAPH SEQUENCES
There is a way to go back from a countable random graph model σ to finite consistent random graph models, which is even simpler: to get a random graph on [n], generate a random graph G from σ, and take the induced subgraph G[n]. It is easy to see that this random graph model is consistent. Furthermore, a consistent random graph model is local if and only if the corresponding countable random graph model is local. To sum up, Proposition 11.9. There is a bijection between consistent random graph models and countable random graph models. This bijection preserves locality. It follows that local countable random graph models can serve as representations of the limit objects for convergent graph sequences. Example 11.10 (Cliques and stable sets). Let G be either the complete graph or the edgeless graph on N∗ , each with probability 1/2. Clearly G does not depend on the ordering of N∗ , so it is a countable random graph model. The subgraph G[n] is also complete with probability 1/2 and edgeless with probability 1/2 (at least for n > 1). This defines a consistent random graph model. However, this model is not local: the subgraph induced by {0, 1} is not independent (in the probabilistic sense) from the subgraph induced by {2, 3}; to the contrary, they are always the same. Example 11.11 (The Rado graph). We construct a random graph on N∗ by connecting a pair of distinct integers i, j ∈ N∗ with probability 1/2, independently for different pairs. The resulting random graph G(N∗ , 1/2), called the Rado graph, has many interesting properties (some of these are stated in the exercises at the end of the section), but right now, what is important for us is that it is local by construction. The corresponding local and consistent random graph model is the ordinary random graph G(n, 1/2). Example 11.12 (Infinite W -random graph). We can extend the definition of W -random graphs (Section 10.1) to get a countable random graph G(N∗ , W ). Given a graphon W , we select a sequence of independent random points (X1 , X2 , . . . ) from [0, 1], and connect i and j (i, j ∈ N∗ ) with probability W (Xi , Xj ). This construction generalizes the Rado graph. It is immediate that the distribution of the infinite W -random graph is invariant under permutations of N∗ , and it is also local: for two disjoint sets S, T ⊆ N∗ , the graph we construct on S is independent (in the probability sense) from the graph we construct on T . We will see that all local countable random graph models can be constructed this way. Example 11.13 (Triangle-free random graph model). It is easy to construct a countable triangle-free graph that contains every finite triangle-free graph as an induced subgraph (Exercise 11.17. In a theory developed independently from ours, Petrov and Vershik [2010] prove that there exist a local countable random graph model that is triangle-free with probability 1. They also construct an appropriate graphon, which turns out 0-1 valued. We conclude this section with a construction of a convergent graph sequence from a countable random graph model (without the locality assumption) given independently by Lov´ asz and Szegedy [2012a] and Diaconis and Janson ([2008], Theorem 5.3).
11.2. RANDOM GRAPHS AS LIMIT OBJECTS
179
Proposition 11.14. Let G be a random graph on N∗ drawn from a countable random graph model. Let G[n] denote the subgraph induced by [n]. Then the sequence (G[1], G[2], . . . ) is convergent with probability 1. Proof. For every fixed simple graph F , the sequence (tinj (F, G[n]) : n = ( 1, 2, . . . ) )is a reverse martingale for n ≥ v(F ) in the sense that E tinj (F, G[n − 1]) | G[n] = tinj (F, G[n]) (this follows by the simple averaging principle (5.27)). By the Reverse Martingale Convergence Theorem A.17, it follows that this sequence is convergent with probability 1. Hence with probability 1, (tinj (F, G[n]) : n = 1, 2, . . . ) is convergent for every F . This last proposition may sound similar to the construction in Lemma 11.8, but there is a significant difference: in this construction, locality is not needed. Unlike in Lemma 11.8, G[n] and G[m] are not independently generated. If we apply the construction in Lemma 11.8 twice, and then pick the even-indexed graphs from one sequence and interlace them with the odd-indexed graphs from the other, we get a sequence that is constructed in the same way, and so it is convergent with probability 1. This means that almost all sequences generated by Lemma 11.8 (for a fixed consistent and local random graph model) have the same limit. In contrast to this, running the construction in Proposition 11.14 twice we could not necessarily interlace the resulting sequences into a single convergent sequence: in Example 11.10, we get a sequence of growing cliques with probability 1/2 and a sequence of growing edgeless graphs with probability 1/2. Both of these sequences are convergent, but they don’t have the same limit. Sequences constructed from one and the same countable random graph model are almost always convergent, but they may converge to different limits. In view of Examples 11.6 and 11.12, we also get: Corollary 11.15. For every graphon W , generating a W -random graph G(n, W ) for n = 1, 2, . . . we get a convergent sequence with probability 1, whose limiting countable random graph model is G(N∗ , W ). Exercise 11.16. (a) Prove that the Rado graph almost surely has the extension property: for any two disjoint finite subsets S, T ⊆ N∗ there is a node connected to all nodes in S but to no node in T . (b) Prove that every countable graph with the extension property is isomorphic to the Rado graph. (c) Prove that if you generate two Rado graphs independently, they will be isomorphic with probability 1. Exercise 11.17. Construct a universal triangle-free graph: a countable graph containing every finite triangle-free graph as an induced subgraph. Exercise 11.18. (a) We can define a random graph G(N∗ , p) for all 0 < p < 1. Prove that with probability 1, this random graph will be isomorphic to the Rado graph for any p. (b) More generally, if W is a graphon with 0 < W (x, y) < 1 for all x, y ∈ [0, 1], then G(n, W ) is almost always isomorphic to the Rado graph. (c) Construct a graphon W such that two independent countable W -random graphs are almost surely non-isomorphic [G´ abor Kun]. Exercise 11.19. Show that without the assumption of locality, Lemma 11.8 does not remain valid.
180
11. CONVERGENCE OF DENSE GRAPH SEQUENCES
Exercise 11.20. Prove that if we generate two sequences as in Proposition 11.14 from a local countable random graph model, then interlacing the two sequences, we get a convergent sequence.
11.3. The limit graphon In this section we give a more explicit description of limit objects for convergent graph sequences: we show that graphons (up to weak isomorphism) are precisely the structures that are needed. 11.3.1. Existence. Let (Gn ) be a convergent graph sequence, so that the densities t(F, Gn ) tend to a limit t(F ) for every finite simple graph F . We know that a limit object can be described as a consistent and local random graph model (finite or countable). The main motivation behind introducing graphons is that they provide a much more explicit representation for this limit object, as the following theorem shows (Lov´ asz and Szegedy [2006]). Theorem 11.21. For any convergent sequence (Gn ) of simple graphs there exists a graphon W such that t(F, Gn ) → t(F, W ) for every simple graph F . We say that this graphon W is the limit of the graph sequence, and write Gn → W . The reader might wonder if one really needs complicated objects like integrable functions to describe limits of graph sequences; would perhaps piecewise linear, or monotone, or continuous functions suffice? It turns out that (up to weak isomorphism) all measurable functions are needed: Every graphon W can be obtained as the limit of a convergent sequence of simple graphs; this follows by Corollary 11.15. There are three quite different ways to prove Theorem 11.21. The original one by Lov´ asz and Szegedy [2006] uses Szemer´edi partitions and the Martingale Convergence Theorem. This is very closely related to the proof of the compactness of the graphon space (Theorem 9.23). Below we will use compactness to prove Theorem 11.21, but we could as well go the other way, since Theorem 11.21 easily implies the compactness of the graphon space (Exercise 11.28). A more recent proof by Elek and Szegedy [2012] constructs a different limit object first, in the form of a graph on a very large sigma-algebra, by taking an ultraproduct; then obtains the graphon as an appropriate projection of this. This proof technique is quite general, it extends to hypergraphs and many other structures. As a third route to prove Theorem 11.21, it was shown by Diaconis and Janson [2008] that it can be derived from results of Aldous [1981] and Hoover [1979] on exchangeable random variables, pointing out a basic connection to probability theory. We will sketch these alternative proofs in the next two sections. f0 , δ ) is Proof of Theorem 11.21. By Theorem 9.23, the metric space (W compact, and hence the sequence (Wn = WGn : n = 1, 2, . . . ) has a convergent f0 . By the Counting Lemma subsequence (Wnj : j = 1, 2, . . . ) with limit W ∈ W 10.23, we have for every simple graph F |t(F, Wnj ) − t(F, W )| ≤ e(F ) δ (Wnj , W ) −→ 0
(j → ∞),
and so t(F, Wnj ) = t(F, Gnj ) → t(F, W ). Since (t(F, Gn ) : n = 1, 2, . . . ) is a Cauchy sequence, this implies that t(F, Gn ) → t(F, W ) for every simple graph F.
11.3. THE LIMIT GRAPHON
181
There may be several graphons W representing the limit of a convergent graph sequence. Of course, we may change the value of W on a set of measure 0; but more generally, we can replace W by any other kernel weakly isomorphic with W . Conversely, of W and W ′ both represent the limit of a convergent graph sequence, then they are weakly isomorphic by the definition of weak isomorphism. (Weak isomorphism has been discussed in Sections 7.3 and 10.7, and we will say more about it in Section 13.2.) Convergence to the limit object can also be characterized by the distance function: Theorem 11.22. For a sequence (Gn ) of graphs with v(Gn ) → ∞ and graphon W , we have Gn → W if and only if δ (WGn , W ) → 0. Proof. If δ (WGn , W ) → 0, then Gn → W follows by the Counting Lemma just like in the proof of Theorem 11.21 above. Conversely, suppose that Gn → W , so t(F, Gn ) → t(F, W ) for every simple graph F , and hence for every fixed k, the Inverse Counting Lemma 10.32 implies that δ (WGn , W ) ≤ √
20 log k
if n is large enough. Hence δ (WGn , W ) → 0 as claimed. (This proof gives an explicit connection between the rates of convergence in Gn → W and δ (WGn , W ) → 0. If we don’t care about this, the theorem follows by f0 , δ ) is compact, the map W 7→ (t(F, W ) : F ∈ abstract arguments: the space (W F) is continuous and injective, hence its inverse is also continuous.) 11.3.2. Ultralimit and limit. The theory of ultraproducts and ultralimits provides a general way to construct limit objects. This proof technique has some drawbacks: it is non-constructive, and requires advanced special techniques from model theory (outlined in Appendix A.5). On the other hand, its advantage is that it is very flexible: one can define the limit of virtually any kind of sequence of structures this way, and only have to wonder later whether this limit object can be “brought down to earth” to have combinatorial (or algebraic, or arithmetic) significance. Let ω be an ultrafilter on N∗ . Recall that we call the sets in ω “Large”, the other subsets of N∗ , “Small”. As a first ∏ try, let us define the limit of a graph sequence (G1 , G2 , . . . ) as the ultraproduct ω Gn . This assigns a limit object to every graph sequence, not just to those that are convergent. Unfortunately, this ultraproduct will depend on the ultrafilter we use, even for convergent graph sequences (so the situation is not as simple as for numerical sequences, see Exercise 11.31). For example, let Hn be the edgeless graph √ and let it be a graph consisting of a clique of size ⌊ n⌋ on n nodes if n is even, √ together with n − ⌊ n⌋ isolated nodes if n is odd. This graph sequence is clearly ∏ convergent. On the other hand, ω Gn has no edge if the set of odd integers is Small, and it consists of a clique of continuum size and continuum many isolated nodes if the set of odd integers is Large (and either can happen). While it is of the same cardinality as the whole ultraproduct, this clique should occupy a negligible part of this huge product graph. To make this precise, we have to introduce a measure on the product. Let (G1 , G2 , . . . ) be a sequence of finite graphs. For every graph Gn = (Vn , En ), we can consider the (finite) sigma-algebra
182
11. CONVERGENCE OF DENSE GRAPH SEQUENCES
An = 2Vn of all subsets of Vn , and ∏ the uniform probability measure πn on Vn . Then the ultraproduct G = (V, E) = is also equipped ω Gn is a graph whose node set V∏ ∏ with a sigma-algebra A = ω An and a probability measure π = ω πn on ∏ it. This answers our concern with the example above: the ultraproduct ω Hn becomes edgeless if we delete an appropriate set of nodes of measure zero. However, the construction is still not satisfactory. First, the sigma-algebra A is an ugly one: for example, it is not separable in general. Second, the set E of edges of the product, as a subset of V × V , may not be measurable with respect to the product sigma-algebra A × A. This is really serious: we would like to be able to assert, for example, that if the graphs Gn have edge density 1/2, then the limit has edge density 1/2, which means that (π × π)(E) = 1/2. However, this assertion does not even make sense! The way out of this trouble is to do∏some fiddling with the ultraproduct. Let us start with taking the ultraproduct ω (Vn × Vn ). This ultraproduct can be identified with V × V in a natural way: if an , bn ∈ Vn , then (11.6)
[(a1 , b1 ), (a2 , b2 ), . . . ] ↔ ([a1 , a2 , . . . ], [b1 , b2 , . . . ]).
It is easy to see that if the bracket on the left side is represented by another equivalent sequence of pairs, then the brackets on the right don’t change, and vice versa. ∏ As before, sets of the form ω Sn (Sn ⊆ Vn × Vn ) form a set algebra on V × V , which generates a sigma-algebra A′ . In particular, ∏ we can take the ultraproduct of the edge sets of the graphs Gn , to get E = ω En ∈ A′ . Furthermore, the ultraproduct η of the uniform measures on Vn × Vn is a probability measure on (V × V, A′ ). by sets of the form ∏ It is easy to see that the sigma-algebra A × A is generated ′ (S × T ), where S , T ⊆ V . Hence A × A ⊆ A . In general, we don’t have n n n n n ω equality (this is why E is not measurable in A×A). But the following lemma shows that in an important way, A′ is not too much larger than A × A: Lemma 11.23. For every set B ∈ A′ and every x ∈ V , the neighborhood B(x) = {y ∈ V : (x, y) ∈ B} belongs to A. ∏ Proof. First, we consider the case when B = ω Bn , and let x = [x1 , x2 , . . . ]. ∏ We claim that B(x)∏= ω Bn (xn ) (which of course implies that B(x) ∈ A). Indeed: y = [y1 , y2 , . . . ] ∈ ω Bn (xn ) if and only if yn ∈ Bn (xn ) for a large set of indices n, if and only if (xn , yn ) ∈ Bn for a large set of indices n, if and only if [x, y] ∈ B. Now taking complements and countable intersections of sets B ∈ A′ corresponds to carrying out the same operations on the sets B(x) (where x ∈ V is fixed), so these sets stay in A. The measure µ, when restricted to A × A, gives the probability measure π × π. Hence we can take the conditional expectation W = E(1E | A × A), which is a function on V × V , measurable with respect to A × A, and has the property that ∫ W (x, y) dπ(x) dπ(y) = µ(X ∩ E) (11.7) X
for every set X ∈ A × A. This results in a probability space (V, A, π) with a symmetric measurable function W : V × V → [0, 1].
11.3. THE LIMIT GRAPHON
183
We show that W can serve as the limit graphon (at least as long as we ignore the ugliness of the underlying sigma-algebra): Proposition 11.24. For every simple graph F and any sequence (Gn : 1, 2, . . . ) of graphs, lim t(F, Gn ) = t(F, W ).
n =
ω
If the graph sequence (G1 , G2 , . . . ) is convergent, then the ultralimit on the left side is equal to limn→∞ t(F, Gn ), independently from ω. Proof. Let V (F ) = [k]. As a first step, we express the left hand side in the ultraproduct space. We have to introduce more sigma-algebras for ∏ this. For every set U ⊆ [k], we take the set VnU of all maps∏U → Vn . The set ω VnU can be identified with V U , just like (11.6) identifies ω Vn × Vn with V × V . The ultraproduct of the Boolean algebra of all subsets of VnU gives a sigma-algebra AU on V U . The ultraproduct of the uniform measures on the sets VnU gives a measure τU on the product sigma-algebra on V U . We abbreviate τ[k] by τk . The∏set Hom(F, Gn ) of homomorphisms from F into Gn is ∏ a subset of Vnk , ∏ k k and so ω Hom(F, Gn ) is a subset of ω Vn = V . The set ω Hom(F, Gn ) can be identified with the set Hom(F, G) ⊆ V k of homomorphisms of F into G. Furthermore, (∏ ) ( ) lim t(F, Gn ) = τk Hom(F, Gn ) = τk Hom(F, G) ω
ω
by the definition of the ultraproduct of measures. So it suffices to prove that ( ) (11.8) τk Hom(F, G) = t(F, W ). Let (X1 , . . . , Xk ) ∈ V k be a random node chosen from the distribution τk . Then we can rephrase the equality to be proved as ( ∏ ) ( ∏ ) (11.9) E 1E (Xi , Xj ) = E W (Xi , Xj ) . ij∈E(F )
ij∈E(F )
(It is easy to check that the functions 1E (Xi , Xj ) are measurable with respect to the sigma-algebra A[k] .) If the random variables 1E (Xi , Xj ) were independent, we could take the expectation factor-by-factor, and we would be done. But of course they are not. The trick in the proof is to replace the factors 1E (Xi , Xj ) by W (Xi , Xj ) one by one. Consider any edge uv of F ; we show that ( ∏ ) ( ∏ ) (11.10) E 1E (Xi , Xj ) = E 1E (Xi , Xj )W (Xu , Xv )) . ij∈E(F )
ij̸=uv
This will show that we can replace 1E (Xu , Xv ) by W (Xu , Xv ) without changing the expectation, and repeating a similar argument for all edges of F , we get (11.9). For notational convenience, assume that u = 1 and v = 2. The main difficulty in the rest of the argument is to be careful about measurability, because we have several sigma-algebras floating around. Using Lemma 11.23, it is not hard to argue that fixing X1 and X2 , the functions 1E (Xi , Xj ) are measurable with respect to A[k] , and so the expectation ∏ f (X1 , X2 ) = EX3 ,...,Xk 1E (Xi , Xj ) ij∈E(F ){i,j}̸={1,2}
184
11. CONVERGENCE OF DENSE GRAPH SEQUENCES
is well defined. Furthermore, again by Lemma 11.23, if we fix X3 , . . . , Xk , then every function 1E (Xi , Xj ) ({i, j} ̸= {1, 2}) becomes either constant, or A{1} -measurable (if the edge ij is incident with 1) or A{2} -measurable (if ij is incident with 2). Hence it follows that f (x1 , x2 ) is A{1} × A{2} -measurable. By the definition of W , this implies that ∫ ∫ f (x1 , x2 )1E (x1 , x2 ) dτ{1,2} (x1 , x2 ) = f (x1 , x2 )W (x1 , x2 ) dπ(x1 ) dπ(x2 ), V ×V
which proves (11.10).
V ×V
To finish the construction of the limit graphon, one has to map this big sigmaalgebra (V, A) onto [0, 1], define the appropriate image of W , and show that it represents the same subgraph densities. We refer to the paper of Elek and Szegedy [2012] for these details. Remark 11.25. If you compare this construction of the limit object with the construction given before, some parallelism between their elements is apparent. For example, (11.7), applied to a generator set X = S × T , asserts that the density of edges of G between S and T is the same as the integral of W on S × T , so W and 1E have distance 0 in the cut norm. Perhaps further exploration of this parallelism could shed some light on the nature of the use of non-constructive infinite methods like ultraproducts in the theory of (very large, but) finite graphs. In the last proof, one needs to handle various sigma-algebras; indeed, a little “calculus of sigma-algebras” was used. Elek and Szegedy push this much further, and develop more from this combinatorial theory of sigma-algebras, which enables them to extend this construction to hypergraphs; cf. Section 23.3. 11.3.3. Exchangeable random variables. Let (G1 , G2 , . . . ) be a convergent graph sequence. We start with the weak limit of the sequence in the form of a local countable random graph model σ. Let G be a graph from this distribution, ( ) and let (Xij )∞ i,j=1 be its adjacency matrix; in other words, Xij = 1(ij ∈ E G) . It follows from the invariance of σ under permutations that the random variables Xij have the same distribution: P(Xij = 1) = limn→∞ t(K2 , Gn ). They are not independent, but have the following property, which is called symmetrically exchangeable: if α is a permutation of N∗ , then for every k ≥ 1, the joint distribution of (Xij : 1 ≤ i, j ≤ n) is the same as the joint distribution of (Xα(i)α(j) : 1 ≤ i, j ≤ n). This is of course just a reformulation of the condition that σ is invariant. Now symmetrically exchangeable random variables have a representation theorem due to Aldous [1981] and Hoover [1979]: every such system can be represented as a mixture of random variables of the form Xij = W (Yi , Yj ), where Yi (i ∈ N∗ ) are independent random variables with values in [0, 1] and W : [0, 1]2 → [0, 1] is a symmetric measurable function in two variables. Furthermore, as Diaconis and Janson [2008] show, if the random graph G is local, then you don’t get a real mixture: local exchangeable distributions are precisely the extreme points of the space of symmetrically exchangeable random variables, in other words, random variables of the form Xij = W (Yi , Yj ). So the representation theorem of Aldous and Hoover provides the limit graphon W directly! For the details of this theory and its application here, we refer to the monograph of Kallenberg [2005] and the paper [2008]. See also Austin [2008] for a survey of the many other applications of this approach.
11.4. PROVING CONVERGENCE
185
Exercise 11.26. Prove that if a graph sequence (Gn ) satisfies v(Gn ) → ∞, then it is Cauchy in the cut distance if and only if it is Cauchy in the sampling distance. Exercise 11.27. Prove the following facts: (a) For every stepfunction W , δ1 (W, H(n, W )) → 0 as n → ∞ with probability 1. (b) For every graphon W , δ1 (W, H(n, W )) → 0 as n → ∞ with probability 1. (c) For every graphon W , δ (G(n, W ), H(n, W )) → 0 as n → ∞ with probability 1. (d) If U and W are weakly isomorphic graphons, then G(n, U ) and G(n, W ) have the same distribution. (e) If U and W are weakly isomorphic graphons, then δ (U, W ) = 0 (A. Schrijver). Exercise 11.28. Show that Theorems 11.21, 11.22 and Corollary 11.15 imply f0 , δ ) is compact. that the space (W Exercise 11.29. Prove that the following properties of graphs are inherited to their ultraproduct: (a) 3-regular; (b) all degrees bounded by 10; (c) triangle-free; (d) containing a triangle; (e) bipartite; (f) disconnected. Exercise 11.30. Prove that the following properties of graphs are not inherited to their ultraproduct: (a) connected; (b) all degrees are even; (c) non-bipartite. Exercise 11.31. (a) Prove that every bounded sequence of real numbers has a unique ultralimit. (b) Prove that the limω (ai + bi ) = limω ai + limω bi . (c) Prove that the ultralimit limω ai is independent of the choice of the ultrafilter ω if and only if the sequence is convergent in the classical sense.
11.4. Proving convergence It is not always easy to show that a certain graph sequence is convergent. We have several characterizations (in terms of subgraph densities, sample distribution, cut distance) and sometimes one, sometimes the other condition is easier to apply. First we develop some useful sufficient conditions for convergence, and then apply these to give a number of examples of interesting convergent graph sequences. 11.4.1. Convergence of sampling methods. We start with a supplement to Corollary 11.15: Proposition 11.32. For every graphon W , generating a W -random graph G(n, W ) for n = 1, 2, . . . we get a graph sequence such that G(n, W ) → W with probability 1. Proof. It is straightforward to check that for every simple graph F and n ≥ v(F ), E(tinj (F, G(n, W )) = t(f, W ). Since tinj (F, G(n, W ) is highly concentrated around its expectation (by Theorem 10.2 or by the proof of Lemma 11.8), it follows that tinj (F, G(n, W ) → t(f, W ) with probability 1. Sometimes we need that other sequences constructed by similar, but different sampling from a graphon are convergent. We describe one lemma of this type by Borgs, Chayes, Lov´ asz, S´os and Vesztergombi [2008] [2011]. We start with a definition. For every n ≥ 1, let Sn ⊆ [0, 1] be a finite set such that |Sn | → ∞. We say that the sequence (Sn ) is well distributed , if |Sn ∩J|/|Sn | → λ(J) for every interval J as n → ∞. Equivalently, the uniform measure on Sn converges weakly to the uniform measure on [0, 1] (see Billingsley [1999] for more on related notions like uniform distribution of sequences).
186
11. CONVERGENCE OF DENSE GRAPH SEQUENCES
Lemma 11.33. Let W ∈ W0 be almost everywhere continuous, and let Sn be a well distributed sequence of sets. Then G(Sn , W ) → W with probability 1. It is clear that such a conclusion cannot hold without some assumption on W , since a general measurable function could be changed on the sets Sn ×Sn arbitrarily without changing its subgraph densities. For an extension to graphons on general metric spaces, see Exercise 13.8. Note that there need not be any randomness in the sequence Sn . Proof. Consider a partition {Z1 , . . . , Zm } of [0, 1] into m intervals of equal length. Since (Sn ) is well distributed, we have 1/(m+1) ≤ |Sn ∩Zj |/|Sn | ≤ 1/(m−1) for every j if n ≥ n0 (m). Given any n that is large enough, we choose the largest m for which n ≥ n0 (m), and partition each set Zj into |Sn ∩ Zj | sets of equal measure, each containing exactly one point of Sn ∩ Zj , to get the partition Qn . This partition has the properties that |Qn | = |Sn |, every partition class contains exactly { one point of Sn , the maximum diameter of partition classes tends to 0, and max λ(Q)|Sn | − 1 : Q ∈ Qn } → 0. For s ∈ Sn , let Qns be the partition class of Qn containing s. Define the function Wn as follows: for s, s′ ∈ Sn and (x, y) ∈ Qns × Qns′ , let Wn (x, y) = W (s, s′ ). Then Wn (x, y) → W (x, y) in every point (x, y) where W is continuous, in particular Wn → W almost everywhere. This implies that (11.11)
∥Wn − W ∥1 −→ 0
(n → ∞).
We can view Wn as the graphon WHn associated with a weighted graph Hn with V (Hn ) = Sn , where the weight of node s ∈ Sn is λ(Qns ), and the weight of edge ss′ (s, s′ ∈ S) is W (s, s′ ). Note that Hn is almost the same weighted graph as Hn = H(Sn , W ): they are defined on the same set of nodes, the edges have the same weights, and the nodeweight λ(Qns ) is asymptotically 1/|Sn | by the construction of Qn . Given ε > 0, we have |λ(Qns ) − 1/|Sn || < ε/|Sn | if n is large enough. Hence there is a measure preserving bijection φ : [0, 1] → [0, 1] and a set R ⊆ [0, 1] of measure ε such that WHn (x, y) = WHφn (x, y)
(x, y ∈ / R).
This implies that (11.12)
δ1 (Hn , Hn ) −→ 0
(n → ∞).
Formulas (11.11) and (11.12) imply that H(Sn , W ) → W , which in turn implies that G(Sn , W ) → W with probability 1 (cf. Lemma 10.11). Corollary 11.34. Let W be a graphon that is almost everywhere continuous. Then G(Sn , W ) → W with probability 1, where (a) Sn ⊆ [0, 1] is obtained by selecting a uniform random point from every interval [j/n, (j + 1)/n], n = 0, . . . , n − 1; (b) Sn = {1/n, 2/n, . . . , n/n}. 11.4.2. Examples: Convergent graph sequences. We discuss a variety of examples of convergent graph sequences. It turns out that it is not always easy to prove that these are convergent and determine the limit graphon. In some examples, the limit can be guessed and the proof of convergence in the cut distance is easy. In other examples, it is quite tricky to guess the limit graphon. In other cases, the convergence of subgraph densities can be proved. We will also see a randomly
11.4. PROVING CONVERGENCE
187
growing graph sequence which is convergent with probability 1, but if we run it again, it converges to a different limit! We start with two easy examples. Example 11.35 (Complete bipartite graphs). It is natural to guess, and easy to prove, that complete bipartite graphs Kn,n converge to the graphon W (x, y) = 1(0 ≤ x ≤ 1/2 ≤ y ≤ 1) + 1(0 ≤ y ≤ 1/2 ≤ x ≤ 1). Example 11.36 (Simple threshold graphs). These graphs are defined on the set {1, . . . , n} by connecting i and j if and only if i + j ≤ n. These graphs converge to the graphon defined by 1(x + y ≤ 1), which we call the simple threshold graphon.
Figure 11.1. Simple threshold graphs and their limits A more interesting example is the following. Example 11.37 (Quasirandom graphs). A sequence of graphs tending to the identically-p function is exactly what we called a “quasirandom sequence” with density p (by the second property in Section 1.4.2). In particular, the Paley graphs (Example 1.1) converge to the graphon W ≡ 1/2. Example 11.38 (Multitype quasirandom graphs). Generalizing the previous example, we consider a multitype quasirandom graph sequence (Gn ) with a template graph H. This means that (assuming that V (H) = [q] and V (Gn ) = [n]) V (Gn ) has a partition (V1 , . . . , Vq ) such that |Vi | = αi (H)n + o(n) and for every fixed i, j ∈ [q], the bipartite graphs Gn [Vi , Vj ] form a quasirandom bipartite graph sequence with edge density βij (H) (in the case when i = j, the induced subgraphs Gn [Vi ] form a quasirandom graph sequence). A multitype quasirandom graph sequence with template graph H tends to the graphon WH , and vice versa. The first statement follows easily, since δ (WGn , WH ) → 0 by the definition of a multitype quasirandom sequence. The converse is less trivial; if δ (WGn , WH ) → 0, then there is a way to label the nodes of Gn so that ∥WGn − WH ∥ → 0 (this follows from Theorem 11.59 below), and then we can partition the nodes of Gn by putting node u in class i (u ∈ [n], i ∈ [q]) iff α1 (H) + · · · + αi−1 (H) ≤ u/n < α1 (H) + · · · + αi (H). It is not hard to verify that this partition has the right properties to guarantee that the sequence (Gn ) is multitype quasirandom with template H. We continue with several examples, given by Borgs, Chayes, Lov´asz, S´os and Vesztergombi [2011], of convergent graph sequences obtained by random growing processes.
188
11. CONVERGENCE OF DENSE GRAPH SEQUENCES
Example 11.39 (Growing uniform attachment graphs). We generate a randomly growing graph sequence Gua n as follows. We start with a single node. At the n-th iteration, a new node is born, and then every pair of nonadjacent nodes is connected with probability 1/n. We call this graph sequence a uniform attachment graph sequence; see Figure 1.8. Let us do some simple calculations. After n steps, let {0, 1, . . . , n − 1} be the nodes (born in this order). The probability that nodes i < j are not connected j n−1 · j+1 = nj . These events are independent for all pairs (i, j). The is j+1 j+2 · · · n expected degree of j is j−1 ∑ n−j
n
i=0
+
n−1 ∑
n−i n − 1 j(j − 1) = − . n 2 2n i=j+1
The expected number of edges is ) n−1 ( 1 ∑ n − 1 j(j − 1) n2 − 1 − = . 2 j=0 2 2n 6 To figure out the limit graphon, note that the probability that nodes i and j are connected is 1 − max(i, j)/n. If i = xn and j = yn, then this is 1 − max(x, y). This motivates the following: Proposition 11.40. The sequence Gua n tends to the limit function 1 − max(x, y) with probability 1. Proof. For a fixed n, the events that nodes i and j are connected are independent for different i, j, and so) by the computation above, Gua n has the same ( distribution as G Sn , 1 − max(x, y) , where Sn = {0, 1/n, . . . , (n − 1)/n}. It is easy to see that this sequence is well distributed in [0, 1], and so the Proposition follows by Lemma 11.33. One can get a good explicit bound on the convergence rate by estimating the cut-distance of WGua and 1 − max(x, y), using the Chernoff-Hoeffding bound. n Example 11.41 (Prefix attachment graphs). In this construction, it will be more convenient to label the nodes starting with 1. At the n-th iteration, a new node n is born, a node z is selected at random, and node n is connected to nodes 1, . . . , z − 1. We denote the n-th graph in the sequence by Gpfx n , and call this graph sequence a prefix attachment graph sequence (Figure 11.2). Again we start with some simple calculations. The probability that nodes i < j are connected is j−i j (but these events are not independent in this case!). The expected degree of j is therefore j−1 ∑ j−i i=1
j
+
n ∑ i−j j n = n − + j ln + o(n). i 2 j i=j+1
The expected number of edges is n(n − 1)/4. Looking at the picture, it seems that it tends to some function, which we can try to figure out similarly as in the case of uniform attachment graphs. The probability that i and j are connected can be written in a symmetric form as |j − i|/ max(i, j). If i = xn and j = yn, then this is |x − y|/ max(x, y).
11.4. PROVING CONVERGENCE
189
Figure 11.2. A randomly grown prefix attachment graph with 100 nodes, and the same graph with nodes ordered by their degrees. Does this mean that the graphon U (x, y) = |x − y|/ max(x, y) is the limit? Somewhat surprisingly, the answer is negative, which we can see by computing triangle The probability that three nodes i < j < k form a triangle is ( )(densities. ) 1 − kj 1 − ji (since if k is connected to j, then it is also connected to i). Hence the expected number of triangles is )( ) ( ) ∑ ( j i 1 n 1− 1− = . k j 6 3 i 0 such that if a simple graph G with n nodes has at most ε′ n3 triangles, then we can delete εn2 edges from G so that the remaining graph has no triangles. This lemma sounds innocent, almost like a trivial average computation. This is far from the truth! No simple proof is known, and (worse) all the known proofs give a terrible dependence of ε′ on ε. The best bound, due to Fox [2011] gives an 2··· ε′ such that 1/ε′ is a tower 22 of height about log(1/ε). The original proof gives a tower about 1/ε2 . Perhaps this looks friendlier (?) if we write it as √of height ∗ ′ ε ≈ 1/ log (1/ε ). The proof given below does not give any explicit bound, but it illustrates the way graph limit theory can be used. Proof. Suppose that the lemma is false. This means that there is an ε > 0 and a sequence of graphs (Gn ) such that t(K3 , Gn ) → 0 but deleting any set of εn2 edges, the remaining graph will contain a triangle. By selecting a subsequence, we may assume that t(F, Gn ) is convergent for every simple graph F , and then there is a graphon W such that Gn → W . We have then t(K3 , W ) = limn→∞ t(K3 , Gn ) = 0. The condition on the deletion of edges is harder to deal with, because it does not translate directly to any property of the limit graphon W . What we can do is to “pull back” information from W to the graphs Gn . By Theorem 11.59, we may assume that ∥WGn − W ∥ → 0. (This step is not absolutely necessary, but convenient.) Let S = {(x, y) ∈ [0, 1]2 : W (x, y) > 0}. By Lemma 8.22, we have ∫ ∫ (1 − 1S )WGn → (1 − 1S )W = 0, [0,1]2
[0,1]2
∫ so we can choose n large enough so that (1 − 1S )WGn < ε/4. Let V (Gn ) = [N ], Ji = [(i − 1)/N, i/N ], and Rij = Ji × Jj . We modify Gn by deleting the edge ij if λ(S ∩ Rij ) < 3/(4N 2 ). Claim 11.65. The remaining graph G′n is triangle-free. Indeed, suppose that i, j, k are three nodes such that λ(S ∩ Rij ) ≥ 3/(4N 2 ), λ(S ∩ Rjk ) ≥ 3/(4N 2 ) and λ(S ∩ Rik ) ≥ 3/(4N 2 ). Observe that t(K3 , W ) = 0
11.8. FIRST APPLICATIONS
199
implies that t(K3 , 1S ) = 0. But we have ∫ t(K3 , 1S ) ≥ 1S (x, y)1S (y, z)1S (x, z) dx dy dz Ji ×Jj ×Jk
1 1 1 1 − λ(Rij \ S) − λ(Rjk \ S) − λ(Rik \ S) 3 N N N N 1 1 1 ≥ 3 −3 = > 0, N 4N 3 4N 3 which is a contradiction. What is left is to bound the number m of edges deleted. This is easy: if edge ij is deleted, then WGn = 1 on Rij , and so ∫ ∫ 1 (1 − 1S )WGn = (1 − 1S ) = λ(Rij \ S) ≥ , 4N 2 ≥
Rij
Rij
We know that the number of deleted edges must be at least εN 2 , and so ∫ εN 2 ε (1 − 1S )WGn ≥ = , 4N 2 4 [0,1]2
which contradicts the choice of n.
If you know the usual derivation of the Removal Lemma from the Regularity Lemma, or have worked it out yourself by solving Exercise 11.66, you may feel that the two proofs are analogous; and you are quite right. One may even say that it is the same proof told in a different language, or at least, that it points out a nontrivial connection between the Regularity Lemma and measure theory. Indeed, other much deeper versions and applications of the Regularity Lemma (like the Regularity Lemma for hypergraphs or Szemer´edi’s Theorem on arithmetic progressions) can be proved using mainly measure theoretic arguments; see Elek and Szegedy [2012] [2012] for details. Exercise 11.66. (a) Let G be a graph on n nodes, and consider a Szemer´edi partition of G with k classes and error bound ε as in the Original Regularity Lemma. Let us delete all edges within the classes, between exceptional pairs of classes, and between classes where the edge-density is less than 100ε1/3 . Prove that if the remaining graph contains any triangle, then it contains at least ε(n/k)3 triangles. (b) Prove the Removal Lemma, based on (a). Exercise 11.67. Extend the proof of the Removal Lemma above to the following more general theorem: For every simple graph F and every ε > 0 there is an ε′ > 0 such that if a simple graph G with n nodes has t(F, G) ≤ ε′ , then we can delete εn2 edges from G so that F has no homomorphism into the remaining graph.
CHAPTER 12
Convergence from the right Recall from the Introduction the formula F −→
G −→ H,
which referred to the framework in which we study large graphs from “both sides”: by mapping small graphs into them, and mapping them into small graphs. So far, we have used homomorphisms into the large graph to define convergence. We have seen many examples, however, of homomorphism numbers from the large graph into fixed target graphs H which were very interesting: the number of k-colorings of a graph; Ising and Potts models in statistical physics; approximating maximum cuts. We have seen that homomorphism functions hom(G, .) have a characterization (Theorem 5.59) perfectly analogous to the characterization of homomorphism functions hom(., G) (Corollary 5.58). Does this duality extend to convergence? In this chapter we set out to characterize convergence of a graph sequence (G1 , G2 , . . . ) in terms of mappings “to the right”, using homomorphisms from the graphs Gn into some fixed graph H, based on the results of Borgs, Chayes, Lov´asz, S´os and Vesztergombi [2012]. The first and most natural approach is not going to work. It is clear that considering simple graphs H would not give sufficient information: if the chromatic number of the graphs in the sequence tends to infinity, then hom(Gn , H) is eventually 0 for every fixed simple graph H. We will see that even counting homomorphisms into weighted graphs H would not suffice to characterize convergence. On the other hand, we show that a modification of the notion of homomorphism numbers, or replacing counting by maximization (in other words, considering restricted multicuts), does lead to a characterizations of convergence. It will turn out that the convergence conditions hold for general weighted target graphs H, but it is enough to require them for simple graphs (in the maximization version) of for weighted graphs with just two positive edgeweights (in the counting case). We will talk informally about left-convergence and right-convergence. Leftconvergence of a sequence (Gn ) means our notion of convergence as defined and studied in the previous chapter. Right-convergence means a number of possible convergence notions defined in terms of homomorphisms from the graph Gn into fixed smaller graphs. 12.1. Homomorphisms to the right and multicuts 12.1.1. Naive right-convergence. Consider homomorphisms G → H, where we think of G as a very large simple graph and H is a small weighted graph. It will be convenient to scale the nodeweights of H so that αH = 1; this only scales the values hom(G, H) by an easily computable factor. We will assume that V (G) = [n] and V (H) = [q]. (We refer to Borgs, Chayes, Lov´asz, S´os and 201
202
12. CONVERGENCE FROM THE RIGHT
Vesztergombi [2012] for a treatment of the case when G is also weighted, and also for consequences of these results in statistical physics.) Recall that for a fixed weighted graph H, the value hom(G, H) grows exponentially with n2 , and so a reasonable normalization is to consider the (dense) homomorphism entropy ent(G, H) =
log hom(G, H) . v(G)2
The first, “naive” notion of right-convergence would be to postulate that these homomorphism entropies converge for all weighted graphs H with (say) positive edgeweights. This is at least a necessary condition for convergence: Proposition 12.1. Let (Gn ) be a convergent graph sequence. Then for every weighted graph H with positive edgeweights, the sequence ent(Gn , H) is convergent. To prove this proposition, let us recall from Example 5.19 that the homomorphism entropy can be approximated by the maximum weighted multicut density: Let G be a simple graph on [n], and H, a weighted graph on [q] with positive edgeweights and αH = 1. Define Bij = log βij (H), then log hom(G, H) log q ≤ cut(G, B) + , n2 n where cut(G, B) is the maximum weighted multicut density 1 ∑ Bij eG (Si , Sj ). cut(G, B) = max (S1 ,...,Sq )∈Πn n2 (12.1)
cut(G, B) ≤
i,j∈[q]
We need a couple of facts about weighted multicut densities. First, they are invariant under blow-ups: Lemma 12.2. For a simple graph G, symmetric matrix B ∈ Rq×q and integer k ≥ 1, we have cut(G(k), B) = cut(G, B). Proof. The inequality cut(G(k), B) ≥ cut(G, B) is clear, since every qpartition of V (G) can be lifted to a q-partition of V (G(k)), contributing the same value to the maximization in the definition of cut(G(k), B). To prove the reverse inequality, let (S1 , . . . , Sq ) be the q-partition of V (G(k)) attaining the maximum in the definition of cut(G(k), B). For every node v ∈ V (G), we pick a random element v ′ ∈ V (G(k)) uniformly from the set of twins of v created when blowing it up, and let T = {v ′ : v ∈ V (G)}. Let G′ = G(k)[T ]. Then G′ ∼ = G, and (1 ∑ ) ∑ 1 E 2 Bij eG′ (Si ∩ T, Sj ∩ T ) = Bij eG(k) (Si , Sj ) 2 n (nk) i,j∈[q]
i,j∈[q]
= cut(G(k), B). It follows that for at least one choice of the nodes v ′ , we have 1 ∑ cut(G′ , B) ≥ 2 Bij eG′ (Si ∩ T, Sj ∩ T ) ≥ cut(G(k), B). n
i,j∈[q]
The following lemma is superficially similar to the Counting Lemma 10.22; it is in fact quite a bit simpler.
12.1. HOMOMORPHISMS TO THE RIGHT AND MULTICUTS
203
Lemma 12.3. For two simple graphs G and G′ and symmetric matrix B ∈ Rq×q , we have |cut(G, B) − cut(G′ , B)| ≤ q 2 δ (G, G′ ). Proof. We start with proving the weaker inequality (12.2) |cut(G, B) − cut(G′ , B)| ≤ q 2 δb (G, G′ ) in the case when v(G) = v(G′ ) = n. We may assume that G and G′ are optimally overlayed, so that V (G) = V (G′ ) = [n] and δb (G, G′ ) = d (G, G′ ). Then for every partition (S1 , . . . , Sq ) ∈ Πn , we have 1 ∑ 1 ∑ Bij eG (Si , Sj ) − 2 Bij eG′ (Si , Sj ) 2 n n i,j∈[q]
i,j∈[q]
1 ∑ ≤ 2 Bij |eG (Si , Sj ) − eG′ (Si , Sj )| ≤ q 2 d (G, G′ ). n i,j∈[q]
This proves (12.2). To get the more general inequality in the lemma, we apply (12.2) to the graphs G(n′ k) and G′ (nk), where n = v(G), n′ = v(G′ ), and k is a positive integer. The left side equals |cut(G, B) − cut(G′ , B)| for any k by Lemma 12.2, while the right side tends to q 2 δ (G, G′ ) if k → ∞ by the definition of δ . Proof of Proposition 12.1. By the Theorem 11.3, we have δ (Gn , Gm ) → 0 as n, m → ∞; by Lemma 12.3, this implies that the sequence of numbers cut(Gn , B) is a Cauchy sequence; by (12.1), it follows that the values ent(Gn , H) form a Cauchy sequence. It would be a natural idea here to define convergence of a graph sequence in terms of the convergence of the homomorphism entropies ent(Gn , H). However, this notion of convergence would not be equivalent to left-convergence, and it would allow sequences that we would not like to consider “convergent”, as Example 12.4 below shows. (Some suspicion could have been raised by (12.1) already: the nodeweights of H disappeared, which indicated loss of information.) Example 12.4. Let (Fn ) be a quasirandom graph sequence with edge density p, and let (Gn ) be a quasirandom graph sequence of density 2p, where (to keep the notation simple), we assume that v(Fn ) = v(Gn ) = n. Then we have, for every weighted graph H with positive edgeweights, ∑ 1 1 Bij eFn (Si , Sj ) + O( ) ent(Fn , H) = 2 max n (S1 ,...,Sq )∈Pn n i,j∈[q]
=
∑
max (S1 ,...,Sq )∈Pn
i,j∈[q]
( |S | |S | ) 1 i j Bij p + o(1) + O( ) n n n
= p max{x Bx : x ∈ Rq+ , xT 1 = 1} + o(1). T
Applying the same computation to Gn , we get that for the graphs G2n (disjoint union of two copies of Gn ), we have log(hom(Gn , H)2 ) log hom(Gn , H) log hom(G2n , H) = = (2n)2 4n2 2n2 1 = ent(Gn , H) = p max{xT Bx : x ∈ Rq+ , xT 1 = 1} + o(1). 2
ent(G2n , H) =
204
12. CONVERGENCE FROM THE RIGHT
So merging the sequences (Fn ) and (G2n ) we get a graph sequence for which the quantities ent(Gn , H) converge for every H, but which is clearly not convergent (check the triangle density!). 12.1.2. Typical homomorphisms. Let us try to take the nodeweights of H ∏ into account. The values αφ = v∈V (G) αφ(v) form a probability distribution on the maps φ : [n] → [q], where (by the Law of Large Numbers) we have |φ−1 (i)| ≈ αi n with high probability, if n is large. However, this information becomes irrelevant as n → ∞, and only the largest term will count rather than the “typical”. It turns out that it is often advantageous to restrict ourselves to maps that are ”typical”, by forcing φ to divide the nodes in the given proportions. Let Π(n, α) denote the set of partitions (V1 , . . . , Vq ) of [n] into q parts with ⌊αi n⌋ ≤ |Vi | ≤ ⌈αi n⌉, , and consider the set of maps { } Φ(n, α) = φ ∈ [q]n : ⌊αi n⌋ ≤ |φ−1 (i)| ≤ ⌈αi n⌉ for all i ∈ [q] . [ √ √ ] (We could be less restrictive and allow, say φ−1 (i) ∈ αi n − n, αi n + n . This would not change the considerations below in any significant way.) We define a modified homomorphism number, by summing only over the “typical” homomorphisms: ∑ ∏ hom∗ (G, H) = αφ βφ(u)φ(v) . φ∈Φ(n,α)
uv∈E(G)
From this, we get the typical homomorphism entropy log hom∗ (G, H) . n2 12.1.3. Restricted multicuts. In maximum multicut problems, it is quite natural to fix the proportion into which a multicut separates the node set. For example, the “maximum bisection problem” asks for the maximum size of a cut that separates the nodes into two almost equal parts. We can formulate the restricted multicut problem as follows. We specify (in addition to the coefficients Bij ), q further numbers α1 , . . . , αq > 0 with α1 + · · · + αq = 1. It will convenient to consider the parameters αi and Bij as the nodeweights and edge weights of a weighted graph H ′ with V (H ′ ) = [q]. Then we are interested in the value 1 ∑ (12.3) cut(G, H ′ ) = max 2 Bij eG (Si , Sj ), n i,j ent∗ (G, H) =
where {S1 , . . . , Sq } ranges over all partitions of V (G) such that (12.4)
⌊αi n⌋ ≤ |Si | ≤ ⌈αi n⌉ (i = 1, . . . , q).
This quantity is called a maximum restricted multicut, or in terms of statistical physics, a microcanonical ground state energy. This quantity can be defined for all graphs H with positive nodeweights, by scaling the nodeweights so that they sum to 1. The same simple computation as in Example 5.19 gives the following formula: 1 (12.5) ent∗ (G, H) = cut(G, H ′ ) + O( ). n Here H ′ is obtained from H by replacing all edgeweights by their binary logarithms (while keeping the same nodeweights).
12.2. THE OVERLAY FUNCTIONAL
205
In the study of subgraph densities, the identity t(F, G) = t(F, WG ) guaranteed an easy transition between graphs ( ) and graphons. A finite consequence of this is the identity t(F, G) = t F, G(m) (where G(m) is the m-blowup of the graph G). Lemma 12.2 above shows that a similar identity holds for multicuts. However, for restricted multicuts or homomorphisms to the right such a simple identity does not hold any more. We will have to estimate the error we are making when we replace a graph G by the associated graphon WG (luckily, this error will be small if G is large enough). Exercise 12.5. Show that if αi = 1/q for all i and n < q, then hom∗ (G, H) = tinj (G, H). Exercise 12.6. Let (Gn ) be a quasirandom sequence with edge density p, and let F be a simple graph such that 2e(F [S]) ≤ q|S|2 for every subset S ⊆ V (F ). Prove that cut(Gn , F ) ≤ pq + o(1) (n → ∞).
12.2. The overlay functional The main advantage of the maximum-cut type functions introduced in the previous section is that it is easy to extend their definitions to the case when the graph G is replaced by a graphon. (Let me repeat that there does not seem to be any reasonable extension of the hom function to graphons.) For a probability distribution α on [q], let Π(α) denote the set of partitions (S1 , . . . , Sq ) of [0, 1] into q measurable sets with λ(Si ) = αi . For every graphon U and weighted graph H on node set [q], we define ∫ ∑ Bij U (x, y) dx dy C(U, H) = sup (S1 ,...,Sq )∈Π(α)
i,j∈[q]
Si ×Sj
This notion does not quite extend the maximum restricted weighted multicut of graphs G; the reason is that in a graph, we cannot partition the set of nodes in exactly the desired proportions. But the difference is small; we will come back to this question in Section 12.4.1. We can generalize even further and define, for two kernels U and W , ∫ ( ) U (x, y)W φ(x), φ(y) dx dy. C(U, W ) = sup ⟨U, W φ ⟩ = sup φ∈S[0,1]
φ∈S[0,1]
[0,1]2
It is easy to see that this extends the definition of maximum restricted weighted multicuts in the sense that if U is any graphon and H is a weighted graph, then C(U, H) = C(U, WH ).
(12.6)
The functional C(U, W ), which we call the overlay functional, has many good properties. It follows just like the similar statement for norms in Theorem 8.13 that (12.7)
C(U, W ) = sup ⟨U, W φ ⟩ = sup ⟨U φ , W ⟩ = φ∈S[0,1]
φ∈S[0,1]
sup
⟨U φ , W ψ ⟩
φ,ψ∈S [0,1]
{ } = sup ⟨U0 , W0 ⟩ : (∃φ, ψ ∈ S [0,1] ) U = U0φ , W = W0ψ . Hence it follows that the overlay functional is invariant under measure preserving f0 × W f0 . It is also transformations of the kernels, i.e., it is a functional on the space W
206
12. CONVERGENCE FROM THE RIGHT
immediate from the definition that this quantity has the (somewhat unexpected) symmetry property C(U, W ) = C(W, U ), and satisfies the inequalities (12.8)
⟨U, W ⟩ ≤ C(U, W ) ≤ ∥U ∥2 ∥W ∥2 ,
C(U, W ) ≤ ∥U ∥∞ ∥W ∥1 .
This suggests that C(., .) behaves like some kind of inner product. This analogy is further supported by the following identity, reminiscent of the cosine theorem, relating it to the distance δ2 derived from the L2 -norm: ) 1( (12.9) C(U, W ) = ∥U ∥22 + ∥W ∥22 − δ2 (U, W )2 2 ) 1( = δ2 (U, 0)2 + δ2 (W, 0)2 − δ2 (U, W )2 . 2 Indeed, δ2 (U, W )2 = =
inf
φ∈S[0,1]
∥U ∥22
∥U − W φ ∥22 = ∥U ∥22 + ∥W ∥22 − 2 sup ⟨U, W φ ⟩
+
φ∈S[0,1]
∥W ∥22
− 2C(U, W ).
We have to be a bit careful: the functional C(U, W ) is not bilinear, only subadditive in each variable: (12.10)
C(U + V, W ) ≤ C(U, W ) + C(V, W ).
It is homogeneous for positive scalars: if λ > 0, then (12.11)
C(λU, W ) = C(U, λW ) = λC(U, W ).
We have C(U, W ) = C(−U, −W ), but C(U, W ) and C(−U, W ) are not related in general. A less trivial property of the overlay functional is that it is continuous in each variable (with respect to the δ distance). This does not follow from (12.9), since the distance δ2 (U, W ) is not continuous with respect to δ (only lower semicontinuous; see Section 14.2.1). Lemma 12.7. If δ (Un , U ) → 0 as n → ∞ (U, Un ∈ W1 ), then for every W ∈ W1 we have C(Un , W ) → C(U, W ). Proof. By subadditivity (12.10), we have −C(U − Un , W ) ≤ C(Un , W ) − C(U, W ) ≤ C(Un − U, W ), and hence it is enough to prove that C(Un − U, W ), C(U − Un , W ) → 0. In other words, it suffices to prove the lemma in the case when U = 0. By definition, we have C(Un , W ) ≥ ⟨Un , W ⟩, and the right side tends to 0 by Lemma 8.22. Hence lim inf n C(Un , W ) ≥ 0. To prove the opposite inequality, we start with the case when W is a stepfunc∑m tion. Write W = i=1 ai 1Si ×Ti , then using (12.10) and (12.11), we get C(Un , W ) ≤ ≤
m ∑ i=1 m ∑ i=1
C(Un , ai 1S×T ) =
m ∑
C(ai Un , 1S×T )
i=1
∥ai Un ∥ =
m ∑ i=1
|ai |∥Un ∥ → 0.
12.3. RIGHT-CONVERGENT GRAPHON SEQUENCES
207
Now if W is an arbitrary kernel, then for every ε > 0 we can find a stepfunction W ′ such that ∥W −W ′ ∥1 ≤ ε/2. We know that C(Un , W ′ ) → 0, and hence C(Un , W ′ ) ≤ ε/2 if n is large enough. But then C(Un , W ) ≤ C(Un , W − W ′ ) + C(Un , W ′ ) ≤ ∥Un ∥∞ ∥W − W ′ ∥1 + ε/2 ≤ ε. This shows that lim supn C(Un , W ) ≤ 0, and completes the proof.
While the functional C(U, W ) is continuous in each variable, it is not continf1 , δ ) × (W f1 , δ ). Let (Gn ) be any uous as a functional on the product space (W quasirandom graph sequence and let Wn = Un = 2WGn − 1. Then Un , Wn → 0 in the cut norm (and so also in δ ), but C(Un , Wn ) = 1 for all n. Exercise 12.8. Define the following functional on W0 × W0 : C ∗ (U, W ) =
sup
⟨U, W φ ⟩
φ: [0,1]→[0,1]
where φ is measurable, but not necessarily measure preserving. Prove the formulas maxcut(G) = C ∗ (WG , WK2 ), and
( ) ∥U ∥ ≈ C ∗ U, 1(x, y ≤ 1/2) , where the ≈ sign means equality up to a factor of 2.
12.3. Right-convergent graphon sequences As we have experienced before, many results are easier and cleaner when formulated for graphons. This is particularly true for results about right-convergence. the goal of this section is to formulate various characterizations of convergent graphon sequences in terms of quantities defined by maps from the underlying set of a graphon. Then in the next section we give a characterization of convergent graph sequences, which will sound almost identical, but whose prof will be much more tedious. 12.3.1. Quotient sets of graphons. If W is a kernel and P = (S1 , . . . , Sq ) is a measurable q-partition of [0, 1], we have defined the template (quotient) graph W/P: it is a weighted graph on [q], with node weights αi (W/P) = λ(Si ) and edge weights ∫ 1 βij (W/P) = W. λ(Si )λ(Sj ) Si ×Sj The Regularity Lemma (Lemma 9.13 and its versions) said that there is always a template that is close to the original graphon (or graph in the finite case) in the cut norm. In order to get right-convergence criteria, we will study all templates of a given graphon. For a kernel W and probability distribution a on [q], we denote by Qa (W ) the set of templates L = W/P with α(L) = a. We denote by Qq (W ) the set of all q-partitions. () We will consider these quotient sets as subsets of the 2q + q dimensional real space. The quotient set Qq (W ) is not always closed, but it is closed if W is a stepfunction (see Exercise 12.13). The closure can be described in terms of “fractional partitions”, also discussed in exercises. Quotient sets can be used to express the overlay functional, at least if one of the kernels involved is a weighted graph. For every weighted graph H with α(H) = a
208
12. CONVERGENCE FROM THE RIGHT
and every kernel W , we have C(W, H) =
(12.12)
∑
max L∈Qa (W )
αi (L)αj (L)βij (H)βij (L).
i,j∈[q]
In this case, we have αi (L) = αi (H) for all i. We need a classical definition: if (X, d) is a metric space, then the Hausdorff metric is defined on the set of subsets of X by the formula dHaus (A, B) = inf{c : d(a, B) ≤ c ∀a ∈ A and d(b, A) ≤ c ∀b ∈ B}
(12.13)
(here d(a, B) denotes the distance of point a from set B). The special case we need is when A, B are sets of weighted graphs on [q]. we can use the edit distance or the cut distance, and denote the corresponding Hausdorff distance of two sets A and B by dHaus (A, B) or dHaus (A, B). Most of the time it does not make much difference 1 which one we use, because of the trivial inequalities dHaus (A, B) ≤ dHaus (A, B) ≤ q 2 dHaus (A, B). 1
(12.14)
We start with a lemma relating quotient sets with different node weights. Lemma 12.9.( For any W ∈ W)1 and any two probability distributions a, a′ on [q], Qa (W ), Qa′ (W ) ≤ 3∥a − a′ ∥1 . we have dHaus 1 Proof. Let L = W/P ∈ Qa (W ), where P = {S1 , . . . , Sq } ∈ Π(a). It is easy to construct a partition P ′ = {S1′ , . . . , Sq′ } ∈ Π(a′ ) such that either Si ⊆ Si′ or Si′ ⊆ Si for every i. Let L′ = W/P ′ ∈ Qa′ (W ). By definition, ∫ ∑ ∫ d1 (L, L′ ) = ∥a − a′ ∥1 + W − W . i,j∈[q]
Here
∫
Si ×Sj
∫ W−
Si′ ×Sj′
Si′ ×Sj′
Si ×Sj
( ) W ≤ λ (Si × Sj )△(Si′ × Sj′ ) ≤ |ai − a′i | max(aj , a′j ) + |aj − a′j | min(ai , a′i ).
Summing over all i and j, we get ) ( ∑ ∑ max(aj , a′j ) + min(ai , a′i ) d1 (L, L′ ) ≤ ∥a − a′ ∥1 1 + = ∥a − a′ ∥1
(
j
i
) ∑ (aj + a′j ) = 3∥a − a′ ∥1 . 1+ j
This proves the Lemma.
We need a couple of lemmas about the Hausdorff distance of quotient sets of different graphons. Lemma 12.10. For any two graphons U and W and any integer q ≥ 1, we have ( ) dHaus Qq (U ), Qq (W ) ≤ δ (U, W ). Proof. The quotient sets are invariant under weak isomorphisms, and hence we may assume that U and W are optimally overlayed, so that δ (U, W ) = ∥U − W ∥ . The contractivity of the stepping operator (Exercise 9.17) asserts
12.3. RIGHT-CONVERGENT GRAPHON SEQUENCES
209
that d (U/P, W/P) ≤ ∥U − W ∥ for any q-partition P of [0, 1]. By the definition of Hausdorff distance, this implies that ( ) dHaus Qq (U ), Qq (W ) ≤ ∥U − W ∥ = δ (U, W ). Lemma 12.11. For any two graphons U and W and any integer q ≥ 1, we have ( ) ( ) ( ) dHaus Qq (U ), Qq (W ) ≤ sup dHaus Qa (U ), Qa (W ) ≤ 4dHaus Qq (U ), Qq (W ) a
(where a ranges over all probability distributions on [q]). Proof. The first inequality is easy: let H ∈ Qq (U ), then H ∈ Qb (U ) for the distribution b = α(H). Hence ( ) ( ) ( ) Qb (U ), Qb (W ) d H, Qq (W ) ≤ d H, Qb (W ) ≤ dHaus ( ) Qa (U ), Qa (W ) . ≤ sup a dHaus Since this holds for every H ∈ Qq (U ), and analogously for every graph in Qq (W ), the inequality follows by the definition of the Hausdorff distance. To prove the second inequality, let a be any probability distribution on [q] and H ∈ Qa (U ).( For every ε > ) 0, there is a quotient L ∈ Qq (W ) such that Q (U ), Q (W ) + ε. Let L ∈ Qb (W ), then ∥a − b∥1 ≤ d (H, L) d (H, L) ≤ dHaus q q by the definition of d (H, L). By Lemma 12.9, there is a quotient L′ ∈ Qa (W ) such that d (L, L′ ) ≤ d1 (L, L′ ) ≤ 3|a − b| + ε ≤ 3d (H, L) + ε. Thus ( ) Qq (U ), Qq (W ) + 5ε. d (H, L′ ) ≤ d1 (H, L) + d (L, L′ ) ≤ 4d (H, L) + ε ≤ 4dHaus Since ε was arbitrary, this proves the lemma.
12.3.2. Graphon convergence from the right. After this preparation, we are ready to characterize convergence of a graphon sequence in terms of homomorphisms into fixed weighted graphs. Theorem 12.12. For any sequence (Wn ) of graphons, the following are equivalent: (i) the sequence (Wn ) is convergent in the cut distance δ ; (ii) the overlay functional values C(Wn , U ) are convergent for every kernel U ; (iii) the restricted multicut densities C(Wn , H) are convergent for every simple graph H; (iv) the quotient sets Qq (Wn ) form a Cauchy sequence in the dHaus Hausdorff metric for every q ≥ 1. It follows from conditions (ii) and (iii) that it would be equivalent to assume the convergence of the sequence C(Wn , H) for every weighted graph H. Lemma 12.11 implies that we could require in (iv) the convergence of Qa (Wn ) for every q ≥ 1 and probability distribution a on [q]. In fact, it would be enough to require this for the uniform distribution (see Exercise 12.24). In (iv), we could use the dHaus Hausdorff metric as well. 1 Proof. (i)⇒(ii) by Lemma 12.7. (ii)⇒(iii) is trivial. (i)⇒(iv) by Lemma 12.10. (iii)⇒(i): Let (Wn ) be a sequence of graphons that is not convergent in the cut distance. By the compactness of the graphon space, it has two subsequences (Wni ) and (Wmi ) converging to different unlabeled graphons W and W ′ . There is a graphon U such that C(W, U ) ̸= C(W ′ , U ); in fact, (12.9) implies ( ) ( ) C(W ′ , W ′ ) − C(W ′ , W ) + C(W, W ) − C(W ′ , W ) = δ2 (W ′ , W )2 > 0,
210
12. CONVERGENCE FROM THE RIGHT
and so either C(W ′ , W ′ ) ̸= C(W, W ′ ) or C(W, W ′ ) ̸= C(W, W ), and we can take either U = W or U = W ′ . Furthermore, we can choose U of the form U = WH , where H is a simple graph. This follows using the fact that simple graphs are dense in the graphon space, and the continuity of the overlay functional (Lemma 12.7). Since C(Wni , H) → C(W, H) and C(Wmi , H) → C(W ′ , H) by Lemma 12.7, it follows that the values C(Wn , H) cannot form a convergent sequence, contradicting (ii). (iv)⇒(iii): Fix any simple graph H on [q], and let a be the uniform distribution on [q]. Let n, m ≥ 1, then we have 1 ∑ 1 ∑ C(Wn , H) = sup βij (L) = max βij (L). 2 2 L∈Qa (Wn ) q L∈Qa (Wn ) q i,j i,j ij∈E(H)
ij∈E(H)
Let Ln ∈ Qa (Wn ) attain the maximum. By the definition ( of Hausdorff distance, ) there is an L′ ∈ Qa (Wm ) such that d (Ln , L′ ) ≤ dHaus Qa (Wn ), Qa (Wn ) . The ∑ definition of the C functional implies that C(Wm , H) ≥ (1/q 2 ) ij∈E(H) βij (L′ ). Hence 1 ∑ 1 ∑ C(Wn , H) − C(Wm , H) ≤ 2 βij (Ln ) − 2 βij (L′ ) q q i,j ij∈E(H)
ij∈E(H)
1 ∑ ≤ 2 |βij (Ln ) − βij (L′ )| = d1 (Ln , L′ ) ≤ q 2 d (Ln , L′ ) q i,j ( ) Qa (Wn ), Qa (Wn ) . ≤ q 2 dHaus By Lemma 12.11, we have ( ) ( ) Qq (Wn ), Qq (Wn ) , Qa (Wn ), Qa (Wn ) ≤ 4dHaus dHaus which tends to 0 as n, m → ∞ by hypothesis. This implies that ( ) lim sup C(Wn , H) − C(Wm , H) ≤ 0. n
Since a similar conclusion holds with n and m interchanged, we get that ( ) C(Wn , H) : n = 1, 2, . . . is a Cauchy sequence. Some of the arguments in the proof of Theorem 12.12, most notably the proof of (iii)⇒(i), were not effective. One can in fact prove explicit inequalities between the different distance measures that occur. We refer to Borgs, Chayes, Lov´asz, S´os and Vesztergombi [2012] for the details. Exercise 12.13. Show by an example that the set Qq (W ) is not closed in general, but it is closed if W is a stepfunction. Exercise 12.14. Show by an example that Qa (W ) is not convex in general, even if W is a stepfunction. Exercise 12.15. A fractional partition of [0, 1] into q parts is an ordered q-tuple of measurable functions ρ1 , . . . , ρq : S → [0, 1] such that for all x ∈ [0, 1], we have ρ1 (x) + · · · + ρq (x) = 1. For a fractional partition ρ of [0, 1] and a kernel W ∈ W, we define the fractional quotient graph W/ρ as a weighted graph on [q] with αi (W/ρ) = ∥ρi ∥1 and ∫ 1 βij (W/ρ) = ρi (x)ρj (y)W (x, y) dx dy. ∥ρi ∥1 ∥ρj ∥1 [0,1]2
Prove that the operation W 7→ W/ρ is contractive for the L1 and L2 .
12.4. RIGHT-CONVERGENT GRAPH SEQUENCES
211
Exercise 12.16. Let ρ be a fractional q-partition of [0, 1]. Prove that W/ρ ∈ Qq (W ). Also proved that every weighted graph in Qq (W ) can be represented this way.
12.4. Right-convergent graph sequences Our basic plan is to apply theorem 12.12 to graphons WGn to get characterizations of convergent graph sequences in terms of homomorphisms to the right. There are two difficulties in the way. First, in the restricted multicut density condition (iii), thee is no way in general to partition the nodes of a graph in given proportions; we have to allow some rounding of the prescribed sizes for the partition classes. Second, a partition of [0, 1] obtained when we overlay WGn and H optimally may not correspond to any partition of V (Gn ). (It corresponds to a “fractional partition”, which we will have to define because of this.) This problem is more serious, and we have to work harder to obtain true partitions of V (Gn ). 12.4.1. Restricted quotients. Let G be a simple graph with nodeset [n] and let P = (S1 , . . . , Sq ) be a partition of [n]. We consider the quotient graph G/P as a weighted graph on [q], with node weights αi (G/P) = |Si |/n (i ∈ [q]), and edge weights βij (G/P) = eG (Si , Sj )/|Si ||Sj | (i, j ∈ [q]). The set of all weighted graphs G/P, where P ranges over all q-partitions of [n], will be called the quotient set of G (of size q), and will be denoted by Qq (G). For a graph G and a probability distribution a on [q], to define the restricted quotient set Qa (G), we have to allow the relative sizes of the partition classes to deviate a little from the prescribed values a: we consider the set of quotients G/P, where P ∈ Π(n, a). Quotient sets can be used to express multicut functions. For every weighted graph H, ∑ αi (L)αj (L)βij (L)βij (H). (12.15) cut(G, H) = max L∈Qa (G)
i,j∈[q]
Note that the nodeweights of H and L are not the same in general, but almost: |αi (L) − αi (H)| ≤ n1 . Remark 12.17. The quotient sets are in a sense dual to the (multi)sets of induced subgraphs of a given size, which was one of the equivalent ways of describing what we could see by sampling. Instead of gaining information about a large graph by taking a small subgraph, we take a small quotient. However, there are substantial differences. On the set of induced subgraphs of a given size, we had a probability distribution, which carried the relevant information. We can also introduce a probability distribution on quotients of a given size of a graph G, by taking a random partition. This would be quite relevant to statistical physics, but we would run into difficulties when tending to infinity with the size of G. The probability distributions would concentrate more and more on boring average quotients, while the real information would be contained in the outliers. To be more specific, a random induced subgraph (of a fixed, but sufficiently large size) approximates the original graph well, but a random quotient does not carry this information. In other words, it is the set of quotients that characterizes the convergence of a graph sequence, and not the distribution on it.
212
12. CONVERGENCE FROM THE RIGHT
12.4.2. Fractional partitions. A fractional partition of a set S is an ordered q-tuple of functions ρ1 , . . . , ρq : S → [0, 1] such that for all x ∈ [0, 1], we have ρ1 (x) + · · · + ρq (x) = 1. An ordinary partition corresponds to the special case when every ρi is 0-1 valued. For every vector α ∈ [0, 1]q , we denote by Π∗ (n, α) the set of fractional partitions ρ of [n] with ∥ρi ∥1 = αi n. We extend the notion of quotients to fractional partitions. For every fractional partition ρ of [n], we consider the fractional quotient graph G/ρ, which is a weighted graph on [q], with node weights αi (G/ρ) =
1 ∑ 1 |ρi | = ρi (u) n n
(i ∈ [q])
u∈[n]
and edge weights βij (G/ρ) =
∑
/ ρi (u)ρj (v) |ρi ||ρj |
(i, j ∈ [q])
u,v∈[n]
In the special case when every value ρi (u) is 0 or 1, then the supports of the functions ρi form a partition P, and G/P = G/ρ. We also introduce fractional quotient sets, replacing partitions by fractional partitions. The set of all fractional q-quotients of a graph G is denoted by Q∗q (G), and the set of all fractional q-quotients G/ρ for which α(G/ρ) is a fixed distribution a on [q], by Q∗a (G). 12.4.3. Relations between quotient sets. Our goal is to use quotient sets to characterize convergence of a graph sequence. But before doing so, we have to formulate and prove a number of rather technical relationships between different quotient sets. For any simple graph G and positive integer q, we have two quotient sets: the set Qq (G) of quotients G/P, and the set Q∗q (G) of fractional quotients G/ρ. In addition, we have the restricted versions Qa of both of these. The quotient sets Qq (WG ) and Qa (WG ) will also come up; but it is easy to see that these are just the same as Q∗q (G) and Q∗a (G). Turning to the quotient set Qq (G) (which is of course the most relevant from the combinatorial point of view), it follows immediately from the definition that Qq (G) ⊆ Q∗q (G). Note, however, that Qa (G) and Q∗a (G) are in general not comparable. The first set is finite, the second is typically infinite. On the other hand, Qa (G) contains graphs whose nodeweight vector is only approximately equal to a, and so it is not contained in Q∗a (G). In the rest of this section we are going to prove that the “true” quotient sets and their fractional versions are not too different, at least if the graph is large. We will need the following version of Lemma 12.9, which can be proved along the same lines. Lemma 12.18. For (any simple graph) G and any two probability distributions a, a′ on [q], we have dHaus Qa (G), Qa′ (G) ≤ 3∥a − a′ ∥1 . 1 The two kinds of quotient sets of the same graph are related by the following proposition.
12.4. RIGHT-CONVERGENT GRAPH SEQUENCES
213
Proposition 12.19. For every simple graph G on [n], integer q ≥ 1, and probability distribution a on [q], ( ∗ ) ( ∗ ) 16q 4q dHaus Qq (G), Qq (G) ≤ √ and dHaus Qa (G), Qa (G) ≤ √ . 1 1 n n Proof. We start with the first inequality, whose proof gets somewhat technical. Since Qq (G) ⊆ Q∗q (G), it suffices to prove that if H is a fractional q-quotient of G, √ then there exists a q-quotient L of G such that d1 (H, L) ≤ 4q/ n. We may assume that q ≥ 2 and n > 9q 2 ≥ 36 (else, the assertion is trivial). Let ρ ∈ Π∗ (n, α) be a fractional partition such that G/ρ = H. We want to “round” the values ρi (u) to ri (u) ∈ {0, 1} so that we get an integer partition in Π(n, α) with “almost” ∑ the same quotient. Let A denote the adjacency matrix of G, and define Fij (r) = u,v∈[n] Auv ri (u)rj (v), then we want ∑ ∑ (12.16) ri (u) = 1, ⌊αi n⌋ ≤ ri (u) ≤ ⌈αi n⌉, Fij (r) ≈ Fij (ρ) u
i
for all i, j ∈ [q]. We do random ( rounding: for) each u ∈ [n], let Zu be chosen randomly from the distribution ρ1 (u), . . . , ρq (u) , and let Ri (u) = 1(Zu = i). So P(Ri (u) = 1) = ρi (u), and for∑ different nodes u of G the random variables Ri (u) are independent. We have i Ri (u) = 1 for every u ∈ [n], and so the Ri define a partition P. Let L = G/P.( For the other two conditions we get the right value at least in ) ( ) ∑ expectation: E u Ri (u) = αi n and E Fij (R) = Fij (ρ) (in the last equation we use that Ri (u) and Rj (v) ∑ are independent if u ̸= v, and Auv = 0 if u = v). We consider the errors Xi = u Ri (u) − αi n and Yij = Fij (R) − Fij (ρ), and use a second moment argument to show that these are small. We have ∑ ( ) ∑ Var(Xi ) = Var Ri (u) = (ρi (u) − ρi (u)2 ) < αi n, u
and hence E(
u
∑
Xi2 ) =
i
∑
Var(Xi ) < n.
i
Furthermore, (12.17)
( ) Var(Yij ) = Var Fij (R) =
∑
( ) Auv Au′ v′ cov Ri (u)Rj (v), Ri (u′ )Rj (v ′ ) .
u,v,u′ ,v ′ ∈[n]
Each covariance in this sum depends on which of u, v, u′ , v ′ and also which of i and j are equal, but each case is easy to treat. The covariance term is 0 if the edges uv and u′ v ′ are disjoint. If i ̸= j, we get: ρi (u)ρj (v) − ρi (u)2 ρj (v)2 < ρi (u)ρj (v), if u = u′ and v = v ′ , −ρ (u)ρ (v)ρ (u)ρ (v) < 0, if u = v ′ , v = u′ , i i j j ρj (v)ρj (v ′ )(ρi (u) − ρi (u)2 ) < ρi (u)ρj (v)ρj (v ′ ), if u = u′ and v ̸= v ′ , −ρi (u)ρj (v)ρi (v)ρj (v ′ ) < 0, if u = v ′ , u′ ̸= v. (The other possibilities are covered by symmetry.) Summing over u, v, u′ , v ′ , we get that the sum in (12.17) is at most (αi n)(αj n) + 0 + (αi n)(αj n)2 + (αi n)2 (αj n) + 0.
214
12. CONVERGENCE FROM THE RIGHT
The case when i = j can be treated similarly, and we get that the sum in (12.17) is at most 2(αi n)2 + 4(αi n)3 . Hence, summing over all i and j, we get ∑ ∑ ∑ ∑ E(Yij2 ) = Var(Yij ) ≤ n2 + (n2 + 2n3 ) αi2 + 4n3 αi3 ≤ 6n3 + 2n2 . i,j
i,j
i
i
By Cauchy–Schwarz, (∑ |X | ∑ |Y | )2 (1 ∑ 1 ∑ 2) i ij 2 2 d1 (H, L)2 = + ≤ (q + q) X + Y , i n n2 n2 i n4 i,j ij i i,j and so
(1
6 2 ) 16q 2 + 2 < . n n n √ n Hence with positive probability, d1 (H, L) ≤ 4q/ n. The second inequality in the proposition is quite easy to prove now, except that we cannot use containment in either direction, and so we have to prove two “almost containments”. Let H ∈ Qa (G) and let b = α(H), ∥a − b∥1 ≤ q/n, ∗ and H ∈ Q∗b (G). By Lemma √ 12.18, there is an L ∈ Qa (G) such that d1 (H, L) ≤ 3∥a − b∥1 ≤ 3q/n < 16q/ n. ′ Conversely, let H ∈ Q∗a (G), then √ by part (a), there exists a q-quotient H ∈ ′ Qq (G) such that d1 (H, H ) ≤ 4q/ n. Lemma 12.18 implies that there exists an L ∈ Q∗a (G) such that E(d1 (H, L)2 ) = (q 2 + q)
+
d1 (L, H ′ ) ≤ 3|a − α(H ′ )| = 3|α(H) − α(H ′ )| ≤ 3d1 (H, H ′ ), √ and so d1 (L, H) ≤ d1 (L, H ′ ) + d1 (H ′ , H) ≤ 4d1 (H, H ′ ) ≤ 16q/ n.
12.4.4. Right-convergent graph sequences. In a sense, right-convergence of a graph sequence is a special case of right-convergence of a graphon sequence. However, quantities like multicuts associated with a graphon of the form WG are only approximations of the analogous combinatorial quantities associated with the corresponding graph G. (This is in contrast with the homomorphism densities from the left, recall e.g. (7.2).) In this section we prove that these approximations are good enough for the equivalent characterizations of convergence of graphon sequences to carry over to graph sequences. We prove the following characterization of convergence of a dense graph sequence, analogous to the characterization of convergence of a graphon sequence given in Theorem 12.12. Theorem 12.20. Let (Gn ) be a sequence of simple graphs such that v(Gn ) → ∞ as n → ∞. Then the following are equivalent: (i) the sequence (Gn ) is convergent; (ii) the overlay functional values C(WGn , U ) are convergent for every kernel U ; (iii) the restricted multicut densities cut(Gn , H) are convergent for every simple graph H; (iv) the quotient sets Qq (Gn ) are Cauchy in the Hausdorff metric for every q ≥ 1. Clearly, conditions (ii) and (iii) are also equivalent to the convergence of cut(Gn , H) for every weighted graph H. By (12.5), this is equivalent to the convergence of typical homomorphism entropies ent∗ (G, J) for every weighted graph J with positive edgeweights. By our discussion in Section 2.2, we could talk about
12.4. RIGHT-CONVERGENT GRAPH SEQUENCES
215
microcanonical ground state energies instead of restricted multicuts. By the results of the previous section, we could replace Qq (Gn ) by Q∗q (Gn ), or we could require the convergence of Qa (Gn ) for every q ≥ 1 and probability distribution a on [q]. On the other hand, it would be enough to require this for the uniform distribution (see Exercise 12.24). Proof. If we replace Gn by WGn , then (i) and (ii) do not change. In (iii), we have ∑ αi (L)αj (L)βij (L)βij (H), C(Gn , H) = max L∈Qa (G)
i,j∈[q]
and C(WGn , H) =
max ∗
L∈Qa (G)
∑
αi (L)αj (L)βij (L)βij (H).
i,j∈[q]
( ∗ ) √ Since dHaus Qa (G), Qa (G) ≤ 9q/ n by Proposition 12.19, it follows that C(Gn , H) 1 is convergent as n → ∞ if and only if C(WGn , H) is. By a similar argument, the validity of (iv) does not change if we replace Gn by WGn . So the theorem follows by Theorem 12.12. The last theorem in this chapter describes the limiting values in Theorem 12.20. The proof is contained in our previous considerations. Theorem 12.21. Let Gn be a convergent sequence of simple graphs such that Gn → W (W ∈ W0 ). Let F be a simple graph, let H be a weighted graph with positive edgeweights, and let J be obtained from H by replacing every edgeweight by its binary logarithm. Then cut(Gn , J) → C(W, J) and ent∗ (Gn , H) → C(W, J). Remark 12.22. How far can we push convergence results for right-homomorphism parameters of a convergent graph sequence? Suppose that (Gn ) is a convergent graph sequence. Does it follow that log t(Gn , W )/v(Gn )2 tends to a limit for every graphon W ? Or at least, for every graphon W > 0? or at least for every graphon W with 1/2 ≤ W ≤ 1? Theorem 12.20(ii) suggests an affirmative answer, but this is false: the example in Exercise 12.25 (which is hard!) shows that right-convergence in this more general sense does not follow from left-convergence. Exercise 12.23. Prove that maxH |C(U, H) − C(W, H)|, where the maximum is taken over all weighted graphs H on [q] with nodeweight vector a) and edgeweights ( ( ) in [−1, 1], is equal to the Hausdorff distance dHaus conv(Qa (U ) , conv Qa (U ) ). 1 Exercise 12.24. Prove that a sequence (Wn ) of graphons is convergent if and only if the quotient sets Qu (Wn ) are convergent in the Hausdorff metric for every q ≥ 1, where u is the uniform distribution on [q]. Exercise 12.25. Let (Gn ) be a quasirandom graph sequence with edge density = kn is a sufficiently fast increasing sequence of integers. 1/2, such that v(Gn ) ∑ Define W (x, y) = 1 + ∞ / [0, 1] n=1 WG2n (k2n x, k2n y) (where WGn (x, y) = 0 if x ∈ or y ∈ / [0, 1]). Prove that the kernel W is 1-2 valued, and log t(G2n , W ) 1 = + o(1), 2 k2n 2
log t(G2n+1 , W ) 1 = + o(1). 2 k2n+1 4
CHAPTER 13
On the structure of graphons 13.1. The general form of a graphon A probability space J = (Ω, A, π) together with a symmetric function W : J × J → R, measurable with respect to the completion of the sigma-algebra A × A, will be called a kernel, and if the range of W is contained in [0, 1], a graphon. So far, we have assumed that J is the unit interval [0, 1] with the Lebesgue measure, and this was good enough to get a limit object for every convergent graph sequence. But recall Example 11.41, where it took an artificial step of applying a measure preserving map to carry the graphon structure over to the unit interval. A similar step was needed in the definition of the tensor product of two kernels (Section 7.4). In both cases, allowing more general underlying sets leads to a simpler and cleaner situation. If we have to make this distinction, a graphon on [0, 1] with the Lebesgue measure will be called a graphon on [0, 1]. Subgraph densities can be defined in any graphon by the same formula as in the special case of [0, 1]: For a multigraph F = (V, E), we define ∫ ∏ ∏ W (xi , xj ) dπ(xi ). t(F, W ) = ΩV
ij∈E
i∈V
Hence we can define weak isomorphism of kernels as before. We can sample H(n, W ) and G(n, W ) from any graphon. If we fix the underlying space, we can talk about kernel norms as before. If (Ω, A, π) and (Ω′ , A′ , π ′ ) are two probability spaces, (Ω, A, π, W ) is a kernel, and φ : Ω′ → Ω is a measure preserving map, then the pullback W φ can be defined as before: ( ) W φ (x, y) = W φ(x), φ(y) . It is clear that (Ω′ , A′ , π ′ , W φ ) will be a kernel, which we call the pullback of (J ′ , W ′ ). Simple computation shows that for every graph F , (13.1)
t(F, W ) = t(F, W φ ),
so (Ω′ , A′ , π ′ , W φ ) is weakly isomorphic to (Ω, A, π, W ). To define the δ distance of two kernels, we have to be a little careful, since at this point we allow the underlying spaces to have atoms, and then the difficulties in the definition of the cut-distance for graphs (Sections 8.1.2–8.1.4) resurface. Let A1 = (J1 , W1 ) and A2 = (J2 , W2 ) be two kernels. To avoid the difficulty with atoms, we consider all kernels (J, W ) and all pairs of measure preserving maps φi : J → Ji . We define δ (A1 , A2 ) = inf ∥W1φ1 − W2φ2 ∥ , where the infimum ranges over all choices of J, W, φ1 and φ2 . 217
218
13. ON THE STRUCTURE OF GRAPHONS
At this point, there are some rather technical questions to address. Do we want to assume that J is a standard probability space (see Appendix A.3)? Do we want to consider A as a complete sigma-algebra with respect to the probability measure (like Lebesgue measurable sets in [0, 1]), or to the contrary, do we want to assume that it is countably generated (like Borel sets in [0, 1])? It was shown by Borgs, Chayes and Lov´asz [2010] that a kernel on an arbitrary probability space can be transformed, by very simple steps, into a kernel on a standard probability space, which is equivalent for all practical purposes (in particular, weakly isomorphic). The steps of such a transformation are described in Exercises 13.11–13.13 below. This implies that we can work with standard probability spaces whenever necessary (or just convenient). We call a kernel standard, if the underlying probability space is standard. From the point of view of subgraph densities, multicuts, etc. the underlying space does not matter much, as we shall see; but choosing the underlying probability space appropriately may lead to a simpler form for the function W and to simpler computations. If the space J is finite, we get just a weighted graph with normalized nodeweights. Let us see a number of further examples where allowing this more general form is very useful (and so Example 11.41 was not an isolated occurrence). Example 13.1 (Limits of interval graphs). An interval graph is a graph obtained from a finite set of intervals on a line, where the intervals are the nodes, and two intervals are connected by an edge if and only if they have a point in common. Diaconis, Holmes and Janson [2011] give the following description of limits of interval graphs. ( ) ( Let J =) {(x, y) ∈ [0, 1]2 : x ≤ y}, and define W (x1 , y1 ), (x2 , y2 ) = 1 [x1 , y1 ]∩ [x2 , y2 ] ̸= ∅ . So (J, W ) can be considered as an infinite interval graph, whose nodes are all sub-intervals of [0, 1]. We can take any probability measure on the Borel sets in J: we always obtain a graphon that is the limit of interval graphs, and all interval graph limits arise this way. Example 13.2. Fix some d ≥ 2, and let Vn be a set of n unit vectors in Rd , chosen independently from the uniform distribution on the unit sphere. Connect two elements x, y ∈ Vn by an edge if and only if xT y ≥ 0, to get a graph Gn = (Vn , En ). The sequence (Gn : n = 1, 2, . . . ) is convergent, and its limit is the graphon whose underlying set is S d−1 , with the uniform distribution, and W (x, y) = 1(xT y ≥ 0). 13.1.1. Atomfree and twin-free kernels. There are still ways to further simplify a kernel (J, W ) on a standard probability space (Ω, A, π). One possibility is to get rid of the atoms by a procedure generalizing the construction of the kernel WH from a weighted graph H, by assigning to each atom a an interval Ia of length π(a), and an interval I to the atom-free part of Ω, so that these intervals partition [0, 1]. This defines a measure preserving map ψ : [0, 1] → Ω, and the pullback W φ will define a kernel on [0, 1] that is weakly isomorphic to (J, W ). This procedure takes us to a very familiar domain (two-variable real functions), but the kernel on [0, 1] is still not uniquely determined by its weak isomorphism class, as we have seen in Example 7.11. To really standardize a kernel, we go the opposite way, by creating and merging atoms as much as we can. To be more precise, we need some definitions.
13.1. THE GENERAL FORM OF A GRAPHON
219
Let (J, W ) be a kernel. Two points x, x′ ∈ J are called twins if W (x, y) = W (x′ , y) for almost all y ∈ J. This defines an equivalence relation on J. We call the kernel H twin-free if no two distinct points in J are twins. Proposition 13.3. For every kernel (J, W ) there is a twin-free kernel (J1 , W1 ) and a measure preserving map φ : J → J1 such that W = W1φ almost everywhere. If J is standard, then we can require that J1 be standard as well. Proof. The twin-free kernel and the measure preserving map φ will be easy to define, but we have to work on verifying their properties. First, we only modify the sigma-algebra of the probability space J = (Ω, A, π). Let us define a new sigmaalgebra A′ consisting of those sets in A that do not separate any twin points. Let J ′ = (Ω, A′ , π) (this is a little abuse of notation, since we should restrict π to A′ ). Define W ′ = E(W | A′ × A′ ). Claim 13.4. W = W ′ almost everywhere. It suffices to show that ∫ ∫ (13.2) W dπ × dπ = A×B
W ′ dπ × dπ
A×B
for all A, B ∈ A. (Note that this holds for A, B ∈ A′ by the definition of conditional probability.) Consider the functions ∫ ∫ UA = W (., y) dπ(y), gA = E(1A | A′ ), VA = W (., y)gA (y) dπ(y). A
Ω ′
These functions are A -measurable by the definition of twins. Furthermore, gA is the orthogonal projection of 1A into the space of A′ -measurable functions, gAB (x, y) = gA (x)gB (y) is the orthogonal projection of 1A×B into the space of A′ × A′ -measurable functions, and by definition, W ′ is the orthogonal projection of W into this space. Using these observations, we have ∫ W dπ × dπ = ⟨1B , UA ⟩ = ⟨gB , UA ⟩ = ⟨VB , 1A ⟩ = ⟨VB , gA ⟩ A×B
= ⟨W, gAB ⟩ = ⟨W , gAB ⟩ = ⟨W , 1A×B ⟩ = ′
′
∫
W ′ dπ × dπ.
A×B
This proves (13.2) and the Claim. It seems that instead of eliminating twin points, we made them even more “twin-like”: they are now not separated by any set from A′ . But this is good, because then identifying them does not change anything: Let Ω1 denote the set of equivalence classes of “being twins” on Ω, let φ(x) be the equivalence class) ( containing x ∈ Ω, let A1 = {φ(X) : X ∈ A′ }, and define π1 (X) = π φ−1 (X) for X ∈ A1 . Then J1 = (Ω1 , A1 , π1 ) is a probability space. Furthermore, W ′ is constant on S × T for two equivalence classes, and hence W1 (S, T ) = W ′ (x, y) (x ∈ S, y ∈ T ) is well defined. Trivially, W1φ = W ′ , so by Claim 13.4, we have W1φ = W almost everywhere. For the proof of the second statement of the Proposition, we need the following. Claim 13.5. If A is countably generated, then A1 is countably separated.
220
13. ON THE STRUCTURE OF GRAPHONS
Let R be a countable generating set in A. For every R ∈ R and rational number r, consider the set ∫ { } SR,r = x ∈ Ω : W (x, y) dπ(y) ≥ r . R
Clearly SR,r ∈ A′ . Furthermore, if x and x′ are not twins, then W (x, .) and ′ W measure, and so there is ∫ (x , .) differ on a ∫set of positive ∫ a set R ∈ R ∫ such that ′ W (x, y) dπ(y) = ̸ W (x , y) dπ(y). Assume that (say) W (x, .) > W (x′ , .), R R R R ∫ ∫ ′ then for any rational number r with R W (x, .) > r > R W (x , .) we have x ∈ SR,r but x′ ∈ / SR,r . So the countable family of sets SR,r separates any two points of Ω that are not twins. It follows that the sets φ(SR,r ) ∈ A1 separate any two points of Ω1 . Claim 13.5 and Proposition A.4 in the Appendix complete the proof. Exercise 13.6. Show that for the set J and function W in Example 13.1, several different measures on J can yield the same—isomorphic—graphons. Exercise 13.7. Consider two graphons U = (Ω, A, π, W ) and U ′ = (Ω, A, π ′ , W ) which only differ in their probability measures. Prove that δ1 (U, U ′ ) ≤ 2dvar (π, π ′ ). [Hint: use Exercise 8.15.] Exercise 13.8. Suppose that a graphon (Ω, A, π, W ) is defined on a metric space (Ω, d), where A is the set of Borel sets, π is atom-free, and W is almost everywhere continuous. Suppose that the sequence Sn ⊆ J is well distributed in the sense that |Sn ∩ U |/|Sn | → π(U ) for every open set U . Then t(F, G(Sn , W )) → t(F, W ) for every simple graph F with probability 1.
13.2. Weak isomorphism III We give a characterization of weakly isomorphic kernels first in the twin-free case, and then use this to complete our arsenal of characterizations in general. Theorem 13.9. If two standard twin-free kernels are weakly isomorphic, then they are isomorphic up to a nullset. Proof. Let (J1 , W1 ) and (J2 , W2 ) be two weakly isomorphic twin-free kernels, where Ji = (Ωi , Ai , πi ) (i = 1, 2) are standard probability spaces. By Corollary 10.35, there is a third kernel (J, W ) (J = (Ω, A, φ)) and measure preserving maps φi : Ji → J such that Wi = W φi almost everywhere. Let Ω′i be the set of all elements u ∈ Ωi for which Wi (u, v) = W φi (u, v) for almost all v ∈ Ωi . By the definition of W and φi , we must have πi (Ωi \ Ω′i ) = 0. So we can delete the elements of Ωi \ Ω′i from Ωi for i = 1, 2. In other words, we may assume that for every u ∈ Ωi , Wi (u, v) = W φi (u, v) for almost all v ∈ Ωi . This implies that φi is injective. Indeed, the set φ−1 i (x) consists of twins in the kernel (Ji , W φi ), which remain twins in (Ji , Wi ). Since (Ji , Wi ) is twin-free, it follows that φ−1 i (x) has only one element. An injective measure preserving map from a standard probability space into another one is almost bijective: the set of points with no inverse image has measure ∗ 0 (Proposition A.4 in the Appendix). Hence Ω∗ = φ1 (Ω1 ) ∩ φ2 (Ω2 ), Ω∗1 = φ−1 1 (Ω ) −1 ∗ ∗ and Ω2 = φ2 (Ω ) have measure 1 in the corresponding graphons, and we can restrict these kernels to them. But then φ1 and φ2 are isomorphisms between these three kernels.
13.2. WEAK ISOMORPHISM III
221
This theorem adds a further characterization of weak isomorphism of standard kernels, formulated before. The following theorem summarizes these characterizations: Corollaries 10.34, 10.35 and 10.36 established the equivalence of (a) with (c), (e), (f) and (b); Proposition 13.3 and Theorem 13.9 imply the nontrivial part of (d). Theorem 13.10. For two standard kernels (J1 , W1 ) and (J2 , W2 ) the following are equivalent: (a) t(F, W1 ) = t(F, W2 ) for every simple graph F (i.e., (J1 , W1 ) and (J2 , W2 ) are weakly isomorphic); (b) t(F, W1 ) = t(F, W2 ) for every loopless multigraph F ; (c) δN (W1 , W2 ) = 0 for every (or just for one) smooth invariant norm; (d) there exist a standard kernel (J0 , W0 ) and measure preserving maps φi : J0 → Ji such that Wi = W0φi almost everywhere; (e) there exist a standard kernel (J0 , W0 ) and measure preserving maps φi : Ji → J0 such that Wiφi = W0 almost everywhere; (f) there exists a coupling measure µ between J1 and J2 such that W1 (X1 , Y1 ) = W2 (X2 , Y2 ) for almost all pairs (X1 , X2 ) and (Y1 , Y2 ) selected independently from the distribution µ. There are (at least) four quite different ways to prove Theorem 13.10 (not counting trivial differences like the order in which various conditions are proved). • The main steps in the route followed here was to establish the equivalence of (a) and (c) (which was proved by Borgs, Chayes, Lov´asz, S´os and Vesztergombi [2008]) and then use Theorem 8.13 (due to Bollob´as and Riordan [2009]). • The original proof by Borgs, Chayes and Lov´asz [2010] is more direct but a lot longer; it is built on the natural idea to bring every kernel to a “canonical form”, so that weakly isomorphic kernels would have identical canonical forms. In the case of functions in a single variable, a canonical form that works in many situations is “monotonization” (see the Monotone Reordering Theorem A.19 in the Appendix). For kernels there does not seem to exist such a canonical form, but one can construct, for every kernel W , a “canonical ensemble”: a probability distribution πW on the set of kernels such that two kernels U and W are weakly isomorphic if and only if the distributions πU and πW are identical. • Diaconis and Janson [2008] showed that theorem 13.10 also follows from results of Kallenberg [2005] in the theory of exchangeable random variables [2005]. • The proof of Janson [2010] is based on the idea of pure kernels (see Section 13.3) and Theorem 13.9. Exercise 13.11. Let J = (Ω, A, π) be a probability space and W : Ω × Ω → R, a symmetric function measurable with respect to the completion of A × A. Show that W can be changed on a set of measure 0 so that it becomes measurable with respect to A × A. Exercise 13.12. Let (J, W ) be a kernel on a (non-standard) probability space J = (Ω, A, π). Show that there is a countably generated σ-algebra A0 ⊆ A on Ω such that W is measurable with respect to the completion of A0 × A0 . Exercise 13.13. Let (J, W ) be a kernel on a (non-standard) countably generated separating probability space J = (Ω, A, π). Show that there is a standard probability space J0 = (Ω0 , A0 , π0 ), a kernel (J0 , W0 ), and an injective measure preserving map φ : J → J0 such that W = W0φ almost everywhere.
222
13. ON THE STRUCTURE OF GRAPHONS
Exercise 13.14. Consider two probability spaces (Ω, A, π) and (Ω′ , A′ , π ′ ), a kernel (Ω, A, π, W ), and a measure preserving map φ : Ω → Ω′ (note: it goes in the opposite direction than in the definition of the pullback!). (a) Show that one can construct a “push-forward” kernel (Ω′ , A′ , π ′ , Wφ such that ( ) (Wφ )φ = E W | φ−1 (A′ ) × φ−1 (A′ ) . (b) show by an example that t(F, Wφ ) = t(F, W ) does not hold in general.
13.3. Pure kernels In the spirit of classical analysis, there is no room to further standardize a kernel: We have seen that every kernel is equivalent to a twin-free kernel, and two weakly isomorphic twin-free kernels are isomorphic up to a nullset, and usually we don’t care about nullsets. But it turns out that cleaning up these nullsets is worth the trouble. 13.3.1. Purifying kernels. We introduce a distance notion on the points of a kernel. Let (J, W ) be a kernel. We can endow the space J with the distance function ∫ rW (x, y) = ∥W (x, .) − W (y, .)∥1 = |W (x, z) − W (y, z)| dz. J
This function is defined for almost all pairs x, y; we can delete those points from J where W (x, .) ∈ / L1 (J) (a set of measure 0), to have rW defined on all pairs. It is clear that rW is a pseudometric (it is symmetric and satisfies the triangle inequality). We call rW the neighborhood distance on W . Example 13.15 (Stepfunctions). For stepfunctions, the underlying metric space is finite. Example 13.16 (Spherical distance). Let S d denote the unit sphere in Rd+1 , consider the uniform probability measure on it, and let W (x, y) = 1 if x · y ≥ 0 and W (x, y) = 0 otherwise. Then (S d , W ) is a graphon, in which the neighborhood distance of two points a, b ∈ S d is just their spherical distance (normalized by dividing by π). Example 13.17. Let (M, d) be a metric space, and let π be a Borel probability measure on M . Then d can be viewed as a kernel on (M, d). For x, y ∈ M , we have ∫ ∫ rd (x, y) = |d(x, z) − d(y, z)| dπ(z) ≤ d(x, y) dπ(z) = d(x, y), M
M
so the identity map (M, d) → (M, rd ) is contractive. This implies that if (M, d) is compact, and/or finite dimensional (in many senses of dimension), then so is (M, rd ). For most ”everyday” metric (spaces )(like segments, spheres, or balls) rd (x, y) can be bounded from below by Ω d(x, y) , in which case (M, d) and (M, rd ) are homeomorphic. ( More)generally, if F : [0, 1] → R is a continuous function, then W (x, y) = F d(x, y) defines a kernel, and the identity map (M, d) → (M, rW ) is continuous. A kernel (J, W ) is pure if (J, rW ) is a complete separable metric space and the probability measure has full support (i.e., every open set has positive measure).
13.3. PURE KERNELS
223
This definition includes that rW (x, y) is defined for all x, y ∈ J and rW (x, y) > 0 if x ̸= y, i.e., the kernel has no twins. Theorem 13.18. Every twin-free kernel is isomorphic, up to a null set, to a pure kernel. Proof. Let (J, W ) be a twin-free kernel. Let T be the set of functions f ∈ L1 (J) such that for every L1 -neighborhood U of f , the set {x ∈ J : W (x, .) ∈ U } has positive measure. Clearly T is a closed subset of L1 (J), and it is complete and separable in the L1 -metric. Let J ′ be the set of points in J for which W (x, .) ∈ T , and let T ′ = {W (x, .) : x ∈ J ′ }. The map φ : J ′ → T ′ defined by x 7→ W (x, .) is bijective, since (J, W ) is twin-free. The set T inherits a probability measure π ′ = π ◦ φ−1 from J. It is easy to see from the construction that J ′ and T ′ are measurable. We claim that π(J \ J ′ ) = 0.
(13.3)
It is clear that for almost all x ∈ J, W (x, .) ∈ L1 (J). Every function g ∈ L1 (J) \ T has an open ∪ neighborhood Ug in L1 (J) such that π{x ∈ J : W (x, .) ∈ Ug } = 0. Let U = g∈T / Ug . Since L1 (J) is separable, U equals the union of some countable subfamily {Ugi : i ∈ N} and thus π{x ∈ J : W (x, .) ∈ U } = 0. Since J \ J ′ ⊆ U , this proves (13.3). The functions W (x, .) (x ∈ J ′ ) are everywhere dense in T and have measure 1. So T is a complete separable metric space with a probability measure on its Borel sets. It also follows from the definition of T that every open set has positive measure, and (13.3) implies that π ′ (T \ T ′ ) = 0. We define a kernel W ′ : T × T → [0, 1] as follows. Let f, g ∈ T . If f ∈ T ′ , then f = W (x, .) for some x ∈ J, and we define W ′ (f, g) = g(x). Similarly, if g ∈ T ′ , then g = W (., y) and we define W ′ (f, g) = f (y). Note that if both f, g ∈ T ′ , then this definition is consistent: W ′ (f, g) = f (y) = g(x) = W (x, y). If f, g ∈ / T ′ , then ′ we define W (f, g) = 0. We note that f and g are determined up to a zero set only; we can choose any function representing them, and since we are changing W on a set of measure 0 only, it remains measurable. The kernel (T, W ′ ) is pure; indeed, we just have to check that rW ′ coincides with the L1 metric on T ; then T will have all the right properties. For f, g ∈ T , we have ∫ ∫ rW ′ (f, g) = |W ′ (f, y) − W ′ (g, y)| dπ ′ (y) = |W ′ (f, y) − W ′ (g, y)| dπ ′ (y) ′ T ∫T ∫ = |f (y) − g(y)| dπ(y) = |f (y) − g(y)| dπ(y) = ∥f − g∥1 . J′
J ′
The kernels (J, W ) and (T, W ) are isomorphic up to a 0-set; indeed, we can get the kernel (J ′ , W |J ′ ) from (J, W ) and the kernel (T ′ , W ′ |T ′ ) from the kernel (T, W ′ ) by deleting appropriate 0-sets, and φ is an isomorphism between (J ′ , W |J ′ ) and (T ′ , W ′ |T ′ ). This proves the theorem. There is still some freedom left: given a pure kernel (J, W ), we can change the value of W on a symmetric subset of J × J that intersects every fiber J × {v} in a set of measure 0. We can take the integral of W (which is a measure ω on J × J), and then the derivative of ω with respect to π × π wherever this exists. This way we get back W almost everywhere, and a well defined value for some further points.
224
13. ON THE STRUCTURE OF GRAPHONS
After all these changes, where W is left undefined is the set of “essential discontinuities” of W (of measure 0). It would be interesting to relate this set to combinatorial properties of W . 13.3.2. Density functions on pure kernels. Now we come to the utilization of our work with purifying kernels. The following technical lemma will be very useful in the study of homomorphism densities on pure graphons. Lemma 13.19. Let (J, W ) be a pure graphon and let F = (V, E) be a k-labeled multigraph with nonadjacent labeled nodes. Then |tx (F, W ) − tx (F, W )| ≤ e(F ) max rW (xu , x′u ) u∈[k]
′
for all x, x ∈ J . k
It follows that the functions t are Lipschitz (and hence continuous). Proof. Let E = {u1 v1 , . . . um vm }, where we may assume that vi is unlabeled. For each v ∈ V \ [k], let yv = xv = x′v be a variable. Then, using a telescoping sum, ∫ tx (F, W ) − tx′ (F, W ) =
=
m ∑
∫
J V \[k]
∏
m ∏
∫ W (xui , xvi ) dy −
i=1
J V \[k]
m ∏
W (x′ui , x′vi ) dy
i=1
( )∏ W (xui , xvi ) W (xuj , xvj ) − W (x′uj , x′vj ) W (x′ui , x′vi ) dy
j=1 V \[k] ii
and hence |tx (F, W ) − tx′ (F, W )| ≤
∫ m ∑
|W (xuj , xvj ) − W (x′uj , x′vj )| dy.
j=1 V \[k] J
By the assumption that vi is unlabeled, we have xvj = x′vj for every j, and so ∫ m ∑ |tx (F, W ) − tx′ (F, W )| ≤ |W (xuj , xvj ) − W (x′uj , xvj )| dy ≤
j=1 V \[k] J m ∑
rW (xuj , x′uj ) ≤ e(F ) max rW (xu , x′u ),
j=1
u∈[k]
which proves the assertion.
Corollary 13.20. Let (J, W ) be a pure kernel, and let F = (V, E) be a k-labeled graph with nonadjacent labeled nodes. Then tx (F, W ) is a continuous function of x ∈ J S with respect to the metric rW . In the case when F is a path of length 2, we get a corollary that will be important in the next section. Corollary 13.21. For every pure kernel (J, W ), W ◦ W is a continuous function (in two variables) on the metric space (J, rW ). Most applications of Corollary 13.20 use the following consequence:
13.4. THE TOPOLOGY OF A GRAPHON
225
Corollary 13.22. Let (J, W ) be a pure kernel, and let F1 , . . . , Fm be (k + n)labeled multigraphs with nonadjacent labeled nodes. Let a1 , . . . , am be real numbers and x ∈ J k , such that the equation (13.4)
m ∑
ai tx,y (Fi , W ) = 0
i=1
holds for almost all y ∈ J k . Then it holds for all y ∈ J k . Proof. By Corollary 13.20, the left side of (13.4) is a continuous function of (x, y), and so it remains a continuous function of y if we fix x. Hence the set where it is not 0 is an open subset of J k . Since the graphon is pure, it follows that this set is either empty or has positive measure. Going to pure graphons is a good proof method, which can lead to nontrivial results. We illustrate this by discussing properties of t(., W ) (W ∈ W) from the point of view of Section 6.3.2. This parameter is multiplicative and reflection positive. If W is not a stepfunction, then t(., W ) has no contractor (else, 6.30 would imply that it is a homomorphism function). On the other hand, it is contractible. We prove this in a more general form. Let F be a k-labeled multigraph, and let P = {S1 , . . . , Sm } be a partition of [k]. We say that P is legitimate for F , if each set Si is stable in F . If this is the case, then the m-labeled multigraph F/P (obtained by identifying the nodes in each Si , and labeling the obtained node with i) has no loops. For a k-labeled quantum graph g, we say that the partition P of [k] is legitimate for g if it is legitimate for every constituent. Then we can define g/P by linear extension. Proposition 13.23. Let g be a k-labeled quantum graph and P, a legitimate partition for g. Let W ∈ W, and suppose that tx (g, W ) = 0 almost everywhere on [0, 1]k . Then ty (g/P, W ) = 0 for almost all y ∈ [0, 1]|P| . Proof. We may assume that W is pure. If tx (g, W ) = 0 almost everywhere, then it holds everywhere by Corollary 13.20. In particular, it holds for every substitution where the variables corresponding to the same class of P are identified, which means that ty (g/P, W ) is identically 0 on [0, 1]|P| . Corollary 13.24. The multigraph parameter t(., W ) is contractible for every kernel W. Exercise 13.25. Show that using properties of pure kernels, one gets a very short proof of the statement of Exercise 7.6.
13.4. The topology of a graphon We have seen that every graphon is weakly isomorphic to a pure graphon (which is unique up to changing the function on special nullsets), which has an underlying complete metric space J. So we could ask: what topological properties does J have for special graphons, and are they related to the combinatorial properties of graph sequences converging to W ? It turns out that these questions are more interesting if we ask them for a different, but related topology on the underlying set, and this is what we are going to introduce now.
226
13. ON THE STRUCTURE OF GRAPHONS
13.4.1. The similarity distance. It was noted by Lov´asz and Szegedy [2007, 2010b] that for a pure graphon (J, W ), the distance function rW = rW ◦W defined by the operator square of W is also closely related to combinatorial properties of a graphon. We call this the similarity distance. In the special case of finite graphs, this notion was defined in the Introduction, where the motivation for its name was also explained. We will use the graph version to design algorithms in Section 15.4.1. In explicit terms, we have ∫ ∫ ∫ rW (a, b) = rW ◦W (a, b) = W (a, y)W (y, x) dy − W (b, y)W (y, x) dy dx (13.5)
J J J ∫ ∫ ( ) = W (x, y) W (a, y) − W (b, y) dy dx . J
J
(We write here and in the sequel dx instead of dπ(x), where π is the probability measure of the graphon.) Lemma 13.26. If (J, W ) is a pure graphon, then the similarity distance rW is a metric. Proof. The only nontrivial part of this lemma is that rW (a, b) = 0 implies that a = b. The condition rW (a, b) = 0 implies that for almost all x ∈ J we have ∫ ( ) W (x, y) W (a, y) − W (b, y) dy = 0. J
Using that (J, W ) is pure, Corollary 13.22 implies that this holds for every x ∈ J. In particular, it holds for x = a and x = b. Substituting these values and taking the difference, we get that ∫ ( )2 W (a, y) − W (b, y) dy = 0, J
and hence W (a, y) = W (b, y) for almost all z. Using again that (J, W ) is pure, we conclude that a = b. So (J, rW ) is a metric space, and hence Hausdorff. We have to be careful though, since the metric space (J, rW ) is not necessarily complete. We will work with its completion (J, rW ), but first we define it differently. Let us say that a sequence of points xn ∈ J is weakly convergent if ∫ ∫ W (xn , y) dy → W (x, y) dy A
A
for every measurable set A ⊆ J. We call this the weak topology on J. (We need this name only temporarily, since we are going to show that rW gives a metrization of the weak topology.) It is well known that this topology is metrizable. Let J denote the completion of J in the weak topology. The map x 7→ W (x, .) embeds J into L1 (J), and weak convergence corresponds to weak* convergence of functions in L1 (J). Hence J corresponds to the weak* closure of J. It follows in particular that J is a compact separable metric space (compactness follows by Aleoglu’s Theorem, since J is a closed subset of the unit ball of L1 (J)).
13.4. THE TOPOLOGY OF A GRAPHON
227
We can extend the definition of W to J × J: for x ∈ J and y ∈ J \ J, we define W (x, y) = x(y) (recall that the elements of J can be identified with functions on J), and we define W (x, y) = 0 if x, y ∈ J \ J. We also extend the measure π by π(J \ J) = 0. Theorem 13.27. For any pure graphon, the metric rW defines exactly the weak topology on J. Proof. First we show that the weak topology is finer than the topology of (J, rW ). Suppose that xn → x in the weak topology, this means that the functions W (xn , .) converge weakly to W (x, .). Consider ∫ ∫ ) ( rW (xn , x) = W (xn , y) − W (x, y) W (y, z) dy dz. J
J
Here the inner integral tends to 0 for every z, by the weak convergence xn → x. Since it also remains bounded, it follows that the outer integral tends to 0. This implies that xn → x in (J, rW ). (Let us note that since π(J \ J) = 0, it does not matter whether we integrate over J or over J.) From here, the equality of the two topologies follows by general arguments: the weak topology on J is compact, and the coarser topology of rW is Hausdorff, which implies that they are the same. Corollary 13.28. For every pure graphon (J, W ), the space (J, rW ) is compact. Another useful corollary of these considerations concerns continuity in the similarity metric. We have seen that W ◦ W is continuous in the metric rW ; one might hope that the function W is continuous as a function on (J, rW ), but this would be too much to ask for (the half-graphon is an easy example). However, integrating out one of the variables we get a continuous function. To be more precise (and more general): Corollary 13.29. For every pure graphon (J, W ), and every function g ∈ L1 (J), the function ∫ (TW g)(.) = W (., y)g(y) dy J
is continuous on (J, rW ). In particular, it follows that every eigenfunction of TW is continuous on (J, rW ). Proof. Let xn → x in the rW metric. Then W (xn , .) → W (x, .) in the weak topology by Theorem 13.27, and hence ∫ ∫ W (Xn , y)g(y) dy → W (x, y)g(y) dy. J
J
We conclude this section with an example in which the topologies defined by rW and rW are different. Note that for any two points x, y ∈ J, we have (13.6)
rW (x, y) ≤ rW (x, y),
which implies that the topology of (J, rW ) is finer than the topology of (J, rW ). The two topologies may be different. Graphons for which the finer space (J, rW ) is also compact seem to have special importance in combinatorics.
228
13. ON THE STRUCTURE OF GRAPHONS
Example 13.30. For y ∈ [0, 1), let y = 0.y1 y2 . . . be the binary expansion of y. Let us decompose [0, 1) into the intervals Ik = [1 − 2−k , 1 − 2−k−1 ). Define U (x, y) = yk for 0 ≤ y ≤ 1 and x ∈ Ik . Define W (1, y) = 1/2 for all y. This function is not symmetric, so we put it together with a reflected copy to get a graphon: U (2x, 2y − 1), if x ≤ 1/2 and y ≥ 1/2, W (x, y) = U (2y, 2x − 1), if x ≥ 1/2 and y ≤ 1/2, 0, otherwise. (This is rather difficult to parse, but it is an important example. Perhaps Figure 13.1 helps.) Selecting one point from each interval [1 − 2−k , 1 − 2−k−1 ), we get an infinite number of points in [0, 1) mutually at rW -distance 1/4; so this sequence is not convergent even in the completion of this graphon. (In particular, (J, rW ) is not compact.) On the other hand, this same sequence converges in (J, rW ). So the two topologies are different.
Figure 13.1. A graphon defined on a space whose completion in rW is not compact. The picture on the left shows just one half. 13.4.2. Similarity distance and regularity partitions. Now we come to the results that first motivated the introduction of a second distance notion defined by a graphon. Let (J, d) be a metric space and let π be a probability measure on its Borel ∫ sets. We say that a set S ⊆ J is an average ε-net, if J d(x, S) dπ(x) ≤ ε. Let S ⊆ J be a finite set and s ∈ S. The Voronoi cell of S with center s is the set of all points x ∈ J for which d(x, s) ≤ d(x, y) for all y ∈ S. Clearly, the Voronoi cells of S cover J. (We can break ties arbitrarily to get a partition.) Theorem 13.31. Let (J, W ) be a pure graphon, and ε > 0. (a) Let S be an average ε-net in the metric space (S, rW ). Then √ the Voronoi cells of S form a weak regularity partition P with error at most 8 ε. (b) Let P = {J1 , . . . , Jk } be a weak regularity partition with error ε. Then there are points vi ∈ Ji such that the set S = {v1 , . . . , vk } is an average (4ε)-net in the metric space (S, rW ). Proof. (a) Let P be the partition into the√Voronoi cells of S. Let us write R = W − WP . We want to show that ∥R∥ ≤ 8 ε. It suffices to show that for any 0-1 valued function f , √ (13.7) ⟨f, Rf ⟩ ≤ 2 ε.
13.4. THE TOPOLOGY OF A GRAPHON
229
Let us write g = f − fP , where fP (x) is obtained by replacing f (x) by the average of f over the class of P containing x. Clearly ⟨fP , RfP ⟩ = 0, and so (13.8)
⟨f, Rf ⟩ = ⟨g, Rf ⟩ + ⟨fP , Rf ⟩ = ⟨f, Rg⟩ + ⟨fP , Rg⟩ ≤ 2∥Rg∥1 ≤ 2∥Rg∥2 .
For each x ∈ J, let (φ(x) ∈ )S be the center of the Voronoi ( cell )containing x, and define W ′ (x, y) = W x, φ(y) and similarly R′ (x, y) = R x, φ(y) . Then using that (W − R)g = WP g = 0, W − W ′ = R − R′ and R′ g = 0, we get ∥Rg∥22 = ⟨Rg, Rg⟩ = ⟨W g, (R − R′ )g⟩ = ⟨W g, (W − W ′ )g⟩ = ⟨g, W (W − W ′ )g⟩ ∫ ∫ ( ) ′ ≤ ∥W (W − W )∥1 = W (x, y) W (y, z) − W (y, φ(z)) dy dx dz ∫ =
J2 J
( ) ( ) rW z, φ(z) = Ex (rW x, S) ≤ ε.
J
This proves (13.7). (b) Suppose that P is a weak regularity partition with error ε. Let R = W −WP , then we know that ∥R∥ ≤ ε. For every x ∈ [0, 1], define ∫ ∫ ∫ F (x) = R(x, y)W (y, z) dy dz = s(x, z)R(x, y)W (y, z) dy dz, J
J
J2
∫
where s(x, z) is the sign of R(x, y)W (y, z) dy. Lemma 8.10 implies that for every z ∈ J, ∫ s(x, z)R(x, y)W (y, z) dx dy ≤ 2∥R∥ ≤ 2ε, J2
and hence
∫ F (x) dx ≤ 2ε.
(13.9) J
Let x, y ∈ J be two points in the same partition class of P. Then WP (x, s) = WP (y, s) for every s ∈ J, and hence ∫ ∫ ( ) (13.10) rW (x, y) = W (x, s) − W (y, s) W (s, z) ds dz J
J
∫ ∫ ( ) = R(x, s) − R(y, s) W (s, z) ds dz J J ∫ ∫ ∫ ≤ R(x, s)W (s, z) ds dz + J
J
J
∫ R(y, s)W (s, z) ds dz J
= F (x) + F (y). For every set T ∈ P, let vT ∈ T be a point “below average” in the sense that ∫ 1 F (vT ) ≤ F (x) dx, π(T ) T
230
13. ON THE STRUCTURE OF GRAPHONS
and let S = {vT : T ∈ P}. Then using (13.9), ∑∫ ∑∫ ( ) rW (x, vT ) dx ≤ Ex rW (x, S) ≤ F (x) + F (vT ) dx T ∈P T
∫ ≤
F (x) dx +
∑
T ∈P T
T ∈P
J
∫
λ(T )F (vT ) ≤ 2
F (x) dx ≤ 4ε.
J
By the Weak Regularity Lemma, we get that every graphon has an average 2 ε-net of size 2O(1/ε ) . How about a “true” ε-net? By a standard trick, if we take a maximum set R of points such that any two are at a distance at least ε, then every point is at a distance of at most ε from R. But is such a set necessarily finite? And if so, how can we bound its size? It turns out that one can give a bound that is similar to the bound on the size of an average ε-net derived from Theorem 13.31. The following result is due to Alon [unpublished]. Proposition 13.32. Let (J, W ) be a graphon and let R ⊆ J be a set such that 2 rW (s, t) ≥ ε for all s, t ∈ R (s ̸= t). Then |R| ≤ (16/ε2 )257/ε . The bound on the size of R is somewhat worse than for the average ε-net, but the main point is that it depends on ε only. There are examples showing that an exponential dependence on 1/ε is unavoidable (Exercise 13.41). Proof. Consider the Frieze–Kannan decomposition of W provided by Lemma 9.14: (13.11)
W =
k ∑
ai 1Si ×Ti + U,
i=1
∑ 2 2 where √ k = ⌈256/ε ⌉, the sets Si , Ti ⊆ J are measurable, i ai ≤ 4, and ∥U ∥ ≤ 4/ k ≤ ε/4. For s, t ∈ R, we have ∫ ∫ ) ( rW (s, t) = W (s, y) − W (t, y) W (y, z) dy dz J
∫
=
J
( ) σst (z) W (s, y) − W (t, y) W (y, z) dy dz,
J×J
) ∫( where σst (z) is the sign of W (s, y) − W (t, y) W (y, z) dy. Substituting from (13.11) for the last occurrence of W , we get ∫ k ∑ ( ) rW (s, t) = ai σst (z) W (s, y) − W (t, y) 1Si ×Ti (y, z) dy dz i=1
∫
(13.12)
+
J×J
( ) σst (z) W (s, y) − W (t, y) U (y, z) dy dz.
J×J
Here the last term is small by Lemma 8.10 and the choice of U : ∫ ( ) ε σst (z) W (s, y) − W (t, y) U (y, z) dy dz ≤ 2∥U ∥ ≤ . 2 J×J
13.4. THE TOPOLOGY OF A GRAPHON
231
To bound a term from the first sum, we do a little computation: ∫ ∫ (∫ )(∫ ) ( ) σst (z) W (s, y) − W (t, y) dy dz = σst (z) dz W (s, y) − W (t, y) dy , Ti Si
and hence, with fi (s) =
∫ Si
Ti
Si
W (s, y) dy, we have
∫ ∫ ( ) σst (z) W (s, y) − W (t, y) dy dz Ti Si
∫ ≤ W (s, y) − W (t, y) dy = |fi (s) − fi (t)|. Si
√ 2 Now if |R| > (16/ε2 )257/ε √> (4 k/ε)k , then there is a pair of points s, t ∈ R such that |fi (s) − fi (t)| ≤ ε/(4 k) for all i, and for this choice of s and t we have k ∑ i=1
∫ ai J×J
k ∑ ( ) ε ε σst (z) W (s, y) − W (t, y) 1Si ×Ti (y, z) dy dz < |ai | √ ≤ . 2 4 k i=1
∑ (In the last step we used that i a2i ≤ 4 and the inequality between arithmetic and quadratic means.) By (13.12) this implies that rW (s, t) < ε, a contradiction. 13.4.3. Finite dimensional graphons. The main reason to be interested in this topology is the following consequence of Theorem 13.31. We define the (upper) Minkowski dimension of a metric space (M, d) as lim sup ε→0
log N (ε) , log(1/ε)
where N (ε) is the maximum number of points in M mutually at distance at least ε. This dimension is finite if and only if there is a d ≥ 0 such that every set of points mutually at distance at least ε has at most ε−d elements. Corollary 13.33. If (J, W ) is a graphon for which the space (J, rW ) has finite Minkowski dimension d, ( ) then for every ε > 0 the graphon has a weak regularity partition with O (1/ε)d classes. Which graphons are finite dimensional in this sense? Almost all of those we have met so far are 1 or at most 2-dimensional (see Exercise 13.44). We will formulate some conjectures later, in Section 16.7.1; right now we describe an interesting combinatorially defined class with this property. We say that a graphon W misses a signed graph F if t(F, W ) = 0. Trivially, if W misses F , then the complementary graphon 1 − W misses the signed graph F − obtained from F by negating the signs of the edges. We will only consider bipartite graphs F ; extension of these results to non-bipartite graphs is open. Graphons missing a signed bipartite graph can be characterized in terms of the Vapnik–Chervonenkis dimension (see Appendix A.6 for basic information about the VC-dimension). This characterization is not much more than a reformulation of the definition, but useful nonetheless.
232
13. ON THE STRUCTURE OF GRAPHONS
Proposition 13.34. A pure graphon (J, W ) misses some signed bipartite graph with k nodes in the smaller bipartition class if and only if W is 0-1 valued almost everywhere { ( ) and the } VC-dimension of the family of neighborhoods RW = supp W (x, .) : x ∈ J is less than k. Proof. First, suppose that (J, W ) misses a signed bipartite graph F with bipartition V1 ∪ V2 , where V1 = [k] and V2 = {1′ , . . . , m′ }. We start with showing that W is 0-1 valued almost everywhere. Let F • be obtained by labeling all nodes of V1 . Then for almost all x ∈ J k , we have tx (F • , W ) = 0. By Corollary 13.22, it follows that tx (F • , W ) = 0 for every x ∈ J k . In particular, tz...z (F • , W ) = 0 for all z ∈ J. But for this substitution, ∫ ∏ m ( )d− (j) + tz...z (F • , W ) = W (z, yj )d (j) 1 − W (z, yj ) dy1 . . . dym J m j=1 +
−
(where d (j) and d (j) are the numbers of positive and negative edges of F incident with j, respectively). If there is a z ∈ J such that 0 < W (y, z) < 1 for all y ∈ Y , where Y has positive measure, then the part of the integral over Y m is already positive, so tz...z (F • , W ) > 0, a contradiction. Next, we show that the VC-dimension of RW is less than k. Suppose not, then there is a) set S = {x { ( } 1 , . . . , xk } ⊆ J with |S| = k such that the family H = supp W (x, .) : x ∈ S is qualitatively independent (this means that for every H′ ⊆ H there is a point contained in all sets of H′ but in no set of H \ H′ ). This implies that tx1 ...xk (F • , W ) > 0. By the purity of (J, W ) and Corollary 13.22, the set of points (y1 , . . . , yk ) ∈ [0, 1]k for which ty1 ...yk (F • , W ) > 0 has positive measure. Hence t(F, W ) > 0. Conversely, suppose that W is 0-1 valued (we may assume everywhere), and dimV C (RW ) < k. Let F denote the signed complete bipartite graph with k nodes in one class U and 2k nodes in the other class U ′ , in which each node in U ′ is connected to a different set of nodes in U by positive edges. Let F • be obtained by labeling the nodes in U . Then any choice of x1 , . . . , xk for which tx1 ...xk (F • , W ) > 0 gives k points with qualitatively independent neighborhoods, which is impossible. So we must have t(F, W ) = 0. Our main goal is to connect the VC-dimension of neighborhoods to the dimension of J. The following theorem was proved (in a slightly more general form) by Lov´ asz and Szegedy [2010b]. Theorem 13.35. If a pure graphon (J, W ) misses some signed bipartite graph F , then (a) W is 0-1 valued almost everywhere, (b) (J, rW ) is compact, and (c) it has Minkowski dimension at most 10v(F ). Proof. (a) is just repeated from Proposition 13.34. To prove (b), we start with studying weakly convergent sequences of functions W (x, .). Let (x1 , x2 , . . . ) be a sequence of points in J and suppose that there is a function f ∈ L1 (J) such that ∫ ∫ W (xn , y) dy −→ f (y) dy S
for every measurable set S ⊆ J.
S
13.4. THE TOPOLOGY OF A GRAPHON
233
Claim 13.36. The weak limit function f is almost everywhere 0-1 valued. Suppose not, then there is an ε > 0 and a set( Y ⊆ J2 with positive measure such ) that ε ≤ f (x) ≤ 1 − ε for x ∈ Y . Let Sn = supp W (xn , .) ∩ Y . We select, for every k ≥ 1, k indices n1 , . . . nk so that the Boolean algebra generated by Sn1 , . . . Snk (as subsets of Y ) has 2k atoms of positive measure. If we have this for some k, then for every atom A of the Boolean algebra ∫ ∫ λ(A ∩ Sn ) = W (x, yn ) dx −→ f (x) dx (n → ∞), A
A
and so if n is large enough, then ( ε) ε λ(A) ≤ λ(A ∩ Sn ) ≤ 1 − λ(A). 2 2 If n is large enough, then this holds for all atoms A, and so Sn cuts every previous atom into two sets with positive measure, and we can choose nk+1 = n. But this means that the VC-dimension of the supports of the W (x, .) is infinite, contradicting Proposition 13.34. This proves Claim 13.36. Claim 13.37. The convergence W (xn , .) → f also holds in L1 . Indeed, we know that f (x) ∈ {0, 1} for almost all x, and hence ∫ ∫ ( ) ∥f − W (xn , .)∥1 = 1 − W (xn , y) dy + W (xn , y) dy −→ 0. {f =1}
{f =0}
Now it is easy to prove that (J, rW ) is compact. Consider any infinite sequence (x1 , x2 , . . . ) of points of J. By Alaoglu’s Theorem, this has a subsequence for which the functions W (xn , .) converge weakly to a function f ∈ L1 (J). By Claim 13.37, they converge to f in L1 . This implies that they form a Cauchy sequence in L1 , and so (x1 , x2 , . . . ) is a Cauchy sequence in (J, rW ). Since (J, rW ) is a complete metric space, this sequence has a limit in J. To prove (c), let F be a signed bipartite graph such that t(F, W ) = 0, and let (V1 , V2 ) be a bipartition of F with |V1 | = k, where we may assume that k ≤ v(F )/2. We may assume that F is complete bipartite, since adding edges (with any signs) does not change the condition that t(F, W ) = 0. Let F • be obtained from F by labeling the nodes in V1 . We want to show that the Minkowski dimension of (J, rW ) is at most 20k. It suffices to show that every finite set Z ⊆ J such that the rW{-distance of any ( ) two −20k elements is at least ε, is bounded by |Z| ≤ c(k)ε . Let H = supp W (x, .) : x∈ } Z . Since W is 0-1 valued, the condition on Z means that (13.13)
π(X△Y ) ≥ ε
for any two distinct sets X, Y ∈ H. We do a little clean-up: Let A be the union of all atoms of the set algebra generated by H that have measure 0. Clearly A itself has measure 0, and hence the family H′ = {X \ A : X ∈ H} still has property (13.13). We claim that H′ has VC-dimension less than k. Indeed, suppose that J \ A qj ∈) S bijectively. contains a shattered k-set S. To each j ∈ V1 , we assign a point ( To each i ∈ V2 , we assign a point pi ∈ Z such that qj ∈ supp W (pi , .) if and only if ij ∈ E + . (This is possible since S is shattered.) Now fixing the pi , for each j there
234
13. ON THE STRUCTURE OF GRAPHONS
is a subset of J of positive measure whose points are contained in exactly the same members of H′ as qj , since qj ∈ / A. This means that the function tx1 ...xk (F • , W ) is positive for xi = pi . Corollary 13.22 implies that tx1 ...xk (F • , W ) > 0 for a positive fraction of the choices of x1 , . . . xk ∈ J, and hence t(F, W ) > 0, a contradiction. Applying Proposition A.30 we conclude that |Z| = |H| ≤ (80k)10k ε−20k . This proves that the Minkowski dimension of (J, rW ) is bounded by 20k. The results in this section do not remain true if the signed graph we exclude is nonbipartite. For example, if we exclude any non-bipartite graph, then any bipartite graph satisfies the condition, but some bipartite graphs are known to need an exponential (in 1/ε) number of classes in their weak regularity partitions. Exercise 13.38. Two metrics d1 and d2 on the same set are called uniformly equivalent, if( there is) a function f : R+( → R+ )such that f (x) ↘ 0 if x ↘ 0, d1 (x, y) ≤ f d2 (x, y) and d2 (x, y) ≤ f d1 (x, y) . Prove that for a pure kernel (J, W ), the space (J, rW ) is compact if and only if the metrics rW are rW are uniformly equivalent. Exercise 13.39. Figure out the completions of the spaces ([0, 1], rW ) and ([0, 1], rW ) for the graphon W in Example 13.30. Exercise 13.40. For the graphon (S d , W ) defined in Example 13.16, show that √ the similarity distance of two points a, b ∈ S d is Ω(](a, b)/ d). Exercise 13.41. Show that the graphon in the previous exercise, with an ap2 propriate choice of d, contains 2Ω(1/ε ) points mutually at least ε apart in the similarity distance. Exercise 13.42. Let W be a graphon such that (J, rW ) can be covered by m balls of radius ε. Prove that there exists a stepfunction U with m(1/ε)m steps such that ∥W − U ∥1 ≤ 2ε. Exercise 13.43. Let M (ε) denote the minimum number of sets of diameter at most ε covering (S, d), ( a metric ) space ( ) and define the covering dimension of (S, d) by lim supε→0 log M (ε) / log(1/ε) . Prove that this is the same as the Minkowski dimension. Exercise 13.44. (a) Check that all graphons constructed in Section 11.4.2 are at most 2-dimensional. (b) Prove that a graphon W on [0, 1] that is a continuous function is at most 1-dimensional. (c) Find the dimension of the graphon in Example 13.16. (d) Construct an infinite dimensional graphon. Exercise 13.45. Let W be a graphon such that t(F, W ) = 0 for a signed bipartite graph F = (V, E). Prove that for every 0 < ε < 1, there exists a 0-1 valued 2 stepfunction U with O(ε−10v(F ) ) steps such that ∥W − U ∥1 ≤ ε.
13.5. Symmetries of graphons An automorphism of a graphon W on [0, 1] is an invertible measure preserving map σ : [0, 1] → [0, 1] such that W σ = W almost everywhere. Clearly, the automorphisms of W form a group Aut(W ). An example with many automorphisms is a setfunction W : here Aut(W ) contains the group of all invertible measure preserving transformations that leave the steps invariant, and it contains all the automorphisms of the corresponding weighted graph. Note, however, that if we purify a stepfunction, then we get a finite weighted graph, so the large and ugly subgroups consisting of measure preserving transformations of the steps disappear. We can endow Aut(W ) with the topology of pointwise convergence in the rW metric. Szegedy observed that if W is pure, then Aut(W ) is compact in this topology. This follows from the facts that every automorphism of a graphon is an
13.5. SYMMETRIES OF GRAPHONS
235
isometry of the compact metric space (J, rW ) (this is trivial), and those isometries that correspond to automorphisms form a closed subgroup (this takes some work to prove; see Lov´ asz [Notes]). We will not go into the detailed study of Aut(W ) in this book, even though it has interesting and nontrivial properties. We restrict our treatment to generalizing the easy direction of Theorem 6.36, and to an application of the results of this chapter to characterizing when t(., W ) has finite connection rank. The group Aut(W ) acts on J k for any k. The number of orbits of this action can be estimated from below as follows. Proposition 13.46. The number of orbits of the automorphism group of W on [0, 1]k is at least r(t(., W ), k). Proof. Suppose that Aut(W ) has a finite set of orbits O1 , . . . Om on [0, 1]k . Let F and F ′ be two k-labeled graphs. Then ∫ t([[F F ′ ]], W ) = tx1 ...xk (F, W )tx1 ...xk (F ′ , W ) dx1 . . . dxn . [0,1]k
The functions tx1 ...xk (F, W ) and tx1 ...xk (F ′ , W ) are constant on every orbit, and hence m ∑ t([[F F ′ ]], W ) = λ(Oj )txj,1 ...xj,k (F, W )txj,1 ...xj,k (F ′ , W ), j=1
where (xj,1 . . . xj,k ) is any representative point of Oj . This shows that M (t(., W ), k) is the sum of m matrices of rank 1, and so it has rank at most m. To be able to say something about the finiteness of the number of orbits of the automorphism group on k-tuples of points of a graphing, we need the following theorem. Theorem 13.47. Let W be a graphon such that r(t(., W ), 2) is finite. Then W is a stepfunction. Proof. First we show that TW has finite rank. It is clear that a kernel W has at most m different nonzero eigenvalues if and only if there are real numbers a0 , . . . , am , not all 0, such that m ∑
(13.14)
ak W ◦(k+2) = 0
k=0
almost everywhere (so that all eigenvalues of W will be roots of the polynomial ∑ k+2 a x ). We claim that this is equivalent to requiring that k k (13.15)
m ∑
ak ⟨W ◦(k+2) , W ◦(l+2) ⟩ = 0
(l = 0, . . . , m).
k=0
Indeed, (13.14) clearly implies (13.15) for every l; on the other hand, (13.15) implies that m m ⟨∑ ⟩ ∑ (13.16) ak W ◦(k+2) , ak W ◦(k+2) = 0, k=0
k=0
236
13. ON THE STRUCTURE OF GRAPHONS
which implies (13.14). Using (7.22), equation (13.15) can be rewritten as (13.17)
m ∑
ak t(Ck+l+4 , W ) = 0
(l = 0, . . . , m).
k=0
This is a system of m+1 homogeneous linear equations in m+1 variables ak , which is solvable if and only if its determinant vanishes: t(C4 , W ) t(C5 , W ) . . . t(Cm+4 , W ) t(C5 , W ) t(C6 , W ) . . . t(Cm+5 , W ) (13.18) = 0. .. .. . . t(Cm+4 , W ) t(Cm+5 , W ) . . . t(C2m+4 , W ) Since this matrix is a submatrix of M (f, 2), this determinant will certainly vanish if m ≥ r(f, 2). It follows that the number of distinct nonzero eigenvalues of the operator TW is at most r(f, 2). Since every eigenvalue has finite multiplicity, it follows that TW has finite rank. Next, we show that the range of W ◦ W is finite (up to a set of measure 0). Consider its moments ∫ as a single variable function on the probability space [0, 1]2 : Mk (W ◦ W ) = [0,1]2 (W ◦ W )k , and the corresponding moment matrix ( )∞ M (W ◦ W ) = Mk+l (W ◦ W ) k,l=0 . Note that Mk (W ◦ W ) = t(K2,k , W ), and •• •• so Mk+l (W ◦ W ) = t(K2,k K2,l , W ). It follows that M (W ◦ W ) is a submatrix of M (f, 2), and hence its rank is finite. By Theorem A.22, the range of W ◦ W is finite (up to a set of measure 0). 2 The fact that TW has finite rank implies that TW ◦W = TW has finite rank. This, together with the fact that the range of W is finite, implies that W is a stepfunction. Indeed, the row space of W is finite dimensional, so we can select a finite set of points x1 , . . . , xr so that every row W (x, .) is a linear combination of the functions W (xi , .). Since W has finite range, the functions W (xi , .) are stepfunctions. There is a finite partition [0, 1] = S1 ∪ · · · ∪ Sp such that every function W (xi , .) is constant on every Si , and hence every row is constant on every Si . By symmetry, this implies that W is constant on every rectangle Si × Sj , i.e., it is a stepfunction. It is easy to derive from this the following analogue of Theorem 5.54. Corollary 13.48. Let f be a reflection positive, multiplicative, and normalized simple graph parameter. Then either r(f, k) is infinite for all k ≥ 2, or there is a twinfree weighted graph H such that f = t(., H), r(f, k) is finite for all k ≥ 0, and r(f, k)1/k → v(H). This result is not a strengthening of Theorem 5.54, because it concerns simple graph parameters. The extension to multigraph parameters is more complicated, and we will return to it later, in Chapter 17. Proof. By Theorem 11.52 and Proposition 14.61, there is a graphon W such that f = t(., W ). If r(f, 2) = ∞, then trivially r(f, k) = ∞ for all k ≥ 2. Suppose that r(f, 2) < ∞, then by Theorem 13.47, W is a stepfunction, and so there is a weighted graph H such that f = t(., H). The conclusion follows by theorem 6.36.
13.5. SYMMETRIES OF GRAPHONS
237
For the graph parameter f = t(., W ), we have r(f, 0) = 1 by multiplicativity, so this is always finite. The rank r(f, 1) may be finite; this happens if W has an automorphism group that has a finite number of orbits with positive measure. Proposition 13.46 and Theorem 13.47 imply that if Aut(W ) has a finite number of orbits on pairs, then W is a stepfunction. It follows in particular that Aut(W ) has a finite number of orbits on k-tuples of points for every k. This is somewhat surprising in view of the fact that there are arbitrarily large finite graphs whose automorphism group has only three orbits on pairs, but an unbounded number of orbits on k-tuples for k ≥ 3 (for example, the Paley graphs in Example 1.1).
CHAPTER 14
The space of graphons The space of graphons is the stage where many acts of interaction between graph theory and analysis take place. This chapter collects a number of questions about the structure of this space that arise naturally and that have at least partial answers. 14.1. Norms defined by graphs We mentioned, and through the results in the last chapters also illustrated, that the cut norm and the cut-distance are best suited for measuring structural similarity of two graphons. However, other norms are also important; the connection between norms on the graphon space and our theory is twofold: first, homomorphism densities give rise to interesting norms, and second, norms with some natural properties are closely related to the cut norm. We start with a discussion of norms defined by homomorphism densities, based on the work of Hatami [2010]. We have seen that many Schatten norms of a kernel operator can be expressed by the homomorphism densities of even cycles. Here we consider a more general question: for which graphs F is |t(F, W )|1/e(F ) a norm? We call such a graph F norming. This condition can be relaxed in two directions: (1) We can ask whether the functional W 7→ |t(F, W )|1/e(F ) is a seminorm on W (i.e., whether it is subadditive, but could be 0 even if W is not identically 0; homogeneity is trivial). We call F seminorming if this holds. (2) We can ask whether W 7→ t(F, |W |)1/e(F ) is a norm (we moved the absolute value signs in); we call F weakly norming if this holds. This is equivalent to asking: which graphs F have the property that the subadditivity inequality t(F, W1 + W2 )1/e(F ) ≤ t(F, W1 )1/e(F ) + t(F, W2 )1/e(F ) holds for all W1 , W2 ∈ W0 ? (There is no fourth version: if the functional W 7→ t(F, |W |)1/e(F ) is a seminorm, then it is a norm; see Exercise 14.9). A related property is that the graph F satisfies t(F, W ) ≥ 0 for every kernel W . Such graphs are called positive. If F = [[F12 ]] for some k-labeled graph F1 with nonadjacent labeled nodes, then F is positive, but the converse is not known. Exercises 14.3 and 14.4 state some of the known properties of positive graphs. Returning to graphs that are norming in one sense or the other, we collect some (easy) facts about them, due to Hatami [2010] and Kunszenti-Kov´acs [unpublished]. Every seminorming or weakly norming graph is bipartite (Exercise 14.7). Being seminorming is almost equivalent to being norming: every seminorming graph that is not norming is a star with an even number of edges. Every seminorming graph is positive (and hence we don’t need the absolute value in the definition; Exercise 14.8), but not every positive graph is seminorming. Graphs with an odd number of edges cannot be seminorming, but they can be weakly norming, as the example of K2 shows. 239
240
14. THE SPACE OF GRAPHONS
There are several classes of graphs F with norming properties. Besides cycles, complete bipartite graphs with an even number of nodes in each bipartition class are norming, and all complete bipartite graphs are weakly norming. For more properties and examples of graphs with norming properties, see Exercises 14.5–14.8. Norming properties are closely related to H¨older-type inequalities for homomorphism densities, which can be stated using the notion of W-decorated graphs introduced in Section 7.2. A(simple graph F) = (V, E) has the H¨ older property, if for every W-decoration w = we : e ∈ E(F ) of F , ∏ (14.1) t(F, w)e(F ) ≤ t(F, we ). e∈E
It has the weak H¨ older property, if this inequality holds for every W0 -decoration of F (equivalently, for every W-decoration with nonnegative functions). Hatami [2010] gives the following characterizations of seminorming and weakly norming graphs in terms of H¨older properties. Theorem 14.1. A simple graph is seminorming if and only if it has the H¨ older property. It is weakly norming if and only if it has the weak H¨ older property. Proof. We prove the second assertion; the proof of the first is similar. In the “if” direction, suppose that a simple graph F = (V, E) with m edges has the weak H¨older property. Let W1 , W2 ∈ W0 . We have ∑ t(F, W1 + W2 ) = t(F, w), w
where the summation extends to all {W1 , W2 }-decorations w of F . So by the weak H¨older property, m ( ) ∑∏ ∑ m t(F, W1 + W2 ) ≤ t(F, we )1/m = t(F, W1 )k/m t(F, W2 )(m−k)/m k w e∈E k=0 ( )m = t(F, W1 )1/m + t(F, W2 )1/m , which shows that the functional t(F, W )1/m is subadditive on W0 . The proof of the “only if” direction is trickier. Suppose that F is weakly norming, and let (F, w) be a W0 -decoration of F . Inequality (14.1) is homogeneous of degree m in each graphon we , so we may scale those and assume that t(F, we ) = 1 for every edge. We want to prove that t(F, w) ≤ 1, but first we prove the weaker inequality (14.2) Indeed, if W =
∑
t(F, w) ≤ mm .
we , then using subadditivity, we get )m (∑ t(F, w) ≤ t(F, W ) ≤ t(F, we )1/m = mm . e
e
To conclude, we use a method called tensoring. Let n ≥ 1, and let us decorate every edge e by the tensor product we⊗n . Then one has t(F, w⊗n ) = t(F, w)n ,
t(F, we⊗n ) = t(F, we )n = 1,
and hence 14.2 implies that t(F, w) ≤ mm/n . Since this holds for every n, it follows that t(F, w) ≤ 1.
14.1. NORMS DEFINED BY GRAPHS
241
Using this Theorem, one can prove about some graphs that they are norming (Hatami [2010]). A characterization of such graphs is open. Proposition 14.2. (a) Even cycles are norming. (b) Hypercubes are weakly norming. (c) Deleting a perfect matching from a complete bipartite graph Kn,n , we get a weakly norming graph. Proof. We describe the proof of (b); the proofs of (a) and (c) are similar (in fact, simpler). Consider the d-dimensional hypercube graph Qd . We consider its node set as V = {0, 1}d , and its edge set as E = {xy : x, y ∈ V, xi = yi for all but one i}. By Theorem 14.1, it is enough to prove that (14.1) holds for F = Qd and any decoration with graphons. Let A be the set of graphons that occur. We may assume that A does not contain the graphon that is almost everywhere 0 (else, the inequality/is trivial). We say that an A-decoration W of Qd is pessimal, if ∏ d d t(Qd , W )e(Q ) e∈E t(Q , We ) is maximal among all A-decorations. Since there are only a finite number of such decorations, at least one of them is pessimal. In these terms, inequality (14.1) means that there is a pessimal A-decoration with all decorating graphons equal. Let S1 denote the set of nodes x of Qd with x1 = 1, x2 = 0; let S2 be the set of nodes x with x1 = 0, x2 = 1; and let T = V \ S1 \ S2 . Note that T separates S1 and S2 . Let Ei be the set of edges incident with any node in Si , and let E0 be the set of edges spanned by T . We can write ∫ ∫ ∏ ∏ ∏ ∏ Wij (xi , xj ) dx = . t(F, W ) = [0,1]V
ij∈E
[0,1]V
ij∈E0 ij∈E1 ij∈E2
Considering the first factor as a weight function (here we use that W ≥ 0), we can apply the Cauchy–Schwarz Inequality to get ( ∫ )1/2 ( ∫ )1/2 ∏ ( ∏ )2 ∏ ( ∏ )2 d t(Q , W ) ≤ . [0,1]V
ij∈E0 ij∈E1
[0,1]V
ij∈E0 ij∈E2
Interchanging x1 and x2 in every x ∈ V (in other words, reflecting in the hyperplane x1 = x2 ) is an automorphism σ of Qd which maps E1 onto E2 , and therefore ∫ ∏ ( ∏ )2 = t(Qd , W ′ ), [0,1]V
ij∈E0 ij∈E1
where We′ = We if e ∈ E1 ∪ E0 , and We′ = Wσ(e) if e ∈ E2 . Similarly, ∫ ∏ ( ∏ )2 = t(Qd , W ′′ ), [0,1]V
ij∈E0 ij∈E2
where We′′ = We if e ∈ E2 ∪ E0 , and We′ = Wσ(e) if e ∈ E1 . Thus we get t(Qd , W ) ≤ t(Qd , W ′ )1/2 t(Qd , W ′′ )1/2 . By the definition of pessimal decoration, we must have equality here, and the decorations W ′ and W ′′ must also be pessimal. Here (say) W ′ is a pessimal decoration
242
14. THE SPACE OF GRAPHONS
which is invariant under interchanging the first two entries in every x ∈ V . We call this symmetrization with respect to the hyperplane x1 = x2 . We can symmetrize similarly with respect to the hyperplane x1 + x2 = 1. Now consider a pessimal decoration W and a face Z of the cube such that all edges of Z are decorated by the same graphon W . Suppose that Z is not the whole cube. We may assume that Z is the face defined by x1 = x2 = · · · = xk = 0, where 0 < k < d. Let Z ′ be the face obtained by reflecting Z in the hyperplane xk = xk+1 . The intersection of Z and Z ′ is the face defined by x1 = x2 = · · · = xk = xk+1 = 0. The smallest face Z ′′ containing both Z and Z ′ is defined by x1 = x2 = · · · = xk−1 = 0. Let us symmetrize with respect to the hyperplane xk = xk+1 . The decoration of the edges of Z does not change, but the decoration of the edges of Z ′ also becomes U . Symmetrizing with respect to xk + xk+1 = 1, we get a pessimal decoration in which all edges of Z ′′ have the same decoration. Repeating this procedure, we get a pessimal decoration with all edges decorated by the same graphon and we are done. Exercise 14.3. A graph F is positive if and only if every connected component of F that is not positive occurs with even multiplicity. Exercise 14.4. Let F be a positive simple graph. (a) F × G is positive for every simple graph G. (b) There is a homomorphism F → F such that every edge has an even number of pre-images (c) [Harder] If F is positive, then there is a homomorphism F → G into a simple graph G with v(G) ≥ v(F )/2 such that every edge has an even number of pre-images (Camarena, Cs´ oka, Hubai, Lippner and Lov´ asz [2012]). Exercise 14.5. Prove that (a) complete bipartite graphs with an even number of nodes in both color classes are norming, (b) complete bipartite graphs are weakly norming, (c) stars with an even number of edges are seminorming, (c) K2,3 is not seminorming (but weakly norming). Exercise 14.6. Let T be a tree that is seminorming. (a) Prove that if U, W ∈ W ∫1 ∫1 and 0 U (x, y) dx = 0 W (x, y) dx for every y, then t(T, U ) = t(T, W ). (b) Prove that T is a star. Exercise 14.7. (a) Every seminorming or weakly norming graph is bipartite. (b) Every seminorming graph is either a star, or eulerian. (c) Every seminorming graph that is not norming is a star. Exercise 14.8. Let F be a seminorming graph. Prove that (a) kernels with t(F, W ) = 0 form a linear space; (b) e(F ) is even; (c) F is positive. Exercise 14.9. Prove that if the functional W 7→ t(F, |W |)1/e(F ) is a seminorm, then it is a norm.
14.2. Other norms on the kernel space The topologies on W1 defined by the cut norm, L1 -norm, weak convergence etc. are different, but there are some subtle, nonobvious relationships between them. This turns out to be quite important for graph-theoretic applications: the interplay between the cut norm and L2 -norm is crucial in the proof of the Regularity Lemma (Section 9.1.2), and the relationship between the cut norm and L1 -norm is the key to the analytic theory of property testing (Section 15.3) and to the stability theory of extremal graphs (Section 16.4). We are not going to explore all the connections between these norms, just those that have graph theoretical significance.
14.2. OTHER NORMS ON THE KERNEL SPACE
243
Almost all norms on W (in short, norms for this section) that we need have some natural properties. Recall from Section 8.2 that a norm N is called invariant, if N (W φ ) = N (W ) for every measure preserving transformation φ ∈ S[0,1] , and smooth, if for every sequence Wn ∈ W1 of kernels such that Wn → 0 almost everywhere, we have N (Wn ) → 0. The norms L1 , L2 , the cut norm, and the graph norms from the previous section share these properties. (But the L∞ -norm is not smooth!) Recall the obvious inequalities (14.3)
∥W ∥ ≤ ∥W ∥1 ≤ ∥W ∥2 . 1/2
For W ∈ W1 , we have ∥W ∥2 ≤ ∥W ∥1 , and hence these two norms define the same topology on W1 . Trivially, the cut norm is continuous in this topology. How about the other way around? There are easy examples showing that ∥Wn ∥ → 0 does not imply that ∥Wn ∥1 → 0 or ∥Wn ∥2 → 0: let (Gn ) be a quasirandom graph sequence with edge density 1/2, and Wn = 2WGn − 1. Then ∥Wn ∥ → 0, but ∥Wn ∥1 = ∥Wn ∥2 = 1. The main goal in this section is to establish the following picture about smooth invariant norms. f1 ) is conTheorem 14.10. (a) Every smooth invariant norm (as a function on W tinuous with respect to the L1 norm, and the cut norm is continuous with respect to any smooth invariant norm. (b) Any smooth invariant norm is lower semicontinuous with respect to any other smooth invariant norm. We also prove an analogous (but not equivalent!) theorem about the distances f defined by smooth invariant norms N . Let us call these, for brevity, δN on W delta-metrics. f1 × W f1 ) Theorem 14.11. (a) Every delta-metric is continuous (as a function on W with respect to δ1 , and δ is continuous with respect to any delta-metric. (b) Any delta-metric is lower semicontinuous with respect to any other deltametric. The fact that we prove continuity (or lower semicontinuity) as a function in two variables, and not just separately in each variable, is significant. As an example of a different nature, recall that the overlay functional C(U, W ) is continuous in each variable, but not as a 2-variable function (Section 12.2). Some of the above statements are trivial, and some follow easily from each other. Along the lines, we are going to prove a couple of facts that will be useful in other contexts too. 14.2.1. Smooth and invariant norms. As a technical preparation, we have to prove some simple facts about smooth and invariant norms. Lemma 14.12. Every smooth norm N is uniformly continuous with respect to the L1 norm on W1 . Proof. Suppose not, then there exists an ε > 0 and a sequence of kernels Wn ∈ W1 such that ∥Wn ∥1 → 0 but N (Wn ) > 0. By selecting a subsequence, we may assume that Wn → 0 almost everywhere, contradicting the assumption that N is smooth.
244
14. THE SPACE OF GRAPHONS
Next we prove a basic property of the stepping operator. Proposition 14.13. The stepping operator is contractive with respect to any smooth invariant norm. Proof. Let P be any finite measurable partition of [0, 1], and let N be an invariant norm. We want to prove that N (WP ) ≤ N (W ) for every W ∈ W. By the invariance of N , we may assume that the partition classes of P are intervals. We may also assume that W ∈ W1 . For every interval I = [a, b) ∈ P, let φI : I → I denote a measure preserving ( ) map x 7→ a + 2(x − a) (mod b − a). Then every map (x, y) 7→ φI (x), φJ (y) is ergodic on I × J. Let φ : [0, 1] → [0, 1] denote the map that acts on I ∈ P as φI . For n ≥ 1, define Un (x, y) =
n−1 n−1 ) 1 ∑ φk 1∑ ( k W (x, y) = W φ (x), φk (y) . n n k=0
k=0
Using the subadditivity and invariance of N , we get n−1 k 1∑ N (Un ) ≤ N (W φ ) = N (W ). n k=0
On the other hand, the Ergodic Theorem implies that Un → WP almost everywhere as n → ∞. Since, trivially, Un ∈ W1 and N is smooth, this implies that N (WP ) = limn→∞ N (Un ) ≤ N (W ). Next we give a useful representation of smooth invariant norms. By the Hahn– Banach Theorem, we can represent any norm on W that is continuous in the L∞ norm as (14.4)
N (W ) = sup ℓ(W ), ℓ∈L
where L is an appropriate set of linear functionals on W, continuous in the L∞ norm. We show that for our norms, the linear functionals in L can be represented as inner products with functions in W. Proposition 14.14. For every smooth and invariant norm N there is a set K ⊆ W such that N (W ) = sup ⟨U, W ⟩ U ∈K
for every W ∈ W. Proof. Define K = {Y ∈ W : ⟨Y, U ⟩ ≤ N (U ) ∀U ∈ W}. Let W ∈ W, then we want to prove that N (W ) = supY ∈K ⟨Y, W ⟩. Suppose not, then we may assume that N (W ) > 1 > supY ∈K ⟨Y, W ⟩. First, we assume that W is a stepfunction. Let P be the partition of [0, 1] into the steps of W . The linear space WP of kernels with steps in P is finite dimensional, and B = {U ∈ WP : N (U ) ≤ 1} is a convex set in it. Since W ∈ / B, and we are working in a finite dimensional space, there is a hyperplane of the form ⟨Y, .⟩ = 1 (Y ∈ WP ) through the point W such that ⟨Y, X⟩ ≤ 1 for all X ∈ B. Then for any U ∈ W, using Proposition 14.13, we get ⟨ ⟩ 1 ⟨Y, U ⟩ = ⟨Y, UP ⟩ = N (UP ) Y, UP ≤ N (UP ) ≤ N (U ), N (UP )
14.2. OTHER NORMS ON THE KERNEL SPACE
245
which shows that Y ∈ K. Since ⟨Y, W ⟩ = 1, this is a contradiction. Second, let W be an arbitrary kernel. Proposition 9.8 implies that there is a stepfunction W ′ such that N (W −W ′ ) < N (W )−1. Then N (W −W ′ ) < N (W )−1, and hence N (W ′ ) > 1. We know already that there is a stepfunction Y ∈ K with the same steps as W ′ such that ⟨Y, W ′ ⟩ = 1. Since ⟨Y, W ⟩ = ⟨Y, W ′ ⟩, this contradicts the choice of W , and completes the proof. The following lemma is a special case of the theorem, but it is best to formulate and prove it separately. Lemma 14.15. Let N be any smooth invariant norm, and let Wn → W in the cut norm (Wn , W ∈ W1 ). Then lim inf N (Wn ) ≥ N (W ). n→∞
Proof. By Proposition 14.14, the norm N can be represented as N (X) = supY ∈K ⟨X, Y ⟩ for some K ⊆ W. Let ε > 0 and choose a function Y ∈ K such that ⟨Y, W ⟩ ≥ N (W ) − ε. Then by Lemma 8.22, N (Wn ) ≥ ⟨Y, Wn ⟩ → ⟨Y, W ⟩ ≥ N (W ) − ε. Since ε > 0 was arbitrary, this proves the Proposition. We can now give the proof of the first main theorem in this section. Proof of Theorem 14.10. (a) Lemma 14.12 proves the first statement. To prove the second, let N be a smooth invariant norm. Suppose that the cut norm is not continuous with respect to N , then there is a sequence of kernels Wn ∈ W1 such that N (Wn ) → 0 but ∥Wn ∥ ≥ c > 0 for all n. By the compactness of the graphon space, we may also assume that δ (Wn , U ) → 0 for some nonzero graphon U . This means that there are invertible measure preserving transformations φn such that ∥Wnφn −U ∥ → 0. Lemma 14.15 implies that lim inf n N (Wnφn ) = lim inf n N (Wn ) ≥ N (U ) > 0. Since N is invariant, this contradicts the assumption N (Wn ) → 0. (b) Let N1 and N2 be two smooth invariant norms on W, and let W1 , W2 , . . . , W ∈ W1 such that N1 (Wn − W ) → 0. Then ∥Wn − W ∥ → 0 by (a). Hence liminfn N2 (Wn ) ≥ N (W ) by Lemma 14.15. 14.2.2. Delta-distances. We start with a fact similar to Lemma 14.15, but more difficult to prove, with the distances δN and δ replacing the norm N and the cut norm. For the case when N is the L1 -norm, this was proved by Lov´asz and Szegedy [2010a]. Lemma 14.16. Let N be a smooth invariant norm on W. Let δ (Un , U ) → 0 and δ (Wn , W ) → 0 as n → ∞ (U, W, Un , Wn ∈ W1 ). Then lim inf δN (Wn , Un ) ≥ δN (W, U ). n→∞
In other words, the distance δN is lower semicontinuous on the compact metric f1 , δ ). space (W Proof. Applying appropriate measure preserving transformations to the kernels Un and Wn , we may assume that ∥Un − U ∥ → 0 and ∥Wn − W ∥ → 0 when n → ∞. Fix an ε > 0. Let P and Q denote finite partitions of [0, 1] such that N (W − WP ) ≤ ε and N (U − UQ ) ≤ ε. For any positive integer n, there are measure preserving transformations φn , ψn : [0, 1] 7→ [0, 1] such that δN (Wn , Un ) = N (Wnφn − Unψn ). The difficulty (why we cannot apply Lemma 14.15) is that these transformations φn and ψn may depend on n.
246
14. THE SPACE OF GRAPHONS
But as a next step, we fix n as follows. Let R = {R1 , . . . , Rm } denote the common( refinement of )the partitions P and Q. We claim that if n is large enough then N (Wn )R − WR ≤ ε. By Lemma 14.12, there is an ε′ > 0 such that it is enough to guarantee that ∥(Wn )R − WR ∥1 ≤ ε′ . By (8.15), this follows if we have ∥(Wn )R − WR ∥ ≤ ε′ /m2 . By Proposition 14.13, this will hold if ∥Wn − W ∥ ≤ ε′ /m2 , which holds for every n that is(large enough)by the assumption that ∥Wn − W ∥ → 0 when n → ∞. Similarly, N (Un )R − UR ≤ ε holds if we choose n large enough. We consider n fixed from now on, and so we can replace Wn , W and P by Wnφn , W φn and φn (P), and replace Un , U and Q by Unψn , U ψn and ψ(Q). With this new notation, we have δN (Wn , Un ) = N (Wn − Un ). Then δN (W, U ) ≤ N (W − U ) ≤ N (W − WR ) + N (WR − UR ) + N (UR − U ). By the choice of P and by Proposition 14.13
( ) N (W −WR ) ≤ N (W −WP )+N (WP −WR ) = N (W −WP )+N (WP −W )R ≤ 2ε, and using the analogous estimate for U , we get that δN (W, U ) ≤ N (WR − UR ) + 4ε. Using Proposition 14.13 again and the fact that the kernels WP and UQ are both constant on the rectangles Ri × Rj , we get ( ) δN (Wn , Un ) = N (Wn − Un ) ≥ N (Wn )R − (Un )R ( ) ( ) ≥ N (WR − UR ) − N (Wn )R − WR − N UR − (Un )R ≥ N (WR − UR ) − 2ε ≥ δN (W, U ) − 6ε. Since this holds for every ε > 0 if n is large enough, the assertion follows.
Proof of Theorem 14.11. The theorem follows from Lemma 14.16 similarly as Theorem 14.10 followed from Lemma 14.15. The details are not repeated. A consequence of Lemma 14.16 (or Theorem 14.11) is worth formulating. f1 be compact with Corollary 14.17. Let N be a smooth invariant norm. Let R ⊆ W respect to the δ distance. Then the functional δN (., R) is lower semicontinuous f1 , δ ). on (W Proof. Suppose that Un → U in the δ distance; we want to prove that lim inf δN (Un , R) ≥ δN (U, R). n→∞
By selecting an appropriate subsequence, we may assume that the limes inferior is actually a limit. For every n, let Wn ∈ R be such that δN (Un , Wn ) ≤ δN (Un , R) + 1/n. Again by going to a subsequence, using the compactness of R, we may assume that Wn → W for some W ∈ R. By Lemma 14.16, ( 1) lim inf δN (Un , R) ≥ lim inf δN (Un , Wn ) − ≥ δN (U, W ) ≥ δN (U, R). n→∞ n→∞ n Exercise 14.18. Prove that the statement of Lemma 14.12 remains valid if the assumption of smoothness is replaced by the assumption that N has a representation as in Proposition 14.14. Exercise 14.19. Construct a norm N such that the stepping operator is not contractive with respect to N .
14.3. CLOSURES OF GRAPH PROPERTIES
247
Exercise 14.20. Let W1 and W2 be graphons that are monotone increasing in both variables. Prove that (a) δ1 (W1 , W2 ) = ∥W1 − W2 ∥1 ; (b) δ (W1 , W2 ) = 2/3 ∥W1 − W2 ∥ ; (c) ∥W1 − W2 ∥1 ≤ 10∥W1 − W2 ∥ (Bollob´ as, Janson and Riordan [2012]).
14.3. Closures of graph properties A class of graphons closed under weak isomorphism is called a graphon property. Since every graphon is weakly isomorphic to a graphon on [0, 1], we can usually restrict our attention to graphons on [0, 1] when studying graphon properties. In this section, we study graphon properties obtained as closures of graph properties, and graphon properties defined by equations. A graph property P is a class of finite graphs closed under isomorphism. Let P is the set of graphons (J, W ) that arise as limits of graph sequences in P. Geometric and topological properties of the closure reveal important information about the graph property, as we shall see. 14.3.1. Hereditary properties. Recall that a graph property is called hereditary, if whenever G ∈ P, then every induced subgraph is also in P. For every graphon W , let I(W ) denote the set of its “induced subgraphs”, i.e., the set of those graphs F for which tind (F, W ) > 0. Clearly, I(W ) is a hereditary graph property. Let P be a hereditary property of graphs. Then (14.5)
∪W ∈P I(W ) ⊆ P.
Indeed, if F ∈ / P, then tind (F, G) = 0 for every G ∈ P, since P is hereditary. This implies that tind (F, W ) = 0 for all W ∈ P, and so F ∈ / I(W ). Equality does not always hold in (14.5). For example, we can always add a graph G and all its induced subgraphs to P without changing P. As a less trivial example, consider all graphs with degrees bounded by 10. This property is hereditary, and P consists of a single graphon (the identically 0 function), so the left hand side of (14.5) consists of edgeless graphs only. Equality in (14.5) can be characterized by assuming that P is closed not only under induced subgraphs, but also under a certain version of multiplying points (Exercise 14.28). The closure of the set of triangle-free graphs is the set of triangle-free graphons, which can be characterized by the property t(K3 , W ) = 0. More generally: Proposition 14.21. Let P be a hereditary graph property. Then its closure is characterized by the (infinitely many) equations tind (F, W ) = 0 for all F ∈ / P. Proof. By the definition of hereditary properties, the equations tind (F, G) = 0 hold for all G ∈ P and F ∈ / P, which implies that tind (F, W ) = 0 for every W ∈ P. Conversely, suppose that W has the property that tind (F, W ) = 0 for all F ∈ / P. This means that P(G(k, W ) ∈ P) = 1 for every k. Since G(k, W ) → W with probability 1, this implies that W ∈ P. 14.3.2. Random-free properties. A graph property P is random-free , if every W ∈ P is 0-1 valued almost everywhere. (For an explanation of the name, see Exercise 14.29.) By Proposition 8.24, if (Gn ) is a convergent random-free sequence of graphs with limit graphon W , then WGn → W in the L1 norm. Example 11.41 illustrated how random-freeness was related to a small amount of randomness in a randomly generated random-free graph sequence. We can also point out that if W
248
14. THE SPACE OF GRAPHONS
is 0-1 valued, then G(n, W ) = H(n, W ), and so to generate G(n, W ), we don’t need randomness to get the edges (of course, we still need randomness to generate the nodes). Among hereditary properties, it is quite easy to characterize random-free properties. Lemma 14.22. A hereditary graph property P is random-free if and only if there is a signed bipartite graph F such that t(F, W ) = 0 for all W ∈ P. The proof will show that it would be enough to assume that for every graphon W ∈ P there is a signed bipartite graph with t(F, W ) = 0. Proof. Suppose that for every signed bipartite graph F there is a graphon W ∈ P such that t(F, W ) > 0. Let (Fn ) be a quasirandom sequence of bipartite graphs with bipartition V (Fn ) = Vn′ ∪ Vn′′ , with edge density 1/2, and with |Vn′ | = |Vn′′ |. Consider the signed bipartite graphs Fbn , obtained from Kn,n by signing the edges of Fn with +, the other edges with −. Let Wn ∈ P be a graphon cn , Wn ) > 0. It is easy to see that this means that there is a simple such that t(F graph Gn obtained from Fn by adding edges within the color classes such that tind (Gn , Wn ) > 0. By Proposition 14.21, this implies that Gn ∈ P. By selecting a subsequence we may assume that the graph sequences G′n = (Gn [Vn′ ]) and G′′n = (Gn [Vn′′ ]) are convergent. By Theorem 11.59, we can order the nodes in Vn′ and in Vn′′ so that WG′n converges to a graphon W ′ on [0, 1] in the cut norm, and similarly WG′′n converges to a graphon W ′′ on [0, 1]. If we order the nodes of Gn so that the nodes in Vn′ precede the nodes in Vn′′ , and keep the above ordering inside Vn′ and Vn′′ , then WGn converges to the graphon ′ if x, y < 1/2, W (2x, 2y) ′′ U (x, y) = W (2x − 1, 2y − 1) if x, y > 1/2, 1/2 otherwise. So U ∈ P is not 0-1 valued, and hence P is not random-free. Conversely, suppose that P is not random-free, and let W ∈ P be a graphon that is not 0-1 valued almost everywhere. Then by Theorem 13.35(a), t(F, W ) > 0 for every signed bipartite graph F . Corollary 14.23. If a hereditary property of bipartite graphs does not contain all bipartite graphs, then it is random-free. Using Theorem 13.35(c), we can associate a finite dimension with every nontrivial hereditary property of bipartite graphs. It would be interesting to find further combinatorial properties of this dimension. The natural analogue of this corollary for properties of nonbipartite graphs fails to hold. Example 14.24. Let P be the property of a graph that it is triangle-free. Then every bipartite graphon is in its closure, but such graphons need not be 0-1 valued. For more characterizations of hereditary and random-free properties, and for more on their connection, see Janson [2011c].
14.3. CLOSURES OF GRAPH PROPERTIES
249
14.3.3. Flexible properties. A graphon U is called a flexing of a graphon W , if U (x, y) = W (x, y) for all x, y with W (x, y) ∈ {0, 1} (so we may change the values of W that are strictly between 0 and 1; we may change them to 0 or to 1, so the relation is not symmetric). We say that a graphon property is flexible if it is preserved under flexing. Every random-free graphon property is trivially flexible. It is also clear that the intersection and union of any set of flexible graphon properties is flexible. If R is a flexible graphon property, then so are its “complement” {1 − W : W ∈ R}, its “downward closure” {U ∈ W0 : (∃W ∈ R) U ≤ W }, and its “upward closure” defined analogously. For every signed graph F , the graphon property {W ∈ W0 (: tind (F, W ) = )0} is flexible, since the condition means that for almost all vectors xi : i )∈ V (F ) , at ( ∏ ∏ least one of the factors in ij∈E(F ) W (x1 , xj ) ij∈E(F ) 1 − W (x1 , xj ) is 0, which is preserved if values strictly between 0 and 1 are changed. By Proposition 14.21, the closure of any hereditary property can be defined by conditions of the form tind (F, W ) = 0. This implies: Proposition 14.25. The closure of a hereditary property is flexible.
There are other, non-hereditary graph properties whose closure is flexible. Some of these are described in Exercise 14.31 Every flexible property has the following interesting geometric feature: Proposition 14.26. If R ⊆ W0 is flexible, then W0 \ R is convex. Proof. Indeed, let W1 , W2 ∈ W0 \ R, and suppose that a convex combination W = α1 W1 + α2 W2 ∈ R. Then for every x, y ∈ [0, 1] with W (x, y) ∈ {0, 1} we have W1 (x, y) = W2 (x, y) = W (x, y), and so by the definition of flexibility, we must have W1 , W2 ∈ R, a contradiction. Corollary 14.27. If P is a hereditary graph property, then W0 \ P is convex. We will discuss an application of this fact in extremal graph theory in Section 16.5.1. Exercise 14.28. Prove that for a hereditary property P of graphs equality holds in (14.5) if and only if for every graph G ∈ P and v ∈ V (G), if we add a new node v ′ and connect it to all neighbors of v, then at least one of the two graphs obtained by joining or not joining v and v ′ has property P. Exercise 14.29. Prove that a graph property is not random-free if it contains large quasirandom bipartite graphs in the following sense: for every ε > 0 there is δ > 0, a sequence of graphs G1 , G2 , · · · ∈ P, and disjoint sets Sn , Tn ⊆ V (Gn ) with |Sn | = |Tn | ≥ δv(Gn ) such that the bipartite graphs Gn [Sn , Tn ] form a quasirandom sequence with error ε. Exercise 14.30. Prove that a graph property P is random-free if and only if for every ε > 0 there is an n ∈ N such that for every graph G ∈ P with v(G) ≥ n and every ε-regular k-partition of G, all but εk2 pairs of partition classes span bipartite graphs whose edge density is at most ε or at least 1 − ε. Exercise 14.31. Prove that the closure of the following graph properties is flexible: (a) G is clique of size ⌈|V (G)|/2⌉ together with ⌊|V (G)|/2⌋ isolated nodes; (b) ω(G) ≥ |V (G)|/2; (c) α(G) ≥ |V (G)|/2; (c) there is a labeling of the nodes by {1, . . . , n} such that all (i, j) with i + j ≤ n are connected by an edge.
250
14. THE SPACE OF GRAPHONS
14.4. Graphon varieties The topic of this section is reminiscent of the setup of algebraic geometry: we study subsets of W, called varieties, defined by equations specifying linear (equivalently, algebraic) dependence between subgraph densities. These equations are f if we wish. Conditions invariant under weak isomorphism, so we may work in W like this play a role in extremal graph theory, and a better understanding of these varieties seems to be an important direction in the study of graphons. A set of kernels satisfying a condition of the form t(g, W ) = 0, where g is a quantum graph, will be called a kernel variety. We get, of course, different versions by putting restrictions on g and on W . A simple variety is defined by a simple quantum graph g. A graphon variety is the intersection of a kernel variety with W0 . It is often convenient to restrict our attention to pure graphons; since every graphon is weakly isomorphic to a pure graphon, this is not an essential restriction. It is clear that every kernel/graphon variety is closed under weak isomorphism. Every simple graphon variety is closed in the cut distance, and hence it can be f0 , δ ). considered as a closed (and hence compact) subset of the graphon space (W However, this does not hold for non-simple varieties (see Example 14.36). The union and intersection of two [simple] graphon varieties are [simple] graphon varieties (Exercise 14.50). We could try to be more general and consider the common solutions (in W ) of a system of constraints t(f1 , W ) = 0, . . . , t(fm , W ) = 0. However, this could always 2 be replaced by the single condition t(f12 + · · · + fm , W ) = 0. While a general theory of graphon varieties is not at hand, there are some interesting examples, which will be needed later on. We will see some less trivial varieties in Section 14.4.2, but this will need some preparation in Section 14.4.1. Example 14.32 (Constants). As an immediate application of Claim 11.63, we get that every constant function W = Jp forms a simple kernel variety. Indeed, this kernel can be defined by the equations t(K2 , W ) = p and t(C4 , W ) = p4 . Example 14.33 (Complete graphs). We have mentioned in the introduction that the densities of triangles and edges in a graph G satisfy the inequality t(K3 , G) ≥ 2t(K2 , G)2 − t(K2 , G), and equality holds if and only if G is a blow-up of the complete graph. This is equivalent to saying that (up to weak isomorphism) the graphon variety defined by the equation t(K3 − 2K2 K2 + K2 , W ) = 0 consists of the countably many graphons WKn (n = 1, 2, . . . ) and the identically-1 graphon. ∫1 Example 14.34 (Regularity). We call a kernel d-regular, if 0 W (x, y) dy = d for almost all 0 ≤ x ≤ 1. This kernel variety can be defined by two subgraph density constraints: t(K2 , W ) = d and t(P3 , W ) = d2 . (This can be shown by a simpler version of the argument in the proof of Claim 11.63.) Regular kernels (without specifying the degree d) can be defined by the constraint t(P3 − K2 K2 , W ) = 0. Example 14.35 (Hadamard kernels). Our next examples show that graphon varieties can encode quite substantial combinatorial complications. A symmetric n × n Hadamard matrix B gives rise to a kernel WB , which we alter a little to get a graphon UB = (WB + 1)/2. We call UB an Hadamard graphon.
14.4. GRAPHON VARIETIES
251
Hadamard graphons, together with the graphon J1/2 , form a simple graphon variety. Indeed, the condition ( ) (14.6) t K3 − 2K2 K2 + K2 , 1 − (2U − 1) ◦ (2U − 1) = 0 implies that 1 − (2U − 1) ◦ (2U − 1) is a kernel that is either identically 1 or it corresponds to a complete graph (Example 14.33). Let W = 2U − 1, then either W ◦ W = 0 or W ◦ W = WI where I is an identity matrix of some size n. In the first case, W = 0 and so U = J1/2 . In the second case, we note that every eigenvector 2 of TW ◦W = TW is a stepfunction with steps [0, 1/n), . . . , [(n − 1)/n, 1), and hence so are the eigenvectors of TW . It follows that W is a stepfunction with these steps, and so W = WB for some n × n matrix B. Furthermore, W ◦ W = WI implies that B 2 = nI. Since U is a graphon, we have −1 ≤ ∑ W ≤ 1, and so every entry of 2 = n for every j, which B is in [−1, 1]. The condition B 2 = nI implies that i Bij implies that every entry of B is either 1 or −1, and so B is an Hadamard matrix. We must add that (14.6) can be expanded into a subgraph density condition on U , using the fact that t(F, U ◦ U ) = t(F ′ , U ) (where F ′ is the subdivision of F ). Example 14.36 (Zero-one valued graphons). It is not hard to see that W ∈ W is 0-1 valued almost everywhere if and only if t(B4 −2B3 +B2 , W ) = 0 (one approach is to note that W is 0-1 valued iff txy (B2•• , W ) = txy (B1•• , W ), and use Lemma 14.37 below). Hence 0-1 valued graphons form a kernel variety. However, the variety of 0-1 valued graphons is not simple, because it is not closed in the cut distance: for a quasirandom graph sequence (Gn ) the associated graphons WGn are 0-1 valued, but WGn → J1/2 in the ∥.∥ norm. 14.4.1. Unlabeling. Before describing more complicated graphon varieties, we introduce a tool that is very useful in constructing varieties. Instead of prescribing subgraph densities, we can try to define graphon or kernel varieties by a (seemingly) more general condition on the density function of a k-labeled graph or quantum graph: such conditions can be written as tx (g, W ) = 0 (for all x ∈ [0, 1]k ) for some k-labeled quantum graph g. However, there is a way to translate labeled constraints to unlabeled constraints. This fact will be convenient in constructions, since it is often easier to describe a property by the density of a labeled quantum graph. Lemma 14.37. For every k-labeled quantum graph f there is an unlabeled quantum graph g such that for any W ∈ W, t(g, W ) = 0 if and only if tx1 ...xk (f, W ) = 0 almost everywhere. If f is simple, then we can require that g is simple, and the labeled nodes form a stable set in every constituent of g. Proof. The first assertion is trivial: tx1 ...xk (f, W ) = 0 almost everywhere if and only if t([[f 2 ]], W ) = 0. This construction works for the second statement as well, provided the labeled nodes form a stable set in every constituent of f . To prove the second statement for every simple k-labeled quantum graph g, define Lb(g) as the disjoint union of the subgraphs of the constituents induced by the labeled nodes (note that these subgraphs ( all )have the ( same ) node set [k]). We use induction on the chromatic number χ Lb(f ) . If χ Lb(f ) = 1, then the labeled nodes are nonadjacent in every constituent, and the trivial construction above works.
252
14. THE SPACE OF GRAPHONS
( ) Suppose that χ Lb(f ) = r > 1, let [k] = S1 ∪· · ·∪Sr be an r-coloring of Lb(f ), and let q = |Sr |. We may suppose that Sr = {k − q + 1, . . . , k}. We glue together two copies of f along Sr . Formally, let f1 be obtained from f by increasing the labels in Sr by k − q (the labels not in Sr are not changed). Let f2 be obtained from f by increasing all labels by k − q. So the product f1 f2 is a (2k − q)-labeled quantum graph, in which the nodes of Sr are labeled 2k − 2q + 1, . . . , 2k − q. Let h be obtained from f1 f2 by unlabeling the nodes in Sr . Claim 14.38. For every W ∈ W, tx1 ...xk (f, W ) = 0 almost everywhere if and only if tx1 ...x2k−2q (h, W ) = 0 almost everywhere. The “only if” part is obvious, since tx1 ...xk (f, W ) = 0
⇒
tx1 ...x2k−q (f1 , W ) = tx1 ...x2k−q (f2 , W ) = 0
⇒
tx1 ...x2k−q (f1 f2 , W ) = 0
⇒
tx1 ...x2k−2q (h, W ) = 0.
To prove the “if” part, note that two labeled nodes whose labels correspond to the same label in f are never adjacent, so we can identify these labels in h to get f 2 (with the labels in Sr removed). So tx1 ...x2k−2q (h, W ) = 0 almost everywhere implies by Proposition 13.23 that t([[f 2 ]], W ) = 0, and hence we get that tx1 ...xk (f, W ) = 0 almost everywhere. This proves the Claim. Thus it suffices to express the constraint tx1 ...x2k−2q (h, W ) = ( 0 by )an appropriate unlabeled constraint. This can be done by induction, since χ Lb(h) ≤ r−1. In some cases, the following simple observation suffices to go between labeled and unlabeled conditions. Lemma 14.39. Let F be a k-labeled signed graph. Then in W0 , the constraints tx1 ...xk (F, W ) = 0 and t([[F ]], W ) = 0 define the same graphon variety. Proof. Clearly tx1 ...xk (F, W ) = 0 implies that t([[F ]], W ) = 0. Conversely, in the constraint ∫ ∏ ∏ ( ) t([[F ]], W ) = W (xi , xj ) 1 − W (xi , xj ) dx = 0 [0,1]V (F )
ij∈E+
ij∈E−
the integrand is nonnegative, so it must be 0 almost everywhere. Thus integrating only over the unlabeled nodes, we also get 0 almost everywhere. Example 14.40. The identically-p graphon Jp is defined by the constraint txy (K2•• , U ) = p. Following the above construction, we get that it can be defined by the constraint t(C4 (p), U ) = 0, where C4 (p) is obtained from C4 by replacing each edge by the quantum graph K2•• − pO2 . It is a good exercise to verify that this is equivalent to the conditions t(K2 , U ) = p and t(C4 , U ) = p4 . We have seen that 0-1 valued graphons do not form a simple variety. On the other hand, there are many interesting varieties whose elements are 0-1 valued. As a first application of Lemma 14.37, we describe a rather general sufficient condition, generalizing Theorem 13.35(a) from graphons to kernels. Lemma 14.41. Let F be a signed bipartite graph on n nodes, all labeled. Suppose that for some W ∈ W we have (14.7)
tx1 x2 ...xn (F, W ) = 0
almost everywhere. Then W (x, y) ∈ {0, 1} almost everywhere.
14.4. GRAPHON VARIETIES
253
Proof. By Proposition 13.23, (14.7) implies that for the 2-labeled signed bond B •• obtained by identifying each color class of F , we have txy (B •• , W ) = 0. This clearly implies that W is 0-1 valued almost everywhere. It follows by Lemma 14.39 that if W ∈ W0 , then it is enough to assume that t([[F ]], W ) = 0, and we get another proof of Theorem 13.35(a). Example 14.42 (Threshold graphons). Let α : [0, 1] → [0, 1] be a “weight function”, and consider the graphon Uα (x, y) = 1(α(x) + α(y) < 1). We call these graphons threshold graphons. They have been studied by Diaconis, Holmes and Janson [2008] as limits of threshold graphs. Every such weight function α has a “monotone reordering” in the form of a measure preserving function φ : [0, 1] → [0, 1] such that α ◦ φ is monotone increasing (see Proposition A.19). Then Uα◦φ = (Uα )φ is a graphon that is weakly isomorphic to Uα . Furthermore, Uα◦φ is monotone decreasing, and clearly 0-1 valued. Conversely, every monotone decreasing 0-1 valued graphon is almost everywhere equal to a threshold graphon (see Exercise 14.53). b4 denote a signed 4-labeled Threshold graphons form a simple variety. Let C 4-cycle, with two opposite edges signed “+”, the other two signed “−”. Then, a kernel W ∈ W is a threshold graphon if and only if (14.8)
b4 , W ) = 0 tx1 x2 x2 x4 (C
almost everywhere. The necessity of this condition is trivial, for the (elementary) proof of the sufficiency, see Exercise 14.55. It follows by Lemma 14.39 that if b4 ]], W ) = 0. W ∈ W0 , then it is enough to assume that t([[C Example 14.43 (Excluded induced subgraphs). From any class of graphs that is characterized by a finite number of excluded induced subgraphs, we get a graphon variety by taking the closure (this is immediate by Proposition 14.21). It seems that the study of this closure often leads to quite interesting questions, which are mostly unexplored. As an interesting special case, we can consider graphs not containing an induced path on 4 nodes. These graphs are called complement reducible, or cographs, and have many interesting properties and characterizations into which we don’t go here. The closure consists of graphons W satisfying the equation tind (P4 , W ) = 0. While a complete characterization of such graphons is awkward, it turns out that the regular ones among them (those satisfying t(P3 , W ) = t(K2 K2 , W )) have a quite pretty characterization (we refer to Lov´asz and Szegedy [2011] for details). 14.4.2. Stepfunctions and finite rank kernels. Let Sk denote the set of kernels that are almost everywhere equal to a stepfunction with k steps. Proposition 14.44. The set Sk is a simple kernel variety, defined by an equation t(f, .) = 0, where f is a simple quantum graph whose constituents have at most (k + 1)(k + 2) nodes. Proof. It is clear that the set Sk is closed under weak isomorphism. Every function U ∈ Sk satisfies the following equation: ∏ ( ) (14.9) U (xij , xi ) − U (xij , xj ) = 0 1≤i 0. 2 4 4 This shows that the events G[S] = F and G[T ] = F are not independent, and hence µ is not local. Exercise 14.63. Prove that the sigma-algebra of Borel sets in the metric space f0 , δ ) is generated by the “semivarieties” S(F, a) = {W ∈ W f0 : t(F, W ) ≥ a}, (W where F is a simple graph and a is a rational number.
14.6. Exponential random graph models In this last section of our study of the graphon space, we sketch some results in the theory of random graphs where viewing the graphon space as a single compact
260
14. THE SPACE OF GRAPHONS
metric space is crucial. Chatterjee and Varadhan [2011] applied the theory of graph limits to the theory of large deviations for Erd˝os–R´enyi random graphs. This was extended by Chatterjee and Diaconis [2012] to more general distributions on graphs, which they call exponential random graph models. We summarize their ideas without going into the details of the proofs. Let f be a bounded graph parameter such that for every convergent graph sequence (Gn ), the numerical sequence f (Gn ) is convergent. Such parameters are called estimable. The canonical examples of such parameters are subgraph densities t(F., ), but there are many others. We will return to them in Section 15.1 to study their estimation through sampling and other characterizations. Right now we only need the fact that every such parameter can be extended to the graphon space f0 so that if Gn → W then f (Gn ) → f (W ), and the extension is continuous in W the distance δ (in particular, the extension is invariant under weak isomorphism). These facts are immediate consequences of the definition. Suppose that we want to understand the structure of a random graph, but under the condition that f (G) is small. For example, Chatterjee and Varadhan were interested in random graphs G(n, 1/2) in which the triangle density is much less than 1/8 (the expectation). To this end, we consider a weighting of all simple 2 graphs on n nodes by e−f (G)n ; this will emphasize those graphs for which f (G) 2 is small. The factor n in the exponent is needed to make the logarithms of the weights to have the same order of magnitude as the logarithm of the total number of ( ) simple graphs on [n] (which is just n2 , if you take binary logarithm). We introduce the probability distribution φn on Fnsimp by e−f (G)n . −f (G′ )n2 simp e G′ ∈Fn 2
φn (G) = ∑
Let ψn denote the normalizing factor in the denominator. It looks quite hairy, but Chatterjee and Diaconis derived an asymptotic formula for it. To state their result, we need some notation. For W ∈ W0 , consider the entropy-like functional ∫ ( ) ( ) 1 I(W ) = W (x, y) log W (x, y) + 1 − W (x, y) log 1 − W (x, y) . 2 [0,1]2 Chatterjee and Varadhan proved that this functional is invariant under weak isof0 , δ ) (this fact is quite similar morphism and lower semicontinuous on the space (W to Lemma 14.16). The formula of Chatterjee and Diaconis can be stated as follows: Theorem 14.64. If f is an estimable graph parameter, then ( ) lim ψn = sup f (W ) − I(W ) . n→∞
W ∈W0
Using this, they prove the following result about the behaviour of a random graph drawn from the distribution φn . Since f (W )−I(W ) is upper semicontinuous f0 , δ ), the supremum in the above formula is in fact a on the compact space (W maximum, and it is attained on a compact set Kf ⊆ W0 . Theorem 14.65. Let f be an estimable graph parameter, and let Gn be a random graph from the distribution φn . Then for every η > 0 there are C, ε > 0 such that P(δ (WGn , Kf ) > η) ≤ Ce−εn . 2
14.6. EXPONENTIAL RANDOM GRAPH MODELS
261
This implies that if we choose a random Gn from φn for every n, then δ (WGn , Kf ) → 0 with probability 1. If Kf consists of a single graphon W0 (which is in a sense the “generic” case), then Gn → W0 with probability 1. Theorems 14.64 and 14.65 provide a framework for analyzing the behavior of exponential random graph models. This is by no means easy, and the results are interesting. Most work has been done for the parameters of the form f (G) = β1 t(K2 , G) + β2 t(K3 , G) (extending, in a sense, our discussions in Sections 2.1.1 and 16.3.2). We refer to Chatterjee and Diaconis [2012], Aristoff and Radin [2012], and Radin and Yin [2012] for recent results.
CHAPTER 15
Algorithms for large graphs and graphons We have seen in the Introduction that different kinds of algorithmic questions can be asked for a very large graph: we may want to estimate a parameter, test a property, or compute (in some sense) an additional structure for the graph. We sneak in a fourth one, making a distinction between “property distinction” and “property testing”. We will see that the theory of graph limits and other methods developed in this book provide valuable tools for the theoretical understanding of all these types of algorithmic problems.
15.1. Parameter estimation We want to determine some parameter of a very large graph G. Of course, we’ll not be able to determine the exact value of this parameter; the best we can hope for is that if we take a sufficiently large sample, we can find the approximate value of the parameter with high probability. To be precise, a graph parameter f is estimable, if for every ε > 0 there is a positive integer k such that if G is a graph with at least k nodes and we select a random k-set X of nodes of G, then from the subgraph G[X] induced by them we can compute an estimate g(G[X]) of f such that (15.1)
P(|f (G) − g(G[X])| > ε) < ε.
We call the parameter g a test parameter for f . However, we don’t really need this notion: we can always use g = f (cf. Goldreich and Trevisan [2003]). Indeed, (15.1) implies that P(|f (G[X]) − g(G[X])| > ε) < ε, and so P(|f (G) − f (G[X])| > 2ε) ≤ P(|f (G) − g(G[X])| > ε) + P(|g(G[X]) − f (G[X])| > ε) < ε + ε = 2ε, so we can choose the threshold k belonging to ε/2 in the original definition to get the condition obtained by replacing g by f . It is easy to see that estimability is equivalent to (saying )that for every convergent graph sequence (Gn ), the sequence of numbers f (Gn ) is convergent. (So graph parameters of the form t(F, .) are estimable by the definition of convergence.) Using this, for any estimable parameter f we can define a functional fb on W0 , where fb(W ) is the limit of f (Gn ) for any sequence of simple graphs Gn → W . It is also f0 , δ ). The functional fb does immediate that this functional fb is continuous on (W not determine the graph parameter f : defining f0 (G) = fb(WG ) we get a graph parameter with fb0 = fb, but f could be any parameter of the form f0 + h, where h(G) → 0 if v(G) → ∞. 263
264
15. ALGORITHMS FOR LARGE GRAPHS AND GRAPHONS
All this is, however, more-or-less just a reformulation of the definition. Borgs, Chayes, Lov´ asz, S´os and Vesztergombi [2008] gave a number of more useful conditions characterizing testability of a graph parameter. We formulate one, which is perhaps easiest to verify for concrete parameters. Theorem 15.1. A graph parameter f is estimable if and only if the following three conditions hold: (i) If Gn and G′n are simple graphs on the same node set (n = 1, 2, . . . ) and d (Gn , G′n ) → 0, then f (Gn ) − f (G′n ) → 0. ( ) (ii) For every simple graph G, f G(m) has a limit as m → ∞ (recall that G(m) denotes the graph obtained from G by blowing up each node into m twins). (iii) f (GK1 ) − f (G) → 0 if v(G) → ∞ (recall that GK1 is obtained from G by adding a single isolated node). Note that all three conditions are special cases of the statement that (iv) if |V (Gn )|, |V (G′n )| → ∞ and δ (Gn , G′n ) → 0, then f (Gn ) − f (G′n ) → 0. This condition is also necessary, so it is equivalent to its own three special cases (i)–(iii) in the Theorem. Proof. The necessity of condition (iv) (which implies (i)–(iii)) is easy: Suppose that there are two sequences of graphs Gn such that |V (Gn )|, |V (G′n )| → ∞ and δ (Gn , G′n ) → 0, but f (Gn ) − f (G′n ) ̸→ 0. By selecting a subsequence, we may assume that |f (Gn ) − f (G′n )| > ε for all n for some ε > 0. Going to a further subsequence, we may assume that the sequences (G1 , G2 , . . . ) and (G′1 , G′2 , . . . ) are convergent. But then δ (Gn , G′n ) → 0 implies that the interlaced graph sequence (G1 , G′1 , G2 , G′2 , . . . ) is convergent as well. However, the numerical sequence (f (G1 ), f (G′1 ), f (G2 ), f (G′2 ), . . . ) is not convergent, a contradiction. To prove the sufficiency of (i)–(iii), we start with proving the following stronger form of (i): (i’) If Gn and G′n are simple graphs with the same number of nodes (n = 1, 2, . . . ) and δ (Gn , G′n ) → 0, then f (Gn ) − f (G′n ) → 0. This follows by Theorem 9.29, which implies that one can overlay the graphs Gn and G′n so that d (Gn , G′n ) → 0. Consider a convergent graph sequence (G1 , G2 , . . . ), we prove that the sequence (f (G1 ), f (G2 ), . . . ) is convergent. Let ε > 0. Using (i’), we can choose an ε1 > 0 so that if δ (G, G′ ) ≤ ε1 , then |f (G) − f (G′ ) ≤ ε. Since the graph sequence is convergent, we can choose and( fix an integer n ≥ 1 so )that δ (Gn , Gm ) ≤ ε1 /2 for m ≥ n. By (ii), the sequence f (Gn (p)) : p = 1, 2, . . . (is convergent. Let a be its ) limit, then we can choose a threshold p0 ≥ 1 so that f Gn (p) − a ≤ ε for every integer p ≥ p0 . We may assume that p0 ≥ 4/ε1 . Finally, based on (iii), we can chose a threshold q ≥ 1 such that |f (GK1 ) − f (G)| ≤ ε/v(Gn ) whenever v(G) ≥ q. Now consider a member Gm of the sequence for which m ≥ n and v(Gm ) ≥ max(q, p0 v(Gn ), 4v(Gn )/ε1 ). We can write v(Gm ) = pv(Gn ) + r, where p ≥ p0 and 0 ≤ r < v(Gn ). Then Gm and G′ = Gn (p)K1r have the same number of nodes. Furthermore, ( ) δ (Gm , G′ ) ≤ δ Gm , Gn (p) + δ (Gn (p), G′ ) 2r ≤ ε1 . ≤ δ (Gm , Gn ) + pv(Gn )
15.1. PARAMETER ESTIMATION
265
Hence |f (Gm ) − f (G′ )| ≤ ε, and so ( ) ( ) f (Gm ) − a| ≤ |f (Gm ) − f (G′ ) + f (G′ ) − f Gn (p) | + |f Gn (p) − a ε ≤ε+r + ε < 3ε. v(Gn ) So for any two indices m1 and m2 that are large enough, we have |f (Gm1 ) − f (Gm2 )| < 6ε, which proves that f is estimable. Example 15.2 (Maximum cut). As a basic example, consider the density of maximum cuts (recall Example 5.18). One of the first substantial results on property testing (Goldreich, Goldwasser and Ron [1998], Arora, Karger and Karpinski [1995]) is that this parameter is estimable. In the introduction we gave an argument (which can be made precise using high concentration results like Azuma’s inequality) that if S is a sufficiently large random subset of nodes of G, then maxcut(G[S]) ≥ maxcut(G) − ε: a large cut in G, when restricted to S, gives a large cut in G[S]. It is harder, and in fact quite surprising, that if most subgraphs G[S] have a large cut, then so does G. This follows from Theorem 15.1 above, since conditions (i)–(iii) are easily verified for f = maxcut. Example 15.3 (Free energy). The “free energy” is a statistical physical quantity. Recall the definition of the energy of a map σ : V (G) → [q] (a configuration) from the introduction: ∑ (15.2) H(σ) = − Jφ(u),φ(v) , uv∈E(G)
and also the partition function (15.3)
Z=
∑
e−H(σ)/T ,
σ:V (G)→[q]
where T is the temperature (for simplicity, we don’t consider an external field). The mean field partition function of G can be obtained (formally) by considering a very high temperature: ∑ (15.4) Zmean = e−H(σ)/v(G) . σ:V (G)→[q]
The free energy is defined by ln Z(G, H) . v(G) It would exceed the framework of this book to explain the physics behind these names; let us just treat them as graph parameters related to homomorphism numbers. Note that the normalization is different from (2.15) in the exponent and therefore we only divide by v(G) (as opposed to (5.33), for example). For more about this connection, we refer to Borgs, Chayes, Lov´asz, S´os and Vesztergombi [2012]. The free energy (for a fixed weighted graph H) is a more complicated example of a estimable parameter, which illustrates the power of Theorem 15.1. It is difficult to verify directly either the definition, or say condition (iv). The theorem splits this task into three: condition (i) is easy by the definition of d (G, G′ ); (ii) is an exercise in classical combinatorics, in which we have to count mappings that split the twin classes in given proportions; finally, (iii) is trivial. (15.5)
F(G, H) = −
266
15. ALGORITHMS FOR LARGE GRAPHS AND GRAPHONS
Exercise 15.4. Show that neither one of the three conditions in Theorem 15.1 can be dropped. Exercise 15.5. Use Theorem 15.1 to prove that the i-th largest eigenvalue of a graph is an estimable parameter for every fixed i ≥ 1. Give a new proof of Theorem 11.54 based on this argument. Exercise 15.6. Fix a graphon W , then cut(W, F ), as a function of F , defines a simple graph parameter. Prove that it is estimable.
15.2. Distinguishing graph properties An algorithmic task of different nature is to test whether a given graph G has a certain property (e.g., is it connected, bipartite, or perfect). Before (e.g. in Section 4.3) we considered graph properties as 0-1 valued graph parameters. In this case, this would not be a useful approach, at least not if we wanted to reduce the question to parameter estimation: getting a 0-1 valued parameter with less than 50 percent error is tantamount to getting it exactly, which is clearly too much to require for very large graphs. Therefore we have to modify the definition to allow an error (which is then equal to 1, so very large in our setting), but with small probability. We start with a version of the problem that turns out simpler: how to distinguish two properties? Deciding whether a graph has a given property P will be then treated as distinguishing graph property P from the set of graphs that are far from having this property. Let P1 and P2 be two graph properties that are exclusive (i.e., P1 ∩ P2 = ∅). We want to design a method that, given an arbitrarily large graph G, looks at a sample of some fixed size k, and guesses whether the graph has property P1 or P2 . If the graph has property P1 or property P2 , we would like to guess right with rather high probability, say at least 2/3. If the graph does not have either one of the properties, then we don’t care what the guess was. The guess will be based on a third graph property Q, which we call the test property. If the sample G(k, G) has property Q, we guess that G has property P1 ; else, we guess that G has property P2 . So the precise definition is the following: we call properties P1 and P2 distinguishable by sampling, if there exists a positive integer k and a test property Q such that for every graph G with at least k nodes 2 ( ) ≥ 3 , if G ∈ P1 , P G(k, G) ∈ Q 1 ≤ , if G ∈ P2 . 3 The following lemma shows that the numbers 1/3 and 2/3 in this definition are arbitrary; we could replace them with any two numbers a and b with 0 < a < b < 1. Furthermore, we can replace the number k by every sufficiently large integer. Lemma 15.7. Let P1 and P2 be two graph properties with P1 ∩ P2 = ∅. Let 0 < a < b < 1 and 0 < c < d < 1. Suppose that there is a positive integer k and a test property Q such that for every graph G with at least k nodes { ( ) ≥ b if G ∈ P1 , P G(k, G) ∈ Q ≤ a if G ∈ P2 .
15.2. DISTINGUISHING GRAPH PROPERTIES
267
Then for every positive integer k ′ that is large enough there is a test property Q′ such that for every graph G with at least k ′ nodes { ) ( ≥ d if G ∈ P1 , P G(k ′ , G) ∈ Q′ ≤ c if G ∈ P2 . Proof. Let k ′ > k. For every graph F on k ′ nodes, let f (F ) = P(G(k, F ) ∈ Q), and define the property Q′ as the set of graphs F on k ′ nodes such that f (F ) ≥ (a + b)/2. ( ) Let G ∈ P1 , v(G) ≥ k ′ . Since G k, G(k ′ , G) is a random k-node subgraph of G, we have ( ) f0 = E f (G(k ′ , G)) = P(G(k, G) ∈ Q) ≥ b. Furthermore, the graph parameter f has the property that if we change edges in F incident with a given node v, then the validity of the event G(k, F ) ∈ Q changes only if the random k-subset contains v, which happens with probability k/k ′ . So the value f (F ) changes by at most k/k ′ . We can apply the Sample Concentration Theorem 10.2 to the parameter (k ′ /k)f , and get that ( ( ( ) a + b) b − a) P G(k ′ , G) ∈ / Q′ = P f (G(k ′ , G) ≤ ≤ P f (G(k ′ , G) ≤ f0 − ≤ e−t , 2 2 where t = (b − a)2 k ′ /(8k 2 ). Choosing k ′ large enough, this will be less than 1 − d, proving that Q′ and k ′ satisfy the first condition in the lemma. The second condition follows similarly. Using our Sampling Lemmas, we can give the following characterization of distinguishable properties. Theorem 15.8. For two graph properties P1 and P2 , the following are equivalent: (a) P1 and P2 are distinguishable by sampling; (b) there exists a positive integer k such that for any Gi ∈ Pi with v(Gi ) ≥ k, we have δ (G1 , G2 ) ≥ 1/k; (c) there exists a positive integer k such that for any Gi ∈ Pi with v(Gi ) ≥ k, we have ( ) 1 dvar G(k, G1 ), G(k, G2 ) ≥ . 3 Note that (b) could be phrased as P1 ∩ P2 = ∅. Proof. (a)⇒(c): Let P1 and P2 be distinguishable with sample size k and test property Q. Then for any two graphs G1 , G2 ∈ Pi , we have ( ) ( ) 1 P G(k, G1 ) ∈ Q − P G(k, G2 ) ∈ Q ≥ . 3
268
15. ALGORITHMS FOR LARGE GRAPHS AND GRAPHONS
On the other hand, ( ) ( ) P G(k, G1 ) ∈ Q − P G(k, G2 ) ∈ Q ∑( ( ) ( )) = P G(k, G1 ) = F − P G(k, G2 ) = F F ∈Q
≤
∑ ( ) ( ) P G(k, G1 ) = F − P G(k, G2 ) = F + F
( ) = dvar G(k, G1 ), G(k, G2 ) . This proves (c). (c)⇒(b): This follows immediately from the Counting Lemma 10.22. (b)⇒(a): Let δ (P1 , P2 ) = c > 0. Let k be large enough, and define the test property Q by { c} (15.6) Q = F : v(F ) = k, δ (F, P1 ) ≤ . 2 To see that this is a valid test property, consider any graph G ∈ P1 with v(G) ≥ k. By the Second Sampling Lemma 10.15, we have with probability at least 2/3 that ( ) 20 c δ G, G(k, G) ≤ √ < , 2 log k and if this happens, then G(k, G) ∈ Q. The condition on graphs in P2 follows similarly. As a consequence of this proof, we can make the following observation: While the test property Q looks like the key to this testing method, it can in fact be chosen in a very specific way, in the form (15.6). We can also talk about distinguishing two graphon properties by testing. We assume that we can get information about a given graphon W by generating a W -random graph G(k, W ). All the above can be repeated in this model, and we will not go through the details. 15.3. Property testing Testing for a single property is a more complicated business than distinguishing two properties. There is a large literature on this subject; here we restrict our attention to aspects that are related to graph limit theory, based on Lov´asz and Szegedy [2010a]. We start with a discussion of testing for a graphon property (which is easily handled using the results about distinguishing two properties), and then add the necessary work to apply this to testing graph properties. 15.3.1. Testable graphon properties. We call a graphon property R testable if it is closed in the δ metric, and there is a graph property R′ (called the test property for R), such that (a) for every graphon W ∈ R and every k ≥ 1, we have G(k, W ) ∈ R′ , with probability at least 2/3, and (b) for every ε > 0 there is a kε ≥ 1 such that for every graphon W with d1 (W, R) > ε and every k ≥ kε we have G(k, W ) ∈ / R′ with probability at least 2/3.
15.3. PROPERTY TESTING
269
The definition above is clearly one-sided, there is no symmetry between the property and its complement. Several other versions could be defined, but we’ll restrict our attention to this one. We could require (a) only for k ≥ k0 , but we can put all smaller graphs into R′ for free, so this would not give anything different. On the other hand, we need a threshold in (b) to depend on ε: for a fixed k, if d1 (W, R) is very small, then there is a graphon U ∈ R such that d1 (W, U ) is very small, and the distributions of G(k, U ) and G(k, W ) are almost the same, so no test property R′ can distinguish them. Example 15.9 (Complete or edgeless). Let R be the graphon property satisfied by the identically-0 and identically-1 graphons. Then the graph property “complete or edgeless” is a good test property for R, so R is testable. Example 15.10 (Constant graphon). Let R be the graphon property satisfied by the identically-1/2 graphon U . We show that this property is not testable. Consider a random graph Gn = G(n, 1/2); then ∥WGn − U ∥ → 0 with probability 1. Fix a sequence for which this happens. The distribution of G(k, WGn ) tends to the distribution of G(k, 1/2) for every fixed k, so for every possible test property R′ , if G(k, 1/2) ∈ R′ with probability at least 2/3, then G(k, WGn ) ∈ R′ with probability at least 1/2 (for every k, if n is large enough). On the other hand, d1 (WGn , R) = 1/2 for every n, and hence we should have G(k, WGn ) ∈ / R′ with probability at least 2/3 if k is large enough. We note that the complementary property W0 \R is testable: since d1 (W, Rc ) = 0 for every graphon W , the identically true property as a good test property. Similarly as for property distinguishing, one may feel that the tricky choice of the test property R′ is crucial. but in fact, once a property is testable, we can use a very simple test property: Proposition 15.11. Let R be a testable graphon property. Then { } 20 R′ = F : v(F ) = 1 or δ (WF , R) ≤ √ log v(F ) is a valid test property for R. Proof. First, suppose that W ∈ R, and let k ≥ 2. By the Second Sampling Lemma 10.16, we have ( ) 20 δ W, G(k, W ) ≤ √ log k with probability at least 2/3. Thus G(k, W ) ∈ R′ with probability larger than 2/3. Second, let R′′ be any valid test property for R. By definition, for every ε > 0 there is a k ≥ 1 (depending on ε, R and R′′ ) such that whenever d1 (W, R) > ε for some graphon W , then P(G(k, W ) ∈ R′′ ) ≤ 1/3. Let U ∈ R, then P(G(k, U ) ∈ R′′ ) ≥ 2/3. This implies that the variation distance of the distributions of G(k, W ) 2 and G(k, U ) is at least 1/3. By Corollary 10.25, this implies that δ (W, U ) ≥ 13 2−k . 2 This holds for all U ∈ R, so δ (W, R) ≥ 13 2−k . Let n be large enough (depending on k), and consider the W -random graph G(n, W ), and the corresponding graphon Wn = WG(n,W ) . Then with high probability ( ) 1 ( ) 1 2 2 20 20 >√ . δ Wn , R ≥ 2−k − δ Wn , W ≥ 2−k − √ 3 3 log n log n
270
15. ALGORITHMS FOR LARGE GRAPHS AND GRAPHONS
So G(n, W ) ∈ / R′ with high probability, and this shows that R′ is a valid test property. It is not always easy to decide about a graphon property whether it is testable, and we will have to develop some theory to prove properties of further, more interesting examples. We start with showing that testability and distinguishability are closely related. Let Rcε denote the set of graphons W such that d1 (W, R) ≥ ε. Proposition 15.12. A graphon property R is testable if and only if it is distinguishable by sampling from the property Rcε for every ε > 0. Proof. The “only if” part is straightforward to check. To verify the “if” part, suppose that R and Rcε are distinguishable by sampling for every ε > 0. Using Lemma 15.7, this means that for every ε > 0 there is a positive integer kε and a test property Qε such that for every k ≥ kε , we have 2 ( ) ≥ 3 , if G ∈ R, P G(k, W ) ∈ Qε ≤ 1 , if G ∈ Rc . ε 3 We may assume that kε increases if ε decreases. One difficulty is that we need to define a single test property Q, not the family {Qε }. But this is easy. Let F ∈ Q if and only if k1/m ≤ v(F ) < k1/(m+1) and F ∈ Q1/m for some positive integer m. Then Q works for every ε > 0. The second difficulty is that we want P(G(k, W ) ∈ Q) for all k, not just for k ≥ kε . But this property holds for k ≥ k1 by definition, so all we have to do is to include all graphs with fewer that k1 nodes in Q. The following corollary to Theorem 15.8 and Proposition 15.12 provides an analytic characterization of testable graphon properties. Recall that the distances d1 and d are related trivially by d ≤ d1 . Testability of a property means, in a sense, an inverse relation: Corollary 15.13. A closed graphon property R is testable if and only if δ (R, Rcε ) > 0 for every ε > 0. The condition given in this corollary can be rephrased in various ways, for example: For every sequence of graphons (Wn ) such that d (Wn , R) → 0, we have d1 (Wn , R) → 0. We show that this condition is equivalent to a seemingly weaker condition. (Recall the definition of flexing from Section 14.3.3.) Lemma 15.14. A closed graphon property R is testable if and only if for every U ∈ R and every sequence of graphons (Wn ) such that Wn → U and every Wn is a flexing of U , we have d1 (Wn , R) → 0. Proof. Let (Wn ) be a sequence of graphons such that d (Wn , R) → 0. If d1 (Wn , R) 9 0, then we may take a subsequence for which lim inf d1 (Wn , R) > 0, and then a further subsequence such that δ (Wn , U ) → 0 for some U ∈ R. Define { U (x, y) if U (x, y) ∈ {0, 1}, ′ Wn (x, y) = Wn (x, y) otherwise.
15.3. PROPERTY TESTING
271
Then Wn′ is a flexing of U . Furthermore, we have ∫ ∫ ∥Wn − Wn′ ∥1 = Wn + (1 − Wn ) → 0 U =0
by By
U =1 Lemma 8.22. This implies that δ (Wn , Wn′ ) → 0, and hence δ (Wn′ , U ) the hypothesis of the lemma implies that d1 (Wn′ , R) → 0. Hence d1 (Wn , R) ≤ d1 (Wn′ , R) + ∥Wn − Wn′ ∥1 → 0.
→ 0.
Since the condition of the last lemma is trivially fulfilled if R is flexible, we get a useful corollary: Corollary 15.15. Every closed flexible graphon property is testable. In particular, the closure of every hereditary property is testable. The following result can be viewed as the graphon analogue of the theorem of Fischer and Newman [2005] (from which the finite theorem can be derived). Theorem 15.16. A closed graphon property R is testable if and only if the functional d1 (., R) is continuous in the cut norm. Proof. If d1 (., R) is continuous, then Corollary 15.13 implies that R is testable. Suppose that R is testable. The functional d1 (., R) is lower semicontinuous in the cut norm by Lemma 14.15. To prove upper semicontinuity, let W, Wn ∈ W0 and let ∥Wn − W ∥ → 0. We claim that lim supn d1 (Wn , R) ≤ d1 (W, R). Let ε > 0, and let U ∈ R be such that ∥W − U ∥1 ≤ d1 (W, R) + ε. By Proposition 8.25, there is a sequence of graphons Un such that ∥Un − U ∥ → 0 and ∥Un − Wn ∥1 → ∥U − W ∥1 . By Corollary 15.13 (in the form in the remark after its statement), it follows that d1 (Un , R) → 0, and so d1 (Wn , R) ≤ ∥Wn − Un ∥1 + d1 (Un , R) → ∥U − W ∥1 . Hence lim sup d1 (Wn , R) ≤ ∥U − W ∥1 ≤ d1 (W, R) + ε. n→∞
Since ε > 0 is arbitrary, this implies that d1 (., R) is upper semicontinuous.
Example 15.17 (Neighborhood of a property). Let S ⊆ W0 be an arbitrary graphon property and let a > 0 be an arbitrary number. Then the property R = {U ∈ W0 : δ (U, S) ≤ a} is testable. To show this, we use Corollary 15.13. For ε > 0 define ε′ = aε/2. Let W ∈ B (R, ε′ ). Then W ∈ B (S, a+ε′ ), and so there is a U ∈ S such that ∥U −W ∥ ≤ a + 2ε′ . Consider Y = (1 − ε)W + εU . Then ∥Y − U ∥ = ∥(1 − ε)(U − W )∥ ≤ (1 − ε)(a + 2ε′ ) < a, so Y ∈ R. Furthermore, ∥W − Y ∥1 = ∥ε(W − U )∥1 ≤ ε, and so W ∈ B1 (R, ε). Since W was an arbitrary element of B (R, ε′ ), this implies that δ (R, Rcε ) > ε′ . Example 15.18 (Subgraph density). For every fixed graph F and 0 < c < 1, the property R of a graphon W that t(F, W ) = c is testable. Let us verify that for every ε > 0 there is an ε′ > 0 such that d1 (W, R) ≥ ε implies that d (W, R) ≥ ε′ . Assume that d1 (W, R) ≥ ε, then t(F, W ) ̸= c; let (say) t(F, W ) > c. The graphons Us = (1 − s)W , 0 ≤ s ≤ ε, are all in B1 (W, ε), and hence not in R. It follows that t(F, Us ) > c for all 0 ≤ s ≤ ε. Since t(F, Uε ) = (1 − ε)e(F ) t(F, W ), this implies that t(F, W ) > (1 − ε)−e(F ) c. Thus for every U ∈ R we have t(F, W ) − t(F, U ) ≥
272
15. ALGORITHMS FOR LARGE GRAPHS AND GRAPHONS
( ) (1 − ε)−e(F ) − 1 c. By the Counting Lemma 10.23 this implies that δ (U, W ) ≥ ( ) (1 − ε)−e(F ) − 1 c/e(F ). Choosing the right hand side in this inequality as ε′ , we get that δ (W, R) ≥ ε′ , which we wanted to verify. Fixing two subgraph densities, however, may yield a non-testable property: for example, t(K2 , W ) = 1/2 and t(C4 , W ) = 1/16 imply that W ≡ 1/2 (see Section 1.4.2), and we have seen that this graphon property is not testable. 15.3.2. Testable graph properties. We call a graph property P testable, if for every ε > 0, graphs in P can be distinguished from graphs farther than ε from P in the edit distance. To be more precise, recall that d1 (F, G) is defined for two graphs on the same node set, and it denotes their normalized edit distance. So for a graph G on n nodes, d1 (G, P) is the minimum number of edges to be changed in order to get a graph with property P, divided by n2 . If there is no graph in P on n nodes, then we define d1 (F, P) = 1. Let Pεc denote the set of simple graphs F such that d1 (F, P) ≥ ε; then we want to distinguish P from Pεc by sampling. This notion of testability is usually called oblivious testing, which refers to the fact that no information about the size of G is assumed. Using our analytic language, we can give several reformulations of the definition of testability of a graph property, which are often more convenient to use. (T1) if (Gn ) is a sequence of graphs such that v(Gn ) → ∞ and δ (Gn , P) → 0, then d1 (Gn , P) → 0; (T2) for every ε > 0 there is an ε′ > 0 such that if G and G′ are simple graphs such that v(G′ ), v(G′ ) ≥ 1/ε′ , G ∈ P and δ (G, G′ ) < ε′ , then d1 (G′ , P) < ε; (T3) P ∩ Pεc = ∅ for every ε > 0. The equivalence of these with testability follows by Theorem 15.8. From this characterization of testability it follows that if P is a testable property such that infinitely many graphs have property P, then for every n that is large enough, it contains a graph on n nodes. Indeed, suppose that for infinitely many n, P contains a graph Gn on n nodes but none on n + 1 nodes. We may assume c that Gn → W ∈ W0 . Then Gn K1 ∈ P1/2 and Gn K1 → W , so W ∈ P ∩ Pεc , a contradiction. It is surprising that this rather restrictive definition allows many testable graph properties: for example, bipartiteness, triangle-freeness, every property definable by a first order formula (Alon, Fischer, Krivelevich and Szegedy [2000]). Let us begin with some simple examples. Example 15.19 (Nonempty). Let P be the graph property that “G has at least one edge”. This is testable with the identically true test property. This example sounds like playing in a trivial way with the definition. The following examples are more substantial. Example 15.20 (Large clique). Let P be the graph property ω(G) ≥ v(G)/2. Then P is testable. This can be verified using (T2): we show that for every ε > 0 there is an ε′ > 0 such that δ (G, P) ≤ ε′ implies that d1 (G, P) ≤ ε. We show that ε′ = exp(−10000/ε6 ) does the job. ′ Indeed, if δ (G, P) ≤ ε′ , then there is (a graph H ∈) P such that δ (G, H) ( ≤ )2ε . ′ Let V (H) = [q] and V (G) = [p], then δ G(q), H(p) ≤ 2ε , and since v G(q) = √ ( ) ( ) v H(p) = pq, Theorem 9.29 implies that δb G(q), H(p) ≤ 45/ − log(2ε′ )
0, then qr+1 , . . . , qp < εq, and so (r − k)q + (r + k)εq ≥ ∑ q = rq. This implies the bound on k. Let G1 = G[1, . . . , r], then i i 1 3 2 2 ε p q ≥ 2
∑
qi qj ≥ (e(G1 ) − kr)(εq)2 ,
ij∈E(G1 )
whence
ε ε ε e(G1 ) ≤ kr + p2 ≤ p2 + p2 < εp2 . 2 2 2 So adding at most εp2 edges to G, we can create a complete subgraph with p/2 nodes, showing that d1 (G, P) ≤ ε. Example 15.21 (Triangle-free). Let P be the property of being triangle-free. Then P is a valid test property for itself. It is trivial that if G is triangle-free, then any sample G(k, G) is also triangle-free. The other condition is, however, far from being trivial. If G(k, G) is triangle-free with probability at least 2/3, and k is large enough, then G has very few triangles. Hence by the Removal Lemma 11.64, we get that we can change (in this case, delete) a small number of edges so that we get rid of all triangles. In fact, the Removal Lemma is equivalent to the testability of triangle-freeness. Theorem 15.24 below will give a general sufficient condition for testability, which will imply the Removal Lemma. There is a tight connection between testability of graph properties and the testability of their closures. To formulate it, we need a further definition. A graph property P is robust, if for every ε > 0 there is an ε0 > 0 such that if G is a graph with v(G) ≥ 1/ε0 and d1 (WG , P) ≤ ε0 , then d1 (G, P) ≤ ε. Another way of stating this is that if (Gn ) is a sequence of graphs such that d1 (WGn , P) → 0, then d1 (Gn , P) → 0. (For a more combinatorial formulation of this property, see Exercise 15.30.) Theorem 15.22. (a) A graphon property is testable if and only if it is the closure of a testable graph property. (b) A graph property is testable if and only if it is robust and its closure is testable. It is not the first time in this book that results about graphons are nice and easy, but to describe the connection between the notions for graphons and the corresponding notions for graphs is the hard part. This is true in this case too, and we will omit some details of the rather long proof of this theorem; see Lov´asz and Szegedy [2010a] for these details. Before going into the proof, let us look at an example.
274
15. ALGORITHMS FOR LARGE GRAPHS AND GRAPHONS
Example 15.23 (Alternating). A graph property with a testable closure need not be itself testable. Let P be the graph property that the graph is complete if the number of nodes is even, but edgeless if the number of nodes is odd. Clearly, this is not testable. The closure of P is the graphon property valid for W ≡ 0 and W ≡ 1, which is testable (Example 15.9). Proof. We start with proving that the closure of a testable graph property is testable. It suffices to prove that if (Wn ) is a sequence of graphons such that d (Wn , P) → 0, then d1 (Wn , P) → 0. We may assume that the sequence Wn is convergent, so Wn → U for some U ∈ W0 (in the δ distance). Clearly U ∈ P, so by the definition of closure, there are graphs Hn ∈ P such that Hn → U . Fix any ε > 0. By Theorem 15.16, there is an ε′ > 0 such that if |V (G)|, |V (H)| are large enough, H ∈ P, and δ (G, H) < ε′ , then d1 (G, P) < ε. Furthermore, there is an nε ≥ 1 such that if n ≥ nε , then δ (WHn , U ), δ (Wn , U ) ≤ ε′ /3. Fix any n ≥ nε , and let Gn,m (m = 1, 2, . . . ) be a sequence of graphs such that Gn,m → Wn as m → ∞. Then, provided m is large enough, δ (Hn , Gn,m ) ≤ δ (WHn , U ) + δ (U, Wn ) + δ (Wn , WGn,m ) < ε′ . Hence by the choice of ε′ , we have d1 (Gn,m , P) ≤ ε. This means that there are graphs Jn,m ∈ P with V (Jn,m ) = V (Gn,m ) such that d1 (Gn,m , Jn,m ) ≤ ε. By choosing a subsequence, we can assume that Jn,m → Un as m → ∞ for some Un ∈ P. Applying Lemma 14.16 we obtain that d1 (Wn , P) ≤ δ1 (Wn , Un ) ≤ lim inf δ1 (WGn,m , WJn,m ) ≤ lim inf d1 (Gn,m , Jn,m ) ≤ ε. m→∞
m→∞
The proof of the converse in (a) (which we will not use in this book) is omitted. Next we show that every testable graph property P is robust. Let (Gn ) be a sequence of graphs such that d1 (WGn , P) → 0, then d (WGn , P) → 0. This implies that there are graphons Un ∈ P such that d (WGn , Un ) → 0. By the definition of P, this implies that there are simple graphs Hn ∈ P such that δ (Gn , Hn ) → 0. This means that δ (Gn , P) → 0. Since P is testable, this implies that d1 (Gn , P) → 0. Finally, we show that if a graph property P is robust and its closure is testable, then P is testable. Let (Gn ) be a sequence of graphs such that d (Gn , P) → 0. Then d (WGn , P) → 0. Since P is testable, this implies that d1 (WGn , P) → 0. By robustness, we get that d1 (Gn , P) → 0, which proves that P is testable. 15.3.3. Hereditary and testable properties. A surprisingly general sufficient property for testability was found by Alon and Shapira [2008]. Theorem 15.24 (Alon–Shapira). Every hereditary graph property is testable. We have seen that the Removal Lemma is perhaps the simplest nontrivial special case of this theorem. The analytic proof of the Removal Lemma we described in Section 11.8 can be extended; we don’t go into the details of this, instead we state and prove a more general result, which gives a full characterization of testable graph properties in terms of being “almost hereditary”. Testable properties are not necessarily hereditary, as Examples 15.19 and 15.20 show, so this condition is not necessary and sufficient. But there is a weaker version of heredity that can be used to characterize testability: we consider only those induced subgraphs that are close to the original graph in the cut-distance. (Recall that by the Second Sampling Lemma 10.15, almost all sufficiently large subgraphs
15.3. PROPERTY TESTING
275
have this property.) Clearly this theorem implies Theorem 15.24. The proof will be quite involved, using much of the material developed earlier in this chapter. Theorem 15.25. A graph property P is testable if and only if for every ε > 0 there is an ε′ > 0 such that if H ∈ P and G is an induced subgraph of H with v(G) ≥ 1/ε′ and δ (G, H) < ε′ , then d1 (G, P) < ε. Another way to state the condition is that if Hn ∈ P (n = 1, 2, . . . ) is a sequence of simple graphs, Gn is an induced subgraph of Hn , and δ (Gn , Hn ) → 0, then d1 (Gn , P) → 0. Informally, induced subgraphs inherit the property of the big guy, but they have to pay an inheritance tax; the tax is however small if the descendants are also big and they are close to the big guy. Proof. The “only if” part is trivial, since the condition is a special case of the reformulation (T2) of testability. By theorem 15.22, it suffices to prove that P is testable and P is robust. We start with proving a graphon version of the condition in the theorem. Claim 15.26. Let U ∈ P and let (Gn ) be a sequence of simple graphs with Gn → U . Also assume that tind (Gn , U ) > 0. Then d1 (Gn , P) → 0. Since U ∈ P, there is a sequence of simple graphs Hm ∈ P (m = 1, 2, . . . ) such that Hm → U . Condition tind (Gn , U ) > 0 implies that tind (Gn , Hm ) > 0 for every n if m is large enough. Furthermore, both Gn → U and Hm → U , and hence δ (Gn , Hm ) → 0 if n, m → ∞. So if n is large enough, we can) select an m(n) such ( that Gn is an induced subgraph of Hm (n) and δ Gn , Hm (n) → 0. The condition in the Theorem implies the claim. To prove that P is testable, we use Lemma 15.14. So let us consider a graphon U ∈ P and a sequence of graphons Wn → U where every Wn is a flexing of U . We want to prove that d1 (Wn , P) → 0. For each n, we choose a simple graph Gn such that (15.7)
v(Gn ) ≥ n,
(15.8)
tind (Gn , Wn ) > 0,
(15.9)
δ (Gn , U ) ≤ δ (Wn , U ) +
1 , n 1 d1 (Gn , P) ≥ d1 (Wn , P) − . n
(15.10)
This is not difficult: Gn = Gnk = G(k, Wn ) will satisfy these conditions with high probability if k is sufficiently large. Indeed, (15.7) and (15.8) are essentially trivial, and (15.9) follows by Lemma 10.16. To verify (15.10), we select a graph Hnk ∈ P with V (Hnk ) = [k] such that d1 (Gnk , P) = d1 (Gnk , Hnk ) ≥ δ1 (WGnk , WHnk ). Let k → ∞, then WGnk → Wn in the δ -distance with probability 1, and (by selecting an appropriate subsequence) WHnk → Un ∈ P. By Lemma 14.16, we get that lim inf δ1 (WGnk , WHnk ) ≥ δ1 (Wn , Un ) ≥ δ1 (Wn , P), n→∞
which proves that (15.10) is satisfied if k is large enough.
276
15. ALGORITHMS FOR LARGE GRAPHS AND GRAPHONS
Claim 15.26 implies that d1 (Gn , P) → 0. Indeed, condition (15.8) implies that tind (Gn , U ) > 0 (here we use that Wn is a flexing of U ). Furthermore, Gn → U by (15.9), so Claim 15.26 applies. From here, the testability of P follows easily: 1 d1 (Wn , P) ≤ δ1 (Wn , P) ≤ δ1 (Gn , P) + → 0. n Our second task is to prove that P is robust: if (Gn ) is a sequence of simple graphs such that d1 (WGn , P) → 0, then d1 (Gn , P) → 0. Let Wn ∈ P be such that ∥WGn − Wn ∥1 → 0. By selecting an appropriate subsequence, we may assume that δ (Wn , U ) → 0, for some graphon U . Clearly U ∈ P and Gn → U . Consider the random graph G′n = G′ (v(Gn )), U ). We have tind (G′n , U ) > 0 with probability 1, and by Lemma 10.18, with probability tending(to 1, G′n →)U . Furthermore, an easy computation (cf. Exercise 10.14) gives that E d1 (Gn , G′n ) = E(∥WGn − WG′n ∥1 ) = ∥WGn − Wn ∥1 → 0, and so with high probability, we have d1 (Gn , G′n ) → 0. By Claim 15.26, this implies that d1 (G′n , P) → 0. Hence d1 (Gn , P) ≤ d1 (Gn , G′n ) + d1 (G′n , P) → 0, which proves that P is robust. Other characterizations of testable graph properties are known. Alon, Fischer, Newman and Shapira [2006] characterized testable graph properties in terms of Szemer´edi partitions (we refer to their paper for the formulation). Fischer and Newman [2005] connected testability to estimability. We already stated a version of this result for graphons (Theorem 15.16), from which it can be derived (we don’t go into the details): Theorem 15.27. A graph property is testable if and only if the normalized edit distance from the property is an estimable parameter. Exercise 15.28. Prove that the graph property ω(G) ≥ v(G)/2 satisfies the condition given in Theorem 15.25. Exercise 15.29. Prove the following analogue of Proposition 15.11 for finite graphs: If P is a testable graph property, then { } 20 P ′ = F : v(F ) = 1 or δ (F, P) ≤ √ log v(F ) is a valid test property for P. Exercise 15.30. Prove that graph property P is robust if and only if for every ε > 0 there is an ε0 > 0 such that if G is a graph with v(G) ≥ 1/ε0 and G has infinitely many near-blowups G′ with d1 (G′ , P) ≤ ε0 , then d1 (G, P) ≤ ε.
15.4. Computable structures 15.4.1. Similarity distance and representative sets. Let us recall from the Introduction that if we want to run algorithms on a very large graph G, it is very useful to define and also compute a “representative set” of nodes, a (fairly large, but bounded size) subset R ⊆ V (G) such that every node is “similar” to one of the nodes in R. We want to define a distance that reflects for two nodes how “similar” their positions in the graph are. We define the similarity distance of two nodes s, t ∈ V (G) as follows: for any node w, compute 1 a(s, t; w) = |N (w) ∩ N (s)| − |N (w) ∩ N (t)| , n
15.4. COMPUTABLE STRUCTURES
and let (15.11)
dsim (s, t) =
1 n
∑
277
a(s, t; w).
w∈V (G)
We can think of a(s, t; w) as a measure of how different s and t are from the point of view of w; then dsim (s, t) is an average measure of this difference. Of course, w could be more myopic and not look for neighbors of s and t among its own neighbors, but look only for s and t; then dsim (s, t) would measure the size of the symmetric difference of the neighborhoods of s and t. As explained in the Introduction, this is a perfectly reasonable definition, but it would not measure what we want. The node w could also look for second or third neighbors of s and t, but this would not give anything more useful than this definition, at least for dense graphs. There are many ways to rephrase this definition. We can pick three random nodes w, v, u ∈ V (G), and define (15.12) dsim (s, t) = Ew Ev (asv avw ) − Eu (atu auw ) , where (aij ) is the adjacency matrix of G. We could use v = u here, I just used different variables to make the correspondence with the definition clearer. We can also notice that dsim (s, t) is the L1 distance of rows s and t of the square of the adjacency matrix, normalized by n2 . Finally, the similarity distance is quite closely related to the distance rWG , discussed in Section 13.4: dsim (s, t) = rWG (x, y), where x and y are arbitrary points of the intervals representing s and t in WG . There is an easy algorithm to compute (approximately) the similarity distance of two nodes. Algorithm 15.31. Input: A graph G given by a sampling oracle, two nodes s, t ∈ V , and an error bound ε > 0. Output: A number D(s, t) ≥ 0 such that with probability at least 1 − ε, D(s, t) − ε ≤ dsim (s, t) ≤ D(s, t) + ε. The algorithm is based on (15.12). Select a random node w and fix it temporarily. Select O(1/ε2 ) random nodes v and compute the average of asv avw , to get a number that is within an additive error of (ε/4 to Ev (asv avw ) ). Estimate Ev (atv avw ) similarly. This gives an estimate for |Ev auw (asv − atv ) | with error at most ε/2 with high probability. Repeat this O(1/ε2 ) times and take the average to get D(s, t). Next, we specialize Theorem 13.31 to graphs. Theorem 15.32. Let G = (V, E) be a graph. (a) If P = {S1 , . . . , Sk } is a partition of V (G) such that d (G, GP ) = ε, then we can select a node vi ∈ Si from each partition class such that the average dsim distance from S = {v1 , . . . , vk } is at most 4ε. (b) If S ⊆ V is a subset such that the average dsim -distance from S = {v1 , . . . , vk } is √ε, then the Voronoi cells of S form a partition P such that d (G, GP ) ≤ 8 ε. We define a representative set with error ε > 0 as a subset R ⊆ V (G) such that any two elements of R are at a (similarity) distance at least ε/2, and the average distance of nodes from R is at most 2ε. (The first condition is not crucial for the
278
15. ALGORITHMS FOR LARGE GRAPHS AND GRAPHONS
applications we want to give, but it guarantees that the set is chosen economically.) 2 Theorem 13.31 implies that such a set R exists with |R| ≤ 232/ε . Furthermore, such a set can be constructed in our model. Algorithm 15.33. Input: A graph G given by a sampling oracle, and an error bound ε. 2
Output: A random set R ⊆ V (G) such that |R| ≤ (64/ε2 )1028/ε , and with probability at least 1 − ε, R is a representative set with error ε. The set R is grown step by step, starting with the empty set. At each step, a new uniform random node w of G is generated, and the approximate distances D(w, v) are computed for all v ∈ R with error less than ε/4 with high probability. If all of these are larger than 3ε/4, then w is added to R. Else, w is discarded and a new random node is generated. If R is not increased in k = ⌈2000 1ε log 1ε ⌉ steps, the algorithm halts. We have to make sure that we don’t make the mistake of stopping too early. It is clear that as long as the average distance from R is larger than 2ε, then the probability that a sample has distance at least ε is at least ε, and so the probability that in k iterations we don’t pick a node whose distance from R is less than ε is less than e−kε . If we find a good node u, then with high probability the approximate distance satisfies D(u, R) > 3ε/4, and so we add u to R. Hence the probability that we stop prematurely is less than e−kε E(|R|) ≤ ε. The size of R can be bounded using Proposition 13.32, which gives the bound on the output size. We can say more. Suppose √ that there exists a representative set R with error ε. Then only a fraction of 2 ε nodes of G (call these nodes √ “remote”) are at a distance more than ε from R. Let us run the above algorithm √ √ with ε replaced by 2 ε, to get a representative set R′ with error q ε. The set R′ will contain at most one non-remote node from every Voronoi cell of R. We have little control√ over how many remote nodes we selected, but we can post-process the ′ result. The ε-balls around the non-remote √ nodes in R cover all the non-remote nodes, so only leave out a fraction of 2 ε of all nodes. By sampling and brute force, we can select the smallest subset R′′ ⊆ R′ with this property. This way √ we have constructed a representative set R′′ with |R′′ | ≤ |R| and error at most 3 ε. As a special case, if there is a representative set whose size is polynomially bounded in the error ε, our algorithm will find one with a somewhat worse polynomial bound. Remark 15.34. One could try to work with a stronger notion: define a strong representative set with distance ε > 0 as a subset R ⊆ V (G) such that any two elements of R are at a (similarity) distance at least ε, and any other node of G is at a distance at most ε from R. It is trivial that every graph contains a strong representative set: just take a maximal set of nodes any two of which are at least ε apart. Furthermore, Proposition 13.32 shows that the size of such a set can be bounded by a function of ε. There are, however, several problems with the idea of computing and using it. First, in our very large graph model, the similarity distance cannot be computed exactly; second (and more importantly) the graph can have a tiny remote part which no sampling will discover but a representative of which should be included in the strong representative set.
15.4. COMPUTABLE STRUCTURES
279
15.4.2. Computing regularity partitions. As an easy application of Theorem 15.32, we give an algorithm to compute a weak Szemer´edi partition in a huge graph. Our goal is to illustrate how an algorithm works in the pure sampling model, as well as in what form the result can be returned. (This way of presenting the output of an algorithm for a large graph was proposed by Frieze and Kannan [1999]). A polynomial time algorithm to compute a regularity partition in the traditional setting of graph algorithms was given by Alon, Duke, Lefmann, R¨odl and Yuster [1994]. Algorithm 15.33 enables us to encode a partition of V (G) as a subset R ⊆ V (G): for each r ∈ R, we can define the partition class Vr as the Voronoi cell of r. Ties will be broken arbitrarily, and nodes to which there are several “almost closest” nodes may be misclassified, but this is the best one can hope for. To formalize, Algorithm 15.35. Input: A graph G given by a sampling oracle, a subset R ⊆ V (G), a node u ∈ V , and an error bound ε > 0. Output: A (random) node v ∈ R almost closest to v in the sense that with probability at least 1 − ε, dsim (u, v) ≤ (1 + ε)dsim (u, R). This algorithm uses Algorithm 15.31 to compute (approximately) the distances dsim (u, r), r ∈ R, and returns the node r ∈ R that it finds closest to u. In other words, we compute the Voronoi cells of the set R. Theorem 15.32 says in this context that the partition determined by Algorithms 15.33 and 15.35 satisfies d (G, GP ) ≤ ε with high probability. We omit the details of the error analysis. So we get a weak regularity partition. It is not known whether stronger regularity partitions can be computed in the sampling model. 15.4.3. Computing a maximum cut. The algorithm to approximately compute the maximum cut is similar. Algorithm 15.36. Input: A graph G given by a sampling oracle, a subset R ⊆ V (G), and an error bound ε > 0. Output: A partition R = R1 ∪ R2 . For the partition P implicitly determined above, we can also compute the edge densities between the partition classes, which we use to weight the edges of the complete graph on R, so that we get a weighted graph H. We find the maximum cut in H by brute force, to get a partition R = R1 ∪ R2 . This gives an implicit definition of a cut in G, where a node u is put on the left side of the cut iff D(u, R1 ) < Db(u, R2 ) for the approximate distances computed by Algorithm 15.31. Remark 15.37. 1. A notion closely related to testability and to computing a structure, called “local reparability”, was introduced by Austin and Tao [2010]. We only give an informal definition. We start with a graph property P and a graph for which almost all induced subgraphs of size N have property P; we say that G has the property N -locally. Now we want to repair G to have property P itself, by changing a small fraction of the edges. So far, this is essentially the same as testability, but we have to do the repair by a local algorithm as follows. We select a random sample A ⊆ V (G) of size k (where k ≤ N is chosen appropriately), and
280
15. ALGORITHMS FOR LARGE GRAPHS AND GRAPHONS
for every pair of nodes u, v ∈ V (G), we compute whether they should be adjacent, knowing only the induced subgraph G[A ∪ {u, v}]. We could decide simply the adjacency of them in G, but then we would not do any repair. Taking the subgraph G[A] and its connections to u and v also into account, our algorithm will define in a modified graph G′ . This graph G′ should have property P, and its edit distance from G should be arbitrarily small if N and k are large enough. It may or may not be possible to do so. We say that P is locally reparable if it is always possible. Austin and Tao prove, among others, that every hereditary property is reparable. For the exact definitions, formulation, proofs, generalizations to hypergraphs and other results we refer to the paper. 2. There is a natural nondeterministic version of testability, introduced by Lov´ asz and Vesztergombi [2012] . A property of finite graphs is called nondeterministically testable if it has a “certificate” in the form of a coloring of the nodes and edges with a bounded number of colors, adding new edges with other colors, and orienting the edges, such that once the certificate is specified, its correctness can be verified by random local testing. Here are a few examples of properties that are nondeterministically testable in a natural way: “the graph is 3-colorable;” “the graph contains a clique on half of its nodes;” “the graph is transitively orientable”; “one can add at most v(G)2 /100 new edges to make the graph perfect.” Using the theory of graph limits, it is proved that every nondeterministically testable property is deterministically testable. In a way, this means that P = N P in the world of property testing for dense graphs. (Many, but not all, of the properties described above are also covered by Theorem 15.24.) We will see that for bounded-degree graphs, the analogous statement does not hold. In fact, the study of nondeterministic certificates will lead to a new interesting notion of convergence (Section 19.2).
CHAPTER 16
Extremal theory of dense graphs Extremal graph theory was one of the motivating fields for graph limit theory, as described in the Introduction. It is also one of the most fertile fields of applications of graph limits. In this chapter we give an exposition of some of the main directions. We start with two sections developing some technical tools, reflection positivity and variational calculus. Then we discuss extremal problems for complete graphs and some other specific problems. We re-prove some classical general results in extremal graph theory, and finally, we treat some very general questions (formulated in the introduction) about decidability of extremal graph problems and the possible structure of extremal graphs.
16.1. Nonnegativity of quantum graphs and reflection positivity We are interested in linear inequalities between the densities of some subgraphs. (Why just linear? We have seen in the Introduction that, using the multiplicativity of subgraph densities, algebraic inequalities between them can be replaced by equivalent linear inequalities. To be sure, there are nontrivial non-algebraic inequalities that hold between subgraph densities; see Exercise 16.21. But their theory is virtually completely unexplored.) As we have seen in the Introduction, many results in extremal graph theory can be stated in this form. We have also seen an example of a proof that relied on simple computations with quantum graphs. To make the problem and the methods precise, for a set U ⊆ W and for a quantum graph x we write x ≥ 0 (for U) if t(x, W ) ≥ 0 for every graphon W ∈ U. Most of the time (but not always) we will be concerned with U = W0 , and then we will suppress the “(for U)” part of the notation. The condition x ≥ 0 is equivalent to requiring that t(x, G) ≥ 0 for every simple graph G. Reflection positivity of the subgraph densities (Proposition 7.1) implies many inequalities of this type. (In a sense it implies all, as we will see in Theorem 16.41.) For every k ≥ 0, every graphon W , every set of simple k-labeled graphs {F1 , . . . , Fm }, and every vector a ∈ Rm , we have m ∑
ai aj t([[Fi Fj ]], W ) ≥ 0,
i,j=1
or, using our notation, m ∑
ai aj [[Fi Fj ]] ≥ 0.
i,j=1 281
282
16. EXTREMAL THEORY OF DENSE GRAPHS
This is equivalent to (16.1)
m (∑
[[
)2 ai Fi
]] ≥ 0.
i,j=1
So we get the fact, used implicitly in the Introduction (Section 2.1.3), that unlabeling the square of a k-labeled quantum graph, we get a nonnegative quantum graph. Let us also recall the trivial fact that adding or deleting isolated nodes to a graph F does not change the homomorphism densities t(F, .). We call a quantum graph g a square-sum if there are k-labeled quantum graphs y1 , . . . , ym for some k such ∑ that g can be obtained from i yi2 by unlabeling and adding or deleting isolated nodes. We have just shown that every square-sum satisfies g ≥ 0. Another important property of semidefinite matrices is that their determinant is nonnegative: for any set {F1 , . . . , Fm } of graphs and any graphon W , we have t([[F1 F1 ]], W ) . . . t([[F1 Fm ]], W ) .. .. ≥ 0. . . t([[Fm F1 ]], W ) . . . t([[Fm Fm ]], W ) Expanding this determinant, we get a polynomial inequality, which can be turned into a linear inequality involving subgraph densities, using multiplicativity of the parameter t(., W ). This can be expressed as nonnegativity of a quantum graph: [[F1 F1 ]] . . . [[F1 Fm ]] .. .. (16.2) ≥ 0. . . [[Fm F1 ]] . . . [[Fm Fm ]] This is still a rather complicated inequality, but one special case will be useful: (16.3)
[[F1 F1 ]][[F2 F2 ]] ≥ [[F1 F2 ]]2 .
Another consequence of reflection positivity is the following: let (aij )m i,j=1 be a symmetric positive semidefinite matrix, then (16.4)
m ∑
aij [[Fi Fj ]] ≥ 0.
i,j=1
We can add two relations related to subgraphs. First, adding an isolated node to a graph F does not change its density in any graph or graphon, and so F K1 − F ≥ 0,
but also
F − F K1 ≥ 0.
Furthermore, if F ′ is a subgraph of F , then t(F ′ , W ) ≥ t(F, W ) for every graphon W , and hence (16.5)
F ′ − F ≥ 0.
Exercise 16.1. Show that inequalities (16.2) (16.4) and (16.5) can be derived from the inequalities (16.1). Exercise 16.2. Prove the following “supermodularity” inequality: if F1 and F2 are two simple graphs on the same node set, then F1 ∪ F2 + F1 ∩ F2 ≥ F1 + F2 .
16.2. VARIATIONAL CALCULUS OF GRAPHONS
283
16.2. Variational calculus of graphons One advantage of working in the large space of graphons instead of the much simpler (discrete) space of graphs is that we can do continuous deformations of a graphon. This is useful when trying to optimize some functional, or when studying conditions that uniquely determine graphons. The following considerations are conceptually not too difficult, but technically more involved. To express the variation of graphon functionals, we need some notation. For every multigraph F = (V, E) and node i ∈ V , let F i denote the 1-labeled quantum graph obtained by labeling i by 1. For every edge ij ∈ E, let F ij denote the 2labeled quantum graph obtained from F by deleting the edge ij, and labeling i by 1 and j by 2. The 2-labeled quantum graph Fij is∑ constructed similarly,∑ but the edge ij is not deleted. So Fij = K2•• F ij . Let F † = i∈V F i and F ‡ = 21 i,j: ij∈E F ij (each edge contributes two terms, since its endpoints can be labeled∑ in two ways; this is why it will be convenient to divide by 2). Similarly, let F ♮ = 12 i,j: ij∈E Fij . We extend the operators F 7→ F † , F 7→ F ‡ and F 7→ F ♮ linearly to all quantum graphs. We have x♮ = K2•• x‡ . Example 16.3. Clearly Cn‡ = nPn•• , where Pn•• denotes the path on n nodes with its endpoints labeled. So txy (Cn‡ , W ) = ntxy (Pn•• , W ) = nW ◦(n−1) . The Edge Reconstruction Conjecture 5.31 says in this language that if F and G are simple graphs that are large enough (both have at least four non-isolated nodes), then [[F ‡ ]] = [[G‡ ]] implies that F ∼ = G. We study two kinds of variations of a kernel W : in the more general version, we change the values of W at every node. However, it is often easier to construct variations in which the measure on [0, 1] is rescaled. This simpler kind variation will have the advantage that if we start with a graphon, then we don’t have to worry about the values of W running out of the interval [0, 1]. We start with describing the variation of the measure. Consider a family αs : [0, 1] → R+ (s ∈ [0, 1]) of weight functions such that ∫1 α s (x) dx = 1 for every s. Every such function defines a probability measure µs 0 on [0, 1] by ∫ µs (A) = αs (x) dx. A
Every kernel W on [0, 1]2 gives rise to a family of kernels Ws , where Ws = ([0, 1], µs , W ). For every finite graph F , we have ∫ ∏ ∏ t(F, Ws ) = W (xi , xj ) dµs (xi ) ∫ =
[0,1]V (F ) ij∈E(F )
∏ [0,1]V (F ) ij∈E(F )
i∈V (F )
W (xi , xj )
∏ i∈V (F )
αs (xi )
∏
dxi .
i∈V (F )
We say that the family (αs ) has uniformly bounded derivative, if for every x ∈ [0, 1] d the derivative α˙ s (x) = ds αs (x) exists, and there is a constant M > 0 such that |α˙ s (x)| ≤ M for all x and s. If αs is a family of weight functions with uniformly bounded derivative, then by elementary analysis it follows that the function
284
16. EXTREMAL THEORY OF DENSE GRAPHS
t(F, Ws ) is differentiable as a function of s, and ⟨ ⟩ d (16.6) t(F, Ws ) = α˙ s , tx (F † , Ws ) . ds In the second type of variation, we consider a family Us ∈ W (0 ≤ s ≤ 1) of kernels. We say that the family Us has uniformly bounded derivative, if for every d x, y ∈ [0, 1] the derivative U˙ s (x, y) = ds Us (x, y) exists, and there is a constant ˙ M > 0 such that |Us (x)| ≤ M for all x and s. If Us is a uniformly bounded family of kernels with uniformly bounded derivative, then the function t(F, Ws ) is differentiable as a function of s, and ⟨ ⟩ d t(F, Us ) = U˙ s , txy (F ‡ , Us ) . (16.7) ds As an application of these formulas, we derive versions of the Kuhn–Tucker conditions in optimization for a graphon minimizing a smooth function of a finite number of homomorphism densities (rephrasing conditions of Razborov [2007, 2008] in our language). For a functional ω on W0 , we say that a graphon W is a local minimizer, if there is an ε > 0 such that ω(U ) ≥ ω(W ) for every graphon U with ∥U − W ∥1 < ε. (We could define this notion with respect to other norms, but this will be the version we use.) Lemma 16.4. Let g be a simple quantum graph, and suppose that W is a local minimizer of t(g, W ) over W ∈ W0 . (a) For almost all x ∈ [0, 1], tx (g † , W ) = t([[g † ]], W ). (b) For almost all x, y ∈ [0, 1], = 0 if 0 < W (x, y) < 1, ‡ txy (g , W ) ≥ 0, if W (x, y) = 0, ≤ 0, if W (x, y) = 1. (c) For almost all x, y ∈ [0, 1], txy (g ♮ , W ) ≤ 0. We will prove a more general lemma. Lemma 16.5. Let Φ : Rm → R be a differentiable function, and let Φi = ∂ Let F1 , . . . , Fm , simple graphs, and suppose that W is a local minimizer ∂xi Φ. ( ) ( ) of Φ t(F1 , W ), . . . , t(Fm , W ) over W ∈ W0 . Set ai = Φi t(F1 , W ), . . . , t(Fm , W ) . (a) For almost all x ∈ [0, 1], m ∑
( ) ai tx (Fi† , W ) − v(Fi )t(Fi , W ) = 0.
i=1
(b) For almost all x, y ∈ [0, 1],
= 0 ai txy (Fi‡ , W ) ≥ 0, i=1 ≤ 0,
m ∑
if 0 < W (x, y) < 1, if W (x, y) = 0, if W (x, y) = 1.
(c) For almost all x, y ∈ [0, 1], m ∑ i=1
ai txy (Fi♮ , W ) ≤ 0.
16.3. DENSITIES OF COMPLETE GRAPHS
285
∫ Proof. (a) Let φ : [0, 1] → [−1, 1] be a measurable function such that φ = 0. For s ∈ [−1, 1], we re-weight the points of [0, 1] by αs (x) = 1 + sφ(x) to get the graphon Ws . Using (16.6), we get ∫1 m ∑ ) d ( Φ t(F1 , Ws ), . . . , t(Fm , Ws ) = ai φ(x)tx (Fi† , W ) dx ds s=0 i=1 0
∫1 =
φ(x) 0
m ∑
ai tx (Fi† , W ) dx.
i=1
Since W is a local minimizer, and ∥Ws −W ∥1 is arbitrarily small if s is small (see ∑ Exercise 13.7), this derivative must be 0 for all φ. This implies that i ai tx (Fi† , W ) is a constant function of x almost everywhere. Integrating over x, we recover the value of the constant, which proves (a). (b) Let U ∈ W1 be a function such that U (x, y) ≥ 0 if W (x, y) = 0 and U (x, y) ≤ 0 if W (x, y) = 1. Then for every s ≥ 0 in a small neighborhood of 0, W + sU ∈ W0 . Using (16.7), ) d ( Φ t(F1 , W + sU ), . . . , t(Fm , W + sU ) ds s=0 ∫ m ∑ = U (x, y) a′i txy (Fi‡ , W ) dx dy. i=1
[0,1]2
Since W is a local minimizer, we must have ∫ U (x, y) [0,1]2
m ∑
a′i txy (Fi‡ , W ) dx dy ≥ 0.
i=1
This must hold for all functions U ∈ W1 such that U (x, y) ≥ 0 if W (x, y) = 0 and U (x, y) ≤ 0 if W (x, y) = 1, which implies (b). (c) This follows from (b) by multiplying by W (x, y).
Exercise 16.6. For W ∈ W0 , define δ(W ) = minx∈[0,1] tx (K2• , W ) and ∆(W ) = maxx∈[0,1] tx (K2• , W ) (minimum degree and maximum degree). Prove that for any tree T , δ(W )v(T )−1 ≤ t(T, W ) ≤ ∆(W )v(T )−1 . Exercise 16.7. For a simple graph F , prove that sup{t(K2 , W ) : W ∈ W0 , t(F, W ) = 0} = sup{δ(W ) : W ∈ W0 , t(F, W ) = 0}.
16.3. Densities of complete graphs The problem of describing relationships between densities of complete graphs in a graph G has received special attention. In this section we survey results about these questions in the framework of our book.
286
16. EXTREMAL THEORY OF DENSE GRAPHS
16.3.1. Linear inequalities. We start with a result that characterizes linear inequalities between complete subgraph densities. This was proved in a special case by Bollob´as [1976], but, as observed by Schelp and Thomason [1998], his proof extends to the proof of the general theorem. Schelp and Thomason give further applications of the method (see Exercise 16.16). Theorem 16.8. Let g be a quantum graph whose constituents are complete graphs. Then g ≥ 0 if and only if t(g, Kn ) ≥ 0 for every n ≥ 1. Note that the case n = 1 is included, which means that t(g, 0) ≥ 0; in other words, the coefficient of K0 in g is nonnegative. Since Kn tends to the all-one function as n → ∞, the condition implies that t(g, 1) ≥ 0. Stating the result more directly: an inequality of the form (16.8)
m ∑
ai t(Ki , G) ≥ 0
i=1
holds for every graph G if and only if (16.9)
m ∑ i=1
ai
(n)i ≥0 ni
holds for every integer n ≥ 1. Proof. The “only if” direction is trivial. To prove the “if” direction, suppose that t(g, Kn ) ≥ 0 for all n; we want to prove that t(g, W ) ≥ 0 for every W ≥ 0. It suffices to prove this for any dense set of graphons W , and we choose the set graphons WH , where H is a node-weighted simple graph (all edgeweights are 0 or 1). Let V (H) = [q], and let α1 , . . . , αq ≥ 0 be the ∑ nodeweights (we allow 0 nodeweights for this argument). We may assume that i αi = 1. Supposing that there is an H with t(g, H) < 0, choose one with minimum number of nodes, and choose the nodeweights so as to minimize t(g, H). Then all the nodeweights must be positive, since a node with weight 0 could be deleted without changing any subgraph density, contradicting the minimality of q. Clearly t(g, H) is a polynomial in the nodeweights αi . Furthermore, the assumptions that the constituents of g are complete and H has no loops imply that every homomorphism contributing to t(g, H) is injective, and so t(g, H) is multilinear. Next we prove that H must be complete. Indeed, if (say) nodes 1, 2 ∈ V (H) are nonadjacent, then t(g, H) has no term containing the product α1 α2 , i.e., fixing the remaining variables, t(g, H) is a linear function of α1 and α2 . Since only the sum of α1 and α2 is fixed, we can shift them keeping the sum fixed and not increasing the value of t(g, H) until one of them becomes 0. This is a contradiction, since we know that all weights must be positive. To show that all weights are equal, let us push the argument above a bit further. Fixing all variables but α1 and α2 , we can write t(g, H) = a + b1 α1 + b2 α2 + cα1 α2 . Since H is complete, we know that t(g, H) is a symmetric multilinear polynomial in α1 , . . . , αq , and so b1 = b2 . Since α1 + α2 is fixed, we get t(g, H) = a′ + cα1 α2 , where a′ does not depend on α1 or α2 . If c ≥ 0, then this is minimized when α1 = 0 or α2 = 0, which is a contradiction as above. Hence we must have c < 0, and in this case t(g, H) is minimized when α1 = α2 . Since this holds for any two variables, all the αi are equal.
16.3. DENSITIES OF COMPLETE GRAPHS
287
But this means that t(g, H) = t(g, Kq ), which is impossible since t(g, Kq ) ≥ 0 by hypothesis. This completes the proof. As a corollary (which is in fact equivalent to the theorem) we get the following. Fix ( an integer m ≥ 1, )and associate with every graphon W the vector tW = t(K2 , W ), . . . , t(Km , W ) . Let Tm denote the set of the vectors tW . It follows from Theorem 11.21 and Corollary 11.15 that Tm is the closure of the points tG , where G is a simple graph (we write tG for tWG ). Corollary 16.9. The extreme points of the convex hull of Tm are the vectors tKn (n = 1, 2, . . . ) and (1, . . . , 1). The following corollary is interesting to state in view of the undecidability result of Hatami and Norine [2011] already mentioned in the Introduction (which will be proved as Theorem 16.34 a little later). It is easy to design an algorithm to check whether (16.9) holds for every n, and hence: Corollary 16.10. For quantum graphs g with rational coefficients whose constituents are complete graphs, the property g ≥ 0 is algorithmically decidable. As a further corollary, we derive Tur´an’s Theorem for graphons. Corollary 16.11. For every r ≥ 2, we have f0 , t(Kr , W ) = 0} = 1 − max{t(K2 , W ) : W ∈ W
1 , r−1
and the unique optimizer is W = WKr−1 . One could prove this result along the lines of several well-known proofs of Tur´ an’s Theorem; we could also prove a generalization of Goodman’s inequality 2.2. Specializing the proof of Theorem 16.8 above just to this case, we get the proof by “symmetrization”, due to Zykov [1949]. Proof. Let us prove the inequality (16.10)
rr t(Kr , W ) − (r − 1)t(K2 , W ) + r − 2 ≥ 0.
By Theorem 16.8, it suffices to verify this inequality when W = WKn for some n ≥ 1. This is straightforward, and we also see that equality holds for n = r − 1 only. Corollary 16.9 implies that equality holds in (16.10) only if W = WKr−1 . In the special case with t(Kr , W ) = 0, we get Corollary 16.11. 16.3.2. Edges vs. triangles. In the introduction (Section 2.1.1) we mentioned several results about the number of triangles in a graph, if the number of edges is known: Goodman’s bound and its improvements, and the Kruskal–Katona Theorem. In this Section we describe the exact relationship between the edge density and triangle density in a graph, i.e., we describe the set D2,3 ; for convenience, we recall how it looks (Figure 16.1; also recall that the figure is distorted to be able to see some features better). As a special case of Corollary 16.9, we get the result of Bollob´as [1976] mentioned in the introduction: Corollary 16.12. The set D2,3 is contained in the convex hull of the points (1, 1) and ( ) ( n − 1 (n − 1)(n − 2) ) , (n = 1, 2, . . . ). tn = t(K2 , Kn ), t(K3 , Kn ) = n n2
288
16. EXTREMAL THEORY OF DENSE GRAPHS
Figure 16.1. However, a quick look at Figure 2.1 shows that Corollary 16.12 does not tell the whole story: between any two special points (including the endpoints of the upper boundary), the domain D2,3 is bounded by a curve that appears to be concave. It turns out that these curves are indeed concave, which can be proved by the same kind of argument as used in the proof of Theorem 16.8 (see Exercise 16.18). The formula for these curves (cubic equations) is more difficult to obtain, and this will be our main concern in the rest of this section. The Kruskal–Katona bound. Let us start with the curve bounding the domain T from above, which is not hard to determine: its equation is y = x2/3 . As we mentioned in the introduction, this follows from (a very special case of) the Kruskal–Katona Theorem in extremal hypergraph theory. Here we give a short direct proof using the formalism of graph algebras. Applying (16.3) with F1 = P3•• and F2 = P2•• , we get ( )( ) 2 2 2 = = ≤ ≤
[[
]]
[[ ]][[ ]]
(the last step uses the trivial monotonicity (16.5)). This shows that t(K3 , W ) ≤ t(K2 , W )3/2 for every graphon W , what we wanted to prove. We also want to prove that this upper bound on the triangle density is sharp. For n ≥ 1, let G consist of a complete graph on k nodes and n − k isolated nodes. Then t(K2 , G) =) (k)2 /n2 and t(K3 , G) = (k)3 /n3 . Clearly, points of the form ( (k)2 /n2 , (k)3 /n3 get arbitrarily close to any point on the curve y = x2/3 . Razborov’s Theorem. To determine the lower bounding curve of D2,3 is much harder (Razborov [2008]); even the result is somewhat lengthy to state. Perhaps the best way to remember it is to describe a family of extremal graphons (which are all node-weighted complete graphs). Theorem 16.13. For all 0 ≤ d ≤ 1, the minimum of t(K3 , W ) subject to W ∈ W0 and t(K2 , W ) = d, is attained by the stepfunction W = WH , where H is a weighted 1 complete graph on k = ⌈ 1−d ⌉ nodes with edgeweights 1 and appropriate nodeweights: k − 1 of the nodeweights are equal and the last one is at most as large as these. One indication of the difficulty of the proof is that the extremal graphon is not unique, except for the special values d = 1 − 1/k. Let us consider the interval I representing the smallest weighted node and the interval J representing any other node. Restricted to I ∪ J, the graphon is bipartite and hence triangle-free. If we
16.3. DENSITIES OF COMPLETE GRAPHS
289
replace the function WH on (I ∪J)×(I ∪J) by any other triangle-free function with the same integral, then neither the edge density nor the triangle density changes, but we get a different extremal graphon. The nodeweights can be determined by simple computation. With a convenient parametrization suggested by Nikiforov [2011], they can be written as (1 + u)/k, . . . , (1 + u)/k, (1 − (k − 1)u)/k. The edge density in the extremal graph is k−1 k−1 (16.11) t(K2 , H) = d = (1 − u2 ) = (1 + u)(1 − u), k k and the triangle density is (k − 1)(k − 2) (k − 1)(k − 2) (1−3u2 −2u3 ) = (1+u)2 (1−2u). 2 k k2 This gives a parametric for the cubic curve bordering the domain in Figure [ k−2 equation ] 2.1 in the interval k−1 , k−1 . We can solve (16.11) for u as a function of d, and then k substitute this in (16.12) to get an explicit expression for t(K3 , H) as a function of d (this hairy formula is not the way to understand or remember the result; but we need it in the proof): (16.12) t(K3 , H) =
(16.13)
(k − 1)(k − 2) (1 + u)2 (1 − 2u) k2 √ √ (k − 1)(k − 2) ( kd )2 ( kd ) = 1 + 1 − 1 − 2 1 − . k2 k−1 k−1
t(K3 , H) = f (d) =
1 (where k = ⌈ 1−d ⌉). This function is rather complicated, and I would not even bother to write it out, except that we need its explicit form in the proof below. Perhaps the following form says more: ( )2 ( 3kd k 2 f (d) kd )3 (16.14) 1− + = 1− . 2(k − 1) 2(k − 1)(k − 2) k−1
This shows that after an appropriate affine transformation, every concave piece of the boundary of the region D2,3 looks alike (including the curve bounding the region from above). Perhaps this is trying to tell us something—I don’t know. These considerations allow us to reformulate Theorem 16.13 in a more direct form: Theorem 16.14. If G is a graph with t(K2 , G) = d, then t(K3 , G) ≥ f (d). The original proof of this theorem uses Razborov’s flag algebra technique, which is basically equivalent to the methods developed in this book. Since then, the result has been extended by Nikiforov [2011] to the number of K4 ’s and by Reiher [2012] to all complete graphs. We describe the proof of Razborov’s Theorem in our language. (Reiher’s proof can be viewed as a generalization of this argument to all complete graphs; the generalization is highly nontrivial.) ( ) Proof. Let W ∈ W0 minimize t(K3 , W ) − f t(K2 , W ) subject to k−2 k−1 ≤ k−1 t(K2 , W ) ≤ k , and suppose (by way of contradiction) that the minimum value is negative. Set d = t(K2 , W ) and √ 3(k − 2) ( kd ) 3(k − 2) ′ (16.15) λ = f (d) = 1+ 1− = (1 + u). k k−1 k
290
16. EXTREMAL THEORY OF DENSE GRAPHS
(The representation in terms of the parameter u will be useful if you want to follow some of the computations below). Since the objective function is 0 at the endpoints k−1 of the interval, we must have k−2 k−1 < d < k , and so W is a local minimizer in W0 . Let us simplify notation by writing g1 ≡ g2 (for W ) and g1 ≤ g2 (for W ) for two quantum graphs g1 and g2 if t(g1 , W ) = t(g2 , W ) and t(g1 , W ) ≤ t(g2 , W ), respectively. We invoke the formulas obtained by variational calculus on graphons, and get by Lemma 16.5(a) that (16.16)
− 2λ
3
− 2λ
=3
(for W )
(if a graph pictogram has a black node, this means that the equation or inequality holds for every choice of the image of the black node). Multiplying (16.16) by the edge with one node labeled, and then unlabeling, we get (16.17)
− 2λ
3
=3
− 2λ
−λ
≤ 0 (for W ).
(for W ).
By Lemma 16.5(c), (16.18)
3
Multiplying with the signed 3-node path with both edges negative and with both endpoints labeled, we get (16.19)
≤λ
3
(for W ),
which can be written as −6
3
+3
≤λ
− 2λ
+λ
(for W ),
or, simplifying and leaving just the 4-node graphs on the left, (16.20)
−6
3
≤ (λ − 3)
− 2λ
+λ
(for W ).
We get a third inequality from inclusion-exclusion: =
−3
−
+3
,
whence (16.21)
−3
3
−
=
−
≤
−
(for W ).
Adding (16.17), (16.20) and (16.21), we get 0 ≤ (λ − 2) We can replace every
+λ
1 2
− 2λ
−
(for W ).
by d, to get
(λ + 3d − 2)
(16.22)
+3
≥ λ(2d2 − d) +
(for W ).
2 3 ),
we can just ignore the K4 term (it is nonnegative), and For k = 3 (i.e., < d < hence (upon verifying that λ + 3d − 2 > 0) t(K3 , W ) ≥
λ(2d2 − d) f ′ (d)(2d2 − d) = ′ = f (d) λ + 3d − 2 f (d) + 3d − 2
(where the last equality is easy to check). However, if k ≥ 4, then we need a nontrivial lower bound for t(K4 , W ) (note that the extremal graph H contains many K4 -s). Let ∫ 1 • (16.23) µ = 2λd − 3t(K3 , W ) and w(z) = tz (K2 , W ) = W (z, y) dy. 0
16.3. DENSITIES OF COMPLETE GRAPHS
291
Then we can write (16.16) as (16.24)
3tz (K3• , W ) = 2λw(z) − µ.
The numbers λ and µ will play an important role in our computations, so let us have a closer look at them. Recall that λ = f ′ (d) is given by (16.15), which yields the bounds 3(k − 2) 3(k − 2) (16.25) ≤λ≤ . k k−1 For µ we don’t have an explicit formula in terms of d alone, but we can get rather tight bounds: from Goodman’s Theorem and the indirect hypothesis we get d(2d − 1) ≤ t(K3 , W ) < f (d), which in turn implies 2df ′ (d) − 3f (d) < µ ≤ 2df ′ (d) − 3d(2d − 1). From here, it takes straightforward computation to verify that 3µ k−2 k−1 < 2 ≤ k−2 λ k−3 We see that µ > 0. Furthermore, inequality 16.18 implies that (16.26)
(16.27)
3tz (K3• , W ) ≤ λw(z),
for almost all z ∈ [0, 1], and then (16.27) and (16.24) imply that µ µ ≤ w(z) ≤ . 2λ λ For any point z ∈ [0, 1], the density of K4 -s containing it is just the density of triangles in its neighborhood. To be precise, for any point z ∈ [0, 1], we define a new graphon Wz on [0, 1], by keeping the same W but adding a nodeweight function W (z, .)/w(z). Then t(K2 , Wz ) =
tz (K3• , W ) 2λw(z) − µ = . 2 w(z) 3w(z)2
µ µ The right hand side is a monotone increasing function of w(z) for w(z) ∈ [ 2λ , λ ], and hence 2λw(z) − µ λ2 k−2 t(K2 , Wz ) = ≤ ≤ . 2 3w(z) 3µ k−1 ( ) So by induction on k, we know that t(K3 , Wz ) ≥ f t(K2 , Wz ) , and ( 2λw(z) − µ ) tz (K4• , W ) = t(K3 , Wz )w(z)3 ≥ f w(z)3 . 3w(z)2
Hence t(K4 , W ) ≥
∫1 ( 2λw(z) − µ ) f w(z)3 dz. 3w(z)2 0
The ugly integrand is hopeless to evaluate directly; the trick is to find a (lower bound ) w3 . that is linear in w(z) and approximates it well enough. Set g(w) = f 2λw−µ 2 3w µ µ Let w0 ∈ [ 2λ , λ ] be chosen so that (16.28)
2λw0 − µ k−3 = . 3w02 k−2
292
16. EXTREMAL THEORY OF DENSE GRAPHS
This is possible, since the left side, as a function of w0 , ranges from 0 to λ2 /(3µ) on this interval, and the target value (k − 3)/(k − 2) is in this range by (16.26). Then (k − 3) 3 (k − 3)(k − 4) 3 g(w0 ) = f w = w0 . k−2 0 (k − 2)2 ( ) The linear function we will use goes through the point w0 , g(w0 ) and has appropriate slope: Claim 16.15. Let λ and µ be real numbers satisfying (16.25) and (16.26), and let µ µ , λ ], we have w0 satisfy (16.28). Then for every w ∈ [ 2λ 1 (2λ2 − 3µ)(w − w0 ). 3 All functions in this claim are explicit, which makes it a (hard and tedious) exercise in first year calculus; we do not reproduce the details of its proof. Using this claim, we have ∫1 ∫1 ( ) 1 1 2 2 t(K4 , W ) ≥ g w(z) dz ≥ g(w0 ) − (2λ − 3µ)w0 + (2λ − 3µ) w(z) dz 3 3 g(w) − g(w0 ) ≥
0
0
(k − 3)(k − 4) 3 1 = w0 + (2λ2 − 3µ)(d − w0 ). (k − 2)2 3 Hence, returning to (16.22), (k − 3)(k − 4) 3 1 w0 + (2λ2 −3µ)(d−w0 ). (k − 2)2 3 This is another messy formula, but we can express the variables in terms of u and y = w0 /(1 + u) (the latter is, of course, chosen with hindsight): we already have expressions for λ and d; we have w0 = y(1 + u), and then µ can be expressed using (16.28). With these substitutions, the difference of the two sides looks like ( k − 2) (16.30) (1 + u)2 y − k ( ) (k − 3 k−1 3k − 7 ) 2(k − 1)(k − 3) 2 × (1 − u) − u+ y+ (1 + u)y . k k−2 k−2 k(k − 2) (16.29) (λ+3d−2)t(K3 , W ) ≥ λ(2d2 −d)+
2
(k−2) 1 We know that 0 < u < k−1 , and (16.28) and (16.26) imply that k−2 k ≤ y ≤ k(k−3) . Then we face another exercise in calculus, to show that in this range (16.30) is negative (we don’t describe the details). This contradicts (16.29), and completes the proof.
It would of course be important to find a more “conceptual” proof of Theorem 16.13. As a couple of examples of the kind of general question that arises, is an algebraic inequality between densities of complete graphs decidable? Does such an inequality hold true if it holds true for all node-weighted complete graphs? Exercise 16.16. Let g be a quantum graph such that every constituent with negative coefficient is complete. Prove that tinj (g, W ) ≥ 0 for every graphon W if and only if tinj (g, Kn ) ≥ 0 for all n ≥ 1 (Schelp and Thomason [1998]). Exercise 16.17. Prove that for quantum graphs g with rational coefficients whose constituents are complete graphs, the property g ≥ 0 is in P . (The input length is the total number of digits in the numerators and denominators of the coefficients, and complete k-graphs (0 ≤ k ≤ m) contribute 1 even it their coefficient is 0.)
16.4. THE CLASSICAL THEORY OF EXTREMAL GRAPHS
293
Exercise 16.18. Adopt the proof of Theorem 16.8 to prove that the boundary of the domain D2,3 (Fig. 2.1) is concave between any two special points tn and tn+1 . Exercise 16.19. (a) Let Kr′ denote the graph obtained by deleting an edge from Kr . Prove that t(Kr , G)2 ′ t(Kr+1 , G) ≥ . t(Kr−1 , G) (b) Prove that ) ( ′ , G) . t(Kr+1 , G) − t(Kr , G) ≤ r t(Kr+1 , G) − t(Kr+1 (c) Prove that t(Kr+1 , G) t(Kr , G) ≤ (r − 1) + 1. t(Kr−1 , G) t(Kr , G) (d) Prove the following result of Moon and Moser [1962]: If Nr denotes the number of complete r-graphs in G, then ( ) Nr Nr+1 1 ≥ 2 r2 − N1 . Nr r −1 Nr−1 Exercise 16.20. Prove the following generalization of Goodman’s Theorem (2.2): if d = t(K2 , W ) is the edge density of a graph G, then ( ) t(Kr , G) ≥ d(2d − 1)(3d − 2) · · · (r − 1)d − (r − 2) . Exercise 16.21. Prove that ( ) t(K4′ , G) ≥ t(K3 , G)2 log∗ 1/t(K3 , G) . r
[T. Tao; hint: use the Removal Lemma.]
16.4. The classical theory of extremal graphs Following the exposition of this topic in the introduction, let us state now in more general and more precise terms the extremal graph results of Erd˝os and Stone [1946], Erd˝os and Simonovits [1966], and Simonovits [1968]. In this more general setting, we exclude several graphs L1 , . . . , Lk as subgraphs of a simple graph G, and we want to determine the maximum number of edges of G, given the number of nodes n. In our formalism, we want to solve (16.31)
max{t(K2 , G) : tinj (L1 , G) = · · · = tinj (Lk , G) = 0}.
Tur´ an’s Theorem (Corollary 16.11) is a special case when k = 1 and L1 = Kr . The key results are summed up in the following theorem. Theorem 16.22. Let L1 , . . . , Lk be simple graphs and let r = mini χ(Li ). Suppose that a simple graph G does not contain any Li as a subgraph. Then 1 (16.32) t(K2 , G) ≤ 1 − + o(1) (v(G) → ∞). r−1 Asymptotic equality holds G = T (n, r − 1) is the Tur´ an graph on n = v(G) nodes with r − 1 color classes. Furthermore, this extremal graph is stable in the following sense: For every ε > 0 there is an ε′ > 0 (depending on L1 , . . . , Lk and ε, but not on G) such that if t(K2 , G) ≥ 1 − 1/(r − 1) − ε′ , then δb1 (Gn , T (n, r − 1)) ≤ ε. Theorem 16.22 can be translated to the language of graphons, and proved quite easily using our general results on graph limits. As in almost all of our applications of graph limit theory, the original treatment has the advantage that it provides explicit bounds for the o(1) term as well as the dependence of ε′ on ε in the theorem above.
294
16. EXTREMAL THEORY OF DENSE GRAPHS
Theorem 16.23. Let L1 , . . . , Lk be simple graphs and let r = mini χ(Li ). Then max{t(K2 , W ) : W ∈ W0 , t(L1 , W ) = · · · = t(Lk , W ) = 0} = 1 −
1 , r−1
and the unique optimizer (up to weak isomorphism) is W = WKr−1 . Proof. Let, say χ(L1 ) = r. Then t(L1 , Kr ) > 0, and hence it follows easily that t(Kr , W ) = 0 (Exercises 7.6, 13.25). Application of Tur´an’s Theorem for graphons (Corollary 16.11) completes the proof. It is not quite trivial, but not very hard either, to derive the classical results mentioned above from Theorem 16.23. Let us illustrate this by the derivation of stability. If stability fails, then there exists a sequence Gn of simple graphs such that tinj (L1 , Gn ) = · · · = tinj (Lk , Gn ) = 0 and t(K2 , Gn ) → 1 − 1/(r − 1), but δb1 (Gn , T (n, r−1)) ̸→ 0. By Theorem 9.30, this implies that δ1 (Gn , T (n, r−1)) ̸→ 0. In graphon language, this means that δ1 (WGn , WKr−1 ) ̸→ 0. By choosing a subsequence and than a subsequence of that, we may assume that δ1 (WGn , WKr−1 ) > a for some a > 0 for all n, and Gn → U for some graphon U . Then t(L1 , U ) = · · · = t(Lk , U ) = 0 and t(K2 , U ) = 1 − 1/(r − 1), so by Theorem 16.23, U is weakly isomorphic to WKr−1 . So Gn → WKr−1 . Since WKr−1 is 0-1 valued, it follows by Proposition 8.24 that δ1 (WGn , WKr−1 ) → 0, a contradiction. 16.5. Local vs. global optima One advantage of embedding the set of graphs to the large space of graphons is that for every optimization problem, we can define local optima, and study then with the methods of analysis. We describe three problems where this treatment gives interesting results. In our first example, every local optimum is also global by convexity, and this fact can be exploited to get a short proof. In the second example, we don’t know whether local and global optima are the same; we can determine the local ones, and perhaps these are also global. In the third example, local and global optima are quite different, and the consideration of local optima leads to interesting results. 16.5.1. The distance from a hereditary graph property. A surprisingly general result in extremal graph theory is the theorem of Alon and Stav [2008], proving that for every hereditary property, a random graph with appropriate density is asymptotically the farthest from the property in edit distance. Theorem 16.24 (Alon and Stav). For every hereditary graph property P there is a number p, 0 ≤ p ≤ 1, such that for every graph G with v(G) = n, ( ) d1 (G, P) ≤ E d1 (G(n, p), P) + o(1) (n → ∞). The following theorem of Lov´asz and Szegedy [2010a] states a graphon version of this fact. (Recall that, by Proposition 14.25, the closure of a hereditary graph property is flexible. We omit the details of the derivation of Theorem 16.24 from Theorem 16.25.) Theorem 16.25. If R is a flexible graphon property, then the maximum d1 -distance from R is attained by a constant function.
16.5. LOCAL VS. GLOBAL OPTIMA
295
Proof. This proof illustrates the power of extending graph problems to a continuum. By Proposition 14.26, the set W0 \ R is convex. Hence it follows that the d1 distance from R is a concave function on W0 \ R. We also know by Proposition f0 , δ ), and 15.15 and Theorem 15.16(b) that d1 (., R) is a continuous function on (W hence it assumes its maximum. Let M be the set of maximizing graphons in W0 ; this is a convex, closed subset of W0 . Since W0 \ R is invariant under the group of invertible measure preserving transformations of [0, 1], so is M , and hence M is also compact in the pseudometric δ . This implies in many ways that M contains a constant function; here is a fast argument. Let W ∈ M have minimum L2 -norm (such a graphon exists, since by Lemma 14.15 the L2 -norm is lower semicontinuous with respect to the cut norm). For every measure preserving transformation φ, we have (W + W φ )/2 ∈ M . Furthermore,
W + Wφ
≤ 1 (∥W ∥2 + ∥Wφ ∥2 ) = ∥W ∥2 .
2 2 2 By the choice of W , we must have equality here, which implies that W and W φ are proportional, but since they have the same integral, they must be equal almost everywhere. Since this holds for any φ, W must be constant almost everywhere. 16.5.2. The Sidorenko Conjecture. Sidorenko [1991, 1993] conjectured that the inequality (16.33)
t(F, W ) ≥ t(K2 , W )e(F ) .
holds for all bipartite graphs F and all W ∈ W, W ≥ 0. Several special cases of this inequality were mentioned in the Introduction, Section 2.1.2. Sidorenko in fact formulated this not only for graphs but for graphons, being perhaps the first to use the integral expression for t(., .) as a generalization of subgraph counting. (The conjecture extends to non-symmetric functions W , but we restrict our attention to the symmetric case here.) A closely related conjecture in extremal graph theory was raised earlier by Simonovits [1984]. In spite of its very simple form and a lot of effort, this conjecture is unproven in general. It is easy to see that every graph satisfying Sidorenko’s Conjecture must be bipartite. Indeed, if W = WK2 , then the right side of (16.33) is positive, but the left side is positive only if F is bipartite. We can view this as an extremal problem in two ways: (1) for every nonnegative W ∈ W, matchings minimize t(F, W ) among all bipartite graphs with a given number of edges; (2) for every bipartite graph F , constant functions W minimize t(F, W ) among all nonnegative kernels W with a given integral. Since both sides of (16.33) are homogeneous in W of the same degree, we can scale W and assume that t(K2 , W ) = 1. Then we want to conclude that t(F, W ) ≥ 1 for every bipartite graph F . There are partial results in the direction of the conjecture. Sidorenko proved it for a fairly large class of graphs, including trees, complete bipartite graphs, and all bipartite graphs with at most 4 nodes in one of the color classes. After a long period of little progress, several new (but unfortunately still partial) results were obtained recently. Each of these is in one way or other related to the material in this book, so we discuss them in some detail. We have defined weakly norming graphs in Section 14.1. Hatami [2010] gives a proof of the following (easy) fact, attributing it to B. Szegedy : If a bipartite graph
296
16. EXTREMAL THEORY OF DENSE GRAPHS
F is weakly norming, then it satisfies the Sidorenko conjecture. Combined with the result of Hatami (Proposition 14.2) that all cubes are weakly norming, it follows that all cubes satisfy Sidorenko’s conjecture. In another direction, Conlon, Fox and Sudakov [2010] proved that the conjecture is satisfied by every bipartite graph that contains a node connected to all nodes on the other side. Their proof uses a sophisticated probabilistic argument. Li and Szegedy [2012] give a shorter analytic proof, which extends to a larger class of graphs. Szegedy [unpublished] uses entropy arguments to prove the conjecture for an even larger class, which includes of all previously settled special cases. The smallest graph for which the conjecture is not known is the M¨obius ladder of length 5 (equivalently, a 10-cycle with the longest diagonals added). 16.5.3. Local Sidorenko Conjecture. We can ask for conditions on W , rather than on F , that suffice to prove the conjectured inequality (16.33). Lov´asz [2011] proved that for every bipartite graph F , the constant 1 kernel minimizes t(F, W ) at least locally: Theorem 16.26. Let F be a bipartite graph with m edges. Let W be a kernel with ∫ W = 1 and ∥W − 1∥∞ ≤ 1/(4m). Then t(F, W ) ≥ 1. The proof of this theorem is not given here; instead, we prove the following easier related result. Let us say that a graph F has the local Sidorenko property if for every ( kernel W ≥ 0) there is an εW > 0 such that for every 0 ≤ ε ≤ εW , we have t F, 1 + ε(W − 1) ≥ 1. (For a graph satisfying the Sidorenko conjecture, we have εW = 1 for every W ≥ 0.) With this weaker notion, a graph does not have to be bipartite to satisfy it. In fact, we have the following characterization of these graphs: Proposition 16.27. A graph has the local Sidorenko property if and only if either it is a forest or its girth is even. In particular, every bipartite graph has the local Sidorenko property. Proof. Let us start with the “if” part. Replacing W by 1+(W −1)/∥W −1∥∞ , we may assume that 0 ≤ W ≤ 2. Let U = W −1, then U ∈ W1 . The homomorphism density t(F, W ) = t(F, 1 + U ) can be expanded in terms of the subgraphs of F , and so what we want to prove is ∑ t(F ′ , εU ) ≥ 1 F ′ ⊆F
for a sufficiently small ε > 0. (Let us agree that, for the rest of this section, F ′ ⊆ F means that F ′ is a subgraph of F without isolated nodes.) The term with F ′ = ∅ is 1, and so (pulling out the ε factors) we want to prove that ∑ ′ (16.34) t(F ′ , U )εe(F ) ≥ 0. ∅̸=F ′ ⊆F
It follows from the definition of U that t(K2 , U ) = 0, and so every term in (16.34) where F ′ is a matching cancels. If F itself is a matching, we have nothing to prove. Otherwise, the next smallest term is t(P3 , U ), which is nonnegative, since P3 = [[(K2• )2 ]] is a square. If t(P3 , U ) > 0, then for every sufficiently small ε > 0 it dominates the sum (16.34), and we are done. So suppose that t(P3 , U ) = 0, then
16.5. LOCAL VS. GLOBAL OPTIMA
297
tx (K2• , U ) = 0 for almost all x. This implies that t(F ′ , U ) = 0 whenever U has a node of degree 1. In particular, if F is a forest, we are done. Suppose that F is not a forest, then the nonzero term in 16.34 with the smallest number of edges is t(C2r , U ), where C2r is the shortest cycle in F . Since t(C2r , U )1/(2r) is a norm by Proposition 14.2, this term is nonzero if U ̸= 0, and so for a sufficiently small ε > 0, it dominates the remaining terms. To prove the “only if” part, suppose of F is odd. Let U be ( 1) that the girth • the graphon defined by the matrix −1 . Then t (K , U ) = 0 for every x, and x 2 1 −1 hence all those terms in (16.34) are 0 in which F has a node with degree 1. So the nonzero terms with the smallest exponent of ε correspond to the shortest (odd) cycles. Trivially t(F ′ , W ) = −1 for such a term, and so for a sufficiently small ε the whole expression (16.34) will be negative. The proof of Theorem 16.26 expands the idea of the proof above, but one has to do much more careful estimations. Many steps in the proof can be viewed as using a version of the calculus “for W0 ” developed before, but this time “for W1 ”. Some steps are described in Exercises 16.29-16.31 to illustrate this potentially useful technique. 16.5.4. Common graphs. The following inequality is closely related to Goodman’s Theorem (2.2), and it can be proved along the same lines: 1 (16.35) t(K3 , G) + t(K3 , G) ≥ , 4 and equality holds asymptotically if G is a random graph with edge density 1/2. Erd˝os conjectured that a similar inequality will hold for K4 in place of K3 , but this was disproved by Thomason [1998]. More generally, one can ask which graphs F satisfy ( ) tinj (F, G) + tinj (F, G) ≥ 1 + o(1) 21−e(F ) , where the o(1) refers to v(G) → ∞. Going to the limit, we get a formulation free of remainder terms: Which simple graphs F satisfy 1 (16.36) t(F, W ) + t(F, 1 − W ) ≥ 21−e(F ) = 2t(F, ) 2 for every graphon W ? Such graphs F are called common graphs. So the triangle is common, but K4 is not. Are there any other common graphs? Sidorenko [1996] studied graphs with this and other “convexity” properties. Let F be a graph satisfying Sidorenko’s conjecture. Then t(F, W ) + t(F, 1 − W ) ≥ t(K2 , W )e(F ) + t(K2 , 1 − W )e(F ) ( t(K , W ) + t(K , 1 − W ) )e(F ) 2 2 ≥2 = 21−e(F ) , 2 so F is common. Sidorenko’s conjecture would imply that all bipartite graphs are common, and all bipartite graphs mentioned above for which Sidorenko’s conjecture is verified are common. Among non-bipartite graphs, not many common graphs are ˇ ıˇcek and Thomason [1996] showed that no graph containing known. Jagger, Stov´ K4 is common. Franek and R¨odl [1992] showed that if we delete an edge from K4 , the obtained graph K4′ is common. Recently Hatami, Hladky, Kr´al, Norine and Razborov [2011] proved that the 5-wheel is common, using computers to find appropriate nonnegative expressions in the flag algebra. We cannot reproduce their proof here; instead,
298
16. EXTREMAL THEORY OF DENSE GRAPHS
let us give the proof of the fact that K4′ is common, which should give a feeling for this technique. We do our computations in the graph algebra, instead of the flag algebra. We start with rewriting (16.36) as follows. Let U = 2W − 1, then substituting U in (16.36) and multiplying by 2e(F ) , we get t(F, 1 + U ) + t(F, 1 − U ) ≥ 2,
(16.37)
which should hold for every U ∈ W1 . In other words, F is common iff the left side is minimized by U = 0. The subgraph densities on the left side of (16.37) can be expanded as before, and we get ∑ t(F, 1 + U ) + t(F, 1 − U ) = 2 t(F ′ , U ). F ′ ⊆F e(F ′ ) even
The term with e(F ′ ) = 0 gives 2, the value on the right side of (16.37), so F is common if and only if ∑ (16.38) t(F ′ , U ) ≥ 0 F ′ ⊆F e(F ′ )>0, even
for every U ∈ W1 . Note that inequality (16.38) has to be true for all U ∈ W1 , not just for U ∈ W0 , so the fact that all terms on the left have nonnegative coefficient does not make this relation trivial. As an example, in the case F = K4′ we get from (16.38) that its commonness follows if we can show that (16.39)
2
+8
Here we can write the left side as ( 2 2 2 +4 +
[[
+
≥ 0 (for W1 ).
+4 )2 +2
]] + 4(
−
) .
It is easy to see (see Exercise 16.30) that the last term is nonnegative. The other terms are squares, which proves (16.39). Locally common graphs. We say that a graph F is locally common, if for every U ∈ W1 there is a 0 < εU ≤ 1 such that if 0 < ε < εU , then t(F, 1 + εU ) + t(F, 1 − εU ) ≥ 2. Franek and R¨odl [1992] proved that K4 is locally common. In fact, the following more general result holds, and can be proved along the lines of the proof of Proposition 16.27, using formula 16.38. Proposition 16.28. Let G be a graph in which the subgraph with the minimum number of edges such that all degrees are at least 2 and the number of edges is even is an even cycle. Then G is locally common. In particular, every bipartite graph is locally common (this follows by Proposition 16.27 as well), and so is every simple graph containing a 4-cycle. Combining ˇ ıˇcek and Thomason [1996] mentioned above, it with the theorem of Jagger, Stov´ follows that every graph that contains a K4 is not common but locally common. Not all graphs are locally common (see Exercise 16.33). Exercise 16.29. Prove that C2 ≥ C4 ≥ C6 ≥ . . . (for W1 ).
16.6. DECIDING INEQUALITIES BETWEEN SUBGRAPH DENSITIES
299
Exercise 16.30. Prove that [[F12 F2 ]] ≤ [[F12 ]] (for W1 ) for any two k-labeled multigraphs F1 and F2 . Exercise 16.31. Suppose that a bipartite graph F contains a 4-cycle. Prove that F ≤ C4 (for W1 ). More generally, if F is not a forest, then F ≤ C2r (for W1 ), where C2r is the shortest cycle in F . Exercise 16.32. Prove that triangles are common (16.35). • Exercise 16.33. Prove that [[C7• C11 ]] is not locally common.
16.6. Deciding inequalities between subgraph densities Now we turn to more general questions in extremal graph theory, as we already indicated in the Introduction. 16.6.1. Undecidability of density inequalities. In analogy with Artin’s Theorem for real polynomials (see Appendix A.7), we may try to represent quantum graphs g with g ≥ 0 as sums of squares or, more generally, as quotients of sums of squares: if y and z are square-sums and y = zg, then g ≥ 0. The following theorem by Hatami and Norine [2011] tells us once and forever that no such “certificate” of nonnegativity can be given. Theorem 16.34. It is algorithmically undecidable whether an inequality g ≥ 0 holds (where g is a quantum graph with rational coefficients). It is a related, but easier, fact that the following is algorithmically undecidable: given simple graphs F1 , . . . , Fm and rational coefficients a1 . . . , am , decide whether ∑ a hom(F, Gi ) ≥ 0 holds for every simple graph G. This follows from a result i i of Ioannidis and Ramakrishnan [1995], proved in a completely different setting of databases; for the adaptation to the graph case, see Exercise 16.43, based on the lecture notes of Kopparty [2011]. The proof will consist of a reduction of the problem of deciding whether a given polynomial p ∈ Z[x1 , . . . , xk ] is nonnegative for every x1 , . . . , xk ∈ N, to the problem of deciding whether x ≥ 0 for a quantum graph g with rational coefficients. Since the latter problem is undecidable by the Theorem of Matiyasevich (see Section A.7 in the Appendix), this will imply that so is the former. To this end, we need a version of Problem A.33, which asks for nonnegativity on another set of numbers, namely on the set { } 1 A = 1 − : n = 1, 2, . . . . n The variables will be represented by edge densities and triangle densities in appropriate graphs, and the fact that Goodman’s bound is only attained for edge densities in A (Corollary 16.12) will then be used to force the edge-densities to be of this form. Proof of Theorem 16.34. We start with some algebraic reductions. Claim 16.35. It is algorithmically undecidable whether an inequality p ≥ 0 (p ∈ Z[x1 , . . . , xk ]) holds on Ak . Indeed, one can reduce deciding if p ≥ 0 on Nk to deciding whether p ≥ 0 on A by a straightforward change of variables. The connection to graph theory is established )by the following reduction. Con( sider the set D2,3 of all pairs t(K2 , W ), t(K3 , W ) where W is a graphon (Figures k
300
16. EXTREMAL THEORY OF DENSE GRAPHS
2 2.1 and 16.1). ( Let D ⊆ D2,3 consist ) of the points (x, y) where x ∈ A and y = 2x −x (i.e., pairs t(K2 , Kn ), t(K3 , Kn ) together with (0, 0) and (1, 1)). For a polynomial p ∈ Z[x1 , . . . , xk ], construct the polynomial in 2k variables:
p∗ (x1 , y1 , . . . , xk , yk ) =
k ∏
(1 − xi )2 p(x1 , . . . , xk ) +
i=1
k ∑
M (yi − 2x2i + xi ),
i=1
where M = 2 max{∥(gradp)(x)∥∞ : x ∈ [0, 1] }. We also construct a polynomial in 3k variables: ( v1 w1 vk wk ) p∗∗ (u1 , v1 , w1 , . . . , uk , vk , wk ) = (u1 . . . uk )N p∗ 2 , 3 , . . . , 2 , 3 , u1 u1 uk uk k
(where N = 5 deg(p∗ ) is large enough to cancel all denominators). k Claim 16.36. The following are equivalent: (i) p ≥ 0 on Ak ; (ii) p∗ ≥ 0 on D2,3 ; ∗ k (iii) p ≥ 0 on D .
Trivially, (ii)⇒(iii)⇒(i). To prove that (i)⇒(ii), assume that p ≥ 0 on Ak , and let (xi , yi ) ∈ D2,3 . We want to prove that p∗ (x1 , y1 , . . . , xk , yk ) ≥ 0. We start with bounding the first term in the definition of p∗ . Let zi ∈ A be closest to xi . Then p(z1 , . . . , zk ) ≥ 0, and so p(x1 , . . . , xk ) ≥ p(x1 , . . . , xk ) − p(z1 , . . . , zk ) = (x − z) · (gradp)(ξ), where ξ ∈ [0, 1]k . By the definition of M , we get that (16.40)
k ∏
(1 − xi )2 p(x1 , . . . , xn ) ≥ −
i=1
k ∏
(1 − xi )2
i=1
≥−
k M∑ |xi − zi | 2 i=1
k M∑ (1 − xi )2 |xi − zi |. 2 i=1
We show that each term is compensated for by the corresponding term in the other part of p∗ , i.e., 1 (16.41) (1 − xi )2 |xi − zi | ≤ yi − 2x2i + xi . 2 Let us assume e.g. that xi ≤ zi (the other case is similar). Let wi ∈ A be the closest point to xi with wi < xi . By Corollary 16.12, yi is above the chord between zi and wi of the parabola 2x2 + x. On the other hand, 2x2i + xi is below the chord between zi and (zi + wi )/2. The slope of the first chord is 2zi + 2wi − 1; the slope of the second, 3zi + wi − 1. The difference in slopes is zi − wi , and so yi − 2x2i + xi ≥ (zi − wi )(zi − xi ). Simple computation shows that (1 − wi )2 1 ≥ (1 − xi )2 . 2 − wi 2 This proves (16.41) and thereby also Claim 16.36. zi − wi =
As a preparation for the rest of the proof, we need to construct some special graphs. We fix a simple graph F with node set [k] that has no automorphisms. For any set of positive integers n1 , . . . , nk , let F (n1 , . . . , nk ) denote the (unlabeled) graph obtained from F by replacing every node i by a set of ni twins. We will call these ni nodes the clones of i.
16.6. DECIDING INEQUALITIES BETWEEN SUBGRAPH DENSITIES
301
As a related construction, we define Fir (i ∈ [k], r ≥ 0) by adding r new twins of node i, making them mutually adjacent and adjacent to i, and labeling the original nodes 1, . . . , k. The node i and the new nodes will be called the clones of i. So F1,r is a k-labeled version of F (r + 1, 1, . . . , 1).
Figure 16.2. Auxiliary graphs for the proof of theorem 16.34. As a further step, we add all missing edges to Fir with negative sign, to get the k-labeled signed graph Fbir (as we have seen, this can be considered as a k-labeled quantum graph). Recall that Fb is defined analogously. Claim 16.37. Every homomorphism of Fb into any graph G is an induced embedding. Indeed, every homomorphism preserves both edges and non-edges by the definition of signed graphs. Suppose that two nodes u, v ∈ V (F ) are mapped onto the same node of G. Then every further node of F must be connected to u and v in the same way, and so interchanging u and v is an automorphism of F , which contradicts the choice of F . Claim 16.38. For every homomorphism of Fb into any of the special graphs F (n1 , . . . , nk ), each node i ∈ V (F ) is mapped onto a clone of i. The proof is similar to the previous one. We already know that the map is injective. For u ∈ V (F ), let σ(u) be defined as the node of F whose clones in F (n1 , . . . , nk ) contain the image of u. No two nodes u, v ∈ V (F ) have σ(u) = σ(v): similarly as before, interchanging two such nodes would be an automorphism of F . Hence σ is an automorphism of F , and hence σ must be the identity. This proves the Claim. Our next observation is that homomorphism densities from the signed graphs Fir into any simple graph G can be expressed quite simply. Let G be any simple graph, and let φ : [k] → V (G), and let S = φ([k]) be its range. Let Uφ,i be the set of nodes in V (G) \ S which are connected to φ(i) and all the neighbors of φ(i) in S, but to no other node in S. We claim that { hom(Kr , G[Uφ,i ]) if φ is an induced embedding of F , (16.42) homφ (Fbir , G) = 0 otherwise. Assume first that φ is an induced embedding of F into G. It is clear that if ψ is any homomorphism of Fbir into G extending φ, then all the clones of i in Fbir must be mapped onto nodes in Uφ,i . Since these twins form a complete graph Kr , the number of ways to map these twins into G[Uφ,i ] homomorphically is hom(Kr , G[Uφ,i ]), and every such map, together with φ, forms a homomorphism of Fbir into G. Claim
302
16. EXTREMAL THEORY OF DENSE GRAPHS
16.37 implies that homφ (Fbir ) = 0 if φ is not an induced embedding of F . Hence (16.42) follows. Normalizing the homomorphism numbers in (16.42), we get that if φ is an induced embedding, then |Uφ,i |r (16.43) tφ (Fbir , G) = t(Kr , G[Uφ,i ]). v(G)r We want to reduce the problem of deciding whether p ≥ 0 on Ak (for a given polynomial p ∈ Z[x1 , . . . , xk ]) to deciding whether g ≥ 0 for a quantum graph g. Given p, we construct the polynomials p∗ ∈ Z[x1 , y1 , . . . , xk , yk ] and p∗∗ ∈ Z[u1 , v1 , w1 , . . . , uk , vk , wk ] as above, and define the k-labeled quantum graph g = p∗∗ (Fb11 , Fb12 , Fb13 , . . . , Fbk1 , Fbk2 , Fbk3 ). The key step in the proof is the following: Claim 16.39. We have [[g]] ≥ 0 if any only if p ≥ 0 on Ak . To start with the “if” direction, assume that p ≥ 0 on Ak . Then p∗ ≥ 0 on by Claim 16.36. We want to prove that t([[g]], G) ≥ 0 for every graph G. This follows if we show that tφ (g, G) ≥ 0 for every map φ : [k] → V (G). To simplify notation, set tir = tφ (Fbir , G). Then k D2,3
tφ (g, G) = p∗∗ (t11 , t12 , t13 , . . . , tk1 , tk2 , tk3 ) (t tk2 tk3 ) 12 t13 = (t11 . . . tk1 )N p∗ 2 , 3 , . . . , 2 , 3 . t11 t11 tk1 tk1 ( ) 3 2 By (16.43), we have (t12 /t11 , t13 /t11 ) = t(K2 , G[Uφ,i ]), t(K3 , G[Uφ,i ]) ∈ D2,3 , and hence it follows that tφ (g, G) ≥ 0. To prove the converse, assume that [[g]] ≥ 0. By Claim 16.36, it suffices to prove that p∗ ≥ 0 on Dk . Let xi = (ni − 1)/ni ∈ A and yi = 2x2i − xi = (n2i − 3ni + 2)/n2i , and consider the graph G = F (n1 +1, . . . , nk +1). By our assumption, t([[g]], G) ≥ 0. We can write ∑ 1 t([[g]], G) = tφ (g, G). v(G)k φ:[k]→V (G)
In every constituent of g, the labeled nodes induce a copy of Fb , which implies by (16.42) that only those terms where φ is an induced embedding are nonzero. By Claim 16.38, such mappings φ map every node of F onto a clone of it, and hence tφ (g, G) is the same for every induced embedding. This implies that tφ (g, G) ≥ 0 for any such map φ. Let us fix an induced embedding φ of F into G. Now we can apply (16.43): the set Uφ,i induces a complete graph with ni nodes, and hence |ni |r (ni )r tφ (Fbir , G) = r t(Kr , Kni ) = . n nr This implies that ( ) n1 (n1 )2 (n1 )3 nk (nk )2 (nk )3 tφ (g, G) = p∗∗ , 2 , 3 ,..., , , n n n n n2 n3 ( n . . . n )N 1 k = p∗ (x1 , y1 , . . . , xk , yk ). nk Since the left side is nonnegative, this implies that p∗ (x1 , y1 , . . . , xk , yk ) ≥ 0 as claimed.
16.6. DECIDING INEQUALITIES BETWEEN SUBGRAPH DENSITIES
303
This proves Claim 16.39, and together with Claim 16.35, it completes the proof of the theorem. As mentioned in the introduction, an inequality g ≥ 0, where g is a quantum graph, is decidable with an arbitrarily small error: Proposition 16.40. There is an algorithm that, given a quantum graph g with rational coefficients and an error bound ε > 0, decides either that g 0 or that g + εK1 ≥ 0 (if both inequalities are true, then it may return either answer). but let us Proof. This will follow from Theorem 16.41 in the next section, ∑ describe a simple direct proof suggested by Pikhurko. Let g = a F be the F F ∑ given quantum graph, let a = F |aF |e(F ), ε1 = ε/a. By Corollary 9.25, there is an integer k ≥ 1 such that all simple ∑ graphs with k nodes form an ε1 -net in (W0 , δ ). Let us check the inequality F aF t(F, G) ≥ 0 for all simple graphs G with at most k nodes. If we find a graph that violates it, we know that g 0. Else, let G be any simple graph. By the definition of k, there is a simple graph G′ on k nodes such that δ (G, G′ ) ≤ ε1 , and hence by the Counting Lemma 10.22, we have ∑ ∑ ∑ aF t(F, G′ )−ε ≥ −ε = −εt(K1 , G′ ), aF t(F, G) ≥ aF (t(F, G′ )−e(F )ε1 ) ≥ F
F
so we can conclude that g + εK1 ≥ 0.
F
16.6.2. Positivstellensatz for graphs. Is there a quantum graph g ≥ 0 which is not a square sum? Hatami and Norine [2011] constructed such a quantum graph. In fact, the existence of such a quantum graph follows from Theorem 16.34, stating that it is algorithmically undecidable whether a quantum graph with rational coefficients is nonnegative. To see this, consider two Turing machines, both working on an input which is a quantum graph g with rational coefficients. We may assume that the constituents of g have no isolated nodes. One of them will look for a graph G with t(g, G) < 0; the other, for a representation of g as a square-sum. If for every input one of them halts, then we know whether or not g ≥ 0. So there must be an input g on which both Turing machines run forever; then we have g ≥ 0, and g is not a square sum. (To be precise, we must add that if g is a square-sum, then it is a square-sum where the coefficients in the quantum graphs yi in the definition are algebraic real numbers; then there are only a countable number of possibilities, and the second Turing machine can check them in an appropriate order. One needs a method to check that given k-labeled quantum graphs yi , . . . , yk with algebraic coefficients, ∑ whether g is obtained from i yi2 by unlabeling and deleting isolated nodes. Such an algorithm follows from Tarski’s Theorem on the decidability of the first order theory of real numbers.) But not all is lost: the following weaker result was proved by Lov´asz and Szegedy [2012a]. Theorem 16.41. Let f be a quantum graph. Then f ≥ 0 if and only if for every ε > 0 there is a square-sum g such that ∥f − g∥1 < ε. An analogous theorem for nonnegative polynomials was proved by Lasserre [2007]. Proof. The “if” part is trivial. The idea of the proof of ∑ the “only if” part is the following. Consider the (unlabeled) quantum graph g = F aF F (where only a
304
16. EXTREMAL THEORY OF DENSE GRAPHS
finite number of the aF are nonzero). We may assume that no graph F with aF > 0 contains an isolated node, since removing isolated nodes does not change t(g, W ). The condition g ≥ 0 means that h(g) ≥ 0 for every graph parameter h of the form h = t(., W ) with W ∈ W0 . This constraint is linear, so we can(equivalently require ) the inequality for every graph parameter of the form h = E t(., W) , where the expectation is over some probability distribution on graphons (see Section 14.5). By Proposition 14.60, this is equivalent to requiring that h is normalized, isolateindifferent∑ and reflection positive. We can forget about the normalization, since the condition F aF h(F ) ≥ 0 is homogeneous. So the question is: Does the inequality ∑ (16.44) aF h(F ) ≥ 0 F
hold for every isolate-indifferent and reflection positive graph parameter h? This problem can be rephrased in terms of the connection matrix X = M (h, N), whose entries we consider as unknowns. These unknowns are not all different: if (for this proof only) F ′ denotes the graph obtained from F by removing its isolated nodes, then we have XF1 ,F2 = XG1 ,G2 whenever [[F1 F2 ]]′ ∼ = [[G1 G2 ]]′ . The reflection positivity conditions mean that X ≽ 0. The question is: Do these constraints imply the inequality ∑ (16.45) aF XF,K0 ≥ 0 F
(In this last sum, all the graphs F are unlabeled.) This is just a feasibility problem in semidefinite programming —apart from the “minor” problem that the unknowns form an infinite matrix, and have to satisfy an infinite number of constraints. We will have to cut the program to finite size, which will bring in the error in the theorem. But let us ignore the problems with infinities, and apply the Semidefinite Farkas Lemma: We have to assign, as a Lagrange multiplier, a matrix Y ≽ 0, which has to satisfy ∑ (16.46) YF1 ,F2 = aF F1 ,F2 : [[F1 F2 ]]′ ∼ =F
for every F ∈ F simp (where the summation extends over all partially labeled simple graphs F1 and F2 ). We can rewrite this as ∑ ∑ g= aF F = YF1 ,F2 [[F1 F2 ]]′ F
F1 ,F2
T
Let us write Y = ZZ with some matrix Z; this takes care of the semidefiniteness condition (remember, we are ignoring the problem that these matrices are infinite). Then )2 ∑ (∑ ∑ ∑ ZF1 ,m ZF2 ,m [[F1 F2 ]]′ = Zm,F F . g=
[[
F1 ,F2 m
m
]]
F
showing that g is a square-sum. Now we have to make this argument precise. Let Fk′ denote the set of fully labeled graphs on [k]. Let M denote the linear space of all symmetric matrices indexed by partially labeled simple graphs, let P be the subset of M consisting of positive semidefinite matrices, and let L denote the subspace of matrices satisfying XF1 ,F2 = XG1 ,G2 whenever [[F1 F2 ]]′ ∼ = [[G1 G2 ]]′ . Clearly, P is a convex cone. Let
16.6. DECIDING INEQUALITIES BETWEEN SUBGRAPH DENSITIES
305
Φk denote the operator mapping a matrix in M to its restriction to Fk′ × Fk′ (this is a finite matrix!). Then Mk = Φk M is the space of all symmetric Fk′ × Fk′ matrices, and Pk = Φk P is the positive semidefinite cone in Mk . It is also clear that Lk = Φk L consists of those matrices X ∈ Mk for which XF1 ,F2 = XG1 ,G2 whenever [[F1 F2 ]]′ ∼ = [[G1 G2 ]]′ . Clearly, Φk (P ∩ L) ⊆ Pk ∩ Lk ,
(16.47)
but equality may not hold in general.
%k (k
'm,k $k
(k‚$k
'k %m
(m
$m
(m‚$m
%
'm (
$ (‚$
Figure 16.3. Spaces and cones in the proof of Theorem 16.41. We note that the entries of every matrix X ∈ Pk ∩ Lk are in the interval [0, X∅,∅ ]. Indeed, for any F ∈ Fk′ and the fully labeled edgeless graph Uk ∈ Fk′ , the condition X ∈ Lk implies that XUk ,F = XF,F = XF,Uk , and so X ∈ Pk implies that 0 ≤ XF,F ≤ X∅,∅ . Since XF,G = XF G,F G , the claim follows. ′ For k ≤ m, we embed Fk′ into Fm , by adding m − k isolated nodes labeled ′ ′ matrices to Fk′ × Fk′ × Fm k + 1, . . . , m. The corresponding operator restricting Fm will be denoted by Φm,k . We claim that the following weak converse of (16.47) holds: ∩ Φm,k (Pm ∩ Lm ). (16.48) Φk (P ∩ L) = m≥k
Indeed, let A be a matrix that is contained in the right hand side. Then for every m ≥ k we have a matrix Bm ∈ Pm ∩ Lm such that A is a restriction of Bm . Now let m → ∞; by selecting a subsequence, we may assume that all entries of Bm tend to a limit. This limit defines a graph parameter f , which is normalized, isolate-indifferent and flatly reflection positive. By Proposition 14.60, f is reflection positive, and so the matrix M (f ) is in P ∩ L and Φk M (f ) = A. We may assume that |V (F )| = k whenever aF ̸= 0. Let A ∈ Mk denote the matrix { aF , if F = G, AF G = 0, otherwise. Then g ≥ 0 means that A · Z ≥ 0∑for all Z ∈ Φk (P ∩ L) (where the inner product A·Z of two matrices is defined as i,j Aij Zij ). In other words, A is in the dual cone of Φk (P ∩ L). From (16.48) it follows that there are diagonal matrices Am ∈ Mk such that Am → A and Am · Y ≥ 0 for all Y ∈ Φm,k (Pm ∩ Lm ). In other words, Am · Φm,k Z ≥ 0 for all Z ∈ Pm ∩ Lm , which can also be written as Φ∗m,k Am · Z ≥ 0, where Φ∗m,k : Mk → Mm is the adjoint of the linear map Φm,k : Mm → Mk . (This adjoint acts by adding 0-s.) So Φ∗m,k Am is in the polar cone of Pm ∩ Lm ,
306
16. EXTREMAL THEORY OF DENSE GRAPHS
∗ + L∗m . The positive semidefinite cone ∑ is self-polar. The linear space which is Pm ∗ Lm consists of those matrices B ∈ Mm for which F1 ,F2 BF1 ,F2 = 0, where the summation extends over all pairs F1 , F2 ∈ F ′ M for which F1 F2 ≃ F0 , for every fixed graph F0 . Thus we have Φ∗m,k Am = P + L, where P is positive semidefinite ∑N and L ∈ L∗m . Since P is positive semidefinite, we can write it as P = k=1 vk vkT , ′ where vk ∈ RFm . We can write this as { N ∑ ∑ (Am )F0 ,F0 , if F1 F2 ≃ F0 ∈ Fk′ , vk,F1 vk,F2 = 0, otherwise. F ,F k=0 1
2
F1 F2 ≃F0
In other words,
( N ∑ ∑ k=1
)2 vk,F F
=
F
∑ (Am )F0 ,F0 F0 . F0
So the quantum graph on the right side is a sum of squares. Furthermore, if m → ∞, then Am → A and so ∑ ∑ (Am )F0 ,F0 F0 → AF0 ,F0 F0 = f. F0
F0
In view of the usefulness of extending graphs to graphons, it seems natural to define graph algebras of infinite formal linear combinations of graphs with appropriate convergence properties; in other words, of graph parameters. It has not been worked out, however, what the structure of the resulting algebra is, and how it is related to graphons. Proposition 14.61 suggests that it should be enough to use fully labeled quantum graphs when approximating a nonnegative quantum graph as a square sum. This is indeed true (see Exercise 16.44), but the approximation is very inefficient, as shown by the following example. Example 16.42. Consider the quantum graph g=
−
≥ 0.
If we allow unlabeled nodes, then g can be represented (up to labels and isolated nodes) as a square sum (in fact, as a single square): ( )2 − . But g cannot be represented as a square sum of fully labeled graphs. To see this, let Sk denote the ∑set of quantum graphs obtained by deleting isolates and unlabeling square sums i yi2 , where every constituent of yi is a fully labeled graph on k nodes. Consider the graph parameter (k) 3 4 if e(G) = 0, (k−2) if e(G) = 1, 2 f (G) = 1 if E(G) consists of two disjoint edges, 0 otherwise. Then f (g) < 0. On the other hand, we claim that f (y 2 ) ≥ 0 if every constituent of y is a fully labeled graph on k nodes. This means that the flat connection matrix M flat (f, k) is positive semidefinite. By the Lindstr¨om–Wilf Formula (A.1), this
16.7. WHICH GRAPHS ARE EXTREMAL?
307
is equivalent to saying that the upper M¨obius inverse f ↑ (see (4.1)) is nonnegative, which is easy to check. Since f is isolate-indifferent, this proves that f is nonnegative on every quantum graph in Sk . It follows that g ∈ / Sk for any k. But g can be approximated by members of Sk for large k. To construct such an approximation, let Hij (1 ≤ i < j ≤ k) consist of V (G) = [k] and a single edge connecting i to j. Expanding and unlabeling the quantum graph 2 ∑ ∑ 1 1 H1i − (k−1) h2k = Hij , k−1 2 2≤i≤k
2≤i 0) the p-threshold graphon. Proposition 16.49. If p is monotone decreasing on [0, 1]2 , then the p-threshold graphon U is finitely forcible among kernels. The proof will show that U is determined by the equations (16.51)
b4 , W ) = 0 tx1 x2 x3 x4 (C
and (16.52)
t(Ka,b , W ) = t(Ka,b , U )
(1 ≤ a, b ≤ 2 deg(p) + 4)
b4 is the signed 4-cycle defined in Example 14.42). It can be conjectured that (here C the monotonicity condition is not needed. Proof. The condition on the monotonicity of p implies that W = U satisfies (16.51). It is trivial that equations (16.52) are satisfied by W = U . Let W ∈ W be any graphon satisfying (16.51)-(16.52). As discussed in Example 14.42, we may assume that W is a threshold graphon, i.e., 0-1 valued and monotone
310
16. EXTREMAL THEORY OF DENSE GRAPHS
decreasing. Let SW = {(x, y) : W (x, y) = 1}. We have ∫
∫
t(Ka,b , W ) = [0,1]a [0,1]b
b a ∏ ∏
W (xi , yj ) dy dx.
i=1 j=1
Split this integral according to which xi and which yj is the largest. Restricting the integral to, say, the domain where x1 and y1 are the largest, we have that whenever W (x1 , y1 ) = 1 then also W (xi , yj ) = 1 for all i and j, and hence ∫
∫
∫
∫
a ∏ b ∏
x1 ∈[0,1] x2 ,...,xa ≤x1 y1 ∈[0,1] y2 ,...,yb ≤y1
∫
∫
W (x1 , y1 )xa−1 y1b−1 1
=
W (xi , yj ) dy dx
i=1 j=1
x1 ∈[0,1] y1 ∈[0,1]
∫ xa−1 y b−1 dy dx .
dy1 dx1 = (x,y)∈SW
Since there are a choices for the largest xi and b choices for the largest yj , this implies that ∫ xa−1 y b−1 dy dx . (16.53) t(Ka,b , W ) = ab (x,y)∈SW
Let us approximate W by a threshold graphon V for which the boundary curve ∂SV is smooth (except for the endpoints of the intersection of SV with boundary of the square), and ε = λ(SW △SV ) is very small. Let ds be the arc length of the boundary curve ∂SV , and let n(x, y) = (n1 (x, y), n2 (x, y)) denote the outward normal of ∂SV at point (x, y). By the Gauss–Ostrogradsky Theorem, we can rewrite (16.53) as an integral along the boundary: ∫ t(Ka,b , V ) = b xa y b−1 n1 (x, y) ds . ∂SV
Interchanging the roles of x and y, and adding, we get ∫ ( ) 1 1 (16.54) xa y b n1 (x, y) + n2 (x, y) ds = t(Ka+1,b , V ) + t(Ka,b+1 , V ). a+1 b+1 ∂SV
Consider the following integral: ∫ ( ) I(V ) = x(1 − x)y(1 − y) p(x, y)2 n1 (x, y) + n2 (x, y) ds. ∂SV
By (16.54), this can be expressed as a linear combination ∑
2 deg(p)+4
I(V ) =
a,b=1
cab t(Ka,b , V ),
16.7. WHICH GRAPHS ARE EXTREMAL?
311
where the coefficients depend only on a, b and p. Furthermore, we have ∑
2 deg(p)+4
I(V ) =
a,b=1
∑
∑
2 deg(p)+4
cab t(Ka,b , V ) ≈
cab t(Ka,b , W )
a,b=1
2 deg(p)+4
=
cab t(Ka,b , U ) = I(U ) = 0,
a,b=1
where the error at the ≈ sign is bounded by Cε for some C that depends only on a, b and p. On the other hand, the integrand in I(V ) is nonnegative everywhere. Letting ε → 0, we see that the x(1 − x)y(1 − y) p(x, y)2 = 0 on ∂(SW ). Since p is monotone decreasing, this implies that ∂(SW ) = ∂(SU ), and hence W = U except perhaps on the boundary (which is of measure 0). Complement reducible graphs. We have seen that Complement reducible graphs form a variety (Example 14.43). Many of them are finitely forcible, and they are of a rather different nature, sort of fractal-like. Here is a specific example. Proposition 16.50. For x, y ∈ [0, 1], let U (x, y) = 1, if the first bit where the binary expansions of x and y differ is at an odd position, and let U (x, y) = 0 otherwise. Then U is finitely forcible. Our examples of finitely forcible graphons have had finite range (the function W assumed only a finite set of values). We refer to the paper of Lov´asz and Szegedy [2011] for the proof of Proposition 16.50 and for further constructions of finitely forcible graphons, where the range of the function W contains an interval. 16.7.3. Not too many finitely forcible graphons. Are there any graphons that are not finitely forcible? A natural extension of the class of stepfunctions is the class of kernels with finite rank. However, we don’t get any new finitely forcible graphons in this class. In fact, Theorem 14.48 implies: Corollary 16.51. Every finitely forcible kernel with finite rank is a stepfunction. In view of Proposition 16.49, the following further corollary may be surprising: Corollary 16.52. Assume that W ∈ W0 can be expressed as a non-constant polynomial in x and y. Then W is not finitely forcible. We want to derive more general necessary conditions for being finitely forcible. We start with a rather strong property of finitely forcible functions. Recalling the definition of the 2-labeled graph F ‡ from Section 16.2, let L(W ) be the linear space generated by all 2-variable functions txy (F ‡ , W ), where F ranges over all simple graphs. Lemma 16.53. Suppose that W ∈ W is forced (in W) by the simple graphs ‡ F1 , . . . , Fm . Then either the functions txy (F1‡ , W ), . . . , txy (Fm , W ) are linearly dependent, or they generate L(W ) (or both). Proof. Suppose not, then there is a simple graph Fm+1 such that the functions txy (Fi‡ , W ) (i = 1, . . . , m + 1) are linearly independent. For U ∈ W, set
312
16. EXTREMAL THEORY OF DENSE GRAPHS
hk (U ; x, y) = txy (Fk‡ , U ). So hk (U ) ∈ W, and hk : W → W is a (non-linear) operator. Let Φ(U ) ∈ W denote the component of hm+1 (U ) orthogonal to the subspace of W generated by h1 (U ), . . . , hm (U ). We need the following technical claim. Claim 16.54. There is an open L∞ -ball U in W centered at the function W such that Φ : W → W is Lipschitz (in the L∞ norm) on U. The operators hk are L∞ -Lipschitz on W. This follows from the inequality ∥txy (F, U ) − txy (F, U ′ )∥∞ ≤ e(F )∥U − U ′ ∥∞ , which is a simple exercise. Define the (numerical) matrix ⟩ ⟨ ⟩ ⟨ h1 (U ), h1 (U ) ... h1 (U ), hm (U ) .. .. B(U ) = . . ⟨ ⟩ ⟨ ⟩ hm (U ), h1 (U ) . . . hm (U ), hm (U ) and the matrices Bi (U ) obtained from B by replacing the i-th column by ⟨h1 (U ), hm+1 (U )⟩, . . . , ⟨hm (U ), hm+1 (U )⟩. By elementary linear algebra, we have ( ) m ∑ det Bi (U ) ( ) hi (U ). Φ(U ) = hm+1 (U ) + det B(U ) i=1 The denominator is bounded away from 0 in a neighborhood U of W , and all the other functions are Lipschitz in this neighborhood, proving that Φ is Lipschitz. This completes the proof of the Claim. By classical results on differential equations in Banach spaces (see e.g. Zeidler [1985]), there exists a b > 0 and a differentiable family {Us : s ∈ [−b, b]} of functions in U satisfying the differential equation U˙ s = Φ(Us ), U0 = W. By (16.7), we have d t(Fi , Us ) = ⟨Φ(Us ), t2 (Fi‡ , Us )⟩ = 0 ds for i = 1, . . . , m, and hence t(Fi , Us ) = t(Fi , U0 ) = t(Fi , W ) for all s ∈ [0, c]. Since the graphs Fi force W , it follows that the Us are weakly isomorphic to W , and d so t(Fm+1 , Us ) = t(Fm+1 , W ). But then ⟨Φ(W ), t2 (F ‡ , W )⟩ = ds t(F, Us ) s=0 = 0, ‡ and so ⟨Φ(W ), Φ(W )⟩ = ⟨Φ(W ), t(Fm+1 , W )⟩ = 0, which is a contradiction, since Φ(W ) ̸= 0. A corollary of the previous theorem is that every finitely forcible kernel satisfies a “nontrivial” relation of the form txy (f, W ) = 0. To specify what we mean by “nontrivial”, let us say that a connected component of a partially labeled graph F is a floating component, if it contains no labeled nodes. Corollary 16.55. If the kernel W ∈ W is finitely forcible, then there is a nonzero simple 2-labeled quantum graph g with nonadjacent labeled nodes and no floating components such that txy (g, W ) = 0 almost everywhere. Proof. The linear dependence of the functions txy (F ‡ , W ) gives a simple 2∑ labeled quantum graph with nonadjacent labeled nodes of the form f = i αi Fi‡ that satisfies txy (f, W ) = 0 almost everywhere. To see that f ̸= 0, it suffices to
16.7. WHICH GRAPHS ARE EXTREMAL?
313
note that from any constituent of Fi‡ we can recover Fi by connecting its labeled nodes, so no cancellation will occur. As an application of Corollary 16.55, we prove: Theorem 16.56. The set of finitely forcible graphons is of first category in f0 , δ ). (W Proof. For a fixed set {F1 , . . . , Fk } of simple 2-labeled graphs with no floating components, let T (F1 , . . . , Fk ) denote the set of graphons W for which there is a ∑k nonzero quantum graph of the form f = i=1 ai Fi satisfying txy (f, W ) = 0 for all x, y ∈ [0, 1]. Corollary 16.55 implies that every finitely forcible graphon belongs to one of the sets T (F1 , . . . , Fk ), so it suffices to prove that these sets are nowhere dense. We do so in two steps. Claim 16.57. Let F1 , . . . , Fk be simple 2-labeled graphs with no floating components, and let W ∈ W0 . Then every neighborhood of W contains a graphon W ′ such that txy (F1 , W ′ ), . . . , txy (Fk , W ′ ) are linearly independent. It is easy to see that there is a simple 2-labeled graph G such that [[GF1 ]], . . . , [[GFk ]] are mutually non-isomorphic (a large complete graph Kn•• , with an edge incident with the node labeled 1 removed, suffices). Proposition 5.44 im( )k plies that there are graphons U1 , . . . , Uk such that the matrix t([[GFi ]], Uj ) i,j=1 is nonsingular. For 0 < ε < 1/k, define W ε = (1 − kε)W ⊕ (ε)U1 ⊕ · · · ⊕ (ε)Uk (so the components of W ε are W, U1 , . . . , Uk , scaled by 1 − kε, ε, . . . , ε). f0 , δ ) if ε → 0. Indeed, for every connected First we show that W ε → W in (W simple graph F , we have ( ) t(F, W ε ) = (1 − kε)v(F ) t(F, W ) + εv(F ) t(F, U1 ) + · · · + t(F, Uk ) , and hence t(F, W ε ) → t(F, W ) as ε → 0. Next, we show that txy (F1 , W ε ), . . . , txy (Fk , W ε ) are linearly independent for all ε > 0. If not, then there are real numbers ai such that k ∑
ai txy (Fi , W ε ) = 0
i=1
for all x, y ∈ [0, 1]. Suppose 1 − kε + (j − 1)ε ≤ x, y ≤ 1 − kε + jε, then every choice of the variables for which one of the unlabeled nodes has value outside the interval [1 − kε + (j − 1)ε, 1 − kε + jε] contributes 0 to txy (Fi , W ε ). Hence k ∑
ai εv(Fi ) txy (Fi , Uj ) = 0
(j = 1, . . . , k)
i=1
for all x, y ∈ [0, 1]. Multiplying by txy (G, Uj ) and integrating, we get k ∑ i=1
ai εv(Fi ) t([[GFi ]], Uj ) = 0
(j = 1, . . . , k).
( ) But this contradicts the nonsingularity of the matrix t([[GFi ]], Uj ) , and proves the claim.
314
16. EXTREMAL THEORY OF DENSE GRAPHS
Claim 16.58. If F1 , . . . , Fk are simple 2-labeled graphs with no floating compof0 , δ ). nents, then T (F1 , . . . , Fk ) is nowhere dense in (W f0 , δ ) contains Indeed, Claim 16.57 implies that every nonempty open set in (W a graphon U such that txy (F1 , U ), . . . , txy (Fk , U ) are linearly independent. Then k their Gram determinant t([[Fi Fj ]], U ) i,j=1 is positive. But this determinant is a continuous in U , and so there is a neighborhood of U in which it does not vanish, and hence txy (F1 , U ′ ), . . . , txy (Fk , U ′ ) are linearly independent in this neighborhood. This proves the claim and thereby the theorem. 16.7.4. Infinitesimally finitely forcible graphons. Lemma 16.53 suggests the following notion. We say that W is infinitesimally finitely forcible if L(W ) has finite dimension. To explain the name, suppose that the functions txy (F1‡ , W ), . . . , txy (Fk‡ , W ) generate L(W ). Informally, this means that every infinitesimal change in W that preserves t(F1 , W ), . . . , t(Fk , W ), also preserves t(F, W ) for every F . The following observation, contrasted with Corollary 16.51, shows that infinitesimal finite forcibility and finite forcibility behave quite differently: Lemma 16.59. Every infinitesimally finitely forcible kernel has finite rank. Proof. If L(W ) has finite dimension d, then consider the functions txy (Ck‡ , W ) = ktxy (Pk•• , W ) = kW ◦k ∈ L(W ),
k = 3, . . . , d + 3.
These are linearly dependent, and so W satisfies a polynomial equation as an operator. This means that it has a finite number of different nonzero eigenvalues. Since every nonzero eigenvalue has finite multiplicity, W has finite rank. The following corollary shows that the (false) conjecture mentioned above that only stepfunctions are finitely forcible is true in a weaker sense. Corollary 16.60. Graphons that are both finitely forcible and infinitesimally finitely forcible are exactly the stepfunctions. This corollary implies that our examples of finitely forcible non-step-functions (e.g., the simple threshold graphon) are finitely forcible but not infinitesimally finitely forcible. We don’t know any examples for the converse. Proof. If a graphon is both finitely forcible and infinitesimally finitely forcible, then it has finite rank by Lemma 16.59, and so it is a stepfunction by Corollary 16.51. Conversely, we know by Theorem 16.46 that every stepfunction W is finitely forcible. Since every function txy (F ‡ , W ) is itself a stepfunction with the same steps, it follows that L(W ) is finite dimensional, so W is infinitesimally finitely forcible. Summarizing Lemma 16.53 and Corollary 16.60, we get the following. Corollary 16.61. Suppose that W ∈ W is forced (in W) by the simple graphs F1 , . . . , Fm . Then either W is a stepfunction, or the functions txy (Fi‡ , W ) (i = 1, . . . , m) are linearly dependent.
16.7. WHICH GRAPHS ARE EXTREMAL?
315
Remark 16.62. 1. All the examples of finitely forcible graphons discussed above, indeed all the examples we know of, have dimension at most 1 (in the sense of the topology of the graphon discussed in Section 13.4). Most likely this is just due to the lack of more involved constructions; but it is not too far fetched to ask: Does every finitely forcible graphon have finite dimension? Together with Proposition 13.34 this would imply that finitely forcible graphons have polynomialsize weak regularity partitions. Together with Conjecture 16.45 and the properties of finitely forcible graphons proved above, this would provide nontrivial “templates” for extremal graphs, and possibly provide some help in finding the extremal graphs for specific extremal graph problems by imposing limitations on them. 2. We have seen a number of “finiteness” conditions on a graphon W : (a) W is a stepfunction; (b) W has finite rank; (c) W is finitely forcible; (d) W is infinitesimally finitely forcible; (e) the graph parameter t(., W ) has finite connection rank, or equivalently, the corresponding gluing algebras Qk /W have finite dimension; (f) the spaces (J, rW ) and/or (J, rW ) are finite dimensional. We could add further such conditions, like (g) the algebras Qk /W are finitely generated (this is true not only for stepfunctions, but also for a simple threshold function, for example). Several implications between these finiteness properties have been proved in this book, but several others are only conjectured. Exercise 16.63. Prove that the simple threshold graphon (Example 11.36) is forced by the conditions (16.51) and t(P3 , W ) − t(K2 , W ) + 1/6 = 0. Exercise 16.64. Show that for every kernel W there is a nonzero simple 2labeled quantum graph g with nonadjacent labeled nodes (which may have floating components) such that txy (g, W ) = 0. Exercise 16.65. Which implications among the finiteness conditions (a)-(g) in Remark 16.62 are proved in this book? Which others are trivial/easy/possible?
CHAPTER 17
Multigraphs and decorated graphs Limit objects can be defined for multigraphs, directed graphs, colored graphs, hypergraphs etc. In many cases, like directed graphs without parallel edges, or graphs with nodes colored with a fixed number of colors, this can be done along the same lines as for simple graphs. Turning to multigraphs, even the definition of homomorphisms is not unique, as we have discussed in Chapter 5. In one version, a homomorphism F → G is a map V (F ) → V (G) where the image of any edge has at least as large multiplicity as the edge itself (node-homomorphism); in another version, to specify a homomorphism between multigraphs, we have to tell the image of every node as well as the image of every edge (node-and-edge homomorphism). We also mentioned homomorphisms that preserve edge-multiplicities (induced homomorphisms). But this is not the main complication. To illuminate the content of this chapter, let us discuss informally convergence of multigraphs. We get to the most general question in several steps. We want to define convergence of a multigraph sequence (G1 , G2 , . . . ) in terms of the convergence of the homomorphism densities t(F, Gn ) for every F , and want to construct a limit object that appropriately reflects the limiting values. (1) In the previous chapters, this program was carried out in detail (maybe even in more detail than you wished to see) in the case when the graphs Gn as well as the graphs F were simple. (2) Suppose that the graphs Gn are multigraphs, but we care about the densities of simple graphs F only. In this case, node-homomorphisms mean nothing new, but node-and-edge homomorphisms do. Let us assume for the time being that the edge multiplicities in the graphs Gn remain uniformly bounded by a fixed constant d. This case is quite easy, and it has been settled (even in greater generality) by Borgs, Chayes, Lov´ asz, S´os and Vesztergombi [2008]: the limit object can be described by a kernel with values in [0, d], and the proofs are rather straightforward generalizations of the proofs from case (1). (3) Let the graphs Gn be multigraphs with bounded edge multiplicities as before, but we want the limit object to correctly reflect densities of multigraphs F . This case is more interesting. It turns out that whether we consider nodehomomorphisms or node-and-edge homomorphisms does not matter much (this is not obvious at the first sight). Nor do the numerical values of the edge multiplicities: we can think of them just as decorations of the edges from the set K = {0, 1, . . . , d}, and the only relevant property of this set is that it is finite. Here comes the first surprise: the limit object can again be defined as a function on [0, 1]2 , but its values are not numbers, but probability distributions on K (in other words, d-tuples of numbers). The second surprise is that one can generalize the results to decorations from a set K that is any compact Hausdorff space. Once the right statement of 317
318
17. MULTIGRAPHS AND DECORATED GRAPHS
the results is found, the proofs can be obtained by essentially the same techniques as before. These results of B. Szegedy and the author will be discussed in Section 17.1. (4) Let us backtrack and generalize in another direction: we allow unlimited edge multiplicities for the graphs Gn , but are only interested in densities of simple graphs F . The limit object is, not too surprisingly, an unbounded kernel. But the treatment becomes more technical; one needs appropriate bounds on the growth of edge multiplicities, and even then, one has to modify the definition of the cut norm and strengthen the Regularity Lemma to get the proofs. Some preliminary results of L. Szak´acs and the author [unpublished] are described in the internet notes [Notes]. (5) Finally, if we have sequences of graphs with unbounded edge-multiplicities and we want a limit object that correctly reflects densities of multigraphs, then we have to combine the ideas of questions (3) and (4). Here the cases of node-homomorphism densities and node-and-edge homomorphism densities diverge: there will be graph sequences that are convergent in the node-homomorphism sense but not in the node-and-edge homomorphism sense. Kolossv´ary and R´ath [2011] showed how to assign limit objects if we work with node-homomorphisms; these results can also be derived from the results mentioned in point (3) above, by compactifying the set of integers. The limit object is a function defined on [0, 1]2 , whose values are probability distributions on N. One expects that under appropriate bounds on the growth of edge multiplicities, these limit objects will also be valid for the node-and-edge homomorphism densities. However, as far as I know, no details have been worked out here. 17.1. Compact decorated graphs 17.1.1. Sampling and homomorphism numbers. We often encounter graphs with a special decoration: most often we color nodes or edges with a finite number of colors, but in some cases the objects used for decoration are more complicated, like kernels in W. Multiplicities of edges can be thought of as decorations from N, and nodeweights and edgeweights can be thought of as decorations from R. In this section we sketch how to extend the results about convergence and limit of simple graphs to the more general setting when we decorate every edge ij G of a simple graph G by an element βij of an arbitrary, but fixed compact Hausdorff space K. Most of this is based on the work of Lov´asz and Szegedy [2012b]. It will be convenient to assume that K contains a special element called 0, where an edge decorated with 0 means that it is missing (one can always add an element to K to play this role). This way we may assume that the underlying simple graph is Kn◦ , the complete graph on [n] with a loop edge on every node. We denote by Fn (K) the set of all K-decorated graphs on [n], and by F(K), the set of all K-decorated graphs. If K is finite, then so is Fn (K). If K is endowed with a topology, then Fn (K) is a topological space, endowed with the product topology of a finite number of copies of K. Compactness of K implies that Fn (K) is compact. We can identify graphs in Fn (K) that are isomorphic (in the obvious sense: the nodes can be permuted so that we get the same decoration for every edge); we get a topological space Fn (K)/Sn , which is also compact.
17.1. COMPACT DECORATED GRAPHS
319
To define subgraph sampling for K-decorated graphs is straightforward: For G ∈ F(K) and k ∈ [v(G)], let G(k, G) denote the K-decorated graph obtained by selecting a random ordered subset (v1 , v2 , . . . , vk ) of V (G) uniformly, and decorating the edge ij of Kn◦ by βvGi ,vj . While G(k, G) comes with labeled nodes, it is clear that this graph with any other labeling of its nodes arises with the same probability. Here comes the first (little) surprise: to define homomorphism numbers and homomorphism densities for K-decorated graphs is not straightforward. There is no natural way to define hom(F, G) for two K-decorated graphs F and G. What we can do is the following. Let C denote the space of continuous real valued functions on K. For every map φ : V (F ) → V (G), where F is a C-decorated graph and G is a K-decorated graph, we define a real value ∏ F G homφ (F, G) = βij (βφ(i)φ(j) ). 1≤i 0 for every function f > 0 for all (x, y) ∈ [0, 1], then there is a K-graphon represented by Wf (x, y). Indeed, for every fixed (x, y) ∈ [0, 1]2 , the functional f 7→ Wf (x, y) is a linear functional on C that is positive on positive functions. The Riesz representation ∫ theorem implies that there is a probability measure ω(x, y) such that Wf (x, y) = K f dω(x, y), and it is not hard to check that ω(x, y) defines a K-graphon. It is enough to know the values Wf (x, y) for f ∈ B, where B is a generating system; this determines the values Wf (x, y) for all f ∈ C, and through this, the probability distributions ω(x, y). We call the system of functions (Wf : f ∈ B) the B-moment representation of ω. (The name refers to the fact that for various natural choices of K and B, the numbers t(F, Wf ) (F ∈ B) behave similarly to the moments of a single-variable function. See Example 17.3). The construction assigning a graphon to every simple graph extends in a straightforward manner: every K-decorated graph G gives rise to a K-graphon ωG as follows. Let V (G) = [n]. We split the unit interval into n intervals J1 , . . . , Jn G for x ∈ Ji , y ∈ Jj (here we identify the of length 1/n, and let ωG (x, y) = βij G G element βij ∈ K with the probability distribution concentrated on βij ). It is also straightforward to extend homomorphism densities. For every Kgraphon ω and C-decorated graph F on V (F ) = [k], we introduce the homomorphism density t(F, ω) by ∫ ∏ F (xi , xj ) dx1 . . . dxk . t(F, ω) = Wβi,j [0,1]k
1≤i 1 is not allowed here, since these values don’t form a convergent sequence. So counting node-and-edge homomorphisms between multigraphs with unbounded edge multiplicities does not fit in this model. We know that convergence of a sequence does not depend on the choice of the generating system, so we can characterize it either through induced homomorphisms or node-homomorphisms. It is easy to see that sequence of multigraphs (Gn : n = 1, 2, . . . ) is convergent if and only if for every k ≥ 0, truncating the edge multiplicities at k gives a convergent sequence (now graphs with bounded edge multiplicities). This is a reasonable definition, but if we want to describe convergence of nodeand-edge homomorphism densities, or convergence of a sequence of weighted graphs with unbounded edge-weights, then we have to work more, as the following example shows. Example 17.10. Let Gn be the multigraph on [n] where the multiplicity of the edge connecting 1 to 2k is 4k for 1 ≤ k ≤ log n, and all other edge multiplicities are 1. This graph sequence is convergent in the compactification sense. However, the edge
326
17. MULTIGRAPHS AND DECORATED GRAPHS
densities t(K2 , Gn ) do not form a convergent sequence: we have t(K2 , Gn ) ∼ 11/3 if n is a power of 2, but t(K2 , Gn ) ∼ 5/3 if n is one less. Lov´ asz and Szak´acs [unpublished] gave a different definition of convergence for multigraph sequences with unbounded edge multiplicities and constructed an appropriate limit object such that the densities of simple graphs converge to a limit for every convergent multigraph sequence; see Lov´asz [Notes].
Part 4
Limits of bounded degree graphs
CHAPTER 18
Graphings This next part of the book treats convergence and limit objects of bounded degree graphs. We fix a positive integer D, and consider graphs with all degrees bounded by D. Unless explicitly said otherwise, this degree bound will be tacitly assumed. In this chapter we introduce infinite graphs that generalize finite bounded degree graphs. Their main role will be to serve as limit objects for sequences of bounded degree graphs, analogous to the role of graphons in the previous part. Graphons (symmetric functions in two variables) are very common objects and of course they have been studied for many reasons since the dawn of analysis. Graphings are less common; however, they are interesting on their own right, and in fact, they too have been studied in other contexts, mainly in connection with group theory. The situation will be more complex than in the dense case, and there will be no single “true” limit object. But the connection between these objects is quite interesting. As a further warning, it is not known whether the objects to be discussed in this chapter are all limit objects of sequences of finite graphs. This makes it even more justified to treat them separately from convergent finite graph sequences. In this part we will consider finite graphs, countably infinite graphs, and even larger graphs (typically of continuum cardinality). To keep notation in check, we will denote finite graphs by F, F ′ , G, G′ , G1 . . . , and families of finite graphs by calligraphic letters. In particular, we denote by G the family of all finite graphs (with all degrees bounded by D). We denote countable graphs by H, H ′ , H1 , . . . , and their families by Gothic letters. In particular, G denotes the family of all countable graphs (with all degrees bounded by D). Graphs with larger cardinality will be denoted by boldface letters like G, G′ , . . . ; we will not talk about families of them. 18.1. Borel graphs We start with a quite general notion. Let (Ω, B) be a Borel sigma-algebra; then (Ω, B) is separating and generated by a countable family J = {J1 , J2 , . . . } of subsets of Ω. It will be convenient to assume that the generator set J is closed under complementation and finite intersections, which implies that it is a Boolean algebra (not a sigma-algebra!). We call the sets in B and also in B × B etc. Borel sets. (The reader who likes more concrete structures can think of this as the sigmaalgebra of Borel sets in [0, 1], with J consisting of finite unions of open intervals with rational endpoints.) Let G be a graph with node set V (G) = Ω. We call G a Borel graph, if its edge-set is a Borel set in B × B. For example, the complete graph on Ω is Borel, 329
330
18. GRAPHINGS
but we will be interested in graphs with all degrees bounded by D, and will tacitly assume this condition for these infinite graphs too. Example 18.1. For a fixed a ∈ (0, 1), we define a graph Pa on [0, 1] by connecting two points x and y if |x − y| = a. This defines a Borel graph. The graph structure of this Borel graph is quite simple: it is the union of finite paths. If a > 1/2, then it is just a matching together with isolated nodes. Of course, the Borel structure adds additional structure. We can make the example more interesting, if we wrap the interval [0, 1] around, and consider the graph Ca on [0, 1) in which a node x is connected to x+a (mod 1) and x−a (mod 1). If a is irrational, we get a graph that consists of two-way infinite paths; if a is rational the graph will consist of cycles. The following lemma is very useful and it also motivates some of the definitions in the sequel. Lemma 18.2. A graph G on a Borel space (Ω, B) is a Borel graph if and only if for every Borel set B ∈ B, the neighborhood NG (B) is Borel. Proof. Suppose that G is a Borel graph, and let B ∈ B. Then B ′ = E(G) ∩ (B × Ω) is a Borel set. Furthermore, if we project B ′ to the second coordinate, then the inverse image of any point is finite. A classical theorem of Lusin [1930] implies that the projection is also Borel; but this projection is just NG (B). Conversely, assume that G has the property that the neighborhood of any Borel set is also Borel. Let Pi (i = 1, 2, . . . ) range over all partitions of Ω into a finite number of sets in J . We claim that ∩ ∪ ( ) J × NG (J) ; (18.1) E(G) = i J∈Pi
this will prove that G is Borel. First, let (x, y) ∈ E(G). If x ∈ J ∈ Pi , then y ∈ NG (J) and so (x, y) ∈ J × NG (J). Hence it follows that (x, y) is contained in the right hand side of (18.1). Second, let (x, y) be a pair contained in the right hand side of (18.1), then for every i ≥ 1 there is a set J ∈ Pi for which (x, y) ∈ J × NG (J). Hence y ∈ NG (J), and so there is a point zi ∈ J such that y ∈ NG (zi ), which means that zi ∈ NG (y). But NG (y) is a finite set, and so this can hold for all sets J ∈ Pi , J ∋ x only if x ∈ NG (y). This shows that (x, y) ∈ E(G). There is a rather rich theory of Borel graphs (see e.g. Kechris and Miller [2004]). We state and prove only a few results that we need. The following theorem, which extends Brooks’ Theorem from finite graphs to Borel graphs, was proved by Kechris, Solecki and Todorcevic [1999]. We say that a coloring of the nodes of a Borel graph is a Borel coloring, if nodes with any given color form a Borel set. Theorem 18.3. Every Borel graph has a Borel coloring with D + 1 colors. Proof. We start with constructing a countable Borel coloring. Consider the countable Boolean algebra J = {J1 , J2 , . . . } generating Borel sets. For every node v ∈ Ω there is a set Ji such that N (v) ⊆ Ji but v ∈ / Ji . Let ψ(v) denote the smallest index i for which Ji has this property. Then trivially the sets ψ −1 (i) (i = 1, 2, . . . ) are disjoint. Two adjacent nodes u and v cannot have ψ(u) = ψ(v),
18.1. BOREL GRAPHS
331
so the sets ψ −1 (i) (i = 1, 2, . . . ) consist of nonadjacent points. It is easy to check (using Lemma 18.2) that these sets are Borel sets. Next, for each i = 1, 2, . . . , we recolor each node in ψ −1 (i) with the least color that does not occur among its neighbors. We do this in order, so the nodes in ψ −1 (2) get recolored before the nodes in ψ −1 (3) etc. Using Lemma 18.2, it is easy to prove that those points that switch their color to a given j form a stable Borel set. Hence the final coloring is Borel. It is trivial that every new color is bounded by D + 1. Now we turn to coloring the edges. Shannon’s Theorem asserts that the edges of a multigraph with maximum degree D can be colored by at most 3D/2 colors. For simple graphs, Vizing’s Theorem gives the better bound of D + 1. For Borel graphs, only a weaker result is known (which is quite trivial in the finite case): Theorem 18.4. Every Borel graph has a Borel edge coloring with 2D − 1 colors. Proof. We define a new Borel graph, the “line-graph” L(G) of G, defined on the set E(G) (which, as a subset of Ω × Ω, is equipped with a sigma-algebra of Borel sets), where two edges of G are adjacent if and only if they share a common endpoint. It is straightforward to see that this graph L(G) is Borel, and has degrees at most 2D − 2. Applying Theorem 18.3 to L(G), we get the Theorem. If we have a Borel graph, then various basic constructions lead to Borel sets and functions. We state and prove this fact for the degree function, but many other similar assertions can be proved (see the Exercises below). For every set A ⊆ Ω and x ∈ Ω, let degG A (x) denote the number of neighbors of x in A. We suppress the superscript G if there is only one graph around. Lemma 18.5. Let G be a Borel graph. Then for every Borel set A ⊆ V (G), degA (x) is a Borel function of x. Proof. Let Pi (i = 1, 2, . . . ) range over all partitions of V (G) into D sets from the generator set J . Then ∑ degA (x) = max 1NG (A∩J) (x), i∈N
showing that this function is Borel.
J∈Pi
Let G be a graph (of any cardinality). We define the local distance of two nodes u, v ∈ V (G) by d◦ (u, v) = inf{2−r : BG,r (u) ∼ = BG,r (v)}. This turns V (G) into a semimetric space. Unfortunately, two different nodes may be at distance 0: this happens exactly when there is an automorphism of G moving one onto the other. (This is the first occurrence of the “curse of symmetry” in this part; it has caused difficulties in Chapter 6, and it will haunt us when constructing graphings or designing local algorithms for large graphs.) We call the topology defined by this semimetric the local topology. This defines a local topology on V (G)× V (G). Assuming that there are no points at distance 0, local distance defines an ultrametric space (i.e., the triangle inequality holds in a very strong sense: d◦ (x, y) ≤ max{d◦ (x, z), d◦ (z, y)}). This implies (or it is easy to see directly) that the set of nodes whose r-neighborhood has a fixed isomorphism type is both closed and open, and such sets form an open basis. The space is totally disconnected.
332
18. GRAPHINGS
Proposition 18.6. Let G be a bounded degree graph (of any cardinality), and suppose that G has no automorphism. Then E(G) is closed in the local topology, and hence G is a Borel graph with respect to the Borel space defined by the local topology. Proof. Let xy ∈ / E(G), and let y1 , . . . , yd be the neighbors of x. Since G has no automorphism, there is an r ≥ 1 such that BG,r (y) ̸∼ = BG,r (y1 ), . . . , BG,r (yd ). We claim that if u, v ∈ V (G) such that d◦ (u, x) < 2−r and d◦ (v, y) < 2−r , then uv ∈ / E(G). Assume that uv ∈ E(G). By the definition of the distance function, d◦ (u, x) < 2−r implies that BG,r+1 (x) ∼ = BG,r+1 (u). Let, say, y1 correspond to v under this isomorphism, then BG,r (y1 ) ∼ = BG,r (v). But d◦ (u, x) < 2−r implies that BG,r (v) ∼ = BG,r (y), so BG,r (y1 ) ∼ B (y), a contradiction. = G,r What to do if G has automorphisms? One possibility is to decorate the nodes from some set K of “colors”, in order to break all automorphisms. A similar construction will be described in Section 18.3.4, and here we don’t go into the details. Exercise 18.7. Let G be a Borel graph, and let us add all edges that connect nodes at distance 2. Prove that the resulting graph G2 is Borel. Exercise 18.8. Let G be a Borel graph. Prove that for every 1-labeled simple graph F , the quantity homu (F, G) is well-defined, and it is a Borel function of u ∈ V (G). Exercise 18.9. Let G be a Borel graph and let Vk denote the set of nodes with degree k. Prove that Vk is a Borel set. Exercise 18.10. Let G be a Borel graph and let Vk denote the union of its finite components with k nodes. Prove that Vk is a Borel set. Exercise 18.11. Prove that every Borel graph has a maximal stable set of nodes that is Borel. Exercise 18.12. Prove that if a graph with bounded degree has no automorphism, then its cardinality is at most continuum.
18.2. Measure preserving graphs Now we come to the definition of graphs that will serve as limit objects for convergent sequences of bounded degree graphs. We endow the sigma-algebra (Ω, B) with a probability measure λ. We say that a graph G with node set Ω is measure preserving, or a graphing, if it is Borel and for any two measurable sets A and B, we have ∫ ∫ (18.2) degB (x) dλ(x) = degA (x) dλ(x). A
B
In other words, “counting” the edges between A and B from A, we get the same as counting them from B. To be precise, a graphing is a quadruple G = (Ω, B, λ, E), where Ω = V (G) is a set, B = B(G) is a Borel sigma-algebra on Ω, λ = λG is a probability measure on B, and E = E(G) ∈ B × B is a Borel set satisfying the measure preserving condition (18.2). Remark 18.13 (On terminology). The name “graphing” was introduced by Adams [1990], and it refers to the representation of the classes of an equivalence relation as the connected components of a Borel graph. It seems, however, that the usage is shifting to the one above. Since these objects are analogous to our
18.2. MEASURE PRESERVING GRAPHS
333
“graphons” in the dense case (whose name comes from the contraction of graphfunction), I like the parallel “graphon–graphing”, and will adopt the above meaning. Besides the probability measure λ on the points, there are two (related) measures that often play a role. The integral measure of the degree function is often called the volume: ∫ deg(x) dλ(x). (18.3) Vol(A) = A
The volume of the whole underlying set is the average degree: ∫ (18.4) d0 = Vol(Ω) = deg(x) dλ(x). Ω
We can normalize the volume to get a probability measure λ∗ (A) = Vol(A)/Vol(Ω). We call the distribution λ∗ the stationary distribution of G; the name refers to the random walk on G. We can define a finite measure η = ηG on (Ω × Ω, B × B) by ∫ (18.5) η(A × B) = degB (x) dλ(x) A
for product sets (A, B ∈ B). It is not hard to see that Caratheodory’s Theorem applies and we can extend η to the sigma-algebra B × B. If we want a probability measure on the edges, we can normalize by the average degree: the measure η/d0 can be considered as the uniform probability measure on E(G). Equation (18.2) implies that η is invariant under interchanging the coordinates. Both marginals of η give the volume measure. Lemma 18.14. The measure η is concentrated on E(G). Proof. Let J = {J1 , J2 , . . . }. We claim that (18.6)
E(G) = (Ω × Ω) \
∞ ∪ (
) Ji × (Ω \ NG (Ji )) .
i=1
It / ( clear that E(G) )is contained in the right hand side. Conversely, if (x, y) ∈ ∪ is J × (Ω \ N (J )) , then for each i for which J ∋ x, we have y ∈ N (J ). So i G i i G i i there is a zi ∈ Ji adjacent to y. Since y has finite degree, this can hold for each Ji only if x is (adjacent to y. This) proves (18.6). Since η Ji × (Ω \ NG (Ji )) = 0 by the definition of η, equation (18.6) implies that η(Ω × Ω \ E(G)) = 0. Assuming that the average degree is positive, one way to generate a random edge from the distribution η/d0 is to select a point x from the distribution λ∗ , and then select an edge incident with x uniformly at random. Conversely, selecting a random edge from the distribution η/d0 , and then selecting randomly one of its endpoints, we get a point from the distribution λ∗ . To describe the connection between λ and λ∗ in this language, we can generate a point from λ∗ by generating a random point x according to λ, and keeping it with probability deg(x)/D (else, rejecting it and generating a new one). If there are no isolated nodes, we can generate a point from λ by generating a random point x according to λ∗ , and keeping it with probability 1/ deg(x).
334
18. GRAPHINGS
Example 18.15. If D = 1, then every graphing G is the graph of an involution φ : S → S for some set S ⊆ V (G). Since S = NG (V (G)), it is measurable. Furthermore, for any measurable A ⊆ S we have ∫ ∫ ( ) λ φ−1 (A) = degA (x) dλ(x) = degφ−1 (A) (x) dλ(x) = λ(A), φ−1 (A)
A
and so φ is a measure preserving map.
Example 18.16 (Graphings from graphs). For any finite graph F , we define a graphing GF as follows. Let V (F ) = [n], and let us split the unit interval [0, 1) into n intervals Ji = [(i − 1)/n, i/n). For every edge ij ∈ E(F ) with i < j, let us connect every point x ∈ Ji to x + (j − i)/n ∈ Jj . It is not hard to verify that the resulting graph GF is measure preserving. Every connected component of GF is isomorphic to F . See Figure 18.1(a) for the graphing on [0, 1] representing of the pentagon. The picture is similar to the pixel picture of the graphon associated with a simple graph, except that instead of a black square, we have a white square with a diagonal. Example 18.17 (Cyclic graphing). Consider the graph Ca introduced in Example 18.1. Endowing it with the uniform measure on [0, 1] turns it into a graphing. If a is rational, then every connected component of Ca is a cycle; else, every connected component is a 2-way infinite path. In this latter case, we call Ca an irrational cyclic graphing (Figure 18.1(b)).
Figure 18.1. Three graphings: (a) the pentagon as a graphing, (b) a cyclic graphing, (c) the union of a symmetric finite set of segments with slopes ±1; every such picture defines a graphing. The slopes of ±1 correspond the measure preservation. Remark 18.18. It may be useful to allow graphs measurable with respect to the completion of B instead of B with respect to the probability measure λ (in the case of the interval [0, 1], this means allowing Lebesgue measurable sets instead of Borel sets). We call a graph Lebesgue measurable if for every set A ∈ A, its neighborhood NG (A) ∈ A. The correspondence between graphings and Lebesgue measurable graphs is described in Exercises 18.30–18.32. 18.2.1. Verifying measure preservation. Suppose that we have a measurable graph G with a probability measure λ on the node set. How to verify that this graph is measure preserving? Let us describe some methods to do so.
18.2. MEASURE PRESERVING GRAPHS
335
Edge measure. The simplest method, which often works well, is to specify a measure η on the edge set satisfying (18.5). To be more precise, we consider a probability space (Ω, B, λ) and a Borel graph G on it. Suppose that there exists a finite measure η on the Borel sets in Ω × Ω, which is invariant under interchanging the coordinates, concentrated on E, and its marginal is the volume measure Vol. This trivially implies that (18.2) holds. Borel subgraphs. Every subgraph of a graphing that is in a sense explicitly definable is itself a graphing: there is no constructive way to violate (18.2). The following lemma makes this precise. Lemma 18.19. Let G = (Ω, B, λ, E) be a graphing, and let L ⊆ E be a symmetric Borel set. Then G′ = (Ω, B, λ, L) is a graphing. Proof. Let A, B ⊆ Ω be Borel sets. We want to show that ∫ ∫ ′ ′ (18.7) dG (x) dλ(x) = dG B A (x) dλ(x). A
B
[ ] First we prove that this equation holds when L = E ∩ (S × T ) ∪ (T × S) with two disjoint Borel sets S, T . Indeed, for any two Borel sets A and B, ∫ ∫ ∫ ′ G dG (x) dλ(x) = d (x) dλ(x) + dG B B∩T B∩S (x) dλ(x) A
A∩S
A∩T
∫
∫
dG A∩S (x) dλ(x)
= B∩T
∫ dG A∩T (x) dλ(x) =
+ B∩S
′
dG A (x) dλ(x). B
A similar computation shows that (18.7) holds if L = E ∩ (S × S) for any Borel set S. To prove the lemma in general, we use induction on the degree bound D. For D = 1 the assertion is trivial. Let ε > 0, let J = {J1 , J2 , . . . } be a countable generator set of B, and let Pn be a partition of V (G) into the atoms generated by J1 , . . . , Jn . Let Xn be the set of points with degree D all whose neighbors belong to the same class of Pn . Since any two points are separated by Pn if n is large enough, we have ∩n Xn = ∅, and hence λ(Xn ) ≤ ε if n is large enough. Let us fix such an n. For S ∈ Pn , let S ′ = S \ Xn . For every S, T ∈ Pn , the graph G(S, T ) obtained by restricting the edge set of G to (S ′ × T ′ ) ∪ (T ′ × S ′ ) is measure preserving by the special case proved above. In G(S, T ), each point has degree at most D − 1, by the definition of S ′ and T ′ . Hence by the induction hypothesis, restricting the edge set to E(G(S, T )) ∩ L we get a graphing G′ (S, T ), which means that ∫ ∫ G′ (S,T ) G′ (S,T ) dB dλ(x) = dA (x) dλ(x). A
B ′
Since the graphings G (S, T ) are edge-disjoint, it follows that G1 = ∪S,T G′ (S, T ) is measure preserving. We get G1 from G′ by deleting all edges incident with Xn , and hence for any two measurable sets A and B, we have ∫ ∫ ∫ ∫ ′ G1 G′ G′ dB (x) dλ(x) = dB (x) dλ(x) + dB∩Xn (x) dλ(x) + dG B\Xn (x) dλ(x). A
A
A
A∩Xn
336
18. GRAPHINGS
Here ∫ ∫ ∫ ′ G dG (x) dλ(x) ≤ d (x) dλ(x) = B∩Xn B∩Xn A
A
and
∫
dG A (x) dλ(x) ≤ Dλ(Xn ) ≤ Dε,
B∩Xn
′
dG B\Xn (x) dλ(x) ≤ Dλ(Xn ) ≤ Dε. A∩Xn
Hence ∫ ∫ ∫ ∫ ′ G1 G1 G′ + 2Dε dG d (x) dλ(x) − d (x) dλ(x) ≤ (x) dλ(x) − d (x) dλ(x) B A B A A
B
A
B
= 2Dε. Since ε was arbitrarily small, this proves (18.7).
Corollary 18.20. The intersection and union of two graphings on the same probability space are graphings. Proof. Let G1 and G2 be the two graphings, then consider G1 ∩ G2 (we keep the underlying point set and do the set operation on the edge set). This is a Borel subgraph of G1 , and hence it is a graphing. The assertion about the union is trivial if the graphings are edge-disjoint. In the general case, consider the graphs G1 \ G2 , G2 \ G1 and G1 ∩ G2 . These three graphs are Borel subgraphs of one of the graphings G1 and G2 , and hence they are graphings. But then so is their union, which is just G1 ∪ G2 . Measure preserving families. Another way to “certify” the measure preservation condition is to use the simpler notion of invertible measure preserving maps. Let A1 , . . . , Ak , B1 , . . . , Bk be measurable subsets of a Borel space (Ω, B), and let φi : Ai → Bi be invertible measure preserving maps. The tuple H = (φ1 , . . . , φk ) will be called a measure preserving family (see Gaboriau [2002], Kechris and Miller [2004]). From every measure preserving family H we get a di→ − rected multigraph G on Ω by connecting x, y ∈ Ω for every i such that y = φi (x). The edges of this digraph are colored with k colors in such a way that each colorclass defines a measure preserving bijection between two measurable subsets of Ω. Forgetting the orientation and the edge-colors of this digraph, we get a graph G with degrees bounded by 2k, which we call the support graph of the measure preserving family. It is more natural perhaps to assume that the maps φ1 , . . . , φk are involutions, in which case we get an undirected graph right away. We say that the measure preserving family is involutive. A little advantage of working with involutions is that we could extend the maps φi to measure preserving involutions Ω → Ω, and would not have to worry about domains Ai and ranges Bi . A graphing with its edges colored and oriented so that each color defines an invertible measure preserving map is equivalent to a measure preserving family. Conversely, every graphing can be “certified” by an appropriate measure preserving family.
18.2. MEASURE PRESERVING GRAPHS
337
Theorem 18.21. A graph supporting a measure preserving family is a graphing. Conversely, for every graphing G there is an involutive measure preserving family with at most 2D − 1 parts supported by G. This measure preserving family is not unique in general. Proof. If the family consists of a single measure preserving map φ, then it is easy to check that its support graph is measure preserving. For a general measure preserving family, this follows by Corollary 18.20. To prove the converse, Theorem 18.4 implies that the edges of G can be split into at most 2D −1 Borel sets that are matchings. By Lemma 18.19, the involutions that these matchings define are measure preserving. While it is nicer to work with measure preserving involutions, the generality allowed in the first statement of the previous theorem has some merits. It is often easier to construct a family of non-involutory measure preserving maps to support the graph. Furthermore, while the minimum number of maps in a measure preserving family with given support graph, and the minimum number of maps in an involutive family can be mutually bounded by factors of 2, the former may be smaller. Both of these merits are illustrated by the following example. Example 18.22. Consider an irrational cyclic graphing Ca as in Example 18.17; we may assume without loss of generality that 0 < a < 1/2. This graphing is 2-regular, and as a graph it consists of disjoint 2-way infinite paths. We claim that Ca cannot be represented by two involutions. Suppose that the involutions φ1 and φ2 define Ca . Then each point x ∈ V (Ca ) is matched with x − a (mod 1) by φ1 and to x + a (mod 1) by φ2 , or the other way around. Let A1 denote the set of points matched the first way, and A2 , the rest. Then trivially A1 and A2 are Borel sets, A1 ∪ A2 = V (Ca ), and A2 = A1 + a (mod 1). But this is impossible by basic results in ergodic theory, since the map x 7→ x + a is ergodic. On the other hand, we can represent Ca by three involutions: One matches points x and x + a if 0 < x ≤ 1 − a and x ∈ (2ka, (2k + 1)a]( for some k ∈ N; the] other matches points x and x + a if 0 < x ≤ 1 − a and x ∈ (2k + 1)a, (2k + 2)a for some k ∈ N; the third matches points x and x + a − 1 if 1 − a < x ≤ 1. Example 18.23 (Squaring the circle). Answering a problem of Tarski, Laczkovich [1990] proved that a circular disc D can be partitioned into a finite number of sets, and these can be translated so that they form a partition of a square S with the same area. (It is not known whether this can be achieved by measurable pieces.) This result gives rise to interesting graphings. Let X1 , X2 , . . . , Xm be the pieces of D, and let v1 , . . . , vm be the translation vectors (so that X1 + v1 , . . . , Xm + vm form a partition of S). We may assume that D and S are disjoint, and λ(D) = λ(S) = 1/2. We define a bipartite graph G on D ∪ S by connecting x ∈ D to y ∈ S iff y − x ∈ {v1 , . . . , vm }. Clearly G is a Borel graph, and every point has degree at most m. Furthermore, those edges that are defined by the same vector vi define a measure preserving map between D ∩ (S − v − i) and (D + vi ) ∩ S. Hence by Theorem 18.21, G is a graphing. The theorem of Laczkovich is equivalent to saying that the vectors v1 , . . . , vm can be chosen so that the resulting graphing G has a perfect matching. It is not
338
18. GRAPHINGS
known whether (for an appropriately rich family of translations) it has a Borel perfect matching. (Exercise 18.29 shows that a graphing can have a perfect matching, but no Borel perfect matching.) Exercise 18.24. Let G be a graphing, and let us add all edges that connect nodes at distance 2. Prove that the resulting graph G2 is a graphing. Exercise 18.25. Let G be a graphing in which every connected component has at most k nodes. Let S ⊆ V (G) be a measurable set that intersects every connected component. Prove that λ(S) ≥ 1/k. Exercise 18.26. Let G be a graphing on [0, 1], let E ′ ⊆ E(G) be a symmetric Borel set, and E ′′ = E(G)\E ′ . Consider the graphings G′ and G′′ on [0, 1] defined by the edge sets E ′ and E ′′ (cf. Lemma 18.19). Prove that ηG = ηG′ + ηG′′ . Exercise 18.27. Let G be a graphing, and let S ⊆ E(G) be a (not necessarily symmetric) Borel set. For x ∈ V (G), let d+ S (x) denote the number of pairs − (x, y) ∈ S, and let d (x) denote the number of pairs (y, x) ∈ S. Prove that S ∫ ∫ d+ dλ = V (G) d− S dλ. V (G) S Exercise 18.28. Let Gi (i = 1, 2) be a graphing on [0, 1]. Define the categorical product G1 × G2 (as a graph on [0, 1] × [0, 1]), and prove that it is a graphing (cf. Aldous and Lyons [2007]). Exercise 18.29. Let Ca be an irrational cyclic graphing. (a) Show that it contains a perfect matching, but no Borel measurable perfect matching (Laczkovich [1988]). (b) Show that if M is any Borel measurable matching in Ca , then there is an augmenting path: a path of odd length such that its endpoints are not covered by M , but every second edge on the path belongs to M (Elek and Lippner [2010]). Exercise 18.30. Prove that for every Lebesgue measurable graph G there is a set T ⊆ V (G) with measure 1 such that the subgraph G[T ] obtained by restricting the edge set to T × T is Borel. Exercise 18.31. Show by an example that there is a Borel graph on [0, 1] that is not Lebesgue measurable. ( ) Exercise 18.32. (a) Let G be a Borel graph such that λ NG (A) = 0 for all A ⊆ V (G) with λ(A) = 0. Prove that G is Lebesgue measurable. (b) Prove that every graphing is Lebesgue measurable.
18.3. Random rooted graphs The construction of Benjamini and Schramm [2001] for limit objects of bounded degree graph sequences is different from graphings, but closely related to them. (It is interesting to note that the relationship of these objects with convergent graph sequences is not completely known, but their relationship with each other is quite well understood.) They will be helpful throughout, in particular in extending the notion of weak isomorphism to graphings and characterizing it. Let us pick a random point x of a graphing G. (When talking about a random point of a graphing G = (Ω, B, E, λ), we mean that it is selected according to the probability measure λ.) Consider the connected component Gx containing it: This is a countable graph with bounded degree, and it has a special “root”, namely the node x. So we have generated a random rooted connected graph with bounded degrees. We start with making precise sense of this, by constructing an interesting Borel graph, the “graph of graphs”.
18.3. RANDOM ROOTED GRAPHS
339
18.3.1. The graph of graphs. Let G• denote the set of connected countable graphs (with all degrees bounded by D) that also have a specified node called their root. We denote the root of a graph H ∈ G• by root(H). (We could consider these graphs as 1-labeled, but in this context calling the single labeled node the “root” is common.) Sometimes we will write we also write H = (H ′ , v), where v = root(H), and H ′ = [[H]] is the unrooted graph underlying H. For every rooted graph H, we denote by deg(H) the degree of its root. The set of finite graphs in G• will be denoted by G • . We consider two graphs in G• the same if there is an isomorphism between them that preserves the root. Let Br ⊆ G • denote the set of r-balls, i.e., the set of finite rooted graphs in which every node is at a distance at most r from the root. (Since we keep the degree bound D fixed, the set Br is finite.) For a rooted countable graph (H, v) ∈ G• , let BH,r = BH,r (v) ∈ Br denote the neighborhood of the root with radius r. For every r-ball F , let G•F denote the set of “extensions” of F , i.e., the set of those graphs H ∈ G• for which BH,r ∼ = F (as rooted graphs). With all this notation, we can define something more interesting. First we define a graph H on the set G• . Let (H, v) ∈ G• . For every edge e = vv ′ ∈ E(H), connect (H, v) by an edge to the rooted graph (H, v ′ ) ∈ G• . So every edge of H incident with v gives rise to an edge of H incident with (H, v). In particular, all degrees in H are bounded by D. We call H the “Graph of Graphs”. The r-neighborhood of a rooted graph H in H is almost the same as the rneighborhood of the root in H. To be precise, if [[H]] has no automorphism, then BH,r (H) ∼ = BH,r (root(H)). The image of v ∈ V (H) under this isomorphism is obtained by moving the root of H to v. However, if there is an automorphism of H moving root(H) to v, then the “curse of symmetry” strikes again, and this map is not one-to-one. We endow the set G• with a metric: For two graphs H1 , H2 ∈ G• , define their ball distance by d• (H1 , H2 ) = inf{2−r : BH1 ,r ∼ = BH2 ,r }. (This is reminiscent of the semimetric defining the local topology of a graph, but it is defined on a different set.) This turns G• into a metric space. It is easy to see (Exercise 18.43) that the sets G•F are both closed and open, they form an open basis, and the space (G• , d• ) is compact and totally disconnected. The sigma-algebra of Borel sets of (G• , d• ) will be denoted by A. As usual, every subset of G• you will ever need, and every function G• → R you will ever define, will be Borel. The graph H is Borel with respect to the sigma-algebra A. This follows by the same argument as Proposition 18.6. 18.3.2. Invariant measures. You may have noticed that we have not defined any measure on the set G• . We will in fact consider many probability measures on it; these measures will carry the real information. Let σ be any probability measure on (G• , A). It is easy to see that the degree deg of the root is a measurable function on G. Nodes with different degrees cause some complication here, and it will be best to introduce right away another probability measure on G• : we define ∫ /∫ σ ∗ (A) = deg dσ deg dσ. A
G•
340
18. GRAPHINGS
Clearly these integrals are finite (at most D). If the denominator is 0, then σ is concentrated on the graph consisting of a single node (the only connected graph with average degree 0). In this trivial case, we set σ ∗ = σ. Next, we introduce a very important condition on the distribution, which expresses that all possible roots of a graph are taken into account judiciously. (The meaning of this condition will be clearer when we get to limits of graph sequences.) Select a rooted graph H according to the distribution σ ∗ and then select a uniform random edge e from the root. We consider e as oriented away from the root. This way we get a probability distribution σ → on the set G→ of graphs in G• with an oriented edge (the “root edge”) from the root also specified. We say that σ is involution invariant (another name commonly used is unimodular) if the map G→ → G→ obtained by reversing the orientation of the root edge is measure preserving with respect to σ → . By an involution invariant random graph we mean a random rooted connected graph drawn from an involution invariant probability measure on G• . Example 18.33. Let G ∈ G be a connected finite graph. Selecting a root from V (G) uniformly at random defines a probability distribution σG on G• (concentrated on rooted copies of G). If we select the root v with probability proportional to the degree of v, and a root edge e incident with v uniformly, then simple computation shows that the edge is uniformly distributed among all oriented edges, and so the distribution σG is involution invariant. Example 18.34 (Path). Let P denote the two-way infinite path with any node chosen as a root. The distribution on G• concentrated on P is involution invariant, since selecting any root edge we still get a distribution concentrated on a single graph, so reversing the edge preserves this distribution. Example 18.35 (Triangular Ribbon). Let P be the 2-way infinite path and let R be the “ribbon” obtained from P by connecting every pair of nodes at distance 2 (Figure 18.2(a)). If we specify any node as its root, we get a connected countable 4-regular rooted graph R• . The distribution on G• (where D = 4) concentrated on R• is involution invariant. To see this, note that if we select an oriented edge as a root, we get only two edge-rooted graphs H ′ and H ′′ (up to isomorphism): either an edge of P is selected, or an edge not on P . Furthermore, reversing the edge yields an isomorphic edge-rooted graph, so the distribution on {H ′ , H ′′ } remains involution invariant.
(a)
(b)
Figure 18.2. (a) The triangular ribbon. (b) The grandmother graph. Note that the neighborhood of a node reveals the orientation of the tree.
18.3. RANDOM ROOTED GRAPHS
341
Example 18.36 (Grandmother graph). Let T be a two-way infinite binary tree, and let us connect every node to its grandparent (Figure 18.2(b)). The resulting 8-regular connected graph H has a node-transitive automorphism group, and hence if specify any node as its root, we get isomorphic rooted graphs. The distribution on G• (where D = 8) concentrated on H is, however, not involution invariant. To see this, we again determine the possible edge-rooted graphs that we obtain from H. We get 4 types: an edge of T oriented “up”, an edge of T oriented “down”, an edge not in T oriented “up”, and an edge not in T oriented “down”. It is not hard to check that these are non-isomorphic, and the probabilities they are obtained with are (in the above order) 2/8, 1/8, 4/8, 1/8. Reversing the root edge interchanges the first two and the last two probabilities, so the distribution is not involution invariant. 18.3.3. Graphings and random rooted graphs. Let us recall the simple construction at the beginning of this section, providing a link between graphings and involution invariant distributions. Let G be a graphing and choose a random point x ∈ V (G). The connected component Gx of G containing x, with a root x, is a graph in G• , which we call a random rooted component of G. The map x 7→ Gx , which we will call the component map, is measurable as a map (V (G), A) → (G• , A), and thus every graphing G defines a probability distribution σ = σG on (G• , A). Selecting x from the distribution λ∗ , the graph Gx will be a random rooted connected graph from the distribution σ ∗ . Selecting an edge of Gx incident with x, we get an edge of G from the probability distribution ηG /d0 , together with an orientation. Since ηG is symmetric, shifting the root to the other endpoint does not change the distribution. Hence σ is involution invariant. So every graphing gives rise to a (well-defined) involution invariant random graph; we also say that the graphing represents this distribution. The following converse to this statement was known in various related contexts for some time; for written versions, see Aldous and Lyons [2007] and Elek [2007a]. Theorem 18.37. Every involution invariant probability distribution on G• can be represented by a graphing. Here we don’t claim uniqueness any more. This will be quite relevant a little later! Before proving this theorem, let us consider a couple of examples. Example 18.38 (Grid). Consider the involution invariant random graph concentrated on the infinite planar grid (with any root). We can construct a graphing representing this by taking two irrational reals α and β that are independent over the rationals, and connecting every x ∈ [0, 1) to x + α (mod 1), x − α (mod 1), x + β (mod 1) and x − β (mod 1). There are many other constructions, for example, we could take the unit square [0, 1)2 as the underlying probability space, and connect (x, y) to (x ± α (mod 1), y ± β (mod 1)). Every connected component of this graphing will be an infinite grid. Example 18.39. Consider the involution invariant random graph that is concentrated on the D-regular tree with a root (which is unique up to isomorphism). How to represent this by a graphing? For D = 2, an irrational cyclic graphing is a graphing representation. Here is another one: let us randomly 2-color the path with colors 0 and 1. The sequence of colors to the right from the root (including the root) can be thought of as a number
342
18. GRAPHINGS
x ∈ [0, 1]. (Let us ignore the ambiguity that one can write rational numbers whose denominator is a power of two in two different ways; this involves a set of measure 0 anyway.) Similarly, the sequence of colors to the left of the root (this time excluding the root) gives a number y ∈ [0, 1]. So every point of the unit square corresponds to a 2-colored 2-way infinite path (with a root), and this correspondence is bijective. To shift the root to the right by one step corresponds to replacing x by 2x (mod 1) and y by y/2 if x < 1/2 and by y/2+1/2 if x ≥ 1/2. (This map, as a transformation of {0, 1}Z , is called a Bernoulli shift. In its other incarnation as a transformation of the unit square, it is sometimes called the dough folding map.) The graphing will be defined on [0, 1]2 (with the Lebesgue measure), and every point (x, y) will be connected to its image and to its inverse image under the dough folding map. For D > 2 a geometric construction of a representing graphing is more complicated. We can start from the fact that a D-regular tree is the Cayley graph of a group freely generated by D involutions. This group can be represented, for example, by reflections in D generic hyperplanes through the origin in D-space. If we take the surface of the unit sphere in RD with the uniform probability distribution, and connect every point to its images and inverse images under these reflections, we get a graphing representing the infinite D-regular tree. (Points on the D hyperplanes in which we reflect will have lower degree than D; but we can delete these points and all their images under the group, which is a set of measure 0, and then we get a graphing in which every connected component is a D-regular tree.) As a preparation for the proof of Theorem 18.37, we describe a rather simple construction of a measurable graph from an involution invariant random graph (which, unfortunately, does not always represent the right measure). Consider the “graph of graphs” H constructed in Section 18.3.1. We have seen that H is a Borel graph. Unfortunately, this Borel graph with the measure σ does not represent the involution invariant distribution σ in general. For example, in Example 18.34 the graph H will have a single node (the rest is of measure 0), and this is too simple to represent anything nontrivial. Exercise 18.47 describes an even worse example, showing that an involution invariant measure does not necessarily turn the graph H into a graphing. The problem is clearly caused by the symmetries in the graph σ is concentrated on. This motivates the following lemma. Lemma 18.40. Let σ be an involution invariant distribution on G• such that almost all graphs from σ have no automorphism (as unrooted graphs). Then (H, σ) is a graphing that represents σ. Proof. First, we prove that (H, σ) is a graphing. Choose a rooted graph (H, v) ∈ G• from the distribution σ ∗ , and a random neighbor u of v (uniformly from the neighbors). By the assumption that almost surely H has no nontrivial automorphism, the graph (H, u) is almost surely different from (H, v). The pair (H, u)(H, v) is an edge of H, and selecting another neighbor of v, we would get a different edge of H. This describes a way to generate a random edge of H (with an orientation). It follows from the involution invariance of σ that this distribution on edges of H is invariant under flipping the orientation. Since the marginal of η is σ ∗ , this implies that (H, σ) is measure preserving.
18.3. RANDOM ROOTED GRAPHS
343
Let (H, v) be any rooted graph from σ with no automorphism. We want to argue that the connected component of H containing (H, v) as a root is isomorphic to (H, v). Indeed, assigning the role of the root to different nodes of H gives non-isomorphic rooted graphs, and so we get an injection of V (H) → G• . From the definition of adjacency in H, this embedding preserves adjacency and nonadjacency, and the range is a connected component of H. This proves that (H, σ) represents σ. 18.3.4. The Bernoulli Graphing. To prove Theorem 18.37, we have to break the symmetries of the graphs from σ. For this, we generalize the “graph of graphs” construction. Let G+ denote the set of triples (H, v, α), where (H, v) ∈ G• , and α : V (H) → [0, 1] is a weighting of the nodes of H. Two such rooted, weighted graphs are considered the same, if there is an isomorphism between the graphs that preserves the root and preserves the weights. Let A+ be the sigma-algebra on G+ generated by the following cylinder sets: for an r ≥ 0, we fix the isomorphism type of the ball B ∈ Br with radius r about the root, and also for every node in B, we specify a Borel set in [0, 1] from which the weight is to be chosen. (The choice of the interval [0, 1] to use for weighting is arbitrary; we could have decorated the nodes by the points of any other Borel probability space. In fact, other decoration will be needed later.) It is easy to see that (G+ , A+ ) is a Borel sigma-algebra. We can define a graph on G+ , the Graph of Weighted Graphs H+ , as follows: we connect two nodes (G, α) and (G′ , α′ ) by an edge if G′ arises from G by shifting the root to one of its neighbors (while keeping all the nodeweights); in other ( words, ) if there is an isomorphism ι from( G to G′)(as unrooted graphs) such that α′ ι(u) = α(u) for every u ∈ V (G), and ι root(G) is a neighbor of root(G′ ). Given a probability distribution σ on (G• , A), we can define a probability distribution σ + on (G+ , A+ ) as follows: Select a random graph H ∈ G• from the distribution σ, and assign independent, uniform random weights from [0, 1] to the nodes. Lemma 18.41. If σ is an involution invariant probability distribution on (G• , A), then Bσ = (G+ , A+ , σ + ) is a graphing, and it represents σ. This construction associates a graphing with every involution-invariant distribution, which we call the Bernoulli graphing representing σ. (The name refers to its close relationship with the Bernoulli shift in Example 18.39.) This lemma also provides the proof of Theorem 18.37. Proof. The proof is essentially the same as the proof of Lemma 18.40, since assigning the random weights to the nodes of a graph G chosen from σ almost surely destroys all automorphisms. Exercise 18.42. Prove that for r > 2, an r-ball has at most Dr nodes, and the r number of non-isomorphic r-balls is bounded by DD . Exercise 18.43. Prove that the sets G•F are closed and open in the metric space (G• , d• ), they form an open basis, and the space is homeomorphic to a Cantor set. Exercise 18.44. Show that a function f : G• → R is continuous if and only if for every finite rooted graph F ∈ Br there is an ε > 0 such that for all graphs H ∈ G• with BH,r ∼ = F , we have |f (G) − f (F )| < ε.
344
18. GRAPHINGS
Exercise 18.45. Prove that the following functions, defined for H ∈ G• , are Borel: (a) 1(H ∼ = H0 ), where H0 ∈ G and isomorphism is meant as isomorphism of unlabeled ( ) ( ) graphs; (b) ω(H); (c) χ(H); (d) f (H) = lim supn→∞ e Br (G) /v Br (G) . Exercise 18.46. Let H ∈ G• , and consider the probability distribution on G• concentrated on H. Prove that this distribution is involution invariant if and only if the automorphism group of H is transitive on the nodes, and for every oriented → → edge − e , the orbit of − e (as a directed graph) has equal indegrees and outdegrees. Exercise 18.47. Let G be a countably infinite graph consisting of a two-way infinite path with two nodes of degree 1 hanging from every node of the path. Let G1 and G2 be the two rooted graphs obtained from G by selecting a node of degree 4 and a node of degree 1 as its root, respectively. Let π be the probability distribution on G in which π(G1 ) = 1/3 and π(G2 ) = 2/3. Show that π is involution invariant, but (H, π) is not measure preserving.
18.4. Subgraph densities in graphings Our next goal is to generalize our central notion, graph homomorphism, to involution-invariant random graphs and to graphings. Following our general framework, we consider homomorphisms from a small graph into (say) an involutioninvariant random graph, as well as homomorphisms from an infinite graph into small graphs. In both cases, there will be some nontrivial preparation that is needed, including proving some results that are important on their own right. In this section we address the easier task, defining homomorphism densities in involution-invariant random graphs and graphings. This takes some preparation, discussing an important consequence of involution invariance. 18.4.1. Mass Transport Principle. Let us consider the set G•• of 2-labeled connected countable graphs (again, graphs that are isomorphic as 2-labeled graphs are identified). We can endow this set with a compact topology just like we did for G• , and then Borel functions are defined. The following very useful characterization of involution invariance was proved by Aldous and Lyons [2007] (it was in fact this form how Benjamini and Schramm first defined involution-invariant measures). Proposition 18.48 (Mass Transport Principle). Let σ be a probability distribution on G• . Then σ is involution invariant if and only if for every Borel function f : G•• → R+ the following identity holds: (∑ ) (∑ ) (18.8) E f (H, v, u) = E f (H, u, v) , u
u
where (H, v) ∈ G• is randomly chosen from the distribution σ. Equation (18.8) allows that both sides be infinite. One can generalize the principle to functions without the nonnegativity condition, by applying it to separately to the negative and positive parts (however, one has to make sure that no infinite expectations occur). The name refers to the following interpretation: if we transport f (H, u, v) amount of mass from node u to node v in the countable graph H (where this amount depends only on the isomorphism type of (H, u, v), and it depends on it in a Borel measurable way), then on the average, no node gains or loses. This is trivial for finite graphs, but it does not automatically hold for countable graphs, since there are distributions on G• that are not involution invariant.
18.4. SUBGRAPH DENSITIES IN GRAPHINGS
345
One can formulate a related identity for graphings, which shows that the Mass Transport principle is in a sense a form of Fubini’s Theorem. To illustrate that we can vary the conditions, let us say that a function (where S is a set ∑ f : S ×S → R∑ of any cardinality) is locally finite, if the sums x∈S f (x, y) and y∈S f (x, y) are absolutely convergent (this includes that they have a countable number of nonzero terms). Proposition 18.49. Let G be a graphing, and let f : V (G) × V (G) → R be a locally finite Borel function. Assume that f (x, y) = 0 unless y ∈ V (Gx ). Then ∫ ∑ ∫ ∑ f (x, y) dy f (x, y) dx = V (G)
y
V (G)
x
If f is the indicator function of edges between two Borel sets A and B, then this identity gives the basic measure preservation identity 18.2. The Mass Transport Principle can be used to prove properties of “typical” graphs from an involution invariant distribution; see Exercises 18.52 and 18.53. We describe the proof of the graphing version; Proposition 18.48 can be proved along the same lines. Proof. It suffices to prove this identity for nonnegative Borel functions, since we can write a general f as the difference of two such functions, which will also be locally finite. It suffices to prove it for bounded Borel functions, since we can obtain an unbounded nonnegative f as the limit of an increasing sequence of bounded Borel functions. By scaling, we may assume that the range of f is contained in [0, 1]. We may assume that there is an r ∈ N such that f (x, y) = 0 unless y ∈ BG,r (x), since we can obtain f as the limit of an increasing sequence of such functions. Finally, it suffices to consider 0-1 valued Borel functions, since we can write f as ∫ 1 f (x, y) = 1(f (x, y) ≥ t) dt, 0
and here the function 1(f (x, y) ≥ t) is a 0-1 valued Borel function for every t. A 0-1 valued Borel function corresponds to a Borel subset S ⊆ V (G) × V (G). Consider the graphing Gr obtained from G by connecting any two nodes at distance at most r. (This is indeed a graphing by Exercise∫ 18.7.) The S is a Borel subset ∫ set of E(Gr ), and hence by Exercise 18.27 we have d+ d− S dλ = S dλ. But this is just the identity to be proved. 18.4.2. Homomorphism frequencies. Recall that in a finite graph G, t∗ (F, G) can be interpreted as the expectation of homu (F ′ , G), where F ′ is obtained from F by labeling one of its nodes, and u is a random node of G. This can be generalized to homomorphisms into an involution-invariant random graph. Indeed, let σ be an involution-invariant distribution, and let (H, v) denote a random rooted graph from σ. Then homv (F ′ , H) is a bounded nonnegative integer, and since it depends only on a bounded neighborhood of the root v, it is a Borel ( ) function of (H, v). So t∗ (F ′ , σ) = E homv (F ′ , H) is well defined. Based on the finite case, we expect that t∗ (F ′ , σ) is independent of the node labeled in F , and so we can define t∗ (F, σ) = t∗ (F ′ , σ). This is correct, but not obvious. Proposition 18.50. Let F ′ and F ′′ be two 1-labeled graphs obtained from the same unlabeled connected graph F . Let σ be an involution-invariant distribution, then t∗ (F ′ , σ) = t∗ (F ′′ , σ).
346
18. GRAPHINGS
Proof. Let F ∗ be the 2-labeled graph obtained by labeling both nodes that are labeled in F ′ or F ′′ . Then ∑ for every rooted graph (H, u) generated ∑ according to σ, we have homu (F ′ , H) = v homuv (F ∗ , H) and homu (F ′′ , H) = v homvu (F ∗ , H). Applying the Mass Transport Principle to the function f (H, u, v) = homuv (F ∗ , H), we get the assertion. If we want to define t(F, G) for a graphing G, it suffices to note that the graphing defines a unique involution invariant distribution σ, and so we can define t(F, G) = t(F, σ). Explicitly, homu (F ′ ,∫G) is a bounded measurable function of u ∈ V (G), so the definition t∗ (F, G) = homu (F ′ , G) dλ(u) makes sense. By the argument above, it follows that this value is independent of the choice of the labeled node in F . Exercise 18.51. Show that the distribution concentrated on the Grandmother Graph (Example 18.36) violates the Mass Transport Principle. Exercise 18.52. Suppose that an involution-invariant random rooted graph is almost always infinite. Prove that the expected degree of the root is at least 2 (Aldous and Lyons [2007]). Exercise 18.53. If G ∈ G• is a random graph from an involution invariant distribution, then with probability 1, G has 0, 1, 2 or infinitely many ends. (An end of a graph is defined as an equivalence class of one-way infinite paths, where two paths are equivalent if they cannot be separated by a finite set of nodes.)
18.5. Local equivalence Two graphings G1 and G2 are locally equivalent, if they have the same subgraph densities: t∗ (F, G1 ) = t∗ (F, G2 ) for every connected simple graph F . We can formulate this notion in terms of the sample distributions. Recall that if G is a finite graph, then ρG,r (B) (B ∈ Br ) is the probability that the r-neighborhood of a random node of G is isomorphic to the r-ball B. This definition extends to graphings verbatim. Two graphings G1 and G2 are locally equivalent if and only if the neighborhood distributions ρGi ,r are the same for every radius r. This is also equivalent to saying that they represent the same involution invariant distribution on the “graph of graphs” G• . Our goal is to characterize weakly equivalent pairs of graphings. To this end, we have to introduce some special maps that certify weak equivalence. Let G1 and G2 be two graphings. We call a measure preserving map φ : V (G1 ) → V (G2 ) a local isomorphism, if its restriction to almost every connected component of G1 is an isomorphism with one of the connected components of G2 . To be more precise, if x is a random point of V (G1 ), then φ(x) is a random point of V (G2 ), and (G1 )x ∼ = (G2 )φ(x) with probability 1 (as rooted graphs). So the involution invariant distributions defined by G1 and G2 are the same. Note, however, that a local isomorphism may not be invertible. Example 18.54 (Cycles vs. bicycles). Consider the cyclic graphing Ca for an irrational real number a, and let C′a be defined by connecting every x ∈ [0, 1] to x ± (a/2) mod 1/2 if x < 1/2, and to 1/2 + (x ± (a/2) mod 1/2) if x ≥ 1/2. Informally, C′a consists of two disjoint copies of Ca , shrunk by a factor of 2. We claim that the map φ : x 7→ 2x mod 1 is a weak isomorphism from C′a to Ca . It is easy to see that this map is measure preserving. Furthermore, any connected component G of C′a lies either entirely in [0, 1/2) or entirely in [1/2, 1),
18.5. LOCAL EQUIVALENCE
347
and therefore φ, restricted to G, is an isomorphism with an appropriate component of Ga . Example 18.55 (Grid II). Let G1 and G2 be the two graphings defined in Example 18.38 representing the infinite grid, where G1 is defined on [0, 1) and G2 , on [0, 1)2 . Then the map (x, y) 7→ x + y (mod 1) defines a local isomorphism from G2 to G1 : it is trivially measure preserving, and it is an isomorphism when restricted to any connected component of G2 . The relation “G1 has a local isomorphism into G2 ” is transitive (since local isomorphisms can be composed), but not symmetric (since local isomorphism may not be invertible). To make it symmetric, we define (temporarily, as we shall see) two graphings to be bi-locally isomorphic, if there is a third graphing that has local isomorphisms into both. This is now a symmetric relation, but don’t we lose transitivity? No, we don’t; the next lemma will be a main step in proving this. Lemma 18.56. Let G1 and G2 be two graphings that both have a local isomorphism into a third graphing G0 . Then they are bi-locally isomorphic. Proof. Let φi : Gi → G0 be a local isomorphism. Consider the set Ω = {(x1 , x2 ) : xi ∈ V (Gi ), φ1 (x1 ) = φ2 (x2 )}, We have Ω ∈ A(G1 ) × A(G2 ), which follows from the easy-to-check formula (18.9)
(V (G1 ) × V (G2 )) \ Ω =
∞ ∪
−1 φ−1 1 (Ji ) × φ2 (V (G0 ) \ Ji ),
i=1
where {J1 , J2 , . . . } is a countable generating Boolean algebra for A(G0 ). Let A denote the sigma-algebra obtained by restricting A(G1 ) × A(G2 ) to Ω. Next, we define an appropriate probability measure on (Ω, A), or more conveniently phrased, a coupling measure on V (G1 ) × V (G2 ) concentrated on Ω. Proposition A.7 in the Appendix implies that such a measure exists. We denote by λ its restriction to (Ω, A). Finally, we have to define an appropriate graph on the probability space (Ω, A, λ). We first define it on the product V (G1 ) × V (G2 ) as the categorical product G1 × G2 , and then take the induced subgraph on Ω. (The graph G1 × G2 has degrees bounded by D2 , but its restriction to Ω has degrees at most D, which will follow from the proof below, but can be checked directly.) The projection map from V (G1 ) × V (G2 ) onto V (Gi ) can be restricted to Ω, to get a map ψi : Ω → V (Gi ). We claim that ψi is local isomorphism. First, we show that ψi is measure preserving. Indeed, for any set A1 ∈ A(G1 ), we have ψi−1 (A1 ) = (A1 × V (G2 )) ∩ Ω, and hence ( ) ( ) λ ψi−1 (A1 ) = λ (A1 × V (G2 )) ∩ Ω = λ(A1 × V (G2 )) ( ) = λ φ1 (A) ∩ φ2 (V (G2 )) = λ1 (A1 ). (where λi is the probability measure of the graphing Gi ). Let (x1 , x2 ) ∈ Ω be a random point from the distribution λ, and let H denote its connected component. Let Hi = (Gi )xi denote the connected component of Gi containing xi , and let H0 be the connected component of G0 containing φ1 (x1 ) = φ2 (x2 ). With probability 1, these three graphs are isomorphic, and the maps φi give isomorphisms H1 ∼ = H0 ∼ = H2 . For every node v ∈ V (H0 ), let αi (v) ∈ V (Hi )
348
18. GRAPHINGS
( ) be the node with φi αi (v) = v. (Note that φ−1 i (v) is not necessarily a singleton, but exactly one of its elements belongs to H .) i ( ) Let α(v) = α1 (v), α2 (v) ∈ Ω, then α(u) and α(v) are adjacent in G if and only if u and v are adjacent in H0 , by the definition of the product graph. So α is an embedding of H0 into H as an induced subgraph. We want to argue that α(H0 ) = H. Indeed, if not, then there (is a node ) α(u) that is connected by an edge of H to a node (w1 , w2 ) ∈ V (H) \ α V (H0 ) . Now α1 (u) is connected to w1 in H1 by the definition of the product graph, and hence z = φ1 (w1 ) is connected to u in H0 , and so it is a node in H0 . Similarly, φ2 (w2 ) is in H0 . Furthermore, (w1 , w2 ) ∈ V (H) ⊆ Ω, and hence φ1 (w1 ) = φ2 (w2 ) = z. But then (w1 , w2 ) = α(z), a contradiction. It follows that with probability 1, G(x1 ,x2 ) ∼ = (Gi )xi , and ψi provides this isomorphism, which proves that ψi is a local isomorphism. Thus G1 and G2 are bi-locally isomorphic. Corollary 18.57. Bi-local isomorphism is a transitive relation. Proof. Figure 18.3 tells the whole story: composing two bi-local isomorphisms, the middle part can be “flipped up” by Lemma 18.56 to get a single bi-local isomorphism.
Figure 18.3. (a) Bi-local isomorphism is transitive. (b) The relationship between graphings, Bernoulli graphings and Bernoulli lifts. We need a construction, introduced by Hatami, Lov´asz and Szegedy [2012], which is similar to the Bernoulli graphing defined in Section 18.3.3. For every graphing G, we define the graphing G+ , which we call the Bernoulli ( ) lift of G. The points of G+ will be pairs (x, ξ), where x ∈ V (G) and ξ : V Gx → [0, 1]. We connect (x, ξ) to (y, ζ) if y is a neighbor of x and ξ = ζ (note that if y is a neighbor of x, then Gx = Gy , so ξ and ζ are weightings of the same graph). Let Ω be the set of such pairs. We define a sigma-algebra A on Ω generated by the sets φ−1 (A) (A ∈ A(G)) and ψ −1 (A) (A ∈ B). To define a measure on (Ω, A), it is perhaps easiest to describe how a random element is generated: we pick a random point x of G, and then assign independent random weights ξ(u) to the nodes u of (G)x . It is easy to see that G+ is a graphing. There is a natural map φ : V (G+ ) → V (G), which simply forgets the weighting. There is also a natural map ψ : V (G+ ) → V (Bσ ), where σ is a involutioninvariant distribution represented by G, (which )forgets all about G except the distribution σ: this is defined by ψ(x, ξ) = Gx , ξ . The following lemma is straightforward to verify.
18.6. GRAPHINGS AND GROUPS
349
Lemma 18.58. The maps φ : V (G+ ) → V (G) and ψ : V (G+ ) → V (Bσ ) defined above are local isomorphisms. Now we are able to prove the main result in this section. Theorem 18.59. Two graphings are locally equivalent if and only if they are bilocally isomorphic. Proof. The “if” part is trivial by the discussion above. To prove the “only if” part, let G1 and G2 be two locally equivalent graphings, we want to prove that they are bi-locally isomorphic. They define the same involution invariant distribution σ on G, and so they are both locally equivalent to the Bernoulli graphing Bσ . Lemma 18.58 implies that they are both bi-locally isomorphic to Bσ . Corollary 18.57 implies that they are bi-locally isomorphic. Exercise 18.60. Let 0 ≤ a ≤ 1 be an irrational number, and define a graphing C′′a (related to the graphings Ca and C′a in Example 18.54): we connect every x ∈ [0, 1] to 1/2 + (x ± (a/2) mod 1/2) if x < 1/2, and to x ± (a/2) mod 1/2 if x ≥ 1/2. Informally, we consider two circles of circumference 1/2, and connect every point on one to the two points a/2 away from the corresponding point on the other circle. Prove that C′′a is locally equivalent to Ca , and construct a local isomorphism C′′a → Ca .
18.6. Graphings and groups Let (Ω, φ1 , . . . , φm ) be a measure preserving family, where φi : Ω → Ω are measure preserving maps defined on the whole Ω. The maps φi generate a group Γ of measure preserving maps. Conversely, let Γ be a finitely generated group, with generators g1 , . . . , gm . Let us assume, for simplicity, that together with each gi , its inverse gi−1 is also among the generators. Let H be the Cayley graph of Γ: V (H) = Γ, and for every x ∈ Γ and 1 ≤ i ≤ m, we connect x to xgi by an edge. We will get every edge in both directions, so we may consider G as an undirected graph. Consider the random rooted graph model which is concentrated on G with root 1 (the identity element of Γ). This is involution invariant (see Exercise 18.46), so it defines a graphing G (Theorem 18.21). In fact, every oriented edge of H is marked by a generator of the group, and this marking is inherited by G, and we can use this marking to construct the graphing in Theorem 18.21. This correspondence between finitely generated groups, graphings and measure preserving families explains the interest of group theorists in the limit theory of bounded degree graphs. We do not elaborate this quite broad and very active area in this book; see e.g. Kechris and Miller [2004].
CHAPTER 19
Convergence of bounded degree graphs Convergence of a graph sequence with bounded degree was perhaps the first which was formally defined (Benjamini and Schramm [2001]), but it is a more complex notion than convergence in the dense case. There are more than one nonequivalent reasonable definitions, which capture different aspects of the notion that graphs in a sequence are becoming “more and more similar” to each other. We treat two such notions in this Chapter. 19.1. Local convergence and limit 19.1.1. Distances. Just like in the dense case, we need to introduce some notions of a distance between two bounded degree graphs before starting the treatment of convergence. We don’t have a good analogue of the cut distance, and therefore we will have to do with the sampling distance. This is a simpler notion, but of course less powerful, since knowing that two graphs are close in sampling distance does not translate into information about their global structure. Recall that we have defined (in the introduction, informally) the sampling distance of two graphs F, F ′ ∈ G. To make the definition precise, we start with the sampling distance of depth r, which is just the variational distance of neighborhood distributions, and we simply sum these with convenient (but ad hoc) weights: (19.1)
r (F, F ′ ) = dvar (ρF,r , ρF ′ ,r ), δ⊙
δ⊙ (F, F ′ ) =
∞ ∑ 1 r δ (F, F ′ ). r ⊙ 2 r=0
Note that in the second expression, the term with r = 0 is 0. Lacking a good analogue of the cut distance, this sampling distance will be our main tool when comparing graphs. Since we can sample from a graphing just as well as we can sample from a graph, this distance is defined for two graphings, and also for a graphing and a graph. Since the sample distributions ρF,r and ρGF ,r are the same (where GF is the graphing on [0, 1] representing the finite graph F ), we have (19.2)
r r δ⊙ (F, F ′ ) = δ⊙ (GF , GF ′ ),
and
δ⊙ (F, F ′ ) = δ⊙ (GF , GF ′ ).
r+1 r Using the trivial inequality δ⊙ (G, G′ ) ≤ δ⊙ (G, G′ ), we get that for every r ≥ 1,
1 r 1 r δ⊙ (G, G′ ) ≤ δ⊙ (G, G′ ) ≤ r + δ⊙ (G, G′ ). r 2 2 An easy consequence of inequalities 19.3 is that if we want to estimate δ⊙ (G, G′ ) of two finite graphs (which by definition depends on infinitely many radii r) with an error less than ε > 0, then we can do the following. We choose a positive integer k > log(3/ε), and then we take sufficiently many samples from both G and G′ so that the empirical distributions φr and φ′r of r-balls in these samples satisfy (19.3)
351
352
19. CONVERGENCE OF BOUNDED DEGREE GRAPHS
dvar (ρG,r , φr ) ≤ ε/6 and dvar (ρG′ ,r , φ′r ) ≤ ε/6 with high probability. We claim ∑k that A = r=0 2−r dvar (φr , φ′r ) is a good estimate of δ⊙ (G, G′ ). Indeed, with high probability, |δ⊙ (G, G′ ) − A| ≤ +
k k ∑ ∑ 1 1 d (ρ , φ ) + d (ρ ′ , φ′r ) var G,r r r r var G ,r 2 2 r=0 r=0 ∞ ∑ r=k+1
1 dvar (ρG,r , ρG′ ,r ). 2r
Here the first term is bounded by (1+1/2+· · ·+1/2k )ε/6 < ε/3, and similar bound applies for the second term. The last term is bounded by 1/2k+1 + 1/2k+2 + · · · = 1/2k ≤ ε/3. So the total error is less than ε. Often we have to compare two graphings G1 , G2 that are defined on the same Borel graph G, and only differ in the invariant distributions π1 , π2 on them. In this case the sampling distance can be bounded by the variational distance of π1 and π2 ; it is easy to see that for every r ≥ 1, we have (19.4)
r δ⊙ (G1 , G2 ) ≤ dvar (π1 , π2 ),
and
δ⊙ (G1 , G2 ) ≤ dvar (π1 , π2 ).
We will also need the edit distance of graphs/graphings on the same node set. For two graphs G, G′ ∈ G with V (G) = V (G′ ) = [n], this is defined as 1 d1 (G, G′ ) = |E(G)△E(G′ )|. n The difference from the dense case is in the normalization. (We will not need the “best overlay” version δ1 .) To extend the edit distance to two graphings G, G′ with V (G) = V (G′ ) = [0, 1], there is a little subtlety. To “count” the edges to be edited, we use the edge measure defined by (18.5); but these two graphings have different edge measures, which edge measure to use? After a little thought, the solution is natural: d1 (G, G′ ) = ηG (E(G) \ E(G′ )) + ηG′ (E(G′ ) \ E(G)). We note that ηG (E(G) ∩ E(G′ )) = ηG′ (E(G′ ) ∩ E(G)) (Exercise 18.26). An easy inequality between the edit distance and sampling distances is stated in the following proposition. Proposition 19.1. For any two graphings G and G′ on the same underlying probability space and r ∈ N, we have r δ⊙ (G, G′ ) ≤ 2Dr d1 (G, G′ ),
and
δ⊙ (G, G′ ) ≤ 3d1 (G, G′ )1/ log(2D) . In particular, these bounds hold for finite graphs. Proof. Let S = E(G) \ E(G′ ) and S ′ = E(G′ ) \ E(G).
Claim 19.2. Let x be a random point in G, then the probability that the ball BG,r (x) contains any edge in S is bounded by 2Dr ηG (S). The number of points x for which BG,r (x) contains a given edge is bounded by 2Dr (this follows by an elementary computation). In the finite case, this implies the Claim by an easy double counting. For graphings, this double counting can be justified using the Mass Transport Principle for graphings, Proposition 18.49. For
19.1. LOCAL CONVERGENCE AND LIMIT
353
( ) two nodes of G, let f (x, y) = degS (y)1 x ∈ BG,r (y) (where degS (y) denotes the number of edges in S incident with y). Let x be a random point of V (G). Then ) ( ∑ { ( ) } ( ) degS (y) λ x : E BG,r (x) ∩ S ̸= 0 ≤ E |BG,r (x) ∩ S| ≤ E =E
(∑ y
) f (x, y) = E
(∑
y∈BG,r (x)
) ( ) f (y, x) ≤ 2Dr E degS (x) = 2Dr η(S).
y
This implies the Claim. Applying the inequality in Claim 19.2 to S ′ as well, we see that with probability at least 1 − 2Dr η(S) − 2Dr η ′ (S ′ ), the ball BG,r (x) contains no edge in S and the ball BG′ ,r (x) contains no edge of S ′ . In this case these two balls are isomorphic, proving that ( ) r δ⊙ (G, G′ ) = dvar BG,r (x), BG′ ,r (x′ ) ≤ 2Dr η(S) + 2Dr η ′ (S ′ ) = 2Dr d1 (G, G′ ). This proves the first inequality. The second inequality follows from the first by ⌈ ⌉ using (19.3) with r = − log(2d1 (G, G′ ))/ log(2D) . 19.1.2. Locally convergent sequences. A sequence of graphs Gn with v(Gn ) → ∞ is locally convergent if the r-neighborhood densities ρGn (F ) converge for every r and every r-ball F . Similarly as for the subgraph sampling in the dense case, there are equivalent frequency type parameters whose convergence could be used instead of the neighborhood densities: we could stipulate the convergence of t∗ (F, Gn ) for every connected graph F , as proved in Proposition 5.6, and also the convergence of t∗inj (F, Gn ) or t∗ind (F, Gn ) for every connected graph F . All these versions would lead to the same convergent sequences. We can also describe convergent sequence of graphs as Cauchy sequences in the sampling distance. From (19.3) it is easy to check that a graph sequence (Gn ) with bounded degrees and with v(Gn ) → ∞ is convergent if and only if it is Cauchy in the sampling distance. Of course, this is essentially just a reformulation of the definition, and not a structural characterization of convergence as Theorem 11.3 was in the dense case. For every Gn and every positive integer r, neighborhood sampling provides a probability distribution ρGn ,r on the set Br of r-balls. By the definition of convergence, for every fixed r, this distribution tends to a limit distribution σr . The sequence of these distributions has some special properties. First of all, it is consistent, in the sense that selecting a random r-ball from σr , and deleting from it the nodes at distance r from the root, we get an (r − 1)-ball from distribution σr−1 . There is another, more subtle consistency property, which is a finite version of involution invariance for a distribution on rooted countable graphs. Note that an r-ball contains other (r − 1)-balls, centered at the neighbors of the original root, and these from these balls we should also be able to recover σr−1 . Since there are several of these in any given r-ball, we have to be a bit careful with the counting. As done before, we bias the distribution by the degree of the root: for F ∈ Br , define deg(F )σr (F ) σr∗ (F ) = ∑ . ∗ H∈Br deg(H)σr (H) Select a random r-ball F from σr∗ , and a random edge uv from the root u of F . We can create two random (r − 1)-balls with a root edge: one, we delete from F the
354
19. CONVERGENCE OF BOUNDED DEGREE GRAPHS
nodes at distance more than r − 1 from the root u; two, we delete all the nodes at distance more than r −1 from v, and consider v the root and vu the root edge. If we get the same distribution on (r − 1)-balls with a root edge with both construction, and this holds for every r ≥ 1, we say that the sequence (σ1 , σ2 , . . . ) is involution invariant. To sum up, every convergent graph sequence gives rise to an involution invariant and consistent probability measure on Br . We have defined involution invariance for measures on the “graph of graphs”, and of course the two notions are closely related. From every probability distribution σ on (G• , A), we get a probability distribution σr on Br by selecting a random countable graph from σ and taking the r-ball about its root. It is trivial that this sequence (σ1 , σ2 , . . . ) is consistent. Conversely, from every consistent sequence (σ1 , σ2 , . . . ) we get a distribution σ on (G• , A), by defining σ(G•F ) = σr (F ) for every r-ball F . It is also straightforward to check that (σ1 , σ2 , . . . ) is involution invariant if and only if σ is. So there is a bijective correspondence between consistent involution invariant sequences (σ1 , σ2 , . . . ), where σr is a distribution on Br , and involution invariant probability distributions on (G• , A). Through this correspondence, every locally convergent graph sequence gives rise to an involution invariant distribution σ on the sigma-algebra (G• , A). This is the Benjamini–Schramm limit or local limit of the sequence. By Theorem 18.37, it follows that there is a graphing G such that ρGn ,r → ρG,r for every r ≥ 1. We write Gn → G, and say that this graphing “represents” the limit; but one should be careful not to call it “the” limit; all locally equivalent graphings represent the same limit object. Example 19.3 (Cycles III). Consider the sequence of cycles (Cn ). It is easy to see that the Benjamini–Schramm limit is the involution invariant distribution concentrated on the two-way infinite path (with any node specified as the root). The graphing Ca constructed in Example 18.1 represents the limit of this sequence for any irrational number a. All connected components of this graphing are two-way infinite paths, so generating a random point x ∈ [0, 1], its connected component (Ca )x has the Benjamini-Schramm limit distribution. Every graphing locally equivalent to Ca (i.e., in which almost all connected components are two-way infinite paths) provides a representation of the limit object. Example 18.54 shows two different graphings representing this limit. Example 19.4 (Grids). Let Gn be the n×n grid in the plane. The r-neighborhood of a node v is a (2r + 1) × (2r + 1) grid (rooted in the middle), provided v is farther than r − 1 from the boundary. This holds for (n − 2r)2 of the nodes, which means almost all nodes if n → ∞. So in the weak limit, every r-neighborhood is a (2r + 1) × (2r + 1) grid. Hence the Benjamini–Schramm limit of this sequence is concentrated on the infinite square grid (with a root). We have seen (Example 18.38) how to represent this involution invariant distribution as a graphing. Example 19.5 (Penrose tilings). This is a more elaborate example, but interesting in many respects. We can tile the plane with the two rhomboids of the left side of Figure 19.1. This is no big deal, if we can use them periodically (for example, as in the middle of Figure 19.1); but we put decorations on the edges, and impose the restriction that these decorations must match along every common edge (as on the right side of Figure 19.1); in particular, we are not allowed to combine two of
19.1. LOCAL CONVERGENCE AND LIMIT
355
Figure 19.1. The Penrose rhomboids, an illegal tiling, and how they should be attached. the same kind into a single parallelogram. It turns out that you can tile the whole plane this way (in fact, in continuum many ways), but there is no periodic tiling. Figure 19.2 shows the graph obtained from a Penrose tiling of the plane. There is a related (in fact, equivalent) version, in which we use two deltoids instead of two rhomboids; such a tiling is also shown in Figure 19.2. A deltoid tiling can be obtained from a rhomboid tiling by cutting up the rhomboids into a few pieces and recombining these to form deltoids. (To figure out the details is left as a challenge to the reader.) One of the interesting (and nontrivial) features of such tilings is that every one of them contains each of the two rhomboids with the same frequency. Similar property holds for every configuration of rhomboids: if a finite configuration F of tiles can be completed to a tiling at all, then this configuration occurs in every Penrose tiling with the same frequency. To be precise, if we take a K × K square about the origin in the plane, and count how many copies of F it contains, then this number, divided by K 2 , tends to a limit if K → ∞. Moreover, this limit is independent of the Penrose tiling that we are studying.
Figure 19.2. A piece of a Penrose rhomboid tiling and of a deltoid tiling. We are not going to dive into the fascinating theory of Penrose tilings, but point out that their basic properties can be translated into graph limits. Let Gn be the graph obtained by restricting the graph of a Penrose rhomboid tiling to the n×n square about the origin. The above properties of the Penrose tiling imply that this sequence is convergent, and in fact it remains convergent if we interlace it with a sequence obtained from a different Penrose tiling. In other words, these finite pieces of any Penrose tiling converge to the same limit. The Benjamini–Schramm limit will be not the original Penrose tiling, but a probability distribution on all Penrose tilings. (This illuminates that in Example 19.4 of grids we end up with a single limiting grid only because grids are periodic.)
356
19. CONVERGENCE OF BOUNDED DEGREE GRAPHS
A graphing representation of the limit of Penrose rhomboid tilings can be described based on their characterization by de Bruijn [1981] (Figure 19.3); this was pointed out by M. B´ar´ asz.
Figure 19.3. A graphing describing the limit of Penrose rhomboid tilings. The underlying set is the union of parallel slices of a rhombic icosahedron through its vertices. The edges of the graphing are all translates of the edges of the polytope that connect two points on these planes. Example 19.6 (Large girth graphs). Let Gn be a sequence of D-regular graphs whose girth (length of the shortest cycle) tends to infinity (it is well known that such graph sequences exist). For every r ≥ 0 and sufficiently large n, the r-neighborhood BGn ,r (v) of any node v is a rooted tree Tr,D of depth r, in which all the nodes closer to the root than r have degree D. So the limiting sequence of distributions is concentrated on these trees Tr,D . The Benjamini–Schramm limit of this sequence is concentrated on a single countable graph, namely the D-regular tree (it does not matter where we put the root). We have seen (Example 18.39) how to construct a graphing representation of this involution invariant distribution. Example 19.7 (Various random D-regular graphs). Let G = G(n, D) denote a random D-regular multigraph. This notion itself is a bit tricky. We can of course define it as the uniform distribution over all D-regular graphs on n nodes; but this definition is quite difficult to handle. A more useful definition is called the configuration model. We start with a set S of nD nodes, partitioned into n sets S1 , . . . , Sn of size D. We take a random perfect matching on S (we better assume that nD is even), and then identify every Si into a single node labeled i. This way we obtain a random D-regular multigraph. If we want a random simple graph, we reject it if we get a graph with multiple edges, and try again. It is easy to compute that the expected number of loops, as well as the number of multiple edges in G(n, D) is bounded by a function of D. More generally, the expected number of k-cycles is bounded by (D − 1)k /2k + o(1) (when n → ∞). Hence it follows that for every fixed r and k, almost all nodes will be farther than r from any cycle of length k or less. In other words, almost all r-neighborhoods will be D-ary trees of depth r. So this sequence is locally convergent with probability 1, and its local limit is the same infinite D-regular tree as in the previous example. Finally, random bipartite D-regular graphs will be interesting for us. These can be generated just like above, except that we assume that n = 2m is even, and we use a random perfect matching between S1 ∪ · · · ∪ Sm and Sm+1 ∪ · · · ∪ Sn .
19.1. LOCAL CONVERGENCE AND LIMIT
357
By computations similar to the above, we can see that random D-regular bipartite graphs tend to the same local limit as random D-regular graphs, the D-regular rooted tree. 19.1.3. Which distributions are limits? A big difference from the dense case is that there is no easy way to construct a sequence of finite graphs that converges to a given graphing (or involution invariant distribution). In fact, we don’t know whether all involution invariant distributions arise as limit objects: Conjecture 19.8 (Aldous–Lyons [2007]). Every involution invariant distribution on (G• , A) is the limit of a locally convergent bounded-degree graph sequence. Since every involution invariant distribution can be represented by a graphing (Theorem 18.37), this is equivalent to asking whether every graphing is the local limit of a locally convergent sequence of bounded-degree graphs. This conjecture, which is a central unsolved problem in the limit theory of bounded-degree graphs, generalizes a long-standing open problem about sofic groups. It is known in some special cases: when the distribution is concentrated on trees (Bowen [2004], Elek [2010b]; see Exercise 19.12), and also when the graphing is “hyperfinite” (to be discussed in Section 21.1). The following is an interesting reformulation of this conjecture. Let Ar ⊆ RBr denote the set of all probability distributions ρG,r , where G ranges through all finite graphs. Let A′r ⊆ RBr denote the set of probability distributions ρG,r , where G ranges through all graphings. Equivalently, A′r consists of probability distributions on Br induced by an involution invariant probability distribution on G• . Clearly Ar ⊆ A′r . Proposition 19.9. (a) The closure Ar of Ar is a compact convex set. (a) A′r is a compact convex set. While most of the time the limit theory of graphs with bounded degree is more complicated than the dense theory, Proposition 19.9 represents an opposite case: in the dense case, even the set D2,3 discussed in Section 16.3.2 was non-convex with a complicated structure. Proof. (a) Let G1 and G2 be two finite graphs, and consider the graph G = consisting of v(G2 ) copies of G1 and v(G1 ) copies of G2 . Then ) 1( ρG,r (B) = ρG1 ,r (B) + ρG2 ,r (B) 2 for every r-ball B. This implies that Ar is convex. Since it is a bounded closed set in a finite dimensional space, it is compact. (b) The fact that A′r is closed follows from general considerations: the set M of involution-invariant measures, as a subset of the set of all probability measures on the compact metric space G• , is closed in the weak topology, and so it is compact. Using that each of the cylinders G•F is open-closed, the projection of M onto RBr is continuous, and hence the image, which is just A′r , is compact. The convexity of A′r follows by a construction similar to that in (a). v(G ) v(G ) G1 2 G2 1
The Aldous–Lyons Conjecture is equivalent to saying that Ar = A′r for every r. So if the conjecture fails to hold, then there is an r ∈ N and a linear inequality on RBr that is valid for Ar but not for A′r . This would be a linear inequality between
358
19. CONVERGENCE OF BOUNDED DEGREE GRAPHS
r-neighborhood densities that holds for every finite graph, but fails to hold for all graphings, a “positive” consequence of a “negative” fact. There is a finite version of the Aldous–Lyons conjecture, which was raised by this author at a conference, and was proved, at least in a non-effective sense, quickly by Alon [unpublished]: Proposition 19.10. For every ε > 0 there is a positive integer n such that for every graph G ∈ G there is a graph G′ ∈ G such that v(G′ ) ≤ n and δ⊙ (G, G′ ) ≤ ε. Proof. Let r = ⌈log(2/ε)⌉, and let G1 , . . . , Gm be any maximal family of r graphs in G such that δ⊙ (Gi , Gj ) > ε/2 for all 1 ≤ i < j ≤ m. Such a family is finite, since every graph is represented by a point in Ar , which is a bounded set in a finite dimensional space, and these points are at least ε/2 apart in the total variation distance. It follows that n = maxi v(Gi ) is finite. By the maximality of r the family, for every graph G there is an i ≤ m such that δ⊙ (G, Gi ) ≤ ε/2. We have v(Gi ) ≤ n, and by (19.3) δ⊙ (G, Gi ) ≤
1 1 ε + dr⊙ (G, Gi ) ≤ r + ≤ ε. 2r 2 2
Unfortunately, no effective bound on n follows from the proof (one can easily get an explicit bound on m, the number of graphs in the representative family, but not on the size of these graphs). It would be very interesting to give any explicit bound (as a function of D and ε), or to give an algorithm to construct H from G. Ideally, one would like to design an algorithm that would work locally, in the sampling framework, similarly to the algorithm in Section 15.4.2 in the dense case. Proposition 19.10 is related to the Aldous–Lyons Conjecture 19.8. Indeed, the Aldous–Lyons Conjecture implies that for any graphing G there is a finite graph G whose neighborhood distribution is arbitrarily close; Proposition 19.10 says that for any finite graph G there is a finite graph H of bounded size whose neighborhood distribution is arbitrarily close. Suppose that we have a constructive way of finding, for an arbitrarily large graph G with bounded degree, a graph H of size bounded by a function of r and ε that approximates the distribution of r-neighborhoods in G with error ε. With luck, the same construction could also work with a graphing in place of G, proving the Aldous–Lyons Conjecture. One route to disproving the Aldous–Lyons Conjecture could be to explicitly find the sets Ar and A′r for some r, and see that they are disjoint. Since the dimension of Ar grows very fast with r, it seems useful to consider even simpler questions. Instead of looking at Ar and A′r , we could fix a finite set {F1 , . . . , Fm } of simple graphs, assign the vector (t∗ (F1 , G), . . . , t∗ (Fm , G)) to every graph G ∈ G, and consider the set T (F1 , . . . , Fm ) of all such vectors. We define the set T ′ (F1 , . . . , Fm ) analogously, replacing graphs by graphings. By the same argument as above, the sets T (F1 , . . . , Fm ) and T (F1 , . . . , Fm ) are convex. The Aldous–Lyons Conjecture is equivalent to saying that T (F1 , . . . , Fm ) = T (F1 , . . . , Fm ) for every F1 , . . . , Fm . This leads us to the problem, very interesting on its own right, to determine the sets T (F1 , . . . , Fm ) and T (F1 , . . . , Fm ), and more generally, to extremal problems for bounded degree graphs. This should be the title of a chapter, but very little has been done in this direction. There are, of course, many results in extremal graph theory that concern graphs with bounded degree; but the limit theory of bounded degree graphs has not been applied to extremal graph theory in a sense in which the limit theory of dense graphs has been. One notable exception is the
19.1. LOCAL CONVERGENCE AND LIMIT
359
result of Harangi [2012], who determined the sets T (K3 , K4 ) and T (K3 , K4 ) for D-regular graphs. He found the same answer in both cases (so this did not give a counterexample to the conjecture). 19.1.4. On colored graphs. It will be useful at various points to extend our constructions and results to colored graphs, where the nodes are colored with b node-colors and the edges are colored with c edge-colors (where b and c are fixed positive integers). Colored graphings can be defined analogously, where every node set and edge set with a given color is Borel. Colored graphs and graphings can be used to express some properties and additional structures which we want to pass to the limit. For example, we can express measure preserving families (used in Section 18.2.1 to certify that a measurable graph is measure preserving) by edge-coloring. We could have formulated all our arguments in the previous chapter and this one in the more general context of colored graphs and graphings. Alternatively, we could repeat these arguments now for colored graphs. We will do neither; we point out in a few sentences how these generalizations would work, and leave it to the interested reader to think through that the arguments can be extended to colored graphs. Sampling from a colored graph results in a distribution of colored balls, and since there is only a finite number of them, all the arguments above remain valid. We can extend the notion of convergence to colored graphs of a fixed type (b, c), i.e., to graphs that are node-colored with b colors and edge-colored with c colors. The sampling process returns a colored r-ball, which is node-colored with b colors, edge-colored with c colors, and has a specified root. As before, we denote by ρG,r the probability distribution on colored r-balls about a random node (where the type (b, c) is understood). We say that the colored graph sequence is locally convergent, if the sequence (ρGn ,r (F ) : n = 1, 2, . . . ) converges for every r and every colored r-ball F . We can define the “graph of colored graphs”: its nodes will be all connected colored rooted countable graphs (with the same degree bound as always). Adjacency is defined as before; we color the node (H, v) with the color of v, and we color the edge (H, v)(H, u) with the color of the edge vu. Every convergent colored graph sequence has a limit object in the form of an involution invariant probability distribution on the “graph of colored graphs”, which in turn can be represented by a colored graphing. One could go a step further, and decorate every node and/or every edge by an element of a fixed compact Hausdorff space K. (For the dense case, a similar extension was treated in Section 17.1.) One could extend the notions of involution invariant distributions and measure preserving graphs to this case, but it would take more effort, and would have fewer applications. One example of an application would be the assignment of weights α to the nodes of graphs in G• in the proof of Theorem 18.37, which could be phrased as using a node-decoration from the compact space [0, 1]. We will use this more general construction in the next section, where colored graphs will play and important role. Exercise 19.11. Let G ∈ G and let S ⊆ E(G), |S| = εv(G). Prove that δ⊙ (G, G\ S) ≤ 4ε1/ log D . Exercise 19.12. Prove that if σ is an involution-invariant distribution such that a rooted graph chosen from σ is almost always a tree, then σ is the local limit of a finite graph sequence (Elek [2010b]).
360
19. CONVERGENCE OF BOUNDED DEGREE GRAPHS
Exercise 19.13. Prove that merging two node colors or two edge colors, every convergent colored graph sequence remains convergent.
19.2. Local-global convergence Are the notion of convergence and the limit object constructed above informative enough? The limit graphon of a dense sequence of graphs contains very much information about the asymptotic properties of the sequence. This is not quite so for the bounded degree case, unfortunately. Let us illustrate this by a simple example. Example 19.14. Let Gn be a sequence of random 3-regular bipartite graphs. Let Hn consist of two disjoint copies of Gn . The Benjamini–Schramm limit of both sequences is a distribution concentrated on a single 3-regular rooted tree T3 . This limit graphing is not uniquely determined. Among others, we have Bernoulli graphing T3 associated with T3 , but one could take the disjoint union T23 of two such graphings (with the node measure scaled down by 2). It seems that T3 represents the limit of Gn “better”, while T23 represents the union the limit of Hn “better”. As another example, if we consider the free group F3 with three generators acting without fixed points on a probability space, then the corresponding graphing (obtained by connecting every point to its images under any of the generators) represents the Benjamini–Schramm limit. One feels that the limit of the sequence (Gn ) is ‘better” represented if the action of the free group is ergodic, while for the limit of Hn , the space should be split into two invariant subsets of measure 1/2. This example suggests that in the limit object, the underlying σ-algebra carries combinatorial information. This is in stark contrast with the dense case (cf. Remark 10.1 and the discussion in that section). In this section we define a notion of convergence for graphs with bounded degree that is stronger than the local convergence (Hatami, Lov´asz and Szegedy [2012]). Among others, if a sequence of graphs is convergent in this stronger sense, then we can read off from the limit whether the graphs are expanders (up to a non-expanding part of negligible size). 19.2.1. Nondeterministic sampling distance. First, we define a new version of the sampling distance. Let G1 , G2 ∈ G, then their non-deterministic sampling distance of depth r for k colors is defined as the least c > 0 with the following property: for(every k-coloring α ) 1 of V (G1 ) there exists a k-coloring α2 of V (G2 ) r such that δ⊙ (G1 , α1 ), (G2 , α2 ) ≤ c, and vice versa. (The sampling distance of (G1 , α1 ) and (G2 , α2 ) means their sampling distance as colored graphs.) We denote (r,k) this distance by δ⊙ (G1 , G2 ). We then take, similarly as before, nd δ⊙ (G1 , G2 ) =
∞ ∑ ∞ ∑ k=0 r=0
1 (r,k) δ (G1 , G2 ). 2r+k ⊙
It is easy to see that these formulas define a metric on finite graphs. We can define the non-deterministic sampling distance of two graphings G1 and G2 similarly, except that we only allow Borel k-colorings, and have to use infimum
19.2. LOCAL-GLOBAL CONVERGENCE
361
instead of a minimum: (19.5)
(r,k)
δ⊙
{ ( ) r (G1 , G2 ) = inf c : ∀α1 ∃α2 δ⊙ (G1 , α1 ), (G2 , α2 ) ≤ c, and ( ) } r ∀α2 ∃α1 δ⊙ (G1 , α1 ), (G2 , α2 ) ≤ c .
nd The quantity δ⊙ (G, G′ ) is defined from this just like in the case of graphs. We say that two graphings G and G′ are locally-globally equivalent, if nd δ⊙ (G, G′ ) = 0. A sequence of graphs (Gn ) is locally-globally convergent if it is nd a Cauchy sequence in the nondeterministic distance, i.e., δ⊙ (Gn , Gm ) → 0 as nd n, m → ∞. We say that its local-global limit is graphing G, if δ⊙ (Gn , G) → 0. nd k,r in these definitions, and require the It is clear that we could replace d by d conditions for all k, r ≥ 1. We have defined nondeterministic distance and local-global equivalence in terms of coloring the nodes. We could allow coloring of the edges as well without changing the notion of equivalence. Let me elaborate this for local-global equivalence.
Proposition 19.15. Suppose that two graphings G and G′ are locally-globally equivalent. Then for any ε > 0, k ≥ 1, and Borel k-edge-coloring α and Borel k-point-coloring β of G, there (exists a Borel k-edge-coloring α′ and Borel k-point) ′ ′ ′ ′ ′ coloring β of G such that δ⊙ (G, α, β), (G , α , β ) ≤ ε. Proof. We want to encode the edge-coloring into a node-coloring. The first trick is to construct (independently of the coloring α) another Borel edge-coloring γ with 2D −1 colors such that no two adjacent edges have the same γ-color (Theorem 18.4). Using ( ) this, we define a point-coloring ζ: (we color a point) x with the pair β(x), σ(x) , where σ(x) is the set of all pairs α(x, y), γ(x, y) , where (x, y) ∈ E(G). This point-coloring uses a finite set K = [k] × 2[k]×[2D−1] of colors. From this point-coloring, we can recover the original coloring easily: for every point x, β(x) is the first element of ζ(x), and for every edge (x, y), σ(x) ∩ σ(y) has a unique element (a, b), and the original color of (x, y) was a. (The only role of the coloring γ was to make sure that this common element of σ(x) and σ(y) is unique.) By the definition of local-global equivalence, there is a K-coloring ζ ′ of the points of G′ such that δ⊙ ((G, ζ), (G′ , ζ ′ )) ≤ ε. We define an edge-coloring α′ and a point-coloring β ′ of G′ as follows. Since ζ(x) ∈ K, we can write ζ(x) = (b(x), s(x)), where b(x) ∈ [k] and s(X) ⊆ [k] × [2D − 1]. Let β ′ (x) = b(x); let α′ (x, y) be the smallest a ∈ [k] for which there is a g ∈ [2D − 1] such that (a, g) ∈ s(x) ∩ s(y); if no such g exists, then let α′ (x, y) = 1 (this will happen only for a small set of edges). Now comes the key observation: Whenever the r-neighborhoods of point x in (G, ζ) and of point y in (G′ , ζ ′ ) are isomorphic, then the r-neighborhoods of x in (G, α, β) and of x′ in (G′ , α′ , β ′ ) are also isomorphic. The rules for obtaining α and β from ζ and α′ and β ′ from ζ ′ work the same way in both graphs. Hence δ⊙ ((G, α, β), (G′ , α′ , β ′ )) ≤ δ⊙ ((G, ζ), (G′ , ζ ′ )) ≤ ε. 19.2.2. Graphings as local-global limits. We have seen that limits of locally convergent graph sequences can be described as involution-invariant distributions, and this representation of the limit is unique. We could also represent the limit by a graphing, but this was not unique, which means that graphings are more complicated objects than necessary. Why bother with graphings at all, why not use involution invariant distributions only? One justification for considering graphings is the following result of Hatami, Lov´asz and Szegedy [2012].
362
19. CONVERGENCE OF BOUNDED DEGREE GRAPHS
Theorem 19.16. For every locally-globally convergent sequence of finite graphs there is a graphing that is its local-global limit. For the proof, we need a couple of lemmas about k-colorings. First, let us discuss continuous k-colorings of a graphing. For this to make sense, we have to fix a topology on V (G). Of course, we should not use the standard topology of (say) [0, 1]: this would not admit nontrivial k-colorings. But if we use the local topology as defined in Section 18.1, we get interesting continuous colorings. Recall that this topology can be defined by a metric, where two points are 2−r -close if their r-neighborhoods are isomorphic. This topology is totally disconnected, so there will be nontrivial continuous k-colorings. A k-coloring will be continuous if an only if the color of a node can be determined just from the isomorphism type of its r-neighborhood for some finite r. We know by Lusin’s Theorem that every Borel function on a compact probability space can be approximated by a continuous function γ in the sense that the set {α ̸= γ} has arbitrarily small measure. Here, in general, γ will not be a k-coloring (if the underlying space of G is the unit interval, for example, then its range will be an interval, not [k]). However, with an appropriate topology on V (G), the approximating function can be chosen to be a coloring itself. Lemma 19.17. Let K be a compact metric space that is totally disconnected, let π be a probability measure on K, and let α : K → [k] be a Borel k-coloring of K. Then for every ε > 0 there is a continuous k-coloring δ : K → [k] such that π{α ̸= δ} ≤ ε. Proof. By Lusin’s Theorem, there is a continuous function β on K such that T = {x ∈ K : α = β} has measure at least 1 − ε. Open-closed sets in K separate any two points, hence by the Stone–Weierstrass Theorem, there is a stepfunction γ whose steps are open-closed (i.e., γ is continuous), and |β(x) − γ(x)| < 1/3 for every x. If a step S of γ contains a point y ∈ T , then we fix one such point, and define δ(x) = α(y) = β(y) for all x ∈ S; else, we define δ(x) = 1. This way we get a continuous k-coloring δ. Let x ∈ T and let S be the step of γ containing x. Then α(x) = β(x), and so, for the point y ∈ S ∩ T used in the definition of δ(x) (which may or may not be x), we have 2 |α(x) − δ(x)| ≤ |β(x) − γ(x)| + |γ(x) − δ(x)| = |β(x) − γ(x)| + |γ(y) − β(y)| ≤ . 3 Since α(x) and δ(x) are integers, this implies that α(x) = δ(x) for x ∈ T . The second lemma we need is similar to Proposition 19.10. It shows that a uniformly bounded number of k-colorings can approximate all k-colorings (in the sense of neighborhood statistics) of an arbitrarily large graph. Lemma 19.18. For every k, r ≥ 1 and ε > 0 there is an integer M = M (k, r, ε) ≥ 1 such that every graph G ∈ G has M k-colorings α1 , . . . , αM such that for every kcoloring β of G there is an i (1 ≤ i ≤ M ) with ( ) r δ⊙ (G, β), (G, αi ) ≤ ε. Of course, M depends on D too, but this is tacitly assumed to be a constant in all our discussions in this part.
19.2. LOCAL-GLOBAL CONVERGENCE
363
Proof. Let {F1 , . . . , FM } be any maximal family of k-colored graphs in G such r that δ⊙ (Fi , Fj ) > ε/2 for all 1 ≤ i < j ≤ M . Since the distributions ρG,r on k-colored r-balls B belong to a bounded set in a finite dimensional space, such a family is finite. Let(G be any )finite graph. For every i ≤ m, select a k-coloring αi of G such r that δ⊙ (G, αi ), Fi ≤ ε/2, if such a k-coloring exists (call such an i relevant); else, let αi be an arbitrary k-coloring of G. We claim that the k-colorings αi constructed this way have the required property. ( ) r For every k-coloring β of G, there is an i ≤ M such that δ⊙ (G, β), Fi ≤ ε/2, by the maximality of the family {F1 , . . . , FM }. Then clearly this i is relevant, and so for the corresponding αi , we have ( ) ( ) ( ) ε ε r r r δ⊙ (G, β), (G, αi ) ≤ δ⊙ (G, β), Fi + δ⊙ Fi , (G, αi ) ≤ + = ε. 2 2
Proof of Theorem 19.16. We apply Lemma 19.17 with ε = 2−r , and denote M (k, r, 2−r ) by M (k, r). We fix a set of M (k, r) k-colorings as in Lemma 19.17 for every graph G ∈ G, and call them its representative k-colorings. ∏∞ Consider the product space K = k,r=1 [k]M (k,r) ; this is compact and totally disconnected. We start with constructing a decoration χ = χG : V (G) → K for every G ∈ G. Given a node v ∈ V (G), we consider the representative k-colorings α1 , . . . , αM (k,r) of G, and concatenate the sequences (α1 (v), . . . , αM (k,r) (v)) for k, r = 1, 2, . . . to get χ(v). Using the decoration χG and the projection map φk,r : K → [k]M (k,r) , we can manufacture many k-colorings of G as β = ψ ◦ φk,r ◦ χ, where ψ : [k]M (k,r) → [k] is any map. We call these k-colorings “special”. It follows from the construction of χ that the representative k-colorings of G are special. Hence for every graph G, every k, r ≥ 1, and every k-coloring α of)V (G), there is a special k-coloring β close ( to α, in the sense that δ⊙ (G, α), (G, β) ≤ 2−r . The graphing HK we construct is similar to the “Graph of Weighted Graphs” + H introduced in Section 18.3.3, but instead [0, 1], we use weights from K. We construct probability measures on HK to get representations of finite graphs and then, representations of the limit. With the decoration χG , and any choice of a root v ∈ V (G), the triple (G, v, χG ) is a point of HK . The map τG : v 7→ (G, v, χG ) defines an embedding G → HK onto a connected component of HK (the fact that this map is injective is clear, since for any two nodes u, v ∈ V (G) one of the kcolorings in Lemma 19.18 must distinguish ( ) them once r is large enough). Let ζG be the uniform distribution on τG V (G) . Since G is finite, this distribution is involution-invariant on HK . Let (Gn ) be a locally-globally convergent graph sequence. By Prokhorov’s Theorem (see Appendix A.3.3), we can replace our graph sequence by a subsequence such that the distributions ζGn converge weakly to a distribution ζ on HK . Since every ζGn is involution-invariant, so is ζ, and hence G = (HK , ζ) is a graphing. We claim that Gn → G in the local-global sense. To prove this convergence, we need the following auxiliary fact. Claim 19.19. Let β be a continuous (k-coloring of G, )and let βn = β ◦ τGn be the k-coloring it induces on Gn . Then δ⊙ (Gn , βn ), (G, β) → 0 as n → ∞.
364
19. CONVERGENCE OF BOUNDED DEGREE GRAPHS
To prove this, fix r ≥ 1, and express the frequency of a k-colored r-ball B0 in (G, β) as an integral: ∫ ( ) ρG,β,r (B0 ) = 1 BG,β,r (x) ∼ = B0 dζ(x). HK
By the definition of βn , we have a similar expression for the frequency of B0 in (Gn , βn ): ∫ ( ) ρGn ,βn ,r (B0 ) = 1 BG,β,r (x) ∼ = B0 dζGn (x). HK
The main observation we need is that the integrand is continuous. Indeed, suppose that Hn → H in the topology of HK , where Hn , H ∈ HK are rooted K-decorated countable graphs. Then for a sufficiently large n, the balls BHn ,r and BH,r are isomorphic, ( and) moreover, there is an isomorphism σn : BH,r → BHn ,r such that(χHn σ)n (x) → χH (x) for every x ∈ V (BH,r ). This means that χH (x) and χ(Hn σn (x) agree in more and more coordinates as n grows, which implies that ) β σn (x) → ( )β(x), since β is continuous. Since β has finite ( range, this implies ) ∼ that β σ (x) = β(x) if n is large enough. But then 1 B (H ) B = = n G,β,r n 0 ( ) ∼ 1 BG,β,r (H) = B0 if n is large enough, which proves that the integrand is continuous. Hence it follows by the weak convergence ζGn → ζ that ∫ ∫ ∼ 1(BG,β,r = B0 ) dζGn −→ 1(BG,β,r ∼ = B0 ) dζ, HK
HK
which proves that ρGn ,βn ,r (B0 ) → ρG,β,r (B0 ) for every k-colored r-ball B0 . This proves the claim. Let us return to the proof of the local-global convergence Gn → G. By the definition of the nondeterministic sampling distance, we have to verify two things for every r, k ≥ 1: every k-coloring of Gn can be “matched” by a k-coloring of G so that the distributions of r-neighborhoods are close, and vice versa. Let ε > 0; we may assume that ε ≥ 2−r , since larger neighborhoods are more difficult to match. First, let α be a Borel k-coloring of G. Then by Lemma 19.17, there is another K Borel k-coloring β such that β is continuous in the ( topology of )H and α = β on −r r a set of measure at least 1 − ε(2D) . Then δ⊙ (G, α), (G, β) ≤ ε/2 by (19.4). For every n, the k-coloring β gives a k-coloring ( βn of the nodes ) of Gn , under the embedding τGn . By Claim 19.19, we have δ (G , β ), (G, β) ≤ ε/2 if n is large ⊙ n n ( ) r enough. This implies that δ⊙ (Gn , βn ), (G, α) ≤ ε. (r,k) Second, let n be large enough so that for all m ≥ n, we have δ⊙ (Gn , Gm ) ≤ of Gn . Then ε/3, and let αn be a k-coloring ( ) for every m ≥ n there is a k-coloring r αm of Gm such that δ⊙ (Gn , αn ), (Gm , αm ) ≤ ε/3. Furthermore, there is a special M (k,r) k-coloring βm → )[k]) ( = ψm ◦ φk,r ◦ χGm of ) Gm (with an appropriate ( ψm : [k] r r such that δ⊙ (Gm , αm ), (Gm , βm ) ≤ ε/3. It follows that δ⊙ (Gn , αn ), (Gm , βm ) ≤ 2ε/3. We can select an infinite subsequence such that ψm = ψ is independent of m, so that βm (v) depends only on the decoration χGm (v) of the node v ∈ V (Gm ). K We can use the( same map ) ψ to get an k-coloring of G: we color every x ∈ H with β(x) = ψ φr (χ(x)) , where χ(x) is the decoration of root(x). This coloring
19.2. LOCAL-GLOBAL CONVERGENCE
365
with βm (v).) Claim (19.19) implies that is continuous, and on ( ) τGm (v) it coincides ( r r δ⊙ (Gm , βm ), (G, β) → 0. Hence δ⊙ (Gn , αn ), (G, β) ≤ ε if n is large enough. Exercise 19.20. Let F1 and F2 be two finite graphs and let GF1 and GF2 denote nd nd the associated graphings. Prove that δ⊙ (F1 , F2 ) = δ⊙ (GF1 , GF2 ). Exercise 19.21. For an (uncolored) graph G, let Qr,k (G) denote the set of all neighborhood distributions ρG∗ ,r , where G∗ is a k-colored version of G. Prove that ( ) (r,k) Qr,k (G), Qr,k (G′ ) . δ⊙ (G, G′ ) = dHaus var Exercise 19.22. (a) Let (Gn ) be a locally-globally convergent graph sequence. Prove that the numerical sequences α(Gn )/v(Gn ) and Maxcut(Gn )/v(Gn ) are convergent. (b) Show by an example that this does not hold for every locally convergent sequence.
CHAPTER 20
Right convergence of bounded degree graphs 20.1. Random homomorphisms to the right Homomorphisms from a large bounded-degree graph G into a small weighted graph H are the bread and butter of statistical physics, as we have illustrated in the Introduction (Chapter 2.2). What happens if we go to the limit with G through bounded degree graphs? Does it make sense to talk about a random homomorphism from a countable graph into a weighted graph H? Or from a graphing? It is natural that statistical physicists have worked out a theory that is able to answer these questions. In this section we reproduce some of these results. We will need these, among others, in the next section, where we discuss right-convergence. To start with a trivial example, let G be a countably infinite graph with bounded degree, and let H be the looped complete graph Kq◦ . Then we can map the nodes of G independently, which defines a perfectly fine probability distribution on maps V (G) → V (H). Unfortunately, if we delete any edge from Kq◦ , then the probability that a random map V (G) → [q] remains a homomorphism is 0. So we could not define a random homomorphism G → H by taking a random map V (G) → V (H) and condition on its being a homomorphism. It turns out that random homomorphisms from countable graphs into weighted graphs can be defined in some cases: when the maximum degree of G is small and the edgeweights of H are close to 1. (We will not attempt to define a random homomorphism from a graphing.) It turns out that the construction for a random homomorphism G → H for infinite graphs G is made possible by another important phenomenon, this time for finite graphs. In its simplest version, let u and v be two nodes of G that are far from each other, and consider a random homomorphism G → H. Are the images of u and v essentially independently distributed? This is not always so; for example, if G is a connected bipartite graph, then there are two homomorphisms G → K2 , and the image of one node determines the images of all of the others. We will start with showing that under similar conditions as we mentioned above (the maximum degree of G is small and the edgeweights of H are close to 1), the images of distant nodes will be essentially independent. This important result, called the Dobrushin Uniqueness Theorem, will be stated and proved first. There is of course a lot more in the literature about this theorem and its applications, see e.g. Georgii [1988] or Simon [1993]. (We have to postpone the explanation of the word “uniqueness” to the end of the next section.) Since we have graphons at hand, we can replace H by a graphon W and get more general results with only a small amount of additional hassle. 20.1.1. Homomorphisms and Markov chains. We start with defining random homomorphisms from a finite bounded degree into a weighted graph and into 367
368
20. RIGHT CONVERGENCE OF BOUNDED DEGREE GRAPHS
graphon. Let G = (V, E) be a simple graph and let H be a weighted graph with nonnegative edgeweights. We have considered random maps φ : V (H) → V where the probability of φ was proportional to αφ . It is also quite natural to bias these with the product of the edgeweights. In other words, let the probability of φ be (20.1)
πG,H (φ) = αφ
homφ (G, H) . hom(G, H)
In the special case when H is a looped-simple unweighted graph, this is the uniform distribution on the set Hom(G, H). Example 20.1 (Ising model). Recall the example from the Introduction (Section 2.2). There is a very large graph G (most often, a grid) whose nodes are the atoms and whose edges are bonds between these atoms. There is a small graph H, whose nodes represent the possible states of an atom. (In the case of the Ising model, H has two nodes only, representing the spins “UP” and “DOWN”.) The nodeweights αi = e−hi represent the influence of an external field on an atom in state i, and the edgeweights βij = e−Jij represent the interaction energy between two adjacent atoms in states i and j (we ignore the dependence on the temperature for this discussion). A possible configuration is a map σ : V (G) → V (H), and its energy is ∑ ∑ H(σ) = − hσ(u) − Jσ(u),σ(v) . u
uv∈E(G)
In the introduction we were focusing on the partition function Z of the system, which turned out to be hom(G, H). But the exponential of the energy e−H(σ) = ασ homσ (G, H) is also very important, because the system will be in state σ with probability e−H(σ) ασ homσ (G, H) = = πG,H (σ). Z hom(G, H) So this distribution on homomorphisms introduced above expresses the fundamental physical state of a material. It is not hard to generalize these constructions to the case when we replace H by a graphon W ̸= 0 on [0, 1] (and of course the formulas become simpler). We will call maps V → [0, 1], “weightings”. For a measurable set X ⊆ [0, 1]V , we define ∫ tx (G, W ) dx (20.2) πG,W (X) = X . t(G, W ) We call a random map drawn from the distribution πG,W a random W -weighting of G. We could of course replace W by a weakly isomorphic graphon on some other probability space, and this could be more natural in some cases (think of the generalization of the Ising model, where the spins can be arbitrarily unit vectors in R3 ). Let S ⊆ V , X ⊆ [0, 1]S and suppose that we fix a partial weighting y ∈ [0, 1]V \S . We can define a kind of conditional distribution ∫ ty,x (G, W ) dx (20.3) πy (X) = X ty (G, W )
20.1. RANDOM HOMOMORPHISMS TO THE RIGHT
369
(The condition x|S = y may have probability 0, but the formula works.) In the special case when S = V \ {v}, the distribution πy can be identified with a distribution on [0, 1], which we denote by πy,v . It will be important to notice that in this case the distribution πy is determined by the restriction of y to NG (v). Is there a more tangible way of defining this distribution? A general technique of generating random elements of complicated distributions and studying their properties is to construct a Markov chain with the given stationary distribution. In this case, there is a rather simple Markov chain M on weightings in [0, 1]V with this property. (In the special case when W = WKq , this will specialize to the “heat-bath” chain, or “Glauber dynamics”, on q-colorings of G.) One step of this Markov chain is described as follows: Given a weighting x, we select a uniform random node v ∈ V (which we call the pivot node) and reweight it from the distribution πx,v . All other nodeweights remain unchanged. It is not hard to check that πG,W is a stationary distribution of this Markov chain. Let us fix a set U ⊆ V and its complement Z = V \ U . We can modify the Markov chain M by selecting the pivot node v from Z only. This modified Markov chain preserves the weighting of U ; if we restrict it to the extensions of a partial weighting a ∈ [0, 1]U , then we get a Markov chain Ma , whose stationary distribution is πa . Next, we define a Markov chain M2 on pairs (x, y) ∈ [0, 1]V × [0, 1]V . Given (x, y), we generate a random pivot node v ∈ Z and modify both x and y according to M, separately but not independently: using the same pivot node, we generate a random weight x from the distribution πx,v , and a random weight y from the distribution πy,v , and couple x and y optimally, so that P(x ̸= y) = dtv (πx,v , πy,v ). We change the weight of v in x to x, and in y to y. Note that for fixed a, b ∈ [0, 1]U , the set of pairs of weightings (x, y) with x|U = a and y|U = b is invariant. Let Ma,b denote the Markov chain restricted to such pairs. The stationary distribution of this Markov chain is difficult to construct directly, but at least it exists: Lemma 20.2. The Markov chain Ma,b has a stationary distribution with marginals πa and πb . This is trivial if Ma,b has a finite number of states (which happens if W is a stepfunction, i.e., we are studying homomorphisms into a finite weighted graph). For the general case, the proof follows by more advanced arguments in probability theory, and is not given here (see Lov´asz [Notes]). These Markov chains (especially the simplest chain M) are quite important in simulations in statistical physics and also in theoretical studies. A lot of work has been done on their mixing times and other properties. For us, however, the main consequence of their introduction will be the existence of the stationary distribution of Ma,b . 20.1.2. Correlation decay. Our next goal is to state and prove the fact, mentioned above, that πG,W has no long-rage interaction: under appropriate conditions, the weights of two distant nodes in a random W -weighting from πG,W are essentially independent. We start with an easy observation (the verification is left to the reader as an exercise). Proposition 20.3. Let G = (V, E) be a simple graph, and let W be a graphon of rank 1. Then πG,W is a product measure on [0, 1]V . In other words, if x is a
370
20. RIGHT CONVERGENCE OF BOUNDED DEGREE GRAPHS
random W -weighting of G, then the weights xu (u ∈ V (G)) are independent (as random variables). The Dobrushin Uniqueness Theorem, in its combinatorial form, will imply that if the adjacency matrix of H is close to a rank-1 matrix, then there is almost no correlation between xu and xv (where x is a random W -weighting of G), provided the degrees of G are small and the distance of u and v is large. In fact, the theorem is stronger: it implies that conditioning on the weights of all nodes far from a given set U ⊆ V has very little influence on the weighting of U . To state this result, we need the following parameter of graphons. Let r ≤ D, and consider the star Sr+1 on {0, 1 . . . r} with center 0. Let us define the Dobrushin value of W as (20.4)
dob(W ) = sup sup dtv (πx , πy ), r≤D x,y
where x, y ∈ [0, 1][r] range through weightings of the leaves of Sr+1 that differ only for a single node u ∈ [r]. If dob(W ) is small, then changing the weight of a neighbor of a node has little influence on the weight of the node. Changing the weight of neighbors one by one, we get by induction that for any graph G ∈ G, node v ∈ V (G) and x, y ∈ [0, 1]V , we have (20.5) dtv (πx,v , πy,v ) ≤ dob(W ) {u ∈ N (v) : xu ̸= yu } . Theorem 20.4. Let G = (V, E) be a (finite) graph with all degrees bounded by D, and let W be a graphon. Then for any partition V = Z ∪ U and any two maps a, b ∈ [0, 1]U , the distributions πa and πb have a coupling κ such that for every node v ∈ Z and every pair (x, y) of random W -weightings from the distribution κ, we have P(xv ̸= yv ) ≤ (dob(W )D)d(v,U ) , where d(v, U ) denotes the distance of v from U in G. What is important in this theorem is that it gives an exponentially decaying correlation between the weight of v and the weights of nodes far away, provided dob(W ) < 1/D. Proof. We assume that dob(W ) < 1/D (else, there is nothing to prove). Let κ be the stationary distribution of the Markov chain Ma,b with marginals πa and πb . So κ is a coupling of these distributions. Let x, y ∈ [0, 1]V , and let (x′ , y ′ ) be obtained from (x, y) by making one step of Ma,b , using a random pivot node v ∈ Z. Let n = |Z|. Then for any node w ∈ Z, (20.6)
′ P(x′w ̸= yw )=
n−1 1 ′ ′ P(x′w ̸= yw | v ̸= w) + P(x′w ̸= yw | v = w). n n
′ Here P(x′w ̸= yw | v ̸= w) = 1(xw ̸= yw ) (since nothing changes at w under this condition), and ′ P(x′w ̸= yw | v = w) = P(i ̸= j | v = w) = dtv (πx,w , πy,w ).
Substituting in (20.6), we get (20.7)
′ P(x′w ̸= yw )=
1 n−1 1(xw ̸= yw ) + dtv (πx,w , πy,w ). n n
20.1. RANDOM HOMOMORPHISMS TO THE RIGHT
371
Now let (x, y) be a random pair from κ, and average (20.7) over x and y, to get ) n−1 1 ( P(xw ̸= yw ) + E dtv (πx,w , πy,w ) . n n By the definition of stationary distribution, (x′ , y′ ) has the same distribution as ′ (x, y), and hence P(x′w ̸= yw ) = P(xw ̸= yw ). Substituting in (20.8), we get ( ) (20.9) P(xw ̸= yw ) = E dtv (πx,w , πy,w ) . (20.8)
′ P(x′w ̸= yw )=
So far, we have not used the Dobrushin parameter dob(W ). By (20.5), we get ∑ ( ) (20.10) E dtv (πx,w , πy,w ) ≤ dob(W ) P(xu ̸= yu ). u∈N (w)
Define f (u) = P(xu ̸= yu ). We have f (u) ∈ {0, 1} if u ∈ U , and (20.9) and (20.10) imply that ∑ (20.11) f (w) ≤ dob(W ) f (u) u∈N (w)
holds for all w ∈ Z. Inequality (20.11) says that the function f is strictly subharmonic at the nodes of Z. It is easy to derive from this fact an estimate on f . Let us start a random walk (v 0 = v, v 1 , . . . ) on G from v ∈ Z, and let T be the (random) time when this random walk hits U (if the connected component of v does not intersect U , then f = 0 on this connected component and the conclusion below is trivial). Consider the random variables Xt = f (v t )(dob(W )D)t . It follows from (20.11) that these form a submartingale, and hence by the Martingale Stopping Theorem A.11, we get ( ) ( ) f (v) = X 0 ≤ E(X T ) = E (dob(W )D)T f (v T ) ≤ E (dob(W )D)T . Since trivially T ≥ d(v, U ), this completes the proof.
It is important that the coupling κ constructed above is independent of the node v. This means that if we want to estimate the probability that x|S ̸= y|S for some subset S ⊆ Z, then we get the same coupling distribution κ, and so we can use the union bound: Corollary 20.5. Under the conditions of Theorem 20.4, every S ⊆ Z satisfies P(x|S ̸= y|S ) ≤ (dob(W )D)d(S,U ) |S|. Let us formulate some other consequences. First, consider proper q-colorings of G, i.e., homomorphisms G → Kq . For Sr+1 in the definition of the Dobrushin parameter, let φ and ψ be two q-colorings of the leaves that differ at node 1 only. Then πφ,0 is the uniform distribution on the set [q]\φ([r]), and πψ,0 has an analogous description. These sets have Hamming distance at most 2D, and hence their total variation distance is at most 1/(q − D). So dob(WKq ) < 1/D is satisfied if q > 2D, and we get: Corollary 20.6. Let G = (V, E) be a graph with all degrees bounded by D, and let q > 2D. Then for any U ⊆ V , any two proper q-colorings α and β of G[U ], and any v ∈ V \ U , the random extensions φ and ψ of α and β to proper q-colorings of G satisfy ( D )d(v,Z) . dtv (φ(v), ψ(v)) ≤ q−D
372
20. RIGHT CONVERGENCE OF BOUNDED DEGREE GRAPHS
We can generalize this corollary to homomorphisms into any looped-simple graph H, assuming the maximum degree ∆(H) of its complement H (among loopedsimple graphs) is small: Corollary 20.7. Let G = (V, E) be a simple graph with all degrees bounded by D, and let H be a looped-simple graph with 2D∆(H) < v(H). Then for any subset U ⊆ V , any two homomorphisms α, β : G[U ] → H, and any v ∈ V \U , the uniform random extensions φ and ψ of α and β to homomorphisms G → H, restricted to the node v, satisfy
dtv (φ(v), ψ(v)) ≤
(
)d(v,Z) D∆(H) . v(H) − D∆(H)
20.1.3. The Dobrushin value. Which graphons W have small Dobrushin value? This property is related to the approximability of W by rank-1 graphons (see Exercises 20.10 and 20.11, and also Proposition 20.3), but for us, the case when W is close to the special rank-1 function 1 will be important. For a graphon W , define ∫ (20.12)
1
∆(W ) = sup
W (x, y) dy.
x∈[0,1]
0
(the “maximum degree”; note that δ(WH ) = ∆(H)/v(H) if H is a looped-simple graph). The quantity provides a useful upper bound on the Dobrushin value. Lemma 20.8. Every graphon W satisfies
dob(W ) ≤
∆(W ) . 1 − D∆(W )
In particular, the Dobrushin condition is satisfied if ∆(W ) < 1/(2D). Proof. The proof is just computation (although a bit tedious). Let z, w ∈ [0, 1]r be two weightings of the leaves of Sr+1 that differ only at node 1. Let
g(x) =
r ∏ i=2
W (x, zi ) =
r ∏
∫ W (x, wi ) and s(x) =
1
g(y)W (x, y) dy. 0
i=2
The density functions of the distributions πz,0 and πw,0 are g(x)W (x, z1 )/s(z1 ) and g(x)W (x, w1 )/s(w1 ), respectively, and hence
(20.13)
1 dtv (πz , πw ) = 2
∫ 0
1
W (x, z ) W (x, w ) 1 1 g(x) − dx. s(z1 ) s(w1 )
20.1. RANDOM HOMOMORPHISMS TO THE RIGHT
373
We may assume without loss of generality that s(z1 ) ≥ s(w1 ), then ∫ W (x, z ) W (x, w ) 1 1 1 1 dtv (πz , πw ) = g(x) − dx 2 0 s(z1 ) s(w1 ) ∫ W (x, z ) W (x, z ) W (x, z ) − W (x, w ) 1 1 1 1 1 1 g(x) − + = dx 2 0 s(z1 ) s(w1 ) s(w1 ) ( ∫ ) 1 1 W (x, z1 ) W (x, z1 ) W (x, z1 ) − W (x, w1 ) g(x) ≤ − + dx 2 0 s(z1 ) s(w1 ) s(w1 ) ( ) ∫ 1 1 W (x, z1 ) W (x, z1 ) W (x, z1 ) − W (x, w1 ) − + = g(x) dx 2 0 s(w1 ) s(z1 ) s(w1 ) ∫ ( W (x, z ) W (x, w ) W (x, z ) − W (x, w ) ) 1 1 1 1 1 1 = g(x) − + dx 2 0 s(w1 ) s(w1 ) s(w1 ) ∫ 1 1 W (x, z1 ) − W (x, w1 ) + |W (x, w1 ) − W (x, z1 )| = g(x) dx 2 0 s(w1 ) ∫ ∫ (we have used here that g(x)W (x, z1 )/s(z1 ) = 1 = g(x)W (x, w1 )/s(w1 )). It is easy to check that W (x, z1 ) − W (x, w1 ) + |W (x, w1 ) − W (x, z1 )| ≤ 2 − 2W (x, w1 ), and using the trivial fact that g(x) ≤ 1, we get ∫ 1 ∫ 1 1 ∆(W ) 1 − W (x, w1 ) dtv (πx , πy ) ≤ dx ≤ . g(x) (1 − W (x, w1 )) dx ≤ s(w1 ) s(w1 ) 0 s(w1 ) 0 To estimate the denominator, note that r r ∏ ∑ g(x)W (x, w1 ) = W (x, wi ) ≥ 1 − (1 − W (x, wi )), i=1
and so
∫
i=1
∫
1
g(x)W (x, w1 ) dx ≥
s(w1 ) = 0
0
1
r ∑ ( ) 1− 1 − W (x, zi ) dx ≥ 1 − D∆(W ). i=1
This proves the lemma.
Exercise 20.9. For a random 3-coloring of the path Pn , determine the correlation between the colors of the endnodes. Does it decay exponentially? Exercise 20.10. Prove that for every graphon W there is a kernel of rank 1 such that ∥U − W ∥1 ≤ dob(W ). Exercise 20.11. Let U > 0 be a kernel of rank 1, and let W be an arbitrary kernel. Prove that dob(W ) ≤ 4∥(W/U ) − 1∥∞ .
20.1.4. Random homomorphisms from infinite graphs into graphons. Our goal is to define a random homomorphism G → W , where G is a countable graph with degrees bounded by D, and W is graphon. First, some technicalities: the W -weightings of V (G) form the product space [0, 1]V (G) , which we endow with the product sigma-algebra A. A random homomorphism will be defined by a probability measure on A. To specify such a measure, it suffices to specify its values on cylinder sets obtained by restricting the weight of a finite number of nodes of G to given Borel sets. In other words, we can specify a distribution π on W -weightings of G by specifying the distribution of its restriction π|S to every finite set S ⊆ V (G). Of course, these restrictions must satisfy appropriate consistency conditions: if S ⊆ T , then (πT )|S = πS . Once we have a family
374
20. RIGHT CONVERGENCE OF BOUNDED DEGREE GRAPHS
(πS ) of distributions satisfying these consistency relations, then the Extension Theorem of Kolmogorov gives us a probability distribution π on all W -weightings such that π|S = πS for all finite sets S ⊆ V (G). So far, this is quite simple. There are many ways to specify such a family (πS ) of distributions. However, we would like other conditions to be satisfied. Let us formulate two: • Markov property. If G1 and G2 are two finite k-labeled graphs, φ is a random W -weighting of G1 G2 , and we condition on φ|[k] , then φ|V (G1 ) and φ|V (G2 ) become independent. This is just another way to express the product formula (5.53). This property can be generalized to infinite graphs. Of course, we have to exercise some care, since G1 and G2 may be infinite. Let S ⊆ V (G) be a finite set and suppose that G \ S is the disjoint union of two graphs G1 and G2 . Let z be a W -weighting of S, and let x denote a random W -weighting of V \ S from the distribution obtained by conditioning on x|S = α. We require that the random weightings x|V (G1 ) and x|V (G2 ) be independent. We say that the distribution of x has the Markov property, if this condition holds for every finite subset S ⊆ V (G) and every W -weighting z of S. • Locality. For a finite set S ⊆ V (G), we would like to get a good idea of the distribution πS by looking at a sufficiently large neighborhood of S. Let B(S, r) = {v ∈ V (G) : d(v, S) ≤ r} be the r-neighborhood of S, and let xr denote a random W -weighting of G[B(S, r)]. Then we want that xr |S → x|S in distribution as r → ∞. We call the distribution of x local if this holds. These conditions are not too strong, as the following classical theorem shows (see [1988] and [1993] for slightly different statements of this fact). Theorem 20.12. Let G be a countable graph with degrees bounded by D, and let W be a graphon such that dob(W ) < 1/D. Then there is a unique local probability distribution πG,W on W -weightings of G with the Markov property. Proof. Let S ⊆ V (G) be a finite set, and let xr be a random W -weighting of G[B(S, r)]. Claim 20.13. The distribution of xr |S tends to a limit as r → ∞. We show that these distributions form a Cauchy-sequence in the total variation distance. Let ε > 0. Since dob(W ) < 1/D, we can choose r large enough so that (Ddob(W ))r ≤ ε/|S|. Let m, n > r, we claim that the distributions of xm |S and xn |S are ε-close in total variation distance. Let zn be the restriction of xn to B(S, n) \ B(S, r), and let x′n be the random weighting of G[B(S, n)], obtained by conditioning on zn . By the Markov property (we are using it for finite graphs here!), x′n has the same distribution as xn . We define zm and x′m analogously. Now we fix any two weightings zn of B(S, n) \ B(S, r) and zm of B(S, m) \ B(S, r), and let yn and ym be obtained by conditioning xn and xm on these partial weightings. By Theorem 20.4, yn and ym can be coupled so that P(yn (v) ̸= ym (v)) ≤ ε/|S| for every v ∈ S. This implies that dtv (yn |S , ym |S ) ≤ ε. Since this holds for fixed zn and zm , it also holds if they are random restrictions of xn and xm , so it holds for x′n and x′m . Since these weightings have the same distribution as xn and xm , the claim follows.
20.2. CONVERGENCE FROM THE RIGHT
375
Now we are able to define the distribution on W -weightings. For a finite set S ⊆ V (G), let πS be the limit of the distributions of xr |S as r → ∞. It is easy to check (using similar arguments as in the proof of Claim 20.13 above), that the family (πS ) of distributions is consistent, and the distribution πG,W they define has the Markov property. Uniqueness follows immediately from locality. A probability distribution on W -weightings of G is called a Gibbs state if it is invariant under the Markov chain M of local re-weightings (as used in the proof of theorem 20.4 in the finite case). It can be proved that under the condition that dob(W ) < 1/(2D), the Gibbs state is unique. Remark 20.14. In a sense, the construction of a random homomorphism can be extended to graphings. The method is similar to the Bernoulli lift of a graphing (Section 18.5). Given a graphing G and a graphon W on [0, 1] such that dob(W ) < 1/(2D), we define a graphing G[W ] on the Graph of Weighted Graphs H+ . To describe the probability distribution on G+ , we generate a random element from it as follows: pick a point x ∈ V (G) and generate a random W -weighting of Gx as described above. If W ≡ 1, we get the Bernoulli lift. We cannot randomly map all points of a graphing into [0, 1] in any reasonable way; this is impossible even if the graphing has no edges. But if we select any countable subset, this can be mapped, and the graphing G[W ] contains the necessary information. I don’t know of any applications of this construction, but I like the fact that our two basic limit objects, graphings and graphons, can be combined this way. 20.2. Convergence from the right While the theory of convergent sequences of bounded degree graphs lacks some of the key facts and constructions that apply in the dense case (most notably a good notion of distance), it is nicer in at least one respect: convergence of a graph sequence can be characterized by convergence of (appropriately normalized) homomorphism numbers into certain fixed graphs so we don’t have to switch to maximization of multicuts as in the dense case. This result is due to Borgs, Chayes, Kahn and Lov´ asz [2012]. We note that the necessity of the right-convergence condition follows only for target graphs that satisfy the Dobrushin condition (but under this condition, it follows more generally for homomorphism numbers into graphons). To state this result, let us define, for a simple graph G with bounded degrees and graphon W , the (sparse, normalized) homomorphism entropy log t(G, W ) , v(G) In the case when W = WH for some weighted graph H on q nodes, we write ent∗ (G, H). In this special case, we could replace t(G, H) by hom(G, H) in this definition: this would mean simply adding log αH to the value, so it is a matter of taste which version one uses in the definition. To see the meaning of ent∗ (G, W ), consider the case when W = WH for some simple graph H. Then log hom(G, H)/v(G) expresses the freedom (entropy) we have in choosing the image of a node v ∈ V (G) in a homomorphism G → H, and ent∗ (G, H) (which is always nonpositive) expresses the loss of entropy per node due to taking homomorphisms instead of all maps. The main result in this section is the following. ent∗ (G, W ) =
376
20. RIGHT CONVERGENCE OF BOUNDED DEGREE GRAPHS
Theorem 20.15. For any sequence (Gn ) of graphs in G, the following are equivalent: (i) (Gn ) is locally convergent; (ii) for every graphon W with dob(W ) ≤ 1/D, the sequence ent∗ (Gn , W ) is convergent; (iii) there is an ε > 0 such that for every looped-simple graph H with ∆(H) ≤ εv(H) the sequence ent∗ (Gn , H) is convergent. The equivalence of conditions (ii) and (iii) is analogous to the equivalence of conditions (ii) and (iii) in Theorem 12.20, and similarly as there, we could replace them by any condition “inbetween”, like weighted graphs satisfying the Dobrushin condition. In the special case when H = Kq , we have ∆(Kq ) = 1, and hom(G, Kq ) is the number of q-colorings of G. So it follows that if (Gn ) is convergent and q > 2D, then the number of q-colorings grows as cv(Gn ) for some c > 1. It is easy to see that some condition on q is needed: for example, if Gn is the n-cycle and q = 2, then ent∗ (Gn , K2 ) oscillates between −∞ and ≈ 0 as a function of n. Lemma 20.8 says that ∆(W ) < 1/(2D) is sufficient for (ii) to apply. This condition could not be relaxed by more than a constant factor, as the following example shows. Example 20.16. Let Gn be a random D-regular graph on 2n nodes, and G′n be a random bipartite D-regular graph on 2n nodes. The interlaced sequence (G1 , G′1 , G2 , G′2 , . . . ) is locally convergent with high probability (almost all rneighborhoods are D-regular trees if r is fixed and n is large enough). Let H be obtained from K2◦ by weighting the non-loop edge by 1 and the loops by 2c . Inequality (5.33) can be generalized to give the bounds c maxcut′ (G) ≤ ent∗ (G, H) ≤ c maxcut′ (G) + 1. (Here maxcut′ (G) = Maxcut(G)/v(G) is normalized differently from the normalization in (5.33).) The maximum cut in G′n has Dn/2 edges, but the maximum cut in Gn has at most Dn/3 edges with high probability (see Bertoni, Campadelli, Posenato [1997] for a sharp estimate). Hence ent∗ (G′n , H) =
cD , 2
but
ent∗ (Gn , H) ≤ 1 +
cD 3
If cD/2 − cD/3 = cD/6 > 1, then the sequence (ent∗ (G1 ), ent∗ (G′1 ), ent∗ (G2 ),ent∗ (G′2 ), . . . ) cannot be convergent with high probability. So assuming ∆(W ) ≤ 7/D would not be enough in (ii). While Theorem 20.15 sounds similar to the results in Chapter 12 (in particular Theorem 12.20), it is both more and less than that theorem. We get a characterization of convergence in terms of left and right homomorphisms, but no analogue of the characterization as a Cauchy sequence in the cut metric. Also, convergence is not established for all soft-core graphs H, just for those close to a complete graph. On the other hand, the proof below says more, since it provides explicit formulas relating left and right homomorphism numbers. Furthermore, homomorphism densities into graphons are considered, not just weighted graphs; recall that the corresponding extension of Theorem 12.20 to graphons if false (Remark 12.22).
20.2. CONVERGENCE FROM THE RIGHT
377
Supposing that we have a convergent sequence (Gn ) tending to an involutioninvariant distribution σ (or to a graphing G), what is the limit of the homomorphism entropies? The answer is not trivial, since there is no good way to “count” homomorphisms of an infinite graph (or of a graphing) into a weighted graph H. Even for the number of q-colorings (q > 2D), and for a sequence of D-regular graphs with girth tending to ∞, where the Benjamini–Schramm limit is a single (infinite) D-regular rooted tree, the limiting value is nontrivial to determine. A natural guess would be that starting at the root of the infinite tree, and working our way out, we have q choices for the color of the root and q − 1 choices for every other node, which suggests an entropy of ent∗ (Gn , Kq ) → log(1 − 1/q). But this is not the right answer, which was determined by Bandyopadhyay and Gamarnik [2008]: ( D 1) (20.14) ent∗ (Gn , Kq ) → log 1 − . 2 q To motivate the following description of the limiting homomorphism entropy in the general case, consider a finite graph G. For any ordering (v1 , . . . , vn ) of its nodes, we consider the graphs Gi = G[v1 , . . . , vi ]. Then t(Gi , W ) 1∑ log . n i=1 t(Gi−1 , W ) n
(20.15)
ent∗ (G, W ) =
If W = WH for some looped-simple graph H, then the fraction inside the logarithm is the conditional probability that a random map V (Gi ) → V (H) is a homomorphism, given that its restriction to Gi−1 is a homomorphism. In general, it can be expressed as ( ∏ ) t(Gi , W ) = Ey Ex W (x, yu ) , t(Gi−1 , W ) u∈N (v)
where x is a uniform random number in [0, 1], and y is a random W -weighting of Gi−1 . If we try to extend this to infinite graphs, the formula makes sense, but as Example 20.16 shows, it may give the wrong result. The trick is to average over all orderings of V (G). We generate an ordering by a random map τ : V (G) → [0, 1]. We denote the set of nodes u ∈ V (G) with τ (u) < τ (v) by Vτ (v), and set Nτ (v) = N (v) ∩ Vτ (v). We can view (20.15) as another averaging (over a random node of G). Thus we get ( ∏ ) ent∗ (G, W ) = Ev Eτ log Ey Ex W (x, yu ) , u∈Nτ (v)
where x is a uniform random number in [0, 1], where y is a random W -weighting of G[Vτ (v)]. Now this formula extends to involution-invariant distributions σ. Instead of a random node v, we consider a random rooted graph (G, v) from σ. Instead of random bijection V (G) → [n], we consider a random map τ : V (G) → [0, 1]. Assuming G satisfies the Dobrushin condition, so does G[Vτ (v)], and so the random W -weighting y is well defined. So we can define ( ∏ ) (20.16) ent∗ (σ, W ) = E(G,v) Eτ log Ey Ex W (x, yu ) . u∈Nτ (v)
378
20. RIGHT CONVERGENCE OF BOUNDED DEGREE GRAPHS
Then we have the following supplement to Theorem 20.15. Supplement 20.17. Let (Gn ) be a locally convergent sequence of graphs with degrees at most D, and let σ be the involution-invariant distribution representing its limit. Let W be a graphon with dob(W ) < 1/D. Then ent∗ (Gn , W ) → ent∗ (σ, W ). Let us illustrate that the rather hairy formula (20.16) does allow us to determine the limiting values of homomorphism entropies, at least in a simple case. Example 20.18. Suppose that Gn is D-regular and the girth of Gn tends to infinity. Let H = Kq , so that hom(G, H) = chr(G, q). Then Gn tends to the involution-invariant distribution concentrated on the infinite D-regular tree (at least we don’t have to take expectation over this). Specializing (20.16), we get ( ∏ ) ent∗ (σ, W ) = Eτ log Ey Ex 1(x ̸= yu ) . u∈Nτ (v)
Here y is a random coloring with colors from [q], and x is a random color. Whatever Vτ (v) is, y assigns uniform and independent colors to the nodes in Nτ (v), since our graph is a tree. Hence for every x, ) ( q − 1 )|Nτ (v)| ( ∏ 1(x ̸= yu ) = Ey , q u∈Nτ (v)
and hence ent∗ (σ, W ) = Eτ log
(( ) q − 1 )|Nτ (v)| q
( ( 1) D 1) = Eτ (Nτ (v)) log 1 − = log 1 − . q 2 q
So we get the theorem of Bandyopadhyay and Gamarnik (20.14).
Proof of Theorem 20.15. (i)⇒(ii) Let G = (V, E) be a simple graph with degrees bounded by D. We may assume that αH = 1. We use the formula (20.15) derived above, and concentrate on the innermost expression ) ( ∏ W (x, yu ) . s(v, τ, y) = Ex u∈Nτ (v)
The Dobrushin Uniqueness Theorem 20.4 implies that we don’t change the expression by much if we restrict everything to the r-neighborhood Nr (v). To be precise, let c = Ddob(H) < 1, and define Gr = G[Nr (v)], Vτr (v) = Nr (v) ∩ Vτ (v), and let sr denote the function s defined for the graph Gr . Let z be a random W -weighting of G[Vτr (v)], then Theorem 20.4 implies that the distributions of y and z, when restricted to v and its neighbors, are closer that (D + 1)cr−1 is total variation distance. This implies that r Ez s (v, τ, z) − Ey s(v, τ, y) ≤ (D + 1)cr−1 , and hence (20.17) where (20.18)
∗ ent (G, H) − Ev Fr (v) ≤ (D + 1)cr−1 , ( ) Fr (v) = Eτ log Ez sr (v, τ, z) .
(We can take expectation over the same τ , since it induces a uniform random permutation of V (Gr ) as well as of V (G).)
20.2. CONVERGENCE FROM THE RIGHT
379
Let us note that in (20.18) Fr (v) depends only on the r-ball B = Nr (v), and we can denote it by F (B). This allows us to express Ev Fr (v) in terms of the distribution σG,r of r-neighborhoods in G. Thus (20.17) implies ∑ ∗ (20.19) σG,r (B)F (B) ≤ (D + 1)cr−1 . ent (G, H) − B∈Br
Now, let (Gn ) be a locally convergent sequence tending to an involutioninvariant distribution σ. Then (20.19) implies that ∑ lim sup ent∗ (Gn , H) ≤ σr (B)F (B) + (D + 1)cr−1 , n
B∈Br
and hence lim sup ent∗ (Gn , H) ≤ lim inf r
n
∑
σr (B)F (B).
B∈Br
A similar argument proves that lim inf n ≥ lim supr , which implies that both limits exist. (ii)⇒(iii) is trivial. (iii)⇒(i) We switch to the natural logarithm, since we are going to use analytic formulas (this only means that all formulas are multiplied by ln 2). We express the logarithm of t(G, H) as ∑ (20.20) ln t(G, H) = ℓ(G[S], H), S≤V (G)
where by M¨obius inversion, (20.21)
ℓ(G, H) =
∑
(−1)|V (G)|−|S| ln t(G[S], H).
S⊆V (G)
Using that ln t(., H) is an additive graph parameter for any fixed H, it is easy to see that ℓ(F, H) = 0 unless F is a connected graph together with isolated nodes (cf. Exercise 4.2). The term corresponding to the edgeless graph is 0, and so we can modify (20.20) so that the summation runs over connected induced subgraphs of G. Collecting terms with isomorphic graphs, we get ∑ ind(F, G) ℓ(F, H) (20.22) ent∗ (G, H) = · , v(G) aut(F ) F
where the summation ranges over all isomorphism types of connected graphs F ; but of course, only a finite number of terms are non-zero for any fixed G. So we can express the homomorphism entropies ent∗ (Gn , H) as linear combinations of the induced subgraph densities ind(F, Gn )/v(Gn ). This suggests a heuristic for the proof: We show that the system of equations (20.22) can be inverted, to express the induced subgraph densities as linear combinations of the homomorphism entropies. It follows then that if the homomorphism entropy into any given graph converges to some value, then so does the frequency of each induced subgraph. This heuristic is of course very naive: (20.22) is an infinite system of equations, and so to do anything with it we need tail bounds; furthermore, the coefficient ℓ(F, H) is defined by the hairy formula (20.21), which has all the unpleasant features one can think of: it has an exponential number of terms, these terms alternate in sign, and the terms themselves are logarithms of simpler functions.
380
20. RIGHT CONVERGENCE OF BOUNDED DEGREE GRAPHS
The identities developed in Section 5.3.1 come to rescue. We can get rid of the logarithms using Corollary 5.22. Substituting the formula for ln t(G, H) in the definition of ℓ(G, H), we get a lot of cancellation, which leads to the formula (20.23)
ℓ(F, H) =
∞ ∑ (−1)m m! m=1
∑
(−1)
∑ i
e(Ji )
J1 ,...,Jm ∈Conn(F ) ∪i V (Ji )=V (F )
k ( )∏ × cri L(J1 , . . . , Jm ) t(Jr , H). r=1
(It is not clear at this point that this is any better than (20.21), but be patient.) Next we turn to inverting the expression (20.22). Let m ≥ 1 and let {F1 , . . . , FN } be the set of all connected simple graphs with 2 ≤ v(Fi ) ≤ m. Let q > m/ε, add q − v(Fi ) ≥ 1 new isolated nodes to Fi , and take the complement to get a looped-simple graph Hi on [q] with loops added at all nodes. We weight each node of Hi by 1/q. Every node in Hi has degree at least q − m, so ∆(H i ) ≤ εq. Consider any graph G with all degrees at most D. We write (20.22) in the form (20.24)
ent∗ (G, Hj ) =
N ∑ ind(Fi , G) ℓ(Fi , Hj ) · + R(G, Hj ), v(G) aut(Fi ) i=1
where (20.25)
∑
R(G, Hj ) =
v(F )>m
ind(F, G)ℓ(F, Hj ) aut(F )v(G)
is a remainder term. We can view (20.24) as a system of N equations in the N unknowns xi = ( )N inj(Fi , G)/v(G). Let A = ℓ(Fi , Hj )/aut(Fi ) i,j=1 be the matrix of this system, and let s, R ∈ RN be defined by sj = ent∗ (G, Hj ) and Rj = R(G, Hj ), then we have AT x = s−R. Assuming that A is invertible (which we will prove momentarily), let B = (AT )−1 . Then the system can be solved: x = B(s − R), or ind(Fi , G) ∑ = Bij ent∗ (G, Hj ) + ri (G), v(G) j=1 N
(20.26) where
ri = ri (G) =
N ∑
Bij R(G, Hj )
j=1
is a remainder term. We have to show that the matrix A is invertible (at least if q is large enough) and estimate the remainder terms. We use (20.23): (20.27)
ℓ(F, Hi ) =
∞ ∑ (−1)k k=1
k!
∑
∑
(−1)
i
e(Ji )
J1 ,...,Jk ∈Conn(F ) ∪V (Ji )=V (F )
k ( )∏ × cri L(J1 , . . . , Jk ) t(Jr , Hi ). r=1
20.2. CONVERGENCE FROM THE RIGHT
381
By the construction of Hi , we have t(Jr , Hi ) = q −v(Jr ) hom(Jr , Fi ), and so k ∏
t(Jr , Hi ) = q −
∑ r
v(Jr )
r=1
k ∏
t(Jr , Fi ).
r=1
Note that for a nonzero term the exponent of q is less than −v(F ) except for k = 1 and V (J1 ) = V (F ), and that the last product does not depend on q. Hence for any simple graph F , ∑ (20.28) ℓ(F, Hi ) = q −v(F ) (−1)e(J)−1 t(J, Fi ) + O(q −v(F )−1 ). J∈Csp(F )
(Here and in what follows, the constants implied in the big-O notation may depend ( )N on m, but not on q and G). By Proposition 5.43, the matrix M = t(Fi , Fj ) i,j=1 is nonsingular. Let L be the N × N matrix with entries Lij = 1(Fi ∈ Csp(Fj )), and let P and Q denote the diagonal matrices with entries Pii = (−1)e(Fi )−1 and Qii = q v(Fi ) aut(Fi ), respectively. Clearly L, P and Q are nonsingular. By (20.28), we have QAT = LT P M + O(q −1 ), which implies that A is nonsingular if q is large enough. Furthermore, Bij = q v(Fi ) aut(Fi )((M T P L)−1 )ij + O(q v(Fj )−1 ), and so |Bij | = O(q v(Fi ) ) = O(q m ).
(20.29)
Using this, the remainder terms can be estimated as follows: ∞ ∑ ∑ ind(F, G) |R(G, Hj )| ≤ |ℓ(F, Hj )| aut(F )v(G) r=m+1 v(F )=r
=
∞ ∑
∑
r=m+1 v(F )=r
(20.30)
=
∞ ∑
ind(F, G) O(q −r ) aut(F )v(G)
2Dr O(q −r ) = O(q −m−1 ).
r=m+1
and (20.31)
ri (G) =
N ∑
Bji R(G, Hj ) = O(q m )O(q −m−1 ) = O(q −1 ).
j=1
So we have proved that in (20.26), for fixed m, the error term ri tends to 0 as q → ∞. The rest of the proof is standard analysis: Assume that ent∗ (Gn , H) → Sj (n → ∞) for every looped-simple graph H with ∆(H) ≤ ε. Consider any simple graph Fi on m nodes. Equation (20.26) implies that (20.32)
N N ind(F , G ) ∑ ∑ i n − Bji Sj ≤ |Bji ||ent∗ (Gn , Hj ) − Sj | + |ri (Gn )|. v(Gn ) j=1 j=1
Let δ > 0 be given, and choose q large enough so that |ri (Gn )| ≤ δ/2 for every n (recall that the big-O in (20.31) does not depend on G). Since ent∗ (Gn , Hj ) → Sj ,
382
20. RIGHT CONVERGENCE OF BOUNDED DEGREE GRAPHS
the first term on the right side of (20.32) is at most δ/2 if n is large enough. It follows that ind(F, Gn )/v(Gn ) is a Cauchy sequence, which means that the sequence (Gn ) is locally convergent. The proof of the Supplement is based on similar arguments and not given here in detail. The proof method used above for (iii)⇒(i) can also be used to prove a somewhat weaker version of (ii), replacing the Dobrushin condition dob(W ) < 1/D by 8D∆(W ) < 1. In fact, the expression (20.22) yields itself more directly to a proof of (i)⇒(ii) than to a proof of (b): naively, if the frequency of any induced subgraph converges to some value, then so do the homomorphism entropies. The main issue is to obtain good tail bounds, which can be done similarly as in the proof above, as long as we are satisfied with proving the convergence for very small ∆(W ); but if we want a bound that is sharp up to a constant, then we need more technical computations. We refer to the paper of Borgs, Chayes, Kahn and Lov´asz [2012] for these details. Remark 20.19. It is a natural question to ask which sequences of bounded degree graphs are right-convergent in the sense that their homomorphism entropies converge for all soft-core target graphs. Gamarnik [2012] studies this problem for sparse random graphs, but the general question is unsettled. It is also natural to ask whether local-global convergence can be characterized by any right-convergence condition. Exercise 20.20. Let G and G′ be two graphs on the same set of nodes [n], such nd that |E(G)△E(G′ )| ≤ εn. Prove that δ⊙ (G1 , G2 ) ≤ 2ε. Exercise 20.21. Let H be a weighted graph with positive edgeweights and (Gn ), a bounded degree graph sequence for which the sequence (ent∗ (Gn , H)) is convergent. Let G′n be obtained from Gn by deleting o(v(Gn )) nodes and edges. Prove that (ent∗ (G′n , H)) is convergent. Exercise 20.22. Let H be a weighted graph with at least one positive edgeweight. Prove that the sequence ent∗ (Pn Pm , H) is convergent as n, m → ∞, and the same holds for the sequence ent∗ (Cn Pm , H), provided n is restricted to even numbers. Exercise 20.23. Let H be a weighted graph whose edges with positive weight form a connected and nonbipartite graph. Prove that the sequence ent∗ (Cn Pm , H) is convergent as n, m → ∞. Exercise 20.24. Let σ be an involution-invariant measure. Show how to express s(σ) in terms of the associated Bernoulli graphing.
CHAPTER 21
On the structure of graphings 21.1. Hyperfiniteness A notion related to Følner sequences in the theory of amenable groups is “hyperfiniteness” for general graph families with bounded degree, which can be extended to graphings in a natural way. This notion was introduced (in different settings) by Kechris and Miller [2004], Elek [2007b] and Schramm [2008]. Hyperfiniteness of a graph family has a number of important consequences, like testability of many graph properties. Quoting an informal remark by Elek, hyperfinite bounded-degree graph families and graphings behave as nicely as dense graph sequences and graphons do. 21.1.1. Hyperfinite graph families. A graph G ∈ G is called (ε, k)hyperfinite (ε ∈ (0, 1), k ∈ N), if we can delete εv(G) edges and get a graph in which every connected component has at most k nodes. Let H ⊆ G be any family of finite graphs (recall that all degrees are bounded by D). We say that H is hyperfinite, if for every ε > 0 there is a k = k(ε) > 0 such that every G ∈ H is (ε, k)-hyperfinite. We could talk about deleting nodes instead of edges. Indeed, deleting one endnode of every edge in S results in even smaller components. Conversely, if deleting a set T of nodes results in a graph with small components, then deleting the set S of edges incident with any node in T leaves small components, and |S| ≤ D|T |. We will see that hyperfinite families are very well-behaved, often as well as dense graphs for analogous questions. How special are they? The examples below show that several important graph families are hyperfinite. (In fact, one has to work to construct a family that is not hyperfinite.) It is also likely that many large reallife networks can be thought of as hyperfinite, showing the potential applicability of the theory of hyperfinite families. Example 21.1 (Trees). The family of trees in G is hyperfinite. Indeed, select an endpoint r as the root in a tree T and fix an integer k ≥ 1. It is easy to see that if v(T ) > k, then there is always an edge such that the branch rooted at it has at least k/(D −1) but at most k nodes. If we delete recursively such edges, then the number of edges deleted is at most (D − 1)(v(T ) − 1)/k, and every connected component of the remaining forest has at most k nodes. So our tree is ((D − 1)/k, k)-hyperfinite. Example 21.2 (Grids). From an n×m grid G, delete the edges inside every M -th vertical and horizontal ribbon of squares (starting from the top and from the left, say). The number of edges deleted is at most m((n − 1)/M ) + n((m − 1)/M ) < 2v(G)/M , and every connected component of the remaining graph has at most M 2 nodes. So this grid is (M 2 , 2/M ) hyperfinite. 383
384
21. ON THE STRUCTURE OF GRAPHINGS
Example 21.3 (Planar graphs). More generally, the family of planar graphs with degree bounded by D is hyperfinite. Indeed, let G be such a graph on n nodes. The Lipton–Tarjan Planar Separation Theorem [1979] says that G has a √ set S of at most 3 n nodes such that every connected component of G − S has at most 2n/3 nodes. We repeat this with every connected component of the remaining graph until all components have at most K nodes. The estimation of the number of deleted nodes is somewhat tricky and it is left to the reader as Exercise 21.21. Example 21.4 (Random regular graphs). Let us generate a random D-regular graph Gn on n nodes, for every even n, by choosing one of the D-regular graphs uniformly at random. The family of graphs obtained is not hyperfinite with probability 1 if D ≥ 3. The following heuristic argument to prove this can be made precise rather easily. If Gn is (ε, k)-hyperfinite, then V (G) can be split into two sets of size between n/2 −k and n/2+ k in such a way that the number of edges between the two classes is at most εn. On the other hand, let {S1 , S2 } be a partition of [n] into two classes of size about n/2, and let Z denote the number of edges in Gn connecting the two classes. The expected number of such edges is about Dn/4. Furthermore, Z is highly concentrated around its mean, and so the probability that Z ≤ εn is o(2−n ) if ε is small enough. There are fewer than 2n such partitions of [n], so with high probability all of these have more than εn edges connecting the two classes. Example 21.5 (Expanders). We call a family E ⊆ G of graphs an expander family if there is a c > 0 such that for every graph G ∈ E and every S ⊆ V (G) with |S| ≤ v(G)/2, we have eG (S, V (G) \ S) ≥ c|S|. An infinite family of expander graphs is not hyperfinite. Indeed, T ⊆ E(G) and G−T has components G1 , . . . , Gr , and all of these have fewer than v(G)/2 nodes, then |T | =
r r ) ∑ 1∑ ( eG V (Gi ), V (G) \ V (Gi ) ≥ cv(Gi ) = cv(G). 2 i=1 i=1
It can be shown that the family of random D-regular graphs in Example 21.4 is an expander family with probability 1. The following explicit construction for an expander graphing was given (in a different context) by Margulis [1973]. Consider the space R2 /Z2 , a.k.a. the torus. Let us connect every point (x, y) to the points (x ± y, y) and (x, y ± x) (additions modulo 1; we can leave out the axes if we don’t want loops). This graph is the support of the measure preserving family consisting of the two maps (x, y) 7→ (x + y, y) and (x, y) 7→ (x, x + y), and hence it is a graphing. Furthermore, this graphing is an expander, and hence not hyperfinite. This is not easy to prove; for a proof based on Fourier analysis, see Gabber and Galil [1981]. Example 21.6. A special case of a hyperfinite family is a family of graphs with subexponential growth, familiar from group theory. To be precise, for a function f : N → N we say that a family H of graphs has f -bounded growth, if for any graph G ∈ H, any v ∈ V (G) and any m ∈ N, the number of nodes in the mneighborhood of v is at most f (m). We say that H has growth, if it ( subexponential ) has f -bounded growth for some function f such that ln f (m) /m → 0 (m → ∞). It was asked by Elek and proved by Fox and Pach [unpublished] that this property implies hyperfiniteness (Exercise 21.22).
21.1. HYPERFINITENESS
385
The following important class of hyperfinite families, generalizing Example 21.3 was found by Benjamini, Schramm and Shapira [2010]. Proposition 21.7. Every minor-closed family of graphs in G that does not contain all graphs is hyperfinite. Proof (sketch). Alon, Seymour and Thomas [1990] proved that the Planar Separator Theorem extends to every minor-closed property not containing all graphs. The same argument as described in Example 21.3 can be carried through to give hyperfiniteness. 21.1.2. Hyperfinite graphings. Hyperfiniteness can be generalized to graphings; in fact, we don’t have to talk about classes of graphings, the notion makes sense for a simple graphing, and leads to some nontrivial questions. A graphing G is (ε, k)-hyperfinite (ε ∈ (0, 1), k ∈ N), if there is a Borel set S ⊆ E(G) with η(S) ≤ ε such that every connected component of G − S has at most k nodes. A graphing G is hyperfinite, if for every ε > 0 there is a positive integer k such that G is (ε, k)-hyperfinite. We could relax this definition and ask for a set S ⊆ E(G) with η(S) ≤ ε such that every connected component of G − S is finite. But this would not change the notion of hyperfiniteness. Indeed, suppose that G satisfies the relaxed condition; we show that it satisfies this stronger condition as well. We choose a set S ′ for ε/2 in place of ε. Let Vm ⊂ V (G) \ S ′ be the set of points contained in some connected component of G−S ′ with m nodes. Then Vm is measurable (Exercise 18.10), and we ∑ have m η(E(G[V ∑ m ])) = η(E(G)) ≤ D. It follows that there is a∪positive integer K such that m≥K η(E(G[Vm ])) ≤ ε/2, and so the set S = S ′ ∪ m≥K E(G[Vm ]) is a set of measure at most ε such that every connected component of G − S has fewer than K nodes. Similarly as for graphs, we could define the same notion by the existence of a Borel set of points S ⊆ V (G) such that λ(S) ≤ ε and every connected component of G − S is finite. For a graphing, (ε, k)-hyperfiniteness can be expressed in a local-global way, through a red-blue coloring of the edges such that the η-measure of red edges is less than ε, and every connected k-node subgraph contains a red edge. Using this, it is easy to verify the following fact. Proposition 21.8. Let G and G′ be locally-globally equivalent graphings. If G is (ε, k)-hyperfinite, then G′ is (ε′ , k)-hyperfinite for every ε′ > ε. This proposition does not remain true for locally equivalent graphings (see Example 21.12 below). The following important and surprisingly non-trivial theorem, which is a version of a result of Schramm (see Theorem 21.13 below) shows that hyperfiniteness (without parameters) is preserved by local equivalence. Theorem 21.9. Let G and G′ be locally equivalent graphings. If G is hyperfinite, then so is G′ . The proof is best understood if we introduce a fractional version of hyperfiniteness. The motivation comes from combinatorial optimization. Let G be a graphing, and let R denote the set of subsets Y ⊆ V = V (G) that induce a connected subgraph of G and have at most k elements. This can be considered as a subset of U = V ∪ V 2 ∪ · · · ∪ V k , and it is a Borel set if we endow U with the natural sigmaalgebra it inherits from V . Let S ⊆ E(G) be a Borel set such that every connected
386
21. ON THE STRUCTURE OF GRAPHINGS
component of G \ S has at most k nodes. For x ∈ V , let Cx denote the node set of the connected component of G \ S containing x. The sets Cx partition V . Let x be a random point of G, then Cx is a random member of R, which has the following two properties: (1) If we select Cx first, and then select a uniform random point y ∈ Cx (note that Cx is finite!), then y is distributed according to λ. (2) If ∂(X) denotes the number of edges of G connecting X to V (G)\X (where X is a finite subset of V (G)), then ( ∂(C ) ) x E = η(S) ≤ ε. |Cx | Both of these properties can be easily verified using the Mass Transport Principle (similarly to the proof of Proposition 18.50). This motivates the next definition: we call a probability distribution τ on R a fractional partition (into parts in R), if selecting Y ∈ R according to τ , and then a point y ∈ Y uniformly, we get a point distributed according to λ; and we define the boundary value of τ as ( ∂(Y) ) . ∂(τ ) = E |Y| We say that G is fractionally (ε, k)-hyperfinite, if there is a fractional partition τ such that ∂(τ ) ≤ ε. It follows from the discussion above that every (ε, k)-hyperfinite graphing is fractionally (ε, k)-hyperfinite. The converse is not true (cf. Example 21.12), but we have the following weak converse: Lemma 21.10. If a graphing is fractionally (ε, k)-hyperfinite, then it is (ε log(8D/ε), k)-hyperfinite. Proof. We use the Greedy Algorithm to construct a partition from a fractional partition τ that establishes that G is fractionally (ε, k)-hyperfinite. Similar algorithms are well known in combinatorial optimization, but here we have to be careful, since we are going to construct an uncountable family of sets, and have to make sure that the partition we obtain has the property that the set of edges connecting different classes is Borel (and of course has small measure). We do our construction in r = ⌊log(2D/ε)⌋ phases. We start with R0 = U0 = ∅. In the j-th phase, let Uj,0 = Uj−1 be the union of previously selected sets. Let Rj,1 be the set of sets Y ∈ R such that ∂(Y ) < ε2j−1 |Y \ Uj,0 |. Let Qj,1 be a maximal Borel set of sets Rj,1 such that the sets Y \ Uj,0 are disjoint. Such a set exists by the following construction. Let H0 be the intersection graph of Rj,0 . It is easy to see that H0 is a Borel graph with bounded degree, and so it contains a maximal stable set that is Borel (this is implicit in the proof of Theorem 18.3, see Exercise 18.11). Let Uj,1 be the union of Uj,0 and the sets in Qj,1 . The phase is not over; we select a maximal Borel family Qj,2 of sets Y ∈ R such that the sets Y \ Uj,1 are disjoint and ∂(Y ) < ε2j−1 |Y \ Uj,1 |. We let Uj,2 be the union of Uj,1 and the sets in Qj,2 . We repeat this k + 1 times, to finish the j-th phase (after a while, we may not be adding anything). Let Qj be the family of sets Y ∈ R selected in the j-th phase, and let Uj = Uj,k be their union. We repeat this for j = 1, . . . , r. Let Q = Q1 ∪ · · · ∪ Qr be the set of all sets Y ∈ R selected. For every Y ∈ Qj , let Y 0 = Y \ Uj−1 (this is the set of nodes first
21.1. HYPERFINITENESS
387
covered by Y ). Let T0 be the set of all edges incident with any node of V \ Ur , and let T1 denote the set of all edges connecting any set Y ∈ Q to its complement. Clearly every connected component of G − (T0 ∪ T1 ) has at most k nodes. Next we show that ∂(Y ) ≥ ε2j−1 |Y \ Uj |
(21.1)
for every Y ∈ R (selected or not) and 1 ≤ j ≤ r. Suppose (by way of contradiction) that ∂(Y ) < ε2j−1 |Y \ Uj |. Then Y ̸⊆ Uj , and hence it was not selected in the j-th phase or before. But Y was eligible for selection throughout the j-th phase, and if it was not selected, then (by the maximality of the family selected) Y must contain a point of Uj,i \ Uj,i−1 for i = 1, . . . , k + 1, which is impossible since |Y | ≤ k. This proves (21.1). We want to bound the measure of T0 ∪T1 . We start with T0 . Select a random set Y ∈ R from the distribution τ , and a random point y ∈ Y . Then y is distributed according to λ, and hence by (21.1) and the definition of the fractional partition τ we have ( |Y \ U | ) ( ∂(Y) ) j (21.2) λ(V \ Uj ) = P(y ∈ / Uj ) = E ≤E ≤ 21−j |Y| ε2j−1 |Y| for every 1 ≤ j ≤ r. In particular, we have λ(V \ Ur ) ≤ 21−r ≤ 2ε/D by the choice of r, and hence ∫ η(T0 ) ≤ deg(x) dx ≤ Dλ(V \ Ur ) ≤ 2ε. V \Ur
Turning to T1 , we select a random point y of G again, and consider the set Y ∈ Q that is the first set added containing y. Since sets added at the same time are disjoint, this is well-defined, unless y ∈ / Ur , in which case we take Y = {y}. We consider Y a random set, from some distribution alpha). We can generate a random y by generating Y first according to α, and then selecting a uniform random element of Y0 . Counting every edge in T1 with its endpoint that was selected first (breaking ties arbitrarily), we have r ( ∂(Y) ) ∑ ( ) η(T1 ) ≤ E ≤ ε 2j−1 λ(Uj ) − λ(Uj−1 ) |Y0 | j=1 =ε
r ∑
( ) 2j−1 λ(V \ Uj−1 ) − λ(V \ Uj ) .
j=1
Doing partial summation and using (21.2) again, η(T1 ) ≤ ε
r ∑
2j−1 λ(V \ Uj ) ≤ εr ≤ ε log(2D/ε).
j=1
Hence η(T0 ∪ T1 ) ≤ 2ε + ε log(2D/ε) = ε log(8D/ε).
Theorem 21.9 follows from our characterization of local equivalence (Theorem 18.59), Lemma 21.10 and the following rather simple couple of facts. Proposition 21.11. Let φ : G1 → G2 be a local isomorphism between graphings G1 and G2 .
388
21. ON THE STRUCTURE OF GRAPHINGS
(a) If G2 is (ε, k)-hyperfinite, then so is G1 . (b) If G1 is fractionally (ε, k)-hyperfinite, then so is G2 . Proof. (a) Let S ⊆ E(G2 ) be a Borel set such that every connected component of G2 \ S is has at most k nodes. Then S ′ = φ−1 (S) is a Borel set in E(G1 ) with η1 (S ′ ) = η2 (S). We claim that almost every connected component of G1 \ S ′ has at most k nodes. Indeed, let X be the union of components of G1 \ S ′ with more than then for almost all x ∈ X, φ is an isomorphism on (G1 )x , and hence ( k nodes, ) φ (G1 )x is a connected subgraph of G2 \ S, which is a contradiction unless X has measure 0. (b) Let τ1 be a probability distribution on the set R1 of the connected induced subgraphs of G1 with at most k nodes that is a fractional partition with ∂(τ ) ≤ ε. Select Y ∈ R1 randomly according to τ1 ; then Z = φ(Y) is a random connected induced subgraph of G2 with at most k nodes. Let τ2 denote the distribution of Z. We claim that τ2 is a fractional partition. Indeed, we can generate a uniform random point of Z by generating a uniform random point y ∈ Y, and taking z = φ(y). (We use here that φ is a local isomorphism, and so it yields and isomorphism between Y and Z.) Since y is distributed according to λG1 , it follows that z is distributed according to λG2 . By a similar argument, we have ( ∂(Z) ) ( ∂(Y) ) E =E ≤ ε. |Z| |Y| This proves (b).
We note that the stronger assertion, namely that (ε, k)-hyperfiniteness of a graphing would be invariant under local equivalence is false. As a simple example, consider the two graphings Ca and C′′a (Exercise 18.60). These are locally equivalent, but C′′a contains a perfect matching that is a Borel set (the set of edges of the form (x, 1/2 + (x − a/2 mod 1/2)), where x ≤ 1/2), and hence it is (1/2, 2)hyperfinite. No such perfect matching exists in Ca (indeed, such a perfect matching would give a measure-preserving involution by Lemma 18.19, and its complement would be another one, contradicting the argument in Example 18.22). Hence Ca is not (1/2, 2)-hyperfinite. In this example, (ε, k)-hyperfiniteness is almost preserved, in the sense that Ca is not (1/2, 2)-hyperfinite, but (1/2 + ε, 2)-hyperfinite for every ε > 0. We will see that this is a general phenomenon among hyperfinite graphings (Corollary 21.18), but not among all graphings, as the following example shows. Example 21.12. For a 3-regular graph G, define a new graph G△ as follows. We replace every node by a triangle (call these principal triangles), and then replace every old edge by a copy of K4− , with the two nodes of degree 2 identified with the endpoints of the edge (Figure 21.1). The graph G△ has 6n nodes and 21n/2 edges. If G is bipartite, then G△ is (3/4, 3)-hyperfinite. Indeed, in this case V (G△ ) can be covered by disjoint triangles: we select all principal triangles corresponding to nodes in one color class, and all non-principal triangles disjoint from them. The number of remaining edges is |E(G△ )| − |V (G△ )| = 9n/2 = (3/4)|V (G△ )|. Now let G be nonbipartite, and let S be a minimum set of edges such that every connected component of G△ \ S has at most 3 nodes. Let A be the set of nodes in G for which the corresponding principal triangle is a component of G△ \ S. It is
21.1. HYPERFINITENESS
389
Figure 21.1. (3/4, 3)-hyperfiniteness is not even approximately preserved by local equivalence easy to check that for every edge of G induced by A or induced by V (G) \ A, there is at least one node of degree at most one among the corresponding four nodes of G△ \ S. The remaining nodes of G△ \ S have degree at most 2. Hence ) 1( |E(G△ ) \ S| ≤ |E(G)| − eG (A, V \ A) + 2(|V (G△ )| − |E(G)| + eG (A, V \ A)) 2 9n ≤ + Maxcut(G), 2 and so 9n |S| ≥ 6n − Maxcut(G) > , 2 since G is nonbipartite. So G△ is (3/4, 3)-hyperfinite. For the appropriate choice of G, we can prove more: let Gn be a random D-regular graph and G′n , a random D-regular bipartite graph on n nodes, then (G′n )△ is (3/4, 3)-hyperfinite. On the other hand, Maxcut(Gn ) < 1.41n with high probability (McKay [1982], see also Hladky [2006]), and we get that (Gn )△ is not even (4/5, 3)-hyperfinite. Let G and G′ be local-global limit graphings of the sequences (Gn )△ and ′ △ (Gn ) , respectively (or of appropriate subsequences), then G and G′ are locally equivalent, G is (3/4, 3)-hyperfinite, but G′ is not even (4/5, 3)-hyperfinite. Our argument proving Theorem 21.9, which was motivated by an argument of Schramm [2008], says that if a graphing is (ε, k)-hyperfinite, then every graphing locally equivalent to it is (ε′ , k)-hyperfinite with a somewhat larger ε′ than ε (namely, ε′ = O(ε log(1/ε))). We could have based another proof on the graph partitioning algorithm of Hassidim, Kelner, Nguyen, and Onak; [2009] this would show that if a graphing is (ε, k)-hyperfinite, then every graphing locally equivalent to it is (ε, k ′ )-hyperfinite with some larger k ′ . The following theorem was proved (in a different formulation, using involution invariant random rooted graph models) by Schramm [2008]. Theorem 21.13. Let (Gn ) be a sequence of graphs in G, converging to a graphing G. Then G is hyperfinite if and only if the family {Gn : n = 1, 2, . . . } is hyperfinite. Proof. The “if” part is not hard. Suppose that (Gn ) is hyperfinite. Let ε > 0, we want to show that G is hyperfinite. Let ε > 0, and let k ≥ 1 be chosen so that for every n that is large enough there is a set Sn ⊆ V (Gn ) with |Sn | ≤ εv(Gn ) such that every connected component of Gn − Sn has at most k nodes. Consider the pairs (Gn , Sn ) as graphs with their nodes 2-colored, and choose a subsequence that is convergent as a sequence of colored graphs. The limit can be represented
390
21. ON THE STRUCTURE OF GRAPHINGS
by a colored graphing (G′ , S). It follows from the definition of convergence that |Sn |/v(Gn ) → λ(S) (where λ is the node measure in (G′ , S)), and also that almost all connected components of G′ − S have at most k nodes. Hence the uncolored graphing G′ is hyperfinite. Since G′ and G are locally equivalent, it follows by Theorem 21.9 that G is hyperfinite. To prove the “only if” part, we invoke Theorem 19.16. By selecting an appropriate subsequence, we may assume that the sequence (Gn ) is locally-globally nd (Gn , G′ ) → 0 for evconvergent, and so it has a limit graphing G′ such that δ⊙ ′ ery k ≥ 0. Clearly G and G are locally equivalent, and hence G′ is hyperfinite by Theorem 21.9. This means that for every ε > 0 there is an m ≥ 1 such that V (G′ ) has a Borel 2-coloring with read and blue such that every connected m-node subgraph contains a red point, and λ′ {red points} ≤ ε. These properties can be read off from the 1-balls and the m-balls, respectively. It follows by the assumption that Gn → G′ in the local-global sense that for a large enough n, Gn has a 2-coloring such that the set Rn of red nodes satisfies |Rn | ≤ 2εv(Gn ), and the number of m-neighborhoods that contain a connected blue subgraph with m + 1 nodes is at most εv(Gn ). Adding the roots of these m-neighborhoods to Rn , we get a set Rn′ ⊆ V (Gn ) with |Rn′ | ≤ 3εn such that every connected component of Gn − Rn has at most m nodes. We state another result, in a sense dual to Theorem 21.13: Theorem 21.14. A graphing is hyperfinite if and only if it is the limit of a hyperfinite graph sequence. Proof. In view of Theorem 21.13, it suffices to prove that every hyperfinite graphing is the limit of a locally convergent graph sequence. (So the Aldous–Lyons conjecture holds for hyperfinite graphings.) Let G be a hyperfinite graphing, and let ε > 0. Let S be a subset of edges with η(S) = ε such that every connected component of G − S is finite. Proposition 19.1 implies that δ⊙ (G, G − S) ≤ 4ε1/ log(2D) .
(21.3)
For every graph F ∈ G, let aF be the measure of points in G − S whose ∑ a = 1, we can connected component is isomorphic to F . Since F F ∑ ∑ choose a finite set H of graphs such that a ≤ ε/D. Let n > (D/ε) F F ∈H / F ∈H v(F ), and nF = ⌊aF n/v(F )⌋ (so that the rationals nF v(F )/n approximate the real numbers aF with common denominator). For every F ∈ / H, let us delete the edges of all connected components of G − S isomorphic to F . For every F ∈ H, let us delete the edges of a set of connected components of G − S isomorphic to F so that the remaining connected components cover a set of measure nF v(F )/n; it is not hard to see that this can be done so that a Borel graph remains. The measure of the set T of deleted edges can be bounded as follows: ∑ D ∑ D( nF v(F ) ) ε D ∑ v(F ) η(T ) ≤ aF + aF − ≤ + ≤ ε. 2 2 n 2 2 n F ∈H /
F ∈H
F ∈H
Hence it follows just like above that (21.4)
δ⊙ (G − S, G − S − T ) ≤ 4ε1/ log(2D) .
Let G be a graph on n nodes consisting of nF copies of each F , together with sufficiently many isolated nodes. Then G − S − T and G have the same connected
21.1. HYPERFINITENESS
391
components, with the same frequencies; hence δ⊙ (G − S − T, G) = 0, and so by (21.3) and (21.4), δ⊙ (G, G) ≤ δ⊙ (G, G − S) + δ⊙ (G − S, G − S − T ) ≤ 8ε1/ log(2D) . So G can be approximated arbitrarily well in the δ⊙ distance by finite graphs.
Kaimanovich [1997] proved the following characterization of hyperfinite graphings, which we quote without proof. (For a proof, see also Elek [2012a].) Theorem 21.15. A graphing G is not hyperfinite if and only if it has a subgraphing F such that ηG (E(F)) > 0 and there is an ε > 0 such that ∂F (Y ) ≥ εv(Y ) for every finite connected subgraph Y of F. We conclude our discussion of hyperfiniteness with a result of Hatami, Lov´asz and Szegedy [2012] and Elek [2012a], showing that in the hyperfinite world, local and local-global are equivalent. Theorem 21.16. Any two locally equivalent hyperfinite atom-free graphings are locally-globally equivalent. As a preparation for the proof, we prove a somewhat stronger statement in a special case. Lemma 21.17. Let G and G′ be two locally equivalent graphings such that all components of them are finite and have at most k nodes (k ≥ 1). Then for every Borel m-coloring β of G′ there is a Borel m-coloring γ of G such that (G, γ) and (G, β) are locally equivalent as colored graphings. Proof. By Theorem 18.59, we may assume that there is a local isomorphism φ : G′ → G or a local isomorphism φ : G → G′ . The second alternative is trivial, so we assume the first. Applying Theorem 18.3 to the graphing obtained by filling up every connected component of G to a complete graph (which results in a Borel graph, cf. Exercise 18.7), we get that there is a Borel k-coloring α : V (G) → [k] such that any two nodes in the same component have different colors. For every isomorphism type F of k-colored graphs with at most k nodes, let UF be the union of all connected components C of G for which C ∼ = F . It is easy to see that the UF are Borel sets. Next, we pull back the coloring to G′ : let UF′ = φ−1 (UF ) and α′ = φ ◦ α. It follows from the definition of local isomorphism that every connected component of G′ [UF′ ] is isomorphic to F as a colored graph (with colors according to α′ ). Consider the given m-coloring β of G′ . On every connected component C of ′ UF , the colors according to α′ are all different, and hence β can be represented as β = α′ ◦ fC with an appropriate map fC : [k] → [m]. For every isomorphism class ′ F of k-colored graphs with at most k nodes and every map f : [k] → [m], let UF,f ∼ be the union of all connected components C for which C = F and fC = f . This partitions every set UF′ into mk sets UF,f , which are Borel (as it is easy to see). ′ The images φ(UF,f ) are not necessarily Borel sets, but we can partition every k ′ set UF into m Borel sets UF,f so that λG (UF,f ) = λG′ (UF,f ). Let us color x ∈ UF,f with color f (α(x)), to get an m-coloring γ. Then the whole component of a point ′ x ∈ UF,f is colored by γ the same way as the component of any y ∈ UF,f . This proves that γ satisfies the conclusion of the Lemma.
392
21. ON THE STRUCTURE OF GRAPHINGS
Proof of Theorem 21.16. As above, we may assume that there is a local isomorphism φ : G′ → G. We want to prove that for every m-coloring β of G′ there is an m-coloring of G for which the sampling distance of these colored graphings is arbitrarily small; and vice versa. In fact, the “vice versa” part is trivial (we can just pull back the k-coloring of G by φ). The first assertion, however, takes more work. Let ε > 0. By hyperfiniteness, there is a set S ⊆ E(G) and a k ≥ 1 such that ηG (S) ≤ ε′ , where ε′ = 14 εlog(2D+2) , and every connected component of G \ S has at most k nodes. Let S ′ = φ−1 (S), then ηG′ (S ′ ) = ηG (S) ≤ ε′ and φ is a local isomorphism from G′ \ S ′ to G \ S. Since β is a Borel m-coloring of G′ , Lemma 21.17 implies that there is a Borel m-coloring γ of G \ S such that (G \ S, γ) and (G′ \ S ′ , β) are locally equivalent as colored graphings. By Proposition 19.1, we have ε δ⊙ ((G, γ), (G \ S, γ)) ≤ 2d1 ((G, γ), (G \ S, γ))1/ log(2D+2) ≤ , 2 and similar inequality holds for G′ . Hence δ⊙ ((G, γ), (G′ , β)) ≤ δ⊙ ((G, , γ), (G \ S, γ)) + δ⊙ ((G \ S, γ), (G′ \ S ′ , β)) ε ε + δ⊙ ((G′ \ S ′ , β), (G′ , β)) ≤ + 0 + = ε. 2 2 This theorem has some interesting consequences. We have seen (Example 21.12) that a graphing may be (ε, k) hyperfinite and a locally equivalent graphing may not be (16ε/15, k) hyperfinite. However, if the graphing is hyperfinite, this cannot occur: Corollary 21.18. Let G and G′ be two hyperfinite locally equivalent graphings, and assume that G is (ε, k)-hyperfinite. Then G′ is (ε′ , k)-hyperfinite for every ε′ > ε. The second corollary shows that the two notions of convergence discussed in sections 19.1 and 19.2 are equivalent for hyperfinite graph sequences. Corollary 21.19. Every locally convergent hyperfinite graph sequence is locallyglobally convergent. Proof. Let (Gn ) be a locally convergent hyperfinite sequence, then it converges locally to a hyperfinite graphing G by Theorem 21.13. If (Gn ) does not converge locally-globally to G, then it has a locally-globally convergent subsequence whose limit graphing G′ is not locally-globally equivalent to G. Since G and G′ are locally equivalent, this contradicts Theorem 21.16. Exercise 21.20. Let G ∈ G be an (ε, k)-hyperfinite graph, and let 0 ≤ δ ≤ ε. Prove that there exists an (ε − δ, k) hyperfinite graph G′ for which δ⊙ (G, G′ ) ≤ 4δ 1/ log D . State and prove the analogous assertion for graphings. Exercise 21.21. Prove that for every planar G on n nodes and every √ graph √ integer K ≤ n, one can delete at most 60n/ K − 30 n nodes from G so that every connected component of the remaining graph has at most K nodes. (The strange formula is given as help, to facilitate induction.) Exercise 21.22. Prove that every family of graphs in G with subexponential growth is hyperfinite. Exercise 21.23. Formulate and prove a version of Corollary 21.18 for finite graphs.
21.2. HOMOGENEOUS DECOMPOSITION
393
21.2. Homogeneous decomposition Hyperfinite graphs can be decomposed into bounded size graphs by deleting a small fraction of the edges. How far can we simplify a general bounded degree graph by deleting a small fraction of the edges? It was proved recently by Angel and Szegedy [unpublished], and independently by Elek and Lippner [2011], that every graph with degrees bounded by D can be decomposed into a bounded number of “homogeneous” parts by deleting small number of edges. To be precise, let us call a subset U ⊆ V (G) an (ε, δ)-island, if eG (U, V (G)\U ) ≤ δv(G), |U | ≥ εv(G), and δ⊙ (G[U ], G) ≥ ε. We say that a graph G ∈ G (ε, δ)homogeneous, if it contains no (ε, δ)-island. Clearly, an (ε, δ)-island is also an (ε′ , δ ′ )-island if ε′ ≤ ε and δ ′ ≥ δ. Hence an (ε, δ)-homogeneous graph is also (ε′ , δ ′ )-homogeneous, if ε′ ≤ ε and δ ′ ≥ δ. Example 21.24. An m × m grid G = Pm Pm is (ε, δ)-homogeneous if δ ≤ ε2 /18 and m is large enough. Indeed, suppose that U ⊆ V (G) is an (ε, δ)-island. Consider any r ≥ 0. We claim that most nodes of G[U ] are “orderly” in the sense that they have the same r-neighborhood as a node in an infinite grid. Indeed, if v ∈ U is not orderly, then either it is closer to the boundary than r, or the r-ball around it contains one of the edges leaving U . It is easy to check that any edge leaving U can be counted at most 2r2 times, hence the number of “disorderly” nodes is at most 4rm + 2r2 δm2 < 3r2 δm2 (if m is large enough). The proportion of “disorderly” nodes in the whole grid G is even smaller, and hence r (G, G[U ]) < δ⊙
Summing, δ⊙ (G, G[U ])
0 there is a δ > 0 such that from every graph G ∈ G we can delete εv(G) edges in such a way that every component of the remaining graph is (ε, δ)-homogeneous. The dependence of δ on ε is explicit, and only moderately bad: choose r = r 1 + ⌈log(1/ε)⌉ (so that 2r ≈ ε/2), let b = |Br | ≤ DD be the number of r-balls O(log D) (Exercise 18.42), and define δ = ε5 /(4Dr b) = 2−1/ε . Proof. The proof follows the argument of Angel and Szegedy, which is reminiscent of the proof of the Regularity Lemma. By (19.3), an (ε, δ)-island U will satisfy dvar (ρG[U ],r , ρG,r ) ≥ ε/2. For a graph G ∈ G with connected components G1 , . . . , Gk , we define f (G) =
k ∑ v(Gi ) ∑ i=1
v(G)
ρGi ,r (B)2 .
B∈Br
Trivially, 0 ≤ f (G) ≤ 1. Let, say, G1 , . .∑ . , Gm be those components of G that are not (ε, δ)-homogeneous, m and suppose that i=1 v(Gi ) = p > (ε/2D)n. Let Vi′ ⊆ V (Gi ) be an (ε, δ)-island, ′′ and let Vi = V (Gi ) \ Vi′ . Let Ci be the set of edges connecting Vi and Vi′′ , then |Ci | ≤ δ|Vi′ |. Finally, let G′ be obtained from G by removing the edges in the sets
394
21. ON THE STRUCTURE OF GRAPHINGS
Ci . We want to show that if many of the parts are not ε-homogeneous, then f (G′ ) is substantially larger than f (G). To keep the notation in check, set G′i = G[Vi′ ], G′′i = G[Vi′′ ], n = v(G), ni = v(Gi ), n′i = v(G′i ) etc. Since the radius r is fixed, we don’t have to show it in notation, and write ρi = ρGi ,r , ρ′i = ρG′i ,r etc. Fix any i ∈ [k] and any B ∈ Br , and consider the difference of their contributions to f (G′ ) and f (G): n′i ′ n′′ ni ρi (B)2 + i ρ′′i (B)2 − ρi (B)2 n n n )2 n′′ ( )2 n′i ( ′ ρi (B) − ρi (B) + i ρ′′i (B) − ρi (B) = n n ) ( ′ ′ 2 ′′ ′′ + ρi (B) ni ρi (B) + ni ρi (B) − ni ρi (B) . n Here the first term will provide the gain, the second is nonnegative, while the third is an error term. To estimate the “gain” term, first we sum over all balls B: )2 ∑( )2 1 (∑ ′ 4 ε2 ρ′i (B) − ρi (B) ≥ |ρi (B) − ρi (B)| = dvar (ρ′i , ρi )2 ≥ . b b b (21.5)
B
B
Summing over i and using that ∑ i,B
n′i
n′i ( ′ ρi (B) n
≥ εni by the definition of an island, we get
− ρi (B)
)2
≥
∑ ni ε2 ε4 ε ≥ . n b Db i
To estimate the error term, we argue that the quantity |n′i ρ′i (B) + n′′i ρ′′i (B) − ni ρi (B)| is the increase or decrease in the number of neighborhoods isomorphic to B when the edges in Ci are deleted; since deletion of an edge can change at most 2Dr balls with radius r, we have ∑ n′i ρ′i (B) + n′′i ρ′′i (B) − ni ρi (B) ≤ 2Dr |Ci |, B
and so the total contribution of the error term is at most ε4 2Dr ∑ |Ci | ≤ 2Dr δ < . n i 2Db It follows that the value ∑ of f (G) increases by at least ε4 /(2Db). The number of edges deleted is at most i |Ci | ≤ δn. We can repeat this until the number of nodes in non-(ε, δ)-homogeneous components drops below εn/D. This happens after at most 2Db/ε4 repetitions (since f (G) ≤ 1), and when we get stuck, the number of deleted edges is at most (2Dbδ/ε4 )n < (ε/2)n, and the number of nodes in those components that are not (ε, δ)-homogeneous is less than (ε/D)n. Deleting all edges in these components means the deletion of no more than (ε/2)n further edges. This turns these remaining components into isolated nodes, which count as (ε, δ)-homogeneous components. This proves the theorem. 21.2.1. The quest for a regularity lemma. Is there a good analogue of the Regularity Lemma for bounded degree graphs? The Regularity Lemma, as discussed in Chapter 9, does not say anything about non-dense graphs. What do we expect from such a lemma? If we think about the many uses of the dense Regularity Lemma, there is no single answer to this question.
21.2. HOMOGENEOUS DECOMPOSITION
395
• It gives a partition of the nodes such that most bipartite graphs between different classes are homogeneous (random-like). Several extensions of the Regularity Lemma to sparse graphs in this sense are known (see e.g. Kohayakawa [1997], Gerke and Steger [2005], Scott [2011]), but they are more-or-less meaningless, or very weak, for graphs that have bounded degree. • It gives a decomposition of the graph into simpler, homogeneous subgraphs. Theorem 21.25 describes such a decomposition. However, this result is clearly not the ultimate word: the (ε, δ)-homogeneous pieces it produces can still have a very complicated structure. • It implies that an arbitrarily large (simple, dense) graph can be “scaled down” to a graph whose size depends on the error bound only, and which is almost indistinguishable from the original by sampling. Proposition 19.10 shows that such a “downscaling” is also valid for bounded degree graphs; Unfortunately, it is noneffective, and provides no algorithm for the construction of the smaller graph. • It provides an approximate code for the graph, which has bounded size (depending on the error we allow), from which basic parameters of the graph can be reconstructed, and from which graphs can be generated on an arbitrary number of nodes that are almost indistinguishable from the original graph by sampling. In this sense, a Regularity Lemma may exist, and should be very useful once we learn how to work with it. While not quite satisfactory, I feel that the results mentioned above justify cautious optimism.
CHAPTER 22
Algorithms for bounded degree graphs The algorithmic theory of large graphs with bounded degree is quite extensive. Similarly as in the case of dense graphs, we can formulate the problems of parameter estimation, property distinction, property testing, and computing a structure. However, it seems that the theory in the bounded degree case is lacking the same sort of general treatment as dense graphs had, in the form of useful general conditions for parameter estimations (like Theorem 15.1), treatment of property distinction in the limit space (Section 15.3), and the use of regularity partitions and representative sets in the design of algorithms (Section 15.4). The most important tools that are missing are analogues of the Regularity Lemma and of the cut distance. Our discussions in this chapter, accordingly, will be more an illustration of several interesting and nontrivial results than a development of a unifying theory. But even so, graph limit theory provides a useful point of view for these results. 22.1. Estimable parameters We call a graph parameter defined on bounded degree graphs estimable, if it is bounded, and for every ε > 0 there is a positive integer k and an “estimator” function g : (Bk )k → R such that for every graph G ∈ G and uniform, independently chosen random nodes v1 , . . . , vk ∈ V (G), we have ( ) (22.1) P f (G) − g(BG,k (v1 ), . . . , BG,k (vk ))| > ε ≤ ε. In other words, g estimates f from a sample chosen according to the rules of sampling from a bounded degree graph. (For convenience, we use the same ε to bound the error in the function value and the probability of a large error; also the same k for the number of samples and the radius of balls we explore around the sampling points. In specific algorithms, one may want to distinguish these values, but this would not alter the notion of estimability.) In the dense case, we did not need a separate estimator function g; we could use g = f . This is not the case here. Example 22.1. Let f (G) be the fraction of nodes of G of degree 1. If G is a 3-regular graph with large girth, then every sample BG,k (v) is a tree with more than half of its nodes of degree 1; but G itself has no nodes of degree 1. It is also easy to see that it would not be enough to use just one sample ball. Example 22.2. Let G be a 2-regular graph on n nodes, G′ , a 3-regular graph on n nodes, and GG′ , their disjoint union. Let the parameter to estimate be the average degree. In a single sample you see only nodes of degree 2 or nodes of degree 3, no matter how far you explore the graph. No matter how large neighborhoods you 397
398
22. ALGORITHMS FOR BOUNDED DEGREE GRAPHS
take, and what function of them you compute, this single bit of information (degree 2 or degree 3) will not distinguish three possibilities (G, G′ and GG′ ). On the other hand, some other facts extend from the dense case with more or less difficulty. The following theorem of Elek [2010a] connects parameter estimation with convergence (recall that the analogous result for dense graphs was trivial). Theorem 22.3. A bounded graph parameter f is estimable if and only( if for )every locally convergent graph sequence (Gn ), the sequence of numbers f (Gn ) is convergent. Proof. The “only if” part is easy: from similar graphs we get similar samples and so we compute similar estimates. Let us make this precise. Suppose that f is estimable, and let (Gn ) be a locally convergent graph sequence. Let 0 < ε < 1/8, we want to show that |f (Gn ) − f (Gm )| < ε if n, m are large enough. By the definition of estimability, we have a positive integer k and an estimator function g : (Bk )k → R such that (22.1) holds. If n, m are large enough, then δ⊙ (Gn , Gm ) ≤ 1/(4k2k ), and hence dvar (ρk,Gn , ρk,Gm ) ≤ 1/(4k). This means that we can couple a random node v ∈ V (Gn ) with a random node u ∈ V (Gm ) so that BGn ,k (v) ∼ = BGm ,k (u) with probability at least 1 − 1/(4k). If we sample k independent nodes v1 , . . . , vk from Gn and k independent nodes u1 , . . . , uk from Gm , then with probability more than 3/4, we have BGm ,k (u1 ) ∼ = BGn ,k (v1 ), . . . , BGm ,k (uk ) ∼ = BGn ,k (vk ). With positive probability, we have (simultaneously BGm ,k (u1 )) ∼ = ∼ BGn ,k (v ), . . . , B (u ) B (v ), f (G ) − g B (v ), . . . , B (v ) ≤ = Gm Gn ,k k n) Gn ,k 1 Gn ,k k 1 (,k k ε and f (Gm ) − g BGm ,k (u1 ), . . . , BGm ,k (uk ) ≤ ε. But in this case we have |f (Gn ) − f (Gm )| ≤ 2ε, which we wanted to prove.( ) The converse is a bit trickier. Suppose that f (Gn ) is convergent for every locally convergent graph sequence (Gn ). Given ε > 0, we want to find a suitable positive integer k and construct an estimator g : (Bk )k → R. The condition on f implies that for every ε > 0 there is an ε′ > 0 such that if δ⊙ (G, G′ ) ≤ ε′ then |f (G) − f (G′ )| ≤ ε. Let r be chosen so that 21−r < ε′ , and let k > 2r/(εε′ ). The estimator we construct will only depend on the r-balls around the roots of the k-balls. So we will construct a function g : (Br )k → R. For every sequence b = (B1 , . . . , Bk ) ∈ (Br )k , let ρb denote the distribution of a randomly chosen element of the sequence. We define the estimator as follows: ′ f (G) where G is any graph with dvar (ρG,r , ρb ) ≤ ε /4, g(b) = if such a graph exists, 0 otherwise. To show that this is a good estimator, let G ∈ G be any graph, and let v1 , . . . , vk )∈ ( V (G) be uniformly chosen random nodes, and let b = BG,r (v1 ), . . . , BG,r (vk ) . By the choice of k, elementary probability theory gives that with probability at least 1 − ε, we have dvar (ρb , ρG,r ) ≤ ε′ /4. If this happens, then in the definition of g(b) the first alternative applies, and so g(b) = f (G′ ) for some graph G′ that satisfies dvar (ρG′ ,r , ρb ) ≤ ε′ /4. This implies that dvar (ρG′ ,r , ρG,r ) ≤ ε′ /2. Then we have by (19.3) 1 ε′ δ⊙ (G, G′ ) ≤ r + ≤ ε′ . 2 2 By the definition of ε′ , this implies that |f (G) − f (G′ )| ≤ ε.
22.1. ESTIMABLE PARAMETERS
399
Corollary 22.4. For every estimable graph parameter f there exists a graphing parameter fb that is continuous in the δ⊙ distance such that f (Gn ) → fb(G) whenever Gn → G. Notice that continuity in the δ⊙ distance implies invariance under local equivalence. Proof. It is easy to see (using Theorem 22.3) that the parameter fb is uniquely determined for graphings that represent limits of convergent graph sequences, and it is continuous in the δ⊙ distance. However, we don’t know if all graphings are like that (cf. Conjecture 19.8). To complete the proof, we can use Tietze’s Extension Theorem to extend the definition of fb to all graphings. This possible non-uniqueness of the extension may be connected with the fact that it is typically not easy to see the “meaning” of the extension of quite natural graph parameters (cf. Supplement 20.17). Our discussion in Section 20.2 shows that parameters of the type G 7→ ent∗ (G, H) = log t(G, H)/v(G) are estimable provided the weighted graph H is sufficiently dense. In the next two sections we describe a couple of further interesting examples of estimable graph parameters. Not all natural parameters of graphs are estimable (see Examples 20.16 and 22.5). However, it was shown by Elek [2010a] that if we restrict ourselves to testing properties on hyperfinite graphs, then many of these become testable. The method is similar to property testing for hyperfinite graphs, which will be discussed later. Example 22.5 (Independence ratio). Recall that α(G) denotes the maximum size of a stable set in graph G. The independence ratio α(G)/v(G) is not estimable. Let Gn be a random D-regular graph on 2n nodes, and G′n be a random bipartite D-regular graph on 2n nodes. It is clear that α(G′n ) = n. In contract, α(Gn ) ≤ (1 − 2cD )n with high probability, where cD > 0 depends only (Bollob´as [1980]). The interlaced sequence (G1 , G′1 , G2 , G′2 , . . . ) is locally convergent (as discussed in Example 19.7), but the independence ratios oscillate between 1/2 and something less than 12 − cD , so they don’t converge. However, it we restrict ourselves to the sequence Gn , then the independence ratios form a convergent sequence; this is a recent highly nontrivial result of Bayati, Gamarnik and Tetali [2011]. 22.1.1. Number of spanning trees. Lyons [2005] proved that the number of spanning trees tree(G), suitably normalized, is an estimable parameter of bounded degree graphs. He in fact proved a more general result, allowing the degrees to be unbounded, as long as the average degree remains bounded and the degrees don’t vary too much; we treat the bounded case only, and refer for the exact statement of the more general result to the paper. Let G be a connected graph with n nodes and m edges, whose degrees are bounded by D as always in this part of the book. It is easy to see that tree(G) ≤ Dv(G) ; a bit sharper, ∏ tree(G) ≤ deg(v), v∈V (G)
whence
1 1 ∑ log tree(G) ≤ log deg(v). n n v∈V (G)
400
22. ALGORITHMS FOR BOUNDED DEGREE GRAPHS
The right hand side is clearly bounded and estimable, which reassures us that we have the right normalization. ( ) Theorem 22.6. The graph parameter log tree(G) /v(G) is estimable for connected bounded degree graphs. Proof. Let G be a connected graph with all degrees bounded by D. It will be convenient to choose D generously so that all degrees are in fact at most D/2. We add D − deg(v) loops to each node v, to make the graph regular (here, a loop adds only 1 to the degree) and also to make sure that its adjacency matrix A is positive semidefinite. This does not change the number of spanning trees, and from the samples of the original graph the samples of this augmented graph are easily generated just by adding loops. We start with developing formulas for tree(G) and its logarithm. Here we face an embarrassment of riches: there are many formulas for tree(G) in the literature, and possibly others would also work. We use (5.41), which we write as ∞ ∑ 1 n−1 log n 1 ( t∗ (Cr , G) 1) (22.2) log tree(G) = log D − − (log e) − . n n n r Dr n r=1 For every fixed r, the quantity t∗ (Cr , G) − 1/n is estimable. Since the other terms in (22.2) are trivially estimable, we are almost done. But the problem is that we have an infinite sum, and we need a convergent majorant. (This is where it becomes important that we have subtracted 1/(nr) in every term!) Lemma 22.7. For any r ≥ 0 and v ∈ V (G), homv (Cr• , G) 1 2D1/3 1 ≤ ≤ + . n Dr n (r + 1)1/3 (It may help with the digestion of this formula that Dr = homv (Pr• , G), and so the ratio in the middle expresses the probability that a random walk started at v returns to v after r steps. Since the endpoint of a random walk becomes more and more independent of the starting point, this probability tends to 1/n by elementary properties of random walks. The main point is that the upper bound gives a uniform bound on the rate of this convergence.) Averaging over all nodes v, the lemma implies that (22.3)
0≤
t∗ (Cr , G) 1 2D1/3 − ≤ . r D n (r + 1)1/3
This gives a convergent majorant, independent of G, for the infinite sum in (22.2), which proves that (1/n) log tree(G) is estimable. Proof of Lemma 22.7. Let P = (1/D)A (this is the transition matrix of the random walk on G), and yr = P r 1v (this is the distribution of a random walk after r r steps). Clearly t∗v (Cr• , G)/Dr = 1T v P 1v = 1v yr . Since P is positive semidefinite, we see from here that the values yr (v) are monotone decreasing, and since P r → n1 J (where J is the all-1 matrix), yr (v) → n1 as r → ∞. This implies the lower bound in the lemma. To get the upper bound, we note that ∞ ∞ ∑ ∑ 1 (22.4) ytT (I − P )yt + ytT (P − P 2 )yt = 1 − n t=0 t=0
22.1. ESTIMABLE PARAMETERS
401
Indeed, the matrices I − P and P − P 2 are positive semidefinite, hence all terms here are nonnegative. Furthermore, P yt = yt+1 , so if we stop the sums at m steps, T then the middle terms telescope out, and we are left with y0T y0 − ym+1 ym+1 , where T T y0 y0 = 1 and ym+1 ym+1 → 1/n. From (22.4) it follows that there is a t ≤ r such that ytT (I − P )yt ≤ 1/(r + 1). Let x = 12 (yt (v) + n1 ), and let u be the closest node to v with yt (u) ≤ x. Consider a shortest path v0 v1 . . . vk , where v0 = v and vk = u. Since yt (v0 ), . . . , yz (vk−1 ) ≥ x, we must have k ≤ 1/x. On the other hand, ( )2 ( )2 yt (v) − x ≤ yt (v0 ) − yt (vk ) ( ) ( )2 = (yt (v0 ) − yt (v1 ) + · · · + yt (vk−1 ) − yt (vk )) ) ( ≤ k (yt (v0 ) − yt (v1 ))2 + · · · + (yt (vk−1 ) − yt (vk ))2 ≤ DkytT (I − P )yt ≤
D Dk ≤ . r+1 x(r + 1)
Hence (22.5)
(yt (v) − x)2 x ≤
D . r+1
Substituting the definition of x, we get ( 1 )3 8D yt (v) − ≤ 8(yt (v) − x)2 x ≤ . n r+1 Since we know that yr (v) ≤ yt (v), this proves the lemma. We note that Lyons gets a better estimate, with 1/2 in the exponent of r + 1 rather than 1/3, but the simpler bound above was good enough for our purposes. ( ) Once we know that the graph parameter log tree(G) /v(G) is estimable, we ( also know ) that if Gn is a locally convergent graph sequence, then log tree(Gn ) /v(Gn ) tends to a limit. From the proof, it is not difficult to figure out what the limiting graphing parameter (or involution-invariant-distributionparameter) is. We formulate the answer for a graphing, but it is easy to translate this to the Benjamini–Schramm model. Given a graphing G and a number D such that D/2 is an upper bound on the degrees, pick a random node x, and start a random walk from x, where you have to add D − deg(y) loops to node y as you go along. Let Xr be the indicator that the random walk returns to x after r steps (not necessarily the first time). With this notation, we have (22.6)
∞ ∑ log tree(Gn ) 1 E(Xr ). −→ log D − v(Gn ) r r=1
The expression on the right describes the limit as a function of the limiting graphing. (Note that its value may be −∞.) Exercise 22.8. Suppose that we want to estimate the variance of the degrees, i.e., ∑ (deg(v) − d0 )2 /v(G), where d0 is the average degree. (a) Show that this v∈V (G) parameter is (estimable. (b) Prove that ) we ( cannot estimate it using an estimator ) of the form g BG,k (v1 ), . . . , BG,k (vk ) = h(BG,k (v1 ))+· · ·+h(BG,k (vk )) /k with any function h : Bk → R.
402
22. ALGORITHMS FOR BOUNDED DEGREE GRAPHS
22.2. Testable properties 22.2.1. Distinguishing properties. Similarly as in the dense case (Section 15.2), our next step is discuss the problem of distinguishing two disjoint graph properties. The setup is very similar to the dense case. Let P1 , P2 ⊆ G be two graph properties with P1 ∩ P2 = ∅. We call properties P1 and P2 distinguishable by sampling, if there exist positive integers r and k, and a property Q of k-tuples of r-balls, such that for every graph G ∈ G and random nodes v1 , . . . , vk , 2 ( ) ≥ 3 , if G ∈ P1 , P (BG,r (v1 ), . . . , BG,r (vk )) ∈ Q ≤ 1 , if G ∈ P . 2 3 The following analogue of Theorem 15.8 is due to Benjamini, Schramm and Shapira [2010]. Theorem 22.9. Two graph properties P1 , P2 ⊆ G are distinguishable by sampling if and only if δ⊙ (P1 , P2 ) > 0. Proof. The necessity of the condition is easy to prove, along the same lines as it was done for dense graphs in the proof of Theorem 15.8; we don’t go into the details. The sufficiency is done differently, and unfortunately it is not constructive. Suppose that δ⊙ (G1 , G2 ) = ε > 0. Similarly as in the proof of Proposition 19.10, we can select a finite set Q1 ⊆ P1 of graphs such that for every graph G ∈ P1 there is a graph H ∈ Q1 such that δ⊙ (G, H) ≤ ε/4. Now if we want to decide whether G ∈ P1 or G ∈ P2 , then we compute δ⊙ (G, H) with error less than ε/4 for all H ∈ Q1 . (This can be done with error probability less than 1/3 by taking a large enough sample of large enough balls.) If there is an H for which we find that δ⊙ (G, H) ≤ ε/2, we conclude that G ∈ P1 . If no such H exists, we conclude that G ∈ P2 . It is straightforward to check that if G ∈ Pi , then the answer will be correct with probability larger than 2/3. 22.2.2. Testable properties and their closures. Now we come to testing a single property P. Just as in the dense case, we either want to conclude that a given graph G does not have the property, or that one can change a small number of adjacencies so that the property is restored. To be more precise, let P ε = {G ∈ G : d1 (G, P) > ε}. We say that a property P of bounded degree graphs is testable if for every ε > 0 there are integers r = r(ε) ≥ 1 and k = k(ε) such that sampling k neighborhoods of radius r from a graph G ∈ G, we can compute “YES” or “NO” so that: (a) if G ∈ P, then the answer is “YES” with probability at least 2/3; (b) if G ∈ / P ε , then the answer is “NO” with probability at least 2/3. Example 22.10 (Forests). Let us look at a simple example that illustrates some of the difficulties in designing algorithms for property testing for graphs with bounded degree, even for monotone properties. Suppose that we want to test whether a graph G is a forest. Our first thought might be to test whether a random ball contains a cycle. Certainly, if it does, then the graph is not a forest. But drawing a conclusion in the other direction is not justified: if the graph G has large girth, then every ball will be tree, and G would be very far from being a forest. This shows that (unlike in the dense case) P is not a good test property for itself. If in addition to this
22.2. TESTABLE PROPERTIES
403
we estimate the average degree and eliminate small components, we can design a test for being a forest (Goldreich and Ron [2008]). To fill in the details makes an interesting exercise. We can use limit objects to give the following condition for testability (the proof is immediate). Proposition 22.11. A graph property P is not testable if and only if there exists an ε > 0 and two convergent sequences of graphs (Gn ) and (Hn ) with Gn ∈ P and d1 (Hn , P) > ε that have a common local limit. 22.2.3. Hyperfinite properties. Hyperfiniteness is particularly important in property testing. Just as in the dense case, property testing is about the interplay between the sampling distance and the edit distance, and these two distances are intimately related for hyperfinite graphs. Benjamini, Schramm and Shapira show that hyperfiniteness is in a sense testable. This does not make sense as said, since hyperfiniteness is a property of a family of graphs, not of a single graph. But if we quantify hyperfiniteness, then we can turn it into a meaningful statement. Proposition 22.12. For every ε there is an ε′ such that for any positive integer k, the properties P1 = {(ε′ , k)-hyperfinite} and P1 = {not (ε, k)-hyperfinite} are distinguishable. Proof. Suppose that this is false, then by Theorem 22.9 there exist an ε > 0, a sequence εn → 0, graphs Gn , G′n ∈ G and positive integers kn such that Gn is (εn , kn )-hyperfinite, G′n is not (ε, kn )-hyperfinite, and δ⊙ (Gn , G′n ) → 0. Then the sequence (Gn ) is hyperfinite. Let us select a convergent subsequence, then the limit graphing of this subsequence is hyperfinite by Theorem 21.13. But the sequence (G′n ) has the same limit, so it must be hyperfinite, by the same theorem. This implies that there is a positive integer k such that all members of the sequence are (ε, k)-hyperfinite. Since (G′n ) is not (ε, kn )-hyperfinite, it follows that kn < k for all n. This implies that almost all connected components of G have at most k elements, but then it follows from Gn → G that all but a o(1) fraction of connected components of G′n have at most k elements, which implies that (G′n ) is an (ε, k)hyperfinite sequence. Benjamini, Schramm and Shapira [2010] proved an important analogue of Theorem 15.24: every minor-closed property of bounded degree graphs is testable. As noted by Elek, the theorem can be extended to any monotone hyperfinite graph property. Theorem 22.13. Every monotone hyperfinite property of graphs with bounded degree is testable. The property of being a forest is certainly minor-closed, so the example discussed at the end of the introduction above is a special case. As another special case, planarity of bounded degree graphs is testable. Proof. Let P be a monotone hyperfinite graph property, and suppose that it is not testable. Then there exist an ε > 0 and two sequences of graphs (Gn ) and (Fn ) such that Gn ∈ P, d1 (Fn , P) > ε and δ⊙ (Gn , Fn ) → 0. We may assume that both sequences are locally convergent, and so they have a common weak limit graphing
404
22. ALGORITHMS FOR BOUNDED DEGREE GRAPHS
G. Since P is hyperfinite, so is the sequence (Gn ). Theorem 21.13 implies that G is hyperfinite, and applying this theorem again, we get that (Fn ) is hyperfinite. By Theorem 21.19, the interlaced sequence G1 , F1 , G2 , F2 , . . . is locally-globally convergent, and hence we may assume that Gn , Fn → G in the local-global sense. Since G is hyperfinite, it has a Borel 2-coloring α : V (G) → [2] such that λ(α−1 (1)) ≤ ε′ and there is a k ∈ N such that every connected subgraph with k + 1 nodes contains a point v with α(v) = 1. By local-global convergence, if n is large enough, then Gn has a 2-coloring αn : V (Gn ) → [2] such that k δ⊙ ((G, α), (Gn , αn )) ≤ ε′ /k. In particular, |αn−1 (1)| ≤ 2ε′ v(Gn ) and the union of connected subgraphs with k + 1 nodes that contain no node with color 1 has at most ε′ v(Gn ) nodes. The graph Fn has a 2-coloring βn with similar properties. It k follows that δ⊙ ((Gn , αn ), (Fn , βn )) ≤ 2ε′ . ′ Let Gn be obtained from Gn by deleting all edges incident with any node of Sn as well as all edges in connected components of Gn [Tn ] that have more than k nodes. This way we delete at most 3ε′ Dv(Gn ) edges. Furthermore, every connected component of G′n has at most k nodes. We define Fn′ analogously. It is important that whether or not an edge is deleted is determined locally, which implies that whenever v ∈ V (Gn ) and u ∈ V (Fn ) satisfied BGn ,αn ,k (v) ∼ = BFn ,βn ,k (u), then also BG′n ,αn ,k (v) ∼ = BFn′ ,βn ,k (u), which means simply that the connected component of G′n containing v is isomorphic to the connected component k k ((Gn , αn ), (Fn , βn )) ≤ 2ε′ . Let Yk (G′n , Fn′ ) ≤ δ⊙ of Fn′ containing u. Hence δ⊙ denote the set of connected graphs with at most k nodes (up to isomorphism), let aY denote the number of connected components of G′n isomorphic to Y ∈ Yk , and let bY be defined analogously. In these terms, we have ∑ aY v(Y ) bY v(Y ) − ≤ 4ε′ . v(Gn ) v(Fn )
Y ∈Yk
( ) We may assume that v(Gn ) ≥ v(Fn ). Let cY = min bY , ⌊aY v(Fn )/v(Gn )⌋ . Let us keep cY copies of every Y ∈ Yk in Fn′ and delete the edges of the rest, to get a graph Fn′′ . The number of edges to delete is bounded by 2ε′ Dv(Fn ) + Dk|Yk | < (ε/2)v(Fn ) if n is large enough, and so d1 (Fn , Fn′′ ) ≤ 3ε′ D + ε/2 < ε. Furthermore, Fn′′ is isomorphic to a subgraph of Gn , and hence Fn′′ ∈ P by monotonicity. This implies that d1 (Fn , P) < ε, a contradiction. Corollary 22.14. Every minor-closed property of graphs with bounded degree is testable. Monotonicity of the property P was used in the proof above only in a somewhat annoying technical way, and one would like to extend the argument to all hyperfinite properties P. One must be careful though: the property that “G is a planar graph with an even number of nodes” is hyperfinite, but not testable (a large grid with an even number of nodes cannot be distinguished from a large grid with an odd number of nodes by neighborhood sampling). But the method works with a little twist: suppose that two graphs F and G have the same number of nodes, and we know that G ∈ P, and the sampling distance of G and F is small; then it follows that the edit distance of F from P is small. This implies that P is testable in a non-uniform sense. For an exact formulation and details, see Newman and Sohler [2011].
22.3. COMPUTABLE STRUCTURES
405
Exercise 22.15. Let G and G′ be (ε, k)-hyperfinite graphs with the same number of nodes n. Prove that they can be overlayed so that 1 k |E(G)△E(G′ )| ≤ 2(1 + Dk )ε + Dδ⊙ (G, G′ ). n
22.3. Computable structures Suppose that we want to compute some structure on a very large graph with bounded degree: say, a maximum matching, a maximum flow, a spanning tree, a maximum cut, a 3-coloring. Even if we can compute (approximately) the appropriate number, what does it mean to compute this object? Similarly as in the dense case, the answer is not obvious. One possible answer is similar to what we did in the dense case. We offer a service: if somebody comes with a question about a particular node v (“How is this node matched in your maximum matching?”), we can answer this question just by inspecting a bounded neighborhood of v (determine the mate u of v in the matching, or conclude that v is unmatched). These answers must be consistent, so for example in the case of matchings, if somebody comes with the request concerning the node u, we must match it with v. Furthermore, the proportion of unmatched nodes should be at most ε higher than for the true maximum matching. In the bounded-degree world, however, there is another, equivalent model, that is perhaps easier to understand and analyze. Let us place an “agent” on every node of the graph G. These agents are allowed to communicate with each other, but only with their neighbors along the edges and for a bounded time. At the end, they have to decide whether they would be matched with any neighbor at all, and if so, to which of their neighbors. This model is called distributed computing. The two models are essentially equivalent. • Suppose that the agents can compute something; then in the “service” model, if somebody comes with a question about a node v, we just look up what our agent responsible for v has computed. All the information the agent has collected from the neighbors can be gathered by exploring a bounded neighborhood of the node. • Suppose that we can provide the service correctly. Then we can instruct each agent to do the computation we would have done if the node they are responsible for were queried. Of course, the agent has to gather all the information about the neighborhood we would have explored, but this can be done by communicating with his/her neighbors. Of course, we also have to instruct them to provide and communicate the information that is needed for this. All of this takes a bounded number of bits. We assume here that the agents can generate a name for themselves that identifies them, at least locally. We don’t go into other details of this model, like whether the communication between agents is synchronized (sending a bit on every tick of the clock). We will in fact use a kind of hybrid description of the algorithms, where the agents are allowed to explore their neighborhood to a bounded depth (including the colors and weights of the other agents in this neighborhood, which we have to use in some cases). Using results of Nguyen and Onak [2008] and Cs´oka [2012a] we illustrate the power and some of the subtleties of these models. Symmetry. There is a fundamental difficulty with distributed algorithms: symmetry. Suppose that we want to construct a matching in a very long cycle. All our agents see the same neighborhood of any given radius, so they will all compute the
406
22. ALGORITHMS FOR BOUNDED DEGREE GRAPHS
same answer: that they want to remain unmatched (which gives an empty matching, very far from being optimal). Symmetry does not allow them to give any other answer, at least deterministically. We can break the symmetry and find a matching close to the optimum, if we allow the agents to flip coins. We can consider the coinflips generated by any agent as a real number between 0 and 1, and call this the local random seed of the agent. This takes an infinite number of coinflips, but only a finite and bounded number of them has to be generated during the run of the algorithm. Preprocessing. In our examples for computing a structure in the dense case, preprocessing (computing a representative set) played a large role. In the bounded degree case, there is less room for preprocessing. We can think of two kinds of preprocessing: • We can do some preliminary computation (perhaps randomized) independently of the graph, and inform the agents about the result. If we are lazy, we can just let the agents do this computation for themselves. The only information they need for this is the random seed we use during the computation. So it suffices to generate a random number in [0, 1] and tell it to all the agents. We call this number the global random seed. (Note: they could generate a random number themselves, but this would not be the same for all agents!) • We can do preliminary computation using information about the graph. This could be based on the distribution of r-balls in G for some fixed r (which is the realistic possibility for us to obtain information about the graph), but perhaps we have some other information about the graph (like somebody tells us that it is connected). Again, we can let the agents work, just have to pass on to them the information about the graph they need. In the strongest form, we let the agents know what the graph is (up to isomorphism). The task. Assume that our agents have to compute a decoration f : V (G) → C, where C is a finite set. Not all decorations will be feasible, but we assume that the feasibility criterion is local, i.e., there is an r ∈ N and a set of feasible Cdecorated r-neighborhoods such that a decoration is feasible if and only if every r-neighborhood is feasible. The goal is to find an “optimal” decoration. The decoration is evaluated locally in the following sense: we associate a value ω(B) ∈ [0, 1] to every C-decorated r-ball F ∈ F , and we want to minimize the average value of r-balls. Setting ω(v) = ω(BG,r (v), f |BG,r (v) ), the cost of the decoration is defined by ∑ 1 ω(v). w(f ) = v(G) v∈V (G)
The agents want to compute a decoration f for which w(f ) is as small as possible. Example 22.16 (Proper coloring). Suppose that we want to compute a proper k-coloring of G. Then we choose C = [K] for some very large K, the feasibility criterion is that the coloring should be proper (clearly this can be verified from the 1-neighborhoods), and we evaluate the coloring by imposing a penalty of 1 on every node with color larger than k. Example 22.17 (Maximum matching). Suppose that we want to compute a maximum matching in G. Then we can take C = [D + 1]. Decoration with i,
22.3. COMPUTABLE STRUCTURES
407
i ≤ D, means that the node is matched with its i-th largest neighbor (in the order of their local seeds); decoration with D + 1 means that the node is unmatched. The feasibility criterion is clearly local. We impose a penalty of 1 for decoration by D + 1. Example 22.18 (Max-flow-min-cut). Suppose that we are given a 3-coloring of the nodes of a graph G by red, white and green. We consider all edges to have capacity 1, and would like to find a maximum flow from the red nodes to the green nodes. This means a decoration of every node v by a rational vector f (v) = (f1 , . . . , fD ), where fi is the flow it is sending to its i-th highest weighted neighbor (this can be negative or positive). The sum of entries of f (v) is the gain γ(v) of the node. Feasibility means that the flow on any edge uv, indicated in the decoration of u, is the negative of the flow on this edge indicated in the decoration of v; furthermore, the gain is 0 at every white node, nonnegative at every red node, and nonpositive at every green node. The objective function is the sum of γ(v) over the red nodes. Computing the minimum cut fits in the framework quite easily too: We decorate every node by either “LEFT” or “RIGHT”. Feasibility means that all red nodes are decorated by “LEFT” and all green nodes are decorated by “RIGHT”. The objective value is half of the average number of neighbors of a node on the other side. The computational model. We compare four settings, getting increasingly more powerful: (A) The agents don’t get any preprocessing information, and have to work deterministically; (B) The agents don’t get any preprocessing information, but have access to their own random number generator; (C) The agents have access to their own random number generator, and in addition they get the same global seed chosen uniformly from [0, 1]; (D) The agents have access to their own random number generator, to the public random number g0 as in (B), and in addition they know the graph up to isomorphism (but they don’t know at which node of the graph they sit). Note that in models (A)-(C), the agents can see their own r-neighborhood, and can hear about other r-neighborhoods at a bounded distance from them, but they will not be able to learn global statistics. The agents themselves will not know, for example, the average degree of the graph. In model (D), one may be concerned how an arbitrarily large graph can be communicated to our agents; instead, we could say that this model allows any kind of information (any number of graph parameters) to be passed to the agents: the most natural would be neighborhood statistics, but the model allows non-testable graph parameters and properties like the chromatic number or connectivity to be used. We will see that there are nontrivial algorithms in the weakest model (A); that (B) is strictly stronger than (A), and (C) is strictly stronger than (B); but every problem solvable in (D) is also solvable in (C) with an arbitrarily small increase in the cost. 22.3.1. Matchings. We start with describing an algorithm in model (B), designed by Nguyen and Onak, [2008] to find an (almost) maximum matching.
408
22. ALGORITHMS FOR BOUNDED DEGREE GRAPHS
Algorithm 22.19. Input: A graph G with maximum degree D and no isolated nodes in the agent model, and an error bound ε. Output: A random matching M such that with probability at least 1 − ε, |M | ≥ (1 − ε)ν(G). As in most matching algorithms, we start with the empty matching, and augment it using augmenting paths: these are paths that start and end at unmatched nodes, and every second edge of them belongs to the current matching M . Augmenting along such a path interchanging the matching edges and non-matching edges) increases the size of the matching by 1. Of course, in our setting we will have to augment simultaneously along many disjoint augmenting paths, to make measurable progress. We will augment along augmenting paths of length at most k = ⌈3/ε⌉, which we call short augmenting paths. This is again done in rounds. It will be convenient to assume that in each round, a new local seed is generated for every agent v. (They could get this from a single random real number in [0, 1], by using all the bits in even position in the first round, half of the remaining bits in the second, etc.) This way the rounds will be independent of each other in the probabilistic sense. Augmentation along many disjoint augmenting paths will be carried out simultaneously by our agents. It is clear that agents looking at their neighborhoods with radius k will discover all short augmenting paths. The problem is that there will be conflicts: these short augmenting paths are not disjoint. To this end, we define when a path is better than another, and we will augment only along those paths that are better than any path intersecting them. To be precise, we define that path P is better than path Q if walking along both paths, starting from their endnodes with higher local seed, the first node that is different has higher local seed in P than in Q. We will augment along paths that are better than any path intersecting them; we call such a path locally best. If we allow agents to explore their neighborhoods with radius 2k, then every locally best short augmenting path will be discovered by at least one agent, who will carry out the augmentation (i.e., send a message to the agents along the path how their mates are to be changed). Several agents may do so for a given path, but there will be no conflict between their messages. The above is repeated q = 4D2k ⌈log(1/ε)⌉ times, then we stop and output the current matching. The idea in the analysis is that in a particular phase, we either find many good short paths, and hence make substantial progress, or the number of all short augmenting paths is small, in which case we have an almost maximum matching. Let as call a node eligible (at a certain phase) if at least one short augmenting path starts at it (such a node is of course unmatched). Let Mi be matching after the i-th round, and let Xi denote the number of eligible nodes. Let M ′ be a maximum matching, and consider the set Mi ∪ M ′ . This set of edges consists of the common edges of Mi and M ′ , and cycles and paths whose edges alternate between M and M ′ . Every cycle contains the same number of edges from Mi and M ′ . Paths that contain more edges from M ′ than from Mi have to end with edges in M ′ at both ends, and so they are augmenting paths. Thus the number of augmenting paths is at least |M ′ | − |Mi | = ν(G) − |Mi |. The number of augmenting paths among these that have length more than k is less than 2|M ′ |/k, so there are at least
22.3. COMPUTABLE STRUCTURES
409
( ) (1 − 2/k)ν(G) − |Mi | short augmenting paths, and Xi ≥ 2 − (4/k) ν(G) − 2|Mi | eligible nodes. Let u be an eligible node after phase i. All the short augmenting paths intersecting any of the short augmenting paths starting at u stay within BG,2k (u). Since |BG,2k (u)| ≤ D2k , there is a chance of at least p = 1/D2k that u is the node with highest local seed among them. Then the best path starting at u will be augmented upon, and hence u has a chance of at least p to become matched in that round. This means that (k − 2 ) ) ( 1 ν(G) − |Mi | E |Mi+1 | Mi ≥ |Mi | + pXi ≥ |Mi | + p 2 k (here expectation is taken over random choices in the (i + 1)-st round), which we can write as (k − 2 (k − 2 ) ) E ν(G) − |Mi+1 | Mi ≤ (1 − p) ν(G) − |Mi | . k k Taking expectation over Mi , we get (k − 2 ) (k − 2 ) E ν(G) − |Mi+1 | ≤ (1 − p)E ν(G) − |Mi | . k k Hence (k − 2 ) (k − 2 ) k−2 E ν(G) − |Mq | ≤ (1 − p)q E ν(G) − |M0 | = (1 − p)q ν(G). k k k By Markov’s Inequality, this implies that (k − 2 ) k−2 P ν(G) − |Mq | > ν(G) ≤ k(1 − p)q ≤ ke−pq ≤ ε. k k2 So with probability at least 1 − ε, we have k−2 k−2 (k − 1)(k − 2) ν(G) − ν(G) = ν(G) ≥ (1 − ε)ν(G). 2 k k k2 This proves that the algorithm works as claimed. We have seen that without local seeds, there is no way to approximately compute a maximum matching. So the matching problem can be solved in model (B) but not in (A). |Mq | ≥
22.3.2. Maximum flow: an algorithm in the weakest model. An algorithm based on similar ideas was developed by Cs´oka [2012a] to find an almost maximum flow (Example 22.18). This algorithm too looks for short augmenting paths. We do not describe the details here, but point out one interesting feature. Using the random local seeds, the agents compute a flow that is almost optimal, in a way similar to the maximum matching algorithm described above. A different choice of seeds would give a different flow. But the expected flow values (expectation taken over all random seeds) also give a valid, almost maximum flow, which is independent of any random seeds. This expectation can be computed locally, from the neighborhoods with radius 2r. Thus to compute the maximum flow, we don’t need the random seeds after all: deterministic agents can compute it. (This does not contradict our arguments about the curse of symmetry, because matchings are not invariant under all automorphism of the given graph, but there is a “canonical” maximum flow that is invariant under automorphisms preserving the sets of sources and sinks: the average of all maximum flows.)
410
22. ALGORITHMS FOR BOUNDED DEGREE GRAPHS
Cs´ oka also describes an algorithm in model (C) to find an almost minimum cut, and proves that for this, a public random number is needed; in other words, the problem cannot be solved in model (B). 22.3.3. Knowing the graph does not help. Our next goal is to show that model (D), which seems to be a lot stronger than (C), is in fact equivalent to it (up to an arbitrarily small increase of the cost; Cs´oka [2012a]). Theorem 22.20. Suppose that there is an algorithm in model (D) by which the agents compute, for every graph G, a feasible decoration f with cost c(G). Then for every ε > 0 there is an algorithm in model (C) by which the agents compute, for every graph G, a feasible decoration f with cost at most c(G) + ε. Proof. Recall that Ar denotes the set of all probability distributions ρG,r , where G ranges through finite graphs. By Proposition 19.9, its closure Ar is convex. Let us fix a large graph G. We may assume that c(G) is the optimal cost of a decoration the agents can compute in model (D). If the agents mistakenly believe that they are working on the graph F , then they will compute (in model (D)) a decoration fF of G (which is random, as a function of the public random seed and the private random seeds). The expectation of ω(fF ) will depend on the true distribution ρG,r . Setting ωF (v) = ω(BG,r (v), fF |BG,r (v) ), we get by the linearity of expectation, ∑ ( ) ( ) 1 E ωF (v) . E w(fF ) = v(G) v∈V (G)
The last expectation depends only on F and on BG,2r (v) (since the distribution of fF (u) depends only on the r-neighborhood of u, and to compute the distribution of ωF (v) it suffices to know the joint distribution of( the decorations fF (u) in the ) r-neighborhood of v). For B ∈ B2r , let a(F, B) = E ωF (root(B)) , then ∑ ( ) E w(fF ) = ρG,2r (B)a(F, B) = LF (ρG,2r ), B∈B2r
where LF : A2r → R is a homogeneous linear function. Clearly LF (ρG,2r ) ≥ c(G) for any F , and our assumption about the quality of the output of (D) implies that LG (ρG,2r ) = c(G). Next, we define a function u : A2r → R by u(ρ) = lim inf c(Gn ), where the limes inferior is to be taken over all sequences (Gn ) for which ρGn ,2r → ρ. A similar argument as in the proof of Proposition 19.9 shows that the function u is convex. For any graph G, considering the special sequence (Gn : n = 1, 2, . . . ), we get (22.7)
u(ρG,2r ) ≤ c(G).
We claim that for every ρ ∈ A2r there is a graph F such that (22.8)
LF (ρ) < u(ρ) + ε.
Indeed, let (Gn ) be a sequence of graphs such that ρn = ρGn ,2r → ρ and c(Gn ) → u(ρ). Then LGn (ρ) = LGn (ρn ) + o(1) = c(Gn ) + o(1) = u(ρ) + o(1). Thus F = Gn can be chosen in (22.8) for a sufficiently large n.
22.3. COMPUTABLE STRUCTURES
411
Br Next, we engage in a little convex { } geometry. Consider the set K ⊆ R × R, defined by K = (ρ, y) : y ≥ u(ρ) . It is{clear that this set is convex. For every } graph F , we consider the halfspace HF = (ρ, y) : y ≤ LF (ρ) − ε . We claim that ∩ K∩ HF = ∅. F
Indeed, suppose that (ρ, y) is a point contained in the left side, then (ρ, y) ∈ K, and (22.8) implies that there is an F ∈ G such that y ≥ u(ρ) > LF (ρ) − ε. On the other hand, (ρ, y) ∈ HF implies that y ≤ LF (ρ) − ε, a contradiction. Hence by Helly’s Theorem, there is a finite set of ∩graphs F1 , . . . , Fm ∈ A2r , where m ≤ |Br | + 1, such that K ∩ H = ∅, where H = i≤m HFi . Since K and H are convex, there is a halfspace defined by a linear inequality y−L(ρ) ≤ b containing H but disjoint from K. This means two things: (a) The inequality y ≥ u(ρ) (ρ ∈ A2r ) implies that y − L(ρ) > b. This last condition means that u(ρ) > L(ρ) + b for every ρ ∈ A2r . (b) The linear inequalities y − LFi (ρ) ≤ ε imply the inequality y − ∑L(ρ) ≤ b. By the Farkas Lemma, there are nonnegative numbers α such that i i αi = 1, ∑ α L = L, and b ≥ −ε. The numbers α form a probability distribution α on i F i i i [m]. Now we can give the following instruction to our agents: Use the even bits of the public random number g0 to pick an i ∈ [m] from the distribution α. (All agents will pick the same i.) Then pretend that you are working on the graph Fi , and compute the decoration fFi according to algorithm (C) (using the remaining bits of g0 as the public random number). Then (using (22.7) in the last step) the agents achieve a cost that is almost as good as the cost they could achieve knowing the graph: ( ) ∑ ( ) ∑ E w(fFi ) = αj E w(fFj ) = αj LFj (ρG,2r ) = L(ρG,2r ) j
j
≤ u(ρG,2r ) − b ≤ u(ρG,2r ) + ε ≤ c(G) + ε.
22.3.4. Computable structures and Borel sets. We conclude this chapter with sketching a connection between algorithmic problems and measure theory. Elek and Lippner [2010] give another algorithm for computing an almost maximum matching in a large bounded degree graph. Their approach is based on the connections with Borel graphs, which were discussed in Section 18.1. Instead of describing a second matching algorithm, we only illustrate the idea on a simpler example, by showing how the proof that every Borel graph with degrees at most D has a Borel coloring with D + 1 colors (Theorem 18.3) can be turned into an algorithm. In the proof of Theorem 18.3, we start with constructing a countable Borel coloring. This part of the argument can be translated easily. We have to select an explicit countable basis for the Borel sets in [0, 1); for example, we can choose intervals of the form [a/b, (a + 1)/b), where 0 ≤ a < b are integers. We have to assign a positive integer index to each of these intervals, say (2a + 1)2b . Then every agent picks the interval with smallest index that contains his local seed but not the local seed of any of his neighbors. Now the agents have indices (they can forget the seeds from now on). Trivially, adjacent agents have different indices. Next, every agent whose index is smaller than the indices of his neighbors changes his index to 1, and labels himself FINISHED. (In the proof of Theorem
412
22. ALGORITHMS FOR BOUNDED DEGREE GRAPHS
18.3, only those with index 2 not adjacent to any node with index 1 did so in the first round; but it is easy to see that eventually all nodes with a locally minimal index will change to 1). Next, all those agents whose index is smaller than the indices of all their unfinished neighbors change their indices to the smallest possible, etc. At this point comes an important difference: for Borel coloring, we could repeat this infinitely many times, but here we have a time bound. Those nodes that managed to change their indices are now properly colored with D+1 colors; however, there will be some who are stuck with their large original indices. Down-to-earth work starts here to show that their number is a small fraction of v(G). We don’t go into the details (see Exercises 22.22, 22.23). Exercise 22.21. Describe how Algorithm 22.19 can be simplified in two simpler versions of the problem: (a) we only want to find a maximal (non-extendable) matching (of course, with an error); (b) somebody marks a matching for us, and we have to test whether it is maximum (again, with some error). Exercise 22.22. Prove that any constant time distributed algorithm that constructs a legitimate coloring of a cycle will use, with high probability, more than 100 colors. Exercise 22.23. Prove that for every ε > 0 there is a k ≥ 1 such that the algorithm (with k rounds) as described above will produce a coloring in which fewer than εv(G) nodes have color larger than D.
Part 5
Extensions: a brief survey
CHAPTER 23
Other combinatorial structures The ideas of characterizing homomorphism functions, connection ranks, regularity lemmas and limit objects have been extended to several combinatorial structures besides graphs. Some of these extensions are rather involved and deep, like the limit theory of hypergraphs; others can be described as “analogous” (at least after finding the right definitions). Without attempting to be complete, we survey several of these extensions.
23.1. Sparse (but not very sparse) graphs The obvious big gap in our treatment of limits of growing graph sequences is any sequence of graphs with density tending to 0, but maximum degree tending to infinity. Some interesting examples are the point-line incidence graphs of finite projective planes (about n3/2 edges, if n is the number of nodes), and d-cubes (n log n edges). Some work has been done. We have mentioned extensions of the Regularity Lemma to sparser graphs by Kohayakawa [1997], Gerke and Steger [2005], and Scott [2011]). While the case of bounded degree graphs is open, these results are highly nontrivial for sparse (but not very sparse) graphs, and have important applications. They are very likely to play an important role in the limit theory of such graphs. In a substantial paper, Bollob´as and Riordan [2009] investigate many of the techniques discussed in this book and elsewhere, mostly from the point of view of extending them from the case of dense graphs to sparser classes. Lyons [2005] extended the convergence theory of bounded degree graphs to graph sequences with bounded average degree, under a condition called tightness (this guarantees that the sequence of sample distributions has a limit distribution). The following example shows that some condition like this is necessary: the average degree of a subdivision G′ of any graph G (dense or not) is bounded by 4. Clearly, to properly describe the limit of the graph sequence (G′n ), the description must contain essentially the same information as the limit of the sequence (Gn ). So limits of graphs with bounded average degree are as complex as limits of any graph sequence (dense or sparse). Graphons and graphings generalize dense graphs and bounded degree graphs, respectively, and they can be considered as the two extremes as far as edge density goes. One common feature is that we can do a random walk on each of them. More precisely, there is a Markov chain on a graphon, as well as on a graphing, and we are going to show that this Markov chain contains all the necessary information about these objects. 415
416
23. OTHER COMBINATORIAL STRUCTURES
Let W : Ω2 → [0, 1] be a graphon with density ω = t(K2 , W ). We can define a Markov chain on Ω by ∫ 1 W (u, v) dv Pu (A) = dW (u) A
(this is defined for almost all u). This Markov chain has a stationary distribution, defined by ∫ 1 π(A) = W (x, y) dx dy. ω A×[0,1]
It is also easy to check that this Markov chain is reversible. The step distribution of this Markov chain is proportional to the integral measure of W . Note that the Markov chain does not change if we scale W , so we have to remember the “density” ω if we want to preserve all information about W . But the Markov chain together with the density does determine the graphon. Next, consider a graphing G on Ω. We can define a Markov chain by degA (u) Pu (A) = . deg(u) Then the random walk defined by this chain is just the random walk on this graph in the usual sense. The measure preservation condition (18.2) says that this Markov chain is reversible. A stationary measure of this random walk is λ∗ (as defined in Section 18.2), its step distribution is η/ω. So the step distribution of the Markov chain is the same as the probability measure on the edges of a graphing. The graphing is determined by this Markov chain. It is a fascinating open problem whether Markov chains can be used to define convergence and limit objects for graph sequences that are neither dense nor of bounded degree. 23.2. Edge-coloring models 23.2.1. Edge-connection matrices. We consider multigraphs with loops. It will be useful to allow a single edge with no endpoints; we call this graph the circle, and denote it by ⃝. We can define edge-connection matrices that are analogous to the connection matrices defined before: Instead of gluing graphs together along nodes, we glue them together along edges. To be precise, we define a k-broken graph as a k-labeled graph in which the labeled nodes have degree one. (It is best to think of the labeled nodes as not nodes of the graph at all, rather, as points where the k edges sticking out of the rest of the graph are broken off.) We allow that both ends of an edge be broken off. For two k-broken graphs G1 and G2 , we define G1 ∗ G2 by gluing together the corresponding broken ends of G1 and G2 . These ends are not nodes of the resulting graph any more, so G1 ∗ G2 is different from the graph G1 G2 we would obtain by gluing together G1 and G2 as k-labeled graphs. We can glue together two copies of an edge with both ends broken off; the result is the circle ⃝. One very important difference is that while G1 G2 is k-labeled, G1 ∗ G2 has no broken edges any more, and so it is not k-broken but 0-broken. This fact leads to considerable difficulties in the treatment of edge models. For every graph parameter f and integer k ≥ 0, we define the edge-connection matrix M ′ (f, k) as follows. The rows and columns are indexed by isomorphism
23.2. EDGE-COLORING MODELS
417
types of k-broken graphs. The entry in the intersection of the row corresponding to G1 and the column corresponding to G2 is f (G1 ∗ G2 ). Note that for k = 0, we have M (f, 0) = M ′ (f, 0), but for other values of k, connection and edge-connection matrices are different. Let G be a finite graph. An edge-coloring model is determined by a mapping h : Nq → R, where q is positive integer. We call h the node evaluation function. Here we think of [q] as the set of possible edge colors; for any coloring of the edges and d ∈ Nq , we think of h(d) as the “value” of a node incident with dc edges with the color c (c ∈ [q]). In statistical physics this is called a vertex model: the edges can be in one of several states, which are represented by the color; an edgecoloring represents a state of the system, and (assuming that h > 0) ln h(d) is the contribution of a node (incident with dc edges with color c) to the energy of the state. There are many interesting and important questions to be investigated in connection with edge-coloring models; we will only consider what in statistical physics terms would be called its “partition function”. To be more precise, for an edgecoloring φ : E(G) → [q] and node v, let degc (φ, v) denote the number of edges e incident with node v with color φ(e) = c. So the vector deg(φ, v) ∈ Nq is the “local view” of node v. The edge-coloring function of the model is defined by ∑ ∏ ( ) h deg(φ, v) . col(G, h) = φ: E(G)→[q] v∈V (G)
Recall that we allow the graph ⃝ consisting of a single edge with no endpoints; by definition, col(⃝, h) = q. We also allow that q = 0, in which case col(G, h) = 1 if G has no edges, and col(G, h) = 0 otherwise. We could of course allow complex valued node evaluation functions, in which case the value of the edge-coloring function can be complex. Example 23.1 (Number of perfect matchings). The number of perfect matchings can be defined by coloring the edges by two colors, say black and white, and requiring that the number of black edges incident with a given node be exactly one. This means that this number is col(., h), where h : N2 → R is defined by h(d1 , d2 ) = 1(d1 = 1). The number of all matchings could be expressed similarly. Example 23.2 (Number of 3-edge-colorings). This number is col(., h), where h : N3 → R is defined by h(d1 , d2 , d3 ) = 1(d1 , d2 , d3 ≤ 1). Example 23.3 (Spectral decomposition of a graphon). Recall the definition (7.18) and expression (7.25) for t(F, W ) in terms of the spectrum of TW . We can consider χ as a coloring of E(F ) with colors 1, 2, . . . . Then Mχ (v) depends only ( on the) numbers of edges with different colors, and so we can write Mχ (v) = h deg(χ, v) , and we get ∑ ∏ ∏ ( ) t(F, W ) = col(G, λ, h) = λχ(e) h deg(χ, v) . χ: E(G)→[q] e∈E(F )
v∈V (G)
However, this is not a proper edge-coloring model, since the value of the circle, which is the number of colors, is infinite in general. The following facts about the edge-connection matrices of edge-coloring functions are easy to prove along the same lines as Proposition 5.64:
418
23. OTHER COMBINATORIAL STRUCTURES
Proposition 23.4. For every edge-coloring model h : Nq → R, the graph parameter col(., h) is multiplicative, its edge-connection matrices M ′ (f, k) are positive semidefinite, and rk(M ′ (f, k)) ≤ q k . B. Szegedy [2007] showed that the first two of these properties suffice to give a characterization of edge-coloring functions. Theorem 23.5. A graph parameter f can be represented as an edge-coloring function if and only if it is multiplicative and M ′ (f, k) is positive semidefinite for all k ≥ 0. The proof of this theorem is quite involved and not reproduced here; it is based on ideas similar to those used in Section 6.6 to prove Theorem 5.57, but using quite a bit more involved tools: The use of the Nullstellensatz and simple semidefiniteness arguments must be replaced by real versions of the Nullstellensatz (Positivstellensatz), and the simple symmetry arguments must be replaced by deeper results from the representation theory of algebras. (As a historical comment, the proof of Theorem 23.5 came first, and Schrijver’s proof of Theorem 5.57 was motivated by this.) Draisma, Gijswijt, Lov´ asz, Regts and Schrijver [2012] give characterizations of complex valued edge-coloring functions. Let us state without proof a result of Schrijver [2012], which shows that a condition on the growth of the edge-connection rank (along with minor other constraints) can characterize complex edge-coloring models. Theorem 23.6. A complex valued graph parameter f is an edge-coloring function model if and only if it is multiplicative, f (⃝) is real, and ( of a complex ) rk M ′ (f, k) ≤ f (⃝)k for every k. 23.2.2. Tensor algebras. There is a surprisingly close connection between edge-coloring models and rather general multilinear algebra. Given an edge-coloring model (with color set [q]), we can think of the nodes as little gadgets with wires (or legs) sticking out corresponding to the edges incident with it. If we assign colors to the wires, the gadget outputs a number (this could be real or complex). The graph parameter defined by this edge-coloring model is the expectation of the product of these numbers, one for each node, where the edge-coloring is random. Formulating the question like this, we see two restrictions that look artificial: — There is only one gadget for each degree. Why not have several? — We have assumed that the output of a gadget depends only on the number of legs with each color; in other words, it is invariant under the permutation of the legs. Why not drop this condition? If we relax these conditions, then every gadget would be described by a real array (Hi1 ,...,id : ir ∈ [q]), where d is the number of legs. In other words, the gadget is described by a tensor with d slots over Rq . Furthermore, we have to indicate for each node v which gadget is sitting there, and how its legs correspond to the edges incident with v. In more mathematical terms, we have a graph where a tensor is associated with every node, and an index associated with every edge, so that the slots (indices) of the tensor correspond to the edges incident with the node. Let us call such a graph a tensor network. The corresponding graph parameter is evaluated by taking the
23.2. EDGE-COLORING MODELS
419
product of these tensors, and then summing over all choices of the indices. Note that every index occurs twice, so we could call this “tracing out” every index. These tensor networks play an important role in several areas of physics, but we can’t go into this topic in this book. This setup allows for a more general construction. If we have a tensor network with k broken edges, then the value associated with the graph will depend on the color of these edges, in other words, it will be described by an array (Ai1 ,...,ik : ir ∈ [q]). So the graph with k broken edges can be considered as a gadget itself. We can break down the procedure of assembling a tensor network from the gadgets (with or without broken edges) into two very simple steps: (a) We can take the disjoint union of two gadgets; if the gadgets have k and l legs, respectively, the union has k + l legs. In terms of multilinear algebra, this means to form the tensor product of two tensors. (b) We can fuse two legs of a gadget. If (Ai1 ,...,ik : ir ∈ [q]) is the tensor describing the gadget, and (say) we fuse legs k − 1 and k, then we get the tensor ∑ Bi1 ,...,ik−2 = Ai1 ,...,ik−2 ,j,j . j∈[q]
In multilinear algebra slang, we trace out the last two indices. It is easy to see that with these operations, we can construct every tensor network with or without broken edges, and we get the corresponding tensor. Supposing that we have a starting kit of gadgets, we can look at the set of all tensors that can be realized by assembling tensor graphs with broken edges from these gadgets. In the spirit of linear algebra, we take all linear combinations of the obtained tensors with the same number of slots. Every tensor obtained this way will be called an assembled tensor. It is clear from (a) and (b) above that the set of assembled tensors has the following structure: For every k, there is a linear space Tk of tensors over Rq with k slots. For every A ∈ Tk and B ∈ Tl , the tensor product A ⊗ B ∈ Tk+l . For every A ∈ Tk , and any two indices in A, tracing out these two indices results in a tensor in Tk−2 . We call such a set of tensors a traced tensor algebra. Conversely, every traced tensor algebra arises as the set of assembled tensors: for every number k of slots, we select a basis of the space Tk , and use the resulting set of tensors as the starting kit. It is quite fruitful to use this connection; one can obtain results that are new both for graphs and for tensor algebras. We describe one important result with combinatorial connections. Given a starting kit K, how can we decide about a tensor whether it can be assembled from this kit? In other words, is it contained in the traced tensor algebra generated by K? A beautiful answer to this question was found by Schrijver [2008a], which we describe in graph-theoretic terms (the proof uses the representation theory of algebras, and we do not give it here; cf. also Schrijver [2008b, 2009]). Recall that we work over a fixed vector space Rq . Every q × q real matrix A is a gadget in itself, with two legs. If it is symmetric, then the legs are interchangeable, but in general we have to talk about a “left leg” (corresponding to the row index) and a “right leg” (corresponding to the column index). Connecting the gadgets for matrices A and B in series gives a gadget representing the matrix AB.
420
23. OTHER COMBINATORIAL STRUCTURES
Orthogonal matrices will play a special role. One observation is that if A is orthogonal, then AAT = I (the identity matrix), and so if we have a gadget graph with no broken edges, and replace any edge by a path of length 3 with A and AT sitting on it: ←→ If we replace every edge by this path of length 3, then the value of the graph does not change. However, we can group together every original gadget B with the orthogonal matrices next to, to get a gadget B A , which—in multilinear algebra terms—is obtained from B by applying the linear transformation A to every slot. If we replace every gadget B in the kit by B A , then the value of the tensor network does not change.
Figure 23.1. Replacing every edge by a path with the same orthogonal transformation at both inner nodes (just facing the opposite direction), and regrouping does not change the value. Now consider a tensor network with broken edges. If we replace every tensor B in the kit by B A , then the matrices A and AT along the unbroken edges still cancel each other, but on the broken edges, one copy still remains. In other words, if we apply the same orthogonal transformation to every slot of every tensor in the kit, then the tensor defined by a tensor network with broken edges undergoes the same transformation.
Figure 23.2. Applying the same orthogonal transformation to all slots of all tensors in the kit results in applying the same orthogonal transformation to the slots of the assembled tensor. In particular, if all tensors in the kit have the property that a particular orthogonal transformation applied to all their slots leaves them invariant, then the
23.3. HYPERGRAPHS
421
same holds for every assembled tensor. The theorem of Schrijver [2008a] asserts that this is the only obstruction to assembling a given tensor. Theorem 23.7. Let T be a traced tensor algebra generated by a set S of tensors, including the identity tensor 1(i = j) (i, j ∈ [q]). Then a tensor T is in T if and only if it is invariant under every orthogonal transformation that leaves every tensor in S invariant. The special case when the generating tensors are symmetric describes edgecoloring models. This can be viewed as an analogue of Theorem 6.38, with the role of the edges and nodes interchanged. Regts [2012] showed how Theorem 23.7 yields an exact formula for the edge-connection rank of edge-coloring models. Example 23.8 (Number of perfect matchings revisited). The tensor model for this graph parameter is a bit more complicated than in Example 23.1. We have 2 edge colors (which will be convenient to call 0 and 1), so we work over R2 ; but we need to specify a tensor for every degree d, expressing that exactly one edge is black: Ti1 ,...,id = 1(i1 + · · · + id = 1). It is easy to see that no orthogonal transformation, applied to all slots, leaves this tensor invariant, so it follows from Theorem 23.7 that every tensor can be assembled from this kit. (We note that the tensor is invariant under permuting the slots; however, this symmetry is not preserved under composition of tensor networks.) Example 23.9 (Number of 3-edge-colorings revisited). To construct a tensor model for the number of 3-edge-colorings, we work over R3 . We again need to specify a tensor for every degree expressing that the edges have different colors: Ti1 ,...,id = 1(i1 , . . . , id are different) (for d > 3, we get the 0 tensor). Permuting the colors (i.e., the coordinates in the underlying vector space R3 ) leaves this tensor invariant, and these are the only orthogonal transformations of R3 with this property. Theorem 23.7 implies that a tensor is invariant under the permutations of the coordinates of R3 if and only if it can be assembled from this kit. 23.3. Hypergraphs When talking about generalizing results on graphs, the first class of structures that comes to mind is hypergraphs (at least to a combinatorialist). So it is perhaps surprising that to extend the main concepts and methods developed in this book (quasirandomness, limit objects, Regularity Lemma, and Counting Lemma) to hypergraphs is highly nontrivial. Even the “right” formulation of the Regularity Lemma took a long time to find, and in the end both the Regularity Lemma and the limit object turned out quite different from what one would expect as a naive generalization. Nevertheless, the issue is essentially solved now, thanks to the work of Chung, Elek, Graham, Gowers, R¨odl, Schacht, Skokan, Szegedy, Tao and others. A full account of this work would go way beyond the possibilities of this book, but we will give a glimpse of the results. By an r-uniform hypergraph, or briefly r-graph, ( we ) mean a pair H = (V, E), where V = V (H) is a finite set and E = E(H) ⊆ Vr is a collection of r-element
422
23. OTHER COMBINATORIAL STRUCTURES
subsets. The elements of V are called nodes, the elements of E are called edges. So 2-graphs are equivalent to simple graphs. We can define the homomorphism number hom(G, H) of an r-graph G into an r-graph H in the natural way, as the number of maps φ : V (G) → V (H) for which φ(A) ∈ E(H) for every A ∈ E(G). The homomorphism density of G in H is defined as one expects, by the formula hom(G, H) t(G, H) = |V (G)||V (H)| Quasirandomness can be defined by generalizing the condition on the density of quadrilaterals. We need to define a couple of special hypergraph classes. Let Knr ( ) r denote the complete r-uniform hypergraph on [n] (i.e., E(Knr ) = [n] r ). Let Lk be the “complete r-partite hypergraph” defined on the node set V1 ∪· · ·∪Vr , where the Vi are disjoint k-sets, and the edges are all r-sets containing exactly one element from each Vi . Clearly t(Krr , H) = t(Lr1 , H) is the edge density of H. It is not r hard to prove that t(Lrk , H) ≥ t(Krr , H)k for every H (this generalizes inequality 2.9 from the Introduction). We define the quasirandomness of H as the difference r qr(H) = t(Lr2 , H) − t(Krr , H)2 . A sequence (Hn ) of hypergraphs is called quasirandom with density p if r t(Krr , Hn ) → p and qr(Hn ) → 0, or equivalently, t(Lr2 , Hn ) → p2 . It was proved by Chung and Graham [1989] that this implies that t(G, Hn ) → pe(G) for every r-graph G, so the equivalence of conditions (QR2) and (QR3) for quasirandomness in the Introduction (Section 1.4.2) generalizes nicely. As a first warning that not everything extends in a straightforward way, let us try to generalize (QR5). A first guess would be to consider disjoint sets X1 , . . . , Xr ⊆ V , and then stipulate that the number of edges with one endpoint in each of them is p|X1 | . . . |Xr | + o(nr ). (For simplicity of presentation, we assume that v(Hn ) = n.) This property is indeed valid for every quasirandom sequence, but it is strictly weaker than quasirandomness. It is not well-defined what the “right” generalization is; we state one below, which is a version of a generalization found by Gowers. Several other equivalent conditions are given by Kohayakawa, R¨odl and Skokan [2002]. Proposition 23.10. A sequence (Hn ) of hypergraphs is quasirandom with density p if and only if for every (r−1)-graph Gn on V (Hn ), the number of of Hn that ( edges ) induce a complete subhypergraph in Gn is t(Krr−1 , Gn )t(Krr , Hn ) nr + o(nr ). In the case of simple graphs (r = 2), let Hn be a simple graph with edge density p. The 1-graph Gn means simply a subset of V (Hn ), and K21 is just a 2-element set. So the condition says that the number of edges of the graph Hn induced by the set Gn is asymptotically ( ) ( ) ( ) ( |Gn | n |Gn | )2 2e(Hn ) n ∼ p , t(K21 , Gn )t(K22 , Hn ) = n n2 2 2 2 and so we get condition (Q4). For general r, the condition can be rephrased as follows: for a random r-set X ⊆ V , the events that X is complete in Gn and X is an edge in Hn are asymptotically independent. The last remark takes us to another complication. Example 23.11. Let G(n, 1/2) be a random graph and let Tn denote the 3-graph formed by the triangles in G(n, 1/2). Then Tn is a 3-graph with density 1/8, which is random in some sense, but it is very different from the random 3-graph Hn on
23.3. HYPERGRAPHS
423
[n] obtained by selecting every edge independently with probability 1/8. In fact, the sequence (Hn ) is quasirandom with probability 1 (this is not hard to see), while Tn has a very small intersection with every quasirandom 3-graph by Proposition 23.10. Also, Tn has some special features, like no 4-set of nodes contains exactly 3 edges of Hn . On the other hand, Tn is totally homogeneous. It has no special global structure; more concretely: on any two disjoint k-sets we see independent copies of the same random hypergraph. If we want to generalize the Regularity Lemma, it has to reflect the difference between Tn and Hn , and similarly for the generalization of the notion of graphons. Which of these sequences should tend to a constant function? We show how to overcome this difficulty, starting with the construction of the limit object. We say that a sequence of r-graphs (Hn ) is convergent, if v(Hn ) → ∞ and t(F, Hn ) has a limit as n → ∞ for every r-graph F . Let t(F ) denote this limit. How to represent this limit function, in other words, what is the hypergraph analogue of a graphon? The natural guess would be a symmetric r-variable function W : [0, 1]r , which would represent the limit by ∫ ∏ W (xi1 , . . . , xir ) dx. t(F, W ) = [0,1]r {i ,...,i }∈E(F ) 1 r
The example of the hypergraphs Hn and Tn above show that this cannot be right. The only reasonable candidate for their limit object would be the function W ≡ 1/8, which represents correctly the limiting densities for the sequence Hn , but not for the sequence Tn . We could make life even more complicated, and consider the intersection Hn ∩ Tn , which is a random 3-graph with expected density 1/64, and the limiting densities are even more complicated. For r > 3, one could construct a whole zoo of homogeneous random hypergraphs, generalizing the construction of Hn and Tn . After several steps of generalization, one arrives at the following: we generate a random coloring of Knj for every 0 ≤ j ≤ r (with any number of colors). To decide whether an r-subset X ⊆ [n] should be an edge, we look at the colors of its subsets, and see if this coloring belongs to some prescribed family of colorings of 2X . (We assume that the prescribed family is invariant under permutations of X.) While this example warns us of complications, it also suggests a way out: we describe the limit not in the r-dimensional but in the 2r -dimensional space. In fact, the limit object turns out to be a subset, rather than a function, which is a gain (it is of course very little relative to the increase in the number of coordinates). [r] Consider the set [0, 1]2 (so we have a coordinate xI for every I ⊆ [r]; the coordinate for ∅ will play no role, we can think of it as 0). Let us note that the [r] symmetric group Sr acts on the power set 2[r] , and hence also on [0, 1]2 . Let [r] U ⊆ [0, 1]2 be a measurable set that is invariant under the action of Sr . We call such a set a hypergraphon. For every hypergraphon U , we define the density of an r-graph F as follows. We assign independent random variables XS , uniform in [0, 1], to every subset S ⊆ V (F ) with |S| ≤ r. For every edge A = {a1 , . . . , ar } ∈ E(F ), and every I ⊆ [r], we denote by AI the subset {ai : i ∈ I}, and we consider the point [r] X(A) ∈ [0, 1]2 defined by (X(A))I = XAI (this depends on the ordering of A, but
424
23. OTHER COMBINATORIAL STRUCTURES
this will not matter thanks to our symmetry assumption about U ). Now we define ( ) t(F, U ) = P X(A) ∈ U for all A ∈ E(F ) . To illuminate the meaning of this formula a little, consider the case r = 2. Then we have U ⊆ [0, 1]3 , where the three coordinates correspond to the sets {1}, {2} and {1, 2} (as we remarked above, the empty set plays no role). For a graphon W , we define the set UW = {(x1 , x2 , x12 ) ∈ [0, 1]3 : x12 ≤ W (x1 , x2 )}. Then it is easy to see that t(F, UW ) = t(F, W ) for any simple graph F . Elek and Szegedy [2012] prove the following. Theorem 23.12. For every convergent sequence (Hn ) of r-graphs there is a hypergraphon U such that t(F, Hn ) → t(F, U ) for every r-graph F . The limit graphon is essentially unique up to some “structure preserving transformations”, which are more difficult to define than in the case of graphs and we don’t go into the details. Elek and Szegedy [2012] give several applications of Theorem 23.12. For a given hypergraphon U , they define U -random hypergraphs and prove that they converge to U . They derive from it the Hypergraph Removal Lemma due to Frankl and R¨odl [2002], Gowers [2006], Ishigami [2006], Nagle, R¨odl and Schacht [2006] and Tao [2006a]. As a refreshing exception, the statement of this lemma is a straightforward generalization of the Removal Lemma for graphs (Lemma 11.64); the proof of Elek and Szegedy is similar to our second proof in Section 11.8. They also derive the Hypergraph Regularity Lemma using Theorem 23.12, using a stepfunction approximation of hypergraphons. This brings us to the Hypergraph Regularity Lemma, a very important but also quite complicated statement. There are several essentially equivalent, but not trivially equivalent forms, due to Frankl and R¨odl [1992], Gowers [2006, 2007], R¨odl and Skokan [2004], R¨odl and Schacht [2007a, 2007b]. Proving the appropriate Counting Lemma for these versions is a further difficult issue, and I will not go into it. But I must not leave this topic without stating at least one form, based on the formulation of Elek and Szegedy [2012], which in fact generalizes the strong form of the Regularity Lemma (Lemma 9.5). We have to define what we mean by “regularizing” a hypergraph. For ε, δ > 0 and k ∈ N, we define an (α, β, k)-regularization of a r-graph H on [n] as follows. For every i ∈ [r], we partition the complete hypergraph Kni into r-graphs Gi,1 , . . . , Gi,k . Let us think of the edges in Gi,j as colored with color j. This defines a partition P of the edges of Knr , where two r-sets are in the same class if the colorings of their subsets are isomorphic. The family {Gi,j : i ∈ [r], j ∈ [k]}, together with an r-graph G on [n] will be called an (α, β, k)-regularization of H, if (a) every r-graph Gi,j has quasirandomness at most α, and (b) G is the union of some of the classes of P, and ( ) (c) |E(H)△E(G)| ≤ β nr . Now we can state one version of the Hypergraph Regularity Lemma. Lemma 23.13 (Strong Hypergraph Regularity Lemma). For every r ≥ 2 and every sequence ϵ = (ε0 , ε1 , ...) of positive numbers there is a positive integer kϵ such that for every r-graph H there is an integer k ≤ kϵ such that H has an (εk , ε0 , k)-regularization. The main point is that to regularize H, we have to partition not only its node set, but also the set of i-tuples for all i ≤ r. Just like in the graph case, we could
23.4. CATEGORIES
425
demand that the i-graphs Gi,j have almost the same number of edges for every fixed i. Of course, the prize we have to pay for stating a relatively compact version is that it takes more work to apply it; but we don’t go in that direction. The extension of the theory exposed in this book to hypergraphs is not complete, and there is space for a lot of additional work. Just to mention a few loose ends, it seems that no good extension of the distance δ has been found to hypergraphs (just as in the case of limit objects or the regularity lemma, the first natural guesses are not really useful). Another open question is to extend these results to nonuniform hypergraphs, with unbounded edge-size. The semidefiniteness conditions for homomorphism functions can be extended to hypergraphs (see e.g. Lov´ asz and Schrijver [2008]), but perhaps this is just the first, “naive” extension. One area of applications of these conditions is extremal graph theory. The work of Razborov [2010] shows that generalizations of graph algebras and of the semidefiniteness conditions can be useful in extremal hypergraph theory. However, we have seen that graph algebras can be defined in the setting of gluing along nodes and also along edges, and this indicates that for hypergraphs a more general concept of graph algebras may be useful. 23.4. Categories The categorial way of looking at mathematical structures is quite prevalent in many branches of mathematics. In graph theory, the use of categories (as a language and also as guide for asking question in a certain way) has been practiced mainly by the Prague school, and has lead to many valuable results; see e.g. the book by Hell and Neˇsetˇril [2004]. One can go a step further and consider categories (with appropriate finiteness assumptions) as objects of combinatorial study on their own right. After all, categories are rather natural generalizations of posets, and there is a huge literature on the combinatorics of posets. However, surprisingly little has happened in the direction of a combinatorial theory of categories; some early work of Isbell [1991], Lov´ asz [1972] and Pultr [1973], and the more recent work of Kimoto [2003a, 2003b] can be cited. Working with graph homomorphisms, we have found not only that the categorial language suggests very good questions and a very fruitful way of looking at our problems, but also that several of the basic results about graph homomorphism and regularity can be extended to categories in a very natural way. The goal of this section is to describe these generalizations, and thereby encourage a combinatorial study of categories. (Appendix A.8 summarizes some background.) 23.4.1. Cancellation laws. Counting homomorphisms has been a main tool for proving cancellation laws for finite relational structures in Section 5.4, and it is not surprising that these results can be extended to locally finite categories (Lov´ asz [1972], Pultr [1973]). The following two theorems generalize Theorem 5.34, Proposition 5.35(b) and Lemma 5.38 to categories. Theorem 23.14. Let a and b be two objects in a locally finite category such that the direct powers a×k and b×k exist and are isomorphic. Then a and b are isomorphic. Theorem 23.15. Let a, b, c be three objects in a locally finite category K such that the direct products a × c and b × c exist and are isomorphic.
426
23. OTHER COMBINATORIAL STRUCTURES
(a) If both a and b have at least one morphism into c, then a and b are isomorphic. (b) There exists an isomorphism from a × c to b × c that commutes with the projections of a × c and b × c to c. So if there is any isomorphism σ in Figure 23.3, then there is one for which the diagram commutes.
Figure 23.3. 23.4.2. Connection matrices and algebras of morphisms. For the next theorem, we assume that K is a locally finite category that has a zero object, a left generator, pushouts, and epi-mono decompositions. Let f be a real valued function defined on the objects, invariant under isomorphism. We say that f is multiplicative over coproducts, if f (a ⊕ b) = f (a)f (b) for any two objects a and b. For every object a, we define a (possibly infinite) symmetric matrix M (f, a), whose rows and columns are indexed by morphisms in Kain , and whose entry in row ( ) α and column β is f t(α ∨ β) (since α ∨ β is determined up to isomorphism, this is well defined). Note that specializing to the category of graph homomorphisms, M (f, a) corresponds to the multiconnection matrix; to get the simple connection matrix, we have to restrict the row and column indices to monomorphisms. One can extend the characterization of homomorphism functions in Corollary 5.58 to categories (Lov´ asz and Schrijver [2010]); this theorem will also contain the dual characterization Theorem 5.59. Theorem 23.16. Let K be a locally finite category that has a zero object z, a left generator, pushouts, and epi-mono decompositions. Let f be a function defined on the objects, invariant under isomorphism. Then there is an object b such that f = |K(., b)| if and only if the following conditions are fulfilled: (F1) f (z) = 1, (F2) f is multiplicative over coproducts, and (F3) M (f, a) is positive semidefinite for every object a. We note that if there is an epimorphism from a to b, then M (f, b) is a submatrix of M (f, a). Thus it would be enough to require the semidefiniteness condition for a left-cofinal subset of elements a. Corollary 23.17. Conditions (F1)–(F3) of the theorem imply that (a) the values of f are non-negative integers, (b) the rank of M (f, a) is finite for every a. Statement (a) of this corollary contrasts it with Theorem 5.54, where (thanks to the weights) the function values can be arbitrary real numbers. An analogue of (b) must be imposed as an additional condition e.g. in the characterization in Theorem 5.54, while in this version it follows from the other assumptions.
23.4. CATEGORIES
427
The conditions are very similar to those in Theorem 5.54, except that there the graphs cannot have loops and the matrices are indexed by monomorphisms only. As a consequence, the characterization concerns homomorphism numbers into weighted graphs, which has not been extended to categories so far. The proof of Theorem 23.16 is built on similar ideas as the proof of Theorem 5.54 in Chapter 6, using algebras associated with the category. Since it is instructive how such algebras can be defined, we describe their construction below; for the details of the proof, we refer to the paper of Lov´asz and Schrijver [2010]. For two objects a and b in a locally finite category K, a formal linear combination (with real coefficients) of morphisms in K(a, b) will be called a quantum morphism. Quantum morphisms between a and b form a finite dimensional linear space Q(a, b). Let ∑ ∑ x= xφ φ ∈ Q(a, b) and y= yψ ψ ∈ Q(b, c), φ∈K(a,b)
ψ∈K(b,c)
then we define xy =
∑
xφ yψ φψ ∈ Q(a, c).
φ∈K(a,b) ψ∈K(b,c)
With this definition, quantum morphisms form a category Q on the same set of objects as K. (Of course, Q is not locally finite any more, but it is locally finite dimensional.) We can be more ambitious and take formal linear combinations of morphisms in Kaout (for a fixed object a), to get a linear space Qout a . This space will be infinite dimensional in general, but it has interesting finite dimensional factors. For each object a, the pushout operation ∧ defines a semigroup on Kaout . Let Qout denote a its semigroup algebra of all formal finite linear combinations of morphisms in Kaout . ⊕ out So Qa = b Q(a, b). Just as in the case of graphs, ( every)function f : Ob(K) → R defines an inner product on Qout a , by ⟨α, β⟩ = f h(α∧β) . Condition (F3) in Theorem 23.16 implies that this inner product is positive semidefinite. Factoring out its kernel, we get a Frobenius algebra, which is finite dimensional (this takes a separate argument, since unlike in the proof of Theorem 5.54, this is not assumed directly). The proof of Theorem 23.16, just like the proof of Theorem 5.54, is built on studying the idempotent bases in these algebras. Example 23.18 (Graph algebras). If the category is the category of graph homomorphisms, and a is the k-labeled graph with k nodes and no edges, then Qout is the gluing algebra of k-multilabeled graphs. a Example 23.19 (Flag algebras). Razborov’s “flag algebras” [2007] can be defined in our setting as follows. We consider the category of embeddings (injective homomorphisms) between graphs. Fixing a graph F (which Razborov calls a “type”), the morphisms from F correspond to graphs with a specified subgraph isomorphic with F (which Razborov calls a “flag”). The pushout of two such morphisms results in an object obtained by gluing together the two graphs along the image of F , which is exactly how Razborov defines the product in flag algebras. So flag algebras in the category of monomorphisms between graphs. This is are the algebras Qout F a subalgebra of the algebra Qout defined in terms of all homomorphisms between F graphs.
428
23. OTHER COMBINATORIAL STRUCTURES
23.4.3. Regularity Lemma for categories. There are more results on graph homomorphisms that extend quite naturally to the categorial setting. Let us state a generalization of the Regularity Lemma—both in its weak and original form (Lov´ asz [Notes]). To motivate the definitions below, consider a weighted graph G. This can be viewed as a weighting of the edges of a complete graph, i.e., as a quantum morphism ˜ n , which is symmetric, i.e., it is invariant under swapping the two nodes K2 → K ˜ k ), and of K2 . Regularity lemmas try to find a partition (a morphism Kn◦ → K ˜ k (a quantum morphism in Q(a, d)), such that “pulling weighting of edges of d = K back” these weights to Kn◦ , we get a good approximation in the cut norm. How to translate to the categorial language that the cut norm of a weighted graph is small? It means that for every morphism Kn◦ → K2◦ , if we push forward the edgeweights, then the resulting edgeweights of K2◦ are all small (this says that versions (a) and (c) of the cut norm in Exercise 8.4 are small, but these are all equivalent up to absolute constant factors). These considerations motivate the following general definitions. Let α ∈ K(a, b) and β ∈ K(c, b). We define a quantum morphism αβ ∗ ∈ Q(a, c) by ∑ αβ ∗ = φ. φ∈K(a,c): φβ=α
This operation extends linearly to define xy ∗ for x ∈ Q(a, b) and y ∈ Q(c, b). It is not hard to check that x(zy)∗ = (xy ∗ )z ∗∑ , and ⟨x, yz ∗ ⟩ = ⟨xz, y⟩. For every quantum morphism x = φ xφ φ ∈ Q(a, b) and every object c, we define the c-norm of x by ∥xβ∥∞ . β∈K(b,c) |K(a, b)|
∥x∥c = max
This norm generalizes the cut norm: if a = K2 and c = K2◦ , then a symmetric quantum morphism x ∈ Q(a, b) is a weighting of the edges of b, and it is not hard to see that ∥x∥ /2 ≤ ∥x∥c ≤ ∥x∥ . Let cm denote the m-th direct power of the object c. The first inequality in the following lemma generalizes the Frieze–Kannan Weak Regularity Lemma 9.3, while the second implies the Original Regularity Lemma of Szemer´edi 9.2. Lemma 23.20. Let K be a locally finite category having finite direct products. Let a, b and c be three objects in K, and let m ≥ 1. Then for every x ∈ Q(a, b) there exists a morphism φ ∈ K(b, cm ) and a quantum morphism y ∈ Q(a, cm ) such that 1 ∥x − yφ∗ ∥c ≤ √ ∥x∥2 m and ∥x − yφ∗ ∥c2m ≤ √
1 log∗ m
∥x∥2 .
The Weak Regularity Lemma is obtained, as described above, by taking a = K2 and c = K2◦ and applying the first bound. Note that a morphism in K(b, cm ) corresponds to a partition of V (G) into 2m classes. The Original Regularity Lemma can be derived from the second bound similarly. Strong versions can be generalized as well, but for the details we refer to Lov´asz [Notes]. There are many unsolved questions here: can the Counting Lemma be generalized to categories? Do the notions of convergence and limit objects be formulated
23.5. AND MORE...
429
in an interesting way? Could these results shed new light on hypergraph limits and regularity lemmas? Or perhaps even on sparse regularity lemmas? Exercise 23.21. Let K be a locally finite category, and let c be an object. Prove that every monomorphism in K(c, c) is an isomorphism. Exercise 23.22. Let K be a locally finite category, and let c, d be two objects. Suppose that there are monomorphisms in K(c, d) and in K(d, c). Prove that c and d are isomorphic. Exercise 23.23. Let K be a locally finite category, and let c and d be two objects. For any two morphisms α ∈ K(a, a′ ) and β ∈ K(b, b′ ), let Nα,β denote the number of 4-tuples of morphisms (φ, ψ, µ, ν) (φ ∈ K(c, a), ψ ∈ K(c, b), µ ∈ K(a′ , d), ν ∈ K(b′ , d)) such that φαµ = ψβν. Prove that the matrix N = (Nα,β ), where α and β range over all morphisms of the category, is positive semidefinite. Exercise 23.24. Let a and b be two objects in a locally finite category. Suppose that the direct powers a × a and b × b exist and are isomorphic. Prove that a and b are isomorphic. Exercise 23.25. Let a, b, c, d be four objects in a locally finite category K such that the direct products a × c, b × c, a × d and b × d exist, a × c and b × c are isomorphic, and d has at least one morphism into c. Prove that a × d and b × d are isomorphic.
23.5. And more... There are many types of discrete structures for which one can try to define convergence and limit objects for growing sequences. This is typically not straightforward, as one can see from the case of simple graphs with (say) Θ(n3/2 ) edges. However, this approach has been successful in some cases. It is a natural question to extend the theory of graph limits to directed graphs. Let us assume that these graphs are simple, so that there are no loops and there is at most one edge between two nodes in a given direction. Diaconis and Janson show that at least some of the theory can be developed based on the theory of exchangeable arrays (see Section 11.3.3). The limit object is a bit more complicated, it can be described by four measurable functions W0,0 , W0,1 , W1,0 , W1,1 : [0, 1]2 → [0, 1] such that W0,0 and W1,1 are symmetric, W0,1 (x, y) = W1,0 (y, x) and W0,0 + W0,1 + W1,0 + W1,1 = 1. The function W0,1 (x, y) measures the density of edges from an infinitesimal neighborhood of x to an infinitesimal neighborhood of y etc. Some further remarks and observations can be found scattered in papers, but no comprehensive treatment seems to be known. Perhaps most of the extension is rather straightforward (but be warned: the theory of existence of homomorphisms between digraphs is much more involved—one can say richer—than for undirected graphs; see Hell and Neˇsetˇril [2004]). Posets can be considered as special digraphs, but they are sufficiently important in many contexts to warrant a separate treatment. Janson [2011a, 2012] starts a limit theory of posets. The treatment is based on methods similar to the limit theory of dense graphs in this book, but there are some analytic complications and interesting special features, for which we refer to the paper. Going away from graphs, let us consider the set Sn of permutations of the set [n]. Cooper [2004, 2006] defined and characterized quasirandomness for permutations, and proved a regularity lemma for them. Hoppen, Kohayakawa, Moreira, R´ath and Menezes Sampaio [2011, 2011] defined convergent sequences of permutations, and described their limit objects. Given a permutation π ∈ Sn and a subset A =
430
23. OTHER COMBINATORIAL STRUCTURES
{a1 , . . . , ak } ⊆ [n], we can define a permutation π[A] ∈ Sk by letting π[A]i < π[A]j iff πai < πaj . For a permutation τ ∈ Sk , let Λ(τ, π) denote the number ( ) of sets A with π[A] = τ , and define the density of τ in π by t(τ, π) = Λ(τ, π)/ nk . A sequence of permutations π1 , π2 , . . . (on larger and larger sets) is convergent, if for every permutation τ , the number t(τ, πn ) tends to a limit as n → ∞. Every convergent permutation sequence has a limit object in the form of a coupling measure on [0, 1]2 , which is uniquely determined. Kr´al and Pikhurko [2012] have used this machinery of limit objects to prove a conjecture of Graham on permutations. I have already mentioned the limit theory of metric spaces due to Gromov [1999]. While developed with quite different applications in mind, this turns out to be closely related to our theory of graph limits. Gromov considers metric spaces endowed with a probability measure, and defines distance, convergence and limit notions for them. A simple graph G can be considered as a special case, where the distance of two adjacent nodes is 1/2, the distance of two nonadjacent nodes is 1, and the probability distribution on the nodes is uniform. Under this correspondence, our notion of graph convergence is a special case of Gromov’s “sample convergence” of metric spaces. Vershik [2002, 2004] considers random metric spaces on countable sets, and defines and proves their universality. He also characterizes isomorphism of metric spaces with measures in terms of sampling, analogously to Theorem 13.10. In a recent paper, Elek [2012b] explores this connection and shows how Gromov’s notions imply results about graph convergence, and also how results about graph limits inspire answers to some questions about metric spaces. Perhaps Gromov’s theory can be applied to graph sequences that are not dense, using the standard distance between nodes in the graph. One of the earliest limit theories is John von Neumann’s theory of continuous geometries. The idea here is that if we look at higher and higher dimensional vector spaces over (say) the real field, then the obvious notion of their limit is the Hilbert space. But, say, we are interested in the behavior of subspaces whose dimension is proportional to the dimension of the whole space. Going to the Hilbert space, this condition becomes meaningless. Neumann constructed a limit object, called a continuous geometry, in which the “dimensions” of subspaces are real numbers between 0 and 1. This construction can be extended to certain geometric lattices (Bj¨orner and Lov´ asz [1987]), but its connection with the theory in this book has not been explored. Perhaps most interesting from the point of view of quasirandomness and limits are sequences of integers, due to their role in number theory. (After all, Szemer´edi’s Regularity Lemma was inspired by his solution of the Erd˝os–Tur´an problem on arithmetic progressions in dense sequences of integers.) Often sequences are considered modulo n; this gives a finite group structure to work with, while one does not lose much in generality. Ever since the solution of the Erd˝os–Tur´an problem for 3-term arithmetic progressions by Roth [1952], through the general solution by Szemer´edi [1975], through the work of Gowers [2001] on “Gowers norms”, to the celebrated result of Green and Tao [2008] on arithmetic progressions of primes, a central issue has been to define and measure how random-like a set of integers is. I will not go into this large literature; Tao [2006c] and Kra [2005] give accessible accounts of it. What I want to point out is the exciting asymptotic theory of structures consisting of an abelian group together with a subset of its elements, and more generally, abelian groups with a function defined on them. There has been a
23.5. AND MORE...
431
lot of parallel developments in this area, most notably the work of Green, Tao and Ziegler [2011] and of Szegedy [2012a]. Not surprisingly, the latter is closer to the point of view taken in this book, and develops a theory of limit objects of functions on abelian groups, which is full of surprises but also with powerful results. (For example, to describe the limits of abelian groups, non-abelian groups are needed!) The theory has connections with number theory, ergodic theory, and higher-order Fourier analysis. This explains why I cannot go into the details, and can only refer to the papers.
APPENDIX A
Appendix A.1. M¨ obius functions Let L be a finite lattice (for us, it will be either the lattice of all subsets, or the lattice of all partitions, of a finite set V ). The M¨ obius function of the lattice is a function µ : L × L → R, defined by the equations ∑ µ(x, y) = 0 if x y, µ(x, z) = 1(x = y). x≤z≤y
This is perhaps easier to understand in a matrix algebra setting. Let M(L) denote the set of L × L matrices A in which Axy = 0 for any two lattice elements x y. It is easy to see that M(L) is closed under addition, matrix multiplication and matrix inverse (if an inverse exists), and so it is a matrix algebra. One special matrix of importance is the zeta matrix Z ∈ M(L) defined by Zxy = 1(x ≤ y). Clearly Z is invertible, and M = Z −1 is a matrix with integer entries, called the M¨ obius matrix. The entries of M give the M¨obius function: Mxy = µ(x, y). For ∑ every function f : L → C, we define its (upper) summation function g(x) = y≥x f (y). From g, we can recover function f by the formula f (x) = ∑ µ(x, y)g(y). This is again better seen in a matrix form: we consider f and y≥x g as vectors in CL , then g = Zf , which is equivalent to f = Z −1 g = M g. Of course, we can turn the lattice upside down, and derive similar formulas for the lower summation. The following simple but very useful matrix identity is due to Lindstr¨om [1969] and Wilf [1968]. Let f : L → R be any function, Af be the L × L matrix with (Af )xy = f (x ∨ y). Then (A.1)
Af = Zdiag(M f )Z T .
An important consequence of this identity states that Af is positive semidefinite if and only if the M¨obius inverse of f is nonnegative. Example A.1. If L is the lattice of subsets of a finite set S, then µ(X, Y ) = (−1)|Y \X| for all X ⊆ Y ⊆ S. M¨obius inversion is equivalent to the inclusion exclusion formula in this case. Example A.2. Consider the lattice of partitions Πn of the finite set [n], where the bottom element is the discrete partition P0 (with n classes), the top element is the indiscrete partition P1 (with one class), and P ≤ Q means that P refines Q. The M¨obius function of this lattice is given by the Frucht–Rota–Sch¨ utzenberger Formula ∏ (A.2) µP = µ(0, P ) = (−1)n−|P | (|S| − 1)! S∈P 433
434
A. APPENDIX
where |P | denotes the number of classes in the partition P. (This easily implies a formula for µ(Q, P ), but we won’t need it.) For the partition lattice, we need some simple identities: for every P ∈ Πn , ∑ (A.3) (x)|R| = x|P | R≥P
By M¨obius inversion, (A.4)
∑
µP x|P | = (x)n ,
P
and from the Lindstr¨om–Wilf Formula, ∑ (A.5) µP µQ x|P ∨Q| = (x)n . P,Q
See Van Lint and Wilson [1992] for more on the M¨obius function of a lattice. A.2. The Tutte polynomial Several important graph invariants can be expressed in terms of the Tutte polynomial of the graph G = (V, E) (which may have loops and multiple edges). The quickest way to define this is by the following formula. Let c(A) (A ⊆ E) denote the number of connected components of the graph (V, A) (including the isolated nodes); in particular, c(E) is the number of components of G. We define ∑ (x − 1)c(A)−c(E) (y − 1)c(A)+|A|−v(G) . (A.6) tut(G; x, y) = A⊆E
This definition does not in any way indicate the many uses this polynomial has. The recurrence relation (A.7)
tut(G; x, y) = tut(G − e; x, y) − tut(G/e; x, y),
where e ∈ E(G) is any edge that is not a cut-edge or a loop, says much more (here G/e denotes the graph obtained from G by contracting e, i.e., deleting one copy of e and identifying its endpoints). If the G has i loops and j cut-edges, and no other edges, then tut(G; x, y) = xi y j . The Tutte polynomial is multiplicative over connected components. There are many graph invariants that satisfy recurrence (A.7) (or some very similar recurrence), and these can be expressed as substitutions into the Tutte polynomial (or some slight modification of it). One often uses the following version of the Tutte polynomial, sometimes called the cluster expansion polynomial: ∑ (A.8) cep(G; u, v) = uc(A) v |A| . A⊆E(G)
This differs from the usual Tutte polynomial T (x, y) on two counts: first, instead of the variables x and y, we use u = (x − 1)(y − 1) and v = y − 1; second, we scale by uc(E) v |V | . The cluster expansion polynomial satisfies the following identities: (a) cep(G; u, v) = vcep(G/e; u, v) + cep(G − e; u, v) for all edges e that are not loops; (b) cep(G; u, v) = qcep(G − i; u, v) if i is an isolated node; cep(G; u, v) = ue(G) if G is a graph consisting of a single node. These relations determine the value of
A.2. THE TUTTE POLYNOMIAL
435
the polynomial for any substitution. (See e.g. Welsh [1993] for more on the Tutte polynomial.) Chromatic polynomial. Let G = (V, E) be a multigraph with n nodes. For every nonnegative integer q, we denote by chr(G, q) the number of q-colorations of G (in the usual sense, where adjacent nodes must be colored differently). Clearly chr(G, q) does not depend on the multiplicities of edges (as long as these multiplicities are positive), and chr(G, q) = 0 if G has a loop. Let chr0 (G, k) denote the number of k-colorations of G in which all colors occur. Then clearly ∑
v(G)
(A.9)
chr(G, q) =
k=0
( ) q chr0 (G, k) . k
This implies that chr(G, q) is a polynomial in q with leading term q n and constant term 0, which is called the chromatic polynomial of G. One can evaluate this polynomial for non-integral values of q, when it has no direct combinatorial meaning. We define chr(K0 , q) = 1. It is easy to see that if q is a positive integer, then for every e ∈ E(G), chr(G, q) = chr(G − e, q) − chr(G/e, q).
(A.10)
Since this equation for polynomials holds for infinitely many values of q, it holds identically. If i is an isolated node of G, then we have chr(G, q) = qchr(G − i; q). From these recurrence relations a number of properties of the chromatic polynomial are easily proved, for example, that its coefficients alternate in sign. Most importantly, they imply that the chromatic polynomial is a special substitution of the cluster expansion polynomial: chr(G, q) = cep(G; q, −1). From formula (A.8) we get ∑ (−1)|A| q c(A) . (A.11) chr(G; q) = A⊆E(G)
The coefficient of the linear term in the chromatic polynomial is called the chromatic invariant of the graph. It will be convenient to consider this quantity with an adjusted sign ∑ ′ (−1)e(G )−v(G)+1 , cri(G) = G′ ′
where G ranges through all connected spanning subgraphs of G. It follows from (A.10) that if G is a simple graph, then for every e ∈ E(G), (A.12)
cri(G) = cri(G − e) + cri(G/e).
This implies by induction that cri(G) > 0 if G is connected and cri(G) = 0 if G is disconnected. Spanning trees. Let tree(G) denote the number of spanning trees in the graph G. This parameter has played an important role in the development of algebraic graph theory; formulas for its computation go back to the work of Kirchhoff in the mid-19th century. The number of spanning trees satisfies the recurrence relation (A.13)
tree(G) = tree(G − e) + tree(G/e)
436
A. APPENDIX
for every edge that is not a loop. It is best to define tree(K1 ) = 1 and tree(K0 ) = 0. One gets by direct substitution in (A.6) that for every connected graph G, tree(G) = tut(G; 1, 1). There are many other expressions in the literature for tree(G). Perhaps the best known is Kirchhoff’s Formula (also called the Matrix Tree Theorem) saying that tree(G) is equal to any cofactor of the Laplacian LG = AG − DG (here AG is the adjacency matrix of G and DG is the diagonal matrix composed of the degrees). There are many useful inequalities for tree(G), of which we mention two: the trivial bound ∏ dG (u), (A.14) tree(G) ≤ u
and the relation with the chromatic invariant, which follows easily by induction from the recurrences (A.13) and (A.12): (A.15)
0 ≤ chr(G) ≤ tree(G).
Nowhere zero flows. Let flo(G, q) denote the number of nowhere-zero q-flows. To be precise, we fix an orientation of the edges for any graph G, and count maps − → f : E (G) → Zq such that the sum of flow values on edges entering a given node is equal to the sum of flow values on edges leaving the node. This number is given by |tut(0, q − 1)|. A.3. Some background in probability and measure theory A.3.1. Probability spaces. We have to fix some terminology. A probability space is a triple (Ω, A, π), where A is a sigma-algebra on the set Ω, and π is a probability measure on π. We say that the space is separating, if for any two elements of Ω there is a set in A containing exactly one of them. The space is countably generated, if there is a countable subset J of A generating A (in other words, A is the smallest sigma-algebra containing J ). It is often convenient to assume (which we can do for free), than J is a set algebra, i.e., it is closed under intersection and complementation. An atom of the space is a singleton with positive measure. Two probability spaces (Ωi , Ai , πi ) (i = 1, 2) are isomorphic if there is an invertible map φ : Ω1 → Ω2 that gives a bijection between A1 and A2 and preserves the measure. Two probability spaces (Ωi , Ai , πi ) (i = 1, 2) are isomorphic up to nullsets if one can delete sets Xi ⊆ Ωi of measure 0 so that the remaining probability spaces are isomorphic. From the point of view of basic constructions in probability (independence, expectation and variance of random variables etc.) the underlying probability space does not matter much, at least as long as it is atom-free. But for more advanced technical work, one likes to work with a robust class of them with nice properties. A Borel sigma-algebra (also called standard Borel space) is a sigma-algebra isomorphic to the sigma-algebra of Borel subsets of a Borel set in R. It can be shown that this definition would not change if instead of subsets of R we allowed subsets of Rn , or indeed, of any separable complete metric space. A Borel probability space is a probability space defined on a Borel sigma-algebra. Equivalently (this is nontrivial), it is isomorphic up to nullsets to the disjoint union of a closed interval (with the Borel sets and the Lebesgue measure) and a countable set of atoms. Every Borel space is countably generated and separating. Every finite probability space
A.3. SOME BACKGROUND IN PROBABILITY AND MEASURE THEORY
437
is Borel. A standard probability space (with small variations, also called a Lusin, Lebesgue or Rokhlin space) is the completion of a Borel probability space (i.e., we add all subsets of sets of measure 0 to the sigma-algebra). Standard probability spaces have many useful properties, some of which will be mentioned below; in a sense, they behave as you would expect them to behave. In this sense Borel (or standard) spaces are quite special. On the other hand, they are general enough so that we can restrict our attention to them; this is due to the following fact: Proposition A.3. Every probability space on a countably generated separating sigma-algebra can be embedded into a Borel space in the sense that it is isomorphic up to nullsets to the restriction of a Borel space to a subset with outer measure 1. A.3.2. Measure preserving maps. Let (Ωi , Ai , πi ) (i = 1, 2) be probability spaces. A map φ : (Ω1 , A1 , π(1 ) → (Ω)2 , A2 , π2 ) is measure preserving, if φ−1 (A) ∈ A1 for every A ∈ A2 , and π1 φ−1 (A) = π2 (A). (So the name is a bit misleading, because it is φ−1 rather than φ that preserves measure.) A measure preserving map is not necessarily bijective; for example, the map [0, 1] → [0, 1] defined by x 7→ 2x mod 1 is measure preserving. We say that a measure preserving map φ is invertible, if it is bijective and φ−1 is also measure preserving. If φ : (Ω1 , A1 , π1 ) → (Ω2 , A2 , π2 ) is measure preserving, then for every integrable function f : (Ω2 , A2 , π2 ) → R we have ∫ ∫ ( ) f φ(x) dπ1 (x) = f (x)dπ2 (x). (A.16) Ω1
Ω2
Let S [0,1] denote the semigroup of measure preserving maps [0, 1] → [0, 1], and let S[0,1] be the group of invertible measure preserving maps [0, 1] → [0, 1]. One of the most important properties of standard probability spaces is that under mild conditions, their measure preserving images are also standard. Proposition A.4. Let (Ω1 , A1 , π1 ) be a standard probability space and let (Ω2 , A2 , π2 ) be another probability space where A2 has a countable subset separating any two points of Ω2 . Let φ : Ω1 → Ω2 be a measure preserving map. Then (Ω2 , A2 , π2 ) is standard, and Ω′2 = Ω2 \ φ(Ω1 ) has measure 0. Furthermore, if φ is bijective, then φ−1 is an isomorphism (Ω′2 , A2 |Ω′2 , π2 |Ω′2 ) → (Ω1 , A1 , π1 ). In particular, φ−1 is also measure preserving. Remark A.5. It is usually a matter of taste or convenience whether we decide to work on a complete space or on a countably generated space. One tends to be sloppy about this, and just say, for example, that the underlying probability space is [0, 1], without specifying whether we mean the sigma-algebra of Borel sets or of Lebesgue measurable sets. Often, one implicitly assumes that the Borel sigma algebra is defined as the set of Borel sets in a Polish space, and uses topological notions like open sets or continuous functions to define measure theoretic notions. This is sometimes unavoidable (see e.g. the definition of weak convergence below), but the same Borel sigma-algebra can be defined by very different topological spaces, and this is important in some cases even in this book. I will use this topological representation only where it is necessary.
438
A. APPENDIX
A.3.3. The space of measures. Let T be a topological space, and let P(T ) denote the set of probability measures on the Borel subsets of T . We say that a sequence of measures µ1 , µ2 , · · · ∈ P(T ) converges weakly to a probability measure µ ∈ P(T ), if ∫ ∫ f dµn → f dµ (n → ∞) T
T
for every continuous bounded function f : S → R. Most often we need this notion in the case when T is a compact metric space, so we don’t have to assume the boundedness of f . This notion of convergence defines a topology on P(T ), which we call the topology of weak convergence. By Prokhorov’s Theorem (see e.g. Billingsley [1999]; this is not the most general form), for a compact metric space K, the space P(K) is compact in the topology of weak convergence, and also metrizable. (One can describe explicit metrizations, like the Levy-Prokhorov metric, but we don’t need them.) There is an important warning about weak convergence: it is not a purely measure theoretic notion, but topological. In other words, we can have a sequence of measures on a Borel sigma-algebra (Ω, B) that is weakly convergent if we put one topology on Ω with the given Borel sets, but not convergent if we put another such topology on Ω. Sometimes we play with this, and change the topology (without changing its Borel sets) to suit our needs. A.3.4. Coupling. A coupling measure between two probability spaces (Ωi , Ai , πi ) (i = 1, 2) is a probability measure µ on the sigma-algebra (Ω1 , A1 ) × (Ω2 , A2 ) whose marginals are π1 and π2 , i.e., µ(A1 ×Ω2 ) = π1 (A1 ) and µ(Ω1 ×A2 ) = π2 (A2 ) for all Ai ∈ Ai . In terms of random variables, a coupling measure is the distribution of a pair (X1 , X2 ), where Xi has distribution πi . The simplest coupling measure is the product measure π1 × π2 , corresponding to choosing X1 and X2 independently. If (Ω1 , A1 , π1 ) = (Ω2 , A2 , π2 ), then the measure on the diagonal {(x, x) : x ∈ Ω1 } defined by µ{(x, x) : x ∈ A} = π1 (A) is another coupling measure. Suppose that (Ωi , Ai , πi ) is the sigma-algebra of Borel sets in a compact metric space Ki . It is easy to see that if we fix the marginal distributions, the set of coupling measures forms a closed (and hence compact) subspace of P(K1 × Ki ). This space is in fact much nicer than the space of all measures, as the following proposition shows (for a proof, see [Notes]). Proposition A.6. Let K1 , K2 be compact metric spaces and let (Ki , Bi , λi ) be probability spaces on their Borel sets. Let µ1 , µ2 , . . . and µ be coupling measures between (K1 , B1 , λ1 ) and (K2 , B2 , λ2 ). Then the following are equivalent: (i) µn → µ weakly; (ii) µn (B1 × B2 ) → µ(B1 × B2 ) for all sets Bi ∈ Bi ; ∫ ∫ (iii) K1 ×K2 f dµn → K1 ×K2 f dµ for every function f : K1 × K2 → R that is the limit of a uniformly convergent sequence of stepfunctions; are measurable functions fn , f : [0, 1] → K1 ×K2 such that µn (X) = ( (iv) There ) ( ) λ fn−1 (X) , µ(X) = λ f −1 (X) , and fn → f almost everywhere. The following construction of coupling measures follows from Proposition 3.8 of Kellerer [1984].
A.3. SOME BACKGROUND IN PROBABILITY AND MEASURE THEORY
439
Proposition A.7. Let (Ωi , Ai , πi ) (i = 0, 1, 2) be standard probability spaces. Let φi : Ωi → Ω0 (i = 1, 2) be measure preserving{ maps. Then there exists a}coupling µ of (Ω1 , A1 , π1 ) and (Ω2 , A2 , π2 ) such that µ (x1 , x2 ) : φ1 (x1 ) = φ2 (x2 ) = 1. A.3.5. Markov chains. Markov chains are very basic material in probability theory, but usually they are defined in a more restrictive setting than what we need, so let us give a brief introduction. A Markov chain is described by a σ-algebra (Ω, A) (the state space), together with a system of probability distributions (Pu : u ∈ Ω) on (Ω, A) such that Pu (A) is a measurable function of u for each A ∈ A (the transition distributions). We call a probability distribution π on (Ω, A) stationary, if ∫ Pu (A) dπ(u) = π(A) A
for all A ∈ A. If the state space is finite, then the Markov chain always has a stationary distribution. In the general case, this is not always true. One sufficient condition for the existence is that (Ω, A) is the sigma-algebra of Borel sets in a compact Hausdorff space K, and the map u 7→ Pu is continuous as a map from K into P(K) with the weak topology. The more usual description of a Markov chain as a sequence of random variables is obtained if we also specify a starting distribution σ on (Ω, A). We start with an X0 ∈ Ω from the distribution σ, and generate Xn+1 as a random element of Ω from the distribution PXn . We will call the sequence (X0 , X1 , X2 , . . . ) a random walk on Ω. If the Markov chain has a stationary distribution π, and X0 is randomly chosen according to π, then every Xn is also from the stationary distribution, and we call the sequence (X0 , X1 , X2 , . . . ) a stationary walk. Every Markov chain defines a probability measure ψ on Ω × Ω by ∫ ψ(A × B) = Pu (B) dπ(u). A
We can think of ψ(A × B) as the frequency with which a stationary walk steps from A to B. We call the measure ψ the step distribution of the Markov chain. We say that the Markov chain is reversible if ψ(A × B) = ψ(B × A) for any two measurable sets A, B. A.3.6. Martingales. A (finite or infinite) sequence (X1 , X2 , . . . ) of real valued random variables is called a martingale, if for all k ≥ 0 we have E(|Xk |) < ∞, and E(Xk+1 | X1 , . . . , Xk ) = Xk . More generally (and not quite logically), the sequence is called a supermartingale, if E(Xk+1 | X1 , . . . , Xk ) ≤ Xk . A submartingale is defined analogously. It is often convenient to define X0 = E(X1 ) (so this is a random variable that is concentrated on a single value). Clearly all expectations in a martingale are the same: X0 = E(X1 ) = E(X2 ) = . . . For a supermartingale, the expectations form a non-increasing sequence. Example A.8. Let Y1 , Y2 , . . . be independent random variables such that E(Yk ) = 0. Then the random variables Xk = Y1 + · · · + Yk form a martingale. The condition that E(Yk ) = 0 can of course be arranged, as soon as the expectations exist, by subtracting its expectation from each Yk , which does not influence the independence of these variables. So the results of martingale theory can be
440
A. APPENDIX
applied to the partial sums of any sequence of independent random variables with finite expectations. Many applications in combinatorics use martingales through the following construction. Example A.9 (Doob’s Martingale). Let (Ω, A, π) be a probability space and let f : Ω → R be an integrable function. Let Y1 , . . . , Yn be independent random elements of Ω from the distribution π, and let Xk = E(f (Y1 , . . . , Yn | Y1 , . . . , Yk ). Then (X1 , . . . , Xn ) is a martingale. Example A.10. Let f : [0, 1] → R be an integrable function, and let P1 , P2 , . . . be a sequence of partitions of [0, 1] into a finite number of measurable parts such that Pn+1 is a refinement of Pn . Let Y ∈ [0, 1] be a uniform random point, and consider the sequence Xk = fPk (Y ). Then (X1 , X2 , . . . ) is a martingale. Instead of [0, 1], we could of course consider any probability space, for example, [0, 1]2 , which shows the connection of martingales with the stepping operator. There are (at least) three theorems on martingales that are relevant for combinatorial applications; these play an important role in our book as well. Let (X0 , X1 , . . . ) be a sequence of random variables. A random variable T with nonnegative integral values is called a stopping time (for the sequence (X0 , X1 , . . . )), if for every k ≥ 0, the event T = k, conditioned on X1 , . . . , Xk , is independent of the variables Xk+1 , Xk+2 . . . (In computer science, this is often called a stopping rule: we decide whether we want to stop after k steps depending on the values of the variables we have seen before, possibly using some new independent coin flips). The Martingale Stopping Theorem (a.k.a. Optional Stopping Theorem) has many versions, of which we state one: Theorem A.11. Let (X1 , X2 , . . . ) be a supermartingale for which |Xm+1 − Xm | is bounded (uniformly for all m), and let T be a stopping time for which E(T ) is finite. Then E(XT ) ≤ X0 . For a martingale, we have equality in the conclusion, and for a submartingale, we have the reverse inequality in the conclusion. The following fact is called the Martingale Convergence Theorem (again, we don’t state it in its most general form). Theorem A.12. Let (X1 , X2 , . . . ) be a martingale such that supn E(|Xn |) < ∞. Then (X1 , X2 , . . . ) is convergent with probability 1. Applying this theorem to the martingale in Example A.10, we get that if f is integrable, then the functions fPk tend to a limit almost everywhere. This limit may not be the function f itself, but it is equal to f almost everywhere if any two points of [0, 1] are separated by one of the partitions Pn (cf. also Proposition 9.8). If we want to prove that a random variable is highly concentrated around its average, most of the time we use Azuma’s Inequality (or one of its corollaries). Theorem A.13. Let (X1 , X2 , . . . ) be a martingale such that |Xm+1 − Xm | ≤ 1 for every m ≥ 0. Then ( ) 2 P Xm > X0 + λ < e−λ /(2m) . Applying Azuma’s Inequality to the martingale (−X1 , −X2 , . . . ), we can bound the probability that Xm < X0 − λ. Applying it to the martingale in Example A.8,
A.4. MOMENTS AND THE MOMENT PROBLEM
441
we get the following inequality, which (up to minor variations) is called Bernstein’s, Chernov’s or Hoeffding’s: Corollary A.14. Let X1 , X2 , . . . be i.i.d. random variables, and assume that |Xi | ≤ 1. Then ( ) m 2 1 ∑ P Xm − E(X1 ) > ε < e−ε m/2 . m i=1 For us, it will be most convenient to use the following corollary of Azuma’s Inequality, obtained by applying it to the martingale in Example A.9: Corollary A.15. Let (Ω, A, π) be a probability space, and let f : Ωn → R be a measurable function such that |f (x1 , . . . , xn ) − f (y1 , . . . , yn )| ≤ 1 whenever (x1 , . . . , xn ) and (y1 , . . . , yn ) differ in one coordinate only. Let x be a random point of Ωn (chosen according to the product measure). Then ( ( ) ) 2 P f (x) − E f (x) > εn < e−ε n/2 . There are also reverse martingales. A sequence (X1 , X2 , . . . ) of real valued random variables is called a reverse martingale, if for all k ≥ 0 we have E(|Xk |) < ∞, and E(Xk | Xk+1 , Xk+2 , . . . ) = Xk+1 . A finite reverse martingale is just a martingale backwards, but infinite reverse martingales are different. While reverse martingales don’t seem to be as important as martingales, there is a very important example. Example A.16. Let (Y1 , Y2 , . . . ) be i.i.d. real valued random variables with E(|Yi |) < ∞, and let Xk = (Y1 + · · · + Yk )/k. Then (X1 , X2 , . . . ) is a reverse martingale. So partial sums Y1 + · · · + Yk form a martingale, but dividing by the number of terms, we get a reverse martingale. (The latter is a bit trickier to verify.) The Martingale Convergence Theorem has an analogue for reverse martingales (which holds under more general conditions and is easier to prove): Theorem A.17. Every reverse martingale is convergent with probability 1. Applying this theorem to Example A.16, we can derive the Strong Law of Large Numbers. We refer to the book of Williams [1991] for more on martingales. A.4. Moments and the moment problem Throughout this book, we deal with kernels and graphons, which are functions in two variables. We define subgraph densities in them, we consider weak isomorphism and its correspondence with measure preserving changes in the variables, approximate them by stepfunctions, just to name a few analytic techniques with graph-theoretic significance. In this Appendix we summarize some analogous notions and results for functions in a single variable (see Feller [1971], Diaconis and Freedman [2004] for more). Some of these are used in the study of kernels, some others should serve as motivation for the problems and results in the main body of the book.
442
A. APPENDIX
Let us consider the space L∞ [0, 1] of bounded measurable functions f : [0, 1] → [0, 1]. For such a function, we consider its moments ∫1 f (x)k dx
Mk (f ) =
(k = 0, 1, 2, . . . ).
0
The moment sequence of a function determines it up to a measure preserving transformation: Proposition A.18. Two bounded measurable functions f, g ∈ L∞ [0, 1] have the same moments if and only if there are measure preserving maps φ, ψ ∈ S [0,1] such that f ◦ φ = g ◦ ψ almost everywhere. Equivalently, there is a function h ∈ L∞ [0, 1] and maps φ, ψ ∈ S [0,1] such that f = h ◦ φ and g = h ◦ ψ. What makes this correspondence substantially easier to handle in the onevariable case than in the two-variable case is that in each equivalence class of weak isomorphism there is a special element: Proposition A.19 (Monotone Reordering Theorem). For every measurable function f : [0, 1] → R+ there is a monotone decreasing function h : [0, 1] → R+ and a map φ ∈ S [0,1] such that f = h ◦ φ. The function h is uniquely determined up to a set of measure 0. Moment sequences can be characterized; this is called the Hausdorff Moment Problem. Given a sequence (a0 , a1 , . . . ) of nonnegative, we define two infinite matri( ) ∑k ces H(a) and M (a) by H(a)n,k = j=0 (−1)j kj an+j and M (a)n,k = an+k . Using this notation, moment sequences can be characterized in different ways: Proposition A.20. For a sequence (an ) of nonnegative numbers, the following are equivalent: (i) (an ) is the moment sequence of a function in L∞ [0, 1]; (ii) a0 = 1 and H(a) ≥ 0 (entry by entry); (iii) a0 = 1 and M (a) is positive semidefinite. We call a function f ∈ L∞ [0, 1] a stepfunction if its range is finite. The set f −1 (x) for any x in the range is called a step of f . Note that its monotone reordering (in the sense of Proposition A.19) is then a stepfunction in the more usual sense, whose steps are intervals. Moment sequences of stepfunctions can be expressed as finite sums of the form Mk (f ) =
r ∑
λ(Si )f (xi )k ,
i=1
where the Si are the steps and xi ∈ Si . Conversely, every exponential sum s(k) = ∑
k ∑
ai bki
i=1
with ai > 0 and i ai = 1 can be thought of as the moment sequence of a stepfunction. An infinite sum of this type can also be represented as the moment sequence of a function (with countably many “steps”). Proposition A.18 implies that the values s(k) of such an exponential sum uniquely determine the numbers ai and bi .
A.4. MOMENTS AND THE MOMENT PROBLEM
443
This fact is “self-refining” in the sense that the following seemingly stronger statement easily follows from it. Proposition A.21. Let ai , bi , ci , di be nonzero real numbers (i = 1, 2, . . . ), such that bi ̸= bj and d∑ j. Assume that there is a k0 ≥ 0 such that for all i ̸= dj for i ̸= ∑ ∞ ∞ k ≥ k0 , the sums i=1 ai bki and i=1 ci dki are convergent and equal. Then the two sums are formally equal, i.e., there is a permutation π of N such that ai = cπ (i) and bi = dπ (i). Returning to stepfunctions with a finite number of steps, we note that they can be characterized in terms of their moment matrices: Proposition A.22. A function is a stepfunction if and only if its moment matrix has finite rank. In this case, the rank of the moment matrix is the number of steps. Stepfunctions are determined by a finite number of moments, and this fact characterizes them. To be more precise, Proposition A.23. (a) Let f ∈ L∞ [0, 1] be a stepfunction with m steps, and let g ∈ L∞ [0, 1] be another function such that Mk (f ) = Mk (g) for k = 0, . . . , m. Then f and g have the same moments. (b) For every function g ∈ L∞ [0, 1] and every m ≥ 0 there is a stepfunction f ∈ L∞ [0, 1] with m steps so that Mk (f ) = Mk (g) for k = 0, . . . , m − 1. These results can be extended to functions f : [0, 1] → [0, 1]d quite easily; we only formulate those that we are using in the book. Such a function is called a stepfunction if its range is finite. Moments don’t form a sequence, but an array with d indices (a d-array for short). For a = (a1 , . . . , ad ) ∈ Nd , the corresponding moment of f = (f1 , . . . , fd ) is defined by ∫1 f1 (x)a1 . . . fd (x)ad dx.
Ma (f ) = 0
For an array A : N → R, we define its moment matrix M = M (A) as the infinite symmetric matrix whose rows and columns are indexed by vectors in Nn , and Mu,v = Au+v . Semidefiniteness of the moment matrix does not characterize moment sequences if d ≥ 2, but they do at least when the function values are bounded by 1 (Berg, Christensen and Ressel [1976], Berg and Maserick [1984]): d
Proposition A.24. A d-array A is the moment array of some measurable function f : [0, 1] → [−1, 1]d if and only if A0...0 = 1, M (A) is positive semidefinite and |Av | ≤ 1 for all v ∈ Nd . Furthermore, f is a stepfunction if and only if M (A) has finite rank, and the rank of M (A) is equal to the number of steps of f . Again, stepfunctions are determined by their moments: Proposition A.25. (a) Let f : [0, 1] → [0, 1]d be a stepfunction with m steps, and let g : [0, 1] → [0, 1]d be another function such that Ma (f ) = Ma (g) for a ∈ {0, . . . , m}d . Then there are measure preserving maps φ, ψ ∈ S [0,1] such that f ◦ φ = g ◦ ψ almost everywhere. In particular, g is a stepfunction, and Ma (f ) = Ma (g) for a ∈ Nd . (b) For every function f : [0, 1] → [0, 1]d and every finite set S ⊆ Nd , there is a stepfunction g : [0, 1] → [0, 1]d with at most |S| + 1 steps so that Ma (f ) = Ma (g) for all a ∈ S.
444
A. APPENDIX
A next question would be to define moments for functions f : [0, 1]d → [0, 1]. It is not enough to use here sequences or arrays. For d = 2, the right amount of information is contained in a graph parameter, and the subgraph densities t(F, f ) show many properties analogous to the classical results described above. Theorem 13.10, Theorem 11.52 together with Proposition 14.61, Theorem 5.54 and Theorem 16.46 are analogues of Theorems A.18, A.20, A.22, and A.23(a). Other results (e.g, the Monotone Reordering Theorem A.19 or Theorem A.23(b) do not seem to generalize to d = 2 in any natural way. The case d ≥ 3 clearly corresponds to hypergraphs, where, as discussed in Chapter 23.3, new difficulties arise, and many of the interesting questions are open. A.5. Ultraproduct and ultralimit Ultrafilters. Let ω ⊆ 2N . We say that ω is an ultrafilter, if it is a filter (closed under supersets), it is closed under finite intersections, and for every X ⊆ N, either X ∈ ω or N \ X ∈ ω, but not both. It follows that N ∈ ω and ∅ ∈ / ω. (See Bell and Slomson [2006] for more on ultrafilters and other constructions below.) A trivial example of an ultrafilter is the set of subsets X ⊆ N containing a given element n ∈ N; such an ultrafilter is called principal. There are non-principle ultrafilters; their existence can be proved using Zorn’s Lemma (i.e., the Axiom of Choice). From now on, we fix a non-principal ultrafilter ω (it does not matter which one). It is sometimes convenient to call the sets in ω Big, and the sets in N \ ω, Small. (We capitalize to make a distinction from the informal use of these words.) The following properties are not hard to prove: Proposition A.26. (a) The union of a finite number of Small sets is Small. (b) The intersection of a finite number of Big sets is Big. (c) Every finite set is Small. Ultraproduct of sets. Let (Vi : i ∈ N) be a sequence of sets. We say that two sequences (ai : i ∈ N) and (bi : i ∈ N) (ai , bi ∈ Vi ) are ω-equivalent, if they differ only on a Small set of indices, i.e., if {i : ai = bi } ∈ ω. (It is easy to see that this is an equivalence relation.) The ultraproduct of the sets ∏ Vi (with respect to the ultrafilter ω) is obtained from their cartesian product ∏i∈N Vi by identifying ω-equivalent∏sequences. We denote this ultraproduct by ∏ ω Vi . Formally, the elements of ω Vi are ω-equivalence classes of sequences in i∈N Vi ; we denote the ω-equivalence class containing a sequence a by [a]. It is not hard to see that the cardinality of the ultraproduct of a sequence of finite non-singleton sets is continuum. ∏ Let Ui ⊆ Vi . Consider the set U of sequences (a1 , a2 , . . . ) ∈ i∈N Vi such that ai ∈ Ui for a Large set of indices i. It is clear that if a sequence belongs to U then classes contained in so does every ω-equivalent sequence; the set of ω-equivalence ∏ U will be denoted (with a little abuse of notation) by ω Ui . Ultraproduct of structures. Let Ai = (Vi , Ri1 , . . . , Rik ) be relational structures of the same type, where Rij is a relation∏on Vi with a finite number rj of variables for j = 1, . . . , k. Their ultraproduct ω Ai is ∏ defined as the relational structure (V, R1 , . . . , Rk ) of the same type, where V = ω Vi and for any rj sequences xi = (xi1 , xi2 , . . . ) (i = 1, . . . , rj ) we have ([x1 ], . . . , [xrj ]) ∈ Rj if and only
A.6. VAPNIK–CHERVONENKIS DIMENSION
445
if (xi1 , . . . , xirj ) ∈ Rij for a large set of indices i. It is easy to see that this definition is correct in the sense that ([x1 ], . . . , [xrj ]) ∈ Rj depends only on the equivalence classes [x1 ], . . . , [xrj ] and not on which representative xi is chosen from [xi ]. A very important property of ultraproduct of structures is stated in the following theorem: Proposition A.27 (Lo´ s’s Theorem). If every structure Ai (i = 1, 2, . . . ) satisfies a first order sentence Φ, then their ultraproduct also satisfies Φ. As a special case, we can look at a sequence of finite simple graphs Gi = (Vi , Ei ), i.e., finite sets Vi with a symmetric irreflexive binary relation Ei . The ultraproduct of them is also a simple graph: the symmetry and irreflexivity of the relation on the ultraproduct is easy to check (or it follows from Lo´s’s Theorem, since these properties of the relation can be expressed by a first-order sentence: ∀x∀y(xy ∈ E ↔ yx ∈ E), and ∀x(xx ∈ / E)). If all the graphs have degrees bounded by D, then so does their ultraproduct, since this property can be expressed by a first-order sentence. Ultralimit of a numerical sequence. As a nice application of an ultrafilter ω we can associate a “limit” to every bounded sequence of numbers. (This is a special construction for a Banach limit of bounded sequences.) Let (a1 , a2 , . . . ) (ai ∈ [u, v]) be a bounded sequence of real numbers. We say that a real number a is the ultralimit of the sequence (in notation limω ai = a) if for every ε > 0, the set {i : |ai − a| > ε} is Small. (Note: ordinary convergence to a would require that this set is finite.) It is not hard to prove that every bounded sequence of real numbers has a unique ultralimit. Furthermore, if limω ai = a and ai ∈ [u, v] for every i, then a ∈ [u, v]. Ultraproduct of measures. Let (Vi , Ai ) be a sigma-algebra for i∏= 1, 2, . . . . ∏ The sets of the form ω Ai (Ai ∈ Ai ), considered as subsets of V = ω Vi , form a Boolean algebra B (they are closed under finite union, intersection, and∏complementation). The Boolean ∏ algebra B generates a sigma-algebra on V = ω Vi , which we denote by A = ω Ai . Next, suppose that there is a probability measure πi on (Vi , Ai ); then we define a setfunction on B by (∏ ) π Ai = lim πi (Ai ). ω
ω
It is not hard to see that π is finitely additive, and a bit harder to see that it is a measure on B, i.e., if B1 , B2 · · · ∈ B and ∩∞ n=1 Bn = ∅, then limn π(Bn ) = 0. Trivially π(V ) = 1. It follows by Carath´eodory’s Measure Extension Theorem (see e.g. Halmos [1950]) that π extends to a probability measure on A (which we also denote by π). Thus (V, A, π) is a probability space,∏which we call the ultraproduct of the probability spaces (Vi , Ai , π). We write π = ω πi . (This is a special case of a Loeb space; see Loeb [1979].) A.6. Vapnik–Chervonenkis dimension In probability theory, we often have to prove that out of a large number of “bad” events, with positive probability none happens. The trivial method (which is sufficient surprisingly often) is to use the union bound: we can draw this conclusion provided the sum of probabilit