# Triangle Removal Lemma

The triangle removal lemma shows that if a lot of edges must be removed to make a graph triangle-free, then there must have been a lot of triangles in the original graph. While this observation may not sound ground breaking, it is a very useful tool in combinatorics.

## The importance of triangles

Whether or not a graph contains any triangles (3 vertices which are all adjacent to each other) is relevant for a variety of questions in mathematics.

### Intersection graphs

The intersection graph of a family of lines $\mathcal{L}$ is the graph on the vertex set $\mathcal{L}$, where an edge exists between $l_1,l_2 \in \mathcal{L}$ if and only if $l_1$ and $l_2$ intersect. A famous question posed by Erdös in the 1970s is whether the chromatic number of intersections graphs of line segments is bounded by a function of their clique number (the maximum size of a clique in the graph). The answer, no, was recently published in this paper.

The authors prove that for every positive integer k, there exists an arrangement of a finite number of line segments so that the intersection graph of the line segments is triangle-free and has chromatic number greater than k. Think about constructing a graph with large chromatic number. Intuitively, the most reasonable way to force more colors is to have 2 intersecting lines, which are both different colors in a proper coloring, and then another line intersecting both of those. However, this is exactly a triangle. Thus the construction of this arrangement is by no means obvious and is of interest to those who study combinatorial geometry.

Also in the realm of incidence geometry, the triangle removal lemma is explicitly used in this 2008 paper by Solymosi. The author proves that when an arrangement of points and lines defines a lot of incidences (at least $cn^{4/3}$ as given by the Szemerédi-Trotter theorem), then there exists k points, which are not collinear or concurrent, so that any pair of them is incident to a line.

### Networks: phase transitions and independence number

In a network, the existence of triangles may be undesirable depending on the network’s purpose. For example, a random graph on n vertices without any cliques is exactly a triangle-free graph. It’s easy to prove using the second moment method that with high probability an Erdös-Rényi random graph on n vertices is triangle-free for $p = o(1/n)$, where $p$ is the probability of an edge between any 2 vertices.

Furthermore, Ajtai, Komlós, and Szemerédi showed that when a graph $G$ on $n$ vertices is triangle-free with maximum degree at most $d \geq 1$, it has a large independence number $\alpha(G) >\frac{n \cdot \log d}{8d}$. Thus knowing whether the underlying graph of a network is triangle-free provides confirmation of the existence of a large independent set in the network.

### Arithmetic progressions

This last result is actually a consequence of the theorem proven in this post: the triangle removal lemma. Roth’s lemma states that for $A \subset \mathbb{N}$ which has positive upper density, A contains an arithmetic progression of length 3. For reasons beyond my knowledge, as my knowledge of number theory is ε, arithmetic progressions with common differences of 3 or 4 are very important.

The triangle removal lemma shows that if a lot of edges must be removed to make a graph triangle-free, then there must have been a lot of triangles in the original graph.

The Triangle Removal Lemma For every ε>0 there exists δ>0 such that for any graph G on n vertices where at least εn² edges must be removed in order to make it triangle-free, G contains at least δn³ triangles.

Thus for the reasons listed above, and many more, the triangle removal lemma is a useful tool. Its proof is due to Rusza and Szemerédi.

## Background and definitions

The experienced reader familiar with Szemerédi’s regularity lemma can skip this section. From now on we use standard graph theory notation, but will introduce all non-basic definitions. Assume G=(V,E) is a graph with vertex set V and edge set E. The triangle removal lemma and regularity lemma both rely on the fact that most graphs look random.

First, let A,B⊂V where A and B are disjoint. The density of the pair (A,B) is defined as

$d(A,B)=\frac{|E(A,B)|}{|A||B|}$

Think of density as a measurement in [0,1], which represents the fraction of total possible edges between two disjoint sets that are edges in the graph. To illustrate this, note that all disjoint pairs of vertices in a complete graph have density 1, and most of the pairs of vertices in a sparse graph will have density close to 0.

Furthermore, (A,B) is said to be ε-pseudo-random if for all pairs (A’,B’), where A’⊂A and B’⊂B, $|d(A,B)-d(A',B')| \leq \epsilon$. In other words, the edge density in a pair of subsets is close to uniform across the pair. The figure below illustrates a graph that obviously satisfies this definition for pseudo-random, $K_{4,4}$, and a graph that is definitely not pseudo-random. A subset pair which is ε-pseudo-random does not have an area where most of its edges lie.

Next, a partition on V into sets $V_1,\cdots, V_k$ is said to be ε-regular if the following two conditions hold:

1. For all $i,j \in [k]$, $\mid \vert V_i \vert-\vert V_j \vert \mid \leq 1$.
2. All except εk² pairs of subsets from the partitions $V_i,V_j$, for i<j, are ε-pseudo-random

A partition on the vertices is ε-regular if it is as close as possible to an equipartition with k parts, and most pairs in this partition have close to uniform edge density between them.

## Formal statement

Equipped with the necessary definitions, it’s time for the lemma and its proof.

The Triangle Removal Lemma For every ε>0 there exists δ>0 such that for any graph G on n vertices where at least εn² edges must be removed in order to make it triangle-free, G contains at least δn³ triangles.

A common equivalent statement is: for every ε>0 there exists δ>0 such that for any graph G on vertices with at most δn³ triangles, there exists a set of at most εn² edges, which when removed results in a triangle-free graph.

## Proof

Let G be a graph on n vertices, where at least εn² edges must be removed in order to make the graph triangle-free. Let $V_1, \cdots, V_k$ be an $\frac{\epsilon}{4}$-regular partition on V. Such a partition exists by Szemerédi’s regularity lemma. Edges which satisfy one of 3 conditions, to be enumerated soon, will be removed. Then it’s argued that at most εn² edges were removed and there are still at least δn³ triangles in the graph.

Remove edge (u,v) from G if:

1. $(u,v) \in V_i \times V_j$ where $(V_i,V_j)$ is not an $\frac{\epsilon}{4}$-pseudo-random pair. Remember our definition for ε-regular partitions allowed a few of the pairs to not have its edges uniformly distributed throughout the partition. So all edges between such pairs are removed.
2. $(u,v) \in V_i \times V_j$ where $d(V_i,V_j) < \frac{\epsilon}{2}$. This condition states to remove all edges between pairs in the partition which didn’t have many edges between them to begin with.
3. $v \in V_i$ where $\mid V_i \mid \leq \frac{\epsilon}{4k}n$. Any partitions that are small are removed. Thus obviously any edges with adjacencies in this partition are removed.

At most, $\frac{\epsilon}{4} k^2$ pairs from the partition were not $\frac{\epsilon}{4}$-pseudo-random. Denote the set of such pairs by $I$. As the partition is close to an equi-partition, for all $i \in [ k ]$, $\mid V_i \mid \leq \lfloor \frac{n}{k}\rfloor +1$. Thus from condition 1, at most $\sum\limits_{(i,j) \in I} \mid V_i\mid\cdot \mid V_j\mid \leq \frac{\epsilon}{4} k^2 \cdot \frac{n^2}{k^2} = \frac{\epsilon}{4}n^2$ edges are removed. The number of edges in a simple graph on n vertices is <. When edges are removed between pairs of partitions that have low density, specifically density $<\frac{\epsilon}{2}$, at most $\frac{\epsilon}{2} \cdot n^2$ edges are removed.

If a partition has $<\frac{\epsilon}{4k}n$ vertices, then it contributes $<\frac{\epsilon}{4k}n^2$ edges. As there are k parts, this implies $ edges are removed when the vertices from those partitions are deleted. In total, the 3 conditions for edge removal deleted only at most εn² edges. Thus the graph still contains a triangle.

It remains to show how many triangles are left in the graph after removing edges.
Fix a triangle that remains in the graph, and label the vertices of the triangle a,b,c. WLOG suppose that $a \in V_1$, $b \in V_2$, $c \in V_3$. As these three edges were not removed, (V₁,V₂), (V₂,V₃), and (V₁,V₃) are $\frac{\epsilon}{4}$-regular pairs with density $\geq \frac{\epsilon}{2}$. Lastly, $\mid V_i \mid >\frac{\epsilon}{4k}n$ for i=1,2,3.

Next, it’s necessary to show that the number of vertices which have at least $\frac{\epsilon n}{4k}$ adjacencies in both V₂ and V₃ is at least $\frac{n}{2k}$. Assume for sake of contradiction that there is a subset X ⊂V₁ where $\mid X \mid \geq \frac{\epsilon n}{4k}$ and every $x \in X$, has $< \frac{\epsilon n}{4k}$ adjacencies in V₂. Then checking the density between X and V₂, $d(X,V_2) < \frac{\mid X \mid \frac{\epsilon n}{4k}}{ \mid X \mid \mid V_2 \mid} = \frac{\epsilon}{4}$, which breaks the assumptions that (V₁,V₂) are $\frac{\epsilon}{4}$-regular pairs. The same applies for V₃, and the number of vertices which have $< \frac{\epsilon n}{4k}$ adjacencies in either V₂ or V₃ is

\begin{aligned} \frac{n}{k}(1-2 \cdot \frac{\epsilon}{4}) > \frac{n}{2k}. \end{aligned}

Thus there are at least $\frac{n}{2k}$ vertices in V₁ with at least $\frac{\epsilon n}{4k}$ adjacent vertices to both V₂ and V₃. For such an $a \in V_1$, let W₂,W₃ be subsets of V₂ and V₃ respectively which are adjacent to a. As(V₂,V₃) is $\frac{\epsilon}{4}$ regular,

\begin{aligned} \mid e(W_2,W_3)\mid \geq \frac{\epsilon}{4} |W_2||W_3| \geq \frac{\epsilon}{4} \cdot \left (\frac{\epsilon n}{4 k} \right)^2 = \frac{\epsilon^3 \cdot n^2}{64k^2} \end{aligned}

Every edge between W₂ and W₃ forms a unique triangle. This was for an arbitrary $a \in V_1$, so summing over all vertices in V1, the number of triangles is at least

\begin{aligned} \sum\limits_{i=1}^{\frac{n}{2k}} \frac{\epsilon^3 \cdot n^2}{64k^2}= \frac{\epsilon^3 \cdot n^2}{64k^2} \cdot \frac{ n}{2 k} = \delta n^3, \text{ for } \delta = \frac{\epsilon^3}{128k^3}. \end{aligned}

## What just happened?

The regularity lemma shed light on the substructure of the graph. This substructure is easier to analyze than the full graph. Even further, the edges which were removed left a graph with triangles that were easy to find and count.