Networks
–
Complex and Graph Neural

Frank Hellmann

Initial observations

We have a language clash: The graph in Graph Neural Networks (GNNs) is the network in Complex Networks. We will try not to use the word network in this Knowledge Repo. All Complex Networks and graphs we will consider are defined over an index set \(1 \dots n\).

A brief taxonomy of Functions on Sets and Graphs

Neural Networks (NN) are function approximators. Set based Neural Networks approximate functions from a set of inputs, meaning there is no particular order of the inputs specified. We call such functions set based functions. Graph Neural Networks approximate functions from data associated to graphs, and again, there is no natural ordering for these data, we refer to the functions as graph based functions. Network Measures in Complex Network theory are graph based functions for graphs with no data associated to them. They might go to a single number, to the set of nodes, or the set of edges.

It’s instructive to briefly consider set based functions on sets where the elements of the set carry no data, or equivalently, functions that don’t depend on the data, in analogy to Network Measures. If such a function takes values in, e.g. the reals \(\mathbb{R}\), it should tell us something about the set aspects of the set of inputs we have. The only property of a set independent of its elements is its size, thus any set based function that does not depend on the data associated to the set elements, is a function of the size of the input set.

In contrast, for graph based functions, much more complex functions are neccessary.

Classification through permutations

For all these functions, the order in which the inputs are presented does not matter. When we want to write a set based function taking values in the reals in the form \(f(x_1, x_2, \dots)\) they are permutation invariant \[\begin{aligned} f(x_1, x_2, \dots) = f(x_{\sigma(1)}, x_{\sigma(2)}, \dots)\end{aligned}\] where \(\sigma\) denotes a permutation.

For a graph based function, we need to permute the edges and the vertices at the same time.

\[\begin{aligned} f(x_1, x_2, \dots, e_{12}, \dots) = f(x_{\sigma(1)}, x_{\sigma(2)}, \dots, e_{\sigma(1),\sigma(2)})\end{aligned}\]

If the graph based function provides a set of data associated to the nodes or edges then the function has to be covariant under permutations:

\[\begin{aligned} f_1(x_1, x_2, \dots) = f_{\sigma(1)}(x_{\sigma(1)}, x_{\sigma(2)}, \dots)\end{aligned}\]

Generally speaking the domain and image of the function, written as a function on ordinary spaces, carry a representation of some permutation group.

Tensor towers

In the above the node features live in \(\mathbf{X}^n\) and the edge features in \(\mathbf{X}_e^{n \times n} = \mathbf{X}_e^{2n}\). The permutation group \(S_n\) acts naturally on these spaces by permuting indices in the tensor copies¹. More generally the permutation group also acts naturally on \(\mathbf{X}_k^{kn}\) where \(\mathbf{X}_k\) is some fixed base space.

We can take the direct sum of such spaces to obtain an entire tensor tower, \[\begin{aligned} \mathcal{X}= \bigoplus_{k=0}^{\infty} \mathbf{X}_k^{kn} \;.\end{aligned}\] This space also transforms naturally under \(S_n\). It also has an interpretation as being the space that carries data on hypergraphs. Hypergraph Neural Networks have actually already been defined in .

General structure results for permutation invariant functions

Theorems exist that guarantee that a permutation invariant function can be written as

\[\begin{aligned} \label{eq::1 to 0 structure} f(x_1, x_2, \dots) = g \left(\sum_i (h(x_i))\right)\end{aligned}\]

See theorem ?? . However this reperesentation might not be useful or meaningful, see e.g. the discussion in .

What is remarkable about this is that this representation theorem also allows us to compare functions on sets of different size, though the results of show that generalizations to larger sets is not guaranteed.

Question: Are there such theorems for graph based functions already known?

Combining functions

Let’s stick to these representations of \(S_n\) for now. We can denote the representations by noting the \(k\) that occur in the expansion. \((1)\) would act on a space \(\mathbf{X}^n\), \((1,2)\) on \(\mathbf{X}_1^n \oplus \mathbf{X}_2^{2n}\) and so on. We can then denote for a function what representation it’s domain and image carry. For example the function that counts the total edge weight \(\sum_{i,j} e_{i,j}\) is of the form \((2) \rightarrow (0)\).

A set based function \(f\) is of the form \((1) \rightarrow (0)\). A node wise network measure on a graph in the sense of Complex Networks is \((2) \rightarrow (1)\).

Functions can be combined in the obvious way, give a network measure \(C\) and a set based neural network \(\mathcal{NN}\) we can obtain a function on graphs by first applying the network measure and then the neural network.

Notation:

We write a set of elements indexed by \(i,j,\dots \in 1 \dots N\) as \(\{z_{ijk}\}\) meaning that the set is over all possible indices. If we want to take a set that is only over a part of the indices we write \(\{z_{ijk}\}_k\) to denote the set that covers all \(k\), and so on. Thus \(\{z_{ijk}\} = \{z_{ijk}\}_{ijk}\) This is not very clean, what we really want to say is \(z_{ij}\) up to permutation. This is the same as a set for one index, but not for several indices. Also needs to be tuned to work with covariant functions...

More interestingly, we can also use set based functions in the other direction to lift. Given set based functions \(f: (1) \rightarrow (0)\) and \(g: (1) \rightarrow (0)\) on on \(S_{n}\) we can create a set consisting of \(g_i = g(\{x_{ij}\}_{j})\) that transforms naturally under \(S_n\). We can then apply \(f\) to the set \(\{g_i\}\) to obtain a function

\[\label{eq::2 to 0 structure} f(\{g(x_{ij})\}) : (2) \rightarrow (0)\]

As an example, let the edge weights be the adjacency matrix \(A_{ij}\), take \(g\) to be the sum function and \(f\) the mean, \[\begin{aligned} g(M) &= \sum_{m \in M} m\\ f(M) &= \frac{\sum_{m \in M} m}{\sum_{m \in M}}\end{aligned}\] then \[f(\{g(A_{ij})\}) = \frac{\sum_{i, j} A_{ij}}{\sum_{i}}\] is the average degree of the network described by \(A\)

A conjecture

In [eq::2 to 0 structure], \(f\) and \(g\) are set based functions, we know that they can be expressed in the form [eq::1 to 0 structure]. Conjecture: Every function \((2) \rightarrow (0)\) can be written in the form [eq::2 to 0 structure].

Explicitly: \[f(\{e_{ij}\}) = g_2\left(\sum_j h_2 \left(g_1 \left( \sum_i h_1(e_{ij})\right)\right)\right)\]

By defining \(h_{21} = g_1 \circ h_2\) this simplifies

\[\label{eq::2 to 0 structure conjecture} f(\{e_{ij}\}) = g_2\left(\sum_j h_{21} \left( \sum_i h_1(e_{ij})\right)\right)\]

What do we know about covariant functions \((1) \rightarrow (1)\)? A form like \(y_i = g(x_i, h(\{x_i\}))\) seems natural...

The dynamics on graphs perspective

An initial observation

Take a discrete time linear diffusive dynamical system on a graph. The equations are given by \[\begin{aligned} x(t + 1) = \left( A \otimes 1 + B \otimes L \right) x\end{aligned}\]

with \(L\) the effective network coupling. Writing \(x\) in components \(x_{ai}\) with \(i\) denoting the graph, and \(a\) denoting the internal idnices this becomes

\[\begin{aligned} x_{ai}(t + 1) &= A_{ab} x_{bj} + B_{ab} L_{ij} x_{bj} \\ x(t+1) &= A \cdot x(t) + B \cdot x(t) \cdot L^T\end{aligned}\]

Assume for a moment we have no local mixing \(A = 0\), then we have

\[\begin{aligned} x(t+n) &= B^n \cdot x(t) (L^T)^n\end{aligned}\]

These type of terms are often a central ingredient in graph convolutional networks. With \(A \neq 0\) we get something like recurrence.

This raises the possibility that more general dyanmics on graphs could be of interest in the construction of GNNs, including possibly using ODE or stochastic dynamics instead of discrete time maps.

It also raises the question whether there are interesting structure results for functions of type \((1, 2) \rightarrow (1)\), which serve as the righthandside of a graph dynamical system.

One final note, the ARMA Filters for GNNs paper constructs a dynamical system that is supposed to converge to an interesting quantity. This is done by inserting a source in the dynamics above. Generally speaking the idea that by using dynamics we can solve equations could also be readily generalized.

Finally, a different notion of graph dynamics is given by functions \((2) \rightarrow (2)\) that change the graph itself, for example by stochastic rewiring. It would be interesting whether these types of dynamics have any use in the construction of GNNs.

Some thoughts on general structure theorems - linear case

A dynamical system on graphs is specified by a function \((1, 2) \rightarrow (1)\). That is, a covariant function \(f_i(\{e_{ij}\}, \{x_i\})\).

The dynamics are then either of

\[\begin{aligned} \dot x_i(t) &= f_i(\{e_{ij}\}, \{x_i(t)\})\\ x_i(t + 1) &= f_i(\{e_{ij}\}, \{x_i(t)\})\end{aligned}\]

The most typical covariant dynamics linear in \(e\) and \(x\) (separately) is given by \[\begin{aligned} x_i(t + 1) &= A \cdot x_i + \sum_j B_1 e_{ij} B_2 x_j\end{aligned}\]

To obtain the gneuinely most general form we would need to allow pure \(e_{ij}\) terms and also consider \(e_{ji} x_{j}\)

Generically this can be rewritten by going to an eigenbasis of \(B_1 \tensor B_2^T\). Call the left and right eigenmatrices \(B_k\) and \(B_k^r\) then we obtain \[\begin{aligned} x_i(t + 1) &= A \cdot x_i + \sum_k \sum_j Tr (e_{ij} B_k^r) B_k x_j\end{aligned}\]

Writing \({L_k}_{ij} = Tr (e_{ij} B_k^r)\) this form becomes a form of multilayer dynamics on effective networks given by the \(L_k\).

\[\begin{aligned} x(t + 1) = \left( A \otimes 1 + \sum_k B_k \otimes L_k \right) x\end{aligned}\]

Graphs encode a notion of locality. Functions that act covariantly can very easily be defined through local dynamics.

Linear dynamics on the graph seem to cover most cases of Graph Neural Networks considered so far.

Lifting to probabilities

If we don’t just want to look at functions from graphs to nodes, but also look at functions of probability densities of graphs and nodes, an interesting potential connection to QFT arises. QFT studies linear state spaces on tensor towers like we looked at above and specifically bosonic and fermionic statistics concern the question of how the spaces considered transform under permutation.

However, this is a different permutation symmetry than we have above. In the space \(\mathbf{X}^{kn}\) QFT considers permutations on the \(k\) rather than the \(n\). So a direct application of, e.g. Fock spaces is not possible.

This can be said more abstractly of course but it should be relatively clear I hope↩︎