Optimization Layer

Calculating a minimum spanning tree for a complete graph is a difficult problem, especially when path weights are multiplicative, as opposed to additive, which they are in join optimization. SQL mostly uses heuristics, leaning on the optimal algorithm only for small table sizes.

However, the edge traversal optimization problem in SourceScape is not quite the same as the join optimization problem in SQL. An optimized SQL join is a minimum spanning tree through a complete graph where each node is a table. The graph we are optimizing for edge traversals is not necessarily complete and is actually quite likely not complete.

This means that it is not particularly bad to simply enumerate all spanning trees and estimate the cost by apply heuristic weights for each edge on the number of

Suppose we have

javascript::class(A)[name="*Component"].
javascript::class_method(A, B).
javascript::method_arg(B, C).

This constraint graph can be traversed in 3 ways:

Each of these traversals has different performance characteristics. Start with classes named “*Component” will yield a small number of initial nodes, but will fan out for each of the edge types that we traverse. Starting from method-args will yield a huge number of initial nodes, but will not fan out when traversing backwards.

Edge fanout makes a big difference, especially with edges like function-contains. If you’re looking for a function that contains an expression, it is almost guaranteed to be better to look for the expression first and then traverse up to the function.

Finally, a minimum spanning tree might not be enough to cover the desired constraint graph.

javascript::function_arg(A, B).
javascript::array_member(C, B).
javascript::function_returns(A, C).

For this, we need to traverse both edges and perform an intersection.

Last updated