# Chromosome structure via Euclidean Distance Matrices

Chromosome structure via Euclidean Distance Matrices

The data represents the auto-correlation coefficients (6MB video) for gene expression of 3827 genes from the circular chromosome of E.coli across 49 different experimental conditions. In the data, the axis is ordered according the order in which genes appear in the E.coli circular chromosome with an arbitrary start and end point. The expression was smoothed at various resolutions to highlight spatial patterns at different scales. In this way the correlation matrices complement each other. Bright green indicates a correlation coefficient of +1 and bright red indicates anticorrelation, -1. It is assumed that the E.coli chromosome is structured such that genes that positively correlate are close in distance within the cell, whereas genes that anticorrelate are far in distance. The exact relation is unknown but it would be interesting to try the alternate hypothesis to see what effect this has on the structure of the molecule.

Last frame from video

In that data video, the left frame represents autocorrelation of a subset of genes at successively increasing levels of resolution. The right frame is the same autocorrelation except that, prior to decomposition of the signal into different resolutions, the positions of the genes along the chromosome are randomly permuted. (On the left is experimental data, on the right is control data; i.e., what one would get from white noise.) The idea is that this represents the autocorrelation one would expect if there were no information in the relative positions of the genes along the chromosome. As such, the right frame is a null hypothesis.  $LaTeX: -$Ronan Fleming

## Realization of Control data

%%% Ronan Fleming, E.coli molecule data
%%% -Jon Dattorro, August 9 2008
clear all

frame = 4;                                          % 1 through 12
G = her49imfs12movfull(frame).cdata;                % uint8
G = (double(G)-128)/128;                            % Gram matrix
N = size(G,1);

Vn = [-ones(1,N-1); speye(N-1)];
[evec evals flag] = eigs(Vn'*G*Vn, [], 20, 'LA');
if flag, disp('convergence problem'), return, end;

close all
Xs = [zeros(3,1) sqrt(real(evals(1:3,1:3)))*real(evec(:,1:3))'];  % Projection of -Vn'D Vn on PSD cone rank 3
plot3(Xs(1,:), Xs(2,:), Xs(3,:), '.')

## E.coli realization

Test image E.coli

I regard the autocorrelation data you provided as a Gram matrix.

Then conversion to a Euclidean distance matrix (EDM) is straightforward -
Chapter 5.4.2 of Convex Optimization & Euclidean Distance Geometry.

The program calculates only the first 20 eigenvalues of an oblique projection
of the EDM on a positive semidefinite (PSD) cone - Chapter 7.0.4 - 7.1 ibidem.

You can see at runtime that there are many significant eigenvalues; which means, the Euclidean body (the molecule) lives in a space higher than dimension 3, assuming I have interpreted the E.coli data correctly.

To get a picture corresponding to physical reality, we obliquely project the EDM on the closest rank-3 subset of the boundary of that PSD cone; this means, precisely, we truncate eigenvalues.

It is unlikely that this picture is an accurate representation unless the number of eigenvalues of that projection approaches 3 prior to truncation.

Matlab Figures allow 3D rotation in real time, so you can get a good idea of the body's shape.

I include a low-resolution figure here (frame 4) for reference.

## Knot Plot

Knot Plot by Ronan Fleming, 2009