Abstract:
Evidence indicates that members of many gene families in the genome
of an organism tend to have homologues both within their own genome and in the
genomes of other organisms. Amongst these homologues, typically only one or a
few per genome perform an analogous function in their genome. Finding subsets
of these genes which show evidence of performing a common function is an
important first step towards, for instance, the creation of phylogenetic trees,
multiple sequence alignments and secondary structure predictions.
Given a collection of taxa $P = {P_1, P_2,..., P_k}$ where $P_i$
contains genes ${p_{i1}, p_{i2},..., p_{in_i}}$, we ask to choose one gene from
each of the taxa $P_i$ such that these chosen vertices most agree. We de,ne most
agreeing in three distinct ways: most tree-like, pairwise closest, and pairwise
most similar.
We show these problems to be computationally hard from almost every
angle via classical, parameterized and approximation complexity theory.
However, on the positive side, we give randomized approximation algorithms
for the pairwise closest and pairwise most similar variants.