Problem B
Determining Nucleotide Assortments
Genes ’R Us specializes in analyzing strands of DNA to look for anomalies, matches, patterns, or whatever specific items their customers are interested in. You may recall from high school biology that the DNA molecule consists of two chains wrapped around each other to form the well-known double helix. The chains are made up of four different types of nucleotides, each distinguished by a specific nitrogen base: adenine (A), thymine (T), guanine (G) and cytosine (C).
A recent customer of Genes ’R Us wants to be able to quickly determine which of the four types of nucleotides is most prevalent in a given section of DNA, as well as which are second, third and fourth most prevalent. For a given strand of DNA (which may contain tens of thousands of nucleotides) there may be many such sections where this information is desired. You’ve been asked to write an efficient program to answer such questions.
Input
Input starts with a string describing a DNA strand. This string consists of the characters A, T, G and C and have positive length $n \leq 50\, 000$. The next line contains a positive integer $m$ ($m \leq 25\, 000$) indicating the number of sections of the DNA strand to investigate. Following this are $m$ lines each containing a pair of integers $s_ i$ $e_ i$, each indicating the start and ending positions of a section of the strand, where $1 \leq s_ i \leq e_ i \leq n$. Integer pairs are separated by a single space.
Output
For each section output a line containing a permutation of ATGC, where the first character indicates the most prevalent nucleotide in the section, the second character indicates the second most prevalent, and so on. Break any ties using the ordering ATGC (i.e., print an A before an equally occurring T, G or C, print a T before an equally occurring G or C, and so on).
Sample Input 1 | Sample Output 1 |
---|---|
TATATGCTCT 3 1 10 6 10 6 6 |
TACG TCGA GATC |