IN SILICO ANTIMALARIAL TARGET SELECTION CONSERVED IN FOUR PLASMODIUM SPECIES

Objectives: The need for new antimalarials drugs and drug targets is pertinent due to the emergence of drug resistant strains of the parasites. Improper target selection has resulted in therapeutic failure. The genomic/post genomic era has made possible the deciphering of the 3D crystal structures of proteins and DNA which are drug targets and are deposited in the protein data bank. Methods: Novel antimalarial targets obtained from evolutionary conserved short sequence motifs are utilised and are essential in transcription processes in the parasite. The motifs TGCATGCA, GTGCAC and GTGCGTGC were curated from experimental work, validated and analysed via phylogenomics genomics and comparative genomics. PlasmoDB blastn was applied to determine their similarity in Plasmodium vivax, knowlesi, Ovale and yoeli . The complete genome of Plasmodium falciparum, P. vivax , P. knowlesi, P. Ovale and P. yoeli was downloaded from the plasmoDB and their positions determined. Results: The targets are essential, conserved in rodent and mammalian species via phylogenomics with percentage identity and similarity greater than 80%, have no similar genes in the same genome and also found to be selective in the parasites vis-à-vis the Homo sapiens via comparative genomics with 0% identity and similarity in the human genome. Conclusion: The targets reveal at the molecular and biochemical level, the vulnerable regions in the parasite while safe in human hence their choices in subsequent rationale drug discovery and design protocols.


INTRODUCTION
Human malaria with estimated 515 million cases of is generally caused by four species of Plasmodium namely P. falciparum, P. vivax, P. ovale and P. malariae and are transmitted by the bite of the female anopheles mosquito 1 . Other mammalian Plasmodium species implicated include: P. knowlesi and P. yoelii. P. falciparum has the greatest toll on human health most especially in children under five years of age. The severity of malaria apart from the characteristic fever and anaemia include neurological involvement which may result in coma and subsequently to death. In the case of cerebral malaria, there could be damage to the hippocampus evident by impaired learning ability and cognitive function. The lifelong potentials of the child are therefore reduced 2 . Novel drugs have been launched into the drug discovery pipelines by the Medicines for Malaria Venture MMV and currently MMV pathogen box 3 . Notwithstanding the propensity of the malaria parasites to become drug resistant still poses a major problem which may threaten the efficacy of promising new antimalarial chemotypes such as the synthetic peroxide OZ439 4 and the spiroindolone NITD 609 5 over time and this scenario will persist until the human-pathogenic Plasmodium species are eventually eradicated 6 . Despite recent advances in the "omics" field Lack of sufficient understanding of biologic networks, incomplete understanding of the biology and dynamics of drug-target interaction, lack of efficient and accurate ways to determine clinical relevance and the need to integrate massive amounts of sequence and phenotype data from the public and private sector. Focus is placed on the generation of an inventory of potential targets for therapeutics based on human and pathogen genome sequence studies to replenish the depleted pipeline. Target identification and selection is a lengthy and complex process that is now expedited and improved by bioinformatic 7 .  A persistent bottleneck for target-based approaches is the identification of a suitable drug target in the first place 8 . This enzyme should be essential for survival of the parasite and sufficiently different from its closest counterpart in the human host to be inhibited selectively. Experimental tools to validate candidate drug targets are limited for the malaria parasites 9 . Several bioinformatics approaches have previously been employed to help identify or prioritize drug targets for Plasmodium parasites. These include methods based on automated identification of important steps in metabolic pathways 10 , methods that combine chemical starting points and protein-based queries 11 , as well as the use of the TDR targets webresource 12 (http://www.tdrtargets.org) to prioritize drug targets through the combination of multiple data types relevant to drug development 13 . A drug target must satisfy the following conditions to be selected (i). It must have conserved orthologues in all of the mammalian-pathogenic Plasmodium. (ii). It must have no other match in the Plasmodium falciparum. (iii). It must not occur or have a good match in all the human genome (chromosomes 1-23 and X, Y). (iv). Functions in the transcription process resulting in formation of gametocytes. The basis behind this principle is that conserved genes usually perform essential functions and in this case the motifs are essential in the regulation of gene 14 . These criteria were applied in the selection of candidate drug targets that satisfy the conditions stated above.

Motif detection
Evolutionary conserved short sequence motifs in a set of promoters were curated from experimental work. and Kyoto ecyclopedia of genes of genes and genomes (Aurrecoechea). The frequency of occurrence of the raw sequence was determined in the mammalian and rodent Plasmodium species. Their upstream and downstream sequences were determined after the sequences of the Plasmodium falciprum hence the final sequences were not less than 16 base pairs long (Roche, Thomson et al.,) 16 . Blastn was applied to determine the percentage similarity and identity of the selected sequences in the other Plasmodium species selected. The entire Human genome was downloaded via the file transfer protocol ftp from the database of the National Centre for Biotechnology Information (NCBI) ftp site.
The download was done in the fasta format.
Comparative genomics was applied to determine the percentage identity and similarity of the final 16 base pair sequences in the human genome. The sequences of interest above were compared with all known sequences in a database to identify homologous sequences. The genomic sequences for Plasmodium falciparum and their functional annotations were obtained from plasma DB (version 4.1) software sequence similarity between P. falciparum sequence and human sequence was run on a standalone computer. The default parameters were selected, and the p-value was calculated to validate sequence similarity to a human chromosome across the 23 chromosome including the X and Y chromosome. The statistical significance of the similarity search in the human genome was obtained from the BLASTn results.

RESULTS AND DISCUSSION
The process of searching and obtaining clearly defined targets was done via extensive literature search in which the motifs were finally selected, based on their various roles in the biochemical and metabolic processes within the Plasmodium parasites 15 . The motifs are sequences covered by transcription factors and are conserved between evolutionary related species (Toucan ref). The 16 base pairs was selected as this length will represent one and a half turn of DNA so as to achieve specificity and selectivity in the Plasmodium parasite 16 . The evolutionary conserved short sequence motifs SSM TGCATGCA, GTGCAC and GTGCGT GC were experimentally determined and validated. Computational analyses carried out on these SSM via location in the downloaded Plasmodium falciparum genome gave the 16 base pair oligonucleotide sequences as follows CTGATGCATGCATTTA, ATTTCGTGCACCTACA and GTGTGTGCGTGCGGAT 16 . The 16 base pair oligonucleotides were selected based on the criteria that a conserved single copy gene that lacks close matches in the same genome is most likely to be indispensable 14 .  They were selected as targets based on their specific roles in the regulation of transcription process involved in the process of sporozoite formation as in the TGCATGCA motif. The second motif GTGCACA CAC was essential and regulatory for the process of merozoite invasion into the host cell while GCACGCGTGC is involved in the regulation of dephosphorylation process. These motifs were found to be conserved in the four species of Plasmodium namely Plasmodium falciparum, P. vivax, P. knowlesi and P. yoelii. ACCTGTGCATGCAGGGAA (TS3) derived from the GTGCAC motif and occurs in P. vivax and identified in literature 15 is also a DNA target that is explored in this study. The raw sequence motifs were found through computational analysis to be selective. Their frequency of occurrence in the Plasmodium falciparum genome was computed and then simultaneously the target sequences were obtained through the location in the various chromosomes. The frequency of TGCATGCA is 26 (Table 2.0) while GTGCAC occurs 89 times in the Plasmodium falciparum (Table 5) and GTGCGTGC occurs only two times (Table 6) in the Plasmodium falciparum.
These frequencies of occurrence are as opposed to the 16 base pair oligonucleotides derived from these motifs which are CTGATGCATGCATTTA, ATTTCGTGC ACCTACA and GTGTGTGCGTGCG GAT and occur only with a frequency of one in the Plasmodium falciparum (Table 4, Table 5 and Table 6) hence fulfilling the criteria of essentiality in the Plasmodium falciparum and the fact that it is a single copy gene. Further bioinformatics analysis also allowed the frequency of occurrence in the Human genome to be calculated in chromosomes number 1-22 and the X and Y chromosomes. The graphical representations are as presented in Figure 1, Figure 2 and Figure 3 respectively for the 16 base pair oligonucleotides CTGATGCATGCATTTA (TS1), ATTTCGTGCACC TACA(TS2), and GTGTGT GCGTGCGGAT (TS4). The plot of the Frequency of occurrence of the raw motifs against the base pair numbers revealed in the case of TS1 and TS2 a clear distinction in the occurrence of the Plasmodium falciparum pG as distinct from the human chromosomes HG 1-22 and the X and Y chromosomes. These oligonucleotides are therefore selective as drug targets in the Plasmodium species in which they are conserved and drugs developed against them will not be toxic in the human. The formation of sporozoites via the sequence TS1 CTGATGCATGCATTTA. Formation of merozoites, however, involves ACCTGTGCATGCAGGGAA (TS2) and ACCAGG GTGCACAAGCA (TS3). CATTTCGTGCACCTAC AT (TS4) regulates the dephosphosrylation process in the Plasmodium parasite. CATTTCGTGCACCTACAT (TS4). The sequences were located in their respective chromosomes in the genome and the 16 base pair were determined via a program to obtain sequences in the 5'-3'position from the motif as well as the 3'-5' position from the motif (Tables 1 to Table 3). The blastn revealed a similarity and identity of not less than 80% in the other species CTGATGCATGCATTTA has 100% similarity in P. vivax, malariae, ovale, knowlesi and 94% in P. yoeli.