Leveraging Genomic and Microarray Data to Find Direct Targets of the C. elegans Transcription Factor TBX-2
thesisposted on 11.06.2014, 00:00 by Thomas J. Ronan
Four independent methods were used to produce a data set enriched for direct targets of C. elegans TBX-2. A microarray experiment comparing wild-type and mutant tbx-2 embryos was analyzed, resulting in a set of differentially-expressed genes which are enriched for genes downstream of tbx-2. A model for a class of T-box transcription factor binding sites was created in order to predict all likely binding sites in the C. elegans genome. Genes lacking binding sites in their regulatory regions were then ruled out as direct targets. An analysis of conservation of predicted T-Box binding sites was carried out in order to identify the most likely biologically functional sites. An existing embryonic time course microarray data set was used to identify genes which demonstrate expression levels and timing compatible with being direct targets of TBX-2. Each of these methods can be used independently to identify a subset of genes enriched for targets of TBX-2, but each method is limited in scope and each method has the capability to identify false targets. Both the microarray-based methods rely on expression patterns in time or level, but lack information about the presence of binding sites through which TBX-2 could act directly. The methods based on transcription factor binding site prediction rely on the presence of binding sites to predict direct targets, but are unable to consider expression information. Since the different methods used derive from different assumptions, the results of these methods should not share the same false positive predictions. Thus a combination of methods should provide a more reliable prediction. Finally, these methods are combined and should result in a pool of potential targets enriched for direct targets of TBX-2.