ArabiTag Help Topics and FAQ


What is ArabiTag?

ArabiTag is a database and Web site designed for exploring EST support for alternative splicing (AS) in Arabidopsis thaliana. ArabiTag is based on detection and analysis of AS Events proposed in the TAIR9 gene models from the Arabidopsis Information Resource (TAIR).

What data does ArabiTag contain?

The ArabiTag system includes Arabidopsis thaliana gene models from the TAIR9 collection and alignments for ESTs that were publicly available from dbEST in July, 2009.

How were ESTs aligned?

We aligned ESTs onto the TAIR9 genomic sequence using blat (from UCSC Genome Bioinformatics) with default parameter settings and maxIntron=35000.

How were EST alignments filtered?

We excluded low-quality alignments, defined as alignments with less than 95% identity or 90% coverage. (Coverage is the percentage of the query EST that was included in its alignment.) We also discarded ESTs that generated multiple high-quality alignments. Only ESTs that showed evidence of splicing (alignment gaps with 30 or more bases) were considered for splicing support.

How do ESTs support introns?

In ArabiTag, an EST supports an intron when its genomic alignment contains a gap with boundaries that coincide with the boundaries of the intron and there are at least twenty aligned bases on either side of the gap.

What are AS Choices and AS Events in Arabitag?

AS Events and AS Choices are concepts ArabiTag uses to classify and analyze alternative splicing choices (abbreviated AS) proposed in the Arabidopsis gene models.

ArabiTag defines an AS Event as a scenario in which two gene models with the same locus id disagree with respect to the splice boundaries of an exon in one model and an intron in the other. More formally, an AS Event consists of an intron in one model that overlaps with an exon in another and where the region where they overlap is internal to both. (The requirement that the region be internal to both helps ArabiTag avoid confusing alternative promoter and polyadenylation choices with alternative splicing.)

Each AS Event has two AS Choices, one for each model with conflicting exon and intron boundaries.

What are Gene Absent and Gene Present in ArabiTag?

As described above, an AS Event (in ArabiTag) means a situation where two gene models from the same locus id disagree with respect to the splice boundaries of an exon in one model and an intron in another. More formally, an AS Event comprises an intron in one gene model that overlaps an exon in another.

The gene model that contains the overlapping exon is called Gene Present (abbreviated GP) and the gene model that contains the overlapping intron is called Gene Absent (abbreviated GA).

The overlapping exon from the GP is called the exon overlap (abbreviated eoverlap) and the intron is overlaps in GA is called the intron overlap (abbreviated ioverlap).

The segment or region of genomic sequence where the intron and exon overlap is called the Difference Region, because it represents a place where the two models differ; it's a region that the GP model includes as part of its mature spliced sequence and which the GA model omits as part of its processed, spliced sequence.

Except in cases where the splicing event involves a retained intron, the GP model also contains an intron which overlaps the ioverlap present in the GA model. We designate the intron contained in the GP model that overlaps ioverlap as the intron alternative, or ialt.

What is an AS Event Id?

Each AS Event we detect by comparing pairs of gene models from the same locus receives a numeric identifier called an AS Event Id. ArabiTag uses these numeric identifiers to keep track of AS Events and reports them as part of most query results.

Typically, clicking a link that looks like this -- http://www.transvar.org/arabitag/ASComparison.jsp?AS_Id=187 -- will retrieve a page that describes an individual AS Event and reports EST libraries containing ESTs that support AS Choices associated with that AS Event.

What is a Difference Region in ArabiTag?

Each AS Event is associated with a single Difference Region, defined as the region of overlap between an exon in GP (Gene Present) and an intron in GA Gene Absent. As described above, we classify AS Events as pairs of gene models arising from the same locus in which one model contains an exon that overlaps an intron in another. (Please see What are Gene Absent and Gene Present in ArabiTag? and What are AS Choices and AS Events in ArabiTag?.)

A Difference region is defined by the chromosome on which it resides and its start and end position along that chromosome. Please note that we use interbase coordinate to specify genomic positions. For more information about interbase and how it works, please see this page from the Chado genomic database documentation.

How does an EST support an AS Choice or an AS Event?

An EST supports an AS Event when it overlaps a Difference Region and when it supports either one or the other of the AS Choices associated with the event. For an EST to support an AS Event, it must show some evidence of splicing. That is, its genomic alignment should contain at least one gap that suggests splicing, removal of an intron.

A spliced EST supports the AS choice exemplified by the GA model provided the EST's genomic alignment contains a gap (an intron) that coincides with the ioverlap relevant to that AS Event. An EST can support the other choice (exemplied by GP) provided it contains a gap supporting ialt contained in GP. However, if the AS Event involves a retained intron, in which case there is no ialt, then the EST can support retention of the intron provided its alignment extends at least 20 bases on either side of either the 5- or 3-prime boundary of the retained intron that GP contains but which is absent in GA.

What does it mean for library to support an AS Choice or an AS Event?

AS Event Report pages present a breakdown of the numbers of ESTs from different libraries that support different alternative splicing choices. This information is useful when you want to get an idea of the expression patterns for individual splicing choices. For example, if a library contains many ESTs that support just one variant of an alternatively-spliced gene, then this may provide a clue about tissue- or condition-specific alternative splicing regulation for that gene. However, one limitation of this analysis is that most libraries derive from diverse mixtures of sample types and so it is very difficult to make strong inferences about tissue or condition-specific splicing using ESTs. However, this may change as more labs contribute high- througput sequencing data sets to GenBank.

The AS Event Report pages in ArabiTag report Library support for AS choices in three sections. The first section shows the libraries that contain a mix of ESTs that support either the GA or GP choices. The next section lists the libraries that support GA and the last section lists the libraries that support GP only. ArabiTag also reporst the number of ESTs in each library for each section.

What is an AS Event Type?

ArabiTag classifies each AS Event as one of the following types: AS: Alternative Acceptor Site; DS: Alternative Donor Site; RI: Retained Intron; ES: Exon Skipping. Because exon skipping involves changing donor and acceptor sites flanking the skipped intron, ArabiTag reports exon skipping events as DS/ES and AS/ES.

What do the p values mean?

For AS events where there is at least one EST that supports one or the other choice, ArabiTag attempts to assess the degree of skew toward one choice or the other using a test based on the binomial distribution.

If each choice is about equally likely, then the distribution of EST support for both choices should be about equal. This is analogous to flipping a coin many times to find out if it is fair. If you flip a fair coin many times, you expect to get about the same number of heads as tails, allowing for some random variation. ArabiTag tests whether splicing is similarly "fair" using a binomial test of a null hypothesis that the two choices are equally likely. Exceedingly small p values resulting from this test indicate that one choice is much, much more likely than the other.

Also, please note that the test was performed many times, once per choice. (This is called multiple hypothesis testing and has been exhaustively discussed in the statistics literature.) Even if everything were completely random, the test would yield p values less than alpha = 0.05, the standard cutoff for judging a test to be statistically significant, around 5% of the time. So if you perform a thousand tests on perfectly random data, you'd expect to get a p value less than or equal to 0.05 in about 50 cases. For this reason, you should view the p values as a very rough guide to bias in AS and you should set your alpha for assessing the significance of any single test very low. (In the paper, we used 9e-6, roughly 0.05 divided by the number of unique tests.)

Why do some AS Events have null p values?

Not all AS Events are covered by spliced ESTs supporting both or even just one of the AS Event's alternative splicing choices. If there are no ESTs that support either one of the choices, then ArabiTag does not attempt to compute a p value assessing the bias for or against a particular choice, since there is no data available (no ESTs) relevant to that event.

If you are searching the AS Events using the Search AS Events query tool and would like to retrieve all AS Events and Difference Regions proposed in the TAIR9 gene models without regard to their EST support, then check the "include Null p-value (unsupported) events" option.

What are minimum ESTs?

The Search AS Events page includes an option to set the minimum number of ESTs supporting each AS choice or the entire AS event.

If you choose the "Each AS Choice" option, then only those AS Events for which each alternative AS Choice has the requested minimum number of supporting ESTs will be returned. You should select this option when you want to retrieve just those AS Events where each AS choice has a given level of supporting from the ESTs relevant to their AS Event.

Alternatively, if you choose the "Entire AS Event" option, then only AS Events with at least the requested minimum number of overlapping ESTs that support one or the other choice will be returned. You should use this option if you just want to look up all the AS Events with a given minimum number of relevant ESTs, regardless of the specific alternative splicing choice they support.

What do the CSV buttons and checkboxes do?

Many ArabiTag search tools and query results pages allow you to save results to your computer hard drive in csv (comma-separated) formats. This makes it possible for you to import the results into Excel, R, or other programs that let you examine the results in greater detail. Clicking or checking buttons or checkboxes named "csv" will tell ArabiTag to save the output of the query as a csv file on your hard drive instead of displaying the results as a Web page in your browser.

What kind of information is in an EST library?

ArabiTag includes ESTs from many diverse libraries prepared from many different sample types, all from Arabidopsis thaliana. On many pages, you will see EST libraries reported by their dbEST numeric id or name, and you will often see links to these library's records in the UniGene database at NCBI

At this time, NCBI dbEST does not have strict standards the types of information must be submitted with newly sequenced ESTs regrading the samples from which they were prepared. However, it is common to find data such as description, tisse type, and developmental stage.

How are EST libraries compared?

When investigating an alternative splicing event of interest, one may find it useful to compare the libraries from which ESTs that support the AS event are derived. ESTs relevant to an AS Event are separated by which gene model they support and which libraries they are from. ArabiTag then compares the respective sets of libraries between the choices and returns which libraries show support for both choices (shared libraries), and which libraries support only one of the choices (unique libraries). This could potentially show a functional dependency of an alternative splicing event.

Using Integrated Genome Browser to view AS Events and overlapping ESTs.

Next to each Difference Region, you should see an IGB icon (). If you have already launched IGB and loaded the TAIR9 mRNA annotation data set, then clicking the IGB icon will tell IGB to scroll and zoom to that Difference Region and the associated gene models.

To view Difference Regions, gene models, and ESTs, follow these instructions:

  1. Launch IGB. Visit http://igb.bioviz.org. Click the Download link and on the next page you see click "Start with Java Web Start."
  2. 1. Choose TAIR9 genome. Click the Data Access tab and choose Arabidopsis thaliana and then choose genome version A_thaliana_Jun_2009.
  3. 2. Load Gene Models. Click the BioViz Quickload Data source under the Data Access tab and choose the TAIR9 mRNA data set. Click the checkbox next to the TAIR9 mRNA data set. It should now appear in the Load Mode table in the center of the display. Choose Whole Genome as the load mode. The gene models should then download into the IGB display.
  4. 3. Load spliced ESTs.Click the BioViz DAS2 Data source under the Data Access tab and click the checkbox next to the data set named spliced ESTs. The data set should now appear in the Load Mode table, next to the mRNA gene models. Choose Region in View as the Load Mode if it is not already pre-selected as the default.
  5. 4. Click an IGB link. To try it out, you can use this one if you like: . IGB should then zoom and scroll to the region specified in the IGB link.
  6. 5. Load spliced ESTs. To load spliced ESTs for the region in view, click the Refresh Data button in the Data Access panel or next to the zoomer at the top of the display. To zoom in or out, click-drag the slider at the top of the display. To stretch the display in the vertical direction (necessary if you have a lot of ESTs), operate the vertical slider on the left side of the display.
  7. 6. Adjust Max Expand. To save space, IGB by default will draw some ESTs on top of each other if there are a lot of ESTs in a given region. To see all the ESTs, right-click (or control-click) the ESTs tier label and choose Adjust Max Expand. Set the Max Expand to 0 to ensure that no ESTs will get drawn on top of each other.

Searching AS Events

The Search AS Events tool allows you to search for AS Events and Difference Regions by p value, by AS Event type, and by the number of ESTs supporting one or both choices.

To retrieve all AS Events, including AS Events that have no overlapping ESTs, check "Include Null p-value AS Events". This will ensure that all events will be returned regardless of whether there were any ESTs supporting them.