Introducing IGB for Arabidopsis: a resource for Arabidopsis genome data and Visualization

Ann Loraine(1), Gregg Helt(2), Ed Erwin(2), Sue Rhee(3), Eva Huala(3)

(1) Section on Statistical Genetics, Biostatistics Department, University of Alabama at Birmingham
(2) Affymetrix, Inc., Emeryville, CA
(3) The Arabidopsis Information Resource, the Carnegie Institute, Stanford, CA

Companion Web site for a poster presented at the 2005 Arabidopsis Research Conference. Here is a link to the poster abstract.

Last update: May 24, 2005

Introduction

The Integrated Genome Browser (IGB, pronounced 'Ig-Bee') is a Java software program that displays genes, sequence data, and microarray expression data all in the same view. It allows dynamic, real-time zooming and panning through a chromosome, link-outs to external databases, searching, and many other features and functions. You can use IGB to view data from Distributed Annotation Servers and IGB-specific Quickload sites as well as from custom data files you create, provided they follow an IGB-compatible format.

Some history

IGB got its start as a visualization application for genome tiling arrays produced by Affymetrix and as a replacement for the Neomorphic Annotation Station, which TIGR curators used to annotate early releases of the Arabidopsis genome. In September, 2004, Affymetrix released the source code for IGB on their Web site under the OSI-approved Common Public License Version 1.0. And the following January, the IGB team imported the code into the newly-created Genoviz project cvs repository on SourceForge. IGB and the library used to create it - an updated version of the old Neomorphic Genome Software Development Kit - are now open source software and are freely-available for anybody to use.

A warning to users!

If you are not a developer or bioinformaticist, do please approach IGB with caution or at least an open mind. The program is extremely complex and still very experimental. As a result, you may find it somewhat confusing to use. In addition, there are some known bugs - errors in the software we have not had time yet to fix. And of course, there are probably bugs we simply don't know about yet. For a list of known bugs, or to report new ones, please use our bug-tracking system at the Genoviz site.

Getting more info

There has not been enough time (or funding) to make the program more accessible to newcomers, non-programmers or non-bioinformaticists. We are doing our best, however, and if you can suggest improvements, the IGB developers would be happy have them. We are planning to set up a feature request site, but in the meantime you can email requests to Ann Loraine. (See email address below.)

If you find you are having problems, the IGB users' manual is a good place to look for help. The manual describes the visualization and statistical methods in IGB and really is the best way to get to know the program. You can download a copy here. For an introduction to some of the ideas IGB implements, here are some potentially useful references:

These papers discuss some IGB antecedents, but the principles for visualization are more or less the same.

Methods for obtaining IGB

Method 1: Java Web Start

Probably the easiest way to get a copy of IGB is to use the Java Web Start Page hosted by Affymetrix. To open a new window to the Java Web Start launch page at Affymetrix, click here. On this page you will find three buttons that allow you launch IGB with different amounts of computer memory available for the program's use. It is usually a good idea to choose the maximum memory option, depending on your computer.

Method 2: Download and compile the source - (mostly) for developers

Another option for obtaining a working copy of IGB is to obtain the source code from the Genoviz project hosted at sourceforge.net. Use cvs to check out a copy of the genoviz module - just follow the standard anonymous checkout protocol.

Method 3: Download pre-compiled 'jar' files - also (mostly) for developers

The Genoviz site also provides pre-compiled Java '.jar' files you can use to run IGB if you do not wish to compile the source code yourself. Take a look at the 'build.xml' file for details on how to launch IGB using these pre-compile .jar files.

Access Arabidopsis Data

Step One: Launch IGB

Download the program either by using the Java Web start method or by compiling and running the source code yourself.

Step Two: Select Arabidopsis genome

Click the Quickload tab in the bottom panel of the display. If you want to use the default Quickload site at Affymetrix, use the pulldown menu on the left to select the Arabidopsis genome option. As of this writing, the genomes listed are not in any particular order, and Arabidopsis, since it was added more recently, is at the bottom of the list. A technical note: these genomes are listed in the same order as they appear in this Quickload server configuration file. When IGB starts up, it accesses this file over the network and builds its genome selection menu from what it finds in this file. IGB always accesses the Quickload site at Affymetrix by default, unless you tell it otherwise in your user preferences (igb_prefs.xml) file.

When you choose the Arabidopsis genome, a list of chromosomes will appear. To select a chromosome, click it.

To use an alternative Quickload server, such as the experimental Quickload site hosted at UAB, click the 'Quickload' tab and then click 'Quickload Options.' Then, enter the URL of the alternative server, e.g., http://www.transvar.org/data/quickload. The various genomes that are available at this new Quickload site will then appear in the Quickload genome menu. If you would like to set up your own Quickload server and tell IGB to access this server by default, edit your user preferences file. If you use the Affymetrix Java Web Start page to start IGB, this preferences file (named igb_prefs.xml) will be downloaded and saved in your home directory. You can edit this file to change aspects of IGB's behavior, such as the IGB Quickload site it accesses when the program launches.

Here is an example.

Note that in order for IGB to find its preferences file, the file must be named 'igb_prefs.xml' and should be saved in your home directory.

Step Three: Load some annotations data - genes or microarray probes

To load annotations, click the checkboxes to the right of the Quickload panel. Thus far these include:

To zoom in on a gene, click in the display to select an item and set the horizontal zoom focus, and then operate the horizontal slider above the main map display. When you do, you should see the selected item in greater detail. The blocks represent exons and the lines connected them represent introns. The shorter regions on the left and right (5-prime and 3-prime) ends of the gene represent untranslated regions.

To view a TAIR Web page showing more information about the selected item, right-click the item (on Mac, control-click it) and select Get more info on the popup menu. If the popup menu that appears does not include this option, choose Select Parent and try again. (A red outline should appear around the entire gene model if the parent - the gene model - was selected.) Once you select the Get More Info option, your Web browser should then open to a Web page showing information about the selected gene.

Mac users: If the Get more info command doesn't work, you probably need to tell IGB how to open your default Web browser. To tell IGB how to open a Web page, set your IGB Browser command. Select the File menu, choose Preferences and click the Other Options tab. Scroll down and type in one simple command in the Browser command text area: open.

Here is an example of IGB zoomed in to show a gene. Note that the vertical slider (the vertical zoomer) is adjusted so that the vertical dimension is zoomed in a little - this makes the gene models taller and easier to see. Also note the white lines on the edges of the exons. Blocks with white lines on their boundaries represent gene models (mRNA variants) with 5-prime or 3-prime boundaries that are identical to the currently selected item, which is outlined in red.

If you click the ATH1 or AG options, then IGB will load all the probe sets (minus controls) for these two Affymetrix GeneChips. Here is a zoomed-in picture of a probe set:

The blue figure represents an alignment between the genome and the probe set design sequence, the sequence that was originally used to select probes for the chip. Typically, these design sequences are taken from known mRNAs, EST cluster consensus sequences - any sequence that is expected to be expressed. In the case of the Arabidopsis GeneChips, these sequences are usually gene models from the TIGR genome annotations.

The blue blocks represent regions where the design and genomic sequences align, and the lines represent gaps in the design sequence portion of the alignment. The pink-colored bars represent the location of individual probes. Note that when a probe is split across two alignment blocks, it is split into two parts connected by a line.

Step Four: Load Sequence

To load sequence data, click "Load All Sequence Residues." Be patient - this could take a few seconds because the entire chromosome is being loaded over the network.

What's happening behind the scenes is that a '.bnib' file for the chromosome is being loaded over the network into IGB. This '.bnib' file is just the sequence data saved in a specialized compressed format designed for storing DNA sequences.

The other option - "Load Sequence Residues in View" - does not work yet for Arabidopsis data. For this option to work, we would need a Distributed Annotation Server to be set up for Arabidopsis. This is a server that can deliver sequence data in small chunks (called "nibbles") and thus reduces time-to-load and memory use inside the program.

Volker Brendel's group at Iowa State has expressed interest in setting up a DAS server for Arabidopsis at some point. If you know of anyone else who is working on this, please let us know.

Step Five: Load expression data

IGB is designed to display expression graph data from genome tiling arrays, microarrays that have probes (more or less) evenly spaced along the chromosome. These arrays have proven to be extremely useful for discovering new regions of transcriptional activity in many <<<<<<< index.html different genomes, including Arabidopsis. We are working on obtaining some genome tiling array data for the version 5 Arabidopsis data, which is what IGB currently shows (see below).

To see a preview of IGB functions related to displaying expression data, click the Graph Adjuster tab. This tab contains many different genomes, including Arabidopsis.

To load data from the Arabidopsis genome tiling arrays described in Yamada, et al "Empirical analysis of transcriptional activity in the Arabidopsis Genome," appearing in Science in October, 2003, first open this directory. The files in this directory contain expression data from several different tissues and from the forward (Watson) and reverse (Crick) strands of the target chromosomes.

To open the files, download them to your local computer's hard drive. Then use the Open File function under IGB's file menu to import the data. To adjust how the graph looks, use the Graph Adjuster tab. This tab contains many functions useful for statistical manipulation of expression data within the viewer. For more information about IGB's statistical capabilities, read the IGB User Guide sections related to displaying and manipulating graphs.

About the graph (expression) data files

To get detailed information about how these data were generated, i.e., information about image scanning, normalization, and other pre- and post-processing steps, the best source is the Science paper's supplementary data files. You can obtain these by visiting the on-line version of the article. We are grateful to Joe Ecker and his group, especially Huaming Chen, for sharing the data, answering our questions, and helping us use it to demonstrate IGB functionality.

To get started working with this data, you need to know three basic things about how the experiments were done as well as how to view the data in IGB.

  1. Data file names containing a "C" indicate that the data are from probes selected from the Crick (reverse or bottom) strand of the chromsome. Conversely, data files names containing a "W" are from the Watson (forward or top) strand.
  2. The target or sample preparation protocol used was the same as the target preparation protocol used with regular Affymetrix chips. This means that labeled cRNA is from the antisense strand of the target mRNA. Thus, Crick strand probes will hybridize to mRNAs expressed from bottom strand genes in IGB, and Watson strand probes will hybridize to mRNAs expressed from the top strand genes.
  3. Third, when you open a graph file, make sure you are showing the same chromosome that the graph is from.

Graph view settings

Once you open a graph file, it will be placed into a separate tier, or layer, in the main map window. To compare it to known annotations and find expressed genes, it is helpful to be able to drag it over the tier of annotations you would like to examine. To turn the graph tier into a draggable graph, click the graph to select it and then click the Graph Adjuster tab. Then click the Floating checkbox. To see a vertical scale showing the range of values in the graph, click Y axis checkbox in the section labeled Decorations.

It is also helpful to adjust the scale of values the graph shows. Expression values are usually very unevenly distributed. That is, there are a few values that are very large and many values that are much smaller. If IGB must show the entire range of values in the graph, it will be hard to see expression values in the lower ranges. The extremely large values are outliers and are (usually) not very interesting; to view a more informative range of values, use the Visible Bounds box to adjust the visible range. To adjust the visible range, click the graph to select it and then click-drag the sliders next to the By Value or By Percentile boxes. This will have the effect of changing how large and small values are shown. For instance, if you set the Min and Max values to the 15th and 90th percentiles, then all values below and above these thresholds will be shown at the minimum and maximum heights in the graph. You can see how this works by watching how the y-axis scale changes as you adjust the Visible Bounds settings.

File name conventions

Data files are named after the target chromosome, the sample, and the target strand:

[chromosome].[sample].[strand].gr.gz

The files are also compressed. (You don't have to uncompress them before opening them in IGB.)

Sample types include:

Some example graph views

The image below shows a close-up on ribosomal protein gene AT2G18020 and graph data from the anther (AN) data for the Watson (top, W) strand of chromosome 2. Note that the graph adjuster settings are set to display the expression data as a stair-step pattern, which sometimes makes expression of individual genes easier to see.

Here is perhaps a more interesting example. This image shows anther-specific expression of the AT2G19110 locus, together with the probes and probe set consensus for a probe set from the ATH1 array. If you follow the links to the TAIR Web site, you will find that experiments using the ATH1 array confirm that this gene is expressed in developing flowers. Based on homology data, it appears to encode a cadmium-transporting ATPase and is localized to the membrane.

Sequence data provenance

The sequence data are from:

ftp://tairpub@ftp.arabidopsis.org/%2Fhome/tair/home/tair/Sequences/whole_chromosomes/

and so far appear to be identical to the latest available Genbank versions listed below, except for the mitochondrial sequences, which differ in length by one base. Copies of these sequences are available here.

Sequence data file provenance
chromosome TAIR sequence file Genbank equivalent Size (bp) IGB .bgn file
1 ATH1_chr1.1con.01222004 NC_003070.5 30432563 chr1.bnib
2 ATH1_chr2.1con.01222004 NC_003071.3 19705359 chr2.bnib
3 ATH1_chr3.1con.01222004 NC_003074.4 23470805 chr3.bnib
4 ATH1_chr4.1con.01222004 NC_003075.3 18585042 chr4.bnib
5 ATH1_chr5.1con.04172003 NC_003076.4 26992728 chr5.bnib
chloroplast ATH1_chloroplast.1con.01072002 NC_003071.3 154478 chrC.bnib
mitochondrion ATH1_mitochondria.1con.01072002 Y08501.2 366923 chrM.bnib


Directs questions and comments to .