circos - visualizing the genome, among other things

来源:互联网 发布:网络招聘骗局 编辑:程序博客网 时间:2024/06/01 20:18

visual guide to Circos

A visual guide to Circos (Circos - an information aesthetic forcomparative genomics) presents some of the capabilities of Circos andillustrates its application in the field of comparative genomics andgenome visualization.

 

Download: medium bitmap (7Mb)| huge bitmap (46Mb)|PDF (40Mb)|Illustrator (20Mb) (PDF and Illustrator files are very complex)

Circos - an information aesthetic for comparative genomics - presented at Genome Informatics 2008, Hinxton, UK

what is Circos?


Video | An animated view of the similarity between human (upper half) and dog (lower half) genomes ...more

An image created with circos <a href='?screenshots'>...more</a>
Figure | An image created with circos ...more

Circos is designed for visualizing genomic data such asalignments, conservation, and generalized 2D data, suchas line, scatter, heatmap and histogramplots. Circos is very flexible — you can use it to visualize anykind of data, not just genomics. Circos has been used to visualizecustomer flow in the auto industry, volume of courier shipments, database schemas, andpresidential debates.

The creation of Circos was motivated by a need to visualize intra- andinter-chromosomal relationships within one or more genomes, orbetween any two or more sets of objects with a corresponding distancescale. Circos is similar to chromowheeland, to a lesser extent, genopix.

Circos uses a circular composition of ideograms to mitigate thefact that some data, like combinations of intra- and inter-chromosomalrelationships (alignments, duplications, assembly paired-ends, etc)are very difficult to organize when the underlying ideograms (orcontigs) are arranged as lines. In many cases, it is impossible tokeep the relationship lines from crossing other structures and thisdeteriorates the effectiveness of the graphic.

Specific features are included to help viewing data on thegenome. The genome is a large structure with localized regions ofinterest, frequently separated by large oceans of uninterestingsequence. To help visualize data in this context, Circos can createimages with variable axis scaling, permitting local magnification ofgenomic regions to be controlled without cropping. Scale smoothingensures that the magnification level changes smoothly. In combinationwith axis breaks and custom ideogram order, the final image can beeasily tuned to offer the clearest illustration of your data.

All aspects of the output image are tunable, making Circos aflexible and extensible tool for the generation ofpublication-quality, circularly composited renditions ofgenomic data and related annotations.

Circos is written in Perl and produces bitmap (PNG) and vector (SVG) images using plain text configuration and input files.

how does it work?

Circos is driven by a Apache-like, text configuration file andaccepts data from flat files. There is currently no graphical userinterface for Circos and no plan to create one.

It is easy to plot, format and layer your data with Circos. A largevariety of plot and feature parameters are customizable, helping youmake the image that best communicates your data. You supply your datato Circos as flat files (e.g. GFF format), tell Circos what you wantplotted using the configuration file, and then create the image.

Great for posters too. <a href='images/circos-conservation.png'>zoom</a>
Figure | Great for posters too. zoom

only for genomic data?

Circos can be applied to draw any kind of data, not just from thefield of genomics. Since I work in genomics, I've been using Circos todraw the kind of data I work with. Circos is ideally suited when yourdata represents relationships between positions on one or more scales.

You can turn tabular data into Circos images using the online version of Circos. Transform boring tables into informative and visually compelling datagraphics.

Visualization of a table with circos.

Large tables can be visualized - below is an example of a 54x14 table.

Visualization of a large table with circos.

I've applied circular compositing to represent database structure withSchemaball.

plot types

Support exists for a variety of plot types, such aspaired-location, scatter, line, histogram, heat map, tiles, glyph and textelements plots. Plots may be combined in a single track and multipletracks are supported. Colours and positions of individual elements canbe tuned to suit your application.

Some examples of Circos plots. (A) glyph (B) highlight with depth control (C) scatter (D) paired-location (E) ribbon (F) histogram (G) tile (H) highlight with auto depth (I) text with auto arrange (J) heat map (K) high-density text (L) high-density glyph (M) multi-type composite (N) variable scale control (O) fine geometry control (P) flexible text and element placement (Q) transparent ribbons (R) stacked histogram (S) connectors (T) tick rings.
Figure | Some examples of Circos plots. (A)glyph (B) highlight with depth control (C) scatter (D) paired-location(E) ribbon (F) histogram (G) tile (H) highlight with auto depth (I)text with auto arrange (J) heat map (K) high-density text (L)high-density glyph (M) multi-type composite (N) variable scale control(O) fine geometry control (P) flexible text and element placement (Q)transparent ribbons (R) stacked histogram (S) connectors (T) tick rings.

Rules can be written to adjust formatting of plot elements based onposition, value and formatting. You can control data characteristics(such as color, text size, position, etc) based on rules that maydepend on initial data values.

global and local zooming

Circos is unique in its support for both global and local axisscale deformation. This is illustrated in the set of figuresbelow, where magnification of ideograms and regions of ideograms canindependently adjusted to accentuate or attenuate the visual impact ofinformation.

You can draw ideograms with no scaling effects (left), with a global scale change applied to one or more ideograms (middle), and additionally add any number of local scale adjustments to enlarge/compress individual regions of ideograms (right). When applying local scale changes, the magnification can be smoothly varied across the zoom region.
Figure | You can draw ideograms with noscaling effects (left), with a global scale change applied to one ormore ideograms (middle), and additionally add any number of local scaleadjustments to enlarge/compress individual regions of ideograms(right). When applying local scale changes, the magnification can besmoothly varied across the zoom region.
<a href='images/circos-sample-large-23.png'>zoom</a> | hires <a href='images/hires/7.png'>01</a> <a href='images/hires/7-z01.png'>02</a> <a href='images/hires/7-z02.png'>03</a> <a href='images/hires/7-z03.png'>04</a> | The purpose of scale stretching is to expand regions which contain interesting data patterns. As one region is stretched, others are contracted to maintain the entire data domain in view. In this figure, location of genes (green), disease genes (orange) and cancer genes (red) are plotted on chr17 with the region in the vicinity of 35 Mb repeatedly expanded. Genes are drawn using <a href='http://mkweb.bcgsc.ca/circos/?tutorials&id=3'>highlights</a> with radial position representing the number of exons in the gene.
Figure | zoom | hires 01 02 03 04| The purpose of scale stretching is to expand regions which containinteresting data patterns. As one region is stretched, others arecontracted to maintain the entire data domain in view. In this figure,location of genes (green), disease genes (orange) and cancer genes(red) are plotted on chr17 with the region in the vicinity of 35 Mbrepeatedly expanded. Genes are drawn using highlights with radial position representing the number of exons in the gene.
<a href='images/circos-sample-large-24.png'>zoom</a> | hires <a href='images/circos-sample-huge-19.png'>01</a> <a href='images/hires/19-z01.png'>02</a> <a href='images/hires/19-z02.png'>03</a> <a href='images/hires/19-z03.png'>04</a>| Scale stretching is very visually appealing when combined with images that depict spatial relationships. Shown here is the similarity of human chromosome 1 (hg17) to the entire genome of the mouse (mm5). Lines represent alignment chains between human and mouse regions and are color coded by the identity of the mouse chromosome on which they impinge. Regions of human chromosome 1 and mouse chromosome 5 are expanded to show details in the alignments.
Figure | zoom | hires 01 02 03 04|Scale stretching is very visually appealing when combined with imagesthat depict spatial relationships. Shown here is the similarity ofhuman chromosome 1 (hg17) to the entire genome of the mouse (mm5).Lines represent alignment chains between human and mouse regions andare color coded by the identity of the mouse chromosome on which theyimpinge. Regions of human chromosome 1 and mouse chromosome 5 areexpanded to show details in the alignments.

using Circos

How do you know whether Circos can be useful to you? First, take alook at some screenshots. These will giveyou an idea of the types of data visualizations that Circos cancreate.

Circos, shamelessly promoted (PDF <a href='images/circos-poster-01.pdf'>white</a>, <a href='images/circos-poster-02.pdf'>black</a>, or <a href='images/circos-poster-03.png'>archetype zoo</a>)
Figure | Circos, shamelessly promoted (PDF white, black, or archetype zoo)

I've made Circos to be simple to use, with the goal being toproduce high quality genome diagrams suitable for publication. To keepCircos flexible, the configurationfile that describes the generation of the image contains manysettings - be sure to read the tutorials tofamiliarize yourself with these features.

To use Circos, you need to have Perl installed, along with a few CPAN modules. It's likely that you already meet all the requirements if you are working on a UNIX system.

You will also need a definition of the genome karyotypes, such as the content of the cytoBandIdeo table (UCSC genome browser). You can download the karyotype from the table browser or directly forhuman, mouse, orrat, orother species.The karyotype files are used to let Circos know the size and featuresof the chromosomes for the purpose of drawing the ideograms.

Once you've decided which species (one or more) and chromosomes(all, some, with optional spans) to use you can layer 2D andposition-paired data in concentric "tracks".

future of Circos

I work on Circos in a passive-aggressive manner - sometimes passivesometimes aggressive. I welcome your comments - please contact Martin Krzywinski if you would like to report a bug, request a feature or share the ways in which you are using, or hope to use, Circos.

There is a development road map for Circos. With one eye on the future, I am also keeping track of what is happening now with Circos.

license

Circos is free software, licensed under GPL.