使用ensembl api鉴定假常染色体区段

来源:互联网 发布:c语言打印标准杨辉三角 编辑:程序博客网 时间:2024/05/06 06:24

PAR regions

The pseudo-autosomal regions are homologous DNA sequences on the (human) X and Y chromosomes. They allow the pairing and crossing-over of these sex chromosomes the same way the autosomal chromosomes do during meiosis. As these genomic regions are identical between X and Y, they are oftentimes only stored once.

To pull out the coordinates of the pseudo-autosomal regions (PAR) from the Ensembl database, you can perform the following query on the Ensembl core database:

Code:

select (select sr.name from seq_region sr where sr.seq_region_id=ae.seq_region_id) as chrom_1, ae.seq_region_start as start_1, ae.seq_region_end as end_1, (select sr.name from seq_region sr where sr.seq_region_id=ae.exc_seq_region_id) as chrom_2, ae.exc_seq_region_start as start_2, ae.exc_seq_region_end as end_2 from assembly_exception ae where ae.exc_type="PAR";

For the human database schema 61 (assembly GRCh37/hg19) you will get where the corresponding region is located:

+---------+----------+----------+---------+-----------+-----------+| chrom_1 | start_1  | end_1    | chrom_2 | start_2   | end_2     |+---------+----------+----------+---------+-----------+-----------+| Y       |    10001 |  2649520 | X       |     60001 |   2699520 || Y       | 59034050 | 59373566 | X       | 154931044 | 155270560 |+---------+----------+----------+---------+-----------+-----------+

For the old assembly (NCBI36/hg18) you will get:

+---------+----------+----------+---------+-----------+-----------+| chrom_1 | start_1  | end_1    | chrom_2 | start_2   | end_2     |+---------+----------+----------+---------+-----------+-----------+| Y       |        1 |  2709520 | X       |         1 |   2709520 || Y       | 57443438 | 57772954 | X       | 154584238 | 154913754 |+---------+----------+----------+---------+-----------+-----------+

You can alternatively use the API:

Code:

my $aefa = $db->get_AssemblyExceptionFeatureAdaptor();
my $sa   = $db->get_SliceAdaptor;
my $slice = $sa->fetch_by_region("chromosome", "Y");
my @aefs = @{$aefa->fetch_all_by_Slice($slice)};
foreach my $ae (@aefs){
  print $ae->display_id."\t".$ae->start."\t".$ae->end."\n";
}
X100012649520X5903405059373566

or for X:

Y600012699520Y154931044155270560

So to translate from Y to X PAR locations you can use the following for GRCh37 / hg19:

Y 10001 - 2649520      <->  X 60001 - 2699520, band Xp22.33Y 59034050 - 59373566  <->  X 154931044 - 155270560, band Xq28

and for NCBI36 / hg18:

Y 1 - 2709520          <-> X  1 - 2709520, band Xp22.33Y 57443438 - 57772954  <-> X  154584238 - 154913754, band Xq28

Please note that these coordinates do not agree with the definitions at the GRC and NCBI. This difference of the PAR-2 end coordinates (chrX:155.260.560 / 155.270.560 or chrY:59.363.566 / 59.373.566) is caused by the 10kb telomeric (gap) region which needs to be included in the PAR-2 definition to correctly represent this arrangement.

See also the telomere & centromer definition notes.

原创粉丝点击