hapmap文件2:(; 前皆为一行)
#Wed Oct 28 01:29:38 2009: HapMap genotype data dump, SNPs genotyped in population MEX on chr7:26924045..27424045;
#For details on file format, see <!-- m --><a class="postlink" href="http://www.hapmap.org/genotypes/;">http://www.hapmap.org/genotypes/;</a><!-- m -->
rs# alleles chrom pos strand assembly# center protLSID assayLSID panelLSID QCcode NA19663 NA19664 NA19665 NA19722 NA19723 NA19649 NA19669 NA19656 NA19657 NA19658 NA19686 NA19719 NA19720 NA19724 NA19726 NA19747 NA19759 NA19773 NA19780 NA19675 NA19676 NA19677 NA19651 NA19653 NA19683 NA19684 NA19725 NA19727 NA19755 NA19756 NA19757 NA19772 NA19774 NA19775 NA19776 NA19777 NA19778 NA19783 NA19784 NA19796 NA19650 NA19671 NA19661 NA19682 NA19771 NA19779 NA19781 NA19782 NA19788 NA19659 NA19660 NA19662 NA19678 NA19680 NA19681 NA19746 NA19721 NA19748 NA19760 NA19718 NA19790 NA19794 NA19795 NA19654 NA19749 NA19751 NA19761 NA19762 NA19763 NA19770 NA19670 NA19716 NA19750 NA19789 NA19685 NA19679 NA19652;
rs774265 A/G chr7 26925442 + ncbi_b36 bbs urn:lsid:bbs.hapmap.org:Protocol:Phase3_Draft2:1 urn:lsid:bbs.hapmap.org:Assay:Phase3_Draft2_rs774265:1 urn:lsid:dcc.hapmap.org:Panel:US_Mexican-30-trios:3 QC+ GG GG GG GG GG GG GG GG GG AG GG GG GG GG GG GG GG GG AG GG GG GG GG GG GG GG GG GG GG AG AG AG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG AG GG GG GG AG GG GG GG;
......
rs4722699 C/T chr7 27423025 + ncbi_b36 bbs urn:lsid:bbs.hapmap.org:Protocol:Phase3_Draft2:1 urn:lsid:bbs.hapmap.org:Assay:Phase3_Draft2_rs4722699:1 urn:lsid:dcc.hapmap.org:Panel:US_Mexican-30-trios:3 QC+ CT CT TT CT CT CC CC CC CT CC CC CT CC TT CC CT CC CC CC CC CT CT CC CC CC CC CC CC CT CT TT CC CC CC CC CT CT CT TT CC CC CC CT CC CC CT CT CT CC CT CC CT CC CT CC TT CC TT CC CT CC CC CC CC CC CC CC CT CT CT CC CC CC CC CC CT CC;
软件所需要的两个文件(格式):
dat:
M rs774265
...
M rs4722699
这个似乎不是很难,我参考hopewell的照葫芦画瓢写了一个:
data datas;
infile 'E:\imputation\mex-ii.txt';
input text;
if substr(text,1,2)='rs' then
do;
text='M'||scan(text,1,' ');
output datas;
end;
run;
data _null_;
set datas;
file 'd:\dats.txt';
put text;
run;
sas报错:
NOTE: Invalid data for text in line 1 1-4.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9
1 #Wed Oct 28 01:29:38 2009: HapMap genotype data dump, SNPs genotyped in population MEX on
91 chr7:26924045..27424045 113
text=. _ERROR_=1 _N_=1
....(前三行文字报错类似)
NOTE: Invalid data for text in line 4 1-8.
4 rs774265 A/G chr7 26925442 + ncbi_b36 bbs urn:lsid:bbs.hapmap.org:Protocol:Phase3_Draft2:1
91 urn:lsid:bbs.hapmap.org:Assay:Phase3_Draft2_rs774265:1 urn:lsid:dcc.hapmap.org:Panel:US_M
181 exican-30-trios:3 QC+ GG GG GG GG GG GG GG GG GG AG GG GG GG GG GG GG GG GG
text=. _ERROR_=1 _N_=4
.....(后n行报错类似)
最后生成的文件是空的