SAS中文论坛
标题:
真诚求助:hapmap下载的文件导入到sas中的问题
[打印本页]
作者:
shiyiming
时间:
2009-10-28 15:19
标题:
真诚求助:hapmap下载的文件导入到sas中的问题
大家好!专业需要,要处理从hapmap上下载的snp
试着想写个程序,把hapmap下载的txt文件变成使用软件所需要的格式
hapmap上下载的文件格式如下:
1。
---
pop: JC
build: ncbi_b36
hapmap_release: rel22
filters: consensus+polymorphic
start: 26924045
stop: 27424045
snps:
- rs774264: 26925374
- rs774265: 26925442
......
- rs4722699: 27423025
- rs17429887: 27423321
phased_haplotypes:
- NA18524_c1: TGATTCGGCCCAGTAGGGGTTGTCCTGCATACTTGGCTCTAACGTGTCTAGATCCTTTTAATTCCCGATGATAAAGCCCAATACGGACTCATTGTTTCCCTAAGCAGATGCTCGATTGTCTATCTGTTGTTCGTACGCGCCAGCCGTCAATCATACCGGAGGGGCATGAATAAAATTTCCATCTAGTCGTTCGAAGGGTGGCAAGATTACGCGGCGAACGTCGAAGTTACGAGTGGATGGTGTTAGCCATCACGGCCATTTCTTAAGCCCTCGTTAGTTAAGGAATGAGGACCGGGCTGGTGTTCGACGTAGATATTAACACGCTGTAAATAGACACCGCTGCATTAGCGTATGCAACCC
- NA18524_c2: TGATTCGGCCCAGTAGGGGTTGTCCTGCATACTTGGCTCTCACGTGTCTAGATCCTTTTAATTCCTAGCAGCGAGCTTCGCTGTGAGTCCGCCGCACCAACAAACAGATACCCAGTCAGCTGCCCTCTTCTCGCGTCCGATGTCTTATCTCCGTTCGGAGAGGATACAGGAGAAATTTTCGTTTAGCCGTCCGGGGAGCGGCAAGGTTACGCGGCGTATTCCGAAGGTACCAGTGGATGGTGTTAGCCATCACGGCCATTTCTTAAGCCTTCGTTAGTTAAGGAATGAGGACTAGGCTGGGCTCTAACGTAGATAATAACACGCTGCAAACAGACGCCGCTGGATTAGCACATGCAACTA
......
- NA19012_c1: CGGCTTAGCTCAGCGTTCGCCGGTACCTATCTTTGGCTCTCACGTGTCTAGATCCTTTTAATTCCCGATGATAAAGCCGAATACGGACTTATTGTTTCCCTACGAGAGGATTTGGCCGTCTATCTGTTGTTGATACGCGATGTCTTATCTCCGTACCGAGAGGATACAGGAGAAATTTTCGTCTAGTCGTTCGGGGAGCGGCAAGATTACGCGGCGTATGTCGAAGTTACCCGTTGATATCGTAAGCAATTGAGACTACGCCGCGCCTTCGAACCGTTCGAGGGAGAGAACTCGAGCCATTGCTCGACGTAAATATTGACACGCTGTAAATAGACACCGCTGCATTAGCACATGCAACTA
- NA19012_c2: TGATTCGGCCCAGTAGGGGTTGTCCTGCATACCCACGCTCCGGACGAGCGAACGTCCCTTACCTGCGATGATAAAGCCCAATACGGACTTATTATTTCCCTACGAGAGGATTTGGCCGTCTATCTGTTGTTGATACGCGCTAGGCGTCATTCATACCGGAGGAGCATGAGTAAAAATTTCGTCTGGTCGCTCGGAAAACGGTCGTACGGCGTCCGAAACGTCGAAGTCACCAGTGGATGGTGTTAGCCATCACGGCCATTCCGCGCCTTCGAACCGTTCGAGGGAGAGAACTCGAGCCATTGCTCGATACGAGTATTGATATATGATGTGCTCGTACCATTACGGTAATGCGCACGGCCC
需要生成如下格式的文件:
snps.txt:
rs774264
rs774265
......
rs4722699
rs17429887
和
haplos.txt:
NA18524->NA18524 HAPL01 TGATTCGGCCCAGTAGGGGTTGTCCTGCATACTTGGCTCTAACGTGTCTAGATCCTTTTAATTCCCGATGATAAAGCCCAATACGGACTCATTGTTTCCCTAAGCAGATGCTCGATTGTCTATCTGTTGTTCGTACGCGCCAGCCGTCAATCATACCGGAGGGGCATGAATAAAATTTCCATCTAGTCGTTCGAAGGGTGGCAAGATTACGCGGCGAACGTCGAAGTTACGAGTGGATGGTGTTAGCCATCACGGCCATTTCTTAAGCCCTCGTTAGTTAAGGAATGAGGACCGGGCTGGTGTTCGACGTAGATATTAACACGCTGTAAATAGACACCGCTGCATTAGCGTATGCAACCC
NA18524->NA18524 HAPL02 TGATTCGGCCCAGTAGGGGTTGTCCTGCATACTTGGCTCTCACGTGTCTAGATCCTTTTAATTCCTAGCAGCGAGCTTCGCTGTGAGTCCGCCGCACCAACAAACAGATACCCAGTCAGCTGCCCTCTTCTCGCGTCCGATGTCTTATCTCCGTTCGGAGAGGATACAGGAGAAATTTTCGTTTAGCCGTCCGGGGAGCGGCAAGGTTACGCGGCGTATTCCGAAGGTACCAGTGGATGGTGTTAGCCATCACGGCCATTTCTTAAGCCTTCGTTAGTTAAGGAATGAGGACTAGGCTGGGCTCTAACGTAGATAATAACACGCTGCAAACAGACGCCGCTGGATTAGCACATGCAACTA
......
NA19012->NA19012 HAPLO1 CGGCTTAGCTCAGCGTTCGCCGGTACCTATCTTTGGCTCTCACGTGTCTAGATCCTTTTAATTCCCGATGATAAAGCCGAATACGGACTTATTGTTTCCCTACGAGAGGATTTGGCCGTCTATCTGTTGTTGATACGCGATGTCTTATCTCCGTACCGAGAGGATACAGGAGAAATTTTCGTCTAGTCGTTCGGGGAGCGGCAAGATTACGCGGCGTATGTCGAAGTTACCCGTTGATATCGTAAGCAATTGAGACTACGCCGCGCCTTCGAACCGTTCGAGGGAGAGAACTCGAGCCATTGCTCGACGTAAATATTGACACGCTGTAAATAGACACCGCTGCATTAGCACATGCAACTA
NA19012->NA19012 HAPLO2 TGATTCGGCCCAGTAGGGGTTGTCCTGCATACCCACGCTCCGGACGAGCGAACGTCCCTTACCTGCGATGATAAAGCCCAATACGGACTTATTATTTCCCTACGAGAGGATTTGGCCGTCTATCTGTTGTTGATACGCGCTAGGCGTCATTCATACCGGAGGAGCATGAGTAAAAATTTCGTCTGGTCGCTCGGAAAACGGTCGTACGGCGTCCGAAACGTCGAAGTCACCAGTGGATGGTGTTAGCCATCACGGCCATTCCGCGCCTTCGAACCGTTCGAGGGAGAGAACTCGAGCCATTGCTCGATACGAGTATTGATATATGATGTGCTCGTACCATTACGGTAATGCGCACGGCCC
现在遇到的主要问题是:
1.在rawdata里有不需要的文字说明
2.所需的snp名(如rs774264)前面的”-“后面的”:“和数字都是不需要的
3.所需的haplos名需从”- NA18524_c1: “改为”NA18524->NA18524 HAPL01“格式
因此,主要是想问,如何把需要的数据提取出来呢?第二,是否有一个替换的命令,来解决问题3?
如果您看到了这里,小女子已十分感激,谢谢~~!
作者:
shiyiming
时间:
2009-10-28 16:43
标题:
Re: 真诚求助:hapmap下载的文件导入到sas中的问题
[code:2g2ikd6z]data snps haplos;
infile 'd:\temp091028.txt' truncover lrecl=400;
input text $1-400;
if substr(text,1,4)='- rs' then
do;
text=scan(text,1,"- :");
output snps;
end;
else if substr(text,1,4)='- NA' then
do;
text=scan(text,1,"- _:")||'->'||scan(text,1,"- _:")
||tranwrd(scan(text,2,"- _:"),'c',' HAPLO')
||scan(text,3,"- _:");
output haplos;
end;
run;
data _null_;
set snps;
file 'd:\snps.txt';
put text;
run;
data _null_;
set haplos;
file 'd:\haplos.txt' lrecl=400;
put text;
run;[/code:2g2ikd6z]
作者:
shiyiming
时间:
2009-10-29 13:07
标题:
Re: 真诚求助:hapmap下载的文件导入到sas中的问题
真是太感谢了!回来试了一下,运行得很好,
代码里scan,substr命令让小菜我学习不少
赞牛人~~~\(≧▽≦)/~
bow~~~~~第一次在版上发帖就遇到热心的好人,开心:)
欢迎光临 SAS中文论坛 (https://mysas.net/forum/)
Powered by Discuz! X3.2