Os propongo ahora un ejercicio basado en dos programas muy importantes de la bioinformática, BLAST y CLUSTALW(Thompson et al., 1994), que deberéis tener instalados en vuestro sistema. Las tareas que os pido son:
formatdb
de BLAST.
Para terminar este capítulo os muestro un programa que usa el programa externo
curl
(hay otras posibilidades, como
wget
)
para hacer múltiples peticiones a un servidor web de forma automática. A veces hay que recurrir
a este tipo de soluciones, por ejemplo si la versión web es la única accesible para un programa. Pero recordad que
antes de hacer este tipo de cosas es una buena idea escribir a la persona responsable de ese servidor web y
pedirle permiso. Es posible que os recomienden qué hora del día es mejor para estos trabajos grandes, para no saturar
sus máquinas.
#!/usr/bin/perl -w # written by Bruno Contreras, July 2005 # program get_thermo_gibbs_energies.pl, that submits a series of dna sequences in FASTA format # to the webserver http://wings.buffalo.edu/gsa/dna/dk/WEBTHERMODYN and retrieves their Gibbs # energy profile # This shows how scripts can be written to serialize web form submissions # For different webservers you need to read the source code of the the submission # page (rigth mouse button on any browser) to check out the variables involved # These are the variables for this example, that I got from my browser #<form enctype="multipart/form-data" method="POST" action="webthermodyn.cgi"> #<input type=text name=temperature value=37> #<input type=text name=saltconc value=10> #<input type=text name=molecule value=""> #<select name="shape" value="Linear"> #<INPUT TYPE="radio" NAME="inputmode" VALUE="input"> #<textarea name="sequence" value=""> #<INPUT TYPE="radio" NAME="allpart" VALUE="ALL"> #<input type=text name=step value=50> #<input type=text value=100 name=windowsize> #<input type=text value=1 name=marknum> #<input type=submit value="SUBMIT QUERY"> use strict; ###################################################################### my $progname = "get_thermo_freeenergies.pl"; my $thermodynCGI = "http://wings.buffalo.edu/gsa/dna/dk/WEBTHERMODYN/webthermodyn.cgi"; my $waittime = 5; # wait 5 seconds between submissions my $curl_options = "-F temperature=37 -F sign=positive -F saltconc=10 -F shape=Linear -F inputmode=input -F allpart=ALL -F step=25 -F windowsize=25 -F marknum=5 -F timelimit=90 "; ## parse DNA sequences file in fasta ################################ my (@DNA,@NAME); my ($dnaseq,$dnaseqname,$n_of_seqs) = ("","",0); open(FASTADNA,$ARGV[0]) || die "#$progname : need a valid DNA fasta file to run properly, exit... \n"; while(<FASTADNA>) { next if(/^$/); if(/\>/) { if($dnaseq) { push(@DNA,$dnaseq); push(@NAME,$dnaseqname); $dnaseq = ""; $dnaseqname = ""; } $dnaseqname = substr((split)[0],1); $n_of_seqs++; } else { $dnaseq .= (split)[0];} } close(FASTADNA); print "#$progname : sequences read $n_of_seqs (from $ARGV[0])\n"; ## get free energy profiles for each sequence ######################## my ($seq,$s); for($seq=0;$seq<scalar(@DNA);$seq++) { my $request = $curl_options . "-F sequence=$DNA[$seq] -F molecule=$NAME[$seq]"; open(CURL,"curl $request $thermodynCGI |") || die "#$progname : cannot run curl $request $thermodynCGI ...exit\n"; while(<CURL>) { if(/DATADIR/) { my $results = (split(/\"\>ASCII/,$_))[0]; $results = (split(/HREF=\"/,$results))[2]; open(RES,"curl $results |" ) || die "#$progname : cannot read $results ...exit\n"; while(<RES>) { print; } close(RES); } } close(CURL); sleep $waittime; # wait this time before the next submission is delivered }