Research Article

Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics

Algorithm 3

Proposed algorithm combining task-level parallelism and dynamic scheduling for eThread on EC2.
Data:   protein gene sequences
Result: Gene annotation and tertiary structure
forall the   sequences do
   read sequence
   forall the 10 threading tools do
        forall the domain, chain do
     while task to run do
       estimate tts, memory footprint
       map available resource
       if pre-processing then
          do pre-processing
       end
       do main processing
       do post-processing
     end
       end
       write about
   end
end
for   sequences do in parallel
   / now meta-analysis step using all
       outputs from 10 threading tools
        /
   read all outputs
   do meta-analysis
end