Definition part: |
Connection variables (undertaken by Biopython package) |
Bacteria phyla (bacteria_main_groups) |
List of geographical areas (list from file: countries_list_all.txt) see supplementary |
materials. |
The query structure (term = “country AND Geographical area’s name AND |
Bacteria [Organism] AND Date of publication”) |
gi_list (list of records verifying the query structure) |
listWC (number of records with the existence of the qualifier/country) |
lisV (number of records with a real/country qualifier attributed to the right |
geographical area) |
// all variables are set at zero (0) or an empty list. |
Define treatments and operations: |
For every geographical area form the list found in “countries_list_all.txt”: |
(i) Query the NCBI database, using the query structure. |
(ii) Retrieve the count of gi_list |
(iii) Retrieve all the records (Genbank format) one by one |
(iv) Access each record: |
If the qualifier/country exists then: |
+ 1 |
If the qualifier value matches the geographical area of |
interest: |
+ 1 |
Check for the taxonomy: |
Count the sequence regarding the appropriate phylum. |
If there is not taxonomy for the sequence (no |
bacteria) then register the GI in |
file “geographical_area_Absence_Bact.txt”, see |
supplementary materials. |
Save results for all records of the geographical area on a row in the result file |
(country_all.txt) see supplementary materials. |
Remove the geographical area from the list of geographical areas. |
If any errors occurred, save the error type in “error.txt”, see supplementary materials. |