Research Article

Using Distributed Data over HBase in Big Data Analytics Platform for Clinical Services

Box 2

Configuration and command scripts run across BDA platform.
(A) Steps to Access Head Node at WestGrid to Start PBS Job
(1) qsub -I -l walltime = 72:00:00, nodes = 6: ppn = 12, mem = 132 gb
(2) ll/global/software/Hadoop-cluster/-ltr
hdp 2.6.2, hb 0.98.16.1, phoenix 4.6.0
(3) module load Hadoop/2.6.2
(4) setup_start-Hadoop.sh f (f for format; do this only once…).
(5) module load HBase/…
(6) module load phoenix/…
(7) (actually check the ingest.sh script under ~/bel_DAD)
(8) hdfs dfsadmin -report
(9) djps (command displays the JVMs, Java services running with PIDs)
(B) Process to Ingest the File into Phoenix/HBase Database
(1) module load Hadoop/2.6.2
(2) module load HBase/0.98.16.hdp262
(3) module load phoenix/4.6.0
(4) localFileName = “The CSV file containing your data”
(5) hdfs dfs -mkdir/data
(6) hdfs dfs -put “localFileName”/data/
(7) hdfs dfs -ls/data
(8) sqlline.py hermes0090-ib0 DAD.sql
(9) export HADOOP_CLASSPATH = /global/software/Hadoop-cluster/HBase-0.98.16.1/lib/HBase-
protocol-0.98.16.1.jar:/global/software/Hadoop-cluster/HBase-0.98.16.1/lib/high-scale-lib-
1.1.1.jar:/global/scratch/dchrimes/HBase-0.98.16.1/34434213.moab01.westgrid.uvic.ca/conf
(10) time Hadoop jar/global/software/Hadoop-cluster/phoenix-4.6.0/phoenix-4.6.0-HBase-0.98-client.jar
org.apache.phoenix.MapReduce.CsvBulkLoadTool --table DAD –input “/data/localFileName”
#psql.py -t DAD localhost all.csv
(C) Ingest All Using d_runAll.sh
(1) First decide which file to use, then check the correctness of its column names. DADV2.sql (for v2) and
DAD.sql (for old)
(2) Create the database table using sqlline.py as illustrated above (sqlline.py hermes0090-ib0 DAD.sql)
(3) Make sure all the modules loaded: module load Hadoop/2.6.2module load HBase/0.98.16.
hdp262module load phoenix/4.6.0
(4) Generate the rest of data (we need 10 billion and monitor Big Data integer in the database).
(5) Use the d_runAll.sh to ingest them all at once.
(6) If a problems happen (persists) check the logs in different location (/global/scratch/dchrimes/and/or on
the/scratch/JOBID on the nodes).