Research Article

Using Distributed Data over HBase in Big Data Analytics Platform for Clinical Services

Table 2

Big Data technologies using Hadoop with possible applications in healthcare [5, 79, 1113, 29, 3742].

TechnologiesClinical utilization

Hadoop Distributed File System (HDFS)It has clinical use because of its high capacity, fault tolerant, and inexpensive storage of very large datasets clinical.

MapReduceThe programming paradigm has been used for processing clinical Big Data.

HadoopInfrastructure adapted for clinical data processing.

SparkProcessing/storage of clinical data indirectly.

CassandraKey-value store for clinical data indirectly.

HBaseNoSQL database with random access was used for clinical data.

Apache SolrDocument warehouse indirectly for clinical data.

Lucene and BlurDocument warehouse not yet in healthcare, but upcoming for free text query on Hadoop platform, can be used for clinical data.

MongoDBJSON document-oriented database has been used for clinical data.

HiveData interaction not yet configured for clinical data, but SQL layer to cross platform being possible.

Spark SQLSQL access to Hadoop data not yet configured for clinical data.

JSONData description and transfer has been used for clinical data.

ZooKeeperCoordination of data flow has been used for clinical data.

YARNResource allocator of data flow has been used for clinical data.

OozieA workflow scheduler to manage complex multipart Hadoop jobs not currently used for clinical data.

PigHigh-level data flow language for processing batches of data, but not used for clinical data.

StormStreaming ingestions were used for clinical data.