Computational and Mathematical Methods in Medicine

Research Article

Using Distributed Data over HBase in Big Data Analytics Platform for Clinical Services

Big Data technologies using Hadoop with possible applications in healthcare [5, 7–9, 11–13, 29, 37–42].


Technologies	Clinical utilization

Hadoop Distributed File System (HDFS)	It has clinical use because of its high capacity, fault tolerant, and inexpensive storage of very large datasets clinical.

MapReduce	The programming paradigm has been used for processing clinical Big Data.

Hadoop	Infrastructure adapted for clinical data processing.

Spark	Processing/storage of clinical data indirectly.

Cassandra	Key-value store for clinical data indirectly.

HBase	NoSQL database with random access was used for clinical data.

Apache Solr	Document warehouse indirectly for clinical data.

Lucene and Blur	Document warehouse not yet in healthcare, but upcoming for free text query on Hadoop platform, can be used for clinical data.

MongoDB	JSON document-oriented database has been used for clinical data.

Hive	Data interaction not yet configured for clinical data, but SQL layer to cross platform being possible.

Spark SQL	SQL access to Hadoop data not yet configured for clinical data.

JSON	Data description and transfer has been used for clinical data.

ZooKeeper	Coordination of data flow has been used for clinical data.

YARN	Resource allocator of data flow has been used for clinical data.

Oozie	A workflow scheduler to manage complex multipart Hadoop jobs not currently used for clinical data.

Pig	High-level data flow language for processing batches of data, but not used for clinical data.

Storm	Streaming ingestions were used for clinical data.