Research Article
Using Natural Language Preprocessing Architecture (NLPA) for Big Data Text Sources
Table 1
Input and output data types for all tasks.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Available tasks: Input data file: GuessDateFromFilePipe (GDFF), StoreFileExtensionPipe (SFE), TargetAssigningFromPathPipe (TAFP), File2StringBufferPipe (F2SB). Input data StringBuffer: AbbreviationFromStringBufferPipe (AFSB), ComputePolarityFromStringBufferPipe (CPFSB), ComputePolarityTBWSFromStringBufferPipe (CPTFSB), ContractionsFromStringBufferPipe (CFSB), FindEmojiInStringBufferPipe (FEjISB), FindEmoticonInStringBufferPipe (FEtISB), FindHashtagInStringBufferPipe (FHISB), FindUrlInStringBufferPipe (FUISB), FindUserNameInStringBufferPipe (FUNISB), GuessLanguageFromStringBufferPipe (GLFSB), InterjectionFromStringBufferPipe (IFSB), MeasureLengthFromStringBufferPipe (MLFSB), NERFromStringBufferPipe (NFSB), StringBufferToLowerCasePipe (SBTLC), SlangFromStringBufferPipe (SFSB), StripHTMLFromStringBufferPipe (SHFSB), StopWordFromStringBufferPipe (SWFSB), TeeCSVFromStringBufferPipe (TCFSB), StringBuffer2SynsetSequencePipe (SB2SS), StringBuffer2TokenSequencePipe (SB2TS). Input data SynsetSequence: SynsetSequence2FeatureVectorPipe (SS2FV). Input data TokenSequence: TokenSequencePorterStemmerPipe (TSPS), TokenSequenceStemIrregularPipe (TSSI), TokenSequence2FeatureVectorPipe (TS2FV). Input data FeatureVector: TeeCSVFromFeatureVectorPipe (TCFFV), TeeDatasetFromFeatureVectorPipe (TDFFV). |