Research Article

On Detecting and Removing Superficial Redundancy in Vector Databases

Table 5

Analysis of cleaning dataset tools. In the notation , the first argument represents a feature of the free version of the corresponding tool, and the second one represents the same feature but in the enterprise version. Also, the symbol means that the tool cleans by row level instead of column level.

Indicator T1T2T3T4T5T6T7

Minimal
Required RAM
NONO4 GBNONO2 GB128 MB

Redundancy type INONONONONONO

Redundancy type IINONONONONONONO

Redundancy type IIINONONONONONONO

RepresentationsNONOstatistical
graphics
statistical
graphics
statistical
graphics
NONO

Allowed Input FormattextCSV, text
JSON, XML
Google Format,
RDF
csv, text,
MS Excel,
JSV, LOG,
MySQL, JSON,
, SQL Server
CSV, MS Excel,
SQLServer, Oracle DB
XML, PotsgreSQL,
Apache Derby,
IBM DB2, HSQL DB
MySQL, Mongo DB
CSV, text,
MS Excel,
MySQL,
Oracle DB
SQLServer,
MS Access
CSV, text,
MS Excel,
XML, DIF,
SYLK,
DBASE
CSV, text,
MS Access,
SQL Server
MySQL,

Output FormatCSV, text
tsv, JSON
Lookup
Table
CSV, TSV
MS Excel
HTML,
Template
CSV, JSON
TDE/
/MS Access,
MySQL,
SQLServer
CSV,
MS Excel
MySQL, SQLServer
Oracle DB
CSV, text
MS Access
CSV, text
Excel, DIF, XML,
SYLK, PDF, XPS
HTML, Open XML
Open Document
CSV, text
MS Excel

Size restrictions1000 cases and 40 variablesNONONONO1.048.576 cases and 16.384 variables1000 cases/
/without limit

Operative systemsW/L/MW/L/MW/MW/L/MWW/MW

Online versionXXcloudXXCloud

Local versionX

InterfaceWebWebDesktopDesktopConsole/desktopDesktopDesktop

LicenseFreeBSDFree
/Enterprise
Free/
/Paying
Free/
/Paying
Free/
/Paying
Free/
/Paying

CompanyStanford/
/Berkeley
Open
Source
TrifactaHuman
Interface
WinPureMicrosoftAshisoft