Integrating Correlation-Based Feature Selection and Clustering for Improved Cardiovascular Disease Diagnosis

<table class="algorithm-group"><tr><td><table class="algorithm" id="alg1"><tr><td colspan="2"><i>Input: F = f<sub>1</sub>, f<sub>2</sub>, f<sub>3</sub>, … f<sub>n</sub>/</i><svg height="10.1524pt" id="M1" style="vertical-align:-0.04990005pt" version="1.1" viewbox="-0.0498162 -10.1025 6.17869 10.1524" width="6.17869pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.0091,0,0,-0.0091,0,-5.741)"><path d="M486 158C486 177 478 202 466 220C413 228 386 236 336 262C386 288 413 297 466 304C478 323 486 347 485 366C470 376 444 381 422 380C389 338 368 319 321 288C323 345 329 372 349 422C339 442 322 461 305 470C289 461 271 442 262 422C281 372 287 345 290 288C243 319 222 338 189 380C167 381 142 376 125 366C125 347 133 322 145 304C198 296 225 288 275 262C225 236 198 227 145 220C133 201 125 177 126 158C141 148 167 143 189 144C222 186 243 205 290 236C288 179 282 152 262 102C272 82 289 63 306 54C322 63 340 82 350 102C330 152 324 179 321 236C368 205 390 186 422 144C444 143 470 148 486 158Z" id="g50-43"></path></g></svg><i> set of all the features </i><svg height="10.1524pt" id="M2" style="vertical-align:-0.04990005pt" version="1.1" viewbox="-0.0498162 -10.1025 6.17869 10.1524" width="6.17869pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.0091,0,0,-0.0091,0,-5.741)"><path d="M486 158C486 177 478 202 466 220C413 228 386 236 336 262C386 288 413 297 466 304C478 323 486 347 485 366C470 376 444 381 422 380C389 338 368 319 321 288C323 345 329 372 349 422C339 442 322 461 305 470C289 461 271 442 262 422C281 372 287 345 290 288C243 319 222 338 189 380C167 381 142 376 125 366C125 347 133 322 145 304C198 296 225 288 275 262C225 236 198 227 145 220C133 201 125 177 126 158C141 148 167 143 189 144C222 186 243 205 290 236C288 179 282 152 262 102C272 82 289 63 306 54C322 63 340 82 350 102C330 152 324 179 321 236C368 205 390 186 422 144C444 143 470 148 486 158Z" id="g50-43"></path></g></svg><i>/;</i></td></tr><tr><td colspan="2">   <i>P/</i><svg height="10.1524pt" id="M3" style="vertical-align:-0.04990005pt" version="1.1" viewbox="-0.0498162 -10.1025 6.17869 10.1524" width="6.17869pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.0091,0,0,-0.0091,0,-5.741)"><path d="M486 158C486 177 478 202 466 220C413 228 386 236 336 262C386 288 413 297 466 304C478 323 486 347 485 366C470 376 444 381 422 380C389 338 368 319 321 288C323 345 329 372 349 422C339 442 322 461 305 470C289 461 271 442 262 422C281 372 287 345 290 288C243 319 222 338 189 380C167 381 142 376 125 366C125 347 133 322 145 304C198 296 225 288 275 262C225 236 198 227 145 220C133 201 125 177 126 158C141 148 167 143 189 144C222 186 243 205 290 236C288 179 282 152 262 102C272 82 289 63 306 54C322 63 340 82 350 102C330 152 324 179 321 236C368 205 390 186 422 144C444 143 470 148 486 158Z" id="g50-43"></path></g></svg><i> statistical significance level </i><svg height="10.1524pt" id="M4" style="vertical-align:-0.04990005pt" version="1.1" viewbox="-0.0498162 -10.1025 6.17869 10.1524" width="6.17869pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.0091,0,0,-0.0091,0,-5.741)"><path d="M486 158C486 177 478 202 466 220C413 228 386 236 336 262C386 288 413 297 466 304C478 323 486 347 485 366C470 376 444 381 422 380C389 338 368 319 321 288C323 345 329 372 349 422C339 442 322 461 305 470C289 461 271 442 262 422C281 372 287 345 290 288C243 319 222 338 189 380C167 381 142 376 125 366C125 347 133 322 145 304C198 296 225 288 275 262C225 236 198 227 145 220C133 201 125 177 126 158C141 148 167 143 189 144C222 186 243 205 290 236C288 179 282 152 262 102C272 82 289 63 306 54C322 63 340 82 350 102C330 152 324 179 321 236C368 205 390 186 422 144C444 143 470 148 486 158Z" id="g50-43"></path></g></svg><i>/;</i></td></tr><tr><td colspan="2">   <i>R/</i><svg height="10.1524pt" id="M5" style="vertical-align:-0.04990005pt" version="1.1" viewbox="-0.0498162 -10.1025 6.17869 10.1524" width="6.17869pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.0091,0,0,-0.0091,0,-5.741)"><path d="M486 158C486 177 478 202 466 220C413 228 386 236 336 262C386 288 413 297 466 304C478 323 486 347 485 366C470 376 444 381 422 380C389 338 368 319 321 288C323 345 329 372 349 422C339 442 322 461 305 470C289 461 271 442 262 422C281 372 287 345 290 288C243 319 222 338 189 380C167 381 142 376 125 366C125 347 133 322 145 304C198 296 225 288 275 262C225 236 198 227 145 220C133 201 125 177 126 158C141 148 167 143 189 144C222 186 243 205 290 236C288 179 282 152 262 102C272 82 289 63 306 54C322 63 340 82 350 102C330 152 324 179 321 236C368 205 390 186 422 144C444 143 470 148 486 158Z" id="g50-43"></path></g></svg><i> a threshold for correlation coefficient levels <sup>∗</sup>/;</i></td></tr><tr><td colspan="2">   <i>N/</i><svg height="10.1524pt" id="M6" style="vertical-align:-0.04990005pt" version="1.1" viewbox="-0.0498162 -10.1025 6.17869 10.1524" width="6.17869pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.0091,0,0,-0.0091,0,-5.741)"><path d="M486 158C486 177 478 202 466 220C413 228 386 236 336 262C386 288 413 297 466 304C478 323 486 347 485 366C470 376 444 381 422 380C389 338 368 319 321 288C323 345 329 372 349 422C339 442 322 461 305 470C289 461 271 442 262 422C281 372 287 345 290 288C243 319 222 338 189 380C167 381 142 376 125 366C125 347 133 322 145 304C198 296 225 288 275 262C225 236 198 227 145 220C133 201 125 177 126 158C141 148 167 143 189 144C222 186 243 205 290 236C288 179 282 152 262 102C272 82 289 63 306 54C322 63 340 82 350 102C330 152 324 179 321 236C368 205 390 186 422 144C444 143 470 148 486 158Z" id="g50-43"></path></g></svg><i> the maximum of features for the subset/</i><svg height="10.1524pt" id="M7" style="vertical-align:-0.04990005pt" version="1.1" viewbox="-0.0498162 -10.1025 6.17869 10.1524" width="6.17869pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.0091,0,0,-0.0091,0,-5.741)"><path d="M486 158C486 177 478 202 466 220C413 228 386 236 336 262C386 288 413 297 466 304C478 323 486 347 485 366C470 376 444 381 422 380C389 338 368 319 321 288C323 345 329 372 349 422C339 442 322 461 305 470C289 461 271 442 262 422C281 372 287 345 290 288C243 319 222 338 189 380C167 381 142 376 125 366C125 347 133 322 145 304C198 296 225 288 275 262C225 236 198 227 145 220C133 201 125 177 126 158C141 148 167 143 189 144C222 186 243 205 290 236C288 179 282 152 262 102C272 82 289 63 306 54C322 63 340 82 350 102C330 152 324 179 321 236C368 205 390 186 422 144C444 143 470 148 486 158Z" id="g50-43"></path></g></svg><i>;</i></td></tr><tr><td colspan="2"><i>Output: F<sub>s</sub>/</i><svg height="10.1524pt" id="M8" style="vertical-align:-0.04990005pt" version="1.1" viewbox="-0.0498162 -10.1025 6.17869 10.1524" width="6.17869pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.0091,0,0,-0.0091,0,-5.741)"><path d="M486 158C486 177 478 202 466 220C413 228 386 236 336 262C386 288 413 297 466 304C478 323 486 347 485 366C470 376 444 381 422 380C389 338 368 319 321 288C323 345 329 372 349 422C339 442 322 461 305 470C289 461 271 442 262 422C281 372 287 345 290 288C243 319 222 338 189 380C167 381 142 376 125 366C125 347 133 322 145 304C198 296 225 288 275 262C225 236 198 227 145 220C133 201 125 177 126 158C141 148 167 143 189 144C222 186 243 205 290 236C288 179 282 152 262 102C272 82 289 63 306 54C322 63 340 82 350 102C330 152 324 179 321 236C368 205 390 186 422 144C444 143 470 148 486 158Z" id="g50-43"></path></g></svg><i> selected subset of features </i><svg height="10.1524pt" id="M9" style="vertical-align:-0.04990005pt" version="1.1" viewbox="-0.0498162 -10.1025 6.17869 10.1524" width="6.17869pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.0091,0,0,-0.0091,0,-5.741)"><path d="M486 158C486 177 478 202 466 220C413 228 386 236 336 262C386 288 413 297 466 304C478 323 486 347 485 366C470 376 444 381 422 380C389 338 368 319 321 288C323 345 329 372 349 422C339 442 322 461 305 470C289 461 271 442 262 422C281 372 287 345 290 288C243 319 222 338 189 380C167 381 142 376 125 366C125 347 133 322 145 304C198 296 225 288 275 262C225 236 198 227 145 220C133 201 125 177 126 158C141 148 167 143 189 144C222 186 243 205 290 236C288 179 282 152 262 102C272 82 289 63 306 54C322 63 340 82 350 102C330 152 324 179 321 236C368 205 390 186 422 144C444 143 470 148 486 158Z" id="g50-43"></path></g></svg><i>/;</i></td></tr><tr><td colspan="2">   <i>(1) Initialize F<sub>s</sub> with feature f<sub>j</sub> ϵ F that is the least correlated with other ones;</i></td></tr><tr><td colspan="2">   <i>(2) do</i></td></tr><tr><td colspan="2">   <i>(3) Compute C<sub>ij</sub>(F<sub>s</sub>, F \ F<sub>s</sub>) as a vector of correlation coefficients between F<sub>s</sub> and each f<sub>i</sub> ϵ {F \ F<sub>s</sub>};</i></td></tr><tr><td colspan="2">   <i>(4) Choose f<sub>j</sub> ϵ {F \ F<sub>s</sub>} with the lowest value of correlation coefficient in a vector C<sub>ij</sub>(F<sub>s</sub>, F \ F<sub>s</sub>);</i></td></tr><tr><td colspan="2">   <i>(5) Include f<sub>j</sub> in F<sub>s</sub></i></td></tr><tr><td colspan="2">   <i>(6) while (s &lt; N AND p &gt; P AND C<sub>ij</sub>(F<sub>s</sub>, F \ F<sub>s</sub>) &lt; R).</i></td></tr></table></td></tr></table>

<div> Proposed feature selection algorithm using reversed correlations</div>

Complexity

alg1

Algorithm 1

Algorithm 1: Integrating Correlation-Based Feature Selection and Clustering for Improved Cardiovascular Disease Diagnosis