An Affinity Propagation Clustering Algorithm for Mixed Numeric and Categorical Datasets

<table><tr><td><table class="algorithm" id="alg1"><tr><td colspan="2">Set <svg height="9.21094pt" id="M132" style="vertical-align:-0.2129908pt" version="1.1" viewbox="-0.0498162 -8.99795 25.8923 9.21094" width="25.8923pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.0135,0,0,-0.0135,0,0)"><path d="M244 607C244 633 228 655 200 655C166 655 146 618 146 594C146 564 166 546 191 546C221 546 244 574 244 607ZM222 91L209 114C184 94 148 66 133 66C127 66 124 73 130 96L201 370C213 416 211 448 191 448C162 448 88 407 29 352L42 328C73 354 104 371 114 371C120 371 119 365 115 345L53 92C32 5 45 -12 68 -12C103 -12 186 50 222 91Z" id="g113-106"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="273" vert-adv-y="273"></glyph.data></g><g transform="matrix(.0135,0,0,-0.0135,7.484,0)"><path d="M535 323V373H52V323H535ZM535 138V188H52V138H535Z" id="g117-34"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="587" vert-adv-y="587"></glyph.data></g><g transform="matrix(.0135,0,0,-0.0135,19.222,0)"><path d="M241 635C89 635 35 457 35 312C35 153 89 -12 240 -12C390 -12 443 166 443 312C443 466 390 635 241 635ZM238 602C329 602 354 454 354 312C354 172 330 22 240 22C152 22 124 173 124 313S148 602 238 602Z" id="g113-49"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="480" vert-adv-y="480"></glyph.data></g></svg>; </td></tr><tr><td colspan="2"><b>for</b> each numeric attribute <svg height="12.4257pt" id="M133" style="vertical-align:-3.427751pt" version="1.1" viewbox="-0.0498162 -8.99795 13.9791 12.4257" width="13.9791pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.0135,0,0,-0.0135,0,0)"><path d="M686 28C612 35 607 44 591 112C563 234 541 360 519 489L489 666L457 658L147 121C100 40 89 36 24 28L17 0H240L250 28C168 34 159 41 190 101L262 237H482C495 180 503 137 510 91C517 47 514 35 441 28L433 0H677L686 28ZM475 280H285L429 541H431L475 280Z" id="g113-66"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="703" vert-adv-y="703"></glyph.data></g><g transform="matrix(.0095,0,0,-0.0095,10.676,3.264)"><path d="M250 606C250 634 233 656 203 656C168 656 146 618 146 593C146 564 169 545 192 545C227 545 250 573 250 606ZM227 95L212 119C187 98 152 71 135 71C129 71 128 78 134 102L207 373C219 418 217 451 194 451C165 451 92 411 30 351L44 326C77 353 106 371 114 371C124 371 121 357 117 341L55 97C32 5 46 -12 70 -12C108 -12 191 51 227 95Z" id="g50-106"></path><glyph.data ascent="3443" descent="-2856" horiz-adv-x="280" vert-adv-y="280"></glyph.data></g></svg> in dataset <svg height="9.04777pt" id="M134" style="vertical-align:-0.04981995pt" version="1.1" viewbox="-0.0498162 -8.99795 9.65971 9.04777" width="9.65971pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.0135,0,0,-0.0135,0,0)"><path d="M686 28C612 35 607 44 591 112C563 234 541 360 519 489L489 666L457 658L147 121C100 40 89 36 24 28L17 0H240L250 28C168 34 159 41 190 101L262 237H482C495 180 503 137 510 91C517 47 514 35 441 28L433 0H677L686 28ZM475 280H285L429 541H431L475 280Z" id="g113-66"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="703" vert-adv-y="703"></glyph.data></g></svg>  <b>do</b></td></tr><tr><td colspan="2"> Figure out the similarity matrix based on (<a href="https://static.hindawi.com/articles/mpe/volume-2014/486075/figures/#EEq10">10</a>) as the input; </td></tr><tr><td colspan="2"> Calculate the median of similarities as the shared value of preference;</td></tr><tr><td colspan="2"> Perform the AP algorithm using (<a href="https://static.hindawi.com/articles/mpe/volume-2014/486075/figures/#EEq1">1</a>)–(<a href="https://static.hindawi.com/articles/mpe/volume-2014/486075/figures/#EEq4">4</a>) to obtain an <svg height="12.4257pt" id="M135" style="vertical-align:-3.427751pt" version="1.1" viewbox="-0.0498162 -8.99795 9.61387 12.4257" width="9.61387pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.0135,0,0,-0.0135,0,0)"><path d="M449 634C442 637 425 643 405 650C376 660 341 666 307 666C181 666 98 590 98 485C98 400 170 343 215 310L246 288C307 243 343 204 343 147C343 67 291 18 219 18C104 18 61 124 51 202L23 199C28 124 27 71 27 47C47 22 122 -16 204 -16C324 -16 428 60 428 174C428 256 379 309 307 360L276 382C223 419 179 455 179 516C179 576 221 632 293 632C379 632 410 564 418 487L448 490C446 536 446 592 449 634Z" id="g113-84"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="472" vert-adv-y="472"></glyph.data></g><g transform="matrix(.0095,0,0,-0.0095,6.327,3.264)"><path d="M250 606C250 634 233 656 203 656C168 656 146 618 146 593C146 564 169 545 192 545C227 545 250 573 250 606ZM227 95L212 119C187 98 152 71 135 71C129 71 128 78 134 102L207 373C219 418 217 451 194 451C165 451 92 411 30 351L44 326C77 353 106 371 114 371C124 371 121 357 117 341L55 97C32 5 46 -12 70 -12C108 -12 191 51 227 95Z" id="g50-106"></path><glyph.data ascent="3443" descent="-2856" horiz-adv-x="280" vert-adv-y="280"></glyph.data></g></svg> classification result;  </td></tr><tr><td colspan="2"> Discretize attribute <svg height="12.4257pt" id="M136" style="vertical-align:-3.427751pt" version="1.1" viewbox="-0.0498162 -8.99795 13.9791 12.4257" width="13.9791pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.0135,0,0,-0.0135,0,0)"><path d="M686 28C612 35 607 44 591 112C563 234 541 360 519 489L489 666L457 658L147 121C100 40 89 36 24 28L17 0H240L250 28C168 34 159 41 190 101L262 237H482C495 180 503 137 510 91C517 47 514 35 441 28L433 0H677L686 28ZM475 280H285L429 541H431L475 280Z" id="g113-66"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="703" vert-adv-y="703"></glyph.data></g><g transform="matrix(.0095,0,0,-0.0095,10.676,3.264)"><path d="M250 606C250 634 233 656 203 656C168 656 146 618 146 593C146 564 169 545 192 545C227 545 250 573 250 606ZM227 95L212 119C187 98 152 71 135 71C129 71 128 78 134 102L207 373C219 418 217 451 194 451C165 451 92 411 30 351L44 326C77 353 106 371 114 371C124 371 121 357 117 341L55 97C32 5 46 -12 70 -12C108 -12 191 51 227 95Z" id="g50-106"></path><glyph.data ascent="3443" descent="-2856" horiz-adv-x="280" vert-adv-y="280"></glyph.data></g></svg> to <svg height="12.4257pt" id="M137" style="vertical-align:-3.427751pt" version="1.1" viewbox="-0.0498162 -8.99795 9.61387 12.4257" width="9.61387pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.0135,0,0,-0.0135,0,0)"><path d="M449 634C442 637 425 643 405 650C376 660 341 666 307 666C181 666 98 590 98 485C98 400 170 343 215 310L246 288C307 243 343 204 343 147C343 67 291 18 219 18C104 18 61 124 51 202L23 199C28 124 27 71 27 47C47 22 122 -16 204 -16C324 -16 428 60 428 174C428 256 379 309 307 360L276 382C223 419 179 455 179 516C179 576 221 632 293 632C379 632 410 564 418 487L448 490C446 536 446 592 449 634Z" id="g113-84"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="472" vert-adv-y="472"></glyph.data></g><g transform="matrix(.0095,0,0,-0.0095,6.327,3.264)"><path d="M250 606C250 634 233 656 203 656C168 656 146 618 146 593C146 564 169 545 192 545C227 545 250 573 250 606ZM227 95L212 119C187 98 152 71 135 71C129 71 128 78 134 102L207 373C219 418 217 451 194 451C165 451 92 411 30 351L44 326C77 353 106 371 114 371C124 371 121 357 117 341L55 97C32 5 46 -12 70 -12C108 -12 191 51 227 95Z" id="g50-106"></path><glyph.data ascent="3443" descent="-2856" horiz-adv-x="280" vert-adv-y="280"></glyph.data></g></svg> intervals according to the clustering result; </td></tr><tr><td colspan="2"> <svg height="9.36053pt" id="M138" style="vertical-align:-0.3625803pt" version="1.1" viewbox="-0.0498162 -8.99795 43.6434 9.36053" width="43.6434pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.0135,0,0,-0.0135,0,0)"><path d="M244 607C244 633 228 655 200 655C166 655 146 618 146 594C146 564 166 546 191 546C221 546 244 574 244 607ZM222 91L209 114C184 94 148 66 133 66C127 66 124 73 130 96L201 370C213 416 211 448 191 448C162 448 88 407 29 352L42 328C73 354 104 371 114 371C120 371 119 365 115 345L53 92C32 5 45 -12 68 -12C103 -12 186 50 222 91Z" id="g113-106"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="273" vert-adv-y="273"></glyph.data></g><g transform="matrix(.0135,0,0,-0.0135,7.484,0)"><path d="M535 323V373H52V323H535ZM535 138V188H52V138H535Z" id="g117-34"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="587" vert-adv-y="587"></glyph.data></g><g transform="matrix(.0135,0,0,-0.0135,19.222,0)"><path d="M244 607C244 633 228 655 200 655C166 655 146 618 146 594C146 564 166 546 191 546C221 546 244 574 244 607ZM222 91L209 114C184 94 148 66 133 66C127 66 124 73 130 96L201 370C213 416 211 448 191 448C162 448 88 407 29 352L42 328C73 354 104 371 114 371C120 371 119 365 115 345L53 92C32 5 45 -12 68 -12C103 -12 186 50 222 91Z" id="g113-106"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="273" vert-adv-y="273"></glyph.data></g><g transform="matrix(.0135,0,0,-0.0135,25.949,0)"><path d="M535 230V280H323V490H265V280H52V230H265V-3H323V230H535Z" id="g117-36"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="587" vert-adv-y="587"></glyph.data></g><g transform="matrix(.0135,0,0,-0.0135,36.929,0)"><path d="M384 0V27C293 34 287 42 287 114V635C232 613 172 594 109 583V559L157 557C201 555 205 550 205 499V114C205 42 199 34 109 27V0H384Z" id="g113-50"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="480" vert-adv-y="480"></glyph.data></g></svg>;  </td></tr><tr><td colspan="2"><b>end for</b></td></tr><tr><td colspan="2">Establish a new dataset <svg height="9.04777pt" id="M139" style="vertical-align:-0.04981995pt" version="1.1" viewbox="-0.0498162 -8.99795 8.27261 9.04777" width="8.27261pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.0135,0,0,-0.0135,0,0)"><path d="M578 512C578 619 494 650 387 650H140L134 622C219 615 223 609 210 542L127 119C112 40 104 34 23 28L17 0H235C317 0 387 7 444 33C515 65 564 122 564 195C564 289 495 335 412 352V354C505 373 578 422 578 512ZM486 510C486 422 423 367 314 367H257L294 565C299 591 305 602 315 608C324 614 339 617 362 617C421 617 486 595 486 510ZM466 200C466 100 393 35 296 35C222 35 198 51 212 127L250 333H303C388 333 466 303 466 200Z" id="g113-67"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="601" vert-adv-y="601"></glyph.data></g></svg> which is a pure categorical dataset composed of the discretized numeric </td></tr><tr><td colspan="2">attributes and the original categorical attributes;  </td></tr><tr><td colspan="2"><b>for</b> each attribute <svg height="14.758pt" id="M140" style="vertical-align:-5.760051pt" version="1.1" viewbox="-0.0498162 -8.99795 12.796 14.758" width="12.796pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.0135,0,0,-0.0135,0,0)"><path d="M578 512C578 619 494 650 387 650H140L134 622C219 615 223 609 210 542L127 119C112 40 104 34 23 28L17 0H235C317 0 387 7 444 33C515 65 564 122 564 195C564 289 495 335 412 352V354C505 373 578 422 578 512ZM486 510C486 422 423 367 314 367H257L294 565C299 591 305 602 315 608C324 614 339 617 362 617C421 617 486 595 486 510ZM466 200C466 100 393 35 296 35C222 35 198 51 212 127L250 333H303C388 333 466 303 466 200Z" id="g113-67"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="601" vert-adv-y="601"></glyph.data></g><g transform="matrix(.0095,0,0,-0.0095,8.075,3.264)"><path d="M400 606C400 634 383 656 353 656C316 656 294 620 294 593C294 564 317 545 343 545C375 545 400 573 400 606ZM366 351C379 413 381 451 356 451C323 451 251 408 183 341L199 313C223 335 267 365 277 365C285 365 284 354 277 312C245 132 222 27 193 -100C182 -148 160 -188 131 -188C113 -188 90 -178 75 -170C64 -164 55 -168 48 -175C38 -185 24 -203 24 -222S48 -257 71 -257C89 -257 131 -241 186 -192C243 -141 286 -46 310 74L366 351Z" id="g50-107"></path><glyph.data ascent="3443" descent="-2856" horiz-adv-x="430" vert-adv-y="430"></glyph.data></g></svg> in dataset <svg height="9.04777pt" id="M141" style="vertical-align:-0.04981995pt" version="1.1" viewbox="-0.0498162 -8.99795 8.27261 9.04777" width="8.27261pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.0135,0,0,-0.0135,0,0)"><path d="M578 512C578 619 494 650 387 650H140L134 622C219 615 223 609 210 542L127 119C112 40 104 34 23 28L17 0H235C317 0 387 7 444 33C515 65 564 122 564 195C564 289 495 335 412 352V354C505 373 578 422 578 512ZM486 510C486 422 423 367 314 367H257L294 565C299 591 305 602 315 608C324 614 339 617 362 617C421 617 486 595 486 510ZM466 200C466 100 393 35 296 35C222 35 198 51 212 127L250 333H303C388 333 466 303 466 200Z" id="g113-67"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="601" vert-adv-y="601"></glyph.data></g></svg>  <b>do</b></td></tr><tr><td colspan="2"> Calculate the distance between two distinct values of any categorical attribute using (<a href="https://static.hindawi.com/articles/mpe/volume-2014/486075/figures/#EEq5">5</a>)–(<a href="https://static.hindawi.com/articles/mpe/volume-2014/486075/figures/#EEq8">8</a>);</td></tr><tr><td colspan="2"> Compute the significance (weight) of each numeric attribute using (<a href="https://static.hindawi.com/articles/mpe/volume-2014/486075/figures/#EEq9">9</a>) in which the interval <svg height="9.21094pt" id="M142" style="vertical-align:-0.2129908pt" version="1.1" viewbox="-0.0498162 -8.99795 6.51834 9.21094" width="6.51834pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.0135,0,0,-0.0135,0,0)"><path d="M449 634C442 637 425 643 405 650C376 660 341 666 307 666C181 666 98 590 98 485C98 400 170 343 215 310L246 288C307 243 343 204 343 147C343 67 291 18 219 18C104 18 61 124 51 202L23 199C28 124 27 71 27 47C47 22 122 -16 204 -16C324 -16 428 60 428 174C428 256 379 309 307 360L276 382C223 419 179 455 179 516C179 576 221 632 293 632C379 632 410 564 418 487L448 490C446 536 446 592 449 634Z" id="g113-84"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="472" vert-adv-y="472"></glyph.data></g></svg> is replaced by <svg height="12.4257pt" id="M143" style="vertical-align:-3.427751pt" version="1.1" viewbox="-0.0498162 -8.99795 9.61387 12.4257" width="9.61387pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.0135,0,0,-0.0135,0,0)"><path d="M449 634C442 637 425 643 405 650C376 660 341 666 307 666C181 666 98 590 98 485C98 400 170 343 215 310L246 288C307 243 343 204 343 147C343 67 291 18 219 18C104 18 61 124 51 202L23 199C28 124 27 71 27 47C47 22 122 -16 204 -16C324 -16 428 60 428 174C428 256 379 309 307 360L276 382C223 419 179 455 179 516C179 576 221 632 293 632C379 632 410 564 418 487L448 490C446 536 446 592 449 634Z" id="g113-84"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="472" vert-adv-y="472"></glyph.data></g><g transform="matrix(.0095,0,0,-0.0095,6.327,3.264)"><path d="M250 606C250 634 233 656 203 656C168 656 146 618 146 593C146 564 169 545 192 545C227 545 250 573 250 606ZM227 95L212 119C187 98 152 71 135 71C129 71 128 78 134 102L207 373C219 418 217 451 194 451C165 451 92 411 30 351L44 326C77 353 106 371 114 371C124 371 121 357 117 341L55 97C32 5 46 -12 70 -12C108 -12 191 51 227 95Z" id="g50-106"></path><glyph.data ascent="3443" descent="-2856" horiz-adv-x="280" vert-adv-y="280"></glyph.data></g></svg>.</td></tr><tr><td colspan="2"><b>end for</b></td></tr></table></td></tr></table>

Mathematical Problems in Engineering

alg1

Algorithm 1

Algorithm 1: An Affinity Propagation Clustering Algorithm for Mixed Numeric and Categorical Datasets