Research Article

Construction of Online English Corpus Based on Web Crawler Technology

Table 1

Web crawler framework corresponding to different programming languages.

LanguageWeb crawler framework

JAVAApache Nutch, webmagic, Heritriz3, WebCollector, craw1er4j, Spiderman, SeimiCrawler, jsoup-Gecco, and htmlunit
PythonScrapy, pyspider, Newspaper, and Crawley
PHPcola, Portia, python selenium, QueryList, phpspider, and PHPCrawl
GoBeanbun, php selenium
C#SmartSpider, Abot, xet, AngleSharp, HtmlAgilityPack, and CsQueryopen-source-sear ch-engine.Cobweb
C/C++upton, Spidr, and Larbin
Rubywombat
node.jsnode-crawler