[Retracted] Analysis of Application Data Mining to Capture Consumer Review Data on Booking Websites
Table 1
Web crawling and anticrawling strategies.
Strategies
Web crawling
Anticrawling
Sending requests to websites and acquiring data
When the view count of a website increases drastically during a specific period, all views are from the same IP address, and all user agents are Python-based, the manager limits the access from the IP address to the website
Simulating a user agent and acquiring a proxy IP
When the view count is abnormal, all users are required to log in to their accounts before viewing the website
Registering an account and visiting a website through cookies or tokens
A complete account database is established, and each account must have clearance to review specific information
Mimicking user operations by restricting the request sending frequency
A verification code is used to determine whether website visitors are real people
Passing the required authentication (e.g., OpenCv authentication)
Dynamic loading pages are introduced, in which data are loaded through JavaScript to increase the difficulty of website analysis
Using Selenium and PhantomJS to fully mimic the browsing behavior of real users