Mobile Information Systems

Research Article

[Retracted] Analysis of Application Data Mining to Capture Consumer Review Data on Booking Websites

Web crawling and anticrawling strategies.


Strategies	Web crawling	Anticrawling
	Sending requests to websites and acquiring data	When the view count of a website increases drastically during a specific period, all views are from the same IP address, and all user agents are Python-based, the manager limits the access from the IP address to the website
	Simulating a user agent and acquiring a proxy IP	When the view count is abnormal, all users are required to log in to their accounts before viewing the website
	Registering an account and visiting a website through cookies or tokens	A complete account database is established, and each account must have clearance to review specific information
	Mimicking user operations by restricting the request sending frequency	A verification code is used to determine whether website visitors are real people
	Passing the required authentication (e.g., OpenCv authentication)	Dynamic loading pages are introduced, in which data are loaded through JavaScript to increase the difficulty of website analysis
	Using Selenium and PhantomJS to fully mimic the browsing behavior of real users