Published
Author Roderic Page
The following is a guest post by Bob Mesibov. According to w3techs, seven out of every eight websites in the Alexa top 10 million are UTF-8 encoded. This is good news for us screenscrapers, because it means that when we scrape data into a UTF-8 encoded document, the chances are good that all the characters will be correctly encoded and displayed. It's not quite good news for two reasons.