What is the future of Data Scraping and the Structured Web?

Big Data has become a hot topic over the past year. What do you think the reason for this is?

I think this is obvious. It’s difficult to imagine today’s world without data. When I got involved in IT, a 10 MB hard drive seemed gigantic, and today, hard drives capable of storing terabytes of data are a standard! Besides, the largest “drive” today is the Internet that contains an immeasurable amount of data and expands at a mind-blowing speed. We just need to learn to separate seeds from the chaff, and that’s what big data technologies are all about.

Do you have any tips & tricks for people who want to turn unstructured data  structured data from the Web?

The thing is, this is still a fairly complex task. Products vary from “low-level”, where you need to be familiar with things like regex, xpath, css, http and such, to “high-level”, where all you need to do is to make clicks on the data you want to extract. The first type is usually more universal, but requires some technical skills. The second one works even for inexperienced users, but is often not efficient enough for solving more complex tasks. That’s why I truly appreciate the efforts made by and similar services to find the golden mean.

What do you think the future is for the Structured Web, and web data.
There is no doubt that connections between data on the Internet will grow (remember, it once started with the good old hypertext), and the speed of this process depends on how commercially profitable it will be. However, I don’t think that the problem of data scraping will ever go away. Even if all websites eventually become structurally interconnected, there will always be a need to untangle this huge knot 🙂

