E. Amitay, D. Carmel, A. Darlow, R. Lempel, and A. Soffer, The connectivity sonar, Proceedings of the fourteenth ACM conference on Hypertext and hypermedia , HYPERTEXT '03, 2003.
DOI : 10.1145/900051.900060

L. Barbosa and J. Freire, Searching for hidden-Web databases, WebDB, 2005.

L. Barbosa and J. Freire, An adaptive crawler for locating hidden-web entry points, WWW, 2007.

B. E. Boser, I. Guyon, and V. Vapnik, A training algorithm for optimal margin classifiers, Proceedings of the fifth annual workshop on Computational learning theory , COLT '92, 1992.
DOI : 10.1145/130385.130401

R. Cai, J. Yang, W. Lai, Y. Wang, and L. Zhang, iRobot, Proceeding of the 17th international conference on World Wide Web , WWW '08, 2008.
DOI : 10.1145/1367497.1367558

S. Chakrabarti, M. Van-den, B. Berg, and . Dom, Focused crawling: a new approach to topic-specific Web resource discovery, Computer Networks, vol.31, issue.11-16, pp.3111-3127, 1999.
DOI : 10.1016/S1389-1286(99)00052-3

Y. Diao, M. Altinel, M. J. Franklin, H. Zhang, and P. Fischer, Path sharing and predicate evaluation for high-performance XML filtering, ACM Transactions on Database Systems, vol.28, issue.4, 2003.
DOI : 10.1145/958942.958947

T. Furche, G. Gottlob, G. Grasso, C. Schallhart, and A. J. Sellers, OXPath: A language for scalable, memory-efficient data extraction from web applications, p.4, 2011.

D. Gibson, K. Punera, and A. Tomkins, The volume and evolution of web page templates, Special interest tracks and posters of the 14th international conference on World Wide Web , WWW '05, 2005.
DOI : 10.1145/1062745.1062763

Y. Guo, K. Li, K. Zhang, and G. Zhang, Board Forum Crawling: A Web Crawling Method for Web Forum, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06), 2006.
DOI : 10.1109/WI.2006.52

M. Hadley, Web application description language. http://www.w3.org/Submission/wadl/. [13] International Business Times, 2011.

P. Kolari, T. Finin, and A. Joshi, Svms for the Blogosphere: Blog Identification and Splog Detection, AAAI, 2006.

C. Lindemann and L. Littig, Coarse-grained classification of web sites by their structural properties, Proceedings of the eighth ACM international workshop on Web information and data management , WIDM '06, 2006.
DOI : 10.1145/1183550.1183559

C. Lindemann and L. Littig, Classifying web sites, Proceedings of the 16th international conference on World Wide Web , WWW '07, 2007.
DOI : 10.1145/1242572.1242736

M. Liu and T. W. Ling, A rule-based query language for HTML, DASFAA, 2001.

E. Osuna, R. Freund, and F. Girosi, An improved training algorithm for support vector machines, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop, 1997.
DOI : 10.1109/NNSP.1997.622408

A. Sahuguet and F. Azavant, Building light-weight wrappers for legacy Web data-sources using W4F, VLDB, 1999.

N. Sawa, A. Morishima, S. Sugimoto, and H. Kitagawa, Wraplet: Wrapping Your Web Contents with a Lightweight Language, 2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System, 2007.
DOI : 10.1109/SITIS.2007.135

W. Shen, A. Doan, J. F. Naughton, and R. Ramakrishnan, Declarative information extraction using Datalog with embedded extraction predicates, VLDB, 2007.

K. Sigurðsson, Incremental crawling with Heritrix, IWAW, 2005.

J. Su, D. Sun, I. Wu, and L. Chen, On design of browser-oriented data extraction system and plug-ins, JMST, 2010.

S. Zheng, R. Song, J. Wen, and D. Wu, Joint optimization of wrapper generation and template detection, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '07, 2007.
DOI : 10.1145/1281192.1281287