Aplicación de algoritmos genéticos a la identificación de la estructura de enlaces en portales web

  1. Martínez Torres, María del Rocío
  2. Palacios Florencio, Beatriz
  3. Toral Marín, Sergio Luis
  4. Barrero García, Federico José
Journal:
Revista española de documentación científica

ISSN: 0210-0614 1988-4621

Year of publication: 2011

Volume: 34

Issue: 2

Pages: 232-252

Type: Article

DOI: 10.3989/REDC.2011.2.779 DIALNET GOOGLE SCHOLAR lock_openOpen access editor

More publications in: Revista española de documentación científica

Abstract

This paper explores website link structure, whereby websites are considered as interconnected graphs and their features are analyzed as a social network. For each root domain, two different networks are extracted: the fi rst being the domain network and the second, the page network. In each case, a series of indicators taken from social network analysis is evaluated in order to characterize the website structure. Factor analysis may provide an appropriate statistical methodology for extracting in graphic form the principal profi le of the website in terms of its internal structure. However, the large number of indicators generated by such an exploratory search would lead to a prohibitive number of possibilities. Therefore, this work proposes the use of genetic algorithms. By using this guided search over a given space of possible solutions, genetic algorithms can provide a subset of indicators able to optimize a fi tness function. The results categorize corporate websites in terms of their link structure and highlight the possibilities for using genetic algorithms as a tool for knowledge discovery.

Bibliographic References

  • Almind, T. C., y Ingwersen, P. (1997). Informetric analyses on the World Wide Web: Methodological approaches to Webometrics, Journal of Documentation, vol. 53 (4), pp. 404-426. doi:10.1108/EUM0000000007205
  • Almpanidis, G.; Kotropoulo, C., y Pitas, I. (2007). Combining text and link analysis for focused crawling. An application for vertical search engines, Information Systems, vol. 32, pp. 886-908. doi:10.1016/j.is.2006.09.004
  • Baeza-Yates, R., y Castillo, C. (2007). Characterization of national web domains, ACM Transactions on Internet Technology, vol. 7 (2), pp. 1-32. doi:10.1145/1239971.1239973
  • Berlt, K.; Silva de Moura, E.; Carvalho, A.; Cristo, M.; Ziviani, N., y Couto, T. (2010). Modeling the web as a hypergraph to compute page reputation, Information Systems, vol. 35 (5), pp. 530-543. doi:10.1016/j.is.2009.02.005
  • Björneborn, L., y Ingwersen, P. (2004). Toward a basic framework for webometrics, Journal of the American Society for Information Science and Technology, vol. 55 (14), pp. 1216-27. doi:10.1002/asi.20077
  • Faba-Pérez, C.; Zapico-Alonso, F.; Guerrero-Bote, V. P., y de Moya-Anegón, F. (2005). Comparative analysis of webometric measurements in thematic environments, Journal of the American Society for Information Science and Technology, vol. 56 (8), pp. 779-785. doi:10.1002/asi.20161
  • Goldberg, D. A. (1989). Genetic Algorithm-in Search, Optimization and Machine Learning, Addison-Wesley Publishing Company, Inc.
  • Goldfarb, A. (2006). The (teaching) role of universities in the diffusion of the Internet, International Journal of Industrial Organization, vol. 24 (2), pp. 203-225. doi:10.1016/j.ijindorg.2005.11.004
  • Holland, J. (1975). Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor, MI.
  • Huizingh, E. K. (2000). The content and design of web sites: an empirical study, Information & Management, vol. 37 (3), pp. 123-134. doi:10.1016/S0378-7206(99)00044-0
  • Iacobucci, D. (1994). Graphs and matrices. En: Wasserman, S. y Faust, K. (eds.), Social network analysis-methods and applications. New York, NY: Cambridge University Press, pp. 92-166.
  • Martínez Torres, M. R., y Toral, S. L. (2010a). International Comparison of R&D Investment By European, US and Japanese Companies, International Journal of Technology Management, vol. 49 (1-2-3), pp. 107-122.
  • Martínez-Torres, M. R., y Toral, S. L. (2010b). Strategic group identification using evolutionary computation, Expert Systems with Applications, vol. 37 (7), pp. 4.948-4.954.
  • Martínez-Torres, M. R.; Toral, S. L.; Barrero, F., y Cortés, F. (2010). The role of Internet in the development of Future Software Projects, Internet Research, vol. 20 (1), pp. 72-86. doi:10.1108/10662241011020842
  • Miranda González, F. J., y Bañegil, T. M. (2004). Quantitative evaluation of commercial web sites: an empirical study of Spanish firms, International Journal of Information Management, vol. 24, pp. 313-328. doi:10.1016/j.ijinfomgt.2004.04.009
  • Nooy, W.; Mrvar, A., y Batagelj, V. (2005). Exploratory Network Analysis with Pajek, Cambridge University Press, New York.
  • Ortega, J. L., y Aguillo, I. F. (2008). Visualization of the Nordic academic web: Link analysis using social network tools, Information Processing and Management, vol. 44, pp. 1.624-1.633.
  • Ortega, J. L., y Aguillo, I. F. (2009). Mapping world-class universities on the web, Information Processing and Management, vol. 45, pp. 272-279. doi:10.1016/j.ipm.2008.10.001
  • Park, H. W., y Thelwall, M. (2003). Hyperlink analysis: Between networks and indicators, Journal of Computer-Mediated Communication, vol. 8 (4). (http://www.ascusc.org/jcmc/vol8/issue4/park.html) [consulta: mayo de 2010].
  • Pinto-Molina, M.; Alonso-Berrocal, J. L.; Cordón-García, J. A.; Fernández-Marcial, V.; García-Figuerola, C.; García-Marco, J.; Gómez-Camarero, C.; Zazo, Á. F., y Doucet, A. V. (2004). Análisis cualitativo de la visibilidad de la investigación de las universidades españolas a través de sus páginas web. Revista Española de Documentación Científica, vol. 27 (3), pp. 345-370.
  • Rencher, A. C. (2002): Methods of Multivariate Analysis. 2nd ed. Wiley Series in Probability and Statistics, John Wiley & Sons. doi:10.1002/0471271357
  • Robbins, S. S., y Stylianou, A. C. (2003). Global corporate web sites: an empirical investigation of content and design, Information & Management, vol. 40 (3), pp. 205-212. doi:10.1016/S0378-7206(02)00002-2
  • Tan, G. W. y Wei, K. K. (2006). An empirical study of Web browsing behaviour: Towards an effective Website design, Electronic Commerce Research and Applications, vol. 5, pp. 261-271. doi:10.1016/j.elerap.2006.04.007
  • Thelwall, M. (2004). Link Analysis: An Information Science Approach, Amsterdam, Elsevier 2004.
  • Thelwall, M. (2008). Bibliometrics to webometrics, Journal of Information Science, vol. 34 (4), pp. 605-621. doi:10.1177/0165551507087238
  • Toral, S. L.; Martínez Torres, M. R., y Barrero, F. (2010). Analysis of Virtual Communities supporting OSS Projects using Social Network Analysis, Information and Software Technology, vol. 52 (3), pp. 296-303. doi:10.1016/j.infsof.2009.10.007
  • Toral, S. L.; Martínez-Torres, M. R., y Barrero, F. (2009a). Virtual Communities as a resource for the development of OSS projects: the case of Linux ports to embedded processors, Behavior and Information Technology, vol. 28 (5), pp. 405-419. doi:10.1080/01449290903121394
  • Toral, S. L.; Martínez-Torres, M. R.; Barrero, F., y Cortés, F. (2009b). An empirical study of the driving forces behind online communities, Internet Research, vol. 19 (4), pp. 378-392. doi:10.1108/10662240910981353
  • Toral, S. L.; Martínez-Torres, M. R., y Barrero, F. (2009c). Modelling Mailing List Behaviour in Open Source Projects: the Case of ARM Embedded Linux, Journal of Universal Computer Science, vol. 15 (3), pp. 648-664.
  • Yang, B., y Qin, J. (2008). Data collection system for link analysis, Third International Conference on Digital Information Management, pp. 247-252. doi:10.1109/ICDIM.2008.4746781