Crawling Microblog by Common-Designed Software

Gang Lu, Shumei Liu, Kevin Lü

Abstract


A mount of data of microblogs is needed to be crawled for research, business analyzing, and so on. However, a lot of dynamic Web techniques are used in microblog Web pages. That makes it hard to crawl data by parsing the contents of Web pages for traditional Web page crawlers. Fortunately, microblogs provide APIs. Well-structured data can be returned to users simply by accessing those APIs in form of URLs. Basing on that mechanism, researchers have obtained some data from microblogs to research. Nevertheless, no common software for crawling microblog has been published up to now. Everyone has to start designing a microblog crawler from very beginning. A common software architecture based on microblog APIs for microblog crawler is proposed in this paper, which is named as MBCrawler. Its structure, architecture, and key classes are introduced. It can be seen that MBCrawler is modular and scalable. By implementing a real microblog crawler for Sina Weibo, it is shown that MBCrawler can fit specific features of different microblogs.

 

DOI: http://dx.doi.org/10.11591/telkomnika.v11i7.2805


Keywords


Social Computing; microblog; crawler; Twitter

Full Text:

PDF

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License