goolap.info – The Web as Data Warehouse
Today, the Web is one of the world’s largest databases. However, due to its textual nature, aggregating and analyzing textual data from the Web analogue to a data warehouse is a hard problem. For instance, users may start from huge amounts of textual data and drill down into tiny sets of specific factual data, may manipulate or share atomic facts, and may repeat this process in an iterative fashion. In the goolap.info project we investigate fundamental problems in the process: What are common analysis operations of “end users” on natural language Web text? What is the typical iterative process for generating, verifying and sharing factual information from plain Web text? Can we integrate both, the “cloud”, a cluster of massively parallel working machines, and the “crowd”, end users of goolap.info, for solving hard problems, such as training 10.000s of fact extractors, for verifying billions of atomic facts or for generating analytical reports from the Web?
The current prototype goolap.info contains already factual information from the Web for about several million objects. The keyword-based query interface focuses on simple query intentions, such as, “display everything about Airbus” or complex aggregation intentions, such as “List and compare mergers, acquisitions, competitors and products of airplane technology vendors”. Start your own analysis at goolap.info
Search for any type of object or use our autocomplete feature to get keyword hints.
Pick one of our randomly selected objects.