SIKS course: Web-based systems

Eyal Oren — eyal@cs.vu.nl
http://eyaloren.org/slides/2008/03/siks/

The Web as knowledge base

"For the long-term goal, exploiting the Web's potential for being the world's largest knowledge base, XML and Semantic Web are key assets, but by themselves not sufficient. We need to cope with diversity, incompleteness, and uncertainty: we have an absolute need for ranked retrieval, and statistics is key: combine techniques from DB, IR, AI, and ML."
Gerhard Weikum, ER'04, SIGMOD'07

Course overview

  • Topic
    • Web-based systems from a Semantic Web perspective
  • Organisers
    • Geert-Jan Houben (TU/e, VUB)
    • Stefan Schlobach (VU)
  • Focus areas
    • Data management (storage, querying, architectures)
    • Boncz, Rode, Siebes
    • Knowledge representation (modelling, reasoning, alignment)
    • Wester, Wang, Atteveldt
    • Web-based systems (standards, engineering, scalability)
    • Jameson, van Ossenbruggen

Schedule

10.00 - 10.45 Introduction and overview Eyal Oren
10.45 - 11.15 Tea break
11.15 - 12.45 Representation and retrieval of Web data using semantic technology Jeroen Wester
12.45 - 13.45 Lunch
13.45 - 16.00 Representation and retrieval of Web data using DB and IR technology Peter Boncz
Henning Rode
16.00 - 16.30 Tea break
16.30 - 18.00 P2P architectures for Web-based systems Ronny Siebes
Dinner

09.30 - 12.00 User-centered Design for the Semantic Web Anthony Jameson
12.00 - 13.00 Lunch
13.00 - 13.45 STITCH, technology for Semantic Interoperability in CH Shenghui Wang
14.00 - 15.30 E-culture, the MultimediaN project Jacco van Ossenbruggen
15.30 - 16.00 Tea break
16.00 - 17.30 Semantic technology for content analysis of newspaper articles Wouter van Atteveldt

Semantic Web basics

The Web of Things

  • Usenet email: allegro!batcave!cornell!rpics!weltyc
  • Internet: "it's not the wires, but the computers"
  • Web: "it's not the computers, but the documents"
  • Semantic Web: "it's not the documents, but the things"
  • We're interested in the knowledge on the Web, not in the documents

The Web for Humans

A city, with population, flag, geo-location

The Web for Machines

Characters and tables, but what does it mean?

Web data interoperability

  • Query the Web as a database
  • Different formats, structures, vocabularies, concepts, meaning

  • Data should be structured (not ASCII)
  • Structure should be data-oriented (not HTML)
  • Meaning of data should be clear (not XML)
  • Data should have standard APIs (not Flickr)
  • Reusable mappings between data are needed (not XSLT)

The Semantic Web (1)?

"There is lots of data we all use every day, and it's not part of the web. I can see my bank statements on the web, and my photographs, and I can see my appointments in a calendar. But can I see my photos in a calendar to see what I was doing when I took them? Can I see bank statement lines in a calendar?
No. Why not? Because we don't have a web of data. Because data is controlled by applications and each application keeps it to itself."
Tim Berners-Lee,

The Web of Data

The Semantic Web (2)?

  • AI view
    • enrich documents with metadata annotations
    • reasoning, ontologies, OWL, ML, NLP
  • DB view
    • online semi-structured data
    • query answering, indexing, storage
  • Web view
    • interlinked semi-structured reusable data & services
    • RDF, URI, SPARQL, HTTP

The Semantic Web (3)?

  • RDF: basic data format (triples form hypergraph)
  • john likes mary .
    isbn:10993 dc:title "The Pelican Brief" .
            
  • RDFS: simple schema language (subclass, subproperty)
  • dc:title rdfs:subPropertyOf rdfs:label .
    Jeep isa Car .
            
  • OWL: rich schema language (constraints, relations)
  • likes isa owl:symmetricProperty .
            

Current state

  • Stable formats and standards: RDF, RDFS, OWL, SPARQL
  • Technology: Adobe, Oracle, IBM, HP, Software AG, Altova
  • Deployment: Novartis, Pfizer, Telefonica, Vodafone, Elsevier
  • Data: Gene ontology, Geonames, Uniprot, Wordnet, DBLP
  • Applications visibility: Twine, Vodafone Live, Freebase, Garlik
  • Ivan Herman, W3C

Research themes

  • Storage (DB, IR, Semplore, Yars2, column storage)
  • Zhang ISWC'07, Harth ISWC'07, Abadi CIDR'07, Stonebraker VLDB'05
  • Machine learning (relation extraction, Wikipedia, shallow NLP)
  • Wu CIKM'07, Wu WWW'08, Suchanek WWW'07, Banko AAAI'07, Auer ESWC'07
  • Social semantics (knowledge acquisition, social networks, semantic wiki)
  • Völkel WWW'06, Ankolekar JWS'08, Ramakrishnan CIDR'07, Mika JWS'07
  • Data integration (linked data, entity search, alignment, gossiping)
  • Bizer ESWC'07, Auer ISWC'07, Cheng VLDB'07, Mihalcea CIKM'07, Aberer WWW'03
  • Reasoning (approximate, minimal, fragments, scalability)
  • Schlobach IJCAI'07, Muñoz ESWC'07, Krötzsch ISWC'07, Motik IJCAI'07
  • User applications (usability, interfaces, browsing, faceted exploration)
  • Huynh WWW'07, White CHI'07, Huynh ISWC'07

Schedule

10.00 - 10.45 Introduction and overview Eyal Oren
10.45 - 11.15 Tea break
11.15 - 12.45 Representation and retrieval of Web data using semantic technology Jeroen Wester
12.45 - 13.45 Lunch
13.45 - 16.00 Representation and retrieval of Web data using DB and IR technology Peter Boncz
Henning Rode
16.00 - 16.30 Tea break
16.30 - 18.00 P2P architectures for Web-based systems Ronny Siebes
Dinner

09.30 - 12.00 User-centered Design for the Semantic Web Anthony Jameson
12.00 - 13.00 Lunch
13.00 - 13.45 STITCH, technology for Semantic Interoperability in CH Shenghui Wang
14.00 - 15.30 E-culture, the MultimediaN project Jacco van Ossenbruggen
15.30 - 16.00 Tea break
16.00 - 17.30 Semantic technology for content analysis of newspaper articles Wouter van Atteveldt