Logische Grundlagen der Datenbanktheorie ws13

Dozent Tadeusz Litak
Termine und Ort WS2013/14 Fri 10:30-14:00, Seminar Room of Informatik 8 (11 floor) (note the change!)
Umfang 4 SWS – 7,5 ECTS
Angaben Vorlesung mit Übung, geeignet als Schlüsselqualifikation, für Gasthörer zugelassen, Unterrichtssprache Deutsch oder Englisch
Studienfächer / Studienrichtungen WPF INF-MA ab 7 (ECTS-Credits: 7,5), WPF INF-BA-V-THI ab 4 (ECTS-Credits: 7,5), WPF INF-BA-V-THI ab 4 (ECTS-Credits: 7,5), WF M-BA ab 4 (ECTS-Credits: 7,5), WF M-BA ab 7 (ECTS-Credits: 7,5)

Slogans

Database queries are formulas, formulas are database queries.
Database instances are finite models, finite models are database instances.

News, lectures, homework

Highlights

  1. We will begin with a brief introduction explaining our motivating slogans (above). We will see the good old notions of model and satisfaction of formula from GLoLoP in an entirely new light. In particular, we will realize that formalisms which laid theoretical foundations for SQL and relational databases—the tuple calculus (in its named and unnamed version) and relational algebra—are variants of first-order logic on finite models. We will also quickly look at various extensions which wormed its way into SQL, such as counting, and briefly wonder what such things would correspond to. We’ll also see see how XML enters the picture: various formalisms for navigating/querying XML documents (think fragments of XPath, for example) are just variants of logic of finite trees. With XML and graph databases (think RDF, ontologies etc.), we will also see modal logic entering the picture. We will need thus to characterize the precise relationship between modal and predicate logic: on arbitrary models, on finite models and on finite trees. This will be our first encounter with a few celebrated results, such as the Van Benthem-Rosen Theorem.
  2. Finite models are a foreign country. They do things differently there. Seeing now model theory and modal/predicate logic all over the place, we will need a reminder/overview of how things work on arbitrary (i.e., possibly infinite) models. This again should be roughly familiar from introductory logic classes like GLoLoP, but I don’t want to assume too much knowledge. Thus, we will overview/recall, e.g., how the set of formulas valid on arbitrary models is recursively enumerable or what consequences compactness has. We will also see various standard model-theoretic characterizations of universal, existential, positive or existential positive formulas (the last we can now see as unions of conjunctive queries). Then we will realize with horror that when attention is restricted to finite models, most of these results and characterizations break down. Not to lose too much spirit: there are a few telling exceptions! In particular, the Van Benthem-Rosen Theorem does survive on finite models. Unions of conjunctive queries also fare surprisingly well. These are powerful and amazing logical results, established relatively recently.
  3. Furthermore, some of questions and results which arise on finite models could not even be thought of in the context of old-fashioned model theory. This includes the amazing 0-1 laws. They are a mathematical beauty in their own right. But we will also see with amazement how such theorems provide instant arguments why some queries just cannot be expressed in first-order logic. We will also learn about more standard tools used to show inexpressibility of queries/properties in a given formalisms, such as various variants of Ehrenfeucht-Fraisse games.
  4. Finally, if certain natural queries/properties cannot be expressed in basic predicate (or modal) logic, the obvious question to ask is: what if we extend the language? The first thing to think of seems recursion, which arises anyway in database formalisms like Datalog. How do extensions of predicate/modal logic with recursion look like? Which of results and techniques discussed above still apply?
  5. As you see, there are jungles, mountains and vast space out there. While there are basics we just have to cover, pretty soon we’ll have to make choices between various paths worth of exploring and individual preferences/needs of participants can be taken into account here. There is no way we can possibly cover everything in a 4 SWS semester anyway!

Ever wondered what logicians are doing at database conferences?
Ever wondered why core fragments of SQL or XPath look the way they do?
Ever wondered what “finite model theory” is all about?
Ever wondered why some queries are so easy to handle and some seem impossible even to write?

Whether you get your kicks from SQL, XML or just plain good math, come and see databases the way you never saw them before!

Contents

We will discuss the theoretical foundations of databases, starting with the underlying data model. We will focus mostly on various flavours of the relational model (calculus vs. algebra, named vs. unnamed variant etc.) and its real-life SQL incarnation, but other models, in particular semistructured databases and their real-life XML incarnation will also get their share of attention. We will investigate the connection between databases and finite model theory and overview some most interesting results in the field. Logical issues in the design of database query languages, in particular expressive completeness, but also questions of decidability and complexity will be of particular importance. Same applies to the use of logical tools for database-theory purposes, from Ehrenfeucht-Fraisse games to Herbrand structures.

Lernziele und Kompetenzen

Wissen The students will learn basic notions and results of finite model theory, with particular attention to notions and theorems relevant for database theory and essential characteristics of expressivity, decidability and complexity. As prerequisites, we will also recall and overview basics of SQL, relational algebra or XML query/navigation languages like XQuery and XPath
Verstehen The students will be able to summarize and prove major results of finite model theory and related areas relevant from the database point of view. Examples include equivalences between relational calculus and relational algebra, named and unnamed perspective, expressive completeness of relational calculus and similar results for XPath and XML query languages. The students will be able to explain the deep difference between finite and infinite model theory, by using 0-1 laws or by showing the failure of most preservation theorems
Anwenden The students will be able to use basic tools of finite model theory (and related areas relevant from a database point of view), such as Ehrenfeucht-Fraisse games or Herbrand structures. The students will use preservation theorems of Rosen or Rossman to characterize expressive power of logics over finite structures
Analysieren The students will be able to determine, e.g., whether a given SQL-like or XPath-like query is expressible over all finite models (or restricted subclasses of structures) in a chosen language
Evaluieren (Beurteilen) The best students, having gained a deep understanding of finite model theory, will be able to examine and evaluate choices involved in the design of basic database query languages, with particular attention to the criterion of expressive completeness and contrasting expressive power of a given language with its computational complexity
Erschaffen Making the right design choices is an important skill whenever the need arises for a new domain-specific language. You can think of database query languages just as an excellent case study given their omnipresence in everyday life.

Literature

  • Serge Abiteboul, Richard B. Hull, Victor Vianu. Foundations of Databases. Addison-Wesley, 1995.
  • Serge Abiteboul, Peter Buneman, Dan Suciu. Data on the Web : From Relations to Semistructured Data and XML. Morgan Kaufmann 1999
  • Heinz-Dieter Ebbinghaus, Jörg Flum, Finite Model Theory, Springer 1995
  • Paris C. Kanellakis. Elements of relational database theory, in Handbook of theoretical computer science (vol. B) Pages 1073 – 1156, MIT Press Cambridge, MA, USA 1990
  • Phokion Kolaitis. The Expressive Power of Logics on Finite Models, in Finite Model Theory and its Applications, EATCS Series: Texts in Theoretical Computer Science, Springer 2007
  • Leonid Libkin. Elements of Finite Model Theory, Springer 2012

————Supplementary reading on databases——————

  • Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer Widom. Database Systems: The Complete Book, Prentice-Hall 2008.
  • Jeffrey D. Ullman, Jennifer Widom. First Course in Database Systems, Prentice Hall 2008
  • Raghu Ramakrishnan and Johannes Gehrke. Database Management Systems, McGraw-Hill 2007

———–Supplementary reading on logic and classical (unrestricted) model theory———

  • Herbert B. Enderton. A Mathematical Introduction to Logic (1 ed. 1972). Academic Press Second edition, 2001. ISBN 978-0-12-238452-3
  • Wilfrid Hodges. Model theory, Cambridge University Press, Cambridge 1993, 772pp.