| |
|
Overview
This is an introductory
graduate level course on Information Retrieval. In
this course we will study the main components, models and architecture
of a modern Information Retrieval System. Information organization,
text operations, and the metrics used to compare the performance of
diverse systems will be described. We will also study the main models
that have been proposed in Information Retrieval Systems to represent
documents and queries and find the similarity between documents and
queries. Finally, advanced techniques recently employed in IR will be
presented.
Exam and Questions:
The 30 minutes exam will consists of 3 parts:
1)
A 10 minute presentation of your assigned topic. Projector and a Laptop
will be provided. Bring your presentation in a memory stick
2) Questions related to your topic
3)
Questions that will be randomly selected from the 11 exercises posted
in the table schedule shown below ( i.e. Exercise 1 up to Exercise 11).
Time/Room/Schedule
We'll start
at 9AM each class. Our assigned classroom is B202. Please check the
schedule regularly for new/updated information. The class notes posted
below will be final the previous day before a class. The tentative
schedule for the course is:
|
|
Week |
Day |
Class
#
|
Subject
|
Exercises/Assignments
|
Class
Notes*/Extra Reading Assignments
|
|
| Week6 |
Tue |
1
|
Introduction I: Historical
perspective,
information structure, information vs. data retrieval systems, digital
libraries and IR systems organization. The WWW. Overview of the course.
|
Read Chapter
1 from Textbook
|
Notes
|
| Week6 |
Thur
|
2
|
Introduction II: Text
operations: tokenization, stemming, stop words, lematization,
compression. Words, terms and concepts, Thesaurus. Markup languages and the
semantic web.
|
Read Chapter 7 and Appendix from Textbook
Exercise 2
To be solved in class
|
Notes
Read
description of Porter's stemmer |
| Week7 |
Tue |
3
|
Query types. Modeling in
IR: Boolean and vector space models. Similarity measures.
|
Read Chapter
2.1-2.5.3 from Textbook
|
Notes
|
| Week7 |
Thur |
|
CANCELLED |
Look
at Exercise 3 and try to solve it |
|
| Week
8 |
Tue
|
4
|
Review
of probability concepts. Probabilistic model in IR.
|
Read Chapter
2.5.4, 2.6.2 from Textbook
|
Notes
Read this
paper (at least up to Section 2.6)
|
| Week8 |
Thur |
5
|
Review of concepts in fuzzy
logic. Fuzzy
logic-based model in IR. Extended boolean model.
Bayesian Networks in
IR: the inference network model.
|
Read Chapter 2.6, 2.6.1 and 2.8.1-2.8.5 from Textbook
|
Notes
Read this
paper
|
| Week
9 |
Tue |
6
|
Retrieval evaluation:
Recall and
Precision, alternate measures. Reference collections. Query Languages.
Query Operations: pseudo-feedback local and global analysis.
|
Read
Chapter 3,4,5 from Textbook
|
Notes
|
| Week
9 |
Thur |
|
CANCELLED |
Finish
reading chapters 3,4,5
Look at exercise 6 and try to solve it |
|
| Week10 |
Tue |
7
|
Ranking algorithms: HITS,
PageRank.
Indexing searching and storage mechanisms 1st part: flat, bitmap and
signature files, PAT trees.
|
Read
Sections 13.4.4 and 8.3 from Textbook
Exercise 7
|
Notes
Read this
paper and
this
one
|
| Week10 |
Thur |
8
|
Indexing, searching and
storage mechanisms 2nd part: Inverted files. Dictionaries: Tries and
B-trees. Anatomy of search engines and crawlers. Libraries and Toolkits
for IR: Lucene.
|
Read
Sections 8.1, 8.2, and 13.1-13.4.3 from Textbook
Exercise 8
|
Notes
Read
this article
and
this one
|
| Week11 |
Tue
14th March |
9
|
Efficient IR part I: Review of parallel
processing. Flynn's classification. Speedup, efficiency, Amdhal's law
and Amdhal's effect.
Parallel and distributed mechanisms
in search engines: data parallelism for logical and physical documents,
data parallelism for terms.
|
Read
Sections 9.1-9.2.2 and 9.3 from Textbook
Exercise 9
|
Notes
|
| Week11 |
Thur
16th March |
10
|
Efficient IR part II: A
parallel crawler. Static and dynamic partitioning of search graphs. Searching models.
Multimedia IR: data modeling,
queries and features. Searching and indexing multimedia objects
using features: R-trees, GEMINI.
|
Read Chapter
11, sections 11.1-11.2.1,11.3.11.3.1. Chapter 12, sections 12.1-12.2 from Textbook
Exercise 10
|
Notes
Read this
paper
|
| Week12 |
Tue
21rd March |
|
CANCELLED
|
|
|
| Week12 |
Thur
23th March |
11
|
Introduction
to Artificial Intelligence and its application in IR systems. QA
systems. Course overview. Final Exercise session
|
|
Notes
|
| Week13 |
Tue
28th March |
12
|
Jens
Rúni Poulsen
Topic:
Multiagent systems in IR
Discussion
|
|
Notes
|
| Week13 |
Thur
30th March |
13
|
CANCELLED
|
|
|
| Week14 |
Tue
4th April |
14
|
Ole Buus
Topic: Advanced techniques in IR:
genetic algorithms, simulated annealing etc.
Daniel
Jacob Poulsen
Topic: Text classification(categorization)
Discussion
|
|
|
Thur
6th April |
|
|
|
|
|
| Week15 |
Tue
11th April |
|
|
|
|
|
|
Thur
13th April |
|
|
|
|
|
| Week16 |
Tue
18th
April
|
|
|
|
|
|
|
Thur
20th April |
15 |
Kim
Beck
Topic: Natural Language Processing
in IR
Jia Ma
Topic: Machine Learning methods in
IR: unsupervised/semi-supervised learning.
Bing Pen
Topic:
Machine Learning methods in
IR: supervised learning
Discussion
|
Presentation schedule and
recommendations |
Notes on NLP in IR systems by Kim Beck
Notes on unsupervised ML in IR by Jia Ma
Notes on supervised ML in IR by Bing Pen |
|
| |
|
*Disclaimer: We will use
slides/notes from the textbook, my own notes and from other sources on
the web.
References
•Textbook
Modern Information
Retrieval by Ricardo Baeza-Yates, Berthier Ribeiro-Neto
Publisher: Addison Wesley; 1st edition (May 15, 1999)
ISBN: 020139829X
Reference books
Mining the Web:
Analysis of Hypertext and Semi Structured Data by Soumen Chakrabarti
Publisher: Morgan Kaufmann; 1st edition (August 15, 2002)
ISBN: 1558607544
Resources
Porter's
stemmer algorithm
How
Google works?
|
|
|
|
|