The amount of data produced by observational and theoretical effort in astronomy and astrophysics today is very large. Thus it is important to be able to organise and access these data very efficiently.
This course aims to give you a solid basic understanding of databases in astronomy, experience in accessing these and applying them to astronomical research, and introduce the statistical tools of astronomical computing.
Throughout the course a number of practical exercises will be given and a key aim is to enable the student to reach a level where she/he can apply the tools to their own research problems. At the end of the course the student should be able to plan the optimal data processing during her/his research, to select necessary tools, to use Virtual Observatory for the data mining and publishing her/his own results.
After having taken this course you should be able to efficiently search and access astronomical databases, know the basics of how to access databases both through on-line interfaces and programatically. You will also be able to apply up-to-date statistical techniques to your data and use these to mine large datasets for scientifically interesting information.
The course will be given by Andrey Belikov (Groningen), Jarle Brinchmann (Leiden) and Edwin Valentijn (Groningen).
The lectures and practical classes will take place in room 112 in the astronomy department, Buys Ballot Laboratory, Utrecht University. This link shows the location in Google maps. The address is Princetonplein 5, 3584CC, Utrecht.
The course will go on from 11:00 to 15:00 with a short break for lunch.
The evaluation (grading) of the course will be done on the basis of a written project to be handed in towards the end of the semester. More details regarding this will be given in the lectures and on this web-site when available.
Each day will consist of a 2 hr lecture followed by a practical session of 1hr30 related to the day's lecture after lunch. During the practical session ("werkcollege"), we will get hands-on experience with topics covered in the lectures. These tasks will not be graded but are an integral, and important, part of the course and will be assumed as background for your final project work.
The overall time allocated to the IAC is 200hrs. This will be distributed in the following way:
This does not add up to 200hrs because we in fact count the day of the lectures as 8hrs for the students - this to take height for the time spent on travelling.
There is not a single book for this course, rather a number of books are useful as reference for various aspects of the course. Individual lectures might also have suggested literature, so follow the lecture link below to see this.
The following are books that are useful for the topics of the course in general:
This book has a close relationship with the Weka data mining software.
New Trends in Data Warehousing and Data Analysis, Series: Annals of Information Systems , Vol. 3 , Kozielski, Stanislaw; Wrembel, Robert (Eds.) , Springer, 2009, ISBN: 978-0-387-87430-2
This book is more advanced
Michael J. A. Berry & Gordon S. Linoff, "Data mining techniques", Wiley 2004, ISBN-13: 978-0-471-47064-9.
This book has a decidedly clear turn towards the business world and so offers a different view of the field.
Christopher M. Bishop, "Pattern recognition and machine learning", Springer 2006, ISBN-13: 978-0387-31073-2.
This book is focused on the methods used in data mining and is not strongly directed to particular applications.
The main responsible for the lecture is indicated in square brackets. The detailed plan of the individual lectures is still somewhat preliminary but the overall content of the course is set. Click the lecture title to access the page for that lecture which will also contain links, literature suggestions and the lecture slides after the lectures have been given.
This lecture will introduce the present day data handling in Astronomy. We will cover both aspects of the data gathering process and of data storage. The history of data centers in astronomy will be reviewed. This will provide a context for the course and a first introduction to the use of these facilities.
To use a database efficiently, or to build your own database, it is necessary to have some knowledge of the database management systems and how to interact with one. This lecture will introduce the main query language for databases, SQL, and give an overview of how databases operate and how to interact with them through programs.
Along with SQL we will introduce some other standards in the programming (XML, UML) and languages for data visualisation and processing ( IDL, python).
The second part of the lecture will be dedicated to visualisation of the data in efficient way.
Databases in astronomy exist to allow us to do science both in a more efficient way, but also to apply novel tools and techniques to explore data. The lecture will first give some examples of astronomical databases and how to access the data.
We will take an in-depth look at the Sloan Digital Sky Survey to see an example of the practical use of SQL applied observational data. This will be complemented with an in-depth look at the Millennium database - a theoretical dataset provided on the web through a database and accessible via SQL searches.
This will be followed by an overview of how these databases have been used in science and the potential uses for future scientific studies.
Introduction to statistical methods in astronomy & the computational tools
Statistical methods in astronomy & the computational tools (continued)
Please, select a task and send it to Edith Fayole till 5 May. Only up to three people per task, so it is better to send a ranked list of your three preferred tasks.
The deadline for submission of the project will be end June and there will be a subsequent oral presentation at about the same time (see PDF for some details). Thus please in your email to Edith Fayolle also provide your availability in late June/early July for the short presentation - probably to take place in Utrecht (TBD).
28/1 An updated version of the course schedule with more in-depth descriptions of the lectures as well as suggested literature is available. Note that a swap of lectures has taken place with the VO lecture now happening April 28.