ESAC DATA ANALYSIS AND STATISTICS WORKSHOP 2016

 

Another ESAC Data Analysis and Statistics workshop will be held at ESAC during the week of October 25 to 28, 2016. This is the third of a continuing series of annual workshops, the firsts of which were held in October 2014 and October 2015.

Registration for this workshop is open to everyone, and no prior knowledge of statistics or advanced data analysis methods is required. The tutors will be Roberto Trotta and Željko Ivezić.

The Gaia Data Release #1 workshop organised jointly by ESA and DPAC, will take place the week immediately following the ESAC Data Analysis and Statistics workshop, and everyone is welcome to register and stay to attend this workshop as well. Note, however, that they are organised independently of one another, and therefore you must register for each one separately.

Table of contents

DATES & LOCATION

The workshop will take place on 25-28 October 2016, in rooms D1/D2 at ESAC. 

TUTORS

agenda

Calendar ICS file

Tuesday 25 October 2016

09:00 - 09:30: Welcome, registration and Installations troubleshooting.

09:30 - 09:35: Welcome to ESAC (Mark Kidger, ESAC Science Faculty Sentinel)

09:35 - 11:00: Introduction to statistics: Probabilities; random variables; parent distributions; samples, central limit theorem; the likelihood function and maximum likelihood principle; frequentist confidence intervals. (Roberto Trotta)

11:00 - 11:30: Coffee/tea break

11:30 - 13:30: Basic concepts in statistics: hands-on session. (Roberto Trotta)

13:30 - 14:30: Lunch @ ESAC canteen

14:30 - 15:30: Introduction to Bayesian inference: Bayes theorem; conceptual and philosophical principles, general advantages of the Bayesian approach; priors. (Roberto Trotta)

15:30 - 16:00: coffee/tea break

16:00 - 17:30: Introduction to Bayesian inference: hands-on session. (Roberto Trotta)

 

Wednesday 26 October 2016

09:30 - 11:00: Bayesian parameter estimation: inferential solution; the gaussian linear model; Markov Chain Montercarlo (Metropolis Hastings, Hamiltonian MC, Gibbs sampling); practical and numerical issues. (Roberto Trotta).

11:00 - 11:30: Coffee/tea break

11:30 - 13:30: Bayesian parameter estimation: hands-on session. (Roberto Trotta)

13:30 - 14:30: Lunch @ ESAC canteen

14:30 - 15:30: Bayesian model comparison: the three levels of inference; the Bayesian evidence; differences with respect to hypothesis testing; computation of the evidence (SDDR; Laplace approximation, nested sampling). (Roberto Trotta)

15:30 - 16:00: coffee/tea break 

16:00 - 17:30: Bayesian model comparison: hands-on session (Roberto Trotta)

 

Thursday 27 October 2016

09:30 - 11:00: Introduction to astroML. (Željko Ivezić)

11:00 - 11:30: Coffee/tea break

11:30 - 13:30: Introduction to astroML: hands-on session (Željko Ivezić)

13:30 - 14:30: Lunch @ ESAC canteen

14:30 - 15:30: Density estimation, One-dimensional introduction: Knuth's histograms, Scargle's Bayesian Blocks algorithm; gaussian mixture models; kernel density estimates (KDE); hands-on session. (Željko Ivezić) 

15:30 - 16:00: coffee/tea break 

16:00 - 17:30: Density estimation: high-D KDE; Bayesian nearest neighbor method; extreme deconvolution in high-Dhands-on session. (Željko Ivezić) 

 

Friday, 28 October 2016

09:30 - 11:00: Clustering and classification: clustering (unsupervised classification); 1D hypothesis testing; clustering with Gaussian Mixture models (GMM); hierarchical clustering algorithm. (Željko Ivezić)

11:00 - 11:30: Coffee/tea break

11:30 - 13:30: Supervised classification: potpourri of supervised classification methods: naive Bayes; quadratic disciminant analysis, GMM, KNN, support vector machines; classification comparison with ROC curves; hands-on session. (Željko Ivezić) 

13:30 - 14:30: Lunch @ ESAC canteen

14:30 - 15:30: Dimensionality Reduction: principal Component Analysis; non-negative Matrix Factorization; independent Component Analysis; manifold learning (Locally Linear Embedding)hands-on session. (Željko Ivezić)

15:30 - 16:00: coffee/tea break 

16:00 - 17:30: Regression and a few misc. points: (Gaussian) errors in both variables; regression with non-Gaussian errors and/or outliers learning curves; fast matching using KD trees; hands-on session. (Željko Ivezić)

FEE

The workshop fee is 50 Euro for non-ESAC (ESA or CAB @ ESAC) attendees and is payable in cash upon arrival and registration at ESAC. It covers the coffee breaks and the daily bus transportation from downtown Madrid to ESAC during the conference. 

 

LOGISTICS

 

Venue

The workshop will take place at the European Space Astronomy Centre (ESAC), Villanueva de la Cañada, near Madrid. 

 

Hotel

The official hotel for this workshop is:

Leonardo Hotel Madrid City Center

Alberto Aguilera 18, 28015 Madrid

 

Shuttle bus

A workshop shuttle bus will depart from and arrive to Leonardo Hotel and will be clearly signed with an “ESA” sign at the front. It is covered by the conference fee and you don’t need to stay in the conference hotel to be able to use it. If you loose this bus, the cost of arriving to ESAC cannot be covered by the fee. Departure and arrival times are the following:

Departure of the bus         On Tuesday 25 of October 2016        8h00 arrival at ESAC at 09:00
Departure of ESAC                On Tuesday 25 of October 2016        17h30 arrival at Leonardo Hotel at 18:30.

Departure of the bus         On Tuesday 26, 27, 28 of October 2016        8h30 arrival at ESAC at 09:30
Departure o
f ESAC                On Tuesday 26, 27,28 of October 2016        17h30 arrival at Leonardo Hotel at 18:30.
 

 

Social event

On Wednesday 26 October 2016 at 8:00 pm, we will be having the workshop dinner at the basque traditional Segaretxe restaurant in downtown Madrid, at a 13 minutes walk from the hotel. There will be full meat, fish or vegetarian menus for a total cost of 30 Euro (VAT included). At registration time we will ask you about the dinner, your eventual menu choice and will collect the dinner price in cash in advance in case you will be joining for it. 

 

WORKSHOP MATERIAL

 

Please copy this files for Roberto's lectures:

Lecture notes (1.8 MBs)

Exercises (348 KBs)

Lectures for Wednesday 26 October 2016:

Lecture on Bayesian Inference (pdf, 7.0 MB) 

Lecture on Model comparison (pdf, 6.6 MB)

Ipython/Jupyter notebooks for Roberto Trotta's hands-on session:

Linear model two populations notebook

Linear model PyMC notebook

 

For Thursday and Friday:

Zeljko’s lecture notes (21 MB):
 

Please copy these files to your laptop:

 

http://www.astro.washington.edu/users/ivezic/astroML/ESAC2016/dataAll.tar.gz (245 MB)

http://www.astro.washington.edu/users/ivezic/astroML/pythonAll.tar.gz

We will discuss what to do with it during the first lecture on Thursday morning.

 

SOFTWARE & INSTALLATION INSTRUCTIONS

The workshop and hands-on sessions will be based on python

Attendees are expected to come to the workshop with a working python installation, in order to participate in the hands-on session.

More precisely, the workshop participants should install the following software before the workshop:

  • python, with the following packages: numpymatplotlibscipyscikit-learnemcee
  • ipython notebook
  • astroML: please see http://www.astroml.org

MAC OS X

On Mac OS X, X11/XQuartz will have to be installed, in addition to the software mentioned above.

All instructions below assume that the bash shell is used, as it is the default shell on Mac OS X. (Adapt instructions accordingly if you changed your default shell.)

PYTHON & IPYTHON NOTEBOOK

We recommend the all-in-one scientific Python installer Anaconda.

  1. Download Anaconda from http://continuum.io/downloads
    For Mac OS X 10.7 (Lion), 10.8 (Mountain Lion), or 10.9 (Mavericks), pick "Mac OS X — 64-Bit Python 2.7 Graphical Installer".
    If you have 
    Mac OS X 10.6 (Snow Leopard), you may use an older version of anaconda.
  2. Double-click to install, and be sure to leave the default "Modify PATH" option.

Most of the necessary python modules already come by default with Anaconda: numpymatplotlibscipyscikit-learn.

The only python module that needs to be added is emcee:

3.  Install emcee in anaconda: conda install -c williamsmj emcee

Test the installation:

  1. Launch python:
      python
    This should start python, and the version should mention Anaconda.

    Exit with Control-D.
  2. Launch ipython:
      ipython
    This should start ipython, and the version should mention Anaconda.
    Exit with Control-D.
  3. Launch ipython notebook:
      ipython notebook
    This should open your default browser, and present you with a .
    Exit by closing the page (in the browser) and 
    with Control-C (in the terminal).
    Note: When the OS language is not English, ipython notebook may crash with the error "ValueError: unknown locale: UTF-8".
    In that case, before launching ipython notebook, type:
      export LC_CTYPE=en_GB.UTF-8
  4. Launch python and test the different modules:
  import numpy
  print numpy.__version__
  import matplotlib
  print matplotlib.__version__
  import scipy
  print scipy.__version__
  import sklearn
  print sklearn.__version__
  import emcee
  print emcee.__version__

All the python modules should load properly, and they should all print their version.
Exit with Control-D.
 

LINUX

All instructions below assume that the bash shell is used; adapt instructions accordingly if you use a different shell.

PYTHON & IPYTHON NOTEBOOK

We recommend the all-in-one scientific Python installer Anaconda.

  1. Download Anaconda from http://continuum.io/downloads
    The file will be named Anaconda-2.1.0-Linux-x86.sh (or a very similar name, adapt instructions accordingly)
  2. Install Anaconda with:
      bash Anaconda-2.1.0-Linux-x86.sh
    Note that you should type bash, regardless of whether or not you are actually using the bash shell.
    Follow the text-only prompts.
    When there is a colon at the bottom of the screen press the down arrow to move down through the text.
  3. Type yes and press enter to approve the license.
  4. Press enter to approve the default location for the files.
  5. Type yes and press enter to prepend Anaconda to your PATH (this makes the Anaconda distribution the default Python).Most of the necessary python modules already come by default with Anaconda: numpymatplotlibscipyscikit-learn.

    The only python module that needs to be added is emcee:

  6. Install emcee in anaconda (on a 64-bit linux): conda install -c lrp emcee

Note: if you are on a 32-bit linux, use the following command instead: conda install -c auto emcee

Test the installation:

  1. Launch python:
      python
    This should start python, and the version should mention Anaconda.

    Exit with Control-D.
  2. Launch ipython:
      ipython
    This should start python, and the version should mention Anaconda.
    Exit with Control-D.
  3. Launch ipython notebook:
      ipython notebook
    This should open your default browser, and present you with a .
    Exit by closing the page (in the browser) and 
    with Control-C (in the terminal).
  4. Launch python and test the different modules:
  import numpy
  print numpy.__version__
  import matplotlib
  print matplotlib.__version__
  import scipy
  print scipy.__version__
  import sklearn
  print sklearn.__version__
  import emcee
  print emcee.__version__

All the python modules should load properly, and they should all print their version.
Exit with Control-D.
 

WINDOWS

The main issue in Windows is the lack of a packaged version of emcee.

PYTHON & IPYTHON NOTEBOOK

We recommend the all-in-one scientific Python installer Anaconda.

  1. Download Anaconda from http://continuum.io/downloads
    The file will be named Anaconda-2.1.0-Windows-x86_64.exe
     (or a very similar name, adapt instructions accordingly)
  2. This package contains Python 2.7.
  3. Install Anaconda following the wizard and accepting all the defaults.

Most of the necessary python modules already come by default with Anaconda: numpymatplotlibscipyscikit-learn.

The only python module that needs to be added is emcee but emcee is not available packaged for Windows, so it should be downloaded from GitHub and installed:

  1. Download a ZIP package with the emcee code from https://github.com/dfm/emcee/zipball/master
  2. Unpack the archive in a temporary directory
  3. Change to the temporary directory created in step 2 and run:

python setup.py install

​This will add emcee to the package library managed by Anaconda.

Test the installation:

  1. Launch python:
      python
    This should start python, and the version should mention Anaconda.

    Exit with Control-D.
  2. Launch ipython:
      ipython
    This should start python, and the version should mention Anaconda.
    Exit with Control-D.
  3. Launch ipython notebook:
      ipython notebook
    This should open your default browser, and present you with a .
    Exit by closing the page (in the browser) and 
    with Control-C (in the terminal).
  4. Launch python and test the different modules:
  import numpy
  print numpy.__version__
  import matplotlib
  print matplotlib.__version__
  import scipy
  print scipy.__version__
  import sklearn
  print sklearn.__version__
  import emcee
  print emcee.__version__

All the python modules should load properly, and they should all print their version.
Exit with Control-D.
 

ORGANIZING COMMITTEE 

  • Michele Armano
  • Guillaume Belanger
  • Hervé Bouy
  • Uwe Lammers
  • Bruno Merín
  • William O'Mullane
  • Pablo Riviere-Marichalar
  • Celia Sánchez
  • Luis Manuel Sarro
  • Roland Vavrek

GET NOTIFICATIONS

You can join the mailing list about statistics at ESAC here.

In case of questions you can send an email to bruno.merin at esa.int.

FUNDING

The SOC warmly thanks the ESAC Science Faculty for funding this workshop.