Personal tools
 
Document Actions

Rgenetics/Galaxy Seminar

by Ross last modified 2008-01-18 16:10

Workshop 

 

Title:  A hands-on workshop introducing the Rgenetics toolkit for the Galaxy Genomic workbench: Integrated genetic association analyses using best-of-breed tools and data resources.

 

Learning Objectives: At the completion of the 3 hour workshop, participants will be familiar with the main features of the Galaxy genomic workbench user interface. Participants will have performed typical genetic association research tasks such as genotype quality control analysis and data cleaning, a genetic association case-control analysis, a genetic association continuous trait analysis, and a genetic association sample ancestry analysis. Participants will have used Galaxy tools for integrating data from the UCSC and other public data repositories with the results of statistical genetic analyses.

 

Presenters/tutors: Ross Lazarus, Anton Nekrutenko and James Taylor

 

Target Audience: Genetic disease association and pharmacogenetics researchers, including clinicians, graduate students, bioinformaticians, software developers and statisticians.

 

Recommended Background: This is a practical, student centered workshop based on adult learning principles. The workshop program will cover a lot of ground quickly, and familiarity with at least some basic concepts and terminology from at least one of population genetics, evolutionary biology, or statistics will be very helpful for getting the most out of this workshop. Hands-on experience performing genotype data quality assessment and association analyses, practical experience using the UCSC genome browser, and at least some familiarity with the documentation and applications of popular genetic association analysis tools such as Plink and Eigenstrat, will also be very helpful. To learn more about the tools and resources, please visit the workshop web site at http://rgenetics.org/chi/08

 

Participant Equipment: Each participant is expected to bring a wireless-capable laptop with a modern, standards compliant internet browser.

 

 


Workshop Program and Outline

A) Introduction.

 

Preliminaries. Warning. Do not unpack yet.

 

Faculty introductions.

 

Student sorting. Sort students into two groups - genetics (who know what HWE is and why it matters) vs all other skills – pair up or threesomes with strengths in genetics and at least one other area from: statistics, clinical sciences, epidemiology, other biology.

 

1. Overview - Translational genomic research road-blocks and solutions.

 

Challenges including tool and data resource heterogeneity, data scale issues, and maintaining up-to-date versions of tools, and ways of overcoming them (20 minutes - Ross)

 

2. Introduction to the Galaxy framework

 

Demonstration of the main features of the Galaxy workbench including the tool menu, managing and sharing persistent histories, accessing external data resources, and workflows (30 minutes - Anton and James)

3. Questions and discussion (10 minutes)



B) Practical Exercises – tutorial workshop

 

Participants work in small groups or individually according to preference. Each group will create a new account on the Rgalaxy server. All exercises will use public HapMap genotype data and synthetic phenotype data so there are no IRB/human subjects issues. Faculty will demonstrate each step on the projected computer, and will circulate amongst the participants, helping out as needed.

1. Genotype data quality assessment and data filtering tools (30 minutes)

 

a) Perform a single nucleotide polymorphism genotype data quality and completion analysis. Review marker and subject genotyping technical quality metrics including missing data rates, minor allele frequencies, Hardy Weinberg Equilibrium statistics, and Mendelian transmission inconsistency rates in family (HapMap) data.

 

b) Filter the raw genotype data to remove markers and subjects with the worst genotyping data while retaining a useful sample size.

 

c) Repeat the QC analysis on the filtered data set to confirm desired technical thresholds

2. Statistical genetics analysis tools: (30 minutes)

 

a) Perform a genotype analysis using a binary trait – ie. a case-control analysis, and view the results in genomic context using the UCSC genome browser

 

b) Perform an ancestry analysis and visualize the results, to explore potential bias from population sub-structure.

 

c) Perform a genotype analysis using a quantitative trait, and view the results in genomic context using the UCSC genome browser

Break for refreshments and conversation (20 minutes)

3. Genomic tools for prioritizing candidate markers: (30 minutes)

 

a) Identify highly conserved regions in the human genome and intersect these with the results from analyses 2a and 2c to help prioritize candidate markers for replication and functional study.

 

b) Identify regions containing (eg ENCODE) functional elements in the human genome and intersect these with results from 2a and 2c, to help prioritize candidate markers for replication and functional study.

C. Concluding remarks and discussion

 

Large group – Review of participant achievements, final questions and closing remarks (10 minutes)


 

Powered by Plone, the Open Source Content Management System

This site conforms to the following standards: