Workshop on Genome Assembly and Annotation

A course will be held during 23-27 October 2017 in Oeiras, hosted by the Portuguese ELIXIR Node in cooperation with the nodes involved in ELIXIR-EXCELERATE task 10.3 “Capacity Building in Genome Assembly and Annotation”.


The course is aimed at researchers interested in learning more about genome assembly and annotation. It will include information useful for both the beginner and the more advanced user. We will start by introducing general concepts and then continue step-by-step to describe all major components of a genome assembly and annotation workflow, from raw data all the way to a final assembled and annotated genome. There will be a mix of lectures and hands-on practical exercises using command line Linux. After the course the participants will be aware of common practices and commonly used tools and be able to run assembly and annotation projects on their own.



Course Organizer:

Henrik Lantz has a background in biology with a PhD in 2003 in Systematic Biology from Uppsala University, Sweden. He then moved to mycology with a post doc on the phylogeny of plant-associated ascomycetes, which in turn led to a position as a bioinformatician at SLU, Uppsala, working with genome assembly and annotation of fungal genomes. He has now been working with assembly and annotation for 7 years, of which the last 5 years has been in NBIS, an organisation that helps Swedish researchers with bioinformatics and also constitutes the Swedish node in ELIXIR.

Currently Henrik is sharing his time between three aresa: 1) Team leader for the NBIS Assembly and Annotation Service. This team provides genome assembly and annotation expertise to Swedish research projects. 2) Administration of NBIS support requests as NGS-coordinator. 3) Leading the European Union funded ELIXIR-EXCELERATE task 10.3, "Capacity Building in Genome Assembly and Annotation".

Henrik is course leader for the "Workshop on Genome Assembly and Annotation" in Oeiras, Portugal, Oct. 2017.

           Affiliation: NBIS, Uppsala University, SE



Mahesh Binzer-Panchal has a background in Computer Science and Mathematics with a PhD in methods of Phylogeographic Inference. Following this he did a post doc at the Max Plank Institute of Evolutionary Biology, in which the group studied lake-river adaptation of three-spined stickleback, and host-parasite co-evolution between three-spined stickleback and it's parasites. Mahesh carried out and assisted with many bioinformatic analyses, including resequencing analyses, RNA-seq and differential expression analyses, metagenomic analyses, and de novo assembly of the three-spined stickleback and targeted genomic regions. Currently he works as a member of the De novo Genome Assembly and Annotation service within NBIS (National Bioinformatics Infrastructure Sweden), specializing in de novo genome assembly. Mahesh takes a particular interest in sequence quality control and validation, and supports projects ranging from small bacterial assemblies to large eukaryote assemblies on a wide range of data.

          Affiliation: NBIS, Uppsala University, SE


Christophe Klopp has a background in computer and agronomical sciences (MSc University Rennes). He has been appointed head of the French national farm animal bioinformatic platform SIGENAE ( in 2002 and head of the South West regional bioinformatic platform in 2007 ( His has worked on transcriptome assembly and annotation for over ten years with production of reference databases for fishes and oyster (,, has also collaborated to many European projects in this field : Aquafirst, Aquafunc or Aquagenome ( He contributed to different software packages such as RNABrowse and DRAP ( Currently he takes part in different genome assembly projects using long read technologies (PacBio and Oxford Nanopore) including baterial, fungi, plants and fish species. He is a close collaborator of the GET ( sequencing platform.

Christophe will teach the long read assembly part of the course and organize the corresponding hands-on.

          Affiliation: MIAT - INRA, Toulouse, FR


Erik Hjerde has a background in biology with a PhD in Genomics. During his career he has been working mainly with prokaryotes with focus on genomics and transcriptomics. The last years, the focus has moved more towards metagenomic analysis on communities from both the human host as well as from various ecological habitats. He is the head of one national (Norwegian) work package within ELIXIR, which among other aim to make tools and analysis workflows requested by the user community available in Galaxy. Currently he is working on projects related to the human gut microbiome and to the salmon and cod gut microbiome.

          Affiliation: UiT, Tromsø, NO


Joelle Amselem has a background in Molecular Biology and Computer science. She is deputy leader of the URGI « Genome analysis » team and operational manager of the URGI platform ( After years of bioinformatic tools development (around transcriptomic and genome annotation), she carried out numerous bioinformatic analyzes in the frame of international genome sequencing and annotation projects including plants, fungi and insects. For over ten years, her research activities have covered the evolutionary dynamics and functional impact of repeats, genes and TEs annotation (

          Affiliation: URGI, INRA, Université Paris­ Saclay, Versailles, FR


Laurent Bouri has a background in genomic (Master degree at Versailles university) and bioinformatics (Master degree at Paris-sud university). As engineer at IRISA labs in Rennes, he spent 2 years benchmarking several softwares / algorithmes for long reads (Pacbio and oxford nanopore) correction and long reads assembly. Currently, he works as engineer at IFB-core ( in France and he is a part of the french node in ELIXIR.

          Affiliation:IFB, FR


Lieven Sterck has a background in biotechnology and with a PhD in bioinformatics from Ghent University, Belgium. Lieven was one of the researchers involved in the annotation of one of the first plant genomes sequenced (the poplar tree) and has as such been involved in genomics for over a decade with a special focus on gene and genome annotation. Currently he is a senior postdoc in the bioinformatics and evolutionary genomics groups of the VIB where he is mainly responsible for several of the international genome projects the lab is collaborating in. Not only is he actively participating in several genome projects, he also has an interest in developing new resources and tools to assist the research community with annotating and curating new genomes (eg. the ORCAE platform).

          Affiliation:VIB, Ghent University, BE


Lucile Soler has a background in biochemistry and bioinformatics with a PhD in Bioinformatics in the aquaculture team at the CIRAD (Agricultural research for development) in Montpellier, France. There she assembled the genome of the Nile Tilapia, annotated it, and did phylogenic and synteny groups analyses in order to find genes involved in the sex cascade. Following this, she worked as a bioinformatician at Syngenta in Toulouse (France), where she developed a pipeline to transfer markers coming from NGS sequencing involving whole genome alignment tools. Currently she works as a member of the De novo Genome Assembly and Annotation service within NBIS (National Bioinformatics Infrastructure Sweden), specializing in eukaryote and bacterial genome annotation.  

          Affiliation: NBIS, Uppsala University, SE


Elixir-PT Hosts:​


Pedro Fernandes graduated in Electronics and Telecommunications Engineering at IST (U.T. Lisboa). He worked in Biomedical Engineering, Biophysics and Physiology and changed to Bioinformatics in 1990. He established the first user community in Portugal around the national service provided by the portuguese node of the EMBnet. In 1998 he created the Gulbenkian Training Programme in Bioinformatics, that provided user skills to more than 3200 course attendees throughout its twelve years of existance. In 2002, in cooperation with Mario Silva from FCUL, he designed a graduate Programme in Bioinformatics. He currently teaches Bioinformatics both in graduate and undergraduate programmes. 

           Affiliation: IGC, Oeiras, PT



Daniel Sobral graduated in Informatics Engineering from Instituto Superior Técnico (Lisbon, Portugal). His interest in Biology led him to join the Gulbenkian PhD programme, and conduct his doctoral studies in Bioinformatics at the Université Aix-Marseille (France) with Dr. Patrick Lemaire. During his PhD he worked in different aspects of bioinformatics, particularly focusing on gene expression networks underlying embryonic development of a model organism, all of this integrated into a community resource. Later he became a Developer for the Ensembl Project where he worked mostly in integrating epigenetic data from the ENCODE project in Ensembl. In this context he gained significant experience with high throuthput sequencing data. In 2012 he moved back to Portugal, where he joined the Bioinformatics Unit at the IGC to assist the local research community in handling the sequencing revolution brought about by high throughput technologies. Within this role he been collaborating in several projects, ranging from genomics, transcriptomics and epigenetics. He has become the Head of the Bioinformatics Unit at IGC since 2014.  

           Affiliation: IGC, Oeiras, PT



Day 1 - Monday October 23

    09:00 - 09:30
    Welcome and introduction of teachers and round of presentation by participants
    (Daniel Sobral, Henrik Lantz).

    09:30 - 10:00
    Lecture: Introduction to genome assembly and annotation (Henrik Lantz)

    10:00 - 10:45
    Lecture: Read QC - Quality Assessment of sequencing data (Mahesh Binzer-Panchal)

    10:45 - 12:00 incl. cofee break
    Practical: Read QC (Mahesh Binzer-Panchal)

    12:00 - 13:00

    13:00 - 14:00
    Practical: Read QC (continuation)

    14:00 - 15:00
    Lecture: Illumina assembly (Erik Hjerde)

    15:00 - 17:00 incl. cofee break
    Practical: Genome assembly using Illumina data (Erik Hjerde)

Day 2 - Tuesday October 24

    09:00 - 10:00
    Lecture: Long read technologies and assemblies (Christophe Klopp)

    10:00 - 12:00 incl. cofee break
    Practical: Long read assembly and polishing (Christophe Klopp)

    12:00 - 13:00

    13:00 - 13:45
    Lecture: Assembly Validation (Mahesh Binzer-Panchal)

    13:45 - 17.00 incl. coffee break
    Practical: Assembly Validation (Mahesh Binzer-Panchal)

Day 3 - Wednesday October 25

    09:00 - 17:00 incl. coffee breaks and lunch
    Project discussion. Discussion with participants about their own projects and also look at their data if available.

    14:00 - 15:00
    Optional Invited Lecture: The INNUCA Bacterial Genome Assembly Pipeline (João Carriço, IMM).

Day 4 - Thursday October 26

    09:00 - 09:30
    Lecture: Eukaryotic Transposable Elements (Joelle Amselem)

    09:30 - 09:40
    Lecture: Prokaryotic Transposable Elements (Laurent Bouri)

    09:40 - 12:00 incl. coffee break
    Practical: Transposable Elements annotation (Eukaryotes + prokaryotes)

    12:00 - 13:00

    13.00 - 14.00
    Lecture: Genome Annotation (Lieven Sterck)

    14.00 - 17.00 incl. coffee break
    Lecture and practical: Bacterial annotation (Lucile Soler)

Day 5 - Friday October 27

    09:00 - 10:00
    Practical: manual curation - ORCAE (Lieven Sterck)

    10:00 - 12:00 incl. coffee break
    Practical: Gene prediction/Genome annotation (Lucile Soler)

    12:00 - 13:00

    13:00 - 13:30
    Lecture: Functional annotation (Lucile Soler)

    13.30 - 15.00
    Practical: Functional annotation

    15:00 - 17:00 incl. coffee break
    Walkthrough of earlier exercises and wrap-up


Registration: Participation is free of charge but requires registration.


Register using this google form until September 30th


We will accept only a maximum of 20 participants. Preference will be given to those currently directly involved in genomic projects, so please indicate a brief description of your project(s) in the application. Given that most practical exercises require the linux command-line, preference will be given to those that have minimal experience with running tools in the command-line. Please indicate in the application form your experience with running bioinformatic tools.

If you require assistance in finding accommodation, please indicate that in the application form.


Contact: For any questions about this course, please contact Daniel Sobral (e-mail address below)

Start date: 
Monday, October 23, 2017
End date: 
Instituto Gulbenkian de Ciência
R. Q.ta Grande 6
2780-156 Oeiras
Daniel Sobral
Event type: 
Workshops and courses