September 2014 Galaxy Update

Galaxy Updates

Welcome to the September 2014 Galaxy Update, a summary of what is going on in the Galaxy community. Galaxy Updates complement the Galaxy Development News Briefs which accompany new Galaxy releases and focus on Galaxy code updates.

Galaxy-UK Community Launched

Galaxy-UK, a new Galaxy Community was launched in August. The Galaxy-UK Community aims to:

  • Bring the Galaxy community in the United Kingdom closer together
  • Identify and address the needs of the community
  • Encourage interaction and collaboration.

Galaxy-UK is also an information hub for events such as:

  • UK based Galaxy training courses
  • UK based talks involving Galaxy
  • Information on the location of UK Galaxy servers
  • Anything else that might be pertinent to bring the UK Galaxy users/admins/trainers together as a community

The community will also have both online meetings and physical meetings, so keep an eye open for these events.

Other Galaxy Communities

Don't fret if you want to join a community but you are not in the UK. Galaxy-UK is just one of several Galaxy communities you can join. (And there are rumors of a German language community in the works.)

Events

Galaxy Events in Europe, Fall 2014

There are a wealth of Galaxy related events in Europe this fall. Events include a large Galaxy presence at ECCB'14, the Fourth Galaxy User Group Grand Ouest (GUGGO) meeting, the 2014 Swiss-German Galaxy Tour, and several training events in Italy, the United Kingdom, Croatia, Norway, and France.

These events are a great way to meet other Galaxy users and developers and learn and share best practices. If you're in Europe and are interested in learning more about Galaxy and/or the community, then please give these a look.

Galaxy at ECCB Tools integration on Galaxy   Analisi dati Next Generation Sequencing con Galaxy   2014 Swiss German Galaxy Tour
Date Topic/Event Venue/Location Contact
September 6-10 At least one tutorial, a panel of European Public Galaxy Instances, and 5 posters European Conference on Computational Biology (ECCB'14), Strasbourg, France Presenters
September 11 Tools integration on Galaxy
This event is full.
Galaxy User Group Grand Ouest, Rennes, France Cyril Monjeaud, Yvan Le Bras
September 15 Fourth GUGGO meeting Galaxy User Group Grand Ouest, Rennes, France Cyril Monjeaud, Yvan Le Bras
September 23-25 Analisi dati Next Generation Sequencing con Galaxy Cagliari, Italy CRS4
September 24 Introduction to Galaxy - Data Manipulation and Visualisation University of Cambridge, United Kingdom Anne Pajon, Jing Su
September 25 RADseq analysis using STACKS on Galaxy Galaxy User Group Grand Ouest, Rennes, France Yvan Le Bras, Cyril Monjeaud
September 30 CIR Interactive Workshop - Introduction to bioinformatics analysis with Galaxy application RBI, Zagreb, Croatia Enis Afgan
September 30 - October 2 Galaxy Training and Demo Day 2014 Swiss-German Galaxy Tour with events in Bern, Switzerland and Freiburg, Germany Hans-Rudolf Hotz and Bjoern Gruening
(second Swiss) Galaxy Workshop
German Galaxy Developers Day
October 6 - 24 Intensive course in High Throughput Sequencing technologies and bioinformatics analysis University of Oslo, Oslo, Norway Lex Nederbragt, Karin Lagesen
November 26-28 RNA-Seq & ChIP-Seq analysis course using Galaxy PRABI, Lyon, France Navratil V., Oger C., Veber P., Deschamps C., Perriere G.

The Great GigaScience and Galaxy Workshop

GigaScience Journal
VLSCI
Australian Bioinformatics Network

The The Great GigaScience and Galaxy (G3) Workshop will be held, Friday 19 September 2014 at The University of Melbourne from 8:45-5pm.

The day's theme is Turning data—big data—into research impact

Morning Session of Talks (8:45am - 12:00pm)

Afternoon Workshops (2:00pm - 5:00pm)

See the event page for registration and contact links, and additional information.

Other Events

And don't worry, Europe does not have a complete lock on upcoming Galaxy related events. There is also things going on in North America, and a few more in Australia too. See the Galaxy Events Google Calendar for details on other events of interest to the community.

Date Topic/Event Venue/Location Contact
October 29 - November 4 Computational & Comparative Genomics Course
Application Deadline: July 15
Cold Spring Harbor Laboratories (CSHL), New York, United States William Pearson, Lisa Stubbs
November 2-5 Introduction to Bioinformatics Analysis with Galaxy Workshop ASA, CSSA, and SSSA International Annual Meeting, Long Beach, California, United States Galaxy Outreach
SNP/Variant Analysis with Galaxy Workshop
A Gentle Introduction to Cloud Computing: Setting up your own Galaxy Server Workshop
November 19-20 Workshop: Extended RNA-Seq analysis The University of Queensland, Brisbane, Queensland, Australia Mark Crowe
2015
July 6-8 2015 Galaxy Community Conference (GCC2015) The Sainsbury Lab, Norwich, United Kingdom Galaxy Outreach

New Papers

44 papers were added to the Galaxy CiteULike Group in August, including this one:

The new papers were tagged in many different areas:

# Tag    # Tag    # Tag    # Tag
- Cloud - Project 1 Tools 6 UsePublic
2 HowTo - RefPublic - UseCloud 1 Visualization
1 IsGalaxy - Reproducibility 5 UseLocal 8 Workbench
30 Methods 1 Shared 8 UseMain

Who's Hiring

Please Help! Yes you!

The Galaxy is expanding! Please help it grow.

Got a Galaxy-related opening? Send it to outreach@galaxyproject.org and we'll put it in the Galaxy News feed and include it in next month's update.


New Releases

August was an eventful month for releases. New versions of Galaxy, CloudMan, BioBlend, and blend4j were all released.

August 11, 2014 Galaxy Distribution


**[Complete News Brief](http://wiki.galaxyproject.org/DevNewsBriefs/2014-08-11)**


*Highlights:*

• [Security alert](http://tinyurl.com/nhgmbc5) from July 31st, upgrade now
• [Citations](http://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax#A.3Ccitations.3E_tag_set): DOIs, `BibTeX`, and much much more
• [Docker](http://wiki.galaxyproject.org/Admin/Tools/Docker): You voted, we've *got it*, with a little help from our friends (you!)
• Significant Workflow, API, Job, Tool Shed, and Dataset management updates
• Fixes, tunings, plus just a drop of → Gossip

getgalaxy    getgalaxy.org
galaxy-dist.readthedocs.org
bitbucket.org/galaxy/galaxy-dist
new: $ hg clone https://bitbucket.org/galaxy/galaxy-dist#stable
upgrade: $ hg pull
$ hg update latest_2014.08.11

August 2014 CloudMan Release

CloudMan

This is mostly an incremental bug fix release with the following summary of changes:

  • On AWS, updated galaxyFS snapshot (snap-e6e1c04a), which includes the June 2, 2014 Galaxy release with the July 30th security fix. All the tools installed via the Tool Shed have been updated and a number of new tools added, most notably: Tophat2, Bowtie2, FastQC, several FASTQ manipulation tools, several QC tools.
  • For AWS, added support for VPC
  • For OpenStack clouds, added the ability to automatically recover worker instances on cluster reboot
  • Added support for creating a file system based on a downloadable archive
  • Do not run Galaxy with multiple processes by default. This is because Tool Shed installs do not work properly in the multi-process mode. This feature can be enabled by setting user data option configure_multiple_galaxy_processes to True when launching an instance.
  • Set SGE slots in each queue to be equal to the number of cores on the instance
  • Set instance IP in the Galaxy's FTP data upload tool message
  • Added support for Nginx v1.4 and allow it (with the PAM module) to used as the authentication mechanism when accessing Galaxy Reports app
  • Fixed cluster deletion when performed via the API
  • No longer automatically start Hadoop and HTCondor services
  • On manually-invoked instance reboots, do not increment the instance reboot count that otherwise eventually leads to instance termination
  • Limit the size of the log message buffer used in the UI to 1000 lines. Long-running instances had issues with this log growing large and that led to poor UI performance. The complete log is still available from the Admin page (or the command line).
  • Automatically delete the bucket/container for Test type (i.e, 'SGE only') clusters on cluster termination

For complete details on implemented changes, please see the source code commits.

CloudMan offers an easy way to get a personal and completely functional instance of Galaxy in the cloud in just a few minutes, without any manual configuration.

BioBlend 0.5.1 Release

BioBlend version 0.5.1 was released on August 19. From the CHANGELOG:

  • Fixed url joining problem described in issue #82
  • Enabled Travis Continuous Integration testing
  • Added script to create a user and get its API key
  • Deprecated create_user() method in favor of clearer create_remote_user(). Added create_local_user().
  • Skip instead of fail tests when BIOBLEND_GALAXY_URL and BIOBLEND_GALAXY_API_KEY environment variables are not defined.
  • Added export and download to objects API
  • Added export/download history
  • GalaxyClient: changed make_put_request to return whole requests response object
  • Added Tool wrapper to BioBlend.objects plus methods to list tools and get one
  • Added show_tool() method to ToolClient class
  • Added name, in_panel and trackster filters to get_tools()
  • Added upload_dataset() method to History class.
  • Removed DataInput and Tool classes for workflow steps. Tool is to be used for running single tools.

blend4j 0.1.1 Release

blend4j version 0.1.1 was released on August 27th. Some key features from the CHANGELOG:

  • Dataset collection support by Aaron Petkau. Among other things the histories client can now create and return information about collections and the workflows client can specify dataset collections as inputs.
  • Documentation overhaul - API documentation is now available online.
  • Update tool shed client defaults to reflect the fact main tool shed is now being served over HTTPS.

Galaxy IPython

IPython

And ... Björn Grüning and Helena Rasche also released Galaxy IPython:

We proudly present the first release of the Galaxy IPython project.

Galaxy IPython is a visualization plugin which should enable Galaxy users with coding skills to easily process their data in the most flexible way. With this plugin, it is possible to analyse and post-process data without downloading datasets or entire histories. One of our aims was to make Galaxy more attractive and accessible to bioinformaticians and programmers, and we hope that this project will build some bridges.

Disclaimer: Even though the Ipython notebooks can be stored and reused, this plugin will break the Galaxy philosophy of reproducibility, I feel personally bad about that, but I also think it is a great opportunity to get more bioinformaticians into Galaxy, and to get Galaxy used more often as a teaching resource. By being able to teach not only about workflows but also about data analysis tasks often necessary with Bioinformatics, Galaxy will be significantly more useful in teaching environments.

Keep in mind to write a nice Tool Shed Tool if you catch yourself using IPython in Galaxy to often for the same task.

Galaxy IPython Screencast
[Screencast](https://www.youtube.com/watch?v=jQDyTuYnn1k)

A few features we have up and running:

  • Use IPython directly in the main window or in the Scratchbook
  • Completely encapsulated IPython environment with matplotlib, biopython, pandas and friends already installed.
  • IPython runs completely self-contained within a Docker container, separate from your Galaxy data
  • Easy access to datasets from your current history via pre-defined IPython functions
  • Manipulate and plot data as you like and export your new files back into your Galaxy history
  • Save IPython Notebooks across analysis sessions in your Galaxy history with the click of a button.
  • View saved IPython Notebooks directly in HTML format, or re-open them to continue your analysis.
  • Self-closing and self-cleaning IPython Docker container
  • Notebooks are secure, only accessible to the intended user

Please follow the installation instruction on our project page.

The underlying IPython Notebook (+Galaxy sugar) is stored at Github and the Docker Registry.

You can also install a ipynb datatype:

Comments welcome!
Happy research!
Eric & Björn

Galaxy Community Hubs

   Galaxy Community Log Board
Galaxy Deployment Catalog   

   Share your experience now   



There were no new Log Board or Deployment Catalog entries in August! Eek! Please don't let this happen again!

The Community Log Board and Deployment Catalog Galaxy community hubs were launched last your. If you have a Galaxy deployment, or experience you want to share then please publish them this month.


Galaxy ToolShed

ToolShed Contributions

Galaxy Project ToolShed Repos

Here are new contributions for the past two months.

In no particular order:

Tools

  • From lparsons

  • From saskia-hiltemann

    • ireport: create interactive HTML reports from galaxy outputs.
  • From iuc

    • stringtie: fast and highly efficient assembler of RNA-Seq alignments into potential transcripts.
    • vt_variant_tools: VT: a variant tool set that discovers short variants from Next Generation Sequencing data.
    • bcftools: Utilities for variant calling and manipulating VCFs and BCFs
    • gemini: GEMINI: a flexible framework for exploring genome variation
  • From urgi-team

  • From bgruening

    • antismash: AntiSMASH - rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters
  • From agordon:

  • From david-hoover:

  • From big-tiandm:

  • From devteam:

    • correlation: computes the matrix of correlation coefficients between numeric columns.
    • vcf2pgsnp: Convert from VCF to pgSnp format
    • pgsnp2gd_snp: onverts a pgSnp dataset to gd_snp format, either starting a new dataset or appending to an old one.
    • ucsc_custom_track: Build custom track for UCSC genome browser
    • dna_filtering: Filter on ambiguities in polymorphism datasets
    • mine: Applies the Maximal Information-based Nonparametric Exploration strategy to an input dataset.
    • pearson_correlation: Computes Pearson's correlation coefficient between any two numerical columns. Column numbers start at 1.
    • generate_pc_lda_matrix: generate a matrix to be used for running the Linear Discriminant Analysis as described in Carrel et al., 2006 (PMID: 17009873)
    • count_gff_features: Counts the number of features in a GFF dataset.
    • column_maker: computes an expression for every row of a dataset and appends the result as a new column (field).
    • scatterplot: creates a simple scatter plot between two variables containing numeric values of a selected dataset.
    • plot_from_lda: generates a Receiver Operating Characteristic (ROC) plot that shows LDA classification success rates for different values of the tuning parameter tau as Figure 3 in Carrel et al., 2006 (PMID: 17009873).
    • histogram: computes a histogram of the numerical values in a column of a dataset.
    • lda_analysis: Perform Linear Discriminant Analysis
    • snpfreq: basic analysis of bi-allelic SNPs in case-control data, using the R statistical environment and Fisher's exact test to identify SNPs with a significant difference in the allele frequencies between the two groups
  • From arkarachai-fungtammasan:

    • microsatellite_ngs: Pipeline to profile and genotype microsatellites from short read data
  • From peterjc:

    • blast_rbh: BLAST Reciprocal Best Hits (RBH) from two FASTA files
  • From nikos:

  • *From mcharles:

  • *From alan-blakely:

  • From anton:

Data Managers

Suites

Packages / Tool Dependency Definitions

Other News