GalaxyAdmins

Two community-wide questionnaires were run in the fall of 2014. One asked Galaxy users about their experience, while the other focused on Galaxy Admins, developers, and Tool Authors. The Admin questionnaire is summarized below.

Big Picture

28 responses were received for the Admin questionnaire.

How do you interact with Galaxy # Responses in each range
Wrap and integrate tools 27
Administer Galaxy 26
Extend Galaxy 13
Admin using CloudMan 7

About how many end users do you expect your Galaxy to have?

# Users # Responses in each range
< 10 7
10 - 49 8
50 - 99 4
100 - 199 4
200 - 499 1
> 500 1

What type of topology are you using for Galaxy? (and computational resources?)

Server Topology # Responses
Standalone server 12
Compute cluster 10
Cloud 4
In transition 1
Other 1

Standalone Server

Compute Cluster

  • We run this on a rocks cluster with about 300 cores.
  • 1st installation: 3500 node cluster
    2nd installation: 2500 node cluster
  • We have the BioTeam appliance which started as a single stand alone server. We just added a second exec host and are in the process of adding two more.
  • Running on dedicated server, tied into local UGE grid.
  • 128 cores, 750GB RAM
    Running ubuntu 14.04/SGE/Postgresql
  • 16 cluster nodes with 12 cores per node; cluster is shared between galaxy and users from the bioinformatics core facility and bioinformatics research groups.
  • https://docs.uabgrid.uab.edu/wiki/Cheaha#Hardware

Standalone Server with Compute Cluster

  • Dedicated server with most jobs run on a cluster via SGE (actually UGE).
  • compute server with 32 cores and 192 GB RAM
    database server with 16 cores and 64 GB RAM
    2 web front end VMs

Cloud

  • Personal Galaxy running on national research cloud infrasrtucture. Used for research project on reproducibility of genomics
  • 7 cloud servers with 16 cpus / 32 GB or RAM each.
  • Cloudman cluster. Head node: 16 core, 2 web threads. 3 - 5 worker nodes (dynamic) 8 core each.

In Transition

Other

  • VM: 4 core, 8GB ram currently running in a virtual machine. Will be upgrading to a standalone server this year and connecting to our compute cluster

Storage

What is your approximate data storage capacity?

Terabytes # Responses
< 10 8
10 - 49 7
50 - 99 4
100 - 199 1
200 - 299 4
> 300 0

Storage Architecture

Capacity Architecture/Vendor Policies
10TB GPFS NO
500GB
1TB Currently just ext3, but thinking about moving to a GusterFS based system. No
2Tb btrfs No quotas, small datasets so it's not a problem
3TB Ext4 on SATA hard drive from Dell, two 3To hard drive are mounted in RAID1 manual procedures to manage storage
270 TB Ext4, ext3, NFS.
Hardrives are raided using PERC from IBM. (equivalent to raid 6).
Yes.
34T no
20TB Netapp No quotas, attempt to police users manually. May need to resort to custom scripts given that there are little to no facilities for administrative control of datasets.
16 TB simple directly attached storage quotas and regular clean-up of deleted datasets
200 TB None at this time
110TB Shared Linux storage using Gluster for all cluster users including Galaxy. Galaxy user quotas with periodic manual review.
30TB SAN Yes, using Galaxy quotas. Default is 500GB, on request projects can get 1TB and perhaps more.
We need to develop procedures around staff leaving and archiving
70TB Local disk storage is 500GB. Also have access to 325TB of Isilon storage over NFS and 215TB of Panasas parallel storage. This is in the process of being upgraded this year and will likely be several PB of GPFS storage. Yes, storage quotas are in place for users of the cluster and also in the Galaxy interface.

The Devil is in the Details

What types of customizations have you done with your Galaxy instance?

Customization Area # Responses
Tools 26
Authentication & user management 11
General admin 10
Usability and interface 8
Error handling 4
Other 4
None 0
  • auth with internal ldap server
    start/stop tooling without downtime
  • Integrated with local AD
    Customized tools for DNA sequence analysis and reporting
  • Using LDAP authentication, and developed several tools and pipelines for internal processes.
  • active directory integration
    developing and testing tools locally for migration to tool sheds.
  • LDAP auth, https, local toolshed with locally developed tools.
  • Authentication and user management:
    • customised text on login page
    • disabled non-admin users from changing their public names and registered email addresses in "preferences" (user accounts are created by admin only)
    • Tools:
      • have created a number of inhouse tools for local use
      • have customised versions of tools taken from core galaxy or toolshed, e.g. to modify parameters defaults to suit local users
    • Other: colours
    • Modified colours for masthead etc to reflect institutional corporate colour scheme, and distinguish appearance from Galaxy main.
  • Shibbolith-based authentication in Apache/mod_shib layer
    • https://projects.uabgrid.uab.edu/galaxy/wiki/GalaxyApacheShibbolethConfiguration
    • Tools - we have copies of tool wrappers for tools used for teaching that have smaller resource profiles, so they get priority running on clusters for very small datasets used when teaching classes.
    • Error handling - error integration with SGE not very good in galaxy. We have several cron jobs that scan SGE logs and email galaxy admin messages when jobs are killed by the scheduler due to having exceed run-time or RAM allocations. Standard galaxy just marks these as red, w/out noting the cause.
  • fixing URL redirection XSS
  • Authentication is through Active Directory
    • We have customised the landing page
    • There are custom tools
    • Runners and destinations have been configured
    • Some customisations done to allow remote job submission to a HPC cluster
  • error handling is done with sentry, per dev suggestions.
  • just added a tool in the toolshed (strelka) and installed some tools with toolshed. Eventually we will install Annovar manually (maybe add it in the toolshed if we got enough time).
  • Scripts to delete old files of certain types. Added custom tools.
  • Lots of tool installations (tool shed) to support GVL tutorials and workflow.
    • Added custom genome space tools for long term data storage.
  • Our server in Queensland uses a local UCSC genome browser mirror rather than going to the US.
    • We've developed and added some of our own tools.
    • We've attached indices via a Gluster file system.
    • Added fail2ban as a security measure.
  • Create tools for non galaxy tools.
  • Specific set of tools installed on image to support workflows
  • Redirection of outputs of each tools in their own directory for each user, with sub directory for each of their history, and customization of workflow to redirect all output of one workflow in the same sub-directory
    • Adding new option in formular for tools wich don't show all possible options.
    • Changing code of certain tool to improve graphical output and adding new option to manage graphics.
  • We have a growing collection of custom tools in a local tool shed.
    • I modified the workflow API to allow multiple parameters per step. I think this may have been independently fixed in a more recent version of Galaxy than came with our appliance.
  • Many additional tools installed.
    • Occasional changes to improve error logging when debugging local issues.
  • added tools and display links to IGV for more data formats
  • Stripped the user menu (described at GCC2012 in lightning talk)
    • Built several custom tools (manuscript in preparation)

Did you encounter any major issues when setting up and configuring Galaxy?

Issue Area # Responses
FTP 4
Migration 4
Cluster setup 4
Data setup 3
Security 2
None 2
Other 2
  • We still don't have a working local tool shed, and would like to get that working to have better control over local tools.
    • it's a bummer to have to restart everything as tools are updated.
  • when using external authentication (via ldap) we cannot display the data at the UCSC genome browser
  • It has been difficult to get Galaxy to play nice with our grid; we use project-based billing, and haven't gotten Galaxy to be able to specify per user, and tuning the number of jobs it can submit has been problematic; it either spams the grid or runs slowly. Also, communication between Galaxy and the grid leaves something to be desired; either grid jobs don't delete if the galaxy process is cancelled, or Galaxy loses track of the running job.
    • For FTP, integration with LDAP was a problem with Galaxy; if the files had restrictive permissions, import failed with an unhelpful error message because the file was owned by a user and couldn't be deleted when moved into galaxy.
  • toolshed with https was tricky -- I posted some documentation on this.
    • I've been surprised that for many tools I've had to manually install some dependency. Seems like that should all be handled by the toolshed.
    • I ran into lots of issues trying to get tests to run on the toolshed. I haven't tried again since the test system was revamped.
    • Data files are getting better, but too many times I've had to manually edit some loc file. Would be really nice if there was a configurable set of indices made automatically whenever a new genome is added to Galaxy.
  • Data setup:
    • managing .loc file references to reference data has to be done largely by hand (data managers still seem too limited e.g. no delete or modify existing entries)
    • data library admin interface can be a bit frustrating when linking to existing data; also paths are written to database so this needs to be manipulated directly e.g. if sysadmin changes a mountpoint location
  • Our SGE cluster uses ""shared nodes"" - you get a faction of a node. Most galaxy tool wrappers seize all node ram/CPU, which is considered ""not playing well with others"" - so we keep having to re-write wrappers to only use the resources that were allocated to them in the DRMAA string in universe. This leads to a lot of config-dupliation headaches. We really want cores & Ram to be variables that are passed around and can even be over-ridden in the GUI by the users on a per-job basis.
    • FTP - we don't use FTP itself, due to security concerns, but do create a directory where users can copy their data via the commandline for import. However, we have a lot of permissions errors depending on whether this directory is on the same FS as datasets. See posting http://lists.bx.psu.edu/pipermail/galaxy-dev/2014-June/019614.html
    • As we upgrade galaxy versions, we often have headaches from tool migrations. We also tried to move our galaxy, and have found a lot of the toolshed installs of underlying packages patch in ABSOLUTE PATHS into the env.sh files, and possibly the executables themselves, making these directories very difficult to move, w/o re-doing the install!
  • The sessions were not encrypted
  • One of the biggest issues has been keeping Galaxy up to date over the 3 years
  • Migration sucks, period. Conf files are still maintained as monolithic objects rather than composed of constituent parts resulting in me having to diff them on every upgrade.
    • Tools are painful to install from the toolshed due to lack of interactivity during the installation ("oh it failed, guess I have to go dig through hundreds of lines of uwsgi.log and grep out web access to figure out what command failed"") and during upgrade (""hey, 50 tools have upgrades. Enjoy updating them one at a time")
  • Tool shed repos are of inconsistent quality.
  • Some tools was not managing dependecies properly but that is not galaxy problem.
    • We are currently trying to move our galaxy on a local cluster using hadoop.
  • Getting all tools to install correctly. Another issue has been lack of backwards compatibility of tool sheds with older galaxies..
  • We did have some difficulty migrating our user base from one galaxy server to another. There were problems with the tool database.
    • Since we don't have the luxury of updating galaxy regularly we are running an older version of galaxy. For some tools in the main toolshed, when we try and install them via the admin interface we get python errors and cannot proceed.
  • Tool setup and installing dependencies was complicated but tool shed improvements made recently have made the process easier.
  • The Genomics Virtual Lab project in Australia (http://genome.edu.au) has spent a lot of time sorting these issues
  • It's very difficult sometimes to discover where we must add the path to access to data usefull for a tools : data_table, .loc files (sometimes several files for one tool) ...
  • There are many things I'd like to automate that just aren't supported yet in the API. The only one that was critical enough to tackle myself (multiple parameters for workflow steps) seems to have been fixed. The next on my list is permissions. I'd like to be able to automatically publish some histories.
    • Managing tools is still somewhat of a headache. The toolshed is a huge improvement, but I still hit problems regularly. We hit a pretty big bug this spring when we upgraded that broke any tools that needed to compile software. It turned out to have been fixed already in the development code, but it took some time to track down.
    • My current tool headache is that I'm unable to install the Mira4 tool from the test tool shed. I haven't had time to debug that one yet.
    • Still, I'm happy with the tool sheds in general, they just need a little work.
  • Minor issues with cluster integration, setting up the Galaxy server as a submit node, remembering to restart Galaxy after cluster software updated, etc.
    • Minor issues when migrated from data storage on the Galaxy server RAID to common gluster.
    • FTP setup did not work with the instructions provided in the wiki, although I used the same ftp server software; had to play around with multiple settings to find one that worked
  • Never got ProFTPd configured so rely on browser upload for small files and admin transfer to data libraries.

What limitations have you experienced in hosting Galaxy and satisfying end user requirements?

Limitation area # Responses
Tool definitions 12
Capacity 10
User history and dataset handling 7
Workflow 6
Authentication, User, and Job management 3
None or minor 3
Error handling 2
Other user interface 2
Other 2
  • users have a hard time keeping data organized by input file.
    • we really need some method to archive data as users fill their quotas
  • we don't have enough time/resources to develop our own tools (making the required local adjustments) satisfying all the user wishes
  • Capacity is mainly to do with dataset deletion; we haven't found a way to purge only datasets that haven't been used as an input in N days.
    • Workflow is an issue because users have to know to import global workflows into their own account and then go to the workflows screen. They expect to have them show up in the list of components on the main screen, as all the individual tasks do. It would be really nice to have workflows be able to be exposed directly as a component.
  • It doesn't make coffee for me. Otherwise it covers most of our needs for users.
  • We do a bunch of RNA-seq work. The Galaxy tools are about a generation behind what people are actually using.
  • Capacity: as both cluster and storage are very limited here this has been frustrating for some users, and has hampered take-up.
    • Tool definitions: local users don't always agree with defaults or want minor additions to 3rd party tools, both of which have to be done in local copies.
  • Biggest headarches are
    1. CPU/RAM specification not well designed; requires redundancy.
    2. tools wrappers not exposing data, such as tophat wrapper throwing away unmapped reads (which we want to analyze)
    3. not very good capture of errors, or resource usage from cluster. When things fail, failure reasons not well captured, when they succeed, run time, queue time and actual RAM used are not captured or displayed to end user or admin.
  • Earlier there were storage capacity issues but these have been resolved.
    • There has been a lot of work put into making certain tools work."
  • One workflow/one owner model is problematic, no way for those who've received a copy of a workflow to update it without re-importing.
  • Tool data location definitions are cumbersome. Toolshed installations are not smooth, require admin interaction. Tool errors are sometimes inconsistently caught, but may be fault of toolshed owner.
  • In the interface of the tools, i find it very limited unless we can use html and css? Did not try this at the moment.
    • For example if we could put the different element on two separate column. Put color on background or whatever.
  • Galaxy unnecessarily stores intermediate files which exacerbates our storage issues. Also, it is very difficult to save the workflow that generated a history without also saving the history. History import is buggy. Workflows are nice, but not flexible enough to be able to create, for example, a generic RNA-Seq workflow that can work with various numbers of samples and experimental designs.
  • Non iuc tools are not very reliable. Lots have been abandoned in till sheds.
  • Managing users data is difficult. Its hard to tell who belongs to which files. Purging old histories etc is not straightforward.
    • Jobs submitted by galaxy to SGE don't easily allow for accounting.
  • Our users want at least the same storage quotas they get from the public galaxy servers. We can't provide that yet. We have one group trying to get their tools working on the latest version of galaxy but they were written back in 2009. It is difficult to get them to understand that we can't just put the xml files in place and everything works. This is a HUGE misunderstanding by scientists and PIs here!
  • No loop of conditional loop is possible with workflow, we must create a specific tool which create internally this loop to incorpore in workflow
  • Capacity:
    • As I mentioned above, it comes with the territory, but it would be nice to have more tools to handle capacity. Another thing I've wished for is the ability to archive whole histories. I'd like to keep our older histories around, but not necessarily on our best storage server. It's nearly impossible to manually go into Galaxy's files and pull out the necessary ones for a given history.
    • It'd be great if Galaxy could have a configured "archive" location. Then, if a history were "archived" all the files could get moved there. Alternatively, the ability to just download entire histories as some sort of structured archive that could be re-uploaded in the future if needed.
    • Tools:
      • We're always wishing tools existed that don't and we've had a string of minor hurdles in installing existing tools. Still, it gets the job done.
    • Histories/Datasets:
      • As above, I'd like some sort of archiving ability and the ability to remove intermediate files from disk without losing the record.
      • Additionally, the interface for getting datasets from multiple histories into one new history is very cumbersome. An API interface for copying datasets between histories would help here.
  • Initially experienced capacity limitations due to local RAID storage, since mitigated by moving Galaxy's storage to main shared gluster.
    • Most tools of local interest required tool wrappers be writted.
    • Users observed to struggle with naming datasets and histories, both due to unhelpful defaults, and also feeling of "drowning in data" once build up a collection of many histories/datasets/
    • Desire to move to integrated user management averaging existing IT Windows/Linux accounts, in part to run cluster jobs as specific Linux users.
    • Currently main limitation is staff time to manage the system and provide user help/training.
  • we do a lot of data analysis outside of Galaxy and it's cumbersome to get data in and out without losing track
  • I didn't realize workflows would break with newer tool versions (minor irritant).
    • Explored using API to pass information to other applications but didn't resolve.