Locked History Actions

Support

Support

What's New? Watch GXYcast1
Learning Galaxy? Get started with Galaxy NGS 101
Ready for your test drive? Follow the 101 Tutorials here http://usegalaxy.org/galaxy101


End-User Support Resource Short List

Contents

  1. Using Galaxy
    1. Galaxy NGS 101
    2. Learning Hub
    3. Screencasts
    4. Custom Searches
    5. Biostar
    6. Mailing Lists
    7. IRC Channel
    8. Galaxy Issue Board
    9. Multiple Galaxies
    10. About Galaxy
  2. Solutions
    1. Tool help
    2. Getting an account
    3. Finding a tool
    4. Loading data
      1. Get Data: Upload
      2. Get Data: EBI-SRA
      3. Get Data: Upload tool used with FTP
      4. Upload tool location
      5. Upload tool option to move FTP datasets into a History
      6. Upload tips
    5. Downloading data
      1. Download tip: Big data
    6. Dataset and History Guides
      1. Get Registered
      2. Example: My History is missing!
      3. Example: My account quota is too large
      4. Example: Dataset metadata missing or incomplete
    7. Dataset status and how jobs execute
      1. Green
      2. Yellow
      3. Grey
      4. Red
      5. Light blue
      6. Grey, Yellow, Grey again ???
      7. Bright blue with moving arrow (deprecated)
    8. Error from tools
      1. Troubleshooting tool errors - Review THIS before submitting bug reports
      2. Job failure reason: cancelled by admin or a cluster failure
      3. Job failure reason: exceeds memory allocation
      4. Job failure reason: execution exceeds maximum allowed job run time (walltime)
    9. Tool doesn't recognize dataset
    10. Dataset special cases
      1. FASTQ Datatype QA
      2. Tabular/Interval/BED Datatype QA
    11. Reference genomes
      1. Detecting Genome Mismatch Problems
      2. Correcting Chromosome Identifier Conflicts
      3. Avoiding Genome Mismatch Issues
      4. Reference Genomes and GATK
    12. Shared and Published data
    13. Reporting tool errors
    14. Interpreting scientific results
      1. Tools on the Test server
      2. Tools on the Main server: RNA-seq
    15. Custom reference genome
      1. Videos
      2. Best Practices
      3. Quick genome access
      4. Tools on the Main server: Extract DNA
      5. Tools on the Main server: GATK
    16. Start a Cloudman Server
  3. Community Q & A
    1. What to include in a question
    2. Starting a scientific, data, or tool usage thread
    3. Starting a technical tool, local/cloud instance, or development thread
    4. Reporting a software bug

Galaxy Web Search

Galaxy Mailing List Search

Using Galaxy Web Search

Galaxy administration, tool, and deployment search

Using Galaxy

Galaxy NGS 101

The Big Picture: Galaxy NGS 101

Learning Hub

See our Learning hub for key coverage of Galaxy user interface concepts, data, and tools. Review "Shared Data → Published Pages" on the Main server usegalaxy.org for publication supplementals and tutorials. Also see the section at the top of this wiki End-User Support Resource Short List.

Screencasts

Screencast videos demonstrate the step-by-step for a range of topics. Packed with tips and methods usable across analysis workflows plus presentations and tutorials for administrations, these are a great resource for both the scientific and technical Galaxy communities.

Watch MOST videos at Vimeo here at Galaxy Project on Vimeo

Custom Searches

Looking for something specific? Try Galaxy Custom Searches with a keyword or phrase. The MailingList search finds all related prior Q & A from the Galaxy Biostar forum and any Galaxy Project mailing lists. The UseGalaxy search finds all online resources for information about using Galaxy. This includes this wiki, tool help and shared Galaxy objects at UseGalaxy.org, and Mailing Lists.

Biostar

Galaxy has teamed up with Biostar to create a Galaxy User support forum at https://biostar.usegalaxy.org.

Galaxy Biostar
We want to create a space where researchers using Galaxy can come together and share both scientific advice and practical tool help. Whether on http://usegalaxy.org, a Cloudman instance, or any other Galaxy, if you have something to say about Using Galaxy, this is the place to do it!

Mailing Lists

Galaxy has one public mailing lists for questions, one private mailing list for bug reports, and one announcement mailing list. Please do not post questions through the Galaxy Issue Board; these will only be redirected. Manage subscriptions and learn more about these list at the Mailing Lists home page. See also:

Images/Logos/MailmanLogoSmall.png

Note: The galaxy-user mailing list has been retired. Galaxy scientific user and tool help has moved to Galaxy Biostar.

IRC Channel

Galaxy IRC Channel
Server: irc.freenode.net  
Channel #galaxyproject

Galaxy also has an IRC channel in which you can participate. You can connect to the chat directly via browser here. This IRC channel is an informal online gathering place for the Galaxy community to post questions and help each other out. If you are unfamiliar with IRC, it is conducive to quick discussion, much like any other casual chat program. There is also lots of online help about IRC.

The #galaxyproject IRC channel has an online public archive (starting 2014/10/22) and these archives are included in the Galaxy search engines.

Galaxy Issue Board

Galaxy Issue Board

The Galaxy Project uses Trello for issue tracking and feature requests. The Galaxy Issue Board supports issue creation, commenting, and voting on issues.

Multiple Galaxies

Some researchers use more than one Galaxy server. How to move data between these is described here.

This procedure works between any two Galaxy instances, whether using the Main public instance, a Local, a Cloud, and many Public Galaxy Servers.

About Galaxy

GalaxyProject.org
Galaxy is an open, web-based platform for data intensive biomedical research.

Galaxy's goal is to be accessible, reproducible, and transparent.
  • Accessible: Users without programming experience can easily specify parameters and run tools and workflows.

  • Reproducible: Galaxy captures information so that any user can repeat and understand a complete computational analysis.

  • Transparent: Users share and publish analyses via the web and create Pages, interactive, web-based documents that describe a complete analysis.

There are many Choices when you want to learn more about using Galaxy!

In addition to using the public Galaxy server, you can also install your own instance of Galaxy, or create an instance of Galaxy on the cloud using CloudMan and explore all Cloud resources including AWS in Education grants. Another option is to use one of the ever-increasing number of Public Galaxy Servers hosted by other organizations. And if you or your lab is looking to run their own local instance that is "ready-to-go", learn more about the Slipstream Appliance: Galaxy Edition.

The GALAXY Framework at the highest level is a set of reusable software components. Learn more about Galaxy's Architecture.

UseGalaxy.org
Galaxy ProjectBig PictureCommunityGet GalaxyCloudManTool ShedDevelopNews BriefsServersLearnSupportGalaxy BiostarNewsTwitterEventsTeachIssuesCiteAbout GalaxyGalaxy Team

The Galaxy Team is a part of the Center for Comparative Genomics and Bioinformatics at Penn State, and the Department of Biology at Johns Hopkinis University

Galaxy is supported in part by NSF, NHGRI, the Huck Institutes of the Life Sciences, and The Institute for CyberScience at Penn State, and Johns Hopkins University.

The public Main instance of Galaxy at http://usegalaxy.org utilizes infrastructure generously provided by the iPlant Collaborative at the Texas Advanced Computing Center, with support from the National Science Foundation.



Solutions

Tool help

Galaxy has a simplified tool interface packed with usage details. Read more...

Getting an account

Having your own account on the public Test and/or Main server means that you can save histories, work with more data, associate an OpenID, and get the most out of Galaxy's functionality. Be sure to note that the public Test and Main instance usage policies are one account per user, as stated in our Terms and Conditions. Also, make sure your email address is valid so that you can confirm your new account (emails are case sensitive) and so that our administrator can contact you if needed (rare, but you'll want the email!). More details here.

Watch the Accounts on Main video for a quick how-to and see our User Accounts wiki for more help.

Finding a tool

At the top of the left tool panel, type in a tool name or data type into the tool search box. Shorter keywords find more choices. Can't find the tool you want? Try looking in the Tool Shed. New tools are added all the time that can be used in local or cloud Galaxy instances.

Loading data

Data is loaded using the tools in the Get Data tool group. Some access specific data provider sites that will load data back into your Galaxy history. To directly load your own local data or data from another source, use the tool Get Data → Upload File (also accessible from the top of the left tool panel, as seen in the graphics below). Want to practice import/export functions with small sample data? Import the Upload sample data history here.

  • Watch the Getting Data In Screencasts

  • Each file loaded creates one dataset in the history.
  • The maximum size limit is 50G (uncompressed).
  • Most individual file compression formats are supported, but multi-file archives are not (.tar, .zip).

Get Data: Upload

  • Load by "browsing" for a local file. Only good for very small datasets. ( < 2G, but often works best for smaller). If you are having problems with this method, try FTP.

  • Load using an HTTP URL or FTP URL.

  • Load a few lines of plain text.
  • Load using FTP. Either line command or with a desktop client.

Get Data: EBI-SRA

  • Search for your data directly in the tool and use the Galaxy links
  • Visual example

  • Be sure to check your sequence data for correct quality score formats and the metadata "datatype" assignment. Here is how...

Get Data: Upload tool used with FTP

  • Load the data using line command FTP or a client. More help...

  • Note that the FTP server name is specific to the Galaxy you are working on. This is by default the URL of the server.
    • For the public Main Galaxy instance at http://usegalaxy.org the FTP server name to use is usegalaxy.org.

    • For a default local (with FTP enabled, see next) the FTP server name to use is localhost:8080. If the server URL was modified, then use that custom URL.

  • If on another server, the FTP server name will appear in the Upload tool pop-up window (see graphics below). When using a local Galaxy server, be certain to configure your instance for FTP first.

  • Use your email and password for the same instance as your credentials to log in and save the data to your account.
  • Once the data is loaded (confirm through FTP client), use the Upload tool to load the data into a History.

Upload tool location

Upload tool location

Upload tool option to move FTP datasets into a History

If you DO NOT see any files as in the example below, load data using FTP first, then come back to the Upload tool. Upload tool "Load FTP data" function

Upload tips

  • Data quota is at limit, so no new data can be loaded. Disk usage and quotas are reported at User → Preferences when logged in.

  • Password protected data will require a special URL format. Ask the data source. Double check that it is publicly accessible.

  • Use FTP, not SFTP. Check with local admin if not sure.

  • No HTML content. The loading error generated may state this. Remove HTML fields from your dataset before loading into Galaxy or omit HTML fields from the query if importing from a data source (such as Biomart).

  • Compression types .gz/.gzip, .bz/.bzip, .bz2/.bzip2, and single-file .zip are supported.

  • Only the first file in any compressed archive will load as a dataset.

  • Data must be < 50G (uncompressed) to be successfully uploaded and added as a dataset to a history, from any source.

  • Is the problem the dataset format or the assigned datatype? Can this be corrected by editing the datatype or converting formats? See Learn/Managing Datasets for help or watch the screencast above for a how-to example.

  • Problems in the first step working with your loaded data? It may not have uploaded completely. If you used an FTP client, the transfer message will indicate if a load was successful or not and can often restart interrupted loads. This makes FTP a great choice for slower connections, even when loading small files.


Downloading data

  • Download datasets by clicking on the disk icon inside the dataset. Good for smaller sizes in all browsers.

  • Download entire histories by selecting "Export to File" from the History menu, and clicking on the link generated.

  • Transfer entire histories by selecting "Export to File" from the History menu, generating the link, coping the link in the "from" Galaxy instance, then in the "to" Galaxy instance select "Import from File" from the History menu, and paste in the link into the new form.

* The video Datasets 1 includes help about different datatypes and what to expect in the download icon (one file or two!).

Download tip: Big data

  • How can I download larger datasets?

Browser option: use Google Chrome and click on the disc icon (this browser supports continuous download.

Utility option: from a shell/unix/terminal window on your computer use wget or curl.

The link can be obtained by right clicking the floppy disk icon inside a history item and choosing "Copy Link Location" (for most datasets) or "Download Dataset/Download bam_index" (for BAM datasets there are two downloads). Once you have the <link>, type this (where "$" indicates the terminal prompt), so that the <link> is inside of single quotes. Like many commands, there are many options. These are examples commonly used with Galaxy.

  $ wget -O '<link>' 
  $ wget -O --no-check-certificate '<link>' # ignore SSL certificate warnings
  $ wget -c '<link>'                        # continue an interrupted download

  $ curl -o outfile '<link>' 
  $ curl -o outfile --insecure '<link>'     # ignore SSL certificate warnings
  $ curl -C - -o outfile '<link>'           # continue an interrupted download

Dataset and History Guides

Review details about these Galaxy objects, plus Workflows and Visualizations in the Learn wiki

Get Registered

  • From the top User menu, select Register

  • More details are here: Getting an account

  • Registered accounts work with multiple Histories, increased quotas (both data and job/tool access), and have access to full Galaxy functionality.
  • A valid email address is the only piece of identifiable information that you share for registration
  • We never, ever, share your Registration information
    • On rare occasions we might send an important administration email about your account
    • These are emails that you definitely want to get

    • Use a valid email address. Not only to confirm you account, but to ensure that we can communicate with you when really necessary
  • Make sure to Register for an account when doing any serious work

    • Register at any time, even in the middle of an analysis. The current History will be added
    • Log into your existing account. The current History will be added
    • Please follow the User quotas for the Galaxy server in use. For http://usegalaxy.org, this is one account per user.

Example: My History is missing!

  • If you were working with an unregistered account, the History could really be now lost to you. Get Registered before starting again!

  • Find any History

    • Locate the History menu gear icon at top of right History panel in the "Analyze Data" view
    • Select the option "Saved Histories" from this pull-down menu
    • At the top of the list in the middle panel, click into "Advanced Search"
    • Select "status: all" to see all of your active, deleted, and permanently deleted histories.
      • Histories in all states are archived for a fairly long time for registered accounts. Meaning, one will always find their data here if it ever appears to be "lost".

Example: My account quota is too large

Example: Dataset metadata missing or incomplete

  • How to notice if this is a problem

    • The dataset will not download when using the disk icon
    • Tools error when using a specific dataset that has been used before successfully
    • Tools error with a message that ends with: OSError: [Errno 2] No such file or directory. Note that not all failures of this type are due to metadata and may simply be a cluster failure - rerunning the job may resolve the problem instead, but try the solution first.

  • Solution

    • Reset the metadata on the dataset(s). This may be an uploaded dataset or one created by prior tools. It could be one of the input datasets to a failed job.
      • How to: Click on the Auto-detect button found near the bottom of the Edit Attributes form for the dataset. Reach this form using the dataset's pencil icon.

  • If this does not resolve the problem

    • If resetting metadata does fix the issue, then there may have been a transient cluster job failure. Re-run the job at least once.
  • Other problematic dataset solutions are listed here, but these are not are not based on the same underlying issue.


Dataset status and how jobs execute

The Galaxy user interface (UI) has been designed to communicate job execution status through visual cues and concise messages. Learn more about how to identify these cues by examining what Datasets in different states look like.

When a tool is executed, one or more new datasets are added to a history. The same is true when a workflow is executed. If using the public Main Galaxy instance, the most effective strategy when running jobs on the shared resource is to start jobs (or workflows), and then leave them alone to execute until completion.

When work is urgent during peak-usage times on the public Main Galaxy instance, a CloudMan instance is a quick-to-implement alternative. For large scale and/or urgent ongoing work, a CloudMan, Local, or SlipStream Galaxy each have advantages as a longer-term solution. Read more ...

So, how does the processing of tool jobs on Main actually work?

  • The color of a dataset designates the current status of the underlying job.

Green

  • The job completed successfully.
  • The resulting data is ready to be used in visualizations, available as input to tools, can be downloaded, or utilized for any other downstream purpose.

Yellow

  • The job is executing. Allow this to complete!
  • If you are using the public Main Galaxy instance, this job is running on one of our clusters. Different types of tools send jobs to different clusters appropriate for the requirements of each tool. Some tools are more compute intensive than others and significant resources are dedicated to job processing. Jobs have up to 72 hours to complete, if they run longer than this they will fail with a "wall-time" error and turn red. Examining tool paramaters is the first option, less sensitive parameters may result in an equally acceptable result, but use less resource. If that is not appropriate or does not succeed, a CloudMan Galaxy or Local Galaxy with sufficient resources may be the solution.

Grey

  • The job is being evaluated to run (new dataset) or is queued. Allow this to complete.
  • If you are using the public Main Galaxy instance, this job is queued, waiting for an opening on the appropriate cluster. It is very important to allow queued jobs to remain queued, and to not delete/re-run them. If re-run, this not only moves the new job back to the end of the queue, effectively lengthening the wait time to execute, but if done repeatedly, the volume of "executing deleted" jobs can create additional work processes in the history as these are cleared away, using up resources, and can cause additional delays.

Red

  • The job has failed.
  • There can be many reasons for this, see the next section, Error from tools for details.

Light blue

  • The job is paused.
  • This indicates either an input has a problem or that you have exceeded disk quota set by the administrator of the Galaxy instance you are working on.

  • If there is an input problem, correct the problem (often by re-run an upstream job) and click on the tool form option to "resume dependencies". You will not need to stop or restart downstream jobs in most cases (permit paused jobs to start, as inputs datasets become available, through this method).
  • If you need to make room, permanently delete unneeded data. If you are using the public Main Galaxy instance, disk quotas are defined here. You will not need to delete/re-run jobs while doing this, unless you are filtering your work to prevent exceeding quota again (only purging, not restarting at this time). Instead, restart using the History menu option "Resume Paused Jobs".

Grey, Yellow, Grey again ???

  • The job is waiting to run, due to admin re-run or an automatic fall-over to a longer-running cluster (currently, Stampede)
  • First, see the descriptions for grey and yellow jobs above.

  • The job was first submitted to the default cluster, but did not finished within the "wall-time" quota. Instead of failing, the job was automatically submitted to the long-running cluster Stampede. This cluster offers more execution time resource to the job. The wait may be longer since jobs running on this cluster by other users are also executing for a longer time period.

  • Stopping (deleting) the job and then restarting places it back at the end of the first queue, where the cycle will begin again, extending wait time even further. Please do not do this. Allow the job to process.
  • If the job fails after running on Stampede, then it is too large to run on http://usegalaxy.org, also known as "Main". Choose another strategy to execute your job on a different Galaxy platform or consider modifying inputs/parameters to make the job less compute intensive.

  • Read more about the details of different clusters on the Main wiki.

Bright blue with moving arrow (deprecated)

  • May be found in earlier Galaxy versions.
  • Applies to "Get Data → Upload File" tool only - the upload job is queuing or running

  • The job may run immediately, or may turn grey if the server is busy, meaning that guidelines for grey jobs apply, and these grey datasets should never be deleted/re-run, for the same reasons explained above.

  • An upload job that seems to stay in the "bright blue with moving arrow" state for a very long time generally indicates that the file being loaded is too large for the method used (specifically, a browsed-file upload) and FTP should be used instead. This is the only active job that should be deleted under normal usage, as it will never complete (no file over 2G will ever load via file browser upload).

Error from tools

Dataset format problems are the #1 reason that tools fail. Most likely this problem was introduced during the initial data upload. Double check the dataset against Galaxy's datatypes or external specifications. In many cases, the format issues can be corrected using a creative combination of Galaxy's tools.

Troubleshooting tool errors - Review THIS before submitting bug reports

  • Verify the size/number of lines or md5sum between the source and Galaxy. Use Line/Word/Character count of a dataset or Secure Hash / Message Digest on a dataset to do this.

  • Look at the end of your file. Is it complete? Are there extra empty lines? Use Select last lines from a dataset with the default 10 to check.

  • Check errors that come from tools such as the FASTQ Groomer. Many tools report the exact problem with exact instructions for corrections.

  • Is the format to specification? Is it recognized by Galaxy? By the target tool or display application? Check against the Galaxy Datatypes list.

    • Are you using a Custom Reference Genome? Have you tried the quick Troubleshooting tips on the wiki?

    • Note: not all formats are outlined in detail as they are common types or derived from a particular source. Read the target tool help, ask the tool authors, or even just google for the most current specification.
  • Is the problem the dataset format or the assigned datatype? Can this be corrected by editing the datatype or converting formats? Often a combination of tools can correct a formatting problem, if the rest of the file is intact (completely loaded).

  • Is the problem a scientific or technical problem? Also see #Interpreting scientific results to decide.

    • Example NGS: Mapping tools: On the tool form itself is a short list of help plus links to publications and the tool author's documentation and/or website. If you are having trouble with Bowtie, look on this tool's form for more information, including a link to this website: http://bowtie-bio.sourceforge.net/index.shtml.

    • Example NGS: RNA Analysis tools: See the galaxy-rna-seq-analysis-exercise tutorial and transcriptome-analysis-faq. If these do not address the problem, then contacting the tool authors is the next step at: mailto:tophat.cufflinks@gmail.com.

    • Example NGS: SAM Tools tools: SAMTools requires that all input files be to specification (Learn/Datatypes) and that the same exact reference genome is used for all steps. Double checking format is the first check. Double checking the the same exact version of the reference genome is used is the second check. The last double check is that the number of jobs and size of data on disk is under quota. Problems with this set of tools is rarely caused by other issues.

  • Tools for fixing/converting/modifying a dataset will often include the datatype name. Use the tool search to locate candidate tools, likely in tool groups Text Manipulation, Convert Formats, or NGS: QC and manipulation.

  • The most commonly used tools for investigating problems with upload, format and making corrections are:
    • TIP: use the Tool search in top left panel to find tools by keyword

    • Edit Attributes form, found by clicking a dataset's Images/Icons/pencil.png icon

    • Convert Format tool group

    • Select first lines from a dataset

    • Select last lines from a dataset

    • Line/Word/Character count of a dataset

    • Secure Hash / Message Digest on a dataset

    • FASTQ Groomer

    • FastQC - How to read the report

    • Tabular to FASTQ, FASTQ to Tabular

    • Tabular to FASTA, FASTA to Tabular

    • FASTA Width formatter

    • Text Manipulation tool group

    • Filter and Sort tool group

Job failure reason: cancelled by admin or a cluster failure

The initial error message will be reported as below, and is found in the comments of a failed dataset (red dataset):

This job failed because it was cancelled by an administrator.
Please click the bug icon to report this problem if you need help.

Other reported error indicate a cluster failure in the error report (click on the bug icon to review. These often do not need to be submitted as the failure error message describes the problem and correction path.

The error indicates that the job was likely given inputs and/or parameters that are either malformed, do not meet the requires for the tool's usage, or the parameters used are very computationally intensive. See Troubleshooting tool errors. These are the exact same checks a submitted bug report is reviewed for.

It is also possible that there was a server or cluster error. A re-run for nearly all failed jobs is the first pass solution. Exceptions may be if the error is clarified as exceeding memory or job execution time (see next sections).

See the two sections below for details about how to determine and resolve the root cause of the error.

If after reviewing, and re-running, and the cause of the error is unclear:

Job failure reason: exceeds memory allocation

The full error message will be reported as below, and can be found by clicking on the bug icon for a failed job run (red dataset):

job info:
This job was terminated because it used more memory than it was allocated.
Please click the bug icon to report this problem if you need help.

On rare cases when the memory quota is exceeded very quickly, an error message such as the following can appear:

job stderr:
Fatal error: Exit code 1 ()
Traceback (most recent call last):
(other lines)
Memory Error

The error indicates that the job ran out of memory while executing on the cluster node that ran the job. This memory is different than the amount of disk usage in an account.

Often memory errors can be avoided by the user executing the job:

  • Double check the inputs to the tool. Are the data properly formatted and labeled?

  • Review the parameters for the tool and determine if any changes made away from the default setting (or possibly the detail settings) are compute-intensive. Make changes if they suit your research goals. See the underlying tool's documentation to better understand specific parameters. This is often linked in the Help section on a tool's execution form.
  • If the tool used was one that compares two dataset, change the order of the inputs, and test with a re-run. Some tools consume less memory when the larger dataset is entered as the first input on the tool form.
  • Also see the troubleshooting help here.

  • In some cases, reporting the memory issue to our team through the "green bug" icon from a dataset is a good way to let us know about tools that run out of memory resource. With the expectation that we cannot probably not solve your issue directly, but cumulative feedback helps us to learn which tools would benefit from additional resource allocation.

If the job remains too large to run on the public Main Galaxy instance at http://usegalaxy.org, then moving to an instance where more memory can be allocated to jobs is the solution. A good choice is CloudMan as processing memory is easily scaled up. AWS in Education grants can help with costs. Another option is setting up a local Galaxy, if you have a computer/server with enough processing memory (16 GB minimum, but more is likely needed if the jobs are large, possibly up to 64 GB).

Job failure reason: execution exceeds maximum allowed job run time (walltime)

The full error message will be reported as below, and can be found by clicking on the bug icon for a failed job run (red dataset):

job info:
This job was terminated because it ran longer than the maximum allowed job run time.
Please click the bug icon to report this problem if you need help.

The error indicates that the job execution time exceeded the "wall-time" on the cluster node that ran the job. "Wall-time" is the maximum amount of time any job has to complete before it is terminated. When using the public Main Galaxy instance at http://usegalaxy.org, see the walltime available here.

Sometimes the execution time of a job can be shorted by adjusting the inputs, parameters used, or the cluster used (try Stampede or Jetstream, if available for the tool form under the section Job Resource Parameters). This solution is similar to jobs that error because they exceed memory allocation.

  • If the tool used was one that compares two dataset, change the order of the inputs, and test with a re-run. Some tools consume less memory when the larger dataset is entered as the first input on the tool form.
  • Also see the troubleshooting help here.

  • Give the longer-running cluster a try, see the Main wiki's section about Stampede (Jetstream is also a choice).

If the job remains too large to run on the public Main Galaxy instance at http://usegalaxy.org, then moving to an instance where more resource can be allocated for jobs is the solution. A good choice is CloudMan. AWS in Education grants can help with costs.

Tool doesn't recognize dataset

Usually a simple datatype assignment incompatibility between the dataset and the tool. Expected input datatype format is explained on the Tool form itself under the parameter settings. Convert Format or modify the datatype using the dataset's pencil icon to reach the Edit Attributes form. Many metadata attributes can be edited on these forms, including database. You may need to first create a Custom Build when using a Custom Reference Genome.

Dataset special cases

FASTQ Datatype QA

  • If the required input is a FASTQ datatype, and the data is a newly uploaded FASTQ file, run FastQC then FASTQ Groomer as first steps, then continue with your analysis. Watch the FASTQ Prep Illumina screencast for a walk-through.

    • If you are certain that the quality scores are already scaled to Sanger Phred+33 (the result of an Illumina 1.8+ pipeline), the datatype ".fastqsanger" can be directly assinged. Click the pencil icon to reach the Edit Attributes form. In the center panel, click on the "Datatype" tab (3rd), enter the datatype ".fastqsanger", and save. Metadata will assign, then the dataset can be used.

    • If you are not sure what type of FASTQ data you have, see the help directly on the FASTQ Groomer tool for information about types.

      • For Illumina, first run FastQC on a sample of your data (how to read the full report). The output report will note the quality score type interpreted by the tool. If not ".fastqsanger", run FASTQ Groomer on the entire dataset. If '.fastqsanger", just assign the datatype.

      • For SOLiD, run NGS: Fastq manipulation → AB-SOLID DATA → Convert, to create a ".fastqcssanger" dataset. If you have uploaded a color space fastq sequence with quality scores already scaled to Sanger Phred+33 (".fastqcssanger"), first confirm by running FastQC on a sample of the data. Then if you want to double-encode the color space into psuedo-nucleotide space (required by certain tools), see the instructions on the tool form Fastq Manipulation for the conversion.

    • If your data is FASTA, but you want to use tools that require FASTQ input, then using the tool NGS: QC and manipulation → Combine FASTA and QUAL. This tool will create "placeholder" quality scores that fit your data. On the output, click the pencil icon to reach the Edit Attributes form. In the center panel, click on the "Datatype" tab (3rd), enter the datatype ".fastqsanger", and save. Metadata will assign, then the dataset can be used.

Tabular/Interval/BED Datatype QA

  • If the required input is a Tabluar datatype, other datatypes that are in a specialized tabular format, such as .bed, .interval, or .txt, can often be directly reassigned to tabular format. Click the pencil icon to reach the Edit Attributes form. In the center panel, using tabs to navigate, change the datatype (3rd tab) and save, then label columns (1st tab) and save. Metadata will assign, then the dataset can be used.

  • If the required input is a BED or Interval datatype, the reverse (.tab → .bed, .tab → .interval) may be possible using a combination of Text Manipulation tools, to create a dataset that matches the BED or Interval datatype specifications.

Reference genomes

Using the same exact reference genome for all steps in an analysis is often mandatory to obtain accurate results. To use the reference genomes available on usegalaxy.org (Main), get the genome from our rsync server.

Detecting Genome Mismatch Problems

  • How can I tell if I have a reference genome mismatch problem?

    • There isn't one single error that points to this problem. But, if you are running a tool for the first time using a newly uploaded dataset, and an error occurs or more likely simply unexpected results are produced - double checking the reference genome would be a good choice.

Correcting Chromosome Identifier Conflicts

  • I suspect there is a problem with the identifiers but how can I check? Or better, how can I fix the problem?

    • A quick way to check for this issue is to compare the chromosome identifiers in the input datasets to each other and to the reference genome used (or intended to be used).
    • Even small differences in identifiers can cause tools to fail, produce warnings, or create incomplete results. This is the second most common cause of usage-related tool failures (input format problems are the first).

    • Using an Ensembl-based chromosome identifier file on Galaxy Main with a locally cached reference genome? Most built-in, native, reference genomes are sourced from UCSC and have UCSC-based identifier names. When using inputs with both versions of identifiers in the same analysis pipeline, there will almost certainly be errors or unexpected results. But, in many cases, inputs from the history can be adjusted to match the cached data, all within Galaxy. Read more about how...

    • Why isn't my Ensembl GTF compatible with Cufflinks and how can I use Ensembl GTFs with Cufflinks?

      • First, determine if an Ensembl GTF is the best choice. If an iGenomes version is available, this has advantages due to the addition of specific attributes utilized by the RNA-seq Tuxedo pipeline. Check at the Cufflinks website here.

        • Download the .tar file locally, uncompress it, then upload only the .gtf file to Galaxy. Loading .tar archives is not supported and has unpredictable outcomes (sometimes the first file in the archive will load - but this is not the file you need, sometimes only a portion of the first file will load - without a warning, and other times an upload error will result: none of these cases should be reported as a bug report/tool error).

        • For certain genomes, the reference annotation GTF file is available on the public Main Galaxy instance, http://usegalaxy.org, under Shared Data -> Data Libraries -> iGenomes.

      • Next, if you want to proceed, confirm that your identifiers are a good candidate for the addition of the "chr" adjustment, then use the workflow available in the Transcriptome Analaysis FAQ.

Avoiding Genome Mismatch Issues

  • When moving between instances, what can be done to mitigate the risk of using the wrong assembly?

  • How do I load a reference genome?

    • Use FTP - details are here... and troubleshooting help is here...

    • If your genome is small (bacterial, etc.), using it as a Custom Reference Genome is the quickest way to get it into Galaxy and to start using it with tools.

Reference Genomes and GATK

Shared and Published data

Have you been asked to share a history? Or has someone shared a workflow with you but you're not sure where to find it? Or maybe you just want to find out more about how publishing your work in Galaxy can be used to support your next publication? Watch the how to Share and Publish screencast and read more here.

Reporting tool errors

  • If running a tool on the public Galaxy server (i.e., http://usegalaxy.org) is resulting in an error (the dataset is red), and you can't determine the root cause from the error message or input format checks:

    • Re-run the job to eliminate transitory cluster issues.
    • Report the problem using the dataset's bug icon. Do not submit an error for the first failure, but leave it undeleted in your history for reference.

    • IMPORTANT: Get the quickest resolution by leaving all of the input and output datasets in the analysis thread leading up to the error in your history undeleted until we have written you back. Use Options → Include Deleted Datasets and click dataset links to undelete to recover error datasets before reporting the problem, if necessary.

      • Example: Error with Cufflinks? Leave the ungroomed + groomed FASTQ, Bowtie/Tophat SAM, optional GTF + custom genome, and Cufflinks datasets undeleted.

    • Include in the bug report what checks confirmed that data format was not an issue
    • Anything else you feel is relevant to the error
  • We do our best to respond to bug reports as soon as possible.
  • Please send all email as reply-all as we work to resolve the error. The galaxy-bugs address we will be corresponding from is internal to the Galaxy team only and we work together to resolve reported problems.

  • If you have resolved the issue, a reply to the bug report to let us know is appreciated.

Interpreting scientific results

A double check against the tool help and documentation is the first step. If the tool was developed by a 3rd party, they are likely the best experts for detailed questions. Tool forms have links to documentation/authors.

Tools on the Test server

  • Tools on Test will have little to no support help offered.

  • Test tool errors reported as a bug reports (#Error from tools) are considered low priority and may not receive a reply.

  • General feedback & discussion threads (instead of questions requiring a reply from the Galaxy team) are welcomed at the development mailing list.

  • Exceptions are possible. Sometimes community users help to test-drive new functionality. If you are interested in this type of testing for a particular tool, contact us on the development mailing list.

Tools on the Main server: RNA-seq


Custom reference genome

Often the quickest way to get your analysis going is to load a custom genome for your own use. Simply upload the FASTA file using FTP and use it as the "reference genome from the history" (wording can vary slightly between tools, but most have this option). Learn more about how to set up and use a Custom Genome including how to create a Custom Build.

Videos

Best Practices

  • Make sure the reference genome is in FASTA format and is completely loaded (see Trouble loading data above).

  • Use the same custom genome for all the steps in your analysis that require a reference genome. Don't switch or the data can become mismatched in your files, preventing downstream work.
  • To add a custom Genome Build so that it can be assigned as a "database" attribute, or to make it known/available to certain tools, create it under "User → Custom Builds". More details here....

  • TIP: To modify a dataset to have an unassigned reference genome, use the pencil icon to "Edit Attributes". On the form, for the attribute Database/Build:, set the genome to be " unspecified (?) ", and submit. Any prior assignments will be removed.

  • If you genome is available on usegalaxy.org (Main), but just not indexed for the tool you want to use, you can get the genome from our rsync server. This will ensure that all of your work uses the same exact reference genome for all steps in an analysis, a critical part of a successful experiment.

  • If you find that there are in downstream tool errors after using a Custom reference genome in an upstream tool on usegalaxy.org (Main), this is good cause to suspect that there is a reference genome mismatch problem. This generally means that the Custom genome needs to be changed to use ours, or that you need to use the Custom genome for all downstream tools, too.

Quick genome access

  • If your genome is small (bacterial, etc.), using it as a Custom Reference Genome is the quickest way to to get it into Galaxy and to start using it with tools.

  • Obtain a FASTA version, load using using FTP, and use from your history with tools.

Tools on the Main server: Extract DNA

  • ExampleFetch Sequences: Extract Genomic DNA

    • Start by loading the custom reference genome in FASTA format into your history as a dataset, using FTP if the dataset is over 2G in size.

    • Load or create an appropriate Interval, BED, or GFF coordinate dataset into the same history.

    • On the Extract Genomic DNA tool form, you will use the options:

      • "Source for Genomic Data:" as "History"
      • next, for the new menu option "Using reference file", select the fasta dataset of your target genome from your active history

Tools on the Main server: GATK

  • GATK tools are natively indexed for the 1000 Genomes human reference genome "hg_g1k_v37" only
  • To use other genomes, load in fasta format and prepare as a Custom genome/build.
  • Note that GATK requires a specific reference genome sort order. The general guideline is "chr1, chr2, chr3,.... chrX, chrY, chrM" (followed by other partial chromosomes sorted in alphabetical order). Use tools in the group "Text Manipulation", "Convert Formats", and "Sort and Filter" to perform any needed rearrangement.
  • It is best to use the same exact reference genome for all steps, or problems can occur downstream, often requiring the analysis to be started over (from mapping, when the genome was first used).
  • Want to use hg19? The genome is available as a GATK-sorted version under "Data Libraries -> GATK". Import the fasta file into your history, then proceed with using as a custom genome with tools.

Start a Cloudman Server

To launch your own Cloudman Galaxy server, go to: http://launch.usegalaxy.org

Community Q & A


Search all

Still need help not covered by the tool help, the Learning Hub, a Screencast, a Tutorial, or an FAQ?

  • Start with a search in our mailing list archives to see if this question has come up before.

  • If you have a development topic to discuss, your data/tool situation has not come up before, and/or troubleshooting has failed (including at least one re-run, as explained in Error from tools above), post to a list or Galaxy Biostar

Note: If your question is about an error on Main for a job failure, start by reviewing the troubleshooting help for Tool Errors. If data input and the job error message don't resolve the issue, please use the tool error submission form from the red error dataset, instead of starting a public mailing list discussion thread (do not delete error datasets). Read more ...

What to include in a question

  1. Where you are using Galaxy: Main, other public, local, or cloud instance

  2. End-user questions from Test are generally not sent/supported - Test is for breaking

  3. If a local or cloud instance, the distribution or galaxy-central hg pull #
  4. If on Main, date/time the initial and ru-run jobs were executed

  5. If there is an example/issue, exact steps to reproduce
  6. What troubleshooting steps (if a problem is being reported) you have tested out
  7. If on Main, you may be asked for a shared history link. Use Options → Share or Publish, generate the link, and email it directly back off-list. Note the dataset #'s you have questions about.

  8. IMPORTANT: Get the quickest answer for data questions by leaving all of the input and output datasets in the analysis thread in your shared history undeleted until we have written you back. Use Options → Show Deleted Datasets and click dataset links to undelete to recover datasets if necessary

  9. Always reply-all unless sharing a private link

Starting a scientific, data, or tool usage thread

Starting a technical tool, local/cloud instance, or development thread

  • Gather information "What to include in a question" above
  • Send an email to mailto:galaxy-dev@lists.galaxyproject.org

  • Subscribing to the Galaxy Development List is recommended for tool developers and instance administrators

  • Discussion threads are open to the entire community and the Galaxy team to answer
  • Always reply-all unless sharing a private link

Reporting a software bug

Bug or Error from tools? Sometimes it is hard to tell. If you are on the public Main instance, and ran a tool that produced a red error dataset, then you will probably want to start by reporting this as a Tool Error, but add in comments about your suspicious about a bug if there is something odd about the job failure.

If you think you've seen a software bug (not an "Error from tools" ), please report it. More information about how and where can be found at the Galaxy Issue Board.