Just 15 minutes + questions, we focus on topics about using and developing nf-core
pipelines. These are recorded and made available at https://nf-co.re
, helping to build an archive of training material. Got an idea for a talk? Let us know on the #bytesize
Slack channel!
This week, Jasmin Frangenberg (@jasmezz) is going to introduce nf-core/funcscan. nf-core/funcscan is a bioinformatics best-practice analysis pipeline for the screening of functional components of nucleotide sequences such as assembled contigs. This includes mining for antimicrobial peptides, antibiotic resistance genes and biosynthetic gene clusters.
Video transcription
The content has been edited to make it reader-friendly
nextflow run nf-core/funcscan
. You give your input sample sheet, give your output directory. This is a minimal example of a pipeline run. Of course, it is recommended to use more parameters. One of them would be in the annotation step, the flag --annotation_tool
, where you can decide which tool you want to use. They have different properties. For example, prodigal is very fast, however, we noticed that with prokka we get better downstream results. It depends on your needs and ideas, which tool you would like to choose. The default is prokka.
After the annotation step, we come to the actual identification of the compounds. You can activate each workflow with this flag --run_amp_screening
, for example, for the AMPs. And by activating this, all the AMP tools are run on your data. You can also choose, for any reason, to deactivate any of the tools. You can switch them off with the flag --amp_skip
and then the name of the tool. This might be because some tools might be very slow or you think they are so specific that you are not interested in the output. As I said, for whichever reason, you can switch them off. This is the same for the antibiotic resistance workflow. You can apply this flag, it runs all the four or five tools on your data and you can skip any tool with the --arg_skip
flag. Same applies for BGC identification. You have the flag, all the tools are run, you can skip whichever you might want to skip. Of course, you can use not only one of the flags per run, but all three flags at the same time. Your data is investigated simultaneously and parallelized as much as possible with Nextflow. Okay, so these are the identification steps.
Now we come to the summary steps for each workflow. Let’s start with the antibiotic resistance, which is done by hAMRonization, which is a tool that is already out there. Here you can see the GitHub link. This tool can actually summarize a bunch of outputs of resistance identification tools. Our pipeline currently includes the orange tagged ones. The output of those tools is then summarized into a standardized gene report. This is how it looks. It’s a table with a lot of columns. You have here the sample IDs, then the genes that have been identified, some information about the databases, which tools were run, and so on. These are actually all the column headers that are very conclusive and you can use this output table for downstream analysis in R or any statistics program.
This is very similar to AMPcombi, which we developed ourselves, Anan and Louisa developed this, where you also have your sample IDs and then some information about probability of AMPs. Additional feature is that it not only identifies your antimicrobial peptides, but it also does some back aligning to a reference database to identify taxonomic classification. It also infers some chemical properties like stereochemistry and provides the publication so you can go back and read more about the compound identified. The last tool for the BGC workflow is comBGC. Similar fashion, we have the sample IDs, the tools which have been applied, and then more information about your candidate biosynthetic gene clusters. With this, you see that we have a scalable workflow now to identify these compounds, which are important for a couple of research fields for, as I said, drug development, antibiotic research and so on.
Since the pipeline is almost ready, it’s probably going to be released next week. Let’s see about it. We have at least added all the modules and subworkflows. We do some more testing and then the pull request will go out. I can already advertise if there is someone here in the chat, who would like to review, please feel free to reach out to us on Slack. In the future, we would like to include more screening modules and to also have a visual summary of the output, which would be a graphical dashboard, probably with a Shiny app. Let’s see about that.
With that, I would like to introduce the development team, which is James, Louisa, Anan, Moritz and me. Of course, we got a lot of help from the nf-core community, which was always assisting, very nice community. Also I would like to emphasize some colleagues here at my institute, which helped with biological and biochemistry knowledge. My supervisor, Pierre Stallforth from the Leibniz HKI. With this, I would like to close and lead you to our repository and the documentation of the pipeline. If you want to interact with us, feel free to join us on Slack and otherwise I’m open for questions either now or later on Slack. Back to you, Franziska.
(host) Thank you very much. Very interesting. Anyone can now unmute themselves if they have any questions, they can also post questions in the chat and then I will read them out. Are there any questions from the audience? Otherwise I actually have a question.
(question) You have shown a minimal command that you can run, that doesn’t actually specify the workflow that it’s using. Is that going to use all three workflows or a specific one, a default?
(answer) This one you mean? Exactly. In the default we have specified none. This would actually run only the annotation, which is probably not very useful for you. This is the current state of the settings. Maybe we will change this later. I don’t know.
(question) Right. Would it make sense to run all three workflows at the same time or is that different kinds of samples?(answer) No, no, that’s what it’s designed for, to run efficiently on all three workflows. It depends on your interest: If you are not interested in the resistance genes, then of course you don’t need to run it, but it’s very efficient to use this also.
(host) Thank you. Are there any more questions at this moment in time? Otherwise, I thank you again. It was a very nice talk. Of course I would also like to thank the Chan Zuckerberg Initiative for funding our bytesize talks and our audience for listening to the talk. I hope to see everyone next week. Thank you very much. Bye.