Just 15 minutes + questions, we focus on topics about using and developing nf-core
pipelines. These are recorded and made available at https://nf-co.re
, helping to build an archive of training material. Got an idea for a talk? Let us know on the #bytesize
Slack channel!
This week, Evan Floden (@evanfloden) will tell us all about the new Nextflow Tower command line client.
Video transcription
The content has been edited to make it reader-friendly
./tw
. And you can see all the different commands that you have here. Tower itself can interact with all of the different aspects of the system. We have things like actions for automations, organizing collaborators, compute environments, credentials, data sets, as well. The other nice thing you can do here is to write ./tw info
, and this is like a health check. It’s going to connect with the Tower instance. It’s checking that we can connect to the version that we’re using, showing which user I am when I’m authenticated to do this, and checking some details there. This can be quite useful as well for just setting it up and running that for how you go. The way that you set this up is you would typically export an access token here. In this case here, I can got my access token, which I’ve copied from before. I would then paste it and export that there, and then you’re good to go and you’re connected.
There’s a couple of other different options that you want to do when you’re considering the setup. The first one is around which workspace you want to work in. I showed you the community workspace before, but if you have your own workspaces set up for your organization, you can simply change that from an environment variable, in this case, the Tower workspace ID, but you can also change it from the command line when you’re running this. You can just say Tower workspace from that. I’m going to say here and just export one that is essentially the community showcase, and anything I do now, you’ll be able to follow along live. If you log into tower, you’ll be able to see all of these actions and pipelines triggered off and things that were created, followed them as we go through. I’m going to export this Tower workspace ID here. Just going to confirm that everything is working and it’s all fine. I’ve uploaded my credentials, connected to the right workspace, and we’re ready to go.
The first thing you want to consider in the first use case for Tower is the primary use case of what we all do, which is typically run pipelines, how we can trigger those off. And in Tower, we have a special concept of what a pipeline is. It’s essentially almost a reserve name that we’ve created for the system. A pipeline is a combination of the workflow repository. You think of this as you get repository where the source code is hosted, combined with the compute environment, which is really where the execution of that pipeline will take place, plus some parameters, essentially inputs that you want to use, default parameters, but also the parameters that you want to use. Those three things come together for what is called a pipeline.
And if we go into here and we see inside of the Tower showcase, you should see the exact same thing. If I was to say here, pipelines list
, you’ll see that it lists all of the pipelines that we have inside of this community showcase. And this is the nf-core pipelines, but also we have some other ones we’re adding in here all the time. And these are pre-configured ones, which are good to go. Again, it’s a combination of the repos and compute environments and some parameters. If I want to launch one of these pipelines, I’ve got a couple of different options and come a couple of different ways that we can do this. Well, the first thing is to say, we could say tw pipelines
here, and I want to say launch
. And when I launch this pipeline, I can then choose the name that I want to give it, essentially launch that pipeline, or I could choose an ID. Let’s just choose a default one, and we will choose that we want to launch nf-core chipseq. I say this here, and you can see that I have made a mistake here, pipelines launch. I think we have to give it a name here. The typo. No? Of course. Nf pipelines launch. Let me just copy paste my example. Maybe I’ve made a typo here somewhere. Okay. Like that. Okay. Must have made a typo there somewhere. This is the interaction that was taking place, and we’re submitting now that pipeline was saying, launch that pipeline with tower.
It’s important to note why this is different from Nextflow run. The primary difference is that when we’re launching this pipeline, we’re directing Tower to launch it. There’s nothing running now on my local machine where this is launched. There’s no Nextflow instance. There’s no head job or anything. Everything has been delegated to Tower, which in itself will submit that compute into the computer environment where that’s working. That’s got a lot of advantages. Firstly, I can shut down my laptop if I need to run out to go to any work at the end of the day. The other thing is that all of the records, all of this, all of the logs, all of the information about the execution of the pipeline has been managed now by Tower and has a full history of that. That’s a very important thing. You don’t want to be reliant on your laptop or where you’re launching the pipeline to manage that information. It also means it’s been collaborated. If you go into the workspace now and click on this URL, you can follow along live. And this allows people to interactively work together. Maybe there’s an issue that you can fix and then you can fix that and launch that again. And we can work in a much more collaborative manner as well.
That was a basic one where we didn’t actually change any of the parameters. This was a default launch of that. But what if you wanted to start running a pipeline where you’re actually changing things up? Well, you’ve got a couple of different options for doing it. We can define a little bit like the nextflow command line here in terms of the different stuff we can run. Eventually I wanted to say now I want to run the viralrecon pipeline from nf-core. And let’s say that I want to now, instead of just run it by default, I want to use an export profile. And this profile may be some test data that you’ve got. Maybe it’s a profile because of something specific about what you want to find there. And I can just use the exact same profiles here to launch that as well. That’s going to trigger off there. It’s going to be launched into the community workspace and doing as well.
Another more common use case is around not so much the changing of a profile, but changing of the input data. What you can do here is what we call the params file. You can create a params file in YAML or JSON. Here you can see I’ve defined some input data. I’ve got some different options for those inputs. And here I’ve saved this into this file called params.yaml. And I could then go say, okay, let’s go launch the tw launch nf-core-rnaseq
pipeline. And now I’ve got the choice here of defining some profiles, but I could also add in now the params file itself. Let’s just say a profile for test. And I also want to put in here the params file, exactly like we would with Nextflow. And this point here, I can just specify exactly what the location of that params file is. Obviously, this allows you to predefine a lot of stuff, maybe you’re working off sample sheets that you’ve got predefined there, and they can trigger off that as well. Okay. This is my params file. I think I’ve probably made some typos in this. Much better copy-pasting in the live demos than trying to do this. We can prove it works. Okay. We saw that, and then we can launch that. This is the basic use case of launching those pipelines. You can, of course, monitor them. You can follow them along. Maybe you want to do that from the GUI. Maybe you want to do it from the command line. There’s a whole bunch of different endpoints there.
I want to switch gears a little bit now and think about how we can define the infrastructure around this. So far, we’ve just launched those pipelines and we’ve been able to monitor them. What about if I wanted to now do this, whether I want to set up pipelines for other people or define my research environment in a more generalized way? I’m just going to quickly change over to a different workspace here. I don’t want to build the stuff in the community workspace. I don’t want to populate it with different things. I’m first going to change over to the workspace here and then I’ll look at the different pipelines that I have inside this workspace. This is a private one that you can’t see, but its principles are exactly the same here. I’m just going to say tw pipelines list
. It’s going to show you all of the pipelines in that space. And you can see that just a whole bunch of stuff that we’ve been populating inside of here. You can see the repository it associates with. I see a lot of the nf-core stuff and then the name of the pipeline.
What I want to do now is imagine that I wanted to say, take this and I had a test version of this or development version and say I wanted to copy that or I wanted to give that to you. You could capture the whole thing in your environment. Or put in another way, I wanted to define exactly what that pipeline is made up of beyond just the workflow itself. One way that I can do that with Tower is to take the pipelines command here and then export the particular pipeline itself. The way that this works is when I give it a name. Let’s just take the first one from the top. I’ll copy paste this time as opposed to trusting myself too much. When I export this, you’ll see that it’s exported entirely as this JSON file here. This has got a fantastic functionality because it means I can import and export things using this command and really define all of my pipelines as code. These pipelines may have different configurations, different setups for different environments. You can define that now entirely inside of the JSON file, but also you have a pretty nice way to interact with it in this regard. That means I could go create a new pipeline, maybe change one or two things out and all of that infrastructure is shown there as well. This principle of importing and exporting, defining, and having this as a stored location works for all of the resources. I’m showing you pipelines here, but the same thing applies to if I was going to show you, for example, the credentials. I want a list inside Tower of all my credentials that are inside of this workspace. And you can see I’ve got credentials here for Google, for GitHub, for Azure, et cetera. All of that becomes available for me to see.
Then I can think about how I can link that into actual computer environments of generating this stuff on the fly. If I wanted to, for example, export my compute environment here, which is going to be where the compute takes place. It’s going to be my AWS batch. In one example, let’s say there’s some credentials for that, how that’s set up. And you can just have a look at what one of those looks like. This export here, the credentials, you can see that this is the whole definition of that, that is required. This is running on EUS3, et cetera, all of the working directory for that and how that sets up. There’s obviously a whole bunch of these things that you can run. Just to give you a full view and probably makes a little bit more sense now, all of the commands that we’ve seen here.
If we do talk on here just by itself, you can start to interact with creating the workspaces themselves as well, participants generating the pipelines, creating data sets. And I just will point out one more thing to give you some inspiration on what’s possible here, is that we recently released a blog post around this, which really took many of these ideas and put them all into play. The concept was that we wanted people to be able to essentially drop a sample sheet or a sample comes off a sequencer. As part of that, it will trigger the execution of a pipeline all the way through. And the Tower CLI is obviously perfect for this. We wanted to set this up on the first case on AWS itself. The way that we did this means we defined some Lambda function that essentially takes a sample sheet which gets generated when data enters to an S3 bucket. Then it uses the Tower CLI to generate a data set, deposit that into Tower, and then to invoke the job with Tower itself. It allows you to integrate with many different services for doing so and then create this whole. This is a a walkthrough if you’re really interested in seeing how this can be done. We provide all the files. And there’s also a Git repository here if you want to go through and follow this up yourself. But I’ll end with that.
We’re really excited to see what people build with this. We are expanding this as well as we add more functionality to Tower, keeping everything really aligned in that respect. And really excited just to see what people build with it.
(host) Thank you, Evan. That was a really interesting talk on Nextflow Tower. I don’t know if there’s anyone who has any questions to follow up on this one.
(question) Maybe I can start the questions off. I haven’t used Nextflow Tower before, but are there any more supplementary dependencies you need on your local machine when you’re using Tower?(answer) When you’re set up with Tower, there’s a couple of ways to do it. One of them is just writing with Tower from your Nextflow command. You can say nextflow run -with_tower
. This is still running it with the head job still on your laptop. To connect externally, you should log into Tower Cloud, and then you can essentially create your computer environment like that. They’re two different ways of working. And to really get the full power of this, you should log in. As I say, it’s running as a service in there. And to provide full clarity around this, this is Tower Cloud, which you can go and log in and use. And our business model is primarily around deploying this in customer’s own environment. Customers have their own version of Tower. This is the public one I’m showing you, which you’re free to use.
(answer) We can think of a couple of use cases. I can jump in there and maybe demo a little bit. One thing in terms of HPC, I’m showing you batch in AWS batch and different cloud environments here. If we go down to that same workspace we were working before, these are the same computer environments that I showed you. And you can connect in here with all the different platforms. You connect in with your own, in this case, PBS or your own Slurm cluster that you have here. And this organizes the infrastructure side of it. You can connect those bits into there as well for that part. There’s three primary use cases around the use of it. If you are creating pipelines and you want to make them available for anyone who maybe doesn’t even have Nextflow expertise or command line expertise, experimentalists, et cetera, you can create your pipeline and define it in a way which makes it super easy for them to come in. Here you can create your own customized user interfaces. As a user, I just need to select my input data. I would come in and I say, I don’t want to run my RNA-Seq sample sheet against that. I want to go have some options around this and maybe I want to save some particular files. And then I can trigger the execution of that job often. It simplifies the whole launch process for them. As a bioinformatician, you probably want to create those pipelines and make them available with compute to your users to do that. Maybe you want a long-running service. You don’t want to be relying on that. You have a full history of your execution so you can follow those pipelines as they’re going through as well. And you can also automate things as well. From the system admin side, you have the compute environments which can be defined and not really a lot of work around the collaboration side of it. There’s a whole bunch of use cases there.
(host) Okay. Apparently everyone else was appreciating your talk. I don’t know if anyone else who has a question. Okay. Seems like everyone else is satisfied. Thank you so much, Evan, for the talk. And I’m sure people can catch you on Slack if they have any questions.(speaker) Absolutely. Thanks so much for the time, everyone. And yeah, reach out if you have any other questions. Always happy to take them.
(host) Sure. Thanks, everyone, for joining today’s bytesize talk. See you next week.
(speaker) Thanks so much, folks.