Support for Ansible Tasks
AcceptedPublic

Authored by mkrizek on Mon, May 15, 4:49 PM.

Details

Summary

Request for comments. There are rough edges. All of this is subject to change and needs polishing. There are
still parts of codebase that will be removed or re-written.
As with disposable clients, I'd like to create a new side branch for work on ansible tasks.

This patch removes libtaskotron formula format for writing tasks and
introduces support for running tasks in form of an ansible playbook.

The following parts are removed:

  • task formula handling code
  • remote execution (Paramiko Wrapper)
  • concept of overlord and minion
  • the exitcode directive (hasn't been really used)

The code expects 'force_imageurl: True" and "imageurl" filled in
taskotron.yaml and image to have python2 and whatever ansible requires
to run.

When running locally, user is asked for sudo password since tasks are ran
as root.

https://fedoraproject.org/wiki/Changes/InvokingTestsAnsible

Test Plan

$ git clone https://github.com/stefwalter/gzip-dist-git
$ runtask -d -i gzip -t koji_build -a x86_64 gzip-dist-git/tests/test_rpm.yml
$ runtask -d -i gzip -t koji_build -a x86_64 --ssh $IP_ADDRESS --ssh-privkey conf/id_taskotron gzip-dist-git/tests/test_rpm.yml
$ runtask -d -i gzip -t koji_build -a x86_64 --libvirt --ssh-privkey conf/id_taskotron gzip-dist-git/tests/test_rpm.yml

Diff Detail

Repository
rLTRN libtaskotron
Branch
feature/ansiblize (branched from develop)
Lint
Lint ErrorsExcuse: WIP
SeverityLocationCodeMessage
Errortesting/functest_main.py:6F401flake8 1
Errortesting/functest_main.py:7F401flake8 1
Errortesting/functest_main.py:8F401flake8 1
Errortesting/functest_main.py:10F401flake8 1
Errortesting/functest_main.py:11F401flake8 1
Errortesting/functest_main.py:12F401flake8 1
Unit
Unit Tests OK
Build Status
Buildable 1150
Build 1150: arc lint + arc unit
mkrizek created this revision.Mon, May 15, 4:49 PM

Looks good as for a WIP to me. My only real concern here is the broad deletion of all the minion-related code - this might be because I'm not familiar with the design details, but WRT the disposable "minion" - what is the plan for choosing the "right" image for the task (aka using F24 to test F24 packages) that we had in the "previoius" code? Are we dropping the feature alltogether, or is there another process being planned to do that? If so, what is it?

libtaskotron/executor.py
59

Is dropping the environment "discovery" based on input data (aka item + type) intentional? Are we just going to be using the config defaults from now on, thus dropping the "test F24 package on F24" functionality that we had? I know the patch notes say "you need to have the right image in config", but this seems more like a temp devel solution, than the expected outcome to me. Or do you have an other method to do this in mind, which was just not implemented yet?

Looks good as for a WIP to me. My only real concern here is the broad deletion of all the minion-related code - this might be because I'm not familiar with the design details, but WRT the disposable "minion" - what is the plan for choosing the "right" image for the task (aka using F24 to test F24 packages) that we had in the "previoius" code? Are we dropping the feature alltogether, or is there another process being planned to do that? If so, what is it?

I forgot to mention this, sorry. The plan, at least short-term, is to somewhat re-use the existing code (although you're right that it's deleted in the patch). Speaking of images, according to @tflink, releng are supposed to be building the images for us \o/

Thanks for pointing that out.

kparal added a subscriber: kparal.Tue, May 16, 11:32 AM

So, first things first - I think there should be almost no deletions in this code. Our old system (formula and minion-based) will have to co-exist with the new system (ansible based) for some time, until we convert all our existing tasks to the new one, and iron out all issues. We need to keep compatibility for some time, this is probably a too big jump to switch everything over at the same time. I'd say keep all the existing code in place, just add new code to make ansible-based workflow work. Once both systems are deployed and working and once we covert all tasks to the new system, then we can drop the old code and refactor all the modules. This approach will also help in having the ansible feature branch much simpler, easier to compare and easier to rebase against develop.

One exception, the exitcode directive is truly not being used, please post a separate patch and we can drop it immediately also in develop.

Regarding ansible workflow - I'm a bit confused, so please bear with me :-) The spec says only local execution is supported, so our runner has to be running on the very same machine as the test itself (i.e. on the minion, in our terms). The code below implements our usual 3 modes - local, remote, and disposable. That's fine, it's no problem to go above the spec, to help development, etc. But since the local mode is the mandatory one (and the one being used in production), I'm not clear why we're removing so much of our existing "remote exec" and "minion creation" code. Because if we want to execute the task locally, we still need to:
a) create the minion VM. That's true until we have openstack or some other magical solution that provisions VMs for us.
b) SSH into the machine (paramiko). We can't run the task (i.e. execute ansible) from the VM host, that's against the spec. (Of course we can suggest that to be changed).
c) checkout the task (from distgit, or other location for generic tasks)
d) copy the results back to VM host (again paramiko and remote_exec methods), or upload them directly to buildbot/jenkins/etc

So, my idea was that we simply execute ansible instead of parsing formula and running directives directly, but everything else stays the same (minion spawning, files being copied in and out, etc). And once we have openstack, infra-provided images, jenkins instead of buildbot, etc, we replace those parts - but those parts are not tightly related to the ansible-based execution.

Am I talking nonsense here? Please correct me. Thanks.

conf/taskotron.yaml.example
112–113 ↗(On Diff #3032)

So where do we check out the distgit tests/ folder (or generic tasks) now?

libtaskotron/executor.py
143–149

IIUIC, you're running the task on a remote machine, just naming it localhost, right? In that case I don't think it adheres to the proposal anyway. The proposal doesn't allow any other execution than local.

So what we need to have is the ability to execute ansible on a local machine. But that doesn't mean taskotron can't have extra functionality, e.g. for easy development. In production, we'll execute everything locally, but in development, there's nothing stopping us from also supporting remote ansible execution.

As a consequence, there's no need to put remote machines until a fake "localhost" name into inventory. We don't need to hide the fact. It just must not be the default approach in production.

Am I making sense? Am I confused instead?

174

This should not work (does it?). It should be

'-u', 'root',
kparal added inline comments.Tue, May 16, 1:26 PM
libtaskotron/executor.py
183

Please use long options in scripts, they make it more readable without having to search in a man page.

In D1195#22176, @kparal wrote:

So, first things first - I think there should be almost no deletions in this code. Our old system (formula and minion-based) will have to co-exist with the new system (ansible based) for some time, until we convert all our existing tasks to the new one, and iron out all issues. We need to keep compatibility for some time, this is probably a too big jump to switch everything over at the same time. I'd say keep all the existing code in place, just add new code to make ansible-based workflow work. Once both systems are deployed and working and once we covert all tasks to the new system, then we can drop the old code and refactor all the modules. This approach will also help in having the ansible feature branch much simpler, easier to compare and easier to rebase against develop.

I don't think it's worth the effort to keep both formats supported. The remote related code needs to die in a fire. There really are not that many tasks using the current formula that we don't maintain. Let's not keep the code more complex than it has to be. Also, the remote related code needs to die in a fire. My idea was, that until we port the tasks to the new format, we keep running the current production and have the new runner in dev.

Did I mention that the remote related code needs to die in a fire?

One exception, the exitcode directive is truly not being used, please post a separate patch and we can drop it immediately also in develop.

Regarding ansible workflow - I'm a bit confused, so please bear with me :-) The spec says only local execution is supported, so our runner has to be running on the very same machine as the test itself (i.e. on the minion, in our terms). The code below implements our usual 3 modes - local, remote, and disposable. That's fine, it's no problem to go above the spec, to help development, etc. But since the local mode is the mandatory one (and the one being used in production), I'm not clear why we're removing so much of our existing "remote exec" and "minion creation" code. Because if we want to execute the task locally, we still need to:
a) create the minion VM. That's true until we have openstack or some other magical solution that provisions VMs for us.
b) SSH into the machine (paramiko). We can't run the task (i.e. execute ansible) from the VM host, that's against the spec. (Of course we can suggest that to be changed).
c) checkout the task (from distgit, or other location for generic tasks)
d) copy the results back to VM host (again paramiko and remote_exec methods), or upload them directly to buildbot/jenkins/etc

So, my idea was that we simply execute ansible instead of parsing formula and running directives directly, but everything else stays the same (minion spawning, files being copied in and out, etc). And once we have openstack, infra-provided images, jenkins instead of buildbot, etc, we replace those parts - but those parts are not tightly related to the ansible-based execution.

Am I talking nonsense here? Please correct me. Thanks.

That's correct. Though I am still hoping the 'local' requirement will go away and we don't have to keep the "inner runner". So currently, yes it does not respect the proposal in this aspect. I am hoping this will get figured out soon.

conf/taskotron.yaml.example
112–113 ↗(On Diff #3032)

Ansible handles that.

libtaskotron/executor.py
174

Thanks.

183

Yep, will do, thanks.

mkrizek updated this revision to Diff 3034.Tue, May 16, 4:00 PM

Address issues in the review. Fix tests and lint errors.

In D1195#22176, @kparal wrote:

So, first things first - I think there should be almost no deletions in this code. Our old system (formula and minion-based) will have to co-exist with the new system (ansible based) for some time, until we convert all our existing tasks to the new one, and iron out all issues. We need to keep compatibility for some time, this is probably a too big jump to switch everything over at the same time. I'd say keep all the existing code in place, just add new code to make ansible-based workflow work. Once both systems are deployed and working and once we covert all tasks to the new system, then we can drop the old code and refactor all the modules. This approach will also help in having the ansible feature branch much simpler, easier to compare and easier to rebase against develop.

I don't think it's worth the effort to keep both formats supported. The remote related code needs to die in a fire. There really are not that many tasks using the current formula that we don't maintain. Let's not keep the code more complex than it has to be. Also, the remote related code needs to die in a fire. My idea was, that until we port the tasks to the new format, we keep running the current production and have the new runner in dev.

I'm of the same mind - we don't have that many tasks using the "old" formulas in the wild that we don't already maintain. I think it'll be far less effort to just do the porting work that it'd be to support both ansible and formulas at the same time.

Did I mention that the remote related code needs to die in a fire?

I'm not sad to see it go but I've seen far fuglier code :)

tflink added a subscriber: merlinm.Wed, May 17, 5:07 AM
In D1195#22176, @kparal wrote:

Regarding ansible workflow - I'm a bit confused, so please bear with me :-)

I think that confusion abounds in this area, to be honest :)

The spec says only local execution is supported, so our runner has to be running on the very same machine as the test itself (i.e. on the minion, in our terms). The code below implements our usual 3 modes - local, remote, and disposable. That's fine, it's no problem to go above the spec, to help development, etc. But since the local mode is the mandatory one (and the one being used in production), I'm not clear why we're removing so much of our existing "remote exec" and "minion creation" code. Because if we want to execute the task locally, we still need to:
a) create the minion VM. That's true until we have openstack or some other magical solution that provisions VMs for us.
b) SSH into the machine (paramiko). We can't run the task (i.e. execute ansible) from the VM host, that's against the spec. (Of course we can suggest that to be changed).
c) checkout the task (from distgit, or other location for generic tasks)
d) copy the results back to VM host (again paramiko and remote_exec methods), or upload them directly to buildbot/jenkins/etc

So, my idea was that we simply execute ansible instead of parsing formula and running directives directly, but everything else stays the same (minion spawning, files being copied in and out, etc). And once we have openstack, infra-provided images, jenkins instead of buildbot, etc, we replace those parts - but those parts are not tightly related to the ansible-based execution.

Am I talking nonsense here? Please correct me. Thanks.

This makes sense to me and tracks what I've understood. I still don't understand why the spec was written that way but in its current form, the idea is to spawn a VM and let the test do any spawning work that it needs to do. I'm not sure this is the best way to do multi-host testing and I know that there's a usecase @merlinm is working on that doesn't cleanly fit into what's currently in the wiki. Are there other usecases that won't fit into that system well?

Are there other cases that

In D1195#22199, @tflink wrote:

I don't think it's worth the effort to keep both formats supported. The remote related code needs to die in a fire. There really are not that many tasks using the current formula that we don't maintain. Let's not keep the code more complex than it has to be. Also, the remote related code needs to die in a fire. My idea was, that until we port the tasks to the new format, we keep running the current production and have the new runner in dev.

I'm of the same mind - we don't have that many tasks using the "old" formulas in the wild that we don't already maintain. I think it'll be far less effort to just do the porting work that it'd be to support both ansible and formulas at the same time.

So, we'll aim for deploying taskotron-ansible onto dev server and also push all tasks converted to ansible format to dev branch, at the exactly same time, correct? If so, how long do we suppose to be in this state, until we update production? Because if we suppose a week or two, that's probably fine. It if should be a month, or two, or three, it's going to be a problem for task developers who use dev branch to test new code (as they rightly should) before pushing to master. (Well, I know of one such person active at the moment, python-versions' maintainer). Of course we can tell them that they need to push directly to production during this phase.

kparal added inline comments.Wed, May 17, 11:05 AM
libtaskotron.spec
36

This can be removed.

In D1195#22202, @kparal wrote:
In D1195#22199, @tflink wrote:

I don't think it's worth the effort to keep both formats supported. The remote related code needs to die in a fire. There really are not that many tasks using the current formula that we don't maintain. Let's not keep the code more complex than it has to be. Also, the remote related code needs to die in a fire. My idea was, that until we port the tasks to the new format, we keep running the current production and have the new runner in dev.

I'm of the same mind - we don't have that many tasks using the "old" formulas in the wild that we don't already maintain. I think it'll be far less effort to just do the porting work that it'd be to support both ansible and formulas at the same time.

So, we'll aim for deploying taskotron-ansible onto dev server and also push all tasks converted to ansible format to dev branch, at the exactly same time, correct? If so, how long do we suppose to be in this state, until we update production? Because if we suppose a week or two, that's probably fine. It if should be a month, or two, or three, it's going to be a problem for task developers who use dev branch to test new code (as they rightly should) before pushing to master. (Well, I know of one such person active at the moment, python-versions' maintainer). Of course we can tell them that they need to push directly to production during this phase.

That sounds about right.

Given how fast everything is moving right now, I don't think we can afford to wait a month while porting everything. That will need to be done as quickly as we can manage it.

This is probably going to be a good time to move our tasks from bitbucket to pagure as well.

mkrizek updated this revision to Diff 3050.Mon, May 22, 10:58 AM
  • inner runner :(
tflink accepted this revision.Tue, May 23, 4:44 AM

I'm not a huge fan of keeping a playbook in the root of the repo like that but I'm not coming up with any brilliant alternatives at the moment.

Otherwise, it looks good to me as a WIP.

libtaskotron.spec
172

Please add a ref to this Diff

This revision is now accepted and ready to land.Tue, May 23, 4:44 AM
jskladan accepted this revision.Tue, May 23, 8:56 AM
  • inner runner :(

Yo dawg, we heard you like playbooks. So we put a playbook, in yo playbook, so you can ansible while ansibling already!

Looks good to me. I expect we'll hit plenty of things we did not think of, along the way, but this is IMO perfectly good starting point.