depcheck: tags and timing

In the previous three blog posts I talked quite a bit about the depcheck test itself. But tests don’t run in a vacuum – there are a lot of moving pieces around the test which have to work properly for the test to provide accurate – and useful – results.

d(repo)/dt

First let’s talk about how builds become updates – that is, the steps between a new package getting built, and a new update appearing in your friendly Software Updater app. Behold the terrifying extent of my Inkscape skills:

New packages get built in Koji, and then when the maintainers are happy with a build, they file an Update Request in Bodhi. Normally they request that the update go into the updates-testing repository, so testers can poke at it for a while and make sure it works, and then the maintainer requests that it go to the official updates repository. And then it shows up in Software Update.

Note that the maintainer makes the request, and that makes Bodhi tag the package as pending – but only members of the Release Engineering (rel-eng) team can actually push packages into updates-testing or updates. So in the end, it’s up to the Release Engineers to decide when a package is ready for release – based on automated test results, feedback from testers and developers, and their own best judgement.

This also means that depcheck can’t move packages into updates or updates-testing. It can mark them as approved – for example, by providing Bodhi feedback, or keeping test results somewhere – but it can’t actually move packages out of the pending list by itself.

Acceptance

This presents a problem. What happens if a pending update is accepted by depcheck, but before rel-eng pushes it into the wild, another update appears in pending and somehow conflicts with the first one?

For example: let’s say the martini package requires gin and vermouth. But then someone introduces a new package – vodka – which Obsoletes: gin. (Yes, I know – this would be wrong and horrible and should obviously be forbidden by Fedora policy and/or the Geneva Convention. But let’s put that aside for now.)

So – what do we do? We could try to go back and revoke our previous test result, which would leave one or both updates unaccepted. We’d need to write a bunch of new code to be able to revoke test results, and that would leave us with less packages being accepted. This seems.. less than desirable.

Another solution – the one that I prefer – is that once a package is accepted, we should leave it accepted. Basically we’d treat it as if it was already part of the live repos. This makes sense: since the goal is always to have a consistent set of packages, once a package is accepted as being consistent we shouldn’t mess with that. (Plus, it’s probably about to be pushed by rel-eng anyway, so it’s not unreasonable to treat it as if it already has been pushed live.) So in practice, this means the first test result takes priority, and we just don’t revoke accepted packages. This makes the code simpler and it should mean we get packages being accepted quicker.

To handle this, we need to be able to split the pending list into two parts: pending and accepted. depcheck treats the packages in accepted like the packages in the live repos, and doesn’t provide test results for them (obviously – they’ve all passed already!). Only the unapproved pending packages actually get test results.

This doesn’t lock the packages or anything like that. It can still be removed from pending by the usual means – obsoleted by the maintainer submitting a newer package, dropped because of bad karma in Bodhi, forcibly removed by rel-eng, etc. Being accepted just means that the package has passed autoqa and is eligible to be pushed by rel-eng – if they see fit.

Timing: more than just the secret to comedy

There’s another wrinkle. The AutoQA system runs all tests independently of each other. This is nice because it means we can run a lot of tests in parallel, but it also means that we can have multiple depcheck tests running at the same time. Which presents a problem: what if there’s two depcheck tests running at the same time, and one test marks some packages as accepted while the other test is still running? What should the other test do?

This is a classic concurrency problem, and there’s a lot of different possible ways to resolve it – usually involving locking or looping or both. We had a few ideas for simple solutions –  for example, we could restart the test if the list of accepted packages changed during the test. Except what if we get stuck in a loop? And also this would change the test results in some cases – so even though it’s the same test and the same packages, the results would be different if Test #1 finishes before Test #2 starts. Why should test timing affect the test results?

After a lot of whiteboard sketching and hand-wringing and test code, we realized that the simplest solution is: just don’t run depcheck tests in parallel. (At least, not tests for the same release – we can still run depchecks for Fedora 13 alongside Fedora 14 tests, since they don’t interact at all.) True, this is less efficient, but the current runtime for a depcheck test is something like 50-60 seconds. During our busiest time ever, we pushed 1300 updates through Bodhi in a month. This works out to 43 updates a day – or somewhere between 35 and 45 minutes of depcheck test time daily, on average. Even if we had a huge burst of updates – say 250 updates submitted simultaneously – and for some reason depcheck takes 10x longer than I expect, rel-eng only pushes updates once a day anyway! So by the time of the first rel-eng push we’d have processed ~144 of the updates, and the rest would be done the next day. So even in a worst-case scenario the outcome is: Less than half of the updates get delayed by one day. That’s it!

In the future we will definitely want to figure out a general strategy for handling tests that want to share information and need some locking/concurrency magic. But this turns out to be unnecessary for depcheck to function correctly and quickly. So we’ll leave it alone. For now.

So what’s left?

Not much! We should be ready to start running depcheck on new updates – in a purely informative manner – in the next couple of days. And once we’re pretty sure the results are right, we’ll start work to make Bodhi only show accepted updates to rel-eng. If it all works as it should, we should able to use this info to keep broken updates from getting into the live repos ever again. Won’t that be nice?

3 thoughts on “depcheck: tags and timing

  1. Thanks for the blogpost, one question:

    Another solution – the one that I prefer – is that once a package is accepted, we should leave it accepted.

    This doesn’t lock the packages or anything like that. It can still be removed from pending by the usual means – [skipped, see below], dropped because of bad karma in Bodhi, forcibly removed by rel-eng, etc.

    But this means even some previously pending-accepted packages may become broken, right? That means that we can’t rely on “once accepted, always accepted”, can we? We have to retest even whole pending-accepted set (and possibly remove some packages from accepted tag) after such event.

    obsoleted by the maintainer submitting a newer package

    This particularly supports my argument. Imagine, that currently we have in pending-accepted these NEW packages:
    gin-1.0 (requires tonic-1.0)
    vodka-1.0 (requires tonic-1.0)
    tonic-1.0

    Now gin maintainer wants to obsolete gin-1.0 and pushes:
    gin-2.0 (requires tonic-2.0)
    tonic-2.0

    That breaks vodka-1.0. We can’t prevent package obsoletion, so our only chance is to retest whole pending set (including pending-accepted, without supposing them already being ok) and claim that vodka-1.0 is broken.

    Conclusion would be: If the -pending-accepted set changes (by some external deus-ex-machina, like releng etc), reset it completely (make it empty).

    Is that right?

  2. But this means even some previously pending-accepted packages may become broken, right?

    Not by maintainer-driven action – any newly-proposed -pending package that causes problems with a live package or an accepted update will be rejected.

    (Keep in mind that depcheck basically does yum --skip-broken upgrade [PENDING..], assuming that all the packages in all the live repos and all the accepted packages are installed on the system. Yum will skip (i.e. reject) anything that interferes with those packages.)

    Now gin maintainer wants to obsolete gin-1.0 and pushes:
    gin-2.0 (requires tonic-2.0)
    tonic-2.0

    Right – these updates should be rejected. First gin-2.0 would be rejected (since tonic-2.0 isn’t present) then tonic-2.0 would be rejected (since it breaks the accepted vodka-1.0 update). The maintainer would need to revoke the vodka-1.0 update to make this work.

    But you’re right that the system is breakable – it’s still possible that we’ll end up with things in an inconsistent state if rel-eng pushes/untags packages without making sure they’re consistent first. So.. hopefully they won’t do that much.

    It’d probably be smart to have a depcheck frontend designed for letting rel-eng check packages before they force-push or force-untag them.

  3. I still see a problem when packages are unpushed – that could happen because of bad karma, maintainer manual request or rel-eng action. If any package in -pending-accepted is unpushed, I believe we have to detect that and invalidate the whole -accepted set.

Leave a comment