In the previous three blog posts I talked quite a bit about the
depcheck test itself. But tests don’t run in a vacuum – there are a lot of moving pieces around the test which have to work properly for the test to provide accurate – and useful – results.
First let’s talk about how builds become updates – that is, the steps between a new package getting built, and a new update appearing in your friendly Software Updater app. Behold the terrifying extent of my Inkscape skills:
New packages get built in Koji, and then when the maintainers are happy with a build, they file an Update Request in Bodhi. Normally they request that the update go into the
updates-testing repository, so testers can poke at it for a while and make sure it works, and then the maintainer requests that it go to the official
updates repository. And then it shows up in Software Update.
Note that the maintainer makes the request, and that makes Bodhi tag the package as
pending – but only members of the Release Engineering (rel-eng) team can actually push packages into
updates. So in the end, it’s up to the Release Engineers to decide when a package is ready for release – based on automated test results, feedback from testers and developers, and their own best judgement.
This also means that
depcheck can’t move packages into
updates-testing. It can mark them as approved – for example, by providing Bodhi feedback, or keeping test results somewhere – but it can’t actually move packages out of the
pending list by itself.
This presents a problem. What happens if a pending update is accepted by
depcheck, but before rel-eng pushes it into the wild, another update appears in
pending and somehow conflicts with the first one?
For example: let’s say the
martini package requires
vermouth. But then someone introduces a new package –
vodka – which
Obsoletes: gin. (Yes, I know – this would be wrong and horrible and should obviously be forbidden by Fedora policy and/or the Geneva Convention. But let’s put that aside for now.)
So – what do we do? We could try to go back and revoke our previous test result, which would leave one or both updates unaccepted. We’d need to write a bunch of new code to be able to revoke test results, and that would leave us with less packages being accepted. This seems.. less than desirable.
Another solution – the one that I prefer – is that once a package is accepted, we should leave it accepted. Basically we’d treat it as if it was already part of the live repos. This makes sense: since the goal is always to have a consistent set of packages, once a package is accepted as being consistent we shouldn’t mess with that. (Plus, it’s probably about to be pushed by rel-eng anyway, so it’s not unreasonable to treat it as if it already has been pushed live.) So in practice, this means the first test result takes priority, and we just don’t revoke accepted packages. This makes the code simpler and it should mean we get packages being accepted quicker.
To handle this, we need to be able to split the
pending list into two parts:
depcheck treats the packages in
accepted like the packages in the live repos, and doesn’t provide test results for them (obviously – they’ve all passed already!). Only the unapproved
pending packages actually get test results.
This doesn’t lock the packages or anything like that. It can still be removed from
pending by the usual means – obsoleted by the maintainer submitting a newer package, dropped because of bad karma in Bodhi, forcibly removed by rel-eng, etc. Being accepted just means that the package has passed autoqa and is eligible to be pushed by rel-eng – if they see fit.
Timing: more than just the secret to comedy
There’s another wrinkle. The AutoQA system runs all tests independently of each other. This is nice because it means we can run a lot of tests in parallel, but it also means that we can have multiple depcheck tests running at the same time. Which presents a problem: what if there’s two
depcheck tests running at the same time, and one test marks some packages as accepted while the other test is still running? What should the other test do?
This is a classic concurrency problem, and there’s a lot of different possible ways to resolve it – usually involving locking or looping or both. We had a few ideas for simple solutions – for example, we could restart the test if the list of accepted packages changed during the test. Except what if we get stuck in a loop? And also this would change the test results in some cases – so even though it’s the same test and the same packages, the results would be different if Test #1 finishes before Test #2 starts. Why should test timing affect the test results?
After a lot of whiteboard sketching and hand-wringing and test code, we realized that the simplest solution is: just don’t run
depcheck tests in parallel. (At least, not tests for the same release – we can still run depchecks for Fedora 13 alongside Fedora 14 tests, since they don’t interact at all.) True, this is less efficient, but the current runtime for a
depcheck test is something like 50-60 seconds. During our busiest time ever, we pushed 1300 updates through Bodhi in a month. This works out to 43 updates a day – or somewhere between 35 and 45 minutes of depcheck test time daily, on average. Even if we had a huge burst of updates – say 250 updates submitted simultaneously – and for some reason depcheck takes 10x longer than I expect, rel-eng only pushes updates once a day anyway! So by the time of the first rel-eng push we’d have processed ~144 of the updates, and the rest would be done the next day. So even in a worst-case scenario the outcome is: Less than half of the updates get delayed by one day. That’s it!
In the future we will definitely want to figure out a general strategy for handling tests that want to share information and need some locking/concurrency magic. But this turns out to be unnecessary for
depcheck to function correctly and quickly. So we’ll leave it alone. For now.
So what’s left?
Not much! We should be ready to start running
depcheck on new updates – in a purely informative manner – in the next couple of days. And once we’re pretty sure the results are right, we’ll start work to make Bodhi only show accepted updates to rel-eng. If it all works as it should, we should able to use this info to keep broken updates from getting into the live repos ever again. Won’t that be nice?