fedup: a little background

A short history of upgrade tools

In the beginning, the Installer was all. If you wanted to upgrade a Red Hat / Fedora system you downloaded the CD images, burned them all to CDs, and sat around swapping disks in and out until your system was upgraded. What fun!

Five years ago I was a QA guy, doing lots and lots of upgrade testing like this. Eventually, after burning a couple thousand CDs, I had this idea: hey, it’d be cool if you could upgrade your system by running some command that would set everything up for you and do the upgrade, without having to download all these ISOs and burn them to CDs. A small pile of gross hacks later, it turned out it was actually possible… and thus PreUpgrade was born.

Fast-forward to last year: I’m now actually a developer on the Installer team. We’re in the middle of basically completely rewriting everything and we keep bumping into weirdo corner cases for Upgrades. And we had this idea: hey, it’d be cool if someone rewrote PreUpgrade so it wasn’t a pile of gross hacks, and instead it was just a separate thing that did upgrades. And then we wouldn’t need to deal with a lot of special if upgrade: ... cases all over the place. Hooray!

And so it was decided. So the new installer doesn’t handle upgrades. At all. Something new is required. And that something is..

Fedora Upgrader – “fedup” for short

The premise is simple: There’s a frontend bit that sets up your system for the upgrade (downloads packages, sets up kernel & initrd, modifies bootloader entries, etc.). That’s fedup. And then there’s the “upgrader image” – the thing that fedup downloads to actually run the upgrade. That’s fedup-dracut. As the name implies, it’s just a regular dracut-built initramfs, with some extra bits and pieces thrown in.

The process works something like this:

fedup

  1. downloads packages and kernel + upgrade.img from the new release
  2. sets up some directories and files so fedup-dracut knows what to do
  3. sets up the bootloader config so it will boot the new kernel + upgrade.img
  4. reboots

fedup-dracut

  1. starts the system and finds your root partition (like normal boot)
  2. enters your old system and sets up the disks (like normal boot)
  3. systemd trickery: If the files that fedup set up are present, we return to fedup-dracut and run special dracut hooks:
    1. upgrade-pre: for pre-upgrade migration tasks (nothing here yet)
    2. upgrade: for the actual upgrade (via system-upgrade-fedora)
    3. upgrade-post: for cleanup etc. (we save the journal to /var/log/upgrade.journal, and write /var/log/upgrade.log for those of you who don’t like the journal)
  4. reboots back to your newly-upgraded system

Part of the goal was to make the process distro-neutral. Other than system-upgrade-fedora, none of this is really Fedora-specific. The upgrade framework itself (the dracut hooks and the systemd trickery) should apply to any distro using systemd and dracut.

The details are a bit vague here because the design isn’t finalized. The current (working prototype) design involves bind-mounts and doing systemctl switch-root twice, which has caused us a bunch of problems. We’ve got some workarounds for this for Fedora 18 Beta but the design will likely change a little between now and Final.

Advertisements

FUDCon Blacksburg

So. FUDCon! You’re all probably tired of hearing about it already so I’ll spare you the personal details and get straight to the interesting stuff.

Installer Team North America Jamboree

New UI

Progress on the new UI continues unabated, but it’s just not gonna be done for F17. New target: F18. Woo!

Installer image resplit

My work to resplit the installer in two stages (and kill off the ancient crusty “loader” initrd environment in favor of dracut) has already partially landed in Rawhide, and we decided to commit to getting it finished for F17 if at all possible. This will mean:

  • Greatly reduced installer memory use! Definitely under 512MB for media-based installs. Possibly under 256MB under optimum conditions. We’ll be doing tests and trying to find ways to reduce this further but that’s still a damn sight better than F15/F16.
  • Lots of 10-year-old code gets deleted or rewritten and moved upstream! Hooray for sharing code and maintenance burdens!
  • The installer should be able to mount anything that your normal system can mount, since we’re using dracut just like your normal system does. Which leads us to..

In the glorious future of F18 there is only preupgrade

(update Nov 13 2012: fedup is totally a real thing now)

Upgrades in F18 will be handled using a totally separate codepath and runtime image, and all upgrades will be handled Preupgrade-style (i.e. starting the upgrade by running a program on your existing system).

You’ll still be able to run an upgrade from media if you want – the upgrader will be right there on the DVD/CD for you to run. But the installer won’t try to find any existing copies of Fedora to upgrade. If you want to upgrade, run the upgrader. If you want to do a fresh install, run the installer.

If you’re suddenly gasping for breath and thinking “OH NO MY /BOOT PARTITION IS TOO SMALL”: relax, friend. If we’re using the normal dracut initramfs to start the installer and upgrader, that means the upgrader will know exactly how to mount your root filesystem – even if it’s some magical encrypted-RAID10-double-LVM-with-cherries-on-top craziness. So we can store the upgrader runtime and packages basically wherever you have room for them.

Smolt Chapter IV: A New Hope

I am (ostensibly) the smolt maintainer. If you’ve filed any smolt bugs recently: Sorry! I don’t have a lot of time for smolt and it’s surprisingly hard to maintain in its current form.

But Nathaniel McCallum and I got to talking during FUDCon about how smolt could be made simpler and more powerful at the same time. The next day when I went to talk to him further about it, he was like: “oh yeah, so I implemented a really simple client and server along the lines of what we were talking about yesterday, so we just need to find somewhere to host it…”

And someone nearby said: “Why not just use OpenShift?” So, predictably, the next day Nathaniel was like: “So I got the server set up in OpenShift…”

So yeah. One weekend and we already have a live, mostly-functional prototype of a replacement for smolt. I’ll post more about it when we’re ready to start doing demos or asking for feedback but I’m kind of excited about it. If only I had more time…

I’ve never had business cards before!

I found some mail in my inbox this morning. Like, actual physical mail in my physical inbox. Weird!

An otherwise-blank envelope from a nondescript address in Kentucky? Containing.. cards? What is this?

OH VERY YES.

These are, without at doubt, the finest business cards known to man. Just look at the smiling beefiness of my business card. Look at his bunly goodness and piquant mustard.

Look upon the Hot Dog, ye mighty, and despair.

vim syntax hilighting for kickstart files

“Aaargh dammit vim”, I said to myself on Monday, as I started trying to edit a kickstart file. “Why do you keep thinking that the // in that URL is the beginning of a comment? This isn’t C++. You’re stupid and I hate you.”

Fast-forward though a good 16 hours of frenzied hacking and much head-scratching, and voila:

To install:

Enjoy!

depcheck: tags and timing

In the previous three blog posts I talked quite a bit about the depcheck test itself. But tests don’t run in a vacuum – there are a lot of moving pieces around the test which have to work properly for the test to provide accurate – and useful – results.

d(repo)/dt

First let’s talk about how builds become updates – that is, the steps between a new package getting built, and a new update appearing in your friendly Software Updater app. Behold the terrifying extent of my Inkscape skills:

New packages get built in Koji, and then when the maintainers are happy with a build, they file an Update Request in Bodhi. Normally they request that the update go into the updates-testing repository, so testers can poke at it for a while and make sure it works, and then the maintainer requests that it go to the official updates repository. And then it shows up in Software Update.

Note that the maintainer makes the request, and that makes Bodhi tag the package as pending – but only members of the Release Engineering (rel-eng) team can actually push packages into updates-testing or updates. So in the end, it’s up to the Release Engineers to decide when a package is ready for release – based on automated test results, feedback from testers and developers, and their own best judgement.

This also means that depcheck can’t move packages into updates or updates-testing. It can mark them as approved – for example, by providing Bodhi feedback, or keeping test results somewhere – but it can’t actually move packages out of the pending list by itself.

Acceptance

This presents a problem. What happens if a pending update is accepted by depcheck, but before rel-eng pushes it into the wild, another update appears in pending and somehow conflicts with the first one?

For example: let’s say the martini package requires gin and vermouth. But then someone introduces a new package – vodka – which Obsoletes: gin. (Yes, I know – this would be wrong and horrible and should obviously be forbidden by Fedora policy and/or the Geneva Convention. But let’s put that aside for now.)

So – what do we do? We could try to go back and revoke our previous test result, which would leave one or both updates unaccepted. We’d need to write a bunch of new code to be able to revoke test results, and that would leave us with less packages being accepted. This seems.. less than desirable.

Another solution – the one that I prefer – is that once a package is accepted, we should leave it accepted. Basically we’d treat it as if it was already part of the live repos. This makes sense: since the goal is always to have a consistent set of packages, once a package is accepted as being consistent we shouldn’t mess with that. (Plus, it’s probably about to be pushed by rel-eng anyway, so it’s not unreasonable to treat it as if it already has been pushed live.) So in practice, this means the first test result takes priority, and we just don’t revoke accepted packages. This makes the code simpler and it should mean we get packages being accepted quicker.

To handle this, we need to be able to split the pending list into two parts: pending and accepted. depcheck treats the packages in accepted like the packages in the live repos, and doesn’t provide test results for them (obviously – they’ve all passed already!). Only the unapproved pending packages actually get test results.

This doesn’t lock the packages or anything like that. It can still be removed from pending by the usual means – obsoleted by the maintainer submitting a newer package, dropped because of bad karma in Bodhi, forcibly removed by rel-eng, etc. Being accepted just means that the package has passed autoqa and is eligible to be pushed by rel-eng – if they see fit.

Timing: more than just the secret to comedy

There’s another wrinkle. The AutoQA system runs all tests independently of each other. This is nice because it means we can run a lot of tests in parallel, but it also means that we can have multiple depcheck tests running at the same time. Which presents a problem: what if there’s two depcheck tests running at the same time, and one test marks some packages as accepted while the other test is still running? What should the other test do?

This is a classic concurrency problem, and there’s a lot of different possible ways to resolve it – usually involving locking or looping or both. We had a few ideas for simple solutions –  for example, we could restart the test if the list of accepted packages changed during the test. Except what if we get stuck in a loop? And also this would change the test results in some cases – so even though it’s the same test and the same packages, the results would be different if Test #1 finishes before Test #2 starts. Why should test timing affect the test results?

After a lot of whiteboard sketching and hand-wringing and test code, we realized that the simplest solution is: just don’t run depcheck tests in parallel. (At least, not tests for the same release – we can still run depchecks for Fedora 13 alongside Fedora 14 tests, since they don’t interact at all.) True, this is less efficient, but the current runtime for a depcheck test is something like 50-60 seconds. During our busiest time ever, we pushed 1300 updates through Bodhi in a month. This works out to 43 updates a day – or somewhere between 35 and 45 minutes of depcheck test time daily, on average. Even if we had a huge burst of updates – say 250 updates submitted simultaneously – and for some reason depcheck takes 10x longer than I expect, rel-eng only pushes updates once a day anyway! So by the time of the first rel-eng push we’d have processed ~144 of the updates, and the rest would be done the next day. So even in a worst-case scenario the outcome is: Less than half of the updates get delayed by one day. That’s it!

In the future we will definitely want to figure out a general strategy for handling tests that want to share information and need some locking/concurrency magic. But this turns out to be unnecessary for depcheck to function correctly and quickly. So we’ll leave it alone. For now.

So what’s left?

Not much! We should be ready to start running depcheck on new updates – in a purely informative manner – in the next couple of days. And once we’re pretty sure the results are right, we’ll start work to make Bodhi only show accepted updates to rel-eng. If it all works as it should, we should able to use this info to keep broken updates from getting into the live repos ever again. Won’t that be nice?

New job – and a new job opening at Red Hat

Eight releases ago (or 4 years, if your time isn’t measured in Fedora releases) I took the role of Fedora QA Lead. Or maybe “Fedora Test Lead”. It was kind of vague.

See, there wasn’t an official title, because it was a brand-new position. And there was no Feature process, no Release Criteria, no Test Day – no test plans at all – no proventesters, no Bodhi, no Koji. Everything got built by Red Hat employees, behind the Red Hat firewall, tested a bit (haphazardly, in whatever spare time the developers could muster) and pushed out to the public.

Things have come a long, long way since then, and I’m really proud of the things we’ve built and accomplished in Fedora QA. I’d like to take a minute to say “thanks” to everyone who’s contributed – anyone who’s participated in a test day, or written a test case, or pulled packages from updates-testing and given Bodhi feedback, or downloaded a Alpha/Beta/RC image and helped with the test matrix, or triaged a bug, or filed a bug, or helped someone else fix a bug.

My job changed over time to focus on the AutoQA project, and I’d also like to say “thanks” to everyone who’s given me ideas and suggestions along the way there. (And a huge thanks to James Laska, Kamil Paral, and Josef Skladanka for making these ideas actually work.)

Anyway, as the title suggests, I am indeed leaving Fedora QA. But I’m not going real far – I’m moving to Red Hat Engineering, and joining the Installer team. And this means that Red Hat is looking for someone to help lead the Fedora QA Automation efforts into the Glorious Future I keep promising. And that could be you.

If you want to work for (in my humble, personal opinion) a truly awesome company, and work with some brilliant, talented people, and get paid to write Free Software – and you think you have the skills and adaptability needed to do it – then check out the job posting. And if it feels right, apply.

I’ll be staying in QA until we get the depcheck test running – this was one of my original goals for Fedora QA and I’m not done until it is. And after that, it’s onward to causing fixing problems at the source. Wish me luck!

depcheck: the why and the how (part 3)

In part 1 I talked about the general idea of the depcheck test, and part 2 got into some of the messy details. If you’d like a more detailed look at how depcheck should operate – using some simplified examples of problems we’ve actually seen in Fedora – you should check out this document and the super-fancy inkscape drawings therein.

Now let’s discuss a couple of things that depcheck (and AutoQA in general) doesn’t do yet.

Handling (or Not Handling) File Conflicts

As mentioned previously, depcheck is not capable of catching file conflicts. It’s outside the scope of the test, mostly due to the fact that depcheck is yum-based, and yum itself doesn’t handle file conflicts. To check for file conflicts, yum actually just downloads all the packages to be updated and tells RPM to check them.[1] RPM then reads the actual headers contained in the downloaded files and uses its complex, twisty algorithms (including the multilib magic described elsewhere) to decide whether it also thinks this update transaction is OK. This happens completely outside of yum – only RPM can correctly detect file conflicts.

So if we want to correctly catch file conflicts, we need to make RPM do the work. The obvious solution would be to trick RPM the same way we trick yum in depcheck – that is, by making RPM think all the available packages in the repos are installed on the system, so it will check the new updates against all existing, available packages.

Unfortunately, it turns out to be significantly harder to lie to RPM about what’s installed on the system. All the data that yum requires in order to simulate having a package installed is in the repo metadata, but the data RPM needs is only available from the packages themselves. So the inescapable conclusion is: right now, to do the job correctly and completely, a test to prevent file conflicts would need to examine all 20,000+ available packages every time it ran.

We could easily have a simpler test that just uses the information available in the yum repodata, and merely warns package maintainers about possible file conflicts.[2] But turning this on too soon might turn out to do more harm than good: the last thing we want to do is overwhelm maintainers with false positives, and have them start ignoring messages from AutoQA. We want AutoQA to be trustworthy and reliable, and that means making sure it’s doing things right, even if that takes a lot longer.

In the meantime, I’m pretty sure depcheck is correctly catching the problems it’s designed to catch. It’ll need some testing but soon enough it will be working exactly how we want. Then the question becomes: how do we actually prevent things that are definitely broken from getting into the live repos?

Infrastructure Integration, or: Makin’ It Do Stuff

A little bit of background: the depcheck test is part of the Fedora QA team’s effort to automate the Package Update Acceptance Test Plan. This test plan outlines a set of (very basic) tests which we use to decide whether a new update is ready to be tested by the QA team. (Please note that passing the PUATP[3] does not indicate that the update is ready for release – it just means the package is eligible for actual testing.)

So, OK, we have some tests – depcheck, rpmguard, and others to come – and they either pass or fail. But what do we do with this information? Obviously we want to pass the test results back to the testers and the Release Engineering (rel-eng) team somehow – so the testers know which packages to ignore, and rel-eng knows which packages are actually acceptable for release. For the moment the simplest solution is to let the depcheck test provide karma in Bodhi – basically a +1 vote for packages that pass the test and no votes for packages that don’t.

Once we’re satisfied that depcheck is operating correctly, and we’ve got it providing proper karma in Bodhi when updates pass the test, we’ll add a little code to Bodhi so it only shows depcheck-approved updates to rel-eng. They can still choose to push out updates that don’t pass depcheck if necessary, but by default packages that fail depcheck will be ignored (and their maintainers notified of the failure). If the package later has its dependencies satisfied and passes depcheck, the maintainer may be notified that all is well and no action is necessary.[4]

The Glorious Future of QA Infrastructure (pt. 1: Busy Bus)

If you’ve hung around anyone from the QA or rel-eng or Infrastructure teams for any amount of time, you’ve probably heard us getting all worked up about The Fedora Messagebus. But for good reason! It’s a good idea! And not hard to understand:

The Fedora Messagebus is a service that gets notifications when Things Happen in the Fedora infrastructure, and relays them to anyone who might be listening. For example, we could send out messages when a new build completes, or a new update request is filed, or a new bug is filed, or a test completes, or whatever. These messages will contain some information about the event – package names, bug IDs, test status, etc. (This will also allow you to go to the source to get further information about the event, if you like.) The messagebus will be set up such that anyone who wants to listen for messages can listen for whatever types of messages they are interested in – so we could (for example) have a build-watcher applet that lives in your system tray and notifies you when your builds finish. Or whenever there’s a new kernel build. Or whatever!

How does this help QA? Well, it simplifies quite a few things. For example, AutoQA currently runs a bunch of watcher scripts every few minutes, which poll for new builds in Koji, new updates in Bodhi, changes to the repos, new installer images, and so on. Replacing all these cron-based scripts with a single daemon that listens on the bus and kicks off tests when testable events happen will reduce complexity quite a bit. Second (as mentioned above) we can send messages containing test results when tests finish. This would be simpler (and more secure) than making the test itself log in to Bodhi to provide karma when it completes – Bodhi can just listen for messages about new test results[5], and mark updates as having passed depcheck when it sees the right message.

But wait, it gets (arguably) more interesting.

The Glorious Future of QA Infrastructure (pt 2: ResultsDB)

We’ve also been working on something we call ResultsDB – a centralized, web-accessible database of all the results of all the tests. Right now the test results are all sent by email, to the autoqa-results mail list. But email is just text, and it’s kind of a pain to search, or to slice up in interesting views (“show me all the test results for glibc in Fedora 13″, for example).

I said “web-accessible”, but we’re not going to try to create the One True Centralized Generic Test Result Browser. Every existing Centralized Generic Test Result Browser is ugly and hard to navigate and never seems to be able to show you the really important pieces of info you’re looking for – mostly because Every Test Ever is a lot of data, and a Generic Test Result Browser doesn’t know the specifics of the test(s) you’re interested in. So instead, ResultsDB is just going to hold the data, and for actually checking out test results we plan to have simple, special-purpose frontends to provide specialized views of certain test results.

One example is the israwhidebroken.com prototype. This was a simple, specialized web frontend that shows only the results of a small number of tests (the ones that made up the Rawhide Acceptance Test Suite), split up in a specific way (one page per Rawhide tree, split into a table with rows for each sub-test and columns for each supported system arch).

This is a model we’d like to continue following: start with a test plan (like the Rawhide Acceptance Test Plan), automate as much of it as possible, and have those automated tests report results (which each correspond to one test case in the test plan[6]) to ResultsDB. Once that’s working, design a nice web frontend to show you the results of the tests in a way that makes sense to you. Make it pull data from ResultsDB to fill in the boxes, and now you’ve got your own specialized web frontend that shows you exactly the data you want to see. Excellent!

But How Will This Help With depcheck And The PUATP?

Right! As mentioned previously, there’s actually a whole Package Update Acceptance Test Plan, with other test cases and other tests involved – depcheck alone isn’t the sole deciding factor on whether a new update is broken or not. We want to run a whole bunch of tests, like using rpmguard to check whether a previously-executable program has suddenly become non-executable, using rpmlint to make sure there’s a valid URL in the package Once an update passes all the tests, we should let Bodhi know that the update is OK. But the tests all run independently – sometimes simultaneously – and they don’t know what other tests have run. So how do we decide when the whole test plan is complete?

This is another planned capability for ResultsDB – modeling test plans. In fact, we’ve set up a way to store test plan metadata in the wiki page, so ResultsDB can read the Test Plan page and know exactly which tests comprise that plan. So when all the tests in the PUATP finish, ResultsDB can send out a message on the bus to indicate “package martini-2.3 passed PUATP” – and Bodhi can pick up that message and unlock martini-2.3 for all its eager, thirsty users.

But anyone who has used rpmlint before might be wondering: how will anyone ever get their package to pass the PUATP when rpmlint is so picky?

The Wonders of Whitelists and Waivers

This is another planned use for ResultsDB – storing whitelists and waivers. Sometimes there will be test failures that are expected, that we just want to ignore. Some packages might be idiosyncratic and the Packaging Committee might want to grant them exceptions to the normal rules. Rather than changing the test to handle every possible exception – or making the maintainers jump through weird hoops to make their package pass checks that don’t apply or don’t make sense – we’d like to have one central place to store exceptions to the policies we’ve set.

If (in the glorious future) we’re already using AutoQA to check packages against these policies, and storing the results of those tests in ResultsDB, it makes sense to store the exceptions in the same place. Then when we get a ‘failed’ result, we can check for a matching exception before we send out a ‘failed’ message and reject a new update. So we’ve got a place in the ResultsDB data model to store exceptions, and then the Packaging Committee (FPC) or the Engineering Steering Committee (FESCo) can use that to maintain a whitelist of packages which can skip (or ignore) certain tests.

There have also been quite a few problematic updates where an unexpected change slipped past the maintainer unnoticed, and package maintainers have thus (repeatedly!) asked for automated tests to review their packages for these kinds of things before they go out to the public. Automating the PUATP will handle a lot of that. But: we definitely don’t want to require maintainers to get approval from some committee every time something weird happens – like an executable disappearing from a package. (That might have been an intentional change, after all.) We still want to catch suspicious changes – we just want the maintainer to review and approve them before they go out to the repos. So there’s another use for exceptions: waivers.

So don’t worry: we plan to have a working interface for reviewing and waiving test failures before we ever start using rpmguard and rpmlint to enforce any kind of policy that affects package maintainers.

The Even-More Glorious Future of QA

A lot of the work we’ve discussed here is designed to solve specific problems that already exist in Fedora, using detailed (and complex) test plans developed by the QA team and others. But what about letting individual maintainers add their own tests?

This has actually been one of our goals from Day 1. We want to make it easy for packagers and maintainers to have tests run for every build/update of their packages, or to add tests for other things. We’re working right now to get the test infrastructure (AutoQA, ResultsDB, the messagebus, and everything else) working properly before we have packagers and maintainers depending on it. The test structure and API are being solidified and documented as we go. We still need to decide where packagers will check in their tests, and how we’ll make sure people don’t put malicious code in tests (or how we’ll handle unintentionally misbehaving tests).

We also want to enable functional testing of packages – including GUI testing and network-based testing. The tests I’ve been discussing don’t require installing the packages or running any of the code therein – we just inspect the package itself for correctness. Actual functional testing – installing the package and running the code – requires the ability to easily create (or find) a clean test system, install the package, run some test code, and then review the results. Obviously this is something people will need to do if they want to run tests on their packages after building them. And this isn’t hard to do with all the fancy virtualization technology we have in Fedora – we just need to write the code to make it all work.

These things (and more) will be discussed and designed and developed (in much greater detail) in the coming days and weeks in Fedora QA – if you have some ideas and want to help out (or you have any questions) join the #fedora-qa IRC channel or the Fedora tester mailing list[7] and ask!


1 This is why some update transactions can fail even after yum runs its dependency check, declares the update OK, and downloads all the packages.
2 Actually, this test already exists. See the conflicts test, which is built around a tool called potential_conflict.py. Note how it’s pretty up-front about only catching potential conflicts.
3 Yeah, “PUATP” is a crappy acronym, but we haven’t found a better name yet.
4 Although maybe not – it seems really silly to send someone an email to tell them they don’t need to do anything. Informed opinions on this matter are welcomed.
5 In fact, AQMP and the qpid bindings allow you to listen only for messages that match specific properties – so Bodhi could listen only for depcheck test results that match one of the -pending updates – it doesn’t have to listen to all the messages and filter them out itself. Neat!
6 Some AutoQA tests will test multiple test cases, and thus report multiple test results. Yes, that can be a little confusing.
7 See the instructions here: http://fedoraproject.org/wiki/QA