I just had my fifth puzzle published at GM Puzzles.
This was a fun one. It's my second Star Battle puzzle, and it's definitely more approachable than my previous dual-grid Star Duel. (Incidentally, that giant Star Duel was noted as one among several of the best object-placement puzzles of 2014 at GM Puzzles. I'm happy to see that people liked it.)
I got the idea for today's Star Battle when reading that best-of post. One of the other puzzles noted there was this lovely 9-pentomino Star Battle by Zoltán Horváth. The post mentioned that another designer, Jiří Hrdina, had independently designed a similar 9-pentomino Star Battle. (I can't link directly to that one since it's not freely available on the web. It's contained in The Art of Puzzles: Star Battle e-book available for sale). Then, in the comments, Matúš Demiger mentioned that he had also independently constructed a third 9-pentomino Star Battle as part of the 2014 24-Hour Puzzle Competition.
So at least three puzzle authors all happened to construct pentomino-themed Star Battle puzzles last year, and all three happened to choose the fairly-standard 10x10 grid size, (forcing the puzzle to include only 9 of the 12 possible pentomino shapes). Matúš's comment was "I hope someone will try to include all twelve pentominoes" and I couldn't resist the challenge.
And it was an interesting challenge since putting 12 pentominoes into a Star Battle requires a 13x13 grid. That's not too much of a problem in and of itself, (my Star Duel used a 15x15 grid, for example). But with this particular theme, as the puzzle grows the total area of the pentominoes grows linearly, while the total puzzle area grows quadratically. In my final puzzle there are 12 regions of size 5 and then one giant outer region with 109 cells. But all 13 regions each only contain two stars. So the real challenge here was to ensure that when solving the puzzle the stars in the huge region didn't get determined early, (causing a bunch of cells to be wasted and forcing the user to tediously mark off all of the unused cells).
Luckily, I think it just worked out. In all of my test-solving, the stars in the large region are among the very last determined.
Anyway, give this puzzle a try if you'd like. The large number of tiny regions means there are a lot of easy steps early on in the puzzle. But there are still a few more interesting deductions in store later on, (but nothing ever all that difficult in this Wednesday-level puzzle).
I should also thank Thomas Snyder for his editorial help. He found and fixed a small ambiguity in the first version of this puzzle that I submitted. I've since coded up a deductive Star Battle solver just to be able to verify uniqueness for puzzles I construct. But maybe I'll talk about that in a future post.
PS. If anyone is following closely, I neglected to mention my fourth puzzle when it was published a few weeks ago on a Friday. It's a Pentominous puzzle with almost no clues other than 12 F's, and I named it "F is for Fiendish". The title is a warning, and I think it deserves it. I think this is the hardest puzzle I've published so far. My Star Duel earned a longer estimated "Expert" time, (37 minutes compared to 20 minutes for "F is for Fiendish"), but that's mostly because Star Duel is so much bigger, (2 15x15 grids compared to a single 10x10 grid). The deductions required here are definitely harder to find.
My sister is really kind to do some of the initial testing of several of my puzzles. After I handed her a copy of "F is for Fiendish" one evening, she called me later that night to ask, "Can you email me a fresh copy of that puzzle? My husband and I have been trying it over and over and the paper is all disintegrating after so much erasing." That's a beautiful thing for a puzzle designer to hear---that someone is terribly frustrated with a puzzle, but still determined to stick with it and keep trying.
So if you want a challenge, give "F is for Fiendish" a try. There are logical steps that can be found at every point to solve the puzzle without needing any guessing or back-tracking, (but they may not be easy to find). Good luck, and happy puzzling!Posted Wed Jan 21 21:49:58 2015
This puzzle is a Pentominous puzzle, a really fun style which I believe was invented by Grant Fikes, a prolific puzzle contributor to GM Puzzles. To solve the puzzle, you simply need to divide the 10x10 grid into 20 pentominoes such that each of the given letters in the grid is contained by a pentomino of that shape, and that no two pentominoes of the same shape are touching.
My puzzle today is a Monday-level difficulty, meaning it's not too tough. And I think this puzzle in particular is a great introduction to Pentominous. When I first started constructing logic puzzles, I had a hard time making good and fun puzzles at an introductory level. Part of the problem is that any puzzle I make is usually harder than I expect it will be, (I think that's common for many beginning puzzle designers), so I'd often force harder deductions in the puzzle thinking that that was a requirement to make the puzzle interesting.
But this puzzle is probably the first time I made what I think it a really successful introductory puzzle. I've given this puzzle to a lot of people and people end up really liking the way the puzzle just sort of walks you through a series of forced pentomino placements. I've often heard feedback along the lines of, "Well, I still say that I can't do puzzles like this, but that one was actually a lot of fun," and that's really rewarding to hear.
So take a look if you're interested and let me know how it goes for you.
Oh, and I should comment on the words I ended up including in the puzzle's presentation. First, it's really tempting to include words in Pentominous puzzles. Second, it's really hard to pull them off. As a case in point here is an example puzzle by Tapio Saarinen where he started with 4 English words, but by the time he was done, only one remained as an actual word.
For today's puzzle, I set myself up for a bigger challenge, because I wanted to get a meaningful phrase, (but I only have the letters FILNPTUVWXYZ to work with---quite limited). The phrase I chose, "Intuit, Nix, Fix" is specifically the strategy I don't recommend using to solve puzzles. That is, you can just make a guess (intuit), then work with the puzzle for a while until you encounter a contradiction, then cancel your guess and try again (nix, fix). I suppose "guess and check" would have been an easier way to say that, but you can't spell that with the twelve Pentomino letters.
Anyway, a lot of people have been trained to use guess-and-check as a strategy because they've been solving puzzles generated randomly by computer programs, and many of these puzzles don't afford any other strategy. A huge benefit of solving good, hand-made puzzles like those published at gmpuzzles.com is that you don't need that strategy. Inside, there's a nice logical path from start to finish. So resist the temptation to guess, and go give this puzzle a try.
Happy puzzling!Posted Mon Dec 22 18:23:57 2014
My boys just finished this year's Christmas puzzle hunt, (known in our family as "The Code"). Read below for a summary of how things went, or check out the actual puzzles if you want to solve them yourself or just wonder at the amount of work my poor boys have to go through before they get any Christmas presents. I've also written up a brief history of the Christmas code linking to all the puzzles from previous years.
I'm really pleased with how things went this year. I scaled the difficulty back compared to last year, and that was good for everyone's sanity. It's easy to see the difference in just the numbers. Last year's hunt had 24 puzzles, 5 metapuzzles, and 1 final metametapuzzle. This year there were only 9 puzzles and 2 small metapuzzles. I was nervous the boys might think I did too little, but I think I actually got the difficulty just right this year.
They did solve it a few days before Christmas this year, but that's a good victory for them. Last year, there was a tough Battleships metapuzzle and the deciphering of a Playfair cipher that all came together only late at night on Christmas Eve.
Of course, the most important feedback comes from the boys themselves. There was a priceless moment last night as one-by-one, the boys in turn had the final flash of insight needed to solve the last clue. There's nothing better for a puzzle designer than getting to watch that "ah-ha" moment hit with its explosion of laughter. And that laughter is always mixed---appreciation for the disguise of the clue and a bit of disgust at not being able to see through it all earlier. And last night, it was a perfect cascade of these moments from one boy to the next. After the first one finally got it, just knowing it was possible was all the others needed to reach the same insight themselves. It was perfect.
This morning the boys reported that this year's code was the "best one yet". I think I've heard that three years in a row now. That's what gives me the motivation to do all this again. I guess I had better start planning now if I want to improve again next year!Posted Mon Dec 22 17:33:15 2014
I've slipped into an accidental tradition of composing a puzzle hunt for my sons every year at Christmas time. We call it "The Code" and it's now one of the most anticipated events every December.
If you'd like to see some examples of what the puzzles are like, you can look at them here (or even try solving them yourself):
As one might hope, I believe my puzzle-designing skills are improving with practice. So hopefully the more recent puzzle above show some of my better work.
This tradition has become sufficiently involved that the history of The Code really deserves to be captured. Here's the story of how it all started, and how it's changed over the years.
On Keeping Christmas a Surprise: The Invention of The Code
One year, my wife and I were lamenting to some friends that Christmas gifts were losing their mystery at our house. Our boys were getting good enough at poking, prodding, and predicting their presents that there weren't a lot of surprises left for Christmas morning. Our friends had had similar problems with their children, and we ended up comparing notes on things we had done to keep gifts secret.
One obvious tactic is to disguise a gift while wrapping it. This technique has already been developed to an artform by our boys. They love giving each other a large, wrapped gift which, when opened reveals a smaller, wrapped gift inside. Then, like a set of matryoshka dolls, that gift reveals a smaller, and smaller gift. In our family, as soon as this unwrapping process is started, everyone knows what the end result will be. The final few layers are so tiny that there's no room for boxes anymore, just many, many layers of different colored gift wrap and excessive amounts of tape. These last layers take a tremendous amount of effort to get through. And the reward for all of this hard work of unwrapping? Inevitably, it's a single penny. We don't recall who wrapped the first penny, but it's now a standing tradition where the boys try to outdo each other each year by disguising a penny in the most elaborate way possible.
Clearly, we're not willing to go to such heroic efforts to disguise every gift we wrap. We had experimented with waiting to put the Christmas gifts out under the tree until just before Christmas, but that wasn't much fun. It robbed the boys of a lot of the fun anticipation of seeing the gifts under the tree throughout December.
Our friends shared an idea that had worked well in their family. What they did was not write names on any of the gifts, but instead wrote a single number on each gift indicating the recipient. And the method for choosing the numbers was selected in a new and unpredictable way each year. For example, one year my friend (who happens to be a dentist) wrote the number of teeth that each child had lost on their gifts.
We thought this was a great idea, and we decided to give it a try. And none of our Christmas gifts have had any of the boys' names on them sense, (though they've had just about everything else possible). Read on for a rundown of what we have done for the code for each year.
The Early Years: Locking the Code up Tight
For the first year, we adopted a very simple strategy. We wrapped each boys gifts in a unique color of wrapping paper. We didn't explain anything, and when the boys asked why none of the gifts had names on them, we were evasive. Then, on Christmas morning, we told them which color of gift wrap corresponded to each boy.
The second year, the boys correctly predicted that we wouldn't use the same technique, and they immediately set about trying to crack "the code" for the Christmas presents. They started convening secret meetings to discuss theories. We discovered some of the notes from one of the meetings and found that they had constructed an entire table mapping out the following variables for each gift under the tree: Wrapping paper color, Number of bows, Colors of bows, Picture on the tag, Names on the tag. You see, this year, I did put tags on the presents, but instead of their names I wrote the names of characters from nonsense poetry: "To: The cat; From: The fiddle", "To: The cow; From: The moon", "To: The dish; From: The spoon", etc.
Their table was fairly effective. They were able to eliminate many variables that couldn't work. If there were more colors of wrapping paper tan children, they assumed that could ignore that. If there were only one or two bows per present, they assumed that couldn't identify one of the four boys. Fortunately for me, they didn't crack the code that year, but only because their table hadn't accounted for the color of ribbon on each present.
By the third year, I realized that I had to put more thought into designing the code. Here was an active group of intelligent agents determined to find the information I was trying to hide. I was careful this time to imagine every variable they could track, and ensure that each variable appeared with four different values, evenly distributed among the presents. I also ensured there was no correlation between any of the variables. Grouping the presents by wrapping-paper color, bows, ribbons, or anything else would always yield four entirely different sets of presents. (So yes, this meant that now my wife and I needed to consult our own table before we could know how to correctly wrap each gift).
Then, for the actual information, I chose two variables I thought they would never track. We carefully folded the flaps on each gift either in the same direction on each side or in opposite directions on each side. Then we either folded all cut edges away to leave clean creases, or left the raw edges exposed. This gave us four sets of presents: Matching flaps creased, Matching flaps raw, Opposite flaps creased, and Opposite flaps raw.
Of course, the boys never even looked at the flaps, and all of their attempts to find logical groups were foiled. When I revealed the answer on Christmas morning, I was smug, thinking I had "won" by creating a code they couldn't crack. Of course, the boys called me out saying that what I had done was totally unfair. (How could it have been unfair I thought? This was a game that I had invented myself?) But the boys were totally right. My problem was thinking that this was a game, when in fact, this should have been a puzzle.
Years later, I read the book "Puzzle Craft" by Mike Selinker and Thomas Snyder. In the introduction, Mike Selinker describes the lesson that my boys were teaching me. He says that a game is a contest with two equal sides and the outcome is in doubt, (either side has a roughly equal chance to win). In contrast, a puzzle is a contest with two wildly-unequal sides where the outcome is never in doubt, (the weaker side will always win). When I treated The Code as a game, it wasn't fun for any of us. The odds were stacked too much in my favor---I could always create an impossible-to-solve puzzle, but who has fun with a puzzle that's impossible to solve? That just leads to frustration and giving up. A good puzzle, in contrast, has plenty of frustration, but enough fun and reward that the solvers stick it through to the end. So I needed to learn to create a puzzle.
The Code Today: The Code as a Puzzle
My boys taught me that the code needed to be fair. That is, they needed to be given enough information to be able to solve the code. There could still be lots of deception and trickery, but they needed to know that with perseverance, patience, and creativity they could actually find the answer. They also gave me a second ground rule: The solution to the code must be relevant and interesting. A final answer of "you get the presents with the matching flaps with the raw edges" doesn't cut it. Instead, the solution to the code should actually point to the boys themselves. Basically, they were telling me "Design us a puzzle", and "Make it a good puzzle", and it just took me some time to figure that out.
Christmas 2012 was the first year I approached The Code as a puzzle. For that year, I labeled each present with nothing more than a small QR code. Scanning the QR code linked to a web page with a silly animated GIF, a solid-color background, some nonsense poetry in the title, etc. Somewhere in all of that was a hidden indication of who the intended recipient of each present was. So this was simply one puzzle, and a lot of obfuscation. I thought this puzzle would have been easier than it was, (a common problem for early puzzle designers from what I understand). But the boys had a lot of fun with it, and with some hints at the end, they figured things out by Christmas.
Christmas 2013 was the first year I stepped up and instead of designing just one puzzle, I designed an entire puzzle hunt. A puzzle hunt is a connected series of several puzzles. Many puzzle hunts also included "meta-puzzles" where the solutions to several puzzle combine to form a new puzzle. This was my first puzzle hunt to design and it included 24 puzzles, 5 metapuzzles, and 1 final metametapuzzle, (where the solutions to 4 previous metapuzzles had to be combined in another puzzle) That was probably over-ambitious for my first puzzle hunt, but it worked fairly well. There were a couple of bugs in the puzzles that I should have caught with better testing in advance.
One thing I was really pleased with was that I intentionally included every element from the previous puzzle, (animated GIFs, random background colors, nonsense poetry, etc.). But where in 2012 many of these elements were meaningless red herrings, in this year's hunt, every element was used in at least one puzzle. I was also happy that I included a mechanism for providing additional hints along the way. (And I did this in a way that I could revise those hints before the boys encountered them, so I could fine-tune the hints based on where they were getting stuck.) That was very useful, and I used that to my advantage again in 2014.
Christmas 2014 was my second puzzle hunt, so it's clearly now a new tradition that we won't give up for some time. (We'll just need to adapt things when some of the boys move out of the house, etc.). I do feel like I'm getting better at puzzle design, and the boys are still having a lot of fun solving this. I wrote up a blog post giving some of my feedback on how the solving experience went this year.Posted Mon Dec 22 17:33:15 2014
This puzzle is a Star Battle, which is an elegant and easy-to-learn puzzle type. All you have to do is place some stars into the grid such that exactly one star, (or more stars for larger puzzles), appears in each row, column, and bold region. Also, no two stars can be placed in adjacent cells, not even diagonally.
For the puzzle I published today, I took a couple of steps beyond the basic Star Battle formula. First, I made the grid quite large, (15x15 with 3 stars per row/column/region). Then I also provided two grids. Each grid has the same solution of stars, so you work back and forth between the grids to solve them both simultaneously. Finally, instead of dividing the regions into random-looking arbitrary shapes, I tried to make some recognizable pictures with the grids. If you click through above, hopefully you'll be able to recognize the shapes I was going for. This puzzle marks my first in a series of puzzles where I chose the name of the puzzle type as the theme for the puzzle. So, I drew a star battle for a Star Battle puzzle. And with dual grids, I named this battle a Star Duel.
Finally, I should point out that where my previous puzzle was published on a Monday, today's puzzle is published on a Saturday. The GM Puzzles website publishes puzzles that get increasingly difficult throughout the week. So if you've never attempted a star-battle puzzle before, I don't actually recommend you start with my puzzle from today.
Instead, you might start with this simpler star battle puzzle that I wrote as part of a Christmas puzzle hunt for my boys. It's a very gentle introduction to the puzzle type. If you try it, you can ignore the second grid with the C, H, A, and S regions. That was part of a metapuzzle included in the puzzle hunt.
Happy puzzling!Posted Sat Dec 20 14:40:58 2014
I've really been impressed by the puzzles published at gmpuzzles.com. They do a great job of showcasing how much more interesting a logic puzzle can be when it's hand-crafted by a puzzle designer, rather than computer generated. I've also enjoyed learning many new kinds of logic puzzles, and decided that I wanted to try my hand at composing some puzzles of my own.
My first published puzzle is a Nanro puzzle, which is a puzzle type I only recently discovered. In a typical Nanro puzzle, the solver is presented with a grid that is segmented into regions of various shapes and sizes, with some of the cells labeled with numbers. The goal is to label additional cells with numbers so that all of the labeled cells form a single connected group, (with no 2x2 square of cells fully labeled). Additionally, within each region there must at least one labeled cell and the number(s) within a region must be equal to the number of labeled cells within that region. (That is, a region may contain one 1 or two 2s or three 3s, etc.) Finally, no two occurrences of the same number can appear adjacent to each other across a region boundary.
OK. So, I'll be the first to admit that the ruleset of Nanro isn't the most elegant. The rules can't be captured as succinctly as those of that other puzzle, ("no repeated numbers in any row, column, or region"). Fortunately, the solving experience is smoother than the description of the rules.
Most puzzles at gmpuzzles.com have some sort of "theme". Often there's an interesting visual presentation. Almost always there's a well-designed "solving path" of logical deductions that can be fun to discover. The idea is that there's something creative that goes into each puzzle that that distinguishes it from something computer-generated. And that creativity usually translates to a more interesting experience for the solver.
For this particular puzzle, the theme I wanted to explore was a "clueless" puzzle. That is, my puzzle contains a grid segmented into regions as typical for Nanro, but it doesn't include any initial numbers in the presentation. So that's something unique I think (as far as I have seen of Nanro at least). I'm not sure how successful I was at exploring the theme in this puzzle. Hopefully it's an interesting puzzle to solve, but I am a bit nervous that I'm bringing down the average quality of puzzles at gmpuzzles.com with this one.
I hadn't seen it prior to composing my puzzle. But before mine was published I found this spectacular Nanro by the inimitable Prasanna Seshadri. It's not entirely clueless, but as close as possible, (just one given number), but it thoroughly explores some of the ideas that my puzzle just touches briefly. Prasanna's puzzle has a really striking presentation that translates to a fun solving path. And it's obviously something that's been generated by hand not computer-generated, (at least given the state of the typical bland computer-generated puzzle today). So that's a great example of what gmpuzzles is all about and the standard I'll be aiming for with future puzzles.
So if you're interested in logic puzzles, take a look at my first puzzle. But even more, look around at the other puzzles at gmpuzzles.com. I doubt you'll be disappointed.Posted Tue Nov 25 18:56:16 2014
For quite some time I've been struggling with hundreds of email messages a day, and hundreds-of-thousands of email messages in archives to search. I've used several different email programs to try to handle my mail, (pine, vm, mh-e, wanderlust, evolution, mutt), and experimented with several others, but not until now did I find one that really helps me process mail the way I want to.
Before I introduce sup, though, I want to at least mention in passing a mail indexer named mairix since I've been meaning to write a blog entry about it for several months. Anthony Towns first let me know about mairix earlier this year when I was complaining about email near him. He described his approach to mail as "mutt for reading mail and mairix for searching".
The way mairix works is that it creates an index of a mail archive then provides a command-line-based search that creates results by creating a new maildir, (with just symlinks to original messages if the original archive is also maildir-based).
So mairix can give you a simple way to add fast, full-archive searching to a program like evolution when its builtin search capability is hopelessly inadequate for your large mail archive. The mail client can be configured to look at a single search-results maildir folder, and mairix can be used to create the results there. Do note that mairix is somewhat limited as a mail search engine since it has no proximity or phrase-based searching, (the best you get is whether messages contain given words). But the maildir-in and maildir-out approach it has makes it easy to integrate, and is the kind of concept that can perhaps let us avoid some of the disasters out there in the form of monolithic email clients.
But I've got a much more interesting story in sup. I was just a few days into reinventing my email workflow around mutt when I got into an interesting conversation with Jamey Sharp about what the ideal mail-handling system would look like. Here are some of the ideas we came up with:
Program presents me with messages that need my attention, (a chronological sort of the archive with unread items in bold does not count).
When I decide a message does not need my attention I can make it "go away" into the archive, (and "mark as read" does not count---I want to file away messages but still track whether they are read or not in the archive).
Threads must be first-level objects, (in particular I want a kill-thread command that archives not only the current messages, but all future messages received in a given thread).
System should support arbitrary tags on messages. Tags can be applied automatically, (such as applying categories to mail received on lists), and applied ad-hoc by the user.
System should allow searching over the entire archive with most recent results appearing instantly. Search terms can include tags, message headers, or full-body search (including phrases).
In addition to full-archive search, incremental refinement should be possible, (a new search based on the results of a previous search).
There's no need for folders in this system. Tags and incrementally refined, tag-based searching provide all the benefits, (without the bug in folder-based systems where a user has to hunt among many folders to find one where a search returns non-empty results). In particular, the "inbox" view should just be a search for unread and non-archived messages.
That was basically our list of core features. Beyond that, we also discussed some further ideas for improving the prioritization of email threads to the user. One primary job of the system is to present the user with the most important thread to be read next, (based on tags, etc.). The user supplies feedback by either reading the thread or archiving it, so there's opportunity to apply machine learning to improve the prioritization.
During the course of this conversation, Jamey mentioned that I should go look at sup which he thought might implement some of these features, but he hadn't tried it out yet. I was delighted to find that each one of our core features already exists in sup, (and sup would make a fine platform for research into the learning-based approaches).
It also does a few things I hadn't specified, such as displaying email in a full-thread nested view, (rather than one message at a time), with quoted elements and signatures folded out of view until the user asks to see them. Another nice touch is that the single-line presentation of a thread includes the first names of participants in the thread, (with bold names to indicate unread messages). This provides some of the essential information needed for applying Joey Hess's thread patterns, but without the tree view at this point.
[Note: I have been told that several of the above features are also
implemented in gmail. I've never tried gmail myself, since it fails to
provide some even more fundamental features: 1.
usage, (thanks to both Mark and Greg for pointing out that gmail
has offline support in beta via gears) 2. personal ownership of email
storage, 3. free-software implementation for customization.]
In the few days I've been using sup, it's definitely transformed the way I process mail. Keeping my inbox empty is simple now, and I now easily tag messages for specific, future actions without fearing that I will forget, (and without leaving them cluttering up my inbox). Searching through my entire archive is fast---the first results come back instantly and they're generally what I want, (the index is optimized to return the most-recent messages first).
Sup is implemented in ruby, and the author and maintainers on the sup-talk mailing list have been friendly and receptive to me, (in spite of my lack of ruby-clue), so I've already got a patch or two in the upstream git repository for sup. The indexing in sup has been performed by ferret, but is currently switching to xapian.
Sup is not entirely mature yet, (I discovered fairly easy ways to make the 0.8.1 version in Debian crash), but I'm happily running the latest code from git now, and trusting my mail to it. I did find that a bug in a dependent package causes sup to crash when running sup from git while sup is also installed from the Debian package. So I recommend doing "apt-get install sup-mail" to get the dependencies installed, then "apt-get remove sup-mail" (but don't autoremove the dependencies), then run sup from:
git clone git://gitorious.org/sup/mainline.git
I've also posted a short list of some changes I'd like to see in sup, (mostly in the client stuff---the indexing and searching stuff seems just fine).
There's a nice New User's Guide included with sup, but no reference manual yet, so the sup wiki is an essential (and short) substitute for a reference manual for now.
One thing sup does not do yet is that it doesn't separate the mail-indexing/tagging/searching system from the mail client. So if you're not happy with a curses client written in ruby, (and perhaps more significantly, a different client than what you're currently using for composing messages, making attachments, viewing mime stuff, etc.) then you're currently out of luck. Fortunately, William Morgan has plans to reinvent the most interesting aspects of sup as a server process and he recently reported making progress on those plans over the last year. (This isn't too surprising, since why would William want to maintain all that code to do things like dealing with mbox, maildir, attachments, etc. when those are already solved problems for users everywhere.)
And hey, I would love to have a mailing list archiver based on sup's threading. So if anyone wants to cook something like that up, please feel free, (sup is licensed under the GNU GPLv2).Posted Thu Aug 20 16:30:00 2009
Trying to get a handle on 2D graphics rendering performance can be a difficult task. Obviously, people care about the performance of their 2D applications. Nobody wants to wait for a web browser to scroll past tacky banner ads or for an email client to render a screen full of spam. And it's easy for users to notice "my programs aren't rendering as fast with the latest drivers". But what developers need is a way to quantify exactly what that means, in order to track improvements and avoid regressions. And that measurement is the hard part. Or at least it always has been hard, until Chris Wilson's recent cairo-perf-trace.
Previous attempts at 2D benchmarking
Various attempts at 2D-rendering benchmark suites have appeared and even become popular. Notable examples are x11perf and gtkperf. My claim is that these tools range from useless to actively harmful when the task is understanding performance of real applications.
These traditional benchmarks suites are collections of synthetic micro-benchmarks. Within a given benchmark, some tiny operation, (such as "render a line of text" or "draw a radio box"), is performed hundreds of times in a tight loop and the total time is measured. The hope is that these operations will simulate the workload of actual applications.
Unfortunately, the workload of things like x11perf and gtkperf rarely come close to simulating practical workloads. In the worst case, the operation being tested might never be used at all in modern applications, (notice that x11perf tests things like stippled fills and wide ellipses which are obsolete graphics operations). Similarly, even if the operation is used, (such as a GTK+ radio button), it might not represent a significant fraction of time spent rendering by the application, (which might spend most of its time drawing its primary display area rather than any stock widget).
So that's just the well-known idea to not focus on the performance of things other than the primary bottlenecks. But even when we have identified a bottleneck in an application, x11perf can still be the wrong answer for measurement. For example, "text rendering" is a common bottleneck for 2D applications. However, a test like "x11perf aa10text" which seems like a tempting way to measure text performance is far from ideal. This benchmark draws a small number of glyphs from a single font at a single size over and over. Meanwhile, a real application will use many glyphs from many fonts at many sizes. With layers and layers of caches throughout the graphics stack, it's really not possible to accurately simulate what "text rendering" means for a real application without actually just running the actual application.
And yes, I myself have used and perhaps indirectly advocated for using things like x11perf in the past. I won't recommend it again in the future. See below for what I suggest instead.
What do the 3D folks do?
For 3D performance, everybody knows this lesson already. Nobody measures the performance of "draw the same triangles over and over". And if someone does, (by seriously quoting glxgear fps numbers, for example), then everybody gets a good laugh. In fact, the phrase "glxgears is not a benchmark" is a catchphrase among 3D developers. Instead, 3D measurement is made with "benchmark modes" in the 3D applications that people actually care about, (which as far as I can tell is just games for some reason). In the benchmark mode, a sample session of recorded input is replayed as quickly as possible and a performance measurement is reported.
As a rule, our 2D applications don't have similar benchmark modes. (There are some exceptions such as the trender utility for mozilla and special command-line options for the swfdec player.) And coding up application-specific benchmarking code for every interesting application isn't something that anyone is signing up to do right now.
Over the past year or so, Chris "ickle" Wilson has been putting a lot of work into a debugging utility known as cairo-trace, (inspired by work on an earlier tool known as libcairowrap by Benjamin Otte and Jeff Muizelaar). The cairo-trace utility produces a trace of all cairo-based rendering operations made by an application. The trace is complete and accurate enough to allow all operations to be replayed with a separate tool.
The cairo-trace utility has long proven invaluable as a way to capture otherwise hard-to-reproduce test cases. People with complex applications that exhibit cairo bugs can generate a cairo-trace and often easily trim it down to a minimal test case. Then after submitting this trace, a developer can replicate this bug without needing to have a copy of the complex application nor its state.
More recently, Chris wrote a new "cairo-trace --profile" mode and a tool named cairo-perf-trace for replaying traces for benchmarking purposes. These tools are currently available by obtaining the cairo source code, (either from git or in the 1.9.2 development snapshot or eventually the 1.10 release or later). Hopefully we'll see them get packaged up so they're easier to use soon.
With cairo-perf-trace, it's a simple matter to get rendering performance measurements of real applications without having to do any modification of the application itself. And you can collect a trace based on exactly the workload you want, (as long as the application you are interested in performs its rendering with cairo). Simply run:
cairo-trace --profile some-application
Which will generate a compressed file named something like some-application.$pid.lzma. To later benchmark this trace, first uncompress it:
lzma -cd some-application.$pid.lzma > some-application.trace
And then run cairo-perf-trace on the trace file:
The cairo-perf-trace utility will replay several iterations of the trace, (waiting for the standard deviation among reported times to drop below a threshold), and will report timing results for both the "image" backend (cairo's software backend) and whatever native backend is compiled into cairo, (xlib, quartz, win32, etc.). So one immediately useful result is its obvious to see if the native backend is slower than the all-software backend. Then, after making changes to the graphics stack, subsequent runs can be compared to ensure regressions are avoided and performance improvements actually help.
Finally, Chris has also established a cairo-traces git repository which collects useful traces that can be shared and compared. It already contains several different browsing sessions with firefox, swfdec traces (one with youtube), and traces of poppler, gnome-terminal, and evolution. Obviously, anyone should feel free to generate and propose new traces to contribute.
Putting cairo-perf-trace to use
In the few days that cairo-perf-traces has existed, we're already seeing great results from it. When Kristian Høgsberg recently proposed a memory-saving patch for the Intel driver, Chris Wilson followed up with a cairo-perf-trace report showing that the memory-saving had no negative impact on a traced firefox session, which addressed the concern that Eric had about the patch.
As another example, we've known that there's been a performance regression in UXA (compared to EXA) for trapezoid rendering. The problem was that UXA was allocating a pixmap only to then use software-based rasterization to that pixmap (resulting in slow read-modify-write cycles). The obvious fix I implemented is to simply malloc a buffer, do the rasterization, and only then copy the result to a pixmap.
After I wrote the patch, it was very satisfying to be able to validate its real-world impact with a swfdec-based trace. This trace is based on using swfdec to view the Giant Steps movie. When running this trace, sysprof makes it obvious that trapezoid rendering is the primary bottleneck. Here is the output of cairo-perf-trace on a GM965 machine before my patch:
[ # ] backend test min(s) median(s) stddev. count [ 0] image swfdec-giant-steps 45.766 45.858 0.11% 6 [ 0] xlib swfdec-giant-steps 194.422 194.602 0.05% 6
The performance problem is quite plain here. Replaying the swfdec trace to the X server takes 194 seconds compared to only 45 seconds to replay it only to cairo's all-software image backend. Note that 194 seconds is longer than the full video clip, meaning that my system isn't going to be able to keep up without skipping here. That's obviously not what we want.
Then, after my simple just-use-malloc patch I get:
[ # ] backend test min(s) median(s) stddev. count [ 0] image swfdec-giant-steps 45.792 46.014 0.37% 6 [ 0] xlib swfdec-giant-steps 81.505 81.567 0.03% 6
Here the xlib result has improved from 194 seconds to 81 seconds. That's a 2.4x improvement, and fast enough to now play the movie without skipping. It's very satisfying to validate performance patches with real-world application code like this. This commit is in the recent 184.108.40.2061 or the Intel driver, by the way. (Of course, there's still a 1.8x slowdown of the xlib backend compared to the image backend, so there's still more to be fixed here.)
The punchline is that we now have an easy way to benchmark 2D rendering in actual, real-world applications. If you see someone benchmarking with only toys like x11perf or gtkperf, go ahead and point them to this post, or the the cairo-perf-trace entry in the cairo FAQ, and insist on benchmarks from real applications.Posted Fri Jun 12 17:36:41 2009
It's hard for me to believe that my last technical post on the progress of the Intel graphics driver was nearly a year ago. Certainly, a lot has happened in that year, and recent developments have been covered well by Keith, Eric, and Jesse. So I won't go into any of those details, but I do want to give my impression of where things stand.
As my colleagues have described so well, a major restructuring of the driver is now complete, (KMS, GEM, dri2, removal of XAA, EXA, dri1, etc.). This restructuring caused months of upheaval, followed by months of stabilization during which many bugs were shaken out. The end result is that the primary developers are all optimistic about the current state of the driver. Things seem better than ever.
In the meantime, however, it's also obvious that some users are not happy with the driver, and some have gotten the impression that the quality is getting consistently worse with time. Some have theorized that no quality-assurance testing is happening, or that the developers just plain don't care about regressions. Neither of those theories are true, so why is there such a disconnect between the developers and the users?
One reason for unhappy users is the lag between development happening and users getting their hands on the code. So the phase of upheaval and instability might be behind the developers, but some users are just entering into it. This is exacerbated by the fact that the major restructuring puts several important components of the driver into the kernel. It used to be that getting the latest Intel driver just meant updating an xf86-video-intel module or upgrading an xserver-xorg-video-intel package. But now it's essential to get a recent kernel as well, (note that almost all of the major fixes that Eric describes happened in the kernel, not the user-level driver). And these fixes aren't in a major kernel release until 2.6.30 which appeared only today.
In addition, distributions and users have a tendency to update the kernel less frequently than user-level components. So when a user checks out the latest xf86-video-intel from git, (without a kernel upgrade), not only is that user not getting the "latest driver" as recommended by the developers, but they might also be running a combination that the developers and QA never saw nor tested.
For example, I recently attempted to run the latest-from-git Intel driver with a kernel version as installed by current Debian installation CDs. This kernel is ancient with respect to our driver development (2.6.25 I think). What I found was that the X server would not start at all. Obviously, we had neglected to test this particular combination, (so yes, QA isn't perfect---but it also can't cover the infinite number of possible combinations). Fortunately, the fix was simple, and appears in the 220.127.116.111 snapshot I just released (today!) so will appear in the 2.8.0 release very soon.
The packagers of X for Ubuntu have been working hard to address this exact problem of the lag between developers and users. They've setup a repository of packages named xorg-edgers where users can easily get packages built directly from git. I don't think it includes the kernel yet, as it will soon hopefully, but this is definitely a step in the right direction. Update: It does include a 2.6.30 kernel now---maybe all that time I spent writing this blog post wasn't wasted after all.
But what about the users that did upgrade their kernel and driver and are still seeing some major problems? In this case, we really want to hear from you. We take pride in our work and are committed to doing our best to make quality releases without regressions. When you hit some awful behavior, (GPU hang, scrambled mode, abysmal performance), it may be that you're just hitting a combination of components or application behavior that we've never seen before. We want those details and we want to fix the problem. File good bug reports and please make use of the "regression" keyword as appropriate. We pay attention to that.
Performance regressions are an especially important issue. In fact, performance is so important that I'll make a separate post about that. I've got some exciting news to share about performance measurement.Posted Wed Jun 10 16:29:07 2009
In June of 2008 I started rock climbing at a local gym, and instantly got hooked. I still can't fully explain the fascination I have with climbing, but it certainly includes an aspect of solving a three-dimensional puzzle. When I started, the easiest routes in the gym (graded 5.7) were difficult for me, but satisfying.
Within a few months I had already acquired a fair amount of gear, and I had the chance to climb outdoors (on toprope) with a good friend, (Scott DV), who was rediscovering climbing, (having climbed in Idaho years ago). We both found that there is some fine climbing to be had in the Portland area. We were just disappointed that we hadn't started sooner, because Oregon's wet season soon started interfering with outdoor climbing.
I got through the winter months by being lucky enough to find a climbing partner, (Scott D), willing to train with me 3 times a week, (and helping to motivate me to get up at ridiculously early hours to climb before work). We both took technique classes that took our climbing up a notch. By the time we were ticking off routes at 5.9, we took lead-climbing class and starting leading in the gym in January 2009.
An unseasonably cold and wet month of March kept us in the gym, where we kept at our routine, leading at 5.8/5.9 and toproping at 5.10- (with several 5.10+ projects). In April, the sun finally came out and we took the first chance we could to get outdoors with a road trip to Smith Rock and some of the best climbing in Oregon. Our first experience leading outdoors was very satisfying, (and very different than in the gym), with each of us succeeding at 5.8 leads.
Back at the gym two days later, we decided that if we're going to start leading at 5.9, we'd better be working 5.11 in the gym. So we started working our first 5.11- project. It was grueling, (and we didn't make it through the crux without cheating with some weight on the rope), but we did get to the top, and it was a blast.
One of my favorite things about climbing is that the routes are graded on an open-ended scale. It's fantastic to know that no matter how much I improve there will always be routes ready to punish me and put me back in my place.Posted Wed Apr 22 12:19:40 2009