Rss Directory > Computer > Unix/Linux > Planet Gentoo
Planet Gentoo - http://planet.gentoo.org/
 

Setting up the storage on my new machine, I just ran into something really interesting, what seems to be deliberate usable and useful, but completely undocumented functionality in the MD RAID layer.

It's possible to create RAID devices with the initial array having 'missing' slots, and then add the devices for those missing slots later. RAID1 lets you have one or more, RAID5 only one, RAID6 one or two, RAID10 up to half of the total. That functionality is documented in both the Documentation/md.txt of the kernel, as well as the manpage for mdadm.

What isn't documented is when you later add devices, how to get them to take up the 'missing' slots, rather than remain as spares. Nothing in md(7), mdadm(8), or Documentation/md.txt. Nothing I tried with mdadm could do it either, leaving only the sysfs interface for the RAID device.

Documentation/md.txt does describe the sysfs interface in detail, but seems to have some omissions and outdated material - the code has moved on, but the documentation hasn't caught up yet.

So, below the jump, I present my small HOWTO on creating a RAID10 with missing devices and how to later add them properly.

MD with missing devices HOWTO

We're going to create /dev/md10 as a RAID10, starting with two missing devices. In the example here, I use 4 loopback devices of 512MiB each: /dev/loop[1-4], but you should just substitute your real devices.

# mdadm --create /dev/md10 --level 10 -n 4 /dev/loop1 missing /dev/loop3 missing -x 0
mdadm: array /dev/md10 started.
# cat /proc/mdstat 
Personalities : [raid1] [raid10] [raid0] [raid6] [raid5] [raid4] 
md10 : active raid10 loop3[2] loop1[0]
      1048448 blocks 64K chunks 2 near-copies [4/2] [U_U_]
# mdadm --manage --add /dev/md10 /dev/loop2 /dev/loop4
mdadm: added /dev/loop2
mdadm: added /dev/loop4
# cat /proc/mdstat 
Personalities : [raid1] [raid10] [raid0] [raid6] [raid5] [raid4] 
md10 : active raid10 loop4[4](S) loop2[5](S) loop3[2] loop1[0]
      1048448 blocks 64K chunks 2 near-copies [4/2] [U_U_]

Now notice that the two new devices have been added as spares [denoted by the "(S)"], and that the array remains degraded [denoted by the underscores in the "[U_U_]"]. Now it's time to break out the sysfs interface.

# cd /sys/block/md10/md/
# grep . dev-loop*/{slot,state}
dev-loop1/slot:0
dev-loop2/slot:none
dev-loop3/slot:2
dev-loop4/slot:none
dev-loop1/state:in_sync
dev-loop2/state:spare
dev-loop3/state:in_sync
dev-loop4/state:spare

Now a short foray into explaining how MD-raid sees component devices. For an array with N devices total, there are slots numbered from 0 to N-1. If all the devices are present, there are no empty slots. The presence or absence of a device in a slot is noted by the display from /proc/mdstat: [U_U_]. That shows we have a devices in slots 0 and 2, and nothing in slots 1 and 3. The mdstat output does include slot numbers after each device in the listing line: md10 : active raid10 loop4[4](S) loop2[5](S) loop3[2] loop1[0]. loop4 and loop2 are in slots 4 and 5, both spare. loop3 and loop1 are in slots 0 and 2. The slot numbers that are greater than the device numbers seem to be extraneous, I'm not sure if they are just an mdadm abstraction, or in the kernel internals only.

Now we want to fix up the array. We want to promote both spares to the missing slots. This is the first item that Documentation/md.txt is really wrong it. The description for the slot sysfs node contains: "This can only be set while assembling an array." This is actually wrong, we CAN write to it and fix our array.

# echo 1 >dev-loop2/slot
# echo 3 >dev-loop4/slot
# grep . dev-loop*/slot
dev-loop1/slot:0
dev-loop2/slot:1
dev-loop3/slot:2
dev-loop4/slot:3
# cat /proc/mdstat
Personalities : [raid1] [raid10] [raid0] [raid6] [raid5] [raid4] 
md10 : active raid10 loop4[4] loop2[5] loop3[2] loop1[0]
      1048448 blocks 64K chunks 2 near-copies [4/2] [U_U_]

The slot numbers have changed in the mdstat output and the sysfs, but they no longer match at all. The spare marker "(S)" has also vanished. Now we can follow the sysfs docmentation, and force a rebuild using the sync_action node.

In theory, the mdadm daemon, if running, should have detected that the array was degraded and had valid spares, but I don't know why it didn't. Perhaps another bug to trace down later.

# echo repair >sync_action 
(wait a moment)
# cat /proc/mdstat
Personalities : [raid1] [raid10] [raid0] [raid6] [raid5] [raid4] 
md10 : active raid10 loop4[4] loop2[5] loop3[2] loop1[0]
      1048448 blocks 64K chunks 2 near-copies [4/2] [U_U_]
      [=============>.......]  recovery = 65.6% (344064/524224) finish=0.1min speed=22937K/sec

The slot numbers still aren't what we set them to, but the array is busy rebuilding still.

# cat /proc/mdstat 
Personalities : [raid1] [raid10] [raid0] [raid6] [raid5] [raid4] 
md10 : active raid10 loop4[3] loop2[1] loop3[2] loop1[0]
      1048448 blocks 64K chunks 2 near-copies [4/4] [UUUU]

Now that the rebuild is complete, the slot numbers have flipped to their correct values.

Bonus: regular maintenance ideas

While we can regularly check individual disks with the daemon part of smartmontools, issuing short and long disk tests, there is also a way to check entire arrays for consistency.

The only way of doing it with mdadm is to force a rebuild, but that isn't really a nice proposition if it picks a disk that was about to fail as one of the 'good' disks. sysfs to the rescue again, there is a non-destructive way to test an array, and only promote to repair mode if there is an issue.

# echo check >sync_action 
(wait a moment)
# cat /proc/mdstat
Personalities : [raid1] [raid10] [raid0] [raid6] [raid5] [raid4] 
md10 : active raid10 loop4[3] loop2[1] loop3[2] loop1[0]
      1048448 blocks 64K chunks 2 near-copies [4/4] [UUUU]
      [============>........]  check = 62.8% (660224/1048448) finish=0.0min speed=110037K/sec

Either make a cronjob to do it, or put the functionality in mdadm. You can safely issue the check command to multiple md devices at once, the kernel will ensure that it doesn't check array that share the same disks.

  Sun, 07 Sep 2008 17:55:00 +0200

So there have been quite a few people asking what is happening with KDE 4.1 on Gentoo. There are still no ebuilds in the tree for KDE 4.1 and to be totally honest I have not been actively developing the KDE ebuilds until recently (took quite a long break). It is however something I really feel that we should have and so I have been working with a few other developers and interested users on the new ebuilds in an overlay. I think these ebuilds are almost ready and I am very eager to get them into the tree.

So what have we been doing? We have been working on a set of ebuilds that can be used with portage 2.2, there were a few setbacks when it became clear that EAPI 2 was not likely to be allowed in the tree in the very near term, but good noises are being made on the mailing lists. I think one of the big changes I have been working on is bringing KDE back into the normal file system hierarchy. This means that you will not be able to slot multiple minor versions of KDE such as 4.1 and 4.2 together.

The loss of slotting also means that ebuild maintenance will be significantly easier, externally released packages will automatically relink to the latest kdelibs for example. It also means that users will not build up multiple copies of minor KDE versions such as 4.0, 4.1, 4.2. I think for the majority of our userbase (myself included) this situation is undesirable with few benefits for normal use of KDE. So having one slot for KDE 4 and upgrading in the normal fashion seems to be very beneficial.

This has not been universally accepted as a good thing. I personally think our main tree KDE should always be installed in the normal FHS hierarchy as upstream seems to intend it. I see little benefit for most users, and many developers not closely involved in KDE development, to having multiple minor versions installed. Also having external KDE packages such as k3b and amarok installed in /usr but linking to libraries in a KDE prefix has never been optimal.

My vision is for a default in-tree KDE that installs into the normal FHS tree as Gnome, Qt 4, XOrg etc do. Developers, power users and those that like to tweak would still be free to install slotted versions of KDE in KDE prefixes alongside the FHS KDE. They could also remove the FHS KDE and just use slotted versions too. We discussed during KDE 3.5 bumps how the current slotting was not ideal.

I am certainly open to opinions. It would be good to get wider feedback from the community on which direction they would like to go in and why. This will not affect KDE 3.5 slotting with KDE 4 - they will always be slotted. It will make maintenance of external KDE packages simpler as everything will be in the same tree. Overlays with ebuilds in alternate slots and prefixes are of course still easily possible and people could continue using KDE in this way.

Currently this work is taking place in an overlay. I am quite honestly exhausted discussing why this is or is not a good idea. I hope I have made my position clear and why I think it is a good thing for our user base. I really want to get KDE 4.1 into the tree as soon as possible. I do feel that these changes should be pushed into the tree, defaulting to an FHS compliant install and using a KDE prefix if requested.

What are your thoughts (other than get KDE 4.1 in tree now - we are working on that)?


In response to Bugday and my willingness to help, I want to help users that contribute. I won’t be around for Bugday and I am normally not available on the weekends. So, here is what I am willing to do.. On any maintainer-needed bug, if you fix the issue, feel free to CC me (darkside (at) gentoo.org) on it. You should say something like “darkside: I fixed this and tested it, please commit” and I will review it and commit it.

Do not:

  • CC me on a bug that doesn’t have a fix.
  • CC me on a bug asking for help with something
  • CC me on a bug with a fix that you personally have not tested.
  • CC me on a maintainer-wanted package. Sorry, but there is Sunrise for you.
  • Abuse my willingness to help you.
  • Expect an immediate commit. Work, school, and my significant other come first ;)

I am offering to do this because a) I like when users get involved and help make Gentoo better, b) the whole maintainership concept slightly bothers me, c) I can’t look at all 250+ maintainer-needed bugs to see if there is a fix for it by myself.

Here is a search for bugzilla that you could use to find maintainer-needed bugs (in assignee or CC). At the time of this writing there is 270. I (plus others I’m sure) have worked that queue before, the lowest I have seen it recently is 250. Consider this an open offer and we will see what happens. Thanks for helping!

What: Gentoo contributors get together to help each other fix bugs

Where: irc.freenode.net, #gentoo-bugs

When: Saturday, September 6, in a timezone near you

What do you need to bring?

  • A Gentoo system, an Internet connection and an IRC client
  • Your bug. If you don't have one, we will find you one to suit your area of interest and your skills
  • Your favorite editor
  • A way to test that your bug is fixed (asking people counts!)
  • You don't need to know C, C++, or bash

What's a bug? Gentoo's way of tracking change requests. A change request can be anything from "I've found a typo in foo" to "I've built this really useful program called bar but there's no ebuild for it." Bugs have various levels of helpfulness, from identifying the existence of a problem to localizing the problem to providing the patch to fix it.

There are bugs in documentation such as man pages as well as ebuilds and the source code that Gentoo distributes. These bugs are problem reports. Bugs for things Gentoo doesn't do yet but you think should be done are feature requests. Bugday is more about fixing problems than adding features, but you won't be turned away if you want help with a new feature.

Want to know more about Bugday? It's held on the first Saturday of every month. It's an opportunity for everyone to contribute to making Gentoo better, and eventually you might even become a Gentoo developer. See the Bugday project page for more details.

Bugday is about community spirit. Gentoo is a community—there is no "me" and "them", there is only "we," so instead of lobbying for "them" to fix your particular bug, work together to fix it! Bugday is an opportunity to get help to help yourself.

If you've been wanting to get involved but weren't sure how, Bugday is a great way for you to see what goes on in making a distribution and get involved in Gentoo.

Discuss this!

Roy Bamford contributed the draft for this announcement.

Image Stacking

As Henner writes, we've managed to add a new feature that some users were requesting for quite a long time -- now there's that thingy for marking several images as "belonging together".

Imagine you've taken a group photo of bunch of your friends at a pub. You weren't exactly sober, so perhaps some of these are shaky, out-of-focus or otherwise imperfect. You weren't drunk as a lord, though, so there's one of them which is pretty good. Now you don't want to erase all the other pictures, but you don't want to see them by default, either. So what you do now is to make an Image Stack of all of them and select which are bad and which one is the best one. This feature might be useful for various variants of images decoded from one RAW file or for images stitched to one big panorama as well.

It's far from complete yet (the GUI part still sucks and you can't select your order of preference, either), but it's a good start and it got done pretty fast.

Lets look at an example stack with a picture of Gunvald (Jesper's dog) on the beach. The default view in the thumbnail view shows only the first picture:
collapsed stack view … expanding to
expanded stack view

Please read Henner's post as well and let us know what you think about this feature.

Digikam

Shortly after my last post, I got a bunch of mails from our users. It's great to hear that people are interested in our work, so many thanks to all the readers! Please keep the comments going. Anyway, the most frequent question was interoperability with Digikam.

At first, I'd like to mention that I'm on a KPhotoAlbum Development Sprint. Nobody of us over here is hacking on Digikam. It surely is a great piece of software, but we don't use it ourselves. That's why I have to disappoint one reader of this blog -- nope, I can't blog about new features in Digikam, sorry. But be sure to check out the KDE Planet form time to time, their development team is blogging every now and then.

Now the interoperability thingy -- we have a nice file describing what the problems are. Looking at the document, I'm afraid that both applications are using quite different approach to tagging -- in Digikam, tags are (I haven't check the Digikam sources, nor am I using it, so this might be actually wrong) apparently in a flat namespace, while in KPhotoAlbum, we use a hierarchy (a tree) of tags for features like "pictures of Anna should be recognized as pictures containing anyone of my friends as well". This is certainly not a showstopper, but quite an important issue nonetheless.

However, please don't get disappointed too early. The question is -- why should we somehow prefer interoperability with Digikam over, say, XnView? There's a lot of users out there who are using different applications on different operating systems, and we'd like to do our best to be interoperable with all of them. At the same time, we certainly aren't willing to be held back by lack of features in other applications. Therefore, we won't write any feature like "copy my database to the Digikam format", nor do we expect Digikam to be able to import our stuff. (There's nothing holding you back in writing your own converter, though -- our format is pretty simple to deal with.) What we will do, however, is adding a feature for metadata import and export. We'll use standardized stuff like Exif, IPTC and XMP, so that it should be easy for KPhotoAlbum users to provide their friends with images with embedded tags (and the other way round, of course).

KDE4, SQL,...

Another question was a state of the KDE4 port and the SQL support. I'll leave that for another blogpost, as this one got pretty long already :).

  Thu, 04 Sep 2008 08:05:35 +0200
I blogged about the USE flag transition from USE=tetex to something more sensible before. This is finally done and so the next step is to get rid of virtual/tetex in the whole tree and depending on the virtual needed instead like virtual/tex-base, virtual/latex-base or virtual/texi2dvi. To avoid problems there is a recommendation for all users and developers: Check if TeXLive is availabe for you, unmerge teTeX and do a world update to get something more up to date. TeXLive 2008 will not provide virtual/tetex anymore and was release yesterday, after some tests it will hit the tree. Also check out the upgrade guide and this forums thread.
Edit 2008/09/05:

A private source that I inquired of indicates that the AD2000B part was only a special run of the AD1989B part. There shouldn't be any functional differences. On the side of a spec sheet, the AD1989B specs should be available "shortly" from Analog Devices.

Original posting:

So in more details to follow, I picked up hardware for a new workstation to replace my G5. The only part of the hardware that isn't working yet, is the digital audio (SPDIF/Toslink) output. My motherboard is an Asus P5Q-Premium, and the specifications claim to have "ADI® AD2000B 8-Channel High Definition Audio CODEC" as the audio chip. This chip is apparently the successor to the AD1988B chip. The analog audio part works fine, just that I use optical to overcome an interference issue on the run between my computers and my actual working area of my desk (with a small digital decoder and stereo speakers).

Digging around in the ALSA drivers, it just seems I need to find a different set of controls to toggle the digital lines to be outputs or enabled - and that this data would be in the public datasheet, just like previous versions of the chip. I submitted a technical request to Asus a few days ago, with no response yet. I also contacted Analog Devices directly. Their customer support referred me to their application engineers, whom I phoned, and they then proceeded to deny the existence of the chip, and I quote: "It's not in my system, we don't manufacture it." That's really interesting, because I've got it on my motherboard!

Either the divisions of Analog Devices aren't talking, or Asus is using chips from a 3rd party that's ripping off Analog Device's trademark amongst other things.

Here's the text off the chip:

AD2000BX
14??793.1
#0816 0.3
SINGAPORE

I tried to take a photo, but it's really annoying and hard to read, without dis-assembling my machine, which I'd prefer not to do at this point.

However, I did find another photo on the web, of the same area from a review of the motherboard. The Analog Devices logo is also clearly visible after the 'BX' portion of the text. From the photo I could make out:

AD2000BX
1383055.1
#0808 0.2
SINGAPORE

If I had to make a guess about it, the chip is AD2000BX, the second line is the serial number, the third is the year and week of manufacturer, plus the revision of the chip, and the last line is the manufacture location.

If you're from Asus or Analog Devices, and you're reading this, where's the datasheet for the chip? Is it a real ADI part? I simply want the public datasheet like the rest of models so that I can fix digital audio output in Linux myself, and contribute it back to the ALSA project.

P.S. The upstream ALSA bug is here. There's no downstream Gentoo bug.

As I wrote earlier I've had some problems with the iwlwifi driver and my Intel 3945 wifi card. One workaround brought to my attention by again by Jeremy Olexa's blog post is to run G mode only.

This can easily be done in OpenWRT Kamikaze with:

ipkg install wl

In the config file /lib/wifi/broadcom.sh add the following line just after the killall statement:

wl gmode GOnly

Restart network/box.

With this workaround speed is back to normal, thanks to the initial blog post by Markus Golser here.

  Wed, 03 Sep 2008 07:48:39 +0200

Here’s something I haven’t written about in a long time — bend, my custom written CLI PHP5 scripts to rip and encode TV shows.

I actually rewrote the entire thing over Labor Day weekend.  What’s amazing is it took so long to write the original one, but so short a time to completely revamp it.  It’s something I’ve been wanting to do for a long time, and I’m glad I finally got to it.  The code on the old one was so horrible, and was such a frustrating experience to patch, debug or add features.  The new one is already 20 times better.

The first one was just plagued by scope creep, though — I started off just mostly coding it around the way that I thought DVDs *should* work and how they ought to be authored, only to be constantly slapped in the face by so many exceptions that I’d have to go back and hack it to work around the new found realities.

One example is that either lsdvd or libdvdread is buggy in how it outputs chapter information.  Actually, my whole experience with chapters have been that if there are any oddities, then the players will just freak out.  You wouldn’t believe the cases I ran into.  Anyway, here’s a small example.  On one DVD, lsdvd will report in original output that one track has 30 chapters on it.  But when you go to display the chapters, it will only say that there are two.  Most of the time, what happens, is that it will choke anytime there is a chapter between others that is zero length.  In this case, lsdvd just chokes and stops counting them.  MPlayer (at least, the ancient version I’m using) will do a couple of things depending on its mood — sometimes freeze, sometimes skip over it, sometimes act like its not even there.  It’s very odd.  I’ve found a lot of interesting little bugs in the dvd libraries and tools.  I’d love to poke and the source and fix them up … when I have time.

The code is online in my svn repo, and the new one is called ‘drip’ for dvd ripper.  Original, I know, but eventually it will replace bend completely once I add in all the features the old one had plus all the new stuff I want.  I would throw in a link to trac which has prettier display output for viewing SVN files, but my installation is broken (again) and I have no idea why, and it’s always a royal pain trying to figure out what went wrong, so I’ll just fix it later.  I love trac, but its not easy debugging the setup.

Oh yah, also I’ve been working on my mythvideo setup, tweaking it even more.  One really thing that dawned on me, which I’ll write in more detail once I actually have a script ready, is that you can use it to execute shell scripts using the File Types admin menu.  Just tell it to execute .sh files in your folders with /bin/bash and away you can go.

Another thing I learned is that MythVideo will only pass two variables to any external scripts, the default player (%d) and the video file (%s), or more accurately, the file you’ve selected to run.  So if you wanted to see what you’re executing, you would add this to the file type for .sh files: /bin/bash %s %s

Then, say you had test.sh, this would be the contents:

#!/bin/bash

echo $a

I’m getting ahead of myself, though .. I’ll write more about that when I’ve got something to show.  I’m actually working on a shell script similar to mplayer-resume to resume playback of a playlist you’re in.  It’s a bit trickier than I thought it would (or rather, not nearly as simple as I had hoped), so I’m still scoping it out in my head.

Speaking of mplayer-resume, I fixed a bug I kept running into with it for a while now.  The script will now catch the exit code of mplayer, and if it’s not successful (zero), then it won’t overwrite or delete the old position.  I used to hit it all the time because I used to run mplayer -hardframedrop when playing my videos, which would crash the playback about 10% of the time and of course kill the file that had the playback position.  I need to repackage it and push it live, but there’s a few more small fixes I want to make to it first … I might finish the playlist resume script first and add it to there.  Plus I want to get trac working, because that’s where it’s homepage is.

But, I moved my mini-itx to the living room and hooked it up to my HDTV.  It was sitting in my bedroom just collecting dust, and I figured I might as well move it to see if it gets any more usage.  Actually, I remember now, I moved it was because the LED lights were really bright in my bedroom at night, and I have to sleep in total darkness to get a good night’s rest.   Anyway, it’s worked out well so far.  My TV has a VGA port so it’s super simple to plug it in, not to mention I like the fact that it doesn’t use up an HDMI port.  I love my TV. :)  Once I have this series playlist resume script finished, I think I’ll be pretty much “done” with having the setup that I’ve wanted so long.  Well, aside from the fact that I need about 12 more terabytes of harddrive space.

Good times, I tell you what.  I’m gonna go watch some Star Trek TNG.

  Tue, 02 Sep 2008 19:37:29 +0200

As jaervosz wrote the other day, the iwl3945 driver has some serious issues with it. I think I have it narrowed down to what conditions cause the problem.

Problem: When downloading large files for non-trivial amounts of time, the download speed drops to <80 K/s. This is unacceptable, the whole pipe is limited to that by the way. I am not sure what exactly causes this but I have narrowed down the conditions to which it happens.

Encryption does not matter.. it falters on wep/wpa{1,2} or open networks. However, I found that this condition only exists on mode B APs. This includes “mixed” APs as well. I do not know enough about drivers but if the AP offers B & G, then it should select G, right? Well, based on the condition of the speeds, I would have to say that it is selecting B mode and then hitting this bug again.

Anyway, for the time being…Do not use B APs. Easier said that done because I’m sure most every sys-admin would select Mixed AP over G only. So, if you are experiencing this issue as well, please comment on the upstream bug, which has been open for 7 months by the way. Annoying. Maybe we can convince them to look at this issue some more? Even intel employees are CC’d on the bug because they have the issue too..

Workaround: Convert your AP to G only or use G only APs.

  Tue, 02 Sep 2008 13:13:00 +0200

One interesting thing of using chroots to check things out is that often enough you stumble across different corner cases when you get to test one particular aspect of packages.

For instance, when I was testing linking collisions, I found a lot of included libraries. This time testing for flags being respected I found some other corner cases.

It might some funky, but it has been common knowledge for a while that gcc -O0 sometimes produced bad code, and sometimes it failed to build some packages. Unfortunately it’s difficult to track it down to specific problems when you’re “training” somebody in handling the compiler. Today, I found one of these cases.

I was going to merge sys-block/unieject in my flagstesting chroot so I could make sure it worked properly, for this, it needed dev-libs/confuse, which I use for configuration files parsing. All at once, I found this failure:

 i686-pc-linux-gnu-gcc -DLOCALEDIR=\"/usr/share/locale\" -DHAVE_CONFIG_H -I. -I. -I.. -Wall -pipe -include /var/tmp/portage/dev-libs/confuse-2.6-r1/temp/flagscheck.h -MT confuse.lo -MD -MP -MF .deps/confuse.Tpo -c confuse.c  -fPIC -DPIC -o .libs/confuse.o
confuse.c: In function 'cfg_init':
confuse.c:1112: warning: implicit declaration of function 'setlocale'
confuse.c:1112: error: 'LC_MESSAGES' undeclared (first use in this function)
confuse.c:1112: error: (Each undeclared identifier is reported only once
confuse.c:1112: error: for each function it appears in.)
confuse.c:1113: error: 'LC_CTYPE' undeclared (first use in this function)
make[2]: *** [confuse.lo] Error 1
make[2]: Leaving directory `/var/tmp/portage/dev-libs/confuse-2.6-r1/work/confuse-2.6/src'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/var/tmp/portage/dev-libs/confuse-2.6-r1/work/confuse-2.6'
make: *** [all] Error 2

This was funny to see as I did merge confuse lately on my main system on Yamato, and I do have nls enabled there, too. It didn’t fail, so it’s not even a glibc cleanup-related failure.

Time to dig into the code, where is setlocale used in confuse?

#if defined(ENABLE_NLS) && defined(HAVE_GETTEXT)
    setlocale(LC_MESSAGES, "");
    setlocale(LC_CTYPE, "");
    bindtextdomain(PACKAGE, LOCALEDIR);
#endif

As I used this before, I know that locale.h provides setlocale() function, but usually it’s included through gettext’s own libintl.h header file, so where is that included? A common problem here would be to have different preprocessor tests between the include and the use so that one is applied but not the other.

#if defined(ENABLE_NLS) && defined(HAVE_GETTEXT)
# include <libintl.h>
# define _(str) dgettext(PACKAGE, str)
#else
# define _(str) str
#endif
#define N_(str) str

It seems to be entirely fine, the only problem would be if libintl.h didn’t include locale.h, but why would it then work on the rest of the rest of the system?

The focal point here is to check why libintl.h includes locale.h in one place and not the other. Let’s then look at the file itself:

/* Optimized version of the function above.  */
#if defined __OPTIMIZE__ && !defined __cplusplus

[...]

/* We need LC_MESSAGES for `dgettext'.  */
# include <locale.h>

[...]

#endif  /* Optimizing.  */

So not for any kind of assurance, but just because there’s a technical need, libintl.h brings in the declaration of setlocale(), and only if you have optimisations enabled. Guess what? my chroot has no optimisations enabled, as I don’t need to execute the code, but just build it.

The fix here is very easy, just include locale.h explicitly; I’ll be sending a patch upstream and submitting one to Gentoo, but it puts an important shadow over the correctness of Free Software when building with optimisations disabled. I suppose this is one other thing that I’ll be testing for in the future, in my checklist.

As you might already know from the news item at the KPA website, I'm currently sitting behind a table in a wonderful country of Denmark. I'm not sitting there alone, though, because we're having a small KPhotoAlbum development sprint right now. Jesper, the main author, was so kind to invite us to his home, the KDE e.V. paid the flight tickets, so all we have to pay is just the barrels of beer and the T-shirts. And that's an investment a typical developer makes with pleasure :).

So far we have made just 17 commits, but there's a lot of talk going here (hi Tuomas :p). For exmaple, the SQL backend is getting closer and closer to being completely ready, the support for multiple cores is shaping up (and you won't see that annoying flickering anymore) and we even have a cute, transparent infobox which looks really sexy.

Right now, I'm working on a reworked EXIF/IPTC/XMP import/export thingy, which would essentially allow you to customize how KPhotoAlbum imports the metadata from the images, and also to store these data back to the files for increased interoperability and stuff like that. So I'll be finally able to share all the captions and tags with a friend of mine who's using XnView. This feature was already present in the 3.1.0 version, but the GUI for it was, well, not really intuitive. I'm converting it to Kross, the KDE scripting environment.

As I brought a GPS device some time ago, one can also expect some efforts for integration with Marble. There's also a whiteboard full of interesting ideas which I won't mention here.

  Mon, 01 Sep 2008 13:58:43 +0200

Hi all…

Firefox 3.0.1 showing the Acid2 test on MIPS (click to enlarge)

Firefox 3.0.1 showing the Acid2 test on MIPS (click to enlarge)

I’ve been rather busy and thus haven’t had much time for Gentoo work, but today I managed to get some patches together that allow Firefox 3.0.1 to build and run on MIPS.

The ebuild is as yet, unkeyworded, as I wish to do some further testing.  I have successfully compiled it on little-endian MIPS, and it mostly seems to work okay.  It mostly passes the Acid2 test with some slight errors, but unfortunately, crashes part way through the Acid3 test — this I’ll investigate when I have the time.  It also crashes on my blog so your mileage may vary.

No testing has been done on big-endian MIPS at this stage, as my O2 is down for the short term (need to build a new kernel and get X running) so I’d appreciate feedback from users on this matter.

  Mon, 01 Sep 2008 10:52:18 +0200

Just a quick post to let you all know that xf86-video-i810-2.4.2 has hit portage in ~arch.

Here's my request : please test the hell out of it !! :)

It seems to be a good improvement over the original 2.4.0 release as it has less flicker, but it needs some serious testing. I think there are still cases where I can see flickering but the conditions seem to vary over time and I can't reliably reproduce the issue.

So please test this new release as much as you can and let me know how it works for you. Bugs should be filed in Bugzilla and not in the comments ;)

As for the new 2.5 branch, it should hit ~arch with a big fat package.mask over the next couple of days.

Oh, and now I'm part of the X11 Team as I have to work with X stuff for my job, I might as well contribute some efforts there too.

Cheers!

Update : Please please file bugs if you still see flickering! Bugs won't solve themselves! Thanks :)

The August issue of the Gentoo Monthly Newsletter has been released. In this month's issue: PHP4 removal, GSOC interview, new Gentoo-based distributions, and more!

Discuss This!

  Mon, 01 Sep 2008 05:52:34 +0200


I spent some time this weekend at the OLPC Physics Game Jam. I teamed up with the legendary Nirav Patel and we made a bridge building game. The objective is to build a bridge and see if it survives after a train starts travelling across it.

We only have one level so far, but it is quite engaging and not as easy as it might sound. It was interesting to see some youngsters try it and experiment with different bridge structures during the review session. The game also features some top notch sound effects coordinated by Brian Jordan.

We won the gold prize for game development. To learn more and download it, see the Bridge page on the wiki.

  Mon, 01 Sep 2008 02:46:00 +0200

As I said on What did Enterprise do?, I had (and have again) a series of chroots I use for testing particular setups; I have, for instance, one running OpenPAM that can tell me whether the software in the tree have the proper dependencies (either sys-libs/pam if it wants Linux-PAM or virtual/pam if it works with OpenPAM).

Since yesterday, thanks to solar, I have a new addition to my testing rig: an uclibc chroot. I asked solar to get me something I could download and run locally as I had to fix a bug with PAM which is now fixed.

I have to say that I don’t know much yet about the setup of uClibc itself, which means I haven’t gotten to understand well yet how iconv is supported in it. Certainly I now know that once USE-based dependencies will be available in the tree I’ll try once again to see if libiconv can be used for something else beside Gentoo/FreeBSD (but the collision with man-pages should be solved before that, if it’s not already).

Even though I know solar does not really wish for me to mess in with NLS and uClibc, I find it a pretty important part of Gentoo/FreeBSD work, I always found it as such, the reason for this is that it’s easier to fix something the right way when you have more than one alternative case. Otherwise you might end up special casing something that should be made generic instead.

I also expect Gentoo/FreeBSD 7 to be upcoming, and that will probably mean my return to that too, now that I can get a VirtualBox running at a decent speed.

But I haven’t even started setting up the uClibc chroot to speed with what I want to do, in particular I want to set up my cowstats script on that too, and maybe one day adding the flags testing script too (which is unfortunately disruptive).

All in all, I hope that having an uclibc chroot around will allow the packages I maintain to work out of the box on uClibc, it’s going to be a pretty interesting task.

As I have written in my post Flags and flags, I think that one way out of the hardened problem would be to actually respect the CFLAGS and CXXFLAGS the user requests so that they actually apply to the ebuilds. Unfortunately, not all the ebuilds in the tree respect the flags, and finding out which ones do and which ones don’t hasn’t been, up to now, an easy task.

There are many reasons for this, the most common one is to look at the build output and spot that all the compile lines lack your custom flags, but this is difficult to automate, another option is to inject a fake definition option (-DIWASHERE) and grep for it in the build logs, but this is messed up if you consider that a package might ignore CFLAGS just for a subset of its final outputs.

While I was without Enterprise I spent some time thinking about this and I came to find a possible solution, which I’m going to experiment on Yamato, starting tonight (which is Friday 29th for what it’s worth).

The trick is that GCC provides a flag that allows you to include an extra file, unknown to the rest of the code. With a properly structured file, you can easily inject some beacon that you can later pick up.

And with a proper beacon injected in the build files, it shouldn’t be a problem to check using scanelf or similar tools if the flags were respected.

The trick here is all in the choice of the beacon and in looking it up; the first requirement for the proper beacon is to make sure it does not intrude and disrupt the code or the compilation, this means it has to have a name that is not common, and thus does not risk to collide with other pieces of code, and won’t clash between different translation units.

To solve this, the name can be just very long so that it’s impractical that somebody might have used it for a funciton or variable name, let’s say we call that beacon cflags_test_cflags_respected. This is the first step, but it still doesn’t solve the problem of clashing traslation units. If we were to write it like this:

const int cflags_test_cflags_respected = 1234;

two translation units with that in them, linked together, will cause a linker error that will stop the build. This cannot happen or it’ll make our test useless. The solution is to make the symbol a common symbol. In C, common symbols are usually the ones that are declared without an initialisation value, like this:

int cflags_test_cflags_respected;

Unfortunately this syntax doesn’t work on C++, as the notion of common symbol hasn’t crossed that language barrier. Which means that we have to go deeper in the stack of languages to find the way to create the common symbol. It’s not difficult, once you decide to use the assembly language:

asm(".comm cflags_test_cflags_respected,1,1");

will create a common symbol of size 1 byte. It won’t be perfect as it might increase the size of .bss section for a program by one byte, and thus screw up perfect non-.bss programs, but we’re interested in the tests rather than the performance, as of this moment.

There is still one little problem though: the asm construct is not accepted by the C99 language, so we’ll have to use the new one instead: __asm__, that works just in the same way.

But before we go on with this, there is something else to take care of. As I have written in the entry linked at the start of this one, there are packages that mix CFLAGS and CXXFLAGS. As we’re here, it could be easy to just add some more test beacons that track down for us if the package has used CFLAGS to build C++ code or CXXFLAGS to build C code. With this in mind, i came to create two files: flagscheck.h and flagscheck.hpp, respectively to be injected through CFLAGS and CXXFLAGS.

flame@yamato ~ % sudo cat /media/chroots/flagstesting/etc/portage/flagscheck.h
#ifdef __cplusplus
__asm__(".comm cflags_test_cxxflags_in_cflags,1,1");
#else
__asm__(".comm cflags_test_cflags_respected,1,1");
#endif
flame@yamato ~ % sudo cat /media/chroots/flagstesting/etc/portage/flagscheck.hpp
#ifndef __cplusplus
__asm__(".comm cflags_test_cflags_in_cxxflags,1,1");
#else
__asm__(".comm cflags_test_cxxflags_respected,1,1");
#endif

And here we are, now it’s just time to inject these in the variables and check for the output. But I’m still not satisfied with this. There are packages that, mistakenly, save their own CFLAGS and propose them to other programs that are linked against; to avoid these to falsify our tests, I’m going to make the injection unique on a package level.

Thanks to Portage, we can create two functions in the bashrc file, pre_src_unpack and post_src_unpack, in the former, we’re going to copy the two header files in the ${T} directory of the package (the temporary directory), then we can mess with the flags variables and insert the -include command. This way, each package will get its own particular path; when a library passes the CFLAGS assigned to itself to another package, it will fail to build.

pre_src_compile() {
    ln -s /etc/portage/flagscheck.{h,hpp} "${T}"

    CFLAGS="${CFLAGS} -include ${T}/flagscheck.h"
    CXXFLAGS="${CXXFLAGS} -include ${T}/flagscheck.hpp"
}

After the build completed, it’s time to check the results, luckily pax-utils contains scanelf, which makes it piece of cake to check whether one of the four symbols is defined, or if none is (and thus all the flags were ignored), the one line function is as follow:

post_src_compile() {
    scanelf "${WORKDIR}" \
        -E ET_REL -R -s \
        cflags_test_cflags_respected,cflags_test_cflags_in_cxxflags,cflags_test_cxxflags_respected,cflags_test_cxxflags_in_cflags
}

At this point you just have to look for the ET_REL output:

ET_REL cflags_test_cflags_respected /var/tmp/portage/sys-apps/which-2.19/work/which-2.19/tilde/tilde.o 
ET_REL cflags_test_cflags_respected /var/tmp/portage/sys-apps/which-2.19/work/which-2.19/tilde/shell.o 
ET_REL  -  /var/tmp/portage/sys-apps/which-2.19/work/which-2.19/getopt.o 
ET_REL cflags_test_cflags_respected /var/tmp/portage/sys-apps/which-2.19/work/which-2.19/bash.o 
ET_REL  -  /var/tmp/portage/sys-apps/which-2.19/work/which-2.19/getopt1.o 
ET_REL cflags_test_cflags_respected /var/tmp/portage/sys-apps/which-2.19/work/which-2.19/which.o

And it’s time to find out why getopt.o and getopt1.o are not respecting CFLAGS while the rest of the build is.

A far less common problem than the last two I have written about, today I wish to analyse the failure in media-gfx/sam2p I reported. I have found similar problems before, and thus I think it’s another case worth talking about although the fix is very quick.

The failure in question would be this one:

Created executable file: ps_tiny (size: 47530).
ps_tiny: error at 1.2.1: tag %<Head or %<Open expected
make: *** [l1ghz.pst] Error 3
make: *** Waiting for unfinished jobs....
encoder.cpp: In member function ‘virtual void CjpegEncode::P::vi_copy(FILE*)’:
encoder.cpp:1033: warning: array subscript is above array bounds
encoder.cpp:1034: warning: array subscript is above array bounds
encoder.cpp:1035: warning: array subscript is above array bounds
ps_tiny: error at 638.5.21620: premature EOF
make: *** [l1g8z.pst] Error 3
ps_tiny: error at 638.5.21620: premature EOF
make: *** [l1gbz.pst] Error 3

The “premature EOF” error message usually means a file is truncated. With experience, you can tell this is a race condition: either the same broken rule or two rules are creating and deleting a file, and one of the two is arriving after it was deleted already.

In this case, looking at the original Makefile, it’s not the same broken rule:

l1g8z.pst: l1zip.psm psmlib.psm ps_tiny
        <$< >tmp.h perl -pe0
        $(CXX) -E $(L1_FLAGS) -DUSE_A85D=1 tmp.h >tmp.i
        <tmp.i >tmp.pin $(PREPROC_STRIP)
        <tmp.pin >tmp.ps0 ./ps_tiny
        <tmp.ps0 >tmp.pst $(TTT_QUOTE) $@
        mv -f tmp.pst $@
l1ghz.pst: l1zip.psm psmlib.psm ps_tiny
        <$< >tmp.h perl -pe0
        $(CXX) -E $(L1_FLAGS) -DUSE_HEXD=1 tmp.h >tmp.i
        <tmp.i >tmp.pin $(PREPROC_STRIP)
        <tmp.pin >tmp.ps0 ./ps_tiny
        <tmp.ps0 >tmp.pst $(TTT_QUOTE) $@
        mv -f tmp.pst $@
l1gbz.pst: l1zip.psm psmlib.psm ps_tiny
        <$< >tmp.h perl -pe0
        $(CXX) -E $(L1_FLAGS) -DUSE_BINARY=1 tmp.h >tmp.i
        <tmp.i >tmp.pin $(PREPROC_STRIP)
        <tmp.pin >tmp.ps0 ./ps_tiny
        <tmp.ps0 >tmp.pst $(TTT_QUOTE) $@
        mv -f tmp.pst $@

I didn’t copy over all the rules, but this already shows the problem here. All the rules, while not exactly identical (the flags passed to the pre-processors are different depending on the target), use the same setting and use the same file names. The result is that while one rule runs the others will run too, creating the race condition.

For Gentoo I fixed it in a slightly sub-optimal way, changing all the reference to tmp. to $@.tmp. This is not exactly the nicest way as the correct way would have been to create different rules that generate the various temporary stages, so that then they could be executed in parallel as much as possible, rather than only sequentially, but as I see very little space for parallelism here, and the build system is a bit of a mess, I thought it was much easier to leave it at that. The result is that the rules above would become:

l1g8z.pst: l1zip.psm psmlib.psm ps_tiny
        <$< >$@.tmp.h perl -pe0
        $(CXX) -E $(L1_FLAGS) -DUSE_A85D=1 $@.tmp.h >$@.tmp.i
        <$@.tmp.i >$@.tmp.pin $(PREPROC_STRIP)
        <$@.tmp.pin >$@.tmp.ps0 ./ps_tiny
        <$@.tmp.ps0 >$@.tmp.pst $(TTT_QUOTE) $@
        mv -f $@.tmp.pst $@
l1ghz.pst: l1zip.psm psmlib.psm ps_tiny
        <$< >$@.tmp.h perl -pe0
        $(CXX) -E $(L1_FLAGS) -DUSE_HEXD=1 $@.tmp.h >$@.tmp.i
        <$@.tmp.i >$@.tmp.pin $(PREPROC_STRIP)
        <$@.tmp.pin >$@.tmp.ps0 ./ps_tiny
        <$@.tmp.ps0 >$@.tmp.pst $(TTT_QUOTE) $@
        mv -f $@.tmp.pst $@
l1gbz.pst: l1zip.psm psmlib.psm ps_tiny
        <$< >$@.tmp.h perl -pe0
        $(CXX) -E $(L1_FLAGS) -DUSE_BINARY=1 $@.tmp.h >$@.tmp.i
        <$@.tmp.i >$@.tmp.pin $(PREPROC_STRIP)
        <$@.tmp.pin >$@.tmp.ps0 ./ps_tiny
        <$@.tmp.ps0 >$@.tmp.pst $(TTT_QUOTE) $@
        mv -f $@.tmp.pst $@

The alternative using pipes, for the first rule, would probably be something like:

l1g8z.pst: l1zip.psm psmlib.psm ps_tiny
        perl -pe0 < $< | \
        $(CXX) -E $(L1_FLAGS) -DUSE_A85D=1 | \
        $(PREPROC_STRIP) | \
        ./ps_tiny | \
        $(TTT_QUOTE) $@ > $@

I haven’t changed it into this because I didn’t have too much time to look into how much difference it makes, or to test it; I’ve written it down to my TODO list for the future, maybe it is a possible improvement.

In general, for parallel make, pipes should be preferred to temporary files, and if temporary files are needed, they should have a different names for each target, so that they won’t overwrite one the other when make is run in parallel.

  Fri, 29 Aug 2008 22:20:24 +0200

A new version of portpeek can be found here (version 1.6.7) or emerged through portage.

Changes:

The way I was checking if a package was masked was incorrect. If you had a package in package.unmask and it was also in profiles/package.mask, the code was incorrectly surmising the package was stable.

Bug reports and patches are always welcome.

Here comes another case study about parallel make failures and fixes. This time I’m going to write about a much less common, and more difficult to understand, type of failure. I have spotted and fixed this failure in gtk# (yes I have it installed).

Let’s see the failure to begin with:

Creating policy.2.4.glib-sharp.dll
Creating policy.2.4.glib-sharp.dll
Creating policy.2.4.glib-sharp.dll
ALINK: error A1019: Metadata failure creating assembly -- System.IO.FileNotFoundException: Could not find file "/var/tmp/portage/dev-dotnet/gtk-sharp-2.10.2/work/gtk-sharp-2.10.2/glib/policy.2.4.glib-sharp.dll".
File name: "/var/tmp/portage/dev-dotnet/gtk-sharp-2.10.2/work/gtk-sharp-2.10.2/glib/policy.2.4.glib-sharp.dll"
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, Boolean anonymous, FileOptions options) [0x00000] 
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access, FileShare share) [0x00000] 
  at (wrapper remoting-invoke-with-check) System.IO.FileStream:.ctor (string,System.IO.FileMode,System.IO.FileAccess,System.IO.FileShare)
  at System.IO.File.OpenRead (System.String path) [0x00000] 
  at Mono.Security.StrongName.Sign (System.String fileName) [0x00000] 
  at System.Reflection.Emit.AssemblyBuilder.Save (System.String assemblyFileName, PortableExecutableKinds portableExecutableKind, ImageFileMachine imageFileMachine) [0x00000] 
  at System.Reflection.Emit.AssemblyBuilder.Save (System.String assemblyFileName) [0x00000] 
  at Mono.AssemblyLinker.AssemblyLinker.DoIt () [0x00000] 
ALINK: error A1019: Metadata failure creating assembly -- System.IO.IOException: Sharing violation on path /var/tmp/portage/dev-dotnet/gtk-sharp-2.10.2/work/gtk-sharp-2.10.2/glib/policy.2.4.glib-sharp.dll
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, Boolean anonymous, FileOptions options) [0x00000] 
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, Boolean isAsync, Boolean anonymous) [0x00000] 
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access) [0x00000] 
  at (wrapper remoting-invoke-with-check) System.IO.FileStream:.ctor (string,System.IO.FileMode,System.IO.FileAccess)
  at System.Reflection.Emit.ModuleBuilder.Save () [0x00000] 
  at System.Reflection.Emit.AssemblyBuilder.Save (System.String assemblyFileName, PortableExecutableKinds portableExecutableKind, ImageFileMachine imageFileMachine) [0x00000] 
  at System.Reflection.Emit.AssemblyBuilder.Save (System.String assemblyFileName) [0x00000] 
  at Mono.AssemblyLinker.AssemblyLinker.DoIt () [0x00000] 
Creating policy.2.6.glib-sharp.dll
Creating policy.2.6.glib-sharp.dll
Creating policy.2.6.glib-sharp.dll
ALINK: error A1019: Metadata failure creating assembly -- System.IO.IOException: Sharing violation on path /var/tmp/portage/dev-dotnet/gtk-sharp-2.10.2/work/gtk-sharp-2.10.2/glib/policy.2.6.glib-sharp.dll
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, Boolean anonymous, FileOptions options) [0x00000] 
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, Boolean isAsync, Boolean anonymous) [0x00000] 
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access) [0x00000] 
  at (wrapper remoting-invoke-with-check) System.IO.FileStream:.ctor (string,System.IO.FileMode,System.IO.FileAccess)
  at System.Reflection.Emit.ModuleBuilder.Save () [0x00000] 
  at System.Reflection.Emit.AssemblyBuilder.Save (System.String assemblyFileName, PortableExecutableKinds portableExecutableKind, ImageFileMachine imageFileMachine) [0x00000] 
  at System.Reflection.Emit.AssemblyBuilder.Save (System.String assemblyFileName) [0x00000] 
  at Mono.AssemblyLinker.AssemblyLinker.DoIt () [0x00000] 
Creating policy.2.8.glib-sharp.dll
Creating policy.2.8.glib-sharp.dll
Creating policy.2.8.glib-sharp.dll
ALINK: error A1019: Metadata failure creating assembly -- System.IO.IOException: Sharing violation on path /var/tmp/portage/dev-dotnet/gtk-sharp-2.10.2/work/gtk-sharp-2.10.2/glib/policy.2.8.glib-sharp.dll
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, Boolean anonymous, FileOptions options) [0x00000] 
  at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access, FileShare share) [0x00000] 
  at (wrapper remoting-invoke-with-check) System.IO.FileStream:.ctor (string,System.IO.FileMode,System.IO.FileAccess,System.IO.FileShare)
  at System.IO.File.OpenWrite (System.String path) [0x00000] 
  at Mono.Security.StrongName.Sign (System.String fileName) [0x00000] 
  at System.Reflection.Emit.AssemblyBuilder.Save (System.String assemblyFileName, PortableExecutableKinds portableExecutableKind, ImageFileMachine imageFileMachine) [0x00000] 
  at System.Reflection.Emit.AssemblyBuilder.Save (System.String assemblyFileName) [0x00000] 
  at Mono.AssemblyLinker.AssemblyLinker.DoIt () [0x00000] 
make[3]: *** [policy.2.4.glib-sharp.dll] Error 1
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

Okay so there are some failures during calling “alink”, in particular it reports “sharing violations”. I suppose the name of the error message is derived from the original .NET as “sharing violation” is what Windows reports when two applications try to write to the same file at once, or one tries to write to a file that is locked down by someone else.

But I want to put some emphasis on something in particular:

Creating policy.2.4.glib-sharp.dll
Creating policy.2.4.glib-sharp.dll
Creating policy.2.4.glib-sharp.dll
[...]
Creating policy.2.6.glib-sharp.dll
Creating policy.2.6.glib-sharp.dll
Creating policy.2.6.glib-sharp.dll
[...]
Creating policy.2.8.glib-sharp.dll
Creating policy.2.8.glib-sharp.dll
Creating policy.2.8.glib-sharp.dll
[...]
make[3]: *** [policy.2.4.glib-sharp.dll] Error 1
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

As you can see each policy is reportedly created thrice. If you, like me, know what to look for in a parallel make failure, you’ll also notice that there are three policies being created there. This is quite important and interesting, as it already suggests to an experienced eye what the problem is, but let’s go on step by step.

Once again, we know the software is built with automake so you don’t expect parallel make failures, not from the default rules at least. But C#/Mono is not one of the languages that automake supports out of the box. Which means that almost surely there are custom rules involved.

As they are using custom rules, rather than automake the problem involves knowledge of GNU make (or any other make, but let’s assume GNU for now, it’s the most common in Free Software after all, for good or bad).

Let’s look for the “Creating” line in the Makefile.am file:

$(POLICY_ASSEMBLIES): $(top_builddir)/policy.config gtk-sharp.snk
        @for i in $(POLICY_VERSIONS); do        \
          echo "Creating policy.$$i.$(ASSEMBLY)";       \
          sed -e "s/@ASSEMBLY_NAME@/$(ASSEMBLY_NAME)/" -e "s/@POLICY@/$$i/" $(top_builddir)/policy.config > policy.$$i.config;  \
          $(AL) -link:policy.$$i.config -out:policy.$$i.$(ASSEMBLY) -keyfile:gtk-sharp.snk;     \
        done

If you had to deal with a similar failure before (as I did), you knew already what you were going to find in that rule. I’m referring to the for loop. It’s a common mistake for people not knowing make well enough to create a rule like this. They expect that declaring multiple targets in the rule means, for make “build all of these with a single command”, while it actually means “for any of these files, use this command to generate it”.

The result is that, as you’re going to need three different files, make will launch three times that code in parallel. Which not only will waste a huge amount of time but will also fail, as the three of them might try to access the same resource at once (like is happening here).

The solution for this kind of problem is not really obvious, as it often requires to rewrite the rules entirely. My usual way of thinking of the problem here is that whoever wrote the rule didn’t know make well enough and made a mistake, and it’s easier to just rewrite the rule.

Let’s decompose the rule then, ignoring the for loop, and the echo line, what we have is these two commands:

sed -e "s/@ASSEMBLY_NAME@/$(ASSEMBLY_NAME)/" -e "s/@POLICY@/$$i/" $(top_builddir)/policy.config > policy.$$i.config
$(AL) -link:policy.$$i.config -out:policy.$$i.$(ASSEMBLY) -keyfile:gtk-sharp.snk

Both of these two commands create a different file, one is intermediate, and is the policy configuration, the other is the final one. This again shows there’s a lack of understanding of how make is supposed to work, again a very common one, so I’m not blaming the developer here, make is a strange language. So there are two dependent steps involved here: the final requested result is the policy file, but to generate that you need the policy configuration.

Let’s start with the policy configuration then, the actual generation command is a simple sed call that takes the generic configuration and sets the assembly name and policy version in it. The problem here is obviously to replace the use of $$i (the variable used in the for loop) with the actual policy name. Just so we’re clear, the policy version is the 2.4, 2.6 and 2.8 string we have seen before. Luckily this is a pretty common task for a software like make and there is a construct that gets in our help: static pattern rules.

The name of the generated file is always in the format policy.$VERSION.config, and we need to know the $VERSION part for using it in sed. Nothing more suited for this than static pattern rules. Let’s replace the variable section of the filename with the magic symbol %, make will take care of expanding that as needed, and will also provide us a special variable in the rule, $* that will take the value of its expansion. The rule then becomes this:

policy.%.config: $(top_builddir)/policy.config
    sed -e "s/@ASSEMBLY_NAME@/$(ASSEMBLY_NAME)/" -e "s/@POLICY@/$*/" $(top_builddir)/policy.config > $@

And here we’ve created our policy configuration files, in a parallel build friendly way as none of them is dependent on the other, the three sed commands can easily be executed in parallel.

Now it’s time to create the actual policy assembly, again, we’re going to make use of the static pattern rules, and making the best use of the fact that you can also declare dependencies based on static patterns.

Instead of a simple two-entries rule, this is going to be a three-entries rule, the first entry defines the list of targets that this rule may apply to, that is the same as it was before ($(POLICY_ASSEMBLIES)), the second and third are the usual ones, defining target and dependencies.

While the original rule depended directly on the generic policy config, this one will only depend on the actual final config, as the rule we just wrote for the configuration files will take care of it. So the final rule to generate the wanted assembly will be:

$(POLICY_ASSEMBLIES) : policy.%.$(ASSEMBLY): policy.%.config gtk-sharp.snk
    $(AL) -link:policy.$*.config -out:$@ -keyfile:gtk-sharp.snk

At this point, the same has just to be applied to all the involved Makefile.am files in the package, like I did on the patch I submitted, and the package becomes totally parallel build friendly.

There is another nice addition to this: you’re trading one complex, difficult to read and broken rule with two one-liner rules, which makes the code much more readable and understandable if you’re looking for a mistake.

Following my post about parallel builds I started today to tackle down some issues with packages not properly building with parallel make. Most of them end up being quite easy to fix, some of them don’t have to be fixed at all, just need the -j1 dropped out of the ebuild because they already build fine (this usually is due to an older version failing and the ebuild never being revisited).

As I haven’t been able yet to find time and energy to restart writing full-fledged guides (the caffeine starvation doesn’t help), I decided to start writing some “case studies”. What I mean is that I’ll try to blog about some common problems I found in a particular package, and show the process to fix that. Hopefully, this way it’ll be easier for other to fix similar problems in the future. This also goes toward the goal of showing more of what Yamato does (by the way, once again thanks to everybody who contributed, and you all are still able to chip in if you want to help me).

The first case study in the list is for libbtctl (that I think is deprecated for what I can understand of its author’s comment).

When building with -j8 (and dropping the ebuild serialisation), the build will fail with an error similar to this:

libtool: compile:  x86_64-pc-linux-gnu-gcc -DHAVE_CONFIG_H -I. -I.. -g -I../intl -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include -I/usr/include/pygtk-2.0 -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/python2.5 -I/usr/include -DDATA_DIR=\"/usr/share/libbtctl\" -DGETTEXT_PACKAGE=\"libbtctl\" -march=barcelona -O2 -ftracer -pipe -ftree-vectorize -Wformat=2 -Wno-error -Wno-pointer-sign -g -ggdb -Wstrict-aliasing=2 -Wno-format-zero-length -MT btctl-pymodule.lo -MD -MP -MF .deps/btctl-pymodule.Tpo -c btctl-pymodule.c -o btctl-pymodule.o >/dev/null 2>&1
libtool: compile:  x86_64-pc-linux-gnu-gcc -DHAVE_CONFIG_H -I. -I.. -g -I../intl -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include -I/usr/include/pygtk-2.0 -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/python2.5 -I/usr/include -DDATA_DIR=\"/usr/share/libbtctl\" -DGETTEXT_PACKAGE=\"libbtctl\" -march=barcelona -O2 -ftracer -pipe -ftree-vectorize -Wformat=2 -Wno-error -Wno-pointer-sign -g -ggdb -Wstrict-aliasing=2 -Wno-format-zero-length -MT btctl-py.lo -MD -MP -MF .deps/btctl-py.Tpo -c btctl-py.c -o btctl-py.o >/dev/null 2>&1
libtool: link: cannot find the library `libbtctl.la' or unhandled argument `libbtctl.la'
make[3]: *** [btlist] Error 1
make[3]: *** Waiting for unfinished jobs....
libtool: link: cannot find the library `libbtctl.la' or unhandled argument `libbtctl.la'
make[3]: *** [btctl-async-test] Error 1
libtool: link: cannot find the library `libbtctl.la' or unhandled argument `libbtctl.la'
make[3]: *** [btctl-discovery-test] Error 1
libtool: link: cannot find the library `libbtctl.la' or unhandled argument `libbtctl.la'
make[3]: *** [btsignal-watch] Error 1
make[2]: *** [all] Error 2
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

It’s an easy error to understand, it cannot find libbtctl.la, piece of cake. It’s more of a problem to find the cause if you don’t know that beforehand.

The first comment to have here is that the buildsystem used is standard autotools; standard autotools, if used with their internal rules, are not subject to parallel-make failures. They don’t build directories in parallel, but they do the rest in as much parallel as they can. This means that it’s either using a custom rule, or it has misused autotools.

Another common problem with “cannot find the library” problems with libtool is when the library is in a different directory, and the order of subdirectories is wrong; this rarely creeps into the distributed tarball, if upstream is smart enough to run a make distcheck or to at least build their own tarballs, but you never know; usually you find this while trying to change the way interdependent libraries links against so that they can be built with --as-needed.

But there’s a tell-tale sign in the message: the library is not prefixed with any path, so it’s not being built in a different directory but in the same one. This makes it very suspicious.

The first error comes from btlist, so let’s extract the source tarball, and look in src/Makefile.am (because that’s the most likely directory where it is defined, we could have grepped but it’s easier this way):

noinst_PROGRAMS=btlist [...]

[...]

btlist_LDFLAGS = \
        libbtctl.la  $(BTCTL_LIBS) \
        $(BLUETOOTH_LIBS) $(OPENOBEX_LIBS)

What do you know? this is the only property defined for the btlist target, and indeed, it doesn’t look right, the LDFLAGS variable should be used to pass flags to be used by the linker (like -Wl,--as-needed), not the names of libraries. Even worse, name of libraries that have to be built as prerequisites for the target.

Edit: Rémi made me notice that I didn’t give the actual solution here, for those who don’t know automake so well. The correct variable to pass the libraries on is either LIBADD (for other libraries) or LDADD (for final executables). As btlist is in PROGRAMS, the latter is what we need to use.

And obviously the same mistake is repeated for almost every target in the Makefile.am. But luckily there’s a very active upstream, and the bug can be solved the same day it is reported.

It’s not so difficult once you see how to do it, is it?

  Thu, 28 Aug 2008 23:31:13 +0200

..and I’m not talking about the famous German mathematician. I’m talking about my ~x86-fbsd box.

It’s dead, and actually I haven’t a way to replace it. So, donations of every type or access to other boxes running gentoo/freebsd 7.0 are really _appreciated_. I’m asking so because gentoo/freebsd is going to be released, we’re doing an huge work testing and keywording packages, and it couldn’t be good if we stop it now.  If you want to donate something please drop me two lines at dav_it at g.o or leave a comment to this post.

Thanks a lot

not_so_happy davit

In the IT world we’re obviously full of practises that, albeit working, are very much hinted against because risky, broken on different setups or just stupid. Many of these practises are usually frequent enough either because they can be easy to apply without knowing, or because they were documented somewhere and people read and spread it.

These practices, in most compiled programming languages, when using optimising compilers, are guarded by warnings, almost-errors that are printed on the error stream of the compiler itself when it identifies a suspect construct. If you use Gentoo, you know them very well as you certainly see lots of them (unless you have -w in your CFLAGS).

Lots of people ignore warnings, because either fixing them is too much work as it would require changing a huge part of the code, or because they are not stopping them from compiling. Much more rarely it happens that the code actually works fine and the warning is bogus; it’s not unheard of though. Also the more advanced the warning, the more probabilities it might be implemented wrong.

On the other hand, the vast majority of warnings are put there for a good reason, and should actually be properly taken care of. These warnings could have been used to make a program 64-bit safe years before 64-bit systems started to be widespread, or might have made sure the code for a project written years before GCC 4.3 were to build correctly with the latest version of the compiler. Of course they are not the one and absolute solution, as many changes might not have had warnings before (like the std:: namespace change), but it could have helped.

But I don’t want to talk about compiler warnings today, but rather about Portage warnings. Since a few versions, thanks also to the availability of Zack and Marius, Portage started throwing warnings after a successful merge, giving you insight with possible problems with an ebuild or with the software the ebuild uses. These are pretty useful as they can catch for you if a ./configure switch was renamed, or removed, after a version bump; and they might tell you if the software is doing something risky and you should warn upstream about that. (Why should you warn upstream? Well, packagers often see lots more code than the daily programmer it’s not uncommon that a programmer might not know about an issue that a packager might know (because of the distribution policy about the problem).

In addition to these that might be setup-dependent, repoman also started warning about suspect (R)DEPEND and other issues with the ebuilds. Hopefully, even if repoman will probably become slower by piling up checks in it, it will be nice to make sure developers know what they are committing in.

This is particularly important because there are quite a few sub-optimal ebuilds in the tree already, and while it’s difficult to find and fix all of them, it’d be quite nice if we could avoid introducing new ones.

Unfortunately, I start to worry that it might not be as feasible as I hoped, because there is a huge fault in my idea that adding warning will keep people away from the mistakes: there is lack of documentation on these problems. As much as I wish I could count my blog as a source of documentation I know this is far from the truth, but I haven’t been able to start writing docs again yet because I was following this world rebuild closely, at least to understand how to follow my priorities. I know I’ll be working on quite a few things in the future, especially once the hospital is just a memory, and hopefully I’ll be able to write enough doc so that the warnings become clear enough that the whole tree will be safe for everybody to use under whichever circumstances.

Since I now have a true SMP system, it’s obvious that I’m expected to run parallel make to make use of this. Indeed, I set in my make.conf to use -j8 where I was using -j1 before. This has a few problems in general and it’s going to take some more work to be properly supported.

But before I start to get to those problems, I’d like to provide a public, extended answer to a friend of mine who asked me earlier today why I’m not using Portage 2.2’s parallel emerge feature.

Well, first of all, parallel emerge is helpful on SMP system during a first install, a world rebuild (which is actually what I’m doing now) or in a long update after some time spent offline; it is of little help when doing daily upgrades, or when installing a new package.

The reason is that you can effectively only merge in parallel packages that are independent of each other. And this is not so easy to ensure, to avoid breaking stuff, I’m sure portage is taking the safe route and rather serialise instead of risking brokenness. But even this, expects the dependency tree to be complete. You won’t find it complete because packages building with GCC are not going to depend on it. The system package set is not going to be put in the DEPEND variables of each ebuilds, as it is, and this opens the proverbial vase to a huge amount of problems, in my view. (Now you can also look up an earlier proposal of mine, and see if it had sense then already).

When doing a world rebuild, or a long-due update, you’re most likely going to find long chains that can be put in parallel, which I sincerely find risky, but they don’t have to be. When installing a new package, on a system that is already well installed and worked on for a few weeks even, you’ll be lucky (or unlucky) to find two or three chains at all. If you’re doing daily updates, finding parallel chains is also unlikely, as the big updates (gnome, kde, …) are usually interdependent.

Although it’s a nice feature to have, I don’t find it’s going to help a lot on the long run, I think parallel make is the thing that is going to make a difference in the medium term.

Okay, so what are the problems with using -j8 for ebuilds then?

  • we express the preference in term of (GNU) make parameters, but not all packages in Portage are built with make, let alone GNU’s;
  • ebuilds that use a non-make-compatible build system will try to parse the MAKEOPTS variable to find out the number of parallel jobs to execute; this does not always work right because there can be other options, like -s (which I use) that might make parsing difficult;
  • even -s option can be useful to some non-make-compatible build systems, but having to translate every option is tremendously pointless and boring;
  • some people use a high number of jobs because they have multiple box building as a cluster, using distcc or icecream; these won’t help with linking though, or with non-compile jobs; forcing non-compile tasks to a single job is going to discontent people using SMP systems, using a job count based on network hosts for non-compile tasks is going to slow down people with single-cpu and multi-host setups;
  • some tasks are being serialised by ebuilds when the could be ran in parallel;

And this is not yet taking into consideration buildsystem-specific problems!

What should be doing then? Well, I think the first point to solve is the way we express the preferences. Instead of expressing it in term of raw parameters to make, we should express it in term of number of jobs, and of features. For instance, a future version of portage might have a make.conf like this:

BUILD_LOCAL_JOBS="8"
BUILD_NETWORK_JOBS="12"

BUILD_OPTIONS="ccache icecream silent"

And then an ebuild would call, rather than simply emake, two new scripts: elmake and enmake (local and network), which would expand to the right number of jobs, for make-compatible buildsystems, that is. For other build systems eclasses could deal with that by getting the number of jobs and the features from there.

More options might be translated this way without having to parse the make syntax in each ebuild, or in each eclass. The ebuilds could also declare a global or per-phase limit to jobs, or a RESTRICT="parallel-make", that would make Portage use a single job.

The last point is probably the most complex one. Robin already dealt with a similar issues in the .bdf to .pcf translation of fonts, and solved it by having a new package provide a Makefile with the translation rules, the conversion could then be parallelised by make, instead of being serialised by the ebuild. I think we should do something like this in quite a few cases; the first one I can think of is the elisp compile for emacs extensions, and I don’t know whether Python serialises or execute in parallel the bytecode generation when providing multiple files to py_compile. And this is just looking at two eclasses I know doing something similar to this. But also Portage’s stripping and compressing of files should probably be parallelised, where there are enough resources to do so locally.

I guess I have found yet another task I’ll spend my time on, especially once I’m back from the hospital.

  Tue, 26 Aug 2008 23:14:04 +0200

Seth Woodworth, a nocturnal intern who works and sleeps on the other half of my desk, recently installed Google Analytics on the OLPC wiki. He has found some fascinating results.

The most amazing figure is the amount of traffic that comes from Uruguay, a country where we have now delivered 100,000 XO laptops (a recent milestone which hit national media in UY in a big way). Approximately 60% of the wiki traffic originates from Uruguay, 10,000+ visits per day, almost all of which is going to the Activities page. A contact at LATU has confirmed that these visits are actually children downloading activities and spreading the word, not a script or something.

This is huge. Loads of Uruguayan children are discovering the huge range of available activities and experimenting with them, every single day. If you’ve written an activity and listed it on that page, it is almost guaranteed that a lot of children have tried it.

The official language of Uruguay is Spanish (and the same is true for many of our other deployments), so ensuring your Spanish translations are up-to-date is of huge value here.

See the end of the latest community news for more of our findings.

  Tue, 26 Aug 2008 06:10:52 +0200
i'll be effectively offline from now until i get set up at the new apartment in Moose Jaw sometime in the next couple weeks (i imagine it'll take a while for them to hook me up, being the back to school rush).

I’ve updated the chapter on graphical environments a bit to reflect how applications, window managers, X server and widget toolkits work together. Hopefully it isn’t a big lie that I wrote there ;-)

I’ll probably be doing a bit of clean ups the coming days before I start out with more chapters…

  Thu, 21 Aug 2008 14:17:33 +0200

We suck. Gentoo sucks. Really.
We (300 devs) aren’t able to implement features that 10 developers implemented in a year.
Gentoo Council can’t make decisions.
Gentoo/Freebsd and Gentoo/Interix (Gentoo/Alt in general) aren’t projects from what users will benefit.
Portage is a mess of spaghetti procedural code with no underlying design (and be careful, –jobs will surely break your system).
Gento developers are trying to hide other devs work.
And every gentoo dev is a lier. Because gentoo itself and its features don’t exist. They’re only in your mind. Or in an island in the pacific sea. With Jim Morrison and the Area 51 aliens. And me probably.

I’m depressed. I’m depressed of seeing things like this.
This isn’t the way to stop the “war” and the fud. IMHO.

UPDATE: Read here for more informations

  Thu, 21 Aug 2008 13:53:01 +0200

Wow. I’ve to apologize. And probably I’ve to learn a lessons about sarcasm tags (as Diego said).
I’m sorry. But I wrote the last post trying to be more *ironic* as possible. I love gentoo, I really love it.

And _actually_ I didn’t want to cause what I’ve caused.
Now, back on most interesting things! There’s a lot of work to do on g/fbsd!
Stay tuned for updates

Cheers

dav

  Wed, 20 Aug 2008 15:21:00 +0200

I’m afraid Davide will have to learn a lesson about sarcasm tags

As for what concern me.. I’d like to remember that I made known my comments policy three months ago, and that comments are an additional feature, not a right, on a private blog.

But I suppose someone is only badmouthing me because he can’t stop thinking about me؟; I suppose I should be charmed by this, but sorry, I’m already taken.

Some followup notes from our alternate Linux desktop on XO work:

  • The README included with the script provides some more specific instructions.
  • Your SD card needs to be a minimum of 4GB capacity, because these distributions install more than 2GB by default. We could probably find some smaller distributions, although we require some relatively recent components for some part of the system (e.g. X), so finding a distribution that is both modern and small might be a challenge.
  • There are ongoing efforts to create a significantly smaller distro to run on the internal flash, but that is an early project at the moment.
  • We heavily recommend using a SD card that is advertised as fast, extreme, ultimate, or whatever! I was previously sceptical of cards advertised in this way - are they actually significantly faster? The answer is YES - we have a ‘non-extreme’ Kingston card and it is so much slower than the others. Installation took 4 times as long. I’m not sure a good way of benchmarking these, the standard “hdparm -t” test showed similar figures on both slow and fast cards.
  • I’ve spent most of my time with Fedora. On the whole, it works very well for a machine that is understandably lower spec than most. I do recommend slimming down the services though - turn off all the NFS daemons, cups, bluetooth daemon, pcscd, kerneloops, … In fedora this is done with the ’service’ command for a one-time stop, and ‘chkconfig’ to prevent things starting at boot.
  • When you run a lot of big applications, the machine does really slow down. Sometimes the mouse cursor freezes for a while. Perhaps we should experiment with swap space, currently I have none.
  • Sound does not work. In Fedora, it doesn’t work at all, probably a PulseAudio bug. I’m hoping that applying the system updates will fix this, but the Fedora infrastructure is down at the moment. In Ubuntu, sound works at the login screen (you hear the welcome sound) but not after you login.
  • On the postive side: wireless works, suspend works (most of the time), mouse and keyboard are good, gnome-power-manager dims the screen when idle, etc. It really acts as a normal fully usable distro with a few quirks identified above.
  Tue, 19 Aug 2008 04:28:10 +0200

Me and Bobby Powers have spent a few hours smoothing out the process of getting fully-featured Linux desktops to boot on the XO laptop. On the whole, OLPC developers have been pretty good at getting code upstream, so only a few fixups are needed to get things operational on the XO.

The only caveat is that you need a 4GB (or larger) SD card. The XO itself only has 1GB of storage, which is not big enough for the standard installs of the distributions that we’ve been playing with.

We’ve got Fedora 9 and Ubuntu Intrepid Alpha 3 working. Here is the process, using Fedora 9 as an example:

First, download the regular CD/DVD installation media for your distribution. For Fedora 9, you go to http://fedoraproject.org/en/get-fedora. Burn that to CD/DVD.

Next, find a regular PC that is capable of reading SD cards. We’re using a standard desktop plus a USB card reader. Boot that PC from the CD/DVD installation media that you burned earlier. Proceed through the installation as usual, but when asked where you would like to install the operating system, select the SD card.

Choose to setup the disk partitions manually. Do not do any fancy partitioning, just choose one partition that fills up the card. You don’t need to add any swap space.

Select the ext3 filesystem and choose to not install a bootloader.

Wait for installation to complete, and shut down the system.

Next, you need a PC running Linux. This can be the same PC as the one you used to install onto SD, assuming that one has Linux installed on it’s hard disk too. It doesn’t really matter which distribution, as long as you have git and regular development tools installed, and the SD card mounted at a known location.

On this PC, run the following:

# git clone git://dev.laptop.org/users/dsd/XO-alt-distro

Next, become root and run the script.

# sudo su -
# cd ~dsd/XO-alt-distro
# ./sd_fixup fedora-9 /media/disk

It will now download and compile the OLPC kernel, and perform a few other necessary tweaks to your SD card.

When the script has completed, unmount the SD card and plug it into an XO. Boot the XO, and say hello to your fully-functional Linux desktop.

In future, we plan to publish filesystem images of SD-installed distributions, so that you can avoid much of the above. To simplify further, we could also write a tool which runs on the XO which downloads said filesystem image and flashes onto SD.

Update 19/08/2008: Posted some additional notes

  Tue, 19 Aug 2008 00:10:09 +0200

Hi all,

Just a quick update to let you know that I've just put x11-drivers/xf86-video-i810-2.4.1 to Portage.

Overall, I'm not very happy with this release. It's definitely not as smooth as 2.3.2 which turned out to be very solid. So please test it out with a recent xorg-server (read, 1.4.2) and let me know in bugzilla if anything breaks.

Again, be prepared to continue the bug hunt in FreeDesktop's bugzilla as my Intel Powers (tm) are very limited. It just so happens that half a dozen Intel developers also roam there, so all in all, it's a good place to file bugs. Just add "remi at gentoo dot org" as a CC on those bugs so I can keep track of the issues.

Thanks

Update :

Josh asked me a couple questions and the answers might interest all those of you who have Intel chips but don't follow X development closely. So here goes :

My Thinkpad R61i has an Intel X3100 chip, so I'll be sure to test out the new drivers. I am sorta scared by your blog post, but, well, as long as I quickpkg the relevant packages before upgrading, I should be okay, right? ;)

Yes, let's get this straight, the driver won't eat your laptop and won't kill kittens either. A lot of the core features do work, but there are some instabilities and some quirky behaviors, and those are the issues I'm worried about.

I normally run the hardmasked/~arch X and Intel packages, with all their quirks, on that machine, so 2.4 shouldn't all that different.

Yep, 2.4.1-r1 is in ~arch, so ~ users will probably already have it installed by now.

I wasn't aware of the git overlay. Are there any fixes in upstream's git that just didn't make it into 2.4.1? If so, I may have to make the move to the overlay.

Donnie's X11 overlay has live ebuilds for the "master" branch of all X packages. Beware that intel/master is very different from intel/xf86-video-intel-2.4-branch. I'm currently backporting patches from the latter into Portage as they will end up in the next release anyway. So there's no need to create a live git ebuild for this particular branch.

As for the current git master, as you may have read on Phoronix and Planet FreeDesktop, the driver is going through major changes. Definitely not for the faint of heart.

Also, when's the namechange to xf86-video-intel?

I plan to tackle this tiny issue when Donnie decides to put Xorg 7.4 (aka xorg-server-1.5) into portage.

Two really-desired features of Portage, that are important for, respectively, desktop and embedded use cases, are multilib and cross-compilation. Both of these, to be properly implemented, require Portage to discern between same-ABI and any-ABI dependencies.

This concept has been called in the past Linked-in and Executed dependencies, but I don’t like that name at all as it makes sense only for those who actually know what the two concept expects. Actually, I don’t like this name either because it confuses the term ABI as the calling convention of an architecture, and the term ABI as the compiled interface of a software library. If anybody can make sure we find a simple term for the calling convention type of ABI, it would be quite nice in my opionion.

Another good reason to get rid of the terms Linked-in and Executed dependencies is that abstracting well enough the concept, one can easily see the same code and mechanisms to be used to describe the dependencies of extension modules like Python’s and Ruby’s, that depend on the version of the interpreter they are built and ran against.

With good enough support, this would allow to handle dependencies for multiple Python and Ruby versions at once without needing strange and silly hacks like the ones present in the ruby eclasses. And would have solved the problem of PHP extension versions that caused to split them in dev-php4 and dev-php5.

Why is this distinction needed for both multilib and crosscompile? Well, let’s start with the multilib cas