|
Planet Gentoo - http://planet.gentoo.org/ Mon, 08 Sep 2008 03:15:39 +0200 ![]() Setting up the storage on my new machine, I just ran into something really interesting, what seems to be deliberate usable and useful, but completely undocumented functionality in the MD RAID layer. It's possible to create RAID devices with the initial array having 'missing' slots, and then add the devices for those missing slots later. RAID1 lets you have one or more, RAID5 only one, RAID6 one or two, RAID10 up to half of the total. That functionality is documented in both the Documentation/md.txt of the kernel, as well as the manpage for mdadm. What isn't documented is when you later add devices, how to get them to take up the 'missing' slots, rather than remain as spares. Nothing in md(7), mdadm(8), or Documentation/md.txt. Nothing I tried with mdadm could do it either, leaving only the sysfs interface for the RAID device. Documentation/md.txt does describe the sysfs interface in detail, but seems to have some omissions and outdated material - the code has moved on, but the documentation hasn't caught up yet. So, below the jump, I present my small HOWTO on creating a RAID10 with missing devices and how to later add them properly. MD with missing devices HOWTOWe're going to create /dev/md10 as a RAID10, starting with two missing devices. In the example here, I use 4 loopback devices of 512MiB each: /dev/loop[1-4], but you should just substitute your real devices.
# mdadm --create /dev/md10 --level 10 -n 4 /dev/loop1 missing /dev/loop3 missing -x 0
mdadm: array /dev/md10 started.
# cat /proc/mdstat
Personalities : [raid1] [raid10] [raid0] [raid6] [raid5] [raid4]
md10 : active raid10 loop3[2] loop1[0]
1048448 blocks 64K chunks 2 near-copies [4/2] [U_U_]
# mdadm --manage --add /dev/md10 /dev/loop2 /dev/loop4
mdadm: added /dev/loop2
mdadm: added /dev/loop4
# cat /proc/mdstat
Personalities : [raid1] [raid10] [raid0] [raid6] [raid5] [raid4]
md10 : active raid10 loop4[4](S) loop2[5](S) loop3[2] loop1[0]
1048448 blocks 64K chunks 2 near-copies [4/2] [U_U_]
Now notice that the two new devices have been added as spares [denoted by the "(S)"], and that the array remains degraded [denoted by the underscores in the "[U_U_]"]. Now it's time to break out the sysfs interface.
# cd /sys/block/md10/md/
# grep . dev-loop*/{slot,state}
dev-loop1/slot:0
dev-loop2/slot:none
dev-loop3/slot:2
dev-loop4/slot:none
dev-loop1/state:in_sync
dev-loop2/state:spare
dev-loop3/state:in_sync
dev-loop4/state:spare
Now a short foray into explaining how MD-raid sees component devices. For an array with N devices total, there are slots numbered from 0 to N-1. If all the devices are present, there are no empty slots. The presence or absence of a device in a slot is noted by the display from /proc/mdstat: [U_U_]. That shows we have a devices in slots 0 and 2, and nothing in slots 1 and 3. The mdstat output does include slot numbers after each device in the listing line: md10 : active raid10 loop4[4](S) loop2[5](S) loop3[2] loop1[0]. loop4 and loop2 are in slots 4 and 5, both spare. loop3 and loop1 are in slots 0 and 2. The slot numbers that are greater than the device numbers seem to be extraneous, I'm not sure if they are just an mdadm abstraction, or in the kernel internals only. Now we want to fix up the array. We want to promote both spares to the missing slots. This is the first item that Documentation/md.txt is really wrong it. The description for the slot sysfs node contains: "This can only be set while assembling an array." This is actually wrong, we CAN write to it and fix our array.
# echo 1 >dev-loop2/slot
# echo 3 >dev-loop4/slot
# grep . dev-loop*/slot
dev-loop1/slot:0
dev-loop2/slot:1
dev-loop3/slot:2
dev-loop4/slot:3
# cat /proc/mdstat
Personalities : [raid1] [raid10] [raid0] [raid6] [raid5] [raid4]
md10 : active raid10 loop4[4] loop2[5] loop3[2] loop1[0]
1048448 blocks 64K chunks 2 near-copies [4/2] [U_U_]
The slot numbers have changed in the mdstat output and the sysfs, but they no longer match at all. The spare marker "(S)" has also vanished. Now we can follow the sysfs docmentation, and force a rebuild using the sync_action node. In theory, the mdadm daemon, if running, should have detected that the array was degraded and had valid spares, but I don't know why it didn't. Perhaps another bug to trace down later.
# echo repair >sync_action
(wait a moment)
# cat /proc/mdstat
Personalities : [raid1] [raid10] [raid0] [raid6] [raid5] [raid4]
md10 : active raid10 loop4[4] loop2[5] loop3[2] loop1[0]
1048448 blocks 64K chunks 2 near-copies [4/2] [U_U_]
[=============>.......] recovery = 65.6% (344064/524224) finish=0.1min speed=22937K/sec
The slot numbers still aren't what we set them to, but the array is busy rebuilding still.
# cat /proc/mdstat
Personalities : [raid1] [raid10] [raid0] [raid6] [raid5] [raid4]
md10 : active raid10 loop4[3] loop2[1] loop3[2] loop1[0]
1048448 blocks 64K chunks 2 near-copies [4/4] [UUUU]
Now that the rebuild is complete, the slot numbers have flipped to their correct values. Bonus: regular maintenance ideasWhile we can regularly check individual disks with the daemon part of smartmontools, issuing short and long disk tests, there is also a way to check entire arrays for consistency. The only way of doing it with mdadm is to force a rebuild, but that isn't really a nice proposition if it picks a disk that was about to fail as one of the 'good' disks. sysfs to the rescue again, there is a non-destructive way to test an array, and only promote to repair mode if there is an issue.
# echo check >sync_action
(wait a moment)
# cat /proc/mdstat
Personalities : [raid1] [raid10] [raid0] [raid6] [raid5] [raid4]
md10 : active raid10 loop4[3] loop2[1] loop3[2] loop1[0]
1048448 blocks 64K chunks 2 near-copies [4/4] [UUUU]
[============>........] check = 62.8% (660224/1048448) finish=0.0min speed=110037K/sec
Either make a cronjob to do it, or put the functionality in mdadm. You can safely issue the check command to multiple md devices at once, the kernel will ensure that it doesn't check array that share the same disks. Sun, 07 Sep 2008 17:55:00 +0200 ![]() So there have been quite a few people asking what is happening with KDE 4.1 on Gentoo. There are still no ebuilds in the tree for KDE 4.1 and to be totally honest I have not been actively developing the KDE ebuilds until recently (took quite a long break). It is however something I really feel that we should have and so I have been working with a few other developers and interested users on the new ebuilds in an overlay. I think these ebuilds are almost ready and I am very eager to get them into the tree. So what have we been doing? We have been working on a set of ebuilds that can be used with portage 2.2, there were a few setbacks when it became clear that EAPI 2 was not likely to be allowed in the tree in the very near term, but good noises are being made on the mailing lists. I think one of the big changes I have been working on is bringing KDE back into the normal file system hierarchy. This means that you will not be able to slot multiple minor versions of KDE such as 4.1 and 4.2 together. The loss of slotting also means that ebuild maintenance will be significantly easier, externally released packages will automatically relink to the latest kdelibs for example. It also means that users will not build up multiple copies of minor KDE versions such as 4.0, 4.1, 4.2. I think for the majority of our userbase (myself included) this situation is undesirable with few benefits for normal use of KDE. So having one slot for KDE 4 and upgrading in the normal fashion seems to be very beneficial. This has not been universally accepted as a good thing. I personally think our main tree KDE should always be installed in the normal FHS hierarchy as upstream seems to intend it. I see little benefit for most users, and many developers not closely involved in KDE development, to having multiple minor versions installed. Also having external KDE packages such as k3b and amarok installed in /usr but linking to libraries in a KDE prefix has never been optimal. My vision is for a default in-tree KDE that installs into the normal FHS tree as Gnome, Qt 4, XOrg etc do. Developers, power users and those that like to tweak would still be free to install slotted versions of KDE in KDE prefixes alongside the FHS KDE. They could also remove the FHS KDE and just use slotted versions too. We discussed during KDE 3.5 bumps how the current slotting was not ideal. I am certainly open to opinions. It would be good to get wider feedback from the community on which direction they would like to go in and why. This will not affect KDE 3.5 slotting with KDE 4 - they will always be slotted. It will make maintenance of external KDE packages simpler as everything will be in the same tree. Overlays with ebuilds in alternate slots and prefixes are of course still easily possible and people could continue using KDE in this way. Currently this work is taking place in an overlay. I am quite honestly exhausted discussing why this is or is not a good idea. I hope I have made my position clear and why I think it is a good thing for our user base. I really want to get KDE 4.1 into the tree as soon as possible. I do feel that these changes should be pushed into the tree, defaulting to an FHS compliant install and using a KDE prefix if requested. What are your thoughts (other than get KDE 4.1 in tree now - we are working on that)? Sat, 06 Sep 2008 00:27:07 +0200 In response to Bugday and my willingness to help, I want to help users that contribute. I won’t be around for Bugday and I am normally not available on the weekends. So, here is what I am willing to do.. On any maintainer-needed bug, if you fix the issue, feel free to CC me ( Do not:
I am offering to do this because a) I like when users get involved and help make Gentoo better, b) the whole maintainership concept slightly bothers me, c) I can’t look at all 250+ maintainer-needed bugs to see if there is a fix for it by myself. Here is a search for bugzilla that you could use to find maintainer-needed bugs (in assignee or CC). At the time of this writing there is 270. I (plus others I’m sure) have worked that queue before, the lowest I have seen it recently is 250. Consider this an open offer and we will see what happens. Thanks for helping! ![]() Fri, 05 Sep 2008 08:02:09 +0200 ![]() What: Gentoo contributors get together to help each other fix bugs Where: irc.freenode.net, #gentoo-bugs When: Saturday, September 6, in a timezone near you What do you need to bring?
What's a bug? Gentoo's way of tracking change requests. A change request can be anything from "I've found a typo in foo" to "I've built this really useful program called bar but there's no ebuild for it." Bugs have various levels of helpfulness, from identifying the existence of a problem to localizing the problem to providing the patch to fix it. There are bugs in documentation such as man pages as well as ebuilds and the source code that Gentoo distributes. These bugs are problem reports. Bugs for things Gentoo doesn't do yet but you think should be done are feature requests. Bugday is more about fixing problems than adding features, but you won't be turned away if you want help with a new feature. Want to know more about Bugday? It's held on the first Saturday of every month. It's an opportunity for everyone to contribute to making Gentoo better, and eventually you might even become a Gentoo developer. See the Bugday project page for more details. Bugday is about community spirit. Gentoo is a community—there is no "me" and "them", there is only "we," so instead of lobbying for "them" to fix your particular bug, work together to fix it! Bugday is an opportunity to get help to help yourself. If you've been wanting to get involved but weren't sure how, Bugday is a great way for you to see what goes on in making a distribution and get involved in Gentoo. Roy Bamford contributed the draft for this announcement. Thu, 04 Sep 2008 14:03:15 +0200 Image StackingAs Henner writes, we've managed to add a new feature that some users were requesting for quite a long time -- now there's that thingy for marking several images as "belonging together". Imagine you've taken a group photo of bunch of your friends at a pub. You weren't exactly sober, so perhaps some of these are shaky, out-of-focus or otherwise imperfect. You weren't drunk as a lord, though, so there's one of them which is pretty good. Now you don't want to erase all the other pictures, but you don't want to see them by default, either. So what you do now is to make an Image Stack of all of them and select which are bad and which one is the best one. This feature might be useful for various variants of images decoded from one RAW file or for images stitched to one big panorama as well. It's far from complete yet (the GUI part still sucks and you can't select your order of preference, either), but it's a good start and it got done pretty fast.
Lets look at an example stack with a picture of Gunvald (Jesper's dog) on the
beach. The default view in the thumbnail view shows only the first
picture: Please read Henner's post as well and let us know what you think about this feature. DigikamShortly after my last post, I got a bunch of mails from our users. It's great to hear that people are interested in our work, so many thanks to all the readers! Please keep the comments going. Anyway, the most frequent question was interoperability with Digikam. At first, I'd like to mention that I'm on a KPhotoAlbum Development Sprint. Nobody of us over here is hacking on Digikam. It surely is a great piece of software, but we don't use it ourselves. That's why I have to disappoint one reader of this blog -- nope, I can't blog about new features in Digikam, sorry. But be sure to check out the KDE Planet form time to time, their development team is blogging every now and then. Now the interoperability thingy -- we have a nice file describing what the problems are. Looking at the document, I'm afraid that both applications are using quite different approach to tagging -- in Digikam, tags are (I haven't check the Digikam sources, nor am I using it, so this might be actually wrong) apparently in a flat namespace, while in KPhotoAlbum, we use a hierarchy (a tree) of tags for features like "pictures of Anna should be recognized as pictures containing anyone of my friends as well". This is certainly not a showstopper, but quite an important issue nonetheless. However, please don't get disappointed too early. The question is -- why should we somehow prefer interoperability with Digikam over, say, XnView? There's a lot of users out there who are using different applications on different operating systems, and we'd like to do our best to be interoperable with all of them. At the same time, we certainly aren't willing to be held back by lack of features in other applications. Therefore, we won't write any feature like "copy my database to the Digikam format", nor do we expect Digikam to be able to import our stuff. (There's nothing holding you back in writing your own converter, though -- our format is pretty simple to deal with.) What we will do, however, is adding a feature for metadata import and export. We'll use standardized stuff like Exif, IPTC and XMP, so that it should be easy for KPhotoAlbum users to provide their friends with images with embedded tags (and the other way round, of course). KDE4, SQL,...Another question was a state of the KDE4 port and the SQL support. I'll leave that for another blogpost, as this one got pretty long already :). Thu, 04 Sep 2008 08:05:35 +0200 I blogged about the USE flag transition from USE=tetex to something more sensible before. This is finally done and so the next step is to get rid of virtual/tetex in the whole tree and depending on the virtual needed instead like virtual/tex-base, virtual/latex-base or virtual/texi2dvi. To avoid problems there is a recommendation for all users and developers: Check if TeXLive is availabe for you, unmerge teTeX and do a world update to get something more up to date. TeXLive 2008 will not provide virtual/tetex anymore and was release yesterday, after some tests it will hit the tree. Also check out the upgrade guide and this forums thread. Wed, 03 Sep 2008 23:25:10 +0200 Edit 2008/09/05:A private source that I inquired of indicates that the AD2000B part was only a special run of the AD1989B part. There shouldn't be any functional differences. On the side of a spec sheet, the AD1989B specs should be available "shortly" from Analog Devices. Original posting:So in more details to follow, I picked up hardware for a new workstation to replace my G5. The only part of the hardware that isn't working yet, is the digital audio (SPDIF/Toslink) output. My motherboard is an Asus P5Q-Premium, and the specifications claim to have "ADI® AD2000B 8-Channel High Definition Audio CODEC" as the audio chip. This chip is apparently the successor to the AD1988B chip. The analog audio part works fine, just that I use optical to overcome an interference issue on the run between my computers and my actual working area of my desk (with a small digital decoder and stereo speakers). Digging around in the ALSA drivers, it just seems I need to find a different set of controls to toggle the digital lines to be outputs or enabled - and that this data would be in the public datasheet, just like previous versions of the chip. I submitted a technical request to Asus a few days ago, with no response yet. I also contacted Analog Devices directly. Their customer support referred me to their application engineers, whom I phoned, and they then proceeded to deny the existence of the chip, and I quote: "It's not in my system, we don't manufacture it." That's really interesting, because I've got it on my motherboard! Either the divisions of Analog Devices aren't talking, or Asus is using chips from a 3rd party that's ripping off Analog Device's trademark amongst other things. Here's the text off the chip: AD2000BX 14??793.1 #0816 0.3 SINGAPORE I tried to take a photo, but it's really annoying and hard to read, without dis-assembling my machine, which I'd prefer not to do at this point. However, I did find another photo on the web, of the same area from a review of the motherboard. The Analog Devices logo is also clearly visible after the 'BX' portion of the text. From the photo I could make out: AD2000BX 1383055.1 #0808 0.2 SINGAPORE If I had to make a guess about it, the chip is AD2000BX, the second line is the serial number, the third is the year and week of manufacturer, plus the revision of the chip, and the last line is the manufacture location. If you're from Asus or Analog Devices, and you're reading this, where's the datasheet for the chip? Is it a real ADI part? I simply want the public datasheet like the rest of models so that I can fix digital audio output in Linux myself, and contribute it back to the ALSA project. P.S. The upstream ALSA bug is here. There's no downstream Gentoo bug. Wed, 03 Sep 2008 19:56:41 +0200 As I wrote earlier I've had some problems with the iwlwifi driver and my Intel 3945 wifi card. One workaround brought to my attention by again by Jeremy Olexa's blog post is to run G mode only. This can easily be done in OpenWRT Kamikaze with:
In the config file /lib/wifi/broadcom.sh add the following line just after the
Restart network/box. With this workaround speed is back to normal, thanks to the initial blog post by Markus Golser here. Wed, 03 Sep 2008 07:48:39 +0200 ![]() Here’s something I haven’t written about in a long time — bend, my custom written CLI PHP5 scripts to rip and encode TV shows. I actually rewrote the entire thing over Labor Day weekend. What’s amazing is it took so long to write the original one, but so short a time to completely revamp it. It’s something I’ve been wanting to do for a long time, and I’m glad I finally got to it. The code on the old one was so horrible, and was such a frustrating experience to patch, debug or add features. The new one is already 20 times better. The first one was just plagued by scope creep, though — I started off just mostly coding it around the way that I thought DVDs *should* work and how they ought to be authored, only to be constantly slapped in the face by so many exceptions that I’d have to go back and hack it to work around the new found realities. One example is that either lsdvd or libdvdread is buggy in how it outputs chapter information. Actually, my whole experience with chapters have been that if there are any oddities, then the players will just freak out. You wouldn’t believe the cases I ran into. Anyway, here’s a small example. On one DVD, lsdvd will report in original output that one track has 30 chapters on it. But when you go to display the chapters, it will only say that there are two. Most of the time, what happens, is that it will choke anytime there is a chapter between others that is zero length. In this case, lsdvd just chokes and stops counting them. MPlayer (at least, the ancient version I’m using) will do a couple of things depending on its mood — sometimes freeze, sometimes skip over it, sometimes act like its not even there. It’s very odd. I’ve found a lot of interesting little bugs in the dvd libraries and tools. I’d love to poke and the source and fix them up … when I have time. The code is online in my svn repo, and the new one is called ‘drip’ for dvd ripper. Original, I know, but eventually it will replace bend completely once I add in all the features the old one had plus all the new stuff I want. I would throw in a link to trac which has prettier display output for viewing SVN files, but my installation is broken (again) and I have no idea why, and it’s always a royal pain trying to figure out what went wrong, so I’ll just fix it later. I love trac, but its not easy debugging the setup. Oh yah, also I’ve been working on my mythvideo setup, tweaking it even more. One really thing that dawned on me, which I’ll write in more detail once I actually have a script ready, is that you can use it to execute shell scripts using the File Types admin menu. Just tell it to execute .sh files in your folders with /bin/bash and away you can go. Another thing I learned is that MythVideo will only pass two variables to any external scripts, the default player (%d) and the video file (%s), or more accurately, the file you’ve selected to run. So if you wanted to see what you’re executing, you would add this to the file type for .sh files: /bin/bash %s %s Then, say you had test.sh, this would be the contents:
I’m getting ahead of myself, though .. I’ll write more about that when I’ve got something to show. I’m actually working on a shell script similar to mplayer-resume to resume playback of a playlist you’re in. It’s a bit trickier than I thought it would (or rather, not nearly as simple as I had hoped), so I’m still scoping it out in my head. Speaking of mplayer-resume, I fixed a bug I kept running into with it for a while now. The script will now catch the exit code of mplayer, and if it’s not successful (zero), then it won’t overwrite or delete the old position. I used to hit it all the time because I used to run mplayer -hardframedrop when playing my videos, which would crash the playback about 10% of the time and of course kill the file that had the playback position. I need to repackage it and push it live, but there’s a few more small fixes I want to make to it first … I might finish the playlist resume script first and add it to there. Plus I want to get trac working, because that’s where it’s homepage is. But, I moved my mini-itx to the living room and hooked it up to my HDTV. It was sitting in my bedroom just collecting dust, and I figured I might as well move it to see if it gets any more usage. Actually, I remember now, I moved it was because the LED lights were really bright in my bedroom at night, and I have to sleep in total darkness to get a good night’s rest. Anyway, it’s worked out well so far. My TV has a VGA port so it’s super simple to plug it in, not to mention I like the fact that it doesn’t use up an HDMI port. I love my TV. :) Once I have this series playlist resume script finished, I think I’ll be pretty much “done” with having the setup that I’ve wanted so long. Well, aside from the fact that I need about 12 more terabytes of harddrive space. Good times, I tell you what. I’m gonna go watch some Star Trek TNG. Tue, 02 Sep 2008 19:37:29 +0200 As jaervosz wrote the other day, the iwl3945 driver has some serious issues with it. I think I have it narrowed down to what conditions cause the problem. Problem: When downloading large files for non-trivial amounts of time, the download speed drops to <80 K/s. This is unacceptable, the whole pipe is limited to that by the way. I am not sure what exactly causes this but I have narrowed down the conditions to which it happens. Encryption does not matter.. it falters on wep/wpa{1,2} or open networks. However, I found that this condition only exists on mode B APs. This includes “mixed” APs as well. I do not know enough about drivers but if the AP offers B & G, then it should select G, right? Well, based on the condition of the speeds, I would have to say that it is selecting B mode and then hitting this bug again. Anyway, for the time being…Do not use B APs. Easier said that done because I’m sure most every sys-admin would select Mixed AP over G only. So, if you are experiencing this issue as well, please comment on the upstream bug, which has been open for 7 months by the way. Annoying. Maybe we can convince them to look at this issue some more? Even intel employees are CC’d on the bug because they have the issue too.. Workaround: Convert your AP to G only or use G only APs. ![]() Tue, 02 Sep 2008 13:13:00 +0200 ![]() One interesting thing of using chroots to check things out is that often enough you stumble across different corner cases when you get to test one particular aspect of packages. For instance, when I was testing linking collisions, I found a lot of included libraries. This time testing for flags being respected I found some other corner cases. It might some funky, but it has been common knowledge for a while that I was going to merge sys-block/unieject in my flagstesting chroot so I could make sure it worked properly, for this, it needed dev-libs/confuse, which I use for configuration files parsing. All at once, I found this failure: This was funny to see as I did merge confuse lately on my main system on Yamato, and I do have nls enabled there, too. It didn’t fail, so it’s not even a glibc cleanup-related failure. Time to dig into the code, where is As I used this before, I know that It seems to be entirely fine, the only problem would be if The focal point here is to check why So not for any kind of assurance, but just because there’s a technical need, The fix here is very easy, just include Tue, 02 Sep 2008 12:03:22 +0200 As you might already know from the news item at the KPA website, I'm currently sitting behind a table in a wonderful country of Denmark. I'm not sitting there alone, though, because we're having a small KPhotoAlbum development sprint right now. Jesper, the main author, was so kind to invite us to his home, the KDE e.V. paid the flight tickets, so all we have to pay is just the barrels of beer and the T-shirts. And that's an investment a typical developer makes with pleasure :). So far we have made just 17 commits, but there's a lot of talk going here (hi Tuomas :p). For exmaple, the SQL backend is getting closer and closer to being completely ready, the support for multiple cores is shaping up (and you won't see that annoying flickering anymore) and we even have a cute, transparent infobox which looks really sexy. Right now, I'm working on a reworked EXIF/IPTC/XMP import/export thingy, which would essentially allow you to customize how KPhotoAlbum imports the metadata from the images, and also to store these data back to the files for increased interoperability and stuff like that. So I'll be finally able to share all the captions and tags with a friend of mine who's using XnView. This feature was already present in the 3.1.0 version, but the GUI for it was, well, not really intuitive. I'm converting it to Kross, the KDE scripting environment. As I brought a GPS device some time ago, one can also expect some efforts for integration with Marble. There's also a whiteboard full of interesting ideas which I won't mention here. Mon, 01 Sep 2008 13:58:43 +0200 ![]() Hi all… I’ve been rather busy and thus haven’t had much time for Gentoo work, but today I managed to get some patches together that allow Firefox 3.0.1 to build and run on MIPS. The ebuild is as yet, unkeyworded, as I wish to do some further testing. I have successfully compiled it on little-endian MIPS, and it mostly seems to work okay. It mostly passes the Acid2 test with some slight errors, but unfortunately, crashes part way through the Acid3 test — this I’ll investigate when I have the time. It also crashes on my blog so your mileage may vary. No testing has been done on big-endian MIPS at this stage, as my O2 is down for the short term (need to build a new kernel and get X running) so I’d appreciate feedback from users on this matter. Mon, 01 Sep 2008 10:52:18 +0200 Just a quick post to let you all know that xf86-video-i810-2.4.2 has hit portage in ~arch. Here's my request : please test the hell out of it !! It seems to be a good improvement over the original 2.4.0 release as it has less flicker, but it needs some serious testing. I think there are still cases where I can see flickering but the conditions seem to vary over time and I can't reliably reproduce the issue. So please test this new release as much as you can and let me know how it works for you. Bugs should be filed in Bugzilla and not in the comments As for the new 2.5 branch, it should hit ~arch with a big fat package.mask over the next couple of days. Oh, and now I'm part of the X11 Team as I have to work with X stuff for my job, I might as well contribute some efforts there too. Cheers! Update : Please please file bugs if you still see flickering! Bugs won't solve themselves! Thanks Mon, 01 Sep 2008 10:02:10 +0200 ![]() The August issue of the Gentoo Monthly Newsletter has been released. In this month's issue: PHP4 removal, GSOC interview, new Gentoo-based distributions, and more! Mon, 01 Sep 2008 05:52:34 +0200
We only have one level so far, but it is quite engaging and not as easy as it might sound. It was interesting to see some youngsters try it and experiment with different bridge structures during the review session. The game also features some top notch sound effects coordinated by Brian Jordan. We won the gold prize for game development. To learn more and download it, see the Bridge page on the wiki. Mon, 01 Sep 2008 02:46:00 +0200 ![]() As I said on What did Enterprise do?, I had (and have again) a series of chroots I use for testing particular setups; I have, for instance, one running OpenPAM that can tell me whether the software in the tree have the proper dependencies (either Since yesterday, thanks to solar, I have a new addition to my testing rig: an uclibc chroot. I asked solar to get me something I could download and run locally as I had to fix a bug with PAM which is now fixed. I have to say that I don’t know much yet about the setup of uClibc itself, which means I haven’t gotten to understand well yet how iconv is supported in it. Certainly I now know that once USE-based dependencies will be available in the tree I’ll try once again to see if libiconv can be used for something else beside Gentoo/FreeBSD (but the collision with man-pages should be solved before that, if it’s not already). Even though I know solar does not really wish for me to mess in with NLS and uClibc, I find it a pretty important part of Gentoo/FreeBSD work, I always found it as such, the reason for this is that it’s easier to fix something the right way when you have more than one alternative case. Otherwise you might end up special casing something that should be made generic instead. I also expect Gentoo/FreeBSD 7 to be upcoming, and that will probably mean my return to that too, now that I can get a VirtualBox running at a decent speed. But I haven’t even started setting up the uClibc chroot to speed with what I want to do, in particular I want to set up my All in all, I hope that having an uclibc chroot around will allow the packages I maintain to work out of the box on uClibc, it’s going to be a pretty interesting task. Sun, 31 Aug 2008 18:54:00 +0200 ![]() As I have written in my post Flags and flags, I think that one way out of the hardened problem would be to actually respect the CFLAGS and CXXFLAGS the user requests so that they actually apply to the ebuilds. Unfortunately, not all the ebuilds in the tree respect the flags, and finding out which ones do and which ones don’t hasn’t been, up to now, an easy task. There are many reasons for this, the most common one is to look at the build output and spot that all the compile lines lack your custom flags, but this is difficult to automate, another option is to inject a fake definition option ( While I was without Enterprise I spent some time thinking about this and I came to find a possible solution, which I’m going to experiment on Yamato, starting tonight (which is Friday 29th for what it’s worth). The trick is that GCC provides a flag that allows you to include an extra file, unknown to the rest of the code. With a properly structured file, you can easily inject some beacon that you can later pick up. And with a proper beacon injected in the build files, it shouldn’t be a problem to check using scanelf or similar tools if the flags were respected. The trick here is all in the choice of the beacon and in looking it up; the first requirement for the proper beacon is to make sure it does not intrude and disrupt the code or the compilation, this means it has to have a name that is not common, and thus does not risk to collide with other pieces of code, and won’t clash between different translation units. To solve this, the name can be just very long so that it’s impractical that somebody might have used it for a funciton or variable name, let’s say we call that beacon two translation units with that in them, linked together, will cause a linker error that will stop the build. This cannot happen or it’ll make our test useless. The solution is to make the symbol a common symbol. In C, common symbols are usually the ones that are declared without an initialisation value, like this: Unfortunately this syntax doesn’t work on C++, as the notion of common symbol hasn’t crossed that language barrier. Which means that we have to go deeper in the stack of languages to find the way to create the common symbol. It’s not difficult, once you decide to use the assembly language: will create a common symbol of size 1 byte. It won’t be perfect as it might increase the size of .bss section for a program by one byte, and thus screw up perfect non-.bss programs, but we’re interested in the tests rather than the performance, as of this moment. There is still one little problem though: the But before we go on with this, there is something else to take care of. As I have written in the entry linked at the start of this one, there are packages that mix CFLAGS and CXXFLAGS. As we’re here, it could be easy to just add some more test beacons that track down for us if the package has used CFLAGS to build C++ code or CXXFLAGS to build C code. With this in mind, i came to create two files: And here we are, now it’s just time to inject these in the variables and check for the output. But I’m still not satisfied with this. There are packages that, mistakenly, save their own CFLAGS and propose them to other programs that are linked against; to avoid these to falsify our tests, I’m going to make the injection unique on a package level. Thanks to Portage, we can create two functions in the After the build completed, it’s time to check the results, luckily pax-utils contains At this point you just have to look for the ET_REL output: And it’s time to find out why getopt.o and getopt1.o are not respecting CFLAGS while the rest of the build is. Sun, 31 Aug 2008 12:21:00 +0200 ![]() A far less common problem than the last two I have written about, today I wish to analyse the failure in media-gfx/sam2p I reported. I have found similar problems before, and thus I think it’s another case worth talking about although the fix is very quick. The failure in question would be this one: The “premature EOF” error message usually means a file is truncated. With experience, you can tell this is a race condition: either the same broken rule or two rules are creating and deleting a file, and one of the two is arriving after it was deleted already. In this case, looking at the original Makefile, it’s not the same broken rule: I didn’t copy over all the rules, but this already shows the problem here. All the rules, while not exactly identical (the flags passed to the pre-processors are different depending on the target), use the same setting and use the same file names. The result is that while one rule runs the others will run too, creating the race condition. For Gentoo I fixed it in a slightly sub-optimal way, changing all the reference to The alternative using pipes, for the first rule, would probably be something like: I haven’t changed it into this because I didn’t have too much time to look into how much difference it makes, or to test it; I’ve written it down to my TODO list for the future, maybe it is a possible improvement. In general, for parallel make, pipes should be preferred to temporary files, and if temporary files are needed, they should have a different names for each target, so that they won’t overwrite one the other when make is run in parallel. Fri, 29 Aug 2008 22:20:24 +0200 A new version of portpeek can be found here (version 1.6.7) or emerged through portage. Changes: The way I was checking if a package was masked was incorrect. If you had a package in package.unmask and it was also in profiles/package.mask, the code was incorrectly surmising the package was stable. Bug reports and patches are always welcome. Fri, 29 Aug 2008 13:03:00 +0200 ![]() Here comes another case study about parallel make failures and fixes. This time I’m going to write about a much less common, and more difficult to understand, type of failure. I have spotted and fixed this failure in gtk# (yes I have it installed). Let’s see the failure to begin with: Okay so there are some failures during calling “alink”, in particular it reports “sharing violations”. I suppose the name of the error message is derived from the original .NET as “sharing violation” is what Windows reports when two applications try to write to the same file at once, or one tries to write to a file that is locked down by someone else. But I want to put some emphasis on something in particular: As you can see each policy is reportedly created thrice. If you, like me, know what to look for in a parallel make failure, you’ll also notice that there are three policies being created there. This is quite important and interesting, as it already suggests to an experienced eye what the problem is, but let’s go on step by step. Once again, we know the software is built with As they are using custom rules, rather than Let’s look for the “Creating” line in the If you had to deal with a similar failure before (as I did), you knew already what you were going to find in that rule. I’m referring to the The result is that, as you’re going to need three different files, make will launch three times that code in parallel. Which not only will waste a huge amount of time but will also fail, as the three of them might try to access the same resource at once (like is happening here). The solution for this kind of problem is not really obvious, as it often requires to rewrite the rules entirely. My usual way of thinking of the problem here is that whoever wrote the rule didn’t know Let’s decompose the rule then, ignoring the for loop, and the echo line, what we have is these two commands: Both of these two commands create a different file, one is intermediate, and is the policy configuration, the other is the final one. This again shows there’s a lack of understanding of how Let’s start with the policy configuration then, the actual generation command is a simple The name of the generated file is always in the format And here we’ve created our policy configuration files, in a parallel build friendly way as none of them is dependent on the other, the three Now it’s time to create the actual policy assembly, again, we’re going to make use of the static pattern rules, and making the best use of the fact that you can also declare dependencies based on static patterns. Instead of a simple two-entries rule, this is going to be a three-entries rule, the first entry defines the list of targets that this rule may apply to, that is the same as it was before ( While the original rule depended directly on the generic policy config, this one will only depend on the actual final config, as the rule we just wrote for the configuration files will take care of it. So the final rule to generate the wanted assembly will be: At this point, the same has just to be applied to all the involved There is another nice addition to this: you’re trading one complex, difficult to read and broken rule with two one-liner rules, which makes the code much more readable and understandable if you’re looking for a mistake. Fri, 29 Aug 2008 08:27:00 +0200 ![]() Following my post about parallel builds I started today to tackle down some issues with packages not properly building with parallel make. Most of them end up being quite easy to fix, some of them don’t have to be fixed at all, just need the As I haven’t been able yet to find time and energy to restart writing full-fledged guides (the caffeine starvation doesn’t help), I decided to start writing some “case studies”. What I mean is that I’ll try to blog about some common problems I found in a particular package, and show the process to fix that. Hopefully, this way it’ll be easier for other to fix similar problems in the future. This also goes toward the goal of showing more of what Yamato does (by the way, once again thanks to everybody who contributed, and you all are still able to chip in if you want to help me). The first case study in the list is for When building with It’s an easy error to understand, it cannot find The first comment to have here is that the buildsystem used is standard autotools; standard autotools, if used with their internal rules, are not subject to parallel-make failures. They don’t build directories in parallel, but they do the rest in as much parallel as they can. This means that it’s either using a custom rule, or it has misused autotools. Another common problem with “cannot find the library” problems with libtool is when the library is in a different directory, and the order of subdirectories is wrong; this rarely creeps into the distributed tarball, if upstream is smart enough to run a But there’s a tell-tale sign in the message: the library is not prefixed with any path, so it’s not being built in a different directory but in the same one. This makes it very suspicious. The first error comes from What do you know? this is the only property defined for the Edit: Rémi made me notice that I didn’t give the actual solution here, for those who don’t know And obviously the same mistake is repeated for almost every target in the It’s not so difficult once you see how to do it, is it? Thu, 28 Aug 2008 23:31:13 +0200 ![]() ..and I’m not talking about the famous German mathematician. I’m talking about my ~x86-fbsd box. It’s dead, and actually I haven’t a way to replace it. So, donations of every type or access to other boxes running gentoo/freebsd 7.0 are really _appreciated_. I’m asking so because gentoo/freebsd is going to be released, we’re doing an huge work testing and keywording packages, and it couldn’t be good if we stop it now. If you want to donate something please drop me two lines at dav_it at g.o or leave a comment to this post. Thanks a lot not_so_happy davit ![]() Thu, 28 Aug 2008 22:01:00 +0200 ![]() In the IT world we’re obviously full of practises that, albeit working, are very much hinted against because risky, broken on different setups or just stupid. Many of these practises are usually frequent enough either because they can be easy to apply without knowing, or because they were documented somewhere and people read and spread it. These practices, in most compiled programming languages, when using optimising compilers, are guarded by warnings, almost-errors that are printed on the error stream of the compiler itself when it identifies a suspect construct. If you use Gentoo, you know them very well as you certainly see lots of them (unless you have Lots of people ignore warnings, because either fixing them is too much work as it would require changing a huge part of the code, or because they are not stopping them from compiling. Much more rarely it happens that the code actually works fine and the warning is bogus; it’s not unheard of though. Also the more advanced the warning, the more probabilities it might be implemented wrong. On the other hand, the vast majority of warnings are put there for a good reason, and should actually be properly taken care of. These warnings could have been used to make a program 64-bit safe years before 64-bit systems started to be widespread, or might have made sure the code for a project written years before GCC 4.3 were to build correctly with the latest version of the compiler. Of course they are not the one and absolute solution, as many changes might not have had warnings before (like the But I don’t want to talk about compiler warnings today, but rather about Portage warnings. Since a few versions, thanks also to the availability of Zack and Marius, Portage started throwing warnings after a successful merge, giving you insight with possible problems with an ebuild or with the software the ebuild uses. These are pretty useful as they can catch for you if a In addition to these that might be setup-dependent, This is particularly important because there are quite a few sub-optimal ebuilds in the tree already, and while it’s difficult to find and fix all of them, it’d be quite nice if we could avoid introducing new ones. Unfortunately, I start to worry that it might not be as feasible as I hoped, because there is a huge fault in my idea that adding warning will keep people away from the mistakes: there is lack of documentation on these problems. As much as I wish I could count my blog as a source of documentation I know this is far from the truth, but I haven’t been able to start writing docs again yet because I was following this world rebuild closely, at least to understand how to follow my priorities. I know I’ll be working on quite a few things in the future, especially once the hospital is just a memory, and hopefully I’ll be able to write enough doc so that the warnings become clear enough that the whole tree will be safe for everybody to use under whichever circumstances. Wed, 27 Aug 2008 14:57:00 +0200 ![]() Since I now have a true SMP system, it’s obvious that I’m expected to run parallel make to make use of this. Indeed, I set in my But before I start to get to those problems, I’d like to provide a public, extended answer to a friend of mine who asked me earlier today why I’m not using Portage 2.2’s parallel emerge feature. Well, first of all, parallel emerge is helpful on SMP system during a first install, a world rebuild (which is actually what I’m doing now) or in a long update after some time spent offline; it is of little help when doing daily upgrades, or when installing a new package. The reason is that you can effectively only merge in parallel packages that are independent of each other. And this is not so easy to ensure, to avoid breaking stuff, I’m sure portage is taking the safe route and rather serialise instead of risking brokenness. But even this, expects the dependency tree to be complete. You won’t find it complete because packages building with GCC are not going to depend on it. The system package set is not going to be put in the DEPEND variables of each ebuilds, as it is, and this opens the proverbial vase to a huge amount of problems, in my view. (Now you can also look up an earlier proposal of mine, and see if it had sense then already). When doing a world rebuild, or a long-due update, you’re most likely going to find long chains that can be put in parallel, which I sincerely find risky, but they don’t have to be. When installing a new package, on a system that is already well installed and worked on for a few weeks even, you’ll be lucky (or unlucky) to find two or three chains at all. If you’re doing daily updates, finding parallel chains is also unlikely, as the big updates (gnome, kde, …) are usually interdependent. Although it’s a nice feature to have, I don’t find it’s going to help a lot on the long run, I think parallel make is the thing that is going to make a difference in the medium term. Okay, so what are the problems with using
And this is not yet taking into consideration buildsystem-specific problems! What should be doing then? Well, I think the first point to solve is the way we express the preferences. Instead of expressing it in term of raw parameters to And then an ebuild would call, rather than simply More options might be translated this way without having to parse the The last point is probably the most complex one. Robin already dealt with a similar issues in the .bdf to .pcf translation of fonts, and solved it by having a new package provide a I guess I have found yet another task I’ll spend my time on, especially once I’m back from the hospital. Tue, 26 Aug 2008 23:14:04 +0200 Seth Woodworth, a nocturnal intern who works and sleeps on the other half of my desk, recently installed Google Analytics on the OLPC wiki. He has found some fascinating results. The most amazing figure is the amount of traffic that comes from Uruguay, a country where we have now delivered 100,000 XO laptops (a recent milestone which hit national media in UY in a big way). Approximately 60% of the wiki traffic originates from Uruguay, 10,000+ visits per day, almost all of which is going to the Activities page. A contact at LATU has confirmed that these visits are actually children downloading activities and spreading the word, not a script or something. This is huge. Loads of Uruguayan children are discovering the huge range of available activities and experimenting with them, every single day. If you’ve written an activity and listed it on that page, it is almost guaranteed that a lot of children have tried it. The official language of Uruguay is Spanish (and the same is true for many of our other deployments), so ensuring your Spanish translations are up-to-date is of huge value here. See the end of the latest community news for more of our findings. Tue, 26 Aug 2008 06:10:52 +0200 i'll be effectively offline from now until i get set up at the new apartment in Moose Jaw sometime in the next couple weeks (i imagine it'll take a while for them to hook me up, being the back to school rush). Thu, 21 Aug 2008 22:08:51 +0200 I’ve updated the chapter on graphical environments a bit to reflect how applications, window managers, X server and widget toolkits work together. Hopefully it isn’t a big lie that I wrote there ;-) I’ll probably be doing a bit of clean ups the coming days before I start out with more chapters… Thu, 21 Aug 2008 14:17:33 +0200 ![]() We suck. Gentoo sucks. Really. I’m depressed. I’m depressed of seeing things like this. UPDATE: Read here for more informations ![]() Thu, 21 Aug 2008 13:53:01 +0200 ![]() Wow. I’ve to apologize. And probably I’ve to learn a lessons about sarcasm tags (as Diego said). And _actually_ I didn’t want to cause what I’ve caused. Cheers dav ![]() Wed, 20 Aug 2008 15:21:00 +0200 ![]() I’m afraid Davide will have to learn a lesson about sarcasm tags … As for what concern me.. I’d like to remember that I made known my comments policy three months ago, and that comments are an additional feature, not a right, on a private blog. But I suppose someone is only badmouthing me because he can’t stop thinking about me؟; I suppose I should be charmed by this, but sorry, I’m already taken. Tue, 19 Aug 2008 14:52:25 +0200 Some followup notes from our alternate Linux desktop on XO work:
Tue, 19 Aug 2008 04:28:10 +0200 Me and Bobby Powers have spent a few hours smoothing out the process of getting fully-featured Linux desktops to boot on the XO laptop. On the whole, OLPC developers have been pretty good at getting code upstream, so only a few fixups are needed to get things operational on the XO. The only caveat is that you need a 4GB (or larger) SD card. The XO itself only has 1GB of storage, which is not big enough for the standard installs of the distributions that we’ve been playing with. We’ve got Fedora 9 and Ubuntu Intrepid Alpha 3 working. Here is the process, using Fedora 9 as an example: First, download the regular CD/DVD installation media for your distribution. For Fedora 9, you go to http://fedoraproject.org/en/get-fedora. Burn that to CD/DVD. Next, find a regular PC that is capable of reading SD cards. We’re using a standard desktop plus a USB card reader. Boot that PC from the CD/DVD installation media that you burned earlier. Proceed through the installation as usual, but when asked where you would like to install the operating system, select the SD card. Choose to setup the disk partitions manually. Do not do any fancy partitioning, just choose one partition that fills up the card. You don’t need to add any swap space. ![]() Select the ext3 filesystem and choose to not install a bootloader. ![]() Wait for installation to complete, and shut down the system. Next, you need a PC running Linux. This can be the same PC as the one you used to install onto SD, assuming that one has Linux installed on it’s hard disk too. It doesn’t really matter which distribution, as long as you have git and regular development tools installed, and the SD card mounted at a known location. On this PC, run the following: # git clone git://dev.laptop.org/users/dsd/XO-alt-distro Next, become root and run the script. # sudo su - # cd ~dsd/XO-alt-distro # ./sd_fixup fedora-9 /media/disk It will now download and compile the OLPC kernel, and perform a few other necessary tweaks to your SD card. When the script has completed, unmount the SD card and plug it into an XO. Boot the XO, and say hello to your fully-functional Linux desktop. ![]() In future, we plan to publish filesystem images of SD-installed distributions, so that you can avoid much of the above. To simplify further, we could also write a tool which runs on the XO which downloads said filesystem image and flashes onto SD. Update 19/08/2008: Posted some additional notes Tue, 19 Aug 2008 00:10:09 +0200 Hi all, Just a quick update to let you know that I've just put x11-drivers/xf86-video-i810-2.4.1 to Portage. Overall, I'm not very happy with this release. It's definitely not as smooth as 2.3.2 which turned out to be very solid. So please test it out with a recent xorg-server (read, 1.4.2) and let me know in bugzilla if anything breaks. Again, be prepared to continue the bug hunt in FreeDesktop's bugzilla as my Intel Powers (tm) are very limited. It just so happens that half a dozen Intel developers also roam there, so all in all, it's a good place to file bugs. Just add "remi at gentoo dot org" as a CC on those bugs so I can keep track of the issues. Thanks Update : Josh asked me a couple questions and the answers might interest all those of you who have Intel chips but don't follow X development closely. So here goes :
Yes, let's get this straight, the driver won't eat your laptop and won't kill kittens either. A lot of the core features do work, but there are some instabilities and some quirky behaviors, and those are the issues I'm worried about.
Yep, 2.4.1-r1 is in ~arch, so ~ users will probably already have it installed by now.
Donnie's X11 overlay has live ebuilds for the "master" branch of all X packages. Beware that intel/master is very different from intel/xf86-video-intel-2.4-branch. I'm currently backporting patches from the latter into Portage as they will end up in the next release anyway. So there's no need to create a live git ebuild for this particular branch. As for the current git master, as you may have read on Phoronix and Planet FreeDesktop, the driver is going through major changes. Definitely not for the faint of heart.
I plan to tackle this tiny issue when Donnie decides to put Xorg 7.4 (aka xorg-server-1.5) into portage. Mon, 18 Aug 2008 20:48:00 +0200 ![]() Two really-desired features of Portage, that are important for, respectively, desktop and embedded use cases, are multilib and cross-compilation. Both of these, to be properly implemented, require Portage to discern between same-ABI and any-ABI dependencies. This concept has been called in the past Linked-in and Executed dependencies, but I don’t like that name at all as it makes sense only for those who actually know what the two concept expects. Actually, I don’t like this name either because it confuses the term ABI as the calling convention of an architecture, and the term ABI as the compiled interface of a software library. If anybody can make sure we find a simple term for the calling convention type of ABI, it would be quite nice in my opionion. Another good reason to get rid of the terms Linked-in and Executed dependencies is that abstracting well enough the concept, one can easily see the same code and mechanisms to be used to describe the dependencies of extension modules like Python’s and Ruby’s, that depend on the version of the interpreter they are built and ran against. With good enough support, this would allow to handle dependencies for multiple Python and Ruby versions at once without needing strange and silly hacks like the ones present in the ruby eclasses. And would have solved the problem of PHP extension versions that caused to split them in Why is this distinction needed for both multilib and crosscompile? Well, let’s start with the multilib cas |