<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Mothership</title>
	<atom:link href="http://www.planetcrushers.com/heide/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.planetcrushers.com/heide</link>
	<description>Just another useless personal blog</description>
	<lastBuildDate>Sun, 02 Jun 2013 21:48:21 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>It&#8217;s Time To Get Smaller</title>
		<link>http://www.planetcrushers.com/heide/archives/2013/06/02/its-time-to-get-smaller/</link>
		<comments>http://www.planetcrushers.com/heide/archives/2013/06/02/its-time-to-get-smaller/#comments</comments>
		<pubDate>Sun, 02 Jun 2013 21:48:21 +0000</pubDate>
		<dc:creator>heide</dc:creator>
				<category><![CDATA[Geek]]></category>

		<guid isPermaLink="false">http://www.planetcrushers.com/heide/?p=1046</guid>
		<description><![CDATA[Well, Haswell and the new Nvidia GPUs are out, so it&#8217;s time to start putting together a new system. It&#8217;s still too early for prebuilt ones to be available, and I wasn&#8217;t happy with the ones I looked at before since they all involved compromising on something due to limited choices, so I guess I&#8217;m [...]]]></description>
				<content:encoded><![CDATA[<p>Well, Haswell and the new Nvidia GPUs are out, so it&#8217;s time to start putting together a new system.  It&#8217;s still too early for prebuilt ones to be available, and I wasn&#8217;t happy with the ones I looked at before since they all involved compromising on something due to limited choices, so I guess I&#8217;m rolling my own again.  So far, the parts I&#8217;m looking at are:</p>
<p><strong>CPU</strong>: Intel i5-4670.  Of the new batch of Haswell chips, this seems to hit the sweet spot for price and performance.  There are slightly faster ones, but they start to get much more expensive very fast after this point.</p>
<p><strong>Motherboard</strong>: <a href="http://www.newegg.ca/Product/Product.aspx?Item=N82E16813131984">Asus Z87M-Plus uATX</a>.  There seems to be a lot less differentiation among motherboards nowadays, perhaps due to more stuff being moved right onto the CPU die.  It&#8217;s mainly about size and number of USB and PCIe slots, and I don&#8217;t really need very many of them.  I&#8217;m not going SLI, so I don&#8217;t need more than 1 PCIe 16x slot, so this board satisfies everything even though it&#8217;s on the cheaper end.</p>
<p><strong>Case</strong>: <a href="http://www.newegg.ca/Product/Product.aspx?Item=N82E16811163186">SilverStone PS07B uATX</a>.  I could reuse my current Antec P180, but it&#8217;s a behemoth, and I&#8217;d like to try something a bit more compact.  I don&#8217;t need four external 5 1/4&#8243; bays!  It&#8217;s my first time looking at uATX cases, so I&#8217;m slightly worried about things fitting and cooling, but as far as I can tell from measurements and other peoples&#8217; reports, it should all work.</p>
<p><strong>Video</strong>: <a href="http://www.newegg.ca/Product/Product.aspx?Item=N82E16814121770">Asus GTX 770 DirectCUII</a>.  Not much differentiation among video cards either, but this new generation should last me a good while and this card is supposed to be particularly cool and quiet.  I&#8217;m slightly concerned that 2GB might not be enough if games ported from next-gen consoles need more and more texture memory, but it&#8217;s easy enough to replace the card down the road if that happens.</p>
<p><strong>Storage</strong>: SanDisk Extreme 480GB SSD and Seagate Barracuda 3TB.  It&#8217;s about time I got an SSD, and this one&#8217;s been recommended for being reliable and reasonably performant.  It won&#8217;t be big enough by itself, so there&#8217;s ye olde regular hard drive in there too.</p>
<p><strong>Power Supply</strong>: <a href="http://www.newegg.ca/Product/Product.aspx?Item=N82E16817151088">SeaSonic X650 Gold</a>.  Not a particularly glamorous component, but I&#8217;m going for a modular one to cut down on the mess of cables inside, especially with a smaller case.</p>
<p>I&#8217;m also trying to choose quieter parts this time.  I didn&#8217;t really focus on it last time, and the fans are fairly noticeable even at idle.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.planetcrushers.com/heide/archives/2013/06/02/its-time-to-get-smaller/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bad Timing</title>
		<link>http://www.planetcrushers.com/heide/archives/2013/05/17/bad-timing/</link>
		<comments>http://www.planetcrushers.com/heide/archives/2013/05/17/bad-timing/#comments</comments>
		<pubDate>Sat, 18 May 2013 04:37:50 +0000</pubDate>
		<dc:creator>heide</dc:creator>
				<category><![CDATA[Geek]]></category>

		<guid isPermaLink="false">http://www.planetcrushers.com/heide/?p=1041</guid>
		<description><![CDATA[Ugh, it&#8217;s not dead yet, but at 7 bad sectors and steadily climbing, causing random system lockups, it won&#8217;t be long now. This drive&#8217;s had a good life, lasting at least five or six years, and everything&#8217;s already backed up, so there&#8217;s no concern over losing anything. It&#8217;s just really bad timing, since I was [...]]]></description>
				<content:encoded><![CDATA[<p><img src="/img/heide/dying_drive.png" /></p>
<p>Ugh, it&#8217;s not dead yet, but at 7 bad sectors and steadily climbing, causing random system lockups, it won&#8217;t be long now.  This drive&#8217;s had a good life, lasting at least five or six years, and everything&#8217;s already backed up, so there&#8217;s no concern over losing anything.</p>
<p>It&#8217;s just really bad timing, since I was planning on putting together a new gaming system in a couple months or so, based on the upcoming Haswell chips and new GPU generation.  If the drive dies tomorrow, there&#8217;s not much point in spending the time to immediately replace it and restore everything when I&#8217;m going to wipe and reinstall in a couple months anyway.  But if I don&#8217;t, I&#8217;ll be left without my main gaming system for that couple months instead.</p>
<p>For now I&#8217;m just crossing my fingers and hoping it manages to limp along just long enough to assemble the new system.  Come on and hurry up, Intel!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.planetcrushers.com/heide/archives/2013/05/17/bad-timing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Faceless</title>
		<link>http://www.planetcrushers.com/heide/archives/2013/05/06/faceless/</link>
		<comments>http://www.planetcrushers.com/heide/archives/2013/05/06/faceless/#comments</comments>
		<pubDate>Mon, 06 May 2013 23:51:25 +0000</pubDate>
		<dc:creator>heide</dc:creator>
				<category><![CDATA[Personal]]></category>

		<guid isPermaLink="false">http://www.planetcrushers.com/heide/?p=1034</guid>
		<description><![CDATA[A couple of weeks ago, I closed my Facebook account. I haven&#8217;t really missed it since. I wasn&#8217;t really using it much anyway, and it was just becoming more of a burden than a benefit. I never posted any status updates; nothing in my life really seems worth injecting into other people&#8217;s news streams. Reading [...]]]></description>
				<content:encoded><![CDATA[<p>A couple of weeks ago, I closed my Facebook account.  I haven&#8217;t really missed it since.</p>
<p>I wasn&#8217;t really using it much anyway, and it was just becoming more of a burden than a benefit.  I never posted any status updates; nothing in my life really seems worth injecting into other people&#8217;s news streams.  Reading other people&#8217;s updates was just a depressing inadequacy reminder.  The constant stream of ads and game updates and &#8216;suggestions&#8217; were annoying and nearly impossible to fully disable.  The web page behaved poorly in Chrome, often chewing up huge amounts of CPU or crashing the tab.  The privacy options were murky at best and what they track increasingly invasive.  Perhaps what was the last straw was the update they tried to push to their Android client, that practically takes over the device and tracks what you do on it.</p>
<p>&#8220;But you&#8217;ll miss out on what all your friends are doing!&#8221;</p>
<p>I don&#8217;t need it.</p>
<p>Yeah, Facebook is great at efficiently distributing the news of your life to all the people you know, but I&#8217;ve been finding it increasingly alienating.  Nobody communicates <em>to</em> me, it&#8217;s all indirect typing past each other.  Telling Facebook about how your day went is not the same thing as telling me how your day went.  Comments are not a replacement for conversation.  If you want to talk to me, <em>talk to me</em>.  If you don&#8217;t, that&#8217;s fine too, social interaction shouldn&#8217;t be forced.</p>
<p>And yeah, it&#8217;s a bit hypocritical in that this blog is the same kind of indirect interaction, but I&#8217;d like to think that the important differences are that I try to keep it to bigger and/or more niche topics, and stuff I&#8217;d rant about to no one in particular, not day-to-day stuff; that your decision to come here and read is voluntary, not obligatory just because you labeled someone a friend; and that I have total control here and am not trying to sell you something or track your reading habits.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.planetcrushers.com/heide/archives/2013/05/06/faceless/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Late Spam</title>
		<link>http://www.planetcrushers.com/heide/archives/2013/05/03/late-spam/</link>
		<comments>http://www.planetcrushers.com/heide/archives/2013/05/03/late-spam/#comments</comments>
		<pubDate>Sat, 04 May 2013 03:51:03 +0000</pubDate>
		<dc:creator>heide</dc:creator>
				<category><![CDATA[SpamSpamSpam]]></category>

		<guid isPermaLink="false">http://www.planetcrushers.com/heide/?p=895</guid>
		<description><![CDATA[Subject: They need you weak Yeah, well, the cyber-samurai aren&#8217;t going to catch me off-guard that easily&#8230; Subject: Postgraduate degree in economics is your dream? Get it right now over here. Man, I don&#8217;t remember most of my dreams, but I hope they&#8217;re more exciting than that. Subject: Get more swell down there You&#8217;re going [...]]]></description>
				<content:encoded><![CDATA[<blockquote class="quote"><p><tt>Subject: They need you weak</tt></p>
<p>Yeah, well, the cyber-samurai aren&#8217;t going to catch me off-guard that easily&#8230;</p></blockquote>
<blockquote class="quote"><p><tt>Subject: Postgraduate degree in economics is your dream? Get it right now over here.</tt></p>
<p>Man, I don&#8217;t remember most of my dreams, but I hope they&#8217;re more exciting than that.</p></blockquote>
<blockquote class="quote"><p><tt>Subject: Get more swell down there</tt></p>
<p>You&#8217;re going to help me make more friends down in the U.S.?</p></blockquote>
<blockquote class="quote"><p><tt>Subject: Check out my awesome racks</tt></p>
<p>I would, but my keycard won&#8217;t let me in your server room.</p></blockquote>
<blockquote class="quote"><p><tt>Subject: Two simple steps,Add the title "Ph.D" to your resume</tt></p>
<p>Hey, that&#8217;s only one step!  I may not have a Ph.D, but I did take Counting 100.</p></blockquote>
<blockquote class="quote"><p><tt>Subject: Tarzan in bed after 1 doze.</tt></p>
<p>He probably should have taken the NoDoz instead.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.planetcrushers.com/heide/archives/2013/05/03/late-spam/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Roto-Router</title>
		<link>http://www.planetcrushers.com/heide/archives/2012/11/22/roto-router/</link>
		<comments>http://www.planetcrushers.com/heide/archives/2012/11/22/roto-router/#comments</comments>
		<pubDate>Fri, 23 Nov 2012 03:19:50 +0000</pubDate>
		<dc:creator>heide</dc:creator>
				<category><![CDATA[Geek]]></category>

		<guid isPermaLink="false">http://www.planetcrushers.com/heide/?p=1025</guid>
		<description><![CDATA[Well I finally got sick of the glitchiness of my old Linksys router. Every once in a while it just &#8216;goes away&#8217; and has to be power cycled, and it&#8217;s annoying when it happens while I&#8217;m at work and still need to access something at home. Doing research on routers is usually a big pain, [...]]]></description>
				<content:encoded><![CDATA[<p>Well I finally got sick of the glitchiness of my old Linksys router.  Every once in a while it just &#8216;goes away&#8217; and has to be power cycled, and it&#8217;s annoying when it happens while I&#8217;m at work and still need to access something at home.</p>
<p>Doing research on routers is usually a big pain, but this time I went with some recommendations and got the ASUS RT-N66U.  Besides just generally good reviews, it&#8217;s one of the models that works well with TomatoUSB, and I&#8217;ve had good luck with the Tomato firmware in the past, so the first thing I did was replace the standard firmware with that <a href="http://www.shadowandy.net/2012/03/asus-rt-n66u-tomatousb-firmware-flashing-guide.htm">via these instructions</a>.  It&#8217;s a bit trickier with this model since you have to use an external utility to flash the firmware, so there&#8217;s a slightly higher chance of turning it into an expensive brick, but it went pretty smoothly.</p>
<p>Only time will tell as far as reliability goes, but it&#8217;s nice to finally be able to use the 5GHz band separately, as it&#8217;s much less congested around here.  Being an apartment and condo-dense area, there&#8217;s a crazy number of APs around here, and the status page for the 2.4GHz radio says &#8220;Interference level: Severe&#8221;.  I&#8217;ll also have to fiddle with the QoS and USB options at some point.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.planetcrushers.com/heide/archives/2012/11/22/roto-router/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Don&#8217;t Call It A Pad</title>
		<link>http://www.planetcrushers.com/heide/archives/2012/09/19/dont-call-it-a-pad/</link>
		<comments>http://www.planetcrushers.com/heide/archives/2012/09/19/dont-call-it-a-pad/#comments</comments>
		<pubDate>Wed, 19 Sep 2012 18:17:59 +0000</pubDate>
		<dc:creator>heide</dc:creator>
				<category><![CDATA[Geek]]></category>

		<guid isPermaLink="false">http://www.planetcrushers.com/heide/?p=1015</guid>
		<description><![CDATA[Well, I gave in to my techno-curiosity about this whole Android thing (I just had an iPhone already) and picked up a Nexus 7 last night. It&#8217;s pretty cheap for a tablet while still being fairly high quality, so it lets me indulge my curiosity without feeling like a big commitment. The build quality is [...]]]></description>
				<content:encoded><![CDATA[<p>Well, I gave in to my techno-curiosity about this whole Android thing (I just had an iPhone already) and picked up a <a href="http://en.wikipedia.org/wiki/Nexus_7">Nexus 7</a> last night.  It&#8217;s pretty cheap for a tablet while still being fairly high quality, so it lets me indulge my curiosity without feeling like a big commitment.</p>
<p>The build quality is pretty great, the screen is awesome, and no problems at all with the touch interface. The only real concerns were that the camera is pretty low-res and grainy in my living room light, but it&#8217;s clearly really just for Skype and such and people who try and take photos with tablets are evil anyway; and when holding it in landscape mode to watch a video, the sound distractingly comes from just one side, so have a good set of headphones instead.  It&#8217;s a little bit heavier than expected, and my arm got a bit tired of holding it up, but that&#8217;s really just because I&#8217;m not exactly in great shape&#8230;</p>
<p>I checked out some comics, and although cramped, I still found them to be reasonably legible at the full-page-fit level. That was admittedly with some DC/Marvel-style bright, large-text pages though, and I can see comics with denser art and text needing more fiddling with zooming in and out. I&#8217;d say it&#8217;s &#8216;adequate&#8217;, obviously not as good as a full-size tablet but I&#8217;d certainly rather view them on this than my iPhone.</p>
<p>Google Play seems nice enough, though I haven&#8217;t really probed the depth of their selections yet. The major omission in Canada is the music store and sync/streaming service, so for us it&#8217;s just not going to be a very good music device unless you set up your own DLNA server and use something like Subsonic to do your own streaming. The Google integration is nice in that I recently switched to Chrome on the desktop, so it&#8217;s convenient already having bookmarks and such shared.</p>
<p>And I&#8217;m still exploring the whole Android ecosystem, but it does please the geek inside me that I can get fairly low-level with terminals, file management, etc. even without having rooted it. The app selection may be smaller, but it hasn&#8217;t really felt like anything&#8217;s missing yet (hell, I even found a .mod music player). I still have to check out the gaming selection in more depth.</p>
<p>I still do way too much fiddly stuff on my laptop for this to ever be a replacement for it, but I&#8217;ll just have to keep using it and see what kind of role it naturally falls into.  It&#8217;s definitely way easier to use than the laptop while lying in bed&#8230;</p>
<p>(Though one thing that really bugged me while setting it up is that I had to log in to my Google account <i>four times</i>, typing out my whole crazy-long strong password each time. The first time fails because of two-factor authentication, so it redirects you to a web page to sign in again and enter the mobile code, but then I misclicked something and it took me to another web page with no obvious navigation or gesture controls to get back to the code entry. I wound up having to hard power it off and go through the whole setup process again, requiring another two password prompts. At least it was only a one-time process&#8230;  The paranoid side of me also isn&#8217;t crazy about having another way essential credentials for things like Google could be leaked, but hopefully having a strong PIN on it suffices.)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.planetcrushers.com/heide/archives/2012/09/19/dont-call-it-a-pad/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>It&#8217;s Shinier Too</title>
		<link>http://www.planetcrushers.com/heide/archives/2012/09/06/its-shinier-too/</link>
		<comments>http://www.planetcrushers.com/heide/archives/2012/09/06/its-shinier-too/#comments</comments>
		<pubDate>Fri, 07 Sep 2012 01:42:40 +0000</pubDate>
		<dc:creator>heide</dc:creator>
				<category><![CDATA[Geek]]></category>

		<guid isPermaLink="false">http://www.planetcrushers.com/heide/?p=1010</guid>
		<description><![CDATA[(Yikes, it&#8217;s been a while&#8230;) Well, it finally did it. I&#8217;ve been a staunch Firefox user ever since it was in beta, since I liked using the same browser across multiple platforms, but it&#8217;s gotten to the point where it&#8217;s just too glitchy to tolerate. Pages not responding to the refresh button properly, unexplained choppiness [...]]]></description>
				<content:encoded><![CDATA[<p>(Yikes, it&#8217;s been a while&#8230;)</p>
<p>Well, it finally did it.  I&#8217;ve been a staunch Firefox user ever since it was in beta, since I liked using the same browser across multiple platforms, but it&#8217;s gotten to the point where it&#8217;s just too glitchy to tolerate.  Pages not responding to the refresh button properly, unexplained choppiness in video playback, the tab bar not scrolling all the way and leaving some tabs accessible only by the tab list, crashes when I use the Flash player full-screen, the tab list missing some tabs at the bottom, high CPU usage on OS X even when lightly loaded, the Firefox window suddenly losing focus so I have to click on it again, not remembering passwords on some pages it should, and so on.  Little things, but they add up.</p>
<p>So, I&#8217;m going to give Chrome a try for a while.  There&#8217;s no guarantee that I won&#8217;t find a bunch of things about it that&#8217;ll annoy me just as much, but it&#8217;s worth a shot.  Now I just have to find a set of equivalent extensions&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.planetcrushers.com/heide/archives/2012/09/06/its-shinier-too/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Settings Are A Bit Off</title>
		<link>http://www.planetcrushers.com/heide/archives/2011/09/02/the-settings-are-a-bit-off/</link>
		<comments>http://www.planetcrushers.com/heide/archives/2011/09/02/the-settings-are-a-bit-off/#comments</comments>
		<pubDate>Fri, 02 Sep 2011 18:27:28 +0000</pubDate>
		<dc:creator>heide</dc:creator>
				<category><![CDATA[Geek]]></category>

		<guid isPermaLink="false">http://www.planetcrushers.com/heide/?p=964</guid>
		<description><![CDATA[The offset sizes were another area I could experiment with a bit. Originally I had three different offset lengths, a short-range one, a medium-range one, and a long-range one, on the theory that shorter offsets might occur more often than longer offsets, and could be stored in fewer bits. If a buffer size of &#8216;n&#8217; [...]]]></description>
				<content:encoded><![CDATA[<p>The offset sizes were another area I could experiment with a bit.  Originally I had three different offset lengths, a short-range one, a medium-range one, and a long-range one, on the theory that shorter offsets might occur more often than longer offsets, and could be stored in fewer bits.  If a buffer size of &#8216;n&#8217; bits was specified, the long-range offset would be &#8216;n&#8217; bits, the medium range offset would be &#8216;n-1&#8242; bits, and the short-range offset would be &#8216;n-2&#8242; bits.</p>
<p>Some experimentation showed that having these different ranges was indeed more efficient than having just a single offset length, but it was hard to tell just what the optimal sizes were for each range.  I kept it to only three different ranges because initially I didn&#8217;t want the number of identifier symbols to be too large, but after merging the identifiers into the literals, I had a bit more leeway in how many more ranges I could add.</p>
<p>So&#8230;why not add a range for <i>every</i> bit length?  I set it up so that 256 would correspond to a 6-bit offset, 257 indicated a 7-bit offset, 258 is an 8-bit offset, etc., all the way up to 24-bit offsets.  This also had the property that, except for the bottom range, an &#8216;n&#8217;-bit offset could be stored in &#8216;n-1&#8242; bits, since the uppermost bit would always be &#8217;1&#8242; and could be thrown away (if it was &#8217;0&#8242;, it wouldn&#8217;t be considered an &#8216;n&#8217;-bit offset, since it fits in a smaller range).  Some testing against a set of data files showed that this did indeed improve the compression efficiency and produced smaller files.</p>
<p>With all of these possible bit values and lengths though, there was still the open question of what should be considered <i>reasonable</i> values for things like the default history buffer size and match length.  Unfortunately, the answer is that it&#8230;depends.  I used a shell script called &#8216;explode&#8217; to run files through the compressor with all possible combinations of a set of buffer sizes and match lengths to see which would produce the smallest files, and the results varied a lot depending on the type and size of input file.  Increasing the match length did not necessarily help, since it increased the average size of the length symbols and didn&#8217;t necessarily find enough long matches to cancel that out.  Increasing the buffer size generally improves compression, but greatly increases memory usage and slows down compression.  After some more experimentation with the &#8216;explode&#8217; script, I settled on defaults of 17 bits for the buffer size, and a match length of 130.</p>
<p>Another idea I&#8217;d remembered hearing about was how the best match at the current byte might not necessarily be the most efficient match.  It might be more efficient to emit the current byte as a literal instead if the next byte is the start of an even longer match.  It was only an intuitive feeling though, so I implemented this and tested it and it did indeed seem to give a consistent improvement in compression efficiency.  As an example, in one text document the phrase &#8216;edge of the dock&#8217; was compressed like so:</p>
<pre>Literal: 'e' (101) (4 bits)
Literal: 'd' (100) (6 bits)
Literal: 'g' (103) (8 bits)
10-bit offset: 544   Length: 3 'e o' (16 bits)
 8-bit offset: 170   Length: 6 'f the ' (17 bits)
10-bit offset: 592   Length: 3 ' do' (16 bits)
Literal: 'c' (99) (6 bits)
Literal: 'k' (107) (7 bits)</pre>
<p>but with the new test, it generated the following instead:</p>
<pre>Literal: 'e' (101) (4 bits)
Literal: 'd' (100) (6 bits)
Literal: 'g' (103) (8 bits)
Literal: 'e' (101) (4 bits) (forced, match len=3)
 8-bit offset: 170   Length: 8 ' of the ' (19 bits)
10-bit offset: 592   Length: 3 ' do' (16 bits)
Literal: 'c' (99) (6 bits)
Literal: 'k' (107) (7 bits)</pre>
<p>The &#8216;forced&#8217; literal normally would have been part of the first match, but by emitting it as a literal instead it was able to find a more efficient match and only two offset/length tokens were needed instead of three, for a difference of 80 bits for the original versus 70 bits for the improved match.  Doing these extra tests does slow down compression a fair bit though, so I made it an optional feature, enabled on the command line.</p>
<p>At this point though, it&#8217;s getting harder and harder to extract gains in compression efficiency, as it starts devolving into a whole bunch of special cases.  For example, increasing the buffer size sometimes makes compression <i>worse</i>, as in the following example:</p>
<pre>'diff' output between two runs:
 17-bit offset: 87005   Length: 10 'with the t' (26 bits)
 14-bit offset: 10812   Length: 3 'arp' (18 bits)
-13-bit offset: 7705   Length: 3 ', w' (17 bits)
-13-bit offset: 5544   Length: 8 'ould you' (19 bits)
+18-bit offset: 131750   Length: 4 ', wo' (41 bits)
+13-bit offset: 5544   Length: 7 'uld you' (19 bits)
 16-bit offset: 50860   Length: 7 '?  You ' (22 bits)
 17-bit offset: 73350   Length: 10 'take that ' (26 bits)</pre>
<p>The compressor looks for the longest matches, and in the &#8216;+&#8217; run it found a longer match, but at a larger offset than in the &#8216;-&#8217; run.  In this case, 18-bit offsets are rare enough that their symbol has been pushed low in the Huffman tree and the bitstring is very long, making it even less efficient to use a long offset, and in the end a whopping 24 bits are completely wasted.  Detecting these kinds of cases requires a bunch of extra tests though, and this is just one example.</p>
<p>So, I think that&#8217;s about all I&#8217;m going to do for attempting to improve the compression efficiency.  How does it do overall?  Well, that 195kB text file that originally compressed to 87.4kB and then made it down to 84.2kB can now be compressed down, with harder searching on and optimal buffer and match length sizes determined, to 77.9kB.  That&#8217;s even lower than &#8216;gzip -9&#8242; at 81.1kB!</p>
<p>It&#8217;s not all good news, though.  If I take the <a href="http://corpus.canterbury.ac.nz/">Canterbury Corpus</a> and test against it, the original total size is 2810784 bytes, &#8216;gzip -9&#8242; reduces them to a total of 730732 bytes (26.0%), and at the default settings, my compressor gets&#8230;785421 bytes (27.9%).  If I enable the extra searching and find optimal compression parameters for each file via &#8216;explode&#8217;, I can get it down to 719246 bytes (25.6%), but that takes a <b>lot</b> of effort.  Otherwise, at the default settings, some of the files are smaller than gzip and others are larger; typically I do worse on the smaller files where there hasn&#8217;t really been much of a chance for the Huffman trees to adapt yet, and the Excel spreadsheet in particular does really poorly with my compressor, for some reason I&#8217;d have to investigate further.</p>
<p>But I&#8217;m not going to.  No, the main remaining problem was one of speed&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.planetcrushers.com/heide/archives/2011/09/02/the-settings-are-a-bit-off/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>I Ain&#8217;t No Huffman</title>
		<link>http://www.planetcrushers.com/heide/archives/2011/08/29/i-aint-no-huffman/</link>
		<comments>http://www.planetcrushers.com/heide/archives/2011/08/29/i-aint-no-huffman/#comments</comments>
		<pubDate>Tue, 30 Aug 2011 05:38:11 +0000</pubDate>
		<dc:creator>heide</dc:creator>
				<category><![CDATA[Geek]]></category>

		<guid isPermaLink="false">http://www.planetcrushers.com/heide/?p=946</guid>
		<description><![CDATA[In terms of compression efficiency, I knew there were some obvious places that could use improvement. In particular, my Huffman trees&#8230;weren&#8217;t even really Huffman trees. The intent was for them to be Huffman-like in that the most frequently seen symbols would be closest to the top of the tree and thus have the shortest bitstrings, [...]]]></description>
				<content:encoded><![CDATA[<p>In terms of compression efficiency, I knew there were some obvious places that could use improvement.  In particular, my Huffman trees&#8230;weren&#8217;t even really <a href="http://en.wikipedia.org/wiki/Huffman_coding">Huffman trees</a>.  The intent was for them to be Huffman-like in that the most frequently seen symbols would be closest to the top of the tree and thus have the shortest bitstrings, but the construction and balancing method was completely different.  Whenever a symbol&#8217;s count increased, I compared it to the parent&#8217;s parent&#8217;s other child, and if the current symbol&#8217;s count was now greater, it swapped it with the current symbol, inserted a new branch where the updated node used to be, and pushed the other child down a level.</p>
<p>Unfortunately, that method led to horribly imbalanced trees, since it only considered nearby nodes when rebalancing, when changing the frequency of a symbol can actually affect the relationship of symbols on relatively distant parts of the tree as well.  As an example, here&#8217;s what a 4-bit length tree wound up looking like with my original adaptive method:</p>
<pre>Lengths tree:
    Leaf node 0: Count=2256 BitString=1
    Leaf node 1: Count=1731 BitString=001
    Leaf node 2: Count=1268 BitString=0001
    Leaf node 3: Count=853 BitString=00001
    Leaf node 4: Count=576 BitString=000001
    Leaf node 5: Count=405 BitString=0000001
    Leaf node 6: Count=313 BitString=00000001
    Leaf node 7: Count=215 BitString=000000000
    Leaf node 8: Count=108 BitString=0000000011
    Leaf node 9: Count=81 BitString=00000000101
    Leaf node 10: Count=47 BitString=000000001001
    Leaf node 11: Count=22 BitString=00000000100001
    Leaf node 12: Count=28 BitString=0000000010001
    Leaf node 13: Count=15 BitString=000000001000000
    Leaf node 14: Count=9 BitString=000000001000001
    Leaf node 15: Count=169 BitString=01
    Avg bits per symbol = 3.881052</pre>
<p>If you take the same data and manually construct a Huffman tree the proper way, you get a much more balanced tree without the ludicrously long strings:
<pre>    Leaf node 0: Count=2256 BitString=10
    Leaf node 1: Count=1731 BitString=01
    Leaf node 2: Count=1268 BitString=111
    Leaf node 3: Count=853 BitString=001
    Leaf node 4: Count=576 BitString=1100
    Leaf node 5: Count=405 BitString=0001
    Leaf node 6: Count=313 BitString=11011
    Leaf node 7: Count=215 BitString=00001
    Leaf node 8: Count=108 BitString=000001
    Leaf node 9: Count=81 BitString=000000
    Leaf node 10: Count=47 BitString=1101000
    Leaf node 11: Count=22 BitString=110100110
    Leaf node 12: Count=28 BitString=11010010
    Leaf node 13: Count=15 BitString=1101001111
    Leaf node 14: Count=9 BitString=1101001110
    Leaf node 15: Count=169 BitString=110101
    Avg bits per symbol = 2.969368</pre>
<p>That&#8217;s nearly a bit per symbol better, which may not sound like much but with the original method there was barely any compression happening at all, whereas a proper tree achieves just over 25% compression.</p>
<p>So, I simply dumped my original adaptive method and made it construct a Huffman tree in the more traditional way, pairing the highest count nodes in a sorted list.  To keep it adaptive, it still does the count check against the parent&#8217;s parent&#8217;s other child, and when it crosses the threshold it simply rebuilds the entire Huffman tree from scratch based on the current symbol counts.  This involves a lot more CPU work, but as we&#8217;ll see later, performance bottlenecks aren&#8217;t necessarily where you think they are&#8230;</p>
<p>My trees also differ from traditional ones in that they prepopulate the tree with all possible symbols with a count of zero, whereas usually you only insert nodes into a Huffman tree if they have a count greater than zero.  This is slightly suboptimal, but it avoids a chicken-and-egg problem with the decoder not knowing what symbol a bitstring corresponds to if it doesn&#8217;t exist in the tree yet because it&#8217;s the first time the symbol has been seen.</p>
<p>Knowing that, and with the improved Huffman trees, another thing became clear: using Huffman trees for the offsets wasn&#8217;t really doing much good at all.  With most files, the offset values are too evenly distributed, and many are never used at all, and all those zero-count entries would get pushed down the tree and become longer strings, so the first time an offset got used it would often have a string longer than its basic bit length, causing file growth instead of compression.  I instead just ripped those trees out and emitted plain old integer values for the offsets.</p>
<p>The way I was constructing my trees also had another limitation: the total number of symbols had to be a power of two.  With the proper construction method, an arbitrary number of symbols could be specified, and that allowed another potential optimization: merging the identifier tree and the literals tree.  The identifier token in the output stream guaranteed that there would always be at least 1 wasted non-data bit per token, and often two.  Merging it with the literals would increase the size of literal symbols, but the expectation is that the larger literal size would on average still be smaller than the sum of the identifier symbols and smaller literal symbols, on average, especially as more &#8216;special case&#8217; symbols are added.  Instead of reading an identifier symbol and deciding what to do based on that, the decoder would read a &#8216;literal&#8217; symbol, and if it was in the range 0-255, it was indeed a literal byte value and interpreted that way, but if it was 256 or above, it would be treated as having a following offset/length pair.</p>
<p>The range of offsets to handle would also have to change, but that&#8217;s for next time&#8230;  With the Huffman tree improvements, my 195kB test file that compressed to 87.4kB before now compressed to 84.2kB.  Still not as good as gzip, but getting there.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.planetcrushers.com/heide/archives/2011/08/29/i-aint-no-huffman/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Compressing History</title>
		<link>http://www.planetcrushers.com/heide/archives/2011/08/26/compressing-history/</link>
		<comments>http://www.planetcrushers.com/heide/archives/2011/08/26/compressing-history/#comments</comments>
		<pubDate>Sat, 27 Aug 2011 05:26:00 +0000</pubDate>
		<dc:creator>heide</dc:creator>
				<category><![CDATA[Geek]]></category>

		<guid isPermaLink="false">http://www.planetcrushers.com/heide/?p=933</guid>
		<description><![CDATA[While sorting through some old files of mine, I happened upon the source code to a compression engine I&#8217;d written 18 years ago. It was one of the first things I&#8217;d ever written in C++, aside from some university coursework, and I worked on it in the evenings during the summer I was in Cold [...]]]></description>
				<content:encoded><![CDATA[<p>While sorting through some old files of mine, I happened upon the source code to a compression engine I&#8217;d written <i>18 years ago</i>.  It was one of the first things I&#8217;d ever written in C++, aside from some university coursework, and I worked on it in the evenings during the summer I was in Cold Lake on a work term, just for fun.  Yes, I am truly a nerd, but there wasn&#8217;t really much else to do in a tiny town like that, especially when you only get 3 TV channels.</p>
<p>Looking at it now it&#8217;s kind of embarrassing, since of course it&#8217;s riddled with inexperience.  No comments at all, leaving me mystified at what some of the code was even doing in the first place, unnecessary global variables, little error checking, poor header/module separation, unnecessary exposure of class internals, poor const correctness, and so on.  It kind of irks my pride to leave something in such a poor state though, so I quickly resolved to at least clean it up a bit.</p>
<p>Of course, I have to understand it first, and I started to remember more about it as I looked over the code.  It&#8217;s a fairly basic combination of both LZ77 pattern matching and Huffman coding, like the ubiquitous Zip method, but the twist I wanted to explore was in making the Huffman trees adaptive, so that the symbols would shift around the tree to automatically adjust as their frequency changed within the input stream.  There were two parameters that controlled compression efficiency: history buffer size, and maximum pattern length.  The history size controls how far back it would look for matching patterns, and the length controlled the upper limit on the length of a match that could be found.</p>
<p>Compression proceeded by moving through the input file byte by byte, looking for the longest possible exact byte match between the data ahead of the current position and the data in the history buffer just behind the current position.  If a match could not be found, it would emit the current byte as a literal and move one byte ahead, and if a match was found, it would emit a token with the offset and length of the match in the history buffer.  To differentiate between these cases, it would first emit an &#8216;identifier&#8217; token with one of four possible values: one for a literal, which would then be followed by the 8-bit value of the literal, and three for offset and length values, with three different possible bit lengths for the offset so that closer matches took fewer bits.   Only matches of length 3 or longer are considered, since two-byte matches would likely have an identifier+offset+length string longer than just emitting the two bytes as literals. In summary, the four possible types of bit strings you&#8217;d see in the output were:</p>
<pre>
    | ident 0 | 8-bit literal |

    | ident 1 | 'x'-bit offset    | length |

    | ident 2 | 'y'-bit offset        | length |

    | ident 3 | 'z'-bit offset            | length |</pre>
<p>And then I used a lot of Huffman trees.  Each of these values were then run through a Huffman tree to generate the actual symbol emitted to the output stream, with separate trees for the identifier token, the literals, the lengths, and the three offset types.  HUFFMAN TREES EVERYWHERE!  The compression parameters were also written to a header in the file, so the decoder would know what history buffer size to use and maximum length allowed.</p>
<p>It worked&#8230;okay&#8230;  I&#8217;ve lost my original test files, but on one example text file of 195kB, my method compresses it down to 87.4kB, while &#8216;gzip -9&#8242; manages 81.1kB.  Not really competitive, but not too bad for a completely amateur attempt either.  There&#8217;s still plenty of room for improvement, which will come&#8230;next time.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.planetcrushers.com/heide/archives/2011/08/26/compressing-history/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

