2008-08-18

PixCede gets shorter file identifiers

My initial filenaming convention for PixCede was pure rubbish.

It all started with the best of intentions. After reading about the development of Pastebin and why the database got scrapped, I was inspired to make PixCede rely on the filesystem and not the database. I did, however, need to keep the timestamp so I could sort the images by the order in which they arrived. So my initial naming convention was:

2008-07-04T07:19:03+00:00_f6cfee1f4e477963a3c24d8f9b769722.jpg

That is, PHP's date('c') followed by an MD5 of the file. My logic was, date('c') would keep the time property and, if by chance any two images hit the system in the same second, they would certainly have different contents, and the MD5 would differentiate them (unless of course, the same image hit at the same time, but.. I don't see why it would need to be there twice). Using date('c') was just a bad idea from the start; if I had spent 2 seconds more thinking about it, I would have just used time(), which returns the number of seconds since the Unix epoch. Using an MD5 hash is a dumb idea too, because it's a pretty expensive operation.

So for version two, I used time() concatenated to uniqid(), a function that creates a UID based on the current time in microseconds (it's what the PHP manual pages recommend to use for Session IDs). Without any parameters, uniqid() returns a 13 character string. That brings me to:

1219098558133aaffc6178602.jpg

Substantially better, but.. as the great doctor says, if something's worth doing, it's worth doing right. Ideally, I want PixCede to send a small enough unique identifier back to the user that he can type it into his browser, after receiving an SMS back. Next on the chopping block: the 13 character unique identifier.

In a dream world, I might get 1000 pictures per second with PixCede. Realistically, I think I only need to worry about two at once (and that's a stretch), but 1000 seems like a nice round number. If I use PHP's base_convert(), I can create a random number from 0 to 1,679,616 (36^4), and convert it to base 36 (10 digits + 26 letters) and only use 4 precious characters. This brings it down to:

1219098558xe21.jpg

Which is a lot better! It preserves the ordering in the first 10 characters, and uses 4 characters on the end to make it unique. But why not go balls to the wall? I might as well convert both the timestamp and the random number to base 36; a timestamp in base36 will sort just as well as one in base10. The final code I am using:

$id = (time() * pow (10, 7)) + rand(0, pow(36, 4));
$uid = base_convert($id, 10, 36);


3bnd9c4wf66.jpg

11 character total, a saving of 47 over my original, poorly encoded 58 characters! This, I feel is an acceptable UID to have to type in by hand. If you want to improve on it, base62 is just small function away (there's a user contributed one on the PHP base_convert() page).

EDIT:
I decided to go ahead and switch it over to base62 encoding. The comment on base_convert() actually didn't work for what I needed; the images were mostly sorted, but off a little bit. It turns out you need to switch the upper and lower case letter sets in the dec2any() function ("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"), and now everything works proper, with a final, 9 character encoding of:

tfebHWV5s.jpg

2008-08-03

Adam's Hang Gliding Adventure: Cooler Than What You Did Saturday

As I was going through some envelopes last week, I found two $100 checks that I wasn't counting on. The "Rules of Surprise Money" were very clear on the next step: I had to buy something I didn't need that was awesome, or do something I didn't need to do that would be awesome. I've got enough junk, so I decided to go with the latter. Browsing through Meetup.com, I found a group of people going hang gliding on August 2nd, and the total cost of ground instruction plus a tandem flight at 1000 feet was $195. The choice was clear -- hang gliding and a Chipotle burrito!

This involved waking up early that man was meant to on a Saturday to meet the Meetup.com participants in Santa Clara at 6:15am and then continue the ride down to Trespines outside of Hollister, CA. Once we arrived there, we split into a ground instruction group and the tandem group. The ground training consisted of learning the basics: learning how to run taking long strides (the longer the strides, the less you bounce up and down, and the lower chance you have of rupturing air tension along the wings of the glider), picking up the glider and balancing it on your shoulders, learning how to orient the glider with or against the wind, walking the glider on the ground, running with the glider, and finally, freakin' lifting off the ground with the glider!

IMG_4810

The tandem was a little more intense. They had a winch that pulled a tow line connected to the tandem glider, so that it would pick up enough speed to lift up. Man. There are very few things that I have experienced that are as intense as that initial lift -- you go up one hundred feet before you even realize what's going on!
After climbing to cruising altitude of about 700 feet, you see those birds flying around down there near the ground. I taunted them for being wusses. After circling around for six or seven minutes, we landed in the field right next to the runway.

IMG_4881

The instructors had all been doing this for years, and had great stories; for example, jumping off of Glacier Point in Yosemite and hang gliding from one side of the valley to the other. Or, the guy who hang glided from Mt. Tam just north of San Francisco about a hundred miles south. Or, the guy who has the longest flight record at 440 miles. Yea. 440 miles. That's roughly equivalent to the distance between Cleveland, Ohio and New York City. In a hang glider.

IMG_4914
Satisfied Customer