Archive for the 'Code' Category

A tale of two developer sites

The other day, I wanted to look at how to write applications for Windows for a class I am in. Where is the logical place to look? The manufacturer’s developer website would be a good place to start. Upon arriving at Microsoft’s Developer Network homepage, suddenly everything got really confusing.

I have a theory about developer home pages, whether they be from Microsoft, Apple, ARM, or Intel: they should be extremely simple and geared mainly towards beginners who know nothing about the technology they want to write software for. I feel that Microsoft fell far short of that. Their MSDN homepage is cluttered, hard to pinpoint where to go if you want to write Windows applications, depends on you knowing about existing Microsoft technologies, and even has ads for their developer products (making people pay for Developer tools is another whole issue).

MSDN home page

Now on the other side of this spectrum is Apple’s developer homepage, which is simple, clean, and extremely informative about where you should go next: “Are you developing for the Mac or the iPhone?”

Apple home page

Matters get worse with the Mac and Windows Dev Center. If I am developer wanting to write an application, I want something that tells me where to go, that explains things simply, and walks me through getting started. Once again, Apple gets it, and even after trying to get through Microsoft’s developer site for about a half-hour, I couldn’t figure it out. Apple’s seemed to take about 2 seconds (or maybe like a minute).

Microsoft’s Windows Dev Center:

Windows Dev Center page

Apple’s Mac Dev Center:

Apple's Dev Center page

So I guess my question is: what’s so wrong with making things simple? Is it just a personal thing?

Granted, all this said, neither of these two developer sites is perfect, in any regard. Both sites place a huge emphasis on their products, versus the actual development on said products. Secondly, anyone who wambles onto their site should be able to write a 5-minute “Hello world!” application with no hassle. Apple and Microsoft both need to simplify, swallow the fact that people do not care to see ads for their products, and then provide a handy guide for how to write applications for their platforms.

Thread safety performance on different CPU architectures

A few weeks ago, I did a post on thread safety performance on my machine. However, that was only on one machine, a 2.16 GHz Intel Core Duo architecture.

Yesterday, Joey and I decided to run some more tests of thread safety performance implementations (no thread safety, mutexes, and semaphores) on different architectures, and the results were quite interesting. We ran tests on my machine again, on a 2.16 GHz Intel Core 2 Duo iMac, a 800 GHz PowerPC G4 iMac, and a Quad Core 2.66 GHz Intel Xeon Mac Pro. After running the same tests on each machine, we normalized the data to get rid of the clock-speed factor.

For mutexes, the machines all performed about the same, with my Core Duo doing slightly better than all of the other machines. In second place was the PPC G4, and then the Core 2 Duo machines took up last. For the semaphores, the PPC G4 smoked the Intel chips, a very interesting result.

Here is a summary of our results (the numbers listed are in cycles/5 seconds and the numbers in parenthesis are the normalized data):

Core Duo:
No thread safety: 440909468 (1)
Mutexes: 98324839 (0.223)
Semaphores: 197981 (0.00044)

Core 2 Duo:
No thread safety: 831519248 (1)
Mutexes: 135168338 (0.163)
Semaphores: 1153251 (0.00139)

PowerPC G4:
No thread safety: 119723944 (1)
Mutexes: 23709889 (0.198)
Semaphores: 1020029 (0.00852)

Quad-core Xeon:
No thread safety: 884848737 (1)
Mutexes: 140097898 (0.158)
Semaphores: 1157646 (0.00131)

Here are some charts representing this data:
All data

Mutex performance

Semaphore performance

Detecting conflicting Objective-C category methods

Objective-C categories are an extremely powerful way to add functionality to an existing class. Take for example, the category below that adds some stack-like methods to NSMutableArray:

@interface NSMutableArray (StackAdditions)
- (id)pop;
- (void)push:(id)object;
@end

@implementation NSMutableArray (StackAdditions)
- (id)pop
{
    id lastObject = [[[self lastObject] retain] autorelease];
    [self removeLastObject];
    return lastObject;
}

- (void)push:(id)object
{
    [self addObject:object];
}
@end

Now let’s say that I was writing a plugin for any app in Mac OS X that had a plugin architecture, such as IB, Aperture, Address Book, etc. Now, let’s also say that these apps had the same category methods on NSMutableArray, but they were slightly different, maybe they didn’t retain the object because of a special need they had. Well, when my plugin gets loaded into the app, the Objective-C runtime will auto-magically replace their category with mine because they have methods with the same name.

Obviously, this presents a problem. Now whenever their code calls -[NSMutableArray pop] or -[NSMutableArray push:], then the runtime will use my implementation instead. And if their category relied on some special happenings in their implementation, then the entire app could quickly fall into an unstable and fatal state.

So what’s to be done? If you are writing an application that has a plugin architecture or you are writing a plugin for an app, you should run the application during testing with the environment variable OBJC_PRINT_REPLACED_METHODS set to YES. What this does is it tells the Objective-C runtime to print out any methods it replaces when loading in a plugin. If you see that your implementation is being overwritten or you are overriding someone else’s, then you know that you will need to change the name of your method to something else, such as -[NSMutableArray myPop] instead of the original method -[NSMutableArray pop].

Attempted to upgrade Wordpress

A new version of Wordpress is out, and after attempting to update to this new and improved version, well… my site broke. So after some finagling with the server, I was able to restore the old version, luckily.

Thread synchronization implementation performance

Thread safety is certainly a very important factor in modern software design, especially as the number of cores increases per machine, and being able to run code concurrently becomes a requirement.

So I decided to see just how expensive using pthread_mutex_lock and semaphores are. The basic test idea is to see how many times you can access a critical section of code. I ran the same test for each synchronization implementation:

  1. No thread-safety.
  2. Thread safe using pthread_mutex_lock.
  3. Thread safe using sem_wait.

The code for each test is below:

void lazy_init_no_lock(void)
{
    static _Bool __isInitialized = 0;
    if (__isInitialized)
        __isInitialized = 1;

    __count++;
}

void lazy_init_lock(void)
{
    static _Bool __isInitialized = 0;

    pthread_mutex_lock(&__lock);
    if (__isInitialized)
        __isInitialized = 1;
    pthread_mutex_unlock(&__lock);

    __count++;
}

void lazy_init_semaphore(void)
{
    static _Bool __isInitialized = 0;

    sem_wait(&sem);
    if (__isInitialized)
        __isInitialized = 1;
    sem_post(&sem);

    __count++;
}

And the results were certainly much more interesting than the code. Pretty much, I called each test from an infinite loop for 5 seconds, and docked how many times I was able to complete the call.

    while(1)
    {
        if (test == 0)
            lazy_init_no_lock();
        else if (test == 1)
            lazy_init_lock();
        else if (test == 2)
            lazy_init_semaphore();
        else
            break;
    }

On average, having no thread safety yielded 440,909,468 calls, thread-safety with mutexes yielded 98,324,839 calls, and semaphores brought up the rear with 1,197,981 calls. It made a nice little graph:
Graph

So I guess the moral of the story is, if you don’t need super powers when doing thread-safety, use mutexes. And if you know your code is not going to run on a multithreaded system, don’t nest all your critical regions in mutexes or semaphores.

How MPRuntime performs

Over spring break, one of the other things I did was do some performance tests of MPRuntime, matching up how it performed against Apple’s Foundation and CoreFoundation frameworks. The results I found were actually quite interesting.

Click here or the button below to see a full report.
View the performance report

Leaving me wanting

The SDK is downloaded, I wrote my first iPhone app, but…

I can’t test it on the iPhone.

Unfortunately, in order to be an iPhone developer, you have to have a certificate that code-signs your software. In order to obtain this certificate, it is necessary to enroll in the iPhone Developer Program, which costs anywhere from $99 to $299. At this point, that isn’t worth it for me.

Conclusion: iPhone development to be continued at a later time.

iPhone SDK on the way

A beta of the iPhone SDK was released this afternoon at about 1 CST. I immediately tried to get on the site, only to be greeted by a friendly page that said:

Safari can’t open the page “http://developer.apple.com/iphone/program/” because the server unexpectedly dropped the connection, which sometimes occurs when the server is busy. You might be able to open the page later.

This continued all afternoon until just about 10 minutes, when I was finally able to get through!
My downloads window

Using counting semaphores on Mac OS X

The POSIX runtime extension includes counting semaphores, specified by functions such as sem_wait, sem_post, and sem_create. Unfortunately, OS X does not implement sem_create for counting (unnamed) semaphores, only implementing it for named semaphores. This certainly creates quite an issue if you want the ease of use (or power in some cases), of a counting semaphore.

I thought I was at a loss for using them until I stumbled upon the Mach kernel primitive semaphores, which happen to be, thank goodness, counting semaphores. The Mach kernel specifies the following functions for using semaphores, as defined by <mach/semaphore.h> and <mach/task.h> (Note: I have only listed the important ones that have POSIX relatives):

kern_return_t semaphore_create(task_t task, semaphore_t *semaphore,
    int policy, int value)
kern_return_t semaphore_signal(semaphore_t semaphore)
kern_return_t semaphore_signal_all(semaphore_t semaphore)
kern_return_t semaphore_wait(semaphore_t semaphore)
kern_return_t semaphore_destroy(task_t task, semaphore_t semaphore)
kern_return_t semaphore_signal_thread(semaphore_t semaphore,
    thread_act_t thread_act)

By looking at them, you can probably guess what they do, and if you don’t, you can always look it up in the documentation, or click here to view it on Apple’s developer website.

You may however, be wondering what the task_t type is, and how you might possibly retrieve the correct task_t struct. But never fear, in <mach/task.h>, there are two methods for getting the current task: current_task and mach_task_self, which both return the current task, but in different forms. For user space programming, you should use mach_task_self, which returns a pointer to the kernel’s virtual memory map. If you are doing kernel programming, however, you would not want to use mach_task_self, and instead you want to use current_task. Do not attempt to use current_task in user-space programming, since linking of your code will fail since the symbol is only available to Kernel extensions.

Below is an example using the counting semaphore primitives. Basically, we are counting up and counting down a static variable by the same amount on two separate threads. With the semaphores, the final value should be zero. If you remove the semaphores, you will not get zero, and you will get some arbitrary value.

#include <stdio.h>
#include <pthread.h>

#include <mach/semaphore.h>
#include <mach/task.h>

static int x = 0;
semaphore_t sem = 0;

void * countUp(void *param)
{
    semaphore_wait(sem);

    unsigned i = 0;
    for (i = 0; i < 100000000; i++)
        x++;

    semaphore_signal(sem);
    return NULL;
}

void * countDown(void * param)
{
    semaphore_wait(sem);
    unsigned i = 0;
    for (i = 0; i < 100000000; i++)
        x--;

    semaphore_signal(sem);
    return NULL;
}

int main (int argc, const char * argv[])
{
    // Create our semaphore. SYNC_POLICY_FIFO is how we handle threads that are
    // waiting on the semaphore. I am pretty sure that POSIX uses FIFO, so I
    // think it is best to use that here, though there are other options defined
    // in .
    int initialValue = 1;
    semaphore_create(mach_task_self(), &sem, SYNC_POLICY_FIFO, initialValue);

    pthread_t up, down;
    pthread_create(&up, NULL, countUp, NULL);
    pthread_create(&down, NULL, countDown, NULL);

    pthread_join(up, NULL);
    pthread_join(down, NULL);

    printf(”%d\n”, x);  // Should print 0.

    return 0;
}

Now the only problem with this code is that it is not platform independent, and will not compile on Linux. If you wanted, you could make platform independent versions of semaphore creating, posting, and waiting functions, as implemented in the example below:

#ifdef __APPLE__
#include <mach/semaphore.h>
#include <mach/task.h>
#else
#include <semaphore.h>
#endif

void platform_sem_create(void * semStructure, int initialValue)
{
    #ifdef __APPLE__
    semaphore_create(mach_task_self(), (semaphore_t *)semStructure, SYNC_POLICY_FIFO, initialValue);
    #else
    int pshared = 0;
    sem_init((sem_t *)semStructure, pshared, initialValue);
    #endif
}

void platform_sem_signal(void * semStructure)
{
    #ifdef __APPLE__
    semaphore_signal(*((semaphore_t *)semStructure));
    #else
    sem_post((sem_t *)semStructure);
    #endif
}

void platform_sem_wait(void * semStructure)
{
    #ifdef __APPLE__
    semaphore_wait(*((semaphore_t *)semStructure));
    #else
    sem_wait((sem_t *)semStructure);
    #endif
}

Then in your code, just use platform_sem_create, platform_sem_signal, and platform_sem_wait instead of functions like sem_wait, semaphore_create, etc.

And that’s the magic to using counting semaphores on Mac OS X.

MPRuntime now has autorelease

I just finished implementing an autorelease pool opaque type for the MPRuntime, and to be honest, it’s pretty sweet. Basically an autorelease pool is a list that you put object pointers in, and then when you are done with the pool and you release it, it releases all of the objects in the pool. This can greatly reduce code size and also delay calling expensive memory free calls until a single moment.

Non-autorelease pool way:
Below is the old code for creating an array, and then adding 100 integers to it.

// Create an array and add a bunch of integers to it.
MPArrayRef array = MPArrayCreate(&kMPTypeArrayCallbacks);
for (unsigned i = 0; i < 100; i++)
{
    MPNumberRef num = MPNumberCreateWithUnsigned(i);
    MPArrayAppendValue(array, num);
    MPRelease(num);
}

// Release our array.
MPRelease(array);

As you can see, it’s quite a hassle to keep the pointer around so you can call release on it. The example above isn’t the greatest because there are no like nested function calls and such, but just imagine magnifying the number of MPRelease statements and how many pointers you would have to keep around. So instead, as in the example below, you can simply autorelease the object so you don’t have to keep that pointer around, and you can send created objects directly in as arguments to function calls and what not.

Autorelease pool way:

// Create the pool.
MPAutoreleasePoolRef pool = MPAutoreleasePoolCreate(0);

// Create an array and add a bunch of integers to it.
MPArrayRef array = MPAutorelease(MPArrayCreate(&kMPTypeArrayCallbacks));
for (unsigned i = 0; i < 100; i++)
    MPArrayAppendValue(array, MPAutorelease(MPNumberCreateWithUnsigned(i)));

// Release the pool, thus draining it.
MPRelease(pool);

Now that’s what I call sweet!

The files on the MPRuntime page contain the autorelease pool.