Finds by @samoli

Ruby/Rails developer and designer in London.

Here are articles that I've read and found interesting, mostly from places like Hacker News and a collection of RSS feeds that I follow. It's mainly here for my own records but you're welcome to follow me.

I'm starring items I find in Google Reader (using Byline) and using ifttt to publish them to Tumblr. This way I can read and reblog articles on the tube (London underground) where there is no Internet connection.

You can also subscribe via email if you like

N.B I'm investigating why the full contents of the post are sometimes posted. They are intended to be links but sometimes the whole post comes through. Also I'd like a way to add a note to the post but since Google removed most of the useful features in reader, I'm not sure how I can do this.

— @samoli on Twitter.

When I started writing The Spirit War (Eli novel #4), I had a bit of a problem. I had a brand new baby and my life (like every new mother’s life) was constantly on the verge of shambles. I paid for a sitter four times a week so I could get some writing time, and I guarded these hours like a mama bear guards her cubs - with ferocity and hiker-mauling violence. To keep my schedule and make my deadlines, I needed to write 4000 words during each of these carefully arranged sessions. I thought this would be simple. After all, before I quit my job to write full time I’d been writing 2k a day in the three hours before work. Surely with 6 hours of baby free writing time, 4k a day would be nothing….

But (of course), things didn’t work out like that. Every day I’d sit down to add 4000 words to my new manuscript. I was determined, I was experienced, I knew my world. There was no reason I couldn’t get 4k down. But every night when I hauled myself away, my word count had only increased by 2k, the same number of words I’d been getting before I quit my day job. 

Needless to say, I felt like a failure. Here I was, a professional writer with three books about to come out, and I couldn’t even beat the writing I’d done before I went pro. At first I made excuses, this novel was the most complicated of all the Eli books I’d written, I was tired because my son thinks 4am is an awesome time to play, etc. etc. But the truth was there was no excuse. I had to find a way to boost my word count, and with months of 2k a day dragging me down, I had to do it fast. So I got scientific. I gathered data and tried experiments, and ultimately ended up boosting my word count to heights far beyond what I’d thought was possible, and I did it while making my writing better than ever before.

When I told people at ConCarolinas that I’d gone from writing 2k to 10k per day, I got a huge response. Everyone wanted to know how I’d done it, and I finally got so sick of telling the same story over and over again that I decided to write it down here.

So, once and for all, here’s the story of how I went from writing 500 words an hour to over 1500, and (hopefully) how you can too:


A quick note: There are many fine, successful writers out there who equate writing quickly with being a hack. I firmly disagree. My methods remove the dross, the time spent tooling around lost in your daily writing, not the time spent making plot decisions or word choices. This is not a choice between ruminating on art or churning out the novels for gross commercialism (though I happen to like commercial novels), it’s about not wasting your time for whatever sort of novels you want to write.
Drastically increasing your words per day is actually pretty easy, all it takes is a shift in perspective and the ability to be honest with yourself (which is the hardest part). Because I’m a giant nerd, I ended up creating a metric, a triangle with three core requirements: Knowledge, Time, and Enthusiasm. Any one of these can noticeably boost your daily output, but all three together can turn you into a word machine. I never start writing these days unless I can hit all three.

Update! The talented Vicky Teinaki made a graphic of this metric and let me use it! She is awesome!



Side 1: Knowledge, or Know What You’re Writing Before You Write It
The first big boost to my daily wordcount happened almost by accident. Used to be I would just pop open the laptop and start writing. Now, I wasn’t a total make-it-up-as-you-go writer. I had a general plot outline, but my scene notes were things like “Miranda and Banage argue” or “Eli steals the king.” Not very useful, but I knew generally what direction I was writing in, and I liked to let the characters decide how the scene would go. Unfortunately, this meant I wasted a lot of time rewriting and backtracking when the scene veered off course. 

This was how I had always written, it felt natural to me. But then one day I got mired in a real mess. I had spent three days knee deep in the same horrible scene. I was drastically behind on my wordcount, and I was facing the real possibility of missing my deadline… again. It was the perfect storm of all my insecurities, the thought of letting people down mixed with the fear that I really didn’t know what I was doing, that I wasn’t a real writer at all, just an amateur pretending to be one. But as I got angrier and angrier with myself, I looked down at my novel and suddenly realized that I was being an absolute idiot. Here I was, desperate for time, floundering in a scene, and yet I was doing the hardest work of writing (figuring out exactly what needs to happen to move the scene forward in the most dramatic and exciting way) in the most time consuming way possible (ie, in the middle of the writing itself). 

As soon as I realized this, I stopped. I closed my laptop and got out my pad of paper. Then, instead of trying to write the scene in the novel as I had been, I started scribbling a very short hand, truncated version the scene on the paper. I didn’t describe anything, I didn’t do transitions. I wasn’t writing, I was simply noting down what I would write when the time came. It took me about five minutes and three pages of notebook paper to untangle my seemingly unfixable scene, the one that had just eaten three days of my life before I tried this new approach. Better still, after I’d worked everything out in shorthand I was able to dive back into the scene and finish it in record time. The words flew onto the screen, and at the end of that session I’d written 3000 words rather than 2000, most of them in that last hour and a half.

Looking back, it was so simple I feel stupid for not thinking of it sooner. If you want to write faster, the first step is to know what you’re writing before you write it. I’m not even talking about macro plot stuff, I mean working out the back and forth exchanges of an argument between characters, blocking out fights, writing up fast descriptions. Writing this stuff out in words you actually want other people to read, especially if you’re making everything up as you go along, takes FOREVER. It’s horribly inefficient and when you get yourself in a dead end, you end up trashing hundreds, sometimes thousands of words to get out. But jotting it down on a pad? Takes no time at all. If the scene you’re sketching out starts to go the wrong way, you see it immedeatly, and all you have to do is cross out the parts that went sour and start again at the beginning. That’s it. No words lost, no time wasted. It was god damn beautiful.

Every writing session after this realization, I dedicated five minutes (sometimes more, never less) and wrote out a quick description of what I was going to write. Sometimes it wasn’t even a paragraph, just a list of this happens then this then this. This simple change, these five stupid minutes, boosted my wordcount enormously. I went from writing 2k a day to writing 5k a day within a week without increasing my 5 hour writing block. Some days I even finished early.

Of the three sides of the triangle, I consider knowledge to be the most important. This step alone more than doubled my word count. If you only want to try one change at a time, this is the one I recommend the most.

Side 2: Time
Now that I’d had such a huge boost from one minor change, I started to wonder what else I could do to jack my numbers up even higher. But as I looked for other things I could tweak, I quickly realized that I knew embarrassingly little about how I actually wrote my novels. I’d kept no records of my progress, I couldn’t even tell you how long it took me to write any of my last three novels beyond broad guesstimations, celebratory blog posts, and vague memories of past word counts. It was like I started every book by throwing myself at the keyboard and praying for a novel to shoot out of my fingers before the deadline. And keep in mind this is my business. Can you imagine a bakery or a freelance designer working this way? Never tracking hours or keeping a record of how long it took me to actually produce the thing I was selling? Yeah, pretty stupid way to work.

If I was going to boost my output (or know how long it took me to actually write a freaking novel), I had to know what I was outputting in the first place. So, I started keeping records. Every day I had a writing session I would note the time I started, the time I stopped, how many words I wrote, and where I was writing on a spreadsheet. I did this for two months, and then I looked for patterns.

Several things were immediately clear. First, my productivity was at its highest when I was in a place other than my home. That is to say, a place without internet. The afternoons I wrote at the coffee shop with no wireless were twice as productive as the mornings I wrote at home. I also saw that, while butt in chair time is the root of all writing, not all butt in chair time is equal. For example, those days where I only got one hour to write I never managed more than five hundred words in that hour. By contrast, those days I got five hours of solid writing I was clearing close to 1500 words an hour. The numbers were clear: the longer I wrote, the faster I wrote (and I believe the better I wrote, certainly the writing got easier the longer I went). This corresponding rise of wordcount and writing hours only worked up to a point, though. There was a definite words per hour drop off around hour 7 when I was simply too brain fried to go on. 

But these numbers are very personal, the point I’m trying to make is that by recording my progress every day I had the data I needed to start optimizing my daily writing. Once I had my data in hand, I rearranged my schedule to make sure my writing time was always in the afternoon (my most prolific time according to my sheet, which was a real discovery. I would have bet money I was better in the morning.), always at my coffee shop with no internet, and always at least 4 hours long. Once I set my time, I guarded it viciously, and low and behold my words per day shot up again. This time to an average of 6k-7k per writing day, and all without adding any extra hours. All I had to do was discover what made good writing time for me and then make sure the good writing time was the time I fought hardest to get. 

Even if you don’t have the luxury of 4 uninterrupted hours at your prime time of day, I highly suggest measuring your writing in the times you do have to write. Even if you only have 1 free hour a day, trying that hour in the morning some days and the evening on others and tracking the results can make sure you aren’t wasting your precious writing time on avoidable inefficiencies. Time really does matter.

Side 3: Enthusiasm
I was flying high on my new discoveries. Over the course of two months I’d jacked my daily writing from 2k per day to 7k with just a few simple changes and was now actually running ahead of schedule for the first time  in my writing career. But I wasn’t done yet. I was absolutely determined I was going to break the 10k a day barrier.

I’d actually broken it before. Using Knowledge and Time, I’d already managed a few 10k+ days, including one where I wrote 12,689 words, or two chapters, in 7 hours. To be fair, I had been writing outside of my usual writing window in addition to my normal writing on those days, so it wasn’t a total words-per-hour efficiency jump. But that’s the great thing about going this fast, the novel starts to eat you and you find yourself writing any time you can just for the pure joy of it. Even better, on the days where I broke 10k, I was also pulling fantastic words-per-hour numbers, 1600 - 2000 words per hour as opposed to my usual 1500. It was clear these days were special, but I didn’t know how. I did know that I wanted those days to become the norm rather than the exception, so I went back to my records (which I now kept meticulously) to find out what made the 10k days different.


The answer was head-slappingly obvious. Those days I broke 10k were the days I was writing scenes I’d been dying to write since I planned the book. They were the candy bar scenes, the scenes I wrote all that other stuff to get to. By contrast, my slow days (days where I was struggling to break 5k) corresponded to the scenes I wasn’t that crazy about.

This was a duh moment for me, but it also brought up a troubling new problem. If I had scenes that were boring enough that I didn’t want to write them, then there was no way in hell anyone would want to read them. This was my novel, after all. If I didn’t love it, no one would.

Fortunately, the solution turned out to be, yet again, stupidly simple. Every day, while I was writing out my little description of what I was going to write for the knowledge component of the triangle, I would play the scene through in my mind and try to get excited about it. I’d look for all the cool little hooks, the parts that interested me most, and focus on those since they were obviously what made the scene cool. If I couldn’t find anything to get excited over, then I would change the scene, or get rid of it entirely. I decided then and there that, no matter how useful a scene might be for my plot, boring scenes had no place in my novels. 


This discovery turned out to be a fantastic one for my writing. I trashed and rewrote several otherwise perfectly good scenes, and the effect on the novel was amazing. Plus, my daily wordcount numbers shot up again because I was always excited about my work. Double bonus!

Life On 10k A Day
With all three sides of my triangle now in place, I was routinely pulling 10-12k per day by the time I finished Spirits’ End, the fifth Eli novel. I was almost 2 months ahead of where I’d thought I’d be, and the novel had only taken me 3 months to write rather than the 7 months I’d burned on the Spirit War (facts I knew now that I was keeping records). I was ahead of schedule with plenty of time to do revisions before I needed to hand the novel in to my editor, and I was happier with my writing than ever before. There were several days toward the end when I’d close my laptop and stumble out of the coffee shop feeling almost drunk on writing. I felt like I was on top of the world, utterly invincible and happier than I’ve ever been. Writing that much that quickly was like taking some kind of weird success opiate, and I was thoroughly addicted. Once you’ve hit 10k a day for a week straight, anything less feels like your story is crawling.

Now, again, 10k a day is my high point as a professional author whose child is now in daycare (PRICELESS). I write 6 - 7 hours a day, usually 2 in the morning and 4-5 in the afternoon, five days a week. Honestly, I don’t see how anyone other than a full time novelist could pull those kind of hours, but that doesn’t mean you have to be a pro to drastically increase your daily word count.

So 10k might be the high end of the spectrum, but of the people I’ve told about this (a lot) who’ve gotten back to me (not nearly as many), most have doubled their word counts by striving to hit all three sides of the triangle every time they write. This means some have gone from 1k a day to 2k, or 2k to 4k. Some of my great success with increasing my wordcount is undoubtedly a product of experience, as I also hit my million word mark somewhere in the fifth Eli novel. Even so, I believe most of the big leaps in efficiency came from changing the way I approached my writing. Just as changing your lifestyle can help you lose a hundred pounds, changing they way you sit down to write can boost your words per hour in astonishing ways.

If you’re looking to get more out of your writing time, I really hope you try my triangle. If you do, please write me (or comment below) and let me know. Even if it doesn’t work (especially if it doesn’t work) I’d love to hear about it. Also, if you find another efficiency hack for writing, let me know about that too! There’s no reason our triangle can’t be a square, and I’m always looking for a way to hit 15k a day :D. 

Again, I really hope this helps you hit your goals. Good luck with your writing!

- Rachel Aaron

UPDATE, November 25, 2011: Wow, I never expected this to be so popular! I’m so glad it’s been helpful for people! Just to update you all on my progress, since writing this post, I’ve completed, edited (and re-edited, and rewrote again, sigh, still working on speeding that part up), and sent out 2 complete novels, one of which was written in 12 days. You can see the worksheets and stats for that one in my 12 Days of Glory post.

Also, if anyone’s interested in how I plot my novels, I’ve posted that here: How I Plot A Novel in 5 Steps! Again, I’m so glad this was helpful to all of you. Thank you so much for all the lovely feedback!

- R

UPDATE, February 7, 2012: I now have a post on editing novels as well called Editing for People Who Hate Editing if you’re at that stage and want to see how I do it.



[shared via Hacker News]

Mar 26

RSpec 2.9.0 released … GitHub Ruby Styleguide … technoweenie’s blog post, “Ending the Mass Assignment Party” … Life Without Gemsets … Custom Bundler Groups … Avdi Grimm’s _Objects on Rails_ Book Available Free

Listen to this episode on Ruby5

This episode is sponsored by New Relic
New Relic is the all-in-one web performance analytics product. It lets you manage and monitor web application performance, from the browser down to the line of code. With Real User Monitoring, New Relic users can see browser response times by geographical location of the user, or by browser type.

Ruby5 News Byte: RSpec 2.9.0 Released
RSpec 2.9.0 has been released.

GitHub Ruby Styleguide
Rejoice, Ruby troglodytes! GitHub has come down from the mountain with the GitHub Ruby Styleguide. Read it! Live it! Love it!

Ending the Mass Assignment Party
technoweenie’s blog post, “Ending the Mass Assignment Party”, and the TainedHash gem for guarding against mass assignment vulnerabilities.

Life Without Gemsets
If you’re using Bundler on all of your projects and you’re still using RVM gemsets, it’s time to move on. Bundler don’t need no stinkin’ gemsets.

Bundler Groups
Make your Gemfile tell the whole story by taking advantage of custom Bundler groups.

Objects on Rails Book Available Free
Avdi Grimm’s “Objects on Rails” e-book is out of beta, and now available to read online, for free.

[shared via Ruby5]

Mar 23

You probably know about this Factory Girl definition syntax:

FactoryGirl.define do
  factory :user do
    name 'Connie Customer'
  end
end

But did you know about this Factory Girl invocation syntax?

setup do
  @user = create(:user)
end

Or:

setup do
  @user = build(:user)
end

Or:

setup do
  post :create, user: attributes_for(:user)
end

It’s in there.

Configuration for Test::Unit / Shoulda:

class ActiveSupport::TestCase
  include FactoryGirl::Syntax::Methods
end

Configuration for RSpec:

RSpec.configure do |config|
  config.include FactoryGirl::Syntax::Methods
end

Configuration for Cucumber:

World Factory::Syntax::Methods

[shared via giant robots smashing into other giant robots]

Mar 12

Since version 10.5 “Leopard”, Mac OS X has had DTrace, a tool used for performance analysis and troubleshooting. It provides data for Apple’s Instruments tool, as well as a collection of command line tools that are implemented as DTrace scripts. I’m familiar with the latter as I wrote the originals for the DTraceToolkit, which Apple then customized and enhanced for Mac OS X where they are shipped by default (great!). I use them regularly to answer this question:

why is my MacBook slow?

I work in an office where everyone has MacBook Pros, and “why is my MacBook slow?” is a common question. Applications can become slow or unresponsive while waiting for CPU work, memory requests or disk I/O to complete.

For people who try to ignore the slowdown, the question can become:

why is my MacBook fan so loud?

Standard performance analysis tools like Activity Monitor and top(1) (and any third-party tools based on the same foundation) can’t tell you some key information about activity on your system, such as how much CPU consumption is caused by short-lived processes, or which processes are causing disk I/O. DTrace, however, can see (just about) everything.

In this post, I’ll cover the top ten Mac OS X DTrace scripts that I use for figuring out why laptops are slow or why applications are misbehaving. Most of these scripts are already installed, a few are from the new DTrace book.

How to use DTrace

If you’ve never run a DTrace script before or even used the command line, here’s a basic walkthrough:

Open Terminal

Run the “Terminal” application. You can find it here in Finder:

You can also type “terminal” in Spotlight (the magnifying glass in the top right corner of your Mac’s screen), which should find it. I usually drag it to my Dock so it’s easy to find later:

Customize Terminal

When you first run Terminal, it’ll probably look like this:

I find the default font small and hard to read. The size of the window in terms of characters is also small (80 columns by 24 rows), presumably to pay homage to original Unix terminals of that size. (Why Unix? Mac OS X is Unix under the hood: the Darwin kernel).

You can adjust the font to your liking in Terminal->Preferences->Text. I use Monaco 13pt, with “Antialias Text” on. Under the “Window” tab is the default size, 80×24, which you can also increase later by clicking and dragging the bottom right corner of the terminal. For example, here’s my screen as I write this blog post (in a terminal-based text editor).

Running top

While DTrace can see everything, there are some things already covered by easy-to-use (and easy-to-type) tools, like top(1). Now that you have Terminal running, type “top -o cpu” and hit enter, which will refresh the screen showing top running processes. Type “q” to exit the top program. The output will look something like this:

Brendan-2:~ brendan$ top -o cpu
[...]
PID    COMMAND      %CPU TIME     #TH  #WQ  #PORT #MREGS RPRVT  RSHRD  RSIZE
67254- firefox-bin  98.8 04:53:45 62/1 1    743   9935+  374M+  78M    637M+
86550  top          15.3 00:28.60 1/1  0    24    33     1748K  264K   2324K
1068   Terminal     13.2 17:38:19 5    1    358   1326   37M    67M    61M+
61501- Google Chrom 7.9  13:40.89 6    1    93-   556-   45M-   73M    77M-
[...]

I’ve truncated the header block to just include the columns. Look at the “%CPU” column to see which processes are making the CPUs busy, and the “RSIZE” column to see who is consuming main memory. The busiest (CPU) process will be at the top, since we sorted on cpu (“-o cpu”). The top was “firefox-bin” (Mozilla Firefox) at 98.8% CPU, which is in terms of a single processor (this has two). If firefox stayed that high you could look for the responsible tab and close it down, or restart Firefox.

After top, I turn to DTrace.

Running DTrace

DTrace requires admin privileges, so to use it you’ll usually need to type in a password to authenticate, provided you have administrator access (if you aren’t sure you do, click here to see how to check).

You can run DTrace by prefixing your DTrace commands with “sudo”, which will prompt for the password the first time around (but not for some time after that). When a DTrace command is running, you usually type Ctrl-C to end it. Here’s an example:

Brendan-2:~ brendan$ sudo iosnoop
Password: password
  UID   PID D    BLOCK   SIZE       COMM PATHNAME
  503 67261 W 384070496  73728  TweetDeck ??/Cookies/Cookies.plist_tmp_67261_0.dat
  503 67261 W 384070640   4096  TweetDeck ??/Local Store/td_26_brendangregg.db-journal
  503 67261 W 384070640   4096  TweetDeck ??/Local Store/td_26_brendangregg.db-journal
  503 67261 W 308056800   4096  TweetDeck ??/Local Store/td_26_brendangregg.db
  503 67261 W 308056856   4096  TweetDeck ??/Local Store/td_26_brendangregg.db
^C
brendan@macbook:~>

If sudo says “permission denied”, check your admin status.

and here’s what happens if you forgot the “sudo”:

Brendan-2:~ brendan$ iosnoop
dtrace: failed to initialize dtrace: DTrace requires additional privileges

Scripts

The scripts follow. A note on style: if the script ends with “.d”, it’s a basic DTrace script. If it doesn’t, then it’s a shell-wrapped script that provides command line options. Use “-h” to list them.

And if you’d like to learn DTrace, I’d recommend starting by reading the basic ones.

1. iosnoop

This “traces” disk I/O execution live. Each time a disk I/O completes, a line of output is printed to summarize it, including process name and filename details:

Brendan-2:~ brendan$ sudo iosnoop
  UID   PID D    BLOCK   SIZE       COMM PATHNAME
  503 54079 R 286522800   4096 Google Chrome ??/Cache/data_2
  503 54079 R 286522808   4096 Google Chrome ??/Cache/data_2
  503 34852 W 385001216  53248      Adium ??/Default/.dat8824.01b
  503 65002 W 308056800   4096  TweetDeck ??/Local Store/td_26_brendangregg.db
  503 65002 W 308056864   4096  TweetDeck ??/Local Store/td_26_brendangregg.db
  503 65002 W 385001320   4096  TweetDeck ??/Local Store/td_26_brendangregg.db-journal
  503 65002 W 385001320   4096  TweetDeck ??/Local Store/td_26_brendangregg.db-journal
  503 65002 W 385001320   4096  TweetDeck ??/Local Store/td_26_brendangregg.db-journal
  503 54079 W 385001384  12288 Google Chrome ??/Default/Cookies-journal
  503 54079 W 385001384   4096 Google Chrome ??/Default/Cookies-journal
  503 54079 W 134993856   4096 Google Chrome ??/Default/Cookies
  503 54079 W 134994056   4096 Google Chrome ??/Default/Cookies
  503 54079 W 134994176   4096 Google Chrome ??/Default/Cookies
  503 54079 W 134994224   8192 Google Chrome ??/Default/Cookies
[...]

This lets you instantly find out which applications are using the disk, and what files they are reading or writing to. Disk I/O is typically slow (for non-SSD disks), so an application calling frequent disk I/O (a dozen per second or more) may run slowly as it waits for the disk I/O to complete.

The output columns show: UID = user ID, PID = process ID (unique identifier for the process), D = direction (R = read, W = write), BLOCK = location on disk, SIZE = I/O size in bytes, COMM = process name, PATHNAME = trailing portion of file pathname.

In that output I caught Google Chrome reading from a cache file (“data_2″), and writing to cookie files (“Cookies-journal” and “Cookies”). TweetDeck also wrote to database files (“td_26_brendangregg.db” and “td_26_brendangregg.db-journal”). The “??” is where the path information ends for iosnoop (if you are a darwin programmer and want to take a swing at improving that, see the fi_pathname translator in /usr/lib/dtrace/io.d).

The “-h” option lists options (don’t need “sudo” for this):

Brendan-2:~ brendan$ iosnoop -h
USAGE: iosnoop [-a|-A|-DeghiNostv] [-d device] [-f filename]
               [-m mount_point] [-n name] [-p PID]
       iosnoop          # default output
                -a      # print all data (mostly)
                -A      # dump all data, space delimited
                -D      # print time delta, us (elapsed)
                -e      # print device name
                -g      # print command arguments
                -i      # print device instance
                -N      # print major and minor numbers
                -o      # print disk delta time, us
                -s      # print start time, us
                -t      # print completion time, us
                -v      # print completion time, string
                -d device       # instance name to snoop
                -f filename     # snoop this file only
                -m mount_point  # this FS only
                -n name         # this process name only
                -p PID          # this PID only
   eg,
        iosnoop -v      # human readable timestamps
        iosnoop -N      # print major and minor numbers
        iosnoop -m /    # snoop events on filesystem / only

For tricky performance issues I often use “-stoD” to get start and end timestamps for each I/O in microseconds, and a couple of different types of I/O time calculations.

In hindsight, I should have called it diskiosnoop, since “io” could refer to different locations in the kernel I/O stack.

2. hfsslower.d

This script answers an iosnoop FAQ: why dosen’t iosnoop see my application disk I/O?

The reason is that applications rarely request disk I/O directly, rather, they access a file system that does disk I/O on their behalf. To increase performance, the file system will usually try to cache as much file data as possible in main memory (DRAM). The application may (by some notion) think that it’s doing disk I/O, but it’s actually reading from very fast DRAM, thanks to the file system. Writes can also buffer in DRAM and write to disk later, which also speeds up application performance.

The hfsslower.d script measures I/O before it is processed by the HFS+ file system (Apple’s current default file system). iosnoop measures I/O after the file system, and only if it reaches disk:

A couple of points:

  • hfsslower.d will see a lot more I/O than iosnoop, as it includes file system cache hits.
  • hfsslower.d better reflects application performance, as it measures the same latency that the application directly suffered.

The hfsslower.d script is from the DTrace book, and can be found here. To run it, you’ll need to create a text file containing the script (or pull it from the DTrace book tarball), and make the file executable from Terminal by running “chmod 755 hfsslower.d”.

This script takes an argument which is the minimum number of milliseconds to show I/O for. Here’s tracing 1ms HFS+ I/O and slower:

Brendan-2:~ brendan$ sudo ./dtbook_scripts/Chap5/hfsslower.d 1
TIME                 PROCESS          D   KB     ms FILE
2011 Sep 27 19:00:15 Google Chrome    R    0     15 data_5
2011 Sep 27 19:00:15 Google Chrome    R    0     10 data_0
2011 Sep 27 19:00:15 Google Chrome    R    0     10 data_0
2011 Sep 27 19:00:15 Google Chrome    R    0     10 data_5
2011 Sep 27 19:00:21 TweetDeck        W    0      1 td_26_brendangregg.db-journal
2011 Sep 27 19:00:25 Adium            R    0     18 Smile.png
2011 Sep 27 19:01:08 firefox-bin      W    0     17 _CACHE_001_
2011 Sep 27 19:01:36 firefox-bin      W    0      9 _CACHE_001_
2011 Sep 27 19:01:36 firefox-bin      W    2     14 _CACHE_002_
2011 Sep 27 19:01:36 firefox-bin      W    0      7 _CACHE_001_
2011 Sep 27 19:01:37 firefox-bin      W    0      1 _CACHE_001_
2011 Sep 27 19:01:40 firefox-bin      W    0      6 _CACHE_001_
2011 Sep 27 19:01:46 firefox-bin      W    0     14 _CACHE_001_
2011 Sep 27 19:01:49 firefox-bin      R   15      1 _CACHE_003_
2011 Sep 27 19:01:49 firefox-bin      W   12      7 _CACHE_003_
2011 Sep 27 19:01:49 firefox-bin      W    0      4 _CACHE_001_
2011 Sep 27 19:01:51 firefox-bin      W    0      9 _CACHE_001_
2011 Sep 27 19:01:52 firefox-bin      R   21     27 Times.dfont/..namedfork/rsrc
2011 Sep 27 19:01:52 fontd            R    0     16 annex_aux
2011 Sep 27 19:01:52 firefox-bin      W    0      5 _CACHE_001_
[...]

Columns are: TIME = time of I/O completion, PROCESS = application name, D = direction (R = read, W = write), KB = I/O size in Kbytes, ms = I/O latency in milliseconds, FILE = filename.

If you use the argument “0″, it will trace everything. If I’m chasing down slow I/O, I’ll often use an argument of “10″ for I/O slower than 10 milliseconds.

At this point you may think: if you just care about slow I/O, then just use iosnoop. That works to a point, but there can be slow I/O caused by something other than disks (file system lock contention, for example). The other advantage of the hfsslower.d script is that the measured latency matches the application pain suffered, whereas at the disk level you can only assume a correlation.

3. execsnoop

This traces the execution of new processes. This is great at identifying short-lived processes that may be caused by misbehaving applications and can slow down your system. These short-lived processes are usually too quick to be picked up by standard monitoring tools like the Activity Monitor or top(1).

To demonstrate this tool, here’s what happens when you type “man ls”:

Brendan-2:~ brendan$ sudo execsnoop -v
STRTIME                UID    PID   PPID ARGS
2011 Sep 28 20:19:18     0  67234  66312 man
2011 Sep 28 20:19:18     0  67234  66312 man
2011 Sep 28 20:19:18     0  67235  67234 sh
2011 Sep 28 20:19:18     0  67235  67234 gzip
2011 Sep 28 20:19:19     0  67236  67234 sh
2011 Sep 28 20:19:18     0  67234  66312 man
2011 Sep 28 20:19:19     0  67236  67234 gzip
2011 Sep 28 20:19:19     0  67234  66312 man
2011 Sep 28 20:19:19     0  67244  67243 less
2011 Sep 28 20:19:19     0  67242  67239 sh
2011 Sep 28 20:19:19     0  67242  67239 gzip
2011 Sep 28 20:19:19     0  67240  67238 tbl
2011 Sep 28 20:19:19     0  67241  67238 groff
2011 Sep 28 20:19:19     0  67245  67241 troff
2011 Sep 28 20:19:19     0  67246  67241 grotty

This prints a line for each new process that is executed, in a rolling output similar to the previous tools. The fields are: STRTIME = (string) timestamp, UID = user ID, PID = process ID, PPID = parent process ID, ARGS = process name (should be process + arguments, but that doesn’t yet work on Mac OS X; if you want to debug, see pr_psargs in /usr/lib/dtrace/proc.d).

Here’s what happens when I turn AirPort (wifi) off, then on:

Brendan-2:~ brendan$ sudo execsnoop -v
STRTIME                UID    PID   PPID ARGS
2011 Sep 28 20:28:00     0  67204      1 airportd
2011 Sep 28 20:28:01   503  67205    140 fontworker
2011 Sep 28 20:28:01     0  67206      1 mDNSResponderHel
2011 Sep 28 20:28:06     0     49      1 autofsd
2011 Sep 28 20:28:10     0  67208     37 ManagedClient
2011 Sep 28 20:28:31   503  67209    140 fontworker
2011 Sep 28 20:28:43     0  67210      1 airportd
2011 Sep 28 20:28:49     0  67211      1 mDNSResponderHel
2011 Sep 28 20:28:49   503  67212    140 fontworker
2011 Sep 28 20:28:51     0  67214      1 kerberosautoconf
2011 Sep 28 20:28:51     0  67215      1 kerberosautoconf
2011 Sep 28 20:28:56     0     49      1 autofsd
2011 Sep 28 20:29:01     0  67219     37 ManagedClient
2011 Sep 28 20:29:04     0  67220      1 ocspd
2011 Sep 28 20:29:16   503  67221  54079 Google Chrome He
2011 Sep 28 20:29:19   503  67222    140 fontworker

Near the end I opened up a new tab in Mozilla Firefox and another in Google Chrome. You can see a fundamental difference: Chrome creates a new process for that tab, Firefox doesn’t.

execsnoop has various options, use -h to list them all. I used -v above, to print the time.

4. opensnoop

This traces file opens and prints various details, including the time and error code when using “-ve”. I usually use it to look for failed opens, which can be a sign of misconfigured applications. Discovering their config files and resource files can also be useful.

Brendan-2:~ brendan$ sudo opensnoop -ve
STRTIME                UID    PID COMM          FD ERR PATH
2011 Sep 30 01:44:15     0     11 DirectoryServic  -1   2 /var/db/dslocal/nodes/Default/hosts
2011 Sep 30 01:44:15     0     11 DirectoryServic  20   0 /etc/hosts
2011 Sep 30 01:44:15     0  85198 dtrace         5   0 /etc/localtime
2011 Sep 30 01:44:15   503    146 SystemUIServer  19   0 /Users/brendan/Library/Preferences/com.apple.menuextra.clock.plist.47C2X9o
2011 Sep 30 01:44:15   503    146 SystemUIServer  19   0 /System/Library/CoreServices/Menu Extras/TimeMachine.menu/Contents/Resources/TMRotatingArrow.pdf
2011 Sep 30 01:44:15   503    146 SystemUIServer  19   0 /System/Library/CoreServices/Menu Extras/Clock.menu/Contents/Resources/Clock.pdf
2011 Sep 30 01:44:15   503    146 SystemUIServer  19   0 /System/Library/CoreServices/Menu Extras/TimeMachine.menu/Contents/Resources/TMRotatingArrow.pdf
2011 Sep 30 01:44:17   503  67261 TweetDeck     17   0 /Users/brendan/Library/Cookies/Cookies.plist_tmp_67261_0.dat
2011 Sep 30 01:44:17   503  67261 TweetDeck     17   0 /Users/brendan/Library/Cookies/Cookies.plist
2011 Sep 30 01:44:18   503  54079 Google Chrome  72   0 /var/folders/bU/bU0WYlnwE6KKnMBrvI5QUU+++TQ/-Tmp-/.com.google.Chrome.sQdN3D
2011 Sep 30 01:44:19     0     36 mds            9   0 .
^C

While tracing, I changed the clock in the top bar from digital:

to analogue:

The files opened to performed this can be seen by the “SystemUIServer” process, beginning with a Preferences file and then some PDFs. Huh? PDFs? What are those PDFs?

That’s bizarre. I didn’t know that those tiny icons were implemented as PDFs!

5. dtruss

The previous two tools, opensnoop and execsnoop, operate by tracing specific system calls. A system call (or “syscall”) is what an application performs to request the operating system kernel to perform privileged work, including process creation, file operations, and other I/O (eg, disk or network I/O). Syscalls are a great target for analysis with DTrace, since examining them often provides a pretty good picture of what an application is trying to do. They can also provide useful details including byte counts, file and process names, error codes, and latency.

The dtruss tool traces all types of system calls, which is very useful for general debugging, especially since Mac OS X doesn’t come with a standard syscall tracer (like Linux’s “strace” or Solaris’s “truss”). One advantage of dtruss over those other tools is that dtruss can trace multiple processes at the same time, matching on the process name “-n”. For example, tracing Firefox via its process name “firefox-bin”:

Brendan-2:~ brendan$ sudo dtruss -n firefox-bin
	PID/THRD  SYSCALL(args) 		 = return
67254/0x720286:  write_nocancel(0x7, "8\0", 0x1)		 = 1 0
67254/0x720286:  lseek(0x24, 0x75500, 0x0)		 = 480512 0
67254/0x720286:  read_nocancel(0x24, "\0", 0x200)		 = 512 0
67254/0x720286:  write_nocancel(0x7, "8\0", 0x1)		 = 1 0
67254/0x720286:  lseek(0x24, 0x73B00, 0x0)		 = 473856 0
67254/0x720286:  read_nocancel(0x24, "\0", 0x200)		 = 512 0
67254/0x746a81:  select_nocancel(0x3D, 0xB04909B8, 0xB0490938)		 = 1 0
67254/0x746a81:  read_nocancel(0x6, "8\0", 0x400)		 = 1 0
67254/0x746a81:  recvfrom_nocancel(0x36, 0xB0490C3F, 0x1)		 = -1 Err#35
67254/0x746a81:  select_nocancel(0x3D, 0xB0490998, 0xB0490918)		 = 1 0
67254/0x746a81:  sendto_nocancel(0x36, 0x20A51008, 0x292)		 = 658 0
67254/0x720286:  write_nocancel(0x7, "8\0", 0x1)		 = 1 0
67254/0x720286:  lseek(0x24, 0x74000, 0x0)		 = 475136 0
[...]

The output will be many pages, as applications commonly make frequent system calls.

dtruss can also launch and trace a program. Here the humble “ls -l” command was traced:

Brendan-2:~ brendan$ sudo dtruss ls -l hfsslower.d
-rwxr-xr-x  1 brendan  staff  1152 Jan 28  2011 hfsslower.d
SYSCALL(args) 		 = return
ioctl(0x3, 0x80086804, 0x7FFF5FBFD710)		 = 0 0
close(0x3)		 = 0 0
stat64("/usr/lib/libstdc++.6.dylib\0", 0x7FFF5FBFCB20, 0x7FFF5FBFD160)		 = 0 0
mmap(0x0, 0xD000, 0x3, 0x1002, 0x1000000, 0x7FFF00000001)		 = 0x20000 0
[...100 lines truncated...]
open_nocancel("/etc/sysinfo.conf\0", 0x0, 0x1B6)		 = -1 Err#2
lstat64_extended(0x100100768, 0x7FFF5FBFE3F0, 0x100101580)		 = 0 0
listxattr(0x100100768, 0x0, 0x0)		 = 0 0
fstat64(0x1, 0x7FFF5FBFD960, 0x7FFF5FBFDA2C)		 = 0 0
ioctl(0x1, 0x4004667A, 0x7FFF5FBFD9AC)		 = 0 0
access("/etc/localtime\0", 0x4, 0x0)		 = 0 0
open_nocancel("/etc/localtime\0", 0x0, 0x0)		 = 4 0
fstat64(0x4, 0x7FFF5FBFB8B0, 0x0)		 = 0 0
read_nocancel(0x4, "TZif\0", 0x2A64)		 = 1017 0
close_nocancel(0x4)		 = 0 0
write_nocancel(0x1, "-rwxr-xr-x  1 brendan  staff  1152 Jan 28  2011 hfsslower.d\n\0", 0x3C)		 = 60 0
fchdir(0x3, 0x100100000, 0xFC080)		 = 0 0
fchdir(0x3, 0x0, 0x100800000)		 = 0 0
close_nocancel(0x3)		 = 0 0

100 lines were trimmed to keep that example short.

dtruss supports various options, including the printing of system call timing for use when analyzing performance.

Brendan-2:~ brendan$ dtruss -h
USAGE: dtruss [-acdefholLs] [-t syscall] { -p PID | -n name | command }

          -p PID          # examine this PID
          -n name         # examine this process name
          -t syscall      # examine this syscall only
          -a              # print all details
          -c              # print syscall counts
          -d              # print relative times (us)
          -e              # print elapsed times (us)
          -f              # follow children
          -l              # force printing pid/lwpid
          -o              # print on cpu times
          -s              # print stack backtraces
          -L              # don't print pid/lwpid
          -b bufsize      # dynamic variable buf size
   eg,
       dtruss df -h       # run and examine "df -h"
       dtruss -p 1871     # examine PID 1871
       dtruss -n tar      # examine all processes called "tar"
       dtruss -f test.sh  # run test.sh and follow children

I’ve used the “-e” option many times to figure out what’s slowing down an application. It shows the elapsed time for the system call in microseconds:

Brendan-2:~ brendan$ dtruss -e ls -l hfsslower.d 
-rwxr-xr-x  1 brendan  staff  1152 Jan 28  2011 hfsslower.d
 ELAPSD SYSCALL(args) 		 = return
    245 mmap(0x10000B000, 0x2000, 0x5, 0x12, 0x3, 0x7FFF00000001)		 = 0xB000 0
     11 mmap(0x10000D000, 0x1000, 0x3, 0x12, 0x3, 0x7FFF00000001)		 = 0xD000 0
     10 mmap(0x10000E000, 0x1F10, 0x1, 0x12, 0x3, 0x7FFF00000001)		 = 0xE000 0
     31 open("/dev/dtracehelper\0", 0x2, 0x7FFF5FC45370)		 = 3 0
     40 __sysctl(0x7FFF5FBFD5B0, 0x2, 0x7FFF5FBFD5DC)		 = 0 0

I then look down the “ELAPSD” column for the largest times.

A dtruss FAQ is: why is the output in the wrong order?

As with other DTrace “tracing” style tools (that print lines of output as they occur) the output can be shuffled slightly due to CPU buffering on multi-CPU systems. The “-d” option for relative times (since program start) is useful for cases where the output order is important. You can then post-sort by that column so that the dtruss output is in the correct order. (For example, by redirecting the output to a file, then using the sort(1) command on that column.)

Ryan Dahl (creator of node.js) had been using dtruss so frequently recently that he made some enhancements and posted a newer version on github here.

6. soconnect_mac.d

This script is from the DTrace book, and traces outbound TCP connections along with details:

Brendan-2:~ brendan$ sudo ./dtbook/Chap6/soconnect_mac.d
PID    PROCESS          FAM ADDRESS          PORT   LAT(us) RESULT
45343  firefox-bin      2   70.32.100.166    443        735 In progress
65002  TweetDeck        2   199.59.148.201   80          94 In progress
65002  TweetDeck        2   208.93.137.53    80          89 In progress
54079  Google Chrome    2   97.74.215.143    80          76 In progress
45343  firefox-bin      2   97.74.215.143    80         752 In progress
45343  firefox-bin      2   97.74.215.143    80          88 In progress
45343  firefox-bin      2   97.74.215.143    80          85 In progress
45343  firefox-bin      2   97.74.215.143    80          39 In progress
45343  firefox-bin      2   97.74.215.143    80          22 In progress
45343  firefox-bin      2   97.74.215.143    80          20 In progress
27     ntpd             2   17.151.16.20     123         24 Success
[...]

It’s a quick way to find out what applications are connecting to whom on the Internet. The script is here.

Columns are: PID = process ID, PROCESS = process name, FAM = protocol family (2 = IPv4), ADDRESS = IP address, PORT = TCP destination port, LAT(us) latency of the connect() system call in microseconds, RESULT = return of the connect().

Many connections these days are “non-blocking”, so the latency appears low and has the result “In progress”. The actual connection completes sometime later (the timing can be traced using some more DTrace, when desired).

There is a companion tool, soaccept_mac.d, which shows inbound TCP connections. If you ran both regularly, you may catch something performing networking that shouldn’t be (e.g., spyware, virus, …).

7. errinfo

This tool provides a summary of which system calls were failing, showing the process name, error code, and short description of the error:

Brendan-2:~ brendan$ sudo errinfo -c
Tracing... Hit Ctrl-C to end.
^C
            EXEC          SYSCALL  ERR  COUNT  DESC
       TweetDeck          lstat64    2      1  No such file or directory
       TweetDeck           select    9      1  Bad file descriptor
       TweetDeck           stat64    2      1  No such file or directory
           cupsd           unlink    2      1  No such file or directory
     firefox-bin connect_nocancel   36      1  Operation now in progress
         launchd            mkdir   17      1  File exists
         launchd    open_nocancel    2      1  No such file or directory
[...]
          Google           Chrome    0     35
          Google           Chrome    0     35
        mdworker      getattrlist    2     24  No such file or directory
        mdworker            mkdir   13     24  Permission denied
       TweetDeck           access    2     33  No such file or directory
       TweetDeck             read   35     35  Resource temporarily unavailable
   mDNSResponder          recvmsg   35     38  Resource temporarily unavailable
        mdworker           stat64    2     47  No such file or directory
       gpg-agent    read_nocancel   35     60  Resource temporarily unavailable
        thnuclnt            ioctl   25     60  Inappropriate ioctl for device
            ntpd __pthread_canceled   22     69  Invalid argument
            ntpd        sigreturn   -2     69
            ntpd       sigsuspend    4     69  Interrupted system call
          Google           Chrome    0      0
 activitymonitor        proc_info    3   1495  No such process

This can be another quick way to track down failing or misconfigured applications. Note that (usually) most system call errors are not a problem: it can be normal for system calls to fail, and the calling application handles that condition correctly. errinfo is particularly useful when the application didn’t handle the failure correctly, and the system calls are silently failing.

8. bitesize.d

This is a simple DTrace script that characterizes the disk I/O workload, showing a distribution of the size of the I/O in bytes along with the application name:

Brendan-2:~ brendan$ sudo bitesize.d
Tracing... Hit Ctrl-C to end.
^C

     PID  CMD
   31502  vim\0

           value  ------------- Distribution ------------- count
            2048 |                                         0
            4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@         4
            8192 |                                         0
           16384 |@@@@@@@@                                 1
           32768 |                                         0        

   54079  Google Chrome\0

           value  ------------- Distribution ------------- count
            2048 |                                         0
            4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@        18
            8192 |@@@@@@@                                  4
           16384 |                                         0        

   86143  bsdtar\0

           value  ------------- Distribution ------------- count
            2048 |                                         0
            4096 |@@@                                      19
            8192 |@@@@                                     23
           16384 |@@                                       12
           32768 |@@@@@@@@@@@@@@@@@@@@@@@                  151
           65536 |@@@@                                     25
          131072 |@@                                       10
          262144 |@@                                       10
          524288 |@                                        8
         1048576 |                                         3
         2097152 |                                         0

The “value” column shows the minimum size of the I/O in bytes, and the “count” column shows the number of I/O in that range: between the minimum size and the next size shown.

To follow an example: the “bsdtar” command shown above performed 151 disk I/Os that were between 32768 bytes and 65535 bytes in size (32K to 64K), which was where most of its I/O fell. The text plot in the middle provides a visualization for the distribution of the events, which shows a spike that corresponds to this 32K to 64K range. Google Chrome did a couple of dozen I/O between 4 and 16 Kbytes.

Disk I/O is an expensive operation, so it’s usually preferable for it to be larger in size to improve the throughput to the device.

9. iotop

This presents the same data as iosnoop, but in a summarized way similar to top(1). It’s handy when disk I/O is so frequent that iosnoop is too verbose, and you want a high level summary of which process is rattling the disks. Here I’ve used “-CP” to not clear the screen and provide a rolling output (-C), and to show disk busy percentages (-P):

Brendan-2:~ brendan$ sudo iotop -CP 1
Tracing... Please wait.
2011 Sep 27 19:33:11,  load: 0.74,  disk_r:      0 KB,  disk_w:  57344 KB

  UID    PID   PPID CMD              DEVICE  MAJ MIN D   %I/O
    0  66661  66312 cp               ??       14   2 W     98

2011 Sep 27 19:33:12,  load: 0.74,  disk_r:      4 KB,  disk_w:  55752 KB

  UID    PID   PPID CMD              DEVICE  MAJ MIN D   %I/O
    0  66661  66312 cp               ??       14   2 W     93

2011 Sep 27 19:33:13,  load: 0.68,  disk_r:     60 KB,  disk_w:  54464 KB

  UID    PID   PPID CMD              DEVICE  MAJ MIN D   %I/O
    0  66661  66312 cp               ??       14   2 R     11
    0  66661  66312 cp               ??       14   2 W     88

2011 Sep 27 19:33:14,  load: 0.68,  disk_r:   2560 KB,  disk_w:  32928 KB

  UID    PID   PPID CMD              DEVICE  MAJ MIN D   %I/O
  503  65002    140 TweetDeck        ??       14   2 W      0
    0      1      0 launchd          ??       14   2 W      1
    0  66661  66312 cp               ??       14   2 R     11
    0  66661  66312 cp               ??       14   2 W     71

The “??” is for where some details aren’t yet available to DTrace, for the same reason as was described for the iosnoop script (if you are a programmer, you can see what needs to be fixed in the /usr/lib/dtrace/io.d file).

If you get “dynvardrops”, sorry: back when I wrote this (6 years ago), I never saw a disk workload heavy enough to cause these. A remedy can be to increase the dynvarsize tunable by adding the following line:

#pragma D option dynvarsize=16m

near this one:

#pragma D option quiet

in /usr/bin/iotop. You may want to make a backup of iotop first (iotop.orig) or copy it somewhere else to run it.

10. maclife.d

This script is from the DTrace book, and traces the creation and deletion of files:

Brendan-2:~ brendan$ sudo ./dtbook/Chap5/maclife.d
TIME(ms)        UID    PID PROCESS      CALL         DIR/FILE
3754594958      503  54079 Google Chrom VNOP_CREATE  Chrome/.com.google.Chrome.K7I9jy
3754597703      503  54079 Google Chrom VNOP_CREATE  -Tmp-/.com.google.Chrome.8MIKKL
3754597703      503  54079 Google Chrom VNOP_REMOVE  -Tmp-/.com.google.Chrome.8MIKKL
3754597703      503  54079 Google Chrom VNOP_CREATE  -Tmp-/.com.google.Chrome.TLlOje
3754597703      503  54079 Google Chrom VNOP_REMOVE  -Tmp-/.com.google.Chrome.TLlOje
3754598365      503  54079 Google Chrom VNOP_CREATE  -Tmp-/.com.google.Chrome.yIwTdE
3754598365      503  54079 Google Chrom VNOP_REMOVE  -Tmp-/.com.google.Chrome.yIwTdE
3754603801      503  65002 TweetDeck    VNOP_CREATE  Cookies/Cookies.plist_tmp_65002_0.dat
3754605028      503  65002 TweetDeck    VNOP_REMOVE  Local Store/td_26_brendangregg.db-journal
3754605026      503  65002 TweetDeck    VNOP_CREATE  Local Store/td_26_brendangregg.db-journal
3754607674      503  65002 TweetDeck    VNOP_CREATE  Local Store/td_26_brendangregg.db-journal
3754607676      503  65002 TweetDeck    VNOP_REMOVE  Local Store/td_26_brendangregg.db-journal
3754609536      503  34852 Adium        VNOP_CREATE  Default/.dat8824.9fa
3754711583      503  17726 thnuclnt     VNOP_CREATE  thnuclnt-17717/thnuclnt.conf-fta17726
3754711587      503  17726 thnuclnt     VNOP_REMOVE  -Tmp-/4e8220118e918
3754711617      503  17726 thnuclnt     VNOP_REMOVE  -Tmp-/4e8220118ead8
3754711620      503  17726 thnuclnt     VNOP_REMOVE  -Tmp-/4e8220119621c
3754711623      503  17726 thnuclnt     VNOP_REMOVE  -Tmp-/4e82201196d7e
3754711624      503  17726 thnuclnt     VNOP_REMOVE  thnuclnt-17717/thnuclnt.conf-fta17726
3754612740      503  31502 vim          VNOP_CREATE  macosx/4913
3754612740      503  31502 vim          VNOP_REMOVE  macosx/4913
3754612741      503  31502 vim          VNOP_CREATE  macosx/top10dtrace.html
3754612741      503  31502 vim          VNOP_REMOVE  macosx/top10dtrace.html~
[...]

Interesting! While tracing I saved the file I was editing in vim, which is seen in the last four lines. This tells me that vim is creating and removing temporary files as part of the save process.

Also note that it looks like TweetDeck created the file twice before removing it (two VNOP_CREATEs followed by a VNOP_REMOVE). This isn’t the correct order, which can be seen by examining the TIME(ms) column. A side-effect of DTrace’s negligible performance impact design is that output can be slightly shuffled due to the way it collects data from per-CPU buffers. I often include a TIME column like that one, not just for the usefulness of knowing time, but also as a means to post sort the output.

Other Scripts

There are other tools ready-to-go in Mac OS X. If you run the “man -k dtrace” command in a Terminal (or “apropos dtrace”), you’ll see the following output:

Brendan-2:~ brendan$ man -k dtrace
[...]
bitesize.d(1m)           - analyse disk I/O size by process. Uses DTrace
cpuwalk.d(1m)            - Measure which CPUs a process runs on. Uses DTrace
creatbyproc.d(1m)        - snoop creat()s by process name. Uses DTrace
dappprof(1m)             - profile user and lib function usage. Uses DTrace
dapptrace(1m)            - trace user and library function usage. Uses DTrace
diskhits(1m)             - disk access by file offset. Uses DTrace
dispqlen.d(1m)           - dispatcher queue length by CPU. Uses DTrace
dtrace(1)                - generic front-end to the DTrace facility
dtruss(1m)               - process syscall details. Uses DTrace
errinfo(1m)              - print errno for syscall fails. Uses DTrace
execsnoop(1m)            - snoop new process execution. Uses DTrace
fddist(1m)               - file descriptor usage distributions. Uses DTrace
filebyproc.d(1m)         - snoop opens by process name. Uses DTrace
hotspot.d(1m)            - print disk event by location. Uses DTrace
httpdstat.d(1m)          - realtime httpd statistics. Uses DTrace
iofile.d(1m)             - I/O wait time by file and process. Uses DTrace
iofileb.d(1m)            - I/O bytes by file and process. Uses DTrace
iopattern(1m)            - print disk I/O pattern. Uses DTrace
iopending(1m)            - plot number of pending disk events. Uses DTrace
iosnoop(1m)              - snoop I/O events as they occur. Uses DTrace
iotop(1m)                - display top disk I/O events by process. Uses DTrace
kill.d(1m)               - snoop process signals as they occur. Uses DTrace
lastwords(1m)            - print syscalls before exit. Uses DTrace
loads.d(1m)              - print load averages. Uses DTrace
newproc.d(1m)            - snoop new processes. Uses DTrace
opensnoop(1m)            - snoop file opens as they occur. Uses DTrace
pathopens.d(1m)          - full pathnames opened ok count. Uses DTrace
pidpersec.d(1m)          - print new PIDs per sec. Uses DTrace
plockstat(1)             - front-end to DTrace to print statistics about POSIX mutexes and read/write locks
priclass.d(1m)           - priority distribution by scheduling class. Uses DTrace
pridist.d(1m)            - process priority distribution. Uses DTrace
procsystime(1m)          - analyse system call times. Uses DTrace
runocc.d(1m)             - run queue occupancy by CPU. Uses DTrace
rwbypid.d(1m)            - read/write calls by PID. Uses DTrace
rwbytype.d(1m)           - read/write bytes by vnode type. Uses DTrace
rwsnoop(1m)              - snoop read/write events. Uses DTrace
sampleproc(1m)           - sample processes on the CPUs. Uses DTrace
seeksize.d(1m)           - print disk event seek report. Uses DTrace
setuids.d(1m)            - snoop setuid calls as they occur. Uses DTrace
sigdist.d(1m)            - signal distribution by process. Uses DTrace
syscallbypid.d(1m)       - syscalls by process ID. Uses DTrace
syscallbyproc.d(1m)      - syscalls by process name. Uses DTrace
syscallbysysc.d(1m)      - syscalls by syscall. Uses DTrace
topsyscall(1m)           - top syscalls by syscall name. Uses DTrace
topsysproc(1m)           - top syscalls by process name. Uses DTrace
weblatency.d(1m)         - website latency statistics. Uses DTrace

Plus more in the DTrace book.

And More…

DTrace isn’t just about running scripts; you can write your own custom scripts, run one-liners, and use higher-level tools that use DTrace behind the scenes, like Apple’s Instruments and Joyent’s Cloud Analytics. And it’s for more than just your MacBook: if you are using servers that have DTrace available, you can use it to diagnose their performance and issues too, including tracing kernel and application code. I regularly use it to see how a MySQL database interacts with a kernel file system, for example.

For more reading about DTrace, you can see my posts tagged dtrace, other blogs on dtrace.org, the original 410 page DTrace guide, 1100 page DTrace book, and my original DTrace page (which includes the DTraceToolkit). It’s a little old now (and I think prustat needs updating), but there’s also Matty’s Top Ten DTrace Scripts, which includes some of those above; and Greg Miller’s Exploring Leopard with DTrace, which includes Objective-C tracing.


[shared via Hacker News]

Mar 11

How I went from writing 2,000 words a day to 10,000 words a day

When I started writing The Spirit War (Eli novel #4), I had a bit of a problem. I had a brand new baby and my life (like every new mother’s life) was constantly on the verge of shambles. I paid for a sitter four times a week so I could get some writing time, and I guarded these hours like a mama bear guards her cubs - with ferocity and hiker-mauling violence. To keep my schedule and make my deadlines, I needed to write 4000 words during each of these carefully arranged sessions. I thought this would be simple. After all, before I quit my job to write full time I’d been writing 2k a day in the three hours before work. Surely with 6 hours of baby free writing time, 4k a day would be nothing….

But (of course), things didn’t work out like that. Every day I’d sit down to add 4000 words to my new manuscript. I was determined, I was experienced, I knew my world. There was no reason I couldn’t get 4k down. But every night when I hauled myself away, my word count had only increased by 2k, the same number of words I’d been getting before I quit my day job. 

Needless to say, I felt like a failure. Here I was, a professional writer with three books about to come out, and I couldn’t even beat the writing I’d done before I went pro. At first I made excuses, this novel was the most complicated of all the Eli books I’d written, I was tired because my son thinks 4am is an awesome time to play, etc. etc. But the truth was there was no excuse. I had to find a way to boost my word count, and with months of 2k a day dragging me down, I had to do it fast. So I got scientific. I gathered data and tried experiments, and ultimately ended up boosting my word count to heights far beyond what I’d thought was possible, and I did it while making my writing better than ever before.

When I told people at ConCarolinas that I’d gone from writing 2k to 10k per day, I got a huge response. Everyone wanted to know how I’d done it, and I finally got so sick of telling the same story over and over again that I decided to write it down here.

So, once and for all, here’s the story of how I went from writing 500 words an hour to over 1500, and (hopefully) how you can too:


A quick note: There are many fine, successful writers out there who equate writing quickly with being a hack. I firmly disagree. My methods remove the dross, the time spent tooling around lost in your daily writing, not the time spent making plot decisions or word choices. This is not a choice between ruminating on art or churning out the novels for gross commercialism (though I happen to like commercial novels), it’s about not wasting your time for whatever sort of novels you want to write.
Drastically increasing your words per day is actually pretty easy, all it takes is a shift in perspective and the ability to be honest with yourself (which is the hardest part). Because I’m a giant nerd, I ended up creating a metric, a triangle with three core requirements: Knowledge, Time, and Enthusiasm. Any one of these can noticeably boost your daily output, but all three together can turn you into a word machine. I never start writing these days unless I can hit all three.

Update! The talented Vicky Teinaki made a graphic of this metric and let me use it! She is awesome!



Side 1: Knowledge, or Know What You’re Writing Before You Write It
The first big boost to my daily wordcount happened almost by accident. Used to be I would just pop open the laptop and start writing. Now, I wasn’t a total make-it-up-as-you-go writer. I had a general plot outline, but my scene notes were things like “Miranda and Banage argue” or “Eli steals the king.” Not very useful, but I knew generally what direction I was writing in, and I liked to let the characters decide how the scene would go. Unfortunately, this meant I wasted a lot of time rewriting and backtracking when the scene veered off course. 

This was how I had always written, it felt natural to me. But then one day I got mired in a real mess. I had spent three days knee deep in the same horrible scene. I was drastically behind on my wordcount, and I was facing the real possibility of missing my deadline… again. It was the perfect storm of all my insecurities, the thought of letting people down mixed with the fear that I really didn’t know what I was doing, that I wasn’t a real writer at all, just an amateur pretending to be one. But as I got angrier and angrier with myself, I looked down at my novel and suddenly realized that I was being an absolute idiot. Here I was, desperate for time, floundering in a scene, and yet I was doing the hardest work of writing (figuring out exactly what needs to happen to move the scene forward in the most dramatic and exciting way) in the most time consuming way possible (ie, in the middle of the writing itself). 

As soon as I realized this, I stopped. I closed my laptop and got out my pad of paper. Then, instead of trying to write the scene in the novel as I had been, I started scribbling a very short hand, truncated version the scene on the paper. I didn’t describe anything, I didn’t do transitions. I wasn’t writing, I was simply noting down what I would write when the time came. It took me about five minutes and three pages of notebook paper to untangle my seemingly unfixable scene, the one that had just eaten three days of my life before I tried this new approach. Better still, after I’d worked everything out in shorthand I was able to dive back into the scene and finish it in record time. The words flew onto the screen, and at the end of that session I’d written 3000 words rather than 2000, most of them in that last hour and a half.

Looking back, it was so simple I feel stupid for not thinking of it sooner. If you want to write faster, the first step is to know what you’re writing before you write it. I’m not even talking about macro plot stuff, I mean working out the back and forth exchanges of an argument between characters, blocking out fights, writing up fast descriptions. Writing this stuff out in words you actually want other people to read, especially if you’re making everything up as you go along, takes FOREVER. It’s horribly inefficient and when you get yourself in a dead end, you end up trashing hundreds, sometimes thousands of words to get out. But jotting it down on a pad? Takes no time at all. If the scene you’re sketching out starts to go the wrong way, you see it immedeatly, and all you have to do is cross out the parts that went sour and start again at the beginning. That’s it. No words lost, no time wasted. It was god damn beautiful.

Every writing session after this realization, I dedicated five minutes (sometimes more, never less) and wrote out a quick description of what I was going to write. Sometimes it wasn’t even a paragraph, just a list of this happens then this then this. This simple change, these five stupid minutes, boosted my wordcount enormously. I went from writing 2k a day to writing 5k a day within a week without increasing my 5 hour writing block. Some days I even finished early.

Of the three sides of the triangle, I consider knowledge to be the most important. This step alone more than doubled my word count. If you only want to try one change at a time, this is the one I recommend the most.

Side 2: Time
Now that I’d had such a huge boost from one minor change, I started to wonder what else I could do to jack my numbers up even higher. But as I looked for other things I could tweak, I quickly realized that I knew embarrassingly little about how I actually wrote my novels. I’d kept no records of my progress, I couldn’t even tell you how long it took me to write any of my last three novels beyond broad guesstimations, celebratory blog posts, and vague memories of past word counts. It was like I started every book by throwing myself at the keyboard and praying for a novel to shoot out of my fingers before the deadline. And keep in mind this is my business. Can you imagine a bakery or a freelance designer working this way? Never tracking hours or keeping a record of how long it took me to actually produce the thing I was selling? Yeah, pretty stupid way to work.

If I was going to boost my output (or know how long it took me to actually write a freaking novel), I had to know what I was outputting in the first place. So, I started keeping records. Every day I had a writing session I would note the time I started, the time I stopped, how many words I wrote, and where I was writing on a spreadsheet. I did this for two months, and then I looked for patterns.

Several things were immediately clear. First, my productivity was at its highest when I was in a place other than my home. That is to say, a place without internet. The afternoons I wrote at the coffee shop with no wireless were twice as productive as the mornings I wrote at home. I also saw that, while butt in chair time is the root of all writing, not all butt in chair time is equal. For example, those days where I only got one hour to write I never managed more than five hundred words in that hour. By contrast, those days I got five hours of solid writing I was clearing close to 1500 words an hour. The numbers were clear: the longer I wrote, the faster I wrote (and I believe the better I wrote, certainly the writing got easier the longer I went). This corresponding rise of wordcount and writing hours only worked up to a point, though. There was a definite words per hour drop off around hour 7 when I was simply too brain fried to go on. 

But these numbers are very personal, the point I’m trying to make is that by recording my progress every day I had the data I needed to start optimizing my daily writing. Once I had my data in hand, I rearranged my schedule to make sure my writing time was always in the afternoon (my most prolific time according to my sheet, which was a real discovery. I would have bet money I was better in the morning.), always at my coffee shop with no internet, and always at least 4 hours long. Once I set my time, I guarded it viciously, and low and behold my words per day shot up again. This time to an average of 6k-7k per writing day, and all without adding any extra hours. All I had to do was discover what made good writing time for me and then make sure the good writing time was the time I fought hardest to get. 

Even if you don’t have the luxury of 4 uninterrupted hours at your prime time of day, I highly suggest measuring your writing in the times you do have to write. Even if you only have 1 free hour a day, trying that hour in the morning some days and the evening on others and tracking the results can make sure you aren’t wasting your precious writing time on avoidable inefficiencies. Time really does matter.

Side 3: Enthusiasm
I was flying high on my new discoveries. Over the course of two months I’d jacked my daily writing from 2k per day to 7k with just a few simple changes and was now actually running ahead of schedule for the first time  in my writing career. But I wasn’t done yet. I was absolutely determined I was going to break the 10k a day barrier.

I’d actually broken it before. Using Knowledge and Time, I’d already managed a few 10k+ days, including one where I wrote 12,689 words, or two chapters, in 7 hours. To be fair, I had been writing outside of my usual writing window in addition to my normal writing on those days, so it wasn’t a total words-per-hour efficiency jump. But that’s the great thing about going this fast, the novel starts to eat you and you find yourself writing any time you can just for the pure joy of it. Even better, on the days where I broke 10k, I was also pulling fantastic words-per-hour numbers, 1600 - 2000 words per hour as opposed to my usual 1500. It was clear these days were special, but I didn’t know how. I did know that I wanted those days to become the norm rather than the exception, so I went back to my records (which I now kept meticulously) to find out what made the 10k days different.


The answer was head-slappingly obvious. Those days I broke 10k were the days I was writing scenes I’d been dying to write since I planned the book. They were the candy bar scenes, the scenes I wrote all that other stuff to get to. By contrast, my slow days (days where I was struggling to break 5k) corresponded to the scenes I wasn’t that crazy about.

This was a duh moment for me, but it also brought up a troubling new problem. If I had scenes that were boring enough that I didn’t want to write them, then there was no way in hell anyone would want to read them. This was my novel, after all. If I didn’t love it, no one would.

Fortunately, the solution turned out to be, yet again, stupidly simple. Every day, while I was writing out my little description of what I was going to write for the knowledge component of the triangle, I would play the scene through in my mind and try to get excited about it. I’d look for all the cool little hooks, the parts that interested me most, and focus on those since they were obviously what made the scene cool. If I couldn’t find anything to get excited over, then I would change the scene, or get rid of it entirely. I decided then and there that, no matter how useful a scene might be for my plot, boring scenes had no place in my novels. 


This discovery turned out to be a fantastic one for my writing. I trashed and rewrote several otherwise perfectly good scenes, and the effect on the novel was amazing. Plus, my daily wordcount numbers shot up again because I was always excited about my work. Double bonus!

Life On 10k A Day
With all three sides of my triangle now in place, I was routinely pulling 10-12k per day by the time I finished Spirits’ End, the fifth Eli novel. I was almost 2 months ahead of where I’d thought I’d be, and the novel had only taken me 3 months to write rather than the 7 months I’d burned on the Spirit War (facts I knew now that I was keeping records). I was ahead of schedule with plenty of time to do revisions before I needed to hand the novel in to my editor, and I was happier with my writing than ever before. There were several days toward the end when I’d close my laptop and stumble out of the coffee shop feeling almost drunk on writing. I felt like I was on top of the world, utterly invincible and happier than I’ve ever been. Writing that much that quickly was like taking some kind of weird success opiate, and I was thoroughly addicted. Once you’ve hit 10k a day for a week straight, anything less feels like your story is crawling.

Now, again, 10k a day is my high point as a professional author whose child is now in daycare (PRICELESS). I write 6 - 7 hours a day, usually 2 in the morning and 4-5 in the afternoon, five days a week. Honestly, I don’t see how anyone other than a full time novelist could pull those kind of hours, but that doesn’t mean you have to be a pro to drastically increase your daily word count.

So 10k might be the high end of the spectrum, but of the people I’ve told about this (a lot) who’ve gotten back to me (not nearly as many), most have doubled their word counts by striving to hit all three sides of the triangle every time they write. This means some have gone from 1k a day to 2k, or 2k to 4k. Some of my great success with increasing my wordcount is undoubtedly a product of experience, as I also hit my million word mark somewhere in the fifth Eli novel. Even so, I believe most of the big leaps in efficiency came from changing the way I approached my writing. Just as changing your lifestyle can help you lose a hundred pounds, changing they way you sit down to write can boost your words per hour in astonishing ways.

If you’re looking to get more out of your writing time, I really hope you try my triangle. If you do, please write me (or comment below) and let me know. Even if it doesn’t work (especially if it doesn’t work) I’d love to hear about it. Also, if you find another efficiency hack for writing, let me know about that too! There’s no reason our triangle can’t be a square, and I’m always looking for a way to hit 15k a day :D. 

Again, I really hope this helps you hit your goals. Good luck with your writing!

- Rachel Aaron

UPDATE, November 25, 2011: Wow, I never expected this to be so popular! I’m so glad it’s been helpful for people! Just to update you all on my progress, since writing this post, I’ve completed, edited (and re-edited, and rewrote again, sigh, still working on speeding that part up), and sent out 2 complete novels, one of which was written in 12 days. You can see the worksheets and stats for that one in my 12 Days of Glory post.

Also, if anyone’s interested in how I plot my novels, I’ve posted that here: How I Plot A Novel in 5 Steps! Again, I’m so glad this was helpful to all of you. Thank you so much for all the lovely feedback!

- R

UPDATE, February 7, 2012: I now have a post on editing novels as well called Editing for People Who Hate Editing if you’re at that stage and want to see how I do it.



[shared via Hacker News]

Episode #257 - March 23rd, 2012

Posted on Friday March 23rd 2012 at 04:00pm. Its tags are listed below.

Episode #257 - March 23rd, 2012

RSpec 2.9.0 released … GitHub Ruby Styleguide … technoweenie’s blog post, “Ending the Mass Assignment Party” … Life Without Gemsets … Custom Bundler Groups … Avdi Grimm’s _Objects on Rails_ Book Available Free

Listen to this episode on Ruby5

This episode is sponsored by New Relic
New Relic is the all-in-one web performance analytics product. It lets you manage and monitor web application performance, from the browser down to the line of code. With Real User Monitoring, New Relic users can see browser response times by geographical location of the user, or by browser type.

Ruby5 News Byte: RSpec 2.9.0 Released
RSpec 2.9.0 has been released.

GitHub Ruby Styleguide
Rejoice, Ruby troglodytes! GitHub has come down from the mountain with the GitHub Ruby Styleguide. Read it! Live it! Love it!

Ending the Mass Assignment Party
technoweenie’s blog post, “Ending the Mass Assignment Party”, and the TainedHash gem for guarding against mass assignment vulnerabilities.

Life Without Gemsets
If you’re using Bundler on all of your projects and you’re still using RVM gemsets, it’s time to move on. Bundler don’t need no stinkin’ gemsets.

Bundler Groups
Make your Gemfile tell the whole story by taking advantage of custom Bundler groups.

Objects on Rails Book Available Free
Avdi Grimm’s “Objects on Rails” e-book is out of beta, and now available to read online, for free.

[shared via Ruby5]

Man finds his way home using Google Earth - after 25 years

Comments

[shared via Hacker News]

Bitcoin Brainwallet

Posted on Tuesday March 13th 2012 at 12:01pm. Its tags are listed below.

Bitcoin Brainwallet

Comments

[shared via Hacker News]

Short, explicit test setups

Posted on Monday March 12th 2012 at 01:15pm. Its tags are listed below.

Short, explicit test setups

You probably know about this Factory Girl definition syntax:

FactoryGirl.define do
  factory :user do
    name 'Connie Customer'
  end
end

But did you know about this Factory Girl invocation syntax?

setup do
  @user = create(:user)
end

Or:

setup do
  @user = build(:user)
end

Or:

setup do
  post :create, user: attributes_for(:user)
end

It’s in there.

Configuration for Test::Unit / Shoulda:

class ActiveSupport::TestCase
  include FactoryGirl::Syntax::Methods
end

Configuration for RSpec:

RSpec.configure do |config|
  config.include FactoryGirl::Syntax::Methods
end

Configuration for Cucumber:

World Factory::Syntax::Methods

[shared via giant robots smashing into other giant robots]

Your product sucks… and why that’s OK

Posted on Monday March 12th 2012 at 12:31pm. Its tags are listed below.

Your product sucks… and why that’s OK

Comments

[shared via Hacker News]

5 Graph Databases to Consider

Posted on Monday March 12th 2012 at 12:31pm. Its tags are listed below.

5 Graph Databases to Consider

Comments

[shared via Hacker News]

Hidden gems in Mac OS X

Posted on Sunday March 11th 2012 at 11:00am. Its tags are listed below.

Hidden gems in Mac OS X

Since version 10.5 “Leopard”, Mac OS X has had DTrace, a tool used for performance analysis and troubleshooting. It provides data for Apple’s Instruments tool, as well as a collection of command line tools that are implemented as DTrace scripts. I’m familiar with the latter as I wrote the originals for the DTraceToolkit, which Apple then customized and enhanced for Mac OS X where they are shipped by default (great!). I use them regularly to answer this question:

why is my MacBook slow?

I work in an office where everyone has MacBook Pros, and “why is my MacBook slow?” is a common question. Applications can become slow or unresponsive while waiting for CPU work, memory requests or disk I/O to complete.

For people who try to ignore the slowdown, the question can become:

why is my MacBook fan so loud?

Standard performance analysis tools like Activity Monitor and top(1) (and any third-party tools based on the same foundation) can’t tell you some key information about activity on your system, such as how much CPU consumption is caused by short-lived processes, or which processes are causing disk I/O. DTrace, however, can see (just about) everything.

In this post, I’ll cover the top ten Mac OS X DTrace scripts that I use for figuring out why laptops are slow or why applications are misbehaving. Most of these scripts are already installed, a few are from the new DTrace book.

How to use DTrace

If you’ve never run a DTrace script before or even used the command line, here’s a basic walkthrough:

Open Terminal

Run the “Terminal” application. You can find it here in Finder:

You can also type “terminal” in Spotlight (the magnifying glass in the top right corner of your Mac’s screen), which should find it. I usually drag it to my Dock so it’s easy to find later:

Customize Terminal

When you first run Terminal, it’ll probably look like this:

I find the default font small and hard to read. The size of the window in terms of characters is also small (80 columns by 24 rows), presumably to pay homage to original Unix terminals of that size. (Why Unix? Mac OS X is Unix under the hood: the Darwin kernel).

You can adjust the font to your liking in Terminal->Preferences->Text. I use Monaco 13pt, with “Antialias Text” on. Under the “Window” tab is the default size, 80×24, which you can also increase later by clicking and dragging the bottom right corner of the terminal. For example, here’s my screen as I write this blog post (in a terminal-based text editor).

Running top

While DTrace can see everything, there are some things already covered by easy-to-use (and easy-to-type) tools, like top(1). Now that you have Terminal running, type “top -o cpu” and hit enter, which will refresh the screen showing top running processes. Type “q” to exit the top program. The output will look something like this:

Brendan-2:~ brendan$ top -o cpu
[...]
PID    COMMAND      %CPU TIME     #TH  #WQ  #PORT #MREGS RPRVT  RSHRD  RSIZE
67254- firefox-bin  98.8 04:53:45 62/1 1    743   9935+  374M+  78M    637M+
86550  top          15.3 00:28.60 1/1  0    24    33     1748K  264K   2324K
1068   Terminal     13.2 17:38:19 5    1    358   1326   37M    67M    61M+
61501- Google Chrom 7.9  13:40.89 6    1    93-   556-   45M-   73M    77M-
[...]

I’ve truncated the header block to just include the columns. Look at the “%CPU” column to see which processes are making the CPUs busy, and the “RSIZE” column to see who is consuming main memory. The busiest (CPU) process will be at the top, since we sorted on cpu (“-o cpu”). The top was “firefox-bin” (Mozilla Firefox) at 98.8% CPU, which is in terms of a single processor (this has two). If firefox stayed that high you could look for the responsible tab and close it down, or restart Firefox.

After top, I turn to DTrace.

Running DTrace

DTrace requires admin privileges, so to use it you’ll usually need to type in a password to authenticate, provided you have administrator access (if you aren’t sure you do, click here to see how to check).

You can run DTrace by prefixing your DTrace commands with “sudo”, which will prompt for the password the first time around (but not for some time after that). When a DTrace command is running, you usually type Ctrl-C to end it. Here’s an example:

Brendan-2:~ brendan$ sudo iosnoop
Password: password
  UID   PID D    BLOCK   SIZE       COMM PATHNAME
  503 67261 W 384070496  73728  TweetDeck ??/Cookies/Cookies.plist_tmp_67261_0.dat
  503 67261 W 384070640   4096  TweetDeck ??/Local Store/td_26_brendangregg.db-journal
  503 67261 W 384070640   4096  TweetDeck ??/Local Store/td_26_brendangregg.db-journal
  503 67261 W 308056800   4096  TweetDeck ??/Local Store/td_26_brendangregg.db
  503 67261 W 308056856   4096  TweetDeck ??/Local Store/td_26_brendangregg.db
^C
brendan@macbook:~>

If sudo says “permission denied”, check your admin status.

and here’s what happens if you forgot the “sudo”:

Brendan-2:~ brendan$ iosnoop
dtrace: failed to initialize dtrace: DTrace requires additional privileges

Scripts

The scripts follow. A note on style: if the script ends with “.d”, it’s a basic DTrace script. If it doesn’t, then it’s a shell-wrapped script that provides command line options. Use “-h” to list them.

And if you’d like to learn DTrace, I’d recommend starting by reading the basic ones.

1. iosnoop

This “traces” disk I/O execution live. Each time a disk I/O completes, a line of output is printed to summarize it, including process name and filename details:

Brendan-2:~ brendan$ sudo iosnoop
  UID   PID D    BLOCK   SIZE       COMM PATHNAME
  503 54079 R 286522800   4096 Google Chrome ??/Cache/data_2
  503 54079 R 286522808   4096 Google Chrome ??/Cache/data_2
  503 34852 W 385001216  53248      Adium ??/Default/.dat8824.01b
  503 65002 W 308056800   4096  TweetDeck ??/Local Store/td_26_brendangregg.db
  503 65002 W 308056864   4096  TweetDeck ??/Local Store/td_26_brendangregg.db
  503 65002 W 385001320   4096  TweetDeck ??/Local Store/td_26_brendangregg.db-journal
  503 65002 W 385001320   4096  TweetDeck ??/Local Store/td_26_brendangregg.db-journal
  503 65002 W 385001320   4096  TweetDeck ??/Local Store/td_26_brendangregg.db-journal
  503 54079 W 385001384  12288 Google Chrome ??/Default/Cookies-journal
  503 54079 W 385001384   4096 Google Chrome ??/Default/Cookies-journal
  503 54079 W 134993856   4096 Google Chrome ??/Default/Cookies
  503 54079 W 134994056   4096 Google Chrome ??/Default/Cookies
  503 54079 W 134994176   4096 Google Chrome ??/Default/Cookies
  503 54079 W 134994224   8192 Google Chrome ??/Default/Cookies
[...]

This lets you instantly find out which applications are using the disk, and what files they are reading or writing to. Disk I/O is typically slow (for non-SSD disks), so an application calling frequent disk I/O (a dozen per second or more) may run slowly as it waits for the disk I/O to complete.

The output columns show: UID = user ID, PID = process ID (unique identifier for the process), D = direction (R = read, W = write), BLOCK = location on disk, SIZE = I/O size in bytes, COMM = process name, PATHNAME = trailing portion of file pathname.

In that output I caught Google Chrome reading from a cache file (“data_2″), and writing to cookie files (“Cookies-journal” and “Cookies”). TweetDeck also wrote to database files (“td_26_brendangregg.db” and “td_26_brendangregg.db-journal”). The “??” is where the path information ends for iosnoop (if you are a darwin programmer and want to take a swing at improving that, see the fi_pathname translator in /usr/lib/dtrace/io.d).

The “-h” option lists options (don’t need “sudo” for this):

Brendan-2:~ brendan$ iosnoop -h
USAGE: iosnoop [-a|-A|-DeghiNostv] [-d device] [-f filename]
               [-m mount_point] [-n name] [-p PID]
       iosnoop          # default output
                -a      # print all data (mostly)
                -A      # dump all data, space delimited
                -D      # print time delta, us (elapsed)
                -e      # print device name
                -g      # print command arguments
                -i      # print device instance
                -N      # print major and minor numbers
                -o      # print disk delta time, us
                -s      # print start time, us
                -t      # print completion time, us
                -v      # print completion time, string
                -d device       # instance name to snoop
                -f filename     # snoop this file only
                -m mount_point  # this FS only
                -n name         # this process name only
                -p PID          # this PID only
   eg,
        iosnoop -v      # human readable timestamps
        iosnoop -N      # print major and minor numbers
        iosnoop -m /    # snoop events on filesystem / only

For tricky performance issues I often use “-stoD” to get start and end timestamps for each I/O in microseconds, and a couple of different types of I/O time calculations.

In hindsight, I should have called it diskiosnoop, since “io” could refer to different locations in the kernel I/O stack.

2. hfsslower.d

This script answers an iosnoop FAQ: why dosen’t iosnoop see my application disk I/O?

The reason is that applications rarely request disk I/O directly, rather, they access a file system that does disk I/O on their behalf. To increase performance, the file system will usually try to cache as much file data as possible in main memory (DRAM). The application may (by some notion) think that it’s doing disk I/O, but it’s actually reading from very fast DRAM, thanks to the file system. Writes can also buffer in DRAM and write to disk later, which also speeds up application performance.

The hfsslower.d script measures I/O before it is processed by the HFS+ file system (Apple’s current default file system). iosnoop measures I/O after the file system, and only if it reaches disk:

A couple of points:

  • hfsslower.d will see a lot more I/O than iosnoop, as it includes file system cache hits.
  • hfsslower.d better reflects application performance, as it measures the same latency that the application directly suffered.

The hfsslower.d script is from the DTrace book, and can be found here. To run it, you’ll need to create a text file containing the script (or pull it from the DTrace book tarball), and make the file executable from Terminal by running “chmod 755 hfsslower.d”.

This script takes an argument which is the minimum number of milliseconds to show I/O for. Here’s tracing 1ms HFS+ I/O and slower:

Brendan-2:~ brendan$ sudo ./dtbook_scripts/Chap5/hfsslower.d 1
TIME                 PROCESS          D   KB     ms FILE
2011 Sep 27 19:00:15 Google Chrome    R    0     15 data_5
2011 Sep 27 19:00:15 Google Chrome    R    0     10 data_0
2011 Sep 27 19:00:15 Google Chrome    R    0     10 data_0
2011 Sep 27 19:00:15 Google Chrome    R    0     10 data_5
2011 Sep 27 19:00:21 TweetDeck        W    0      1 td_26_brendangregg.db-journal
2011 Sep 27 19:00:25 Adium            R    0     18 Smile.png
2011 Sep 27 19:01:08 firefox-bin      W    0     17 _CACHE_001_
2011 Sep 27 19:01:36 firefox-bin      W    0      9 _CACHE_001_
2011 Sep 27 19:01:36 firefox-bin      W    2     14 _CACHE_002_
2011 Sep 27 19:01:36 firefox-bin      W    0      7 _CACHE_001_
2011 Sep 27 19:01:37 firefox-bin      W    0      1 _CACHE_001_
2011 Sep 27 19:01:40 firefox-bin      W    0      6 _CACHE_001_
2011 Sep 27 19:01:46 firefox-bin      W    0     14 _CACHE_001_
2011 Sep 27 19:01:49 firefox-bin      R   15      1 _CACHE_003_
2011 Sep 27 19:01:49 firefox-bin      W   12      7 _CACHE_003_
2011 Sep 27 19:01:49 firefox-bin      W    0      4 _CACHE_001_
2011 Sep 27 19:01:51 firefox-bin      W    0      9 _CACHE_001_
2011 Sep 27 19:01:52 firefox-bin      R   21     27 Times.dfont/..namedfork/rsrc
2011 Sep 27 19:01:52 fontd            R    0     16 annex_aux
2011 Sep 27 19:01:52 firefox-bin      W    0      5 _CACHE_001_
[...]

Columns are: TIME = time of I/O completion, PROCESS = application name, D = direction (R = read, W = write), KB = I/O size in Kbytes, ms = I/O latency in milliseconds, FILE = filename.

If you use the argument “0″, it will trace everything. If I’m chasing down slow I/O, I’ll often use an argument of “10″ for I/O slower than 10 milliseconds.

At this point you may think: if you just care about slow I/O, then just use iosnoop. That works to a point, but there can be slow I/O caused by something other than disks (file system lock contention, for example). The other advantage of the hfsslower.d script is that the measured latency matches the application pain suffered, whereas at the disk level you can only assume a correlation.

3. execsnoop

This traces the execution of new processes. This is great at identifying short-lived processes that may be caused by misbehaving applications and can slow down your system. These short-lived processes are usually too quick to be picked up by standard monitoring tools like the Activity Monitor or top(1).

To demonstrate this tool, here’s what happens when you type “man ls”:

Brendan-2:~ brendan$ sudo execsnoop -v
STRTIME                UID    PID   PPID ARGS
2011 Sep 28 20:19:18     0  67234  66312 man
2011 Sep 28 20:19:18     0  67234  66312 man
2011 Sep 28 20:19:18     0  67235  67234 sh
2011 Sep 28 20:19:18     0  67235  67234 gzip
2011 Sep 28 20:19:19     0  67236  67234 sh
2011 Sep 28 20:19:18     0  67234  66312 man
2011 Sep 28 20:19:19     0  67236  67234 gzip
2011 Sep 28 20:19:19     0  67234  66312 man
2011 Sep 28 20:19:19     0  67244  67243 less
2011 Sep 28 20:19:19     0  67242  67239 sh
2011 Sep 28 20:19:19     0  67242  67239 gzip
2011 Sep 28 20:19:19     0  67240  67238 tbl
2011 Sep 28 20:19:19     0  67241  67238 groff
2011 Sep 28 20:19:19     0  67245  67241 troff
2011 Sep 28 20:19:19     0  67246  67241 grotty

This prints a line for each new process that is executed, in a rolling output similar to the previous tools. The fields are: STRTIME = (string) timestamp, UID = user ID, PID = process ID, PPID = parent process ID, ARGS = process name (should be process + arguments, but that doesn’t yet work on Mac OS X; if you want to debug, see pr_psargs in /usr/lib/dtrace/proc.d).

Here’s what happens when I turn AirPort (wifi) off, then on:

Brendan-2:~ brendan$ sudo execsnoop -v
STRTIME                UID    PID   PPID ARGS
2011 Sep 28 20:28:00     0  67204      1 airportd
2011 Sep 28 20:28:01   503  67205    140 fontworker
2011 Sep 28 20:28:01     0  67206      1 mDNSResponderHel
2011 Sep 28 20:28:06     0     49      1 autofsd
2011 Sep 28 20:28:10     0  67208     37 ManagedClient
2011 Sep 28 20:28:31   503  67209    140 fontworker
2011 Sep 28 20:28:43     0  67210      1 airportd
2011 Sep 28 20:28:49     0  67211      1 mDNSResponderHel
2011 Sep 28 20:28:49   503  67212    140 fontworker
2011 Sep 28 20:28:51     0  67214      1 kerberosautoconf
2011 Sep 28 20:28:51     0  67215      1 kerberosautoconf
2011 Sep 28 20:28:56     0     49      1 autofsd
2011 Sep 28 20:29:01     0  67219     37 ManagedClient
2011 Sep 28 20:29:04     0  67220      1 ocspd
2011 Sep 28 20:29:16   503  67221  54079 Google Chrome He
2011 Sep 28 20:29:19   503  67222    140 fontworker

Near the end I opened up a new tab in Mozilla Firefox and another in Google Chrome. You can see a fundamental difference: Chrome creates a new process for that tab, Firefox doesn’t.

execsnoop has various options, use -h to list them all. I used -v above, to print the time.

4. opensnoop

This traces file opens and prints various details, including the time and error code when using “-ve”. I usually use it to look for failed opens, which can be a sign of misconfigured applications. Discovering their config files and resource files can also be useful.

Brendan-2:~ brendan$ sudo opensnoop -ve
STRTIME                UID    PID COMM          FD ERR PATH
2011 Sep 30 01:44:15     0     11 DirectoryServic  -1   2 /var/db/dslocal/nodes/Default/hosts
2011 Sep 30 01:44:15     0     11 DirectoryServic  20   0 /etc/hosts
2011 Sep 30 01:44:15     0  85198 dtrace         5   0 /etc/localtime
2011 Sep 30 01:44:15   503    146 SystemUIServer  19   0 /Users/brendan/Library/Preferences/com.apple.menuextra.clock.plist.47C2X9o
2011 Sep 30 01:44:15   503    146 SystemUIServer  19   0 /System/Library/CoreServices/Menu Extras/TimeMachine.menu/Contents/Resources/TMRotatingArrow.pdf
2011 Sep 30 01:44:15   503    146 SystemUIServer  19   0 /System/Library/CoreServices/Menu Extras/Clock.menu/Contents/Resources/Clock.pdf
2011 Sep 30 01:44:15   503    146 SystemUIServer  19   0 /System/Library/CoreServices/Menu Extras/TimeMachine.menu/Contents/Resources/TMRotatingArrow.pdf
2011 Sep 30 01:44:17   503  67261 TweetDeck     17   0 /Users/brendan/Library/Cookies/Cookies.plist_tmp_67261_0.dat
2011 Sep 30 01:44:17   503  67261 TweetDeck     17   0 /Users/brendan/Library/Cookies/Cookies.plist
2011 Sep 30 01:44:18   503  54079 Google Chrome  72   0 /var/folders/bU/bU0WYlnwE6KKnMBrvI5QUU+++TQ/-Tmp-/.com.google.Chrome.sQdN3D
2011 Sep 30 01:44:19     0     36 mds            9   0 .
^C

While tracing, I changed the clock in the top bar from digital:

to analogue:

The files opened to performed this can be seen by the “SystemUIServer” process, beginning with a Preferences file and then some PDFs. Huh? PDFs? What are those PDFs?

That’s bizarre. I didn’t know that those tiny icons were implemented as PDFs!

5. dtruss

The previous two tools, opensnoop and execsnoop, operate by tracing specific system calls. A system call (or “syscall”) is what an application performs to request the operating system kernel to perform privileged work, including process creation, file operations, and other I/O (eg, disk or network I/O). Syscalls are a great target for analysis with DTrace, since examining them often provides a pretty good picture of what an application is trying to do. They can also provide useful details including byte counts, file and process names, error codes, and latency.

The dtruss tool traces all types of system calls, which is very useful for general debugging, especially since Mac OS X doesn’t come with a standard syscall tracer (like Linux’s “strace” or Solaris’s “truss”). One advantage of dtruss over those other tools is that dtruss can trace multiple processes at the same time, matching on the process name “-n”. For example, tracing Firefox via its process name “firefox-bin”:

Brendan-2:~ brendan$ sudo dtruss -n firefox-bin
	PID/THRD  SYSCALL(args) 		 = return
67254/0x720286:  write_nocancel(0x7, "8\0", 0x1)		 = 1 0
67254/0x720286:  lseek(0x24, 0x75500, 0x0)		 = 480512 0
67254/0x720286:  read_nocancel(0x24, "\0", 0x200)		 = 512 0
67254/0x720286:  write_nocancel(0x7, "8\0", 0x1)		 = 1 0
67254/0x720286:  lseek(0x24, 0x73B00, 0x0)		 = 473856 0
67254/0x720286:  read_nocancel(0x24, "\0", 0x200)		 = 512 0
67254/0x746a81:  select_nocancel(0x3D, 0xB04909B8, 0xB0490938)		 = 1 0
67254/0x746a81:  read_nocancel(0x6, "8\0", 0x400)		 = 1 0
67254/0x746a81:  recvfrom_nocancel(0x36, 0xB0490C3F, 0x1)		 = -1 Err#35
67254/0x746a81:  select_nocancel(0x3D, 0xB0490998, 0xB0490918)		 = 1 0
67254/0x746a81:  sendto_nocancel(0x36, 0x20A51008, 0x292)		 = 658 0
67254/0x720286:  write_nocancel(0x7, "8\0", 0x1)		 = 1 0
67254/0x720286:  lseek(0x24, 0x74000, 0x0)		 = 475136 0
[...]

The output will be many pages, as applications commonly make frequent system calls.

dtruss can also launch and trace a program. Here the humble “ls -l” command was traced:

Brendan-2:~ brendan$ sudo dtruss ls -l hfsslower.d
-rwxr-xr-x  1 brendan  staff  1152 Jan 28  2011 hfsslower.d
SYSCALL(args) 		 = return
ioctl(0x3, 0x80086804, 0x7FFF5FBFD710)		 = 0 0
close(0x3)		 = 0 0
stat64("/usr/lib/libstdc++.6.dylib\0", 0x7FFF5FBFCB20, 0x7FFF5FBFD160)		 = 0 0
mmap(0x0, 0xD000, 0x3, 0x1002, 0x1000000, 0x7FFF00000001)		 = 0x20000 0
[...100 lines truncated...]
open_nocancel("/etc/sysinfo.conf\0", 0x0, 0x1B6)		 = -1 Err#2
lstat64_extended(0x100100768, 0x7FFF5FBFE3F0, 0x100101580)		 = 0 0
listxattr(0x100100768, 0x0, 0x0)		 = 0 0
fstat64(0x1, 0x7FFF5FBFD960, 0x7FFF5FBFDA2C)		 = 0 0
ioctl(0x1, 0x4004667A, 0x7FFF5FBFD9AC)		 = 0 0
access("/etc/localtime\0", 0x4, 0x0)		 = 0 0
open_nocancel("/etc/localtime\0", 0x0, 0x0)		 = 4 0
fstat64(0x4, 0x7FFF5FBFB8B0, 0x0)		 = 0 0
read_nocancel(0x4, "TZif\0", 0x2A64)		 = 1017 0
close_nocancel(0x4)		 = 0 0
write_nocancel(0x1, "-rwxr-xr-x  1 brendan  staff  1152 Jan 28  2011 hfsslower.d\n\0", 0x3C)		 = 60 0
fchdir(0x3, 0x100100000, 0xFC080)		 = 0 0
fchdir(0x3, 0x0, 0x100800000)		 = 0 0
close_nocancel(0x3)		 = 0 0

100 lines were trimmed to keep that example short.

dtruss supports various options, including the printing of system call timing for use when analyzing performance.

Brendan-2:~ brendan$ dtruss -h
USAGE: dtruss [-acdefholLs] [-t syscall] { -p PID | -n name | command }

          -p PID          # examine this PID
          -n name         # examine this process name
          -t syscall      # examine this syscall only
          -a              # print all details
          -c              # print syscall counts
          -d              # print relative times (us)
          -e              # print elapsed times (us)
          -f              # follow children
          -l              # force printing pid/lwpid
          -o              # print on cpu times
          -s              # print stack backtraces
          -L              # don't print pid/lwpid
          -b bufsize      # dynamic variable buf size
   eg,
       dtruss df -h       # run and examine "df -h"
       dtruss -p 1871     # examine PID 1871
       dtruss -n tar      # examine all processes called "tar"
       dtruss -f test.sh  # run test.sh and follow children

I’ve used the “-e” option many times to figure out what’s slowing down an application. It shows the elapsed time for the system call in microseconds:

Brendan-2:~ brendan$ dtruss -e ls -l hfsslower.d 
-rwxr-xr-x  1 brendan  staff  1152 Jan 28  2011 hfsslower.d
 ELAPSD SYSCALL(args) 		 = return
    245 mmap(0x10000B000, 0x2000, 0x5, 0x12, 0x3, 0x7FFF00000001)		 = 0xB000 0
     11 mmap(0x10000D000, 0x1000, 0x3, 0x12, 0x3, 0x7FFF00000001)		 = 0xD000 0
     10 mmap(0x10000E000, 0x1F10, 0x1, 0x12, 0x3, 0x7FFF00000001)		 = 0xE000 0
     31 open("/dev/dtracehelper\0", 0x2, 0x7FFF5FC45370)		 = 3 0
     40 __sysctl(0x7FFF5FBFD5B0, 0x2, 0x7FFF5FBFD5DC)		 = 0 0

I then look down the “ELAPSD” column for the largest times.

A dtruss FAQ is: why is the output in the wrong order?

As with other DTrace “tracing” style tools (that print lines of output as they occur) the output can be shuffled slightly due to CPU buffering on multi-CPU systems. The “-d” option for relative times (since program start) is useful for cases where the output order is important. You can then post-sort by that column so that the dtruss output is in the correct order. (For example, by redirecting the output to a file, then using the sort(1) command on that column.)

Ryan Dahl (creator of node.js) had been using dtruss so frequently recently that he made some enhancements and posted a newer version on github here.

6. soconnect_mac.d

This script is from the DTrace book, and traces outbound TCP connections along with details:

Brendan-2:~ brendan$ sudo ./dtbook/Chap6/soconnect_mac.d
PID    PROCESS          FAM ADDRESS          PORT   LAT(us) RESULT
45343  firefox-bin      2   70.32.100.166    443        735 In progress
65002  TweetDeck        2   199.59.148.201   80          94 In progress
65002  TweetDeck        2   208.93.137.53    80          89 In progress
54079  Google Chrome    2   97.74.215.143    80          76 In progress
45343  firefox-bin      2   97.74.215.143    80         752 In progress
45343  firefox-bin      2   97.74.215.143    80          88 In progress
45343  firefox-bin      2   97.74.215.143    80          85 In progress
45343  firefox-bin      2   97.74.215.143    80          39 In progress
45343  firefox-bin      2   97.74.215.143    80          22 In progress
45343  firefox-bin      2   97.74.215.143    80          20 In progress
27     ntpd             2   17.151.16.20     123         24 Success
[...]

It’s a quick way to find out what applications are connecting to whom on the Internet. The script is here.

Columns are: PID = process ID, PROCESS = process name, FAM = protocol family (2 = IPv4), ADDRESS = IP address, PORT = TCP destination port, LAT(us) latency of the connect() system call in microseconds, RESULT = return of the connect().

Many connections these days are “non-blocking”, so the latency appears low and has the result “In progress”. The actual connection completes sometime later (the timing can be traced using some more DTrace, when desired).

There is a companion tool, soaccept_mac.d, which shows inbound TCP connections. If you ran both regularly, you may catch something performing networking that shouldn’t be (e.g., spyware, virus, …).

7. errinfo

This tool provides a summary of which system calls were failing, showing the process name, error code, and short description of the error:

Brendan-2:~ brendan$ sudo errinfo -c
Tracing... Hit Ctrl-C to end.
^C
            EXEC          SYSCALL  ERR  COUNT  DESC
       TweetDeck          lstat64    2      1  No such file or directory
       TweetDeck           select    9      1  Bad file descriptor
       TweetDeck           stat64    2      1  No such file or directory
           cupsd           unlink    2      1  No such file or directory
     firefox-bin connect_nocancel   36      1  Operation now in progress
         launchd            mkdir   17      1  File exists
         launchd    open_nocancel    2      1  No such file or directory
[...]
          Google           Chrome    0     35
          Google           Chrome    0     35
        mdworker      getattrlist    2     24  No such file or directory
        mdworker            mkdir   13     24  Permission denied
       TweetDeck           access    2     33  No such file or directory
       TweetDeck             read   35     35  Resource temporarily unavailable
   mDNSResponder          recvmsg   35     38  Resource temporarily unavailable
        mdworker           stat64    2     47  No such file or directory
       gpg-agent    read_nocancel   35     60  Resource temporarily unavailable
        thnuclnt            ioctl   25     60  Inappropriate ioctl for device
            ntpd __pthread_canceled   22     69  Invalid argument
            ntpd        sigreturn   -2     69
            ntpd       sigsuspend    4     69  Interrupted system call
          Google           Chrome    0      0
 activitymonitor        proc_info    3   1495  No such process

This can be another quick way to track down failing or misconfigured applications. Note that (usually) most system call errors are not a problem: it can be normal for system calls to fail, and the calling application handles that condition correctly. errinfo is particularly useful when the application didn’t handle the failure correctly, and the system calls are silently failing.

8. bitesize.d

This is a simple DTrace script that characterizes the disk I/O workload, showing a distribution of the size of the I/O in bytes along with the application name:

Brendan-2:~ brendan$ sudo bitesize.d
Tracing... Hit Ctrl-C to end.
^C

     PID  CMD
   31502  vim\0

           value  ------------- Distribution ------------- count
            2048 |                                         0
            4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@         4
            8192 |                                         0
           16384 |@@@@@@@@                                 1
           32768 |                                         0        

   54079  Google Chrome\0

           value  ------------- Distribution ------------- count
            2048 |                                         0
            4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@        18
            8192 |@@@@@@@                                  4
           16384 |                                         0        

   86143  bsdtar\0

           value  ------------- Distribution ------------- count
            2048 |                                         0
            4096 |@@@                                      19
            8192 |@@@@                                     23
           16384 |@@                                       12
           32768 |@@@@@@@@@@@@@@@@@@@@@@@                  151
           65536 |@@@@                                     25
          131072 |@@                                       10
          262144 |@@                                       10
          524288 |@                                        8
         1048576 |                                         3
         2097152 |                                         0

The “value” column shows the minimum size of the I/O in bytes, and the “count” column shows the number of I/O in that range: between the minimum size and the next size shown.

To follow an example: the “bsdtar” command shown above performed 151 disk I/Os that were between 32768 bytes and 65535 bytes in size (32K to 64K), which was where most of its I/O fell. The text plot in the middle provides a visualization for the distribution of the events, which shows a spike that corresponds to this 32K to 64K range. Google Chrome did a couple of dozen I/O between 4 and 16 Kbytes.

Disk I/O is an expensive operation, so it’s usually preferable for it to be larger in size to improve the throughput to the device.

9. iotop

This presents the same data as iosnoop, but in a summarized way similar to top(1). It’s handy when disk I/O is so frequent that iosnoop is too verbose, and you want a high level summary of which process is rattling the disks. Here I’ve used “-CP” to not clear the screen and provide a rolling output (-C), and to show disk busy percentages (-P):

Brendan-2:~ brendan$ sudo iotop -CP 1
Tracing... Please wait.
2011 Sep 27 19:33:11,  load: 0.74,  disk_r:      0 KB,  disk_w:  57344 KB

  UID    PID   PPID CMD              DEVICE  MAJ MIN D   %I/O
    0  66661  66312 cp               ??       14   2 W     98

2011 Sep 27 19:33:12,  load: 0.74,  disk_r:      4 KB,  disk_w:  55752 KB

  UID    PID   PPID CMD              DEVICE  MAJ MIN D   %I/O
    0  66661  66312 cp               ??       14   2 W     93

2011 Sep 27 19:33:13,  load: 0.68,  disk_r:     60 KB,  disk_w:  54464 KB

  UID    PID   PPID CMD              DEVICE  MAJ MIN D   %I/O
    0  66661  66312 cp               ??       14   2 R     11
    0  66661  66312 cp               ??       14   2 W     88

2011 Sep 27 19:33:14,  load: 0.68,  disk_r:   2560 KB,  disk_w:  32928 KB

  UID    PID   PPID CMD              DEVICE  MAJ MIN D   %I/O
  503  65002    140 TweetDeck        ??       14   2 W      0
    0      1      0 launchd          ??       14   2 W      1
    0  66661  66312 cp               ??       14   2 R     11
    0  66661  66312 cp               ??       14   2 W     71

The “??” is for where some details aren’t yet available to DTrace, for the same reason as was described for the iosnoop script (if you are a programmer, you can see what needs to be fixed in the /usr/lib/dtrace/io.d file).

If you get “dynvardrops”, sorry: back when I wrote this (6 years ago), I never saw a disk workload heavy enough to cause these. A remedy can be to increase the dynvarsize tunable by adding the following line:

#pragma D option dynvarsize=16m

near this one:

#pragma D option quiet

in /usr/bin/iotop. You may want to make a backup of iotop first (iotop.orig) or copy it somewhere else to run it.

10. maclife.d

This script is from the DTrace book, and traces the creation and deletion of files:

Brendan-2:~ brendan$ sudo ./dtbook/Chap5/maclife.d
TIME(ms)        UID    PID PROCESS      CALL         DIR/FILE
3754594958      503  54079 Google Chrom VNOP_CREATE  Chrome/.com.google.Chrome.K7I9jy
3754597703      503  54079 Google Chrom VNOP_CREATE  -Tmp-/.com.google.Chrome.8MIKKL
3754597703      503  54079 Google Chrom VNOP_REMOVE  -Tmp-/.com.google.Chrome.8MIKKL
3754597703      503  54079 Google Chrom VNOP_CREATE  -Tmp-/.com.google.Chrome.TLlOje
3754597703      503  54079 Google Chrom VNOP_REMOVE  -Tmp-/.com.google.Chrome.TLlOje
3754598365      503  54079 Google Chrom VNOP_CREATE  -Tmp-/.com.google.Chrome.yIwTdE
3754598365      503  54079 Google Chrom VNOP_REMOVE  -Tmp-/.com.google.Chrome.yIwTdE
3754603801      503  65002 TweetDeck    VNOP_CREATE  Cookies/Cookies.plist_tmp_65002_0.dat
3754605028      503  65002 TweetDeck    VNOP_REMOVE  Local Store/td_26_brendangregg.db-journal
3754605026      503  65002 TweetDeck    VNOP_CREATE  Local Store/td_26_brendangregg.db-journal
3754607674      503  65002 TweetDeck    VNOP_CREATE  Local Store/td_26_brendangregg.db-journal
3754607676      503  65002 TweetDeck    VNOP_REMOVE  Local Store/td_26_brendangregg.db-journal
3754609536      503  34852 Adium        VNOP_CREATE  Default/.dat8824.9fa
3754711583      503  17726 thnuclnt     VNOP_CREATE  thnuclnt-17717/thnuclnt.conf-fta17726
3754711587      503  17726 thnuclnt     VNOP_REMOVE  -Tmp-/4e8220118e918
3754711617      503  17726 thnuclnt     VNOP_REMOVE  -Tmp-/4e8220118ead8
3754711620      503  17726 thnuclnt     VNOP_REMOVE  -Tmp-/4e8220119621c
3754711623      503  17726 thnuclnt     VNOP_REMOVE  -Tmp-/4e82201196d7e
3754711624      503  17726 thnuclnt     VNOP_REMOVE  thnuclnt-17717/thnuclnt.conf-fta17726
3754612740      503  31502 vim          VNOP_CREATE  macosx/4913
3754612740      503  31502 vim          VNOP_REMOVE  macosx/4913
3754612741      503  31502 vim          VNOP_CREATE  macosx/top10dtrace.html
3754612741      503  31502 vim          VNOP_REMOVE  macosx/top10dtrace.html~
[...]

Interesting! While tracing I saved the file I was editing in vim, which is seen in the last four lines. This tells me that vim is creating and removing temporary files as part of the save process.

Also note that it looks like TweetDeck created the file twice before removing it (two VNOP_CREATEs followed by a VNOP_REMOVE). This isn’t the correct order, which can be seen by examining the TIME(ms) column. A side-effect of DTrace’s negligible performance impact design is that output can be slightly shuffled due to the way it collects data from per-CPU buffers. I often include a TIME column like that one, not just for the usefulness of knowing time, but also as a means to post sort the output.

Other Scripts

There are other tools ready-to-go in Mac OS X. If you run the “man -k dtrace” command in a Terminal (or “apropos dtrace”), you’ll see the following output:

Brendan-2:~ brendan$ man -k dtrace
[...]
bitesize.d(1m)           - analyse disk I/O size by process. Uses DTrace
cpuwalk.d(1m)            - Measure which CPUs a process runs on. Uses DTrace
creatbyproc.d(1m)        - snoop creat()s by process name. Uses DTrace
dappprof(1m)             - profile user and lib function usage. Uses DTrace
dapptrace(1m)            - trace user and library function usage. Uses DTrace
diskhits(1m)             - disk access by file offset. Uses DTrace
dispqlen.d(1m)           - dispatcher queue length by CPU. Uses DTrace
dtrace(1)                - generic front-end to the DTrace facility
dtruss(1m)               - process syscall details. Uses DTrace
errinfo(1m)              - print errno for syscall fails. Uses DTrace
execsnoop(1m)            - snoop new process execution. Uses DTrace
fddist(1m)               - file descriptor usage distributions. Uses DTrace
filebyproc.d(1m)         - snoop opens by process name. Uses DTrace
hotspot.d(1m)            - print disk event by location. Uses DTrace
httpdstat.d(1m)          - realtime httpd statistics. Uses DTrace
iofile.d(1m)             - I/O wait time by file and process. Uses DTrace
iofileb.d(1m)            - I/O bytes by file and process. Uses DTrace
iopattern(1m)            - print disk I/O pattern. Uses DTrace
iopending(1m)            - plot number of pending disk events. Uses DTrace
iosnoop(1m)              - snoop I/O events as they occur. Uses DTrace
iotop(1m)                - display top disk I/O events by process. Uses DTrace
kill.d(1m)               - snoop process signals as they occur. Uses DTrace
lastwords(1m)            - print syscalls before exit. Uses DTrace
loads.d(1m)              - print load averages. Uses DTrace
newproc.d(1m)            - snoop new processes. Uses DTrace
opensnoop(1m)            - snoop file opens as they occur. Uses DTrace
pathopens.d(1m)          - full pathnames opened ok count. Uses DTrace
pidpersec.d(1m)          - print new PIDs per sec. Uses DTrace
plockstat(1)             - front-end to DTrace to print statistics about POSIX mutexes and read/write locks
priclass.d(1m)           - priority distribution by scheduling class. Uses DTrace
pridist.d(1m)            - process priority distribution. Uses DTrace
procsystime(1m)          - analyse system call times. Uses DTrace
runocc.d(1m)             - run queue occupancy by CPU. Uses DTrace
rwbypid.d(1m)            - read/write calls by PID. Uses DTrace
rwbytype.d(1m)           - read/write bytes by vnode type. Uses DTrace
rwsnoop(1m)              - snoop read/write events. Uses DTrace
sampleproc(1m)           - sample processes on the CPUs. Uses DTrace
seeksize.d(1m)           - print disk event seek report. Uses DTrace
setuids.d(1m)            - snoop setuid calls as they occur. Uses DTrace
sigdist.d(1m)            - signal distribution by process. Uses DTrace
syscallbypid.d(1m)       - syscalls by process ID. Uses DTrace
syscallbyproc.d(1m)      - syscalls by process name. Uses DTrace
syscallbysysc.d(1m)      - syscalls by syscall. Uses DTrace
topsyscall(1m)           - top syscalls by syscall name. Uses DTrace
topsysproc(1m)           - top syscalls by process name. Uses DTrace
weblatency.d(1m)         - website latency statistics. Uses DTrace

Plus more in the DTrace book.

And More…

DTrace isn’t just about running scripts; you can write your own custom scripts, run one-liners, and use higher-level tools that use DTrace behind the scenes, like Apple’s Instruments and Joyent’s Cloud Analytics. And it’s for more than just your MacBook: if you are using servers that have DTrace available, you can use it to diagnose their performance and issues too, including tracing kernel and application code. I regularly use it to see how a MySQL database interacts with a kernel file system, for example.

For more reading about DTrace, you can see my posts tagged dtrace, other blogs on dtrace.org, the original 410 page DTrace guide, 1100 page DTrace book, and my original DTrace page (which includes the DTraceToolkit). It’s a little old now (and I think prustat needs updating), but there’s also Matty’s Top Ten DTrace Scripts, which includes some of those above; and Greg Miller’s Exploring Leopard with DTrace, which includes Objective-C tracing.


[shared via Hacker News]

Google celebrates the UK’s Bletchley Park and ground-breaking Colossus computer

Comments

[shared via Hacker News]

Securing Your Domain Against Seizure: Where Is Safe to Register a Domain Name?

Comments

[shared via Hacker News]

Raspberry Pi interview: Eben Upton reveals all

Posted on Tuesday March 6th 2012 at 12:15pm. Its tags are listed below.

Raspberry Pi interview: Eben Upton reveals all

Comments

[shared via Hacker News]

Sometimes the bug isn’t in your code, it’s in the CPU

Comments

[shared via Hacker News]

_uacct = "UA-687561-1"; urchinTracker();