iOS Crash Symbolication for dummies Part 2

In the previous post, we’ve learned what is symbolication process and why it is needed. In this post we will dive deeper and learn how to make sure a dSYM file is generated and see how we can manually use it to symbolicate crash reports.

How do I make sure dSYM is actually being generated?

XCode has several settings that may affect dSYM generation, let’s review them one by one.

First of all, let’s make sure that debug information is being generated:

Let’s instruct XCode to put the debug symbols in a separate dSYM file:

By default that option is only selected for release builds, but not for debug builds. In most cases it is enough, as most commonly debug builds are only used by the developer while debugging their own application while being attached to XCode. However when trying out symbolication or when there is a chance a debug build is going to end up on a device of a colleague who is going to test it, we may still opt to have the dSYM file to be able to analyze the crashes.

And last, but not the least important:

This option is not important for the symbolication process itself, but it is important to check it, as it instructs XCode to strip the debug symbols from the binary file itself, the file we are going to distribute to App Store. It both affects the size of the distributed application, but more importantly, leaving debug information in the application makes our competitors’ life much easier.

With all these options checked, our next build should produce a dSYM file (or rather a series of dSYM files, one for our main application and one for each framework we build as part of our application). The files are located in the products folder. Apple made it quite tricky to find it, one common method is to look at the build logs and copy the path from there. There is an alternative way through XCode:

  • Go to File->Project settings
  • Click on Advanced
  • Clicking on that small arrow will reveal the product folder in Finder

Note: XCode 8.2 was used at the time of writing the post, the options may differ in other XCode versions.

What can I do with a crash report and a dSYM file?

Let’s say we have that raw crash report with addresses and a dSYM file that we know is the matching one. What can we do with it?
The address and the dSYM file should be enough to extract debug information about that address, but there is still one element missing. We need to know the exact address that image was loaded at for that specific crash. The reason for this is operating system randomizes the offset at which programs are being loaded every time they are ran. The technique is usually called ASLR (Address Space Layout Randomization) and it is mostly done for security reasons, as it prevents exploits that rely on a specific layout of the program at runtime.

This is where the list of all the loaded images comes into play. If you are dealing with raw crash reports generated by XCode, it can be found
in the “Binary images” section of the text file.

1
2
3
4
5
6
7
8
9
10
11
Binary Images:
0x10007C000 - 0x1002C3FFF +MyApplication arm64 /var/containers/Bundle/Application/MyApplication.app/MyApplication
0x184158000 - 0x184159FFF libSystem.B.dylib arm64 /usr/lib/libSystem.B.dylib
0x18415A000 - 0x1841AFFFF libc++.1.dylib arm64 /usr/lib/libc++.1.dylib
0x1841B0000 - 0x1841D0FFF libc++abi.dylib arm64 /usr/lib/libc++abi.dylib
0x1841D4000 - 0x1845ADFFF libobjc.A.dylib arm64 /usr/lib/libobjc.A.dylib
0x184871000 - 0x184871FFF libvminterpose.dylib arm64 /usr/lib/system/libvminterpose.dylib
0x184872000 - 0x184898FFF libxpc.dylib arm64 /usr/lib/system/libxpc.dylib
0x184899000 - 0x184AB3FFF libicucore.A.dylib arm64 /usr/lib/libicucore.A.dylib
0x184AB4000 - 0x184AC4FFF libz.1.dylib arm64 /usr/lib/libz.1.dylib
0x185675000 - 0x1859F9FFF CoreFoundation arm64 /System/Library/Frameworks/CoreFoundation.framework/CoreFoundation

Now that we know that at that particular run our application has been loaded at 0x10007C000, we can use the atos tool that comes with XCode to try to extract more info:

1
2
$ atos -o MyApplication.app.dSYM -l 0x10007C000 0x100117f48
getElementFromArray (in MyApplication.app.dSYM) (AppDelegate.m:22)

That seems to be working. But that is a lot of tedious work if we do it manually. If we want to get nice human readable snapshot of the callstack, then for each address, we have to:

  • Find the image that address is corresponding to in that particular run of the application (remember the ASLR?).
  • Get the start address for that image
  • Locate the dSYM file for that specific image. (Where do we get dSYM files for all the system images? *)
  • Use atos tool to translate the address into a human readable location.

When you deal with XCode raw crash reports, there is a perl script that’s shipped in one of the frameworks that can partially automate this flow.
In XCode 8.2 it can be found in:
/Applications/Xcode.app/Contents/SharedFrameworks/DVTFoundation.framework/Versions/A/Resources/symbolicatecrash
but it may vary from version to version.

1
$ find /Applications/Xcode.app -name symbolicatecrash -type f

Now we can try to symbolicate the whole report at once:

1
path/to/symbolicatecrash /path/to/MyApplication_2016-12-19_Device.crash /path/to/MyApplication.app.dSYM
  • In fact, locating dSYM files for system frameworks is major pain point for most developers. These dSYM files are usually located in ~/Library/Developer/Xcode/iOS\ DeviceSupport/ folder. However this folder is populated by XCode and only contains symbols for iOS versions and architectures that were attached to that particular XCode. (i.e if you’ve never debugged an armv7 device running iOS 8.2 to your Mac, you will not have iOS 8.2 armv7 symbols on this machine.) Good news is starting with iOS 10, Apple dropped the support for all old armv7 devices, and both arm64 and armv7s support files are shipped with iOS regardless of the arch of the device itself. So it is enough now to attach any device with iOS 10.2 to XCode to have support files both for armv7s and arm64 flavors. It is still practically impossible to “collect them all”, however, especially when Apple sometimes releases iOS beta builds daily.

With this script in hand we know how to completely symbolicate one single crash. And that is assuming we got the crash report itself, all the dSYM files, all the tools and a lot of patience. In the real world however, this approach becomes impractical really quickly, as our app is deployed on thousands (hopefully millions!) of devices, all having different iOS versions and architectures. And in order to have a complete solution we have to:

  • Have system frameworks dSYM files for all available iOS versions and architectures out there
  • Be able to match and combine similar crashes, even if they have somewhat different stack traces, but share identical root cause
  • Automatically catalogue dSYM files for each application and framework build we produce
  • Detect, register and process every crash from every user and device
  • Analyze app crash statistics and trends per device, iOS version, App version, etc.

That is exactly where 3rd party crash reporting services come into the picture. Crash reporting can do that and much more, leaving us time to focus on building the app itself instead of spending precious time on building infrastructure and toolchains for debugging and analysis. Crash reporting services differ when it comes to quality of the stack trace reports, as well additional contextual information they provide about the crashes. Bugsee crash reporting has recently been ranked the highest among all iOS crash reporting services when it comes to accuracy and the amount of details in the report. Bugsee doesn’t stop there, however, it also presents video of user actions, console logs and network traffic that preceded the crash.

In the next post of the series, we will be diving deeper into advanced topics of symbolication such as Bitcode.

iOS Crash Symbolication for dummies Part 1

Many developers use Bugsee for its great crash reporting capabilities. In fact, Bugsee crash reporting has recently been ranked the highest among all iOS crash reporting services when it comes to accuracy and the amount of details in the report. Bugsee doesn’t stop there, however, it also presents video of user actions, console logs and network traffic that preceded the crash.

In the following series of posts we are actually going to focus on the crash log itself, explain the magic behind it and show how to properly set it up.

First post in the series is an introductory one.

What is symbolication?

In order to answer that question we must briefly touch on the build process itself. Regardless of the language our project is written in (be that Objective C, Swift or any other), the build process translates our human readable code into machine binary code. Consider the following buggy code (can you spot the bug?).

1
2
3
4
5
6
7
8
9
10
11
12
13
void initialize() {
array = @[@"one", @"two", @"three"];
}
NSNumber* getElementFromArray(int index) {
return array[index];
}
void printAllElements() {
for (int i = 0; i <= 3; i++) {
NSLog(@"%@", getElementFromArray(i));
}
}

After build it will eventually become this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
0x100117dec: stp x29, x30, [sp, #-16]! ; <--- Start of the initialize() method
<...skipped...>
0x100117e9c: ldp x29, x30, [sp], #16
0x100117ea0: ret
0x100117ea4: bl 0x10022d83c
0x100117ea8: stp x29, x30, [sp, #-16]! ; <--- Start of the printAllElements() method
0x100117eac: mov x29, sp
0x100117eb0: sub sp, sp, #32
0x100117eb4: stur wzr, [x29, #-4]
0x100117eb8: ldur w8, [x29, #-4]
0x100117ebc: cmp w8, #3
0x100117ec0: b.gt 0x100117f08
0x100117ec4: ldur w0, [x29, #-4]
0x100117ec8: bl 0x100117f14 ; <---- this is where it calls getElementFromArray()
0x100117ecc: mov x29, x29
0x100117ed0: bl 0x10022d668
<...skipped...>
0x100117f0c: ldp x29, x30, [sp], #16
0x100117f10: ret
0x100117f14: stp x29, x30, [sp, #-16]! ; <--- Start of getElementFromArray() method
0x100117f18: mov x29, sp
0x100117f1c: sub sp, sp, #16
0x100117f20: adrp x8, 436
0x100117f24: add x8, x8, #2520
0x100117f28: adrp x9, 452
0x100117f2c: add x9, x9, #1512
0x100117f30: stur w0, [x29, #-4]
0x100117f34: ldr x9, [x9]
0x100117f38: ldursw x2, [x29, #-4]
0x100117f3c: ldr x1, [x8]
0x100117f40: mov x0, x9
0x100117f44: bl 0x10022d608 ; <--- Here we send message to NSArray to retrive that element
0x100117f48: mov sp, x29
0x100117f4c: ldp x29, x30, [sp], #16
0x100117f50: ret

As you can see from this example, the build process got rid of all the symbols (variable and method names), it also doesn’t know anymore anything about the layout of our code, the amount if spaces we put to separate the functions, all that information is lost. So now when crash occurs (and it will occur, after all we access elements beyond the bounds of that array), if we don’t have symbolication properly set up, this is the only crash information we will end up with:

1
2
3
4
5
6
7
NSRangeException: *** -[__NSArrayI objectAtIndex:]: index 3 beyond bounds [0 .. 2]
0 CoreFoundation 0x1857A51B8
1 libobjc.A.dylib 0x1841DC55C
2 CoreFoundation 0x1856807F4
3 MyApplication 0x100117f48
4 MyApplication 0x100117ecc
5 ...

This is pretty raw, and not very useful. We know it failed in some method inside the CoreFoundation system method, which was in turn called from some method in libobjc.A.dylib, which was in turn called from another method in CoreFoundation, which in turn was called from our application (finally!). But what is 0x100117f48? Where exactly is it? What file, function or line number is it? That is exactly where symbolication comes in.

Symbolication is the process of translating the return addresses back into human readable method/filename and line numbers.

Successful symbolication will result in the following report instead:

1
2
3
4
5
6
NSRangeException: *** -[__NSArrayI objectAtIndex:]: index 3 beyond bounds [0 .. 2]
0 CoreFoundation __exceptionPreprocess + 124
1 libobjc.A.dylib objc_exception_throw + 52
2 CoreFoundation -[__NSArrayI objectAtIndex:] + 180
3 MyApplication getElementFromArray (MyFile.m:22)
4 MyApplication printAllElements (MyFile.m:27)

Now it’s pretty obvious that crash was caused by some improper array access in line 22 of MyFile.m, which happens to be within getElementsArray method. And if we need more context, we can easily see this one was called by printAllElements at line 27 of the same file.

What is a dSYM file?

Luckily for us, XCode can be instructed to keep a lot of the data that is being lost during the build process. It can put it inside the application itself, but that is not a good idea. We do not want to ship our application with all these extra debugging information, it will make it very easy for our competitors and hackers to reverse engineer the app. We would like to have it generated, but kept out of the AppStore. That’s exactly what dSYM file is all about. During the build process, XCode strips all the debug information from the main executable file, and puts it inside a special file called dSYM. This helps to keep our executable small and easier to distribute to happy customers.

If our application is using frameworks, the product folder will have a separate dSYM file generated for each framework built. Eventually all of them are needed if we want to cover our bases and be able to symbolicate a crash in every possible location in our app.

Needless to say, a dSYM file generated while building a specific version of the application can only be used to symbolicate crashes from that specific version only.
dSYM files are identified by a Unique ID (UUID), which changes every time we modify and rebuild our code, and that ID is what is used to match a symbol file to a specific crash. A dSYM may be associated with more than one UUID, as it may contain debug information for more than one architecture.

The UUID of a dSYM can be easily retrieved using the dwarfdump command:

1
2
3
$ dwarfdump -u MyApplication.app.dSYM
UUID: 9F665FD6-E70C-3EB9-8622-34FD9EC002CA (armv7) MyApplication.app.dSYM
UUID: 8C2F9BB8-BB3F-37FE-A83E-7F2FF7B98889 (arm64) MyApplication.app.dSYM

The dSYM above has debug information for both arm7 and arm64 flavors of our application, each flavor has its own UUID.

These dSYM files can and should be manually stored for future symbolication of the crashes in the production build. Alternatively they can be uploaded to a crash reporting service like Bugsee, where they will be put in a vault and will get eventually used for processing a crash for that specific build. Typically, a special build phase is added to the build process that is responsible for uploading dSYM files to the vault.

What happens during iOS crash?

During crash the following information is being collected on the device:

  • Crash/exception type and an exception specific message (if and when available)
  • Stack trace for each thread (in raw form, the list of these unreadable return addresses that we saw before)
  • List of all images (user and system frameworks and extensions loaded by the application. Each one has a unique UUID to help match it to the right dSYM file)
  • Other information about the specific build, device, time of the crash, etc. These are less relevant to the symbolication process, but important nevertheless.

This information is sent for processing to the crash reporting service, where it will be matched with proper dSYM files that were already uploaded at build time, or will be uploaded manually at a later time. The symbolication process happens on the server and produces a nice, human readable crash report that can either be viewed through a web dashboard or downloaded as a file. The report will typically include the items listed above (basic info, crash/exception details and symbolicated stack trace if all the stars are aligned and all symbol files were properly uploaded and processed.

That is what a typical crash reporting service provides. Bugsee provides much more than that, to name a few, with Bugsee, these reports also include an interactive player that can play in a synchronized manner the following:

  • Video of the screen and user interactions that preceded the crash
  • Network traffic, with complete request and response headers and body
  • System and application traces (disk space, cpu loads, top view and window names, etc.)
  • Your custom traces
  • Console logs

This gives much more context that is of tremendous help when trying to debug an evasive issue that is only happening for customers in the field.

In the next post of the series, we are diving deeper into symbolication process itself, and show how to manually symbolicate an address or a full Apple crash report.

Managing iOS build configurations

Why?

Applications today are rarely being built without 3rd party libraries and SDKs. There are libraries for integrating remote backends into your app. Libraries for effective image caching and loading. Libraries for gathering analytics and libraries for pushing messages. Libraries that help your users report issues, help you debug and analyze crashes. (Yes, that last one is Bugsee)

It is also a common practice for developers to maintain more than flavor of the application during active development. A debug version of the application may require a new version of the backend server which also might be under development still, it might send analytics data to a different service or it may need to include a helper library that is not necessary in the release version.

This following tutorial describes several options for maintaining different build configurations.

We are using Bugsee SDK as an example in this tutorial. At the same time, we do want to point out, that Bugsee SDK is extremely lightweight and doesn’t impact your user’s experience. Bugsee also passes all Apple Appstore requirements. A lot of Bugsee’s customers have seen the benefits of shipping the library in their App Store builds, it provides them with video-enabled crash reporting and in-app bug reporting.

Debug vs. Release

This one is the default most common setup you get when you create your application with xCode. With preprocessor macros it is easy to differentiate at compile time between a Debug and a Release build. Thus, it is easy to launch (or not launch) a library in any particular configuration. Lets launch Bugsee only in Debug build.

1
2
3
4
5
6
7
8
9
10
11
12
#if DEBUG
#import "Bugsee/Bugsee.h"
#endif
....
- (BOOL)application:(UIApplication *)application
didFinishLaunchingWithOptions:(NSDictionary *)launchOptions {
// ...other initialization code
#if DEBUG
[Bugsee launchWithToken:@"your app token"];
#endif

So now you don’t import and don’t run it, but the framework is still getting packaged into your application. If you use CocoaPods for you library management, there is a solution, starting with v0.34, CocoaPods honors build configurations. All you have to do is put the following in your Podfile:

1
pod 'Bugsee', :configurations => ['Debug']

Unfortunately, if you manually copy libraries into your project, you are out of luck.

Debug vs. Release (+TestFlight)

In this configuration you still manage and maintain two build flavors (Debug and Release). The Release build is first distributed to QA and Beta testers through Apple TestFlight and only when ready you promote it to App Store. The beauty of this approach is that it is the same build(same binary!). Thus, you can be sure there are no differences between what was tested and what was released. This means, however, that Bugsee SDK must be part of that Release build and you just want to disable it when it is being installed from App Store.

Fortunately, it is easy to differentiate between App Store and Testflight installations in run-time and initialize Bugsee only when appropriate:

1
2
3
4
5
6
7
8
9
10
11
- (BOOL)application:(UIApplication *)application
didFinishLaunchingWithOptions:(NSDictionary *)launchOptions {
// ...other initialization code
if ([[[[NSBundle mainBundle] appStoreReceiptURL] lastPathComponent] isEqualToString:@"sandboxReceipt"]) {
// We are in TestFlight, enable Bugsee!
[Bugsee launchWithToken:@"your app token"];
}
return YES;
}

Debug vs. Release vs. Beta

In this setup you create and maintain three build flavors. Distribution method of the Beta flavor is not important, it can be done either through Ad-hoc distribution (see our tutorial on building in house Ad-hoc distribution using S3) or through TestFlight. In this setup you are back to detecting the configuration at compile. Adding an additional configuration, however, is a multi step process:

Go to your project Info, and under Configurations, create a new one by duplicating the Release one.

Go to Build Settings and in Preprocessor Macros and add BETA=1 to that new Beta configuration (note that this is actually the place DEBUG=1 is being set for that Debug build).

Voila! You can now detect Release versus Beta build at compilation time. Other than the BETA flag, the builds are identical.

Let’s launch Bugsee in Debug and Beta only:

1
2
3
4
5
6
7
8
9
10
11
12
#if ( DEBUG || BETA)
#import "Bugsee/Bugsee.h"
#endif
....
- (BOOL)application:(UIApplication *)application
didFinishLaunchingWithOptions:(NSDictionary *)launchOptions {
// ...other initialization code
#if (DEBUG || BETA)
[Bugsee launchWithToken:@"your app token"];
#endif

Let’s use that CocoaPods trick to install the library only in Debug and Beta builds:

1
pod 'Bugsee', :configurations => ['Debug', 'Beta']

The only thing left to do is to build the Beta configuration.

First of all, go to Product->Schemes->Edit Scheme and click on Duplicate Scheme button. Name it properly (if your original scheme was called MyApp, call the new one MyApp Beta).

Edit the new scheme, and change the configuration of the Archive step to use Beta configuration.

That’s it. Now you have two schemes, one for Appstore and one for Beta releases. As long as you are developing and using Build & Run, they are identical, and it doesn’t matter which one you are using. It matters though when you do Archive, depending whether you want to produce and IPA for App Store or Ad-Hoc, remember to switch to the right one.

iOS Ad-hoc distribution using Amazon S3

Why?

Every iOS developer at some point in their life is challenged with a task to distribute their awesome app to a group of loyal beta testers. Apple does not make it easy. The reasons for this are clear. We all care about security and greatly appreciate the fact that iOS platform is much more secure than its main competitor, but it is still a problem that needs to be solved. There are services out there that try to automate the ad-hoc distribution process, but sometimes it is not desirable or even possible to use them. Thus, there is a need to bake an in-house distribution system.

To make the ad-hoc distribution work, one must:

  • Upload the IPA file to a place where it can be downloaded
  • Generate a special plist xml file, referring to the IPA above. This file must be accessible over https!
  • Generate an html file with a special Download link, pointing to the plist file above. This is the page users will see

Throughout this tutorial, we are going to use an application called Dishero as an example. We will host it directly on S3, without spinning up machines or configuring http servers. The result will look similar to this:

We will produce a nice webpage, hosted on S3 that will provide details and a one-click install for our application.

Prerequisites

Create an Ad-hoc distribution profile

First of all you will need to create a distribution profiles in iTunes Connect.

Ad-Hoc distribution limits the the ability to install your application only on specific devices that have to be pre-registered. Instruct your beta users to visit http://whatsmyudid.com/ and to follow the instructions there to obtain UDID (Unique Device ID) for their device and send it to you.

Follow the pictures below to create an Ad-Hoc provisioning profile, and add all the devices to it. Download it once ready and install in your local keychain by double-clicking on it.





Create a bucket on S3 and enable web hosting on it

You will also need a dedicated bucket on S3 for this. Make sure you enable Web Hosting on your bucket as shown in the picture below.
Bonus: If you want to have your download page to be at a nice URL (we are using download.dishero.com for this example), you have to do a few tricks:

  1. Name your bucket accordingly - download.dishero.com
  2. In your DNS, configure a CNAME record download.dishero.com to point to download.dishero.com.s3.amazonaws.com.

Install S3cmd

We will need s3cmd to upload the files to S3 and set the right permissions.

Installing it using homebrew is easy:

1
brew install s3cmd

So is installing it using python’s pip:

1
sudo pip install s3cmd

…or you can just follow the manual process described on s3cmd website.

Once you have it installed, run it

1
s3cmd --configure

and enter your AWS credentials when prompted. Credentials will be saved and we won’t need to state them explicitly anymore.

Where the magic happens

The script below does all the heavy lifting. It generates index.html, proper plist file and uploads them along with the IPA itself to S3.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
#!/bin/bash
set -e
PROJECT="Dishero"
FILE=$1
TARGET=$2
VERSION=$3
BUILD=$4
BUILD_TIME=`stat -f "%Sm" -t "%Y-%m-%d %H:%M:%S" $1`
SECURE_TARGET="https://s3-us-west-2.amazonaws.com/${TARGET}"
echo "Generating ${PROJECT}-${VERSION}-${BUILD}.plist"
cat > ./${PROJECT}-${VERSION}-${BUILD}.plist <<PLIST_DELIM
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>items</key>
<array>
<dict>
<key>assets</key>
<array>
<dict>
<key>kind</key>
<string>software-package</string>
<key>url</key>
<string>http://${TARGET}/${PROJECT}-${VERSION}-${BUILD}.ipa</string>
</dict>
</array>
<key>metadata</key>
<dict>
<key>bundle-identifier</key>
<string>${BUNDLE}</string>
<key>bundle-version</key>
<string>${VERSION}</string>
<key>kind</key>
<string>software</string>
<key>subtitle</key>
<string>${PROJECT}</string>
<key>title</key>
<string>${PROJECT}</string>
</dict>
</dict>
</array>
</dict>
</plist>
PLIST_DELIM
echo "Generating index.html"
cat > ./index.html <<INDEX_DELIM
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=0">
<title>${PROJECT} ${VERSION} (${BUILD})</title>
<style type="text/css">
body {background:#fff;margin:0;padding:0;font-family:arial,helvetica,sans-serif;text-align:center;padding:10px;color:#333;font-size:16px;}
#container {width:300px;margin:0 auto;}
h1 {margin:0;padding:0;font-size:14px;}
p {font-size:13px;}
.link {background:#ecf5ff;border-top:1px solid #fff;border:1px solid #dfebf8;margin-top:.5em;padding:.3em;}
.link a {text-decoration:none;font-size:15px;display:block;color:#069;}
.warning {font-size: 12px; color:#F00; font-weight:bold; margin:10px 0px;}
</style>
</head>
<body>
<div id="container">
<h1>iOS 7.0 and newer:</h1>
<p>${PROJECT} Beta ${VERSION} (${BUILD})</p>
<p>Built on ${BUILD_TIME}</p>
<div class="link"><a href="itms-services://?action=download-manifest;url=${SECURE_TARGET}/${PROJECT}-${VERSION}-${BUILD}.plist">Tap to install!</a></div>
<p><strong>Link didn't work?</strong><br />Make sure you're visiting this page on your device, not your computer.</p>
</body>
</html>
INDEX_DELIM
echo "Uploading ${PROJECT}-${VERSION}-${BUILD}.ipa to s3://${TARGET}"
s3cmd put -P $FILE s3://${TARGET}/${PROJECT}-${VERSION}-${BUILD}.ipa
echo "Uploading ${PROJECT}-${VERSION}-${BUILD}.plist to s3://${TARGET}"
s3cmd put -P ${PROJECT}-${VERSION}-${BUILD}.plist s3://${TARGET}/${PROJECT}-${VERSION}-${BUILD}.plist
echo "Uploading index.html to s3://${TARGET}"
s3cmd put -P index.html s3://${TARGET}/

Executing the script requires only a few parameters: an IPA file, a bucket path, a version and a build

1
2
# $PATH_TO_SCRIPT/upload_to_s3.sh <IPA_FILE> <TARGET_PATH> <VERSION> <BUILD>
$PATH_TO_SCRIPT/upload_to_s3.sh Dishero.ipa download.dishero.com/beta 2.5.0 1795

The command above will make the download page available at http://download.dishero.com/beta. Alternatively, if DNS is not under your control and you can’t do CNAME tricks to get a proper hostname, you will have to point your users to https://s3-us-west-2.amazonaws.com/bucketname/path/

Typing bidi text

I feel stupid asking it, but where can I learn that sacral secret knowledge of typing hebrew text mixed with english, numbers and punctuation using modern editors and operating systems? Oh, and after typing, being able to copy-paste it to another app/window while preserving the right order of elements in a sentence. I can’t remember the last time I did that, I think the most popular text editor in Israel was called Einstein back then and copy-paste hasn’t been discovered by humanity yet, in any case I can’t seem to be able to do it now, no matter what I try. All I need is to type two short sentences.

NGINX rewrites on Amazon Elastic Beanstalk

We are running most of our backend code at Dishero on Amazon Elastic Beanstalk and it has been great so far. With Elastic Beanstalk we don’t need to worry about provisoning new instances and auto-scaling, in most cases it just allows us to upload our app and ELB takes care of the rest, however in some rare cases when we actually want to tinker with the internals and do some changes to the actual container it appears to be non-trivial and sometimes takes time to figure out.

Typical ELB setup

In a typical auto-scaled ELB setup, we have one elastic load balancer and multiple instances that execute our code. The setup is as follows:

  • Elastic Load Balancer is the one that is terminating the SSL connection, traffic to our instances is pure HTTP over port 80. You can find out more about setting up SSL certificates on Elastic Beanstalk here.
  • The instances are configured to forward all traffic from port 80 to 8080 using iptables.
  • Each instance has an NGINX running, which is listening on port 80880 and forwarding the traffic to our actual Node applicaiton.

The problem

We want to configure nginx to redirect all non-https traffic to https, and while we are at it to redirect all non-www traffic to www (i.e always push users to https://www.example.com/…).

  • It would be nice also to serve these as one redirect, taking a naive approach and writing two different rules (one for www and one for https) might result in two sequential redirects.
  • Since we are running the same configuration on multiple configurations which have different base URLs, we do not want to hardcode the actual URLs in the actual rulesm but rather keep them generic.

The solution

After several unsuccessfull iterations we arrived at the following set of rules:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
set $fixedWWW '';
set $needRedir 0;
# nginx does not allow nested if statements
# check and decide on adding www prefix
if ($host !~* ^www(.*)) {
set $fixedWWW 'www.';
set $needRedir 1;
}
# what about that https? the traffic is all http right now
# but elastic load balancer tells us about the original scheme
# using $http_x_forwarded_proto variable
if ($http_x_forwarded_proto != 'https') {
set $needRedir 1;
}
# ok, so whats the verdict, do we need to redirect?
if ($needRedir = 1) {
rewrite ^(.*) https://$fixedWWW$host$1 redirect;
}

So the question is where should we put it?

The file that configures nginx to proxy traffic from 8080 to the application in the Elastic Beanstalk environment is located at /etc/nginx/conf.d/00_elastic_beanstalk_proxy.conf

Obviously, SSH’ing to all the instances, modifying the file and restarting nginx manually is of no use, it will get overwritten next time the app is deployed and newly deployed instances won’t have the changes either.
Luckily for us Beanstalk allows us to customize the EC2 environment it provisions using configuration files, that system is pretty flexible and allows not only to install yum packages, write and overwrite files in the system, but to run commands and shell scripts during app deployments as well.

We may be tempted to use the config files to overwrite the 00_elastic_beanstalk_proxy.conf file in /etc/nginx/conf.d directly, and then wonder where are our changes and why they are nowhere to be seen in the system. Actually it might work well if all we want is add new nginx configuration files, but the issue is with existing nginx files, during the deployment process, the customization stage happens before nginx default files are being installed into their by the Elastic Beanstalk system, so even if we set up our copy of 00_elastic_beanstalk_proxy.conf, moments later it will still be overwritten with the default one. We need to overwrite that default one instead, and the source location of these is /tmp/deployment/config/, the one we are mostly interested in is suprisingly named #etc#nginx#conf.d#00_elastic_beanstalk_proxy.conf

So eventually, after all the trial and error, the solution appears to be quite simple, the one thing that needs to be added to our project is the following nginx.config file inside our .ebextensions folder:

.ebextensions/nginx.config
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
files:
"/tmp/deployment/config/#etc#nginx#conf.d#00_elastic_beanstalk_proxy.conf" :
mode: "000755"
owner: root
group: root
content: |
upstream nodejs {
server 127.0.0.1:8081;
keepalive 256;
}
server {
listen 8080;
set $fixedWWW '';
set $needRedir 0;
# nginx does not allow nested if statements
# check and decide on adding www prefix
if ($host !~* ^www(.*)) {
set $fixedWWW 'www.';
set $needRedir 1;
}
# what about that https? the traffic is all http right now
# but elastic load balancer tells us about the original scheme
# using $http_x_forwarded_proto variable
if ($http_x_forwarded_proto != 'https') {
set $needRedir 1;
}
# ok, so whats the verdict, do we need to redirect?
if ($needRedir = 1) {
rewrite ^(.*) https://$fixedWWW$host$1 redirect;
}
location / {
proxy_pass http://nodejs;
proxy_set_header Connection "";
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
gzip on;
}

hexo

Spent the evening setting up a new blog engine as I couldn’t revive the old one. This whole Ruby/Gem/Bundle mess is beyond me, it just never works. So this blog is now running on hexo. I am sure that won’t make me write more, but its Node.JS/NPM based, so at least I will be able to set it up again on a new machine if I suddenly do decide to.

What was it I wanted to blog about? I really can’t remember.

Google Reader

Since Google has shut down the Reader I keep reading in various sources the same idea over and over again. The idea that RSS is an old technology, that it was and is used only by geeks, and that it is obsolete and not needed anymore as one is supposed to get his fix of news through social media (facebook, twitter) and social reading apps like zite, flipboard, etc.

What bothers me most is I read these same ideas in blogs of pretty technical people whose opinions I do usually value.

The interesting part is, if it wasn’t for the RSS, I wouldn’t be reading these blogs in the first place, as it would be too much of a hassle for me to visit each one of their individual sites to check whether they generated some new content. I am also not going to subscribe to their facebook or twitter feeds for the occasional notifications, as it has too much noise for my taste, I don’t care for their random 140 letter ideas, nor do I care about photos of their dinner, I only want their well structured thoughts, the ones they actually spent some time on. So if RSS dies tomorrow, chances are I will still visit some of these sites, but the frequency will become significantly lower with time, and eventually I might even forget about some of them, or give up on regularly reading blogs altogether.

I do agree that social media and tablet/iphone magazines do solve the problem of getting news, latest internet memes etc. I do use Flipboard, Facebook and Twitter myself. But these are things that will eventually pop up in one of these social feeds one way or another anyway, if I miss the post from site A then I will notice a repost from site B the next day or a week later. And even if I do miss some, who cares, knowing it doesn’t really bother me.

But what about personal blogs? I know John Doe, I know he is writing about a topic that is close to my heart, I want to read absolutely everything he writes and I want to read it the day he writes it because I like to be part of the discussion as well. I also like to read everything in one place and I want to be able to save reading some of these articles for later. Is there any technology besides RSS today that fits all these requirements?

I will happily admit RSS is dead if you can point me in the direction of that alternative technology. I don’t think you can though.

Hello, Octopress!

I finally gave in to peer pressure and switched my blog to Octopress.

Apparently WordPress is not cool anymore and all the kids forgot php, uninstalled mysql, went back to generating static pages and just outsource all comment management headache to Disqus.

In the process if migration I decided I didn’t really have any important posts or comments I had to move over from the old blog (which was mostly a mirror of my old LiveJournal account anyway), so I just left all of them behind and starting from a clean sheet here.

Well, I actually did move one post, the only technical post I had there, and since the plan for this new blog to be fairly technical, I decided it will be a good start.

Wish me luck and lets hope I will actually write.

Multiple working folders with single GIT repository

The problem

As much as I love GIT, there are several design decisions that make my everyday work really difficult. One of such design flaws is making the repository and the working folder tightly coupled. It is a serious problem for every developer who is frequently switching between several product/topic branches of the same project.

Git gives you two options to deal with this:

  • Create different local clones of the repository and checkout different branch in every one of them.
  • Do all your work in one local repository, every time you have a task switch ether stash or commit your changes on the current branch and checkout a different one.

As I quickly learned both solutions were not very practical for what I was doing. As a linux kernel developer in a company that actually ships multiple linux based products, I eventually found myself working and supporting several product branches, sometimes based on different versions of the kernel. One branch could be plain vanilla kernel.org kernel based on 2.6.32, another one could be an old product running of 2.6.24 with tons of additional architecture specific code that came from a vendor, etc. Both approaches had very serious drawbacks:

  • Cloning a kernel repository for each product/branch would take more disk space (every copy of kernel repository is about 500MB), even though disk space is cheap today, you have to admit it doesn’t make sense to keep several copies of the same binary blob on your disk (UPDATE 10/25/2010: That is actually not true, cloning locally from the existing repository is done via hard links, so no extra space is wasted). But the biggest problem is synchronizing all these local repositories. If I need to merge branches or cherry-pick individual commits between them I end up pushing and pulling the changes through a shared remote repository, or connecting these local repositories as remotes to one another.
  • Doing all your work in one repository is not an option either, if you ever tried to checkout a 2.6.24 linux kernel branch while already having 2.6.35 in your folder you’d understand why, it takes time. When you are multi-tasking you usually try to minimize the overhead of each switch and having to stash your current work or create dummy commits every time you switch doesn’t seem like the right way to do it. Another drawback to this approach is there is no convenient way to do a nice visual diff between files and folders on two branches, yes, it’s doable, but it is not as easy as running “meld folderA folderB”.

The solution

I’m not going to take credit for the method I am going to show you, I wasn’t the one who came up with it, Ed, one of the guys on my team did. Ed actually developed whole set of useful scripts for kernel development involving frequent switching between product branching, but the part about branching and multiple working folders is applicable to other projects as well, so that’s what I want to share with you.
Actually the trick is quite trivial, we use our knowledge of git internals and unix symbolic links to achieve what we want. All we wanted is to have one repository and multiple working folders associated with it, so lets do just that. That is how our final folder structure will look like:

project
+-.repo
  +-.git
    +-branches
    +-config
    +-description
    +-HEAD
    +-hooks
    +-objects
    +-info
    +-packed-refs
    +-refs
+-branchA
  +-.git
    +-branches -> ../../.repo/.git/branches
    +-config -> ../../.repo/config
    +-description
    +-HEAD
    +-index
    +-hooks -> ../../.repo/hooks
    +-objects -> ../../.repo/objects
    +-info -> ../../.repo/info
    +-packed-refs -> ../../.repo/packed-refs
    +-refs -> ../../.repo/refs
+-branchB
+-branchC
...
etc

We created one wrapper “project” folder, one master GIT repository in .repo and a bunch of branchX folders. Each one of these folders is a legal GIT folder by itself, however it doesn’t keep its own config and objects/refs repositories, it links to the one in .repo instead. It does keep a private local version of HEAD, index (staging area) and the whole working folder. Every manipulation on the database performed in any of the branch folders is immediately visible in others (commits, branches, tags, remote changes), context switching is just a matter of changing folders now.

This is how we initialize this structure for the first time time:

init.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
mkdir .repo
pushd .repo
git init
git remote add origin git://some.remote.url.../project.git
git fetch origin
popd
newfolder.sh branchA
pushd branchA
git checkout -b branchA origin/branchA
popd
newfolder.sh branchB
pushd branchB
git checkout -b branchB origin/branchB
popd

The script we used to automate creation of a new working folder (newfolder.sh). We can call it as many times as we want an whenever we need to create a new folder.

newfolder.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
if [ "$1" == "" ]; then
echo "Need to specify target."
return
fi
TARGET=$1
mkdir $TARGET
pushd $TARGET
git init
pushd .git
for i in branches config hooks info objects refs packed-refs ; do
rm -rf $i
ln -sf ../../.repo/.git/$i $i
done
popd # .git
popd # $TARGET

Enjoy! I find this method very useful already, but I’ll be glad to hear some feedback since I am sure someone will come up with a way to polish it and make it even better.

Update 10/25/01: I wrote the post yesterday, and today I accidentally found out that the trick described above for a while has been a part of a standard git distribution and can be found in contrib/workdir/git-new-workdir :)