Typing bidi text

I feel stupid asking it, but where can I learn that sacral secret knowledge of typing hebrew text mixed with english, numbers and punctuation using modern editors and operating systems? Oh, and after typing, being able to copy-paste it to another app/window while preserving the right order of elements in a sentence. I can’t remember the last time I did that, I think the most popular text editor in Israel was called Einstein back then and copy-paste hasn’t been discovered by humanity yet, in any case I can’t seem to be able to do it now, no matter what I try. All I need is to type two short sentences.

NGINX rewrites on Amazon Elastic Beanstalk

We are running most of our backend code at Dishero on Amazon Elastic Beanstalk and it has been great so far. With Elastic Beanstalk we don’t need to worry about provisoning new instances and auto-scaling, in most cases it just allows us to upload our app and ELB takes care of the rest, however in some rare cases when we actually want to tinker with the internals and do some changes to the actual container it appears to be non-trivial and sometimes takes time to figure out.

Typical ELB setup

In a typical auto-scaled ELB setup, we have one elastic load balancer and multiple instances that execute our code. The setup is as follows:

  • Elastic Load Balancer is the one that is terminating the SSL connection, traffic to our instances is pure HTTP over port 80. You can find out more about setting up SSL certificates on Elastic Beanstalk here.
  • The instances are configured to forward all traffic from port 80 to 8080 using iptables.
  • Each instance has an NGINX running, which is listening on port 80880 and forwarding the traffic to our actual Node applicaiton.

The problem

We want to configure nginx to redirect all non-https traffic to https, and while we are at it to redirect all non-www traffic to www (i.e always push users to https://www.example.com/…).

  • It would be nice also to serve these as one redirect, taking a naive approach and writing two different rules (one for www and one for https) might result in two sequential redirects.
  • Since we are running the same configuration on multiple configurations which have different base URLs, we do not want to hardcode the actual URLs in the actual rulesm but rather keep them generic.

The solution

After several unsuccessfull iterations we arrived at the following set of rules:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
set $fixedWWW '';
set $needRedir 0;
# nginx does not allow nested if statements
# check and decide on adding www prefix
if ($host !~* ^www(.*)) {
set $fixedWWW 'www.';
set $needRedir 1;
}
# what about that https? the traffic is all http right now
# but elastic load balancer tells us about the original scheme
# using $http_x_forwarded_proto variable
if ($http_x_forwarded_proto != 'https') {
set $needRedir 1;
}
# ok, so whats the verdict, do we need to redirect?
if ($needRedir = 1) {
rewrite ^(.*) https://$fixedWWW$host$1 redirect;
}

So the question is where should we put it?

The file that configures nginx to proxy traffic from 8080 to the application in the Elastic Beanstalk environment is located at /etc/nginx/conf.d/00_elastic_beanstalk_proxy.conf

Obviously, SSH’ing to all the instances, modifying the file and restarting nginx manually is of no use, it will get overwritten next time the app is deployed and newly deployed instances won’t have the changes either.
Luckily for us Beanstalk allows us to customize the EC2 environment it provisions using configuration files, that system is pretty flexible and allows not only to install yum packages, write and overwrite files in the system, but to run commands and shell scripts during app deployments as well.

We may be tempted to use the config files to overwrite the 00_elastic_beanstalk_proxy.conf file in /etc/nginx/conf.d directly, and then wonder where are our changes and why they are nowhere to be seen in the system. Actually it might work well if all we want is add new nginx configuration files, but the issue is with existing nginx files, during the deployment process, the customization stage happens before nginx default files are being installed into their by the Elastic Beanstalk system, so even if we set up our copy of 00_elastic_beanstalk_proxy.conf, moments later it will still be overwritten with the default one. We need to overwrite that default one instead, and the source location of these is /tmp/deployment/config/, the one we are mostly interested in is suprisingly named #etc#nginx#conf.d#00_elastic_beanstalk_proxy.conf

So eventually, after all the trial and error, the solution appears to be quite simple, the one thing that needs to be added to our project is the following nginx.config file inside our .ebextensions folder:

.ebextensions/nginx.config
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
files:
"/tmp/deployment/config/#etc#nginx#conf.d#00_elastic_beanstalk_proxy.conf" :
mode: "000755"
owner: root
group: root
content: |
upstream nodejs {
server 127.0.0.1:8081;
keepalive 256;
}
server {
listen 8080;
set $fixedWWW '';
set $needRedir 0;
# nginx does not allow nested if statements
# check and decide on adding www prefix
if ($host !~* ^www(.*)) {
set $fixedWWW 'www.';
set $needRedir 1;
}
# what about that https? the traffic is all http right now
# but elastic load balancer tells us about the original scheme
# using $http_x_forwarded_proto variable
if ($http_x_forwarded_proto != 'https') {
set $needRedir 1;
}
# ok, so whats the verdict, do we need to redirect?
if ($needRedir = 1) {
rewrite ^(.*) https://$fixedWWW$host$1 redirect;
}
location / {
proxy_pass http://nodejs;
proxy_set_header Connection "";
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
gzip on;
}

hexo

Spent the evening setting up a new blog engine as I couldn’t revive the old one. This whole Ruby/Gem/Bundle mess is beyond me, it just never works. So this blog is now running on hexo. I am sure that won’t make me write more, but its Node.JS/NPM based, so at least I will be able to set it up again on a new machine if I suddenly do decide to.

What was it I wanted to blog about? I really can’t remember.

Google Reader

Since Google has shut down the Reader I keep reading in various sources the same idea over and over again. The idea that RSS is an old technology, that it was and is used only by geeks, and that it is obsolete and not needed anymore as one is supposed to get his fix of news through social media (facebook, twitter) and social reading apps like zite, flipboard, etc.

What bothers me most is I read these same ideas in blogs of pretty technical people whose opinions I do usually value.

The interesting part is, if it wasn’t for the RSS, I wouldn’t be reading these blogs in the first place, as it would be too much of a hassle for me to visit each one of their individual sites to check whether they generated some new content. I am also not going to subscribe to their facebook or twitter feeds for the occasional notifications, as it has too much noise for my taste, I don’t care for their random 140 letter ideas, nor do I care about photos of their dinner, I only want their well structured thoughts, the ones they actually spent some time on. So if RSS dies tomorrow, chances are I will still visit some of these sites, but the frequency will become significantly lower with time, and eventually I might even forget about some of them, or give up on regularly reading blogs altogether.

I do agree that social media and tablet/iphone magazines do solve the problem of getting news, latest internet memes etc. I do use Flipboard, Facebook and Twitter myself. But these are things that will eventually pop up in one of these social feeds one way or another anyway, if I miss the post from site A then I will notice a repost from site B the next day or a week later. And even if I do miss some, who cares, knowing it doesn’t really bother me.

But what about personal blogs? I know John Doe, I know he is writing about a topic that is close to my heart, I want to read absolutely everything he writes and I want to read it the day he writes it because I like to be part of the discussion as well. I also like to read everything in one place and I want to be able to save reading some of these articles for later. Is there any technology besides RSS today that fits all these requirements?

I will happily admit RSS is dead if you can point me in the direction of that alternative technology. I don’t think you can though.

Hello, Octopress!

I finally gave in to peer pressure and switched my blog to Octopress.

Apparently WordPress is not cool anymore and all the kids forgot php, uninstalled mysql, went back to generating static pages and just outsource all comment management headache to Disqus.

In the process if migration I decided I didn’t really have any important posts or comments I had to move over from the old blog (which was mostly a mirror of my old LiveJournal account anyway), so I just left all of them behind and starting from a clean sheet here.

Well, I actually did move one post, the only technical post I had there, and since the plan for this new blog to be fairly technical, I decided it will be a good start.

Wish me luck and lets hope I will actually write.

Multiple working folders with single GIT repository

The problem

As much as I love GIT, there are several design decisions that make my everyday work really difficult. One of such design flaws is making the repository and the working folder tightly coupled. It is a serious problem for every developer who is frequently switching between several product/topic branches of the same project.

Git gives you two options to deal with this:

  • Create different local clones of the repository and checkout different branch in every one of them.
  • Do all your work in one local repository, every time you have a task switch ether stash or commit your changes on the current branch and checkout a different one.

As I quickly learned both solutions were not very practical for what I was doing. As a linux kernel developer in a company that actually ships multiple linux based products, I eventually found myself working and supporting several product branches, sometimes based on different versions of the kernel. One branch could be plain vanilla kernel.org kernel based on 2.6.32, another one could be an old product running of 2.6.24 with tons of additional architecture specific code that came from a vendor, etc. Both approaches had very serious drawbacks:

  • Cloning a kernel repository for each product/branch would take more disk space (every copy of kernel repository is about 500MB), even though disk space is cheap today, you have to admit it doesn’t make sense to keep several copies of the same binary blob on your disk (UPDATE 10/25/2010: That is actually not true, cloning locally from the existing repository is done via hard links, so no extra space is wasted). But the biggest problem is synchronizing all these local repositories. If I need to merge branches or cherry-pick individual commits between them I end up pushing and pulling the changes through a shared remote repository, or connecting these local repositories as remotes to one another.
  • Doing all your work in one repository is not an option either, if you ever tried to checkout a 2.6.24 linux kernel branch while already having 2.6.35 in your folder you’d understand why, it takes time. When you are multi-tasking you usually try to minimize the overhead of each switch and having to stash your current work or create dummy commits every time you switch doesn’t seem like the right way to do it. Another drawback to this approach is there is no convenient way to do a nice visual diff between files and folders on two branches, yes, it’s doable, but it is not as easy as running “meld folderA folderB”.

The solution

I’m not going to take credit for the method I am going to show you, I wasn’t the one who came up with it, Ed, one of the guys on my team did. Ed actually developed whole set of useful scripts for kernel development involving frequent switching between product branching, but the part about branching and multiple working folders is applicable to other projects as well, so that’s what I want to share with you.
Actually the trick is quite trivial, we use our knowledge of git internals and unix symbolic links to achieve what we want. All we wanted is to have one repository and multiple working folders associated with it, so lets do just that. That is how our final folder structure will look like:

project
+-.repo
  +-.git
    +-branches
    +-config
    +-description
    +-HEAD
    +-hooks
    +-objects
    +-info
    +-packed-refs
    +-refs
+-branchA
  +-.git
    +-branches -> ../../.repo/.git/branches
    +-config -> ../../.repo/config
    +-description
    +-HEAD
    +-index
    +-hooks -> ../../.repo/hooks
    +-objects -> ../../.repo/objects
    +-info -> ../../.repo/info
    +-packed-refs -> ../../.repo/packed-refs
    +-refs -> ../../.repo/refs
+-branchB
+-branchC
...
etc

We created one wrapper “project” folder, one master GIT repository in .repo and a bunch of branchX folders. Each one of these folders is a legal GIT folder by itself, however it doesn’t keep its own config and objects/refs repositories, it links to the one in .repo instead. It does keep a private local version of HEAD, index (staging area) and the whole working folder. Every manipulation on the database performed in any of the branch folders is immediately visible in others (commits, branches, tags, remote changes), context switching is just a matter of changing folders now.

This is how we initialize this structure for the first time time:

init.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
mkdir .repo
pushd .repo
git init
git remote add origin git://some.remote.url.../project.git
git fetch origin
popd
newfolder.sh branchA
pushd branchA
git checkout -b branchA origin/branchA
popd
newfolder.sh branchB
pushd branchB
git checkout -b branchB origin/branchB
popd

The script we used to automate creation of a new working folder (newfolder.sh). We can call it as many times as we want an whenever we need to create a new folder.

newfolder.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
if [ "$1" == "" ]; then
echo "Need to specify target."
return
fi
TARGET=$1
mkdir $TARGET
pushd $TARGET
git init
pushd .git
for i in branches config hooks info objects refs packed-refs ; do
rm -rf $i
ln -sf ../../.repo/.git/$i $i
done
popd # .git
popd # $TARGET

Enjoy! I find this method very useful already, but I’ll be glad to hear some feedback since I am sure someone will come up with a way to polish it and make it even better.

Update 10/25/01: I wrote the post yesterday, and today I accidentally found out that the trick described above for a while has been a part of a standard git distribution and can be found in contrib/workdir/git-new-workdir :)