Understanding Git in 5 minutes

Git it seems is known for being confusing and difficult to learn. If you are transitioning from a “traditional” versioning system such as CVS or Subversion, here are the things you need to know:

  • A “working copy” in Subversion is a copy of the various files in a subversion repository, together with metadata linking it back to the repository. When using Git, your working copy (sometimes referred to as “working directory”, apparently, in Git parlance) is actually hosted inside a local copy (clone) of the remote repository. To create this clone you use the “git clone” command. So, generally, you would use “git clone” where you would have used “svn checkout”.
  • A repository has a collection of branches, some of which may be remote-tracking branches (exact copies of the upstream repository), and the rest of which are local branches (which generally have an associated upstream and remote-tracking branch).
  • In Git you make changes by committing them to the local branch. You can later push these commits upstream, which (if successful) also updates the associated remote tracking branch in your repository.
  • But actually, commit is a two-stage operation in Git. First you stage the files you want to commit (“git add” command), then you perform the commit (“git commit”).
  • You can fetch any new changes from the remote repository into a remote tracking branch, and you can merge these changes into your local branch; You can combine both operations by doing a pull (“git pull”), which is the normal way of doing it.
  • A Git “checkout” is to replace your working copy with a copy of a branch. So you switch branches use the “git checkout” command. You cannot (or at least don’t normally) checkout remote tracking branches directly; you instead checkout the associated local branch.
  • So actually a Git repository contains these different things:
    • A collection of branches, both local and remote-tracking;
    • A working copy
    • A “stash” (of changes to be committed)
    • Some configuration
    • (a few other things not mentioned here).
  • Except for the working copy itself, everything in a Git repository is stored in the “.git” directory within your working copy.
  • A “bare” Git repository doesn’t have a working copy (or stash). The data that would normally be inside “.git” is instead contained directly inside the repository root folder. A repository hosted on a server is often a “bare” repository; when you clone it, your clone will not be bare, so that you can perform checkouts / make commits etc.
  • Because you make commits to a local branch, a commit operation does not contact the origin server. This means that commits may occur in the origin repository simultaneously. A merge will be needed before the local branch commits can be pushed to the remote repository.
  • Git versions represent a snapshot of the repository state. They are identified by an SHA-1 checksum (of both file contents and some metadata, including version creation date and author). A Git version has a preceding version, and may have multiple preceding versions if it is the result of a merge.
  • To avoid complicating the commit history, you can rebase commits made to your local branch, which means that the changes they make are re-applied after the changes from remote commits. This re-writes the history of your commits, effectively making it appear that your commits were performed after the remote changes, instead of interleaved with them. (You can’t rebase commits after you have pushed them).
Advertisements

C++ and pass-by-reference-to-copy

I’m sure many are familiar with the terms pass-by-reference and pass-by-value. In pass-by-reference a reference to the original value is passed into a function, which potentially allows the function to modify the value. In pass-by-value the function instead receives a copy of the original value. C++ has pass-by-value semantics by default (except, arguably, for arrays) but function parameters can be explicitly marked as being pass-by-reference, with the ‘&’ modifier.

Today, I learned that C++ will in some circumstances pass by reference to a (temporary) copy.

Interjection: I said “copy”, but actually the temporary object will have a different type. Technically, it is a temporary initialised using the original value, not a copy of the original value.

Consider the following program:

#include <iostream>

void foo(void * const &p)
{
    std::cout << "foo, &p = " << &p << std::endl;
}

int main (int argc, char **argv)
{
    int * argcp = &argc;
    std::cout << "main, &argcp = " << &argcp << std::endl;
    foo(argcp);
    foo(argcp);
    return 0;
}

What should the output be? Naively, I expected it to print the same pointer value three times. Instead, it prints this:

main, &argcp = 0x7ffc247c9af8
foo, &p = 0x7ffc247c9b00
foo, &p = 0x7ffc247c9b08

Why? It turns out that what we end up passing is a reference to a temporary, because the pointer types aren’t compatible. That is, a “void * &” cannot be a reference to an “int *” variable (essentially for the same reason that, for example, a “float &” cannot be a reference to a “double” value). Because the parameter is tagged as const, it is safe to instead pass a temporary initialised with the value of of the argument – the value can’t be changed by the reference, so there won’t be a problem with such changes being lost due to them only affecting the temporary.

I can see some cases where this might cause an issue, though, and I was a bit surprised to find that C++ will do this “conversion” automatically. Perhaps it allows for various conveniences that wouldn’t otherwise be possible; for instance, it means that I can choose to change any function parameter type to a const reference and all existing calls will still be valid.

The same thing happens with a “Base * const &” and “Derived *” in place of “void * const &” / “int *”, and for any types which offer conversion, eg:

#include <iostream>

class A {};

class B
{
    public:
    operator A()
    {
        return A();
    }
};

void foo(A const &p)
{
    std::cout << "foo, &p = " << &p << std::endl;
}

int main (int argc, char **argv)
{
    B b;
    std::cout << "main, &b = " << &b << std::endl;
    foo(b);
    foo(b);
    return 0;
}

Note this last example is not passing pointers, but (references to) object themselves.

Takeaway thoughts:

  • Storing the address of a parameter received by const reference is probably unwise; it may refer to a temporary.
  • Similarly, storing a reference to the received parameter indirectly could cause problems.
  • In general, you cannot assume that the pointer object referred to by a const-reference parameter is the one actually passed as an argument, and it may not exist once the function returns.

Maven in 5 minutes

Despite having been a professional Java programmer for some time, I’ve never had to use Maven until now, and to be honest I’ve avoided it. Just recently however I’ve found myself need to use it to build a pre-existing project, and so I’ve had to had to get acquainted with it to a small extent.

So what is Maven, exactly?

For some reason people who write the web pages and other documentation for build systems seem to have trouble articulating a meaningful description of what their project is, what it does and does not do, and what sets it apart from other projects. The official Maven website is no exception: the “what is Maven?” page manages to explain everything except what Maven actually is. So here’s my attempt to explain Maven, or at least my understanding of it, succintly.

  • Maven is a build tool, much like Ant. And like Ant, Maven builds are configured using an XML file. Whereas for Ant it is build.xml, for Maven it is pom.xml.
  • Maven is built around the idea that a project depends on several libraries, which fundamentally exist as jar files somewhere. To build a project, the dependencies have to be download. Maven automates this, by fetching dependencies from a Maven repository, and caching them locally.
  • A lot of Maven functionality is provided by plugins, which are also jar files and which can also be downloaded from a repository.
  • So, yeah, Maven loves downloading shit. Maven builds can fail if run offline, because the dependencies/plugins can’t be downloaded.
  • The pom.xml file basically contains some basic project information (title etc), a list of dependencies, and a build section with a list of plugins.
  • Dependencies have a scope – they might be required at test, compile time, or run time, for example.
  • Even basic functionality is provided by plugins. To compile Java code, you need the maven-compiler-plugin.
  • To Maven, “deploy” means “upload to a repository”. Notionally, when you build a new version of something with Maven, you can then upload your new version to a Maven repository.
  • Where Ant has build targets, Maven has “lifecycle phases”. These include compile, test, package and deploy (which are all part of the build “lifecycle”, as opposed to the clean lifecycle). Running a phase runs all prior phases in the same lifecycle first. (Maven plugins can also provide “goals” which will run during a phase, and/or which can be run explicitly in a phase via the “executions” section for the plugin in the pom.xml file).
  • It seems that the Maven creators are fans of “convention over configuration”. So, for instance, your Java source normally goes into a particular directory (src/main/java) and this doesn’t have to specified in the pom file.

Initial Thoughts on Maven

I can see that using Maven means you have to expend minimal effort on the build system if you have a straight-forward project structure and build process. Of course, the main issue with the “convention over configuration” approach is that it can be inflexible, actually not allowing (as opposed to just not requiring) configuration of some aspects. In the words of Maven documentation, “If you decide to use Maven, and have an unusual build structure that you cannot reorganise, you may have to forgo some features or the use of Maven altogether“. I like to call this “convention and fuck you (and your real-world requirements)”, but at least it’s pretty easy to tell if Maven is a good fit.

However, I’m not sure why a whole new build system was really needed. Ant seems, in many respects, much more powerful than Maven is (and I’m not saying I actually think Ant is good). The main feature that Maven brings to the table is the automatic dependency downloading (and sharing, though disk space is cheap enough now that I don’t know if this is really such a big deal) – probably something that could have been implemented in Ant or as a simple tool meant for use with Ant.

So, meh. It knows what it can do and it does it well, which is more than can be said for many pieces of software. It’s just not clear that its existence is really warranted.

In conclusion

There you have it – my take on Maven and a simple explanation of how it works that you can grasp without reading through a ton of documentation. Take it with a grain of salt as I’m no expert. I wish someone else had written this before me, but hopefully this will be useful to any other poor soul who has to use Maven and wants to waste as little time as possible coming to grips with the basics.

On Going 64-bit

Some History

My desktop system almost exclusively runs software that I compiled from source. It’s been that way for a long time, since I got frustrated with the lacklustre pace of Debian-stable software updates and with the package management system in general (this was 1998 or so). I began compiling packages from source rather than updating them via the Debian repository, and eventually decided to purge the remaining parts of Debian from my system, replacing Debian packages with compiled-from-source counterparts.

What can I say; I was young and optimistic. Surprisingly enough, I managed to get a working system, though compiling many packages was fraught with unwarranted difficulty and required various hacks to makefiles and buildscripts. Linux From Scratch didn’t even exist yet, so I was mostly on my own when I ran into problems, but I persevered and it payed off. I had a nice, fast, working system and I had a good understanding of how it all fit together. Later, when Fedora and Ubuntu appeared on the scene (and Debian had got its act together somewhat) I felt no desire to switch to these fancy new distributions because doing so would mean losing the strong connection with the system that I had pieced together by hand.

Sure, I faced some problems. Upgrading packages was a difficult, and often required upgrading several of the dependencies. Uninstalling packages was a nightmarish procedure of identifying which files belonged to the package and deleting them one by one. I learned that I could use “make DESTDIR=… install” to output many packages into their own root, and I eventually I wrote a shell script that would “install” packages by symbolically linking them into the root file system from a package specific root, and could uninstall them by removing those links – so I had a kind of rudimentary package management (I cursed those packages which didn’t support DESTDIR or an equivalent; I often ended up needing to edit the makefile by hand). I usually compiled without many of the optional dependencies and to a large extent I avoided the “dependency hell” that had plagued me while using Debian. I found Linux From Scratch at some point and used its guides as a starting point when I wanted to upgrade or install a new package. I kept notes of build options and any specific hacks or patches that I had used. Larger packages (the Mozilla suite and OpenOffice come to mind) were the most problematic, often incorporating non-standard build systems and having undocumented dependencies; to fix an obscure build problem I often had to tinker and then repeat the build process, which could run for hours, multiple times until I had managed to work around the issue (anyone who’s read this blog is now probably starting to understand why I have such a loathing for bad build documentation!).

Despite all the problems and the work involved, I maintained this system to the present day.

Modernising

When I started building my system, the processor was a 32-bit Pentium III. When an upgrade gave me a processor that was 64-bit capable, it didn’t seem like the effort of switching to the new architecture was justifiable, so I stuck with the 32-bit software system. Recently I decided that it’s time to change, so I begun the task of re-compiling the system for the 64-bit architecture. This required building GCC as a cross-compiler, that is, a compiler that runs on one architecture but that targets another.

Building a cross-compiler was not as easy as I had hope it would be. GCC requires a toolchain (linker, assembler etc) that supports the target architecture. GNU Binutils targeting on a 32-bit architecture cannot handle the production of 64-bit binaries, so the first step (and probably the easiest) was to build a cross-target Binutils. This was really about as simple as:

./configure --prefix=/opt/x86_64 --host=i686-pc-linux-gnu --target=x86_64-pc-linux-gnu
make
make install

However, building GCC as a cross compiler is nowhere near as trivial. The issue is that GCC includes both a compiler and a runtime-support library. Building the runtime-support library requires linking against an appropriate system C library, and of course I didn’t have one of those, and I couldn’t build one because I didn’t have a suitable cross-compiler. It turns out, however, that you can build just enough of GCC to be able to build just enough of Glibc that then you can then build a bit more of GCC and then the rest of Glibc and finally the rest of GCC. This process isn’t formally documented (which is a shame) but there’s a good rundown of it in a guide by Jeff Preshing, without which I’m not sure I would have succeeded. (Some additional notes on cross compiling GCC are included at the end of this post).

Now I had a working cross-compiler targeting the x86_64 platform. When built as a cross-compiler GCC and Binutils name their executables beginning with an architecture prefix, so for instance “gcc” becomes “x86_64-pc-linux-gnu-gcc” and “ld” becomes “x86_64-pc-linux-gnu-ld” (you actually get these names when building as a non-cross compiler, too, but in that case you also get the non-prefixed name installed). To build a 64-bit kernel, you simply supply some variables to the kernel make process:

make ARCH=x86_64 CROSS_COMPILE="/opt/x86_64/bin/x86_64-pc-linux-gnu-" menuconfig
make ARCH=x86_64 CROSS_COMPILE="/opt/x86_64/bin/x86_64-pc-linux-gnu-" bzImage
# and so on

The “CROSS_COMPILE” variable is the prefix used for any compiler/binutil utility when producing object for the target. So for example “gcc” is prefixed to become “/opt/x86_64/bin/x86_64-pc-linux-gnu-gcc”.

I built the kernel, and booted to it. It worked! … except that it refused to run “init”, because I hadn’t enabled “IA-32 emulation” (the ability to run 32-bit executables on a 64-bit kernel). That was easily resolved, however.

I then set about building some more 64-bit packages: build Binutils as a 64-bit native package, re-building Glibc without the special –prefix, building GCC as a 64-bit native compiler (which required first cross-compiling its prequisites, including MPFR, GMP and MPC). All seems to be going ok so far.

Multi-lib / Multi-arch

One issue with having 32-bit and 64-bit libraries in the same system is that you have name clashes. If I have a 32-bit /usr/lib/libgmp.so.10.1.3, where do I put the 64-bit version? It seems that the prevalent solution is to put 64-bit libraries in /usr/lib64 (or /lib64) and leave only 32-bit libraries in plain /usr/lib and /lib. The Glibc dynamic linker hard-codes these paths when compiled for x86_64, it seems. So, when I build 64-bit packages I’ll use –libdir=/usr/lib64, I guess, but I don’t like this much; the division seems pretty unnatural, mainly in that it favours the 32-bit architecture and is somewhat arbitrary. Debian and Ubuntu both appear to have similar reservations and are working on “Multiarch spec”, but for now I’ll go with the lib/lib64 division as it’s going to be less immediate hassle.

I also still have the issue that I can’t control which packages pkg-config will detect. I guess I can use the PKG_CONFIG_PATH environment variable to add the 64-bit paths in when doing a 64-bit build, but I can’t prevent it from picking up 32-bit packages in the case where the 64-bit package isn’t installed, which I imagine could lead to some pretty funky build errors.

That’s about where I’m at. It wasn’t particularly easy, but it is done and it seems to work.

Notes on building GCC as a Cross Compiler

First, read the guide by Jeff Preshing.

I have the following additional notes:

  • If it’s not clear from Jeff’s guide, –prefix for your GCC cross-compiler build should be /xyz (or whatever you like) and –prefix for Glibc should be /xyz/$TARGET, in my case I used /usr/x86-64 and /usr/x86-64/x86_64-pc-linux-gnu. The GCC cross-compiler will expect to find libraries and include files in the latter (under …/lib and …/include).
  • I wanted multilib support, whereas Jeff’s guide builds GCC without multilib. For this you need the 32-bit Glibc available to the cross-compiler, in $GLIBC_PREFIX/lib/32 (being /usr/x86-64/x86_64-pc-linux-gnu/lib/32 in my case). I symlinked various libraries, but you might get away with just symlinking your entire /usr/lib as $GLIBC_PREFIX/lib/32. You also need $GLIBC_PREFIX/include/gnu/stubs-32.h (which you can link from /usr/include/gnu/stubs-32.h).
  • During the Glibc build I got an error about an unresolved symbol, on something like __stack_chk_guard (foolishly I did not record the exact error). I got around this by configuring Glibc with ‘libc_cv_ssp=no’.
  • If you install 64-bit libraries into the standard /usr/lib and want to link against them when building with your cross-compiler, you’ll need to add -L/usr/lib to your linker flags (eg LDFLAGS=-L/usr/lib during configure).

I normally like Google, but…

I recently had to fill out the “Report a delivery problem between your domain and Gmail” form.

This is a server that I use personally. It is not an open relay and sends email only from me. I have checked the outgoing mail logs and there is no way, not a chance, that there has been spam from my domain to any gmail accounts.

Google, I don’t understand why you’ve blocked me from sending email through my server from sending email to gmail accounts. Perhaps the IP address was used by a spammer in the past, but that must have been years ago. I don’t understand, either, why you make this form (https://support.google.com/mail/contact/msgdelivery) so difficult to find and fill out; why, for instance, you ask for ‘results from your tests’ and then limit the field length so that is impossible to provide those results.

I have now done everything within my power to ensure that my server cannot be seen as a spammer. I have set up reverse DNS records so that the IP address (***.***.***.***) correctly resolves to the hostname (******.***). I have added an SPF record for the site. In fact I have completely complied, always, with your “Best practices for forwarding emails to gmail” (https://support.google.com/mail/answer/175365?hl=en), and more recently with your “Bulk Senders Guidelines” (https://support.google.com/mail/answer/81126?hl=en) despite the fact that I am clearly not a Bulk Sender.

Please, at least, remove my server from your blacklist and allow the limited number of your users that I wish to contact to receive my emails.

Please also fix your form. And for pity’s sake please fix your 550 response so that it guides server admins to the form rather than requiring them to trawl the internet in search of it. I’d like to suggest, furthermore, that it’s not reasonable to blacklist a server and return SMTP 550 responses, without allowing the server administrator some means of discovering why their server is blacklisted.

Thankyou.

After submitting the form, this text is displayed:
Thank you for your report. We will investigate this issue and take the necessary steps to resolve it. We will contact you if we need more details; however, you will not receive a response or email acknowledgment of your submission.

… so, I can’t even expect a response? Lift your game, Google. Lift your game.

Edit 17/July/2015: After having avoided the problem for some time by (a) occasionally using a different mail server and (b) not emailing Gmail addresses, I finally figured out the problem – after looking at the Postfix logs and noticing that the IP address for the Gmail relay was an IPv6 address, I re-configured Postfix to contact Gmail servers only via IPv4. Hey presto, it worked! It seems that I had reverse DNS for IPv4 but not IPv6, and the lack of reverse DNS is enough to make the Gmail relays refuse to accept mail.

I wondered about this. I had SPF set up and surely that make the reverse-DNS check unnecessary? Of course there is always the following scenario:

  • I gain access to a mail server through which I can route mail (maybe it’s an open relay, or maybe I get access via some other means);
  • I set up a domain name, and specify (via an SPF record) that my relay is used to send email for that domain
  • I spam away.

While this is certainly possible, it also seems to be easy to deal with, because it requires the spammer to purchase a domain and, given that emails can be verified as “originating” from that domain due to SPF, the domain can just be blacklisted. In any case I would think that the 550 response from the Gmail relay should include information on exactly why the message was refused, which would have saved me a lot of trouble.

Sakura 2.4.0 build failure

I was just trying to build the Sakura terminal emulator. More-or-less following the instructions exactly, after running “cmake .”, I then ran “make” at which point I got the following:

Unknown option: u
make[2]: *** [CMakeFiles/man] Error 1
make[1]: *** [CMakeFiles/man.dir/all] Error 2
make: *** [all] Error 2

Yeah, it’s not exactly the most helpful output, is it? This is what I hate about CMake and non-standard build systems in general. Eventually I figured out the problem is in the file “CMakeFiles/man.dir/build.make” where the target “CMakeFiles/man:” calls the “pod2man” utility (from that almighty piece-of-crap scripting language, perl) with a “-u” argument which “pod2man” doesn’t seem to recognize (perl 5.8.8). Removing that makes the build complete.

 

Schwartz did not. Innocent until proven otherwise.

OSNews is running an article with this headline:

De Icaza: Sun’s Schwartz Pitched Google Lawsuit to Oracle

Seriously. De Icaza is saying that Schwartz had actually pitched the idea of a a Google lawsuit to Oracle; but he has nothing to back it up.

His report has been confirmed by James Gosling …

No it hasn’t. James Gosling’s blog post says nothing about J. Schwartz whatsoever. In this community I’d expect a bit more constraint. Icaza is full of crap in this. All you have to do is read James’ blog post. He, repeat, says nothing about Schwartz. Why is De Icaza pointing fingers?

Icaza specifically says:

So now we know that Jonathan shopped Sun with a big “Sue Google” sign. So much for his visionary patent defense against Apple and of course this jewel: …

Ehh, wrong. Icaza, you really are an idiot. Read James’ friggin’ post. OSNews, you are also idiots for reporting this.

Update 2012-05-07: So Google’s star witness in Oracle vs Google is… Jonathan Schwartz. So much for De Icaza’s credibility.