TCP/IP Addressing
From Wiki99
↑ Computers ↑
← prev: Bootstrap software
next: TCP/IP DNS →
Contents |
IP Addresses: How Computers Address Each Other
The first thing we need to know is how one computer on the internet manages to communicate with another computer. This is done through an IP address, which is that a.b.c.d address (for example 100.27.36.65) which you see occasionally.
A single network
The internet is, like its name suggests, a collection of networks. Each network is, conceptually, a single wire running through all the computers on that network.
In the old days there was literally one wire snaking through all the computers. Nowadays, for reasons of performance and reliability, when we use ethernet we use what are called hubs or switches. The ethernet cables from each computer all plug into a single box, but conceptually those computers are all still joined together with a single wire.
In the case of a wireless network, the "wire" connecting all the computers is more abstract, but the idea is still the same. In this case the "wire" is an agreement that all the computers using a single wireless network all use the same frequency and other signal characteristics when sending and receiving their radio signals.
Because these computers are all connected by a single conceptual wire, every message thrown onto that wire can be seen by every other computer. Each message has a header which gives the address of the destination computer. Each computer reads the header, sees if it is the destination computer, and if so reads the rest of the message.
|
As a technical detail, the address that is used in this case is not the IP address, it is a second type of address called the ethernet address. There are good technical reasons for these two levels of addressing, however they are not important at our level of detail; it is a good enough approximation to just treat this address as being an IP address. |
Routers
Connecting these different networks together are boxes called routers. A router is just a computer that has at least two networks connected to it. This computer can be a standard computer like a mac or a PC, or it can be a small cheap specialized box, like an Airport base station, or it can be a large expensive specialized box like the Cisco routers used by large corporations. Each network connection on the router is called a network interface.
In a normal home situation, you might have something like this:
-
You have a cable modem which is connected to a wire that also connects
to a number of the neighboring houses and snakes all the way round your
neighborhood to the main cable office. The cable modem box is not a router,
it is just electronics that convert the type of electrical signal on a
cable TV wire to the type of electrical signal that is used by an
ethernet cable.
-
You then have an Airport base station connected to the cable modem.
-
Then you have two "real" computers in your house talking to each other wirelessly, say a tower and a portable.
In this case your home network actually consists of three computers, namely your two "real" computers and the Airport base station. The Airport base station does a number of things, one of which is that it acts as a router.
Now consider the problem of how a packet of data gets from one computer to another.
- Any packet you send from the tower to the portable goes there directly, problem solved. If the two were connected by wired ethernet, the packet would just travel over the wire. The two are connected wirelessly, but the principle is the same --- one of them radiates out the packet over its antenna, and the other receives the packet using its antenna.
-
Now consider a packet that you want to send to the outside world, say to Google, from your laptop.
Your laptop, based on various tables it maintains, knows that the Google server is not on its local wireless network. Your laptop also knows that any data addressed to a computer not on the local network should be sent to the router for this network, ie to the Airport base station. So the packet gets sent to the router (via its wireless interface).
The Airport base station router is especially simple since it only has two network interfaces, so it knows that anything that comes in over one interface just needs to be sent out again over the other interface.
So the packet is copied over to its other interface, (ie the ethernet interface that is connected to the cable modem), and is thrown onto the cable tv wire. The packet now runs along the cable TV wire until it gets to the head office where there will be another router.
This router is much bigger than our simple little base station, with many many wires (usually optical fibers going to various places around the country) connected to it, representing various networks. Based on tables inside this large router, it knows which of the wires connected to it goes, ideally, to Google. Usually no wire goes directly to Google, but the router tables will indicate that one of the wires will get the packet closer to Google. The router throws the packet onto the appropriate wire; the packet bounces around from one network to another, at each stage going through a router; and eventually it arrives at Google's network where one of Google's computers will claim it and process it.
IP address specifics
The above description was fairly vague. We need to look at this in just a little more detail to really understand what is going on.
How can a computer identify where it wants a packet of data to go? The answer is that each computer has what is called an IP address, a number that's unique to that computer, that identifies the computer.
|
But that's not quite an accurate description. What if a computer has two network interfaces, say ethernet and Airport, or two ethernet connectors? How would packets know anything about which of the two ethernet ports they're supposed to go over? The real answer is that it is not each computer but, in fact, each network interface that has an IP address. In practice most computers, even if they have multiple network interfaces (eg ethernet and Airport), only have one of these in use at a given time, and so one can loosely talk about "a computer having an IP address". |
Now consider an IP address like 200.100.75.6. This is not simply a random number. In particular, (to simplify aggressively) the top three numbers are a "network ID" and the lowest number is the number for a particular computer on that network. This is how your portable knows whether it can send outgoing packets directly to your tower, or whether it needs to send them to the router which will then figure out what to do with them. If the "network ID" part of the destination IP address matches the "network ID" part of the sender's IP adddress, just throw the packet onto the network and the recipient computer will see it, otherwise send it to the router which will forward it.
This all has a very important, but slightly confusing, consequence; it means that routers land up having multiple IP addresses, one for each network they are connected to. Suppose my office network is 200.100.75.x, and the office is connected to the internet via a DSL modem connecting it to some phone company network numbered 222.50.60.x; then the router at my office will have one IP address on the office network, say 200.100.75.1, associated with the interface that is connected to my office, and one IP address on the phone company network, say 222.50.60.27, associated with the interface that is connected to the DSL modem. The whole point of the router, the reason for its existence, is to listen for traffic to either of these IP addresses, (ie 200.100.75.1 or 222.50.60.27) on their respective interfaces, and, when a packet targetted at one of these addresses comes it, to copy it over to the other interface.
The important thing to learn from this is that at a bare minimum to do anything using the network your computer has to know two IP addresses.
- Your computer has to know its own IP address, so that when a packet runs past it over its network wire, it knows to read that packet and act on it.
-
And your computer needs to know the IP address of the router on its network to know where to send all packets that are destined for the outside world rather than the local network.
(Note that your computer doesn't need to care about what other IP addresses the router may have on its other interfaces. Only the internals of the router care about that.)
|
There is actually a third piece of information it needs to know, the so-called subnet mask, which usually looks something like 255.255.255.0. This is a technical detail that is not essential to understanding how the system works so we'll ignore it. It is this subnet mask which tells your computer the split between the network ID (which we described above as always being the first three numbers in the IP address) and the ID of a specific computer on that network. |
So where do you get these two IP addresses from and what do you do with them? What you do with them is simpler so let's look at that first.
What to do with an IP address
The networking preference panel shows you a list of the various ways your computer could be connected to a network: for example ethernet, Airport, modem, USB, firewire. (You could even have multiple ethernet cards if you wanted.) The most common situation is to have only one of these active at a time, but in theory you could have all of them active at once. Each one could be connected to a different network, and could be sending and receiving packets. This might seem a little strange, but when you think about it, there's really no problem. Each network you are conected to simply sends packets to some particular address, and if your computer is set up to use some address for ethernet, and reads packets from ethernet, there's nothing to stop it using some other address for wireless and reading packets from wireless. Alternatively a large server might have 8 ethernet cards, with eight IP addresses, simply because one ethernet connection isn't fast enough.
|
Once you have multiple active connections to different networks, you can then, if you want, switch on extra software to make your computer act as a router. Some people do this with powerbooks. For example suppose you and your friends are in a hotel room with an ethernet jack. You can plug one of the powerbooks into the ethernet jack, and set it up [using the Sharing preference panel] to to also make the wireless network connection active, and to route packets between the wireless connection and the ethernet connection. Now all your other friends can see that wireless network and connect to the internet wirelessly, without needing a base station. |
So the important things to understand are that
- each connection to a network is independent
-
each can be active at the same time as the others
-
you can run routing software, if more than one interface is active (but you don't have to, and we won't talk anymore about this)
-
(this is the important point)
each active network connection needs to know its own IP address and the IP address of the router for the network to which it is connected
|
If you are a programmer or have an analytical mind you might be wondering: "What if I want to send out data? How does the operating system know which network interface to send it on?" For example, suppose I am connected to one network using ethernet card 1 and another network using ethernet card 2. I want to send out a packet to Google. What determines which ethernet card is used to send out that packet? At the programming level this is specified through the bind() API. At the user level, for most applications it simply doesn't matter; for applications where it does matter it can usually be specified through command-line arguments or some preference setting. For example ssh has a -b flag to allow you to specify this. |
Where do IP addresses come from?
Manual IP addresses
Let's return to the question of where these two essential IP addresses per
interface comes from.
In the old days these were manually assigned. (At some offices or campuses they
still are.) There would be a network administrator for each network who would
have a list of all the IP addresses on that network, which one were being used
and which ones were not. When you acquired a new computer (or when you moved to
a new building and thus a new network), before you could connect to the network
you would apply for an IP address. The administrator for the network you wanted
to connect to would look at the list to find a free IP number, and would give
it to you along with telling you the IP address of the router sitting on that
network.
This manual process obviously led to all sorts of headaches. People would get confused and use the wrong numbers, the master list of IP addresses would get lost or out of date, people would have to wait a few days to get their number, and so on.
DHCP
To improve life, something called DHCP (dynamic host configuration protocol) was invented.
Double-click on any particular network interface in the network prefs panel, say Airport, and select the TCP/IP tab. You will see a menu listing ways to configure IPv4.
If you choose Manual configuration, you have to type in the IP address for your computer (more precisely the IP address for the Airport network interface of this computer), along with the IP address for the router on this network. (And the mysterious, not-explained-here, subnet mask.)
But the usual situation is that you don't use the Manual option, you use the DHCP option. If you choose DHCP, your computer will essentially send a packet onto the network saying "help, tell me who I am", and a reply will come back from a DHCP server giving your computer an assigned IP address and the IP address of the router for the network this interface is attached to. The DHCP server keeps track of all the details a person used to keep track of like which addresses are being used by computers on the network and which addresses are currently not in use.
|
You might ask how this can work since your computer does not yet have any IP address information. How does it know what address to use to send a "help" packet to the DHCP server? The answer is that it uses a special IP address which is treated as a broadcast address, meaning it is listened to by every computer on the network. The "help" packet includes in it the ethernet address of the ethernet card that is asking for help. (The ethernet address of a given piece of ethernet hardware is burned into that hardware at the factory so you don't need to set it or ever change it. An ethernet address looks like 00:1a:95:b1:c2:99. You can see it in the ethernet tab of the network prefs pane for an ethernet connector.) One of the computers on this ethernet (usually it is the same box that is the router, but it doesn't have to be) is a DHCP server and it recognizes this sort of "help" packet. The DHCP server picks a free address and sends a reply packet containing that address along with the IP address of the router. It sends the packet back using ethernet addressing, not IP addressing. This same scheme works for Airport because Airport basically fakes things to look like an ethernet network (for example each Airport card has an address that looks just like an ethernet address). |
Alternatives to DHCP
Of course since DHCP use ethernet specific concepts (broadcast, ethernet addresses), it won't work on other types of network connections. Other types of network connectors use their own methods of giving you the IP information you need.
As one example, phone modems usually use PPP to connect to the internet. One aspect of PPP does basically the same thing as DHCP; as soon as the modem connection is made, your computer asks the remote modem it is connected to over the phone line to give it the IP address info it needs.
|
The network control panel for ethernet or Airport gives you two other ways to set up your IP address. BootP is a mostly obsolete predecessor to DHCP, but the same sort of idea. DHCP with manual address uses DHCP to pull in some addresses, but then allows you to override some of the information pulled in. |
Real world implications of DHCP for your server
Now the above is interesting and all, but how does it actually affect your server? Of course you need to know how this all works to be able to track down problems, but there is more to it than that.
When you sign up with an ISP, you usually have one of two choices. You can buy "home" service, which utilizes DHCP, or you can buy "business" service. ("Business" service is usually not what you want. It costs far more than home service, while frequently giving you a slower downlink connection. What you get for your extra dollars is a faster uplink connection, and a fixed IP address, also called a static IP address.)
Let's ignore routers, base stations and so on for now; suppose that you have your computer directly connected to a cable modem or a DSL modem using DHCP. One implication of this is that every time you disconnect your computer from the internet (for example when you switch it off or when it goes to sleep) the ISP's DHCP server will see that you are no longer using that IP address, and will mark it as free. Next time you connect, your computer will again send out a "help" packet and will get back what is probably a different IP address (though the IP address for the ISP's router will probably not have changed).
Moreover your DHCP assigned IP address usually comes with a lifetime attached to it, say 24 hours, so after 24 hours, even if your computer was never disconnected from the internet, it will again ask the DHCP server for IP information, and the IP address it gets back may be the one it was using or may be a different one.
In summary, unless you specifically purchase a fixed IP address, your IP address, as assigned by DHCP, will change occasionally. This matters because how are people going to connect to your server if its address keeps changing? We'll discuss how this is handled a little later when we talk about DNS.
NAT
Now when we talked about DHCP in the previous paragraph, we assumed your computer was directly connected to the cable modem/DSL modem. But this is no longer the usual case. More usual is that you have an Airport base station plugged into the cable/DSL modem, and your computers have wireless cards.
As far as the ISP is concerned, there is one computer plugged into their network, just like before, that computer being the base station. The base station will use DHCP, just like any other computer, to acquire its IP information (for its cable-modem-ethernet interface) from the ISP's DHCP server; and just like any other computer, the IP address of the base station will change every so often. The cable-modem-ethernet connector on the base station will have an address like 200.20.85.65 assigned by the cable/DSL ISP's DHCP server.
Meanwhile, on its wireless connection, the base station acts as a DHCP server for the other computers, namely your tower and your portable, in your house. The base station assigns itself an IP address for its wireless interface (this IP address may be 10.0.1.1 for Airport, it may be be 192.168.1.255 for Linksys, these are two common cases), and when the other computers, your tower and your portable, send out "help" requests to get their IP info, they will be assigned addresses like 10.0.1.2 or 10.0.1.3, and told that the router for this wireless network is 10.0.1.1. So in this way the base station has learned how to communicate with the cable/DSL network, and the local machines have learned how to communicate with the base station.
But we've left out one essential detail. How can this, as I have explained it, possibly work? According to what I have said so far, we have base stations all over the world telling users' machines "your IP address is 10.0.1.2". Now if my portable has address 10.0.1.2, (assigned by my Airport base station) and the guy next door's imac also has address 10.0.1.2 (assigned by his Airport base station) then if Amazon.com sends out a packet to me (using my 10.0.1.2 adddress), how does it get delivered to me and not to my neighbor?
There are two parts to answering this:
The first is that the IP addresses 10.x.x.x and 192.168.1.x are special private addresses. Routers are supposed to never send packets with these addresses in them outside that local network. So I, using my portable with address 10.0.1.2 can send packets to my tower at 10.0.1.3 and receive them, but these packets will not be sent out by the Airport router to the outside world.
But, then, how do I talk to the outside world, or, more specifically, when I send a packet to the outside world, if the "reply-to" address that I put in the packet is 10.0.1.2, and addresses like that are no supposed to be used in the outside world, how does this all work?
The answer is something called NAT which stands for Network Address Translation. We'll discuss NAT in more detail later when we cover ports, but the bottom line is that when my portable (with address 10.0.1.2) tries to contact Amazon.com, the Airport base station will send the packets to the outside world rewriting the originating address so that it looks like the packets came from the base station, with an address (on its cable modem interface) of 200.20.85.65. Then when a reply packet arrives, the base station will forward it on to either the portable (at 10.0.1.2) or the tower (at 10.0.1.3) as appropriate. (This of course means that the base station has to maintain various tables so that it can figure out, for any particular incoming packet, whether that packet is a response to a request sent out by the portable [and should be forwarded to 10.0.1.2] or a request sent out by the tower [meaning it should be forwarded to 10.0.1.3]).
NAT causes all sorts of complications for our home server as we will see. The ideal situation would be to have the tower, the portable and the base station all have three distinct static IP addresses given out by the ISP. The problem is that
- ISPs insist on charging people extra money for extra IP addresses, and they insist on charging for static addresses and
-
doing this gets us back to the problem of coordinating available IP addresses that we discussed when introducing DHCP.
The current version of IP is called IPv4. There is a successor defined, called IPv6 which is slowly moving from research into the real world. These problems don't exist in IPv6, and one glorious day we'll all be using IPv6 and will be able to forget about NAT.
But that day has not yet arrived, so for now NAT is something that you have to understand if you are setting up a home server.

