From van@lbl-csam.arpa  Sun Jan 31 00:03:16 1988
Posted-Date: Sun, 31 Jan 88 00:01:12 PST
Received-Date: Sun, 31 Jan 88 00:03:16 PST
Received: from LBL-CSAM.ARPA by venera.isi.edu (5.54/5.51)
	id AA19754; Sun, 31 Jan 88 00:03:16 PST
Received: by lbl-csam.arpa (5.58/1.18)
	id AA11040; Sun, 31 Jan 88 00:01:14 PST
Message-Id: <8801310801.AA11040@lbl-csam.arpa>
To: Jon Crowcroft <jon@cs.ucl.ac.uk>
Cc: end2end-interest@venera.isi.edu
Subject: Re: measurements 
In-Reply-To: Your message of Fri, 29 Jan 88 15:59:31 GMT.
Date: Sun, 31 Jan 88 00:01:12 PST
From: Van Jacobson <van@lbl-csam.arpa>
Status: R

Jon -

Your latest measurements are absolutely fascinating.  I was
(barely) able to ftp the goony & cnuce tests (I sure hope purple
isn't running the new tcp -- if it is, looks like I've got a lot
of tuning left to do).  I think I understand most of what's there
except for a clock mystery on the goony test. 

The clock mystery is that the ttcp output in your message said
the goony test took 693 seconds for a throughput of 2.88 kBps. 
The tcpdump output says the test took 770 sec for a throughput of
2.65 kBps.  I think there's evidence in the trace for the 2.65
kBps rate so I'll make a wild guess that it's the correct number
(but I don't understand why the ttcp & tcpdump output agree
exactly for the cnuce test and disagree by 10% on the goony test.
And the relative offsets between ego's clock and the clock of
whatever was running tcpdump changed between the two tests: The
clocks read 16:07 & 16:01 at the start of the goony test then
17:51 & 18:18 at the start of the cnuce test.  Are you running
timed or something that could change the clock while the test was
running?  I did check the tcpdump trace to make sure the clock
was monotone with no jump discontinuities but it's not possible
to check the ttcp machine since it printed only two numbers.)

(BTW, my real reason for believing the 2.65 kBps rate is that I
have a model that predicted you would get 2.667 kBps throughput
for an echo test under ideal conditions.)

I think the situation is better than you stated for the Goony
test, even if we belive the lower rate.  With a user data size
(mtu) of 420 bytes, there should be 488 bytes going through the
simp for each data packet (2 frags, 1 w/20 byte tcp hdr, both
w/20 byte ip hdr & 4 byte butterfly hdr = 420 + 20 + 2*(20 + 4) =
488).  There's also an ack for every 2 data packets that
contributes another 44 bytes (20 tcp, 20 ip, 4 butterfly).  If we
charge each data packet for 1/2 an ack, the simp sees 488 + 44/2
= 510 bytes for every 420 bytes of user data.  Or, the total
unidirectional throughput through the simp is 21% higher than the
data throughput.  2.65kBps * 8 = 21 kbps * 1.21 = 26 kbps.  So,
the simp should have been handling around 52 kbps (which doesn't
seem too shabby). 

Since your trace included all the outbound traffic, we can cross
check this.  I put together an awk script to compute total bits
sent to the simp vs.  time (the script is at the end of this
message).  A graph of its output shows a very nice line whose
slope turns out to be 26 kbps with a couple of brief (~30sec)
excursions to 30 kbps. 

The script labels bits due to packets vs.  bits due to acks so
you can plot them in different colors (at least, that's what I
did).  This shows a wonderful example of self-organizing
behavior: At around 20sec, the connection has gotten to
equilibrium.  Because of the way slow-start works, the acks &
data are pretty well interleaved.  But because of cookie crumb
effects, the acks almost immediately start diffusing towards the
"late" end of the rtt slot and, by 100sec, all the sends show up
at the early end of the slot & all the acks at the late end (you
can see this as a "staircase" effect in the plot that is barely
noticiable at 20sec and grows to be substantial at 120sec). 

We can measure the average amount of the slot occupied by the
acks (which seemed to be 1.6 sec) and the number of acks (which
was 8 at equilibrium).  We know that there's one ack generated
for every 2 packets and the acks are generated at exactly the
rate that data packets come out of the simp.  Thus 16 packets
times 488 bytes/packet times 8 bits/byte in 1.6 sec = 39,040 bps
= 38 kbps.  This is amazingly close to Karen's 37 kbps
theoretical max for one 64kb channel.  The agreement will be
comforting if goonhilly-echo uses one channel & a real mystery if
it uses two channels. 

This flow separation of the sends & acks (which showed up in your
earlier data) is where I got the 2.65 kBps prediction.  We say
that at equilibrium the sends & the acks are going to occupy
non-overlapping parts of an rtt slot of length R.  Say that at
equilibrium we generate n packets of total size P bits.  If the
channel bandwidth is B, the ack portion of the slot must have
duration nP/B.  Two packets will be sent on the receipt of each
ack so, if the ack spacing is preserved as the acks flow back
through the channel, the send slot will also have duration nP/B. 
However, because of the PODA piggybacking & the symmetry of an echo
test, the acks get jammed together & come out the other end only
one packet time apart rather than two (I can draw a picture of
why this should happen but I won't even attempt it on a terminal.
But measure the slope of the steep part of the stairstep & you'll
see it's true).  Thus the duration of the send slot is half the
ack slot.  Thus we have

    R = 1/2 nP/B + nP/B = 3/2 nP/B

but the effective throughput for the test is the rtt divided by
the amount of data, or nP/R.  Substituting the above expression
for the R, the nP's cancel and we end up saying the effective
throughput for an echo test will be 2/3 B. That says that for a
38 kbps channel you should have gotten 25.33 kbps.  A least-
squares fit to the tcpdump data says you got 26.43 kbps (and
we'll agree not to quibble about the extra 2% since it's an error
in the right direction). 


The cnuce data could have been interesting (since it wasn't an
echo test) but was limitted by the receiver advertising only a
4KB window.  The average send-to-ack time was 2.6 sec so the
maximum possible throughput was 4/2.6 = 1.5 kBps.  Several short
portions of the trace (short = ~30 sec) seemed to go at this rate
but, overall, it looks like you got the dismal 1.1 kBps because
of a very high error rate (2% or 97 packets resent for 4973
sent).  With this high error rate, it's going to be hard to get
good throughput even when the window is a reasonable size. 

 - Van

ps: It seems like a crime that your satnet connection is being
    cut off in two weeks, just at the time you're getting all
    these great measurements and things are starting to make
    sense.  Is there any way to get an extension?

---------------------------
# bps.awk:
# given a tcpdump ftp trace, output one line for each send
# in the form
#   <p or a> <send time> <amt>
# where <p or a> indicates whether this was a data packet or ack
# packet, <send time> is the time packet was sent (in seconds with
# zero at time of first packet) and <amt> is the number of Kbits
# seen so far.  We compute total bits.  I.e., appropriate sizes
# for the internet headers of data packets & ack packets are added
# to the measured amount of user data in the packet.
NR == 1 {
	n = split ($1,t,":")
	tim = t[1]*3600 + t[2]*60 + t[3]
	tzero = tim
	OFS = "\t"
	hdrsize = 68
	ahdrsize = 44
}
{
	# convert time to seconds
	n = split ($1,t,":")
	tim = t[1]*3600 + t[2]*60 + t[3]
	if ($6 !~ /^ack/) {
		# get amount of data in the packet
		i = index($6,"(")
		# add it to the amount of header & update the running total
		nbytes += substr($6,i+1,length($6)-i-1) + hdrsize
		printf "p\t%7.2f\t%g\n", tim-tzero, nbytes*(8/1024)
	} else {
		# acks are pure header
		nbytes += ahdrsize
		printf "a\t%7.2f\t%g\n", tim-tzero, nbytes*(8/1024)
	}
}

