tjkopena's wiki : Work

This page collects notes on papers, books, etc, and some websites. Most work related, notable web reading is captured via del.icio.us tagging. This is essentially a list of readings that might be good cites at some point.

Most of this has been gutted and moved to Papers.

2k8/8

''The Voice of the Turtle: Whatever Happened to AI?" Lenat. AI Magazine, Summer 2008.

''Electric Elves: What Went Wrong and Why." Tambe, Bowring, Pearce, Varakantham, Scerri, Pynadath. AI Magazine, Summer 2008.

2k8/5

A First Course in Database Systems. Chapter 5, The Database Language SQL. Ullman and Widom, 2007.

Familiarizing w/ SQL.
Subsections: Simple queries, subqueries, duplicates, aggregation, modifications, schema definitions, views, nulls and outerjoins, recursion.

The Next Mainstream Programming Language: A Game Developer's Perspective. Sweeney. POPL 2006 (presentation).

Some interesting discussion about the codebase for a typical game.
- Three main segments of code: Gameplay simulation, numeric computation, shading
  - Gameplay: High-level, object oriented; tens of thousands of objects, each touching a handful when it is updated every 1/30--1/60th of a second.
  - Numeric: Physics, path planning, sound, scene graphs; low level, high perforance; essentially functional---taking one large input and making a smaller, different output
  - Shading: Highly specialized, running on the GPU; extremely parallel already
The two major issues languages need to address: Concurrency, reliability
- It turns out that the two often go hand-in-hand
- Effective, reliable modularization needed in large products is very similar to guarantees needed for good concurrency
- Also need to be able to support parallel object hierarchies, i.e. a new game reconfiguring a game engine's framework
Willing to trade some performance for ease-of-coding, reliability
- Much more static and run time checking, guarantees, semantics
- Interesting changes to integer data types recommended, performance cost feasibility shown
Lots of focus on features of Haskell, but doesn't like the syntax
- Also, questions lazy evaluation, preferring lenient evaluation

An Integrated Framework for Enabling Effective Data Collection and Statistical Analysis with ns-2. Cicconetti, Mingozzi, Stea. WNS2 2006.

Important goals in network simulation w/ ns-2:
- Not requiring people to write a lot of ad hoc, offline analysis code
- Reducing simulation run time by reducing logging, encapsulating results
  - May want to watch just some traffic, not all the background traffic
  - Logging may be especially bad for performance in distributed setting
Describes ns2measure package, containing two main components
- C++ classes for collecting data in ns-2
  - Stat class connects to probes, collects histograms
  - Stat object is global, can be called from any code; arguments are metric identifier, numerical identifier (e.g. a flow), and the sample value
- Scripts & programs for doing post-analysis
  - Scripts take set of measures of interest, compute averages and confidence intervals
  - Manage independent replications---run a number of separate simulations, collecting the sample from each one, then taking the mean of those
  - Supports stats terminating simulations, e.g. stopping replications once confidence interval is within given bounds
Some interesting measures: Size of routing tables, size of contention windows, throughput, loss rate, queue length, end-to-end delay
Some interesting statistics: Histograms, means, counts, time averaged

An Object-Oriented Representation for Efficient Reinforcement Learning. Diuk, Cohen, Littman, ICML 2008.

Presents a rule and object-oriented representation for reinforcement learning.
- Simple object oriented paradigm used to describe world
- Rule conditions are states in that world
- Actions manipulate attributes and objects
Close relationships w/ Guestrin's relational RL
Compresses the state space a great deal, and generalizes
- E.g., walls are basically the same, and I should learn about all of them anytime I interact with one of them
Great example taken from Atari's Pitfall, played through an emulator
Future work: More generalization, object hierarchies

VLDB review---distributed asynchronous continuous query checkpointing.

VLDB review---external flash memory sorting.

2k7/12

IP Performance Metrics (IPPM) for spatial and multicast. Stephan, Liang, Morton. draft-ietf-ippm-multimetrics-05.txt

Detailed notes here.

2k7/11

An Ethernet Address Resolution Protocol. Plummer. http://www.ietf.org/rfc/rfc826.txt

Apparently still current?

2k7/9

HOLSR: A Hierarchical Proactive Routing Mechanism for Mobile Ad Hoc Networks. Villasenor-Gonzalez, Ge, Lamont. IEEE Communications, July 2005.

Basic point 1: Ad hoc networks, and perhaps especially proactive ones due to the state they must keep and flood around the network, do not scale well. One way to address this is to break the space up, so not all nodes are keeping track of all nodes.
Basic point 2: Current protocols are not well designed for heterogenous networks, consisting of nodes with different radio range, throughput, memory, etc
Hierarchical OLSR runs several OLSR networks, each associated with a layer in a hierarchy defined by a group of nodes sharing a particular communications band
- For example, small sensors may form clusters at the bottom, WiFi a mid layer, and satellite the top layer
Nodes with clusters under them must know all the details of that network, but other networks need not. This improves scalability; such nodes generally associated with more powerful computing platforms anyway
- Does not require a particular addressing scheme

Wiring Vietnam. Tambini. Scarecrow Press, 2007.

Documents very early sensor networks/fields deployments and development throughout the Vietnam war.
Good look at operations, structure, and successful uses of sensor networks.
Some talk of current developments and usages.

2k7/8

Allowing Errors in Speech Over Wireless LANs. Chakeres, Dong, Belding-Royer, Gersho, Gibson. Workshop on Applications and Services in Wireless Networks (ASWN), 2004.

Basic point:
- 802.11 MAC discards packets with any error in them.
- For some applications, like speech and voice, this is unnecessary; they can tolerate some error
- What can be modified to enable this, and how well does it perform?
Varying amounts of protected bytes are required
- MAC & IP Headers have to be protected, otherwise packet may not be delivered correctly
* Some parts of the application data must also be protected, such as sequence numbers, frame counts, metadata, and other identifiers
- Accomplished here by altering the MAC to protect parts of data (corresponding to VOIP traffic)
- Could be done by modifying MAC to protect header, allowing IP checksum over its header, and using UDP Lite or similar package to protect header and parts of payload
Performance is not overwhelmingly better here though
- Perhaps it would be over multiple hops or lesser quality links?
- Congestion is notably eased, however, by requiring less retransmits upon discard

Generalized MANET Packet/Message Format. Clausen, Dearlove, Dean, Adjih. draft-ietf-manet-packetbb-08.

NOTE: Version 9 now exists, but does not change any significant content except noting that child protocols may choose how to handle malformed packets.
Defines a general format for control information packets, intended to be used by MANET routing protocols, etc.
Based around blocks of addresses and TLV structures
Comments to MANET group are here.

MANET Neighborhood Discovery Protocol (NHDP). Clausen, Dearlove, Dean, OLSRv2 Design Team. draft-ietf-manet-nhdp-04.txt

Outlines a relatively simple protocol for nodes to determine their two-hop neighborhood.
- Done by local broadcasting all known neighbors; receiving nodes then know their neighbors' neighbors
Based on PacketBB, using address blocks and TLVs to convey information
- HELLO messages may carry other information, e.g. MPR info
Some provision for quality information, but no specifics, left up to implementation
- Possibilities: Using announced times for next message to count missing messages, link layer info, etc
Comments to MANET group are here.

User Datagram Protocol. Postel. RFC 768.

UDP, w00t!

Internet Protocol: DARPA Internet Program Protocol Specification. Postel. RFC 791.

Discusses the basic IP model and packet format.

Broadcasting Internet Datagrams in the Presence of Subnets. Mogul. RFC 922.

Discusses addressing of various classes of broadcast messages, with slightly more attention on subnet issues.
Does not indicate what source should be chosen for multi-interface broadcasts.

Broadcasting Internet Datagrams. Mogul. RFC 919.

Discusses addressing of various classes of broadcast messages.
Does not indicate what source should be chosen for multi-interface broadcasts.

Simplified Multicast Forwarding for MANET. Macker, SMF Design Team. draft-ietf-manet-smf05.

Comments to MANET WG

Review for WIDM 2007. Comments in email.

IP over 802.16 Problem Statement and Goals. Lee et al, IETF Internet Draft (Feb 2007 ver)

802.16 stops, as usual, at physical and link layers. WiMAX Forum and WiBro defining network architecture above those layers.
802.16 MAC is point-to-point and connection oriented, which complicates this
Each node has a 48bit MAC address, including base station
Nodes have no connection upon entry to the network; must create them
Uplink is always unicast
PDU (Protocol Data Unit) contains no source and destination addresses; only CID (Connection ID)
IP Specific Subpart of 802.16 Packet Convergence Sublayer supports IP over point-to-point connection
This draft includes a good terminology section

Applied Cryptography. 2nd edition. Schneier. Wiley & Sons. Chapters 1--8, 10, 24.

Chapter 1: Foundations
Chapter 2: Protocol Building Blocks
Chapter 3: Basic Protocols
Chapter 7: Key Length
Chapter 8: Key Management
Chapter 10: Using Algorithms
Chapter 24: Example Implementations

2k7/7

Policy-Based Network Management for NeXt Generation Spectrum Access Control. Perich. IEEE Symposium on New Frontiers in Dynamic Spectrum Access Networks, 2007.

Need to overcome harmful interference from malfunctioning devices or malicious users
Defines an OWL-based policy language for various capabilities of software defined radios
- Uses SWI-Prolog on desktops, ad hoc reasoner in radios (maybe just XML parsing?)
- SWRL also used
Policies and configuration decisions pushed around using NETCONF

Performance of Routing Protocols in Large-Scale Mobile Wireless Ad Hoc Networks. Zhang, Riley. IEEE Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOT 2005)

Uses GTNets to evaluate AODV on very large networks under various parameters
- 10000 to 50000 nodes
- Increased mobility, by either faster speed or shorter pause times
- Increased traffic load
Metrics: Packet delivery ratio, end-to-end latency, control overhead, average hop count
Results:
- Within that range of topology size, there is little effect on packet delivery ratio; it's already hard enough to begin with
  - Would be nice to show that statistically the difference is insignificant
- Traffic load has substantial effect on packet delivery ratio, presumably due to collisions
- Speed has little effect on packet delivery ratio; pause time has significant effect
- Pause time has greatest effect on end-to-end delay of any parameter
- Not clear how control overhead grows; appears linear, but rises sharply at 50k nodes
- MAC layer packet loss is very sensitive to mobility
Includes short description of AODV and various possible enhancements, though it doesn't do much to look at those enhancements
General lack of solid statistics here, i.e. confidence or even number of trials

Opinion: Is It Time To Replace SMTP? Crocker. IP Journal; Volume 10, number 2 (June 2007).

Key point: SMTP hasn't changed in a long time. It's good at what it does, but should it do more?
Rather than evolving SMTP, people have moved toward more centralized, non-standard solutions for other comms modalities like forums, instant messaging and wikis
But, mail is definitely different; if you're not looking for/expecting communication, forums don't work; need to be able to push messages
How do those concepts and usage models relate to mail?

More ROAP. IETF Journal, IETF 68, May 2007, volume 3, issue 1

Network Management Research Group (nmrg) and European Network of Excellence for the Management of Internet Technologies and Complex Services (EMANICS) interest in uncertainty and probabilistic approaches as well as ontologies for network management
Separating identity and location is a huge, looming issue
- Important to reduce the information associated with locators so that they reflect primarily network topology
- Need to be able to aggregate them efficiently
- Important features: Multi-homing across multiple providers, ease of provider switching and locator renumbering, mobility, roaming, session resilience, and traffic engineering
Clean slate approach is pretty questionable, how could you ever get it moving?
BGP is unusual in that it works w/ TCP's send queue, compressing state by removing obsoleted data waiting in the send queue
- This has to be done on a host-by-host basis (as opposed to data/message) to avoid slowing uncongested hosts
Difficult to make richer endpoints in the face of simple devices such as RFID tags

Improving 802.11 Range with Forward Error Correction. Riemann, Winstein. Technical Report MIT-CSAIL-TR-2005-011, 2005.

Ethernet is restricted by delay, not signal power; combined with CSMA/CD, means congestion causes errors, not noise.
In wireless settings lots of things can cause noise; adding FEC redundancy can recover from bit errors in received packets
FEC makes even more sense for 1-way links as well as when latency is important; doesn't make sense for 2-way links when latency does not matter
According to authors, at the time only Intersil Prism 2/2.5/3 chipset cards have a method to retrieve frames with errors
- MadWifi for Atheros might now as well
- Used modified Host AP driver to catch erroneous frames
- Used broadcast to avoid retransmits (most 802.11 cards retransmit 4 times)
Scrambles the message to mitigate problems with contiguous sequences of errors, which would overwhelm that coding block.
Get distances up to half a mile plain, and 0.87 of a mile w/ Reed-Solomon coding, but half a 200mW transmitter.
Notes that RoofNet did not have nearly the same sucess; hypothesises that this is because line-of-sight links are fundamentally different from urban multi-path.

The Google File System. Ghemawat, Gobioff, Leung. SOSP 2003.

Design for failure; file system is built on commodity hardware, some percentage of which is bound to have a fault at some point and/or fail permanently, and at large scale that means a lot of devices will fail.
Examine assumptions---very large files and predominantly append-only writes implies different strategies for disk access, block structure, atomicity guarantees, and optimization.
Focus on sustained bandwidth rather than low latency.
Support short read/write operations in random places, but don't optimize for them.
Non-standard operations:
- Snapshot: Copy a file or directory tree at low cost
- Record Append: Allow concurrent, atomic appending to a file. Simplifies apps by removing locking logic. Useful for multi-way merge & producer/consumer queues with many producers.
Clients get metadata from central master server; talk directly to chunkservers to perform reads/writes
Chunkservers ultimately have final say over what they're storing, and there are many cases such as failure where the master cannot control it.
- Therefore, use soft state on the master rather than trying to keep a consistent view.
Master state periodically written to disk in direct memory form; no need to parse upon reload.
- Once operation log grows too big, it is checkpointed in this fashion.
Consistency model is somewhat simplified compared to DB semantics
- Consistent: All clients will always see the same data, regardless of replica is used
- Defined: Consistent and clients will see data in entirety

Art of Computer Systems Performance Analysis, The. Jain, Wiley Press, 1991. Chapters 1--3, 13.

Chap 1: Basic overview of performance evaluation.
Chap 2: Common mistakes.
Chap 3: Techniques & Metrics.
Chap 13: System comparison using sample data.

2k7/6

Introduction to RF Propagation. Seybold. Wiley Press, 2005. Chapters 1--4, 6, 7, 12.

Chap 1: Basic applications and introduction to RF modeling.
Chap 2: Electromagnetic physics overview.
Chap 3: Antenna fundamentals.
Chap 4: Link budgets.
Chap 6: Atmospheric effects.
Chap 7: Near-earth propagation models.
Chap 12: RF safety.

Experimental Evaluation of Ad Hoc Routing Protocols. Borgia. IEEE PerCom Wkshps 2005

Compares AODV and OLSR in outdoor and indoor settings; UNIK-OLSR and UU-AODV implementations
Claim: Current routing technology has an ad hoc horizen of 2--3 hops and 10--20 nodes, beyond which it doesn't function very well
- Relatively small number of nodes used, 8 Linux laptops of varying kinds
- Test inside an office/academic building, out on a field
Metrics: Overhead, delay, packet delivery ratio
- Overhead: OLSR 200--1200Bps; AODV 200--400 Bps
AODV 19--20 seconds to establish first path; OLSR 8
AODV delays are significantly longer, ~1--2 seconds versus ~200msec OLSR
Outdoor 3 hop, OLSR goes to 1 sec delay, AODV drops 50% more packets
Decent set of references in this.

ISM-Band and Short Range Device Antennas. Loy, Sylla. Texas Instruments Application Report SWRA046A. 2005

Reviews basic antenna and propagation models

Path Loss Model for Wireless Applications at 3500 MHz. Joseph, Roelens, Martens. IEEE (where?)

Quickly presents a measurement study for a WiMAX-type transmitter and develops a path loss model
Compares the model to some existing ones, but does not seem to go back and compare to new data again

A Performance Comparison of Multi-Hop Wireless Ad Hoc Network Routing Protocols. Broch, Maltz, Johnson, Hu, Jetcheva. MOBICOM 98

Required for simulating MANETs:
- Node mobility
  - Spatial diversity, varying collision domains
- Radio propagation model supporting propagation delay
- Propagation typically $1/r^2$ at short distance (r is distance) and $1/r^4$ at longer distances
  - Crossover is usually ~100m for outdoor low-gain antennas roughly 1.5m above ground plane in the 1--2GHz band
- Capture effects (continue receiving as long as new packets RSS > -10dB from current)
- Carrier sense
General improvements:
- Periodic broadcasts and response packets are artificially jittered to avoid synchronization
- Routing information queued for transmission ahead of ARP and data packets
- Most used link breakage detection relying on MAC layer
Good summary here of DSDV, TORA, DSR, AODV; some optimizations introduced:
- DSDV: Hop by hop distance vector routing protocol; guarantees loop freedom
- TORA: Prioritizes quickly found routes over the best routes
  - Uses IMEP (Internet MANET encapsulation protocol); does some aggregation of packets, guarantees in-order delivery for control traffic
    - Very sensitive to the interval chosen to wait for aggregation
- DSR: Reactive source routing; some optimizations:
  - Supports unidirectional routes, though 802.11 does not
  - First queries neighbors to see if they're destination or have route
  - Sniffs for routes in promiscuous mode
  - Caching multiple redundant routes, to use on failure
- AODV: Combines DSR and DSDV in a sense
  - Use link layer link detection instead of HELLO messages
  - Shorten timeouts on route requests
Notes on experiment setup:
- Chose a rectangular workspace to force longer routes than in a square space
- Includes a lot of runs---70 movement patterns
- Speeds from 20m to 1m per second
- Constant bit rate sources
- None of these protocols do load balancing, and scene gets congested, so investigation limited to small packets
- Claim: Difficult to compare routing protocols directly using TCP, since it is also adapting to the network
Metrics: Packet delivery ration, routing overhead, path optimality
- Delivery rate is ultimately what matters most to many applications
- Routing overhead effects scalability, functionality in congested or low-bandwidth environments, efficiency of power consumption; may exacerbate congestion/collisions, fill up queues.
  - Should include #bytes, not just messages in this---some have big headers
- Path optimality effects delay, loss, bandwidth
Interesting points from results:
- DSDV suffers because it does not cache backup routes
Claim: broadcast packets are inherently less reliable because they cannot use DCF

Large-Scale Network Simulations With GTNetS. Riley. Winter Simulation Conference 2003.

Briefly touches on overall design of GTNetS.
Describes optimizations to reduce memory footprint, typically what constrains topology the most
- FIFO event queues---though it's not clear to me why you'd do this any other way...
- Abstract packet queues---calculating some events to eliminate the actual marker
- Timer buckets---grouping events into a less detailed time-resolution to save list size
- No routing tables---compute on demand, cache at sources
- No routing tables or computation at leaf nodes or for local broadcast destinations
- Lots of control over logging, built in statistics collection (obviating data collection)

2k7/5

Empirical Methods for Artificial Intelligience. Cohen. MIT Press, 1995. Chapters 1--3.

Covers basic statistics, experiment design, and classical hypothesis testing.

2k7/4

Dude, get up to date!

2k7/3

TAG: A Tiny AGgregation Service for Ad-Hoc Sensor Networks. Madden, Franklin, Hellerstein, Hong. OSDI 2002.

OWLDB: Integrating Ontologies and Relational Databases. Auer, Ives. Pre-print draft.

Extensive comments in email.

Aurora: A New Model and Architecture for Data Stream Management. Abadi, Carney, Cetintemel, Cherniak, Convey, Lee, Stonebraker, Tatbul, Zdonik. VLDB Journal, 2003.

This paper outlines a very comprehensive stream processing system. In addition to the basic operators and data management mechanisms (buffers, archival storage, etc), it incorporates a heavy focus on resource utilization and quality of service optimization. In addition to data flow, users also specify resource limits and QoS graphs based on time, values, and other properties. The system then employs a variety of mechanisms to meet resource limits while optimizing QoS.

As presented, Aurora is a centralized stream processor, with some mention of future work on distributing the system. More than being "just" a stream processor, however, it presents a lot of great concepts for application messaging. It's interesting to consider all messaging as streams, and apply Aurora concepts to those streams. Examples include squelching messages that have current recipients (moving drop boxes downstream), prioritizing messages based on application metrics (value-based QoS graphs), and aggregating messages when specifics are not needed.

The notion of train scheduling probably fits in well with generating bursts of network traffic in a distributed Aurora system. Processing and then shipping out tuples in batches will improve aggregate costs in transmitting the tuples.

It's interesting that the Aurora scheduling process consumes as much time or more than the actual data processing. More discussion on how that overhead varies---by data, by network, etc---would be worthwhile. It's not completely clear from the graphs presented how much of the costs are fixed, how much are data load dependent, and how much depend on network size.

Models and Issues in Data Stream Systems. Babcock, Babu, Datar, Motwani, Widom. Invited talk, ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 2002.

This paper outlines several models for streaming databases, with a focus on continuous queries. It frames much of the discussion in terms of approximation, where relaxation is done either by discarding data (e.g, sliding windows over recent data, or sampling) or aggregation (e.g. statistics and synopsis techniques). Other interesting points include punctuation, control markers inserted into data for upstream consumption (i.e. everything after this point is day > 10), optimization over query plans, and queries referencing discarded history.

If declarative routing style networking were to be extended to incorporate higher, application and domain layer data and policies, it would most likely have to more explicitly incorporate data stream mechanisms such as those described here. The current work seems to include aspects of update data models and sliding windows (timeouts on data), but would have to be more flexible and explicitly defined to support the wide range of data models (i.e. append) and queries (i.e. averages over periods of time) that applications my require.

Also of interest is that the close similarities between database streams and networking. Sliding windows, aggregations, and other approximations may seem somewhat new for databases, but are very natural and accepted from a networking perspective. However, there are some things here which networking can in turn perhaps draw from. For example, different data models, such as append versus update, imply possibilities for different delivery strategies and optimizations (i.e. deleting a packet if not delivered before its update window, or replacing queued items made obsolete by newly delivered data).

It's also interesting to consider all of a system's networking from a data stream perspective. That might provide a unified framework for tasks such as cost based, cross-layer optimization of networking strategies and quality of service. Examples include expediting messages deemed important by the domain, reducing all traffic in the presence of dwindling power supplies, trading off delay versus traffic in aggregating smaller messages, etc.

BAA07-07: WNaN Adaptive Network Development (WAND). DARPA. 2007.

Call to develop ultra-large (tens of thousands of nodes), densely connected, adaptive networks over inexpensive wireless hardware (developed under BAA06-26: WANN).
Huge networks, mission aware, network planning, infrastructure utilization, distributed computation, multicast, cacheing, cross layer, policy reasoning, delay tolerant, flexible and maintainable (easily updated).

Long Term Knowledge Retention Workshop Summary. Lubell, Rachuri, Subrahmanian, Regli. NISTIR 7386, December 2006.

Review of workshop held at NIST in 2006.
There are real business cases & stories around for engineering archiving, including examples from legal, operational, and development perspectives.
Same old story: STEP is great for solid geometry, not so much other data. Need to capture simulations, intent, test data, process plans, etc.
Need package formats and specifications.
Don't worry about capturing everything in some easily accessible format. It's more important to capture as much as possible than to do it well but only get a little bit. The "digital archeology" approach to archiving.

Forensic Analysis for Epidemic Attacks in Federated Networks. Xie, Sekar, Reiter, Zhang. IEEE ICNP 2006.

Describes a system for information sharing between autonomous systems (in the Internet sense) in order to more effectively analyze and trace attacks.
Key point revolves around statistical sampling of connections in the network, moonwalks, to backtrace nodes likely to have been involved in the epidemic.
ASs may optionally provide information to their neighbors to help with this process. It's in their benefit to do so, as they will also be better able to analyze the attack.

2k7/2

Service Oriented Architecture (SOA) Primer: Service Oriented Computing, SOA Components, Web Services, and Service Orchestration. Mongan. Draft preliminary background study, 2006.

Active Networking: One View of the Past, Present, and Future. Smith. IEEE Transactions on Systems, Man, and Cybernetics, February 2004.

Great review of development of active networks.
Interesting point: Typical users don't see the network as dumb. Routing, caches, DNS, etc, all just gets lumped in there.
Basic code loading approaches: On demand; in advance; per packet. ** Related distinguishers: Active over packets or flows? Upgrades by application developers or network administrators?
Basic tradeoff: Flexibility vs safety vs performance.
Interesting point: Not all nodes need to be active. Could place at lower bandwidth links, AS boundaries, etc.
Interesting combination: Powerful services installed easily on a node, most of them offering information rather than forwarding decisions, and a small per-packet glue language to tie them together and make decisions through the network.
PLAN packet language employs interesting, severe restrictions to ensure properties such as termination, byte-code compiling for performance, framework resource controls for fairness on node.
Much of active networks has since been rolled into projects such as PlanetLab. Overlay networks in general can be seen as an ad-hoc form of active networking. Using kernel hooks for userspace routing programs is another (popular for MANETs). ** Example from router development/experimentation: NRL's SMF implementation, which uses IPQueue module in IPTables to install its own code and rewrite kernel's IP sequencing.
A lot of active networks concepts are also being deployed in the mainstream, but in an ad-hoc fashion. Firewalls, IPTables, etc. Mostly from the services/stack direction rather than capsules.
Related work under other names, e.g. composable protocols (DARPA SAPIENT), extensible routers.
Mobile agents from AI community basically combo of overlay & active networking. Different tradeoff in flexibility vs performance: Don't do routing, but mobility useful for large application layer tasks.
Encapsulation very useful, e.g. to allow encryption as possible services.
Maybe capsule approaches make the most sense in low-bandwidth, high latency settings (e.g. IPN DTN), where retransmits are costly, and decisions may vary widely.

Active Network Vision and Reality: Lessens from a Capsule-Based System. Wetherall. SOSP 1999.

In this work, capsules do not carry code. Instead, they carry an identifier uniquely associated with a service. If that code is not loaded at a node when a packet arrives, the code is requested of the previous forwarding node. If it's already not available there, the packet is dropped.
Relies on Java language & security model, simple techniques such as killing long-running threads, to provide resource guarantees.
API is very restricted. Deployment of upgrades to basic active framework sounds almost as problematic as traditional router upgrade problem. Mitigating factor is that packets can reason on upgrades and possibly degrade gracefully in their absence.
It's unfortunate that packets of different types cannot share information. It would perhaps be useful if soft state was world-readable, but only writable by packets in that service class.
It seems somewhat misleading to refer to this as a capsule based approach. My impression is that their choice of Java maded per-capsule code completely infeasible. In addressing the size issue, they wound up moving to basically a router plugin architecture into which packets are demultiplexed and plugins are easily fetched on demand. Technically you could ship new code with every packet, but it would require the fetch after receiving it.
Capsule code in a domain specific language is likely to be much smaller and faster. This implies that it might be feasible to have a powerful, full blown language implementing services, and a small packet language to glue them together. Seems to be what PLAN does.
You might be able to argue that packets hardly ever change, so why not just have them demultiplex into a set of services?
Regardless, it would be nice if services were more composable, which they aren't really here.
No generic setup here for collecting local information and allowing packets to use it, such as bitrates, drop rates, neighbors, etc.
Some interesting points here about amortizing the fetch cost. Also, interesting numbers on how much processing time can be allocate per packet at different places in the network.

Implementing Declarative Overlays. Loo, Condie, Hellerstein, Maniatis, Roscoe, Stoica. ACM SOSP 2005

This paper presents P2, an extension of the "Declarative Routing" work to better support more sophisticated network protocols. In particular, the aim is essentially to enable rapid network overlay prototyping, debugging, and evaluation. A version of Datalog extended with implicit message passing/tuple storage and many functions and features for protocols, such as tuple-triggering timers, is developed as the language for describing these protocols. A flexible dataflow framework with Datalog operators, timers, network handlers, and other components is then generated to conduct actual execution.

One issue here is that the pool of functions required outside of the tuple semantics in order to implement many protocols could grow large and quite arbitrary. Already this paper introduces a number of functions and features not incorporated in the original Declarative Routing publications. This might especially be the case for protocols closely tied to application and domain specific logic. However, for many protocols this may not be the case. A review and sample implementations of many distinct protocols may be necessary to determine the seriousness of this concern. It is also worth pointing out that much of this may be obviated by supplying a small language or making it easier to deploy new functions. However, this definitely reduces the elegance of the Datalog-based semantics & language.

The implementation discussed here is based around a single-thread, follow-to-completion execution model. This may have to change in order to support worthwhile goals such as supporting simultaneous execution of several overlay variants, with data and processing sharing between them. In addition to the implementation concerns, it's not clear what additional features will be required, e.g. stronger locking or other transaction semantics. Such features may also be necessary to implement reactive protocol components, which seem to have great potential to block the processing of an entire sequence and waste node resources (CPU, memory, network).

As with the declarative routing work, some sort of tuple aggregation scheme to reduce the amount of generated traffic and save on packet overhead also seems to be a requirement for practical use. Some of this can be implemented through careful orchestration within the Datalog specification, however that again moves the language away from its clean, logic-based semantics. On a related note, it seems difficult in this scheme to implement protocols which ride control traffic on top of data traffic, and this is not something which can be done within the language.

Another note is that the work here makes several significant uses of negation. It seems that an ability to provide an ordering over rules is required to make effective and non-ambiguous use of negation as failure, which seems to be what's in play here. There also seem to be similar concerns & interactions regarding the delete operator, which might when combined with negation-as-failure allow for a very strong and possibly unwieldy negation.

Concerns about the difficulty of programming in the language are probably a minor issue. Every language has a learning curve, and rule-based programming certainly takes some getting used to. The opportunities for verification and automated optimization, beyond the concise nature of the specifications, probably outweigh this concern.

On a different topic, it would be interesting to incorporate a transformation engine into the dataflow framework in order to interoperate with other implementations. This may also require some aggregation scheme or coding in the rulebase. However, the dataflow framework and automated composition does seem well suited for simply including such transformations as another element to be inserted, and is a definite plus for the approach.

Declarative Routing: Extensible Routing with Declarative Queries. Loo, Hellerstein, Stoica, Ramakrishnan. SIGCOMM 2005

See slides from my presentation reviewing this paper. Other comments:
Really interesting; rule based specifications of routing protocols compiled down into DB-style operator strands in a dataflow execution framework.
Rules implicitly trigger communication between nodes to exchange neighbors, paths, etc.
Very interesting; some serious questions about overhead in messaging, necessity to incorporate some sort of aggregation, etc.
Related to communication planning framework interests? Add more hardware reasoning in?

Loo, Ives, Smith. NSF FIND proposal.

2k7/1

Knowledge Plane for the Internet, A. Clark, Partridge, Ramming, Wroclawski. SIGCOMM03.

Vision paper on the need for networking to become more self managing and autonomous.
Current Internet requires an awful lot of management, more and more things configured by hand (e.g. many routers are manually configured in order to implement policy decisions).
Only by being more autonomous can it truly adapt at reasonable time scales.
Need to treat this is a knowledge-based problem, ship data around, support users, and make actionable decisions.
Machines can do a lot to help users diagnose frequently opaque problems ("Your game doesn't work because your ISP is blocking multiast traffic to that port.").
It's no longer efficient enough to have all the intelligence at the network edges. The core needs to know what shoud be happening, because it's the one in a place to detect problems and do something about them.
Need to coordinate across regions and the globe.
Must handle incomplete, out of date, inconsistent, and malicious data.
Information economies---services compete to provide data/analysis. Among many others, interesting issues here of providence---how do I know everyone's not just repackaging the same data?
Data and knowledge based routing.
Where does it run, where does the KP live?

Outdoor Experimental Comparison of Four Ad Hoc Routing Algorithms. Gray, et al. MSWiM 2004.

Describes construction of and results from a testbed for outdoor/indoor/simulated routing comparison.
Constructs common routing algorithm software environment so the same implementation can be used across each. Uses mobility & connectivity traces from outdoor experiments in indoor testing.
Each node generates a local beacon, which serves to build some picture of the connectivity, regardless of what the routing tables say. Further, for many applications this sort of traffic is also necessary (i.e. broadcasting GPS coordinates).
Each node also generates random amounts of dummy application traffic, from the host to a randomly chosen other host. Intervals and size are on a normal distribution. This traffic emulates application traffic.
Four algorithms were studied:
- APRL: Each node puts out a beacon including its routing table. The first determined route for a node is used, without any notion of "shortest path." Routing loops are avoided by not using a route until a dummy packet is successfully sent to a destination and returned.
- STARA-S: With some probability, a neighboring node is chosen as the next hop. Acknowledgements are sent back from the destination. Probabilities are adjusted based on the delay, such that long routes are used less frequently. This can lead to an exploration problem. Messages to determine these times are sent periodically.
- AODV: Upon required a route to a destination, AODV sends out a request. Any node receiving this that knows of a route sends back a response. Otherwise it forwards the request one. Regular beacons are sent out to determine neighbors; messages are sent out if a node disappears, to invalidate routes.
- ODMRP: Very similar to AODV, except that data is sent along with request messages. This is good when data is small, bad otherwise. Also, it maintains a notion of a multicast group, and routes to forward messages to members of that group. These forwarding groups time out periodically.
Results from this paper are questionable; their analysis is funny in places.
However, they note that ODRMP does very well, probably largely because shares data and control packets and the data is small. APRL and STARA either do not adapt frequently enough, or clog the network.
Metrics used in their studies include:
- Communication efficiency: How many control and other packets are generated per message (not including fragmentation; all packets here are smaller than MTU).
- End to end latency: How long it takes for messages to be delivered. To all hosts in multicast case. Some stuff here about trying to sync clocks based on GPS data, which isn't accurate enough, so they also try to adjust for skew afterward.
- Hop count: How many nodes a message is routed through before reaching the destination. Not clear what this means if several routes are used (e.g. redundant broadcasting).
- Message delivery ratio: How many messages actually arrive at their intended destinations.

2k6/12

On the Credibility of Manet Simulations. Andel, Yasinsac. IEEE Computer, July 2006.

Network simulation is good & important, but there's not nearly enough rigor used in practice.
Results are not reproducible, mis-interpreted or analyzed, and models don't reflect reality.
Difficult to determine where to sample at; e.g. one protocol may dominate another at 10 & 100Mbps, while the second protocol is better at intermediate values. How do you find that? Exhaustive testing is time consuming.
Simulators and their components (e.g. fading modules) vary widely & produce different results.
People don't generate traffic in realistic ways, often using a single connection at a fixed bit rate. Need to model usage requirements more closely, e.g. need to have application-layer connections coming up in down, include ARP, changing traffic, etc.
Need to thoroughly document test parameters for it to be useful. Put critical metadata in paper, complete information on web.
Need to determine how many trials are necessary to be statistically valid. Some interesting notes & references on how to do this.
Most MANET studies use random mobility models---but how often do people actually do this?

Defining Network Capacity. Chimento, Ishac. draft-ietf-ippm-bw-capacity-04.

Note read of earlier version listed below.
The following are thoughts passed on to the IPPM group (here):
- There should perhaps be more emphasis or explanation that this is really about path capacity. The notions defined here are "network capacity" from the perspective of one host to another specific host along specific routes, not from the perspective of the network or even across all hosts or all routes to one host. In particular, MANET literature often discusses network capacity in the sense of the entire network, how many bits can be moving forward at one time between all hosts. This is particularly interesting in that setting because of the conflicts in the following point.
- pg 7, definition 3.3, IP Layer Path Capacity. This definition assumes that the links are independent and do not affect each other. The minimum capacity along the path does not necessarily provide a lower bound on the path's capacity. For example, this is the case on multi-hop wireless networks, where each link may very well have reasonable capacity, but communication (forwarding) on each link hinders communication on the successor and predecessor links as they can't both be live. Note also that definitions which account for this probably take care of congestion as well. The note about congestion in this definition is odd, although perhaps it's really trying to say that this definiton does not address the above phenomenon of adjacent links affecting each other? In that case, this definition perhaps has some value, but a metric which accounts for that would be more intuitive and practically useful.
- pg 9, discussion 3.2, Hardware Duplicates. This argument at least needs more motivation. It should point out that IP packets do have IDs and can conceivably be identified as duplicates (do most implementations set sensible IDs for non-fragmented packets or is it ignored?). Wouldn't most (all?) implementations simply pass duplicate packets up the stack? I would not have previously thought IP included any checking for this. In that case, it's not immediately obvious why these are not included in the network capacity but corrupted data is. Granted, it reduces the amount of data that can be transmitted. But so do corrupt packets. That line of thought seems to rely on an argument about measuring transport layer bits, which is not the goal of this draft. In short, I don't necessarily believe this definition should be changed, but it's clear it will be a sticking point for most readers and at a minimum needs to be stronger. Also, the closing sentence should read "reflect the duplication with a lower value" to be slightly more clear, rather than "the lower value."

Wireless Sensor Navigation for Emergency Navigation. Tseng, Pan, Tsai. IEEE Computer, July 2006.

Discusses adapting TORA (temporally ordered routing algorithm) to handle physical routing of people in an emergency (e.g. escape in a forest fire, building fire, etc).
Seems based largely on potential fields.
Includes some notion of hazardous regions, essentially making path cost not just a factor of length but hazard as well.
Mentions Great Duck Island and Firebug.
Not clear or well written at all.

2k6/11

Efficient Reasoning in EL+. Baader, Lutz, Suntisvaraporn.

EL+ is EL concepts with general concept and role inclusion.
Can handle transitive roles, role hierarchies and right identities.
This language is rich enough for many applications.
It remains tractable.

Measurement Study of Vehicular Internet Access Using In Situ Wi-Fi Networks, A. Bychovsky, Hull, Miu, Balakrishnan, Madden. Mobicom 2006.

Studies the ability of wi-fi in cars driving in residential areas to connect to encountered, unplanned, open access points.
While driving in large cities (this paper reports on Boston), enough access points are encountered to provide useful coverage.
Typically in range of such acces points long enough to transfer useful amounts of information, about 216KB over TCP.
Throughput not affected much by speed, though obviously the overall time of connection shortens as you drive out of range. Median of 30KB/s, fairly typical of broadband upload speeds in US.
Association times improved by a small optimization of DHCP, remembering leases granted by access points and requesting them again if still within lease bounds.
Some interesting points about actually running the test, e.g. problems with starting up scripts causing time lags due to the limited systems.

Measurement-based Characterization of 802.11 in a Hotspot Setting. Rodrig, Reis, Mahajan, Wetherall, Zahorjan. SIGCOMM 2005 Workshop.

Studies wi-fi data collected around access points at a reasonably large conference.
Finds that only 40% of wi-fi data is spent transmitting original data packets. 35% spent on retransmissions, 15% acknowledgements, 10% management traffic.
Retransmissions are common---28% of data transmissions, 46% of the time due to decremented bit rate.
Most transmissions are done at 1Mbps, which actually only makes contention worse. Switching bit rates is common and very frequent. Most cards strongly prefer the highest bit rate.
Placed monitors external to but near access points. Collected most data with tcpdump. Some hacks to MadWiFi to get data on errors.
Key point: Airtime dominated by 1Mb traffic.

Framework for IP Performance Metrics. Paxson, Almes, Mahdavi, Mathis. RFC 2330

Wire time vs. host time; imperfect clocks.
Why it is recommended to avoid thinking about Internet properties in probabilistic terms.
Metrics must be concrete & well defined; repeatable; unbiased; useful; not create artificial goals.
Includes definitions for hosts, routers, paths, clouds.
IMMP documents are based around metric definitions of bit prefixes, e.g. M=1000, not 1024.
Many different measurement methodologies exist:
- Direct measurement with generated traffic.
- Calculating a metric from other metrics, e.g. given propagation delay for each link, work out propagation delay for the path.
- Estimation of a metric from more aggregated metrics, e.g. estimating propagation delay from packet delays & sizes.
- Estimating a metric at the current time based on past and current data for that and other metrics.

One-way Active Measurement Protocol (OWAMP) Requirements. Shalunov, Teitelbaum. RFC 3763.

Increasing availability of high precision time sources (GPS, CDMA), enables accurate one-way active measurement.
But, note that for many metrics, such as loss, highly accurate timing is not critical anyway.
Need to define:
- Test stream initiation.
- Exchange of test packets.
- Retrieval of test information.
Poisson streams.
Keep raw data so user can generate any statistics desired.
Should disconnect test controls and actual test execution.
Cannot allow control to specify construction of arbitrary packets for testing, due to security concerns.
Have to ensure as best as possible that tests cannot be arbitrarily boosted, e.g. by watching for OWAMP traffic and giving that priority.

IP Performance Metrics (IPPM) for Spatial and Multicast. Stephan, Liang, Morton. draft-ietf-ippm-multimetrics-02.

Typographical comments for this draft are at Reading.IPPMMultimetrics.
Other standards define metrics for point-to-point communication, possibly over several links. This defines them for point-to-group communications.
Spatial metrics: One of the hosts involved is neither the sender nor the receiver (e.g. a link in a multi-hop path).
- It is useful to decompose paths and determine statistics along each segment. For example, this can help determine where delay and jitter are occurring.
One-to-group metrics: The measured packet is sent by one host and received by several destinations.
Does not address group-to-one and group-to-group metrics, but hopes those defined here may be used to build those.
Notes that the placement of instrumentation effects the results---you may see more before a buffer than after.
Makes heavy use of 1-way metrics, which require a great deal of clock synchronization.
Notes basic problem in passive measuring---packets don't have timestamps necessarily.
Having the multicast group address optional may be useful for applying the tests to broadcast tests.

Two-way Active Measurement Protocol (TWAMP), A. Babiarz, Hedayat, Krzanowski, Yum. draft-ietf-ippm-twamp-01.

Defines extensions to One-Way Active Measurement Protocol to handle a Two-Way test.
This would be applicable, e.g., if clocks cannot be synced well.

Defining Network Capacity. Chimento, Ishac. draft-ietf-ippm-bw-capacity-03.

The term "bandwidth" is overloaded. Hence, this document defines "capacity."
Have to differentiate from theoretical link capacity versus usable capacity, e.g. after overhead bits, errors, etc.
Must avoid sampling in a way that causes bias. For example, sampling at a frequency near a multiple of some underlying phenomena would be a problem.
Have to include IP fragments in measurement even if full packet cannot be reassembled, as these are useful IP data. Some tests differ on this.
However, packets partially received when the test timeout occurs are to be tossed. This has some ramifications for biasing introduced by relationships between packet size and timeouts that must be kept in mind.

Tailoring OWL for Data Intensive Ontologies. Calvanese, Giacomo, Lembo, Lenzerini, Rosati. DL2006.

Same as below, but some more details, lower bounds on reasoning (LOGSPACE).

DL-Lite: Practical Reasoning for Rich DLs. Calvanese, Giacomo, Lenzerini, Rosati, Vetere. DL2004.

Introduces simple but usable DL including existential qualification, limited negation, etc.
Orients reasoning procedures around data & reasoning over queries.
Reasoning taken as two-step process:
- Transform query into a new query, possibly in a different language.
- Evaluate over ABOX.
For DL-LITE, the original query can be transformed into an equivalent conjunctive query to execute in a relational DB, taking advantage of its optimizations, memory structure, etc.

JCISE review.

Design Framework for Highly Concurrent Systems, A. Welsh, Gribble, Brewer, Culler. Published where?

Highly concurrent server desgin largely based on thread or event models.
- Threads. Pros: Comfortable programming model, good tools, wrap blocking calls, utilize multiple processors. Cons: Lock contention, OS overhead leading to degradation in performance.
- Events. Pros: Less overhead, no degradation at maximum levels. Cons: Difficult programming model, can't utilize parallelism (e.g. multiple processors).
These are actually ends of a spectrum and have to combine both to work most effectively.
Introduces design patterns based around Tasks, Thread Pools, and Queues.
- Wrap: Associate a thread pool with a queue to create a Task Handler; eliminate degradation, utilize parallelism.
- Pipeline: Break tasks up into Stages in order to utilize downtime, different task latencies, utilize cache, etc.
- Combine: Share threads in a thread pool between low-latency Tasks to minimize number of threads.
- Replicate: Duplicate Task Handler to utilize parallelism.
Note: Have to determine thread count threshold for machine, OS I/O request thresholds, etc. in order to tune well.
Interesting comments in passing on use of economic models for scheduling/allocation.

2k6/10

Essential Guide to RF and Wireless, The. Weisman. 2nd Edition. Book.

Pretty good intro to basic RF principles, antennas, cellular technology, etc. Not so much on wi-fi. The "Here's another equation you won't care about." shtick wears thin fast, but it disappears as the book goes on. Good enough that I read this cover to cover. Interesting tidbits:
- Time modulation.
- Commercial avionics run near 900Mhz, ergo rules on cell phone use.
- Cell towers usually have 3 antennas per 120 degree sector: 1 to transmit, 2 to receive and cancel multipath effects.

Architecture and Evaluation of an Unplanned 802.11b Mesh Network. Bicket, Aguayo, Biswas, Morris. Mobicom 2006.

tjkopena's wiki
	Reading / Work	Edit Page \| Page History \| Wiki Help Print \| Search: