Why there is no support for compression in ExpansionTree:

  The state encodings used by ExpansionTree can get quite large if a tree
  has many expanded branches.  Several other approaches to creating tree
  widgets include automatic zlib-based compression of these strings to keep
  data sizes to a minimum.  ExpansionTree doesn't.  What follows is a
  technical explanation of why, and how you can work around this.


The primary reason I removed compression support from my code is because the
decompression of a state encoding is a security risk.  Without proper data
validation decompressing user-supplied data can easily lead to resource
exhaustion attacks.  The problem I faced with ExpansionTree is that I didn't
want to make any assumptions about the state encoding that I didn't have to.
Unfortunately, to safely decompress "tainted" data one must make assumptions,
or compromise flexibility.  Lets examine why this is...

  I'd been experimenting with Python 2.2's low-memory use decompression
  hooks when I ran into a situation I couldn't solve.  Before I get to that
  though, lets back up for a second and look at Zope's history of using
  decompression in tree widgets.  The first tree widget library was the
  dtml-tree, and like many early designs, it used depressingly naive code to
  handle compressed data.  It simply called zlib.decompress(state) which, to
  be fair, was all Python supported at the time, but also opens up the server
  to potentially needing an enormous amount of memory all at once.  There
  were a few patches to, and Products based on, dtml-tree after its release,
  but none of them modified the decompression code, so they all remained
  vulnerable to potential resource exhaustion from carefully crafted query
  data.  (Consider that the compression ratio of sparse data is nearly 1000:1
  using zlib.  We're actually lucky that it doesn't offer any form of RLE
  support!)  ZTUtils.SimpleTree was introduced next, and initially it didn't
  support zlib compression.  When it was added in October 2002, it too used
  the naive zlib.decompress() routine.  The compressed state string was
  limited to 8191 bytes thanks to an earlier attempt to avoid denial-of-service
  attacks, (see collector issue 605) but this still allows for a ~8MB payload
  to be delivered (see collector issue 1012).  Similar problems faced Philipp
  von Weitershausen's ZopeTree Product.

  OK, back to my experiments--Python 2.2 added an interface to the zlib
  library to allow for decompression with low memory usage (prior to this,
  breaking up the compressed data stream had to be done by hand, so this was
  a welcome addition).  This let me decompress the data in small pieces, and
  untaint them as I went, unfortunately it also meant having to make some tough
  choices.  ExpansionTree's encoded state strings are a series of encoded
  object ids and depth markers separated by the ':' character.  This led me
  the following code snippet:

    d = decompressobj()
    s = d.decompress(compressed_state_string, chunk_size)
    then = 0
    while d.unconsumed_tail:
        now = s.count(':')
        if now < then + min_limit: raise Error # too few separators
        if now > max_limit: raise Error        # too many separators
        then = now
        s += d.decompress(d.unconsumed_tail, chunk_size)
    if s.count(':') > limit: raise Error       # too many separators

  So we have a few unknowns:
    chunk_size  --  This one is easy, just pick something sane, I used 512.
    max_limit   --  I let the programmer using the ExpansionTree library
                    define this.  Given the encoding algorithm, for a tree
                    with n branch nodes there can be maximum 2n-2 separators
                    present in a valid state string.  Sadly, we can't know
                    the potential number of branch nodes in a tree ahead of
                    time, so it becomes the programmer's responsibility.
    min_limit   --  min_limit represents an approximate value of how many
                    separators there should be per-chunk_size bytes of data.
                    To determine a reasonable value for min_limit we must
                    know beforehand what the maximum length of a valid object
                    id is, after it has been encoded with b2a().

  Big problem with the algorithm above: its not easy for the programmer
  trying to use this library to understand why they need to define all these
  limits or how they should go about deriving reasonable values for them
  without first really understanding all the nitty gritty details of how
  ExpansionTree actually works.

  As the author of the library, I was left with a dilemma.  If a library is
  too difficult to understand, people simply won't use it, it doesn't matter
  how secure it is.  I'd already exposed the max_limit concept in version v0.9
  of ExpansionTree, and people didn't seem to be bitching too loudly--then
  again I had no proof anyone was even using it...  I could simply hardcode a
  value for min_limit, but that would be defining an arbitrary limit on
  encoded object ids, and that still means I'd have to explain why that limit
  exists and hope people can understand.  (As opposed to this document, which
  if you don't understand, it won't hurt you.) So there I was, torn by how to
  proceed, and not particularly happy with any of my options.

  I toyed with a few other ideas, like using crypto to determine validity of
  user-supplied data, and so forth, then my friend Tim Morgan blindsided me
  with the obvious solution that I'd totally overlooked: just use sessions,
  don't let the state string ever leave the server so you don't have to trust
  the integrity of user-supplied data at all.  Duh!  Its was so simple!  It
  *is* so simple!

So there you have it, thats the history behind, and reason why, ExpansionTree
doesn't support state string compression--because there's no need for it given
sessioning machinery, and Zope has sessioning machinery.  If your state strings
get too large to fit comfortably or reliably within HTTP headers, query
variables, or URI path components, just use sessions to store them, and pass
around a session key instead.
