Making Networked Systems Robust

We’ve been building high-performance protocols and networked systems for more than twenty years now. We did this very well– the Internet and other networked systems play a key role in our daily lives and there continues to be a phenomenal growth in data transferred. The problem, though, is that networked systems suffer from faults ranging from infrequent yet crippling distributed denial-of-service attacks to niggling yet frequent performance problems in edge networks~(e.g., Enterprise and University networks). Making networked systems robust to such faults is difficult, I believe, for two reasons. First, the fundamental value of some systems, such as web-sites, is tied to their being openly accessible. This means there is little control on requests and traffic and so these systems have to maintain good performance despite unpredictable or malicious inputs. Second, systems that one can control, such as campus networks where an administrator has full control, are beset with scale and complexity issues. Software, hardware and applications from many vendors, with complex inter-dependencies have to work together. When users see poor performance, we need a model of what affects the user’s action and in what ways to identify the causes of the problem. But, such models don’t exist today. In this talk, I will present two examples of robust networked systems that address each of the above two problems. For faults due to uncontrollable external factors such as a denial-of-service attack, Kill-Bots, a web-server protection mechanism reacts quickly, re-apportions server resources to maintain good performance despite ongoing attack, and eventually mitigates the attack by detecting the attackers. Kill-Bots is the first system to address a novel kind of DDoS attacks that mimic legitimate users and has lead to significant follow-on work. For faults due to complex inter-dependencies within an enterprise, Sherlock learns the underlying functional dependencies between these components, encodes the dependencies in a probabilistic inference graph and then uses the graph to quickly identify the causes of performance problems. We deployed Sherlock in a portion of the Microsoft Enterprise Network and demonstrated its practical use.

      — Srikanth Kandula

Comments

Developing, Optimizing and Hosting Data Driven Web Application

How to develop the thousands of web applications in very simple/efficient way. More and More Data driven web application are shown now. Maybe we need a more powerful language to help us to develop those applications, LINQ, Hilda is on this way.

Comments

P2P, Streaming and CDNs: What Will Really Work?

From the wondering-out-loud dept. comes this question: Is peer-to-peer (P2P) technology on the verge of radically changing the content-delivery marketplace? And if so, what does that mean for both content producers and content delivery networks — more opportunity, or threatened business models, or both all around?

While there’s no single news nugget to point to emphatically, a series of recent announcements, posts and observations all seem headed in the direction of a big collision between traditional CDNs, P2P technology and streaming video. Out of the pileup, we see the following questions that don’t yet have clear answers; but please feel free to provide some in the comments arena.

  • What happens to the traditional CDN business when P2P is added to the mix? According to this week’s news from CacheLogic, it means more flexibility and cheaper pricing for content providers. Akamai last month bought its own P2P play, RedSwoosh. And how do BitTorrent and upstart Neokast fit into the equation?
  • Does a combination of CDN and P2P solve some of the quality-of-service issues many service providers were predicting that heavy video use would bring? If so, what happens then to AT&T’s and Verizon’s IPTV business models, which were built somewhat on the idea of being able to charge premiums for faster video delivery?
  • When will Google and Cisco flex their considerable infrastructure muscle to take (even more) advantage of the growing demand for online video? On Wednesday Google took one step in that direction by making video search part of its powerful first page of results — wonder how that went over in Sumner Redstone’s office.
  • Cisco, meanwhile, confirmed its intentions this week to offer such networking services, which we had wondered about previously. Even as Cisco second-in-command Charlie Giancarlo tried to dispel notions that such a service would be consumer-pointed, or have a Cisco brand name, it’s clear now that the networking giant is going to move beyond boxes — but what does that mean to all its service-provider customers?
  • What are the new business models that better content delivery technology could enable, beyond Joost, Justin.TV and Ustream?

    As you are crafting your opinions, some more P2P/CDN nuggets:– Most BitTorrent traffic is TV shows, not movies. (TorrentFreak)
    – Online media requires a hybrid approach? (Streaming Media)
    – New Flash Player will enable P2P for .FLV clips. (Beet.TV)
    – Can Joost overcome Infrastructure Problems? (NewTeeVee)
    – CDN Startups Talk Tough (Light Reading)

    So what do you think?

  • Comments (1)

    Memory management

    Recently, I am busy to build up a library. I found that the job to define a well friendly library is an art.  :)

    The interface is diffcult, you should think over all the corner cases that users may misuse APIs. And I found that I should refactor the memory management code. If you cannot support a clean memory interface, users are easy to write a wrong program that will crash the whole system.

    Comments (2)

    how to optimize memory usage in your application

    Comments

    building a javascript library (jQuery)

    http://video.google.com/videoplay?docid=-474821803269194441

    jQuery is very popular library. Maybe someday i should look through it.

    How to write a good library?

    • write a solid api ( made a grid, filled in the blanks)
    • Fear Adding Methods ( Methods can cause a support nightmare; avoid adding, if you can; defer to extensibility)
    • Embrace Removing code
    • Provide an Upgrade path
    • Reduce to a common root
    • Consistency
      •  naming scheme and stick with it
      • argument position (options, arg2, … , callback)
      •  Callback context.
    •  Namespacing (Questios to ask)
      • Can my code coexist with other random code on the site?
      • Can my code coexist with other copies of my own library?
      • Can my code be embedded inside another namespace?
    •  Perform Type Checking
      • Make your API more fault resistant
      • Correct values whenever possible
      • Error message
    • Errors
      • Never gobble errors
      • Ignore the templation to try { … } catch(e) {}
      • Improves debug-ability for everyone [weil: Mike burrows give the same suggestion, he said we should give a assert violation as earlier as we can.]
    • Extensibility
      • Your code should be easily extensible
      • Write less, defer to others
      • Makes for cleaner code
      • Foster community and growth
    • Documentation
      • structured (provide a clear format, users can build new views with it, An API for your API!)
      • Users want to help
        • Make barries to helping very low
        • Keep your docs in a wiki
        • Only do this if you’ve already written all of your doces
        • Use template to maintain structure.
      • Write the Docs Yourself
        • It isn’t glamorous, but it’s essential
        • You must buckle-down and do it yourself
        • Improves your longevity
    • Tesiting (1000% Essential)
      • Test-driven development
        • wirte test cases before your tackle the bugs
        • find devs who love to write test cases
        • check for failures before commit
    • Maintain Focus
      • very very important

    Comments

    Gear and mashup problem (speaker: yahoo)

     http://video.google.com/videoplay?docid=452089494323007214

    The Yoda of lambda prgramming and google gear.

    any damn fool could produce a better data format than XML (James Clark, 2007-04-06)

     Java

    • Java was a huge failure
    • Very popular, high acceptance
    • “Write once, run everywhere” promise not kept
    • Unworkable “blame the victim” security model.
    • Tedious UI model.
    • Seccessful as a server technology.

    Ajax

    • Applications without installation
    • Highly interactive
    • High social potential
    • Easy to use
    • Great network efficiency
    • but it is too damn hard to write applications

    Mashups: The most interesting innovation in  software development in 20 years, but mashups are insecure, mashups must not have access to any confidential informations

    Why?

    Javascript dumps all programs into a common global space; There is nothing to protect the secrets of one component from another; Any infromation in any component is visible to all other components.

     Drivers of innovation

    1. Proposal
    2. Standard
    3. Browser Makers
    4. Application Developers

    Comments

    What kind of the thing is worth to do? We should do some jobs that will impact the others

    Sometimes, I don’t like the computer system research, because seldom research is very useful for the others. people always play some trick games. I hate that.

    Comments

    Data Center Networking

    Currently, Data Center applications are more and more important to the company such as microsoft, google, amazon. In this senario, Should we still need the TCP/IP network stack? We know the TCP/IP is target for complexity environment, the routering, the failure handling. Those are unnecessary for the data center, maybe it is time for us to redesign the network stack in the data center.

    Here are some of my initial ideas about this topic.

    We partition the machines in data center into several groups. In each group, the machines are all connected. We don’t need maintain the connections, the resent machenis.

    1.jpg

    Comments (1)

    Dynamo: Amazon’s Highly Available Key-Value Store

    System Assumptions and Requirements

    • Query Model: simple read and write operations to a data item that is uniquely identified by a key.
    • Dynamo targets applications that operate with weaker consistency if this results in high availability. Dynamo does not provide any isolation guarantees and permits only signle key updates.
    • Efficiency: The system needs to function on a commodity hardware infrastructure. services have stringent latency reqquirments which are in genneral measured at the 99.9th precentile of the distribution. (it will provide a response within 300ms for 99.9% of its requests fro a peak client load of 500 requests per second.)

    Design Considerations

    • Weak Consistence ( eventually consistent)
    • Application resolves conflicts (always writable)
    • Incremental scalability
    • Symmetry
    • Decentralization
    • Hetergeneity

    System Interface:

    • get(key) : return a single object or a list of objects with conflicting versions along with a context
    • put(key, contect, object)

    Experiences and lessons learned

    • The main advantage of Dynamo is that its client applications can tune the values of N, R and W to achive their desired levels of performances, availability and durability.
    • Using an object buffer in each node main memory. Each write operation is stored in the buffer and gets periodically written to storage by a writer thread.
    • 99.94% of requests saw exactly one verison;0.00057% of requests saw 2 versions; 0.00047% of requests saw 3 versions and 0.00009% of requests saw 4 versions (amazing)
    • client-driven coordination is better than server-driven coordination.
    • Balancing background vs. foreground tasks.

    Comments

    « Previous entries