Archive for Research

P2P, Streaming and CDNs: What Will Really Work?

From the wondering-out-loud dept. comes this question: Is peer-to-peer (P2P) technology on the verge of radically changing the content-delivery marketplace? And if so, what does that mean for both content producers and content delivery networks — more opportunity, or threatened business models, or both all around?

While there’s no single news nugget to point to emphatically, a series of recent announcements, posts and observations all seem headed in the direction of a big collision between traditional CDNs, P2P technology and streaming video. Out of the pileup, we see the following questions that don’t yet have clear answers; but please feel free to provide some in the comments arena.

  • What happens to the traditional CDN business when P2P is added to the mix? According to this week’s news from CacheLogic, it means more flexibility and cheaper pricing for content providers. Akamai last month bought its own P2P play, RedSwoosh. And how do BitTorrent and upstart Neokast fit into the equation?
  • Does a combination of CDN and P2P solve some of the quality-of-service issues many service providers were predicting that heavy video use would bring? If so, what happens then to AT&T’s and Verizon’s IPTV business models, which were built somewhat on the idea of being able to charge premiums for faster video delivery?
  • When will Google and Cisco flex their considerable infrastructure muscle to take (even more) advantage of the growing demand for online video? On Wednesday Google took one step in that direction by making video search part of its powerful first page of results — wonder how that went over in Sumner Redstone’s office.
  • Cisco, meanwhile, confirmed its intentions this week to offer such networking services, which we had wondered about previously. Even as Cisco second-in-command Charlie Giancarlo tried to dispel notions that such a service would be consumer-pointed, or have a Cisco brand name, it’s clear now that the networking giant is going to move beyond boxes — but what does that mean to all its service-provider customers?
  • What are the new business models that better content delivery technology could enable, beyond Joost, Justin.TV and Ustream?

    As you are crafting your opinions, some more P2P/CDN nuggets:– Most BitTorrent traffic is TV shows, not movies. (TorrentFreak)
    – Online media requires a hybrid approach? (Streaming Media)
    – New Flash Player will enable P2P for .FLV clips. (Beet.TV)
    – Can Joost overcome Infrastructure Problems? (NewTeeVee)
    – CDN Startups Talk Tough (Light Reading)

    So what do you think?

  • Comments (1)

    Gear and mashup problem (speaker: yahoo)

     http://video.google.com/videoplay?docid=452089494323007214

    The Yoda of lambda prgramming and google gear.

    any damn fool could produce a better data format than XML (James Clark, 2007-04-06)

     Java

    • Java was a huge failure
    • Very popular, high acceptance
    • “Write once, run everywhere” promise not kept
    • Unworkable “blame the victim” security model.
    • Tedious UI model.
    • Seccessful as a server technology.

    Ajax

    • Applications without installation
    • Highly interactive
    • High social potential
    • Easy to use
    • Great network efficiency
    • but it is too damn hard to write applications

    Mashups: The most interesting innovation in  software development in 20 years, but mashups are insecure, mashups must not have access to any confidential informations

    Why?

    Javascript dumps all programs into a common global space; There is nothing to protect the secrets of one component from another; Any infromation in any component is visible to all other components.

     Drivers of innovation

    1. Proposal
    2. Standard
    3. Browser Makers
    4. Application Developers

    Leave a Comment

    Data Center Networking

    Currently, Data Center applications are more and more important to the company such as microsoft, google, amazon. In this senario, Should we still need the TCP/IP network stack? We know the TCP/IP is target for complexity environment, the routering, the failure handling. Those are unnecessary for the data center, maybe it is time for us to redesign the network stack in the data center.

    Here are some of my initial ideas about this topic.

    We partition the machines in data center into several groups. In each group, the machines are all connected. We don’t need maintain the connections, the resent machenis.

    1.jpg

    Comments (1)

    Dynamo: Amazon’s Highly Available Key-Value Store

    System Assumptions and Requirements

    • Query Model: simple read and write operations to a data item that is uniquely identified by a key.
    • Dynamo targets applications that operate with weaker consistency if this results in high availability. Dynamo does not provide any isolation guarantees and permits only signle key updates.
    • Efficiency: The system needs to function on a commodity hardware infrastructure. services have stringent latency reqquirments which are in genneral measured at the 99.9th precentile of the distribution. (it will provide a response within 300ms for 99.9% of its requests fro a peak client load of 500 requests per second.)

    Design Considerations

    • Weak Consistence ( eventually consistent)
    • Application resolves conflicts (always writable)
    • Incremental scalability
    • Symmetry
    • Decentralization
    • Hetergeneity

    System Interface:

    • get(key) : return a single object or a list of objects with conflicting versions along with a context
    • put(key, contect, object)

    Experiences and lessons learned

    • The main advantage of Dynamo is that its client applications can tune the values of N, R and W to achive their desired levels of performances, availability and durability.
    • Using an object buffer in each node main memory. Each write operation is stored in the buffer and gets periodically written to storage by a writer thread.
    • 99.94% of requests saw exactly one verison;0.00057% of requests saw 2 versions; 0.00047% of requests saw 3 versions and 0.00009% of requests saw 4 versions (amazing)
    • client-driven coordination is better than server-driven coordination.
    • Balancing background vs. foreground tasks.

    Leave a Comment

    The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software

    http://www.gotw.ca/publications/concurrency-ddj.htm

    并行编程中必须考虑的两个问题是被处理数据和任务间通讯。经过用户的选择与市场的淘汰,现在的并行编程标准基本上趋向以下三种:

    1、数据并行。特点,各任务处理的数据彼此分离,任务间通过消息传递进行通讯;数据分离和消息传递工作由编译器完成。

    HPF(High Performance Fortran,高性能Fortran)是典型的数据并行编程语言。因为目前的编译器技术对实际应用中各种不规则问题的解决方案仍不够理想,加上专注于数据并行,因此HPF未获广泛应用。

    2、消息传递。特点,各任务处理的数据彼此分离,任务间通过消息传递进行通讯;数据分离和消息传递工作由程序员和用户完成,因此对程序员要求很高。这种模式非常适用于消息传递的体系结构(如机群系统),用户和程序员主要需考虑的是通讯同步和通讯性能问题。

    并行虚拟机(PVM,Parallel Virtual Machine)和消息传递接口(MPI,Message Passing Interface)是两种广泛使用的消息传递并行编程标准。其中PVM侧重异构环境下的可移植性和互操作性;MPI更强调性能,但在异构环境下有不同的实现。几乎所有的高性能计算系统都支持PVM和MPI。

    3、共享内存。特点,各任务处理的数据实现内存共享,任务间也通过共享数据实现通讯;数据共享可由程序员或编译器完成。共享内存并行编程主要应用在对称多处理器(SMP ,Symmetric Multi Processors)系统上。

    OpenMP(Open MultiProcessing由X3H5发展而来)和PThread(POSIX Thread)都是共享内存并行编程的实现。

    OpenMP由1993年建立的X3H5标准发展而来,目前已成共享内存并行编程的实际工业标准,得到DEC、Intel、IBM和Sun等厂商广泛支持。它在Forthan、C/C++得到了实现,主要支持隐式并行编程,即编译器实现并行。

    PThread主要在Unix系统上使用。Unix的实现系统很多,比如Linux、FreeBSD、Solaris、Mac OS X等。要在众多“类UNIX”上开发跨平台的多线程应用,绝非易事,因此制定了POSIX Thread标准。David R. Butenhof(Boost库发起者之一,ISO C++标准委员会成员)的《Programming with POSIX Threads》这本书,可以说是Unix上编写多线程应用的必备参考书。对其他平台并行程序开发也有很高参考价值。

    总的来说,共享内存并行编程与目前大多数的多线程程序员思维习惯最为接近,是程序员从单核转向多核系统需付代价最小的方案。但专家仍有不同意见,比如Herb Sutter就不看好OpenMP,因为共享内存并行编程本质上并没有太多改进,仍然依赖数据资源的锁定,这会带来性能问题。消息传递并行有性能优势,但对程序员的要求又太高了。所有这些难题,还需要研究并行和各种标准、库的专家继续努力解决。

    Comments (1)

    AltaVista Index Talk (Mike Burrows)

    http://www.researchchannel.org/prog/displayevent.aspx?rID=2123

    some notes:

    1. continuation design is good, avoid a lot of unused branchs.
    2. For some critical code piece, Choose instructions to dual-issue well, Fixed word structure allows prefetch, Avoid branch mispredictions.
    3. branch mispredications will suffer a lot of performance, but the question is that in x86 how to reduce the branch mispredication, I am very familiar with how to optimize the application in RISC, but not x86 micro-architecuture. Now I use the Vtune tools to get the performance results but don’t know how to reduce the branch mispredications. If you have some ideas, please tell me. thanks a lot.
    4. The interface for index stream readers(ISRs):  loc(),  next(), seek(X)
    5. constraint solve processing is very critical to index serve.
    6. Queries take about 100 cycles/query/MByte(AltaVista), 1.5G index size
    7. 30% inner loop, 15% constraint solver, 15% higher level seek code, 7% ranking code, 0.2% merging results, Miss ratios: 2% I-cache, 8% D-cache, 8% level-2 cache, 40% level-3 cache. It seems that the current ranking algorithm is more complex than the one in 2000.

    Comments (2)

    Mashup os

    Today, we discuss a sosp paper “Protection and Communication Abstractions
    for Web Browsers in MashupOS”, as we known that MashupOS is very popular now, maybe it is the killer to the tradition operation system, maybe in the future, we can do anything by the brower, the data are kept in the internet, in some big services companies like google, facebook, yahoo, live, msn. The paper is to give communication abstractions for browsers and how to protect them. we know that, the current browsers do not support the communications between the different domains, but MashupOS need these communication, so maybe in the future the browsers will support these functionalities, so in that point, browsers will be like a new operation system. How to avoid those problems we ecounter now (memory leak, buffer overflow) ? This paper is to want to solve these problems in the mashup os.

    Comments (1)

    Older Posts »