JavaBean分布式应用的事务控制效率?

对分布式应用在事务控制方面有一点不明白?
在一个分布式应用中,同一个JavaBean分布到很多个不同的JVM中。如果要对其进行增删改操作,那不是要通知到所有JVM中的JavaBean进行事务锁定,才能进行操作。如果象云计算一样有几万几十万台机器,一个JavaBean分布到了几万几十万个JVM中,那进行增删改操作不会是非常拖累系统?
搜索引擎可以存在事务不一致的情况,反正多几少几张网页、或者过期几天是没有什么关系的,但在企业级应用上,是不允许存在这样的差错的,那在这方面的应用是不是有问题?
还有,如果一个系统中有多个Web Server(Struts+Spring+Hibernate),连到一个db server,Hibernate会不会产生事务上的问题?例如,通过A Web server的Hibernate更改某一pojo,会不会自动锁定B Web Server的Hibernate Session中的pojo直到完成更改。

[该贴被mentat于2008-04-26 10:18修改过]

没有人理我呀?!我总觉得分布式系统的事务与状态处理要花掉太多的代价?总有一个临界点,使得企业级应用在集群规模上无利可图?也就是用在处理事务与状态上的开销大于增加规模带来的好处?

Bang多次提到EJB比Hibernate的好外是分布式的对象或者组件,但好象有资料说,在分布式系统中,80%的对象/组件间访问在同一个JVM中,也就是说分布式可能用80%的精力解决了20%的问题?这是合理的吗?
还有负载均衡load balance的问题,用硬件级F5或者Apache WEB容器级的集群应当可以解决80%的问题吧?可能对于单次访问中对计算资源要求比较大的粗粒度访问会对单台机器造成比较大的负载。

对于fail over,Banq提出了SessionBean比HttpSession的好处有很多,主要有两点:一是SessionBean在应用层,更合适保存客户端状态,而HttpSession则是表现层的东东,用它来保存客户端状态会造成层次划分上的混乱;二是session复制的问题,用HttpSession可能只有通过一些不怎么优雅的方式内存复制、持久到数据库或者LDAP中才能fail over,而SessionBean天然就是分布式的!
但单台机器(特别是对于拥有热后备CPU、内存的机器)的可靠性越来越高,fail over应当很少很少发生吧?就算发生,对于一般性的企业级应用也是可以忍受的吧?

另外,企业级应用能否接受RESTFul这种完全无客户端状态的模式?

有一篇曾经被TSS推荐的关于分布式事务的课程博客:
http://natishalom.typepad.com/nati_shaloms_blog/2007/08/lessons-from-am.html

可惜这个网站被大陆封了,所以,很精彩文章看不见,我转载如下,希望给大家对分布式事务有一个很重要认识,分布式事务关系到大访问量下的数据安全,不重视,就会发生ATM机吐钱的臭事,我们程序员最好丰富自己知识以后,再根据情况决定使用,问问你的客户:给你做的这个软件系统可能有1%可能发生类似ATM吐钱事件发生,让你的客户搞定法律,让碰到这个1%的倒霉用户判它一个无期。世界就清净了。

Lessons from Pat Helland: Life Beyond Distributed Transactions
Distributed Transactions are a common pattern used to ensure ACID properties are met between distributed resources. Since the late 70s, the fIrst generation of this model has been widely used in many large-scale applications that struggle with the difficulties of multiplexing many online terminals. This led to the emergence of the 1st generation TP Monitors (TPMs), such as Tuxedo (Now owned by BEA). The emergence of web based applications in the late 90s drove the creation of 2nd generation TPMs, in the form of JEE application servers, to address similar needs using more open and modern technologies as described in the following article by Subbu Allamaraju: Nuts and Bolts of Transaction Processing. The diagram below illustrates a typical transaction flow in a JEE environment:

Transactions in EJB Application Server
(Source: Subbu Allamaraju, Nuts and Bolts of Transaction Processing)

The increased business demand for greater scalability led many to the realization that the current transaction model is a bottleneck due to its inherit centralized approach. It also makes our systems quite brittle and complex due to the tight coupling that it introduces, as described in a paper by Pat Helland of Amazon.com: Life beyond Distributed Transactions: an Apostate’s Opinion:

Today, we see new design pressures foisted onto programmers that simply want to solve business problems. Their realities are taking them into a world of almost-infinite scaling and forcing them into design problems largely unrelated to the real business at hand. Unfortunately, programmers striving to solve business goals like eCommerce, supply-chain-management, financial, and health-care applications increasingly need to think about scaling without distributed transactions. They do this because attempts to use distributed transactions are too fragile and perform poorly.

This challenge leads us to the emergence of a third generation of TPMs, or what Gartner calls Extreme Transaction Processing (XTP).

What approaches are people taking today to overcome the limitations of previous generations? From Pat:

"..Because the performance costs and fragility make distributed transaction impractical. Natural selection kicks in,... applications are built using different techniques which do not provide the same transactional guarantees but still meet the needs of their businesses"

So the question is how to achieve high transactional throughput and performance without compromising consistency and reliability?





Pat Helland suggest the following principles:

Instead of global transactional serializability we assume multiple disjoint scopes of transactional serializability
Scalable Apps Use Uniquely Identified “Entities” [Nati: Entities represent data elements that are equivalent to Tuples in Space-Based Architecture terminology]
Atomic Transactions Cannot Span Entities
From the programmer’s perspective, the uniquely identified entity is the scope of serializability
Messages Are Addressed to Entities [Nati: in other words, we use messages to manage the workflow between the disjointed transactions - this is pretty much the equivalent of a transaction coordinator].
Entities Manage Per-Partner State (“Activities”) [Nati: in GigaSpaces terminology: each activity is executed within a collocated partition].
In order to avoid distributed transactions you must:

Create/rearrange your object model using entities (objects with IDs), in a such a way that most of your state changes will happen inside the boundaries of a single entity. a change to a single entity is atomic.
For those changes that exceed the boundaries of one entity, changes will not be atomic and the final desired change will be achieved using a sequence of atomic changes, every change request (activity) MUST be implemented to be idempotent because of the chance of retransmitting due to failures.
To accomplish idem-potency of your activities (actions/methods) the entity itself SHOULD hold a history of committed activities per source(entity).
Combining these three principles allows you to safely change the state of the application without distributed transactions, while allowing for almost infinite scaling.

How does this apply to SOA?

A core principle in SOA, event-driven architecture (EDA) and grid computing is the notion of loose coupling of services. The distributed transaction model leads to tight coupling by definition, as it requires intimate integration and runtime dependencies among all of the services that are associated with a transaction. The suggested model of "disjoint transactions" fits in nicely with this type of new architecture. Here's Pat again:

Everything discussed in this paper is supportive of SOA. Most SOA implementations embrace independent transaction scopes across services. The major enhancement to SOA presented here is the notion that each service may confront almost-infinite scaling within itself and some observations about what that means. These observations apply across services in a SOA and within those individual services where they are designed to independently scale.

Pat ends with:

In a few years we are likely to see the development of new middleware or platforms which provide automated management of these applications and eliminate the scaling challenges for applications developed within a stylized programming paradigm. This is strongly parallel to the emergence of TP-monitors in the 1970s.

Guess what, Pat, the future is now!

看了下,好象意思是要通过良好的设计,尽量把分布式事务放在一个可以控制的范围内 (事务要紧耦合?)。
一个实体在一个分布式系统中只存在唯一标识的一个。
今天太忙,明天再看。