Iterating multiple soap responses with Yield Return
  Posted April 23, 2004    PermaLink    Comments (0)  

My previous post on Streaming Multiple SOAP Responses actually was going somewhere web services related (other than 'CSV Rules! (for tabular data)'), but it got rather long in the tooth so I truncated it.

So how could you solve this with SOAP? You cannot use an array of returns, because you have to parse that last closing element to established well formedness before you pass the response out of the soap layer. But of course, you could just use multiple responses in the same stream, complete with multiple soap envelopes. Kind of like when you and your three roommates get the same credit card offer, the post office box has three slightly different envelopes. Message Queues, SMTP, and other strange transports may have difficulty expressing a complete set of the responses and linking them with a request. But In HTTP (the most common transport) this is simple: you send the documents back to back, and when you would normally complain about data following the document element instead you sever the stream and start a new document, as though it's in a different stream. You could do more strange stuff like require one envelope per chunk to aid in parsing, but intermediaries that don't understand that requirement may make new chunks or even merge chunks, it's their rights as proxies.

Interestingly enough, in the 2.0 work of WSDL the W3C separated the semantics of the content of the operations from their use. Part 2 explains Message Exchange Patterns. Even though all the patterns are zero or one request responses my quick perusal did not indicate any prohibitions from extension patterns supporting multiple responses (correct me if I'm wrong), so in theory some semantic like this could be done with enough standardization, like say a in-multiple-out pattern. I suppose you could do multiple requests, but that is about as usable as a multiple-instruction-single-data CPU architecture; theoretically robust and complete but practically useless.

But there was one piece missing from this puzzle... how does this gently transition to simple language features? If you return an array and stream the responses you then have to wait yourself to get all the data before returning, which really defeats the purpose of a dribbling stream. ONDotNet however shows a potential answer in a new C# language construct, the yield return. When you have a piece of data you immediately call yield return foo and that signals the language to go in and out of your method context to push the result down the wire. In a non C# context rather than using goofy context methods and listeners a similar thing could be done by modeling the methods as stateful iterators.

It could happen, in a standards driven world in 2-3 years on a broad basis perhaps. This may be the most compelling reason to do something other that zero or one request or response patterns already described. But I'm a realist, CSV will have to do for a good while.

Streaming Multiple SOAP Responses
  Posted April 16, 2004    PermaLink    Comments (0)  

Sometimes it's just hard for the newest fangled spangeled technology like SOAP and WSDL to beat tried and true technologies like Comma Separated Values, and sometimes it is the most benign of examples that seem to prove it.

Consider the following situation: you are searching or listing some potentially large data set, like a file system. You don't know how much stuff you are getting, but you know when you do find something you know all you will ever need know about it. But conversly this knowledge is atomic: you know nothing until you know everything. Furthermore getting that information has some non trivial cost that is basically static. And for the sake of argument all of this API is hidden behind some horrible facade that you cannot access (yea, I could be less abstract but, you know, NDA this, trade secret that, confidential interface here, yadda yadda yadda). How would you write a SOAP service to access that and get the data?

The obvious solution is to queue all of the result up into a single query. That's all fine and dandy, except that you then get all of the data at one time. That may not be desireable: what if you are showing a table with that data north of the one comma when you pretty printed range and after five minutes without data the user/QA Engineer/Customer thinks the query is hung and kills the app, never mind the fact that the data was available within 2 seconds of submission, essentially we have made the time to first data the same as time to all data plus network overhead. Why? Because of Well-Formed XML constraints; I don't know the soap response is valid XML until I receive the close tag, and for an unbounded element I don't how many there are until the close tag for the wrapper appears, and then because of threading issues it just doesn't deal well with common soap toolkits.

Can we could cache the results from the query and get the results in multiple calls? Then we have to cache it on the server side which means we need to keep soap sessions or handle callback objects or other such messiness. And what if the query is aborted? Then how do you deal with left over data no one ever came to get? LRU methods could also cause the data to be pushed out before you could get it when you have a high number of users or calls unless you make the chances very big, not to mention the bug potential. And even if we could do a quicker call to get just some key for each piece of data and return that quicker we still would have to go back and make N calls, which if the overhead of each individual SOAP call is any single digit fraction of the time to get the data from the key (or worse yet, a multiple) then the first row may be 2 seconds but the total time to get all of that data is now 150% to 300% what it was before if the overhead was 1/2 or 2x.

So how does CSV fit into this? First off a CSV file can have full context for any given row taken out of the file at any point (the agreement of column meanings is the same as needing a WSDL to describe it). Parsing single lines is a rather trivial state machine that college freshmen could implement as a weekly CS101 lab. You don't need some magic end of content marker like a close tag, it's always a newline outside of quotes. Detecting when you are done is as simple as detecting the fact the file is closed, and if a user abandons a request mid stream the socket errors tell the querying engine to stop creating more data. It's much like feeding a baby: you move the stuff in the jar one spoonful at a time to the baby and you stop when the baby spits up, gets resistant, or you run out of baby food.

Optimizing Web Services are much like other optimization tasks, there are hot spots where you focus on tweaking the algorithm or dropping down to assembly language if you have to. And when you optimize for raw speed sometimes you get other pleasant surprises too: it's 10x to 100x smaller as well.

Web Service aren't Distributed Objects?!
  Posted December 03, 2003    PermaLink    Comments (0)  

Newsflash: Web Service aren't Distributed Objects [via The Server Side].

I have just one thing to say about that..... Duh! Maybe that's why it's getting greater traction that CORBA and Java RMI. RPC is a lot simpler than Distributed Objects, and sometimes the simplest solution is the best.

WS-I Basic Profile Testing Tools
  Posted August 13, 2003    PermaLink    Comments (0)  

For anyone that's been reading the technical news lately WS-I relased the 1.0 WS-I Basic Profile (WBP). Basically it's a document that takes several of the published documents relating to web services and removes as much of the ambiguity that it can. Clearly there was some policiticing in the process in some of the rules but most of the "don't do that" rules fall under the realm of vauge and ill-defined.

One of the nice things is that they also have a set of testing tools. They current;y in beta but cover the bulk of the big "gotchas" from the spec. Implmentations are available in Java (based predominantly on Xerces and Axis) and C# (based on .NET). Source is not available. I spent most of today running these against the WSDL definitions I am developing and glean some suprising details.

  • Axis 1.1's WSDL2Java accepts a lot of WSDL that lies outside of the WBP. That's possiblbly because it generates a lot from Java2WSDL that lies outside of WBP. I consider that a good thing for WSDL2Java becuase it accepts what is within the basic profile (it's a tool, not a WSB conformance cop). I consider this a bad thing for Java2WSDL since some users are just going to blindly use the tool and say "Look! Web Services! Oohh. Aahh." Highlights (all WBP restrictions from the original WSDL spec) are the name element missing from the wsdl:fault element in the bindings, creating types instead of elements for the parts that are fault responses, and using wsdl:import to point to schema files.
  • SOAP Arrays are gonners. This is one piece of conformance I won't be conforming to for a while. Why? For structures themselves I can use other structures to get WSDL2Java to create arrays. The problem lies in the port definitions. If I use the tricks the WBP recommends to construct arrays WSDL2Java generates java classes that consist of a single indexed JavaBean property, and the methods consist of parameters of those classes and not parameters that are arrays. If I use the SOAP Array extensions I get a method call that has arrays instead of that bogus class. This seems to be a tough problem for Axis to address in a non-specialized form. Perhaps there could be a processor rule that checks to see if a type for a message part consists of a single unbound element, and if so then the method part becomes an array with the name of the element child of the element's type. I'de hate to be an IBM programmer working on WebSphere right now. As for .NET programmers? They'll never know since the C# version of the test tool doesn't flag this error.
  • No method overloading. There can be only one bound port and operation corresponding to a name. Differing these operations by message parts is not allowed, the names must be distinct. The reasons make sense. Using overloading makes your WSDL a leadky abstraction, showing too much of the underlying languages features. That, and some client langugages don't allow method overloading and that would break stuff. We want to be sure that those COBOL programmers can use your web service too!
  • Goodbye RPC/encoded and hello RPC/literal. This is potrayed as one of the biggest political force feeds in the spec, but for clear definiton of your web services XML struvture it has some merit. The use attribute on all of the bindings must be literal and can never be encoded. From an Axis viewpoint it means very little. You will only see a difference if you snoop the wire and see that all of those multi-ref elemetns are missing. What's really funny is that WSDL.exe (from Micsosoft's .NET SDK) dosen't accept RPC/literal and demands RPC/encoded or document/literal, accepting nothing else. It whines like a mule and refuses to create oerpations when you try and feed it RPC/literal. This is another poriton of the WBP I won't be conformant to for a while, becuase I use C# to prove some base level of interoperability. So when Microsoft gets their rear in gear I'll get mine. Did I mention I'm going on a vacation soon?

And one last tidbit: don't use circular schema imports. The Java version of the testing tool will start in an infinite recursive loop that no number of StackOverflowErrors will ever be able to stop. You probobly shouldn't do it anyway, but the testing tool won't tell you it's a problem (since it won't tell you anthing).

Straw Man, SOAPy Water
  Posted July 16, 2003    PermaLink    Comments (0)  

Here we have a piece of RDF advocacy masquerading as a SOAP RPC vs. Document Encoding article. The problem is that once you get past the first section the arguments the author uses supports in no way the contention made in the opening setction, they are in fact blasting some unrelated topic that he has vaugle associated witht he original contention.

This is a common argumentation fallicy referred to as a "Straw Man Arguement." The layman's definition is that the arguer brings in some unrelated issue (the straw man) and proceeds to destroy it, and then claim that he was actually destroying the original argument, while actually not addressing the issue in any tangible fashion.

Here's the essential summary of the technical porton of the article: "Using RDF is more descriptive than a real bad tree encoding stored in a hash table." A slightly longer version is "Using RDF (let's demonstrate this with a document encoding) is more descriptive than a real bad tree encoding stored in a hash table (let's use RPC encoding for this example)." The issue of SOAP binding encoding has absolutely nothing to do with the real philosphical issue at hand, and instead he is beating the RDF drum. His "perfect" RDF example could just as easily been done with the <soap:binding style="RPC" ... > . But that would have ruined his groove, so he ignores that fact.

Despite the misplaced direction there are some good practices that we can draw from this example. Fist, use the WSDL file as the master definition. In the first example he uses a .jws to deploy the first web service, which is quick and dirty, with an emphasis on the dirty. Despite how hard you try you cannot separate the quick from the dirty using .jws files. By using a WSDL as the master in the second he has also increased the interoperability of the entire system since leaky abstractioins from Java to WSDL won't get translated into any third language, with second level abstracton leaks. With the WSDL as the master each client has to only map the WSDL issues and not inherit old Java issues.

Second: some data structures jus plain old suck. Anything is better than the encoded mess that is the hashtable/tuple mapping with encoded data. Not to mention that using a tree path as a string is so LDAP. Even RDF was better than the first example (it wasn't that hard).

Third: If you have data format already in XML Schema, just go ahead and use it in the WSDL. You don't need to re-invent the wheel if you already have your data format. If you want to use RDF as your message parts in a WSDL call go for it! This issue is totally unrelated to whether the client should consider this an RPC or document/literal call (which they can freely ignore if they choose).

Lots of good issues in that article, but it fell prey to the trendy RPC vs. document debate in the SOAP community while having nothing to do with it. And why the blind allegence to document, is he applying for a job at Microsoft?

Axis of Cookies
  Posted May 05, 2003    PermaLink    Comments (7)  

I swear, some times it feels like I spend more time reading third pary code to see what it's really doing than writing code of my own. Today's subject of abuse will be Apache Axis. Here's the use case: Using a java plug-in applet inside a web-app that uses form based authentication access a soap service behind that authentication. You don't have the password available (there are tricks to get the username, but you really have to think outside the box to intercept the password, and forget about C3 certification at that point!)

My first line of thought was that Cookies are magically added to URL requests in the plug in so it must work in axis. That is a cool and very real feature of the plug in, but it only works when using a java.net.URL, but Axis dosen't use it, it uses some custom HTTP sender with support for the Jakarta Commons HTTP Sender. Thank you for plauing try again.

What worked for a while was the URL re-writing method form the servlet spec, basically using http://shemnon.com/speling/;jsessionid=<sessionID>. That works fine on some servers, but not others. The hitch is that in the spec URL re-writing is only meant to be used when the client rejects the cookies. One app server accepts the session id in both the cookie or the url re-writing, but anohter one we need to support doesn't. And you know what sucks? It's the second app server that is probobly properly implementing that part of the servlet spec! We're not playing horseshoes here so I need to keep going.

So as little as I want to I have to dive into the JAXRPC spec. The truth is that Axis's WSDL2Java is so easy to use from a coding perspective you rarely need to use the standard APIs that it implements. It just looks like another RMI-style interface that maigcally works. Believe me, I like stuff that magically works, it allows me to go home after my 8 hours are in.

It turns out that the classes you get from the ServiceLocaters are also javax.xml.rpc.Stub classes. These provice access to some standard properites, like if I needed HTTP Basic authentication I'de be in luck, but alas no dice, I need to use the currently bound user. There is also another standard property that will maintain the http session but it maintains only new sessions! And it suckes when all you get is "302 Found" messages to the login form when you don't know the password. And apparently the session is only maintained on a per-service basis as well (per instance of the serivce to make things worse).

This is where open source and source available software comes in handy. After diving into the setMaintainSession(boolean) code deep inside of the transport handlers I came to the conclusion that the cookie was stored in a property set that inhereited ultimatly from the same properties that the service uses to read the standard JAXRPC properties. So all I need to do is set the cookie in the service session when I get it from the service...

import org.apache.axis.transport.http.HTTPConstants;

/* ... */

MyServiceLocator locator = new MyServiceLocator();
MyService serivce = locator.getMyService();
((javax.xml.rpc.Stub)service)._setProperty(
    "javax.xml.rpc.session.maintain"),
    Boolean.TRUE);
((javax.xml.rpc.Stub)service)._setProperty(
    HTTPConstants.HEADER_COOKIE,
    "JSESSIONID=" + sessionID);

and those last two statements only cost me one afternoon of lost time....