If I understand what you are asking here I would suggest that you do encapsulated transactions and cache the transaction to disk in a binary form. At that point, when you re-start, you not only know where it was, but you have both an understanding of if you have enough data to complete the transaction and log it as complete or need to role it back and log it as failed. The reason for separating the transaction from the log is to more efficiently deal with the transactions from a space and processing perspective (smaller and faster than ascii) while using the log for informational data unless some sort of debugging is on.

If I got what you were asking correctly, then yes, the above mentioned method is a good idea.

By on 9/9/2008 3:02 PM ()

Hi code8,

I don't think I understand completely. Do you mean to say that in addition to what's been outlined so far, you suggest that the host-side IO processing support transactions where possible to make the system even more reliable? If this is what you meant, yes, I think that's a good idea too.

As for your comparison to ascii - if you refer to my "1000 nulls" example, then yes, I certainly think that the details of the record/replay mechanism can change - I was just trying my best to focus on the core idea and leave the specifics of that implementation to the implementors.

Thanks!

By on 9/9/2008 3:38 PM ()

Hi JDK,

I'm a bit skeptical that it would be this easy, but then again, I'm by no means an expert.

Does this approach scale? What about repeated actions, or loops. Suppose I'm sending a thousand mails. A million. Wouldn't the log become very large? Wouldn't it take a very long time to "replay" it when you resume?

Can you still use non-IO bound local variables to store an intermediate result? I guess not, since you'd probably need that result to replay your state afterwards.

All this "automatic persistence" business is very tricky. What about open file handles? Sockets? When you replay, the outside world may rect very differently than when you actually did the action. Purity does not help you much since it's a theoretical notion: the world is not actually pure!

I'm not saying what you're trying is impossible, but I do think that, like you say, the OS it not made for this kind of stuff, and it seems to me that is going to leak through to your abstractions no matter what.

Maybe it would be a nice start if you try to make a domain specific language on top of WF that maybe deals with the complexities you mention?

just my 2c!

Kurt

By on 9/8/2008 12:49 AM ()

Hi Kurt,

These are good questions.

I believe that this approach does scale to reasonable limits for the types of applications that I have in mind. Repeated actions and loops DO work, since the log has been generated from an initial state and then put through deterministic transformations, then the loop is the same each time, and the log will have those responses ready to replay. Right? By this same logic, non-IO bound local <i>values<i> (and mutable <i>variables<i>) would also be replayed and available where needed because, again, those code-defined transformations and the states they came from are all deterministic. It's BECAUSE they'd be pure (by dicipline) that we could guarantee that no different information is available upon replay. And yes, sending 1000 emails, or uploading 1000 documents to SharePoint, or assigning 1000 tasks will certainly result in a larger log (but in the case of the emails, maybe only 1000 bytes of nulls) All IO operations would have to be limited, I'm guessing, to operations that are completely tied up by the workflow host before continuing. That is, no streams would be passed back to user code, only buffers and other fixed structures - arrays of the data read, for instance. The Host would offer these IO operations (email, use a webservice, assign a task, log some reporting data, etc) and allow admins to add additional trusted IO operations. Also, the guest workflows can be composed to build up complex, reusable, and still pure workflows for things like requesting that the host send a certain format of email of a company-specific template. Does that clear up some of the questions? Thanks JDK

By on 9/8/2008 3:38 AM ()

Also, I think the syntax and development of the Monad could be simplified if the CTP's experimental Custom Bindings were explored further and added as part of the language. The custom bindings seems to pick up ANY word with '!' after it.

So instead of parsing out the intent of the request for the host from "let! SendEmailAsync ..." we could just say "SendEmailAsync! ...". This would be more natural for the user and less complex for the monad development. But I can't figure out how to even use this experimental feature - there's no documentation that I've found.

And SendEmailAsync and AssignTask maybe actually turn out to be higher-level pure functions which stitch together lower-level host-supported primatives.

I'm kinda rambling in this reply, but just wanted to make some notes.

By on 9/9/2008 11:16 AM ()

Hi JDK,

The remnants of the "experimental custom binding feature" in the CTP should be ignored and will be deleted in the next release. :-)

[ Aside: For those interested in this bit of archaeology, if you type

1
   foo! x = 1

then you get a warning related to this. However the syntax is always treates as "let!". :-) ]

Thanks

Don

By on 9/9/2008 12:29 PM ()

Hi Don,

Thanks for that info.

But you're exactly the guy I'm interested in hearing from - What about the rest of it? Basically - adding a record and replay ability to the async workflow concept.

I was told about a similiar project written in Haskell called HAppS, although the focus is that it is a web server rather than a BPA engine.

Thanks!

By on 9/9/2008 12:42 PM ()

Hi JDK

It's certainly interesting, and looks feasible. H

Note this line is not correct:

1
        if (Async.Run(result) = Approved) then

Alarm bells should ring if you _ever_ see Async.Run inside a workflow: this is a blocking operation. Instead unblock the operation by binding:

1
2
        let! res = result
        if (res = Approved) then

Don

P.S. Another approach to BPA problems may be to start with a quotation of the workflow, and pre-process it to insert persistence at each binding point. That might allow you to lift the "nothing impure" restriction, and also to optionally translate the entire workflow to one of the BPA object models, or to choose to use other formats to persist the whole workflow. The FSharp.PowerPack.Linq.QuotationEvaluation quotation evaluator would come in useful here, though you should feel free to extend that if needed.

By on 9/9/2008 4:22 PM ()

BPA processes are similar to regular programs except that they often do very little computation compared to the length of time that they are concidered running, sitting idle while still (and undesirably) consuming fixed resources as OS processes do. These programs still may perform arbitrarily computation and interact with the outside world through either-way and two-way communication. Also, BPA processes may need to outlive the OS processes that represent it. BPA is difficult to model well with current technologies because OS processes aren't built for this type of long-running, process-unaware model.

So, a BPA process has the following properties:

- its a long-running process
- it sits mostly idle, consuming fixed resources in an OS
- from time to time, it interacts with the outside world
- the process serializes its state to disk so that it can be reconstructed on demand whenever the process ends.

From my point of a view, a messageboard already has those properties. You can think of a thread as the BPA, and the posts in the thread being the stateful information; posts are serialized to disk to persist across process contexts, and de-serialized back into memory whenever a user wants to read the thread. Threads are mostly idle, not interacting with the user too often. Threads persist across application contexts, they can be reconstructed after a server reboots or a power outage.

If we concede that a messageboard thread satisfies the definition of a BPA, then it turns out that writing process-unaware code is a common, mundane programming task in any language.

The BPA you describe, which sends emails to people, is fundamentally no different from the one I described -- as long as you save the state of your process to disk from time to time, you can in principle terminate and reconstruct the process on demand and across application contexts.

Please correct me if I misunderstood you :)

By on 9/6/2008 7:57 PM ()

I don't think you misunderstood me; I think you're being intentionally obtuse.

By on 9/6/2008 8:30 PM ()

I don't think you misunderstood me; I think you're being intentionally obtuse.

Not trying to be obtuse -- at least not intentionally. What you're trying to describe in your opening post just goes over my head (to me it doesn't sound any different from caching data to disk, but evidently you mean something more abstract), so I apologize if my reply wasn't helpful.

By on 9/7/2008 10:55 PM ()

Hi Juliet,

No problem. You're right that I'm looking to cache data to disc, but it's the specific kind of data that makes it a challenge.

As an os process crawls along a program, stacks and heaps of data are formed in the processes memory space - the process is implicitly tracking the locations, sizes, and values of that data, but such information is not explicitly exposed. I'm making stuff up here, but a debug-mode program could maybe pause and store this info to disc, but in order to bring it back again - probably at a different location in memory - I don't think the debugger would have to be really smart to go in and update ALL the pointers to reflect the new locations. I don't know about that though. I'm not sure if the memory manager on the CPU does all the arithmetic live, storing only the relative values to raw memory, and which version of the addresses that the debugger would see.

In any case, the amount of data to be saved is probably prohibitively large relative and is definitely difficult to analyze. Also, this assumes that we map 1 os process to 1 business process instance since two otherwise unrelated instances could not be practically separated - that's not going to scale.

Rather than that nightmare, I propose that we need to decouple the business process from the OS process by considering the business process as a graph of pure transformations. If we restrict ourselves to writing code in this certain style, then it will be much easier to track all the critical pieces of information that would be needed to reconstruct a process's implicit state if the in-memory state were lost. Although the raw data in memory may be different upon reconstruction, logically at our level of abstraction, the data is identical. Given any initial state, and then any set of deterministic, pure transformations, we can arrive at the same outcome. Now, along the way, we may request that certain non-deterministic operations be performed on our behalf - the results of these IO operations are cached along with the initial state, and are fed back to us during a replay operation. This all should work even if the set of transformations is not known at start-up but is generated along the way - that is, we don't know which path the process will take without running it and the early transformation will tell the host which transformation to run subsequently.

This is confusing, I'm sure. I don't even know what I'm talking about - I'm just trying to explain it the best I can can get someone who's better at this stuff to comment on why it's not the BPA miracle drug, show me where it already exists, or start developing a working host.

By on 9/9/2008 11:55 AM ()
IntelliFactory Offices Copyright (c) 2011-2012 IntelliFactory. All rights reserved.
Home | Products | Consulting | Trainings | Blogs | Jobs | Contact Us | Terms of Use | Privacy Policy | Cookie Policy
Built with WebSharper