Anything can be done :)

Are you talking about a file that's many hundreds of megabytes or more? -- if not, then you may just as weel read it in to memory, modify it, and write it out. (Unless you're not talking about a modern workstation.)

If this is not a one-off task and it needs to be quickly (at run-time), then I'd do some research into memory-mapped files on windows. (And use a 64-bit machine if you can.)

The next best thing would simply be to do the usual buffered (chunked) read/process/write of the file, possibly using async/interleaved I/O.

Actually modifying the file *in-place* sounds like a bad idea as it's destructive and can't be rolled back, and it will likely be slower as you'll be blowing the disk cache out of the water by modifying the same neighborhood your reading. (And, of course, if you ever decide to insert/delete characters, you're hosed.)

If your source file and destination file are on different physical drives, you'll be much better off.

If this is a homework assingment, there are lot's of choices :)

If you have a specific plan and are having trouble expressing this in F#, then this may be the group - otherwise you may want to try a general .Net forum.

By on 1/26/2010 11:04 AM ()

Okay, Whoa...

Google Reader just flagged this as a *new* thread, which is why I repied.
Sorry...

By on 1/26/2010 11:08 AM ()

Thanks for your reply. It was a new thread before you answered :-)
I was talking about files >100GB. I guess memory mapped files is the right solution. Thankfully, .net 4.0 supports it natively, without P/Invoke - looking at it now.

Okay, Whoa...

Google Reader just flagged this as a *new* thread, which is why I repied.
Sorry...

By on 1/26/2010 12:41 PM ()

Man, I must need coffee :) - the 2nd reply was because I quickly parsed your join date as the post's date -- it's weird that the former has such a prominent position in the UI. Anway...

Seeing the standard buffered copy appoach is a no-brainer, before you do the MMF approach you can try that first and see what the timing is like. (Seems as a back-of-the-evelope calculation that it should take about three hours for 100GB? ) I could be wrong about MMF being better in this case, as you'll have to keep on re-mapping segments of the file on a 32-bit machine, and on a 64-bit machine it would be interesting to see how windows write-behind and caching compares to manual buffer-mangement. (It's been years since I've seen these issues addressed.) Please publish results.

Of course, you could parallelize things as well - if there's truely no state involved as you say.

If you don't mind me asking, what kind of domain/task involves lower-casing > 100GB files? - that's alot of typing with CAPSLOCK on :)

By on 1/26/2010 1:02 PM ()

Bioinformatics. ACGT->acgt

If you don't mind me asking, what kind of domain/task involves lower-casing > 100GB files? - that's alot of typing with CAPSLOCK on :)

By on 1/27/2010 6:14 PM ()
IntelliFactory Offices Copyright (c) 2011-2012 IntelliFactory. All rights reserved.
Home | Products | Consulting | Trainings | Blogs | Jobs | Contact Us | Terms of Use | Privacy Policy | Cookie Policy
Built with WebSharper