Your input is a 200Mb byte array and your output is a corresponding string using 2 chars per input byte, so that it takes 800Mb to store. At one point both are in memory, so you need 1Gb. You also have to assume some memory will be used by the intermediate structures during the execution of your algorithm, in-between GC passes.

Solution 1: Buy more memory.

Solution 2: Tweak your algorithm so that it does not require the complete input to be available at a single time. Then process in chunks.

Solution 3: Implement regular expressions over bytes, or rewrite your algorithm to not use regular expressions. The idea is to use a 200Mb byte array as the only big structure permanently in memory. This assumes you are OK using 200Mb.

As an aside, this is not what you asked for, but there are plenty of things you can do to the code to make it run faster - not faster in theory, as your complexity is linear which is OK, but by a constant factor faster, which can be significant in practice. For example, try to not allocate the intermediate strings. Use StringWriter and Write("{0:x2}", x).

By on 5/29/2012 9:31 AM ()

Thanks for your input!

I realised that I clearly was not addressing the real problem which was me trying to analyse a 200mb input in one string.

I changed my algorithm to use the pdf file structure to split de file in the parts i needed and now im getting decent performance and resources usage.

By on 5/29/2012 6:16 PM ()
IntelliFactory Offices Copyright (c) 2011-2012 IntelliFactory. All rights reserved.
Home | Products | Consulting | Trainings | Blogs | Jobs | Contact Us | Terms of Use | Privacy Policy | Cookie Policy
Built with WebSharper