Well, matching the full text with a regex should definitely work, but you need to get the regex right of course (which at the moment it isn't). Depending on the situation it might be more appropriate to go through the text in chunks (e.g. lines), but for most HTML-related tasks, there is no need bothering.
Code could look something like this:
1 2 3 4 5 6 7 8
Regex.Matches("input", "[aeiouy]") |> Seq.cast < Match > |> Seq.map (fun m -> m.Value) |> List.of_seq;;
val it : string list = ["i"; "u"]
Maybe with this regex?
1 2 3 4 5 6 7 8 9 10 11 12
let linksFromHtml html = let regex = "http://www.nba.com/games/\\d{8}/([^/]+)/gameinfo.html" Regex.Matches(html, regex) |> Seq.cast < Match > |> Seq.map (fun m -> m.Groups.[1].Value, m.Value) |> List.of_seq
BTW: it's better to label regex groups explicitly, but I still don't get how to put the < and > in my code without having them disappear
deleted ... missed your Seq.Cast
Hi all!
Sorry I haven't answered earlier, I have been QUITE busy the past weeks.
Going back to the topic... I still can't get it. If you remembered, the thing was to get the games that are in the NBA webpage, which each day has an url like this: [link:www.nba.com]
The code so far is this:
#light
open System.IO
open System.Net
open System.Text.RegularExpressions
let mydate = "19/02/2009"
/// Function to transform a date given as "dd/mm/yyyy" to the one required by the NBA webpage as "yyyymmdd"
let dateTransform date = date |> String.split ['/'] |> List.rev |> List.reduce_left (+)
/// Transform the given date to the NBA webpage format
let dateNBA = mydate |> dateTransform
/// Compose exact URL
let myUrl = "[link:www.nba.com] + dateNBA + "/scoreboard.html"
/// Get the contents of the URL via a web request
let http (url: string) =
let req = System.Net.WebRequest.Create(url)
let resp = req.GetResponse()
let stream = resp.GetResponseStream()
let reader = new StreamReader(stream)
let html = reader.ReadToEnd()
resp.Close()
html
//The actual HTML content
let webpage = myUrl |> http
//Extraction of matches for the date
let linksFromHtml webpage =
let regex = "[link:www.nba.com]
Regex.Matches(webpage, regex)
|> Seq.cast < Match >
|> Seq.map (fun m -> m.Groups.[1].Value, m.Value)
|> List.of_seq
The part that's failing is that last one who you kindly tried to help me with. I've gone through the entire chapter on regular expressions on the Expert F# book but haven't been able to fix that last function.
As it is now, it claims that Regex.Matches(webpage, regex) should have type 'unit but has type 'bool.
I don't know where the error is, as MSDN claims that Regex.Matches "Searches an input string for all occurrences of a regular expression and returns all the successful matches" .
That should return "The MatchCollection of Match objects found by the search" as they saym which really doesn't help me too much.
Any hint on this?
Thank you for all the effort on helping guys with their first steps inot this :).
Cheers!
Strange, exact same code works fine for me. What version of f# and .net framework are you using?
Also, for each of the following two functions
1
let test = Regex.Matches("test", "test") |> Seq.cast<Match>
1 2 3
let test2 = Regex.Matches("test", "test")
What type does VS tell you that test and test2 are? It looks to me like for some reason it's thinking the function has a different type than it does?
Oh and one more thing, I assume the < > are in your code because you didn't wrap it in code tags and it translated your symbols. But just in case you actually copied it out of brilsmurf's post and didn't replace them with the correct symbols, they should actually be the less than and greater than symbols, like I put in mine above. Hopefully that's not too obvious lol, but I'm not sure why else you'd get such a strange error.
Edit: OK I noticed something very weird. If I replace the less than and greater than symbols in my code (that works) with < and > and hover over my final variable that I assigned to it says that it's type bool even though it gives me a syntax error (albeit a different syntax error than you are getting). If I change my working code slightly from this:
1 2 3 4 5
let result = Regex.Matches("test", "test") |> Seq.cast<Match> |> Seq.map (fun m -> m.Groups.[1].Value, m.Value) |> List.of_seq
to this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
let result = Regex.Matches("test", "test") |> Seq.cast <Match> |> Seq.map (fun m -> m.Groups.[1].Value, m.Value) |> List.of_seq
it still fails but now it tells me that result has type 'unit'. Note that the only difference between these two code fragments is the presence of a single whitespace character between Seq.cast and the type annotation. So, try it exactly verbatim to the one that I verified works (the one without the whitespace character) and see what happens.
Fortunately or unfortunately for you, I am that dumb... and I did put < and > instead of their corresponding symbols. As these are my first steps I just thought they would had a purpose I didn't understand as I haven't nearly finished the book and this is my first program in F#... sorry.
Also, sorry for the code tags... I didn't see any icon for them and thought they weren't allowed. I'll put them from now on.
On the matter... it all seems to be correct now... there's no syntactic mistake, but I still can't get the list of matches. It might be a problem with the regular expression or something, as I get an empty list.
If anyone wants to try, the code right now is like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
#light open System.IO open System.Net open System.Text.RegularExpressions let mydate = "23/02/2009" /// Function to transform a date given as "dd/mm/yyyy" to the one required by the NBA webpage as "yyyymmdd" let dateTransform date = date |> String.split ['/'] |> List.rev |> List.reduce_left (+) /// Transform the given date to the NBA webpage format let dateNBA = mydate |> dateTransform /// Compose exact URLgames;; let myUrl = "http://www.nba.com/games/" + dateNBA + "/scoreboard.html" /// Get the contents of the URL via a web request let http (url: string) = let req = System.Net.WebRequest.Create(url) let resp = req.GetResponse() let stream = resp.GetResponseStream() let reader = new StreamReader(stream) let html = reader.ReadToEnd() resp.Close() html //The actual HTML content let webpage = myUrl |> http let re = "http://www.nba.com/games/\\d{8}/([^/]+)/gameinfo.html" let result = Regex.Matches(webpage, re) |> Seq.cast<Match> |> Seq.map (fun m -> m.Groups.[1].Value, m.Value) |> List.of_seq let games = result let showGames games = games |> List.iter (fun x -> printf "%s" x)
Thank you for all the help! Appreciate it.
Cheers!
Heh, no problem. Just for reference, the code tags needs an attribute of language="F#". Like this:
1 2
<code lang=fsharp> ..code goes here
</code> As for the regex, it's been a while since I've done anything with regexes, but the \\d might be the problem. \d is a digit already, I'm not sure what the extra slash would do. What if you change it to: <code lang=fsharp> let re = "http://www.nba.com/games/[^/]*/[^/]*/gameinfo.html" </code> If that works, then you know the problem is with the digit specification.
I don't know what I'm doing wrong but it seems I just can't get this right hehe!
With the regex you put the result is the same. Even if I do it with something like this
1
let re = "http://www.nba.com/games/*"
The final list is empty... I guess the the problem elsewhere... :).
Thanks for everything!
Ahh, I see. For starters, you need to use "captures". By default there is exactly 1 match on a regular expression, and that is the entire string. In this case the URL. If I run the code with the URL given in the beginning, I get the list
[("", "[link:www.nba.com]
as the result. If you want to extract the date and the teams, you need to mark them in the regular expressino that you want to capture them with parentheses.
1
let re = "http://www.nba.com/games/([^/]*)/([^/]*)/gameinfo.html"
This should save the values 20090125 and DALBOS. I don't remember how you access them, if it's through the "Groups" or "Captures", I remember it was unintuitive last time I had to do it, but hopefully that gets you on the right track.
That's definitely helpful.
Just yesterday I had a deep look at Groups, etc in the regex and some more code other than the book's on how to manage them.
I'll try again when I come at night.
Thanks for all the interest you've put in this.
Cheers!
I finally did it... at last!
It all comes down to a decision from nba.com to change the way the did put the links, hehe!
They were put before as "www.nba.com/games/20090302/NOHPHI/gameinfo.html" while now they have reduced each link to "games/20090302/NOHPHI/gameinfo.html". That's why I was having a void result from the match on the regex.
Code now looks like this
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
#light open System.IO open System.Net open System.Text.RegularExpressions let mydate = "02/03/2009" /// Function to transform a date given as "dd/mm/yyyy" to the one required by the NBA webpage as "yyyymmdd" let dateTransform date = date |> String.split ['/'] |> List.rev |> List.reduce_left (+) /// Transform the given date to the NshwBA webpage format let dateNBA = mydate |> dateTransform /// Compose exact URLgames;; let myUrl = "http://www.nba.com/games/" + dateNBA + "/scoreboard.html" /// Get the contents of the URL via a web request let http (url: string) = let req = System.Net.WebRequest.Create(url) let resp = req.GetResponse() let stream = resp.GetResponseStream() let reader = new StreamReader(stream) let html = reader.ReadToEnd() resp.Close() html //The actual HTML content let webpage = myUrl |> http /// Regular expression to match the date and teams involved /// We take as Group "date" the numbers after the first slash "/" /// We take as Group "teams" the rest untill the next slash "/" let re = @"games/(?<date>[0-9]+)/(?<teams>\w+)" let result = Regex.Matches(webpage, re) |> Seq.cast<Match> |> Seq.map (fun m -> m.Groups.Item("date").Value, m.Groups.Item("teams").Value, m.Value) |> List.of_seq
> result;;
val it : (string * string * string) list
= [("20090302", "NOHPHI", "games/20090302/NOHPHI");
("20090302", "ATLWAS", "games/20090302/ATLWAS");
("20090302", "CLEMIA", "games/20090302/CLEMIA");
("20090302", "DALOKC", "games/20090302/DALOKC");
("20090302", "SASLAC", "games/20090302/SASLAC")]
It could be better... I suppose. At first, I think it would be good to separate the teams initials directly from the regex... but I will leave it as is til I get my copy of "Mastering regular expressions" ;).
Thanks, thanks, thanks a lot for all the help and interest put in solving my problem.
Cheers!
Can maybe be it that according to MSDN, MatchCollection "Represents the set of successful matches found by iteratively applying a regular expression pattern to the input string."
[SerializableAttribute]
public class MatchCollection : ICollection,
IEnumerable
I'm new to .Net and F# so I don't really know if there's a problem managing an ICollection and IEnumerable object as a Sequence. Where can I look at those kind of things to try not bother you all more than necessary?
Thanks!
Topic tags
- f# × 3705
- websharper × 1897
- compiler × 286
- functional × 201
- ui next × 139
- c# × 121
- classes × 97
- web × 97
- .net × 84
- book × 84
- async × 76
- ui.next × 67
- bug × 54
- core × 49
- website × 49
- server × 45
- parallel × 43
- ui × 43
- enhancement × 41
- parsing × 41
- testing × 41
- trywebsharper × 41
- typescript × 37
- html × 35
- javascript × 35
- owin × 35
- asynchronous × 30
- monad × 28
- ocaml × 28
- tutorial × 27
- warp × 27
- haskell × 26
- sitelet × 25
- linq × 22
- workflows × 22
- wpf × 20
- fpish × 19
- introduction × 19
- silverlight × 19
- sitelets × 19
- monodevelop × 17
- rpc × 17
- suave × 17
- piglets × 16
- collections × 15
- feature request × 15
- jquery × 15
- templates × 15
- getting started × 14
- pipeline × 14
- kendoui × 13
- reactive × 12
- 4.1.0.171 × 11
- monads × 11
- opinion × 10
- 4.0.190.100-rc × 9
- deployment × 9
- fixed × 9
- formlets × 9
- in × 9
- json × 9
- plugin × 9
- proposal × 9
- scheme × 9
- solid × 9
- basics × 8
- concurrent × 8
- highcharts × 8
- how-to × 8
- python × 8
- 4.1.1.175 × 7
- complexity × 7
- documentation × 7
- visual studio × 7
- 4.1.2.178 × 6
- lisp × 6
- real-world × 6
- released in 4.0.192.103-rc × 6
- remoting × 6
- resources × 6
- scala × 6
- websharper ui.next × 6
- workshop × 6
- xaml × 6
- 4.0.193.110 × 5
- 4.2.3.236 × 5
- aspnetmvc × 5
- authentication × 5
- azure × 5
- bootstrap × 5
- conference × 5
- dsl × 5
- formlet × 5
- java × 5
- list × 5
- metaprogramming × 5
- ml × 5
- released in Zafir.4.0.188.91-beta10 × 5
- sql × 5
- visualstudio × 5
- websharper.forms × 5
- zafir × 5
- 4.0.192.106 × 4
- 4.0.195.127 × 4
- 4.1.0.38 × 4
- 4.2.1.86 × 4
- 4.2.6.118 × 4
- css × 4
- example × 4
- extensions × 4
- fsi × 4
- fsx × 4
- html5 × 4
- jqueryui × 4
- lift × 4
- reflection × 4
- remote × 4
- rest × 4
- spa × 4
- teaching × 4
- template × 4
- websocket × 4
- wontfix × 4
- 4.0.196.147 × 3
- 4.1.0.34 × 3
- 4.1.6.207 × 3
- 4.2.1.223-beta × 3
- 4.2.11.258 × 3
- 4.2.4.114 × 3
- 4.2.4.247 × 3
- 4.2.5.115 × 3
- 4.2.6.253 × 3
- 4.2.9.256 × 3
- ajax × 3
- alt.net × 3
- aml × 3
- asp.net mvc × 3
- canvas × 3
- cloudsharper × 3
- compilation × 3
- database × 3
- erlang × 3
- events × 3
- extension × 3
- file upload × 3
- forums × 3
- inline × 3
- issue × 3
- kendo × 3
- macro × 3
- mono × 3
- msbuild × 3
- mvc × 3
- pattern × 3
- piglet × 3
- released in Zafir.4.0.187.90-beta10 × 3
- svg × 3
- type provider × 3
- view × 3
- 4.1.1.64 × 2
- 4.1.5.203 × 2
- 4.1.7.232 × 2
- 4.2.10.257 × 2
- 4.2.3.111 × 2
- 4.2.5.249 × 2
- android × 2
- asp.net × 2
- beginner × 2
- blog × 2
- chart × 2
- client × 2
- client server app × 2
- clojure × 2
- computation expressions × 2
- constructor × 2
- corporate × 2
- courses × 2
- cufp × 2
- d3 × 2
- debugging × 2
- direct × 2
- discriminated union × 2
- docs × 2
- elm × 2
- endpoint × 2
- endpoints × 2
- enterprise × 2
- entity framework × 2
- event × 2
- f# interactive × 2
- fable × 2
- flowlet × 2
- formdata × 2
- forms × 2
- fsc × 2
- google maps × 2
- hosting × 2
- http × 2
- https × 2
- iis 8.0 × 2
- install × 2
- interactive × 2
- interface × 2
- iphone × 2
- iteratee × 2
- jobs × 2
- jquery mobile × 2
- keynote × 2
- lens × 2
- lenses × 2
- linux × 2
- listmodel × 2
- mac × 2
- numeric × 2
- oauth × 2
- obfuscation × 2
- offline × 2
- oop × 2
- osx × 2
- packaging × 2
- pattern matching × 2
- performance × 2
- pipelines × 2
- q&a × 2
- quotation × 2
- reference × 2
- released in Zafir.4.0.185.88-beta10 × 2
- rx × 2
- script × 2
- security × 2
- self host × 2
- seq × 2
- sockets × 2
- stm × 2
- tcp × 2
- trie × 2
- tutorials × 2
- type × 2
- url × 2
- var × 2
- websharper.charting × 2
- websharper4 × 2
- websockets × 2
- wig × 2
- xna × 2
- zh × 2
- .net interop × 1
- 2012 × 1
- 4.0.194.126 × 1
- 4.1.3.184 × 1
- 4.1.4.189 × 1
- 4.2.0.214-beta × 1
- 4.2.12.259 × 1
- 4.2.2.231-beta × 1
- 4.2.8.255 × 1
- Canvas Sample Example × 1
- DynamicStyle Animated Style × 1
- Fixed in 4.0.190.100-rc × 1
- Released in Zafir.UI.Next.4.0.169.79-beta10 × 1
- SvgDynamicAttribute × 1
- WebComponent × 1
- abstract class × 1
- accumulator × 1
- active pattern × 1
- actor × 1
- addin × 1
- agents × 1
- aggregation × 1
- agile × 1
- alter session × 1
- animation × 1
- anonymous object × 1
- apache × 1
- api × 1
- appcelerator × 1
- architecture × 1
- array × 1
- arrays × 1
- asp.net 4.5 × 1
- asp.net core × 1
- asp.net integration × 1
- asp.net mvc 4 × 1
- asp.net web api × 1
- aspnet × 1
- ast × 1
- attributes × 1
- authorization × 1
- b-tree × 1
- back button × 1
- badimageformatexception × 1
- bash script × 1
- batching × 1
- binding-vars × 1
- bistro × 1
- body × 1
- bundle × 1
- camtasia studio × 1
- cas protocol × 1
- charts × 1
- clarity × 1
- class × 1
- cli × 1
- clipboard × 1
- clojurescript × 1
- closures × 1
- cloud × 1
- cms × 1
- coding diacritics × 1
- color highlighting × 1
- color zones × 1
- combinator × 1
- combinators × 1
- compile × 1
- compile code on server × 1
- config × 1
- confirm × 1
- content × 1
- context × 1
- context.usersession × 1
- continuation-passing style × 1
- coords × 1
- cordova × 1
- cors × 1
- coursera × 1
- cross-domain × 1
- csla × 1
- current_schema × 1
- custom content × 1
- data × 1
- data grid × 1
- datetime × 1
- debug × 1
- declarative × 1
- delete × 1
- devexpress × 1
- dhtmlx × 1
- dictionary × 1
- directattribute × 1
- disqus × 1
- distance × 1
- do binding × 1
- doc elt ui.next upgrade × 1
- docker × 1
- dojo × 1
- dol × 1
- dom × 1
- domain × 1
- du × 1
- duf-101 × 1
- dynamic × 1
- eastern language × 1
- eclipse × 1
- edsl × 1
- em algorithm × 1
- emacs × 1
- emotion × 1
- enums × 1
- error × 1
- etw × 1
- euclidean × 1
- eventhandlerlist × 1
- examples × 1
- ext js × 1
- extension methods × 1
- extra × 1
- facet pattern × 1
- failed to translate × 1
- fake × 1
- fantomas × 1
- fear × 1
- float × 1
- form × 1
- form-data × 1
- forum × 1
- fp × 1
- frank × 1
- fsdoc × 1
- fsharp × 1
- fsharp.core × 1
- fsharp.powerpack × 1
- fsharpx × 1
- fsunit × 1
- function × 1
- functional style × 1
- game × 1
- games × 1
- gc × 1
- generic × 1
- geometry × 1
- getlastwin32error × 1
- getting-started × 1
- google × 1
- google.maps × 1
- grid × 1
- group × 1
- guide × 1
- hash × 1
- headers × 1
- hello world example × 1
- heroku × 1
- highchart × 1
- history × 1
- how to × 1
- html-templating × 1
- http405 × 1
- httpcontext × 1
- hubfs × 1
- i18n × 1
- ie 8 × 1
- if-doc × 1
- iis × 1
- image × 1
- images × 1
- inheritance × 1
- initialize × 1
- input × 1
- install "visual studio" × 1
- installer × 1
- int64 × 1
- interfaces × 1
- internet explorer × 1
- interop × 1
- interpreter × 1
- io × 1
- iobservable × 1
- ios × 1
- iot × 1
- ipad × 1
- isomorphic × 1
- javascript optimization × 1
- javascript semanticui resources × 1
- jquery-plugin × 1
- jquery-ui × 1
- jquery-ui-datepicker × 1
- js × 1
- kendo datasource × 1
- kendochart × 1
- kendoui compiler × 1
- knockout × 1
- l10n × 1
- learning × 1
- library × 1
- libs × 1
- license × 1
- licensing × 1
- lineserieszonescfg × 1
- local setting × 1
- localization × 1
- logging × 1
- loop × 1
- macros × 1
- mailboxprocessor × 1
- mapping × 1
- maps × 1
- markerclusterer × 1
- markup × 1
- marshal × 1
- math × 1
- mathjax × 1
- message × 1
- message passing × 1
- message-passing × 1
- meta × 1
- metro style × 1
- micro orm × 1
- minimum-requirements × 1
- mix × 1
- mobile installation × 1
- mod_mono × 1
- modal × 1
- module × 1
- mouseevent × 1
- mouseposition × 1
- multidimensional × 1
- multiline × 1
- multithreading × 1
- mysql × 1
- mysqlclient × 1
- nancy × 1
- native × 1
- nested × 1
- nested loops × 1
- node × 1
- nunit × 1
- object relation mapper × 1
- object-oriented × 1
- om × 1
- onboarding × 1
- onclick × 1
- optimization × 1
- option × 1
- orm × 1
- os x × 1
- output-path × 1
- override × 1
- paper × 1
- parameter × 1
- persistence × 1
- persistent data structure × 1
- phonegap × 1
- pola × 1
- post × 1
- powerpack × 1
- prefix tree × 1
- principle of least authority × 1
- privacy × 1
- private × 1
- profile × 1
- programming × 1
- project × 1
- project euler × 1
- projekt_feladat × 1
- protected × 1
- provider × 1
- proxy × 1
- ptvs × 1
- public × 1
- pure f# × 1
- purescript × 1
- qna × 1
- quant × 1
- query sitelet × 1
- question × 1
- quotations × 1
- range × 1
- raphael × 1
- razor × 1
- rc × 1
- reactjs × 1
- real-time × 1
- ref × 1
- region × 1
- released in 4.0.190.100-rc × 1
- reporting × 1
- responsive design × 1
- rest api × 1
- rest sitelet × 1
- restful × 1
- round table × 1
- router × 1
- routing × 1
- rpc reverseproxy × 1
- runtime × 1
- sales × 1
- sample × 1
- sampleapp × 1
- scriptcs × 1
- scripting × 1
- search × 1
- self hosted × 1
- semanticui × 1
- sequence × 1
- serialisation × 1
- service × 1
- session-state × 1
- sharepoint × 1
- signals × 1
- sitelet website × 1
- sitelet.protect × 1
- sitlets × 1
- slickgrid × 1
- source code × 1
- sqlentityconnection × 1
- ssl × 1
- standards × 1
- static content × 1
- stickynotes × 1
- streamreader × 1
- stress × 1
- strong name × 1
- structures × 1
- submitbutton × 1
- subscribe × 1
- svg example html5 websharper.ui.next × 1
- sweetalert × 1
- system.datetime × 1
- system.reflection.targetinvocationexception × 1
- table storage × 1
- targets × 1
- tdd × 1
- templates ui.next × 1
- templating × 1
- text parsing × 1
- three.js × 1
- time travel × 1
- tls × 1
- tooltip × 1
- tracing × 1
- tsunamiide × 1
- turkish × 1
- twitter-bootstrap × 1
- type erasure × 1
- type inference × 1
- type providers × 1
- type-providers × 1
- typeprovider × 1
- ui next forms × 1
- ui-next × 1
- ui.next jqueryui × 1
- ui.next charting × 1
- ui.next formlets × 1
- ui.next forms × 1
- ui.next suave visualstudio × 1
- ui.next templating × 1
- unicode × 1
- unittest client × 1
- upload × 1
- usersession × 1
- validation × 1
- vb × 1
- vb.net × 1
- vector × 1
- view.map × 1
- visal studio × 1
- visual f# × 1
- visual studio 11 × 1
- visual studio 2012 × 1
- visual studio shell × 1
- vs2017 compiler zafir × 1
- vsix × 1
- web api × 1
- web-scraping × 1
- webapi × 1
- webcomponents × 1
- webforms × 1
- webgl × 1
- webrtc × 1
- webshaper × 1
- websharper async × 1
- websharper codemirror × 1
- websharper f# google × 1
- websharper forms × 1
- websharper reactive × 1
- websharper rpc × 1
- websharper sitelets routing × 1
- websharper warp × 1
- websharper-interface-generator × 1
- websharper.chartsjs × 1
- websharper.com × 1
- websharper.exe × 1
- websharper.owin × 1
- websharper.ui.next × 1
- websharper.ui.next jquery × 1
- websockets iis × 1
- why-websharper × 1
- windows 7 × 1
- windows 8 × 1
- windows-phone × 1
- winrt × 1
- www.grabbitmedia.com × 1
- xamarin × 1
- xml × 1
- yeoman × 1
- yield × 1
- zafir beta × 1
- zafir websharper4 × 1
- zarovizsga × 1
![]() |
Copyright (c) 2011-2012 IntelliFactory. All rights reserved. Home | Products | Consulting | Trainings | Blogs | Jobs | Contact Us | Terms of Use | Privacy Policy | Cookie Policy |
Built with WebSharper |
Hi again!
I hope I'm not asking too many questions...
The problem now is this: I want to look through and html page which I get the code with the "http" function. Every link (href) that I want to save has the form of e.g. "[link:www.nba.com] (I guess regex would be "/games/20090125/*/gameinfo.html"). I should create a list with the form ("DALBOS","[link:www.nba.com] of all the links in the page that match the regex i put above. I really don't know how to go through all the document and save every match the regex finds (maybe convert the full text to a huge string and then filter it... but I tried that one already).
I don't ask for teh code, but I've tried several different approaches without success. So, I ask if someone could just give the guidelines on how it would be the best way to do this... In terms of code flow, structures, etc.
I've been googleing a while so... also, if anyone knows about any good article explaining similar cases with F#, regex, and text I'd be very glad.
Thanks a lot!