Question : Fetch special characters like "Ñ" and absolute URL from href attribute of anchors

Hi there

I used the following codes to fetch the source codes from the web page (assigned to url2 in the following codes) but got two painful problems.

1. odd characters or character missed, e.g. the name "ALBARIÑO ORGANISTRUN" become "ALBARIÿ ORGANISTRUN" if display in Notepad++ or "ALBARI ORGANISTRUN" if display in Notepad. waht I want is "ALBARIÑO ORGANISTRUN".

2. relative url, e.g. the href value of the anchor "Blancos" is "prodtype.asp?PT_ID=107&numRecordPosition=1&strPageHistory=cat&strKeywords=&strSearchCriteria="
, But I really want is its absolute address like "http://www.elcatavinos.com/tienda/prodtype.asp?PT_ID=107&numRecordPosition=1&strPageHistory=cat&strKeywords=&strSearchCriteria="

Can any one help me sort them out?

Thanks in advance!

Jason

__________________________________________________________
codes I used:

dim url1
dim url2
dim xmlhttp
dim datafile
dim FS
dim dataFileTs
dim i
dim cookie

url1 = "http://www.elcatavinos.com/tienda/store/dynamicIndex.asp?sm=b1"
url2 = "http://www.elcatavinos.com/tienda/product.asp?numRecordPosition=5&P_ID=25473&strPageHistory=cat&strKeywords=&SearchFor=&PT_ID=107"
datafile = "c:\temp\test.dat"

set FS = Wscript.CreateObject("Scripting.FileSystemObject")
set datafileTs = FS.CreateTextFile(datafile, True, True)
set xmlHTTP = Wscript.CreateObject("MSXML2.XMLHTTP.3.0")
xmlHTTP.Open "HEAD",url1, false
xmlHTTP.Send
i = 0
do until xmlHTTP.readyState = 4            
      Wscript.Sleep 100
      i = i + 1
      if i > 1000 then exit do
Loop
cookie = xmlhttp.getResponseHeader("set-cookie")

xmlHTTP.Open "GET",url2, false
xmlHTTP.SetRequestHeader "set-cookie",cookie
xmlHTTP.SetRequestHeader "Content-Type","text/html; charset=iso-8859-1"
xmlHTTP.SetRequestHeader "Content-Location","absoluteURI"
xmlHTTP.Send
i = 0
do until xmlHTTP.readyState = 4            
      Wscript.Sleep 100
      i = i + 1
      if i > 1000 then exit do
Loop
datafileTs.Writeline xmlhttp.responseText





Answer : Fetch special characters like "Ñ" and absolute URL from href attribute of anchors

> .. href attribute of every anchor should already contain its domain ..
I already explained http:#22674364 that the client cannot enforce this, the server needs to do it
Random Solutions  
 
programming4us programming4us