login - follow a page redirect using rvest in R -


i new r , rvest. trying use these information website (www.medicinescomplete.com) allows sign in using athens academic login system. in browser, when click on athens login button transfers athens login form. after submitting user credentials form redirects browser original site logged in.

i used submit_form() function submit credentials athens form , returns 200 code. however, r not follow redirect browser , if use jump_to() command return original site not logged in. suspect redirected link returned sign in page might contain log in credentials need not know how find link , send using rvest

has worked out how log in via athens using rvest or has idea how make follow automatic redirect??

the code have used far (login credentials changed):

library(rvest) library(magrittr)  url <- "https://www.medicinescomplete.com/about/" mcsession <- html_session(url) mcsession <- jump_to(mcsession, "/mc/athens.htm?   uri=https%3a%2f%2fwww.medicinescomplete.com%2fabout%2f") athensform <- html_form(mcsession)[[1]] athensform <-set_values(athensform, ath_uname = "xxx", ath_passwd = "yyy") submit_form(mcsession, athensform) jump_to(mcsession, "https://www.medicinescomplete.com/mc/bnf/current/") 

i 200 code submit_form() step 403 forbidden code jump_to() last line.

i piped submit_form step html() , printed it. make out successful login in body of main page there line referring redirecting original site. html whole page long post relevant bit seems be:

<div style="padding: 8px;" id="logindiv">                         <form method="post" action="https://www.medicinescomplete.com/mc/athens">                             please wait while transfer you. <br><noscript>javascript disabled, please<input type="submit" value="click here" style="border:none;background:none;text-decoration:underline;color:#e27b2f;"> 

and wonder if following bit refers login key:

<input type="hidden" name="target" value="https://www.medicinescomplete.com/about/" style="display:none"><input type="hidden" name="relaystate" value="https://www.medicinescomplete.com/about/" style="display:none"><input type="hidden" name="samlresponse" value="pfjlc3bvbnnlihhtbg5zpsj1cm46b2fzaxm6bmftzxm6dgm6u0fntdoylja6chjvdg9jb2wiihhtbg5zonnhbwwypsj1cm46b2fzaxm6bmftzxm6dgm6u0fntdoylja6yxnzzxj0aw9uiibezxn... 

aha! further down page there this:

<script> window.onload = function() { document.forms[0].submit(); } </script> 

i think window meant automatically submit form performs post original medicinescomplete.com site authenticate using hidden field login credential. however, on trying use submit_form() on page don't seem further! have added following line try , work out going on:

> submit_form(mcsession, athensform) %>% html_form() %>% str() 

and gives following output:

submitting 'submit' list of 1  $ :list of 5   ..$ name   : chr "<unnamed>"   ..$ method : chr "post"   ..$ url    : chr "https://www.medicinescomplete.com/mc/athens"   ..$ enctype: chr "form"   ..$ fields :list of 4   .. ..$ null        :list of 7   .. .. ..$ name    : null   .. .. ..$ type    : chr "submit"   .. .. ..$ value   : chr "click here"   .. .. ..$ checked : null   .. .. ..$ disabled: null   .. .. ..$ readonly: null   .. .. ..$ required: logi false   .. .. ..- attr(*, "class")= chr "input"   .. ..$ target      :list of 7   .. .. ..$ name    : chr "target"   .. .. ..$ type    : chr "hidden"   .. .. ..$ value   : chr "https://www.medicinescomplete.com/about/"   .. .. ..$ checked : null   .. .. ..$ disabled: null   .. .. ..$ readonly: null   .. .. ..$ required: logi false   .. .. ..- attr(*, "class")= chr "input"   .. ..$ relaystate  :list of 7   .. .. ..$ name    : chr "relaystate"   .. .. ..$ type    : chr "hidden"   .. .. ..$ value   : chr "https://www.medicinescomplete.com/about/"   .. .. ..$ checked : null   .. .. ..$ disabled: null   .. .. ..$ readonly: null   .. .. ..$ required: logi false   .. .. ..- attr(*, "class")= chr "input"   .. ..$ samlresponse:list of 7   .. .. ..$ name    : chr "samlresponse"   .. .. ..$ type    : chr "hidden"   .. .. ..$ value   : chr "pfjlc3bvbnnlihhtbg5zpsj1cm46b2fzaxm6bmftzxm6dgm6u0fntdoylja6chjvdg9jb2wiihhtbg5zonnhbwwypsj1cm46b2fzaxm6bmftzxm6dgm6u0fntdoylja"| __truncated__   .. .. ..$ checked : null   .. .. ..$ disabled: null   .. .. ..$ readonly: null   .. .. ..$ required: logi false   .. .. ..- attr(*, "class")= chr "input"   .. ..- attr(*, "class")= chr "fields"   ..- attr(*, "class")= chr "form" 

i feel information in form should allow me log in original site don't quite understand how! unfortunately when try submit_form() function again form doesn't seem work. tried this:

submit_form(mcsession, athensform) %>% html_form() %>% submit_form(mcsession, .) %>% html() 

and got this:

submitting 'submit' submitting '' error in if (!(submit %in% names(submits))) { :    argument of length 0 

it's tied this issue prevents httr issue correct get query on redirect.

it little hard guess though, because you're missing reproducible example or complete verbose output of query.

a workaround prevent redirect with:

rvest::submit_form(...,                    httr::config(followlocation = false)) 

Comments

Popular posts from this blog

javascript - AngularJS custom datepicker directive -

javascript - jQuery date picker - Disable dates after the selection from the first date picker -