php - How to parse the data obtained by using CURL to get DL? -

March 15, 2015

i want show list of journals , abbreviation like:

journal name, abbreviation

i getting data need : http://images.webofknowledge.com/wok46/help/wos/d_abrvjt.html running following:

$ch = curl_init();

//set options  $curl = curl_init();  curl_setopt_array($curl, array(  curlopt_url =>           'http://images.webofknowledge.com/wok46/help/wos/a_abrvjt.html'  ));   $result = curl_exec($curl);  curl_close($curl);   $data=json_decode($result, true);  //!end function, make_call

but shows me whole page, said need name of journals(dt) , abbreviation (dd). how can pars result?

html dom parsing via simple html dom scraping method ...

<?php  function scraper($file, $cnt = null) {     /*       @param $file, url or path/file       @param $cnt, (number of results list) empty all, or number     */     require_once('path/to/simple_html_dom.php');     //set_time_limit(0); // uncomment large files     $result = array();      // create dom url     $html = file_get_html($file);     if ($html) {         if (empty($cnt)) { $cnt = count($html->find('dt')); }          foreach($html->find('dl') $dl) {              ($i = 0; $i < $cnt; $i++) {                 $dt = $dl->find('dt', $i)->plaintext;                 $dd = $dl->find('dd', $i)->plaintext;                 $result[] = array(trim($dt) => trim($dd));             }          }     }      return $result;  }  $array = scraper('http://somesite.com/page.html'); print_r($array); ?>

example output ...

array (     [0] => array         (             [d h lawrence review] => d h lawrence rev         )      [1] => array         (             [d-d excitations in transition-metal oxides] => springer tr mod phys         )      [2] => array         (             [dados-revista de ciencias sociais] => dados-rev cienc soc         )      [3] => array         (             [daedalus] => daedalus         )      [4] => array         (             [daedalus] => daedalus-us         )      [5] => array         (             [daghestan , world of islam] => suomal tied toim sar         )  )

updated example specific user350082's issue ...

the definition lists dt , dd tags not closed resulting in dd being included in find('dt') result.

<dt>d h lawrence review<b><dd>  d h lawrence rev</b> <dt>d-d excitations in transition-metal oxides<b><dd>   springer tr mod phys</b> etc. etc. etc.

updated function ...

function scraper($file, $cnt = null) {      /*       @param $file, url or path/file       @param $cnt, (number of results list) empty all, or number     */     require_once('path/to/simple_html_dom.php');     //set_time_limit(0); // uncomment large files     $result = array();      // create dom url     $html = file_get_html($file);     if ($html) {          foreach($html->find('dl') $dl) {              if (empty($cnt)) { $cnt = count($html->find('dt')); } // set count if null             ($i = 0; $i < $cnt; $i++) {                  $dd = $dl->find('dd', $i)->plaintext;                  $dt = $dl->find('dt', $i)->innertext; // dt html tags, easier removing dd duplication                 $dt = preg_replace('/\s+/', ' ',$dt); // remove whitespace, tabs etc.                  // strip dd text duplication dt                 if (($pos = strrpos($dt ,$dd)) !== false) {                     $strlen = strlen($dd);                     $dt = substr_replace($dt, "", $pos, $strlen);                 }                  $dt = strip_tags($dt); // remove html tags                 if (empty($dt)) { $dt = $dd; } // make sure dt not empty                  $result[] = array(trim($dt) => trim($dd));              }          }      }      return $result;  }

Search This Blog

Plus Code

php - How to parse the data obtained by using CURL to get DL? -

Comments

Post a Comment

Popular posts from this blog

How to group boxplot outliers in gnuplot -

cakephp - simple blog with croogo -

bash - Performing variable substitution in a string -