php - How to parse the data obtained by using CURL to get DL? -
i want show list of journals , abbreviation like:
journal name, abbreviation
i getting data need : http://images.webofknowledge.com/wok46/help/wos/d_abrvjt.html running following:
$ch = curl_init();
//set options $curl = curl_init(); curl_setopt_array($curl, array( curlopt_url => 'http://images.webofknowledge.com/wok46/help/wos/a_abrvjt.html' )); $result = curl_exec($curl); curl_close($curl); $data=json_decode($result, true); //!end function, make_call but shows me whole page, said need name of journals(dt) , abbreviation (dd). how can pars result?
html dom parsing via simple html dom scraping method ...
<?php function scraper($file, $cnt = null) { /* @param $file, url or path/file @param $cnt, (number of results list) empty all, or number */ require_once('path/to/simple_html_dom.php'); //set_time_limit(0); // uncomment large files $result = array(); // create dom url $html = file_get_html($file); if ($html) { if (empty($cnt)) { $cnt = count($html->find('dt')); } foreach($html->find('dl') $dl) { ($i = 0; $i < $cnt; $i++) { $dt = $dl->find('dt', $i)->plaintext; $dd = $dl->find('dd', $i)->plaintext; $result[] = array(trim($dt) => trim($dd)); } } } return $result; } $array = scraper('http://somesite.com/page.html'); print_r($array); ?> example output ...
array ( [0] => array ( [d h lawrence review] => d h lawrence rev ) [1] => array ( [d-d excitations in transition-metal oxides] => springer tr mod phys ) [2] => array ( [dados-revista de ciencias sociais] => dados-rev cienc soc ) [3] => array ( [daedalus] => daedalus ) [4] => array ( [daedalus] => daedalus-us ) [5] => array ( [daghestan , world of islam] => suomal tied toim sar ) ) updated example specific user350082's issue ...
the definition lists dt , dd tags not closed resulting in dd being included in find('dt') result.
<dt>d h lawrence review<b><dd> d h lawrence rev</b> <dt>d-d excitations in transition-metal oxides<b><dd> springer tr mod phys</b> etc. etc. etc. updated function ...
function scraper($file, $cnt = null) { /* @param $file, url or path/file @param $cnt, (number of results list) empty all, or number */ require_once('path/to/simple_html_dom.php'); //set_time_limit(0); // uncomment large files $result = array(); // create dom url $html = file_get_html($file); if ($html) { foreach($html->find('dl') $dl) { if (empty($cnt)) { $cnt = count($html->find('dt')); } // set count if null ($i = 0; $i < $cnt; $i++) { $dd = $dl->find('dd', $i)->plaintext; $dt = $dl->find('dt', $i)->innertext; // dt html tags, easier removing dd duplication $dt = preg_replace('/\s+/', ' ',$dt); // remove whitespace, tabs etc. // strip dd text duplication dt if (($pos = strrpos($dt ,$dd)) !== false) { $strlen = strlen($dd); $dt = substr_replace($dt, "", $pos, $strlen); } $dt = strip_tags($dt); // remove html tags if (empty($dt)) { $dt = $dd; } // make sure dt not empty $result[] = array(trim($dt) => trim($dd)); } } } return $result; }
Comments
Post a Comment