PDA

View Full Version : Help with Regex in PHP


HS Staff
06-05-03, 10:48 AM
Hello All:

I am needing some PHP code that will run a regex on a string (the environmental variable for a users hostname) and return the fully qualified domain name from that string. Below are some examples of strings:

poolf7-029.wwa.com
whatever.sub.sub2.domain.co.uk
morestuff.more-stuff.sub.school.k12.us

You get the idea. If we were looking at just a single .xxx it would be simple, but .com.ty or .co.uk and perhaps even 3-level extensions make this more challenging.

Anyone know of some code that has been written to parse fully qualified domain names out of hostnames?

Pearl
06-05-03, 12:01 PM
PHP is not my thing, but I found these, hopefully they'll help?

http://www.zend.com/codex.php?id=553&single=1
http://ca2.php.net/preg_match
http://www.perlmonks.org/index.pl?node_id=33357 (perl)
http://www.educat.hu-berlin.de/doc/php/functions.html#gethostbyaddr ??

Best of Luck.

phpkid
06-06-03, 03:10 AM
That is an interesting problem.
Here is what I could come up with.


<?php

//You will have to keep list of Fully qualified domain nam in $domain_names array.
$domain_names[] = '.abc.dk';
$domain_names[] = '.com';
$domain_names[] = '.hi';
$domain_names[] = '.hello';
$domain_names[] = '.mark';
$domain_names[] = '.in';

//Supply domain name to check in $domain variable.
$domain = 'hello.com.in';

$num_levels = 3; //Number of levels (or extensions) to go to look in a domain!


//Start Processing...

//Find number of dots.
$num_dots = substr_count($domain,'.');

$domain_dup = $domain; //Copy the original domain name.
if($num_dots == 0 )
{
print 'Not a valid domain!';
}
else
{
$level = 0; //We will start from last DOT till the number of levels we are looking for.
while($level < $num_levels)
{
$domain_part = '.' . substr(strrchr($domain_dup,'.'),1);
$domain_root[] = $domain_part . $domain_root[count($domain_root)-1];
//$domain_dup = str_replace($domain_root[count($domain_root)-1],'',$domain_dup);
$domain_dup = str_replace($domain_part,'',$domain_dup);

$level++;
if($level == $num_dots)
break;

}
//nl2br(print_r($domain_root));
}

//As we have build up list of fully qualified domain names, now search throuh our existing list!
foreach($domain_root as $domain_to_check)
{
if(in_array($domain_to_check,$domain_names))
{
$found = TRUE;
print "Fully qualified domain name for <h1>$domain</h1> is <h2>$domain_to_check</h2>";
}

}
if(! $found)
{
print 'Invalid domain name.';
}
?>


Try it and see if it works. I think it might not work in some situations and I am right now too lazy to think about those situations.

So if you could populate $domain_names properly and check it, it will be great!

Regards,
JD

phpkid
06-06-03, 03:19 AM
Okie,
I figured out when my code will break.
Say if you have top qualified domain name like .co.in and .in then it will tell you that top qualified domain name is .in instead of correct '.co.in' . Anyways following code solves the problem! :)


<?php

$domain = 'india.co.in';

print "Fully qualified name for the domain $domain is " . '<h1>' . fully_qualified_name($domain) . '</h1>';

?>
<?php

function fully_qualified_name($domain)
{
//You will have to keep list of Fully qualified domain nam in $domain_names array.
$domain_names[] = '.abc.dk';
$domain_names[] = '.com';
$domain_names[] = '.hi';
$domain_names[] = '.hello';
$domain_names[] = '.mark';
$domain_names[] = '.in';
$domain_names[] = '.co.in';


$num_levels = 3; //Number of levels (or extensions) to go to look in a domain!


//Start Processing...

//Find number of dots.
$num_dots = substr_count($domain,'.');

$domain_dup = $domain; //Copy the original domain name.
if($num_dots == 0 )
{
return 'Not a valid domain!';
}
else
{
$level = 0; //We will start from last DOT till the number of levels we are looking for.
while($level < $num_levels)
{
$domain_part = '.' . substr(strrchr($domain_dup,'.'),1);
$domain_root[] = $domain_part . $domain_root[count($domain_root)-1];
//$domain_dup = str_replace($domain_root[count($domain_root)-1],'',$domain_dup);
$domain_dup = str_replace($domain_part,'',$domain_dup);

$level++;
if($level == $num_dots)
break;

}
//nl2br(print_r($domain_root));
}

$domain_root = array_reverse($domain_root);

//As we have build up list of fully qualified domain names, now search throuh our existing list!
foreach($domain_root as $domain_to_check)
{
if(in_array($domain_to_check,$domain_names))
{
return $domain_to_check;
}

}
return 'Invalid domain name.';

}

?>


Hope that helps! :)
JD

DuEy
06-06-03, 06:51 PM
This returns the domains that are real you can modify it to only work on the first domain.

<?
$domain = "php.is.the.best.netdupe.com";

$darray = explode(".", $domain);
//some calculations
$total = count($darray);
$sub = $total -1;
$buffer = $darray[$total];

for($i=$sub;$i>=0; $i--){
$buffer = $darray[$i].".".$buffer;
//more calculations
$buffer = trim(str_replace(".", " ", $buffer));
$buffer = str_replace(" ", ".", $buffer);
//$buffer = $buffer{$length};
$ip = gethostbyname($buffer);

if(ereg('^([0-9]+\.)+([0-9]{1,3})+(\.)+([0-9]{1,3})+(\.)+[0-9]{1,3}$',$ip)){
echo $buffer."<br>";
}
}
?>


this echo's "netdupe.com" as it is the only real domain.

if you want it to only return the top level domain (no subdomains) you will need to make all the real domains into a string seperated by a comma. should look something like this:


if(ereg('^([0-9]+\.)+([0-9]{1,3})+(\.)+([0-9]{1,3})+(\.)+[0-9]{1,3}$',$ip)){
$end .= $buffer.",";
}
}
$buffer = explode(",", $end);
echo $buffer[0];
?>


go here for the proper file the forums strip stuff that needs it to work.
http://www.devdino.com/domaincheck.txt

DuEy
06-07-03, 02:32 AM
Ok ive modified it a bit to try and speed the script up, tested and works fine. I enjoyed making this one lol. See attached file.