Re: Help scripting dns lookup using awk

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Help scripting dns lookup using awk

Tim Daneliuk
On 09/14/2017 07:55 PM, Ernie Luzar wrote:
> The following sh script works, but runs very slow.
Rudiments of the same thing in Python (spacing and indents matter):

#!/usr/bin/env python
# Return IP Of Hostnames Specified On Commandline

from socket import gethostbyname
from sys    import argv

for host in argv[1:]:
    try:
        print gethostbyname(host), " " * 8, host

    except:
        print "Invalid Hostname: %s" % host


----------------------------------------------------------------------------
Tim Daneliuk     [hidden email]
PGP Key:         http://www.tundraware.com/PGP/

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Help scripting dns lookup using awk

Polytropon
On Thu, 14 Sep 2017 20:55:00 -0400, Ernie Luzar wrote:

>    host_in="$1"
>    host_out="$2"
>    host_error="$3"
>    truncate -s 0 $host_out
>    truncate -s 0 $host_error
>
>    cat $host_in | awk '
>      { system(host $1)
>       rc_status = system($0)
>       if (rc_status != 0)
>          print $1 > $host_error
>        else
>          print $1 > $host_out
>      }'
>
>
> # command line exec command.
>  >hosts2void-dns_lookup.awk /tmp/aw.hosts \
> /root/good.dns /root/bad.dns
>
> # This is the output.
> sh: medrx.sensis.com.au: not found
> sh: medrx.sensis.com.au: not found

You're not providing the whole command as needed; system($0) will
only try to execute the hostname, not a "host <something>" command.



> awk: illegal field $(), name "host_error"
>   input record number 1, file
>   source line number 5

The $ is a reserved character in awk to indicate the fields; $0 is
the whole record, $1 the first field, and so on.



> I see 2 problems with my awk code.
>
> 1. The text output of the host command results is going
>     the console screen. In the sh version I kill the output
>     using > /dev/null  How would I to do something like that in awk.

Combine the command string before executing, for example like this:

        cmd = sprintf("/usr/bin/host %s > /dev/null 2>&1", $1)
        rc = system(cmd)

This should suppress all messages, and you can still evaluate the
return code of the external program call.



> 2. I get that doing  print $1 > $host_error  is not allowed.
>     What is the correct way to pass script variables to awk?

Look closely: Your awk script is in ' ... ' (single quotes). According
to standard sh behavior, this means that $<something> is not expanded
(unlike " ... $<something> ..."). If you want to transfer parameters
into an awk script from a sh "enclosure", use awk's -v parameter.
For example:

        ... | awk -v host_out=${host_out} -v host_error=${host_error} '
                # your awk code here
        '



> Now I am wondering if there is a simpler way to do dns lookup
> in awk?

Just tidy up your code a little bit, the basic parts are already
there. ;-)



--
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Help scripting dns lookup using awk

FREE BSD QUESTIONS mailing list
In reply to this post by Tim Daneliuk
On Thu, 14 Sep 2017 20:55:00 -0400
Ernie Luzar wrote:

> The following sh script works, but runs very slow.

Almost certainly the reason it's slow is that you are doing sequential
synchronous lookups. Switching  to another language isn't going help
much. To speed it up you either need to switch to a language with a
DNS library that supports asynchronous lookups or fire-off parallel
child processes. The latter is easier.




_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Help scripting dns lookup using awk

Polytropon
On Fri, 15 Sep 2017 14:30:19 +0100, RW via freebsd-questions wrote:

> On Thu, 14 Sep 2017 20:55:00 -0400
> Ernie Luzar wrote:
>
> > The following sh script works, but runs very slow.
>
> Almost certainly the reason it's slow is that you are doing sequential
> synchronous lookups. Switching  to another language isn't going help
> much. To speed it up you either need to switch to a language with a
> DNS library that supports asynchronous lookups or fire-off parallel
> child processes. The latter is easier.

Correct. The bottleneck is is sequential calls to "host <parameters>".
Separating the input, for example per TLD, and then executing the
queries in parallel could help. It's also possible to use IP ranges
for separation. However, only actual testing will reveal which
approach works best. :-)



--
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Help scripting dns lookup using awk

Ernie Luzar
In reply to this post by FREE BSD QUESTIONS mailing list
RW via freebsd-questions wrote:

> On Thu, 14 Sep 2017 20:55:00 -0400
> Ernie Luzar wrote:
>
>> The following sh script works, but runs very slow.
>
> Almost certainly the reason it's slow is that you are doing sequential
> synchronous lookups. Switching  to another language isn't going help
> much. To speed it up you either need to switch to a language with a
> DNS library that supports asynchronous lookups or fire-off parallel
> child processes. The latter is easier.
>

How would I go about coding a sh script to fire-off parallel child
processes?
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Help scripting dns lookup using awk

Jonathan McKeown-2
Ernie, I've been following your questions over the last month or so. I
think I can guess your problem domain, and I suspect if you told the list
what you're trying to achieve you'd get much better suggested solutions.

As it is I think you have one approach in mind, and all your questions
relate to implementing parts of your idea.

My humble apologies if I'm wrong; but please consider explaining what the
overall problem is. (Someone mentioned x-y problems - I think you have one
here.)
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Help scripting dns lookup using awk

Polytropon
In reply to this post by Ernie Luzar
On Fri, 15 Sep 2017 19:20:22 -0400, Ernie Luzar wrote:

> RW via freebsd-questions wrote:
> > On Thu, 14 Sep 2017 20:55:00 -0400
> > Ernie Luzar wrote:
> >
> >> The following sh script works, but runs very slow.
> >
> > Almost certainly the reason it's slow is that you are doing sequential
> > synchronous lookups. Switching  to another language isn't going help
> > much. To speed it up you either need to switch to a language with a
> > DNS library that supports asynchronous lookups or fire-off parallel
> > child processes. The latter is easier.
> >
>
> How would I go about coding a sh script to fire-off parallel child
> processes?

There are several methods. One would be to "pre-sort" per TLD
or per IP. You can do this with awk or a sort | grep contruct.
Then you iterate over the result files and send the batch lookup
processes into background with the & appended. When everything
is done, re-combine the result files as needed.



--
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Help scripting dns lookup using awk

Ernie Luzar
In reply to this post by Jonathan McKeown-2
Jonathan McKeown wrote:

> Ernie, I've been following your questions over the last month or so. I
> think I can guess your problem domain, and I suspect if you told the
> list what you're trying to achieve you'd get much better suggested
> solutions.
>
> As it is I think you have one approach in mind, and all your questions
> relate to implementing parts of your idea.
>
> My humble apologies if I'm wrong; but please consider explaining what
> the overall problem is. (Someone mentioned x-y problems - I think you
> have one here.)

Yes all my different posts over the last month are related to a solution
I am trying to development. It all started with what looked like a very
simple request from top management. "Stop employees from using social
media from company PCs while at work"  The one and only Freebsd system
is the front door to the Company LAN and wifi. All LAN devices are
WINDOW machines either cabled or wifi including hand held smart phones.
So needed a single point solution that would effect the whole digital shop.

You ask what about smart phones accessing their wireless service. In the
USA a wireless signal jammer is not legal if the people being effected
are un-aware of its existence. On being hired all employees sign a legal
contract containing security requirements and are made aware that a cell
phone wireless signal jammer is employed covering the Company estate and
that Company land lines phone service is the only allowed way for phone
contact with the public for personal and Company business.

As the result of questions posted here, I learned about online providers
of "host" lists. These lists contain '127.0.0.1 domain-name" records of
known malware sites. These "host" lists can be used on WINDOWS and Unix
flavored operating systems by populating those machines host file. This
was not a single point solution.

Along comes using dns as a single point solution. The 3 main players
being bind, unbound, and Dnsmasq which all have the function to be
populated with domain-names to be blocked at the local host level as not
found. I chose unbound, but am having problems with /etc/resolv.conf and
resolvconf not working as documented. Also could not get the built-in
local-unbound to work with any local changes. Posted questions here
which went un-answered. local-unbound and resolvconf are new and don't
have a user base yet to draw answers from. So pretty much a dead end. I
finally installed the port version of unbound and got it working.

Using the public host files and unbound became a single point solution
to provide protection LAN wide that is un-seen by the user base. No more
installing browser plug-ins that tries to do the same blocking function.
The dns solution provides protection to the LAN users from LAN machines
that may become infected. There is no absolute solution just more layers
of protection.

These public available "host" files contain a lot of un-necessary junk
that needed to be cleaned away. I wrote a .sh script to do this, but it
was very slow. Got help from this list to convert it to awk. Using the
same sample input file .sh took 7+ minutes, the awk version took 4
seconds. No brainier about which version I plan to use.

As the last step in massaging the raw "host" file content what'd to do a
dns lookup to verify those host domain-names were really good. Feeding
unbound bogus domain-names is not going to hurt anything, but just
wanted to be thorough. Again I started with a .sh script using the host
system command which is very slow, I got help here from the list to
convert it to awk and it was only a few seconds faster over all. I
changed the .sh script to use the drill system command and it ran in
half the time the host command version took.

In reply to the subject of this post, I got the following;

" Almost certainly the reason it's slow is that you are doing sequential
   synchronous lookups. Switching  to another language isn't going help
   much. To speed it up you either need to switch to a language with a
   DNS library that supports asynchronous lookups or fire-off parallel
   child processes. The latter is easier."

So I posted my last reply asking;

How would I go about coding a sh script to fire-off parallel child
processes?

The only "other language" installed on my front door host is perl
because its part of the apache pkg. I don't want to install another
language just because it has a fast pre-canned dns lookup.

So if anyone knows of a perl dns lookup solution I sure would be
interested in hearing about it.

While waiting for a reply to that last question I have done more
testing. Using the drill command version of the .sh script against a
"host" file containing 409 records which is the smallest file I have,
found that 174 host names return mddomain or serverror. So it's
oblivious that all 12 host files need dns verification. Thats 900,000+
records.

If I run that .sh script against the same host file I start receiving
this console message;

Error: error sending query: Could not send or receive, because of
network error

The results indicate all the hosts were looked up. My isp provides 1gb
upload and 3gb download speeds so limited speed is not the cause of the
network error.

Does anyone have any ideas about what is going here?







_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Help scripting dns lookup using awk

Edgar Pettijohn III-2
On Sat, Sep 16, 2017 at 10:24:16AM -0400, Ernie Luzar wrote:

> Jonathan McKeown wrote:
> > Ernie, I've been following your questions over the last month or so. I
> > think I can guess your problem domain, and I suspect if you told the
> > list what you're trying to achieve you'd get much better suggested
> > solutions.
> >
> > As it is I think you have one approach in mind, and all your questions
> > relate to implementing parts of your idea.
> >
> > My humble apologies if I'm wrong; but please consider explaining what
> > the overall problem is. (Someone mentioned x-y problems - I think you
> > have one here.)
>
> Yes all my different posts over the last month are related to a solution
> I am trying to development. It all started with what looked like a very
> simple request from top management. "Stop employees from using social
> media from company PCs while at work"  The one and only Freebsd system
> is the front door to the Company LAN and wifi. All LAN devices are
> WINDOW machines either cabled or wifi including hand held smart phones.
> So needed a single point solution that would effect the whole digital shop.
>
> You ask what about smart phones accessing their wireless service. In the
> USA a wireless signal jammer is not legal if the people being effected
> are un-aware of its existence. On being hired all employees sign a legal
> contract containing security requirements and are made aware that a cell
> phone wireless signal jammer is employed covering the Company estate and
> that Company land lines phone service is the only allowed way for phone
> contact with the public for personal and Company business.
>
> As the result of questions posted here, I learned about online providers
> of "host" lists. These lists contain '127.0.0.1 domain-name" records of
> known malware sites. These "host" lists can be used on WINDOWS and Unix
> flavored operating systems by populating those machines host file. This
> was not a single point solution.
>
> Along comes using dns as a single point solution. The 3 main players
> being bind, unbound, and Dnsmasq which all have the function to be
> populated with domain-names to be blocked at the local host level as not
> found. I chose unbound, but am having problems with /etc/resolv.conf and
> resolvconf not working as documented. Also could not get the built-in
> local-unbound to work with any local changes. Posted questions here
> which went un-answered. local-unbound and resolvconf are new and don't
> have a user base yet to draw answers from. So pretty much a dead end. I
> finally installed the port version of unbound and got it working.
>
> Using the public host files and unbound became a single point solution
> to provide protection LAN wide that is un-seen by the user base. No more
> installing browser plug-ins that tries to do the same blocking function.
> The dns solution provides protection to the LAN users from LAN machines
> that may become infected. There is no absolute solution just more layers
> of protection.
>
> These public available "host" files contain a lot of un-necessary junk
> that needed to be cleaned away. I wrote a .sh script to do this, but it
> was very slow. Got help from this list to convert it to awk. Using the
> same sample input file .sh took 7+ minutes, the awk version took 4
> seconds. No brainier about which version I plan to use.
>
> As the last step in massaging the raw "host" file content what'd to do a
> dns lookup to verify those host domain-names were really good. Feeding
> unbound bogus domain-names is not going to hurt anything, but just
> wanted to be thorough. Again I started with a .sh script using the host
> system command which is very slow, I got help here from the list to
> convert it to awk and it was only a few seconds faster over all. I
> changed the .sh script to use the drill system command and it ran in
> half the time the host command version took.
>
> In reply to the subject of this post, I got the following;
>
> " Almost certainly the reason it's slow is that you are doing sequential
>    synchronous lookups. Switching  to another language isn't going help
>    much. To speed it up you either need to switch to a language with a
>    DNS library that supports asynchronous lookups or fire-off parallel
>    child processes. The latter is easier."
>
> So I posted my last reply asking;
>
> How would I go about coding a sh script to fire-off parallel child
> processes?
>
> The only "other language" installed on my front door host is perl
> because its part of the apache pkg. I don't want to install another
> language just because it has a fast pre-canned dns lookup.
>
> So if anyone knows of a perl dns lookup solution I sure would be
> interested in hearing about it.
>
> While waiting for a reply to that last question I have done more
> testing. Using the drill command version of the .sh script against a
> "host" file containing 409 records which is the smallest file I have,
> found that 174 host names return mddomain or serverror. So it's
> oblivious that all 12 host files need dns verification. Thats 900,000+
> records.
>
> If I run that .sh script against the same host file I start receiving
> this console message;
>
> Error: error sending query: Could not send or receive, because of
> network error

Looking at the source for drill. I believe this may be a generic error for
the sending, or receiving of the dns packet. Perhaps a snippet of the script
you are running showing how you are using drill may shed some light on it.

>
> The results indicate all the hosts were looked up. My isp provides 1gb
> upload and 3gb download speeds so limited speed is not the cause of the
> network error.
>
> Does anyone have any ideas about what is going here?
>
>
>
>
>
>
>
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "[hidden email]"
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Help scripting dns lookup using awk

Matthew Seaman-2
In reply to this post by Ernie Luzar
On 16/09/2017 15:24, Ernie Luzar wrote:
> Yes all my different posts over the last month are related to a solution
> I am trying to development. It all started with what looked like a very
> simple request from top management. "Stop employees from using social
> media from company PCs while at work"  The one and only Freebsd system
> is the front door to the Company LAN and wifi. All LAN devices are
> WINDOW machines either cabled or wifi including hand held smart phones.
> So needed a single point solution that would effect the whole digital shop.

The canonical solution to this sort of requirement is to implement a web
proxy on the egress from your network.  Within the proxy you maintain a
blacklist of forbidden sites that it will refuse to provide service to.

The trick is to use firewall redirection to force any and all web
traffic to hit the proxy, and permit only the proxy to make web requests
from your corporate network to the outside world -- the term is
"transparent proxy."

This works best with unencrypted traffic, but can also be made to work
with HTTPS, although not quite as effectively.  It is also possible for
a motivated person to use VPN software to get around this sort of
restriction, but anyone so desperate to evade your corporate policies is
probably better handled by your HR department than by getting into a
technological arms-race.

        Cheers,

        Matthew


signature.asc (949 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Help scripting dns lookup using awk

Steve O'Hara-Smith
On Sun, 17 Sep 2017 08:21:23 +0100
Matthew Seaman <[hidden email]> wrote:

> This works best with unencrypted traffic, but can also be made to work
> with HTTPS, although not quite as effectively.  It is also possible for
> a motivated person to use VPN software to get around this sort of
> restriction,

        Blocking all unproxied outside access kills that option, leaving
esoterica like IP/DNS as the only options to bypass security.

--
Steve O'Hara-Smith <[hidden email]>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"