awk help

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

awk help

Ernie Luzar
Hello list;

I wrote the following script to read a hosts file I downloaded
containing known bad sex domain names intended for addition to
/etc/hosts file. This script drops some useless records and builds
a local-zone: record for input to unbound. The file contains 55,000
records and this script takes 7 minutes to process it.

I know awk can do this same thing must faster. I have searched awk
and can not find any reference that would do what the tr '\r' ' '`
command does.

Would some one point me to documentation with example of how to get
the same result with awk as the tr '\r' ' '` command does.

Thanks.



#! /bin/sh
date
   host_out="$1"
   host_in="$2"
   truncate -s 0 $host_out

# Make the input file read a line at a time, not a field at a time.
   IFS=$'\n'
   set -f

   for line in `cat $host_in`; do

     # Locate and replace carriage return with blank.
     line=`echo -n "${line}" | tr '\r' ' '`

     # Locate and replace tab with blank.
     line=`echo -n "${line}" | tr '\t' ' '`

     # Drop blank lines.
     blank_line=`echo -n $line | cut -c 1-1`
     if [ "$blank_line" = " " ]; then
       continue
     fi

     # Drop lines with localhost in it.
     localhost=`echo -n $line | cut -w -f 2`
     if [ "$localhost" = "localhost" ]; then
       continue
     fi

     # Drop line with # in cloumn 1 as a comment.
     comment1=`echo -n $line | cut -c 1-1`
      if [ "$comment1" = "#" ]; then
       continue
     fi

     # Drop line with word Malvertising starting in cloumn 1
     comment1=`echo -n $line | cut -w -f 1`
     if [ "$comment1" = "Malvertising" ]; then
       continue
     fi

     # Out put record.
     ip=`echo -n $line | cut -w -f 1`
     $trace_on echo "ip = ${ip}"
     if [ "$ip" = "127.0.0.1" -o "$ip" = "0.0.0.0" ]; then
       domain_name=`echo -n $line | cut -w -f 2`
       echo "local-zone: \"${domain_name}\" always_nxdomain" >> $host_out
       continue
     else
       domain_name=`echo -n $line | cut -w -f 1`
       echo "local-zone: \"${domain_name}\" always_nxdomain" >> $host_out
     fi

   done
   date
   exit 0
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: awk help

Tim Daneliuk
On 09/10/2017 08:32 AM, Ernie Luzar wrote:
> Hello list;
>
> I wrote the following script to read a hosts file I downloaded
> containing known bad sex domain names intended for addition to
> /etc/hosts file. This script drops some useless records and builds
> a local-zone: record for input to unbound. The file contains 55,000
> records and this script takes 7 minutes to process it.
>
Ernie -

It would be a bit easier if you could provide us with a line or two of
sample input, and an example of desired output.


----------------------------------------------------------------------------
Tim Daneliuk     [hidden email]
PGP Key:         http://www.tundraware.com/PGP/

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

ignore

Ernie Luzar
testing email address
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: ignore

Adam Vande More
On Sun, Sep 10, 2017 at 11:12 AM, Ernie Luzar <[hidden email]> wrote:

> testing email address
>

Please use [hidden email] for this.

--
Adam
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: awk help

Ernie Luzar
In reply to this post by Tim Daneliuk
Tim Daneliuk wrote:

> On 09/10/2017 08:32 AM, Ernie Luzar wrote:
>> Hello list;
>>
>> I wrote the following script to read a hosts file I downloaded
>> containing known bad sex domain names intended for addition to
>> /etc/hosts file. This script drops some useless records and builds
>> a local-zone: record for input to unbound. The file contains 55,000
>> records and this script takes 7 minutes to process it.
>>
> Ernie -
>
> It would be a bit easier if you could provide us with a line or two of
> sample input, and an example of desired output.
>
>
> ----------------------------------------------------------------------------
> Tim Daneliuk     [hidden email]
> PGP Key:         http://www.tundraware.com/PGP/
>
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "[hidden email]"
>


127.0.0.1 localhostCR
127.0.0.1 001.hitgraph.jpCR
127.0.0.1 002.hitgraph.jpCR^M

### UncheckyAds

0.0.0.0 cdn.appround.biz
0.0.0.0 cdn.bigspeedpro.com
0.0.0.0 cdn.bispd.com

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: awk help

Tim Daneliuk
On 09/10/2017 11:22 AM, Ernie Luzar wrote:
<SNIP>


Sorry, I am still a bit dense or something.  Are all the lines below
input or are you trying to drop stuff starting with 127.0.0.1 or ....?


>>
>
>
> 127.0.0.1 localhostCR
> 127.0.0.1 001.hitgraph.jpCR
> 127.0.0.1 002.hitgraph.jpCR^M
>
> ### UncheckyAds
>
> 0.0.0.0 cdn.appround.biz
> 0.0.0.0 cdn.bigspeedpro.com
> 0.0.0.0 cdn.bispd.com
>


--
----------------------------------------------------------------------------
Tim Daneliuk     [hidden email]
PGP Key:         http://www.tundraware.com/PGP/

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: awk help

Polytropon
In reply to this post by Ernie Luzar
Allow me a few comments regarding sh -> awk for reformatting
your input data.

On Sun, 10 Sep 2017 09:32:02 -0400, Ernie Luzar wrote:
>      # Locate and replace carriage return with blank.
>      line=`echo -n "${line}" | tr '\r' ' '`

Drop the ^Ms in a pipe step ... | tr -d '\r' | ...



>      # Locate and replace tab with blank.
>      line=`echo -n "${line}" | tr '\t' ' '`

No need, awk defaults to tab(s) and/or space(s) as field
separators, and you can easily access the fields with $1,
$2, $3 and so on.



>      # Drop blank lines.
>      blank_line=`echo -n $line | cut -c 1-1`
>      if [ "$blank_line" = " " ]; then
>        continue
>      fi

Just add a rule (length > 0) infront of your { ... awk
statements for each line }.



>      # Drop lines with localhost in it.
>      localhost=`echo -n $line | cut -w -f 2`
>      if [ "$localhost" = "localhost" ]; then
>        continue
>      fi

Expand the rule like (length > 0 && $2 != "localhost") { ... },
in case "localhost" is the exact text; if you want to use a
reges, use $2 != /localhost/ instead.



>      # Drop line with # in cloumn 1 as a comment.
>      comment1=`echo -n $line | cut -c 1-1`
>       if [ "$comment1" = "#" ]; then
>        continue
>      fi

Add another rule as reges !/^#/ && ( ... as above ... ) { ... }
to filter those. Or, also possible, use ... | grep -v "^#" | ...
infront of awk.



>      # Drop line with word Malvertising starting in cloumn 1
>      comment1=`echo -n $line | cut -w -f 1`
>      if [ "$comment1" = "Malvertising" ]; then
>        continue
>      fi

See above.



>      # Out put record.
>      ip=`echo -n $line | cut -w -f 1`
>      $trace_on echo "ip = ${ip}"
>      if [ "$ip" = "127.0.0.1" -o "$ip" = "0.0.0.0" ]; then
>        domain_name=`echo -n $line | cut -w -f 2`
>        echo "local-zone: \"${domain_name}\" always_nxdomain" >> $host_out
>        continue
>      else
>        domain_name=`echo -n $line | cut -w -f 1`
>        echo "local-zone: \"${domain_name}\" always_nxdomain" >> $host_out
>      fi

Construct an output statement as desired. Use variables instead
of $1, $2, $3 if the whole things gets too complex, for example
like this (not tested, just for illustration):

#!/bin/sh

cat input.txt | tr -d '\r' | awk '
!/^#/ && (length > 0) {
        ip = $1
        host = $2

        if (!(host == "localhost" || ip == "127.0.0.1" || ip == "0.0.0.0"))
                printf("local-zone: \"%s\" always_nxdomain\n", ip)
        else
                printf("local-zone: \"%s\" always_nxdomain\n", host)
}' > output.txt

That should be basically what you need. You now just have to
combine the moving parts correctly. :-)





--
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: ignore

Thomas Mueller-6
In reply to this post by Adam Vande More

from Adam Vande More:

> On Sun, Sep 10, 2017 at 11:12 AM, Ernie Luzar <[hidden email]> wrote:
       
> > testing email address
       
> Please use [hidden email] for this.

I just checked.  List to send test messages to is [hidden email] .

Comparable to Usenet newsgroup alt.test .

Tom

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[hidden email]"