Skip to content

Commit

Permalink
Merge pull request #39 from dhs-ncats/bugfix/remove_bad_characters
Browse files Browse the repository at this point in the history
Remove a few bad characters
  • Loading branch information
jsf9k authored Nov 30, 2018
2 parents 38875b1 + 22f357a commit 28a2d49
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 1 deletion.
10 changes: 10 additions & 0 deletions gather-domains.sh
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,16 @@ cut -d"," -f1 gathered.csv > scanme.csv
# Remove characters that might break parsing
sed -i '/^ *$/d;/@/d;s/ //g;s/\"//g;s/'\''//g' scanme.csv

# The latest Censys snapshot contains a host name that contains a few
# carriage return characters in the middle of it. Let's get rid of
# those.
sed -i 's/\r//g' scanme.csv

# We collect a few host names that contain consecutive dots. These
# seem to always be typos, so replace multiple dots in host names with
# a single dot.
sed -i 's/\.\+/\./g' scanme.csv

# Move the scanme to the output directory
mv scanme.csv $OUTPUT_DIR/scanme.csv

Expand Down
2 changes: 1 addition & 1 deletion version.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.0.6
1.0.7

0 comments on commit 28a2d49

Please sign in to comment.