Search String Faster With Ruby

- - posted in Technical | Tagged as grep,ruby,shellscript | Comments

Is grep Slow? Sure it is. Read on.. Just a few days back there was an electrical outage and lot of applications were dead.
In one case, lot of customer orders could not be processed. Hence, there rose a need of manual intervention and extraction of orders (XML) from a logfile and re-feeding them to another system. The task was simple, I had order numbers in orders.txt and I had to write a shell script to grep for a particular xml containing each of these orders, extract XML and create a file for each order.

Shell script to loop and find an order in another file
1
2
3
4
5
6
7
8
9
10
11
12
13
cat extract.sh
--------------
i=0
echo "Extraction Script Started at:" `date`
while read order_id; do
  filename=$order_id"_tmp.xml"
  finalfilename=$order_id".xml"
  grep ".*$order_id." *.log > $filename && echo "written xml for $order_id in $filename" || echo $order_id >> Orders_not_found.txt
  cat $filename | sed -e 's|text-to-remove|new-text|g' | sed -e 's|\*\*\*||g' > $finalfilename
  rm $filename
  ((i++))
done < orders.txt
echo "Extraction Script Ended at:" `date`

But the problem was that the log file in which I was searching was too huge. It was 5GBs in total. Hence the grep was taking minimum 4-5 Minutes to search one order and create an xml file for that. Clearly this was not a solution, as I had to find thousand orders in those log files and it was very critical for end customer.

If my calulation was right, I had to spend:
4 Mins = 1 Xml
60 Mins = 1 Hr = 15 Xml
at this rate I would have spent atleast 3-4 days CPU time , to get all those 1000 XMLs. (Not to mention the pain of getting screwed and frustration). Meaning which, we all would have been screwed over and over again for 3-4 days by the customer.

Enter Ruby: One liner saved us.
I used this ruby command, to first find the relevant generic string then create order xml files using a normal shell script as above. I thought I would keep this fir future reference.

1
$ ruby -pe 'next unless $_ =~ /<RegExp for stringtomatch>.*Number.*/' < 5GB-file.log >> 6Mb-file-reduced.log

Wondereful, Ruby took just few minutes to grep the regular exp string into a 5GB log file, and now I had to search orders into this smaller reduced size intermediate file.

Thus, this saved us 3 days and did wonders in just half an hour.

Voila !!!

Credit for the one liner goes to Garry Tan, where I found this wonderful ruby command.





Gravatar of Ashwani Kumar

Recent posts


Subscribe



Your Feedback encourages me




Learning and Developments

One Month Rails



, AWS AWS, Active Directory, Facebook Flash, Forwarding, GOD,Chat,Coffee Github,Feedback,Repo Google,Search HAProxy, IOT, IP-block JQuery Load MQ MQTT, Messaging Octopress Octopress, OpenVpn PI, Plugin Plugin, Port Raspberry, S3, SSH, Shell,Commands Soapui, Tag Tag, Tree, Tunneling XML XML, XServer, Xming ajax, angular, animated balancing cloud, commenting, connectivity datatables diaspora dropdown geocoding grep, ipaddress, ipv6, java, mysql nokogiri, octopress-migration octopress-plugin openssl powershell proxy rails, repo reviews ruby, script scripts, security, sharepoint shell ssh, telnet, vi, vieditor vim, visualblock, webattacks windows,cleanup windowsxp