Sunday, 30 September 2012

HTTP proxy


I really wanted to make one and this assignment on the Princeton blog was a perfect guide to start with.

Code- https://github.com/pragya1990/proxy (written in C)

Proxy.c is a simple program which forwards client requests to the server and has the ability to filter/modify the messages reaching the client from the server depending on the type of proxy the user wants to implement.

Some of the http codes I used were for :
- bad_request 400
- forbidden 403
- hostless request
- internal error 206
- missing 500
- moved permanently 301
- not found 404
- unauthorised request 401

Socket programming is used for making a web connection with a web client. The program includes checks like validating client requests, changing buffer size etc.


traffic-analysis using tcpdump

Tcpdump is a pretty useful tool. Wireshark can also be used as an alternative.

code - https://github.com/pragya1990/traffic-analysis


Tcpdump stores the file in .pcap format. Using the pcap library functions, we can analyse the packets captured using tcpdump. More information can be found at 'man pcap'.

The program "pcap_program.c" reads the packets of a pcap file "packet.pcap".
It then maps the IP addresses to some numbers which are stores in map.txt.
The edges.txt file shows which IP addresses are talking to which one and for how many seconds and microseconds.

In the terminal, I executed the command : tcpdump -i 3 -c 15 -w /home/hp/Desktop/tcpdump/packets.pcap
It captures 15 IP packets and saves them to packets.pcap.

The total list of IP addresses as shown in terminal :

root@ubuntu:~# tcpdump -i 3 -c 15 -w /home/hp/Desktop/tcpdump/packets.pcap
tcpdump: listening on wlan0, link-type EN10MB (Ethernet), capture size 96 bytes
15 packets captured
15 packets received by filter
0 packets dropped by kernel
root@ubuntu:~# tcpdump -n -q -r /home/hp/Desktop/tcpdump/packets.pcap
reading from file /home/hp/Desktop/tcpdump/packets.pcap, link-type EN10MB (Ethernet)
16:45:45.464511 IP 209.85.231.83.443 > 192.168.1.2.59417: tcp 52
16:45:45.464568 IP 192.168.1.2.59417 > 209.85.231.83.443: tcp 0
16:45:49.471288 IP 192.168.1.2.56531 > 192.168.1.1.53: UDP, length 36
16:45:49.493679 IP 192.168.1.1.53 > 192.168.1.2.56531: UDP, length 180
16:45:49.494077 IP 192.168.1.2.46097 > 75.101.153.231.80: tcp 0
16:45:49.723182 IP 192.168.1.2.46098 > 75.101.153.231.80: tcp 0
16:45:49.836979 IP 75.101.153.231.80 > 192.168.1.2.46097: tcp 0
16:45:49.837062 IP 192.168.1.2.46097 > 75.101.153.231.80: tcp 0
16:45:49.837767 IP 192.168.1.2.46097 > 75.101.153.231.80: tcp 482
16:45:50.062293 IP 75.101.153.231.80 > 192.168.1.2.46098: tcp 0
16:45:50.062343 IP 192.168.1.2.46098 > 75.101.153.231.80: tcp 0
16:45:50.182085 IP 75.101.153.231.80 > 192.168.1.2.46097: tcp 0
16:45:50.184581 IP 75.101.153.231.80 > 192.168.1.2.46097: tcp 231
16:45:50.184612 IP 192.168.1.2.46097 > 75.101.153.231.80: tcp 0
16:45:50.185293 IP 75.101.153.231.80 > 192.168.1.2.46097: tcp 0

root@ubuntu:~/Desktop/tcpdump# gcc -lpcap -o pcap_program pcap_program.c
root@ubuntu:~/Desktop/tcpdump# ./pcap_program


After compiling the program and executing ./pcap. After compiling the program and executing ./pcap_program_program, two files map.txt and edges.txt were made.

In this program, I have taken one assumption that the packets have ether type IP. However, while running the program several times, I realised that sometimes packets with ARP protocol were also coming and this gave an error in the program as the pointer of the IP header was set according to the offset of the IP protocol at 14. I did overcome this problem by giving an offset of '0' for protocols other than IP, but its not always correct.

For finding the talking time of the IP addresses, I subtracted the time of two consecutive IP packets. I am not sure if this is how we get the talking time but it seems correct. The talking time of the last packet is not shown as we need the next packet to find its time.

We can also visualize this traffic using the igraph library and interesting data analytics could be done using these graphs by understanding the communication of the nodes, one of them being "Six degrees of separation". :)

text-editor in python

Python is an extremely simple language and a good option to start with small projects when you are still a noob at programming.

code - https://github.com/pragya1990/editor

 Used Python IDE and Tkinter library for GUI

The editor file is the starting point. 
The code should work except that the path needs to be changed for dictionary.txt and for all the gif files that can be downloaded from github. 
- The path for dictionary.txt appears in find_misspelled_word.py 
- the path for *.gif files appears in Interface_2.py

Just download the code and type: "python editor.py" in the terminal.

Screenshot - 


Pointers: 
  • Spell-check : The misspelled word was found by comparing it with a dictionary stored as a hash table. To make the list of suggested words for the misspelled word, an attribute score associated with each word would increment based on its similarity with misspelled word. Similarity includes the length of the common substring, the number of common letters, words that start with the same letter as the misspelled word. For instance: For misspelled word “utail”,"tail" will be higher in rankings because the length of common substring 4(very high considering total length of "utail" is only 5). The top ten words with the highest score are displayed in the list of suggested words.
  • Undo-Redo : This was the most interesting part. After doing it wrong many times, the final solution had an undo and redo stack. The attributes associated with an event(keypress) were event_type, start, end, value, operation. Start and end define the index, value stores the string, event_type defines the behaviour - delete/insert the string or shift the cursor. On pressing undo button, the event at the top of the stack is executed, popped and pushed into the redo stack. The same happens for the redo stack.
  • Find and Replace : It supports features such as match case, match entire word only, search backwards and wrap around. Boyer-Moore would become inefficient when all permutations of a pattern are considered, so a modified rabin-karp was used instead. Matched patterns were appropriately highlighted. Replace contained additional buttons for replace one or replace all.
  • Functions for open file, save, quit, new were simple as they were already provided by the library. 
Overall, it turned out to be 2330 lines of code divided into 24 files.

Beginning..

This blog is mostly about the projects and DIY things done during my college life. I started late and there aren't many, but then I thought it would be good idea to post them somewhere.

They might come handy for beginners, especially for those who are interested in networking.