Monday, 1 October 2012

local-dropbox for home

The very idea of dropbox fascinates me. Local-dropbox uses the bandwidth of the home wireless router for transferring large files from one device to another. The idea is to save internet bandwidth by converting one of the spare laptops into a server and using it for auto-syncing files at different workstations(laptops, smartphones, ipod). For instance: Using a dropbox for transferring a movie is not efficient as dropbox will take hours for uploading and syncing(+ it will ask for money :P). A wireless router at home can serve the same purpose along with achieving the minimum possible latency, zero internet bandwidth and highly secure, since no external device is involved.

code - https://github.com/pragya1990/local-dropbox

Used Python IDE and SimpleHTTPServer.

The code is still very raw, but I would still highlight how I went about it:

1) Initial connection : The server remains open at standard HTTP port for receiving messages from the client. The client connects to the server using socket programming and initial control information is exchanged between them such as client_id, client_port_for_file_transfer, server_port_for_file_transfer etc. Standard OK messages are sent if the everything went fine otherwise appropriate error messages are shown.

2) File transfer: I used the SimpleHTTPServer's commands for opening two ports(one at the server and another at the client) for transferring files. The urllib library was used for viewing files at the other end and reading them.

3) Syncing: At the client end, the program continuously checks if the dropbox folder is updated or not. The log file of client and server should be exactly same if no changes has happened on either side. If it needs updation, the client modifies its log file and sends a control message to the server to update its log file as well. The server checks its log file with the client log and downloads/updates the required files. If it is a shared folder, control messages are further sent to the other clients who are sharing the modified folder.

4) Testing: Initially both the client and server are running on the same laptop to make testing easy. Syncing and file updation works perfectly. :)

HTTP forwarder


I like playing with network packets and I wanted a deeper understanding of how every layer works, not just theoretically, but practically. Small projects really help a lot :
My previous exercises of analysing network traffic exposed me to the network layer but to a very small extent. 
Making a proxy gave some idea about the working of application layer. 

This particular project was aimed for developing a deeper understanding of all layers of the TCP/IP protocol. The program mainly involves changing of physical and logical addresses, port checking and computing checksum at the ethernet, network and transport layer.

code - https://github.com/pragya1990/http-forwarder (written in C)


Consider a scenario in which one laptop(say server) is acting as a HTTP forwarder(it has access to the internet by WLAN) for another laptop(say client, which is not connected to the internet) and both laptops are connected to each other by LAN.

Steps while sending a packet :
1) Sending --> (in the child process)listen at eth1 and if destination port == 80 , then modify packet - (a) write destination ip and source port in router.txt (b) change source IP, source and destination mac addr, checksum (c) inject the modified packet in wlan1 interface using pcap_inject.
2) --> Receiving : (in the parent process)listen to wlan1 and if source ip & destination port number exist in router.txt & source port == 80,then modify packet - (a) change destination ip, source and destination mac addr, checksum. (b) inject the modified packet in eth0 interface using pcap_inject.


The response from the server includes tcp handshake packets(syn, syn-ack, ack) and one "http get" request.
Here's the wireshark dump of one such packet sent. (162.254.3.1 is the IP address of the client laptop and 125.252.226.160(some random site)
3 0.000370 162.254.3.1 125.252.226.160 TCP 51442 > http [SYN] Seq=0 Win=5840 Len=0 MSS=1460
5 0.063316 125.252.226.160 162.254.3.1 TCP http > 51442 [SYN, ACK] Seq=0 Ack=1 Win=5840 Len=0 MSS=1460
6 0.063568 162.254.3.1 125.252.226.160 TCP 51442 > http [ACK] Seq=1 Ack=1 Win=5840 Len=0
7 0.063996 162.254.3.1 125.252.226.160 HTTP GET / HTTP/1.1
8 0.064114 125.252.226.160 162.254.3.1 TCP http > 51443 [SYN, ACK] Seq=0 Ack=1 Win=5840 Len=0 MSS=1460
9 0.064381 162.254.3.1 125.252.226.160 TCP 51443 > http [ACK] Seq=1 Ack=1 Win=5840 Len=0

Sunday, 30 September 2012

HTTP proxy


I really wanted to make one and this assignment on the Princeton blog was a perfect guide to start with.

Code- https://github.com/pragya1990/proxy (written in C)

Proxy.c is a simple program which forwards client requests to the server and has the ability to filter/modify the messages reaching the client from the server depending on the type of proxy the user wants to implement.

Some of the http codes I used were for :
- bad_request 400
- forbidden 403
- hostless request
- internal error 206
- missing 500
- moved permanently 301
- not found 404
- unauthorised request 401

Socket programming is used for making a web connection with a web client. The program includes checks like validating client requests, changing buffer size etc.


traffic-analysis using tcpdump

Tcpdump is a pretty useful tool. Wireshark can also be used as an alternative.

code - https://github.com/pragya1990/traffic-analysis


Tcpdump stores the file in .pcap format. Using the pcap library functions, we can analyse the packets captured using tcpdump. More information can be found at 'man pcap'.

The program "pcap_program.c" reads the packets of a pcap file "packet.pcap".
It then maps the IP addresses to some numbers which are stores in map.txt.
The edges.txt file shows which IP addresses are talking to which one and for how many seconds and microseconds.

In the terminal, I executed the command : tcpdump -i 3 -c 15 -w /home/hp/Desktop/tcpdump/packets.pcap
It captures 15 IP packets and saves them to packets.pcap.

The total list of IP addresses as shown in terminal :

root@ubuntu:~# tcpdump -i 3 -c 15 -w /home/hp/Desktop/tcpdump/packets.pcap
tcpdump: listening on wlan0, link-type EN10MB (Ethernet), capture size 96 bytes
15 packets captured
15 packets received by filter
0 packets dropped by kernel
root@ubuntu:~# tcpdump -n -q -r /home/hp/Desktop/tcpdump/packets.pcap
reading from file /home/hp/Desktop/tcpdump/packets.pcap, link-type EN10MB (Ethernet)
16:45:45.464511 IP 209.85.231.83.443 > 192.168.1.2.59417: tcp 52
16:45:45.464568 IP 192.168.1.2.59417 > 209.85.231.83.443: tcp 0
16:45:49.471288 IP 192.168.1.2.56531 > 192.168.1.1.53: UDP, length 36
16:45:49.493679 IP 192.168.1.1.53 > 192.168.1.2.56531: UDP, length 180
16:45:49.494077 IP 192.168.1.2.46097 > 75.101.153.231.80: tcp 0
16:45:49.723182 IP 192.168.1.2.46098 > 75.101.153.231.80: tcp 0
16:45:49.836979 IP 75.101.153.231.80 > 192.168.1.2.46097: tcp 0
16:45:49.837062 IP 192.168.1.2.46097 > 75.101.153.231.80: tcp 0
16:45:49.837767 IP 192.168.1.2.46097 > 75.101.153.231.80: tcp 482
16:45:50.062293 IP 75.101.153.231.80 > 192.168.1.2.46098: tcp 0
16:45:50.062343 IP 192.168.1.2.46098 > 75.101.153.231.80: tcp 0
16:45:50.182085 IP 75.101.153.231.80 > 192.168.1.2.46097: tcp 0
16:45:50.184581 IP 75.101.153.231.80 > 192.168.1.2.46097: tcp 231
16:45:50.184612 IP 192.168.1.2.46097 > 75.101.153.231.80: tcp 0
16:45:50.185293 IP 75.101.153.231.80 > 192.168.1.2.46097: tcp 0

root@ubuntu:~/Desktop/tcpdump# gcc -lpcap -o pcap_program pcap_program.c
root@ubuntu:~/Desktop/tcpdump# ./pcap_program


After compiling the program and executing ./pcap. After compiling the program and executing ./pcap_program_program, two files map.txt and edges.txt were made.

In this program, I have taken one assumption that the packets have ether type IP. However, while running the program several times, I realised that sometimes packets with ARP protocol were also coming and this gave an error in the program as the pointer of the IP header was set according to the offset of the IP protocol at 14. I did overcome this problem by giving an offset of '0' for protocols other than IP, but its not always correct.

For finding the talking time of the IP addresses, I subtracted the time of two consecutive IP packets. I am not sure if this is how we get the talking time but it seems correct. The talking time of the last packet is not shown as we need the next packet to find its time.

We can also visualize this traffic using the igraph library and interesting data analytics could be done using these graphs by understanding the communication of the nodes, one of them being "Six degrees of separation". :)

text-editor in python

Python is an extremely simple language and a good option to start with small projects when you are still a noob at programming.

code - https://github.com/pragya1990/editor

 Used Python IDE and Tkinter library for GUI

The editor file is the starting point. 
The code should work except that the path needs to be changed for dictionary.txt and for all the gif files that can be downloaded from github. 
- The path for dictionary.txt appears in find_misspelled_word.py 
- the path for *.gif files appears in Interface_2.py

Just download the code and type: "python editor.py" in the terminal.

Screenshot - 


Pointers: 
  • Spell-check : The misspelled word was found by comparing it with a dictionary stored as a hash table. To make the list of suggested words for the misspelled word, an attribute score associated with each word would increment based on its similarity with misspelled word. Similarity includes the length of the common substring, the number of common letters, words that start with the same letter as the misspelled word. For instance: For misspelled word “utail”,"tail" will be higher in rankings because the length of common substring 4(very high considering total length of "utail" is only 5). The top ten words with the highest score are displayed in the list of suggested words.
  • Undo-Redo : This was the most interesting part. After doing it wrong many times, the final solution had an undo and redo stack. The attributes associated with an event(keypress) were event_type, start, end, value, operation. Start and end define the index, value stores the string, event_type defines the behaviour - delete/insert the string or shift the cursor. On pressing undo button, the event at the top of the stack is executed, popped and pushed into the redo stack. The same happens for the redo stack.
  • Find and Replace : It supports features such as match case, match entire word only, search backwards and wrap around. Boyer-Moore would become inefficient when all permutations of a pattern are considered, so a modified rabin-karp was used instead. Matched patterns were appropriately highlighted. Replace contained additional buttons for replace one or replace all.
  • Functions for open file, save, quit, new were simple as they were already provided by the library. 
Overall, it turned out to be 2330 lines of code divided into 24 files.

Beginning..

This blog is mostly about the projects and DIY things done during my college life. I started late and there aren't many, but then I thought it would be good idea to post them somewhere.

They might come handy for beginners, especially for those who are interested in networking.