Automate Everything w/ Bash, Linux & Command Line
  1. Google Page Rank Bash Script

    So you can get the Google Page Rank for a page in many ways. Perhaps the easiest is to just install a web browser extension. They’re widely available for Google Chrome and Firefox. In Chrome, I’m currently using using one simply called "PageRank" and it works well.

    What if you need something more? What if you need to get Google Page Rank outside of your browser toolbar? Maybe you’re working on link building and you have a list of sites/page you’re considering and want to pull Google Page Rank for each of them. Or, maybe you just want to track Google Page Rank changes for your own pages over time and compare them to your competitors. There are many reasons why you may need to pull this data in an automated manner.

    How to Look-up Google Page Rank

    Asking Google to provide their PR for a page isn’t really all that complicated of a request. It basically requires that you know the URL of the page you’re requesting (duh…) and then the checksum of that URL (as calculated by Google’s algorithm). The checksum is the difficult piece.

    Google Page Rank Reporting Options

    Many of the popular scripting languages have add-on modules for calculating the checksum needed to query Google Page Rank, or will actually make the full query. However, I like the simplicity of Bash and wanted to figure out how to calculate the checksum in Bash. I failed, on multiple attempts.

    Fortunately, I did find a program written in C which was previously published at http://zhiwei.li/. At this point the page where the code was is 404ing, so I’ve decided to publish it below.

    /******************************************************************************
    Filename     : pagerank.c
    Description  : Google PageRank Checksum Algorithm 
    Author       : http://zhiwei.li/
    Log          : Ver 0.1 2005-9-13  first release
                   Ver 1.0 2005-10-19  fixed :final character bug
                   Ver 1.1 2006-10-05  refine code
                   Ver 1.2 2008-8-20   use boolean type
    ******************************************************************************/
    
    #include <stdio.h>
    #include <stdbool.h>
    
    int ConvertStrToInt(char *pStr, int Init, int Factor)
    {
        while (*pStr) {
            Init *= Factor;
            Init += *pStr++;
        }
        return Init;
    }
    
    int HashURL(char *pStr)
    {
        unsigned int C1, C2, T1, T2;
    
        C1 = ConvertStrToInt(pStr, 0x1505, 0x21);
        C2 = ConvertStrToInt(pStr, 0, 0x1003F);
        C1 >>= 2;
        C1 = ((C1 >> 4) & 0x3FFFFC0) | (C1 & 0x3F);
        C1 = ((C1 >> 4) & 0x3FFC00) | (C1 & 0x3FF);
        C1 = ((C1 >> 4) & 0x3C000) | (C1 & 0x3FFF);
    
        T1 = (C1 & 0x3C0) << 4;
        T1 |= C1 & 0x3C;
        T1 = (T1 << 2) | (C2 & 0xF0F);
    
        T2 = (C1 & 0xFFFFC000) << 4;
        T2 |= C1 & 0x3C00;
        T2 = (T2 << 0xA) | (C2 & 0xF0F0000);
    
        return (T1 | T2);
    }
    
    char CheckHash(unsigned int HashInt)
    {
        int Check = 0;
        bool Flag = false;
        int Remainder;
    
        do {
            Remainder = HashInt % 10;
            HashInt /= 10;
            if (Flag){
                Remainder += Remainder;
                Remainder = (Remainder / 10) + (Remainder % 10);
            }
            Check += Remainder;
            Flag = !Flag;
        } while( 0 != HashInt);
    
        Check %= 10;
        if (0 != Check) {
            Check = 10 - Check;
            if (Flag) {
                if (1 == (Check % 2)) {
                    Check += 9;
                }
                Check >>= 1;
            }
        }
        Check += 0x30;
        return Check;
    }
    
    int main(int argc, char* argv[])
    {
        unsigned int HashInt;
    
        if (argc != 2) {
            printf("Usage: %s [URL]\n",argv[0]);
            return 1;
        }
    
        HashInt = HashURL(argv[1]);
        printf("Checksum=7%c%u\n", CheckHash(HashInt), HashInt);
        return 0;
    }
    

    Since this is C source code, you have to compile it first before you can execute it. Save this code as pagerank.checksum.c and then run the following command to compile it. This works great in Linux, but should also work well in OSX and Cygwin.

    gcc -o pagerank.checksum pagerank.checksum.c
    

    This will great the binary named pagerank.checksum. This part of the program will just give you the checksum value. You can test to make sure it’s working by executing it on the command line and giving the page you want as an argument. Like this:

    ./pagerank.checksum http://www.google.com
    

    It will output Checksum=791322981365 as the value. The hard part is over. Now, you can use that binary in the following bash script to query the Google Page Rank. Save it as something link pr.sh.

    #!/bin/bash
    # This script uses the C program sourced from http://zhiwei.li/ to calculate the page checksum needed to query Google Page Rank.
    page=$1
    page_encoded=`echo $page | sed 's/\//%2F/g;s/:/%3A/g'`
    checksum=`./pagerank.checksum $page | sed 's/Checksum=//'`
    pr_request="http://toolbarqueries.google.com/tbr?client=navclient-auto&ch=$checksum&ie=UTF-8&oe=UTF-8&features=Rank&q=info:$page_encoded"
    curl -s "$pr_request" | cut -d":" -f3
    

    Then run it like this.

    ./pr.sh http://www.google.com
    

    And you’ll see that Google currently gives its-self a rank of 9. That’s it. You’ve now combined a program written in C and a Bash script to get Google Page Rank for a page.

    What about if you need to pull PR for a huge list of pages? That’s also easy. Save all the pages you want to check into a text file called pages.to.check.txt and then run this command.

    for URL in `cat pages.to.check.txt`; do echo -e "$URL\t$(./pr.sh $URL)"; done
    

    Maybe at some point I’ll bundle this all up for download…

    Happy automation!

     
    1. automateeverything posted this