Jobs

From stacky wiki
Revision as of 12:44, 16 October 2011 by Anton (talk | contribs) (Created page with "Here's a python script that will take a list of university names and produce a page listing all jobs at those universities listed on mathjobs. <pre> #!/usr/bin/python # email =...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Here's a python script that will take a list of university names and produce a page listing all jobs at those universities listed on mathjobs.

#!/usr/bin/python
#

email = "YOUREMAILHERE"
passwd= "YOURPASSWORDHERE"
names = ["City University of New York","Georgia Institute of Technology","Indiana University"]
targetfile = "jobs.html"

from mechanize import Browser
from lxml import etree
from StringIO import StringIO
import time

br = Browser()

# need to log in first
br.open("https://www.mathjobs.org/jobs?info-ja")
br.select_form(nr=0)
br["email"]=email
br["pass"]=passwd
br.submit()

# now search for jobs
results = etree.Element("table")
first = True
for n in names:
    br.open("https://www.mathjobs.org/jobs?jobsearch")
    br.select_form(name="mainForm")
    br["Name"]=n
    br.submit()
    page = etree.parse(StringIO(br.response().read()), etree.HTMLParser()).getroot()
    try:
        resulttable = page.find("body").findall("table")[1]
    except:
        resulttable = ['x']
        print "trouble with", n
    if first:
        results.append(resulttable[0]) # include headers first time
        first = False
    results.extend(resulttable[1:]) # remove column headers
    time.sleep(5) # wait five seconds to be gentle on the mathjobs server

page[1][3]=results  # plug the concatenated results into the last viewed page
f = open(targetfile,"w")
f.write(etree.tostring(page))
f.close()

Then fix the links with

sed -e 's|<a href="/jobs|<a href="https://www.mathjobs.org/jobs|g' jobs.html > jobs1.html

You'll need to configure your browser to not specify the referrer, since mathjobs doesn't like people linking directly to jobs postings. For Firefox, you can use the RefControl extension.

Then you can open jobs1.html in your browser, and click the links to go to the jobs. Note that the wizard tool won't work.