Sane PCRE URL Regex for PHP

So it turns out that there are some "oddities" in the spec and URLs that don't seem like URLs are actually URLs, and thus PHP's filter_var lets them pass. TLDs are not a requirement for example.

This created some interesting conflicts though. I'm building an exporter that exports stuff from WordPress and into Ghost (yes, I'm the guy you need to bug if you have issues with the Ghost plugin).

Upon import, data is checked against a validator, specifically this one: https://github.com/chriso/validator.js

WordPress allows users to enter what seems to be a URL, which if I just export as is, will fail on import to Ghost. Also, according to WordPress, this is something that should be saved in the website field: http://<script>alert(derp);</script>. The characters would be escaped, but it's clearly not a URL. Ghost chokes on it.

Anyway, I set out to convert the regexp used in the js thing into a php thing. Enjoy: