CUTCODEDOWN
Minimalist Semantic Markup

Welcome Guest
Please Login or Register

If you have registered but not recieved your activation e-mail in a reasonable amount of time, or have issues with using the registration form, please use our Contact Form for assistance. Include both your username and the e-mail you tried to register with.

Author Topic: PROBLEM - URL Case  (Read 1461 times)

benanamen

  • Full Member
  • ***
  • Posts: 160
  • Karma: +14/-0
PROBLEM - URL Case
« on: 4 Dec 2020, 06:23:22 pm »
Squire and Paladin currently allows any case including mixed case for the URL path. This should not be so.

Consider the /contact path. There are 128 case permutations of the word "contact" all of which will work in the URL.

Just by adding one more letter (/contacts) there would be 256 case permutations all of which would work.

There should be one, and only one case that should work, all lowercase. Any other case should result in a 404.


These cases all work  :o

Code: [Select]
contact Contact cOntact COntact coNtact CoNtact cONtact CONtact conTact ConTact cOnTact COnTact coNTact CoNTact cONTact CONTact contAct ContAct cOntAct COntAct coNtAct CoNtAct cONtAct CONtAct conTAct ConTAct cOnTAct COnTAct coNTAct CoNTAct cONTAct CONTAct contaCt ContaCt cOntaCt COntaCt coNtaCt CoNtaCt cONtaCt CONtaCt conTaCt ConTaCt cOnTaCt COnTaCt coNTaCt CoNTaCt cONTaCt CONTaCt contACt ContACt cOntACt COntACt coNtACt CoNtACt cONtACt CONtACt conTACt ConTACt cOnTACt COnTACt coNTACt CoNTACt cONTACt CONTACt contacT ContacT cOntacT COntacT coNtacT CoNtacT cONtacT CONtacT conTacT ConTacT cOnTacT COnTacT coNTacT CoNTacT cONTacT CONTacT contAcT ContAcT cOntAcT COntAcT coNtAcT CoNtAcT cONtAcT CONtAcT conTAcT ConTAcT cOnTAcT COnTAcT coNTAcT CoNTAcT cONTAcT CONTAcT contaCT ContaCT cOntaCT COntaCT coNtaCT CoNtaCT cONtaCT CONtaCT conTaCT ConTaCT cOnTaCT COnTaCT coNTaCT CoNTaCT cONTaCT CONTaCT contACT ContACT cOntACT COntACT coNtACT CoNtACT cONtACT CONtACT conTACT ConTACT cOnTACT COnTACT coNTACT CoNTACT cONTACT CONTACT
To save time, let's just assume I am never wrong.

GrumpyYoungMan

  • Hero Member
  • *****
  • Posts: 633
  • Karma: +8/-0
    • GrumpyYoungMan
Re: PROBLEM - URL Case
« Reply #1 on: 5 Dec 2020, 01:13:11 am »
Why is this a problem? It’s a genuine question...

If we know it is meant to be “contact” why should “contact” and “Contact” be able to be generate different pages depending on the case?
Trying to learn a new trick to prove old dogs can learn new ones...

Total Novice have-a go Amateur Programmer - not sure that is the right thing to say... but trying to learn...

benanamen

  • Full Member
  • ***
  • Posts: 160
  • Karma: +14/-0
Re: PROBLEM - URL Case
« Reply #2 on: 5 Dec 2020, 01:22:38 am »
There is not different pages generated by different cases... and that is the problem.

The problem comes when you get dinged by search engines for "duplicate content" and get a lower search ranking. Anyone can link to your site by any of the URL variations. Those links get indexed by search engines. If search ranking is not important to you then it doesn't matter.

Besides, it's just sloppy that it can do that.
To save time, let's just assume I am never wrong.

John_Betong

  • Full Member
  • ***
  • Posts: 218
  • Karma: +23/-1
    • The Fastest Joke Site On The Web
Re: PROBLEM - URL Case
« Reply #3 on: 6 Dec 2020, 03:14:54 am »
Following is "belt and braces",  not recommended but does solve the problem which should have been solved at the source:

Code: [Select]
$url = strtolower( $_SERVER['REQUEST_URI']);
echo $url = '<link rel="canonical" href="https:/' . $url .'" />';
Retired in the City of Angels where the weather suits my clothes

fgm

  • Jr. Member
  • **
  • Posts: 58
  • Karma: +5/-0
Re: PROBLEM - URL Case
« Reply #4 on: 6 Dec 2020, 03:58:17 am »
In similar cases what I do is setting the canonical URL.

This is what I do for www vs non-www. Instead of using a 301 redirection from non-www to www I set www URL as canonical, so I save one redirection for the first connection.
« Last Edit: 6 Dec 2020, 06:58:34 am by fgm »

Jason Knight

  • Administrator
  • Hero Member
  • *****
  • Posts: 832
  • Karma: +154/-1
    • CutCodeDown -- Minimalist Semantic Markup
Re: PROBLEM - URL Case
« Reply #5 on: 6 Dec 2020, 10:21:13 am »
Laughably this "error" occurs only on windows. The windows filesystem is case insensitive, the ENTIRE rest of the world is not.

I hadn't thought of that since I don't use windows for ACTUAL server use, and only in initial testing via XAMPP.

Thanks, that's something I should add a safety check for, at least for the actions.

Hmm... glob instead of file_exists? It does return case sensitive.

Something like ...

Code: [Select]
!empty($match = glob('/actions/' . $ACTION, GLOB_ONLYDIR)) &&
($match !== false)

... for the directory matching. File matching could be similarly implemented.

Would only be needed for Winblows though, so I'd probably use a wrapping file check function or object, with which one is loaded being based on capabilities detection. One of the few times overloading a PHP function would be nice. Actually that code would work on all systems and not be significant overhead. The CHECK for it might take more time than just doing it.

As much as I'd like to say "F*** the halfwit morons hosting on Windows" I know that's not a viable choice.
I'll fix every flaw, I'll break every law, I'll tear up the rulebook if that's what it takes. You will see, I will crush this cold machine.

GrumpyYoungMan

  • Hero Member
  • *****
  • Posts: 633
  • Karma: +8/-0
    • GrumpyYoungMan
Re: PROBLEM - URL Case
« Reply #6 on: 6 Dec 2020, 10:49:46 am »
Useless information and adds nothing to this conversation!

But I have both a local windows 10 server and a local Ubuntu server and then a dedicated public facing CentOs server for my live website...
« Last Edit: 6 Dec 2020, 10:52:04 am by GrumpyYoungMan »
Trying to learn a new trick to prove old dogs can learn new ones...

Total Novice have-a go Amateur Programmer - not sure that is the right thing to say... but trying to learn...

benanamen

  • Full Member
  • ***
  • Posts: 160
  • Karma: +14/-0
Re: PROBLEM - URL Case
« Reply #7 on: 6 Dec 2020, 11:52:33 am »
Following is "belt and braces",  not recommended but does solve the problem which should have been solved at the source:

Code: [Select]
$url = strtolower( $_SERVER['REQUEST_URI']);
echo $url = '<link rel="canonical" href="https:/' . $url .'" />';

Hi John,

strtolower was initially how I solved the "routing" problem when it came to code but then I realized the problem with this "fix". As I already explained, it is possible for any of the permutations to be indexed by search engines and/or be seen as "duplicate content", thus lowering your search rankings. So, strtolower is not the solution. There should only be one acceptable URL path (all lowercase, no spaces). Anything else should result in a 404 even though the letters are the same.
To save time, let's just assume I am never wrong.

John_Betong

  • Full Member
  • ***
  • Posts: 218
  • Karma: +23/-1
    • The Fastest Joke Site On The Web
Re: PROBLEM - URL Case
« Reply #8 on: 6 Dec 2020, 10:34:38 pm »
Quote
Hi John,strtolower was initially how I solved the "routing" problem when it came to code but then I realized the problem with this "fix". As I already explained, it is possible for any of the permutations to be indexed by search engines and/or be seen as "duplicate content", thus lowering your search rankings. So, strtolower is not the solution. There should only be one acceptable URL path (all lowercase, no spaces). Anything else should result in a 404 even though the letters are the same.

Hi @benanamen,


You missed how the strtolower(...) function result was used.

Well over fifteen years ago I started a joke site and learned a lot about SEO Brownie Points.

I progressed from:

1. using links with numeric parameters to search static Joke web pages

2.  changed the static web-pages to parameters with spaces between joke title words

3. changed spaces to underscore

4. changed underscore to hyphens

5. changed jokes titles to Proper Case

6. earlier on I added a MySql Database

Progressive changes were due to learning from SEO Gurus such as Google. Each progression created problems with "duplicate content" which was overcome allowing the previous joke parameter to link to the new joke parameter by using Canonical links.

The numeric and Proper Case parameter still works but I have dropped the conversion from items 2 ... 4 as can be seen from the following four links:

https://www.johns-jokes.com/2001

https://www.johns-jokes.com/golf-with-a-gorilla

https://www.johns-jokes.com/Golf-With-a-Gorilla

www.johns-jokes.com/GOLF-WITH-A-GORILLA?size-medium&color=blue

Recommended Canonical readng:

https://developers.google.com/search/docs/advanced/crawling/consolidate-duplicate-urls
Retired in the City of Angels where the weather suits my clothes

Jason Knight

  • Administrator
  • Hero Member
  • *****
  • Posts: 832
  • Karma: +154/-1
    • CutCodeDown -- Minimalist Semantic Markup
Re: PROBLEM - URL Case
« Reply #9 on: 7 Dec 2020, 05:16:57 pm »
As I already explained, it is possible for any of the permutations to be indexed by search engines and/or be seen as "duplicate content", thus lowering your search rankings.
Which they would ONLY do if someone makes links for all those permutations, since engines spider sites by existing links, not randomly throwing shit at your server like a monkey at the zoo.

It could be a mechanism of attack to sabotage your SEO by some black hat cracker, but it's not something search is going to go do intentionally "just because it can". It has to find a link worded that way before it goes off trying to access it.

If you have a file on your hosting that nowhere on the web is there a href pointing at it, search engines will NEVER find it.
« Last Edit: 7 Dec 2020, 05:20:16 pm by Jason Knight »
I'll fix every flaw, I'll break every law, I'll tear up the rulebook if that's what it takes. You will see, I will crush this cold machine.

benanamen

  • Full Member
  • ***
  • Posts: 160
  • Karma: +14/-0
Re: PROBLEM - URL Case
« Reply #10 on: 7 Dec 2020, 06:16:52 pm »
If you have a file on your hosting that nowhere on the web is there a href pointing at it, search engines will NEVER find it.

That is very much not true.
To save time, let's just assume I am never wrong.

Jason Knight

  • Administrator
  • Hero Member
  • *****
  • Posts: 832
  • Karma: +154/-1
    • CutCodeDown -- Minimalist Semantic Markup
Re: PROBLEM - URL Case
« Reply #11 on: 7 Dec 2020, 06:21:30 pm »
That is very much not true.
Since when? Isn't that in fact the very reason that sitemap nonsense came into being, and why tools for checking that all your pages are cross-linked for SEO purposes exists?

If there's NO link anywhere on the web to it, how would a search engine EVER find it? In what reality does that even make sense to be a thing?

They do NOT sit there just slamming your server with every possible upper and lower case combination... if they did the average server logs would blow the servers apart with 404's.
I'll fix every flaw, I'll break every law, I'll tear up the rulebook if that's what it takes. You will see, I will crush this cold machine.

benanamen

  • Full Member
  • ***
  • Posts: 160
  • Karma: +14/-0
Re: PROBLEM - URL Case
« Reply #12 on: 7 Dec 2020, 06:36:48 pm »
Quote
how would a search engine EVER find it?

Probably the same way I do when I Pen Test. Getting a list of files off a server is trivial and does not require them to have links to them. I can find all kinds of goodies people think nobody knows about. Tons and tons of backup files, zips, tars, etc that are not linked to and much more.
To save time, let's just assume I am never wrong.

Jason Knight

  • Administrator
  • Hero Member
  • *****
  • Posts: 832
  • Karma: +154/-1
    • CutCodeDown -- Minimalist Semantic Markup
Re: PROBLEM - URL Case
« Reply #13 on: 7 Dec 2020, 09:37:51 pm »
Probably the same way I do when I Pen Test. Getting a list of files off a server is trivial and does not require them to have links to them.
That "list of files off a server" ARE links to them. Which means you're exploiting flaws in configuration (like I had recently) list directory listings being public facing.

I just placed a file called "nobodyShouldSeeThis.txt" on the main cutcodedown.com in a random subdirectory that has never been publicly linked to. Find it.

I even made it "Easy" by giving you its name. Unless you've cracked my FTP or found some way to pull directory listings that shouldn't exist,  you're not gonna find it. Much less a search engine which has much better things to spend their time on.

... and it would also mean I have a security flaw in my jdir.php I'd like to know about so I can fix it.
« Last Edit: 8 Dec 2020, 06:08:08 am by Jason Knight »
I'll fix every flaw, I'll break every law, I'll tear up the rulebook if that's what it takes. You will see, I will crush this cold machine.

 

SMF spam blocked by CleanTalk

Advertisement