URL shortner
来源:互联网 发布:c语言中double 编辑:程序博客网 时间:2024/06/05 23:07
want to create a URL shortener service where you can write a long URL into an input field and the service shortens the URL to "http://www.example.org/abcdef
". Instead of "abcdef
" there can be any other string with six characters containing a-z, A-Z and 0-9
. That makes 56~57 billion possible strings.
Edit: Due to the ongoing interest in this topic, I've uploaded the code that I used to GitHub, with implementations for Java, PHP and JavaScript. Add your solutions if you like :)
My approach:
I have a database table with three columns:
- id, integer, auto-increment
- long, string, the long URL the user entered
- short, string, the shortened URL (or just the six characters)
I would then insert the long URL into the table. Then I would select the auto-increment value for "id
" and build a hash of it. This hash should then be inserted as "short
". But what sort of hash should I build? Hash algorithms like MD5 create too long strings. I don't use these algorithms, I think. A self-built algorithm will work, too.
My idea:
For "http://www.google.de/
" I get the auto-increment id 239472
. Then I do the following steps:
short = '';if divisible by 2, add "a"+the result to shortif divisible by 3, add "b"+the result to short... until I have divisors for a-z and A-Z.
That could be repeated until the number isn't divisible any more. Do you think this is a good approach? Do you have a better idea?
19 Answers
I would continue your "convert number to string" approach. However you will realize that your proposed algorithm fails if your ID is a prime and greater than 52.
Theoretical background
You need a Bijective Function f. This is necessary so that you can find a inverse function g('abc') = 123 for your f(123) = 'abc' function. This means:
- There must be no x1, x2 (with x1 ≠ x2) that will make f(x1) = f(x2),
- and for every y you must be able to find an x so that f(x) = y.
How to convert the ID to a shortened URL
- Think of an alphabet we want to use. In your case that's
[a-zA-Z0-9]
. It contains 62 letters. Take an auto-generated, unique numerical key (the auto-incremented
id
of a MySQL table for example).For this example I will use 12510 (125 with a base of 10).
Now you have to convert 12510 to X62 (base 62).
12510 = 2×621 + 1×620 =
[2,1]
This requires use of integer division and modulo. A pseudo-code example:
digits = []while num > 0 remainder = modulo(num, 62) digits.push(remainder) num = divide(num, 62)digits = digits.reverse
Now map the indices 2 and 1 to your alphabet. This is how your mapping (with an array for example) could look like:
0 → a1 → b...25 → z...52 → 061 → 9
With 2 → c and 1 → b you will receive cb62 as the shortened URL.
http://shor.ty/cb
How to resolve a shortened URL to the initial ID
The reverse is even easier. You just do a reverse lookup in your alphabet.
e9a62 will be resolved to "4th, 61st, and 0th letter in alphabet".
e9a62 =
[4,61,0]
= 4×622 + 61×621 + 0×620 = 1915810Now find your database-record with
WHERE id = 19158
and do the redirect.
Some implementations (provided by commenters)
- Ruby
- Python
- CoffeeScript
- Haskell
- Perl
- C#
3792586=='F_ck'
with u in the place of _). I would exclude some characters like u/U in order to minimize this. – Paulo Scardine Jun 28 '13 at 16:02Why would you want to use a hash?
You can just use a simple translation of your auto-increment value to an alphanumeric value. You can do that easily by using some base conversion. Say you character space (A-Z,a-z,0-9 etc') has 40 characters, convert the id to a base-40 number and use the characters are the digits.
public class UrlShortener { private static final String ALPHABET = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"; private static final int BASE = ALPHABET.length(); public static String encode(int num) { StringBuilder sb = new StringBuilder(); while ( num > 0 ) { sb.append( ALPHABET.charAt( num % BASE ) ); num /= BASE; } return sb.reverse().toString(); } public static int decode(String str) { int num = 0; for ( int i = 0; i < str.length(); i++ ) num = num * BASE + ALPHABET.indexOf(str.charAt(i)); return num; } }
Not an answer to your question, but I wouldn't use case-sensitive shortened URLs. They are hard to remember, usually unreadable (many fonts render 1 and l, 0 and O and other characters very very similar that they are near impossible to tell the difference) and downright error prone. Try to use lower or upper case only.
Also, try to have a format where you mix the numbers and characters in a predefined form. There are studies that show that people tend to remember one form better than others (think phone numbers, where the numbers are grouped in a specific form). Try something like num-char-char-num-char-char. I know this will lower the combinations, especially if you don't have upper and lower case, but it would be more usable and therefore useful.
My approach: Take the Database ID, then Base36 Encode it. I would NOT use both Upper AND Lowercase letters, because that makes transmitting those URLs over the telephone a nightmare, but you could of course easily extend the function to be a base 62 en/decoder.
- URL shortner
- url
- URL
- URL
- url
- url
- URL
- url
- URL
- url
- url
- url
- URL
- Url
- url
- URL
- URL
- url
- Java并发
- 数据结构(c)——线性表:顺序表和链式表
- java基础(1)--java.math.BigDecimal类的用法
- List遍历:for,foreach还是Iterator?
- c语言结构体复习笔记
- URL shortner
- 华为P9遭疯抢,首发3分钟售罄;苹果汽车概念图遭权威杂志曝光引热议;中科大机器人太逼真
- 浅谈引用和指针的区别
- HttpServletResponse 和 HttpServletRequest的应用场景
- hdu 1068 Girls and Boys 匈牙利算法求最大独立集
- Missing Operating System问题检查及恢复
- nginx 父子进程通信 channel
- APM代码中MAVLINK的初步分析。
- Linux(妙算)串口通信
encode()
anddecode()
functions. The steps are, therefore: (1) Save URL in database (2) Get unique row ID for that URL from database (3) Convert integer ID to short string withencode()
, e.g.273984
tof5a4
(4) Use the short string (e.g.f4a4
) in your sharable URLs (5) When receiving a request for a short string (e.g.20a8
), decode the string to an integer ID withdecode()
(6) Look up URL in database for given ID. For conversion, use: github.com/delight-im/ShortURL – Marco W. Feb 10 '15 at 10:31