杭电OJ第4018题 Parsing URL

来源：互联网发布：美橙域名证书下载编辑：程序博客网时间：2024/04/29 00:23

　　杭电OJ第4018题，Parsing URL（题目链接）。

Parsing URL
Problem Description
In computing, a Uniform Resource Locator or Universal Resource Locator (URL) is a character string that specifies where a known resource is available on the Internet and the mechanism for retrieving it.
The syntax of a typical URL is:
scheme://domain:port/path?query_string#fragment_id
In this problem, the scheme, domain is required by all URL and other components are optional. That is, for example, the following are all correct urls:
http://dict.bing.com.cn/#%E5%B0%8F%E6%95%B0%E7%82%B9
http://www.mariowiki.com/Mushroom
https://mail.google.com/mail/?shva=1#inbox
http://en.wikipedia.org/wiki/Bowser_(character)
ftp://fs.fudan.edu.cn/
telnet://bbs.fudan.edu.cn/
http://mail.bashu.cn:8080/BsOnline/
Your task is to find the domain for all given URLs.
Input
There are multiple test cases in this problem. The first line of input contains a single integer denoting the number of test cases. For each of test case, there is only one line contains a valid URL.
Output
For each test case, you should output the domain of the given URL.
Sample Input
3
http://dict.bing.com.cn/#%E5%B0%8F%E6%95%B0%E7%82%B9
http://www.mariowiki.com/Mushroom
https://mail.google.com/mail/?shva=1#inbox
Sample Output
Case #1: dict.bing.com.cn
Case #2: www.mariowiki.com
Case #3: mail.google.com
Source
The 36th ACM/ICPC Asia Regional Shanghai Site —— Warmup

　　解题思路：简单的字符串解析，没有任何难度。不过要注意，不要输出端口号。直接用Java的正则表达式就能轻松搞定。

import java.io.*;import java.util.*;import java.util.regex.Matcher;import java.util.regex.Pattern;public class Main{    public static void main(String args[])    {        Scanner cin = new Scanner(System.in);        int n;        String URL;        Matcher matcher;        Pattern pattern = Pattern.compile("([A-Za-z]+://)([^:/]+)[:/].*");        n = cin.nextInt();        URL = cin.nextLine();        for ( int i = 1 ; i <= n ; i ++ )        {            URL = cin.nextLine();            matcher = pattern.matcher(URL);            if ( matcher.matches() )                System.out.println("Case #" + i + ": " + matcher.group(2) );        }    }}

　　喜欢用C语言搞也行。C语言本来可以用GNU正则表达式的。

C语言 + GNU正则表达式

#include <stdio.h>#include <stdlib.h>#include <string.h>#include <regex.h>typedef int COUNT;#define MAX_LENGTH 1000int main (void){    COUNT i;    int n;    char url[MAX_LENGTH];    regmatch_t pmatch[4];    regex_t match_regex;    regcomp( &match_regex, "([A-Za-z]+://)([^:/]+)([:/].*)", REG_EXTENDED );    scanf( "%d", &n );    for ( i = 1 ; i <= n ; i ++ )    {        scanf( "%s", url );        regexec( &match_regex, url, 4, pmatch, 0 );        url[pmatch[2].rm_eo] = '\0';        puts( &(url[pmatch[2].rm_so]) );    }    regfree( &match_regex );    return EXIT_SUCCESS;}

不过杭电OJ是Windows服务器，用的gcc编译器是MinGW的gcc，所以不支持GNU正则表达式，所以如果用C语言写，就只能自己解析字符串了。C代码如下：

#include <stdio.h>#include <stdlib.h>#include <string.h>#include <stdbool.h>typedef int COUNT;#define MAX_LENGTH 1000int main (void){    COUNT i, j;    int n;    bool starturl;    char url[MAX_LENGTH];    char outputurl[MAX_LENGTH];    int len;    scanf( "%d", &n );    for ( i = 1 ; i <= n ; i ++ )    {        starturl = false;        scanf( "%s", url );        sprintf (outputurl, "Case #%d: ", i );        len = strlen( outputurl );        for ( j = 0 ; url[j] != '\0' ; j ++ )        {            if ( !starturl )            {                if ( url[j] == '/' )                {                    j ++;                    starturl = true;                }            }            else            {                if ( url[j] == ':'                         || url[j] == '/'                        || url[j] == '\0' )                    break;                outputurl[len++] = url[j];            }        }        outputurl[len] = '\0';        puts( outputurl );    }    return EXIT_SUCCESS;}

0 0