杭电OJ第4018题 Parsing URL

来源:互联网 发布:美橙域名证书下载 编辑:程序博客网 时间:2024/04/29 00:23

  杭电OJ第4018题,Parsing URL(题目链接)。

Parsing URL

Problem Description

In computing, a Uniform Resource Locator or Universal Resource Locator (URL) is a character string that specifies where a known resource is available on the Internet and the mechanism for retrieving it.
The syntax of a typical URL is:
scheme://domain:port/path?query_string#fragment_id
In this problem, the scheme, domain is required by all URL and other components are optional. That is, for example, the following are all correct urls:
http://dict.bing.com.cn/#%E5%B0%8F%E6%95%B0%E7%82%B9
http://www.mariowiki.com/Mushroom
https://mail.google.com/mail/?shva=1#inbox
http://en.wikipedia.org/wiki/Bowser_(character)
ftp://fs.fudan.edu.cn/
telnet://bbs.fudan.edu.cn/
http://mail.bashu.cn:8080/BsOnline/
Your task is to find the domain for all given URLs.

Input

There are multiple test cases in this problem. The first line of input contains a single integer denoting the number of test cases. For each of test case, there is only one line contains a valid URL.

Output

For each test case, you should output the domain of the given URL.

Sample Input

3
http://dict.bing.com.cn/#%E5%B0%8F%E6%95%B0%E7%82%B9
http://www.mariowiki.com/Mushroom
https://mail.google.com/mail/?shva=1#inbox

Sample Output

Case #1: dict.bing.com.cn
Case #2: www.mariowiki.com
Case #3: mail.google.com

Source

The 36th ACM/ICPC Asia Regional Shanghai Site —— Warmup

 

 

 

 

  解题思路:简单的字符串解析,没有任何难度。不过要注意,不要输出端口号。直接用Java的正则表达式就能轻松搞定。

import java.io.*;import java.util.*;import java.util.regex.Matcher;import java.util.regex.Pattern;public class Main{    public static void main(String args[])    {        Scanner cin = new Scanner(System.in);        int n;        String URL;        Matcher matcher;        Pattern pattern = Pattern.compile("([A-Za-z]+://)([^:/]+)[:/].*");        n = cin.nextInt();        URL = cin.nextLine();        for ( int i = 1 ; i <= n ; i ++ )        {            URL = cin.nextLine();            matcher = pattern.matcher(URL);            if ( matcher.matches() )                System.out.println("Case #" + i + ": " + matcher.group(2) );        }    }}

 

 

 

  喜欢用C语言搞也行。C语言本来可以用GNU正则表达式的。

C语言 + GNU正则表达式
#include <stdio.h>#include <stdlib.h>#include <string.h>#include <regex.h>typedef int COUNT;#define MAX_LENGTH 1000int main (void){    COUNT i;    int n;    char url[MAX_LENGTH];    regmatch_t pmatch[4];    regex_t match_regex;    regcomp( &match_regex, "([A-Za-z]+://)([^:/]+)([:/].*)", REG_EXTENDED );    scanf( "%d", &n );    for ( i = 1 ; i <= n ; i ++ )    {        scanf( "%s", url );        regexec( &match_regex, url, 4, pmatch, 0 );        url[pmatch[2].rm_eo] = '\0';        puts( &(url[pmatch[2].rm_so]) );    }    regfree( &match_regex );    return EXIT_SUCCESS;}

不过杭电OJWindows服务器,用的gcc编译器是MinGWgcc,所以不支持GNU正则表达式,所以如果用C语言写,就只能自己解析字符串了。C代码如下:

#include <stdio.h>#include <stdlib.h>#include <string.h>#include <stdbool.h>typedef int COUNT;#define MAX_LENGTH 1000int main (void){    COUNT i, j;    int n;    bool starturl;    char url[MAX_LENGTH];    char outputurl[MAX_LENGTH];    int len;    scanf( "%d", &n );    for ( i = 1 ; i <= n ; i ++ )    {        starturl = false;        scanf( "%s", url );        sprintf (outputurl, "Case #%d: ", i );        len = strlen( outputurl );        for ( j = 0 ; url[j] != '\0' ; j ++ )        {            if ( !starturl )            {                if ( url[j] == '/' )                {                    j ++;                    starturl = true;                }            }            else            {                if ( url[j] == ':'                         || url[j] == '/'                        || url[j] == '\0' )                    break;                outputurl[len++] = url[j];            }        }        outputurl[len] = '\0';        puts( outputurl );    }    return EXIT_SUCCESS;}
0 0
原创粉丝点击