第一篇技术博客,写得不好请见谅,谢谢(^_^)
由于最近师弟师妹们学习Android的需求,于是就写了此篇博客并且与各位分享一下。
整篇博客总共分为两部分。
第一部分搭建一个新闻列表界面(ListView列表)。
第二部分新闻数据的抓取(使用正则表达式)
涉及到的技术,java正则表达式,java网络编程(IO流)。
编译器:android studio
整个Demo项目的结构如下所示。
1. 第一部分,搭建一个新闻列表界面
MainActivity.java文件代码如下
package per.edward.androidnewsreader;import android.app.Activity;import android.os.Bundle;import android.view.View;import android.widget.AdapterView;import android.widget.ListView;import android.widget.Toast;import java.util.ArrayList;import java.util.List;import per.edward.androidnewsreader.adapter.NewsAdapter;import per.edward.androidnewsreader.bean.NewsItemModel;public class MainActivity extends Activity { private ListView mListView; private List<String> list; private NewsAdapter adapter; @Override protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.activity_main); initView(); initData(); } public void initView() { list = new ArrayList<String>(); mListView = (ListView) findViewById(R.id.list_view); } public void initData() { for (int i = 0; i < 15; i++) { list.add(i+""); } adapter = new NewsAdapter(this, list, R.layout.adapter_news_item); mListView.setAdapter(adapter); mListView.setOnItemClickListener(new ItemClickListener()); } /** * 新闻列表点击事件 */ public class ItemClickListener implements AdapterView.OnItemClickListener{ @Override public void onItemClick(AdapterView<?> adapterView, View view, int i, long l) { Toast.makeText(getApplicationContext(),""+i,Toast.LENGTH_SHORT ).show(); } }}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
activity_main.xml文件如下所示
<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android" xmlns:tools="http://schemas.android.com/tools" android:layout_width="match_parent" android:layout_height="match_parent"> <ListView android:id="@+id/list_view" android:layout_width="match_parent" android:layout_height="match_parent"/></RelativeLayout>
adapter_news_item.xml文件如下所示
<?xml version="1.0" encoding="utf-8"?><RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android" android:layout_width="match_parent" android:layout_height="wrap_content" android:padding="10dp"> <ImageView android:id="@+id/image_view" android:layout_width="80dp" android:layout_height="wrap_content" android:scaleType="centerCrop" android:layout_centerVertical="true" android:background="@mipmap/ic_launcher" /> <LinearLayout android:layout_marginLeft="10dp" android:id="@+id/line" android:layout_width="match_parent" android:layout_height="wrap_content" android:layout_toRightOf="@+id/image_view" android:orientation="vertical"> <TextView android:id="@+id/txt_title" android:layout_width="wrap_content" android:layout_height="wrap_content" android:text="Edward" android:textSize="16dp" /> <TextView android:layout_marginTop="5dp" android:id="@+id/txt_summary" android:layout_width="wrap_content" android:layout_height="wrap_content" android:text="岭南学院" android:textSize="12dp" /> </LinearLayout></RelativeLayout>
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
~~~整个界面的效果就是如下图,非常的简单。
整个新闻列表界面搭建完成。就是如此简单。接下来就是分享一下如何去抓取新闻网站的数据。
2. 第二部分,数据抓取分析
抓取目标URL地址:http://news.qq.com/china_index.shtml
下面咋们看看这个网站中的内容,内容中左边有个图片右边有新闻标题和新闻摘要。
接下来目标很明确,就是将这些数据全部拿下来,再将其显示在第一部搭建的界面中。
查看此页面的源代码,如下图所示,我用红色边框勾出了三条新闻的源代码。
<a target="_blank" class="pic" href="/a/20150909/036168.htm"><img class="picto" src="http://img1.gtimg.com/news/pics/hv1/51/43/1920/124859016.jpg"></a><em class="f14 l24"><a target="_blank" class="linkto" href="/a/20150909/036168.htm">英航客机美国拉斯维加斯起火 14人轻伤送医治疗</a></em><p class="l22">美国联邦航空管理局发布声明说,飞机左引擎起火,机组中断起飞,指挥乘客紧急疏散。</p>
我们可以发现,除了新闻的图片地址,新闻标题,新闻的摘要,新闻详情地址会改变之外,其它的标签对都不会改变。因此我们根据此规则,可以简单的使用正则表达式匹配出我们想要的数据出来。
正则表达式的核心代码如下
Pattern pattern = Pattern .compile("<a target=\"_blank\" class=\"pic\" href=\"([^\"]*)\"><img class=\"picto\" src=\"([^\"]*)\"></a><em class=\"f14 l24\"><a target=\"_blank\" class=\"linkto\" href=\"[^\"]*\">([^</a>]*)</a></em><p class=\"l22\">([^</p>]*)</p>");
可以看到compile中字符串里面的内容基本和每条新闻源码相似,其中([^\"]*),([^</a>]*),([^</p>]*)
这三个比较奇怪的语句,咋们可以简单的认为在此限定的字符串中任意匹配所有字符直到遇到\”结束。其它两个([^</a>]*),([^</p>]*)
也差不多同样的意思。
Function.java文件的代码
package per.edward.androidnewsreader.functionimport android.util.Logimport java.util.ArrayListimport java.util.Listimport java.util.regex.Matcherimport java.util.regex.Patternimport per.edward.androidnewsreader.bean.NewsItemModelpublic class Function { public static List<NewsItemModel> parseHtmlData(String result) { List<NewsItemModel> list = new ArrayList<>() Pattern pattern = Pattern .compile("<a target=\"_blank\" class=\"pic\" href=\"([^\"]*)\"><img class=\"picto\" src=\"([^\"]*)\"></a><em class=\"f14 l24\"><a target=\"_blank\" class=\"linkto\" href=\"[^\"]*\">([^</a>]*)</a></em><p class=\"l22\">([^</p>]*)</p>") Matcher matcher = pattern.matcher(result) StringBuffer sb = new StringBuffer() while (matcher.find()) { NewsItemModel model = new NewsItemModel() model.setNewsDetailUrl(matcher.group(1).trim()) model.setUrlImgAddress(matcher.group(2).trim()) model.setNewsTitle(matcher.group(3).trim()) model.setNewsSummary(matcher.group(4).trim()) sb.append("详情页地址:" + matcher.group(1).trim() + "\n") sb.append("图片地址:" + matcher.group(2).trim() + "\n") sb.append("标题:" + matcher.group(3).trim() + "\n") sb.append("概要:" + matcher.group(4).trim() + "\n\n") list.add(model) } Log.e("----------------->", sb.toString()) return list }}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
NewsItemModel.java
package per.edward.androidnewsreader.bean;import android.graphics.Bitmap;/** * description:新闻Model * <p/> * author:Edward * <p/> * 2015/9/9 */public class NewsItemModel { private Bitmap newsBitmap; private String newsDetailUrl; private String urlImgAddress; private String newsTitle; private String newsSummary; public Bitmap getNewsBitmap() { return newsBitmap; } public void setNewsBitmap(Bitmap newsBitmap) { this.newsBitmap = newsBitmap; } public String getUrlImgAddress() { return urlImgAddress; } public void setUrlImgAddress(String urlImgAddress) { this.urlImgAddress = urlImgAddress; } public String getNewsDetailUrl() { return newsDetailUrl; } public void setNewsDetailUrl(String newsDetailUrl) { this.newsDetailUrl = newsDetailUrl; } public String getNewsTitle() { return newsTitle; } public void setNewsTitle(String newsTitle) { this.newsTitle = newsTitle; } public String getNewsSummary() { return newsSummary; } public void setNewsSummary(String newsSummary) { this.newsSummary = newsSummary; }}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
CommonTool.java代码
package per.edward.androidnewsreader.tool;import java.io.BufferedInputStream;import java.io.ByteArrayOutputStream;import java.io.IOException;import java.io.InputStream;import java.net.HttpURLConnection;import java.net.MalformedURLException;import java.net.URL;public class CommonTool { /** * get请求(获取指定地址的数据) * * @param urlString * @return */ public static String getRequest(String urlString, String codingType) { BufferedInputStream bis = null; ByteArrayOutputStream bos = null; InputStream is = null; try { URL url = new URL(urlString); HttpURLConnection conn = (HttpURLConnection) url.openConnection(); conn.setRequestProperty("Accept", "*/*"); conn.connect(); int responseCode = conn.getResponseCode(); if (responseCode == 200) { is = conn.getInputStream(); bis = new BufferedInputStream(is); bos = new ByteArrayOutputStream(); int length = 0; byte[] by = new byte[1024]; while ((length = bis.read(by)) != -1) { bos.write(by, 0, length); } bos.flush(); String result = new String(bos.toByteArray(), codingType); return result; } } catch (MalformedURLException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } finally { try { if (bos != null) { bos.close(); } if (bis != null) { bis.close(); } if (is != null) { is.close(); } } catch (IOException e) { e.printStackTrace(); System.out.println("关闭失败!"); } } return null; } /** * 下载图片网络 * * @param urlString * * @return */ public static InputStream getImgInputStream(String urlString) { try { URL url = new URL(urlString); HttpURLConnection connection = (HttpURLConnection) url.openConnection(); connection.setRequestMethod("GET"); connection.setReadTimeout(10 * 1000); connection.connect(); if (connection.getResponseCode() == 200) { return connection.getInputStream(); } else { return null; } } catch (Exception e) { return null; } }}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
NewsAdapter.java文件
package per.edward.androidnewsreader.adapter;import android.content.Context;import android.view.LayoutInflater;import android.view.View;import android.view.ViewGroup;import android.widget.BaseAdapter;import android.widget.ImageView;import android.widget.TextView;import java.util.List;import per.edward.androidnewsreader.R;import per.edward.androidnewsreader.bean.NewsItemModel;/** * description: * <p/> * author:Edward * <p/> * 2015/9/9 */public class NewsAdapter extends BaseAdapter { private Context mContext; private List<NewsItemModel> list; private int layoutId; private ViewHolder viewHolder = null; public NewsAdapter(Context mContext, List<NewsItemModel> list, int layoutId) { this.mContext = mContext; this.list = list; this.layoutId = layoutId; } @Override public int getCount() { return list.size(); } @Override public Object getItem(int i) { return list.get(i); } @Override public long getItemId(int i) { return i; } @Override public View getView(final int position, View view, ViewGroup viewGroup) { if (view == null) { viewHolder = new ViewHolder(); view = LayoutInflater.from(mContext).inflate(layoutId, null); viewHolder.imageView = (ImageView) view.findViewById(R.id.image_view); viewHolder.txtTitle = (TextView) view.findViewById(R.id.txt_title); viewHolder.txtSummary = (TextView) view.findViewById(R.id.txt_summary); view.setTag(viewHolder); } else { viewHolder = (ViewHolder) view.getTag(); } if (list.get(position).getNewsBitmap() != null) { viewHolder.imageView.setImageBitmap(list.get(position).getNewsBitmap()); } else { viewHolder.imageView.setVisibility(View.GONE); } viewHolder.txtTitle.setText(list.get(position).getNewsTitle()); viewHolder.txtSummary.setText(list.get(position).getNewsSummary()); return view; } public class ViewHolder { ImageView imageView; TextView txtTitle, txtSummary; }}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
最后在进行网络操作之后别忘了AndroidManifest.xml的网络权限。
<?xml version="1.0" encoding="utf-8"?><manifest xmlns:android="http://schemas.android.com/apk/res/android" package="per.edward.androidnewsreader"> <uses-permission android:name="android.permission.INTERNET" /> <uses-sdk android:maxSdkVersion="22" android:minSdkVersion="9" /> <application android:allowBackup="true" android:icon="@mipmap/ic_launcher" android:label="@string/app_name" android:theme="@style/AppTheme"> <activity android:name=".MainActivity" android:label="@string/app_name"> <intent-filter> <action android:name="android.intent.action.MAIN" /> <category android:name="android.intent.category.LAUNCHER" /> </intent-filter> </activity> </application></manifest>
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
最后再修改一下第一部分贴过的MainActivity.java文件的代码。
package per.edward.androidnewsreader;import android.app.Activity;import android.graphics.Bitmap;import android.graphics.BitmapFactory;import android.os.Bundle;import android.os.Handler;import android.os.Message;import android.util.Log;import android.view.View;import android.widget.AdapterView;import android.widget.ListView;import android.widget.Toast;import java.util.ArrayList;import java.util.List;import per.edward.androidnewsreader.adapter.NewsAdapter;import per.edward.androidnewsreader.bean.NewsItemModel;import per.edward.androidnewsreader.function.Function;import per.edward.androidnewsreader.tool.CommonTool;public class MainActivity extends Activity { private ListView mListView; private List<NewsItemModel> list; private NewsAdapter adapter; private final static int GET_DATA_SUCCEED = 1; @Override protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.activity_main); initView(); initData(); } public void initView() { list = new ArrayList<NewsItemModel>(); mListView = (ListView) findViewById(R.id.list_view); } public void initData() { new Thread(new Runnable() { @Override public void run() { String result = CommonTool.getRequest("http://news.qq.com/china_index.shtml", "gbk"); Log.e("结果------------->", result); List<NewsItemModel> list = Function.parseHtmlData(result); for (int i = 0; i < list.size(); i++) { NewsItemModel model = list.get(i); Bitmap bitmap = BitmapFactory.decodeStream(CommonTool.getImgInputStream(list.get(i).getUrlImgAddress())); model.setNewsBitmap(bitmap); } mHandler.sendMessage(mHandler.obtainMessage(GET_DATA_SUCCEED, list)); } }).start(); } public Handler mHandler = new Handler() { @Override public void handleMessage(Message msg) { switch (msg.what) { case GET_DATA_SUCCEED: List<NewsItemModel> list = (List<NewsItemModel>) msg.obj; adapter = new NewsAdapter(MainActivity.this, list, R.layout.adapter_news_item); mListView.setAdapter(adapter); mListView.setOnItemClickListener(new ItemClickListener()); Toast.makeText(getApplicationContext(), String.valueOf(list.size()), Toast.LENGTH_LONG).show(); break; } } }; /** * 新闻列表点击事件 */ public class ItemClickListener implements AdapterView.OnItemClickListener { @Override public void onItemClick(AdapterView<?> adapterView, View view, int i, long l) { NewsItemModel temp =(NewsItemModel) adapter.getItem(i); Toast.makeText(getApplicationContext(), temp.getNewsTitle(), Toast.LENGTH_SHORT).show(); } }}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
Demo的最终效果图