记录

网络爬虫学习笔记

scraping the web for fun and for profit  2020年3月2日

课程计划

  • 入门

  • HttpClient

  • Jsoup

  • 案例

    dependencies {
        implementation('org.jsoup:jsoup:1.12.2')
        implementation('org.apache.httpcomponents:httpclient:4.5.2')
      
        testImplementation('org.slf4j:slf4j-log4j12:1.7.25')
        testImplementation('org.junit.jupiter:junit-jupiter:5.6.0')
    }
    

爬虫功能

从功能上来讲,爬虫一般分为数据采集,处理,存储三个部分。爬虫从一个或若干个初始页面的URL开始,获得初始页面上的URL,在抓取页面的过程中,不断从当前页面抽取新的URL放入队列,直到满足系统的一定停止条件。

  • 数据采集
  • 处理
  • 存储

从初始页面开始,爬取这个页面里面的详细页面连接,接着是下一页,等等。

阅读更多📰

OCP-1Z0-816 模拟测试1回顾

题目比较多  2020年2月26日

4. Given:

Path p1 = Paths.get("c:\\temp\\test1.txt");
Path p2 = Paths.get("c:\\temp\\test2.txt");

Which of the following code fragments moves the file test1.txt to test2.txt, even if test2.txt exists?

  • Files.move(p1, p2);

    This will throw a java.nio.file.FileAlreadyExistsException if the file already exists.

  • Files.move(p1, p2, StandardCopyOption.REPLACE_EXISTING);
  • try(Files.move(p1, p2)) { }

    Files.move returns a Path object (of the destination file), which is not a resource that can be closed because it does not implement AutoCloseable interface. So this will not compile.

  • try(Files.copy(p1, p2, StandardCopyOption.REPLACE_EXISTING)) { Files.delete(p1); }
  • Files.copy(p1, p2, StandardCopyOption.REPLACE_EXISTING); Files.delete(p1);

Explanation
Files.copy method will copy the test1.txt into test2.txt. If test2.txt doesn’t exist, it will be created. However, Files.isSameFile method doesn’t check the contents of the file. It is meant to check if the two path objects resolve to the same file or not. In this case, they are not, and so, it will return false.
The following is brief JavaDoc description for both the methods:
public static Path copy(Path source, Path target, CopyOption... option) throws IOException

阅读更多📰

816考点速记

1. Unmodifiable collections using of/copyOf and Collections.unmodifiableXXX methods

java.util.List and java.util.Set have of and copyOf static factory methods that provide a convenient way to create unmodifiable lists/sets.

The of methods accept either an array or multiple individual parameters. If you pass it a collection, it will be treated as a regular object i.e. it will return a list/set containing the same collection object instead of returning a list/set containing the objects that the passed collection contains.

The copyOf, on the other hand, accepts only a Collection. It iterates through the passed Collection and adds all the elements of that Collection in the returned list/set.

阅读更多📰

816基准测试回顾

1. Which of the following annotations are retained for run time?

  • @SuppressWarnings

    It is defined with @Retention(SOURCE)

  • @Override

    It is defined with @Retention(SOURCE)

  • @SafeVarargs

    It is defined with @Retention(RUNTIME)

  • @FunctionalInterface

    It is defined with @Retention(RUNTIME)

  • @Deprecated

    It is defined with @Retention(RUNTIME)


2. Your application needs to load a set of key value pairs from a database table which never changes. Multiple threads need to access this information but none of them changes it.
Which class would be the most appropriate to store such data if the values need not be kept in a sorted fashion?

  • Hashtable
  • HaspMap
  • Set
  • TreeMap
  • List

Explanation
You should know that all Hashtable methods are synchronized and this compromises its performance for simultaneous reads.
Since not thread modifies the data, it is not efficient to use a Hashtable.
A HashMap is a perfect choice because its methods are not synchronized and so it allows efficient multiple reads. TreeMap is used to keep the keys sorted which makes it a little bit slower than HashMap.
Set and List can’t be used since we need to store Key-value pairs.

阅读更多📰

Exercises for Inner class

内部类章节的练习  2020年2月15日

  1. What is an inner class? Differentiate between member, local, and anonymous inner classes.

    An inner class declares inside a top-level class without a static modifier. Inner classes that declared at member level are called member inner classes, local to the method, or instance initialization block are local inner class. And local inner classes that have no name are anonymous inner classes.

阅读更多📰

Travis CI上手

玩下持续构建、持续集成  2020年2月12日

配置文件

我用于博客自动构建的.travis.yml脚本:

language: ruby
rvm:
  - 2.6.3
env:
  global:
  - NOKOGIRI_USE_SYSTEM_LIBRARIES=true
  - secure: D0RrVHHBEEJMyJK9OUUPM6Wms+D+zifJhHFl4BcmC0leSR0vANoQ6357Bl4miZL2flmnBU/1Z9WNzcDVM0JwQpF8ZVeEdMvgBY5+73Xjp8cIwYfWx8MAUB8J1Hug2aEiXw6KoSxjFOnUpSRnhCcWpviFNCZOa5XavMfe7+OHbNQT1U1xu+ZOmDG6Hrm3CMhfzu2cRPIfziAAgnWVfgtqA0STXnY9D0HvfsKPl6khy6VWkeF09Chqo/XVO/SV5LSgplkgXh1OzS6ITwT99UWDRqIeXYAvMJFyrHkzdxNgtAddYVxmx8WQqfqUFQ6A3HSnJy2Fq04H3gLszQxrVPzw8fNeOzBe7hIviaZdgRsglc26NO13EX//e4DA1ImiYKciFGvDNDBrqtv8u3XAG+8jVe/dLwKbBXnvi64udh2/H/lTBfWQ3eV08/R6m0m4aF8jbv0oBuXReyXuEy9D6xNm+yceekWzKAmTYPFb9W5XLveIgt8SIQO9P6tGuMWV6w7KRhgbpgI886ZeSVS3WcAGIr11SvSNa+x5WapOd/mizHrA8VvW4v8GjmYYwTOsmiLuYl2t1fWY76yYOOdSDLC9XhJN/mS3qxxX7WR4KCJgpstyzTVXj+uyuzi23sV1SdUvO35r3VkZk11ad9Qmp9WLXpdSQfMAKbu/lC7C9HfrGuM=
cache:
  bundler: true
  directories:
    - /home/travis/.rvm/
install:
- gem install bundler
- bundle install
script:
- bundle exec rake deploy
after_success:
- bash <(curl -s https://codecov.io/bash)

发现启用了 cache 之后,快很多了。搞到动图凑内容:

cache_1581461859_5053

用来测试构建时长,哈哈哈哈!

阅读更多📰

AsciiDoc语法参考

学习笔记  2020年2月12日

段落

段落不需要特别的格式,直接写就是了。 然后呢

自然段

这是普通段落

开始至少空一格就成为自然段,同一个自然段里面的内容必须相邻,
即不能隔空行。
隔空行就是另一个自然段了。自然段文字是等宽字体,然后会保留空格和新行。
例如    保留了前边的空格,
然后这样,保留了换新行。

现在这里又是一个普通段落了。

阅读更多📰

Java 内部类

Inner Class  2020年2月4日

在这一篇文章,你将会学习到:

  • 内部类是什么
  • 如何声明内部类
  • 如何声明成员、局部,以及匿名内部类
  • 如何创建内部类的对象

什么是内部类?

我们知道类(class)是包(package)的成员(member)。即顶层类,直接放在包下面的类。例如以下这段代码:

// TopLevel.java
package io.zwt.innerclasses;

public class TopLevel {
    private int value = 101;
    
    public int getValue() {
        return value;
    }
    
    public void setValue(int value) {
        this.value = value;
    }
}

上面的 TopLevel 类是 io.zwt.innerclasses 包的成员。 这个里有三个成员:

  • 一个实例变量(instance variable):value
  • 两个方法(method):getValue() and setValue()

一个类也可以声明在另一个类里面。这种类称为内部类。如果定义在另一个类中的类显式或隐式被声明为static,则它被称为嵌套类,而不是内部类。包含内部类的类称为外层类外部类。看一下下面的 OuterInner 类声明:

// Outer.java
pakcage io.zwt.innerclasses;

public class Outer {
	public class Inner {
		// 内部类的成员在这里
	}
	// 外部类的其它成员在这里
}

一个内部类实例只能存在于其外部类实例内。也就是说,你在创建内部类实例之前必须要有一个外部类的实例。这个规则在规范一个对象不能脱离另一个对象而存在这方面很有用。内部类可以完全访问到它的外部类的所有成员,包括私有成员。

阅读更多📰

深入生成器设计模式

Builder Pattern in Depth  2020年2月2日

GoF 定义

Separate the construction of a complex object from its representation so that the same construction processes can create different representations.

将一个复杂对象的构建与它的表示分离,使得同样的构建过程可以创建不同表示。(中文版书里的翻译)

Builder 在《设计模式》的中文版里边翻译为“生成器”,那我就按这个译法吧。生成器模式属于创建型模式(Creational patterns),它关注如何创建对象。当需要构建的对象比较复杂,由多个部分组成,也就说它的构造方法会有很多参数,就可以考虑使用这种模式。生成器模式认为对象的构建机制应该独立于它的组成部分(也就是属性),对象的构建过程不关注对象的组成部分。所以同一个构建过程可以构建出不同表示(属性)的对象(通过改变构建步骤)。

阅读更多📰

Spring Boot的类

学习笔记  2020年2月1日

常见类的用途

SpringApplicationBuilder

java.lang.Object
org.springframework.boot.builder.SpringApplicationBuilder

Builder for SpringApplication and ApplicationContext instances with convenient fluent API and context hierarchy support. Simple example of a context hierarchy:
new SpringApplicationBuider(ParentConfig.class).child(ChildConfig.class).run(args);
Another common use case is setting active profiles and default properties to set up the environment for an application:

new SpringApplicationBuilder(Application.class).profiles("server")
    			.properties("transport=local").run(args);

If your needs are simpler, consider using the static convenience methods in SpringApplication instead.

(org.springframework.boot:spring-boot:2.2.4.RELEASE)

阅读更多📰