HtmlLinkValidationTF

Validate links in an HTML document.

Task package:org.schmant.task.base
Java package:org.schmant.task.html
Category:HTML tasks
Since:0.6
EntityFS-aware?Yes*
Implements:ActionTaskFactory
Restriction:This task only works on files that can be resolved as File:s since it must be able to resolve relative links.

Description:

This task validate links in an HTML document by trying to load the resource that is targeted by each link. An optional collection of regular expressions can be given to exclude the link addresses matching any of the expressions in the collection from validation (the ignorePatterns property).

This task can also validate the targets of Javadoc links. Add all Javadoc packages that links should be validated for to the javadocPackages property.

This task lends itself well to running in a TaskExecutor since it is common that it spends most of its time waiting for remote servers to answer its requests.

The task stores all validated links in a collection that it checks before validating another link. If a link to validate is already present in the collection, it is not validated again. The collection can be set to the validLinkCollection property.

Required properties

Properties

ignorePatternstop

A collection of regular expression patterns. The link addresses that match any of the patterns in the collection will not be validated.

The patterns are parsed by Pattern.compile().

Setter method:
addIgnorePattern(String s)
Add one ignore pattern.
parameters:
s – A pattern.
Setter method:
addIgnorePattern(Pattern p)
Add one ignore pattern
parameters:
p – A pattern.
Setter method:
addIgnorePatterns(Object o)
Add one or several ignore patterns.
parameters:
o – One or an array or collection of ignore patterns (strings and/or Pattern objects).
Setter method:
clearIgnorePatterns()
Clear the collection of ignore patterns.
inputEncodingtop

Character encoding of the HTML files.

Setter method:
setInputEncoding(Charset c)
parameters:
c – A character encoding.
Setter method:
setInputEncoding(String s)
parameters:
s – The name of a character encoding
Default value:
The JVM's default character encoding (platform dependent).
javadocPackagestop

Javadoc links pointing to any of the classes in the packages (and their subpackages) added to this property will be validated.

Javadoc links are of the form http://host/path/index.html?package-path/class.html. When a Javadoc link is validated, the validator first tries to read the index.html file, and then the class HTML file that the link references.

Setter method:
addJavadocPackage(String p)
Add one package. Javadoc links referencing classes in the package or any of its subpackages will be validated.
parameters:
p – The package name.
Setter method:
addJavadocPackages(Object o)
Add one or several package names. Javadoc links referencing classes in any of the packages or their subpackages will be validated.
parameters:
o – A package name or an array or collection of package names (strings).
Setter method:
clearJavadocPackages()
Clear the collection of packages.
logFootertop

The message that is logged to info level after the task has been successfully run.

Setter method:
setLogFooter(String s)
parameters:
s – The footer message.
Default value:
Empty (no footer message is logged.)
See also:
logHeader
logHeadertop

The message that is logged to info level before the task is run.

Setter method:
setLogHeader(String s)
parameters:
s – The header message.
Default value:
A task class specific message.
See also:
logFooter
reportLeveltop

This property is used to change the Report level for all task created by this task factory. The report level is changed for the thread running the task when the it is run, and is restored to its previous level when the it is done.

Setter method:
setReportLevel(Level l)
Set the report level
parameters:
l – The new report level.
sources (required)top

A collection of HTML files to validate.

Setter method:
addSource(Object o)
Add one or several HTML files.
parameters:
o – An HTML file or an array or collection of HTML files.
Interpreted by InterpretAsFileStrategy.
Setter method:
addSources(Object o)
Add one or several HTML files.
parameters:
o – An HTML file or an array or collection of HTML files.
Interpreted by InterpretAsFileStrategy.
Setter method:
clearSources()
Discard all HTML files.
Setter method:
setSource(Object o)
Set one or several HTML files, discarding previously set files.
parameters:
o – An HTML file or an array or collection of HTML files.
Interpreted by InterpretAsFileStrategy.
Setter method:
setSources(Object o)
Set one or several HTML files, discarding previously set files.
parameters:
o – An HTML file or an array or collection of HTML files.
Interpreted by InterpretAsFileStrategy.
traceLoggingtop

If trace logging is enabled for a task, it reports its configuration before it is run.

Trace logging may also be enabled globally for all tasks by calling TraceMode.setTraceMode(boolean).

Setter method:
setTraceLogging(boolean b)
Enable or disable trace logging.
parameters:
b – Enable trace logging?
validLinkCollectiontop

A collection of already validated absolute links. This collection can be shared between several link validation tasks to prevent valid absolute links from being validated more than once. The collection must be safe for concurrent access by several threads if it is used by several concurrent validation tasks (see the Collections synchronizedX methods).

Setter method:
setValidLinkCollection(Collection<String> c)
parameters:
c – The valid link collection. This object should be safe to use concurrently from several threads.

Examples

Example 1

Validate links in all HTML files in the directory hierarchy under the doc directory. Don't validate links that point to Java SE's API documentation. (Assume that Javadoc gets those right.) Run the validation tasks in a TaskExecutor.

import org.entityfs.util.filter.entity.* import org.schmant.arg.DirectoryAndFilter import org.schmant.run.TaskExecutor import org.schmant.task.html.HtmlLinkValidationTF import org.schmant.task.meta.RecursiveActionTF // A collection of links that have already been validated. This is used to // prevent valid links from being validated more than once def vlc = Collections.synchronizedSet(new HashSet()) def te = new TaskExecutor(). // Use eight parallel threads. It is good to use a large number of threads // here since each thread often spends a lot of time waiting for a remote // server to reply. setNumberOfThreads(8).start() try { // Use a RecursiveActionTF to run the task for each HTML file that it finds. new RecursiveActionTF(). // Don't disable the header and footer logging from the nested tasks since // then we would not see which files that have been validated. setDisableHeaderLogging(false). // Use the task executor to run the tasks instead of running them in // the recursive action tasks's thread. setTaskExecutor(te). addSource( // Use a filter that only lets HTML files through new DirectoryAndFilter(doc, new EFileNameExtensionFilter("html"))). setTaskFactory( new HtmlLinkValidationTF(). setValidLinkCollection(vlc). // Validate all Javadoc links referencing the org.example or // org.example2 packages or any of their subpackages addJavadocPackages(["org.example", "org.example2"]). // Ignore the links to Java SE's API documentation from the API // documentation. Assume that javadoc gets it right (and don't hammer // Sun's servers). addIgnorePattern("http://java.sun.com/javase/.*?/docs/api/.*")).run() te.waitFor() } finally { te.shutdown() }


* An EntityFS-aware task is implemented using EntityFS. This means that it uses the filter settings of DirectoryView:s and also that it often can work with other file system implementations than File-based, such as the RAM file system.