@Generated(value="com.amazonaws:aws-java-sdk-code-generator") public class WebCrawlerConfiguration extends Object implements Serializable, Cloneable, StructuredPojo
The configuration of web URLs that you want to crawl. You should be authorized to crawl the URLs.
Constructor and Description |
---|
WebCrawlerConfiguration() |
Modifier and Type | Method and Description |
---|---|
WebCrawlerConfiguration |
clone() |
boolean |
equals(Object obj) |
WebCrawlerLimits |
getCrawlerLimits()
The configuration of crawl limits for the web URLs.
|
List<String> |
getExclusionFilters()
A list of one or more exclusion regular expression patterns to exclude certain URLs.
|
List<String> |
getInclusionFilters()
A list of one or more inclusion regular expression patterns to include certain URLs.
|
String |
getScope()
The scope of what is crawled for your URLs.
|
int |
hashCode() |
void |
marshall(ProtocolMarshaller protocolMarshaller)
Marshalls this structured data using the given
ProtocolMarshaller . |
void |
setCrawlerLimits(WebCrawlerLimits crawlerLimits)
The configuration of crawl limits for the web URLs.
|
void |
setExclusionFilters(Collection<String> exclusionFilters)
A list of one or more exclusion regular expression patterns to exclude certain URLs.
|
void |
setInclusionFilters(Collection<String> inclusionFilters)
A list of one or more inclusion regular expression patterns to include certain URLs.
|
void |
setScope(String scope)
The scope of what is crawled for your URLs.
|
String |
toString()
Returns a string representation of this object.
|
WebCrawlerConfiguration |
withCrawlerLimits(WebCrawlerLimits crawlerLimits)
The configuration of crawl limits for the web URLs.
|
WebCrawlerConfiguration |
withExclusionFilters(Collection<String> exclusionFilters)
A list of one or more exclusion regular expression patterns to exclude certain URLs.
|
WebCrawlerConfiguration |
withExclusionFilters(String... exclusionFilters)
A list of one or more exclusion regular expression patterns to exclude certain URLs.
|
WebCrawlerConfiguration |
withInclusionFilters(Collection<String> inclusionFilters)
A list of one or more inclusion regular expression patterns to include certain URLs.
|
WebCrawlerConfiguration |
withInclusionFilters(String... inclusionFilters)
A list of one or more inclusion regular expression patterns to include certain URLs.
|
WebCrawlerConfiguration |
withScope(String scope)
The scope of what is crawled for your URLs.
|
WebCrawlerConfiguration |
withScope(WebScopeType scope)
The scope of what is crawled for your URLs.
|
public void setCrawlerLimits(WebCrawlerLimits crawlerLimits)
The configuration of crawl limits for the web URLs.
crawlerLimits
- The configuration of crawl limits for the web URLs.public WebCrawlerLimits getCrawlerLimits()
The configuration of crawl limits for the web URLs.
public WebCrawlerConfiguration withCrawlerLimits(WebCrawlerLimits crawlerLimits)
The configuration of crawl limits for the web URLs.
crawlerLimits
- The configuration of crawl limits for the web URLs.public List<String> getExclusionFilters()
A list of one or more exclusion regular expression patterns to exclude certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
public void setExclusionFilters(Collection<String> exclusionFilters)
A list of one or more exclusion regular expression patterns to exclude certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
exclusionFilters
- A list of one or more exclusion regular expression patterns to exclude certain URLs. If you specify an
inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the
web content of the URL isn’t crawled.public WebCrawlerConfiguration withExclusionFilters(String... exclusionFilters)
A list of one or more exclusion regular expression patterns to exclude certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
NOTE: This method appends the values to the existing list (if any). Use
setExclusionFilters(java.util.Collection)
or withExclusionFilters(java.util.Collection)
if you
want to override the existing values.
exclusionFilters
- A list of one or more exclusion regular expression patterns to exclude certain URLs. If you specify an
inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the
web content of the URL isn’t crawled.public WebCrawlerConfiguration withExclusionFilters(Collection<String> exclusionFilters)
A list of one or more exclusion regular expression patterns to exclude certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
exclusionFilters
- A list of one or more exclusion regular expression patterns to exclude certain URLs. If you specify an
inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the
web content of the URL isn’t crawled.public List<String> getInclusionFilters()
A list of one or more inclusion regular expression patterns to include certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
public void setInclusionFilters(Collection<String> inclusionFilters)
A list of one or more inclusion regular expression patterns to include certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
inclusionFilters
- A list of one or more inclusion regular expression patterns to include certain URLs. If you specify an
inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the
web content of the URL isn’t crawled.public WebCrawlerConfiguration withInclusionFilters(String... inclusionFilters)
A list of one or more inclusion regular expression patterns to include certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
NOTE: This method appends the values to the existing list (if any). Use
setInclusionFilters(java.util.Collection)
or withInclusionFilters(java.util.Collection)
if you
want to override the existing values.
inclusionFilters
- A list of one or more inclusion regular expression patterns to include certain URLs. If you specify an
inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the
web content of the URL isn’t crawled.public WebCrawlerConfiguration withInclusionFilters(Collection<String> inclusionFilters)
A list of one or more inclusion regular expression patterns to include certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
inclusionFilters
- A list of one or more inclusion regular expression patterns to include certain URLs. If you specify an
inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the
web content of the URL isn’t crawled.public void setScope(String scope)
The scope of what is crawled for your URLs.
You can choose to crawl only web pages that belong to the same host or primary domain. For example, only web pages that contain the seed URL "https://docs.aws.amazon.com/bedrock/latest/userguide/" and no other domains. You can choose to include sub domains in addition to the host or primary domain. For example, web pages that contain "aws.amazon.com" can also include sub domain "docs.aws.amazon.com".
scope
- The scope of what is crawled for your URLs.
You can choose to crawl only web pages that belong to the same host or primary domain. For example, only web pages that contain the seed URL "https://docs.aws.amazon.com/bedrock/latest/userguide/" and no other domains. You can choose to include sub domains in addition to the host or primary domain. For example, web pages that contain "aws.amazon.com" can also include sub domain "docs.aws.amazon.com".
WebScopeType
public String getScope()
The scope of what is crawled for your URLs.
You can choose to crawl only web pages that belong to the same host or primary domain. For example, only web pages that contain the seed URL "https://docs.aws.amazon.com/bedrock/latest/userguide/" and no other domains. You can choose to include sub domains in addition to the host or primary domain. For example, web pages that contain "aws.amazon.com" can also include sub domain "docs.aws.amazon.com".
You can choose to crawl only web pages that belong to the same host or primary domain. For example, only web pages that contain the seed URL "https://docs.aws.amazon.com/bedrock/latest/userguide/" and no other domains. You can choose to include sub domains in addition to the host or primary domain. For example, web pages that contain "aws.amazon.com" can also include sub domain "docs.aws.amazon.com".
WebScopeType
public WebCrawlerConfiguration withScope(String scope)
The scope of what is crawled for your URLs.
You can choose to crawl only web pages that belong to the same host or primary domain. For example, only web pages that contain the seed URL "https://docs.aws.amazon.com/bedrock/latest/userguide/" and no other domains. You can choose to include sub domains in addition to the host or primary domain. For example, web pages that contain "aws.amazon.com" can also include sub domain "docs.aws.amazon.com".
scope
- The scope of what is crawled for your URLs.
You can choose to crawl only web pages that belong to the same host or primary domain. For example, only web pages that contain the seed URL "https://docs.aws.amazon.com/bedrock/latest/userguide/" and no other domains. You can choose to include sub domains in addition to the host or primary domain. For example, web pages that contain "aws.amazon.com" can also include sub domain "docs.aws.amazon.com".
WebScopeType
public WebCrawlerConfiguration withScope(WebScopeType scope)
The scope of what is crawled for your URLs.
You can choose to crawl only web pages that belong to the same host or primary domain. For example, only web pages that contain the seed URL "https://docs.aws.amazon.com/bedrock/latest/userguide/" and no other domains. You can choose to include sub domains in addition to the host or primary domain. For example, web pages that contain "aws.amazon.com" can also include sub domain "docs.aws.amazon.com".
scope
- The scope of what is crawled for your URLs.
You can choose to crawl only web pages that belong to the same host or primary domain. For example, only web pages that contain the seed URL "https://docs.aws.amazon.com/bedrock/latest/userguide/" and no other domains. You can choose to include sub domains in addition to the host or primary domain. For example, web pages that contain "aws.amazon.com" can also include sub domain "docs.aws.amazon.com".
WebScopeType
public String toString()
toString
in class Object
Object.toString()
public WebCrawlerConfiguration clone()
public void marshall(ProtocolMarshaller protocolMarshaller)
StructuredPojo
ProtocolMarshaller
.marshall
in interface StructuredPojo
protocolMarshaller
- Implementation of ProtocolMarshaller
used to marshall this object's data.