Class StandardAnalyzer
Filters StandardTokenizer with StandardFilter, LowerCaseFilter and StopFilter, using a list of English stop words.
You must specify the required LuceneVersion compatibility when creating StandardAnalyzer:
- As of 3.4, Hiragana and Han characters are no longer wrongly split from their combining characters. If you use a previous version number, you get the exact broken behavior for backwards compatibility.
- As of 3.1, StandardTokenizer implements Unicode text segmentation, and StopFilter correctly handles Unicode 4.0 supplementary characters in stopwords. ClassicTokenizer and ClassicAnalyzer are the pre-3.1 implementations of StandardTokenizer and StandardAnalyzer.
- As of 2.9, StopFilter preserves position increments
- As of 2.4, Tokens incorrectly identified as acronyms are corrected (see LUCENE-1068)
Inherited Members
Lucene.Net.Analysis.Analyzer.NewAnonymous(System.Func<System.String, System.IO.TextReader, Lucene.Net.Analysis.TokenStreamComponents>)
Lucene.Net.Analysis.Analyzer.NewAnonymous(System.Func<System.String, System.IO.TextReader, Lucene.Net.Analysis.TokenStreamComponents>, Lucene.Net.Analysis.ReuseStrategy)
Lucene.Net.Analysis.Analyzer.NewAnonymous(System.Func<System.String, System.IO.TextReader, Lucene.Net.Analysis.TokenStreamComponents>, System.Func<System.String, System.IO.TextReader, System.IO.TextReader>)
Lucene.Net.Analysis.Analyzer.NewAnonymous(System.Func<System.String, System.IO.TextReader, Lucene.Net.Analysis.TokenStreamComponents>, System.Func<System.String, System.IO.TextReader, System.IO.TextReader>, Lucene.Net.Analysis.ReuseStrategy)
Lucene.Net.Analysis.Analyzer.GetTokenStream(System.String, System.IO.TextReader)
Lucene.Net.Analysis.Analyzer.InitReader(System.String, System.IO.TextReader)
Lucene.Net.Analysis.Analyzer.GetObjectData(System.Runtime.Serialization.SerializationInfo, System.Runtime.Serialization.StreamingContext)
System.Object.ToString()
System.Object.Equals(System.Object)
System.Object.Equals(System.Object, System.Object)
System.Object.ReferenceEquals(System.Object, System.Object)
System.Object.GetHashCode()
System.Object.GetType()
System.Object.MemberwiseClone()
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
[Serializable]
public sealed class StandardAnalyzer : StopwordAnalyzerBase, IDisposable
Constructors
Name | Description |
---|---|
StandardAnalyzer(LuceneVersion) | Builds an analyzer with the default stop words (STOP_WORDS_SET). |
StandardAnalyzer(LuceneVersion, CharArraySet) | Builds an analyzer with the given stop words. |
StandardAnalyzer(LuceneVersion, TextReader) | Builds an analyzer with the stop words from the given reader. |
Fields
Name | Description |
---|---|
DEFAULT_MAX_TOKEN_LENGTH | Default maximum allowed token length |
STOP_WORDS_SET | An unmodifiable set containing some common English words that are usually not useful for searching. |
Properties
Name | Description |
---|---|
MaxTokenLength | Set maximum allowed token length. If a token is seen that exceeds this length then it is discarded. This setting only takes effect the next time tokenStream or tokenStream is called. |
Methods
Name | Description |
---|---|
CreateComponents(String, TextReader) |