spotterbase.dnm package
Submodules
spotterbase.dnm.defaults module
- spotterbase.dnm.defaults.get_arxmliv_dnm_factory(*, decorate_replacements: bool = True, number_replacements: bool = True, keep_titles: bool = True, keep_replacements_as_annotations: bool = True, normalize_white_space: bool = True, wrap_replacements_with_spaces: bool = False) DnmFactory
- spotterbase.dnm.defaults.get_simple_arxmliv_factory(*, token_prefix: str = '', token_suffix: str = '', keep_titles: bool = True, core_token_processor: Callable[[str], str] = _default_core_token_processor) SimpleDnmFactory
spotterbase.dnm.dnm module
- class spotterbase.dnm.dnm.Dnm(*, meta_info: _MetaInfoType, string: str | None = None, start_refs: Sequence[int] | None = None, end_refs: Sequence[int] | None = None, _rel_data: _RelData[_MetaInfoType] | None = None)
-
- get_embedded_annotations() list[Annotation]
- sub_dnm_from_dom_range(dom_range: DomRange) tuple[Dnm, DnmMatchIssues]
- sub_dnm_from_ref_range(start_ref: int, end_ref: int) tuple[Dnm, DnmMatchIssues]
- to_dom_offset_range() DomOffsetRange
- to_fragment_target(target_uri: str | Uri | URIRef | Path | VocabularyMeta) FragmentTarget
- class spotterbase.dnm.dnm.DnmFactory
Bases:
ABC
- class spotterbase.dnm.dnm.DnmMatchIssues(dom_start_earlier: bool, dom_end_later: bool, dom_start_later: bool, dom_end_earlier: bool)
Bases:
objectWhen mapping a part of the DOM to the DNM, it might not be a perfect match. This class contains more detailed information about the problem.
- property dnm_covers_more: bool
- property dnm_misses_something: bool
- dom_end_earlier: bool
- dom_end_later: bool
- dom_start_earlier: bool
- dom_start_later: bool
- class spotterbase.dnm.dnm.DnmMeta(dom: '_Element', offset_converter: 'OffsetConverter', selector_converter: 'SelectorConverter', uri: 'Uri')
Bases:
object- dom: _Element
- embedded_annotations: EmbeddedAnnotations
- offset_converter: OffsetConverter
- selector_converter: SelectorConverter
- class spotterbase.dnm.dnm.EmbeddedAnnotations
Bases:
object- get_next_replacement_number(category: str) int
- insert(replacement: str, range_: DomOffsetRange, annotation: Annotation, replacement_unique: bool = True)
spotterbase.dnm.linked_str module
- class spotterbase.dnm.linked_str.LinkedStr(*, meta_info: _MetaInfoType, string: str | None = None, start_refs: Sequence[int] | None = None, end_refs: Sequence[int] | None = None, _rel_data: _RelData[_MetaInfoType] | None = None)
Bases:
Generic[_MetaInfoType]Should be treated as immutable! For optimization, references are used (e.g. when created a sub-linked-str)
- char_at(pos: int) str
- get_end_ref() int
- get_end_refs() Sequence[int]
- get_indices_from_ref_range(start_ref, end_ref) tuple[int, int]
- get_meta_info() _MetaInfoType
- get_start_ref() int
- get_start_refs() Sequence[int]
- lower() LinkedStr_T
- normalize_spaces() LinkedStr_T
replace sequences of whitespaces with a single one.
- replacements_at_positions(replacements: Iterable[tuple[int, int, str]], positions_are_references: bool = True) LinkedStr_T
- strip() LinkedStr_T
- upper() LinkedStr_T
- with_string(string: str) LinkedStr_T
spotterbase.dnm.node_based_dnm_factory module
- class spotterbase.dnm.node_based_dnm_factory.NodeBasedDnmFactory(root_processor: NodeProcessor)
Bases:
DnmFactory
- class spotterbase.dnm.node_based_dnm_factory.NodeProcessor
Bases:
ABC
- class spotterbase.dnm.node_based_dnm_factory.ReplacingNP(replacement_pattern: ReplacementPattern, category: str, number_replacements: bool = True, keep_annotation: bool = True)
Bases:
NodeProcessor
- class spotterbase.dnm.node_based_dnm_factory.SkippingNP
Bases:
NodeProcessor
- class spotterbase.dnm.node_based_dnm_factory.SourceHtmlNP
Bases:
NodeProcessorEssentially outputs the original HTML sources of the node (but generates it from the DOM). TODO: Currently, attributes are skipped. This should be configurable.
- class spotterbase.dnm.node_based_dnm_factory.TextExtractingBlockedNP(child_processor: NodeProcessor)
Bases:
NodeProcessorExtracts the text content of a node, but processes the children with a different processor. This can be useful to avoid infinite recursion in some cases.
- class spotterbase.dnm.node_based_dnm_factory.TextExtractingNP
Bases:
NodeProcessor- register_class_processor(class_: str, processor: NodeProcessor)
- register_tag_processor(tag: str, processor: NodeProcessor)
- class spotterbase.dnm.node_based_dnm_factory.TokenAfterNodeNP(token: str, node_processor: NodeProcessor)
Bases:
NodeProcessor
spotterbase.dnm.post_processing_dnm_factory module
- class spotterbase.dnm.post_processing_dnm_factory.PostProcessingDnmFactory(main_factory: DnmFactory, post_processor: Callable[[Dnm], Dnm])
Bases:
DnmFactory
spotterbase.dnm.range_subst module
- class spotterbase.dnm.range_subst.RangeSubstituter(to_substitute: Iterable[tuple[tuple[int, int], tuple[str, _T]]])
Bases:
Generic[_T]- apply(dnm: LinkedStr_T) LinkedStr_T
- ordered_ref_ranges: list[tuple[int, int]]
- replacement_values: dict[tuple[int, int], str]
- substitution_values: dict[tuple[int, int], _T]
spotterbase.dnm.replacement_pattern module
- class spotterbase.dnm.replacement_pattern.CategoryStyle
Bases:
object- ALL_CAPS() str
- CAMEL_CASE() str
- HYPHENATED() str
- class spotterbase.dnm.replacement_pattern.ReplacementPattern
Bases:
ABC
- class spotterbase.dnm.replacement_pattern.StandardReplacementPattern(prefix: 'str' = '', infix: 'str' = '', suffix: 'str' = '', include_infix_if_unnumbered: 'bool' = False, include_prefix_suffix_if_unnumbered: 'bool' = True, category_style: 'Callable[[str], str]' = <functools._lru_cache_wrapper object at 0x7eed1124c300>)
Bases:
ReplacementPattern- category_style() str
- include_infix_if_unnumbered: bool = False
- include_prefix_suffix_if_unnumbered: bool = True
- infix: str = ''
- prefix: str = ''
- suffix: str = ''
spotterbase.dnm.simple_dnm_factory module
- class spotterbase.dnm.simple_dnm_factory.SimpleDnmFactory(nodes_to_replace: dict[str, str] | None = None, classes_to_replace: dict[str, str] | None = None)
Bases:
DnmFactory
spotterbase.dnm.xml_util module
- class spotterbase.dnm.xml_util.XmlNode(node: _Element, text_node: Literal['text', 'tail', 'none'] = 'none')
Bases:
objectthe lxml implementation of text nodes is not very convenient, so we will have wrapper for the needed nodes.
- node: _Element
- tail: bool = False
- text: bool = False
- spotterbase.dnm.xml_util.get_node_classes(node: _Element) list[str]