Module: SchemaGraphy::RegexpUtils

Defined in:
lib/schemagraphy/regexp_utils.rb

Overview

A utility module for robustly parsing and using regular expressions. It handles various formats, including literals and plain strings, and provides helpers for extracting captured content.

Class Method Summary collapse

Class Method Details

.create_regexp(pattern, flags = '') ⇒ Regexp

Create a Regexp object from a pattern string and explicit flags.

Parameters:

  • pattern (String)
    The regex pattern (without delimiters).
  • flags (String) (defaults to: '')
    The flags string (ex: "im").

Returns:

  • (Regexp)
    The compiled Regexp object.


137
138
139
140
# File 'lib/schemagraphy/regexp_utils.rb', line 137

def create_regexp pattern, flags = ''
  options = flags_to_options(flags)
  Regexp.new(pattern, options)
end

.extract_all_captures(text, pattern_info) ⇒ Hash, ...

Extract all named capture groups as a hash or positional captures as an array.

Parameters:

  • text (String)
    The text to match against.
  • pattern_info (Hash)
    The hash result from `parse_pattern`.

Returns:

  • (Hash, Array, nil)
    A hash of named captures, an array of positional captures, or `nil`.


173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
# File 'lib/schemagraphy/regexp_utils.rb', line 173

def extract_all_captures text, pattern_info
  return nil unless text && pattern_info

  regexp = pattern_info[:regexp]
  match = text.match(regexp)

  return nil unless match

  if match.names.any?
    # Return hash of named captures
    match.names.each_with_object({}) do |name, captures|
      captures[name] = match[name]
    end
  else
    # Return array of positional captures
    match.captures
  end
end

.extract_capture(text, pattern_info, capture_name = nil) ⇒ String?

Extract content using named or positional capture groups.

Parameters:

  • text (String)
    The text to match against.
  • pattern_info (Hash)
    The hash result from `parse_pattern`.
  • capture_name (String) (defaults to: nil)
    The name of the capture group to extract (optional).

Returns:

  • (String, nil)
    The extracted text, or `nil` if no match is found.


148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
# File 'lib/schemagraphy/regexp_utils.rb', line 148

def extract_capture text, pattern_info, capture_name = nil
  return nil unless text && pattern_info

  regexp = pattern_info[:regexp]
  match = text.match(regexp)

  return nil unless match

  if capture_name && match.names.include?(capture_name.to_s)
    # Extract named capture group
    match[capture_name.to_s]
  elsif match.captures.any?
    # Extract first capture group
    match[1]
  else
    # Return the entire match
    match[0]
  end
end

.extract_flags_from_regexp(regexp) ⇒ String

Extract a flags string from a compiled Regexp object.

Parameters:

  • regexp (Regexp)
    A compiled regexp object.

Returns:

  • (String)
    String representation of the flags (e.g., "im").


124
125
126
127
128
129
130
# File 'lib/schemagraphy/regexp_utils.rb', line 124

def extract_flags_from_regexp regexp
  flags = ''
  flags += 'i' if regexp.options.anybits?(Regexp::IGNORECASE)
  flags += 'm' if regexp.options.anybits?(Regexp::MULTILINE)
  flags += 'x' if regexp.options.anybits?(Regexp::EXTENDED)
  flags
end

.flags_to_options(flags) ⇒ Integer

Convert a flags string (ex: "im") to a Regexp options integer.

Parameters:

  • flags (String)
    String containing regex flags.

Returns:

  • (Integer)
    Regexp options integer.


106
107
108
109
110
111
112
113
114
115
116
117
118
# File 'lib/schemagraphy/regexp_utils.rb', line 106

def flags_to_options flags
  options = 0
  flags = flags.to_s

  options |= Regexp::IGNORECASE if flags.include?('i')
  options |= Regexp::MULTILINE if flags.include?('m')
  options |= Regexp::EXTENDED if flags.include?('x')

  # NOTE: 'g' (global) and 'o' (once) are not standard Ruby flags
  # encoding flags ('n', 'e', 's', 'u') are handled by to_regexp

  options
end

.parse_and_extract(text, pattern_input, capture_name = nil, default_flags = '') ⇒ String?

A convenience method that combines parsing and a single extraction.

Parameters:

  • text (String)
    The text to match against.
  • pattern_input (String)
    The pattern string (with or without /flags/).
  • capture_name (String) (defaults to: nil)
    Name of the capture group to extract (optional).
  • default_flags (String) (defaults to: '')
    Default flags if the pattern has no flags.

Returns:

  • (String, nil)
    The extracted text, or `nil` if no match is found.


199
200
201
202
# File 'lib/schemagraphy/regexp_utils.rb', line 199

def parse_and_extract text, pattern_input, capture_name = nil, default_flags = ''
  pattern_info = parse_pattern(pattern_input, default_flags)
  extract_capture(text, pattern_info, capture_name)
end

.parse_and_extract_all(text, pattern_input, default_flags = '') ⇒ Hash, ...

A convenience method that combines parsing and extraction of all captures.

Parameters:

  • text (String)
    The text to match against.
  • pattern_input (String)
    The pattern string (with or without /flags/).
  • default_flags (String) (defaults to: '')
    Default flags if the pattern has no flags.

Returns:

  • (Hash, Array, nil)
    All captured content, or `nil` if no match is found.


210
211
212
213
# File 'lib/schemagraphy/regexp_utils.rb', line 210

def parse_and_extract_all text, pattern_input, default_flags = ''
  pattern_info = parse_pattern(pattern_input, default_flags)
  extract_all_captures(text, pattern_info)
end

.parse_pattern(input, default_flags = '') ⇒ Hash?

Parse a regex pattern string using the `to_regexp` gem for robust parsing. Handles `/pattern/flags`, `%r{pattern}flags`, and plain text formats.

Examples:

parse_pattern("/^hello.*$/im")
# => { pattern: "^hello.*$", flags: "im", regexp: /^hello.*$/im, options: 6 }
parse_pattern("hello world")
# => { pattern: "hello world", flags: "", regexp: /hello world/, options: 0 }
parse_pattern("hello world", "i")
# => { pattern: "hello world", flags: "i", regexp: /hello world/i, options: 1 }

Parameters:

  • input (String)
    The input string, e.g., "/pattern/flags" or "plain pattern".
  • default_flags (String) (defaults to: '')
    Default flags to apply if none are specified (default: "").

Returns:

  • (Hash, nil)
    A hash with `:pattern`, `:flags`, `:regexp`, and `:options`, or `nil`.


30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# File 'lib/schemagraphy/regexp_utils.rb', line 30

def parse_pattern input, default_flags = ''
  return nil if input.nil? || input.to_s.strip.empty?

  input_str = input.to_s.strip

  # Remove surrounding quotes that might come from YAML parsing
  clean_input = input_str.gsub(/^["']|["']$/, '')

  # Heuristic to detect if it's a Regexp literal
  is_literal = (clean_input.start_with?('/') && clean_input.rindex('/').positive?) || clean_input.start_with?('%r{')

  if is_literal
    # Try to parse as regex literal using to_regexp
    begin
      regexp_obj = clean_input.to_regexp(detect: true)

      # Extract pattern and flags from the compiled regexp
      pattern_str = regexp_obj.source
      flags_str = extract_flags_from_regexp(regexp_obj)

      {
        pattern: pattern_str,
        flags: flags_str,
        regexp: regexp_obj,
        options: regexp_obj.options
      }
    rescue RegexpError => e
      # Malformed literal is an error
      raise RegexpError, "Invalid regex literal '#{input}': #{e.message}"
    end
  else
    # Treat as plain pattern string with default flags
    flags_str = default_flags.to_s
    options = flags_to_options(flags_str)

    begin
      regexp_obj = Regexp.new(clean_input, options)

      {
        pattern: clean_input,
        flags: flags_str,
        regexp: regexp_obj,
        options: options
      }
    rescue RegexpError => e
      raise RegexpError, "Invalid regex pattern '#{input}': #{e.message}"
    end
  end
end

.parse_structured_pattern(pattern_hash) ⇒ Object

Note:
Not yet implemented.
Future enhancement to parse structured pattern definitions from a Hash.

Parameters:

  • pattern_hash (Hash)
    A hash with 'pattern' and 'flags' keys.

Raises:

  • (NotImplementedError)
    Always raises this error.


84
85
86
87
88
89
# File 'lib/schemagraphy/regexp_utils.rb', line 84

def parse_structured_pattern pattern_hash
  # TODO: Implement structured pattern parsing
  # pattern_hash should have 'pattern' and 'flags' keys
  # flags can be string or array
  raise NotImplementedError, 'Structured pattern parsing not yet implemented'
end

.parse_tagged_pattern(tagged_input, tag_type) ⇒ Object

Note:
Not yet implemented.
Future enhancement to parse custom YAML tags for regular expressions.

Parameters:

  • tagged_input (String)
    The input string with a YAML tag.
  • tag_type (Symbol)
    The type of tag, e.g., `:literal` or `:pattern`.

Raises:

  • (NotImplementedError)
    Always raises this error.


96
97
98
99
100
# File 'lib/schemagraphy/regexp_utils.rb', line 96

def parse_tagged_pattern tagged_input, tag_type
  # TODO: Implement custom YAML tag parsing
  # tag_type would be :literal or :pattern
  raise NotImplementedError, 'Tagged pattern parsing not yet implemented'
end